Visualizing U.S. Commuters

I recently analyzed the patterns of U.S. Commuters and created a visualization that summarizes these patterns at the state level.

About the Data

This data is from the Census Transporation Planning Products (CTPP). For those who don’t know, the CTPP is derived from the 2006–2010 5-year American Community Survey (ACS) data. You can learn more about this data set by visiting the home page. In an effort to make this easily reproducible, you can download the csv used in this analysis. I chose this data source because I have used it in the past and am familiar with it.

Steps Taken

I created a directed network graph from this data. I used python’s networkx and pandas packages and the complete source code is provided below. I excluded Puerto Rico from the data set because I wanted to analyze state level patterns. I did leave the District of Columbia in thus there are 51 States in the analysis.

I wanted to use Gephi to generate the visualization because I have done so in the past. After creating the directed network graph using Python, I imported the data into Gephi and used the community detection algorithm using the default settings (Randomized checked, Use weights checked, Resolution = 1.0). This resulted in 7 communities being detected.

I grouped the states by these communities and colored them and placed them in a circular layout with straight edges between the nodes. I varied the width based on the edge weight.


Commuters Network


There are a couple of things that are of interest. The first thing to acknowledge is the proliferation of edges in this network.  Almost all of the states are connected with the other states.  This results in the “spirograph” like effect in the visualization.  However I don’t find that to be the most interesting aspect.

The sub-networks highlighted in this visualization are particularly interesting.  For example one readily sees that a lot of people in New Jersey work in New York.  As a former New Yorker there is no surprises there.  You also see the Capital Beltway connections in the visualization.  Residents of Maryland and Virginia find work in D.C.  The community detection algorithm highlights this finding.  Since the states are arranged by “community”, seeing the cross-community connections are interesting.  For example the Connecticut to New York have a connection.

Source Code

The following is how I generated the directed network graph file for Gephi from the CSV.

import pandas as pd
import networkx as nx

df = pd.read_csv('Table 1 Commuting Flows Available at State to POW State and County to POW County only.csv', skiprows=[0,1], thousands=',')
# Only pull the Estimates
df = df[df['Output']=='Estimate']
# Only pull the first 4 columns
df = df[df.columns[:4]]
# Drop the N/A's
df = df.dropna()
# Drop the Output column
df = df.drop('Output', 1)
# Rename the columns
df.columns = ['from','to','weight']
# Drop the Puerto Rico records
df = df[df['from'] != 'Puerto Rico']
df = df[df['to'] != 'Puerto Rico']
# Remove the people who work where they live
commuters = df[df['from'] != df['to']]
# Build the network graph
G = nx.from_pandas_dataframe(commuters, 'from','to', ['weight'], create_using=nx.DiGraph())
# Write the graph so I can use Gephi

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s