Image

Visualizing U.S. Commuters

I recently analyzed the patterns of U.S. Commuters and created a visualization that summarizes these patterns at the state level.

About the Data

This data is from the Census Transporation Planning Products (CTPP). For those who don’t know, the CTPP is derived from the 2006–2010 5-year American Community Survey (ACS) data. You can learn more about this data set by visiting the home page. In an effort to make this easily reproducible, you can download the csv used in this analysis. I chose this data source because I have used it in the past and am familiar with it.

Steps Taken

I created a directed network graph from this data. I used python’s networkx and pandas packages and the complete source code is provided below. I excluded Puerto Rico from the data set because I wanted to analyze state level patterns. I did leave the District of Columbia in thus there are 51 States in the analysis.

I wanted to use Gephi to generate the visualization because I have done so in the past. After creating the directed network graph using Python, I imported the data into Gephi and used the community detection algorithm using the default settings (Randomized checked, Use weights checked, Resolution = 1.0). This resulted in 7 communities being detected.

I grouped the states by these communities and colored them and placed them in a circular layout with straight edges between the nodes. I varied the width based on the edge weight.

Visualization

Commuters Network

Observations

There are a couple of things that are of interest. The first thing to acknowledge is the proliferation of edges in this network.  Almost all of the states are connected with the other states.  This results in the “spirograph” like effect in the visualization.  However I don’t find that to be the most interesting aspect.

The sub-networks highlighted in this visualization are particularly interesting.  For example one readily sees that a lot of people in New Jersey work in New York.  As a former New Yorker there is no surprises there.  You also see the Capital Beltway connections in the visualization.  Residents of Maryland and Virginia find work in D.C.  The community detection algorithm highlights this finding.  Since the states are arranged by “community”, seeing the cross-community connections are interesting.  For example the Connecticut to New York have a connection.

Source Code

The following is how I generated the directed network graph file for Gephi from the CSV.

import pandas as pd
import networkx as nx

df = pd.read_csv('Table 1 Commuting Flows Available at State to POW State and County to POW County only.csv', skiprows=[0,1], thousands=',')
# Only pull the Estimates
df = df[df['Output']=='Estimate']
# Only pull the first 4 columns
df = df[df.columns[:4]]
# Drop the N/A's
df = df.dropna()
# Drop the Output column
df = df.drop('Output', 1)
# Rename the columns
df.columns = ['from','to','weight']
# Drop the Puerto Rico records
df = df[df['from'] != 'Puerto Rico']
df = df[df['to'] != 'Puerto Rico']
# Remove the people who work where they live
commuters = df[df['from'] != df['to']]
# Build the network graph
G = nx.from_pandas_dataframe(commuters, 'from','to', ['weight'], create_using=nx.DiGraph())
# Write the graph so I can use Gephi
nx.write_gexf(G,'Commuters.gexf')
Status

U.S. Labor Markets: A Network Approach

I have been busy preforming a network analysis to identify labor markets.  I have previously done this with Florida and thought it would be interesting to try this with the whole United States.

Network Analysis

I used census commuting data to build my network then used Gephi to analyze the network graph.  I came up with 71 labor markets.  Here is a visualization of the network:

graph

Findings

I translated the communities discovered from the graph into the following map (for those wishing to know more please visit my GitHub repository):

map

Discussion

At first blush I think I’m on to something.  I live in Upstate New York and find it interesting to see the division between upstate New York (in purple) with downstate (in green).  It seems to be quite accurate (I lived in NYC and this conforms with my sense where downstate ends and upstate begins). What do you think?

Caveats

A couple of things to keep in mind with this map.  The first is that this is based on a network so there is that six degrees of separation type thing underlying this map.  Look at the LA are (in an admittedly ugly yellow-brown color).  That region includes:

  • Southern California
  • Arizona
  • Hawaii and
  • Part of Nevada, Utah and New Mexico.

How can Utah be connected with Hawaii?  Well people in southern Utah can be connected with people in Las Vegas, and Las Vegas can be connected with eastern California, and eastern California is connected with western California, which is connected with Hawaii.  You can see it in  the visualization of the graph above (look for chains of nodes).  So some of these far flung empires are due to connections.

The other thing to keep in mind is that the borders are fuzzy not hard.  One of my primary motivations for doing this in the first place was to see if I could tease out the labor market which may or may not be related to a political boundaries.  I like seeing Connecticut and part of New Jersey joined with New York City.  It makes total sense.  However this is not to say there are people in the Connecticut that don’t work in the Boston area.  They do.  Because the boundaries are not hard.

Further Work

Now that I have these markets identified I think it would be interesting to see if I could tease out some specializations.  Since the area represents a network of people and knowledge spreads through networks it would be interesting to see where the knowledge base is deepest.  The New York City market could be highly specialized in finance for example.  What other specializations occur?

Another thing that would be interesting it to apply a contagion model to unemployment.  Does a decrease in unemployment “infect” neighbors and pull down their level of employment?

I would also like to put together some dot maps showing the working population in these markets.

Commuter Network Analysis to Regionalize Florida

Recently I wanted to organize the 67 counties in Florida into regions that made sense.  I wanted these regions to be based off of data.

I decided I could use commuter data from the Census Transportation Planning Package (CTPP) to create a network graph.  It would show the migration of workers 16 years and older from the county they reside in to the county the work in.  The thought being that counties that share a commuters are economically linked together.

I used Gephi to create the network graph.  It was a weighted directional graph.  I had Gephi calculate the modularity using the default settings which broke the state up into 7 regions.

Florida Network

I then exported the data out of Gephi and pulled it into R and created a quick map to visualize the regions.  At first blush these regions seem to make sense. This may be a good approach to use in the future.

Florida Regions