Fixing FIPS Codes Mangled by Pandas

I have been working with data produced by the U.S. Census Bureau.  They use FIPS codes to identify geographies.  When I read in the data using Pandas, the FIPS codes get converted into numbers.  After trying to force the type to string (which doesn’t work currently), I decided to create a work around.  Here it is:

def fix_fips(fips, total_length):
    """Takes a broken FIPS and repairs it"""
    fips = str(fips)
    current_length = len(fips)
    if current_length < total_length:
        number_of_leading_zeros = total_length - current_length
        leading_zeros = ''.join('0' * number_of_leading_zeros)
        fips = leading_zeros + fips
    return fips

So say I have some state level data which has a two character FIPS code read into a pandas dataframe. I would correct the mangled data by:

df['State FIPS code'] = df['State FIPS code'].apply(fix_fips, args=(2,))

Hope this helps other data ninjas out there!


A Couple of Great Blog Posts on Panda

While working with Pandas today I came across two great blog posts.  The first is Greg Reda’s Intro to Pandas Data Structures.  He give a great tutorial complete with some examples.  His writing is clear and concise.

The second is Mikhail Semeniuk’s Python Pandas Tutorial.  This post was interesting to me because of examples of how to run regressions.  This is something I will put to use.


My First Real-World Data Wrangling with Python

Today I was faced with a daunting task.  As previously mentioned, I am the developer behind Govistics, a government spending statistic tool.  We are in the process of updating some data including the list of governments in the database.  I won’t bore you with all the details but I decided to try out my Python skills.

The task was to update some names of the 94,000 records.  It read in a CSV, loaded it into a pandas data frame, selected a subset based on a criteria, steps through the subset making request for JSON encoded data, then output the results into another CSV.

This was a great first real world exercise!  Honestly I muddled through it and followed example code found in the documentation for the most part.

But it is nice to see evidence of progress being made.  A few of weeks ago if you were to tell me I would be writing Python scripts to help my wrangle some data I would have laughed.  But now it has become a reality.

I am also embolden to create some API wrappers for some open data sources, now that I have a working example of how to make and process a JSON request.

How to Install Pandas on a Python 3.3 Windows 7 System

I have been watching the Intro to Data Science Udacity course and the instructor uses Pandas. Here are the steps I followed to install it on my system (Note: I explained how I set up Python on my system in this previous post):

  1. Install NumPy – I was unable to install it using easy_install because it couldn’t find the Atlas and Blas libraries on my machine. So I downloaded binary packages from Christoph Gohlke’s U.C. Irvine site instead.  I chose numpy‑MKL‑1.8.1rc1.win32‑py3.3.exe because it matches my setup.
  2. Open a command prompt – I clicked the start button and type “command” in the “Search programs and files” field, and hit enter.  There are other ways to do it.
  3. Install Pandas using easy_install – I typed “C:\Python33\Scripts\easy_install.exe pandas” and it is on my machine.

Now I will be able to put the things I’m learning in the Udacity course into practice.