How to Install Pandas on a Python 3.3 Windows 7 System

I have been watching the Intro to Data Science Udacity course and the instructor uses Pandas. Here are the steps I followed to install it on my system (Note: I explained how I set up Python on my system in this previous post):

  1. Install NumPy – I was unable to install it using easy_install because it couldn’t find the Atlas and Blas libraries on my machine. So I downloaded binary packages from Christoph Gohlke’s U.C. Irvine site instead.  I chose numpy‑MKL‑1.8.1rc1.win32‑py3.3.exe because it matches my setup.
  2. Open a command prompt – I clicked the start button and type “command” in the “Search programs and files” field, and hit enter.  There are other ways to do it.
  3. Install Pandas using easy_install – I typed “C:\Python33\Scripts\easy_install.exe pandas” and it is on my machine.

Now I will be able to put the things I’m learning in the Udacity course into practice.

Predicting Titanic Survivors

Udacity intro to data science course has a project that involves predicting the probability of a passenger being a survivor on the Titanic.  To successfully complete the task you need to have a higher than 80% accuracy rate.

The following is the heuristic that I programmed. I can’t take credit for this as I got my inspiration from Hal Varian’s paper. The heuristic has a 80.47% accuracy rate.

# Assume they aren't a survivor by default
survivor = False
# Prediction model variables
passenger_id = passenger["PassengerId"]
sex = passenger["Sex"]
pclass = passenger["Pclass"]
age = passenger["Age"]
sibsp = passenger["SibSp"]
# Let's find the Survivors
if sex == "female" and pclass <= 2:
     survivor = True
elif sex == "male" and pclass > 1 and age <= 9 and sibsp <= 2:
     survivor = True
# Set the prediction for the passenger
if survivor:
     predictions[passenger_id] = 1
else:
     predictions[passenger_id] = 0

Status

Able to Continue with Udacity

I just wrapped up the 10th lesson of Codecademy‘s Python course.  I felt like I was getting the hang of it so I decided to test out my understanding by picking up the Udacity intro to data science course where I left off.

I am happy to announce I was able to complete the assignment and make some progress on that course.  I am starting to get the hang of Python.  I’m very excited about that.

I have also finished the Varian paper tonight.  I really liked his idea of using multiple prediction models to and then averaging their results to come up with a final prediction.  I also like his idea of disclosing the uncertainty of the model in economics, similar to the way hurricane landfall forecast disclose their uncertainty.  Economists need to be more honest about how much the really don’t know.

Status

Insight from Unexpected Places

I was working through Udacity’s intro to data science course and came to the first exercise which involves predicting if a passenger on the Titanic would be a survivor or not based on the characteristics of the passenger (age, sex, etc.).

Well, I got hung up because I don’t know enough Python syntax to do the programming (I guess I need to finish my Codecademy Python course first). So I decided to switch gears and explore how economists are using Big Data.

I was reading a working paper by Hal Varian, since he is partially responsible for getting me on this path, that was a high level overview of Big Data. It was in the 10 o’clock hour and my brain was spent so I stopped (I am a morning person). I was flipping ahead to see how many pages were left when I saw something I did not expect to see.

I saw a decision tree that showed how to predict if a person was a Titanic survivor based on some characteristics. You can get a hold of the paper on the resource page of this blog. It is funny where you can find insight. Who would of thought a paper, written for economists, could provide a model to predict Titanic survivors. There is a good lesson that Computer Scientists do not have a monopoly on this field.

Link

Udacity’s Intro to Data Science Course

As you probably can tell by my background, I am a proponent of self-education.  I want to pass on a link to Udacity’s Intro to Data Science Course.  Udacity offers higher education courses through the internet.

I have just finished the first lesson which provides an introduction to the field.  I would highly recommend this to anyone interested in learning more about the field.


Update: I have created a page that will house resources for you to learn more. It is located at: https://buddingdatascientist.wordpress.com/resources/.