Predicting Titanic Survivors

Udacity intro to data science course has a project that involves predicting the probability of a passenger being a survivor on the Titanic.  To successfully complete the task you need to have a higher than 80% accuracy rate.

The following is the heuristic that I programmed. I can’t take credit for this as I got my inspiration from Hal Varian’s paper. The heuristic has a 80.47% accuracy rate.

# Assume they aren't a survivor by default
survivor = False
# Prediction model variables
passenger_id = passenger["PassengerId"]
sex = passenger["Sex"]
pclass = passenger["Pclass"]
age = passenger["Age"]
sibsp = passenger["SibSp"]
# Let's find the Survivors
if sex == "female" and pclass <= 2:
     survivor = True
elif sex == "male" and pclass > 1 and age <= 9 and sibsp <= 2:
     survivor = True
# Set the prediction for the passenger
if survivor:
     predictions[passenger_id] = 1
else:
     predictions[passenger_id] = 0

Status

Insight from Unexpected Places

I was working through Udacity’s intro to data science course and came to the first exercise which involves predicting if a passenger on the Titanic would be a survivor or not based on the characteristics of the passenger (age, sex, etc.).

Well, I got hung up because I don’t know enough Python syntax to do the programming (I guess I need to finish my Codecademy Python course first). So I decided to switch gears and explore how economists are using Big Data.

I was reading a working paper by Hal Varian, since he is partially responsible for getting me on this path, that was a high level overview of Big Data. It was in the 10 o’clock hour and my brain was spent so I stopped (I am a morning person). I was flipping ahead to see how many pages were left when I saw something I did not expect to see.

I saw a decision tree that showed how to predict if a person was a Titanic survivor based on some characteristics. You can get a hold of the paper on the resource page of this blog. It is funny where you can find insight. Who would of thought a paper, written for economists, could provide a model to predict Titanic survivors. There is a good lesson that Computer Scientists do not have a monopoly on this field.