Predicting Titanic Survivors

Udacity intro to data science course has a project that involves predicting the probability of a passenger being a survivor on the Titanic.  To successfully complete the task you need to have a higher than 80% accuracy rate.

The following is the heuristic that I programmed. I can’t take credit for this as I got my inspiration from Hal Varian’s paper. The heuristic has a 80.47% accuracy rate.

# Assume they aren't a survivor by default
survivor = False
# Prediction model variables
passenger_id = passenger["PassengerId"]
sex = passenger["Sex"]
pclass = passenger["Pclass"]
age = passenger["Age"]
sibsp = passenger["SibSp"]
# Let's find the Survivors
if sex == "female" and pclass <= 2:
     survivor = True
elif sex == "male" and pclass > 1 and age <= 9 and sibsp <= 2:
     survivor = True
# Set the prediction for the passenger
if survivor:
     predictions[passenger_id] = 1
     predictions[passenger_id] = 0


6 thoughts on “Predicting Titanic Survivors

    • Mike Silva says:

      He explained in his paper “One might summarize this tree by
      the following principle: ‘women and children first . . . particularly if they were
      traveling first class.’”

      I just had to translate that into Python. The decision tree gave the parameters so it wasn’t hard to do. Best of luck with the course. I know I enjoyed it!

  1. Charles Way says:

    I am curious about how did your arrive that ‘sex == “male” and pclass > 1 and age <= 9 and sibsp <= 2' will survive too. Is this mentioned in paper or you just used common sense?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s