Why I Am Not Satisfied With Big Data

Today I was listening to EconTalk’s episode featuring Gary Marcus on the future of AI and the brain and they put into words a thought that I have been having for a while relating to Big Data.  You see, Big Data is really about correlation – this thing is related to this other thing. Gary pointed out:

And correlation can only get you so far. And usually correlations are out there in the world because they are causal principles that make them true. But if you only pick up on the correlation rather than the causal principle, then you are wrong in the cases where maybe there is another principle that applies or something like that. And so, statistical correlations are good guides, but they are not great guides.

I find this to be remarkably insightful!  Correlation is not causation.  This needs to be part of the data scientist mantra.  Without this understanding you can put too much trust in shaky ground.

The Big Data world seems slow to admit this limitation.  I recently read Chris Anderson 2008 article “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” It is a great example of a misguided trust in correlation only.  There’s no need to understand the deeper why because we have discovered this strong correlation and that is sufficient.

Yeah correlation is nice but I would much rather have a causal principle and if Big Data can’t deliver that, then I don’t think I will be satisfied.


My Odyssey into Data Science

Here’s how my story starts.  I began teaching myself programming in junior high.  As the internet grew I taught myself how to build webpages.  I got so good I had a job as a web designer in high school.

I graduated high school and entered college majoring in computer science.  I was so excited!  Then the dot.com burst hit and honestly I got scared.  Would I have a job when I graduated college?

I took an economics course as part of my generals and I fell in love with the field.  I switched majors and graduated with a degree in economics, however I remained a computer geek at heart.

I worked for the BLS in New York City after I graduated from college.  As our family began to grow we wanted to find a more family friendly place to live.  I found a job at CGR in Rochester, NY.  I worked with large amounts of data and became good at it.  I also programmed a couple of web based products: Govistics and informANALYTICS.

I listen to the EconTalk podcast every week.  One week they interviewed Hal Varian, the chief economist for Google. They talked about Big Data and how it was a growing field.  I began to think I should begin to position myself to move into that field.  After all that is what I was doing but at a much smaller scale.

I have turned the idea over in my mind time and time again and have decided that I am going to do something about it.  This is the story of my journey.