Why I Am Not Satisfied With Big Data

Today I was listening to EconTalk’s episode featuring Gary Marcus on the future of AI and the brain and they put into words a thought that I have been having for a while relating to Big Data.  You see, Big Data is really about correlation – this thing is related to this other thing. Gary pointed out:

And correlation can only get you so far. And usually correlations are out there in the world because they are causal principles that make them true. But if you only pick up on the correlation rather than the causal principle, then you are wrong in the cases where maybe there is another principle that applies or something like that. And so, statistical correlations are good guides, but they are not great guides.

I find this to be remarkably insightful!  Correlation is not causation.  This needs to be part of the data scientist mantra.  Without this understanding you can put too much trust in shaky ground.

The Big Data world seems slow to admit this limitation.  I recently read Chris Anderson 2008 article “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” It is a great example of a misguided trust in correlation only.  There’s no need to understand the deeper why because we have discovered this strong correlation and that is sufficient.

Yeah correlation is nice but I would much rather have a causal principle and if Big Data can’t deliver that, then I don’t think I will be satisfied.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s