Today I was listening to EconTalk’s episode featuring Gary Marcus on the future of AI and the brain and they put into words a thought that I have been having for a while relating to Big Data. You see, Big Data is really about correlation – this thing is related to this other thing. Gary pointed out:
And correlation can only get you so far. And usually correlations are out there in the world because they are causal principles that make them true. But if you only pick up on the correlation rather than the causal principle, then you are wrong in the cases where maybe there is another principle that applies or something like that. And so, statistical correlations are good guides, but they are not great guides.
I find this to be remarkably insightful! Correlation is not causation. This needs to be part of the data scientist mantra. Without this understanding you can put too much trust in shaky ground.
The Big Data world seems slow to admit this limitation. I recently read Chris Anderson 2008 article “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” It is a great example of a misguided trust in correlation only. There’s no need to understand the deeper why because we have discovered this strong correlation and that is sufficient.
Yeah correlation is nice but I would much rather have a causal principle and if Big Data can’t deliver that, then I don’t think I will be satisfied.