Even the most casual business and media observers know we are awash in an ever-rising, welcomed tidal wave of Big Data! Throughout an increasing number of enterprises and organizations, there is a belief in and a call to action to apply predictive models to big data for sharper cause-and-effect insights and the potential revelation of previously hidden new opportunities.
To be sure, big data assessment has had many important impacts, one of the most famous being Google’s search engine. But there are voices that raise notes of caution as well. In a recent (April 6, 2014) New York Times Op-Ed column, Gary Marcus and Ernest Davis caution readers about the pitfalls in thinking virtually any problem can be resolved just by crunching large amounts of data with state of the art algorithms.
Here are three potentially confounding aspects of big data analysis to be aware of:
- Big data analysis excels at revealing underlying correlations. Because of the scope of the data sets, many correlated variables can be revealed, sometimes subtle ones. But the analytics never reveal which of these correlations are truly meaningful. For example, a big data analysis might show that from 2006 to 2011, the sharp decline in the US murder rate was strongly correlated with Internet Explorer’s market share. A causal relationship or a spurious correlation.
The advice: be cautious about being over-reliant on output that flags correlations but reveals virtually nothing else about a relationship.
- Reliance on big data statistical analysis, no matter how powerful, is limiting. Additional steps are needed to understand the relationship between the correlated items. For example, from 1998 to 2007, the number of new autism cases and sales of organic foods rose sharply. Does this trend reflect a causal relationship? Perhaps, but it would have to be verified or refuted by further analysis to understand the relationship between eating habits and autism.
The cautionary note: conduct further research to shed light on the nature of the relationship revealed by big data analytic before acting on it.
- Tools based on big data can eventually grow less reliable without warning. For example, big data metrics that rely on web hits frequently change over time. They often merge web hit data aggregated in differ ways and with different objects, sometimes leading to erroneous conclusions.
The watch-out: be sure of the underlying data sourcing methodology and how it may have changed over time.
Be realistic about big data. It can be an important resource for analysis and discovery, but be aware that it can sometimes be less robust than originally intended, especially with regard to predictability of consumer behavior.