Big data – our saviour or just trying to find a needle in the haystack?
Numerous publications have described using big data analytics, but there is still misunderstanding on what it is, its origins and why we need it.
The concept is not new. Statisticians and research scientists have been using the techniques for at least 150 years. Indeed, Florence Nightingale pioneered systematic data collection during the Crimea War to distil effects of unsanitary conditions in military hospitals.
Fundamentally, the goal of big data is to filter insight or ‘signals’ from the apparently-noisy background of the data itself, which can be used to define an item of interest. For example, the diagnostic features of diseases, or how to detect a hacker in your network. Closer to home, one of the best examples of successful application of big data analytics is in the retail space, specifically an individual’s shopping habits, as demonstrated by Amazon.
Am I making a noise?
If we have all possible data for a given problem, we would not need statistics, we can simply look at the data. However, in reality, this rarely happens – typically only a subset of the data will be available. The question then becomes how representative is this subset compared to the overall sample, so how much can you trust the information? This is where big data analytics can help. The main issue is how to separate the proportion of the data that will provide insight and intelligence rather than ‘noise’. Depending on the data type, statistical noise may be artefacts in the collection process, or simply any variation in the sample that is magnified as the sample size is small. In the shopping example, how an individual shop on a day in a festive season is unlikely to represent their typical shopping pattern outside of that season. This brings us to the value of big data.
The power of big data is its size. When the sample sizes increase, the data become less likely to be skewed disproportionately by spurious artefacts. True signals will emerge from the data since these will be consistently observed. Bigger samples can also detect rare events, adding details to the dataset. Big data analytics can be extremely powerful. However, as Florence Nightingale showed, systematic collection and analysis will make it easier to coax that needle out of the haystack.
Dr Wendy Ng, CISSP, CCNP; 4th April 2017