Have you ever stumbled on something you weren’t even aware was there? Mining data is often like that. You may start looking in one direction but end up somewhere totally different. BI Tools, like QlikView help us to investigate data and look for trends or anomalies. The important thing is to start looking.
Most of us are already familiar with this concept when using Google to search the web. We commence our search, but very quickly find ourselves side-tracked and pursuing a totally different agenda. But is there a better framework to use than simply lurching from one path to another?
We often hear the term ‘slice and dice’, but what does this really mean when you’re faced with a screen of data. Here are some practical techniques that can be used to approach data investigations to help yield information you didn’t know was there.
a) Pattern Matching
b) Comparison and Decomposition
c) Hypothesis and correlation
e) Data Integrity checks
A) Pattern Matching and Data Integrity checks. One of the principles of data investigations is looking for patterns and then identifying discrepancies. Discrepancies often point to wider implications. E.g. dirty data, broken procedures, and loopholes. For all those would-be accountants out there it brings out the auditor within them.
B) Comparison and Decomposition. In this scenario, it often helps to start with a big picture approach and then look at the data from the top down. The idea is to start large and break down the data into progressively smaller parcels. Hence, you make commence by looking at All Revenue for Australia and comparing year-on-year. If revenue significantly rises of falls from one year to the next, you want to probe further. This may lead to looking at Revenue by State, or looking at the revenue by particular quarters.. This is the process of decomposition. As the data is segmented it helps to narrow the field of investigation, in the process throwing up variations which can be pursued.
C) Hypothesis and Correlation. Another means of investigation is to commence with an idea of what you might find and then search for data to substantiate your idea. So you might say for example, do all customers that buy teapots, also buy teabags from us? The hypothesis is based on your assumption that the customers who buy the teapots are also buying teabags. You search for data to see if there is a correlation between the two. Let’s assume the discovery of the opposite is true, i.e. teapot customers are not buying teabags from us. This leads you to investigate why. Do you need to speak with marketing etc?
D) Categorisation. Similar to Decomposition, this approach works from the bottom-up. Begin by grouping or categorising data into meaningful segments. You may group all clients together from a particular industry, or split clients into groups based on purchase volume and frequency. Once the groups have been established you can look for interesting spikes or dips which lead to asking more probing questions.
At the end of the day, the overwhelming premise for mining data is to keep asking “Why?” Whilst this may be a cliché, the inquisitive mind that searches will find the unknown.