Introduction To Data Mining

Data mining can be described as the process of improving decision-making by identifying useful patterns and insights from data. Data mining is particularly useful for revealing hidden patterns and providing insights during analysis, for example, understanding how many people will be impacted by specific changes. It involves examining large volumes of data from varying viewpoints and summarising the data so that useful patterns and connections can be established. It may involve the use of dashboards and reports that facilitate visual communication of results. The main challenge with data mining usually lies in securing the right type, volume and quality of data that is necessary to draw insights.

The BABOK guide highlights 3 variants of data mining outcomes:

Descriptive: This involves the use of clustering to display patterns within a set of data, for example, similarities between suppliers can be displayed visually.

Diagnostic: With this approach, techniques such as decision trees and segmentation can be employed to show why a pattern or relationship exists within the data set. An example here is identifying the attributes of the most successful suppliers within a region.

Predictive: This approach involves the use of techniques such as regression to show the probability of an event occurring in the future.

If you are an analyst charged with a data mining exercise, ensure the following steps are followed at the minimum:

  1. Define the goal and extent of the data mining exercise. What questions are to be answered?

  2. Prepare the data set to be used as the basis for analysis. Is the data sufficient and accurate?

  3. Analyse the data using a variety of statistical measures and visualisation tools so that observations can be made around how data values are distributed and missing data identified. Examples of data mining techniques that can be employed include linear regression, decision tree analysis, predictive scorecards, and so on.