What Is Data Mining?

Data Mining is the art and science of discovering and exploiting new, useful, and profitable relationships in data.

In any business, we need to study what happens in order to improve. We study our potential customers. We study our actual customers. We study what we do and how we do it.

Our objective in data mining is to find patterns of behavior — predictable outcomes — and turn these into profitable opportunities.

In the past this study of patterns was severely limited by the amount of human effort involved, and by the expense of gathering the necessary data.

Now the balance has changed. As businesses have become more automated, more data is readily available than ever before. More computing power is available to process that data, and automated techniques that can be used to find patterns in that data with limited human intervention have evolved and matured.

Data Mining Techniques

Most data mining techniques fall into one of two related categories: model building, and clustering.

Model Building seeks to create a predictive model related to a business question. For example, we could try and model how likely different customers would be to be interested in some particular offer, and how much profit we would expect to earn if they accepted our proposal. If we succeed in doing this, then we can make a rational decision as to which customers to approach based upon (a) the cost of making the offer, (b) the estimated probability of acceptance, and (c) the estimated profit if that customer accepts the offer.

Depending upon the techniques chosen, a model may be either an opaque model (it works but we aren't exactly sure how or why) or a transparent model (we understand exactly how the model arrives at any prediction). Either may be acceptable, depending upon the application. An opaque model that predicts production defect rates is perfectly acceptable if our interest is limited to production planning, but we would need a transparent model if we were interested in improving the process.

A particular important type of modelling in retail situations is known as Market Basket Analysis.

Clustering attempts to segment a population into one or more groups that have (as far as we are concerned) similar characteristics and are therefore expected to behave in a similar manner. Unlike model building there is typically no specific outcome or attribute that must be predicted. The objective is often to group similar things together so that we can think about them better.

I marketing, clustering is particularly common — so common that some clusters, such as Yuppies (Young Upwardly Mobile Professionals) and Dinkies (Dual Income, No Kids Yet) have made it into everyday speech. It's impractical to aim a product or a marketing message at "everyone": it's better to find groups of people who are likely to respond similarly and to aim the product or message at that group.

Psychographic segmentation was particularly prominent in the 2016 US Presidential Election, and the Brexit referendum of the same year, when controversial use was made of clusters based upon psychological profiles to target groups using social media.

Within the model building and clustering areas there are many available techniques. Which technique to apply will depend upon the specific business objectives, as well as on the availability and structure of the available data.

Next: Data Mining: What Can it do for me?

See Also: Recommended books on Data Mining and Data Science