Data Mining

Data MiningĀ or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining derives its name from the similarities between searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.

What Can Data Mining Do?

Although data mining is still in its infancy, companies in a wide range of industries – including retail, finance, heath care, manufacturing transportation, and aerospace – are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.

For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of data mining include:

  • Market segmentation – Identify the common characteristics of customers who buy the same products from your company.
  • Customer churn – Predict which customers are likely to leave your company and go to a competitor.
  • Fraud detection – Identify which transactions are most likely to be fraudulent.
  • Direct marketing – Identify which prospects should be included in a mailing list to obtain the highest response rate.
  • Interactive marketing – Predict what each individual accessing a Web site is most likely interested in seeing.
  • Market basket analysis – Understand what products or services are commonly purchased together; e.g., beer and diapers.
  • Trend analysis – Reveal the difference between a typical customer this month and last.

Data Mining Technologies

The analytical techniques used in data mining are often well-known mathematical algorithms and techniques. What is new is the application of those techniques to general business problems made possible by the increased availability of data and inexpensive storage and processing power. Also, the use of graphical interfaces has led to tools becoming available that business experts can easily use.

Some of the tools used for data mining are:

Artificial neural networks – Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Decision trees – Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

Rule induction – The extraction of useful if-then rules from data based on statistical significance.

Genetic algorithms – Optimization techniques based on the concepts of genetic combination, mutation, and natural selection.

Nearest neighbor – A classification technique that classifies each record based on the records most similar to it in an historical database.

How Data Mining Works

How is data mining able to tell you important things that you didn’t know or what is going to happen next? That technique that is used to perform these feats is called modeling. Modeling is simply the act of building a model (a set of examples or a mathematical relationship) based on data from situations where the answer is known and then applying the model to other situations where the answers aren’t known. Modeling techniques have been around for centuries, of course, but it is only recently that data storage and communication capabilities required to collect and store huge amounts of data, and the computational power to automate modeling techniques to work directly on the data, have been available.

As a simple example of building a model, consider the director of marketing for a telecommunications company. He would like to focus his marketing and sales efforts on segments of the population most likely to become big users of long distance services. He knows a lot about his customers, but it is impossible to discern the common characteristics of his best customers because there are so many variables. From his existing database of customers, which contains information such as age, sex, credit history, income, zip code, occupation, etc., he can use data mining tools, such as neural networks, to identify the characteristics of those customers who make lots of long distance calls. For instance, he might learn that his best customers are unmarried females between the age of 34 and 42 who make in excess of $60,000 per year. This, then, is his model for high value customers, and he would budget his marketing efforts to accordingly.

Do let me know your thoughts in comments below.

If you like what you just read & want to continue your analytics learning,subscribe through emails.