Data Mining in The Applied World: Value From Volume

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5



Data Mining In The Applied


Nowadays, digital information is relatively easy to capture and fairly inexpensive to
store. The digital revolution has seen collections of data grow in size, and the
complexity of the data therein increase. Question commonly arising as a result of this
state of affairs is, having gathered such quantities of data, what do we actually do
with it? It is often the case that large collections of data, however well structured,
conceal implicit patterns of information that cannot be readily detected by
conventional analysis techniques. Such information may often be usefully analyzed
using a set of techniques referred to as knowledge discovery or data mining. These
techniques essentially seek to build a better understanding of data, and in building
characterizations of data that can be used as a basis for further analysis, extract
value from volume.
In this paper we present the data warehousing and mining concepts, the goals
behind data mining and its applications in the real world.
1. Introduction
We live in the Age of Information. The importance of collecting the data that reflect
business or scientific activities to achieve competitive advantage is widely recognized
now. Powerful systems for collecting data and managing it in large databases are in
place in all large and mid-range organizations. The value of raw data (collected over
a long time) is on the ability to extract high-level information: information useful for
decision support, for exploration, and for better understanding of the phenomena
generating the data. Traditionally this task of extracting information was done with
the help of analysis where one or more analysts with the help of statistical
techniques provide summaries and generate reports. Such an approach fails as the
volume and dimensionality of the data increase. Who could expect to understand
millions of cases each having hundreds of fields? To complicate the issue, the data
expand and change at rates that could easily defy human analysis. Hence tools to aid
the automation of analysis tasks are becoming a necessity. Thus, data mining was
evolved which is automatic extraction of patterns of information from the data. The
additional benefit of using the automated process of data mining systems is that this
process has a much lower cost than hiring an army of highly trained professional
statisticians (analysts). While data mining does not eliminate human participation in
solving the task completely, it significantly simplifies the job and allows an analyst to
manage the process of extracting knowledge from data. Many organizations now
view information as one of their most valuable assets and data mining allows a
company to make full utilization of these information assets. Two critical factors for
success with data mining are: a large, well-integrated data warehouse and a welldefined understanding of the business process within which data
mining is to be applied (such as customer prospecting, retention, campaign
management, and so on).
2. Data Warehousing
Before discussing the different applications of data mining let us first delve upon data

Data warehousing deals with the problem of gaining unified access to data from
multiple and potentially incompatible information systems. A data warehouse is a
central repository for all or significant parts of the data that an enterprise's various
business systems collect. A data warehouse is defined as:
(i) Subject-oriented, integrated.
(ii) Time-variant, nonvolatile collection of data in support of management decision
Data from various online transaction processing applications and other sources is
selectively extracted and organized on the data warehouse database for use by
analytical applications and user queries. Data warehousing emphasizes the capture
of data from diverse sources for useful analysis and access.
2. Data Mining
As the term connotes, data mining refers to the mining or discovery of new
information in terms of patterns or rules from vast amounts of data. Data mining
helps in achieving the following goals or tasks:
1. Prediction: Data mining can show how certain attributes within the data will
behave in the future. Examples of predictive data mining in the business context
includes the analysis of buying transactions to predict what consumers will buy under
certain discounts and how much sales volume a store will generate in a given period.
In a scientific context, certain seismic wave patterns may predict an earthquake with
high probability.
2. Identification: Data patterns can be used to identify the existence of an item an
event or an activity. For example, in biological applications, existence of a gene may
be identified by certain sequences of
nucleotide symbols in the DNA sequence. It also involves authentication where it is
ascertained whether a user is indeed a specific user or one from an authorized class;
it involves a comparison of parameters or images or signals against a database.
3. Classification: Data mining can partition the data so that different classes or
categories can be identified based on combination of parameters. For example,
customers in a supermarket can be categorized into discount seeking shoppers,
shoppers in a rush, loyal regular shoppers and infrequent shoppers. This
classification may be used in different analysis of customer buying transactions as
post mining activity.
4. Optimization: One eventual goal of data mining activity is to optimize the use of
limited resources such as time, space, money, or materials and to maximize output
variables such as sales or profits under a given set of constraints.
These goals are realized with the help of different approaches such as Discovery of
sequential patterns, Discovery of patterns in time series, Discovery of classification
rules, Regression, Neural networks, Genetic Algorithms, Clustering and
3. Data Mining in the Real World
Although data mining is still in its infancy, organizations working in a wide range of
environments - including retail, finance, heath care, manufacturing, transportation,
education, natural resource planning and aerospace - are already using data mining
tools and techniques to take advantage of historical data. By using pattern
recognition technologies and statistical and mathematical techniques to sift through
warehoused information, data mining helps analysts recognize significant facts,
relationships, trends, patterns, exceptions and anomalies that might otherwise go
unnoticed. We now site some data
mining applications in operation in various fields:

3.1 Business Management

For businesses, data mining is used to discover patterns and relationships in the data
in order to help make better business decisions. Data mining can help spot sales
trends, develop smarter marketing campaigns, and accurately predict customer
loyalty. Specific uses of data mining include:
Market segmentation - Identify the common characteristics of customers who
buy the same products from your company.
Customer churn - Predict which customers are likely to leave your company and
go to a competitor.
Fraud detection - Identify which transactions are most likely to be fraudulent.
Direct marketing - Identify which prospects should be included in a mailing list to
obtain the highest response rate.
Interactive marketing - Predict what each individual accessing a Web site is most
likely interested in seeing.
Market basket analysis - Understand what products or services are commonly
purchased together.
Trend analysis - Reveal the difference between a typical customer this month and
The above uses are elaborated further in the following cases:
3.1.1 Telecommunication Company
Details about who call whom, how long they are on the phone, and whether a line is
used for fax as well as voice can be invaluable in targeting sales of services and
equipment to specific customers. But these tidbits are buried in masses of numbers
in the database. By delving into its extensive customer-call database to
manage its communications network, a regional telephone company identified new
types of unmet customer needs. Using its data mining system, it discovered how to
pinpoint prospects for additional services by measuring daily household usage for
selected periods. For example, households that make many lengthy calls between 3
p.m. and 6 p.m. are likely to include teenagers who are prime candidates for their
own phones and lines. When the company used target marketing that emphasized
convenience and value for adults - "Is the phone always tied up?" - hidden demand
surfaced. Extensive telephone use between 9 a.m. and 5 p.m. characterized by
patterns related to voice, fax, and modem usage suggests a customer has business
activity. Target marketing offering those customers "business communications
capabilities for small budgets" resulted in sales of additional lines, functions, and
3.1.2 Credit Card Sale
A bank searching for new ways to increase revenues from its credit card operations
tested a non-intuitive possibility: Would credit card usage and interest earned
increase significantly if the bank halved its minimum required payment? With
hundreds of gigabytes of data representing two years of average credit card
balances, payment amounts, payment timeliness, credit limit usage, and other key
parameters, the bank used a powerful data mining system to model the impact of
the proposed policy change on specific customer categories, such as customers
consistently near or at their credit limits who make timely minimum or small
payments. The bank discovered that cutting minimum payment requirements for
small, targeted customer categories could increase average balances and extend
indebtedness periods, generating more than $25 million in additional interest earned.


3.1.3 Pharmaceutical Company

A pharmaceutical company analyzed its recent sales force activity and their results to
improve targeting of

high-value physicians and determine which marketing activities will have the greatest
impact in the next few months. The data included competitor market activity as well
as information about the local health care systems. The results were distributed to
the sales force via a wide-area network that enabled the representatives to review
the recommendations from the perspective of the key attributes in the decision
process. The reviews of the sales force along with the results were sent back to the
top management for final decisions. The ongoing, dynamic analysis of the data
warehouse allows best practices from throughout the organization to be applied in
specific sales situations.
3.1.4 Shelf spacing in supermarkets
A supermarket decided to allot shelf space to products and place them according to
the requirements of the customers. For this, they performed a market basket
analysis (a data mining technique) and found there was a correlation between baby
diapers and beer sold at that establishment. The company used this completely nonintuitive information to rearrange its shelves and place the beer and diapers within
close proximity of each other and wound up with a healthy increase in sales. The
point is that these kinds of relationships are often obscure and not intuitively obvious
for a human to even think of exploring.
3.2 Other Areas of Application
Though data mining is most visible in the business world, it finds application in other
areas too, where it facilitates decision-making, resource optimization, cost
effectiveness and classification. We hereby discuss some cases to support this:
3.2.1 Expert GIS for water resource planning:
The Texas water development board is a state agency responsible for long-term
water supply planning. One of its major tasks is to assure water resources for a wide
region through good planning and sound water management. The manual planning
process is very tedious and difficult, and suffered from a number of limitations. Thus,
the planning system was automated and it comprised of:
1. An expert rule system.
2. A geographic information system (GIS)
3. A Network Flow solver.
The rule-based system contains expertise acquired from water resources planning
experts. The GIS system stores and analyses spatially distributed water supply and
demand data. The task of the network flow solver is to balance the flows in networks
developed by the expert GIS with input from various water analyst. The objective of
this part is to find the least costly allocation solution. In case of a deficit it is also
able to suggest alternative supplies that are efficient and cost affective.
3.2.2 Intelligent search agents on the Internet
The Internet mainly uses data mining in the form of intelligent search agents. One
such search agent, the Purple Yogi empowers networks to understand both content
and user needs, enabling the next generation of content management and enterprise
knowledge management solutions. A Yogi Discovery System understands the content
in the network, discovers the users' interests and empowers the network to connect
the right content to the right users. By driving this awareness into the network, a
Yogi Discovery System greatly reduces the time and effort users and content
providers expend searching for each other. Users benefit from having relevant
information made effortlessly available to them, information they might not
even know existed. Content providers benefit from reaching exactly the right set of
users interested in their content.
3.2.3 Health Care
Merck-Medco Managed Care is a mail-order business which sells drugs to the
country's largest health care providers: Blue Cross and Blue Shield state
organizations, large HMOs, U.S. corporations, state governments, etc. Merck-Medco

is mining its one terabyte data warehouse to uncover hidden links between illnesses
and known drug treatments, and spot trends that help pinpoint which drugs are the
most effective for what types of patients. The results are more effective treatments
that are also less costly. Merck-Medco's data mining project has helped customers
save an average of 10-15% on prescription costs.
3.2.4 Education
The education domain offers many interesting and challenging applications for data
mining. First, an educational institution often has many diverse and varied sources of
information. There are the traditional databases (e.g. students information,
teachers information, class and schedule information, alumni information), online
information (online web pages and course content pages) and more recently,
multimedia databases. Second, there are many diverse interest groups in the
educational domain that give rise to many interesting mining requirements. For
example, the administrators may wish to find out information such as admission
requirements and to predict the class enrollment size for timetabling. The students
may wish to know how best to select courses based on prediction of how well they
will perform in the courses selected. The alumni office may need to know how best to
perform target mailing so as to achieve the best effort in reaching out to those
alumni that are likely to respond. All these applications not only contribute towards
the education institute delivering a better quality education experience, but also aid
the institution in running its administrative tasks. With so much information and so
many diverse needs, it is foreseeable that an integrated data mining system that is
able to cater for the special needs of an education institution will be in great demand
particularly in the 21st century.
4. Conclusion
Data mining challenges the long standing viewpoint that computers and internet do
bring information but not knowledge. In the new millennium, competitive enterprises
will be mining their data with sophisticated data mining tools to find and attract the
best customers, to improve and enhance their product offerings, to maximize
operating efficiency and to cut costs and improve customer satisfaction. With time
and resources in short supply, data mining software will help enterprises maximize
resources to remain competitive.
In the short-term, the results of data mining will be in profitable, if mundane,
business related areas. Micro-marketing campaigns will explore new niches.
Advertising will target potential customers with new precision.
In the medium term, data mining may be as common and easy to use as e-mail. We
may use these tools to find the best airfare to New York, root out a phone number of
a long-lost classmate, or find the best prices on lawn mowers.
The long-term prospects are truly exciting. Imagine intelligent agents turned loose
on medical research data or on sub-atomic particle data. Computers may reveal new
treatments for diseases or new insights into the nature of the universe.
Thus we see that with the advancements and deployment of sophisticated data
mining tools, computers can think bringing knowledge to our desktops.

You might also like