File 1705310604 0009750 Unit-1b
File 1705310604 0009750 Unit-1b
File 1705310604 0009750 Unit-1b
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
– Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses $600 - $800 a month live in that area
– Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k usually buy this type of
CD player
• Fraud detection
– Find outliers of unusual transactions
• Financial planning
– Summarize and compare the resources and spending
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Information Machine
Science Data Mining Learning
Visualization Other
Disciplines
• Data are organized around major subjects, e.g. customer, item, sup
plier and activity.
• Provide information from a historical perspective (e.g. from the past
5 – 10 years)
• Typically summarized to a higher level (e.g. a summary of the
transactions per item type for each store)
• User can perform drill-down or roll-up operation to view the data at
different degrees of summarization
• Cluster Analysis
– Class label is unknown: group data to form new classes
– Clusters of objects are formed based on the principle of maximizing
intra-class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters may
represent individual target groups for marketing.
• Outlier Analysis
– Data that do no comply with the general behavior or model.
– Outliers are usually discarded as noise or exceptions.
– Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts
• Evolution Analysis
– Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the stocks of
particular companies.
Database
Technology Statistics
Information Machine
Science Data Mining Learning
Visualization Other
Disciplines
1.6 Classification of data mining systems
• Database
– Relational, data warehouse, transactional, stream, object-oriented/relational,
active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW
• Knowledge
– Characterization, discrimination, association, classification, clustering, trend/
deviation, outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels
• Techniques utilized
– Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, bio-data mining, stock
market analysis, text mining, Web mining, etc.
1.7 Data Mining Task Primitives
(1) The set of task-relevant data – which portion of the database to be used
(5) Visualization methods – what form to display the result, e.g. rules,
(1)
(3)
(2)
(1)
(1)
(1)
(2)
(1)
(5)
Data Mining: Concepts and Techniques 40
Why Data Mining Query Language?