Data Mining Introduction Unit III
Data Mining Introduction Unit III
Data Mining Introduction Unit III
1
Chapter 1. Introduction
1960s:
◦ Data collection, database creation, IMS and network DBMS
1970s:
◦ Relational data model, relational DBMS implementation
1980s:
◦ RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
and application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s—2000s:
◦ Data mining and data warehousing, multimedia databases, and Web
databases
Customer profiling
◦ data mining can tell you what types of customers buy what products
(clustering or classification)
Identifying customer requirements
◦ identifying the best products for different customers
◦ use prediction to find what factors will attract new customers
Provides summary information
◦ various multidimensional summary reports
◦ statistical summary information (data central tendency and variation)
September 12, 2023 Data Mining: Concepts and Techniques 8
Corporate Analysis and Risk
Management
Applications
◦ widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
Approach
◦ use historical data to build models of fraudulent behavior and use data
mining to help identify similar instances
Examples
◦ auto insurance: detect a group of people who stage accidents to collect
on insurance
◦ money laundering: detect suspicious money transactions (US Treasury's
Financial Crimes Enforcement Network)
◦ medical insurance: detect professional patients and ring of doctors and
ring of references
September 12, 2023 Data Mining: Concepts and Techniques 10
Fraud Detection and Management (2)
Detecting inappropriate medical treatment
◦ Australian Health Insurance Commission identifies that in many cases
blanket screening tests were requested (save Australian $1m/yr).
Detecting telephone fraud
◦ Telephone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm.
◦ British Telecom identified discrete groups of callers with frequent intra-
group calls, especially mobile phones, and broke a multimillion dollar
fraud.
Retail
◦ Analysts estimate that 38% of retail shrink is due to dishonest
employees.
Sports
◦ IBM Advanced Scout analyzed NBA game statistics (shots blocked,
assists, and fouls) to gain competitive advantage for New York Knicks
and Miami Heat
Astronomy
◦ JPL and the Palomar Observatory discovered 22 quasars with the help
of data mining
Internet Web Surf-Aid
◦ IBM Surf-Aid applies data mining algorithms to Web access logs for
market-related pages to discover customer preference and behavior
pages, analyzing effectiveness of Web marketing, improving Web site
organization, etc.
September 12, 2023 Data Mining: Concepts and Techniques 12
Data Mining: A KDD Process
Pattern Evaluation
◦ Data mining: the core of
knowledge discovery
process. Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
September 12, 2023 Data Mining: Concepts and Techniques 13
Steps of a KDD Process
Data Exploration
Statistical Analysis, Querying and Reporting
Pattern evaluation
Data
Databases Warehouse
Relationaldatabases
Data warehouses
Transactional databases
Advanced DB and information repositories
◦ Object-oriented and object-relational databases
◦ Spatial databases
◦ Time-series data and temporal data
◦ Text databases and multimedia databases
◦ Heterogeneous and legacy databases
◦ WWW
September 12, 2023 Data Mining: Concepts and Techniques 21
September 12, 2023 Data Mining: Concepts and Techniques 22
September 12, 2023 Data Mining: Concepts and Techniques 23
September 12, 2023 Data Mining: Concepts and Techniques 24
September 12, 2023 Data Mining: Concepts and Techniques 25
September 12, 2023 Data Mining: Concepts and Techniques 26
September 12, 2023 Data Mining: Concepts and Techniques 27
September 12, 2023 Data Mining: Concepts and Techniques 28
September 12, 2023 Data Mining: Concepts and Techniques 29
September 12, 2023 Data Mining: Concepts and Techniques 30
September 12, 2023 Data Mining: Concepts and Techniques 31
September 12, 2023 Data Mining: Concepts and Techniques 32
September 12, 2023 Data Mining: Concepts and Techniques 33
September 12, 2023 Data Mining: Concepts and Techniques 34
September 12, 2023 Data Mining: Concepts and Techniques 35
Data Mining Functionalities (1)
Concept description: Characterization and
discrimination
◦ Data can be associated with Classes or concepts.
◦ Useful to describe individual class in Generalize,
summarize, and contrast data characteristics, e.g., dry vs. wet
regions
◦ Data Characterization: Summarizing the data of the class
under study in general terms.
◦ Data Discrimination: Comparison of the general features of
target class data objects with the general features of objects
from one or a set of contrasting classes.
Outlier analysis
◦ Outlier: a data object that does not comply with the general behavior of the
data
◦ It can be considered as noise or exception but is quite useful in fraud detection,
rare events analysis
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
General functionality
◦ Descriptive data mining
◦ Predictive data mining
Different views, different classifications
◦ Kinds of databases to be mined
◦ Kinds of knowledge to be discovered
◦ Kinds of techniques utilized
◦ Kinds of applications adapted
September 12, 2023 Data Mining: Concepts and Techniques 43
A Multi-Dimensional View of Data
Mining Classification
Databases to be mined
◦ Relational, transactional, object-oriented, object-relational, active, spatial,
time-series, text, multi-media, heterogeneous, legacy, WWW, etc.
Knowledge to be mined
◦ Characterization, discrimination, association, classification, clustering,
trend, deviation and outlier analysis, etc.
◦ Multiple/integrated functions and mining at multiple levels
Techniques utilized
◦ Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, neural network, etc.
Applications adapted
◦ Retail, telecommunication, banking, fraud analysis, DNA mining, stock market
analysis, Web mining, Weblog analysis, etc.
September 12, 2023 Data Mining: Concepts and Techniques 44
OLAP Mining: An Integration of Data
Mining and Data Warehousing
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse
September 12, 2023 Data Mining: Concepts andRepository
Techniques 46
Major Issues in Data Mining (1)