Data Mining
Data Mining
Data Mining
Sports
IBM Advanced Scout analyzed NBA game statistics (shots
blocked, assists, and fouls) to gain competitive advantage for
New York Knicks and Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the
help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs
for market-related pages to discover customer preference and
behavior pages, analyzing effectiveness of Web marketing,
improving Web site organization, etc.
December 7, 2021 Data Mining: Concepts and Techniques 9
Data Mining: A KDD Process
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
December 7, 2021 Data Mining: Concepts and Techniques 10
Steps of a KDD Process
Data Exploration
Statistical Analysis, Querying and Reporting
Pattern evaluation
Data
Databases Warehouse
Time-series data
Stream data
Multimedia database
Outlier analysis
Outlier: a data object that does not comply with the general
analysis
Trend and evolution analysis
Trend and deviation: regression analysis
Similarity-based analysis
Database
Statistics
Systems
Machine
Learning
Data Mining Visualization
Algorithm Other
Disciplines
General functionality
Descriptive data mining
Predictive data mining
Different views, different classifications
Kinds of data to be mined
Kinds of knowledge to be discovered
Kinds of techniques utilized
Kinds of applications adapted
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
December 7, 2021 Data Mining: Concepts and Techniques 23
Major Issues in Data Mining
Mining methodology
Mining different kinds of knowledge from diverse data types, e.g., bio, stream,
Web
Performance: efficiency, effectiveness, and scalability
Pattern evaluation: the interestingness problem
Incorporation of background knowledge
Handling noise and incomplete data
Parallel, distributed and incremental mining methods
Integration of the discovered knowledge with existing one: knowledge fusion
User interaction
Data mining query languages and ad-hoc mining
Expression and visualization of data mining results
Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
Domain-specific data mining & invisible data mining
Protection of data security, integrity, and privacy
December 7, 2021 Data Mining: Concepts and Techniques 24
Summary
Data mining: discovering interesting patterns from large amounts of
data
A natural evolution of database technology, in great demand, with
wide applications
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
Data mining systems and architectures
Major issues in data mining