Week 1-2
Week 1-2
Week 1-2
Datawarehouse
•Summary of data organized around major subjects
Involves data cleaning, integration, transformation, loading
and periodic refreshing
•Multi-dimensional database structure
Each dimension corresponds to an attribute
Data mart vs. Datawarehouse: Department wide vs. enterprise
wide
Advanced Data and Information System
•Object Relational Databases
•Temporal, Sequence and Time-series Databases
Examples: data from stock exchange, inventory control and observation of natural phenomena
Data mining to unravel the change in trends
•Spatial and Spatiotemporal Databases
Uncover patterns pertaining fields, gardens or houses
•Text and Multimedia Databases
•Heterogeneous and Legacy Databases
Information exchange is the main issue which may be resolved using Data mining to generalize data in to higher
conceptual levels
•Data Streams -Scientific and engineering data
-Data mining-constantly evaluate incoming streams for patterns and dynamic changes
•The World Wide Web -Web mining
Other Major Issues in Data Mining
•Mining methodology
Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
Performance: efficiency, effectiveness, and scalability
Pattern evaluation: the interestingness problem
Incorporation of background knowledge
Handling noise and incomplete data
Parallel, distributed and incremental mining methods
Integration of the discovered knowledge with existing one: knowledge fusion
•User interaction
Data mining query languages and ad-hoc mining
Expression and visualization of data mining results
Interactive mining of knowledge at multiple levels of abstraction
•Applications and social impacts
Domain-specific data mining & invisible data mining
Protection of data security, integrity, and privacy
MAJOR ALGORITHMS IN DATA MINING
Data Mining Functionalities
•Data mining tasks can be classified in to two categories:
Descriptive: Characterize the general properties of data
Predictive: Inferences on current data in order to make predictions
•A measure of certainty may also be associated with each pattern
Data Mining Functionalities
•Multidimensional concept description: Characterization and discrimination
Characterization: Generalize or summarize the target class or class under study based upon features, and
contrast data characteristics, e.g., dry vs. wet regions
Discrimination: Is comparing a target class with a set of contrasting classes
•Frequent patterns, association, correlation vs. causality
Extremist Literature Bombs [0.5%, 75%] (Correlation or causality?)
•Classification and prediction
Construct models (functions) that describe and distinguish classes or concepts for future prediction .E.g.,
classify countries based on (climate), or classify cars based on (gas mileage)
Predict some unknown or missing numerical values
Architecture: Typical Data Mining System