01 Intro To Data Mining
01 Intro To Data Mining
01 Intro To Data Mining
A multi-disciplinary
filed which combines
Statistics, AI & Machine
Learning, Database & Data
Warehousing
Data mining is the process of discovering interesting
patterns and knowledge from large amounts of data.
The data sources can include databases, data warehouses,
the Web, other information repositories, or data that are
streamed into the system dynamically.
Improving health care and reducing costs Predicting the impact of climate change
Prediction Methods
Reducing hunger and poverty by increasing
✓ Use some variables to predict unknown or future values
of other variables.
Description Methods
✓ Find human-interpretable patterns that describe the
data.
Data
Tid Refund Marital Taxable
Status
Income Cheat
Milk
1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No
4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K
No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married
75K No 10 No Single 90K Yes 11 No Married 60K No 12 Yes
10
Divorced 220K No 13 No Single 85K Yes 14 No Married 75K No
15 No Single 90K Yes
Find a model for class attribute as a function of the values of
other attributes
Model for predicting credit
worthiness
Employed
Class
Education No
Credit Worthy Education
# years at present
Tid Employed Level of address No Yes
…… Number of Number of
years years
> 3 yr < 3 yr > 7 yrs < 7 yrs
Yes Yes No
No Tid Employed
Level of # years at
Credit Worthy
Education present
address
Education 1 Yes Undergrad 7 ? 2 No
Tid Employed Level of # years at present addressGraduate 3 ? 3 Yes High
Credit Worthy School 2 ?
1 Yes Graduate 5 Yes 2 Yes High School 2 No 3 No Set
Undergrad 1 No 4 Yes High School 10 Yes … … … …
10
…
…………… 10
Test
Learn
Classifier Model
Data Preparation
▪
involves data,
dataset,
databases and ETL(
Extraction,
Transformation &
Loading