Data Mining: Department of Information Technology University of The Punjab, Jhelum Campus
Data Mining: Department of Information Technology University of The Punjab, Jhelum Campus
Data Mining: Department of Information Technology University of The Punjab, Jhelum Campus
MINING
Ayesha Irfan
Department of Information Technology
University Of The Punjab, Jhelum Campus
Data mining
Data mining is a process of analyzing data from
different perspective and summarizing it into useful
information that can be used to increase revenue.
Many people treat data mining as a synonym for another
popularly used term, knowledge discovery from data, or
KDD.
Data, Information and Knowledge
Data is any facts, numbers or text that can be processed
by computer.
Information The patterns associations or relationships
among all the data can provide information.
Data mining
Knowledge The information can be converted into
knowledge about historical trends and future patterns.
Mean=
=
=
= 696/12 = 58
Median
Data : 1,3,3,6,7,8,9 Odd Media
n
Median = = = 4th Value 6
Data: 1,2,3,4,7,8,9,10 Even
Median = [] = []
Media
[] = = = = 5.5 n
Basic Statistical Description of data
Mode
The mode is the value that appears the most often.
Some data may not have a mode.
Some data may have more two modes, this is known
as bimodal.
Data: 0,2,1,0,0,3,2,4,2,2
Mode: 2
Data: 5,2,1,3,1,4,5,5,4,1
Mode: 1 and 5
Basic Statistical Description of data
Standard Deviation x X-
2 5 -3 9
σ=
4 5 -1 1
Data: 4,2,5,8,6 5 5 0 0
σ= = =2.24 6 5 1 1
8 5 3 9
=20
Variance
Variance= (σ
= (2 = 5
Box Plot Theory
Dataset:
14,6,3,2,4,15,11,8,1,7,2,1,3,4,10,22,20
Arrange data: 1,1,2,2,3,3,4,4,6,7,8,10,11,14,15,20,22
Median
Median= = 2.5 Median= = 12.5
Whiskers
1 2.5 6 12.5 22
MINING FREQUENT
ITEMSET
Apriori Algorithm
Let’s look at an example, Dataset is given in the
table below. We will apply the Apriori algorithm
for finding frequent itemsets. min_sup=3
TID Item
T1 M,O,N,K,E,Y
T2 D,O,N,K,E,Y
T3 M,A,K,E
T4 M,U,C,K,Y,
T5 C,O,O,K,I,E
Apriori Algorithm
Step1: Count the number of transactions.
C1
Item Sup_co
unt L1
M 3 Item Sup_count
O 4 M 3
N 2 O 4
K 5 K 5
E 4 E 4
Y 3 Y 3
D 1
A 1
U 1
C 2
Apriori Algorithm
C2
Item Sup_count L2
MO 1 Item Sup_count
MK 3 MK 3
ME 2 OK 3
MY 2 OE 3
OK 3 KE 4
OE 3 KY 3
OY 2
KE 4
KY 3
EY 2
Apriori Algorithm
C3
Item Sup_count L3
MKO 1
Item Sup_count
MKE 2
OKE 3
MKY 2
OKE 3
OKY 2
KEY 2
MINING FREQUENT
PATTERNS
FP growth tree
Dataset min_sup=3
.
Step1: Calculate the sup_count for each element
ID Item bought
f: 1 2 3 4 c:1
100 f,c,a,m,p
200 f,c,a,b,m
300 f,b c:1 2 3 b:1
b:1
400 c,b,p
500 f,c,a,m,p a:12 3
p: 1
m:1 2 b: 1
p:1 2
m:1
FP growth tree
Dataset
Tid Items
10 A,C,D
20 B,C,E
30 A,B,C,E
40 B,E