Lesson 2.2 - Frequent Pattern Analysis
Lesson 2.2 - Frequent Pattern Analysis
Lesson 2.2 - Frequent Pattern Analysis
Anis ur Rahman
Department of Computing
NUST-SEECS
Islamabad
1 / 54
Mining Frequent Patterns, Associations and Correlations
Road Map
Basic Concepts
Apriori Algorithm
Optimization Techniques
FP-Growth
2 / 54
Mining Frequent Patterns, Associations and Correlations
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
3 / 54
Mining Frequent Patterns, Associations and Correlations
Frequent Patterns
4 / 54
Mining Frequent Patterns, Associations and Correlations
Association Rules
5 / 54
Mining Frequent Patterns, Associations and Correlations
Association Rules
Rules that satisfy both minsup and minconf are called strong rules.
6 / 54
Mining Frequent Patterns, Associations and Correlations
7 / 54
Mining Frequent Patterns, Associations and Correlations
8 / 54
Mining Frequent Patterns, Associations and Correlations
Computational Complexity
9 / 54
Mining Frequent Patterns, Associations and Correlations
10 / 54
Mining Frequent Patterns, Associations and Correlations
Road Map
1 Basic Concepts
2 Apriori Algorithm
3 Optimization Techniques
4 FP-Growth
11 / 54
Mining Frequent Patterns, Associations and Correlations
Apriori: Method
12 / 54
Mining Frequent Patterns, Associations and Correlations
Apriori Example
Database
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Supmin = 2
1st scan
Itemset
Itemset sup
Itemset sup
{A, B}
{A} 2
{A} 2 {A, C}
{B} 3 ⇒ {B} 3 ⇒ {A, E}
{C} 3
{C} 3 {B, C}
{D} 1
{E} 3 {B, E}
{E} 3
{C, E}
13 / 54
Mining Frequent Patterns, Associations and Correlations
Apriori Example
Database
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Supmin = 2
2nd scan
Itemset sup
Itemset sup
{A, B} 1
{A, C} 2 {A, C} 2 Itemset
{A, E} 1 ⇒ {B, C} 2 ⇒
{B, C, E}
{B, C} 2 {B, E} 3
{B, E} 3 {C, E} 2
{C, E} 2
14 / 54
Mining Frequent Patterns, Associations and Correlations
Apriori Example
Database
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Supmin = 2
3rd scan
Itemset sup
{B, C, E} 2
15 / 54
Mining Frequent Patterns, Associations and Correlations
Apriori Algorithm
16 / 54
Mining Frequent Patterns, Associations and Correlations
Candidate Generation
17 / 54
Mining Frequent Patterns, Associations and Correlations
18 / 54
Mining Frequent Patterns, Associations and Correlations
19 / 54
Mining Frequent Patterns, Associations and Correlations
S ⇒ (L − S )
If (support_count(L)/support_count(S)) ≥ min_conf
20 / 54
Mining Frequent Patterns, Associations and Correlations
Example
Question: If the minimum confidence = 70%, what are the strong rules?
21 / 54
Mining Frequent Patterns, Associations and Correlations
23 / 54
Mining Frequent Patterns, Associations and Correlations
24 / 54
Mining Frequent Patterns, Associations and Correlations
Road Map
Basic Concepts
Apriori Algorithm
Optimization Techniques
FP-Growth
25 / 54
Mining Frequent Patterns, Associations and Correlations
26 / 54
Mining Frequent Patterns, Associations and Correlations
Itemset sup
Tid Items Itemset
{A} 2
10 A, C, D {A}
⇒ {B} 3 ⇒
20 B, C, E {B}
{C} 3
30 A, B, C, E {C}
{D} 1
40 B, E {E}
{E} 3
Note. min-support=2
27 / 54
Mining Frequent Patterns, Associations and Correlations
Hash codes 0 1 2 3 4 5 6
Buckets {A,C} {B,C}
{A,B} {B,E}
{A,C} {C,E}
{A,E} {B,C}
{B,E}
{C,E}
{B,E}
Buckets counters 0 0 4 0 0 7 0
binary vector 0 0 1 0 0 1 0
28 / 54
Mining Frequent Patterns, Associations and Correlations
Itemset sup
Itemset
{A,B} 1
{A,C} 2 {A,C}
{A,E} 1 ⇒ {B,C}
{B,C} 2 {B,E}
{B,E} 3 {C,E}
{C,E} 2
29 / 54
Mining Frequent Patterns, Associations and Correlations
Hash codes 0 1 2 3 4 5 6
Buckets {A,B,C} {B,C,E}
{A,B,E} {B,C,E}
{A,C,E}
Buckets counters 0 0 3 0 0 2 0
binary vector 0 0 1 0 0 1 0
30 / 54
Mining Frequent Patterns, Associations and Correlations
Itemset sup
{A,B,C} 1 Itemset
{A,B,E} 1 ⇒
{B,C,E}
{A,C,E} 1
{B,C,E} 2
31 / 54
Mining Frequent Patterns, Associations and Correlations
32 / 54
Mining Frequent Patterns, Associations and Correlations
D1 + D2 + · · · + Dk = D
33 / 54
Mining Frequent Patterns, Associations and Correlations
D1 + D2 + · · · + Dk = D
34 / 54
Mining Frequent Patterns, Associations and Correlations
Correctness:
It is easy to see that the intersection of tidlists gives the correct
support for an itemset
Since the partitions are non overlapping, a cumulative count over
all partitions gives the global support for an itemset
35 / 54
Mining Frequent Patterns, Associations and Correlations
#Transactions= 200,000
36 / 54
Mining Frequent Patterns, Associations and Correlations
Road Map
Basic Concepts
Apriori Algorithm
Optimization Techniques
FP-Growth
37 / 54
Mining Frequent Patterns, Associations and Correlations
38 / 54
Mining Frequent Patterns, Associations and Correlations
Example FP-Growth
39 / 54
Mining Frequent Patterns, Associations and Correlations
40 / 54
Mining Frequent Patterns, Associations and Correlations
41 / 54
Mining Frequent Patterns, Associations and Correlations
42 / 54
Mining Frequent Patterns, Associations and Correlations
43 / 54
Mining Frequent Patterns, Associations and Correlations
44 / 54
Mining Frequent Patterns, Associations and Correlations
45 / 54
Mining Frequent Patterns, Associations and Correlations
46 / 54
Mining Frequent Patterns, Associations and Correlations
47 / 54
Mining Frequent Patterns, Associations and Correlations
48 / 54
Mining Frequent Patterns, Associations and Correlations
49 / 54
Mining Frequent Patterns, Associations and Correlations
50 / 54
Mining Frequent Patterns, Associations and Correlations
FP-growth properties
51 / 54
Mining Frequent Patterns, Associations and Correlations
itemset TID_set
I1 {T100 , T400 , T500 , T700 , T800 , T900 }
I2 {T100 , T200 , T300 , T400 , T600 , T800 , T900 }
I3 {T300 , T500 , T600 , T700 , T800 , T900 }
I4 {T200 , T400 }
I5 {T100 , T800 }
The frequent k-itemsets are used to construct candidate (k+1)-itemsets based on the Apriori
property
itemset TID_set
{I1 , I2 } {T100 , T400 , T800 , T900 }
{I1 , I3 } {T500 , T700 , T800 , T900 }
{I1 , I4 } {T400 }
{I1 , I5 } {T100 , T800 }
{I2 , I3 } {T300 , T600 , T800 , T900 }
{I2 , I4 } {T200 , T400 }
{I2 , I5 } {T100 , T800 }
{I3 , I5 } {T800 }
53 / 54
Mining Frequent Patterns, Associations and Correlations
itemset TID_set
{I1 , I2 , I3 } {T800 , T900 }
{I1 , I2 , I5 } {T100 , T800 }
54 / 54