Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
Intelligence(2170715)
Unit-5
Concept Description
and Association Rule
Mining
Outline
What is concept description?
Market basket analysis
Association Rule Mining
Generating Rules
Improved apriori algorithm
Incremental ARM (Association Rule Mining)
Associative Classification
Rule Mining
Support
• Fraction of transactions that contain an itemset
o E.g. s({Milk, Bread, Chocolate}) = 2/5
Frequent Itemset
• An itemset whose support is greater than or equal to a minimum support
threshold
Unit: 5 – Concept Description and Association Rule Mining 9
Association rule mining (Cont..)
Association Rule
• An implication expression of the form X → Y, where X and Y are itemsets
o E.g.: {Milk, Chocolate} → {Pepsi}
Rule Evaluation
• Support (s)
o Fraction of transactions that contain both X and Y
• Confidence (c)
o Measures how often items in Y appear in transactions that contain X
Example:
Find support & confidence for {Milk, Chocolate} ⇒ Pepsi
= c = 67
Answer
Support (s) : 0.4
{Milk, Chocolate} → {Pepsi} c = 0.67
{Milk, Pepsi} → {Chocolate} c = 1.0
{Chocolate, Pepsi} → {Milk} c = 0.67
{Pepsi} → {Milk, Chocolate} c = 0.67
{Chocolate} → {Milk, Pepsi} c = 0.5
{Milk} → {Chocolate, Pepsi} c = 0.5
Unit: 5 – Concept Description and Association Rule Mining 12
Association rule mining (Cont..)
A common strategy adopted by many association rule
mining algorithms is to decompose the problem into two
major subtasks:
1. Frequent Itemset Generation
• The objective is to find all the item-sets that satisfy
the minimum support threshold.
• These itemsets are called frequent itemsets.
2. Rule Generation
• The objective is to extract all the high-confidence
rules from the frequent itemsets found in the
previous step.
Unit: 5 – Concept Description and Association Rule Mining 13
Apriori algorithm
Purpose: The Apriori Algorithm is an influential algorithm for
mining frequent itemsets for Boolean association rules.
Key Concepts:
• Frequent Itemsets:
The sets of item which has minimum support (denoted by Li for ith-Itemset).
• Apriori Property:
Any subset of frequent itemset must be frequent.
• Join Operation:
To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 itself.
Data
Feedback Business
Strategy
TID Items
null
1 AB
Header B:8 A:2
2 BCD
3 ACDE Item Support
A:5 C:3 C:1 D:1
4 ADE B 8
5 ABC A 7 C:3 D:1 D:1 E:1 D:1 E:1
6 ABCD C 7
D:1 E:1
7 BC D 5
8 ABC E 3
9 ABD
10 BCE
null
Header Table
f:4 c:1
Item frequency
f 4
c 4 c:3 b:1 b:1
a 3
b 3 a:3 p:1
m 3
p 3 m:2 b:1
p:2 m:1