Mining: Association Rules
Mining: Association Rules
Mining: Association Rules
Rules
Md Tabrez Nafis
Department of Computer Science &
Engineering
JAMIA HAMDARD, New Delhi
Association Rules
Mining Association Rules between Sets of
Items in Large Databases
(R. Agrawal, T. Imielinski & A. Swami) 1993.
1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
denoted L1
A subsequence pass k consists of two phases
First, the frequent itemsets Lk-1 are used to generate the
candidate itemsets Ck
Next, the database is scanned and the support of candidates in
Ck is counted
The frequent itemsets Lk are determined
Generating Frequent Item
Sets
For k products…
1. User sets a minimum support criterion
2. Next, generate list of one-item sets that
meet the support criterion
3. Use the list of one-item sets to generate
list of two-item sets that meet the support
criterion
4. Use list of two-item sets to generate list of
three-item sets
5. Continue up through k-item sets
The Apriori algorithm
Probably the best known algorithm
Two steps:
Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
Use frequent itemsets to generate rules.
Counts of pairs
of
Frequent items
Pass 1 Pass 2
A-Priori Using Triangular Matrix
Why would we even
mention the infrequent items?
Pass 1 Pass 2
Frequent Triples, Etc.
Filter
Filter
C1 L1 Construct C2 L2 Construct C3
First Second
pass pass
Frequent Frequent
items pairs
Step 1: Mining all frequent
itemsets
A frequent itemset is an itemset whose support
is ≥ minsup.
Key idea: The apriori property (downward
closure property): any subsets of a frequent
itemset are also frequent itemsets
ABC ABD ACD BCD
AB AC AD BC BD CD
A B C D
The Algorithm
Iterative algo. (also called level-wise search):
Find all 1-item frequent itemsets; then all 2-item
frequent itemsets, and so on.
In each iteration k, only consider itemsets that
After join
C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}
After pruning:
C4 = {{1, 2, 3, 4}}
because {1, 4, 5} is not in F3 ({1, 3, 4, 5} is removed)
Step 2: Generating rules from
frequent itemsets
Frequent itemsets association rules
One more step is needed to generate
association rules
For each frequent itemset X,
For each proper nonempty subset A of X,
Let B = X - A
A B is an association rule if
Confidence(A B) ≥ minconf,
frequent items.
Limitations of Apriori
Algorithm
(Summary)
Needs several iterations of the data