03 Pre Processing
03 Pre Processing
03 Pre Processing
Fatima Radi
Farah Al-Tufaili
University of Kufa – facility of computer science and math
computer science department 2015
First let’s look back to The Apriori Algorithm
ECLAT
FP- Growth
AprioriDP
Context Based Association Rule Mining Algorithm
Node-set-based algorithms
GUHA procedure ASSOC
OPUS search
What Is FP-Growth
TID frequency
A 5
B 6
C 3
D 6
E 4
In Table 2 you can see the numbers written in Red pen. Those are the
priority of each item according to it's frequency of occurrence. Item B got
the highest priority (1) due to it's highest number of occurrences. At the
same time you have opportunity to drop the items which not fulfill the
minimum support requirement. For instance, if Database contain F which
has frequency 1, then you can drop it.
As you see in the Table 3 new column added to the Table 1. In the
Ordered Items column all the items are queued according to it's priority,
which mentioned in the Red ink in Table 2. For example, in the case of
ordering row 1, the highest priority item is B and after that D, A and E
respectively.
Row 7:
Attach two new nodes A and E to the D node
which hanging on the null node. Then mark D,A,E
as D:2,A:1 and E:1.
Now count the frequency of occurrence of each item of the FP tree and
compare it with Table 2. If both counts equal, then it is positive point to
indicate your tree is correct.
Benefits of the FP-tree Structure
Completeness
Preserve complete information for frequent pattern mining
Never break a long pattern of any transaction
Compactness
Reduce irrelevant info—infrequent items are gone
Items in frequency descending order: the more frequently
occurring, the more likely to be shared
Never be larger than the original database (not count node-links
and the count field)
There exists examples of databases, where compression ratio
could be over 100
FP-Growth Complexity