Mining Frequent Itemsets Using Vertical Data Format
Mining Frequent Itemsets Using Vertical Data Format
Mining Frequent Itemsets Using Vertical Data Format
• Drawback
– The TID sets can be quite long, taking substantial memory
space as well as computation time for intersecting the long
sets.
Mining frequent itemsets using vertical data
format
• To overcome the drawback:
– a technique called diffset is used, which keeps track of only
the differences of the TID sets of a (k + 1)-itemset and a
corresponding k-itemset
– Example:
{I1} = {T100, T400, T500, T700, T800, T900} and
{I1, I2} = {T100, T400, T800, T900}.
The diffset between the two is diffset({I1, I2}, {I1}) = {T500,
T700}.
Challenge in mining a Large Set
• Mining frequent itemsets from a large data set often
generates a huge number of itemsets satisfying the minimum
support (min sup) threshold, especially when min sup is set
low.
https://data-mining.philippe-fournier-viger.com/introduction-high-utility-itemset-mining/
High-Utility Itemset Mining
• In this problem, a transaction database contains transactions
where purchase quantities are taken into account as well as the
unit profit of each item.
for {a,e}, its utility is the sum of 8$ + 16 $ = 24$ because it appears only in
transactions T0 and transaction T3