Data Mining and Predictive Analysis
Data Mining and Predictive Analysis
Data Mining and Predictive Analysis
Reporting to
Dr. Pradeep Kumar Bala
Data Mining and Predictive Analytics
Association Rule
Second rule: If Intro, Survey and DOE are taken, Cat.Data is also taken. It has
confidence of 80% and lift of 3.842.
The support for all rules is very low i.e, it is under 2%. This means that the
applicability of these rules is not good.
Q.3. The data shown in Figure 11.7 are a subset of a dataset on cosmetic
purchases given in
binary matrix form. The complete dataset (in the file Cosmetics.xls) contains
data on the
purchases of different cosmetic items at a large chain drugstore. The store
wants to analyze
associations among purchases of these items for purposes of point-of-sale
display, guidance to sales personnel in promoting cross sales, and guidance for
piloting an eventual time-of purchase electronic recommender system to boost
cross sales. Consider first only the subset
shown in Figure 11.7 (CH10-Assoc-Exer_Cosmetics-small.eps)
b. Consider the results of the Association Rules analysis shown in figure 11.8
and
(CH10-Assoc-Cosmetics-small-rules.eps)
i. or the first row, explain the “Conf. %” output and how it is calculated.
ii. For the first row, explain the “Support(a), Support(c) and Support(a U c)
output and how it
is calculated.
iii. For the first row, explain the “Lift Ratio” and how it is calculated.
iv. For the first row, explain the rule that is represented there in words.
vii. Reviewing the first couple of dozen rules, comment on their redundancy,
and how you would assess their utility.
Answer with Recommendations:
a. The “0” in the first row, first column under “bag” indicates that, in the first
transaction (i.e. the first row), no bag was purchased. The “1” to its right
indicates that blush was purchased in that first transaction.
b.
2. Support (a) = 103 means that there were 103 transactions in which
bronzer and nail polish both were purchased.
Support (c) = 77 means that there were 77 transactions in which
brushes and concealer were purchased.
Support (a U c) = 62 means that there were 62 transactions in which
bronzer + nail polish + brushes + concealer was purchased.
3. Lift Ratio = 3.909 means we are 3.909 times more likely to find a
transaction with brushes + concealer IF we look only in those
transactions where bronzer + nail polish are purchased, compared to
searching randomly in all transactions.
4. The rule is :"If a transaction includes bronzer + nail polish, it will also
include brushes + concealer." There are 62 transactions that meet
this rule, out of 103 that involve bronzer + nail polish, for confidence
of 60.19%. If we are searching for transactions with brushes +
concealer, limiting our search to transactions with bronzer + nail
polish will increase our probability of success by a factor of 3.909.
5. Done in R Code