Data Mining and Predictive Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Data mining and Predictive Analysis

Analysis of Cosmetics and Coursetopics Data

Report Submitted By: Group 11

Mayur Patil M030-19


Rohan Chowdhury M143-19
Suraj Singh M053-19
Rohit Jha M068-19
Vaibhav Darshe M058-19

Reporting to
Dr. Pradeep Kumar Bala
Data Mining and Predictive Analytics
Association Rule

Questions & Solution

Q.2. Consider the data in the file coursetopics.xls.


Each row represents the courses attended by a single customer.
The firm wishes to assess alternative sequencings and combinations of
courses.
Use Association Rules to analyze these data, and interpret several of the
resulting rules.

Answer: R Code in separate file

Interpretation of first 2 rules;


First rule: If Intro, Regression and Forecast are taken, Data mining is also taken.
It has confidence of 71.4% and a lift of 4.011.

Second rule: If Intro, Survey and DOE are taken, Cat.Data is also taken. It has
confidence of 80% and lift of 3.842.

The support for all rules is very low i.e, it is under 2%. This means that the
applicability of these rules is not good.
Q.3. The data shown in Figure 11.7 are a subset of a dataset on cosmetic
purchases given in
binary matrix form. The complete dataset (in the file Cosmetics.xls) contains
data on the
purchases of different cosmetic items at a large chain drugstore. The store
wants to analyze
associations among purchases of these items for purposes of point-of-sale
display, guidance to sales personnel in promoting cross sales, and guidance for
piloting an eventual time-of purchase electronic recommender system to boost
cross sales. Consider first only the subset
shown in Figure 11.7 (CH10-Assoc-Exer_Cosmetics-small.eps)

a. Select several values in the matrix and explain their meaning.

b. Consider the results of the Association Rules analysis shown in figure 11.8
and
(CH10-Assoc-Cosmetics-small-rules.eps)

i. or the first row, explain the “Conf. %” output and how it is calculated.

ii. For the first row, explain the “Support(a), Support(c) and Support(a U c)
output and how it
is calculated.

iii. For the first row, explain the “Lift Ratio” and how it is calculated.

iv. For the first row, explain the rule that is represented there in words.

v. Find all the Association Rules from the data.

vi. Interpret the first several rules in the output in words.

vii. Reviewing the first couple of dozen rules, comment on their redundancy,
and how you would assess their utility.
Answer with Recommendations:

a. The “0” in the first row, first column under “bag” indicates that, in the first
transaction (i.e. the first row), no bag was purchased. The “1” to its right
indicates that blush was purchased in that first transaction.
b.

1. If bronzer and nail polish were purchased, 60.19% of the time


brushes and concealer were also purchased.

The calculation is:


(# of transactions with bronzer + nail polish + brushes + concealer)/(#
of transactions with bronzer + nail polish)∗100

2. Support (a) = 103 means that there were 103 transactions in which
bronzer and nail polish both were purchased.
Support (c) = 77 means that there were 77 transactions in which
brushes and concealer were purchased.
Support (a U c) = 62 means that there were 62 transactions in which
bronzer + nail polish + brushes + concealer was purchased.

3. Lift Ratio = 3.909 means we are 3.909 times more likely to find a
transaction with brushes + concealer IF we look only in those
transactions where bronzer + nail polish are purchased, compared to
searching randomly in all transactions.

4. The rule is :"If a transaction includes bronzer + nail polish, it will also
include brushes + concealer." There are 62 transactions that meet
this rule, out of 103 that involve bronzer + nail polish, for confidence
of 60.19%. If we are searching for transactions with brushes +
concealer, limiting our search to transactions with bronzer + nail
polish will increase our probability of success by a factor of 3.909.

5. Done in R Code

6. First rule: If Blush+Concealer+Eye.shadow are purchased, Mascara is


also purchased. This rule has 96% confidence -- purchasing
Blush+Concealer+Eye.shadow almost guarantees purchase of
Mascara. It has lift of 2.688, and support of about 11.9% (119
transactions out of 1000) for these 4 items together.
Second rule: If Blush+Eye.shadow are purchased, Mascara is also
purchased. This rule has 92.9% confidence -- purchasing
Blush+Eye.shadow also almost guarantees purchase of Mascara. It
has lift of 2.601, and support of about 16.9% (169 transactions out of
1000) for these 3 items together.

Third rule: If Lip.Liner+Eyeliner are purchased, Concealer is also


purchased. This rule has 92.3% confidence -- purchasing
Lip.Liner+Eyeliner also almost guarantees purchase of Concealer. It
has lift of 2.088, & support of about 12% (120 transactions out of
1000) for these 3 items together.

7. First, a note about utility. From a static retail presentation


perspective (buy X together with Y), the shopper's attention can
probably only handle a couple of rules. Coupon and web offer
generating systems have no such limit, because, while one or two
offers are presented to a give customer at a given time, other
customers, and this customer at a different time may receive
different offers.
The first few rules like 1 & 7 can be bundled. 9 & 12 can also be
bundled. 14 & 27 can also be offered in bundle.

You might also like