MRA Project - Shehroz Khan
MRA Project - Shehroz Khan
MRA Project - Shehroz Khan
Submitted by
1. Agenda & Executive Summary of the data -> Contents of the ppt -> Problem statement -> About
Data (Info, Shape, Summary Stats, your assumptions about data)
• We have received the 3 years data of automobile part manufacture. The data is about an
automobile parts manufacturing company.
• They have provided the data collected of transactions for 3 years.
• The data has 2747 entries (0 To 2746) of rows and 20 columns.
• The data has 1 datetime64 , 2 float64, 5 int64,and 12 Object data types.
• There is no missing values present in the data set.
• This data more or less reflects the purchasing behavior of customers in different categories.
• The company is into automobile part manufacture, and they have different product line like
Classic car , Motorcycle, plane, train, ship, Bus truck, vintage cars etc.
• The data maintained each transactions entry as order number and for each order number
maintained all required information like customer identity details , and product details like price
, quantity , product code, and sales for each customer.
• We noticed that one order number has many different entries with different product codes.
• Manufacturer's Suggested Retail Price(MSRP) for each product code is decided but we found that
this is not matching with Price of Each item & is inconsistent with MSRP.
2. PART A: Exploratory Analysis and Inferences -> Univariate, Bivariate, and multivariate analysis
using data visualization (Weekly, Monthly, Quarterly, Yearly Trends in Sales and Sales Across
different Categories of different features in the given data) -> Summarise the inferences
Univariate Analysis
• Using boxplot on sales & quantity order variable we have plotted univariate analysis.
• We can clearly see that outlier is present there.
• Also using histogram on sales variable we did univariate analysis.
• For Categorical variable like product line we also did univariate analysis using bar plot.
• We have no noticed that the sales of classic cars products are high followed by vintage car product
sales
Bivariate Analysis
• Using boxplot on sales & product line variables we have plotted bivariate analysis.
• We can clearly see that outlier is present in each product line category .
• Using boxplot on sales & deal size variables we have plotted bivariate analysis.
• We can clearly see that outlier is present in Large deal size. In Pie chart we can see the larger
portion of classic cars followed by vintage cars were as trains has the least demand.
Multivariate Analysis
• MSRP, Price Each, status, sales & product line using these variables we did multivariate analysis.
• For this we used horizontal bar, tree map, stack bar, scatter plot respectively.
• As sales are high for classic cars the company has even sold below MSRP, there might be a chances
that the company has given more discounts to its customers. And vice versa for vintage cars were
the company has sold above MSRP.
• Ship, vintage car & train are been sold above the MSRP.
• By looking at the given data almost all the transactions are been shipped.
• Yearly, Quarterly, monthly, Weekly time series analysis & its trend are been shown.
• We observed that in Last quarter sales are high as compared to other quarters.
• There is a seasonality seen.
Summary of Inferences
4. PART A: Inferences from RFM Analysis and identified segments -> Who are your best customers?
(give at least 5) -> Which customers are on the verge of churning? (give at least 5) -> Who are
your lost customers? (give at least 5) -> Who are your loyal customers? (give at least 5)
• On basis on Recency, frequency & monetary we have grouped our top customers.
• We have given the most signifcance to recency parameter as these customers has recently
purchased our products.
• Also according to RFM model the most importance is given to recency.
• Hence we have kept it as our parameter for selecting top customers.
• For eg. Customer name -Euro Shopping Channel, they have recently made a purchase, also has
high frequency with a high monetary transaction.
• On basis on Recency, frequency & monetary we have grouped our loyal customers.
• These customers have purchased multiple items with good monetary value.
• If we focus more on this segment of customers, we can easily turn them into our top best
customers too. Also, in this segment we can see the customers for productline - classic cars are
many.
Customers on verge of churning
• On basis on Recency, frequency & monetary we have grouped our Customers who are on verge
of churning.
• We should definitely focus on this group before we lose them and try to convert them into our
regular customers.
• For e.g. Customer name Saveley & Henriot,Co – Their frequency is good with good monetary
value, but low recency made them stand in this group. If the company pays more attention and
fulfil their requirement, then we can easily turn them into our regular customer and we can save
them from churning out.
Lost Customers
• On basis on Recency, frequency & monetary parameters we have grouped our Customers who
we’d lost. Their recency is very low and hasn’t made any purchase since long.
• So we can say these are our lost customers. If taken feedback from them and fulfill their demand
we might bring them back to been a good customer.
Recommendation
• Using Recency, frequency & monetary parameters we have grouped our top , loyal, on the verge
of churning and lost customers.
• Customers with good recency has been our top customers were as we also have lost customer
lists.
• Customers on verge of churning can be saved and can be converted into a good buyer.
• RFM model is used for deriving the customers types like Loyal, top or best, on verge of churning
& lost customers.
• Recency, frequency & monetary parameters were widely used to bifurcate the types of
customers.
• This model can be very helpful to the company to maintain its sales and customers and can focus
on how the company has lost the customers & can take various actions to bring back them.
• It is vital for the company to convert the customers who are on verge of churning into a regular
customer or atleast maintain them.
• And also how to increase the sales ratio can be identified.
Grocery Store (Part B)
1. PART B: Exploratory Analysis --> Exploratory Analysis of data & an executive summary (in PPT)
of your top findings, supported by graphs. --> Are there trends across
months/years/quarters/days etc. that you are able to notice?
• It is generally a good practice to drop duplicate rows in a dataset as they do not provide any
additional information and can skew the results of any analysis performed on the dataset.
• However, in this particular case, dropping duplicate rows may not be appropriate as there is no
unique identifier for each row.
• Each row consists of a date, a customer ID, and a product purchased, but the same product can
be purchased by multiple customers on the same date.
• Therefore, we drop duplicate rows, it may inadvertently remove valid information from the
dataset.
• So duplicate values are not removed from the dataset
Assumptions
• The data represents a list of items purchased at a grocery store on various dates.
• Each entry in the data represents a single item purchased.
• The first column in the data represents the date the item was purchased.
• The second column represents the customer who made the purchase.
• The third column represents the item purchased.
• The same item can be purchased by multiple customers on different dates.
• There is no information provided about the quantity or price of each item.
• We have not dropped the duplicated values.
Product Count
Count of products sold yearly
Non-Eatables
Product sales vs Time
Observations
• As we have data till 26 feb 2020 that’s why the count of products sold in 2020 is low
• As we have data till 26 feb 2020 that’s why the count of products sold in Q1 is Hight.
• In 2019 Q1 sales was highest
• In 2018 Q3 sales was highest
• Count of product sold in Q2 is aproxx same in 2019 and 2018.
• In 2018 most of the products were sold in January and least were sold in February.
• In 2019 most of the products were sold in March and least were sold in January.
• Most of the products were sold on Sundays.
• Least products were sold on Mondays.
• On other days sales in consistent.
• There are total 28 products in this category.
• Highest sold : poultry , Soda, cereals
• Least sold : pork, fruits, sandwich loaves
• There are 9 products in this categoary.
• Highest Sold Products : Soap, Toilet Paper.
• Lowest Sold Products: Hand soap.
• We can see poultry, cereals and soda are highly sold products over the years
• Poultry, Soda and Cereal are the top 3 Products sold in 2019
• Cereals, Poultry and Soda are the top 3 Products sold in 2018
Recommendations
• Focus on promoting and stocking up on poultry, soda, and cereals as they are consistently top-
selling products.
• Consider increasing the stock of soap and toilet paper as they are the highest sold non-eatable
products.
• Evaluate the reasons behind the low sales of hand soap and take measures to increase its sales.
• Schedule promotions and offers on Sundays to maximize sales on the day with the highest sales.
• Plan marketing campaigns and discounts during February to increase sales during the historically
low-sales month.
• Plan marketing campaigns and discounts during January and March to increase sales during the
historically high-sales months.
• Aim to replicate the sales patterns of Q1 2019 and Q3 2018.
• Keep the stock of products sold in Q2 consistent with the previous years to maintain sales levels.
• Keep in mind the limited data for 2020 while making sales and marketing decisions.
2. PART B: Use of Market Basket Analysis (Association Rules) -->Write Something about the
association rules and its relevance in this case -->Add KNIME workflow image -->Write about
threshold values of Support and Confidence
• In market basket analysis, support, confidence, and lift values are used to measure the strength
of association between items in a transaction dataset.
• Confidence: It is the conditional probability that a transaction containing one item also contains
another item. It is calculated as the number of transactions containing both items divided by the
number of transactions containing the first item. It measures the strength of the association
between two items. High confidence indicates that the items are likely to be bought together, and
can be used to recommend or suggest items to customers.
• Lift: It is the measure of how much more often two items occur together than expected if they
were independent of each other. It is calculated as the support of the itemset divided by the
product of the individual supports of the items. A lift value of 1 indicates that the items are
independent, while a value greater than 1 indicates a positive association between the items. A
lift value less than 1 indicates a negative association between the items. High lift indicates that
the items have a strong association and can be used for cross-selling or bundling.
Association rules are a technique used to find relationships or associations between items in a large
dataset. These rules are based on the concept of frequent itemsets, which are sets of items that appear
together frequently in a transactional dataset.
24 rules have been found with the dataset and set parameters
Recommendations
• Offer a "Buy Two Get One Free" promotion on yogurt, poultry, and aluminum foil to encourage
customers to purchase more items at once.
• Create a combo deal where customers can purchase cereals, bagels, and sandwich bags together
at a discounted price.
• Offer a discount on mixes when purchased with yogurt, poultry, or aluminum foil.
• Provide a discount on dinner rolls when purchased with spaghetti sauce or poultry.
• Create a "Paper Products Bundle" offer that includes paper towels, toilet paper, and/or tissues at
a discounted price.
• These discount offers and combos can help increase sales by providing customers with more value
for their money and encouraging them to purchase more items. It is important to promote these
offers through in-store signage, advertisements, and social media to ensure customers are aware
of the deals available.