Business Report Pradeep Chauhan 11june'23
Business Report Pradeep Chauhan 11june'23
Business Report Pradeep Chauhan 11june'23
PGDSBA
Pradeep
Chauhan
[email protected]
A. What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the dataset
and the nature of the variables) ……………………………………………………….Page2
B. Take a critical look at the data and do a preliminary analysis of the variables. Do a
quality check of the data so that the variables are consistent. Are there any
discrepancies present in the data? If yes, perform preliminary treatment of data...
…………………………………………………………………………………………….Page 4
C. Explore all the features of the data separately by using appropriate visualizations and
draw insights that can be utilized by the business……………………………… …Page 6
D. Understanding the relationships among the variables in the dataset is crucial for
every analytical project. Perform analysis on the data fields to gain deeper insights.
Comment on your understanding of the data………………………………………. Page 10
E. Employees working on the existing marketing campaign have made the following
remarks. Based on the data and your analysis state whether you agree or disagree with
their observations. Justify your answer Based on the data available………..……Page 14
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an
easier target for a SUV sale over a Sedan Sale.
F. From the given data, comment on the amount spent on purchasing automobiles
across the following categories. Comment on how a Business can utilize the results
from this exercise. Give justification along with presenting metrics/charts used for
arriving at the conclusions………………………………………………………….Page 17
Give justification along with presenting metrics/charts used for arriving at the
conclusions.
F1) Gender
F2) Personal_loan
G. From the current data set comment if having a working partner leads to the purchase
of a higher-priced car…………………………………………………………………Page 19
H. The main objective of this analysis is to devise an improved marketing strategy to
send targeted information to different groups of potential buyers present in the data. For
the current analysis use the Gender and Marital_status - fields to arrive at groups with
similar purchase history……………………………………………………………….Page 21
I. Analyse the dataset and list down the top 5 important variables, along with the
business justifications………………………………………………………….Page 22.
1
Problem 1
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback models. In
its recent board meeting, concerns were raised by the members on the efficiency of the marketing
campaign currently being used. The board decides to rope in an analytics professional to improve the
existing campaign.
1. You as an analyst have been tasked with performing a thorough analysis of the data and coming
up with insights to improve the marketing campaign.
A. What is the important technical information about the dataset that a database administrator
would be interested in? (Hint: Information about the size of the dataset and the nature of the
variables)
Answer A To improve the marketing campaigns, First we will check the relevance of data.
First thing first, We will import all the libraries like numpy- the basic library for array related
operations/mathematical operations, pandas - to load the data into the dataframe. matplotlib and seaborn
for visualisation of data.
First, We will try to validate the data and check quality of the data whether it is good enough to proceed
further.
We will try to understand each feature in the data; By using our statistical technique, We try to identify
the central tendency (mean, median and mode) and distribution of the data.
When we talk about numeric fields, the central tendency corresponds to mean, median and mode. And
we try to understand the distribution/shape of the data for each of continuous variable with help of
standard deviation.
Read the csv We use the feature of pandas to load the csv into the dataframe.
We start viewing data, we will view top 5 rows of the data, Here we observe 1 spelling mistake in Gender
column of the data.
2
We will view last 5 rows of the data to ensure data has been effectively.
We will look at the dimensions of the data by using shape property of dataframe,
We have a data of 1581 rows with 14 features/columns (including both dependent and independent
variables. By default we expect that each column has 1581 entries.
We can observe here that Gender, Partner_salary has missing values in our dataset. We have 1 float, 5
integers, 8 objects.
We have 8 categorical variables here i.e. Gender, Profession, Marital_status, Education, Personal_loan,
House_loan, Partner_working, make.
3
We have observed here by describe function that maximum age is 54 and minimum age is 22.
B. Take a critical look at the data and do a preliminary analysis of the variables. Do a quality check of
the data so that the variables are consistent. Are there any discrepancies present in the data? If
yes, perform preliminary treatment of data.
Answer B
• We have had a view in top 5 rows of the data in previous question, and observed 1 spelling
mistake in ‘Gender’ column of the dataframe, we will check for more discrepancies here.
• We have observed by calculating unique values in the “”Gender” column that there are 2 spelling
mistakes and null values here, we will impute these values.
• We will replace both spelling mistakes in ‘Gender’ column by using replace method, as well as
replace nan with maximum count of values is ‘Male’.
• As We can see that we have 1581 entries in ‘Gender’ column with datatype integer.
4
• We have replaced missing values in ‘Partner_salary’ column after imputing the null values.
• We have diagnosed that there are outliers in Total_salary Column, We will take IQR approach to
treat it.
5
• We have treated the outliers in Total_salary by IQR method.
C. Explore all the features of the data separately by using appropriate visualizations and draw
insights that can be utilized by the business.
Answer C
We will do univariate analysis here to deep dive and get the insights.
6
• We can observe from boxplot and countplot above shared that maximum numbers of cars are
purchased by people below the age of 31 (people age ranging between and 30 buying most of the
cars).
7
• We can observe from here that maximum numbers of cars are purchased by people having
number of dependants 2 and 3.
• We can observe from here that maximum numbers of cars are purchased by people with Gender
‘Male’.
8
• We can also say by looking at the boxplot above that maximum number of cars are bought by
people with having Salary of 60K and 80K as our histogram is double peaked i.e. with value of
60K and 80K.
• Apart from that we can see after having a look at the boxplot that there are some outliers as well
as our boxplot is slightly right-skewed. We can separately analyse outliers and treat them with the
help of IQR.
• We can also say by looking at the boxplot above that maximum number of cars are purchased by
Married people.
• We can have a look at the histogram above and can say that maximum number of cars are
purchased by Sedan, then Hatchback on second number and demand of SUV is least.
9
D. Understanding the relationships among the variables in the dataset is crucial for every analytical
project. Perform analysis on the data fields to gain deeper insights. Comment on your
understanding of the data.
Answer D
• We can say by observing the above countplot that Sedan Make cars are purchased more when
both partners are working.
10
• We can say by observing the above countplot that Married people purchased more cars,
particularly Sedan Make cars; while on the other hand Single prefer purchasing Hatchback cars.
11
• Postgraduate people purchased cars more, specifically Sedan Make, as well as sedan make cars
are purchased more by graduate people.
• We can say by observing the above countplot that Salaried People purchase more cars,
specifically Seadan make cars. Apart from that, we can also that SUVs are also purchased more by
Salaried People.
12
• Male purchase more cars, specifically Hatchback make cars. Apart from that, we can also that
SUV cars are purchased more by Female.
• More cars are purchased by people with no House loan, specifically Sedan make cars.
13
• We can infer from the pairplot that there is correlation between Age and price of the car, as the
age of the people increases, they buy cars with more price.
• There is no corelation between Total_salary and price of the car.
E. Employees working on the existing marketing campaign have made the following remarks. Based on
the data and your analysis state whether you agree or disagree with their observations. Justify your
answer Based on the data available.
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target for a
SUV sale over a Sedan Sale.
Answer E
14
15
E1 We can infer from above plots that statement made by Steve Roger is False, Women prefer SUV by
a large margin, compared to the men”.
E2 We can infer from above plot that Ned Stark is Right, more Salaried class people buy Sedan
make cars.
16
E3 We can infer from above countplot that Sheldon Cooper is wrong; Salaried male purchases
more Sedan Make cars over SUV.
F. From the given data, comment on the amount spent on purchasing automobiles across the following
categories. Comment on how a Business can utilize the results from this exercise. Give justification along
with presenting metrics/charts used for arriving at the conclusions.
Give justification along with presenting metrics/charts used for arriving at the conclusions.
F1) Gender
F2) Personal_loan
Answer F
F1
17
• We can see from above Histogram that Male purchased more number of cars in comparison of
Female.
• While looking at the Mean and Median values we can see that Male spent more money on
purchasing cars in comparison of Female.
Answer F2
18
• We can infer from above chart that it does not make any difference whether a person have taken
a personal loan or not when the price of car comes into consideration.
G. From the current data set comment if having a working partner leads to the purchase of a higher-
priced car.
Answer G
19
• We can infer from above plots that there is no correlation between Price of car and Partner is
working or not.
• As far as the question of buying a high priced car is concerned, We can observe mean and median
values for the ‘Partner_working’ and can say that it is independent of the fact.
H. The main objective of this analysis is to devise an improved marketing strategy to send targeted
information to different groups of potential buyers present in the data. For the current analysis use the
Gender and Marital_status - fields to arrive at groups with similar purchase history.
20
Answer H
From having a view at the above counplot we can improve our marketing strategy accordingly and
make married man as our targeted audience to reach them and get them converted into customers
as they are potential buyers to us.
Problem 2
A bank can generate revenue in a variety of ways, such as charging interest,
transaction fees and financial advice. Interest charged on the capital that the bank
21
lends out to customers has historically been the most significant method of revenue
generation. The bank earns profits from the difference between the interest rates it
pays on deposits and other sources of funds, and the interest rates it charges on the
loans it gives out.
GODIGT Bank is a mid-sized private bank that deals in all kinds of banking products,
such as savings accounts, current accounts, investment products, etc. among other
offerings. The bank also cross-sells asset products to its existing customers through
personal loans, auto loans, business loans, etc., and to do so they use various
communication methods including cold calling, e-mails, recommendations on the net
banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards based on
risk policy and customer category class but due to huge competition in the credit
card market, the bank is observing high attrition in credit card spending. The bank
makes money only if customers spend more on credit cards. Given the attrition, the
Bank wants to revisit its credit card policy and make sure that the card given to the
customer is the right credit card. The bank will make a profit only through the
customers that show higher intent towards a recommended credit card. (Higher
intent means consumers would want to use the card and hence not be attrite.)
Problem 2 Question: ( Analyze the dataset and list down the top 5 important
variables, along with the business justifications. (10 Points) Data Dictionary
- Link )
Answer
These are the 5 variables we will take into consideration while issuing a new credit card to our
customer
1. avg_spends_l3m
2. annual_income_at_source
3. other_bank_cc_holding
4. card_type
5. cc_limit
1) avg_spends_i3m
Let’s keep it as the focus of discussion while taking above 5 variables/factors into consideration
one by one.
“The bank will make a profit only through the customers that show higher
intent towards a recommended credit card. (Higher intent means consumers
would want to use the card and hence not be attrite.)”
By knowing the extent of average credit card spend in last 3 months, we can have a fair idea of
the intent of customer. The more the spend made on credit card, the higher will be the intent and
lesser will the chances of attrition of customer.
22
To enable customers to use credit card more frequently, Bank should allot more credit card limit
to customer.
There is a positive co-relation between ‘average credit card spend in last 3 months’ and ‘cc_limit’.
2.) annual_income_at_source
There is a positive co-relation between annual income and credit card limit, the higher will be the
annual income of the customer the more money he can spend after fulfilling his/her fixed
obligation to income.
Which will result in higher average credit card spend and higher intent and lesser probability of
attrition.
3.) other_bank_cc_holding
Bank should take into consideration the customers holding other bank’s credit card. Analysing
customers who hold credit cards from other banks, Banks can gain insights into customer
preferences, spending habits, and credit card features that are popular among competitors.
This information can help the bank tailor their product offerings and marketing strategies to
better meet customer needs and remain competitive in the market.
4.) card_type
We can increase the card usage by analysing and offering the relevant credit card to customer
according to customer’s need and preference.
23
By understanding customers' preferences and providing them with the right credit card type, We
can reduce attrition rates. When customers are satisfied with their credit card and find it suitable
for their financial needs, they are less likely to switch to another provider or cancel the card.
Considering the credit card type allows banks to target the right customers, increase card usage,
reduce attrition rates, high profits.
5.) cc_limit
The Credit card limit is a critical factor while considering usage and we can customise the card
type accordingly depending on his need and interest. So that we can encourage the customer to
spend more and lowering the attrition rate.
The credit card limit directly impacts a customer's spending capacity. By considering the credit
card limit, banks can issue cards with limits that align with a customer's income, financial stability,
and spending habits.
24