Middle Class Spending Report CIA 3 Ba

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

CIA – 3

BUSINESS ANALYTICS

TOPIC- PREDICTIVE ANALYSIS


SUBMITTED TO –
PROF. GOKILAVANI R
SCHOOL OF COMMERCE, FINANCE AND ACCOUNTANCY
CHRIST (DEEMED TO BE UNIVERSITY)
BANGALORE
SUBMITTED BY –
NAME ROLL NO

SIDDHARTH MITRA 2314466

NUPUR ABROL 2314444

AARON SHIBU 2314402

ANJANAY SEHGAL 2314412

VRISTI P 2314478
3BCOM F&I

Predictive Analysis of Income, Savings, Expenditure, and Investment in


Middle-Class Households

Introduction:
In today's competitive environment, success depends on correctly understanding consumer
behaviour. This report examines the inter-relationships between four major variables: Age,
Education, Income, and Spending - where the latter is the independent variable and the other
three are dependent variables.
We will use secondary data to try and understand the correlations and regressions that would
eventually define the relationships driving Spending. We will use statistical techniques to
create a predictive model that will forecast Spending, given the other three variables.
This will, in return, help us get deep insights into our understanding of consumer behaviour,
plus actionable recommendations to optimise marketing strategies, which shall enable
businesses more effectively to anticipate the needs of consumers and improve overall
performance.

Assumptions and predictions:


From the given regression output, the following conditions are met:

1. Correlation between Independent and Dependent Variables:


-There is a high positive correlation between Income and Spending at value 0.99, meaning
that, conversely, the level of income proportionally increases spending.
- Education is also positively correlated at a moderately positive level of 0.70 with
Spending , which implies that educated respondents tend to spend more than the less-
educated ones.
- The age is weakly and adversely correlated with Spending at -0.04, meaning that old age
has little effect on spending in this data.

2. Multicollinearity:
- Education and Income are very strongly correlated at 0.79, so probably multicollinear,
hence possibly affecting the accuracy of the regression model.
- Other independent variables like Age, Education, Income would have low correlations and
indicate that multicollinearity occurs primarily in relation to the relationship between
Education and Income.

3. Data normalisation and outliers:


- The distribution of the data set seems reasonably good, and there are no extreme outliers in
Age, Education, Income, and Spending. - The range is moderate and does not involve any
outliers that are significantly different from the mean. For example, Income has a standard
deviation of around 5,770, which is reasonable given its mean of 44,892. The data appears to
be well structured with no significant outliers; however, the high correlation between
Education and Income is problematic because it may result in multicollinearity between them
in the regression model.

Research Methodology:

Objective:
● Establish the relationship of Age, Education, and Income with Consumer
Spending: Run regression analysis to know how each independent variable: Age,
Education, Income impacts the dependent variable: Spending.

● Evaluate the Predictive Power of the Model:


With an R-squared value standing at 1, determine how strong this regression model is
for predicting the consumer spending in relation to the independent variables and
whether overfitting is a concern.

● Develop Consumer Profiles:


Based on the regression output, divide consumers into various profiles depending on
age, education, and income. Analyze typical spending behavior. Determine how Age,
Education, and Income interact to shape Consumer Spending using the variable
relationships modelled by regression:

Hypothesis
Null Hypothesis (H₀):
The null hypothesis states that a phenomenon does not exist, or there is no relationship
between the variables being compared. It simply states the default position that there is no
significant effect and that observed differences are the result of chance or random variation.
For example, in the regression analysis, it can be a statement such as "there is no relationship
between income versus savings."
Alternative Hypothesis (H₁ or Hₐ):
The alternative hypothesis is the complement of the null hypothesis. It holds that an effect
exists, a relationship does exist, or a difference between variables is significant. Continuing
with the same example, the alternative hypothesis would be "there is a significant relation
between income and savings."

Analysis and Interpretation

In our data we have taken Age, Education & Income as an independent variable and
Spending as dependent variables for the regression analysis of our data.

Before we move onto analysis, we need to understand the basics of regression analysis for
better understanding of our data.

The "regression analysis" sheet contains data that is frequently used to forecast the
dependent variable (likely profit) using one or more independent variables (e.g., sales,
quantity, discount).

Key Terms:

· Multiple R: Correlation of expected and actual values. The correlation coefficient


between the observed and predicted values. It ranges in value from 0 to 1.
· R Square: Measures how much variability in the dependent variable is explained by
the model. Measure that indicates how much of the variation of a dependent variable is
explained by an independent variable in a regression model.

· Adjusted R Square: Adjusted for the number of predictors in the model, making it
more accurate for assessing model fit when working with several variables. It measures
the variation of regression.

The data presented includes results from a multiple linear regression analysis, involving
Predictors related to financial indices (Age, Income, Education and Spending). Here’s an
interpretation of the key statistical results:

1. Regression Statistics:

· Multiple R (1): This is the correlation coefficient, showing the strength of the
relationship between the actual and predicted values. A value of 1 indicates a moderate
positive correlation.

· R Square (1): This shows that approximately 100% of the variance in the
dependent variable (Price) is explained by the independent variables.

· Adjusted R Square (1): This value adjusts for the number of predictors in the
model and indicates that around 100% of the variance is explained, taking into account
the number of predictors.

· Standard Error (4.47E-12): This represents the average distance that the observed
values fall from the regression line. A lower value indicates a better fit.
2. ANOVA Table:

The p-value for the F-statistic, denoted here as "Significance F," is extremely small (0),
meaning that the model is statistically significant. This suggests that the independent
variables collectively have a significant effect on the dependent variable (Spending).

3. Coefficients Table:

Ø Intercept (Constant):

● This is the value of the dependent variable (Spending) when all the independent
variables (Age, Education, Income) are set to zero.
● In your case, the intercept (constant) is 600, meaning that if Age, Education, and
Income were all zero (though this may not make sense in real-world terms), the
predicted spending would be 600 units.

Ø Coefficient for Age:


· Coefficient = ~0: This means that for each additional year in age, the predicted
spending changes by almost 0 units.

Ø Coefficient for Education (X2):

· Coefficient = -800: This indicates that for each increase in education level (assuming
education is coded numerically), the predicted spending decreases by 800 units.

Ø Coefficient for Income (X3):

● Coefficient = 0.6: For each unit increase in income (for example, an increase of 1
dollar), the predicted spending increases by 0.6 units.

The following chart, "Age (X1) Line Fit Plot," is the relationship between age (X1) and
spending (Y1) based on a regression analysis..

● The actual spending data points cluster around the predicted spending values,
meaning the data points in general lie very close to the model's predictions.
● The individual points that make up this scatter plot do not follow a strongly linear
path. This would suggest that spending may be influenced by other influences besides
one's age.

Overall, the chart indicates that age alone may not be a strong predictor of spending. Other
factors likely influence spending patterns, and more data is needed to draw definitive
conclusions.
The given chart, titled "Income (X3) Line Fit Plot," displays the relationship between income
(X3) and spending (Y1) based on a regression analysis.

The trend of the data points seems to be upward, which suggests that as the level of income
increases, spending increases too.

The data points are scattered around the predicted spending line, which suggests that although
a general trend may exist, there is also variation in the spending patterns at each income level.

Regression analysis postulates that with an increasing trend and limited data, a positive
relationship may exist between income and spending. It follows that with increased income,
there is also an increase in spending.

On the whole, it reflects a positive relationship between income and spending, but to arrive at
any conclusive outcome, more data is required.
The following chart "Education (X2) Line Fit Plot" depicts a relationship between education
X2 and spending Y1, as based on a regression analysis.

It is the latter case, and by the low number of data points with no well-defined linearity, the
regression results support a very weak or nil relation between education and spending.

Overall, from this chart, education might not be used to clearly determine spending. There
might be other factors that relate to spending; thus, so much more data is needed if conclusive
results are to be drawn.

Model Building and equation analysis:

Y=β0+β1⋅X1+β2⋅X2+β3⋅X3

Where:

YYY is the dependent variable (Spending in your case).

X1, X2, X3X_1, X_2, X_3X1, X2, X3 are the independent variables (Age, Education,
Income).

β1, β2, β3\beta_1, \beta_2, \beta_3β1, β2,β3 are the coefficients of the independent variables.

Spending=600+0⋅Age−800⋅Education+0.6⋅Income
Example:

Let’s assume you have the predicted index values for the 2 person (these are just

hypothetical values; you should use actual forecasted data):

Person A:

● Age: 30
● Education: 12 (assuming this represents years of education)
● Income: 50,000

Person B:

● Age: 45
● Education: 16
● Income: 75,000
For Person A:

SpendingA=600+0⋅30−800⋅12+0.6⋅50000

=600−9600+30000
=600+30000−9600=24000

For Person B:

SpendingB=600+0⋅45−800⋅16+0.6⋅75000

=600−12800+45000
=600+45000−12800=32600

Predicted Spending for Person A: 24,000

Predicted Spending for Person B: 32,600

Conclusion:

The regression analysis returns a strong model, with an R-Squared value of 1, indicating that
Age, Education, and Income perfectly explain variations in consumer spending in this
dataset. The implication of such a relationship is powerful, suggesting each of these
demographic factors greatly contributes to spending behaviour. However, the perfection also
hints at overfitting or perhaps an issue in the data that should be considered. Overall, the
model depicts clear trends regarding how various consumer groups allocate monetary
resources and thus facilitates a better understanding of spending habits based on age,
education, and income.

You might also like