Lecture 6 - Multiple Regression Analysis
Lecture 6 - Multiple Regression Analysis
Lecture 6 - Multiple Regression Analysis
APPLICATION (QHO430)
In reality, it is possible that there are number of contributing factors, for example:
does the size of the store have an influence?
is the area of the region served significant?
are the sales related to the local population size?
does the number of years of experience of the sales staff
have an effect?
does the gender of the sales rep make a difference?
etc . . .
Multiple Regression Analysis
We could look at each of these relationships separately to see if they are significant.
But . . .
Say we found a strong relationship between sales and store
size and also between sales and local population.
Previously, we related the dependent variable (eg sales, “y”) to one independent variable (eg
advertising spend, “x”) :
y = a + bx
Eg Sales = a + b * (advertising spend)
y = a + b1x1 + b2x2 + . . . .
Eg Sales = a + b1 * (advertising spend) + b2 * (Population)+ . . . .
Once we know which of the variables have a significant impact, we can construct a
regression equation.
What is the P-value?
A p-value is a measure of the probability that an observed difference could
have occurred just by random chance. The lower the p-value, the greater the
statistical significance of the observed difference.
When you perform a statistical test a p-value helps you determine the significance
of your results in relation to the null hypothesis.
The null hypothesis states that there is no relationship between the two variables
being studied (one variable does not affect the other). It states the results are due
to chance and are not significant in terms of supporting the idea being
investigated. Thus, the null hypothesis assumes that whatever you are trying to
prove did not happen.
What is the P-value?
The smaller the p-value, the stronger the evidence that you should reject the null
hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It
indicates strong evidence against the null hypothesis, as there is less than a
5% probability the null is correct (and the results are random).
Multiple Regression Analysis
Example:
A toy manufacturer currently sells toys to retail outlets in six different areas. They are looking
to expand and their business and sales departments want to predict the probable number of
sales in a new area. They consider a variety of possible contributing factors.
Current Advertising Population M/F Sales
Areas Spend (£000) (Thousands) Sales Rep (£000)
A 1 220 M 110
B 5 690 F 295
C 8 810 M 405
D 6 430 M 220
E 3 105 F 105
F 10 595 F 390
Multiple Regression Analysis
Before we can use Excel’s Data Analysis add-in, we need all
data to be quantitative (numerical).
Obviously, we also need quantitative data for our regression
equation.
We can now process our data table and interpret the results.
Multiple Regression Analysis
Re-coding:
How do we cope with gender (Male/Female)?
If we call Male “0” and Female “1”, we can use these zeros
and ones in a regression equation as before.
Answer
Sales (£000) = 14.83 + 19.9 × advertising (£000) + 0.27 × Population(000)
= £351 230
Prediction Using Multiple Regression Analysis
Exercise:
Predict the sales return for an area with a population of 480 000 given an advertising budget
of £4000.
Answer
Sales (£000) = 14.83 + 19.9 × advertising (£000) + 0.27 × Population(000)
Sales =
Prediction Using Multiple Regression Analysis
Exercise:
Predict the sales return for an area with a population of 480 000 given an advertising budget
of £4000.
Answer
Sales (£000) = 14.83 + 19.9 × advertising (£000) + 0.27 × Population(000)
£224,030
=
Summary
– Excel’s Data Analysis add-in tool performs regression
– Quality of fit of line to the data points can be judged by correlation (r) and coefficient of
determination (R2)
– Excel has CORREL and RSQ functions
– Accuracy of prediction is reflected by standard error (s.e.) in output
– Approximate 95% confidence intervals for predictions can be constructed if assumptions
can be made
– We can extend the regression model to include the effect of more than one significant
independent variables.
Relationship between Qualifications and Income
[taken from
https://www.qs.com/what-effect-does-education-level-have-on-wealth/#:~:text=There's%20a%20clear
%20correlation%20between,the%20greater%20your%20salary%20becomes
.]
Response and Explanatory Variables
Response variable (Dependent variable)
• The outcome variable on which comparisons are made.
Example: Response/Explanatory
• Level of Carbon dioxide and Amount of gasoline use for cars
“The UK Data Service is funded by the Economic and Social Research Council (ESRC) to
meet the data needs of researchers, students and teachers from all sectors, including
academia, central and local government, charities and foundations, independent research
centres, think tanks, and business consultants and the commercial sector.”
•https://census.ukdataservice.ac.uk/
This website hosts a lot of the data for England on UK data
What is a Boundary?
Recommended Reading
You have a rich list of resources to explore:
• https://learn.solent.ac.uk/course/view.php?id=42897§ion=2
Look at the Key data area, take time to understand the different types of data available.