Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Business Decision Making II

Simple Linear Regression

Dr. Nguyen Ngoc Phan

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlation

Definition
In statistics, dependence is any statistical relationship between
two random variables or two sets of data. Correlation refers to
any of a broad class of statistical relationships involving
dependence.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlation

Definition
In statistics, dependence is any statistical relationship between
two random variables or two sets of data. Correlation refers to
any of a broad class of statistical relationships involving
dependence.

Familiar examples of dependent phenomena include the


correlation between the physical statures of parents and their
offspring, and the correlation between the demand for a
product and its price.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlations are useful because they can indicate a predictive
relationship that can be exploited in practice. For example, an
electrical utility may produce less power on a mild day based
on the correlation between electricity demand and weather.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlations are useful because they can indicate a predictive
relationship that can be exploited in practice. For example, an
electrical utility may produce less power on a mild day based
on the correlation between electricity demand and weather.
A typical way of showing the correlation between two related
variables is on a scatter diagram, plotting a number of pairs of
data on the graph.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Figure: A scatter diagram

Dr. Nguyen Ngoc Phan Simple Linear Regression


Degrees of correlation
Two variables can be one of the following
1 Perfectly correlated
2 Partly correlated
3 Uncorrelated

Dr. Nguyen Ngoc Phan Simple Linear Regression


Figure: Perfect correlation

Dr. Nguyen Ngoc Phan Simple Linear Regression


Figure: Partial correlation

Dr. Nguyen Ngoc Phan Simple Linear Regression


Figure: No correlation

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlation, whether perfect or partial, can be positive or
negative.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlation, whether perfect or partial, can be positive or
negative.
Definition
A positive correlation exists when as one variable decreases,
the other variable also decreases and vice versa.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Correlation, whether perfect or partial, can be positive or
negative.
Definition
A positive correlation exists when as one variable decreases,
the other variable also decreases and vice versa.
A negative correlation exists when as one variable decreases,
the other variable increases and vice versa.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The correlation coefficient
The correlation coefficient is a measure of the linear
correlation between two variables X and Y , giving a value
between [−1, 1], where 1 is perfect positive correlation, 0 is no
correlation, and -1 is perfect negative correlation.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The correlation coefficient
The correlation coefficient is a measure of the linear
correlation between two variables X and Y , giving a value
between [−1, 1], where 1 is perfect positive correlation, 0 is no
correlation, and -1 is perfect negative correlation.
The correlation coefficient can be calculated by
P P P
n xi yi − xi yi
r=p P 2 p P
n xi − ( xi )2 n yi2 − ( yi )2
P P

where (xi , yi )’s are pairs of data for two variables X and Y ,
and n is the number of pairs of data used in the analysis.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example
The cost of output at a factory is thought to depend on the
number of units produced. Data have been collected for the
number of units produced each month in the last six months,
and the costs, as follows
Month Output (’000 of units) Cost
x y
1 2 9
2 3 11
3 1 7
4 4 13
5 3 11
6 5 15

Is there any correlation between output and cost?

Dr. Nguyen Ngoc Phan Simple Linear Regression


The coefficient of determination
The coefficient of determination is the square of the
correlation coefficient, r 2 . It measures the explanatory power
of the regression model.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The coefficient of determination
The coefficient of determination is the square of the
correlation coefficient, r 2 . It measures the explanatory power
of the regression model.

Example
A study involving six employees finds the following values for
job performance rating (JPR) and years experience. Using
these data how well does experience explain job performance?
JPR 1.5 4 4.5 5 6.2 7
Experience 2.5 3.5 4 4.5 5 5.9

Dr. Nguyen Ngoc Phan Simple Linear Regression


The coefficient of determination
The coefficient of determination is the square of the
correlation coefficient, r 2 . It measures the explanatory power
of the regression model.

Example
A study involving six employees finds the following values for
job performance rating (JPR) and years experience. Using
these data how well does experience explain job performance?
JPR 1.5 4 4.5 5 6.2 7
Experience 2.5 3.5 4 4.5 5 5.9

Answer: r 2 = 0.964

Dr. Nguyen Ngoc Phan Simple Linear Regression


Spearman’s rank correlation coefficient
Spearman’s rank correlation coefficient, R, is used to measure
the correlation between the order or rank of two variables.
6 di2
P
R =1−
n(n2 − 1)

where n is the number of pairs of data, di is the difference


between the rankings in each set of data.
The value of R can be interpreted in exactly the same way as
the ordinary correlation coefficient, and −1 ≤ R ≤ 1.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example
Apple hired seven computer technicians. They were given a
test to measure their basic knowledge. After a year of service,
their supervisor was asked to rank each technician’s job
performance. Their test scores and performance rankings are
Technician Test score Performance ranking
Smith 82 4
Jones 73 7
Boone 60 6
Lewis 80 3
Clark 67 5
Lincoln 94 1
Washington 89 2

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example
Calculate Spearman’s rank correlation coefficient and interpret
the result.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example
Calculate Spearman’s rank correlation coefficient and interpret
the result.
Solution: R = 0.857. Our sample suggests a strong, positive
relationship between a technician’s test score and his or her
job performance rating.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Lines of best fit

The correlation coefficient measures the degree of correlation


between two variables, but it does not tell us how to predict
values for one variable y given values for the other variable x.
To do that, we need to find a line which is a good fit for the
points on a scattergraph, and use that line to find the value of
y corresponding to each given value of x.
There are two methods to find such a line: the scattergraph
method and the linear regression method.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The scattergraph method
Draw a line through data points with about an equal number
of points above and below the line.

Figure: An example of scattergraph method

Dr. Nguyen Ngoc Phan Simple Linear Regression


Linear regression using the least squares method

Definition
The least squares method of linear regression provides a
technique for estimating the equation of a line of best fit.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Linear regression using the least squares method

Definition
The least squares method of linear regression provides a
technique for estimating the equation of a line of best fit.

The equation of a line is of the form y = b0 + b1 x.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Linear regression using the least squares method

Definition
The least squares method of linear regression provides a
technique for estimating the equation of a line of best fit.

The equation of a line is of the form y = b0 + b1 x.


The least squares method provides estimates for values of a
and b as follows
P P P P P
n xy − x y y b1 x
b1 = P 2 P b0 = −
n x − ( x)2 n n

where n is the number of pairs of data.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (4)
The management of Hop Scotch Airlines assumes a direct
relationship between advertising expenditures and the number
of passengers who choose to flight Hop Scotch. The company
hired statisticians to determine the regression model. Monthly
values for advertising expenditures and the numbers of
passengers are collected for the n = 15 most recent months.
The advertising x is in $’0,000 and passengers y are in
thousands.
The accounting department for Hop Scotch must determine
the relationship between advertising expenditures and the
number of passengers to make decisions regarding the
allocations for the advertising budget.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (4)
Month x y Month x y
1 10 15 9 19 24
2 12 17 10 10 17
3 8 13 11 11 16
4 17 23 12 13 18
5 10 16 13 16 23
6 15 21 14 10 15
7 10 14 15 12 16
8 14 20 Total 187 268

Dr. Nguyen Ngoc Phan Simple Linear Regression


Solution: The regression equation is

y = 4.40 + 1.08x

The model tells us that if, for example, $100,000 is spent on


advertising (x = 10), then

y = 4.40 + 1.08(10) = 15.2

We predict on the basis of our model that 15,200 people will


choose to fly Hop Scotch when $100,000 is spent on
advertising.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (5)
Confirm that the degree of correlation between the advertising
expenditures and number of passengers is high by calculating
the correlation coefficient.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (5)
Confirm that the degree of correlation between the advertising
expenditures and number of passengers is high by calculating
the correlation coefficient.
Solution: r = 0.97

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (6)
The chief executive officer for Hop Scotch must also decide
how the fund for advertising to be allocated. The airline often
buys advertising space in the magazines Execitive Weekly and
Fisherman’s Delight. Market research has shown the people
who read EW have higher income then those reading FD.
Therefore, the CEO is curious about the impact that a
person’s income has on the frequency of flying. Data are
gathered for 10 passengers on their annual income levels, in
$’000, and the number of flights they took during the most
recent 12-month period.
Use regression analysis to determine the nature of relationship
between income and tendency of people to use air service in
their travel plans.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (6)
Passenger Flights (y ) Income (x)
1 5 30
2 4 27
3 7 38
4 10 48
5 11 59
6 8 54
7 9 42
8 11 63
9 8 52
10 9 47
Total 82 460

Dr. Nguyen Ngoc Phan Simple Linear Regression


Solution: y = 0.058 + 0.177x
The regression coefficient of 0.177 tells that there is a positive
relationship between income and the frequency of flying. As
income goes up, the number of flights will increase.
It can also be seen that a 1-unit increase in income of $1000 is
associated with a 0.177 increase in the number of flights.
If the CEO wishes to get most of his advertising dollar, he
should place his advertisements in Execitive Weekly.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (7)
The director’s assertion is that illness will be reduced as
exercise increases. The employee records provide these data:
X X
n = 50, xy = 1080, x = 180
X X
y = 450, x 2 = 5340
where x is the number of exercising hours, y is the number of
sick days, and n is the number of employees participating the
survey.
Use regression analysis to examine the relationship between
illness and exercise.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Solution: y = 9.4 − 0.1151x
It would appear that the personnel director and her exercise
program are vindicated. The negative sign on the regression
coefficient testifies to the claim that as employees spend more
time in the exercise program, the number of days lost to illness
decreases.
Specifically, if one additional hour is devoted to physical
exercise, the number of sick days will decrease by 0.1151 days.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (8)
The chairman of the Federal Reserve System of the US has
the responsibility of controlling the nation’s money supply. His
actions impact directly on mortgage rates people must pay to
buy houses. In 1999, his staff was instructed to examine the
effect of mortgage rates on the number of houses sold. A
regional center in Lexington gathering data for the study
provided the information shown below. Housing units are in
hundreds.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (8)
Year Housing units sold Mortgage rate
1981 20 12.1
1984 17 13.5
1986 13 14.95
1988 14 13.75
1990 15 12.95
1992 14 12.5
1994 15 10.1
1996 16 9.82
1998 17 9.5

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (8)
1 Determine the dependent and independent variables.
2 Assuming a linear relationship exists between these two
variables, construct regression model.
3 Interpret the constant and coefficient.
4 What would be the level of units sold if the mortgage rate
was 11.5 percent?
5 What would happen to the number of units if the rate
increased by 2 percentage points?

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (9)
A principal theory in finance holds that as bond yields rise,
investor takes fund out of the stock market, causing it to fall,
and buy bonds. Weekly data, which use the federal funds as a
proxy for bond yields, as reported by the Commerce
Department in the winter of a year are shown.
1 Assuming the federal fund rate impacts on the stock
market, identify the dependent variable.
2 Do these data tend to corroborate that financial theory?
In what manner and to what extend would interest rates
serve as a forecasting tool for the stock market?

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (9)
Week Dow Jones Federal funds rate
1 2050 6.8
2 2010 6.95
3 1983 7.3
4 2038 7.5
5 1995 7.7
6 1955 7.7
7 1878 8.3
8 1802 8.7

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (9)
Week Dow Jones Federal funds rate
1 2050 6.8
2 2010 6.95
3 1983 7.3
4 2038 7.5
5 1995 7.7
6 1955 7.7
7 1878 8.3
8 1802 8.7

Solution: DJ = 2885.21 - 120.93 FRR


r 2 = 0.84. Financial theory is corroborated as evidenced by
the negative coefficient for FRR.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (10)
A popular financial theory holds that there is a direct
relationship between the risk of an investment and the return
it promises. A stock’s risk is measured by its β-value. Shown
here are the returns and β-values for 12 stocks suggested by
an investment firm. Do these data seem to support this
financial theory of a direct relationship?
Investors typically view return as a function of risk. Use an
interpretation of both the regression coefficient and the
coefficient of correlation in your respond.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (10)
Stock Return (%) β-Value
1 5.4 1.5
2 8.9 1.9
3 2.3 1.0
4 1.5 0.5
5 3.7 1.5
6 8.2 1.8
7 5.3 1.3
8 0.5 -0.5
9 1.3 0.5
10 5.9 1.8
11 6.8 1.9
12 7.2 1.9

Dr. Nguyen Ngoc Phan Simple Linear Regression


Solution: Return = 0.4902 + 3.39β
r = 0.896, r 2 = 0.8.
Both b and r are positive. The financial theory of a direct
relationship is supported.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (11)
Economic theory holds that as interest rates go down firms are
able to invest more in capital equipment. Monthly figures for
the interest rate and levels of new capital investment in
billions of dollars are shown in the table.
1 Calculate the regression model.
2 Plot the data and regression line. Does the model
support the theory that lower interest rates are associated
with higher level of investment?

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (11)
Month Interest rate Capital investment
Jan 10.0 10
Feb 9.5 11
Mar 9.0 12
Jul 7.5 16
Aug 7.0 17
Sep 6.5 18
Oct 6.0 19
Nov 5.5 20
Dec 5.0 21

Dr. Nguyen Ngoc Phan Simple Linear Regression


Example (11)
Month Interest rate Capital investment
Jan 10.0 10
Feb 9.5 11
Mar 9.0 12
Jul 7.5 16
Aug 7.0 17
Sep 6.5 18
Oct 6.0 19
Nov 5.5 20
Dec 5.0 21

Solution: CI = 32.57 - 2.25 IR. Yes, line has negative slope.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Simple Linear Regression Model

The equation that describes how y is related to x and an error


term is called the regression model.
Simple linear regression model
y = β0 + β1 + 
where  is a random variable referred to as the error term.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Simple Linear Regression Model

The equation that describes how y is related to x and an error


term is called the regression model.
Simple linear regression model
y = β0 + β1 + 
where  is a random variable referred to as the error term.

The equation that describes how the expected value of y ,


denoted E (y ), is related to x is called the regression equation.
Simple linear regression equation
E (y ) = β0 + β1 x

Dr. Nguyen Ngoc Phan Simple Linear Regression


In practice, β0 and β1 are not known and must be estimated
using sample data. Sample statistics, denoted by b0 and b1 ,
are computed as estimates of the population parameters β0
and β1 .

Dr. Nguyen Ngoc Phan Simple Linear Regression


In practice, β0 and β1 are not known and must be estimated
using sample data. Sample statistics, denoted by b0 and b1 ,
are computed as estimates of the population parameters β0
and β1 .
Estimated simple linear regression equation
ŷ = b0 + b1 x
where b0 and b1 are computed by the least squares method,
and ŷ is the point estimator of E (y ).

Dr. Nguyen Ngoc Phan Simple Linear Regression


In practice, β0 and β1 are not known and must be estimated
using sample data. Sample statistics, denoted by b0 and b1 ,
are computed as estimates of the population parameters β0
and β1 .
Estimated simple linear regression equation
ŷ = b0 + b1 x
where b0 and b1 are computed by the least squares method,
and ŷ is the point estimator of E (y ).

The graph of the estimated simple linear regression equation is


called the estimated regression line, which is also the line of
best fit that we have already introduced.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Homework:
Page 608: Problems 5, 9 and 13
Page 619: Problems 19 and 21.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Model Assumptions

We saw that the value of the coefficient of determination (r 2 )


is a measure of the goodness of fit of the estimated regression
equation. However, even with a large value of r 2 , the
estimated regression equation should not be used until further
analysis of the appropriateness of the assumed model has been
conducted.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Model Assumptions

We saw that the value of the coefficient of determination (r 2 )


is a measure of the goodness of fit of the estimated regression
equation. However, even with a large value of r 2 , the
estimated regression equation should not be used until further
analysis of the appropriateness of the assumed model has been
conducted.
An important step in determining whether the assumed model
is appropriate involves testing for the significance of the
relationship. The tests of significance in regression analysis are
based on the following assumptions about the error term .

Dr. Nguyen Ngoc Phan Simple Linear Regression


Assumptions about  in the Regression model
1 E () = 0. This implies β0 and β1 are constants, and
hence
E (y ) = β0 + β1 x
2 The variance of , denoted by σ 2 , is the same for all x.
3 The values of  are independent.
4  is a normally distributed random variable for all values
of x.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Testing for Significance
If β1 = 0, then E (y ) = β0 . In this case, we would conclude
that x and y are not linearly related.
If β1 6= 0, we would conclude that the two variables are related.
Thus, to test for a significant regression relationship, we must
conduct a hypothesis test to determine whether β1 = 0.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Testing for Significance
If β1 = 0, then E (y ) = β0 . In this case, we would conclude
that x and y are not linearly related.
If β1 6= 0, we would conclude that the two variables are related.
Thus, to test for a significant regression relationship, we must
conduct a hypothesis test to determine whether β1 = 0.
The t test is commonly used. It requires an estimate of σ 2 ,
the variance of  in the regression model.
With ŷ = b0 + b1 x, we can use the mean square error s 2 to
estimate σ 2 .
(yi − ŷi )2
P
2
s =
n−2

The standard error of the estimate s = s 2 is used to
estimate σ.
Dr. Nguyen Ngoc Phan Simple Linear Regression
t Test
The purpose of the t test is to see whether we can conclude
that β1 6= 0. We will use the sample data to test the following
hypotheses about β1 :

H0 : β1 = 0 Ha : β1 6= 0

Sampling distribution of b1
Expected value E (b1 ) = β1
Standard population
σ
σb1 = pP
(xi − x̄)2

Distribution form: Normal


Dr. Nguyen Ngoc Phan Simple Linear Regression
Because we do not know the value of σ, we develop an
estimate of σb1 , denoted sb1 , by estimating σ with s
s
sb1 = pP
(xi − x̄)

t test for significance in simple linear regression


H0 : β1 = 0 Ha : β1 6= 0
Test statistic
b1
t=
sb 1
Reject H0 if p–value ≤ α, where p follows a t distribution with
n − 2 degrees of freedom.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The confidence interval for β1 is

b1 ± tα/2 sb1

The confidence coefficient associated with this interval is


1 − α, and tα/2 is the t value providing an area of α/2 in the
upper tail of a t distribution with n − 2 degrees of freedom.
If 0, the hypothesized value of β1 , is not included in the
confidence interval, we can reject H0 .
Example: Problems 23 and 26, page 629.
Homework: Problems 27 and 30, page 631.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Estimation and Prediction

When using the simple linear regression model, we are making


an assumption about the relationship between x and y . We
then use the least squares method to obtain the estimated
simple linear regression equation.
If a significant relationship exists between x and y and the
coefficient of determination shows that the fit is good, the
estimated regression equation should be useful for estimation
and prediction.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The following notation will be necessary:
x ∗ = the given value of the independent variable x
y ∗ = the random variable denoting the possible values of
the dependent variable y when x = x ∗
E (y ∗ ) = the mean or expected value of the dependent
variable y when x = x ∗
ŷ ∗ = b0 + b1 x ∗ = the point estimator of E (y ∗ ) and the
predictor of an individual value of y ∗ when x = x ∗ .

Dr. Nguyen Ngoc Phan Simple Linear Regression


Point estimators and predictors do not provide any information
about the precision associated with the estimate and/or
prediction.
We must develop confidence intervals and prediction intervals.
A confidence interval is an interval estimate of the mean value
of y for a given value of x.
A prediction interval is used whenever we want to predict an
individual value of y for a new observation corresponding to a
given value of x.

Dr. Nguyen Ngoc Phan Simple Linear Regression


The formula for estimating the variance of ŷ ∗ is

(x ∗ − x̄)2
 
2 2 1
sŷ ∗ = s +P
n (xi − x̄)2

Confidence interval for E (y ∗ )


ŷ ∗ ± tα/2 sŷ ∗
where the confidence coefficient is 1 − α and tα/2 is based on
the t distribution with n − 2 degrees of freedom.

Dr. Nguyen Ngoc Phan Simple Linear Regression


Recall that s 2 is the mean square error. Define

(x ∗ − x̄)2
 
2 2 2 1
spred = s + sŷ ∗ = s 1 + + P
n (xi − x̄)2

Prediction interval for ŷ ∗


ŷ ∗ ± tα/2 spred
where the confidence coefficient is 1 − α and tα/2 is based on
a t distribution with n − 2 degrees of freedom.

Example: Problem 35, page 637.


Homework: Problem 37, page 638.

Dr. Nguyen Ngoc Phan Simple Linear Regression

You might also like