Notes 1024 Part1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Chapter 12

Multiple Regression and


the General Linear Model
The research context is that two or more independent variables and one
dependent variable have been observed for each of 𝑛 participants. Here, we will
discuss two independent variables 𝑥1𝑖 , 𝑥2𝑖 , 𝑖 = 1, … , 𝑛 and one dependent
variable 𝑦𝑖 , 𝑖 = 1, … , 𝑛. The mathematics and analysis for more independent
variables generalize routinely. The research team then has a spreadsheet with
𝑛 vectors of observations 𝑥1𝑖 , 𝑥2𝑖 , 𝑦𝑖 , 𝑖 = 1, … , 𝑛.

As in Chapter 11, one of the variables (here 𝑦) is the outcome variable or


dependent variable. This is the variable hypothesized to be affected by the other
variables in scientific research. The other variables (here 𝑥1𝑖 , and 𝑥2𝑖 , 𝑖 = 1, … , 𝑛)
are the independent variables. They may be hypothesized to predict the outcome
variable or to cause a change in the outcome variable. The research task is to
document the association between independent and dependent variables. The
multiple regression model is used to model this association.
As before, a recommended first step is to create the scatterplots of observations,
with the vertical axis representing the dependent variable and the horizontal axis
representing one of the independent variables. The “pencil test” can be used
again. If the plot passes this test, then it is reasonable to assume that a linear
model (such as 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 ) describes the data. The linear model is
reasonable for many data sets in observational studies. Specifically, the model
for Chapter 12 is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜎𝑌|12 𝑍𝑖 . The parameters
(𝛽0 , 𝛽1 , 𝛽2 )are fixed but unknown. The parameter 𝜎𝑌|12 is the unknown
conditional standard deviation of 𝑌𝑖 controlling for 𝑥1𝑖 and 𝑥2𝑖 ,𝑖 = 1, … , 𝑛. The
standard deviation of 𝑌𝑖 is assumed to be equal for each observation. The
random errors 𝑍𝑖 are assumed to be independent. The independence of the
random errors (and hence independence of 𝑌𝑖 ) is important. The assumption of a
linear regression function (that is E 𝑌𝑖 𝑥1𝑖 , 𝑥2𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 ) is also
important. As in Chapter 11, this is equivalent to the joint distribution of the
2
dependent variable values being 𝑁𝐼𝐷(𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 ,𝜎𝑌|12 ).
Estimating the Linear Model Parameters

Again, OLS (ordinary least squares) is the most used method to estimate the
parameters of the linear model. A linear model with arbitrary arguments 𝑏0 +
𝑏1 𝑥1 + 𝑏2 𝑥2 is used as a fit for the dependent variable values. The method
uses the residual 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖 . As in Chapter 11, The fitting model
is judged by how small the set of residuals is. Here OLS minimizes the sum of
squares function 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑏0 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖 )2 . The OLS
method is to find the arguments (𝛽መ0 , 𝛽መ1 , 𝛽መ2 ) that make 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 as small
as possible. This minimization is a standard calculus problem.
Step 1 is to calculate the partial derivatives of 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 with respect to each
argument.
Step 2 is to find the arguments (𝛽መ0 , 𝛽መ1 , 𝛽መ2 ) that make the three partial
derivatives simultaneously zero. The resulting equations are still called the
normal equations:

These equations still have a very important mathematical interpretation. Let 𝑟𝑖 =


𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥1𝑖 −𝛽መ2 𝑥2𝑖 . 𝑖 = 1, … , 𝑛. The first normal equation is equivalent to
σ𝑛𝑖=1 𝑟𝑖 = 0; the second is σ𝑛𝑖=1 𝑟𝑖 𝑥1𝑖 = 0; and the third is σ𝑛𝑖=1 𝑟𝑖 𝑥2𝑖 = 0. That is,
there are three constraints on the n residuals. The OLS residuals must sum to
zero, and the OLS residuals are orthogonal to the two independent variable
values. The n residuals then have n-3 degrees of freedom.
Step 3 is to solve this three-linear-equation system in three unknowns.
There is a more general approach to solving systems like this.
Recall the three normal equations, the left-hand side terms are the same as the
terms of 𝑋 𝑇 𝑦, and the coefficients of the OLS estimators match with the terms
of 𝑋 𝑇 𝑋. For this problem, then, the normal equations can be written in matrix
form as (𝑋 𝑇 𝑋)𝛽መ = 𝑋 𝑇 𝑦.This result also holds for three or more independent
variables. The proof is exactly the same as for the two independent variable
case.

The existence of (𝑋 𝑇 𝑋)−1 is the usual case in observational studies using


multiple regression. If (𝑋 𝑇 𝑋)−1 does not exist, then the OLS estimators exist
but are not unique.
෡ = (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚
Distribution of 𝜷
𝑌1
𝑌
Let 𝑌 = 2 be the vector of the random outcome variables. That is, the data will

𝑌𝑛
be collected in the future as opposed to having the data in hand as we assumed in
our OLS estimator derivation. The probabilistic model for the data can be written
in matrix form 𝑌 = 𝑋𝛽 + 𝜎𝑌|12 𝑍, where Z is the column vector of random errors
𝑍𝑖 that are assumed to be independent. From now on, we consider the general case
with 𝑝 − 1 independent variables. The vector 𝛽 of parameters in the regression
function is then 𝑝 × 1, remembering that there is an intercept term in our model.
The matrix X of coefficients of the parameters of the regression function is now
𝑝 × 𝑛, 𝑝 < 𝑛, with rank 𝑝.
Fisher’s Decomposition of the (Uncorrected) Total
Sum of Squares
Statistical computing programs subtract the correction 𝑛𝑌ത𝑛2 with 1 degree of
freedom from both the uncorrected total sum of squares and uncorrected
regression sum of squares. That is, the programs display the corrected total sum
of squares and the correct regression sum of squares, in the Analysis of Variance
(ANOVA) Table as below:

Analysis of Variance Table


𝑝 − 1 Predictor Multiple Linear Regression

Source DF Sum of Squares Mean Square F

𝑇
Regression 𝑝−1 𝑋𝛽መ
𝑇
𝑋𝛽መ − 𝑛𝑌ത𝑛2 𝑋𝛽መ 𝑋𝛽መ − 𝑛𝑌ത𝑛2 𝑀𝑆 𝑅𝐸𝐺
𝑝−1 𝑀𝑆𝐸

𝑛−𝑝 𝑅𝑇 𝑅 𝑅𝑇 𝑅
Error
(𝑛 − 𝑝)

2
Total 𝑛−1 𝑇𝑆𝑆 = (𝑛 − 1)𝑆𝐷𝑉
Inferences in Multiple Regression

The probabilistic model for the data is 𝑌 = 𝑋𝛽 + 𝜎𝑌|1..(𝑝−1) 𝑍. The outcome or


dependent (random) variables 𝑌𝑖 , 𝑖 = 1, … , 𝑛 are each assumed to be the sum of the
linear regression expected value 𝛽0 + 𝛽1 𝑥1𝑖 + ⋯ + 𝛽𝑝−1 𝑥 𝑝−1 𝑖 and a random error
term 𝜎𝑌|1…(𝑝−1) 𝑍𝑖 . The random variable 𝑍𝑖 , 𝑖 = 1, … , 𝑛 are assumed to be
independent standard normal random variables. The parameter 𝛽0 is the intercept
parameter and is fixed but unknown. The parameters 𝛽1 , … , 𝛽𝑝−1 are partial
regression coefficient parameters and are also fixed but unknown. These parameters
are the focus of the statistical analysis. The parameter 𝜎𝑌|1…(𝑝−1) is also fixed but
unknown. Another description of this model is that 𝑌𝑖 , 𝑖 = 1, … , 𝑛 are independent
normally distributed random variables with 𝑌𝑛×1 having the distribution
2
𝑀𝑉𝑁(𝑋𝛽, 𝜎𝑌|1…(𝑝−1) 𝐼𝑛×𝑛 ).

Again, there are four assumptions. The two important assumptions are that the
outcome variables are 𝑌𝑖 , 𝑖 = 1, … , 𝑛 independent and that the regression function is
𝛽0 + 𝛽1 𝑥1𝑖 + ⋯ + 𝛽𝑝−1 𝑥 𝑝−1 𝑖 . Homoscedasticity is less important. The assumption
that 𝑌𝑖 , 𝑖 = 1, … , 𝑛 are normally distributed random variables is least important.
Testing null hypotheses about the partial regression
coefficients

The mathematical analysis of the general problem is complicated. The analysis for
two independent variables, however, is more manageable—particularly the problem
of sequential tests. As before, the model for the data is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 +
𝜎𝑌|12 𝑍𝑖 .

The research problem is to consider a sequence of models. The first model is that
𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝜎𝑌|1 𝑍𝑖 with null hypothesis 𝐻0 : 𝛽1 = 0. This is a Chapter 11
problem. The second model is that 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜎𝑌|12 𝑍𝑖 with null
hypothesis 𝐻0 : 𝛽2 = 0. This is an example of a sequential test. That is, the second
hypothesis is tested after the first one. These tests require the definition of the
partial correlation coefficient.
Partial correlation coefficient
Analysis of variance table for a sequential test
Analysis of Variance Table
Multiple Regression of 𝑌 on 𝑥1 and 𝑥2 |𝑥1

Source DF Sum of Squares Mean Square F

Reg on 𝑥1 1 [𝑟(𝑥1 , 𝑦)]2 𝑇𝑆𝑆 [𝑟(𝑥1 , 𝑦)]2 𝑇𝑆𝑆

2 2 2 2
Reg on 𝑥2 |𝑥1 1 𝑟𝑌2|1 (1 − 𝑟 𝑥1 , 𝑦 ) 𝑇𝑆𝑆 𝑟𝑌2|1 (1 − 𝑟 𝑥1 , 𝑦 ) 𝑇𝑆𝑆 𝑀𝑆 𝑅𝑒𝑔/𝑀𝑆𝐸

Error 𝑛−3 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑀𝑆𝐸 = 𝑆𝑆𝐸/(𝑛 − 3)

2
Total (corrected) 𝑛−1 𝑇𝑆𝑆 = (𝑛 − 1)𝑆𝐷𝑉
Example question:
A study collects the values of (𝑌, 𝑥1 , 𝑥2 ) on 400 subjects. The total sum of squares
for Y is 1000. The correlation between 𝑌 and 𝑥1 is 0.67; the correlation between 𝑌
and 𝑥2 is 0.50; and the correlation between 𝑥1 and 𝑥2 is 0.25.
a. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑥1 and the sum of
squares due to the regression on 𝑥2 after including 𝑥1 .
b. Test the null hypothesis that both 𝛽2 = 0 and 𝛽1 = 0 ; that is, the null
hypothesis is that there is no association between 𝑌 and these two
independent variables.
c. Test the null hypothesis that the variable 𝑥2 does not improve the fit of
the model once 𝑥1 has been included against the alternative that the
variable does improve the fit of the model. Report whether the test is
significant at the 0.10, 0.05, 0.01 levels of significance.
Complete Mediation and Complete Explanation
Causal Models
In analyzing research data from engineering or physical sciences studies, the
independent variables typically operate at the same time. Given this, the fact
that a partial regression coefficient is an estimate of a partial derivative strongly
indicates to the user that caution is warranted in the interpretation of a partial
regression coefficient. In social science and epidemiological research, however,
the independent variables may operate at different points of time. For example,
𝑥1 may describe a variable measured when the participant was between ages 5
and 6, and 𝑥2 may describe a variable measured when the participant was
between the ages of 8 and 9. The time-ordering of the independent variables is
a crucial consideration in the interpretation of partial regression coefficients.

For example, often one sees that 𝜌𝑦2 appears significant (that is, 𝑥2 has a
significant F statistic in a multiple regression analysis or the 𝑟𝑦2 , the Pearson
product moment correlation, is significant) but that 𝜌𝑦2|1 does not appear
significant. That is, in multiple regression analysis, the variable 𝑥2 does not
have a significant F-to-enter once 𝑥1 is in the regression equation.
There is a fundamental paper (Simon, 1954, available on JSTOR and on our
Brightspace site) that you should download and read it.

Simon points out that when one has a common cause model (or explanation), the
independent variable 𝑥1 precedes both 𝑥2 and 𝑦 with regard to operation impact.
Then if 𝑥1 “causes” 𝑥2 and if 𝑥1 “causes” 𝑦, then there will be a “spurious”
correlation 𝜌𝑦2 (this correlation will be non-zero even though 𝑥2 has no causal
relation to 𝑦) and 𝜌𝑦2|1 will be zero. For example, consider G. B. Shaw’s
correlation between the number of suicides in England in a given year and the
number of churches of England in the same year.
In a causal chain model, the independent variable 𝑥2 operates before and causes
𝑥1 , and 𝑥1 operates before 𝑦 and causes 𝑦. Simon also points out that, when the
model is a causal chain (or mediation), one also observes that 𝜌𝑦2 will be non-
zero and 𝜌𝑦2|1 will be zero (even though 𝑥2 causes 𝑦 through the mediation of
𝑥1 ).

Both causal modeling situations have the same empirical fact that a partial
correlation is near 0. Deciding which interpretation is valid requires clarifying the
sequence of operation of the variables. In practice, the relevant partial correlation
may not be essentially 0. In this event, researchers speak of partial explanation
and partial mediation.
Example question:
A research team sought to estimate the model 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑤. The
variable 𝑌 was a measure of depression of a participant observed at age 25; the
variable 𝑥 was a measure of anxiety shown by the participant at age 18; and the
variable w was a measure of the extent of traumatic events experienced by the
participant before age 15. They observed values of y , 𝑥 and w on 𝑛 = 800
subjects. They found that the standard deviation of 𝑌 ,where the variance estimator
used division by 𝑛 − 1, was 12.2. The correlation between 𝑌 and 𝑤 was 0.31; the
correlation between 𝑌 and 𝑥 was 0.14; and the correlation between 𝑥 and 𝑤 was
0.41.
1. Compute the partial correlation coefficients 𝑟𝑌𝑥|𝑤 and 𝑟𝑌𝑤|𝑥 .
2. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑤 and the sum of
squares due to the regression on 𝑥 after including 𝑤. Test the null hypothesis
that 𝛽1 = 0 against the alternative that the coefficient is not equal to zero. That
is, test whether 𝑥 adds significant additional explanation after using w. Report
whether the test is significant at the 0.10, 0.05, and 0.01 levels of significance.
3. What interpretations can you make of these results in terms of causal models?
Example question:
A research team sought to estimate the model 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑤. The
variable 𝑌 was a measure of the extent of criminal behavior of a participant
observed at age 30; the variable 𝑥 was a measure of the rebelliousness shown by
the participant at age 12; and the variable w was a measure of delinquency shown
at age 18. They observed values of y , 𝑥 and w on 𝑛 = 1500 subjects. They found
that the standard deviation of 𝑌 ,where the variance estimator used division by 𝑛 −
1, was 15.7. The correlation between 𝑌 and 𝑤 was 0.62; the correlation between
𝑌 and 𝑥 was 0.35; and the correlation between 𝑥 and 𝑤 was 0.58.
1. Compute the partial correlation coefficients 𝑟𝑌𝑥|𝑤 and 𝑟𝑌𝑤|𝑥 .
2. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑤 and the sum of
squares due to the regression on 𝑥 after including 𝑤. Test the null hypothesis
that 𝛽1 = 0 against the alternative that the coefficient is not equal to zero. That
is, test whether 𝑥 adds significant additional explanation after using w. Report
whether the test is significant at the 0.10, 0.05, and 0.01 levels of significance.
3. What interpretations can you make of these results in terms of causal models?

You might also like