Notes 1024 Part1
Notes 1024 Part1
Notes 1024 Part1
Again, OLS (ordinary least squares) is the most used method to estimate the
parameters of the linear model. A linear model with arbitrary arguments 𝑏0 +
𝑏1 𝑥1 + 𝑏2 𝑥2 is used as a fit for the dependent variable values. The method
uses the residual 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖 . As in Chapter 11, The fitting model
is judged by how small the set of residuals is. Here OLS minimizes the sum of
squares function 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑏0 − 𝑏1 𝑥1𝑖 − 𝑏2 𝑥2𝑖 )2 . The OLS
method is to find the arguments (𝛽መ0 , 𝛽መ1 , 𝛽መ2 ) that make 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 as small
as possible. This minimization is a standard calculus problem.
Step 1 is to calculate the partial derivatives of 𝑆𝑆 𝑏0 , 𝑏1 , 𝑏2 with respect to each
argument.
Step 2 is to find the arguments (𝛽መ0 , 𝛽መ1 , 𝛽መ2 ) that make the three partial
derivatives simultaneously zero. The resulting equations are still called the
normal equations:
𝑇
Regression 𝑝−1 𝑋𝛽መ
𝑇
𝑋𝛽መ − 𝑛𝑌ത𝑛2 𝑋𝛽መ 𝑋𝛽መ − 𝑛𝑌ത𝑛2 𝑀𝑆 𝑅𝐸𝐺
𝑝−1 𝑀𝑆𝐸
𝑛−𝑝 𝑅𝑇 𝑅 𝑅𝑇 𝑅
Error
(𝑛 − 𝑝)
2
Total 𝑛−1 𝑇𝑆𝑆 = (𝑛 − 1)𝑆𝐷𝑉
Inferences in Multiple Regression
Again, there are four assumptions. The two important assumptions are that the
outcome variables are 𝑌𝑖 , 𝑖 = 1, … , 𝑛 independent and that the regression function is
𝛽0 + 𝛽1 𝑥1𝑖 + ⋯ + 𝛽𝑝−1 𝑥 𝑝−1 𝑖 . Homoscedasticity is less important. The assumption
that 𝑌𝑖 , 𝑖 = 1, … , 𝑛 are normally distributed random variables is least important.
Testing null hypotheses about the partial regression
coefficients
The mathematical analysis of the general problem is complicated. The analysis for
two independent variables, however, is more manageable—particularly the problem
of sequential tests. As before, the model for the data is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 +
𝜎𝑌|12 𝑍𝑖 .
The research problem is to consider a sequence of models. The first model is that
𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝜎𝑌|1 𝑍𝑖 with null hypothesis 𝐻0 : 𝛽1 = 0. This is a Chapter 11
problem. The second model is that 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜎𝑌|12 𝑍𝑖 with null
hypothesis 𝐻0 : 𝛽2 = 0. This is an example of a sequential test. That is, the second
hypothesis is tested after the first one. These tests require the definition of the
partial correlation coefficient.
Partial correlation coefficient
Analysis of variance table for a sequential test
Analysis of Variance Table
Multiple Regression of 𝑌 on 𝑥1 and 𝑥2 |𝑥1
2 2 2 2
Reg on 𝑥2 |𝑥1 1 𝑟𝑌2|1 (1 − 𝑟 𝑥1 , 𝑦 ) 𝑇𝑆𝑆 𝑟𝑌2|1 (1 − 𝑟 𝑥1 , 𝑦 ) 𝑇𝑆𝑆 𝑀𝑆 𝑅𝑒𝑔/𝑀𝑆𝐸
2
Total (corrected) 𝑛−1 𝑇𝑆𝑆 = (𝑛 − 1)𝑆𝐷𝑉
Example question:
A study collects the values of (𝑌, 𝑥1 , 𝑥2 ) on 400 subjects. The total sum of squares
for Y is 1000. The correlation between 𝑌 and 𝑥1 is 0.67; the correlation between 𝑌
and 𝑥2 is 0.50; and the correlation between 𝑥1 and 𝑥2 is 0.25.
a. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑥1 and the sum of
squares due to the regression on 𝑥2 after including 𝑥1 .
b. Test the null hypothesis that both 𝛽2 = 0 and 𝛽1 = 0 ; that is, the null
hypothesis is that there is no association between 𝑌 and these two
independent variables.
c. Test the null hypothesis that the variable 𝑥2 does not improve the fit of
the model once 𝑥1 has been included against the alternative that the
variable does improve the fit of the model. Report whether the test is
significant at the 0.10, 0.05, 0.01 levels of significance.
Complete Mediation and Complete Explanation
Causal Models
In analyzing research data from engineering or physical sciences studies, the
independent variables typically operate at the same time. Given this, the fact
that a partial regression coefficient is an estimate of a partial derivative strongly
indicates to the user that caution is warranted in the interpretation of a partial
regression coefficient. In social science and epidemiological research, however,
the independent variables may operate at different points of time. For example,
𝑥1 may describe a variable measured when the participant was between ages 5
and 6, and 𝑥2 may describe a variable measured when the participant was
between the ages of 8 and 9. The time-ordering of the independent variables is
a crucial consideration in the interpretation of partial regression coefficients.
For example, often one sees that 𝜌𝑦2 appears significant (that is, 𝑥2 has a
significant F statistic in a multiple regression analysis or the 𝑟𝑦2 , the Pearson
product moment correlation, is significant) but that 𝜌𝑦2|1 does not appear
significant. That is, in multiple regression analysis, the variable 𝑥2 does not
have a significant F-to-enter once 𝑥1 is in the regression equation.
There is a fundamental paper (Simon, 1954, available on JSTOR and on our
Brightspace site) that you should download and read it.
Simon points out that when one has a common cause model (or explanation), the
independent variable 𝑥1 precedes both 𝑥2 and 𝑦 with regard to operation impact.
Then if 𝑥1 “causes” 𝑥2 and if 𝑥1 “causes” 𝑦, then there will be a “spurious”
correlation 𝜌𝑦2 (this correlation will be non-zero even though 𝑥2 has no causal
relation to 𝑦) and 𝜌𝑦2|1 will be zero. For example, consider G. B. Shaw’s
correlation between the number of suicides in England in a given year and the
number of churches of England in the same year.
In a causal chain model, the independent variable 𝑥2 operates before and causes
𝑥1 , and 𝑥1 operates before 𝑦 and causes 𝑦. Simon also points out that, when the
model is a causal chain (or mediation), one also observes that 𝜌𝑦2 will be non-
zero and 𝜌𝑦2|1 will be zero (even though 𝑥2 causes 𝑦 through the mediation of
𝑥1 ).
Both causal modeling situations have the same empirical fact that a partial
correlation is near 0. Deciding which interpretation is valid requires clarifying the
sequence of operation of the variables. In practice, the relevant partial correlation
may not be essentially 0. In this event, researchers speak of partial explanation
and partial mediation.
Example question:
A research team sought to estimate the model 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑤. The
variable 𝑌 was a measure of depression of a participant observed at age 25; the
variable 𝑥 was a measure of anxiety shown by the participant at age 18; and the
variable w was a measure of the extent of traumatic events experienced by the
participant before age 15. They observed values of y , 𝑥 and w on 𝑛 = 800
subjects. They found that the standard deviation of 𝑌 ,where the variance estimator
used division by 𝑛 − 1, was 12.2. The correlation between 𝑌 and 𝑤 was 0.31; the
correlation between 𝑌 and 𝑥 was 0.14; and the correlation between 𝑥 and 𝑤 was
0.41.
1. Compute the partial correlation coefficients 𝑟𝑌𝑥|𝑤 and 𝑟𝑌𝑤|𝑥 .
2. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑤 and the sum of
squares due to the regression on 𝑥 after including 𝑤. Test the null hypothesis
that 𝛽1 = 0 against the alternative that the coefficient is not equal to zero. That
is, test whether 𝑥 adds significant additional explanation after using w. Report
whether the test is significant at the 0.10, 0.05, and 0.01 levels of significance.
3. What interpretations can you make of these results in terms of causal models?
Example question:
A research team sought to estimate the model 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑤. The
variable 𝑌 was a measure of the extent of criminal behavior of a participant
observed at age 30; the variable 𝑥 was a measure of the rebelliousness shown by
the participant at age 12; and the variable w was a measure of delinquency shown
at age 18. They observed values of y , 𝑥 and w on 𝑛 = 1500 subjects. They found
that the standard deviation of 𝑌 ,where the variance estimator used division by 𝑛 −
1, was 15.7. The correlation between 𝑌 and 𝑤 was 0.62; the correlation between
𝑌 and 𝑥 was 0.35; and the correlation between 𝑥 and 𝑤 was 0.58.
1. Compute the partial correlation coefficients 𝑟𝑌𝑥|𝑤 and 𝑟𝑌𝑤|𝑥 .
2. Compute the analysis of variance table for the multiple regression analysis of
𝑌. Include the sum of squares due to the regression on 𝑤 and the sum of
squares due to the regression on 𝑥 after including 𝑤. Test the null hypothesis
that 𝛽1 = 0 against the alternative that the coefficient is not equal to zero. That
is, test whether 𝑥 adds significant additional explanation after using w. Report
whether the test is significant at the 0.10, 0.05, and 0.01 levels of significance.
3. What interpretations can you make of these results in terms of causal models?