Introduction To Regression and Analysis of Variance PDF
Introduction To Regression and Analysis of Variance PDF
Introduction To Regression and Analysis of Variance PDF
Jonathan Taylor
- p. 1/15
Course outline
Course outline
What is a regression model?
This course is not an exhaustive survey of regression
Simple linear regression
model
methodology.
Parsing the name
Least Squares: Computation
We will focus on regression models: a large class of
Solving the normal equations
Geometry of least squares
statistical models used in applied practice.
Residuals
Estimating 2
In our survey, we will emphasize common themes among
2
Estimating
b e
these models.
Distribution of ,
b:
Inference for t-statistics First half of course bears some similarity to STATS 191
Statistics software
General themes in regression Introduction to Applied Statistics but we will focus a little
models
more on the theoretical aspects of the models than in STATS
191.
Prerequisites: STATS 200 + familiarity with matrix algebra.
Evaluation: 4 assignments (60%), 1 take home final exam
(40 %).
- p. 2/15
What is a regression model?
Course outline
What is a regression model?
A regression model is essentially a model of the relationships
Simple linear regression
model
between some covariates (predictors) and an outcome.
Parsing the name
Least Squares: Computation
Often used in an exploratory setting: can sometimes be used
Solving the normal equations
Geometry of least squares
for confirmatory studies but generally not for establishing
Residuals
Estimating 2
causal relationships.
2
Estimating
b e
Example: to predict height of the wife in a couple, based on
Distribution of ,
b:
Inference for t-statistics the husbands height.
Statistics software
Wife is the outcome;
General themes in regression
models covariate(s) is Husband.
- p. 3/15
Simple linear regression model
Course outline
What is a regression model?
Assume that we only have information on Husband and we
Simple linear regression
model
observe n pairs (Yi , Xi ).
Parsing the name Specifying the model: given (X1 , . . . , Xn ) we assume that
Least Squares: Computation
Solving the normal equations Y i = 0 + 1 Xi + i
Geometry of least squares
Residuals N (0, 2 Inn )
Estimating 2
Estimating
2 Fitting the model: how do we estimate (0 , 1 )?
b e
Distribution of , Least squares
b:
Inference for t-statistics
Statistics software
General themes in regression
n
X
(b0 , b1 ) = argmin
2
models
(Yi 0 1 Xi )
(0 ,1 ) i=1
- p. 4/15
Parsing the name
Course outline
What is a regression model?
Why is it called a simple linear regression model?
Simple linear regression Because we were modelling the height of Wife (Y
model
Parsing the name
Least Squares: Computation
dependent variable) on Husband (X independent variable)
Solving the normal equations
Geometry of least squares
alone we only had one covariate: hence it is a simple
Residuals model.
Estimating 2
Estimating
2 In the model
b e
Distribution of
b:
Inference for
,
t-statistics
E(Y |X) = 0 + 1 X,
Statistics software
General themes in regression i.e. the conditional expectation of Y given X is linear in X.
models
Hence it is a linear regression model.
In general, a linear regression model for an outcome Y and
covariates X1 , . . . , Xp states that
p
X
E Y X1 , . . . , X p = 0 + j X j
j=1
Course outline
What is a regression model?
In Wifes heigh model, least squares regression chooses
Simple linear regression
model
the line that minimizes
Parsing the name
Least Squares: Computation
n
X
Solving the normal equations
Geometry of least squares
SSE(0 , 1 ) = (Yi 0 1 Xi )2 .
Residuals
i=1
Estimating 2
Estimating
2
b e
Distribution of ,
Normal equations:
b:
Inference for t-statistics
Xn
Statistics software
SSE
General themes in regression
= 2 (Yi 0 1 Xi )
models
0 i=1
Xn
SSE
= 2 (Yi 0 1 Xi ) Xi
1 i=1
- p. 6/15
Solving the normal equations
- p. 7/15
Geometry of least squares
Course outline
What is a regression model?
For each pair (0 , 1 ) the vector P(0 ,1 ) with components
Simple linear regression
model
Parsing the name Pi,(0 ,1 ) = 0 + 1 Xi
Least Squares: Computation
Solving the normal equations
Geometry of least squares is a linear combination of the vectors X and
Residuals
Estimating 2
Estimating
2 1 = (1, . . . , 1).
b e
Distribution of ,
b:
Inference for t-statistics The SSE can be expressed as
Statistics software
General themes in regression
n
X
models
- p. 8/15
Residuals
Course outline
What is a regression model?
The residuals are defined as
Simple linear regression
model
Parsing the name ei = Yi Ybi
Least Squares: Computation
Solving the normal equations
Geometry of least squares
Equivalent to
e = Y Yb
Residuals
Estimating 2
Estimating
2
b e
Distribution of , or e is the projection of Y onto the orthogonal complement
b:
Inference for
Statistics software
t-statistics
L of the plane L spanned by 1 , X.
General themes in regression
models
This implies
X n
ei = e 1 = 0
i=1
n
X
ei Xi = e X =0
i=1
Xn
ei Ybi = e Yb =0
i=1
Course outline
What is a regression model?
If we knew (0 , 1 ), then
Simple linear regression
model
Parsing the name i = Y i 0 1 Xi
Least Squares: Computation
Solving the normal equations
Geometry of least squares and
Residuals n
X
Estimating 2
Estimating
2 kk2 = 2i = SSE(0 , 1 ) 2 2n
b e
Distribution of , i=1
b:
Inference for
Statistics software
t-statistics
so !
n
X
General themes in regression
1
models
E 2i = 2
n i=1
- p. 10/15
Estimating 2
Course outline
What is a regression model?
As (0 , 1 ) is unknown we might think of using estimates of
Simple linear regression
model
i instead:
Parsing the name
Least Squares: Computation
Solving the normal equations kek2 = SSE(b0 , b1 ) 2 2n2
Geometry of least squares
Residuals
Estimating 2 and
Estimating
2
2 b b SSE(b0 , b1 )
b e
Distribution of , b = M SE(0 , 1 ) =
b:
Inference for t-statistics n2
Statistics software
General themes in regression
models
is an unbiased estimate of 2 .
Why n 2? Because e is the projection of onto an n 2
dimensional subspace hence we can write its norm as the
sum of the squares n 2 independent standard normal
random variables.
- p. 11/15
Distribution of b, e
E((b0 , b1 )) = (0 , 1 )
Distribution of ,
b:
Inference for t-statistics
Statistics software
General themes in regression 2
Var(b1 ) =
models
Sxx
2
!
1 X
Var(b0 ) = 2 + .
n Sxx
Natural estimates of variance
2
d b1 ) = b
Var(
Sxx
2
!
d b0 ) = 1 X
b2
- p. 12/15
Var( + .
b t-statistics
Inference for :
Course outline
What is a regression model? d b1 ) and
Because e is independent of b it follows that Var(
Simple linear regression
model d b0 ) are independent of .
Var( b
Parsing the name
Least Squares: Computation
Solving the normal equations
Under the hypothesis H0 : 1 = 10
Geometry of least squares
b1 10
Residuals
Estimating 2
Estimating
2 T =q tn2 .
b e
Distribution of , d b1 )
Var(
b:
Inference for t-statistics
Statistics software
General themes in regression
models
(Why?)
To test this hypothesis, compare |T | to tn2,1/2 the 1 /2
quantile of the t distribution with n 2 degrees of freedom.
Reject H0 if |T | > tn2,1/2 .
More on inference in next class.
- p. 13/15
Statistics software
Course outline
What is a regression model?
We will use R in this class.
Simple linear regression
model
R is an open source, multi-platform statistics programming
Parsing the name
Least Squares: Computation environment.
Solving the normal equations
Geometry of least squares Here is the code & output to fit this dummy model:
Residuals
Estimating 2
Estimating
2
b e
Distribution of ,
b:
Inference for t-statistics
Statistics software
General themes in regression
models
- p. 14/15
General themes in regression models
Course outline
What is a regression model?
Specifying regression models.
Simple linear regression What is the joint (conditional) distribution of all outcomes
model
Parsing the name
Least Squares: Computation
given all covariates?
Solving the normal equations Are outcomes independent (conditional on covariates)? If
Geometry of least squares
Residuals not, what is an appropriate model?
Estimating 2
Estimating
2 Fitting the models.
b e
Distribution of , Once a model is specified how are we going to estimate
b:
Inference for t-statistics
Statistics software the parameters?
General themes in regression
models Is there an algorithm or some existing software to fit the
model?
Comparing regression models.
Inference for coefficients in the model: are some zero (i.e.
is a smaller model better?)
What if there are two competing models for the data? Why
would one be preferable to the other?
What if there are many models for the data? How do we
compare models for the data?
- p. 15/15