Appliedstat 2017 Chapter 4 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Chapter 4

Simple Linear Regression

4.1 The model

The simple linear regression model for n observations can be written as

y i = β 0 + β 1 x i + ei , (4.1)

i = 1, 2, ..., n.

To complete the model in (4.1), we make the following additional assumptions:

1. E(ei | xi ) = 0 for all i = 1, 2, ..., n, or, equivalently, E(yi | xi ) = β 0 + β 1 xi .

2. var (ei | xi ) = σ2 for all i = 1, 2, ..., n, or, equivalently, var (yi | xi ) = σ2 .

3. cov(ei , e j | xi , x j ) = 0 for all i 6= j or, equivalently, cov(yi , y j | xi , x j ) = 0.

Conditioning arguments may be dropped in the lecture notes.

4.2 Estimation

In the least-squares approach, we seek estimators βb0 and βb1 that minimize the
sum of squares of the deviations (yi − ybi )2 of the n observed yi ’s from their

predicted values.
CHAPTER 4. SIMPLE LINEAR REGRESSION 28

• Normal equations:
n
∑ ( yi − β 0 − β 1 xi ) = 0
i =1
n
∑ xi ( yi − β 0 − β 1 xi ) = 0
i =1

∑in=1 ( xi − x̄ )(yi −ȳ) ( x − x̄ )


• βb1 = ∑in=1 ( xi − x̄ )2
We will use expression βb1 = ∑in=1 ai yi , where ai = ∑n (i x − x̄)2 .
i =1 i

Note that ∑in=1 ai = 0 and ∑in=1 a2i = 1/SXX , where SXX = ∑in=1 ( xi − x̄ )2 .

βb0 = ȳ − βb1 x̄
p
 βb1 can be also expressed as βb1 = SXy /SXX = r Syy /SXX = rSdyy /Sd XX .

• E( βb1 ) = β 1

E( βb0 ) = β 0
σ2
Var ( βb1 ) = n
∑i=1 ( xi − x̄ )2
 
1 x̄2
Var ( βb0 ) = σ2 n + n
∑i=1 ( xi − x̄ )2

cov(ȳ, βb1 ) = 0

σ2 = (n − 2)−1 ∑in=1 (yi − βb0 − βb1 xi )2 .


b

e and yb
• Partition of sum of squares: Using orthogonality between b

||y||2 = || X βb||2 + ||b


e||2 ,

that is
n n n
∑ y2i = ∑ yb2i + ∑ (yi − ybi )2,
i =1 i =1 i =1
or
n n n
∑ (yi − ȳ)2 = ∑ (ybi − ȳ)2 + ∑ (yi − ybi )2
i =1 i =1 i =1
SST SSR SSE

• Coefficient of Determination: R2 = r2 = SSR/SST = 1 − SSE/SST = S2Xy /(SXX Syy ).

 The proportion of variation in y that is explained by the model.

 r is sample correlation coefficient between y and x.


CHAPTER 4. SIMPLE LINEAR REGRESSION 29

• SST = SSR + SSE, and SSE and SSR are independent.

• Suppose that e = (e1 , · · · , en ) T ∼ Nn (0, σ2 I ). Under H0 : β 1 = 0, SSE/σ2 ∼


χ2 (n − 2), SSR/σ2 ∼ χ2 (1).

4.3 Testing hyporthesis

• For the distributions of estimators, we assume e = (e1 , · · · , en ) T ∼ Nn (0, σ2 I ).

• H0 : β 1 = β 10


βb1 − β 10 /(σ/ SXX ) βb − β
T= √ = 1 √ 10
σ2 /σ2
b σ/ SXX
b
σ2 =
is distributed with t with n − 2 degrees of freedom under the null, where b

SSE/(n − 1) = MSE.

• H0 : β 0 = β 00

Recall βb0 = ȳ − βb1 x̄.

βb0 is distributed as normal with mean E βb0 = β 0 and var (ȳ − βb1 x̄ ) = σ2 (1/n +
x̄2 / ∑in=1 ( xi − x̄ )2 . Note that cov(ȳ, βb1 x̄ ) = 0.

βb0 − β 00
T= q
x̄2
σ2 ( n1 +
b SXX )

is distributed with t with n − 2 degrees of freedom under the null.

4.4 Estimating E(yi | xi )

• An estimator of the regression line, E(y| x ) = β 0 + β 1 x is yb = βb0 + βb1 x. We can

check its unbiasedness, E( βb0 + βb1 x ) = β 0 + β 1 x.


CHAPTER 4. SIMPLE LINEAR REGRESSION 30

• Confidence interval can be placed using

var ( βb0 + βb1 xi ) = var {ȳ + βb1 ( xi − x̄ )}

= var (ȳ) + var ( βb1 )( xi − x̄ )2 + 2( xi − x̄ )cov(ȳ, βb1 )

= σ2 [1/n + ( xi − x̄ )2 /SXX ]

• Under the normality of the data, yb is also normally distributed.

4.5 Residuals

• ei = y i − β 0 − β 1 x i . b
ei = yi − βb0 − βb1 xi

ei is also normally distributed with mean


• Under the normality of the data, b
E (b
ei ) = 0 and variance

var (yi − βb0 − βb1 xi ) = var {yi − ȳ − βb1 ( xi − x̄ )}

= var (yi − ȳ) + ( xi − x̄ )2 var ( βb1 ) − 2( xi − x̄ )cov(yi − ȳ, βb1 )

= σ2 [1 − 1/n − ( xi − x̄ )2 /SXX ],

where cov(yi − ȳ, βb1 ) = ai σ2 is used. Recall that ai = ( xi − x̄ )/ ∑nj=1 ( x j − x̄ )2 .


Chapter 5

Multiple Regression

5.1 The model

• The model and assumptions: The multiple linear regression model can be ex-
pressed as

yi = β 0 + β 1 x1i + · · · + β p x pi + ei = Xi β + ei

for i = 1, · · · , n. The matrix form is

y = Xβ + e.

The assumptions for ei or yi are essentially the same as those for simple linear
regression:

1. E(ei ) = 0 or E(yi | Xi ) = Xi β.

2. cov(y) = σ2 In .

• Estimation of β: The estimator of β minimizes (y − Xβ) T (y − Xβ), the solution

of X T Xβ = X T Y. When X is of full column rank, βb = ( X T X )−1 ( X T Y ).

b E( βb) = β, Var ( βb) = σ2 ( X T X )−1 . Under the normality of data,


• Properties of β:
βb is normally distributed as well.

• Gauss Markov Theorem: If E(y) = Xβ and cov(y) = σ2 In , the least-squares


estimators are best linear unbiased estimators (BLUE).
CHAPTER 5. MULTIPLE REGRESSION 32

(Proof) Let β̃ = Ay representing linear estimators. To restrict among unbiased


estimators, we impose E β̃ = Ay = AXβ = β, which implies A should satisfy

AX = I. Then

β̃ = Ay = { A − ( X T X )−1 X T + ( X T X )−1 X T }y = { A − ( X T X )−1 X T }y + βb

var ( β̃) = ( A − ( X T X )−1 X T )( A T − X ( X T X )−1 )σ2

+ 2( A − ( X T X )−1 X T ) X ( X T X )−1 σ2 + var ( βb)

 var ( βb)
βb has the smallest variance and equality holds when A = ( X T X )−1 X T .

The above theorem holds for c T βb in estimating c T β.

5.2 Estimation of σ2

• Let s2 = (n − p − 1)−1 (y − X βb) T (y − X βb). We can see that E(s2 ) = σ2 , and


( n − p −1) s2
under normality assumption, σ2
∼ χ2 ( n − p − 1).

• Note that var ( βb) = σ2 ( X T X )−1 is unknown. Its estimator can be obtained by
c ( βb) = s2 ( X T X )−1 .
var

• MLE: Assuming y ∼ N ( Xβ, σ2 I ), the maximum likelihood estimator for β is

βb = ( X T X )−1 X T y, and for σ2 , b


σ2 = n−1 (y − X βb) T (y − X βb)

βb ∼ N ( β, σ2 ( X T X )−1 )
nbσ2
σ2
∼ χ2 ( n − p − 1)

σ2 (or s2 ) are independent.


βb and b

5.3 Geometry of least squares

: Denote by En a Euclidean space in n dimensions. Consider a subspace in k

dimensions where n > k. A subspace that is of particular interest to us is the


CHAPTER 5. MULTIPLE REGRESSION 33

one for which the columns of X provide the basis vectors. We may denote the k
columns of X as x1 , x2 , · · · , xk . Then the subspace associated with these k basis

vectors will be denoted by S( X ) or S( x1 , · · · , xk ). The basis vectors are said


to span this subspace, which will in general be a k-dimensional subspace. The
subspace S( x1152
, · · · MULTIPLE
, xk ) consists
REGRESSION:of every vector that can be formed as a linear
ESTIMATION

It is important to clarify first what the geometric approach to least squares is not. In
combination of
twothe xi , i =
dimensions, we 1, · · · k. the
illustrated Formally, it issquares
principle of least defined as
by creating a two-
dimensional scatter plot (Fig. 6.1) of the n points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). We
then visualized the least-squares regression line as the best-fitting straight line to
the data. This approach can be generalized to present the least-squares estimate in
multiple linear regression on thebasis of the best-fitting k hyperplane in (k þ 1)-
S( x1 , · · · , xk ) ≡ z ∈ E z =
dimensional space to the n points (x 11 , x12 , .

n. . , x1k , y1 ), (x21 , x22 , . . . , x2k , y2 ), . . . ,
bi x i , bi ∈ R
(xn1 , xn2 , . . . , xnk , yn ). Although this approach is somewhat useful in visualizing
multiple linear regression, the geometric approach toi =least-squares 1 estimation in
multiple linear regression does not involve this high-dimensional generalization.
We would like to represent y using Xb and find
The geometric approach to be discussed below is appealing because of its math-
b such that euclidian distance
ematical elegance. For example, the estimator is derived without the use of matrix cal-
between y and culus.Xb Also,isthethe geometricshortest.
approach providesSuch deeper X βbinsight
should projection of y onto
be inference.
into statistical
Several advanced statistical methods including kernel smoothing (Eubank and
S( x1 , · · · , xk ).Eubank
Let b e1999),
= (Fourier
y−X analysis (Bloomfield
βb). i.e., for every 2000), and
1997) can be understood as generalizations of this geometric approach. The geo-
wavelet analysis
element in S( (Ogden
x1 , · · · , xk ) that is
metric approach to linear models was first proposed by Fisher (Mahalanobis 1964).
represented byChristensenXb, it (1996) should satisfy and Sengupta (2003) discuss the linear stat-
and Jammalamadaka
istical model almost completely from the geometric perspective.

< Xb,
7.4.1 Parameter >=Space,
eData
Space,b )T b
( Xband = b T XSpace
ePrediction T
e=
b 0.
The geometric approach to least squares begins with two high-dimensional spaces, a
(k þ 1)-dimensional space and an n-dimensional space. The unknown parameter
vector b can be viewed as a single point in (k þ 1)-dimensional space, with axes cor-
Following tworesponding
figuresto (from the textbook)
the k þ 1 regression coefficients billustrate the geometry of
0 , b1 , b0 , . . . , bk . Hence we call this
least squares.
space the parameter space (Fig. 7.3). Similarly, the data vector y can be viewed as a

Figure 7.3 Parameter space, data space, and prediction space with representative elements.
actually has the status of a subspace because it is closed under addition and scalar
multiplication (Harville 1997, pp. 28 – 29). This subset is said to be the subspace gen-
erated or spanned by the columns of X, and we will call this subspace the prediction
space. The columns of X constitute a basis set for the prediction space.

7.4.2 Geometric Interpretation of the Multiple Linear Regression Model


CHAPTER 5. MULTIPLE
The multipleREGRESSION
linear regression model (7.4) states that y is equal to a vector in the 34
prediction space, E(y) ¼ Xb, plus a vector of random errors, 1 (Fig. 7.4). The

Figure 7.4 Geometric relationships of vectors associated with the multiple linear regression
model.

5.4 Sum of squares

• Let H = X ( X T X )−1 X T . We can see that H Jn = Jn , from postmultiplying 1T to


both sides of X T X ( X T X )−1 X T 1 = X T 1. Therefore ( In − H )( H − n1 Jn ) = 0. Then
n  
1
SST = ∑ (yi − ȳ) = y 2 T
In − Jn y
i =1
n
 
T 1
= y In − H + H − Jn y
n
 
T T 1
= y ( In − H )y + y H − Jn y
n
= SSE + SSR,
where SST is “total sum of squares,” SSE is “sum of squares of errors” and SSR

is “regression sum of squares.”

• Consider the distribution of SSE when y ∼ N ( Xβ, σ2 I ). Note that ( I − H ) X = 0,


X T ( I − H ) = 0, therefore SSE = y T ( In − H )y = (y − Xβ) T ( In − H )(y − Xβ).
Also In − H is idempotent and rank( I − H ) = tr ( I − H ) = n − p − 1. Then

1
2
SSE ∼ χ2 (n − p − 1).
σ

• Now consider the distribution of SST = y T ( In − n1 Jn )y. ( In − n1 Jn ) is idempotent


and rank( In − n1 Jn ) = tr ( In − n1 Jn ) = n − 1.

1 T
Let λ T = 2σ2
µ ( In − n1 Jn )µ = 1 T T
2σ2
β X ( In − n1 Jn ) Xβ. Then
CHAPTER 5. MULTIPLE REGRESSION 35

1
SST ∼ χ2 (n − 1, λ T ).
σ2

• Now consider the distribution of SSR = y T ( H − n1 Jn )y. First note that ( In −


H )( H − n1 Jn ) = 0 and ( H − n1 Jn ) is idempotent. Let

1 T
λR = 2σ2
µ (H − n1 Jn )µ = 1 T T
2σ2
β X (H − n1 Jn ) Xβ. Then

1
2
SSR ∼ χ2 ( p, λ R ).
σ

• Coefficient of determination: R2 = SSR/SST


Properties of R2 are:

1. 0 ≤ R2 ≤ 1

2. R = corr
d (yi , ybi ), where corr
d is the sample correlation.

3. If β 1 = β 2 = · · · = β p = 0

R2 = SSR/SST = SSR/(SSE + SSR)


1
σ2
SSR
= 1
σ2
SSE + σ12 SSR
W1
∼ ,
W1 + W2
where W1 ∼ G ( p/2, 2), W2 ∼ G ((n − p − 1)/2, 2) and W1 ⊥ W2

∼ Beta( p/2, (n − p − 1)/2)


p/2 p
Therefore, E( R2 ) = p/2+(n− p−1)/2
= ( n −1)
.

4. R2 is invariant with respect to linear transformation x to Ax where A is of


full rank.

p
( R2 − n−1 )(n−1) ( n −1) R2 − p
• Adjusted R2a = n − p −1 = n − p −1 . Details will be discussed later.
CHAPTER 5. MULTIPLE REGRESSION 36

5.5 Model misspecification

• Case of underfitting: Let the true model be


y = Xβ + e
 
β
= ( X1 , X2 ) 1 +e
β2
= X1 β 1 + X2 β 2 + e
Consider model misspecification where a fitted model includes X1 only, i.e.,

y = X1 β∗1 + e∗ .

Then, we have βb∗1 = ( X1T X1 )−1 X1T y

E( βb∗1 ) = β 1 + Aβ 2 where A = ( X1T X1 )−1 X1T X2 .

E( βb∗1 ) = E(( X1T X1 )−1 X1T y)

= ( X1T X1 )−1 X1T ( X1 β 1 + X2 β 2 )

= β 1 + ( X1T X1 )−1 X1T X2 β 2

var ( βb∗1 ) = σ2 ( X1T X1 )−1

βb∗1 is a biased estimator of β 1 .

• Theorem ! b = ( X T X )−1 X T y from the full model be partitioned as


7.9c. Let β
b = β1 , and let β b ∗ = ( X T X1 )−1 X T y be the estimator from the reduced
b
β 1 1 1
β2
b
model. Then

b ∗ ) = σ2 AB−1 A T , which is a positive definite matrix, where


b ) − cov( β
(i) cov( β 1 1

A = ( X1T X1 )−1 X1T X2 and B = X2T X2 − X2T X1 A. Thus var ( βbj ) > var ( βb∗j ), where
βbj ’s are entries in β
b and
1

(ii) var ( x0T β
b ) ≥ var ( x T β
01 1 ), where x01 is a part of x0 that corresponds to β1 .
b

(Proof) (i) We can verify (i) by directly applying the result of the inverse of the
partitioned matrix. On the other hand

B = X2T X2 − X2T X1 ( X1T X1 )−1 X1T X2 = X2T ( In − X1 ( X1T X1 )−1 X1T ) X2  0


CHAPTER 5. MULTIPLE REGRESSION 37

and so is AB−1 A T positive definite.

• Theorem 7.9d When y = X1 β∗1 + e is fitted, the estimator of σ2 ,

(y − X1 βb∗1 )T (y − X1 βb∗1 )
s2 = ,
n−q

has
β T2 X2T ( I − X1 ( X1T X1 )−1 X1T ) X2 β 2
E ( s2 ) = σ 2 + ,
n−q
where q is the dimension of β1 .

(Proof)

SSE1 = y T y − y T X1 ( X1T X1 )−1 X1T y

= y T ( In − X1 ( X1T X1 )−1 X1 )y

E(SSE1 ) = tr {( In − X1 ( X1T X1 )−1 X1T )σ2 I }

+ β T X T [ In − X1 ( X1T X1 )−1 X1T ] Xβ

= (n − q)σ2 + β T2 X2T [ In − X1 ( X1T X1 )−1 X1T ] X2 β 2

5.6 Orthogonalization

• Theorem 7.10. If X1T X2 = O, then the estimator of β 1 in the full model y =


X1 β 1 + X2 β 2 + e is the same as the estimator of β∗1 in the reduced model y =
X1 β∗1 + e∗ .

• Estimation of β 2 using orthogonalization: Let y = X1 β 1 + X2 β 2 + e be the full

model and y = X1 β∗1 + e∗ be the reduced model.

Consider the following three steps:

1. Regress y on X1 and calculate residuals y − yb( X1 ), where yb( X1 ) = X1 βb∗1 =

X1 ( X1T X1 )−1 X1T y.

2. Regress the columns of X2 on X1 and obtain residuals X2.1 = X2 − X


b 2 ( X1 ) .

If X2 is written in terms of its columns as X2 = ( x21 , · · · , x2j , · · · , x2( p−q+1) ),


CHAPTER 5. MULTIPLE REGRESSION 38

then the regression coefficient vector for x2j on X1 is b j = ( X1T X1 )−1 X1T x2j ,
and xb2j = X1 b j = X1 ( X1T X1 )−1 X1T x2j . For all columns of X2 , this becomes
b 2 ( X1 ) = X1 ( X T X1 )−1 X T X2 = X1 A, where A = ( X T X1 )−1 X T X2 . Note
X 1 1 1 1

that X2.1 = X2 − X
b 2 ( X1 ) is orthogonal to X1 :

X1T X2.1 = 0.

Using the alias matrix A, the residual matrix can be expressed as

X2.1 = X2 − X
b 2 ( X1 )

= X2 − X1 ( X1T X1 )−1 X1T X2 = X2 − X1 A

3. Regress y − yb( X1 ) on X2.1 = X2 − X


b 2 ( X1 ) .

Since X2.1 is orthogonal to X1 , we obtain the same βb2 as in the full model
yb = X1 βb1 + X2 βb2 .

• Remark 1: The above three steps imply that the estimation of β 2 can be viewed
as the partial correlation between y and X2 after removing the effect of X1 . In

other words, the estimation of β 2 takes into account the amount of variation in
y due to X2 after the effect of X1 has been accounted for, and the relationship
between X1 and X2 should be corrected.

• Remark 2: One can directly show that βb2 and β̃ 2 are equivalent, where β̃ 2 is the

estimator obtained in step 3.

T
β̃ 2 = ( X2.1 X2.1 )−1 X2.1
T T
(y − yb( X1 )) = ( X2.1 X2.1 )−1 X2.1
T
y

where X2.1 = X2 − X1 ( X1T X1 )−1 X1T X2 = ( I − H1 ) X2 and H1 = X1 ( X1T X1 )−1 X1T .

β̃ 2 = ( X2T ( I − H1 ) X2 )−1 X2T ( I − H1 )y.

On the other hand, βb is the solution of


CHAPTER 5. MULTIPLE REGRESSION 39

X2T X2 X2T X1 X2T y


    
β2
= .
X1T X2 X1T X1 β1 X1T y
Then

I − X2T X1 ( X1T X1 )−1 X2T X2 X2T X1 I − X2T X1 ( X1T X1 )−1 X2T y


      
β2
= ,
0 I X1T X2 X1T X1 β1 0 I X1T y

X2T X2 − X2T H1 X2 X2T X1 − X2T H1 X1 X2T y − X2T H1 y


    
β2
= .
X1T X2 X1T X1 β1 X1T y
Using X2T X1 − X2T H1 X1 = 0, we have

X2T ( I − H1 ) X2 X2T ( I − H1 )y
    
0 β2
= .
X1T X2 X1T X1 β1 X1T y
Thus,
βb2 = [ X2T ( I − H1 ) X2 ]−1 X2T ( I − H1 )y = β̃ 2

5.7 Centered covariates

• Reparameterize the model such that yi = α + β 1 ( xi1 − x̄1 ) + · · · + β p ( xip − x̄ p ) +

ei , where α = β 0 + β 1 x̄1 + · · · + β p x̄ p . Let β c = ( β 1 , · · · β p ) T and Xc = (( xij −


 
α
x̄ j )). We can rewrite y = (1, Xc ) + e.
βc
Since column 1 is perpendicular to columns of centered X, Xc , the normal equa-
tion is
 
T α
= (1, Xc )T y
b
(1, Xc ) (1, Xc ) b
βc

    
n 0 α
b nȳ
T T =
0 Xc Xc βc
b XcT y

α = ȳ
Then b

βbc = ( XcT Xc )−1 XcT y = S− 1


XX S Xy

α − βb1 x̄1 · · · − βbp x̄ p = ȳ − βbTc x̄


βb0 = b
CHAPTER 5. MULTIPLE REGRESSION 40

5.8 Adjusted R2

• It is well known that the coefficient of determination, R2 , increases as the num-

ber of variables increases. To address such issue, an adjusted R2 is developled.

SSE p /(n− p)
 
−1
• R2adj,p = 1 − 1 − R2p nn− p = 1− SST/(n−1)
= 1 − (n − 1) MSE p /SST, where
MSE p = SSE p /(n − p) is the mean squared error when p covariates are consid-
ered. Note that the model with largest R2adj,p is equivalent to the model with the

smallest MSE p .

• Relation to F statistics: When the model with p coefficients is a submodel of

the model with k coefficients, R2adj,p is related to the F statistic.


(SSE p −SSEk )/(k− p)
 Recall that F = SSEk /(n−k)
.

Then, one can verify

n − 1 (n − k) + F (k − p)
R2adj,p = 1 − (1 − R2k )
n−k n−p

by using SSE p = SST (1 − R2p ).

 One can also show that F ≥ 1 and R2adj,p ≤ R2adj,k are equivalent. This implies
that model selection using the adjusted R2 tends to overfit.

5.9 Numerical examples

• 7.53: gas vapor example


When gasoline is pumped into the tank of a car, vapors are vented into the

atmosphere. An experiment was conducted to determine whether y, the amount


of vapor, can be predicted using the following four variables based on initial
conditions of the tank and the dispensed gasoline:

x1 = tank temperature (◦ F)

x2 = gasoline temperature (◦ F)
CHAPTER 5. MULTIPLE REGRESSION 41

x3 = vapor pressure in tank ( psi)

x4 = vapor pressure of gasoline ( psi)

 Estimation of β

attach(gas)

fit=lm(y˜ . ,data=gas ) ## or fit=lm(y˜x1+x2+x3+x4,data=gas)


X=model.matrix(fit)
t(solve(t(X)%*%X)%*%t(X)%*%y)

(Intercept) x1 x2 x3 x4

[1,] 1.015 -0.02861 0.2158 -4.32 8.975

t(chol2inv(chol(t(X)%*%X))%*%t(X)%*%y)

[,1] [,2] [,3] [,4] [,5]


[1,] 1.015 -0.02861 0.2158 -4.32 8.975

coefficients(fit)

(Intercept) x1 x2 x3 x4

1.01502 -0.02861 0.21582 -4.32005 8.97489

 sume of squares

SST= sum((y-mean(y))ˆ2)
SSR= sum((fit$fitted-mean(y))ˆ2)
SSE= sum((y-fit$fitted)ˆ2) ### or sum((fit$res)ˆ2)

SSR

[1] 2520
CHAPTER 5. MULTIPLE REGRESSION 42

SSE

[1] 201.2

SST-SSR

[1] 201.2

H=X%*%solve(t(X)%*%X)%*%t(X)

n=length(y)
J=rep(1,n)%*%t(rep(1,n))
SSE2= t(y)%*%(diag(n)-H)%*%y

SSE2

[,1]
[1,] 201.2

SSR2= t(y)%*%(H - J/n)%*%y


SSR2

[,1]
[1,] 2520

 R2

summary(fit)$r.squared

[1] 0.9261

SSR/SST

[1] 0.9261
CHAPTER 5. MULTIPLE REGRESSION 43

R2=SSR/SST
summary(fit)$adj.r.squared

[1] 0.9151

((n-1)*R2 - 4)/(n-5) # p=4

[1] 0.9151

 Some other commands

summary(fit)
anova(fit)
attributes(summary)

attributes(anova)
par(mfrow=c(2,2))
plot(fit,which=1)

plot(fit,which=2)
plot(fit,which=3)
plot(fit,which=4)

You might also like