Gls PDF
Gls PDF
Gls PDF
1
Contents
2
1 Introduction
In both ordinary least squares and maximum likelihood approaches to parameter estimation,
we made the assumption of constant variance, that is the variance of an observation is the
same regardless of the values of the explanatory variables associated with it, and since the
explanatory variables determine the mean value of the observation, what we assume is that
the variance of the observation is unrelated to the mean.
y
εˆ
.
y^
x
Var [y i ] ≠ σ 2
There are many real situations in which this assumption is inappropriate. In some cases
the measurement system used might be a source of variability, and the size of the measure-
ment error is proportional to the measured quantity. Other times this occurs when errors
are correlated. Also, when the underlying distribution is continuous, but skewed, such as
lognormal, gamma, etc., the variance is not constant, and in many cases variance is a func-
tion of the mean.
An important point is that the constant variance is linked to the assumption of normal dis-
tribution for the response.
When the assumption of constant variance is not satisfied a possible solution is to transform
the data (for example taking log of the response variable and/or the explanatory variables)
to achieve constant variance. Another approach is based on generalized or weighted least
squares which is an modification of ordinary least squares which takes into account the in-
equality of variance in the observations. Weighted least squares play an important role in
the parameter estimation for generalized linear models.
Y = Xβ + ε E[ε] = 0 V ar[ε] = σ 2 V
3
where V is a known n × n matrix. If V is diagonal but with unequal diagonal elements,
the observations y are uncorrelated but have unequal variance, while if V has non-zero off-
diagonal elements, the observations are correlated.
z = K −1 y B = K −1 X g = K −1 ε ⇒ z = Bβ + g (2.1)
E[g] = K −1 E[ε] = 0
V ar[g] = V ar[K −1 ε] = K −1 V ar[ε]K −1 = σ 2 K −1 V K −1 = σ 2 K −1 KKK −1 = σ 2 I,
and so we are under the assumptions of ordinary least squares. The least squares function is
β̂ = (X 0 V −1 X)−1 XV −1 y
| {z }
0 0
(B B )−1 B
and
Again, under normal theory, the generalized least squares estimators are the maximum
likelihood estimators since the log-likelihood function is:
1 1
L ∝ − ln(σ 2 ) − ln |V | − 2 (y − Xβ)0 V −1 (y − Xβ)
2 2σ
The analysis of variance table is:
4
Source df SS MS F
Regression p − 1 SSR = y 0 V −1 X(X 0 V −1 X)−1 X 0 V −1 y M SR
M SR = SSR /(p − 1) M SE
Error n−p SSE = y 0 V −1 y − y 0 V −1 X(X 0 V −1 X)−1 X 0 V −1 y M SE = SSE /(n − p)
Total n−1 SST = y 0 V −1 y M ST = SST /(n − 1)
1. Start with wi = 1
Continue until convergence. More details on the effect of the method on β̂ can be found in
Ruppert and Carroll (1988)
5
3 Examples
The following examples are taken from Chapter 5 of Faraway (2002)
Call:
lm(formula = Employed ~ GNP + Population, data = longley)
Residuals:
Min 1Q Median 3Q Max
-0.80899 -0.33282 -0.02329 0.25895 1.08800
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.93880 13.78503 6.452 2.16e-05
GNP 0.06317 0.01065 5.933 4.96e-05
Population -0.40974 0.15214 -2.693 0.0184
Correlation of Coefficients:
(Intercept) GNP
GNP 0.98
Population -1.00 -0.99
In data collected over time such as this, errors could be correlated. Assuming that errors
take an autoregressive form:
6
> cor(g$res[-1],g$res[-16])
[1] 0.3104092
A model with autoregressive error has covariance matrix V ij = ρ|i−j| . Assuming that ρ is
know and equal to 0.3104092. Then, V is computed as,
> V<-diag(16)
> V<-0.3104092^abs(row(V)-col(V))
and the generalized least squares estimate β̂ = (X 0 V −1 X)−1 XV −1 y is,
> X<-model.matrix(g)
> V.inv<-solve(V)
> beta<-solve(t(X)%*%V.inv%*%X)%*%t(X)%*%V.inv%*%longley$Empl
> beta
[,1]
(Intercept) 94.89887752
GNP 0.06738948
Population -0.47427391
q p
The standard error of β̂, V ar(β̂) = σ 2 (X 0 V −1 X)−1 is
> res<-longley$Empl-X%*%beta
> sig<-sum(res^2)/g$df
> sqrt(diag(solve(t(X)%*%V.inv%*%X))*sig)
(Intercept) GNP Population
14.15760467 0.01086675 0.15572652
Another way to fit this model would be to use the lm() function but with the model in
equation (2.1)
> K<-chol(V)
> K.inv<-solve(t(K))
> B<-K.inv%*%X
> z<-K.inv%*%longley$Empl
> lm(z~B-1)$coeff
B(Intercept) BGNP BPopulation
94.89887752 0.06738948 -0.47427391
In practice, we do not know the value of ρ, and so we would estimate ρ again from the data
cor(res[-1],res[-16])
[1] 0.3564161
and fit the model again, and iterate until convergence.
A third option is to use the nlme() library, which contains the gls() function. We can
use it to fit this model,
7
> library(nlme)
> g<-gls(Employed~GNP+Population,correlation=corAR1(form=~Year),data=longley)
> summary(g)
Generalized least squares fit by REML
Model: Employed ~ GNP + Population
Data: longley
AIC BIC logLik
44.66377 47.48852 -17.33188
Coefficients:
Value Std.Error t-value p-value
(Intercept) 101.85813 14.198932 7.173647 0.0000
GNP 0.07207 0.010606 6.795485 0.0000
Population -0.54851 0.154130 -3.558778 0.0035
Correlation:
(Intr) GNP
GNP 0.943
Population -0.997 -0.966
Standardized residuals:
Min Q1 Med Q3 Max
-1.5924564 -0.5447822 -0.1055401 0.3639202 1.3281898
The final estimate for ρ is 0.644. However, if we check the confidence intervals
intervals(g)
Approximate 95% confidence intervals
Coefficients:
lower est. upper
(Intercept) 71.18320440 101.85813280 132.5330612
GNP 0.04915865 0.07207088 0.0949831
Population -0.88149053 -0.54851350 -0.2155365
attr(,"label")
[1] "Coefficients:"
8
Correlation structure:
lower est. upper
Phi -0.4430373 0.6441692 0.9644866
attr(,"label")
[1] "Correlation structure:"
we see that it is not significantly different from 0, and therefore we can ignore it.
Call:
lm(formula = crossx ~ energy, data = strongx)
9
Residuals vs Fitted
20
10●
1●
10
Residuals
●
●
0 ●
●
●
−10
● ●
7●
Fitted values
lm(formula = crossx ~ energy, data = strongx)
Residuals:
Min 1Q Median 3Q Max
-14.773 -9.319 -2.829 5.571 19.818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 135.00 10.08 13.40 9.21e-07 ***
energy 619.71 47.68 13.00 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> g<-lm(crossx~energy,data=strongx,weights=sd^-2)
> summary(g)
Call:
lm(formula = crossx ~ energy, data = strongx, weights = sd^-2)
10
Residuals:
Min 1Q Median 3Q Max
-2.323e+00 -8.842e-01 1.266e-06 1.390e+00 2.335e+00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 148.473 8.079 18.38 7.91e-08 ***
energy 530.835 47.550 11.16 3.71e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Compare the value of σ̂, and the two fits in the following picture
> plot(crossx~energy,data=strongx)
> abline(g)
> abline(gu,lty=2)
●
350
●
300
crossx
●
250
●
●
●
●
200
● ●
energy
11
Bibliography
Longley, J. (1967). An appraisal of least squares programs from the point of view of the
user. Journal of the American Statistical Association, 62:819–841.
12