Linear Algebra With R
Linear Algebra With R
Linear Algebra With R
March 2015
http://BendixCarstensen/SDC/R-course
Version 5.2
Compiled Monday 11th April, 2016, 11:21
from: /home/bendix/teach/APC/00/LinearAlgebra/linalg-notes-BxC.tex
1 Matrix algebra 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Transpose of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.3 Multiplying a vector by a number . . . . . . . . . . . . . . . . . . . . 3
1.2.4 Sum of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.5 Inner product of vectors . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.6 The length (norm) of a vector . . . . . . . . . . . . . . . . . . . . . . 5
1.2.7 The 0–vector and 1–vector . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.8 Orthogonal (perpendicular) vectors . . . . . . . . . . . . . . . . . . . 6
1.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Multiplying a matrix with a number . . . . . . . . . . . . . . . . . . 7
1.3.3 Transpose of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 Sum of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Multiplication of a matrix and a vector . . . . . . . . . . . . . . . . . 8
1.3.6 Multiplication of matrices . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.7 Vectors as matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.8 Some special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.9 Inverse of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.10 Solving systems of linear equations . . . . . . . . . . . . . . . . . . . 12
1.3.11 Some additional rules for matrix operations . . . . . . . . . . . . . . 13
1.3.12 Details on inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 A little exercise — from a bird’s perspective . . . . . . . . . . . . . . . . . . 15
2 Linear models 16
2.1 Least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 What goes on in least squares? . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Projections in Epi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Constructing confidence intervals . . . . . . . . . . . . . . . . . . . . 19
2.3 Showing an estimated curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Reparametrizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ii
Chapter 1
Matrix algebra
1.1 Introduction
These notes have two aims: 1) Introducing linear algebra (vectors and matrices) and 2)
showing how to work with these concepts in R. They were written in an attempt to give a
specific group of students a “feeling” for what matrices, vectors etc. are all about. Hence
the notes/slides are are not suitable for a course in linear algebra.
1.2 Vectors
1.2.1 Vectors
A column vector is a list of numbers stacked on top of each other, e.g.
2
a= 1
3
A row vector is a list of numbers written one after the other, e.g.
b = (2, 1, 3)
In both cases, the list is ordered, i.e.
(2, 1, 3) 6= (1, 2, 3).
We make the following convention:
• In what follows all vectors are column vectors unless otherwise stated.
• However, writing column vectors takes up more space than row vectors. Therefore we
shall frequently write vectors as row vectors, but with the understanding that it
really is a column vector.
A general n–vector has the form
a1
a2
a=
..
.
an
1
2 Matrix algebra R linear algebra
where the ai s are numbers, and this vector shall be written a = (a1 , . . . , an ).
A graphical representation of 2–vectors is shown Figure 1.1. Note that row and column
a1 = (2,2)
2
1
0
a2 = (1,−0.5)
−1
−1 0 1 2 3
The vector a is in R printed “in row format” but can really be regarded as a column vector,
cfr. the convention above.
Example 1.2.1
>
1 1
3 = [1, 3, 2] and [1, 3, 2]> = 3
2 2
R linear algebra 1.2 Vectors 3
a = (a> )>
Example 1.2.2
1 7
7 3 = 21
2 14
Multiplication by a number:
> 7*a
[1] 7 21 14
See Figure 1.3 and 1.4. Only vectors of the same dimension can be added.
4 Matrix algebra R linear algebra
a1 = (2,2)
2
1
− a2 = (−1,0.5)
0
a2 = (1,−0.5)
2a2 = (2,−1)
−1
−1 0 1 2 3
Example 1.2.3
1 2 1+2 3
3 + 8 = 3 + 8 = 11
2 9 2+9 11
Addition of vectors:
> a <- c(1,3,2)
> b <- c(2,8,9)
> a+b
[1] 3 11 11
a1 = (2,2)
2
1
a1 + a2 = (3,1.5)
0
a2 = (1,−0.5)
−1
−1 0 1 2 3
> sum(a*b)
[1] 44
3 a1 + (− a2) = (1,2.5)
a1 = (2,2)
2
1
a1 + a2 = (3,1.5)
− a2 = (−1,0.5)
0
a2 = (1,−0.5)
−1
−1 0 1 2 3
> rep(0,5)
[1] 0 0 0 0 0
> rep(1,5)
[1] 1 1 1 1 1
v1 ⊥ v2 ⇔ v1 · v2 = 0
1.3 Matrices
1.3.1 Matrices
An r × c matrix A (reads “an r times c matrix”) is a table with r rows og c columns
a11 a12 . . . a1c
a21 a22 . . . a2c
A = ..
.. .. ..
. . . .
ar1 ar2 . . . arc
Note that one can regard A as consisting of c columns vectors put after each other:
A = [a1 : a2 : · · · : ac ]
Likewise one can regard A as consisting of r row vectors stacked on to of each other.
Now, create a matrix:
> A <- matrix(c(1,3,2,2,8,9),ncol=3)
> A
[,1] [,2] [,3]
[1,] 1 2 8
[2,] 3 2 9
Note that the numbers 1, 3, 2, 2, 8, 9 are read into the matrix column–by–column. To get
the numbers read in row–by–row do>
> A2 <- matrix(c(1,3,2,2,8,9),ncol=3,byrow=T)
> A2
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 8 9
Some find it easier to create (small) matrices using rbind because you get to enter the
numbers as they appear in the matrix:
> A3 <- rbind( c(1,3,2),
+ c(2,8,9) )
> A3
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 8 9
Example 1.3.1
1 2 7 14
7 3 8 = 21 56
2 9 14 63
Example 1.3.2
>
1 2
3 8 = 1 3 2
2 8 9
2 9
Example 1.3.3
1 2 5 4 6 6
3 8 + 8 2 = 11 10
2 9 3 7 5 16
Addition of matrices
> B <- matrix(c(5,8,3,4,2,7),ncol=3,byrow=T)
> A+B
[,1] [,2] [,3]
[1,] 6 10 11
[2,] 7 4 16
Example 1.3.4
1 2 1·5+2·8 21
3 8 5 = 3 · 5 + 8 · 8 = 79
8
2 9 2·5+9·8 82
Example 1.3.5
1 2 1 2 1 2
3 8 5 4 = 3 8 5 : 3 8
4
8 2 8 2
2 9 2 9 2 9
1·5+2·8 1·4+2·2 21 8
= 3·5+8·8 3 · 4 + 8 · 2 = 79 28
2·5+9·8 2·4+9·2 82 26
Note that the product AB can only be formed if the number of rows in B and the
number of columns in A are the same. In that case, A and B are said to be conform.
In general AB and BA are not identical.
A mnemonic for matrix multiplication is :
10 Matrix algebra R linear algebra
5 4
1 2 8 2 21 8
3 8 5 4 = 1 2 1 · 5 + 2 · 8 1 · 4 + 2 · 2 = 79 28
8 2
2 9 3 8 3·5+8·8 3·4+8·2 82 26
2 9 2·5+9·8 2·4+9·2
Matrix multiplication:
> A <- matrix(c(1,3,2,2,8,9),ncol=2)
> B <- matrix(c(5,8,4,2), ncol=2)
> A%*%B
[,1] [,2]
[1,] 21 8
[2,] 79 28
[3,] 82 26
• A matrix with 0 on all entries is the 0–matrix and is often written simply as 0.
• A diagonal matrix with 1s on the diagonal is called the identity matrix and is
denoted I. The identity matrix satisfies that IA = AI = A. Likewise, if x is a vector
then Ix = x.
AB = BA = I.
One says that B is A’s inverse and writes B = A−1 . Likewise, A is Bs inverse.
Example 1.3.6
Let
1 3 −2 1.5
A= B=
2 4 1 −0.5
Now AB = BA = I so B = A−1 .
Example 1.3.7
If A is a 1 × 1 matrix, i.e. a number, for example A = 4, then A−1 = 1/4.
x1 + 3x2 = 7
2x1 + 4x2 = 10
3
2
2
1
0
−1
−1 0 1 2 3
x1
[,1]
[1,] 1
[2,] 2
Actually, the reason for the name “solve” for the matrix inverter is that it solves
(several) systems of linear equations; the second argument of solve just defaults to the
identity matrix. Hence the above example can be fixed in one go by:
> solve(A,b)
[1] 1 2
A = diag(a1 , a2 , . . . , an )
AA− A = A.
Note that since A is 2 × 3, A− is 3 × 2, so the matrix AA− is the smaller of the two
square matrices AA− and A− A. Because A is of full rank (and only then) AA− = I. This
is the case for any G-inverse of a full rank matrix.
For many practical problems it suffices to find a generalized inverse. We shall return to
this in the discussion of reparametrization of models.
• Solve these equations using the methods you have learned from linear algebra.
Linear models
yi = β0 + β1 xi for i = 1, . . . , 5
The first question is: Can we find a vector β such that y = Xβ? The answer is clearly no,
because that would require the points to lie exactly on a straight line.
A more modest question is: Can we find a vector β̂ such that X β̂ is in a sense “as close
to y as possible”. The answer is yes. The task is to find β̂ such that the length of the vector
e = y − Xβ
> y
[1] 3.7 4.2 4.9 5.7 6.0
> X
16
R linear algebra 2.2 Linear models 17
6.0
●
●
5.5
5.0
●
y
4.5
●
4.0
1 2 3 4 5
x
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
> beta.hat <- solve( t(X)%*%X ) %*% t(X)%*%y
> beta.hat
[,1]
3.07
x 0.61
Hence:
y ∼ N (Xβ, σ 2 In )
where In is the n × n identity matrix.
The least squares fitted values are ŷ = X β̂ = X(X > X)−1 X > y.
This is the projection of the y-vector on the column space of X; it represents the vector
which has the smallest distance to y and which is a linear combination of the columns of
X. The matrix applied to y is called the projection matrix
In least squares regression the fitted value are PX y. The projection is the matrix that
satisfies that the difference between y and its projection is orthogonal to the columns in X
(y − PX y) ⊥ X
The orthogonality means that all of the columns in X are orthogonal to the columns of
residuals, y − PX y = (I − PX )y, for any y. Hence any column in X should be orthogonal
to any column in (I − PX ), so if we take the transpose of X (which is p × n and multiply
with the matrix (I − PX ) (which is n × n) we should get 0:
(I − PX ) ⊥ X ⇔ X > (I − PX ) = 0
Orthogonality of two vectors means that the inner product of them is 0. But this requires
an inner product, and we have implicitly assumed that the inner product between two
n-vectors was defined as: X
ha|bi = ai b i = a> b
i
But for any positive definite matrix M we can define an inner product as:
ha|bi = a> M b
In particular we can use a diagonal matrix with positive values in the diagonal
X
ha|bi = a> W b = ai w i b i
i
A projection with respect to this inner product on the column space of X is:
Exercise 2.2.1
Show that
(y − PW y) ⊥W X
where ⊥W is orthogonality with respect to the inner product induced by the diagonal
matrix W .
R linear algebra 2.2 Linear models 19
These three columns can be computed from the first columns by taking the p × 2 matrix
(β̂, σ̂) and post multiplying it with the 2 × 3 matrix:
1 1 1
0 −1.96 1.96
> library(Epi)
> ci.mat()
Estimate 2.5% 97.5%
[1,] 1 1.000000 1.000000
[2,] 0 -1.959964 1.959964
> ci.mat(alpha=0.1)
Estimate 5.0% 95.0%
[1,] 1 1.000000 1.000000
[2,] 0 -1.644854 1.644854
The matrix we use to multiply with the parameter estimates is the age-part of the design
matrix we would have if observations were for ages a1 , a2 , . . . , an . The product of this piece
of the design matrix and the parameter vector represents the function f (a) evaluated in the
ages a1 , a2 , . . . , an .
We can illustrate this approach by an example from the Epi package where we model the
birth weight as a function of the gestational age, using the births dataset:
> data(births)
> str(births)
'data.frame': 500 obs. of 8 variables:
$ id : num 1 2 3 4 5 6 7 8 9 10 ...
$ bweight: num 2974 3270 2620 3751 3200 ...
$ lowbw : num 0 0 0 0 0 0 0 0 0 0 ...
$ gestwks: num 38.5 NA 38.2 39.8 38.9 ...
$ preterm: num 0 NA 0 0 0 0 0 0 0 0 ...
$ matage : num 34 30 35 31 33 33 29 37 36 39 ...
$ hyp : num 0 0 0 0 1 0 0 0 0 0 ...
$ sex : num 2 1 2 1 1 2 2 1 2 1 ...
> ma <- lm( bweight ~ gestwks + I(gestwks^2), data=births )
> ( beta <- coef( ma ) )
(Intercept) gestwks I(gestwks^2)
-8247.693621 406.461711 -2.893052
> ( cova <- vcov( ma ) )
(Intercept) gestwks I(gestwks^2)
(Intercept) 5307195.933 -292330.6903 3995.943420
gestwks -292330.690 16204.3875 -222.720377
I(gestwks^2) 3995.943 -222.7204 3.075777
R linear algebra 2.3 Showing an estimated curve 21
If we want to predict the birth weight for gestational ages 32 to 42 weeks, say, we just set
up the matrices needed and perform the computations:
> ga.pts <- 32:42
> G <- cbind( 1, ga.pts, ga.pts^2 )
> wt.eff <- G %*% beta
> sd.eff <- sqrt( diag(G %*% cova %*% t(G)) )
> wt.pred <- cbind( wt.eff, sd.eff ) %*% ci.mat()
> matplot( ga.pts, wt.pred, type="l", lwd=c(3,1,1), col="black", lty=1 )
This machinery has been implemented in the function ci.lin which takes a subset of
the parameters from the model (selected via grep), and multiplies them (and the
corresponding variance-covariance matrix) with a supplied matrix:
> wt.ci <- ci.lin( ma, ctr.mat=G )[,c(1,5,6)]
> matplot( ga.pts, wt.ci, type="l", lty=1, lwd=c(3,1,1), col="black" )
3500
3500
3000
3000
wt.pred
wt.ci
2500
2500
2000
2000
32 34 36 38 40 42 32 34 36 38 40 42
ga.pts ga.pts
Figure 2.2: Prediction of the gestational age-specific birth weight. Left panel computed “by
hand”, right panel using ci.lin.
The example with the age-specific birth weights is a bit irrelevant because they could
have been obtained by the predict function — or even simpler, the ci.pred function from
the Epi package.q
The interesting application is with more than one continuous variable, say gestational
age and maternal age. Suppose we have a quadratic effect of both gestational age and
maternal age:
> ma2 <- lm( bweight ~ gestwks + I(gestwks^2) +
+ matage + I(matage^2), data=births )
> ci.lin( ma2 )
Estimate StdErr z P 2.5% 97.5%
(Intercept) -7404.0059740 2592.084329 -2.8563909 0.004284873 -12484.397903 -2323.614045
gestwks 409.2978884 127.755598 3.2037570 0.001356469 158.901518 659.694259
I(gestwks^2) -2.9284416 1.759843 -1.6640354 0.096105363 -6.377671 0.520788
matage -53.8976266 75.249616 -0.7162512 0.473836261 -201.384163 93.588910
I(matage^2) 0.7959908 1.118262 0.7118107 0.476582033 -1.395762 2.987744
22 Linear models R linear algebra
and that we want to show the effect of gestational age for a maternal age of 35 (pretty
close to the median) and the effect of maternal age relative to this reference point. (These
two curves can be added to give predicted values of birth weight for any combination of
gestational age and maternal age.)
First we must specify the reference point for the maternal age, and also the points for
maternal age for which we want to make predictions
> ma.ref <- 35
> ma.pts <- 20:45
> M <- cbind(ma.pts,ma.pts^2)
> M.ref <- cbind(ma.ref,ma.ref^2)
Then we derive the estimated birth weight for a maternal age 35, as function of gestational
age. To this end we need to have a contrast matrix which in each row has the 1, the
gestational age, gestational age squared, the latter two for all the gestational ages of
interest (that is the matrix G defined above) and the reference maternal age and the same
squared (i.e. identical rows):
Then to show the effect of maternal age on birth weight relative to the maternal age
reference of 35, we sue only the maternal age estimates, but subtract rows corresponding to
the reference point:
400
3000
200
bw.ma
bw.ga
2500
0
2000
−200
32 34 36 38 40 42 20 25 30 35 40 45
ga.pts ma.pts
Figure 2.3: Prediction of the gestational age-specific birth weight for maternal age 35 (left)
and the effect of maternal age on birth weight, relative to age 35.
R linear algebra 2.4 Reparametrizations 23
Exercise 2.3.1
Redo the two graphs in figure 2.3 with y-axes that have the same extent (in grams birth
weight).
Why would you want to do that?
2.3.1 Splines
The real advantage is however when simple functions such as the quadratic is replaced by
spline functions, where the model is a parametric function of the variables, and
implemented in the functions ns and bs from the splines package.
Exercise 2.3.2
Repeat the exercise above, where you replace the quadratic terms with natural splines
(using ns() with explicit knots), both in the model and in the prediction and construction
of predictions and contrasts.
2.4 Reparametrizations
Above we saw how you can compute linear functions of estimated parameters using
ci.lin. But suppose you have a model parametrized by the linear predictor Xβ, but that
you really wanted the parametrization Aγ, where the columns of X and A span the same
linear space. So Xβ = Aγ, and we assume that both X and A are of full rank,
dim(X) = dim(A) = n × p, say.
We want to find γ given that we know Xβ and that Xβ = Aγ. Since we have that
p < n, we have that A− A = I, by the properties of G-inverses, and hence:
γ = A− Aγ = A− Xβ
1. Given that you have a set of fitted values in a model (in casu ŷ = Xβ) and you want
the parameter estimates you would get if you had used the model matrix A. Then
they are γ = A− ŷ = A− Xβ.
2. Given that you have a set of parameters β, from fitting a model with design matrix
X, and you would like the parameters γ, you would have got had you used the model
matrix A. Then they are γ = A− Xβ.
The latter point can be illustrated by linking the two classical parametrizations of a
model with a factor, either with or without intercept:
The last resulting matrix is clearly the one that translates from an intercept and two
contrasts to three group means.
The practical use of this may be somewhat limited, because the requirement is to know
the “new” model matrix A, and if that is available, you might just fit the model using this.