What Is Linear Algebra and Why Do I Need It?
What Is Linear Algebra and Why Do I Need It?
What Is Linear Algebra and Why Do I Need It?
These notes accompany the part of the course lectured by Mark Kambites.
• Linear algebra expresses some of the fundamental objects of geometry in a formal algebraic way. It
allows us to use equational reasoning to understand geometry.
• Many real-world problems (both those of a geometric flavour and others) are modelled with linear
algebra, which provides a powerful toolkit to solve these problems.
Practicalities
Lecturer (first half ). Professor Mark Kambites (email [email protected]).
Notes and Lectures. Notes for Mark’s part of the course will be provided with gaps for you to
complete in lectures, on Mark’s webpage at
personalpages.manchester.ac.uk/staff/Mark.Kambites/la.php
The notes form the definitive content of this part of the course, and you should expect to refer to them
frequently. If you need the notes in a different format due a disability, please just let me know. The lectures
will explain the same material, but sometimes more informally.
Exercises. Exercise sheets will be handed out in lectures, and contain instructions on when to do the
exercises and what to hand in. They are an essential part of learning, so if you want to do well in the course
you need to schedule time to attempt all of them (not just the ones for handing in!). Solutions will be made
available when you have had a chance to attempt the questions yourself.
Office Hour. My office is Alan Turing 2.137. My office hour will generally be 15:30-16:30 on Tuesdays
during teaching weeks; it may sometimes be necessary to change this in which case details will be posted
on my website, so please check there before making a special journey. If you can’t make it to my office hour
but need a personal meeting then please email or ask after a lecture for an appointment.
Supervisions and Homework. The exercise sheets tell you which exercises to hand in for which weeks;
your supervision group leader will tell you exactly when and where to hand in. Attendance at super-
visions, and handing in homework, is compulsory. Please make sure you arrive on time with the
exercise sheets; group leaders may mark you absent if you come late or not properly prepared.
Assessment. The assessment for the course comprises homework and supervision attendance (10%) and a
final exam (90%). The supervisions/homework for the first 6 weeks and Section A of the exam will cover
my half of the course.
Feedback. Please let me know how you are finding the course, and especially if you think there are any
problems with it. Feel free to speak to me after a lecture, email me, come along in my office hour, or even
slip an anonymous note under my office door!
Books. The course is self-contained and full notes will be supplied, so you should not need to refer to any
books. But if you would like an alternative viewpoint, the course webpage contains some suggested texts.
1
Prerequisite Material
This course builds directly upon MATH10101 Foundations of Pure Mathematics from last
semester. You will need to understand many of the concepts from that course, including:
• modular arithmetic.
If you can’t remember what the above terms mean, you should go back to your 10101 notes and reread the
definitions. Exercise Sheet 0 gives a chance to practice working with these.
Remark. You’ll discover this semester that mathematics at university level is not an “examine-and-
forget” subject. Each stage builds upon what you have learnt before, and demands progressively deeper
understanding of previous material. Your main aim in studying this course — and my aim in lecturing it
— is therefore not the exam, but a deep understanding of the material which will serve you well next year
and beyond. If together we can achieve that, the exam will take care of itself!
1 Matrices
Matrices are a useful and elegant way to study linear equations. For teaching purposes it is actually easier to
introduce matrices first (which we do in this chapter) and linear equations afterwards (in the next chapter).
Size. A matrix has clearly defined numbers of rows and columns, which together make up its size. A
matrix with p rows and q columns is called a p × q matrix. Of the above examples:
Matrix Entries. If a matrix is called A, we write Aij to denote the entry in the ith
√ row and jth column of
A. For example, in the matrix A above, the entries are A11 = 2, A12 = 5, A21 = 3 and A22 = 0.8. (Some
people write aij instead of Aij .)
1
For now! Later we shall see that matrices (“matrices” is the plural of “matrix”) can also have other things in them, such
as complex numbers or even more abstract mathematical objects, and much of what we are doing will still work. The objects
which form the entries of a matrix are called the scalars (so for now “scalar” is just another word for “real number”).
2
Equality of Matrices. Two matrices A and B are equal if and only if
• they have the same number of rows and
• they have the same number of columns and
• corresponding entries are equal, that is Aij = Bij for all appropriate2 i and j.
2 7 −1 3
Example. Let A = 7 −1 and B = 0 4 .
−1 2 2 0
Since both are 3 × 2 matrices, the subtraction A − B is meaningful and we have
2 − (−1) 7 − 3 3 4
A−B = 7−0 −1 − 4 = 7 −5
−1 − 2 2−0 −3 2
2
By “appropriate” I mean, of course, that i and j have the right values to be “indices” (“indices” is the plural of “index”)
into the matrix: in other words, i is an integer between 1 and the number of rows, and j is an integer between 1 and the number
of columns. Once the size of a matrix is known it should be obvious what numbers are allowed to be row and column indices.
For brevity I will often just write “for all i and j” to mean “for all integers i between 1 and the number of rows in the matrix
and j between 1 and the number of columns in the matrix”.
3
Proposition 1.1. For any matrices A and B of the same size p × q, we have
A − B = A + (−B).
Proof. Notice first that (−B) = (−1)B is also a p × q matrix, so the sum A + (−B) makes sense. Now for
each i and j we have
So A − B and A + (−B) are the same size, and agree in every entry, which means they are equal.
A × B = AB
m× n n× p m× p
The entry in the ith row and the jth column of AB depends on the entries in the ith row of A and the
entries in the jth column of B. Formally,
n
X
(AB)ij = Ai1 B1j + Ai2 B2j + Ai3 B3j + . . . + Ain Bnj = Aik Bkj .
k=1
The way entries of the factors combine to form entries of the product can be visualised as follows:
.. ..
B1j
. .
B2j
Ai1 Ai2 Ai3 . . . Ain (AB)ij
..
... ... B3j ... = ... ...
.. ...
. .. .
..
.
..
. Bnj .
i − th row j − th column Element in i − th row
and j − th column
• The top entry of AB will be computed from top row of A and the single column of B: it is 2 × 5 +
1 × (−2) = 8.
• The middle entry of AB will be computed from the middle row of A and the single column of B: it is
−3 × 5 + 7 × (−2) = −29.
• The bottom entry of AB will be computed from the bottom row of A and the single column of B: it
is 1 × 5 + 5 × (−2) = −5.
4
8
So AB = −29 .
−5
Both C and D are 2 × 2. As the number of columns in C equals the number of rows in D, CD exists, and
it will also be 2 × 2.
2 1 4 −2
CD =
−5 3 3 7
2×4+1×3 2 × (−2) + 1 × 7
=
−5 × 4 + 3 × 3 (−5) × (−2) + 3 × 7
11 3
=
−11 31
Then EF will be a 5 × 3 matrix, and for example the entry (EF )42 can be calculated using the 4th row of
8
4
E, which is 1 2 2 4 , and the 2nd column of F , which is , so:
−3
2
Tip. The only way to master matrix operations — especially multiplication — is to practice! When you
have finished the examples on Exercise Sheet 1, try making up examples of your own and multiplying them.
Get a friend to do the same ones and check you get the same answer.
Exercise. Prove that if A and B are square matrices (same number of rows and columns) of the same size
as each other, then AB is always defined. What size will AB be?
Theorem 1.2. Let A, B and C be matrices such that the given operations are defined, and λ and µ be
numbers. Then
(i) A + B = B + A;
(ii) A + (B + C) = (A + B) + C;
5
(iv) A(B + C) = AB + AC;
Proof. We prove (i) and (iv) to exemplify the methods. Some of the other proofs are on Exercise Sheet 1.
(i) For A + B (and B + A) to be defined, A and B must be the same shape, say m × n. Now A + B and
B + A are both m × n, and for each i and j we have
So A + B and B + A are the same shape with exactly the same entries, which means A + B = B + A.
(iv) For the given operations to be defined, A must be an m × r matrix, and B and C both r × n matrices.
Now A(B + C) and AB + AC are both m × n matrices, and for each i and j we have
r
X r
X r
X r
X
[A(B + C)]ij = Ail (B + C)lj = Ail (Blj + Clj ) = Ail Blj + Ail clj
l=1 l=1 l=1 l=1
= (AB)ij + (AC)ij = (AB + AC)ij .
Remark. Property (i) in Theorem 1.2 is called the commutativity law for matrix addition; notice that
there isn’t a corresponding statement for matrix multiplication - see Exercise Sheet 1. Properties (ii) and
(iii) are associativity laws (for matrix addition and matrix multiplication respectively). Properties (iii)-(xi)
are distributivity laws.
The Main Diagonal. If A is a square matrix (say n × n) then the main diagonal (sometimes just called
the diagonal) is the diagonal from top left to bottom right of the matrix. In other words, it is the collection
of entries whose row number is the same as their column number.
A matrix is called diagonal if all the entries not on the diagonal are 0. (Some, or even all, of the
diagonal entries may also be 0.) For example,
1 0 0
2 0
and 0 1 0
0 0
0 0 1
6
−2 0
5 2
is not diagonal because of the 5 which lies off the main diagonal.
Identity Matrices. A square matrix with 1’s on the main diagonal and 0’s everywhere else is called an
identity matrix. The n × n identity matrix is denoted In . For example:
1 0 0
1 0
I2 = while I3 = 0 1 0 .
0 1
0 0 1
whenever A is the right size for these products to be defined. (Exercise: prove this.)
A A−1 = A−1 A = In .
Not every matrix has an inverse. Matrices with an inverse are called invertible; those which are not
invertible are called singular.
Lemma 1.3. (i) Suppose A is an invertible n × n matrix. Then the inverse of A is unique.
Proof. (i) Suppose B and C are both inverses of A. Then AB = BA = In and AC = CA = In . Now
using Theorem 1.2(iii):
(ii) Let E −1 and D −1 be inverses of E and D respectively. Then again by Theorem 1.2(iii):
A similar argument (exercise!) gives (D −1 E −1 )(ED) = In . This shows that the matrix D −1 E −1 is
an inverse of ED (and so by part (i), the inverse of ED).
7
Remark. The value ad − bc is a very important number associated to the matrix A. It is called the
determinant of A, and we will see later (Chapter 4) how to generalise the idea to bigger matrices.
Exercise. Determine which of the following matrices are invertible, and for those which are find the inverse:
1 2 0 1 −1 3 3 1
.
3 4 1 0 −2 6 3 1
The Zero Matrix. A matrix is called a zero matrix if all of its entries are 0. For example:
0
0 0 0
or .
0 0 0
0
Clearly there is exactly one zero matrix of each size: we write 0p×q for the p × q matrix all of whose entries
are 0, so those above are 04×1 and 02×2 .
Theorem 1.5. Let A and B be matrices such that the given operations are defined. Then:
(i) (AT )T = A;
(ii) (A + B)T = AT + B T ;
(iii) (AB)T = (B T AT ).
Proof.
(i) Clear.
(ii) For A + B to be defined we need A and B to have the same size, say m × n, in which case (A + B)T and
AT + B T are both defined and n × m. Now for each i and j,
((A + B)T )ij = (A + B)ji = Aji + Bji = (AT )ij + (B T )ij = (AT + B T )ij .
So (A + B)T and (AT + B T ) are the same size with the same entries, which means they are equal.
(iii) For AB to be defined it must be that A is m × p and B is p × n, in which case AT is p × m and B T
is n × p, so that B T AT is defined. It is easy to check that (AB)T and B T AT are both n × m. Now for
1 ≤ i ≤ n and 1 ≤ j ≤ m we have3
p
X p
X p
X
((AB)T )ij = (AB)ji = Ajk Bki = (AT )kj (B T )ik = (B T )ik (AT )kj = (B T AT )ij .
k=1 k=1 k=1
3
This is a case where just saying “appropriate i and j” might not be entirely clear, so we are careful to specify!
8
Theorem 1.6. If A is an invertible matrix then AT is invertible and (AT )−1 = (A−1 )T .
Remark. Notice that a matrix which is symmetric or skew symmetric must be square, and a skew symmetric
matrix must have 0s on the main diagonal.
Theorem 1.7. (i) If A is any matrix then AAT and AT A are symmetric matrices.
Proof. (i) and (ii) follow easily from Theorem 1.5, and (iii) follows from Theorem 1.6. (exercise: do
this!).
9
2 Systems of Linear Equations
A system of equations of the form
x + 2y = 7
2x − y = 4
or
5p − 6q + r = 4
2p + 3q − 5r = 7
6p − q + 4r = −2
is called a system of linear equations.
Definition. An equation involving certain variables is linear if each side of the equation is a sum of
constants and constant multiples of the variables.
√
Examples. The equations 2x = 3y + 3, 3 = 0 and x + y + z = 0 are linear.
Non-examples. The equations y = x2 and 3xy + z + 7 = 2 are not linear since they involve products of
variables.
4x − 2y = 22
9x + 3y = 12
. . . has a solution x = 3 and y = −5. (In fact in this case this is the only solution.)
Remark. In general, a system of linear equations can have no solutions, a unique solution, or infinitely
many solutions. It is not possible for the system to have, for example, exactly 2 solutions; this is a higher
dimensional generalisation of the fact that 2 straight lines cannot meet in exactly 2 points (recall Exercise
Sheet 0).
Definition. The set of all solutions is called the solution set or solution space of the system.
Warning. When we talk about solving a system of equations, we mean describing all the solutions (or
saying that there are none, if the solution set is empty), not just finding one possible solution!
3x + 4y = 4.
Each solution consists of an x-value and a y-value. We can think of these as the coordinates of a point in
2-dimensional space. For example, a solution to the above equation is x = −4 and y = 4. This solution
gives the point (−4, 4).
In general, the set of solutions to a linear equation with variables x and y forms a line in 2-dimensional
space. So each equation corresponds to a line — we say that the equation defines the line.
Systems of Equations. Now suppose we have a system of two such equations. A solution to the system
means an x-value and a y-value which solve all the equations at once. In other words, a solution
10
corresponds to a point which lies on all the lines. So in geometric terms, the solution set to a system of
linear equations with variables x and y is an intersection of lines in the plane.
Question. Recall from Exercise Sheet 0 what an intersection of two lines in the plane can look like:
• Usually4 it is a single point. This is why 2 equations in 2 variables usually have a unique solution.
• Alternatively, the lines could be distinct but parallel. Then the intersection is the empty set. This
is why 2 equations in 2 variables can sometimes have no solutions.
• Or then again, the two lines could actually be the same. In this case the intersection is the whole
line. This is why 2 equations in 2 variables can sometimes have infinitely many solutions.
Similarly, if we have a system of more than 2 equations, we are looking for the intersection of all the
lines (in other words, the set of points in the plane which lie on all the lines). The intersection of 3 lines is
“usually” empty but could also be a single point or a line.
3x + 4y + 7z = 4.
This time we can consider a solution as a point in 3-dimensional space. The set of all solutions (assuming
there are some) forms a plane in space. The solution set for a system of k linear equations will be an
intersection of k planes.
Exercise. What can the intersection of 2 planes in 3-space look like? Try to imagine all the possibilities,
as we did for lines in the previous section. How do they correspond to the possible forms of solution sets of
2 equations with 3 variables?
Exercise. Now try to do the same for intersections of 3 planes. (Recall that the intersection of three sets
is the set of points which lie in all three). How do the possible geometric things you get correspond to the
possible forms of solution sets for 3 equations with 3 variables?
In yet higher dimensions, the solution set to a linear equation in n variables is a copy of (n − 1)-dimensional
space inside n-dimensional space. Such a set is called a hyperplane. So geometrically, the solution set of
k equations with n variables will be an intersection of k hyperplanes in n-dimensional space.
Remark. Actually it is not quite true to say that every equation defines a hyperplane. Equations like
“0 = 0” and “2 = 2” define the whole space; if all equations in a system have this form then the solution
space will be the whole space. On the other hand, equations like “0 = 1” have no solutions; if any equation
in a system has this form then the solution space will be empty.
Remark. You might be wondering what happens if we drop the requirement that the equations be linear,
and allow for example polynomial equations? If we do this then lines, planes and hyperplanes get replaced
by more general objects called curves, surfaces and hypersurfaces respectively. For example, the solution
set to x2 + y 2 + z 2 = 1 is a sphere, which is a 2-dimensional surface in 3-space. The solution space to a
system of polynomial equations (an intersection of hypersurfaces) is called an (affine) algebraic variety.
This part of mathematics is called algebraic geometry; you can study it in your third year.
4
Of course “usually” here is a vague term, but I hope you can see intuitively what I mean! If you pick two lines at random
it is somehow “infinitely unlikely” that they will be exactly parallel. It is possible to make this intuition precise; this is well
beyond the scope of this course but as a (challenging!) exercise you might like to think how you would do so.
11
2.4 Linear Equations and Matrices
Any system of linear equation can be rearranged to put all the constant terms on the right, and all the
terms involving variables on the left. This will yield a system something like . . .
4x − 2y = 3
2x + 2y = 5
. . . so from now on we will assume all systems have this form. The system can then be written in matrix
form:
4 −2 x 3
= .
2 2 y 5
You can check (by multiplying out the matrix equation using the definition of matrix multiplication) that
it holds if and only if both of the original (scalar) equations hold. We can express the equations even more
concisely as an augmented matrix:
4 −2 3
.
2 2 5
Definition. Two matrices A and B are called row equivalent if each can be obtained from the other by
a sequence of elementary row operations.
Row operations and row equivalence are useful because of the following fact:
Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M is
row equivalent to N then the two systems have the same solution set.
We’ll prove this theorem later (Chapter 3), and but for now let’s explore its consequences. It means
that if we start with a system of equations, applying elementary row operations to the augmented matrix
will give us another system of equations with the same solution set. The aim of the game is to obtain a
system of equations which is easier to solve. For example, consider the system:
First we convert this into augmented matrix form (noting that the “missing” x terms are really 0x and
become 0s in the matrix):
5
There is an obvious “dual” idea of elementary column operations, written ci → λci , ci → ci + λcj and ci ↔ cj .
12
Now we swap rows 1 and 2 (r1 ↔ r2 ), to get
1
Next let’s scale rows 1 and 2 to make the first entry in each 1 (r1 → −r1 , r2 → 10 r2 ):
It turns out that these equations are easier to solve than the original ones. Equation [3] tells us straight
3
away that z = 3. Now substituting this into equation [2] we get y − 10 × 3 = − 19
10 , in other words, y = −1.
Finally, substituting both of these into equation [2] gives x − 4 × (−1) + 3 = 9, that is, x = 2.
So we have found the solutions to equations [1], [2] and [3]. But these were obtained from our original
system of equations by elementary row operations, so (by Theorem 2.1) they are also the solutions to the
original system of equations. (Check this for yourself !)
This strategy forms the basis of a very general algorithm, called Gaussian6 Elimination, which we
shall see in Section 2.7.
Remark. Notice how we converted the matrix into a roughly “triangular” form, with all the entries towards
the “bottom left” being 0. It was this property which made the new equations easy to solve. The following
definitions will make precise what we mean by “triangular” here.
Pivots. The pivot (or leading entry) of a row in an augmented matrix is the position of the leftmost
non-0 entry to the left of the bar. (If all entries left of the bar are 0, we call the row a zero row and it has
no pivot.)
13
(1) any zero rows come at the bottom; and
(2) the pivot of each of the other rows is strictly to the right of the pivot of the row above; and
(3) all the pivots are 1.
A row echelon matrix is a reduced row echelon matrix if in addition:
(4) each pivot is the only non-zero entry in its column.
Example. The matrix A above is not a row echelon matrix, because the
0 1 1 2 0
second row pivot is not to the right of the first row pivot. The matrix B 0 0 0 1 5
on the right is a row echelon matrix, but it is not a reduced row echelon B=
0
0 0 0 0
matrix because the 2 in the first row is in the same column as the second
0 0 0 0 6
row pivot.
Gaussian Elimination is a simple algorithm which uses elementary row operations to reduce the
augmented matrix to row echelon form:
(1) By swapping rows if necessary, make sure that no non-0 entries (in any row) are to the left of
the first row pivot.
(2) Scale the first row so that the pivot is 1.
(3) Add multiples of the first row to each of the rows below, so as to make all entries below the first row
pivot become 0. (Since there were no non-0 entries left of the first row pivot, this ensures all pivots
below the first row are strictly to the right of the pivot in the first row.)
(4) Now ignore the first row, and repeat the entire process with the second row. (This will ensure that all
pivots below the second row are to the right of the pivot in the second row, which in turn is to the
right of the pivot in the first row.)
(5) Keep going like this (with the third row, and so on) until either we have done all the rows, or all the
remaining rows are zero.
Exercise. Go back to the example in Section 2.5, and compare what we did with the procedure above.
Notice (Step 1) that there are no non-0 entries to the left of the pivot in the first row (there can’t be,
since the pivot is in the first column!). So we do not need to swap rows to ensure this is the case.
14
Next (Step 2) we scale to make the pivot in the first row 1, that is, r1 → 12 r1 , which gives
Next (Step 3) we seek to remove the entry in row 2 which is below the row 1 pivot. We can do this with
the operation r2 → r2 − 6r1 , giving the matrix:
Similarly, we remove the entry in row 3 below the row 1 pivot by r3 → r3 + 4r1 :
Now we ignore the first row, and repeat the whole process with the remaining rows. Notice that there
are no pivots to the left of the row 2 leading entry. (The one in row 1 doesn’t count, since we are ignoring
row 1!). Now we scale to make the second row pivot into 1, with r2 → − 13 r2 , giving:
Then remove the entry in row 3 which is below the row 2 pivot, with r3 → r3 − 3r2 , giving:
Finally, we repeat the process again, this time ignoring rows 1 and 2. All we have to do now is scale row
3 to make the pivot 1, with r3 → 17 r3 :
Our matrix is now in row echelon form, so we convert it back to a system of equations:
2
• substituting into equation [2] gives y − 3 × 4 = − 11
3 , so y = −1;
1 1
• substituting into equation [1] gives x + 2 × (−1) − 2 × 4 = − 27 , so x = −1.
15
Thus, the solution is x = −1, y = −1 and z = 4. (You should check this by substituting the values back
into the original equations!)
Remark. Notice how the row echelon form of the matrix facilitated the “backtracking” procedure for
solving equations [1], [2] and [3].
4x + y = 9 2x − 4y = 12
(i) (ii)
2x − 3y = 1 −x + 2y = −5
• Add multiples of the 2nd row to the row above to remove any non-0 entries above the 2nd row pivot.
• Add multiples of the 3rd row to the rows above to remove any non-0 entries above the 3rd row pivot.
• Continue down the rows, adding multiples of row k to the rows above to eliminate any non-0 entries
above the kth row pivot.
Theorem 2.2. Let A be a matrix. Then A is row equivalent to a unique reduced row echelon matrix (called
the reduced row echelon form of the matrix).
Proof. It should be clear that applying the Gauss-Jordan elimination algorithm to A will give a reduced
row echelon matrix row equivalent to A. The proof of uniqueness is more difficult, and we omit it.
Important Consequence. We can check if two matrices are row equivalent, by applying Gauss-Jordan
elimination to compute their reduced row echelon forms, then checking if these are equal. (If the reduced
row echelon forms are equal then clearly the original matrices are equivalent. Conversely, if the matrices are
equivalent then by Theorem 2.2, they must have the same reduced row echelon form.)
1 3 6 1 0 0
Exercise. Check if 0 1 2 and 1 5 10 are row equivalent.
3 7 14 1 1 2
16
Example. Consider the equations.
−x + 2y + z = 2
3x − 5y − 2z = 10
x − y = 14
Remark. When we say these are the solutions, we mean that substituting in different values of λ ∈ R will
give all the solutions to the equations. For example, λ = 1 would give x = 29, y = 15 and z = 1, so this
is one solution to the system of equations (check this!). Or then again, λ = 0 gives x = 30, y = 16 and
z = 0, so this is another possible solution.
x + 7y − 4z = 0
2x + 4y − z = 0
3x + y + 2z = 0
An obvious observation about homogeneous systems is that they always have at least one solution,
given by setting all the variables equal to 0. This is called the trivial solution. Geometrically, this means
that the solution space to a homogeneous system always contains the origin.
Forward Pointer. Solution sets to homogeneous systems — that is, intersections of hyperplanes through
the origin — are very special and important subsets in Rn , called vector subspaces. More on this later
(Chapter 5).
When it comes to solving them, homogeneous systems of equations are treated just like non-homogeneous
ones. The only difference is that nothing interesting ever happens on the RHS of the augmented matrix, as
the following exercise will convince you!
Remark. Notice how, in your solution, everything stays “homogeneous” throughout. All the augmented
matrices in the elimination process, and all the resulting equations, have only 0’s on the RHS.
17
2.12 Computation in the Real World
In describing methods to solve equations (Gaussian and Gauss-Jordan elimination) we have implicitly as-
sumed two things:
But in reality, these assumptions are often not true. If we want to solve a system of equations because it
models a real-world problem, then the coefficients in the equations may have been obtained by measure-
ment, in which case their values are subject to experimental error. And even if the coefficients are known
precisely, real-world computers don’t actually store and process real numbers: they can’t, because there
is an (uncountably) infinite set of real numbers but even the most powerful digital computer has a finite
memory. Instead they work with approximations to real numbers: typical is floating point arithmetic
in which numbers are rounded to a certain number of significant figures. At every stage of the computation
numbers have to be rounded to the nearest available approximation, introducing slight inaccuracies even if
the original data was perfect.
If the starting data or the arithmetic are imperfect then it clearly isn’t realistic to expect perfect solutions.
But some methods have the useful property that data which is almost right will produce solutions which
are almost right8 . Other methods do not have this property at all: they may work perfectly in theory, but
the tiniest error in the starting data gets magnified as the computation progresses, and leads to a wildly
inaccurate answer. Gaussian elimination, in the simple form we have described, is not very good in this
respect, but it can be made better with a slightly more sophisticated approach (called partial pivoting)
which takes account of the relative size of the matrix entries.
Further consideration of these practical issues is beyond the scope of this course, but you can learn about
them in later years, starting with the second year option MATH20602 Numerical Analysis I.
8
Or the subtly different property that the solutions produced are exact solutions to almost the right problems. Of course
this is an oversimplification; for an accurate explanation see the first few sections of Chapter 1 in the book Accuracy and Stability
of Numerical Algorithms (2nd edition) by N. J. Higham, and in particular the distinction between forward error and backward
error. (But this is not examinable in this course.)
18
3 Fields, Elementary Matrices and Calculating Inverses
3.1 Fields
So far we have worked with matrices whose entries are real numbers (and systems of equations whose
coefficients and solutions are real numbers). If for a moment we call the set of real numbers K (it will
become clear shortly why we’re not just calling it R!), here are a few basic properties of these operations:
(F3) there is an element of K (which we call “0”) with the property that a + 0 = 0 + a = a for all a ∈ K;
(F7) there is an element of K (which we call “1” and which is different from 0) with the property that
1a = a1 = a for all a ∈ K;
(F8) for every a ∈ K except 0, there exists an element a−1 ∈ K such that aa−1 = a−1 a = 1;
These nine properties are called the field axioms. Check for yourself that they are true for the real
numbers. Of course lots of other things are true for the real numbers, so what’s so special about these ones?
It turns out these are exactly the properties which we need to make linear algebra “work”. This means we
can replace the real numbers with any other type of “scalar” having these properties and things will still
work in the same way! For example, the complex numbers (check!) also satisfy the field axioms, so we can
do linear algebra where the scalars are complex numbers.
Definition. Let K be a set equipped with operations of addition and multiplication9 . Then K is a called
field if it satisfies the nine field axioms.
Examples. The real numbers satisfy the axioms (that was the whole point!) so R is a field. We have
already mentioned that the complex numbers satisfy the axioms, so C is a field as well. Exercise Sheet 3
gives some more examples.
Non-example. The set of 2 × 2 matrices (with real number entries) has operations of addition and
multiplication, but is not a field. We saw on Exercise Sheet 1 that it doesn’t satisfy (F6). In fact it satisfies
seven of the nine axioms: on Exercise Sheet 3 you can find the other one that it doesn’t satisfy!
Remark. For real numbers (and complex numbers) we have the extra operations of subtraction and division.
The definition of a field doesn’t explicitly mention these, but in fact we can use them in any field by defining:
Remark. Many other elementary facts about the real numbers can be deduced from the field axioms, which
means these facts must be true in all fields. For example, we know that in the real numbers we always have
(b + c)a = ba + ca, which is not one of the field axioms. But in fact if a, b and c come from any field then:
(b + c)a = a(b + c) = ab + ac = ba + ca
9
Of course, the field axioms don’t even make sense unless we know how to add and multiply the objects!
19
where the first and last equalities follow from (F6) and the middle one from (F9). Exercise Sheet 3 gives
some more examples of this.
Important Remark. In fact, almost everything we did in Chapters 1 and 2 works when the real numbers
are replaced by elements of some other field (for example, by complex numbers). Matrix operations can
be defined in exactly the same way, and everything we proved about them remains true. Systems of linear
equations still make sense, and Gaussian and Gauss-Jordan Elimination still find the solution sets (there is
an example to try on Exercise Sheet 3). Of course, wherever we referred to the real numbers 0 and 1 (for
example, in defining identity and zero matrices, or during Gaussian elimination) we use the 0 and 1 elements
of the field instead. The only thing which was really specific to the real numbers, and doesn’t quite work
over other fields, is the geometric interpretation of solution sets in space.
Remark. It is worth taking a step back to consider the “big picture” of what we did in this section:
• We start with a particular “concrete” mathematical structure (in this case the real numbers).
• We identify the vital features which make it “work”, and abstract these features as axioms.
• We forget the original structure and study an abstract structure (a field) about which the only
assumption we make is that it satisfies the axioms.
• Everything we learn in the abstract setting applies to every structure satisfying the axioms (for exam-
ple, we can apply it back to the real numbers, but we can also apply it to the complex numbers).
This process of abstraction is one of the central pillars of mathematics. Instead of studying specific
individual structures in isolation, it lets us simultaneously understand lots of different structures which
work in the same way. For example, instead of learning that something is true for the real numbers and
then having to check all over again whether it still works for the complex numbers, we can do it for both
structures at once (and for a whole load of other structures besides). This idea is the basis of abstract
algebra, which you will study in detail starting with MATH20201 Algebraic Structures I next semester.
Definition. An n × n matrix E is an elementary matrix if it can be obtained from the identity matrix In
by a single elementary row operation. For an elementary row operation ρ we write Eρ for the corresponding
matrix.
There are three types of elementary matrices, corresponding to the three types of row operations. They
look like:
Eri ↔rj =
20
Eri →αri =
Theorem 3.1. Let A and B be m × n matrices over a field and suppose B is obtained from A by a row
operation ρ. Then B = Eρ A.
Corollary 3.2. Suppose A can be transformed into B by applying a sequences of elementary row operations
ρ1 , ρ2 , . . . , ρk . Then we have
B = Eρk . . . Eρ2 Eρ1 A.
Theorem 3.3. Let E be an elementary n × n matrix over a field. Then E is invertible and E −1 is also an
elementary matrix.
Proof. It is easy to check (exercise!) using the definition of the inverse that:
Elementary matrices allow us to provide the promised proof that elementary row operations on an
augmented matrix preserve the solution set.
Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M is
row equivalent to N then the two systems have the same solution set.
Proof. Let M = (A | B) and N = (C | D). Then the two systems of equations can be written in matrix form
(see Section 2.4) as AX = B and CX = D where X is the column matrix whose entries are the variables.
21
Since M and N are equivalent by a sequences of row operations, A and C are equivalent by the same
sequence of row operations, and so are B and D. So by Corollary 3.2 there are elementary matrices
E1 , . . . , Ek such that
C = Ek . . . E1 A and D = Ek . . . E1 B.
Now suppose Z is column matrix which is a solution to AX = B, that is, AZ = B. Then we have
so Z is also a solution to CX = D.
The converse (showing that a solution to CX = D is also a solution to AX = B) is very similar
(exercise!).
(i) A is invertible;
(ii) =⇒ (iii). Suppose (ii) holds. Let M be the reduced row echelon form of A. Since M is row equivalent
to A, it follows from Theorem 2.1 that the equation M X = 0n×1 has only one solution. Since M is square
and in reduced row echelon form, it is either the identity matrix or has a zero row at the bottom. Having
a zero row at the bottom would give lots of solutions to M X = 0n×1 (exercise: why? think about
(M | 0n×1 ) and backtracking), so it must be that M = In .
(iii) =⇒ (iv). By definition, A is row equivalent to its reduced row echelon form.
A = Ek . . . E1 In = Ek . . . E1
(v) =⇒ (i). By Theorem 3.3 elementary matrices are invertible, and by (an inductive argument using)
Lemma 1.3(ii) a product of invertible matrices is invertible.
In = Eρk . . . Eρ1 A
22
Multiplying both sides on the right by A−1 we get
But this means (by Corollary 3.2 again) that applying the sequence of row operations ρ1 , . . . , ρk to In gives
A−1 . This observation gives us an efficient way to find the inverse A−1 of an invertible matrix A:
(i) use Gauss-Jordan elimination to transform A into reduced row echelon form (which by Theorem 3.4
must turn out to be In — if it isn’t then A isn’t invertible!);
We can make the process even easier by writing the matrices A and In side-by-side in a single augmented
n × 2n matrix (A | In ), then just apply Gauss-Jordan elimination to convert the left-hand-side to In , and
let the right-hand-side “follow along”.
1 0 1
Example. Let A = 1 1 −1 . We have
0 1 0
1 0 1 1 0 0 1 0 1 1 0 0
1 1 −1 0 1 0 −r2 →r2 −r1
−−−−−→ 0 1 −2 −1 1 0
0 1 0 0 0 1 0 1 0 0 0 1
1 0 1 1 0 0
r3 →r3 −r2
−− −−−−→ 0 1 −2 −1 1 0
0 0 2 1 −1 1
r3 → 21 r3
1 0 1 1 0 0
−−−−−→ 0 1 −2 −1 1 0
1
0 0 1 2 − 21 12
1 0 0 12 1
− 21
r1 →r1 −r3
r2 →r2 +2r3 2
−− −−−−−→ 0 1 0 0 0 1 .
1 1 1
0 0 1 2 −2 2
We have now reduced the left-hand-side to its reduced row echelon form, which is In , confirming that A is
invertible. The remaining right-hand-side must be A−1 , so
1 1
− 21
2 2
A−1 = 0 0 1 .
1 1 1
2 −2 2
1 6 4
Example. Let B = 2 4 −1 . This time:
−1 2 5
1 6 4 1 0 0 r2 →r2 −2r1 1 6 4 1 0 0
r3 →r3 +r1
2 4 −1 0 1 0 −−−−−−→ 0 −8 −9 −2 1 0
−1 2 5 0 0 1 0 8 9 1 0 1
1 6 4 1 0 0
r3 →r3 +r2
−− −−−−→ 0 −8 −9 −2 1 0 .
0 0 0 −1 1 1
We could continue with the elimination (exercise: try it!) but the zero row which has appeared in the
bottom of the left-hand-side is clearly not going away again. This means the reduced row echelon form of
B is not In , so B is not invertible!
23
4 Determinants
The determinant of a square matrix is a scalar (i.e. an element of the field from which the matrix entries
are drawn) which can be associated to it, and which contains a surprisingly large amount of information
about the matrix. To define it we need some extra notation:
det M = ad − bc.
Example. The determinant of the matrix A from the previous example is:
det A = A11 det A11 − A12 det A12 + A13 det A13
5 6 4 6 4 5
= 1 det − 2 det + 3 det
8 9 7 9 7 8
= 1 × (5 × 9 − 6 × 8) − 2 × (4 × 9 − 6 × 7) + 3 × (4 × 8 − 5 × 7)
= −3 + 12 − 9
=0
Theorem 4.1. For any n × n matrix M over a field and any 1 ≤ i ≤ n and 1 ≤ j ≤ n:
Pn i+k M
(i) det M = k=1 (−1) ik det M ik ;
Pn k+j M
(ii) det M = k=1 (−1) kj det M kj .
Proof. Omitted.
Computing the determinant using (i) is called expanding along the ith row, while using (ii) is
expanding down the jth column. Notice that expanding along the first row just means using the
definition of the determinant. But depending on the form of the matrix, choosing the row or column to
expand along/down carefully can make it much easier to compute the determinant.
24
1 0 3
Example. Suppose we wish to compute det 2 0 9 . Noticing the two 0 entries in the second column,
1 4 6
it makes sense to expand down this column, giving
2+1 2 9 2+2 1 3 2+3 1 3
det A = (−1) ×0×det + (−1) ×0×det + (−1) ×4×det = −4×3 = −12.
1 6 1 6 2 9
Using the definition (i.e. expanding along the first row) would have given the same answer (check for
yourself !), but with more calculation.
Terminology. The value det M ij is called that (i, j) minor of M , while the value (−1)i+j det M ij is
the (i, j) cofactor of M . The method we have seen for computing the determinant is called cofactor
expansion or Laplace10 expansion. (Note that some people who write mij for matrix entries use the
notation Mij for the (i, j) minor of M .)
Theorem 4.2. Let M be an n × n upper triangular matrix over a field. Then det M is the product of the
entries on the main diagonal:
Yn
det M = Mii .
i=1
Proof. In is clearly upper triangular, so det In is the product of the entries on the main diagonal,
which is 1.
Corollary 4.5. If A and B are square matrices over a field and are row equivalent, then
det A = 0 ⇐⇒ det B = 0.
10
After Pierre-Simon, Marquis de Laplace (pronounced “la-PLASS”), 1749–1827.
25
Proof. It follows from Theorem 4.4 that none of the types of elementary row operations ever change whether
the determinant is 0. If A and B are row equivalent then there is a sequence of row operations transforming
A into B, and none of these can ever change whether the determinant is 0.
We will prove Theorem 4.4 shortly, but first let’s see how it helps us calculate a determinant. The idea
is to use row operations to transform the matrix into a form where the determinant is easy to calculate,
and keep track of how the operations affect the determinant so that we can recover the determinant of
the original matrix. Theorem 4.2 suggests it makes sense to aim for an upper triangular matrix; we know
every square matrix is row equivalent to one of these because row echelon matrices are upper triangular,
but usually it is not necessary to go as far as finding a row echelon matrix:
7 1
0 6 6
Example. Let’s find the determinant of the matrix 0 0 −6 .
1 −2 −4
Applying the operations r1 → 6r1 , r3 → r3 − r1 , r2 ↔ r3 gives an upper triangular matrix:
1 0 7
0 −2 −11 .
0 0 −6
By Theorem 4.2 the determinant of this matrix is 1 × (−2) × (−6) = 12. By Theorem 4.4 the operation
r1 → 6r1 will have multiplied the determinant by 6, while the row swap will have changed the sign, so the
determinant of our original matrix must be − 16 (12) = −2.
Another application of Theorem 4.4 is to obtain the determinants of elementary matrices; this will prove
useful later.
Corollary 4.6. (i) det Eri →αri =
(ii) det Eri →ri +λrj =
(iii) det Eri ↔rj =
Proof. Each part follows by the corresponding part of Theorem 4.4 from the fact that Eρ is obtained from
In by the corresponding row operation ρ. For (iii), for example, Eri ↔rj is obtained from In by a row swap,
so by Theorem 4.4(iii) and Corollary 4.3 we have
det Eri ↔rj = − det In = −1.
Exercise. Do the other two cases.
26
For each k, notice that Ark is an (n − 1) × (n − 1) matrix with two rows the same. Hence, by the inductive
hypothesis, all of these matrices have determinant 0, so det A = 0.
Question/Exercise. Why, in the proof of Lemma 4.7, do we have to start the induction from n = 3? In
other words, why can’t we just use n = 1 as the base case?
Lemma 4.8. Let X, Y and Z be n × n matrices which are all the same except in row i, and suppose
Zij = Xij + Yij for all j. Then det Z = det X + det Y
Proof. Expand det Z along the ith row (exercise: write out the details!).
ri →ri +λrj
(ii) Suppose A −−−−−−−→ B. Let D be the matrix which is the same as A and B, except that the ith row
is λ times the jth row of A. Then we are in the position of Lemma 4.8 with X = A, Y = D and Z = B, so
we have
det B = det A + det D.
Now let C be the matrix which is the same as D, except that the ith row is exactly the jth row of A. Then
D can be obtained from C by applying the transformation ri → λri , so by part (i), det D = λ det C. But C
has two rows (rows i and j) the same, so by Lemma 4.7, det D = λ0 = 0. Thus, det B = det A + 0 = det A.
ri ↔rj
(iii) Suppose A −−−−→ B. We define matrices F , G, H, P and Q which are the same as A (and B) except
in rows i and j, where their entries are as follows:
Now we are in the position of Lemma 4.8 with X = G, Y = H and Z = F , so we have det F = det G+det H.
Similarly, the conditions of Lemma 4.8 are satisfied with X = A, Y = Q, Z = G and also with X = B, Y =
P, Z = H so we obtain
det G = det A + det Q and det H = det B + det P.
Notice also that each of F , P and Q has two identical rows, so by Lemma 4.7 they all have determinant 0.
Hence
0 = det F = det G + det H = det A + det Q + det P + det B = det A + det B
so that det A = − det B as required.
27
4.5 Determinants and Inverses
We saw in Section 1.6 that a 2 × 2 matrix is invertible if and only its determinant is non-0. We now know
nearly enough to prove the corresponding statement for n × n matrices: just one more lemma is needed.
Lemma 4.9. Let M be an n × n reduced row echelon matrix over a field. Then either M = In or M has a
zero row and has determinant 0.
Proof. By Theorem 2.2, A is row equivalent to a reduced row echelon matrix M . By Theorem 3.4, A is
invertible if and only if M = In . By Lemma 4.9, M = In if and only if det M 6= 0. But M is row equivalent
to A, so by Corollary 4.5, det M 6= 0 if and only if det A 6= 0.
We also saw in Section 1.6 that for 2 × 2 matrices there is an explicit formula for the inverse in terms of
determinants. The formula generalises to larger matrices, although it is often too complicated to be useful
in practice:
Exercise. Check that if A is a 2 × 2 matrix, Cramer’s Rule simplifies to the formula from Section 1.6.
Proof. By strong induction on k. Suppose first k = 1. Then E1 = Eρ for some elementary row operation ρ.
Let M be an n × n matrix. Now Eρ M is obtained from M by the operation ρ. For each of the three types of
elementary row operation, we can check using the appropriate parts of Theorem 4.4 and Corollary 4.6 that
det(Eρ M ) = det Eρ det M . For example, if ρ is a row swap then by Theorem 4.4(i), det(Eρ M ) = − det M
and by Corollary 4.6(i) det(Eρ ) = −1.
Now let k ≥ 2 and suppose the statement is true for smaller k. Then applying the inductive hypothesis
three times, with the role of M played first by E1 M , then by M itself, and finally by E1 , we obtain:
28
Proof. Write A = Ek . . . E1 M where M is the reduced row echelon form of A and the Ei s are elementary
matrices. Then by Lemma 4.12:
Case 2: M 6= In . In this case A is not invertible and so by Theorem 4.10, det A = 0. Also, by Lemma 4.9
M has a row of zeros. It follows that M B has a row of zeros, and expanding along this row we see that
det(M B) = 0. Now
det(AB) = det(Ek . . . E1 ) det(M B) = 0 = (det A)(det B).
29
5 Vectors, Eigenvalues and Subspaces
5.1 Vectors and Matrices
Recall that real n-space is the set
Rn = {(a1 , . . . , an ) | a1 , . . . , an ∈ R}.
The elements of Rn are called (real) n-vectors, or just vectors when n is clear. More generally, if K is a
field then
K n = {(a1 , . . . , an ) | a1 , . . . , an ∈ K}.
is the space of n-vectors over K.
The Zero Vector. We write 0n for the zero vector (0, . . . , 0) ∈ K n , or just 0 where n is clear.
Vectors as Matrices. There is an obvious way to think about an n-vector (a1 , . . . , an ) as a 1 × n row
matrix (a1 a2 . . . an ); a vector thought of in this way we call a row n-vector. Similarly, a vector may be
thought of as an n × 1 column matrix, in which case we call it a column n-vector. Notice that each row
n-vector is the transpose of the same vector thought of as a column n-vector.
Vector Operations. We know how to add matrices of the same size (Section 1.1), and how to scale a
matrix by a scalar (Section 1.2). We can also define addition and scaling on vectors, by thinking of them as
row (or column) vectors: The resulting operations are:
Since row and column vectors are just matrices, we can also multiply them by matrices which have the right
size. For example, if v is a column n-vector and M is an n × n matrix13 then M v is defined (since the
number of columns in M equals the number of rows in v) and gives another column n-vector. On the other,
hand, vM is not defined14 .
Definition. Let A be an n × n matrix over a field. We say that a scalar λ is an eigenvalue15 of A if there
is a non-zero column n-vector (that is, an n × 1 matrix) X such that
AX = λX.
30
Definition. If A is an n × n matrix then the characteristic polynomial of A is defined by
Remark. The symbol x here is a “formal variable”. What do we mean by A − xIn ? It is a matrix whose
entries are polynomials with variable x. Polynomials don’t actually form a field (exercise: which axioms
fail?) but its determinant is calculated just as if they did, by expanding along a row or column, and using
algebraic addition and multiplication of polynomials in place of the normal addition and multiplication of
numbers. What comes out is a polynomial (of degree n, as it happens) in the variable x.
and
3−x 1
χA (x) = det(A − xIn ) = det
1 3−x
= (3 − x) × (3 − x) − 1 × 1 = (3 − x)2 − 1 = x2 − 6x + 8 = (x − 4)(x − 2).
I have factorised the polynomial so we can see what the roots are: notice that one of them is 4, which we
saw was an eigenvalue of A. In fact, this is a general phenomemon:
Theorem 5.1. Let A be an n × n matrix over a field K, and λ ∈ K. Then λ is an eigenvalue of A if and
only if χA (λ) = 0.
Proof. We have
Example (continued). Keeping A as above, we know that χA (λ) has roots 2 and 4. We already know
that 4 is an eigenvalue, but Theorem
5.1 tells us that 2 is an eigenvalue as well. What are the corresponding
x1
eigenvectors? If X = is one of them then AX = 2X, or equivalently, (A − 2In )X = 02×1 , that is,
x2
1 1 x1 0
= .
1 1 x2 0
The solutions to this are x1 = t, x2 = −t, so the set of all eigenvectors corresponding to the eigenvalue 2 is:
t
| t ∈ R, t 6= 0 .
−t
Notice how we exclude the case t = 0, since this gives the zero vector which by definition is not an eigenvector.
Remark. By the same line of reasoning, the collection of eigenvectors corresponding to a given eigenvalue
will always be the solution space to some system of homogeneous equations (minus the trivial solution,
which is the zero vector).
31
0 1
Example. What are the eigenvalues and eigenvectors of A = ? This time
−1 0
−x 1
χA (x) = det(A − xI2 ) = det = (−x)(−x) − (1 × (−1)) = x2 + 1
−1 −x
so the eigenvalues are the solutions to x2 + 1 = 0. If we regard A as a matrix over R (or Q) there are no
solutions in K, so A does not have any (real) eigenvalues. On the other hand, A is also a matrix over C
(every real number is also a complex number!), and this equation does have solutions in C, namely x = i
and x = −i. So A has (complex) eigenvalues i and −i. (Can you find the corresponding eigenvectors?)
v = λ1 w1 + λ2 w2 + · · · + λt wt .
(ii) span(S) is the set of all vectors which are linear combinations of w1 , . . . , wt , that is
span(S) = {µ1 w1 + µ2 w2 + · · · + µt wt | µ1 , . . . , µt ∈ K}
(i) Is v = (9, 2, 7) a linear combination of w1 and w2 ? In other words, can we find λ1 , λ2 ∈ R such that
v = λ1 w1 + λ2 w2 ? In other words, such that
This is a just a system of linear equations. Solving (e.g. by Gaussian elimination), we see that there
is a solution (λ1 = −3 and λ2 = 2). So the answer is YES, v is a linear combination of w1 and w2 .
(ii) Is v = (4, −1, 8) a linear combination of w1 and w2 ? This time v = λ1 w1 + λ2 w2 would mean
that is,
λ1 + 6λ2 = 4, 2λ1 + 4λ2 = −1 and − 1λ1 + 2λ2 = 8.
These equations have no solutions (check!), so this time v is not a linear combination of w1 and w2 .
span(S) = {λ1 w1 + λ2 w2 | λ1 , λ2 ∈ R}
= {(λ1 + 6λ2 , 2λ1 + 4λ2 , −1λ1 + 2λ2 ) | λ1 , λ2 ∈ R}
32
Remark. Notice that the zero vector is always in span(S), since it can be written as
Remark/Definition. What if S is the empty set ∅? You might expect span(∅) to be the empty set, but
it turns out to be convenient to define
span(∅) = {0}.
so that, even in this case, the zero vector is always in the span.
5.4 Subspaces
(i) Rn of Rn .
(ii) {0n } of Rn .
(iii) {(λ, 0) | λ ∈ R} of R2 .
(iv) {(µ, µ) | µ ∈ R} of R2 .
(v) {(λ, 1) | λ ∈ R} of R2 .
Proof. We have seen that the zero vector is always in span(S), so span(S) is certainly not empty.
Let u, v ∈ span(S). Then
u = λ1 w1 + λ2 w2 + · · · + λt wt and v = µ1 w1 + µ2 w2 + · · · + µt wt
u + v = (λ1 w1 + λ2 w2 + · · · + λt wt ) + (µ1 w1 + µ2 w2 + · · · + µt wt )
= (λ1 + µ1 )w1 + (λ2 + µ2 )w2 + · · · + (λt + µt )wt
u = λ1 w1 + λ2 w2 + · · · + λt wt
for some λ1 , . . . , λt ∈ K, so
33
The following theorem gives many more examples of subspaces.
Theorem 5.4. Let M be an m × n matrix over a field K. Let U be the set of all n × 1 column vectors X
such that
M X = 0m×1 .
Then U is a subspace of K n .
so that λu ∈ U .
Thus, U is a subspace of K n .
Definition. The subspace U in Theorem 5.4 is called the kernel or nullspace of the matrix M . (More on
this in the second part of the course.)
Corollary 5.5. For any homogeneous system of m linear equations with n variables, the solution space is
a subspace of K n .
Proof. Because the constant terms are all zero, when we write the equations in matrix form (see Section 2.4)
they have the form
M X = 0m×1 ,
where M is m × n matrix of coefficients and X is the column matrix of variables. So the solution set is by
definition exactly the kernel of the matrix of coefficients M .
Definition. S = {w1 , . . . , wt } ⊆ K n is a linearly independent set if the only way to write 0 as a linear
combination
0 = λ1 w1 + λ2 w2 + · · · + λt wt .
is with λ1 = λ2 = · · · = λt = 0. Otherwise S is a linearly dependent set.
Example. In R2 , the set {(1, 0), (0, 1)} is linearly independent. It should be obvious that
so the only way this can be 0 is if λ1 = λ2 = 0. Intuitively, if you start at the origin and travel some non-
zero distance in the x-direction, you clearly can’t return to the origin by travelling only some distance (even
“backwards”) in the y-direction. In this sense, the x-direction and y-direction are independent directions,
which is why the vectors (1, 0) (in the x-direction) and (0, 1) (in the y-direction) are linearly independent.
Example. Let w1 = (1, −2, 3), w2 = (5, 6, −1), w3 = (3, 2, 1) and S = {w1 , w2 , w3 } ⊆ R3 . Is S linearly
dependent or linearly independent? To answer this we need to consider solutions to
34
λ1 + 5λ2 + 3λ3 = 0, −2λ1 + 6λ2 + 2λ3 = 0, 3λ1 − λ2 + λ3 = 0
−t
λ1 = λ2 = , λ3 = t
2
for t ∈ R. Putting t = 2, for example, gives a solution with λ1 , λ2 , λ3 not all zero, so S is linearly dependent.17
Example. Let w1 = (4, −1, 2), w2 = (−4, 10, 2) and S = {w1 , w2 } ⊆ R3 . Is S linearly dependent or
independent? This time we consider the solutions to
Here the only solution is λ1 = λ2 = 0 (check!), which means that S is linearly independent.
(ii) span(S) = U
Example. Let S = {(4, −1, 2), (−4, 10, 2)}. We saw in the example at the end of Section 5.5 that S is
linearly independent, so S is a basis for span(S).
Theorem 5.6. Suppose U is a subspace of K n . Then U has a finite basis, and if B1 and B2 are two different
bases18 for U then |B1 | = |B2 |.
Definition. If U is a subspace of K n then the dimension of U is |B| where B is a basis for U . We write
dim U for the dimension of U . (Notice how Theorem 5.6 ensures that this definition makes sense: without
it, the dimension might depend on which basis B we choose!)
Lemma 5.7. Let U be a subspace of K n . If S ⊆ U , S is linearly independent and |S| = dim U then S is a
basis for U .
35
5.7 Geometry of Subspaces in Rn
An m-dimensional subspace of Rn basically looks like a copy of Rm sitting inside Rn and containing the
origin. In other words, it is a copy of m-space inside n-space, for example a line in space, a plane in space,
a 3-space in 4-space and so on.
Example. A 2-dimensional subspace of R3 is simply a plane (through the origin, since a subspace always
contains the zero vector). A 1-dimensional subspace is a line through the origin, while a 0-dimensional
subspace is a point (which has to be the origin!).
Remark. Intuitively, you can think of a basis as giving a system of m coordinates for describing points
in the subspace. For example, if U is a 2-dimensional subspace of R3 then it is a plane, so we should be able
to describe points on it with only 2 coordinates. Choose a basis {u, v} for U . Then each point in U can be
written as λu + µv in exactly one way, and we can think of (λ, µ) as its coordinates in the plane. Notice
how the basis vectors themselves are u = 1u + 0v and v = 0u + 1v, so they get coordinates (1, 0) and (0, 1)
respectively.
36
6 Orthogonality
In this chapter, we shall work only with vectors and matrices over R, and not over a general field. (The
reason for this will become apparent.)
Remark. If u and v are a column n-vectors, notice that hu | vi is just the entry of the 1 × 1 matrix uT v.
Proposition 6.1. Let u, v ∈ Rn . Then
(i) hu | vi = hv | ui;
(ii) hu + v | wi = hu | wi + hv | wi;
(iii) hλu | vi = λhu | vi = hu | λvi;
(iv) hv | vi ≥ 0, and hv | vi = 0 ⇐⇒ v = 0n .
Proof. Let u = (a1 , . . . , an ), v = (b1 , . . . , bn ) and w = (c1 , . . . cn ).
(i) Clear from the definition.
(ii) u + v = (a1 + b1 , a2 + b2 , . . . , an + bn ), so
hu + v | wi = (a1 +b1 )c1 +(a2 +b2 )c2 +· · ·+(an +bn )cn = (a1 c1 . . . an cn )+(b1 c1 +. . . bn cn ) = hu | wi+hv | wi.
Definition. For v = (a1 , a2 , . . . , an ) ∈ Rn , the (Euclidean) norm (also called the magnitude or length)
of v is p q
||v|| = hv | vi = a21 + a22 + · · · + a2n .
Remark. The norm of v is just the distance from the origin to the point with coordinates given by
v. (Notice that in 2 dimensions this is Pythagoras’ Theorem!). p In particular, note that ||v|| ≥ 0 and
||v|| = 0 ⇐⇒ v = 0n . If n = 1, say v = (v1 ), notice that ||v|| = v12 is just the absolute value |v1 | of the
entry v1 , that is, v1 if v1 ≥ 0 and −v1 if v1 < 0.
Remark. The norm gives us an algebraic way to talk about distance in Rn : the distance between the
points with coordinate vectors a and b is ||a − b||. This is the “usual” notion of the distance between two
points (there are others!) in Rn , as first studied in detail by Euclid.
19
After Euclid of Alexandria, 3rd-4th century BCE. He didn’t invent inner products — vectors in the abstract sense we are
studying are a 19th century innovation — but we’ll see shortly that this definition gives a notion of distance in Rn which makes
it into the geometric space which Euclid studied. The inner product is sometimes called the scalar product (because it gives
a scalar) or the dot product (because some people write it u · v instead of hu | vi).
20
For the first time in the whole course, we are using here a fact about the real numbers which isn’t true for fields in general.
Actually property (iv) in the statement of the proposition does not really make sense over C (for example) since the order ≥
does not make sense on the complex numbers. It turns out that we can do something similar to the inner product for complex
numbers (see Exercise Sheet 5) but not for fields in general.
37
6.2 Orthogonality and Orthonormality
Definition. Two vectors u and v are called orthogonal if hu | vi = 0. A set S ⊆ Rn is called an orthogonal
set if every pair of vectors from S is orthogonal.
Remark. Geometrically, vectors are orthogonal if the lines from the origin to the corresponding points are
at right angles (or if at least one of them is the zero vector).
Example. Let u = (−1, 3, 2), v = (4, 2, −1) and w = (1, 1, 2). Then
hu | vi = (−1 × 4) + (3 × 2) + (2 × −1) = 0
so u and v are orthogonal. On the other hand,
hu | wi = (−1 × 1) + 3 × 1 + 2 × 2 = 6
so u and w are not orthogonal. Hence, the set {u, v} is orthogonal, but the set {u, v, w} is not. Also,
p p √
||u|| = hu | ui = (−1)2 + 32 + 22 = 14
Example. In R3 the set {(1, 0, 0), (0, 0, 1)} is orthonormal (exercise: check!).
Lemma 6.3. Let v ∈ Rn and λ ∈ R. Then ||λv|| = |λ| ||v||.
Proof. Using Proposition 6.1(iii) again,
p p p
||λv|| = hλv | λvi = λ2 hv | vi = |λ| hv | vi = |λ| ||v||.
1
Corollary 6.4. If v ∈ Rn is a non-zero vector then ||v|| v is a unit vector.
Proof. Since ||v|| is positive, by Lemma 6.3 we have
1 1
||v|| = 1 ||v|| = 1.
|| v|| =
||v|| ||v|| ||v||
Example. Let u = (0, 1, 0), v = (1, 0, 1) and w = (1, 0, −1) and let S = {u, v, w}. Then hu | vi = hu | wi =
hv | wi = 0 (exercise: √check!) so S is an orthogonal set. Although ||u|| = 1 the set is not orthonormal
because ||v|| = ||w|| = 2 (exercise: again, check!). However, by Corollary 6.4 and Proposition 6.2 the
set
1 1 1 1 1 1
u, v, w = (0, 1, 0), √ , 0, √ , √ , 0, − √
||v|| ||w|| 2 2 2 2
is orthonormal.
38
6.3 Orthonormal Bases
Definition. Let U be a subspace of Rn and let S ⊆ U . We say that S is an orthonormal basis for U if
S is an orthonormal set and S is a basis for U .
Recall (from Section 5.6) that if S is a basis for U then we can write every vector in U as a linear
combination of vectors in S. In general, finding how to write a given vector in this way involves solving a
system of linear equations. When S is an orthonormal basis, however, things are much easier:
Lemma 6.5. Let U be a subspace of Rn and S = {v1 , v2 , . . . , vk } an orthonormal basis for U . Then for
every u ∈ U ,
u = hu | v1 iv1 + hu | v2 iv2 + · · · + hu | vk ivk .
Proof. Since S is a basis for U and u ∈ U we can write u = λ1 v1 + · · · + λk vk for some λ1 , . . . , λk ∈ R. Now
for each i,
hu | vi i = hλ1 v1 + λ2 v2 + · · · + λk vk | vi i
= λ1 hv1 | vi i + λ2 hv2 | vi i + · · · + λk hvk | vi i (by Proposition 6.1)
= λi hvi | vi i (because hvj | vi i = 0 for j 6= i)
= λi (because hvi | vi i = 1)
Proof. Suppose
λ1 v1 + λ2 v2 + · · · + λk vk = 0n .
Recalling the definition of linear independence (see Section 5.5), we need to show that λ1 = · · · = λk = 0.
For each i we have
0 = h0n | vi i
= hλ1 v1 + · · · + λk vk | vi i
= λ1 hv1 | vi i + · · · + λk hvk | vi i (by Proposition 6.1)
= λi hvi | vi i (because hvj | vi i = 0 for j 6= i)
(in fact orthonormal) set (check!) so, by Theorem 6.6, S is linearly independent. Since dim R3 = 3 = |S|,
Lemma 5.7 tells us that S is a basis (in fact, an orthonormal basis) for R3 .
Hence every element of R3 can be written as a linear combination of u, v and w, and because S is
an orthonormal basis, Lemma 6.5 gives us a practical way to do this. For example, consider x = (1, 1, 1) ∈ R3 .
We have:
1 7
(1, 1, 1) = x = hx | uiu + hx | viv + hx | wiw = u − v + w.
5 5
Remark. Recall from Section 5.7 that we can think of a basis for an m-dimensional subspace as giving a
system of m coordinates for describing points in the subspace. In particular, viewing R3 as a subspace of
itself, the basis S gives a different system of coordinates for R3 . The point which has “proper” coordinates
(1, 1, 1) has coordinates 1, − 51 , 75 in this new system of coordinates.
39
6.4 Gram-Schmidt Orthogonalization
We have seen that for understanding a subspace, it is useful to have an orthonormal basis. This naturally
leads to two questions: does every subspace have an orthonormal basis, and if so, how do we find it?
Theorem 6.7. Every subspace of Rn has an orthonormal basis.
By way of a proof, we will present an algorithm called the Gram-Schmidt21 Orthogonalization
Process, which answers the second question of how we find an orthonormal basis. The algorithm starts
with a basis for a subspace of Rn (recall that by Theorem 5.6, every subspace has one!) and produces from
it an orthonormal basis for the same subspace.
Let U be a subspace of Rn , and {u1 , . . . , uk } ⊆ Rn a (not necessarily orthonormal) basis for U .
Step 1. Set v1 = u1 .
Step k. Set
huk | vk−1 i huk | v2 i huk | v1 i
vk = u k − vk−1 − · · · − v2 − v1 .
hvk−1 | vk−1 i hv2 | v2 i hv1 | v1 i
Again we find that vk 6= 0, vk ∈ U and hvi | vk i = 0 for all i < k.
So we end up with non-zero vectors v1 , . . . , vk ∈ U such that each vj is orthogonal to each vi with i < j.
Thus, v1 , . . . , vk is an orthogonal (but not yet orthonormal!) subset of U . Now set
1 1 1
S = v1 , v2 , . . . , vk .
||v1 || ||v2 || ||vk ||
Then S is orthogonal by Proposition 6.2 and the elements are unit vectors by Corollary 6.4, so S is or-
thonormal. Now by Theorem 6.6, S is linearly independent. Finally,
40
Example. Let u1 = (1, 1, 1), u2 = (0, 3, 3) and u3 = (0, 0, 1). Consider the subspace
U = span({u1 , u2 , u3 }) ⊆ R3
of all linear combinations of these vectors. It is easy to check that these three vectors are linearly independent,
so they form a basis for U . However, they are neither orthogonal (check!) nor (except for u3 ) unit vectors,
so they are not an orthonormal basis. We apply the Gram-Schmidt process to compute an orthonormal
basis. First we set:
v1 = u1 = (1, 1, 1)
Now hu2 | v1 i = 6 and hv1 | v1 i = 3 so we set
hu2 | v1 i 6
v2 = u2 − v1 = (0, 3, 3) − (1, 1, 1) = (−2, 1, 1).
hv1 | v1 i 3
Thus,
1 1
{v1 , v2 , v3 } = (1, 1, 1), (−2, 1, 1), 0, − ,
2 2
is a basis of orthogonal vectors for the space U . To turn it into an orthonormal basis we just have to scale
to get unit vectors, so
1 1 1
√ (1, 1, 1), √ (−2, 1, 1), √ (0, −1, 1)
3 6 2
is an orthonormal basis for U .
Remark. What is U in the previous example? In fact since {u1 , u2 , u3 } is a linearly independent set of size
3 = dim R3 it must be that U = R3 ! So what we have found is an orthonormal basis for R3 .
Exercise. Use the Gram-Schmidt process to convert {u1 , u2 , u3 } into an orthonormal basis for R3 where
(i) A is orthogonal;
(ii) AT A = In ;
(iii) AAT = In ;
(iv) the columns of A (viewed as column n-vectors) form an orthonormal basis for Rn ;
(v) the rows of A (viewed as row n-vectors) form an orthonormal basis for Rn ;
Proof. The equivalence of (i), (ii) and (iii) follows from Exercise 5.8.
Now let c1 , . . . , cn be the columns of A, viewed as column n-vectors. It follows from the definitions of
matrix multiplication, the tranpose and the inner product that
n
X n
X
T T
(A A)ij = (A )ik Akj = Aki Akj = hci | cj i
k=1 k=1
41
for all i and j. So AT A = In means exactly that
(AT A)ii = hci | ci i = 1 for all i and (AT A)ij = hci | cj i = 0 for i 6= j,
which means exactly that {c1 , . . . , cn } is an orthonormal set. Since dim Rn = n, by Lemma 5.7 this is the
same as {c1 , . . . , cn } being an orthonormal basis. This shows that (ii) ⇐⇒ (iv).
A very similar argument with rows instead of columns (exercise!) shows (iii) ⇐⇒ (v).
Remark/Warning. The terminology here is very confusing: an orthogonal matrix is one whose rows
(and columns) are orthonormal, not just orthogonal! (It might be better if orthogonal matrices were called
orthonormal matrices instead, but unfortunately this is an example of where illogical names have become
too ingrained to change.)
Remark. The equivalence of (iv) and (v) alone is quite remarkable. Given an orthonormal basis for Rn ,
write it down as the rows of an n × n matrix. Magically23 , the columns of the matrix will be another
orthonormal basis! For example, consider the orthonormal basis for R3 which we obtained in the example
in Section 6.4. Writing the basis vectors as the rows of a matrix we get
1 1 1
√ √ √
3 3 3
−2
√ √1 √1
A= .
6 6 6
−1 √1
0 √
2 2
Theorem 6.8 tells us both that A will be an orthogonal matrix (exercise: check!), and that the vectors
which make up the columns of A will form another orthonormal basis for R3 , namely:
1 −2 1 1 −1 1 1 1
√ , √ ,0 , √ , √ , √ , √ , √ , √ .
3 6 3 6 2 3 6 2
In fact, the rows and the columns of a matrix are often connected in deep and mysterious ways; this is an
example of a very general mathematical phenomenon called duality.
23
Disclaimer: the University of Manchester does not really believe in magic. ;-)
42