MA100 MT Course Pack 2020-21
MA100 MT Course Pack 2020-21
MA100 MT Course Pack 2020-21
1 Orientation 7
1.1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Teaching arrangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Course materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Current exam structure for MA100 . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Maple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1
5 One-Variable Calculus, Part 3 of 7 47
5.1 Increasing and decreasing functions . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Local extrema and strict local extrema . . . . . . . . . . . . . . . . . . . . 49
5.3 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Tests for classifying stationary points . . . . . . . . . . . . . . . . . . . . . 52
5.5 Convex and concave functions . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Inflection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 59
2
9.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 109
11 Matrices, 2 of 3 120
11.1 Solving systems of n linear equations for n unknowns . . . . . . . . . . . . 120
11.2 Elementary row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11.3 Elementary matrices and row-equivalence . . . . . . . . . . . . . . . . . . . 125
11.4 Theorems on matrix invertibility . . . . . . . . . . . . . . . . . . . . . . . 128
11.5 Inversion algorithm based on elementary row operations . . . . . . . . . . . 130
11.6 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.7 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 133
12 Matrices, 3 of 3 133
12.1 Minors, cofactors and the determinant . . . . . . . . . . . . . . . . . . . . 133
12.2 The properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . 137
12.3 Calculating determinants using row operations . . . . . . . . . . . . . . . . 140
12.4 Inverting a matrix using cofactors and the determinant . . . . . . . . . . . 142
12.5 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.6 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.7 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 146
3
14 Developing Geometric Insight, 2 of 2 160
14.1 Visualising vectors and operations in R3 . . . . . . . . . . . . . . . . . . . 160
14.2 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.3 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
14.4 Geometric and algebraic approaches to linear systems . . . . . . . . . . . . 171
14.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
14.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 177
4
20 Vector Spaces, 4 of 4 212
20.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
20.2 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
20.3 The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . 215
20.4 Angle and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
20.5 Pythagoras’ theorem and the triangle inequality . . . . . . . . . . . . . . . 216
20.6 Orthonormality and the Gram-Schmidt process . . . . . . . . . . . . . . . 218
20.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
20.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 220
5
Preface
These lecture notes are intended as a self-contained study resource for the MA100 Mathe-
matical Methods course at the LSE. At the same time, they are designed to complement
the MA100 course texts, ‘Linear Algebra, Concepts and Methods’ by Martin Anthony and
Michele Harvey, and ‘Calculus, Concepts and Methods’ by Ken Binmore and Joan Davies.
I am grateful to Martin Anthony and Michele Harvey for allowing me to use some materials
from their ‘Linear Algebra, Concepts and Methods’ textbook and to Michele Harvey for
commenting on a draft of the Calculus part of these lecture notes. I am also grateful to Siri
Kouletsis for her invaluable help with typing and editing the manuscript and for various
improvements to its content.
6
1 Orientation
1.1 General information
Lecturer:
Ioannis Kouletsis
Class teachers:
Abdulla Al-Othman, Phil Chan, Sam Fendrich, Jenjira Jaimunk, Ioannis Kouletsis, Siri
Kouletsis, Maria Molina-Domene, Kamila Nowakowicz, Ruilan Wang, Aled Williams, Michael
Yiasemides and George Zouros.
7
In a typical lecture video presentation, you are given a few minutes to attempt a part of
an exercise based on the corresponding section of the lecture notes and then you compare
your solutions to mine, presented immediately afterwards.
The classes:
These are weekly one-hour interactive sessions where you strengthen your understanding
of the course material with the help of your class teacher. You have a class per week,
starting in week 2 and finishing in week 11 of each term. Some of these classes take place
on campus and some are conducted online, but they all have the same structure and cover
the same material.
Each student is assigned to a class group of about fifteen students who usually follow
the same degree programme. Your personal timetable indicates the day and time of your
class along with the room number. Please note that class attendance is compulsory and is
recorded by your class teacher.
8
The class forums:
There is a forum dedicated to each class group where you can post questions about the self-
study exercises and related theory (in preparation for your class) as well as clarifications
about the class exercises and other class materials (after your class). Your classmates can
answer your questions, and your class teacher will monitor the forum and intervene as
needed.
Class structure:
A typical class consists of three parts:
In the first part of the class, your class teacher will give you a ‘consolidation exercise’
in order to ensure that the main topics covered in the previous week’s class have been
understood. Your class teacher will provide the solution to this exercise afterwards. Please
feel free to ask your class teacher for further clarifications about the underlying concepts
and methods if needed.
In the second part of the class, the focus shifts to the most recent material; that is,
the material corresponding to the previous week’s lectures notes and videos. Your class
teacher will typically answer questions you may have about the theory, or expand on a
topic, or show you an application, and will then guide you through a set of ‘current topic
exercises’. As with the ‘consolidation exercise’, your class teacher will provide solutions to
these exercises afterwards.
Finally, in the third part of the class, you will initiate the ‘in-class practice exercise’.
As already mentioned, you can complete this exercise after class and should submit your
solution to your class teacher electronically. Please note that the ‘in-class practice exercise’
will always be based on the topics practised during the second part of the class.
9
Solutions to the class materials:
Solutions to the ‘consolidation exercise’, ‘current topic exercises’ and ‘in-class practice
exercise’ conducted in class during week n will be posted on MA100 Moodle at the end
of week n. I strongly recommend that you go through these solutions before the following
week’s class so that you do not leave any gaps in your understanding.
10
The class format and the class forums provide an opportunity for establishing stronger
connections with your class teacher and fellow students. Moreover, your active engagement
with all the class components allows the School to get to know you. A record of your class
attendance and weekly submission of work is kept throughout your studies and, along with
your exam results, forms the basis for future academic references.
Please keep in mind that some parts of the ‘consolidation exercises’, ‘current topic exercises’
and ‘in-class practice exercises’ are designed to be challenging, so do not worry too much
if you are not able to solve them all on your first attempt. This is natural and part of your
learning process. However, after the solutions to these exercises are discussed in class and
posted on the MA100 Moodle webpage, make sure that you fully understand them; that
is, solve the parts you did not manage to complete the first time around. This comment
applies to the ‘exercises for self study’ and all other exercises covered in the course as well;
it is not enough to read their solutions on MA100 Moodle – you need to keep attempting
these exercises until you have mastered the applicable theory and methods.
The pace of the course is manageable. However, if you fall behind the pace, the volume
of the unread material can progressively become overwhelming, especially since you have
other courses to study as well. It is therefore imperative to adopt good study habits from
the very beginning of the academic year; that is, from week 1. Do not make the common
mistake of thinking that a four-week break before the exams provides enough time to cover
unread material. At that point, you should only be refreshing material that you have
already practised thoroughly during the term.
Office hours:
You can find the office hours of the MA100 teaching staff posted on the Mathematics
Department site: https://www.lse.ac.uk/Mathematics
Email communication:
For all other issues related to the course, you can email me personally at [email protected].
11
• Martin Anthony and Michele Harvey, Linear Algebra, Concepts and Methods, Cambridge
University Press,
• Ken Binmore and Joan Davies, Calculus, Concepts and Methods, Cambridge University
Press.
Reference texts: The following books provide additional reading and can be found in
the library:
• H. Anton, Elementary Linear Algebra (or the Applications Version of this book by Anton
and Rorres),
• D. Lay, Linear Algebra and its Applications,
• S. L. Salas and E. Hille, Calculus, One and Several Variables,
• Schaum Outline Series: Mathematics for Economists; Linear Algebra; Advanced Calculus;
Differential and Integral Calculus.
Solutions to the ‘exercises for self study’: These refer to the exercises found at the
end of each section of the lecture notes. Each set of solutions will be posted on the MA100
Moodle webpage under the relevant week. The solutions are very detailed; in essence,
they re-teach the lecture material from a practical perspective in order to ensure that you
appreciate the links that exist between the theory and its applications.
Solutions to the ‘consolidation exercises’, the ‘current topic exercises’ and the
‘in-class practice exercises’: These exercises and their solutions will be posted on the
MA100 Moodle webpage at the end of the week in which they are introduced in class.
Solutions to the ‘assignments with exam-style questions’: The solutions to these
assignments will be posted on the MA100 Moodle webpage in weeks 5 and 9 of each term;
that is, just after the corresponding submission deadlines. The assignments themselves will
be posted a couple of weeks earlier; that is, in weeks 3 and 7 of each term.
Solutions to past exams: Past examination papers from the last three years can be found
on the MA100 Moodle webpage. Their solutions will be released on the MA100 Moodle
webpage in due course. Older examination papers, without solutions, can be found in the
library. Please note that the MA100 exam structure changed in the 2015/16 academic year.
The 2014/15 paper followed the old exam structure which was based on a single exam in
the summer term. The exam structure introduced in 2015/16 – which is also our current
exam structure and is described in detail in section 1.4 below – is based on two exams, one
held in January and one held in May.
Regarding past exams: Past examination papers are useful for familiarising yourself
with the style of questioning, but I would advise against using them for the purpose of
deducing which topics are likely to appear in the forthcoming exams. The key to success in
MA100 is having understood the entire course material and having mastered the examples
from the lecture notes, the exercises covered in the lectures and workshops, the ‘exercises
for self study’, the ‘consolidation exercises’, the ‘current topic exercises’ and the ‘in-class
practice exercises’.
12
1.4 Current exam structure for MA100
The 2020/21 exam structure for MA100 consists of two exams. The first exam counts 25%
towards your final MA100 grade and will take place in week 0 of the Lent term, in January
2021. It will examine material covered in lectures 2-16. The second exam counts 75%
towards your final MA100 grade. This will take place in the summer term, in May 2021,
and will examine the entire material of the course with the emphasis being placed on the
material covered in lectures 17-39. The earlier material is examined only indirectly in the
second exam in the sense that it forms the foundations on which the later material rests.
I will give you further details as the course progresses.
1.5 Maple
Maple is a computer program which is widely used for analysing, visualising and solving
mathematical problems. You can try the Maple tutorial on the MA100 Moodle webpage
on any of the computer facilities offered by the School. The relevant link is posted on
the MA100 Moodle webpage under ‘Maple’. Note that Maple is not part of the MA100
examinable material.
1.7 Syllabus
By the end of the academic year we will have covered a range of mathematical topics in
Calculus and Linear algebra and some of their applications to Economics and Statistics.
In the Michaelmas term, we will introduce the logical framework and methods of proof, one-
variable calculus, Taylor polynomials, classification of stationary points, convexity, conics,
general graph sketching, global optimisation, constrained optimisation, integration tech-
niques, matrices, determinants, Gaussian elimination, linear Cartesian geometry, Cartesian
and vector parametric descriptions of flats, systems of linear equations, real coordinate vec-
tor spaces, general vector spaces, vector subspaces, linear independence, linear span, basis,
coordinates and dimension.
In the Lent term, we will start by discussing some special kinds of vector spaces required
for analysing linear transformations and then proceed to transition matrices, linear trans-
formations, similarity, eigenvalues and eigenvectors, algebraic and geometric multiplicities,
13
diagonalisation and orthogonal diagonalisation, quadratic forms, multi-variate calculus,
surfaces, contours, vertical sections, gradients, directional derivatives, constraint optimi-
sation including Lagrange’s method, vector-valued functions, the general chain rule, dif-
ferential and difference equations including some partial differential equations and, finally,
linear systems of differential and difference equations.
This brings us to the end of the introductory part of these lecture notes. Next, we get
started on our syllabus with a section on mathematical logic. This section is necessary for
understanding some methods of proof that we will encounter throughout the course.
14 is an even number.
p
T
F
The negation of a proposition p is denoted by ¬p and is simply the proposition ‘not p’.
Its truth table follows:
14
p ¬p
T F
F T
Example 2.2.1 Let p be the proposition ‘4 is an even number’. Then ¬p is the proposition
‘4 is not an even number’. In this case, p is true and ¬p is false. On the other hand, let
p be the proposition ‘π is an even number’. Then ¬p is the proposition ‘π is not an even
number’. In this case, p is false and ¬p is true.
A compound proposition is a proposition that is built up from simpler propositions
using linking words such as and, or, if-then, if-and-only-if. The rest of this subsection
is concerned with some fundamental compound propositions and their truth tables.
The conjunction of two propositions p and q is denoted by p ∧ q and is the proposition
‘p and q’. The conjunction p ∧ q is true only if the propositions p and q are both true. Its
truth table is given below:
p q p∧q
T T T
T F F
F T F
F F F
Example 2.2.2 Let p be the proposition ‘11 is a multiple of 2’ and q be the proposition
‘16 is a multiple of 2’. The conjunction p ∧ q amounts to the proposition ‘11 and 16
are multiples of 2’. In this case, the conjunction p ∧ q is false since its first constituent
proposition is false.
The disjunction of two propositions p and q is denoted by p ∨ q and is the proposition ‘p
or q’. The disjunction p ∨ q is false only if the propositions p and q are both false. This
idea is reflected in the following truth table:
p q p∨q
T T T
T F T
F T T
F F F
Example 2.2.3 Let p be the proposition ‘14 is a multiple of 2’ and q be the proposition
‘14 is a multiple of 3’. The disjunction p ∨ q amounts to the proposition ‘14 is a multiple of
2 or 3’. In this case, the disjunction p ∨ q is true because its first constituent proposition
is true.
The conditional proposition denoted by p ⇒ q is the proposition ‘if p, then q’, also
referred to as ‘p implies q’. The conditional proposition is false only if p is true and q is
false. The truth table for p ⇒ q is given below:
p q p⇒q
T T T
T F F
F T T
F F T
15
Remark 2.2.4 The last two rows of the truth table for p ⇒ q express the idea that if the
premise p on which the conditional proposition p ⇒ q rests is false, the proposition itself
is a true proposition.
A few examples of conditional propositions are presented below:
Example 2.2.5 Let p be the proposition ‘16 is a perfect square’ and q be the proposition
‘64 is a perfect square’. Since both propositions p and q are true, the proposition p ⇒ q is
true.
Example 2.2.6 Let p be the proposition ‘13 is a perfect square’ and q be the proposition
‘64 is a perfect square’. Since p is false, the proposition p ⇒ q is true.
Example 2.2.7 Let p be the proposition ‘3 is an even number’ and q be the proposition
‘π is an even number’. Since p is false, the proposition p ⇒ q is true.
Example 2.2.8 Let p be the proposition ‘5 is an even number’ and q be the proposition
‘5 is not an even number’. Again, since p is false, the proposition p ⇒ q is true. There is
no contradiction in the proposition ‘if 5 is an even number, then 5 is not an even number’
because the premise on which it rests is false.
Example 2.2.9 Let p be the proposition ‘5 is not an even number’ and q be the proposition
‘5 is an even number’. Now, p is true and q is false, so the proposition p ⇒ q is false.
Example 2.2.10 Let p be the proposition ‘16 is an even number’ and q be the propo-
sition ‘the sum of the angles of a triangle is 180 degrees’. Since both p and q are true,
the proposition p ⇒ q is true. Note that the propositions p and q seem to be logically
disconnected here.
Example 2.2.11 Let p be the proposition ‘13 is an even number’ and q be the proposition
‘the sum of the angles of a triangle is 32 degrees’. Since p is false, the proposition p ⇒ q
is true.
Remark 2.2.12 A theorem of the form ‘p implies q’ is usually associated with a deductive
process for inferring the conclusion q from the assumed truth of the premise p. One may
wonder how the conditional proposition p ⇒ q can be relevant in capturing this idea. After
all, the truth value of the conditional p ⇒ q is entirely determined by the truth values of its
constituent propositions p and q. Whether or not we can reach q from p through a series of
mathematical deductions is not relevant for establishing the truth value of the conditional
p ⇒ q. In fact, in Examples 2.2.10 and 2.2.11, we cannot even imagine connecting the
propositions p and q. How can the conditional p ⇒ q be the right mathematical object to
represent the process of inferring q from p? The answer to this question lies in the fact that
a valid mathematical reasoning used in a proof of the form ‘p implies q’ cannot produce a
false proposition q by assuming the truth of a proposition p that is actually true. Any such
reasoning can only produce (i) a true proposition q by assuming the truth of a proposition
p that is actually true, (ii) a true proposition q by assuming the truth of a proposition p
that is actually false or (iii) a false proposition q by assuming the truth of a proposition
p that is actually false. In each of these cases, T ⇒ T , F ⇒ T or F ⇒ F , the validity of
the deductive argument and hence the validity of the resulting proposition ‘p implies q’ is
indeed captured by the truth value T of the conditional p ⇒ q.
16
Remark 2.2.13 The fact that a valid mathematical reasoning can never lead to a propo-
sition of the form ‘T implies F ’ provides the logical foundations for the so-called ‘proof by
contradiction’ which we will discuss among other methods of proof in subsection 2.5.
Example 2.2.14 Referring to Example 2.2.8 and Remark 2.2.12, let us construct a valid
mathematical argument that establishes the truth of the proposition ‘if 5 is an even number,
then 5 is not an even number’ in agreement with the truth value T of the corresponding
conditional. Note that we need to assume the truth of the premise ‘5 is an even number’
and infer the conclusion ‘5 is not an even number’.
Indeed, if 5 is an even number, we have that 5 = 2k for some integer k. Then, since
5 = 2(5) − 5, we can write that 5 = 2(2k) − 5 which implies that 5 = 4k − 6 + 1. Realising
that 4k − 6 is a multiple of 2, we obtain the equation 5 = 2(2k − 3) + 1 where 2k − 3 is an
integer. Thus, we deduce that 5 is not an even number. Note that this is an illustration of
the case where we produce a true proposition by assuming the truth of a proposition that
is actually false.
The biconditional proposition denoted by p ⇔ q is the compound proposition ‘p if and
only if q’. Its truth table is given below:
p q p⇔q
T T T
T F F
F T F
F F T
p q p⇒q q⇒p
T T T T
T F F T
F T T F
F F T T
Example 2.2.17 Let p be the proposition ‘4 is even’ and q be the proposition ‘5 is even’.
The converse of the conditional proposition ‘if 4 is even, then 5 is even’ is the conditional
proposition ‘if 5 is even, then 4 is even’. Here, p is true and q is false, so the conditional
proposition p ⇒ q is false and its converse proposition q ⇒ p is true.
The contrapositive of the conditional proposition p ⇒ q is the conditional proposition
(¬q) ⇒ (¬p). The truth tables for these propositions are displayed together below:
17
p q ¬p ¬q p⇒q (¬q) ⇒ (¬p)
T T F F T T
T F F T F F
F T T F T T
F F T T T T
Remark 2.2.18 Note that the truth table of the proposition p ⇒ q is identical to that
of its contrapositive proposition (¬q) ⇒ (¬p).
Example 2.2.19 Let p be the proposition ‘4 is even’ and q be the proposition ‘4 + 1 is
odd’. The contrapositive of the conditional proposition ‘if 4 is even, then 4 + 1 is odd’ is
the conditional proposition ‘if 4 + 1 is not odd, then 4 is not even’. Here, both p and q
are true and hence their negations ¬p and ¬q are both false. The conditional proposition
p ⇒ q and its contrapositive proposition (¬q) ⇒ (¬p) are both true.
A final note on terminology: The conditional proposition ‘p ⇒ q’ is commonly known as
‘p is sufficient for q’ or, equivalently, as ‘q is necessary for p’. Indeed, provided that the
conditional proposition p ⇒ q is true, it must have one of the forms ‘T ⇒ T ’, ‘F ⇒ T ’ or
‘F ⇒ F ’. Hence, it is sufficient for p to be true in order for q to be true. However, it is
not necessary for p to be true in order for q to be true, because q can be true even if p is
false. On the other hand, it is necessary for q to be true in order for p to be true because
if this is not the case, then p has to be false. Also note that it is not sufficient for q to be
true in order for p to be true, because it may happen that q is true and p is false.
For similar reasons, the biconditional proposition p ⇔ q is known as ‘p is necessary and
sufficient for q’ or, equivalently, as ‘q is necessary and sufficient for p’.
p ¬p ¬(¬p)
T F T
F T F
In general, two propositions A and B that are built up from a single proposition p through
some logical operations such as ¬, ∧, ∨, ⇒, ⇔ are called logically equivalent if they
always have the same truth value; that is, for every truth value of p, the truth value of A
and the truth value of B are the same.
Remark 2.3.1 In the above example, the role of A is played by p and the role of B is
played by ¬(¬p). Note that both A and B are built up from p.
We can extend this idea and apply it to propositions A and B that are built up from two
propositions p and q.
Two propositions A and B that are built up from two propositions p and q are called
logically equivalent if they always have the same truth value; that is, for every pair of
truth values of p and q, the truth value of A and the truth value of B are the same.
18
Remark 2.3.2 Looking at the truth table preceding Remark 2.2.18, we realise that the
proposition A given by p ⇒ q and its contrapositive proposition B given by (¬q) ⇒ (¬p)
are logically equivalent. This fact can be very useful in some proofs as we will see in
subsection 2.5.
Example 2.3.3 In order to show that the biconditional proposition p ⇔ q is logically
equivalent to the conjunction of the conditional propositions p ⇒ q and q ⇒ p we construct
the following truth table and focus on its last two columns:
p q ¬p p⇒q (¬p) ∨ q
T T F T T
T F F F F
F T T T T
F F T T T
Example 2.3.5 We also show that the proposition ¬(p ∨ q) is logically equivalent to the
proposition (¬p) ∧ (¬q):
Remark 2.3.6 Similarly, we can show that the proposition ¬(p ∧ q) is logically equivalent
to the proposition (¬p) ∨ (¬q). The equivalence of these two propositions and the equiv-
alence of the propositions given in Example 2.3.5 are collectively known as de Morgan’s
Laws.
n is an even number.
This is a predicate that depends on a single variable n. For simplicity, let us assume that
the variable n is an integer and let us denote this predicate by P (n). We can now construct
19
propositions such as P (1), P (4) and P (21) whose truth values can be determined. P (1) is
false, P (4) is true and P (21) is false.
Remark 2.4.1 In general, we cannot assign a truth value to a predicate P (n) before
we specify the value of the variable n. However, this does not exclude the possibility
of constructing predicates whose truth values can be determined irrespectively of n. For
example, let n be an integer, and consider the conditional statement ‘if n is even, then n+1
is odd’. This is a predicate P (n) whose truth value is T for all n. Indeed, for every even
value of n, P (n) becomes a proposition of the form ‘T ⇒ T ’, which is true. For every odd
value of n, P (n) becomes a proposition of the form ‘F ⇒ F ’, which is also true. Hence,
for all n, P (n) is true. Similarly, the conjunction ‘n is even and n is odd’ is a predicate
P (n) whose truth value can be determined irrespectively of n. In this case, P (n) is false
for all n.
An existential statement is a statement which expresses the existence of at least one
object of a certain kind which has a particular property. Examples of existential statements
are given below:
An existential statement can be written in the form ∃n[P (n)]. The symbol ∃ is called the
existential quantifier and simply means ‘there exists’. The property P (n) satisfied by n
is enclosed in the square brackets.
Example 2.4.2 The existential statement ‘there exists a real number n such that n2 =
17’ can be written in the form ‘∃ real number n[n2 = 17]’. Similarly, the existential
statement ‘there exists an integer m such that 3 < m < 12’ can be written in the form
‘∃ integer m[3 < m < 12]’.
Remark 2.4.3 The existential statements given in Example 2.4.2 are both true. In order
to prove them, we simply need to find at least one real number n and at least one integer
m with the required properties. In general, in order to prove that an existential statement
is true we just need to find an example.
A universal statement is a statement which expresses the fact that all objects of a certain
kind have a particular property. Examples of universal statements are the following:
A universal statement can be written in the form ∀n[P (n)]. The symbol ∀ is called the
universal quantifier and simply means ‘for all’. The property P (n) satisfied by n is
enclosed in the square brackets.
Example 2.4.4 The universal statement ‘for all integers n, 2n is a even number’ can be
written in the form ‘∀ integers n[2n is even]’. Similarly, the universal statement ‘for all
integers m, if m2 +m is even, then m is even’ can be written in the form ‘∀ integers m[(m2 +
m is even) ⇒ (m is even)]’.
20
Remark 2.4.5 The first statement in Example 2.4.4 is true and is straightforward to
prove by considering separately the case where n is odd and the case where n is even. The
second statement in Example 2.4.4 is false. In order to establish that it is false we need
to find a counterexample. That is, we need to find an integer m such that m2 + m is even
and m is odd, thereby obtaining a conditional proposition of the form T ⇒ F which is
false. We can easily see that m = 1 provides such a counterexample. In general, in order
to prove that a universal statement is false we just need to find a counterexample.
In preparation for some methods of proof that we will discuss in subsection 2.5 and also
later in the course, let us write down the negation of an existential statement as well as
that of a universal statement:
The negation of the existential statement ∃n[P (n)] is the universal statement
∀n[¬P (n)].
The negation of the universal statement ∀n[P (n)] is the existential statement
∃n[¬P (n)].
! "
Remark 2.4.6 Equivalently, we can say that ¬ ∃n[P (n)] is logically equivalent to
! "
∀n[¬P (n)] and that ¬ ∀n[P (n)] is logically equivalent to ∃n[¬P (n)].
Example 2.4.7 Let n be an integer. The negation of the existential statement ‘there
exists an n such that n2 = 7’ is the universal statement ‘for all n, n2 )= 7’.
Example 2.4.8 Let n be a natural number. The negation of the universal statement ‘for
all n, n2 ≥ n’ is the existential statement ‘there exists an n such that n2 < n’.
21
Example 2.5.2 Let us prove the universal statement ‘for all integers n, if n2 is even,
then n is even’.
A direct proof here seems to be a difficult task. We can imagine starting it by letting
n2 = 2k for some integer√k, but then it is difficult to see how to proceed from there. We
cannot claim that n = ± 2k is even because√the only thing that we know about k is that
it is an integer, and not all integers k make 2k an even number.
Instead, let us prove this statement by contraposition. Letting p be the predicate ‘n2 is
even’ and q be the predicate ‘n is even’, we need to prove the logically equivalent statement
(¬q) ⇒ (¬p); that is, we need to show that ‘if n is not even, then n2 is not even’. Note
that, given that n is an integer, n2 is also an integer, so if they are not even they must be
odd. Thus, we are left to prove the statement ‘if n is odd, then n2 is odd’ whose truth can
be established by a direct proof. To this end, we let n = 2s + 1 for some integer s. Then,
n2 = (2s + 1)2 = 4s2 + 2s + 1 = 2(2s2 + s) + 1, so n2 has the form 2k + 1 where k = 2s2 + s
is an integer. Therefore, n2 is odd and our task has been completed.
A proof by example or counterexample is a convenient method for either proving
an existential statement ‘∃n[P (n)]’ or disproving a universal statement ‘∀n[P (n)]’. Recall
from subsection 2.4 that disproving a universal statement ‘∀n[P (n)]’ amounts to proving
the existential statement‘∃n[¬P (n)]’.
Example 2.5.3 Let us prove the existential statement ‘there exists an integer n such
2n = n2 ’.
An example suffices: n = 2 is such an integer.
Example 2.5.4 Let us disprove the universal statement ‘for all integers n, 2n )= n2 ’.
A counterexample suffices: n = 2. Note that disproving this statement amounts to prov-
ing its negation, which is logically equivalent to the existential statement encountered in
Example 2.5.3. Our ‘counterexample’ here is called an ‘example’ there.
A proof by contradiction is based on the following idea. Suppose that we want to prove
that a statement s is true. Instead of showing directly that the truth value of s is T , we
assume that s is false; in other words, we assume that ¬s is true. Our aim is to produce a
valid mathematical argument based on the assumed truth of the premise ¬s which leads to
some statement r that is false. The resulting conditional statement (¬s) ⇒ r is based on a
valid mathematical deduction so it is a true statement, as explained in Remark 2.2.12. As
such, (¬s) ⇒ r can only have one of the forms ‘T ⇒ T ’, ‘F ⇒ T ’ or ‘F ⇒ F ’. Given that
the truth value of r is F , we conclude that our statement (¬s) ⇒ r has the form ‘F ⇒ F ’.
In this way we deduce that the truth value of the premise ¬s is F and hence the truth
value of s is T !
In the special case where the statement s whose truth we want to establish has the con-
ditional form ‘p ⇒ q’, we need to assume that ‘p ⇒ q’ is false; that is, we need to assume
that ‘¬(p ⇒ q)’ is true. Using the fact established in subsection 2.3 that ‘p ⇒ q’
! is logically
"
equivalent to ‘(¬p) ∨ q’, we see that ‘¬(p ⇒ q)’ is logically equivalent to ‘¬ (¬p) ∨ q ’.
In turn, this is logically equivalent to ‘p ∧ (¬q)’ by one of the de Morgan’s laws. Thus,
in order to prove by contradiction that ‘p ⇒ q’ is true, we need to assume that ‘p ∧ (¬q)’
is true (equivalently, that p is true and q is false) and then produce a valid mathematical
argument that leads to some statement r that is false.
22
Let us apply this idea to the universal statement encountered in Example 2.5.2 in order to
prove its validity by contradiction.
Example 2.5.5 Show that for all integers n, if n2 is even, then n is even.
This is a conditional statement of the form ‘p ⇒ q’ where p is the predicate ‘n2 is even’
and q is the predicate ‘n is even’. As explained above, in order to establish the truth of
this conditional statement by contradiction, we need to assume that ‘n2 is even’ and ‘n is
not even’ and produce a valid mathematical argument that leads to some statement r that
is false.
We first use the fact that n is an integer, so if it not even, then it is odd. Therefore, we
assume that ‘n2 is even’ and ‘n is odd’, and our aim is to reach a contradiction in the form
of some false statement r.
Assuming that n is odd, let n = 2s + 1 for some integer s. Based on that assumption,
we see that n2 = (2s + 1)2 = 4s2 + 4s + 1 = 2(2s2 + 2s) + 1 is of the form 2k + 1 where
k = 2s2 + 2s is an integer, so n2 is odd. However, we have also assumed that n2 is even.
Thus, we have reached the statement r that ‘n2 is both odd and even’ which is clearly
false. Our task has been completed.
q,
(p ∧ q) ∨ q,
and
(p ∨ q) ∧ q
and
∀ odd n [(2n + n3 ) is odd].
Show that both these statements are true.
Exercise 2.6.3 (i) Use a truth table to prove that the proposition ‘¬(p ∧ q)’ is logically
equivalent to the proposition ‘(¬p) ∨ (¬q)’.
(ii) Let p be the proposition ‘4 is an even integer’ and q be the proposition ‘5 is an even
integer’. Express in words the propositions ‘¬(p∧q)’ and ‘(¬p)∨(¬q)’ in order to appreciate
their logical equivalence.
Exercise 2.6.4 (i) Use a truth table to show that the proposition ‘(q ⇒ r) ∧ (q ⇒ s)’ is
logically equivalent to the proposition ‘q ⇒ (r ∧ s)’.
23
(ii) Let r be a proposition whose truth value is F and let s be a proposition whose truth
value is T . Suppose that the truth value of the proposition ‘(q ⇒ r) ∧ (q ⇒ s)’ is T . What
can you say about the truth value of q?
Exercise 2.6.6 For integers n, consider the universal statement ‘∀n [(n2 + 3n) is odd]’.
(i) Prove that this statement is false.
For integers n, consider the existential statement ‘∃n [(n2 + 3n) is odd]’.
(ii) Use the definitions given in Exercise 2.6.5 to prove that this statement is false as well.
Hint: Prove instead that its negation is true.
24
2.7 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Some paragraphs on logic can be found on pages 8 and 9 of our Algebra Textbook as part of
its preliminaries. I would also recommend that you read through the entire preliminaries,
from page 1 to page 9.
• N. Biggs, Discrete Mathematics, Oxford University Press.
If you have a copy of this book, chapters 1 and 3 can be useful as additional reading.
25
The cardinality |X| of a set X is the number of elements of X. So, if A = {5, 6, 9}, its
cardinality is |A| = 3. There is a unique set with no elements. This is denoted by ∅ and
called the empty set. The cardinality of the empty set is zero. Some sets have infinitely
many elements and therefore infinite cardinality. These are called infinite sets. The set
N = {1, 2, 3, ...}
of all integers are both infinite sets. Another infinite set for which a special symbol is
reserved is the set Q of all rational numbers. These are numbers which can be written in
the form ab , where a and b )= 0 are elements of Z. The set R of real numbers is the set of
all numbers that can be written as decimals. R consists of the√ elements of Q and all the
irrational numbers. Irrational numbers are decimals such as 2, π, and e which cannot be
written as ratios of integers. Later in the course, we will encounter the set C of complex
numbers. These are numbers of the form x + iy, where x and y are elements of R and i is
defined by the property that i2 = −1.
Let us now focus on the set R. A subset I of R is called an interval if, whenever it
contains two real numbers, it contains all the real numbers between them. An interval can
be visualised as a line segment. It may include both its end points, one of them, or none
of them. Typical intervals are illustrated below:
In all these cases, a and b are called the endpoints of the interval regardless of whether they
belong to it or not. Any point of an interval that is not an endpoint is called an interior
point. Note that ∞ and −∞ are not real numbers but symbols. These are neither interior
points nor endpoints. A square bracket implies that the endpoint is included in the interval
while a circle bracket implies that the endpoint is excluded from the interval. The symbols
∞ and −∞ are associated with circle brackets; for example, we have R = (−∞, ∞). An
interval of the form (a, b) is called an open interval and an interval of the form [a, b] is
called a closed interval. We will not use any special name for intervals of the form (a, b]
or [a, b).
26
sets X, Y and the rule f have been specified, it does not matter what letters we choose to
denote these variables. For example,
√
f : [4, 7) → (1, 9] defined by y = f (x) = x
and √
f : [4, 7) → (1, 9] defined by f (s) = s
are equivalent descriptions of f . Note that the second description does not even involve a
letter for the dependent variable; the rule f is equally clear without it.
The set of outputs of f : X → Y is called the range of f and is denoted by R(f ) ⊆ Y :
Note that the notation (a, b) for a point in R2 coincides with the notation (a, b) for an open
interval. These concepts are of course unrelated.
It is helpful to visualise R2 and its elements. To this end, we introduce the so-called
Cartesian plane. The Cartesian plane is a plane equipped with a horizontal x-axis and a
vertical y-axis. These axes intersect at a point called the origin of the Cartesian plane. The
element (a, b) ∈ R2 is represented by the point in the Cartesian plane whose x-coordinate
is a and y-coordinate is b. Thus, the element (0, 0) ∈ R2 corresponds to the origin. The
Cartesian plane is depicted below:
Figure 3.3.1
27
G(x, y) = 0 and H(x, y) = 0, respectively. SF corresponds to a diagonal line in the Carte-
sian plane passing through the origin (0, 0), SG consists of only the origin (0, 0) and SH
corresponds to a vertical line passing through the point (2, 0). These subsets are illustrated
below:
Figure 3.3.3
Figure 3.3.5
28
Figure 3.3.6
Remark 3.3.7 Figure 3.3.6 captures the following idea: A subset SG ⊆ R2 defined by
G(x, y) = 0 represents a function if every vertical line in R2 intersects SG at most once.
29
Figure 3.4.1
Remark 3.4.2 A function is injective if every horizontal line in R2 intersects its graph
at most once.
Note that if we restrict the domain and the codomain of g, h and k, we can construct
corresponding bijective functions:
Figure 3.4.3
30
3.5 The derivative
The derivative measures the response of a dependent variable y to changes in an indepen-
dent variable x. It generalises the concept of the slope of a line.
Let us start with the graph of a line in R2 described by the Cartesian equation y = mx+C.
The slope m of this graph determines the response of the dependent variable y to changes
in the independent variable x in the following sense: Consider any point on this graph.
If its x-coordinate is changed by an amount h, its y-coordinate must be changed by an
amount mh in order for this point to remain on the graph:
Figure 3.5.1
Given any two points (x0 , mx0 + C) and (x0 + h, mx0 + mh + C) on this graph, the slope
of the graph is defined by the ratio
δy mx0 + mh + C − mx0 − C mh
= = = m,
δx x0 + h − x0 h
where δx is the change in the x-coordinate and δy is the corresponding change in the
y-coordinate.
In the case of a graph which is not a line, the response of y to x may vary from point to
point. Therefore, the definition of the slope must be modified so that it can be applied to
an individual point.
Figure 3.5.2
31
For this purpose, let us select a random point (x0 , f (x0 )) on the graph of some random
function f . This function is assumed to be differentiable at the point (x0 , f (x0 )) in a
sense to be explained in subsection 3.6. For the time being, our task is to define the slope
of this graph at the point (x0 , f (x0 )). As illustrated below, we introduce a nearby point
(x0 + h, f (x0 + h)) and investigate what happens to the slope of the line segment joining
the points (x0 , f (x0 )) and (x0 + h, f (x0 + h)) as we bring (x0 + h, f (x0 + h)) closer to
(x0 , f (x0 )):
Figure 3.5.3
In general, the line segment joining (x0 , f (x0 )) and (x0 +h, f (x0 +h)) does not coincide with
the piece of the graph between (x0 , f (x0 )) and (x0 + h, f (x0 + h)) because the latter may be
a curve. However, this piece looks straighter and straighter as the point (x0 + h, f (x0 + h))
moves closer and closer to the original point (x0 , f (x0 )).
Figure 3.5.4
In the limit as h tends to zero, the graph of f between the points (x0 , f (x0 )) and
(x0 + h, f (x0 + h)) effectively becomes a straight line, thereby allowing us to define the
slope of the curve y = f (x) at the individual point (x0 , f (x0 )). More precisely, we define
the slope at the point (x0 , f (x0 )) by the expression
f (x0 + h) − f (x0 )
limh→0
h
32
and call it the derivative of f at x0 .
The derivative of f at x0 is denoted by various symbols, among them, f " (x0 ), Df (x0 ),
df dy
(x0 ), (x0 ) and y " (x0 ).
dx dx
Remark 3.5.5 It should be emphasised that the limit h → 0 does not mean that
h = 0. It means that h becomes exceedingly small while remaining non-zero. Hence, the
f (x0 + h) − f (x0 )
numerator and the denominator of the fraction are not zero and their
h
ratio is not undefined. Still, there is no guarantee that the limit of this ratio will exist for
all points in the domain of a function f . Whether this limit exists or not depends both
on the function f and the particular point considered. We will go through some examples
in subsection 3.6 where the derivative does not exist at a given point (x0 , f (x0 )) because
taking the limit there fails to produce a unique finite answer.
First, let us apply this process to a case where the derivative does exist.
Example 3.5.6 Let us calculate the derivative of the function f (x) = x2 at any given
point x0 ∈ R. Using the definition, we have
h(2x0 + h)
f " (x0 ) = limh→0 .
h
Now recall that h is not actually zero; it only tends to zero. Hence, since h )= 0, we
eliminate the h terms and obtain the expression
33
Figure 3.5.7
Having defined the derivative f " (x), we can now define the tangent line to the graph of
y = f (x) at the point (x0 , f (x0 )) as the line which passes through (x0 , f (x0 )) and has slope
equal to f " (x0 ). A Cartesian equation for this line is
Example 3.5.8 Let us find a Cartesian equation for the line in R2 which is tangent to
the graph of y = x2 at the point (3, 9). By applying the above formula, we get
34
Theorem 3.6.1a If a function is not continuous at a given point, then it is not differen-
tiable there either.
We will not go into the details of this theorem in this course, but do also note its contra-
positive version:
Theorem 3.6.1b If a function is differentiable at a given point, then it is continuous
there as well.
In the third example below, the function h is continuous but not differentiable at the given
point because neither the left nor the right derivative exist there. The relevant calculations
are performed below each figure:
Example 3.6.2 The modulus function f (x) = |x| is defined by f (x) = x when x ≥ 0
and by f (x) = −x when x < 0. This function has the following graph:
Figure 3.6.3
It is clear from this graph that f is continuous at x = 0 and at any other point, but it is
not differentiable at x = 0. Indeed, as we approach the origin from the left, the slope of the
line segment that joins the ‘approaching point’ and the origin is −1, but as we approach
the origin from the right, the slope of the corresponding line segment is 1. Hence, there is
no unique finite value for the slope of the graph at the point 0, which makes the function
non-differentiable there. Let us confirm this by direct calculation:
For a positive ε, the left derivative at the point 0 is given by
f (0) − f (0 − ε) 0−ε
limε→0 = limε→0 = limε→0 (−1) = −1.
ε ε
The fact that f (−ε) = −(−ε) = ε follows from the definition of the modulus function.
Similarly, for a positive ε, the right derivative at the point 0 is given by
f (0 + ε) − f (0) ε−0
limε→0 = limε→0 = limε→0 (1) = 1.
ε ε
If you are wondering why the calculation of the left derivative of f can be performed using
a positive ε as opposed to a negative one, then note the following:
35
Starting with the general definition of the derivative of f at the point x0 , that is,
f (x0 + h) − f (x0 )
limh→0 ,
h
and using x0 = 0 and a negative h, we obtain the following formula for the left derivative
of f at 0:
f (0 + h) − f (0)
limh→0 .
h
If we now replace the negative h by a positive ε in the above formula, that is, if we use
h = −ε, we arrive at the alternative formula
f (0 − ε) − f (0)
limε→0
−ε
which is indeed identical to the formula
f (0) − f (0 − ε)
limε→0
ε
used above.
Example 3.6.4 The function g is given by g(x) = x + 1 when x > 0 and by g(x) = x
when x ≤ 0. Its graph is presented below:
Figure 3.6.5
This graph shows that g is not continuous at x = 0, so on the basis of Theorem 3.6.1a,
it cannot be differentiable there either. Indeed, as we approach the origin from the left,
we see that the slope of the line segment that joins the ‘approaching point’ and the origin
is 1. However, as we approach the origin from the right, the line segment joining the
‘approaching point’ (ε, ε + 1) and the origin becomes vertical, so its slope tends to infinity.
Since there is no unique finite value for the slope of the graph at the point 0, the function
is non-differentiable there. The following calculation confirms our conclusion:
For a positive ε, the left derivative at 0 is given by
36
Similarly, for a positive ε, the right derivative at 0 is given by
g(0 + ε) − g(0) (ε + 1) − 0
limε→0 = limε→0 → ∞.
ε ε
Example 3.6.6 The function h is given for all x by h(x) = x1/3 . Its graph is sketched
below:
Figure 3.6.7
Exercise 3.7.2 For this question, use the formula for the derivative at a general point x,
f (x + h) − f (x)
limh→0 ,
h
to show that the derivative of the function
f (x) = x3 − 4x
is the function
f " (x) = 3x2 − 4.
Exercise 3.7.4 (i) Use the definition of the derivative at a general point x, namely
f (x + h) − f (x)
f " (x) = limh→0 ,
h
f (x) = 2x2 + 3x + 1
is the function
f " (x) = 4x + 3.
38
(iv) Use the formula for the derivative at a point x0 , namely
g(x0 + h) − g(x0 )
limh→0 ,
h
to calculate the left and right derivatives of g at x0 = 1. Confirm that your findings are
consistent with your answer to the previous part.
where k ∈ Z. These results should be memorised as they will be used frequently in this
course.
39
and so on.
f (x) = cos(x) − x5 + 3
is given by
f " (x) = −sin(x) − 5x4 .
and so on.
f (x) = 5xcos(x)ex
is given by
f " (x) = 5cos(x)ex − 5xsin(x)ex + 5xcos(x)ex .
The quotient rule
The quotient rule allows us to calculate the derivative of the ratio of two functions:
& '"
f (x) f " (x)g(x) − f (x)g " (x)
= ! "2 .
g(x)
g(x)
y(x) = cos(x7 + x5 ).
This function is not an elementary function, so its derivative is not among the list presented
in subsection 4.1. Either we have to differentiate y(x) from first principles or we can simply
regard y(x) as the composition v(w(x)) of the elementary functions
40
whose derivatives are known. Formally, the composition of two functions v and w is the
function denoted by v ◦ w and defined by
(v ◦ w)(x) = v(w(x)).
This expression is meaningful provided that x belongs to the domain of w and that the
range of w is a subset of the domain of v.
The chain rule tells us that the derivative of the composition y(x) = v(w(x)) is the product
of the individual derivatives according to
By applying the chain rule to our current functions and expressing the final answer in
terms of x, we obtain
dy(x)
= −sin(w)(7x6 + 5x4 ) = −sin(x7 + x5 )(7x6 + 5x4 ).
dx
Remark 4.2.4 In practice, it is worth dispensing with all the intermediate labels such as
w. Note that w represents the circle bracket (x7 + x5 ) and that the chain rule effectively
allows us to treat this bracket as a variable. Therefore, we can write down the final answer
directly: ( )"
cos( ) = −sin( )( )" = −sin(x7 + x5 )(7x6 + 5x4 ).
The empty brackets are purely to illustrate the structure of the chain rule; they are not
needed either.
The larger the number of compositions of functions required, the greater the convenience
gained by avoiding intermediate labels. For example, the derivative of the function
* ( )+18
3 2
y(x) = sin ln(x + x )
Its derivative can be obtained by combining the chain rule and the product rule:
1
f " (x) = 4x3 ln[sin(x6 ex )] + x4 cos(x6 ex )(6x5 ex + x6 ex ).
[sin(x6 ex )]
41
4.3 Higher order derivatives
The second derivative of a twice differentiable function f is simply the derivative of its
derivative: , -"
f "" (x) = f " (x) .
Similarly, the third derivative of a thrice differentiable function f is the derivative of its
second derivative: , -"
f """ (x) = f "" (x) ,
and so on. The most convenient notation for the nth -derivative of f is Dn f (x).
Example 4.3.1 Using the sum, product, quotient and chain rules, let us find the second
derivative D2 f (x) of the function
x4 sin(3x)
f (x) = .
x2 + 5
42
4.4 Taylor polynomials and Taylor series
A real polynomial of degree n is a function that can be written in the form
Given a function f with the property that all its derivatives up to Dn f exist, it is possible
to find a polynomial Pn whose graph closely resembles the graph of f in the neighbourhood
of a given point a. This is known as the Taylor polynomial of f about a of degree n
and is defined by:
1 2 1
Pn (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + ... + Dn f (a)(x − a)n .
2! n!
The characteristic property of Pn is that all its derivatives up to the nth -derivative evaluated
at a are equal to the corresponding derivatives of f at a. That is,
Pn (a) = f (a),
f (x) = ex ,
Df (x) = ex ,
D2 f (x) = ex .
Evaluating these derivatives at the point 0, we obtain
f (0) = 1,
Df (0) = 1,
D2 f (0) = 1.
Substituting these values into the formula for the Taylor polynomial P2 (x), we find
1
P2 (x) = 1 + (x − 0) + (x − 0)2 .
2!
Therefore, the first three Taylor polynomials of f about 0 are:
P0 (x) = 1,
43
P1 (x) = 1 + x,
1
P2 (x) = 1 + x + x2 .
2
These are sketched on the same graph as f , below:
y = P0 (x) = f (a)
describes the horizontal line that passes through the point (a, f (a)) on the graph of f ,
while the Cartesian equation
describes the tangent line to the graph of f at the point (a, f (a)).
If all the derivatives of f exist, we can let n tend to ∞ and define the Taylor series of f
about a given point a as the infinite sum
1 2 1
P∞ (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + D3 f (a)(x − a)3 + ....
2! 3!
A question arises: ‘does the Taylor series converge to f (x)’ ? In other words, ‘is P∞ (x)
the same function as f (x)’ ?
On most occasions, the answer is ‘yes’: the Taylor series converges to f (x) for all x in the
domain of f or at least for some set of values of x that includes a. For those values of x,
we indeed have
1 2 1
f (x) = P∞ (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + D3 f (a)(x − a)3 + ....
2! 3!
However, there are cases where the series P∞ (x) converges to f (x) only when x = a and
hence loses its usefulness away from a, and there are also cases where P∞ (x) converges
44
to a function other than f (x). We will not encounter any such exceptional cases in this
module.
The following example is concerned with a typical Taylor polynomial, the corresponding
Taylor series and its convergence.
Example 4.4.2 Let us derive the Taylor polynomial Pn (x) about 0 for the function
1
f (x) = ,
1−x
where x )= 1.
First, we calculate a few derivatives in order to spot a pattern. By the chain rule, we have
f (x) = (1 − x)−1 ,
f (0) = 1,
Df (0) = 1,
D2 f (0) = 2,
D3 f (0) = 3!,
D4 f (0) = 4!,
..
.
Dn f (0) = n!.
Substituting these values into the formula for the Taylor polynomial Pn (x), we find
1 1 1
Pn (x) = 1 + (x − 0) + 2(x − 0)2 + 3!(x − 0)3 + ... + n!(x − 0)n .
2! 3! n!
Thus, simplifying this expression, we obtain the approximate equation
1
6 Pn (x) = 1 + x + x2 + x3 + · · · + xn .
1−x
In order to investigate the convergence of the corresponding Taylor series, let us multiply
both sides of this approximation by 1 − x. We get
45
which, upon simplification, reduces to the following approximate equation:
1 6 1 − xn+1 .
Now recall that the Taylor series is derived from the Taylor polynomial Pn by letting
n → ∞. As long as −1 < x < 1, we have that xn+1 → 0 as n → ∞. Hence, the right hand
side ‘1 − xn+1 ’ of the above equation approaches the left hand side ‘1’. Accordingly, the
error in the approximation of f (x) by Pn (x) becomes exceedingly small as n → ∞, and
the Taylor series
P∞ (x) = 1 + x + x2 + x3 + x4 + . . .
converges to f (x). On the other hand, if x ≤ −1 or x > 1, the expression ‘1 − xn+1 ’ does
not approach ‘1’ as n → ∞, which means that the Taylor series does not converge to f (x).
For example, if we evaluate f (x) at x = 2 we find
1
f (2) = = −1,
1−2
while if we evaluate P∞ (x) at x = 2 we find
P∞ (2) = 1 + 2 + 22 + 23 + 24 + · · · → ∞.
Note that this is an example of a Taylor series which converges to f (x) for some set of
values of x that includes the point of expansion a.
x 3 x2 + 1 −x
(a) tan(x), (b) e sin(x − 1), (c) 2 , (d) .
x +x+1 x2 + x + 1
(e) Why is the derivative of the function in (c) the same as the derivative of the function
in (d)?
(f) Find a Cartesian equation in R2 that describes the tangent line to the graph of the
function
f (x) = ex sin(x3 − 1)
at the point (1, 0) on this graph. Note that f (x) is the function encountered in (b).
Exercise 4.5.2 Find the quadratic Taylor polynomial P2 (x) of the function
f (x) = x3 e−x
g(x) = cos(x)
46
about the point a = 0. You can assume that P∞ (x) converges to g(x) for all x.
Exercise 4.5.3 (i) Using the product and chain rules, differentiate the function
with respect to x.
(ii) Hence, find the Taylor polynomial P1 (x) of the function f (x) about the point a = 1
and also find a Cartesian equation in R2 for the tangent line to the graph of f at the point
(1, 2cos(e + 6)).
Exercise 4.5.4 (i) Find the quadratic Taylor polynomial P2 (x) of the function
√
f (x) = x − ln(x)
g(x) = sin(x)
about the point a = 0. You can assume that P∞ (x) converges to g(x) for all x.
(iii) Use your result from part (ii) to find the Taylor series P∞ (x) of the function
h(x) = sin(x3 )
about the point a = 0. Hint: This should also follow from part (ii).
f (x2 ) ≥ f (x1 )
47
whenever x1 ∈ I, x2 ∈ I and x2 > x1 . It is said to be strictly increasing on I only if
f (x2 ) ≤ f (x1 )
Remark 5.1.1 Some books may use the terms ‘non-decreasing’ and ‘non-increasing’ to
describe what we call ‘increasing’ and ‘decreasing’, respectively. They may also use the
terms ‘increasing’ and ‘decreasing’ instead of ‘strictly increasing’ and ‘strictly decreasing’.
Keep this in mind, especially if you are using various sources when studying.
Given a differentiable function f on an open interval I, the following statements are true:
Remark 5.1.2 Note that the second statement does not claim ‘if and only if’. This
is because a differentiable function f on an open interval I may satisfy the non-strict
inequality f " (x) ≥ 0 and still be a strictly increasing function on I. An example is the
function f (x) = x3 on R. Its derivative f " (x) = 3x2 is not positive everywhere on R because
it vanishes at 0. Still, f is a strictly increasing function on R because x2 3 > x1 3 whenever
x1 ∈ R, x2 ∈ R and x2 > x1 . A similar remark applies to the fourth statement above.
Example 5.1.3 Let us show that the function f (x) = x2 is
(i) strictly decreasing on the interval I1 = (−9, −1) and
(ii) strictly increasing on the interval I2 = (0, 6).
Indeed, observe that f is differentiable on these open intervals and that
48
Since f " (x) < 0 for all x ∈ (−9, −1), f is strictly decreasing on I1 . Similarly, since f " (x) > 0
for all x ∈ (0, 6), f is strictly increasing on I2 .
Example 5.1.4 Let us show that the constant function f (x) = 5 is both increasing and
decreasing on R but is neither strictly increasing nor strictly decreasing on R.
Indeed, we have that
f (x2 ) ≥ f (x1 ) and f (x2 ) ≤ f (x1 )
whenever x1 ∈ R, x2 ∈ R and x2 > x1 . It follows that f is both increasing and decreasing
on R.
f (a) ≥ f (x)
for all x ‘sufficiently near’ a. The qualification ‘sufficiently near’ can be made precise: there
exists a real number δ > 0 such that f (a) ≥ f (x) for all x in the subset (a − δ, a + δ) of I.
A strict local maximum of a function f : I → R is a point a ∈ I with the property that
for all x ‘sufficiently near’ a such that x )= a. In other words, there exists a real number
δ > 0 such that f (a) > f (x) for all x in the subset (a − δ, a) ∪ (a, a + δ) of I.
Similarly, a local minimum of f : I → R is a point a ∈ I with the property that
f (a) ≤ f (x)
49
Local maxima and minima are known collectively as local extrema. A strict local ex-
tremum is necessarily a local extremum. However, the converse of this statement is not
true: there exist local extrema which are not strict.
The following figure illustrates some local extrema graphically. Note that they are all strict
local extrema:
Figure 5.2.1
f " (a) = 0.
For any such point a, the tangent line to the graph of f at (a, f (a)) is a horizontal line
described by the Cartesian equation
y = f (a).
Figure 5.3.1
50
A stationary point of f : I → R which is neither a local minimum nor a local maximum is
called a stationary inflection point.
Note that a local maximum or a local minimum of a general function f may not be a
stationary point of f . Indeed, in Example 5.2.3, the point x = 0 is a strict local minimum
of the modulus function, but it is not a stationary point of this function because the latter
is not differentiable at x = 0.
Proof Let us prove this theorem by contradiction. Referring to subsection 2.5, we need
to assume the negation of this conditional statement — that is, we need to assume that a
is a local maximum or a local minimum of f and that a is not a stationary point of f —
and reach contradiction.
Indeed, assuming that a is not a stationary point of f , we have f " (a) )= 0. Then the slope
of the graph of f at a is either positive or negative. If this slope is positive, any nearby
point x to the right of a will satisfy f (x) > f (a) and any nearby point x to the left of a will
satisfy f (x) < f (a). Similarly, if this slope is negative, any nearby point x to the right of
a will satisfy f (x) < f (a) and any nearby point x to the left of a will satisfy f (x) > f (a).
Thus, in all cases, we end up contradicting the other assumption that we made, namely,
that a is a local minimum or a local maximum. This completes our proof.
Example 5.3.3 Find the local extrema of the function f (x) = x4 − 3x2 on R.
The logically equivalent contrapositive version of Theorem 5.3.2 informs us that if x is not
a stationary point of f , then x is not a local extremum of f . Therefore, the only candidates
for such extrema are the stationary points of f . These are solutions of the equation
4x3 − 6x = 0.
Factorising this equation according to 2x(2x2 − 3) = 0, we find that f has stationary points
at √
3
x = 0 and x = ± √ .
2
Note that some of these candidates may not be local extrema; they may be stationary
inflection points. We will return to this example and complete it in the next subsection.
51
5.4 Tests for classifying stationary points
Let I be an open interval and let f : I → R be a twice-differentiable function on I. Also
let a be a stationary point of f .
The Second-Derivative Test: The second-degree Taylor polynomial of f about a pro-
vides an approximation for f in the neighbourhood of a:
1 ""
f (x) 6 f (a) + f (a)(x − a)2 .
2!
Given that (x − a)2 > 0 for all x )= a, it follows that the sign of f "" (a) coincides with the
sign of the difference f (x) − f (a) for all x near a such that x )= a. Using the definitions of
local extrema and strict local extrema encountered in subsection 5.2, we deduce that the
stationary point a is:
(i) a local minimum (in fact, a strict local minimum) if f "" (a) > 0,
(ii) a local maximum (in fact, a strict local maximum) if f "" (a) < 0.
On the other hand, if f "" (a) = 0, the second-degree Taylor polynomial at a merely tells us
that f (x) 6 f (a). Hence, it gives us no information about the difference f (x) − f (a) other
than the fact that this difference vanishes to that particular degree of accuracy. In other
words,
(iii) the second-derivative test fails if f "" (a) = 0.
In order to deal with case (iii), a Taylor polynomial of degree higher than 2 is needed.
However, f may not be differentiable at a more than twice, and even if it is, it is generally
unclear how high the degree of the Taylor polynomial should be in order for this polynomial
to yield conclusive information about the behaviour of f near a. In any case, it is simpler
to deduce the character of the point a by evaluating the function f at nearby points x1 < a
and x2 > a and comparing f (x1 ) and f (x2 ) with f (a).
The First-Derivative Test: Alternatively, we can classify the stationary point a of such
a function f : I → R by considering the behaviour of the first derivative of f near a. This
test is always conclusive and can be summarised by the following rules:
(a) if f " (x) changes sign from negative to the left of a to positive to the right of a, then a
is a local minimum (in fact, a strict local minimum),
(b) if f " (x) changes sign from positive to the left of a to negative to the right of a, then a
is a local maximum (in fact, a strict local maximum),
(c) if f " (x) does not change sign at a, then a is a stationary point of inflection.
Example 5.4.1 Let us use the test based on the first derivative in order to classify the
stationary points of the function f (x) = x4 − 3x2 encountered in Example 5.3.3. Let us
then complete that example by identifying the local extrema of f .
Continuing from Example 5.3.3, the derivative f " can be factorised completely as follows:
! √ √ "! √ √ "
f " (x) = 4x x − 3/ 2 x + 3/ 2 .
52
The way in which the sign of f " behaves over the corresponding subsets of the domain of
f is depicted below:
√ √ √ √ √ √ √ √
−∞ < x < − 3/ 2 − 3/ 2 < x < 0 0 < x < 3/ 2 3/ 2 < x < ∞
− + − +
√ √
It follows by the first-derivative
√ √ test that − 3/ 2 is a strict local minimum, 0 is a strict
local maximum and 3/ 2 is a strict local minimum. Therefore, all the stationary points
of f are local extrema and, in fact, strict local extrema.
Let us show that the second-derivative test fails at all the stationary points of these func-
tions and then classify these points in an alternative way.
(i) The derivative of f is given by f " (x) = 0. Since f " vanishes everywhere on R, every
point a ∈ R is a stationary point of f . The second derivative of f is given by f "" (x) = 0
and therefore also vanishes everywhere on R. It follows that the second derivative test
fails at each stationary point a of f . In order to classify these points we observe that the
graph of f is a horizontal line. Therefore, each a ∈ R is both a local maximum and a local
minimum but is neither a strict local maximum nor a strict local minimum.
(ii) The derivative of g is given by g " (x) = 3x2 , so the point 0 is the only stationary point
of g. The second derivative of g is given by g "" (x) = 6x. Since g "" (0) = 0 the second
derivative test fails. In order to classify this stationary point, we observe that g(x) < g(0)
for x < 0 and that g(x) > g(0) for x > 0. Hence the stationary point at 0 is neither a
local minimum nor a local maximum, so it is a stationary inflection point. Alternatively,
we observe that g " does not change sign at 0. Hence, by the first derivative test, we reach
the same conclusion.
(iii) The derivative of h is given by h" (x) = 4x3 , so 0 is the only stationary point of h. The
second derivative of h is given by h"" (x) = 12x2 . Since h"" (0) = 0, the second derivative test
fails. In order to classify this stationary point, we observe that h(x) > h(0) for all x )= 0.
Hence 0 is a local minimum and, in fact, a strict local minimum. Alternatively, we note
that h" (x) < 0 for x < 0 and h" (x) > 0 for x > 0. By the first-derivative test, we confirm
our previous conclusion.
Example 5.4.3 Confirm that the point 0 is a stationary point of the function
f (x) = cos(x) and then classify this point by using the Taylor polynomial
x2
P2 (x) = 1 −
2
of f about 0.
The first-derivative of f is given by
53
Using the Taylor polynomial given in the question, we obtain the approximation
x2
f (x) 6 P2 (x) = 1 − .
2
Noting that f (0) = 1, we deduce that for all x sufficiently near zero such that x )= 0 the
difference f (x) − f (0) satisfies
x2
f (x) − f (0) 6 −
< 0.
2
Hence the stationary point 0 is a local maximum and, in fact, a strict one.
For comparison, let us classify this stationary point by using the second-derivative test. In
this case, this test is conclusive and provides the fastest approach.
The second derivative of f is given by
f "" (x) = [−sin(x)]" = −cos(x).
Setting this expression equal to zero, we see that a stationary point must satisfy
(ex − 1)2 = 0 or ex = 0.
The second equation has no solution and the first equation has a unique solution when
ex = 1; that is, when x = 0. Hence the point 0 is the only stationary point of f .
54
5.5 Convex and concave functions
A function f defined on an interval I is called convex only if the line segment joining any
two points on the graph of f lies above or on this graph. It is called strictly convex if
the line segment joining any two (distinct) points on the graph of f lies above this graph.
Similarly, a function f defined on an interval I is called concave only if the line segment
joining any two points on the graph of f lies below or on this graph. It is called strictly
concave if the line segment joining any two (distinct) points on the graph of f lies below
this graph. Various cases of convex and concave functions are illustrated below. Note that
the first graph corresponds to a strictly convex function:
Figure 5.5.1
An alternative description of concave and convex functions utilises the idea of a convex
set: A convex set in R2 is a subset S of R2 such that for any two points (x1 , y1 ) and (x2 , y2 )
in S, the line segment joining (x1 , y1 ) and (x2 , y2 ) lies entirely in S. A few illustrations of
convex and non-convex sets in R2 are given below:
Figure 5.5.2
Using this idea, we define a convex function f : R → R as a function with the property
that the set S of points lying above or on the graph of f in R2 is a convex set. Similarly,
we define a concave function f : R → R as a function with the property that the set S
of points lying below or on the graph of f in R2 is a convex set.
Figure 5.5.3
55
Example 5.5.4 Without sketching its graph, visualise the function f : R → R given by
f (x) = x2 . The set of points lying above or on the graph of f in R2 is a convex set. Hence,
f is a convex function. On the other hand, the function g : R → R given by g(x) = −x2
is a concave function, because the set of points lying below or on the graph of g in R2 is a
convex set.
Note that the definitions of convexity and concavity do not require the function f to be
differentiable on the interval I. However, if f is differentiable on an open interval I, we
have the following additional description:
A differentiable function f on an open interval I is convex if and only if its derivative
f " (x) is an increasing function on I; that is,
whenever x1 ∈ I, x2 ∈ I and x2 > x1 . In this case, all the tangent lines of the graph of f
lie below or on this graph.
Similarly, a differentiable function f on an open interval I is concave if and only if its
derivative f " (x) is a decreasing function on I; that is,
whenever x1 ∈ I, x2 ∈ I and x2 > x1 . In this case, all the tangent lines of the graph of f
lie above or on this graph.
Figure 5.5.5
f "" (x) ≥ 0
for all x ∈ I.
Similarly, a twice-differentiable function f on an open interval I is concave if and only if
its second derivative f "" (x) is non-positive on I; that is,
f "" (x) ≤ 0
for all x ∈ I.
56
Remark 5.5.6 Note that the above definitions do not disagree with the statements
made in section 4.1 of our Calculus Textbook that ‘if f "" (x) > 0, then f is convex’ and ‘if
f "" (x) < 0, then f is concave’.
Example 5.5.7 Consider the functions f : R → R given by (i) f (x) = −x2 , (ii) g(x) = x3 ,
(iii) h(x) = x4 and (v) k(x) = −2x whose graphs are sketched below. Let us classify these
functions as convex, concave or neither by using the above tests.
Figure 5.5.8
We note that all these functions are twice-differentiable on R, so the above tests are all
applicable. The test based on the second derivative is generally the simplest, so it is the
one we will use below:
(i) The second derivative of f is given by f "" (x) = (−2x)" = −2. Since f "" (x) < 0 for all
x ∈ R, f is a concave function.
(ii) The second derivative of g is given by g "" (x) = (3x2 )" = 6x. Thus, it is neither true
that g "" (x) ≥ 0 for all x ∈ R nor that g "" (x) ≤ 0 for all x ∈ R. It follows that g is neither a
convex nor a concave function.
(iii) The second derivative of h is given by h"" (x) = (4x3 )" = 12x2 . Since h"" (x) ≥ 0 for all
x ∈ R, h is a convex function. On the other hand, it is not true that h"" (x) ≤ 0 for all
x ∈ R, so h is not a concave function.
(iv) The second derivative of k is given by f "" (x) = (−2)" = 0. Since we have that f "" (x) ≥ 0
and f "" (x) ≤ 0 for all x ∈ R, f is both a convex and a concave function.
Figure 5.6.1
57
Note that it is not necessary for f to be differentiable at an inflection point a. However, if
f is twice-differentiable on an open interval I that contains a, then we have the following
additional description:
A point a belonging to the domain of a twice-differentiable function f is an inflection
point only if f "" (a) = 0 and f "" (x) changes sign at a. It is a stationary inflection point
only if we also have that f " (a) = 0.
Example 5.6.2 Using the above test, let us investigate if the stationary points of the
functions (i) f : R → R given by f (x) = x3 and (ii) g : R → R given by g(x) = x4 are
inflection points.
We note that these functions are twice-differentiable on R, so the above test is applicable.
(i) The first and second derivatives of f are given by
Hence, x = 0 is the only stationary point of f and we also see that f "" (0) = 0.
The sign of f "" behaves over R as shown below:
Since f "" (0) = 0 and f "" (x) changes sign at 0, we conclude that 0 is an inflection point.
Moreover, we have that f " (0) = 0, so 0 is a stationary inflection point.
(ii) The first and second derivatives of g are given by
Hence, x = 0 is the only stationary point of g and we also see that g "" (0) = 0.
The sign of g "" behaves over R as shown below:
Since g "" (x) does not change sign at 0, we conclude that 0 is not an inflection point.
(i) Sketch the graph of f . Is the point x0 = 1 a local maximum of f ? Is this point a strict
local maximum of f ?
(ii) By considering the left and right derivatives of f at x0 = 1, decide whether the point
x0 = 1 is a stationary point of f .
58
Exercise 5.7.2 Consider the function f : R → R given by
(i) Find f " and f "" and determine how the sign of each of these derivatives behaves over
the domain of f .
(ii) Identify and classify the stationary points of f using either the first-derivative test or
the second-derivative test.
f (x) = x3 − 3x.
(i) Find the stationary points of f (x). For each stationary point, determine whether it
is a local maximum, a local minimum or a stationary inflection point by using the first-
derivative test.
(ii) Confirm your classification of each stationary point by using the second-derivative test.
(iii) Find the largest open intervals of R where the function f is increasing, strictly de-
creasing, convex, concave.
(i) Sketch the graph of g. On the basis of your graph: Is g a convex function on R? Is the
point x0 = 2 a local minimum of g? Is this point a strict local minimum of g?
(ii) By considering the left and right derivatives of g at x0 = 2, decide whether the point
x0 = 2 is a stationary point of g.
59
(i) Identify each point b ∈ R which is an endpoint of an interval in the collection {In }.
Do this regardless of whether the point b belongs to A or not. For example, if
A = (−8, 5], the points of interest are −8 and 5. If A = (−∞, 3) ∪ (3, ∞), the
relevant point is 3. If b belongs to A, evaluate f at b; otherwise, investigate how f
behaves as x approaches b.
(ii) Find all the points of intersection of the graph of f with the coordinate axes.
(iii) Find f " and hence any stationary points of f . Also investigate how the sign of f " varies
over A in order to identify subsets of A where f is increasing or strictly increasing
and subsets of A where f is decreasing or strictly decreasing. Note that this process
amounts to the first-derivative test for classifying stationary points.
(iv) In addition, find f "" and investigate how its sign varies over A in order to identify
subsets of A where f is convex and subsets of A where f is concave. Hence find any
points of inflection.
(v) If A extends to ∞ or −∞, investigate how f behaves for large positive or large
negative values of x. Then combine the above findings in order to sketch the graph
of f .
Remark 6.1.1 Some of these steps may be omitted in practice in order to reduce the time
spent on sketching a graph. However, following the entire procedure as laid out above is a
good way of gaining affinity with the relevant concepts. With this in mind, the following
two examples go through all these steps.
Example 6.1.2 Let us apply this methodology to the function
1
f : (−∞, 0) ∪ (0, ∞) → R given by f (x) = 2x − .
x
(i) We are interested in the point 0 ∈ R which does not belong to the domain of f . As x
approaches 0 from the left, that is, as
x → 0− ,
we see that f (x) is positive and large:
f (x) → ∞.
As x approaches 0 from the right, that is, as
x → 0+ ,
we see that f (x) is negative and large:
f (x) → −∞.
We say that the graph of f has a vertical asymptote at 0. This is the line with Cartesian
equation x = 0.
(ii) Given that 0 ∈ R does not belong to the domain of f , no y-intercept exists for this
function. In order to see if f has any x-intercepts, we solve the equation
1
2x − = 0.
x
60
This is equivalent to the quadratic equation
2x2 − 1 = 0
which
! implies
" that
! there
" are two x-intercepts; namely the points with Cartesian coordinates
1 1
− √2 , 0 and √2 , 0 .
Taking a closer look at the way in which f (x) behaves for large positive and large negative
values of x, we observe the following:
1 1
2x − < 2x and 2x − 6 2x as x → ∞
x x
and
1 1
2x − > 2x and 2x − 6 2x as x → −∞.
x x
This means that the graph of f approaches the line y = 2x from below as x → ∞ and from
above as x → −∞. The line y = 2x is called a slant asymptote of the graph of f .
The relevant graph is presented below:
61
Figure 6.1.3
(0, 0).
This is also an x-intercept. In order to find if additional x-intercepts exist, we solve the
equation
x4 − 4x2 = 0.
We factorise the left hand side completely,
and obtain two additional x-intercepts; namely the points with Cartesian coordinates
4x3 − 8x = 0,
62
which is factorised as follows:
√ √
4x(x + 2)(x − 2) = 0.
Thus, there are three stationary points whose y-coordinates can be found by using the
relation y = x4 − 4x2 . These are:
√ √
(− 2, −4), (0, 0) and ( 2, −4).
The sign of f " varies over the domain R in the following way:
√ √ √ √
x<− 2 − 2<x<0 0<x< 2 x> 2
− + − +
√ √
It follows that f is decreasing to the left√of (− 2, −4), increasing between (−√ 2, −4)
and (0, 0), decreasing between (0, 0) and ( 2, −4) and increasing to the right of ( 2, −4).
Hence,
√ √
(− 2, −4) is a local minimum, (0, 0) is a local maximum and ( 2, −4) is a local minimum.
12x2 − 8 = 0.
It follows that there are two inflection points whose y-coordinates are obtained by using
y = x4 − 4x2 . These are:
√ √ √ √
(− 2/ 3, −20/9) and ( 2/ 3, −20/9).
The sign of f "" varies over the domain R in the following way:
√ √ √ √ √ √ √ √
x < − 2/ 3 − 2/ 3 < x < 2/ 3 x > 2/ 3
+ − +
√ √ √
Thus, f is convex to the left of (− √23 , − 20
9
), concave between (− √23 , − 20
9
) and ( √23 , − 20
9
)
√
and convex to the right of ( √23 , − 20
9
).
63
Figure 6.1.5
Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0
x2 = ay
64
Figure 6.2.1
Figure 6.2.2
x2 y 2
+ 2 =1
a2 b
for some real numbers a > 0 and b > 0:
Figure 6.2.3
x2 y 2
− 2 =1
a2 b
for some real numbers a > 0 and b > 0.
Its graph follows:
65
Figure 6.2.4
b b
The lines y = x and y = − x are the slant asymptotes of the hyperbola.
a a
Remark 6.2.5 A quick way of deriving the Cartesian equations of these asymptotes from
the Cartesian equation of the hyperbola is by factorising the difference of squares on the
left hand side: !x y " !x y "
+ − = 1.
a b a b
Now notice that if any of these factors became zero at some point (x, y), the above equation
would be violated. Hence, no point (x, y) on the hyperbola can satisfy
x y
± = 0,
a b
which implies that the lines described by the Cartesian equations
b
y=± x
a
can never intersect the graph of the hyperbola. These lines are indeed the slant asymptotes
of this graph.
Remark 6.2.6 Later in the course, we will discuss conics with a general orientation with
respect to the Cartesian axes. For now, it suffices to give a few examples of conics that are
not in standard position or orientation. These examples are by no means exhaustive:
Example 6.2.7 A circle of radius r > 0 centred at a point (x0 , y0 ) )= (0, 0) is described
by the Cartesian equation
(x − x0 )2 + (y − y0 )2 = r2 .
The illustration below corresponds to the case where x0 > 0 and y0 > 0:
66
Figure 6.2.7
(x − x0 )2 (y − y0 )2
+ =1
a2 b2
for some real numbers a > 0, b > 0 and (x0 , y0 ) )= (0, 0) is an ellipse. It is in standard
orientation with respect to the axes but is translated by x0 in the x-direction and by y0 in
the y-direction. In the graph below, x0 > 0 and y0 > 0:
Figure 6.2.8
x − x0 = (y − y0 )2
is not in standard position and orientation. The same is true for the hyperbola
xy = 3.
The graphs of these conics follow. The graph of the parabola is illustrated using x0 > 0
and y0 > 0:
67
Figure 6.2.9
Note that instead of slant asymptotes, the hyperbola has a vertical and a horizontal asymp-
tote. This is due to its non-standard orientation.
A strict global maximum of f is a point a ∈ A such that f (a) > f (x) for all x ∈ A such
that x )= a.
Similarly, a global minimum of f is a point a ∈ A such that f (a) ≤ f (x) for all x ∈ A.
A strict global minimum of f is a point a ∈ A such that f (a) < f (x) for all x ∈ A such
that x )= a.
Global maxima and minima will be collectively referred to as global extrema. We will
also use the term strict global extrema where appropriate. A strict global extremum is
necessarily a global extremum but the converse may not be true. Some typical examples
of strict and non-strict global extrema are illustrated below:
Figure 6.3.1
The first graph depicts a function which has two non-strict global maxima. Note that they
are both strict local maxima. The second and third graphs show a strict and a non-strict
global minimum, respectively.
68
Example 6.3.2 Consider the functions (i) f : R → R given by f (x) = |x|, (ii) g :
(−1, 3] → R given by g(x) = x(x − 1)(x − 2) and (iii) h : R → R given by h(x) = 5. Let us
identify any global maxima and minima for each of these functions based on the following
graphs:
(i) The function f has no global maximum but it has a global minimum at x = 0. This is
a strict global minimum.
(ii) The function g has a global maximum at the right endpoint of its domain; that is, at
x = 3. This is a strict global maximum. On the other hand, g has no global minimum.
(iii) Every point x is both a global minimum and a global maximum of h. However, none
of these points is a strict global extremum.
Remark 6.3.3 The definition of a local extremum given in subsection 5.2 refers to an
open interval and hence can only be applied to an interior point of the domain A of f .
Given that a global extremum of f can also occur at an endpoint of A, the need arises to
introduce the corresponding local concept. This is essential in order to avoid the awkward
situation of a function having a global extremum at an endpoint of its domain which — in
the absence of the relevant concept — cannot be called a local one.
To this end, let I be any one of the disjoint intervals that may comprise the domain A of
f : An endpoint b of I is a local maximum of f only if b belongs to I and
f (b) ≥ f (x)
for all x ‘sufficiently near’ b. For a left endpoint b ∈ I, the qualification ‘sufficiently near’
means that there exists a real number δ > 0 such that f (b) ≥ f (x) for all x in the subset
[b, b + δ) of I. For a right endpoint b ∈ I, the subset [b, b + δ) needs to be replaced by
(b − δ, b].
for all x ‘sufficiently near’ b such that x )= b. For a left endpoint b ∈ I, this means that
there exists a real number δ > 0 such that f (b) > f (x) for all x in the subset (b, b + δ) of
I. For a right endpoint b ∈ I, the subset (b, b + δ) needs to be replaced by (b − δ, b).
By reversing the direction of the inequalities ‘f (b) ≥ f (x)’ and ‘f (b) > f (x)’ in the above
definitions, we obtain corresponding definitions for a local minimum and a strict local
minimum that occurs at an endpoint b of I.
69
These concepts are illustrated graphically below:
Figure 6.3.4
The first graph depicts a strict local maximum which occurs at the left endpoint of the
domain. This is not a global maximum. The second graph shows a non-strict local max-
imum (which is also a non-strict local minimum!) at the right endpoint of the domain.
This point is a non-strict global maximum. The third graph shows a strict local minimum
at the left endpoint of the domain. This is a global minimum which is not a strict global
minimum.
6.4 Optimisation
Consider the problem of finding the maximum value of a function f on an interval I ⊆ R.
This interval may be open, closed, or neither.
In general, our maximisation problem may not have a solution because, simply, a global
maximum may not exist. Even if a global maximum exists, the problem of identifying
where it occurs may not be straightforward.
However, if f is differentiable in the interior of the interval I, then the places at which
a global maximum can occur are limited considerably. The following argument explains
why:
Recall from Theorem 5.3.2 that if a function is differentiable on an open interval, then a
local maximum can only occur at a stationary point of this function. In the case we are
discussing here, the interior of the interval I is by definition an open interval so, given
that f has been assumed differentiable in this interior, the conditions of Theorem 5.3.2 are
satisfied. This theorem implies that among points in the interior of I, the stationary points
of f are the only candidates for a local maximum. Moreover, since a global maximum is
also a local maximum, it follows from Theorem 5.3.2 that if f admits a global maximum
and this maximum occurs in the interior of I, then it has to be among the stationary points
of f . If, on the other hand, f admits a global maximum and this maximum does not occur
in the interior of I, then the only other places where it can occur are those endpoints of
I that belong to I. Hence, we have deduced that if f admits a global maximum, the only
places where this can occur are the stationary points of f in the interior of I and those
endpoints of I that belong to I.
In the more general case where the domain A of a function f : A → R is a finite union of
disjoint intervals {In } and f is differentiable in the interior of each In , the above conclusion
still holds: if f admits a global maximum, the only places where this can occur are the
70
stationary points of f in the interior of the intervals {In } as well as any endpoints of these
intervals that belong to A.
The steps required for dealing with a general optimisation problem are summarised below.
Let us emphasise that this methodology relies on the assumption that the function f is
differentiable in the interior of the disjoint intervals {In } that comprise its domain A.
Step 1: Find the stationary points of f in the interior of each interval In by solving the
equation f " (x) = 0. These points are candidates for a global extremum.
Step 2: Identify all the endpoints of the disjoint intervals {In }. Among these endpoints,
those that belong to A are candidates for a global extremum.
Step 3: If each disjoint interval In contains both its endpoints — that is, if each such
interval is a closed set of the form [a, b] — then a global extremum necessarily exists on
A and is among the points found in steps (i) and (ii). The proof of this statement is not
part of our syllabus but it can be established using Theorem 6.4.4 below. On the other
hand, if at least one of the intervals In does not have the form [a, b] — that is, if it has the
form (a, b] or [a, b) or (a, b) or (−∞, b] or (−∞, b) or [a, ∞) or (a, ∞) or (−∞, ∞) — then
there is no guarantee that a solution to the optimisation problem exists. We may then
need to employ ad hoc methods in order to decide whether an optimal solution exists or
not. Ideally, sketching the graph of f will settle this issue; however, this task may be too
complicated, in which case considering the sign of the derivative f " may be of considerable
help. In any case, if a global extremum exists, then it must be among the candidates found
in steps (i) and (ii).
Remark 6.4.1 The following theorems may be useful for certain types of optimisation
problems. If they are applicable, they guarantee the existence of at least one kind of global
extremum. The proof of these theorems goes beyond the scope of our course.
The reason that we require the set A to be open in Theorems 6.4.2 and 6.4.3 is that we
have defined differentiability only on open sets. Typical cases where theorems 6.4.2, 6.4.3
and 6.4.4 are relevant are illustrated below:
71
Figure 6.4.5
Without relying on the graph of f , determine whether f has any global extrema on its
domain [−1, ∞). If it does, find where.
Given the definition of f , it is natural to regard its domain A = [−1, ∞) as the union
I1 ∪ I2 ∪ I3 of the disjoint intervals I1 = [−1, 2], I2 = (−2, 4] and I3 = (4, ∞). Clearly, f
is differentiable in the interior of these intervals, so we can apply the methodology given
above.
It follows that f has a stationary point at x = 0, which is our first candidate for a global
extremum.
This can never be zero, so there are no stationary points — and hence no candidates for a
global extremum — in the interior of I2 .
72
Finally, the derivative of f in the open interval I3 = (4, ∞) is
1
f " (x) = .
x2
This is always greater than zero, so there are no stationary points — and hence no candi-
dates for a global extremum — in the interior of I3 , either.
Step 2: We identify the endpoints of the intervals I1 , I2 and I3 that belong to A = [−1, ∞).
These are the points −1, 2 and 4. Together with the stationary point 0, they are the only
candidates for a global extremum. Let us evaluate f at each of these candidates in order
to eliminate some possibilities. We find:
Step 3: These values imply that a global maximum of f can only occur at the point x = 4.
Similarly, a global minimum of f can only occur at the point x = 0. Unfortunately, the
intervals I1 , I2 and I3 are not all closed intervals of the form [a, b], so we have no guarantee
that global extrema actually exist for our optimisation problem. In addition, we realise
that none of the theorems 6.4.2 - 6.4.4 is directly applicable, so we need to improvise.
For example, we observe that f (x) = − x1 is negative on I3 = (4, ∞). This means that the
point x = 0 (at which f takes the value zero) cannot be a global minimum. Since this point
is the only candidate for a global minimum, we conclude that f has no global minimum.
Regarding a global maximum, we observe that the maximum value of f (x) = − x1 on the
interval I3 = (4, ∞) cannot exceed zero. This means that the point x = 4 (at which f takes
the value six) remains a candidate for a global maximum. Similarly, we observe that the
maximum value of f (x) = x2 on the interval I1 = [−1, 2] is four, which is less than the value
that f takes at our candidate x = 4. Finally, we are left with the function f (x) = x + 2
on the interval I2 = (2, 4]. Since f is strictly increasing on (2, 4], its maximum value is
attained at the right endpoint of I2 . This means that x = 4 is the global maximum of f
and that it is in fact a strict global maximum.
For clarity, the graph of f is sketched below. If we had used this graph from the beginning,
we would have arrived at the solution more quickly. However, following the methodology
and using a few ad hoc arguments also has its merits.
73
6.5 Exercises for self study
Exercise 6.5.1 Consider the function f : I → R given by
f (x) = x3 − 3x.
Find if global extrema exist and where they occur for each of the following cases:
Exercise 6.5.2 Find if global extrema exist and where they occur for each of the following
functions. Hint: For the first case below, it is not necessary to find the stationary points
of the function.
f : R → R defined by f (x) = x5 − 3x4 + 7x − 18,
g : R → R defined by g(x) = x4 + 6x2 + 5,
h : [−2, 3] → R defined by h(x) = ex − x.
Exercise 6.5.3 Following the methodology presented in subsection 6.1 of the Lecture
Notes, sketch the graph of the function
1
f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = x + .
2x
where R(x) = 4x is the revenue function and C(x) = x3 − x2 − 5x is the cost function.
Find the maximum value of π(x) on [0, 2] and the corresponding quantity x.
74
This definition is equivalent to saying that the composite function f −1 ◦ f : X → X is the
identity function on X and the composite function f ◦ f −1 : Y → Y is the identity function
on Y ; that is,
f −1 (f (x)) = x for all x ∈ X
and
f (f −1 (y)) = y for all y ∈ Y.
Figure 7.1.1
f (x) = 2x + 3
Using the definition of an inverse function, we need to prove that the following statements
are both true:
if y = f (x), then x = f −1 (y)
and
if x = f −1 (y), then y = f (x).
y−3
Indeed, assuming that y = f (x) = 2x + 3, we solve for x and find that x = = f −1 (y).
2
−1 y−3
Similarly, assuming that x = f (y) = , we solve for y and find that y = 2x+3 = f (x).
2
Alternatively, we can show that
and
f (g(x)) = x for all x ∈ [3, 13).
75
Indeed, for all x in the domain [0, 5) of f , we have that
(2x + 3) − 3
g(f (x)) = g(2x + 3) = =x
2
and, for all x in the domain [3, 13) of g, we have that
& ' & '
x−3 x−3
f (g(x)) = f =2 + 3 = x.
2 2
As an illustration, consider a random bijective function f : [1, 4] → [5, 6] and its inverse
f −1 : [5, 6] → [1, 4]:
Figure 7.1.3
Below, we apply this approach to some elementary bijective functions in order to obtain
explicit expressions for their inverses.
76
Linear Functions Consider the linear function f : R → R defined by the rule
f (x) = ax
Figure 7.2.1
f (x) = ax + b
where a )= 0 and b are real constants. By making x the subject of y = ax + b we find that
y−b
x= .
a
Hence, the inverse function f −1 : R → R is given by
x−b
f −1 (x) = .
a
The graph corresponding to the case a = 2, b = 1 follows:
77
Figure 7.2.2
Note that a linear function can be considered as a special case of an affine function, where
b = 0. In other words, the graph of a linear function is a line that goes through the origin
and that of an affine function is a line that has a non-zero y-intercept (0, b), in general.
f (x) = xn
Figure 7.2.3
78
Even Positive Integer Powers of Non-Positive Inputs Consider the function
f : (−∞, 0] → [0, ∞) defined by
f (x) = xn
where n is an even natural number. Note that f is bijective. By making x the subject of
y = xn we obtain two possible expressions:
1
x = ±y n .
However, the fact that the inverse of a bijective function is unique implies that only one of
these expressions is correct. Indeed, before relabelling, we observe that x ≤ 0. This means
that
1
x = −y n .
We deduce that the inverse function f −1 : [0, ∞) → (−∞, 0] is given by
1
f −1 (x) = −x n .
Figure 7.2.4
79
Figure 7.2.5
expb (x) = bx
where b is a real number strictly greater than 1. We will call expb the exponential
function with base b. The base b need only be greater than 0 and not equal to 1 but it
suffices to impose the stricter condition b > 1.
As an optional exercise, you may find it useful to investigate why the condition b > 1 is
reasonable. In particular, check what happens in the cases where (i) b < 0, (ii) b = 0 and
(iii) b = 1. Then note that if 0 < b < 1, we have that b = q −1 for some q > 1, which
implies that bx = q −x . What does this tell us about the relationship between the graphs of
the functions expb (x) with 0 < b < 1 and expq (x) with q > 1?
Being an inverse function of expb , the function logb satisfies the following properties:
The graphs of expb and logb are shown below for the case b = 2:
80
Figure 7.2.6
Note that any point (x, y) that satisfies the logarithmic equation y = log2 (x) also satis-
fies the exponential equation exp2 (y) = x. Similarly, any point (x, y) that satisfies the
exponential equation y = exp2 (x) also satisfies the logarithmic equation log2 (y) = x.
Several additional properties follow from the definition logb (expb (x)) = x. The most im-
portant of these properties are listed below and should be memorised:
Figure 7.2.7
Also consider the bijective function
cos : [0, π] → [−1, 1]
and its inverse function
arccos : [−1, 1] → [0, π].
Their graphs are sketched below:
Figure 7.2.8
Finally, consider the bijective function
! π π"
tan : − , →R
2 2
and its inverse function ! π π"
arctan : R → − , .
2 2
The graphs of these functions can be found in our Calculus Textbook - see subsection 7.6
of the lecture notes for the relevant references.
82
7.3 The derivative of an inverse function
Let f : X → Y be a differentiable bijective function described by an expression of the form
y = f (x). Our task is to find an expression for the derivative
df −1 (y)
dy
of the inverse function f −1 : Y → X.
An alternative approach utilises the fact that the composite function f −1 ◦ f is the identity
function on X; that is,
f −1 (f (x)) = x for all x ∈ X.
Hence, using the chain rule and differentiating both sides of this equation with respect to
x, we obtain the following result:
df −1 (y) df (x)
= 1.
dy dx
dx 1
= .
dy dy
dx
Remark 7.3.1 Note that the derivative of f −1 (y) at a given point y0 is well-defined only
if the derivative of f (x) at the corresponding point x0 = f −1 (y0 ) is non-zero.
Remark 7.3.3 Graphically, the fact that the derivative of f −1 at y0 is the reciprocal of
the derivative of f at x0 = f −1 (y0 ) can be understood as follows: The graphs of f and
f −1 are reflections of each other in the diagonal line through the origin of the Cartesian
plane. Hence, if the slope of the graph of f at a point (x0 , y0 ) is m, the slope of the graph
of f −1 at the corresponding point (y0 , x0 ) is 1/m. In order to convince yourself that this
is indeed the case, you may sketch on the same Cartesian plane the graph of a random
bijective function f and that of its inverse f −1 and compare their slopes at corresponding
points.
Example 7.3.4 Let us apply the above-mentioned result to the functions (i) g : (0, ∞) →
R defined by
g(x) = loge (x) = ln(x)
π π
and (ii) h : R → (− , ) defined by
2 2
h(x) = arctan(x)
x = g −1 (y) = ey .
y = h(x) = arctan(x)
84
for the inverse function. Using the fact that
dy 1
= ,
dx dx
dy
we find the following expression for the derivative dy/dx:
1
(arctan(x))" = .
sec2 (y)
Finally, we need to express the right hand side in terms of x. Instead of replacing y by
arctan(x) to obtain the rather complicated expression
1
(arctan(x))" = ,
sec2 [arctan(x)]
we follow a different approach: We use the trigonometric identity
sec2 (y) = 1 + tan2 (y)
and the relation x = tan(y) to obtain the simpler expression
1 1
(arctan(x))" = 2
= .
1 + tan (y) 1 + x2
This result should be memorised and added to our list of elementary derivatives.
85
Figure 7.4.1
Without sketching the graph of f , find a local inverse for f at the point x0 = −2.
We also note that f " (x) < 0 for all x < 0 and that f " (x) > 0 for all x > 0. This means
that the function f is strictly decreasing on the interval (−∞, 0) and strictly increasing on
the interval (0, ∞). Hence, the function f is bijective on the interval A = [−3, −1] which
contains x0 = −2 as an interior point. The corresponding interval B = [e, e9 ] contains
y0 = f (−2) as an interior point. Of course, other choices of intervals A and B are possible.
and we also need to satisfy the condition that x should belong to A = [−3, −1]. This
implies that the negative root should be selected. Therefore, the function
defined by /
f −1 (y) = − ln(y)
is a local inverse for f : R → R at the point x0 = −2.
86
7.5 Exercises for self study
Exercise 7.5.1 (i) Show carefully that the derivative of g(x) = arcsin(x) is given by
1
g " (x) = √ .
1 − x2
f (x) = sin(x).
2π
Find a local inverse for f at x0 = .
3
2π
Hint: The function arcsin is not a local inverse of sin at x0 = .
3
f (x) = x2 + 4x + 7.
Sketch the graph of f and hence find a local inverse for f at the point x0 = −4.
Exercise 7.5.3 (i) Show carefully that the derivative of f (x) = arccos(x) is given by
1
f " (x) = − √ .
1 − x2
Exercise 7.5.4 Let x denote the quantity of a product and let p denote the price per
unit required to sell x units of this product. You can assume that x and p are continuous
variables.
(i) Sketch the graph of the demand function D : (0, ∞) → (0, 85) defined by
x = D(p) = 85e−6p
and explain why D is a bijective function. Use the labels (s, t) for your graph; i.e., sketch
t = D(s).
(ii) Find an expression for the inverse demand function p = D−1 (x) and sketch its graph
on the same Cartesian plane as in part (i); i.e., sketch t = D−1 (s).
(iii) Obtain an expression for the revenue function R(x) = xD−1 (x) and maximise this
function on (0, 85). State the quantity xmax and the price pmax at this maximum.
87
The price elasticity of demand ε(x) is defined by
D−1 (x)/x
ε(x) = .
(D−1 )" (x)
(iv) Verify that the absolute value of ε(xmax ) is equal to 1, where xmax is defined in part
(iii).
Remark 8.1.1 Note that there are many indefinite integrals F for a given function f .
This is because the addition of a constant term C )= 0 to an indefinite integral F1 of f
produces another indefinite integral F2 of f .
Example 8.1.2 The functions F1 (x) = sin(x) and F2 (x) = sin(x) + 3 are both indefinite
integrals for f (x) = cos(x) because
Given an interval I ⊆ X whose left and right endpoints are a and b respectively, the
definite integral 0 b
f (x)dx
a
88
of f : X → Y is numerically equal to the area bounded by the x-axis, the graph of f and
the vertical lines x = a and x = b, with the understanding that area beneath the x-axis
counts as negative:
Figure 8.1.4
where F is any primitive of f . In other words, it tells us that the area bounded by the
graph of a given function f , the x-axis and the vertical lines x = a and x = b is equal to
F (b) − F (a), where F is any indefinite integral of f ; i.e., any function whose derivative is
f.
In fact, the above result is known as the second version of the fundamental theorem of
calculus. An alternative version of this theorem, referred to as the first version, states
that &0 x '
d
f (t) dt = f (x).
dx a
is a primitive of f .
Remark 8.2.1 The first version of the fundamental theorem of calculus tells us that we
write an indefinite integral of a function f as a definite integral of f with the variable x
being the upper limit and a constant a being the lower limit. Note that the constant a
plays the role of the arbitrary constant of integration.
The following example illustrates both versions of this theorem.
Example 8.2.2 Let f : R → R be defined by
f (x) = x2 .
89
The first version of the fundamental theorem of calculus states that for a ∈ R,
&0 x '
d 2
t dt = x2 .
dx a
t3
In order to illustrate this, let us choose as a primitive of t2 . Then we have
3
&0 x ' & '
d 2 d x 3 a3
t dt = − = x2 ,
dx a dx 3 3
in agreement with the first version of the theorem.
Continuing with this example, consider the interval (1, 2) ⊂ R. The second version of the
fundamental theorem of calculus states that
0 2
x2 dx = F (2) − F (1),
1
0
x−1 dx = ln(−x) + C, x < 0,
90
0
ex dx = ex + C, x ∈ R,
0
cos(x)dx = sin(x) + C, x ∈ R,
0
sin(x)dx = −cos(x) + C, x ∈ R,
0
1
dx = arctan(x) + C, x ∈ R,
x2 +1
0
1
√ dx = arcsin(x) + C, x ∈ (−1, 1).
1 − x2
and 0
x−1 dx = ln(−x) + C, x<0
−arccos(x) + C
1
provides an alternative primitive for √ , since we have that
1 − x2
1
(−arccos(x) + C)" = √ .
1 − x2
where the arbitrary constant C has been omitted for simplicity. Using the corresponding
rule of differentiation,
d
sin(x) = cos(x),
dx
91
and the chain rule, we obtain the derivative of the composite function sin(u(x)):
d
sin(u(x)) = cos(u(x))u" (x).
dx
where the function u(x) is left unspecified and thus remains at our disposal.
We recognise that the term x inside the integral is a multiple of the derivative of the
argument x2 of cos(x2 ). We can therefore think of x2 as u(x) and proceed as follows:
0 0 0
1 1 1
2
xcos(x )dx = 2
cos(x )(2x)dx = cos(u(x))u" (x)dx = sin(u(x)) + C
2 2 2
1
= sin(x2 ) + C.
2
Indeed, by the chain rule, the derivative of the primitive on the right hand side is the
integrand on the left hand side:
& '"
1 2 1
sin(x ) + C = cos(x2 )(2x) = xcos(x2 ).
2 2
Generalising our elementary rules in this way, we produce a much wider family of such rules,
listed below. It should be noted that the domain of validity of each of these generalised
rules depends on the details of the function u(x):
0
uα+1 (x)
uα (x)u" (x)dx = + C, α )= −1,
α+1
0
u−1 (x)u" (x)dx = ln|u(x)| + C,
0
eu(x) u" (x)dx = eu(x) + C,
0
cos(u(x))u" (x)dx = sin(u(x)) + C,
0
sin(u(x))u" (x)dx = −cos(u(x)) + C,
92
0
1
u" (x)dx = arctan(u(x)) + C,
u2 (x) +1
0
1
/ u" (x)dx = arcsin(u(x)) + C.
1 − u2 (x)
Remark 8.4.2 Note that the term u" (x)dx can be written compactly as du(x). Simpli-
fying this notation further, we can even write du(x) as du, in which case we recover our
original elementary rules with x replaced by u. For example, the last rule above becomes:
0
1
√ du = arcsin(u) + C
1 − u2
Example 8.4.4 This example is concerned with the case where u is a non-linear
function of x: Compute the integral
0
2
e4x xdx.
Example 8.4.5 This is about integrals of the arctan-type. The characteristic prop-
erty of such integrals is that the quadratic polynomial in the denominator has negative
discriminant: Compute the integral
0
1
2
dx.
x + 4x + 7
Without the explicit use of the label u, we have:
0 0 0
1 1 1 1
2
dx = 2
dx = & '2 dx
x + 4x + 7 (x + 2) + 3 3 x+2
√ +1
3
93
√ 0 & ' √ & '
3 1 x+2 3 x+2
= & '2 d √ = arctan √ + C.
3 x+2 3 3 3
√ +1
3
Example 8.4.5 This is about integrals of the arcsin-type. The characteristic prop-
erty of such integrals is that the quadratic polynomial inside the square root has positive
discriminant and negative leading coefficient: Compute the integral
0
1
√ dx.
5 − x2
Remark 8.4.6 We could add a few more integrals of the trigonometric-type to the above
list as well as some typical integrals of the hyperbolic trigonometric-type. However, that
would go beyond the scope of this course.
du = 8xdx.
Hence, 0 0 0
4x2 1 4x2 1 1 1 2
e xdx = e 8xdx = eu du = eu + C = e4x + C.
8 8 8 8
94
This technique is known as integration by change of variable and is more general than
integration by recognition, in the sense that it can be applied to integrals where it is
difficult to recognise a primitive directly.
Here, it seems difficult to recognise a primitive, so let us try to eliminate the square root
by using a suitable change of variable. To this end, we introduce the trigonometric variable
θ by
x = sin(θ),
where
π π
−
<θ< .
2 2
π π
Note that the function sin : [− 2 , 2 ] → [−1, 1] is bijective; this guarantees the validity of
this method.
We have
dx = cos(θ)dθ,
so the integral becomes
0 π/2 2
1 − sin2 (θ) cos(θ)dθ.
−π/2
We now use the fact that 1 − sin2 (θ) = cos2 (θ) and note that cos(θ) ≥ 0 for θ ∈ [− π2 , π2 ],
which implies that 2
1 − sin2 (θ) = cos(θ).
Hence, the integral becomes
0 π/2
cos2 (θ)dθ.
−π/2
Using the double angle formula cos(2θ) ≡ 2cos2 (θ) − 1, we obtain the equivalent integral
0 π/2
1
(cos(2θ) + 1)dθ
2 −π/2
which yields
3 4 π/2
1 sin(2θ)
+θ .
2 2 −π/2
√
By evaluating these limits, we learn that the area under the graph of y = 1 − x2 between
−1 and 1 is equal to π/2. This is indeed the area of the upper half of a circle of radius 1.
95
Figure 8.6.1
This graph has two vertical asymptotes described by the Cartesian equations x = b and
x = d. Clearly, f is not defined at the points b and d, so its domain X is the subset of R
given by
X = (−∞, b) ∪ (b, d) ∪ (d, ∞).
The points a ∈ X, c ∈ X and e ∈ X are some random points belonging to the domain of
f . Note that f is continuous on X and that f (x) → 0 as x → ±∞.
Suppose that we are interested in calculating the areas under the graph of f corresponding
to the following definite integrals:
0 a 0 b 0 c
(i) f (x)dx, (ii) f (x)dx, (iii) f (x)dx,
−∞ a b
0 d 0 e 0 ∞
(iv) f (x)dx, (v) f (x)dx, (vi) f (x)dx.
c d e
The first and sixth integrals are called improper integrals of the first kind: the integrand
remains finite throughout the domain of integration, but the domain of integration is
unbounded. The second, third, fourth and fifth integrals are called improper integrals
of the second kind: the domain of integration is bounded, but the integrand encounters
a vertical asymptote at one of the endpoints of the domain of integration.
96
0 b &0 M '
(ii) f (x)dx = limM →b− f (x)dx ,
a a
0 c &0 c '
(iii) f (x)dx = limM →b+ f (x)dx ,
b M
0 d &0 M '
(iv) f (x)dx = limM →d− f (x)dx ,
c c
0 e &0 e '
(v) f (x)dx = limM →d+ f (x)dx ,
d M
0 ∞ &0 M '
(vi) f (x)dx = limM →∞ f (x)dx .
e d
If taking the relevant limit leads to a unique finite value, we say that the improper integral
converges. Otherwise, we say that the integral diverges.
0 ∞ 0 ∞ 0 2 0 7
1 1 1 1
(a) dx, (b) √ dx, (c) dx, (d) √ dx.
3 x2 2 x 0 x2 4 7−x
Figure 8.6.3
97
(a) This is an improper integral of the first kind. Following the definition, we have:
0 ∞ &0 M ' 53 4M 6 & '
1 1 1 1 1 1
dx = lim M →∞ dx = lim M →∞ − = lim M →∞ − + = .
3 x2 3 x2 x 3 M 3 3
1
Hence, this integral converges. The first area is equal to .
3
(b) This is an improper integral of the first kind. We have:
0 ∞ &0 M ' !7 √ 8 "
1 1 M
√ dx = limM →∞ √ dx = limM →∞ 2 x 2
2 x 2 x
! √ √ "
= limM →∞ 2 M − 2 2 → ∞.
(c) This is an improper integral of the second kind. The vertical asymptote occurs at
x = 0, so the integral is defined by the following limit:
0 2 &0 2 ' 53 42 6 & '
1 1 1 1 1
2
dx = limM →0+ 2
dx = limM →0+ − = limM →0+ − + → ∞.
0 x M x x M 2 M
(d) This is an improper integral of the second kind. The vertical asymptote occurs at
x = 7, so the integral is defined by the following limit:
0 7 &0 M ' !7 √
1 1 8M "
√ dx = limM →7− √ dx = limM →7− −2 7 − x 4
4 7−x 4 7−x
! √ √ " √
= limM →7− −2 7 − M + 2 3 = 2 3.
√
This integral converges, so the fourth area is equal to 2 3.
98
Exercise 8.7.2 Compute the following integrals either by recognition or by a change of
variable: 0 √x
e
√ dx,
2 x
0
3 − 2x
√ dx.
1 − x2
Express your final answers in terms of x.
Given such a function f , the probability P (X ≤ a) that X takes a value less than or equal
to a ∈ R is equal to the improper definite integral
0 a
P (X ≤ a) = f (x) dx.
−∞
(i) Find the value of the constant c that makes the following function f : R → R a
probability density function:
x
ce x ∈ (−∞, 0)
f (x) =
−x
ce x ∈ [0, ∞)
(ii) Hence, calculate the probability P (X ≤ 2) leaving your answer in exact form.
Exercise 8.7.4 Use the change-of-variable technique to compute the indefinite integrals
0
1
2
dx
2x + 12x + 23
and 0
1
√ dx
−x2 − 8x − 10
Express both integrals in terms of x.
99
9 One-Variable Calculus, Part 7 of 7
9.1 Integration by partial fractions
This technique deals with integrals of the form
0
pm (x)
dx,
pn (x)
Step 1: We start by comparing the degrees of the polynomials pm (x) and pn (x):
pm (x)
(i) If m < n, the integrand is said to be in standard form.
pn (x)
pm (x)
(ii) If m ≥ n, we use polynomial division in order to express the integrand in
pn (x)
the form
R(x)
Q(x) + .
pn (x)
R(x)
The quotient Q(x) is a polynomial. The remainder term has the property that
pn (x)
the degree of its numerator is strictly less than the degree of its denominator. As a
result, we can express the original integral as a sum of two integrals:
0 0 0
pm (x) R(x)
dx = Q(x)dx + dx.
pn (x) pn (x)
Step 2: Since standard form is always attainable, let us only consider integrands of the
form
R(x)
pn (x)
where the degree of the numerator is strictly less than n. Let us also assume that the
polynomial pn (x) has been factorised into a product of polynomials of the lowest degree;
that is, polynomials of degree 1 and polynomials of degree 2 of negative discriminant. Some
of these factors may be repeated in pn (x).
Using these factors as building blocks, we construct partial fractions according to the
following rules:
(i) Each distinct polynomial p(x) of degree 1 that is raised to the power m in pn (x)
produces m partial fractions of the form
a1 a2 am
+ + · · · + ,
p(x) (p(x))2 (p(x))m
100
(ii) Each distinct polynomial q(x) of degree 2 of negative discriminant that is raised to
the power k in pn (x) produces k partial fractions of the form
a1 x + b 1 a2 x + b 2 ak x + b k
+ 2
+ ··· + ,
q(x) (q(x)) (q(x))k
Step 3: The resulting fractions can be integrated using elementary integration techniques.
In particular:
Example 9.1.1 Before we present these integration techniques, let us illustrate the
process described in step 2 by applying it to an integrand that contains all the different
kinds of factors that we have discussed: The denominator of this integrand is a polynomial
of degree ten which has been factorised into its simplest possible factors. The numerator
of this integrand is an unspecified polynomial of degree strictly less than ten. Regardless
of the particular choice of the polynomial in the numerator, the process described in step
2 results in the following partial fractions:
A0 x0 + A1 x1 + · · · + A9 x9 a1 b1 b2 b3
3 2 2 2
= + + 2
+
(x − 4)(x + 5) (x + 4x + 7)(x + 1) x − 4 x + 5 (x + 5) (x + 5)3
c1 x + d 1 e1 x + f1 e 2 x + f2
+ + 2 + 2 .
x2+ 4x + 7 x +1 (x + 1)2
Note that the constants on the left hand side are arbitrary but given, while the constants
on the right hand side are to be determined.
Remark 9.1.2 Examples involving the calculation of constants such as the lower-case
constants above and examples illustrating the process of polynomial division can be found
in section 10.6 of our calculus textbook. Throughout these notes, I have assumed that you
are familiar with such calculations.
Let us now complete this subsection by presenting two examples that involve the integration
of fractions of the form
am x + b m
,
(q(x))m
101
where q(x) is a polynomial of degree 2 of negative discriminant:
We observe that the integrand is not in standard form. Therefore, we use polynomial
division to express it as follows:
0 & '
7x − 3
x+ 2 dx.
x + 6x + 11
7x − 3
Moreover, we observe that the remainder term is of the form
x2 + 6x + 11
ax + b
q(x)
where q(x) is a polynomial of degree 2 of negative discriminant. Any such integral can be
calculated by using the following technique:
The factor 7/2 appearing in I2 is needed in order to reproduce the ‘7x’ term in the numer-
ator of the original integral. Similarly, the constant −24 appearing in I3 is needed in order
to reproduce the ‘−3’ term in the numerator of the original integral.
102
0
7 2x + 6 7
I2 = dx = ln(x2 + 6x + 11),
2 x2 + 6x + 11 2
and I3 is of the arctan-type. We have:
0 0 0
1 1 24 1
I3 = −24 2
dx = −24 2
dx = − '2 & dx
x + 6x + 11 (x + 3) + 2 2
x+3
√ +1
2
& ' & '
√ 0 1 x+3 √ x+3
= −12 2 & '2 d √ = −12 2 arctan √ .
x+3 2 2
√ +1
2
Adding I1 , I2 and I3 , we obtain the final answer:
0 3 & '
x + 6x2 + 18x − 3 x2 7 2
√ x+3
dx = + ln(x + 6x + 11) − 12 2 arctan √ + C.
x2 + 6x + 11 2 2 2
1
which yields − + C. To this end, we express our integral
q(x)
0
8x + 5
dx
(x2 + 6x + 11)2
The factor 4 appearing in I1 is needed in order to reproduce the ‘8x’ in the numerator
of the original integral. Similarly, the constant −19 appearing in I2 is needed in order to
reproduce the ‘+5’ term in the numerator of the original integral.
103
The integral I1 can be calculated by recognition. We have:
0
2x + 6 4
4 2 2
dx = − 2 .
(x + 6x + 11) x + 6x + 11
The integral I2 is not of the arctan-type because of the presence of the square in the
denominator. However, it is closely related to such an integral in the sense that it can be
simplified by using a arctan-type change of variable. In particular, we complete the square
to get 0
dx
I2 = −19
((x + 3)2 + 2)2
and then consider a change of variable that is motivated by the following argument:
1
We now use the double angle formula cos2 (θ) = (cos(2θ) + 1) to obtain the equivalent
2
integral √ 0 √ & '
19 2 19 2 sin(2θ)
I2 = − (cos(2θ) + 1)dθ = − + θ + C.
8 8 2
Finally, we would like to express I2 as a function of x. To this end, we use the double
angle formula sin(2θ) = 2 sin(θ)cos(θ) to obtain an expression for I2 in terms of θ, sin(θ)
and cos(θ),
√
19 2 ! "
I2 = − sin(θ)cos(θ) + θ + C,
8
√
and then use the relation x + 3 = 2 tan(θ) to express θ, sin(θ) and cos(θ) in terms of x.
In fact, the easiest way to achieve this is to consider a right-angled triangle with θ being
x+3
one of the two acute angles. Given that tan(θ) = √ , we can set the side opposite to
2 √
the angle θ equal to x + 3 and the side / adjacent to the angle θ equal to 2. Then the
hypotenuse of this triangle is equal to (x + 3)2 + 2, as illustrated below:
104
Figure 9.1.5
Therefore, I2 is equal to
√ 5 √ & '6
19 2 x+3 2 x+3
I2 = − / / + arctan √ + C.
8 (x + 3)2 + 2 (x + 3)2 + 2 2
You may confirm this result by differentiating the right hand side.
105
This is the formula for integration by parts. As is always the case with indefinite
integrals, the left hand side is determined by the right hand side only up to the addition
of an arbitrary constant.
0
Given an integral h(x) dx that is difficult to calculate directly, the above formula can be
useful as long as the following two conditions are satisfied:
(i) The integrand h(x) can be viewed as the product of two functions f (x) and g " (x),
where it is possible to integrate g " (x).
0
(ii) Either the resulting integral f " (x)g(x)dx can be calculated directly or can eventu-
ally lead to a solution.
We identify f (x) = x and g " (x) = ex and apply the rule. We have f " (x) = 1, g(x) = ex , so
0 0
xe dx = xe − ex dx = xex − ex + C.
x x
1
We identify f (x) = ln(x) and g " (x) = 1 and apply the rule. We have f " (x) = , g(x) = x,
x
so 0 0
ln(x)dx = xln(x) − dx = xln(x) − x + C.
Remark 9.2.3 There are cases where the method of integration by parts generates a
recursive formula which eventually leads to a solution. The following example provides a
simple illustration of such a case.
We identify f (x) = sin(x) and g " (x) = ex and apply the rule. We have f " (x) = cos(x),
g(x) = ex , so 0 0
e sin(x)dx = e sin(x) − ex cos(x)dx.
x x
Regarding the second integral, we now perform integration by parts once more. We identify
f (x) = cos(x) and g " (x) = ex (otherwise we will undo our previous step and go back to the
start) and apply the rule again. We have f " (x) = −sin(x), g(x) = ex , so
0 & 0 '
x x x x
e sin(x)dx = e sin(x) − e cos(x) + e sin(x) dx .
106
0
We now observe that the integral ex sin(x)dx appears twice in the above equation, so we
can make it the subject of this equation. In this way, we find:
0
1
ex sin(x)dx = (ex sin(x) − ex cos(x)) + C.
2
1
We identify f (x) = arctan(x) and g " (x) = 1 and apply the rule. We have f " (x) =
x2 +1
and g(x) = x, so 0 0
x
arctan(x)dx = x arctan(x) − dx.
x2 + 1
Integrating the second term by recognition, we obtain
0 0
1 2x 1
arctan(x)dx = x arctan(x) − dx = x arctan(x) − ln(x2 + 1) + C.
2 x2 + 1 2
Consider a market with one product. Let x denote the number of units of this product and
p denote the price per unit. We can assume that both x and p are continuous variables.
where 0 ≤ x ≤ 20. Also suppose that the inverse supply function is given by
p = S(x) = x2 + 20.
Clearly, consumers who buy the product at this market price p0 would have been willing
to pay at least p0 . This monetary gain obtained by consumers — because they are able
to purchase a product for a price that is less than the highest price that they would be
willing to pay — forms the consumers’ surplus. It is the area under the demand curve
y = D(x) minus the area x0 p0 corresponding to the total amount the consumers pay. The
relevant illustration follows:
107
Figure 9.3.1
We can see that the consumers’ surplus corresponds to the definite integral
0 x0 0 5 3 45
3x2 75
D(x)dx − x0 p0 = (60 − 3x)dx − 225 = 60x − − 225 = .
0 0 2 0 2
Similarly, the producers’ surplus is the amount that producers benefit by selling at a
market price that is higher than the least price they would be willing to sell for. This is the
area x0 p0 corresponding to the total amount the producers receive minus the area under
the supply curve y = S(x):
Figure 9.3.2
We can see that the producers’ surplus corresponds to the definite integral
0 x0 0 5 3 45
2 x3 250
x0 p0 − S(x)dx = 225 − (x + 20)dx = 225 − + 20x = .
0 0 3 0 3
108
Exercise 9.4.2 Calculate the integral
0
1
dx.
(1 − x2 )2
Exercise 9.4.3 Use the method of partial fractions to calculate the integral
0 3
x − x2 + x + 1
dx
(x2 + 1)2
Exercise 9.4.4 Use the method of change of variable to calculate the integrals
0 0 0
8 3 2 6 3 sin(x) cos(x)
(i) cos (x) sin (x) dx (ii) (x + 2) x dx and (iii) dx.
(1 + sin(x))3
10 Matrices, Part 1 of 3
10.1 Definitions, notation and terminology
We define a matrix A of size m × n as a rectangular array of numbers with m rows and
n columns:
a11 a12 ... a1n
a21 a22 . . . a2n
A = .. .. .. .. .
. . . .
am1 am2 . . . amn
The numbers in the array are called the entries in the matrix. In particular, the entry
that occurs in the ith row and jth column of A is denoted by aij and called the (i, j) entry.
It is often useful to denote an m × n matrix A by (aij )m×n where 1 ≤ i ≤ m and 1 ≤ j ≤ n.
Two matrices are equal if they have the same size and their corresponding entries are
equal. In other words, given
we say that
A=B
109
if and only if aij = bij for all i, j, where i ∈ {1, 2, ..., m} and j ∈ {1, 2, ..., n}.
The zero matrix of size m × n, denoted by 0 = (0)m×n , has all entries zero:
0 0 ... 0
0 0 . . . 0
0 = .. .. . . .. .
. . . .
0 0 ... 0
A column matrix, also called a column vector, is a matrix C = (ci1 )m×1 with only one
column:
c11
c21
C = .. .
.
cm1
Similarly, a row matrix, also called a row vector, is a matrix R = (r1i )1×n with only
one row: , -
R = r11 r12 . . . r1n .
A matrix S = (sij )n×n with n rows and n columns is called a square matrix of order n:
s11 s12 ... s1n
s21 s22 . . . s2n
S = .. .. . . .. .
. . . .
sn1 sn2 . . . snn
Equivalently, we can write In = (aij )n×n with aii = 1 for all i ∈ {1, 2, ..., n} and aij = 0 if
i )= j.
The identity matrix In and the zero matrix 0 = (0)n×n are examples of what is known as a
diagonal matrix of order n. This is a square matrix D = (dij )n×n in which all the entries
off the main diagonal are zero:
d11 0 . . . 0
0 d22 . . . 0
D = .. .. . . .. .
. . . .
0 0 . . . dnn
A wider category in which diagonal matrices belong is that of triangular matrices. There
are two kinds of triangular matrices:
110
An upper triangular matrix U = (uij )n×n is a square matrix of order n in which all the
entries below the main diagonal are zero:
u11 u12 . . . u1n
0 u22 . . . u2n
U = .. .. . . . .. .
. . .
0 0 . . . unn
Similarly, a lower triangular matrix L = (lij )n×n is a square matrix of order n in which
all the entries above the main diagonal are zero:
l11 0 . . . 0
l21 l22 . . . 0
L = .. .. . . .. .
. . . .
ln1 ln2 . . . lnn
Note that a diagonal matrix of order n is both an upper triangular and a lower triangular
matrix of the same order.
A = AT .
Example 10.1.1 Write down the zero matrix 0 of size 2 × 3, the identity matrix I2 , a
general diagonal matrix D of order 3 and a general square matrix M of order 2. Also write
down their transposes 0T , IT2 , DT and MT . Are any of these matrices symmetric?
111
We have:
& ' & ' a 0 0 & '
0 0 0 1 0 d e
0= I2 = D = 0 b 0 M= .
0 0 0 0 1 f g
0 0 c
Their transposes are:
0 0 & ' a 0 0 & '
1 0 d f
0 T = 0 0 IT2 = DT = 0 b 0 T
M = .
0 1 e g
0 0 0 0 c
A matrix is symmetric if it is equal to its transpose. The matrix 0 is not symmetric because
it is not equal to 0T . The matrices I2 and IT2 are equal so they are symmetric. Similarly,
the general diagonal matrices D and DT are equal so they are symmetric. Regarding the
general square matrices M and MT , they are equal and hence symmetric only if e = f ;
otherwise, they are not symmetric.
Example 10.1.2 Write down the transpose UT of a general upper triangular matrix U
of order 3. Is UT a lower triangular matrix? What property must U satisfy in order to be
a symmetric matrix?
We have:
a b c a 0 0
U = 0 d e and UT = b d 0 .
0 0 f c e f
Clearly, the matrix UT is a lower triangular matrix. The matrices U and UT are equal
and hence symmetric if b = c = e = 0. In other words, they are symmetric only if they are
diagonal.
Finally, let us introduce two more definitions which will be needed in what follows:
A matrix is said to be in row echelon form only if it has the following three properties:
1. For every non-zero row, the first non-zero entry in the row is equal to 1. Such an
entry is called a leading 1.
2. For any two non-zero rows, the leading 1 in the lower row is further to the right than
the leading 1 in the higher row.
3. Zero rows are at the bottom of the matrix.
A matrix is said to be in reduced row echelon form only if it satisfies the previous three
properties plus the fourth property added below:
4. Every column that contains a leading 1 has zeros elsewhere in the column.
Example 10.1.3 Identify every matrix that is in row echelon form and every matrix that
is in reduced row echelon form:
1 2 0
1 5 −4 & ' & '
2 0 0 0 1 0 0 1 0
A= 0 0 0 B= C= D= 0 0 0 .
0 0 0 1 0 0
0 0 0
0 0 0
112
The matrix A is in reduced row echelon form and hence also in row echelon form. The
matrices B and C are not in row echelon form; hence, they are not in reduced row echelon
either. The matrix D is in row echelon form; however, it is not in reduced row echelon
form.
The operation of matrix addition applies only to matrices of the same size. In particular,
given matrices A = (aij )m×n and B = (bij )m×n , their sum A+B is defined to be the matrix
obtained by adding the entries of A to the corresponding entries of B. That is,
A + B = (cij )m×n
where
cij = aij + bij .
The operation of scalar multiplication applies to a matrix of any size: Given a matrix
A = (aij )m×n and a real number λ ∈ R (also known as a scalar), the matrix λA is defined
to be the matrix obtained by multiplying each entry of the matrix A by λ. That is,
λA = (bij )m×n
where
bij = λaij .
The matrix λA is called a scalar multiple of the matrix A.
Remark 10.2.1 Given matrices A = (aij )m×n and B = (bij )m×n of the same size, the
difference A − B is defined by combining the operations of matrix addition and scalar
multiplication. In particular, we have
A − B = A + (−B),
where −B is the matrix obtained by multiplying B by the scalar −1. In other words,
A − B = (cij )m×n
where
cij = aij − bij .
113
By combining the operations of matrix addition and scalar multiplication, we get:
& ' & ' & ' & '
3 2 −4 2 5 −6 9 6 −12 4 10 −12
3A − 2B = 3 −2 = −
0 9 1 7 7 0 0 27 3 14 14 0
& ' & ' & '
9 6 −12 −4 −10 12 5 −4 0
= + = .
0 27 3 −14 −14 0 −14 13 3
On the other hand, the matrix 2A + 5C is not defined. This is because the matrices 2A
and 5C do not have the same size, so they cannot be added.
The operation of matrix multiplication produces a product matrix AB from two ma-
trices A and B only if the number of columns of A is equal to the number of rows of B.
In particular, given a matrix A = (aij )m×r and a matrix B = (bij )r×n , their product is the
matrix AB = (cij )m×n whose general entry cij is given by the rule
This rule tells us that in order to find the general entry cij of AB, we need to select row
i of A and column j of B, multiply the corresponding entries from the row and column
together, and then add up the resulting products.
Are the matrices AB, BA, AC and CA defined? Compute those which are defined.
114
On the other hand, the matrix CA is not defined. This is because C is a 2 × 3 matrix and
A is a 2 × 2 matrix, so the number of columns of C is not equal to the number of rows of
A.
Remark 10.2.4 The above example shows that matrix multiplication is not commuta-
tive; that is MN )= NM in general. On the other hand, many familiar laws of arithmetic
are valid for matrices. We summarise these laws in the following subsection.
In addition, there are associative and distributive laws which involve scalar multiplication.
The four main ones are:
• λ(µA) = (λµ)A,
• λ(A + B) = λA + λB,
• (λ + µ)A = λA + µA.
Finally, there are laws satisfied by the zero matrix 0 of the appropriate size,
• A + 0 = A,
• A − A = 0,
• 0A = 0,
• A0 = 0,
• AI = A,
• IA = A.
115
10.4 The inverse matrix and its properties
The operation of matrix inversion applies only to a square matrix A = (aij )n×n . Moreover,
not all square matrices can be inverted. We say that a matrix A = (aij )n×n is invertible
if there exists a matrix B = (bij )n×n such that
AB = BA = In .
Recall that In is the n × n identity matrix. Provided that B exists, it is called the inverse
of A and is denoted by A−1 .
Proof Assume that an invertible n × n matrix A has two inverses, B and C, so that
AB = BA = In
and
AC = CA = In .
We will show that B and C must actually be the same matrix, thereby proving uniqueness.
To this end, consider the product CAB. Since matrix multiplication is associative and
AB = In , we have
CAB = C(AB) = CIn = C.
On the other hand, again by associativity and the fact that CA = In , we have
CAB = (CA)B = In B = B.
We conclude that C = B, so there is only one inverse matrix for any invertible n×n matrix
A.
Some of the most important properties of the inverse matrix are listed below:
116
10.5 Powers of a matrix
Given a square matrix A and a natural number n ∈ N, we define An as the product
An = A
? A@A. . . AB .
n times
Powers of matrices obey rules similar to those obeyed by powers of numbers. First, if A is
an invertible matrix and n ∈ N, then:
The proof of this statement follows immediately from the definition of an inverse matrix
and the associativity of matrix multiplication.
In addition, the usual rules for exponents hold; that is, for integers r, s, we have:
Ar As = Ar+s
and
(Ar )s = Ars .
(AT )T = A,
(λA)T = λAT ,
(A + B)T = AT + BT ,
(AB)T = BT AT
and finally,
(AT )−1 = (A−1 )T .
The proofs of the first four statements are omitted but they follow directly from the defi-
nition of the transpose. The last property follows from the fourth property above and the
definition of the inverse. In particular, we have:
117
10.7 Matrix equations and their solutions
A system of m linear equations in n unknowns x1 , x2 , . . . , xn is a set of m equations
of the form
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
We say that s1 , s2 , . . . , sn is a solution of this system if all m equations hold true when
x 1 = s 1 , x2 = s 2 , . . . , x n = s n .
The matrix A = (aij )m×n , whose (i, j) entry is the coefficient aij of the linear system is
called the coefficient matrix:
a11 a12 . . . a1n
a21 a22 . . . a2n
A = .. .. ... .. .
. . .
am1 am2 . . . amn
and we also denote the m×1 column vector whose entries are the given numbers b1 , b2 , . . . , bm
by
b1
b2
b = .. ,
.
bm
Ax = b.
Finding the set of all solutions of such a matrix equation will be the main focus of the
following few lectures.
118
10.8 Exercises for self study
Exercise 10.8.1 If a and b are both column matrices of size n × 1, what is the size of
the matrix product aT b? What is the relationship between aT b and bT a?
& ' & '
2 5 x y
Exercise 10.8.2 Let B = and set B−1 = . Find numerical values for
0 1 z w
x, y, z and w and hence determine the matrix B−1 by solving the system of equations
& '& ' & '
2 5 x y 1 0
=
0 1 z w 0 1
Which of the following matrix expressions are defined? Compute those which are defined.
Exercise 10.8.4 Using the definition of the inverse of a matrix and the rules of matrix
algebra found in subsection 10.3, prove that if A and B are invertible matrices of the same
order, then AB is invertible and
119
11 Matrices, 2 of 3
11.1 Solving systems of n linear equations for n unknowns
We begin our investigation of linear systems by considering systems of n linear equations
for n unknowns x1 , x2 , ..., xn . In particular, we have
Ax = b,
where the square matrix A and the column vectors x and b are defined below:
a11 a12 . . . a1n x1 b1
a21 a22
. . . a2n
x2 b2
A = .. .. . . . .. , x = . , b = . .
. . . .. ..
an1 an2 . . . ann xn bn
In general, the matrix equation Ax = b may not have a solution, and even if it has a
solution, this solution may not be unique. In order to appreciate this point, let us consider
three simple examples of linear systems, each involving two equations for two unknowns.
We will see that the first system has a unique solution, the second system has no solution
and the third system has infinitely many solutions. For simplicity, the unknowns are
denoted by x and y instead of x1 and x2 .
In order to solve it, we multiply the second equation by 2 to obtain the equivalent system
C
2x + 3y = 12
2x + 8y = 22
and then subtract the first equation from the second one to find the value of y. In this
way, we get
5y = 10,
which implies that
y = 2.
120
By substituting this value in any one of the original equations, we find that
x = 3.
Thus, the system has the unique solution
(x, y) = (3, 2).
Following the same method, we multiply the second equation by 2 to obtain the equivalent
system C
2x + 6y = 12
2x + 6y = 22
and then subtract the first equation from the second one. We get the contradictory equation
0 = 10,
which implies that this system has no solution.
Using the same approach, we multiply the second equation by 2 to obtain the equivalent
system C
2x − 4y = 10
2x − 4y = 10.
This implies that our system imposes only a single restriction on the variables x and y.
Hence, any pair of values for x and y that satisfies
2x − 4y = 10
is a solution of this system. In particular, we can assign an arbitrary value t ∈ R (known
as a free parameter) to any one of the variables x and y, and then solve the equa-
tion 2x − 4y = 10 for the other variable. For example, if we let y play the role of the
free parameter t ∈ R and make x the subject of the equation 2x − 4y = 10, we obtain
x = 5 + 2t. In this way, we express the general solution of our linear system in the
so-called parametric form:
x = 5 + 2t
y = t.
Clearly, this system has infinitely many solutions. Each solution corresponds to a different
value of t ∈ R. For example, if t = 0, we find (x, y) = (5, 0). If t = 1, we find (x, y) = (7, 1),
and so on.
The first and the third linear systems are called consistent because they admit at least
one solution. The second linear system is called inconsistent because it admits no
solution.
121
11.2 Elementary row operations
As the number of equations and unknowns in a linear system increases, so does the com-
plexity of the algebra involved in finding solutions. A systematic approach is therefore
needed. The key idea is to perform invertible algebraic operations on the linear system
which are designed to produce a succession of increasingly simpler linear systems. These
systems must all be equivalent to the original system in the sense that they should preserve
its set of solutions. The process terminates when a linear system is reached whose con-
sistency or inconsistency is evident and whose set of solutions, if any, can be immediately
obtained. It turns out that this final system has a coefficient matrix which is always in
reduced row echelon form.
Typically, there are three main algebraic operations involved in this process:
• Multiplying both sides of that particular equation by the reciprocal of the constant
used above.
• Subtracting the same multiple of the first equation from the second equation. Note
that ‘subtracting ... from’ is equivalent to ‘adding the negative of ... to’, so this last
operation is an elementary row operation in accordance with the definition of that
term.
In subsection 11.3, we will prove the invertibility of each elementary row operation by a
more formal argument which involves invertible matrices, called elementary matrices. For
the time being, let us return to the linear systems presented in Examples 11.1.1, 11.1.2 and
11.1.3 and use elementary row operations to establish the consistency or the inconsistency
of each system and also obtain its set of solutions whenever that system is consistent.
Our aim is to use a sequence of elementary row operations that result in an equivalent
system whose coefficient matrix is in reduced row echelon form. We can start by creating
a leading 1 in the first row of the corresponding matrix equation. In order to achieve this,
we can either divide the first equation by 2 or simply interchange the two equations. Let
us perform the latter operation and denote the corresponding interchanging of the rows by
R1 ↔ R2.
122
In this way, we get the equivalent system
C
x + 4y = 11
2x + 3y = 12.
We now subtract two times the first equation from the second equation. This corresponds
to the row operation
R2 → R2 − 2R1
and produces the simpler equivalent system
C
x + 4y = 11
−5y = −10.
By multiplying the second equation by − 15 , we create a leading 1 in the second row of the
corresponding coefficient matrix. This row operation is denoted by
1
R2 → − R2
5
Finally, we can transform this matrix into reduced row echelon form by subtracting four
times the second equation from the first equation. The corresponding row operation is
R1 → R1 − 4R2
The system is clearly consistent and has a unique solution which can be read directly. Note
that the coefficient matrix of this system is equal to the identity matrix
& '
1 0
I=
0 1
We can start again by creating a leading 1 in the first row of the corresponding matrix
equation. To this end, we perform the row operation
R1 ↔ R2
123
which brings our system into the form
C
x + 3y = 11
2x + 6y = 12.
We now subtract two times the first equation from the second equation. The corresponding
row operation
R2 → R2 − 2R1
produces the equivalent system
C
x + 3y = 11
0 = −10.
which is clearly inconsistent. The coefficient matrix of this last system has a row of zeros,
& '
1 3
,
0 0
so the process of row-reduction cannot continue further. As with the previous example,
note that this final coefficient matrix is indeed in reduced row echelon form.
R2 → R2 − 2R1
x = 5 + 2t
y = t.
124
11.3 Elementary matrices and row-equivalence
It is an interesting fact that all the elementary row operations used in the previous sub-
section can be represented themselves by square matrices of the appropriate order. For
the examples presented in subsection 11.2, the appropriate order is 2. In the general case
where the system to which elementary row operations are applied is a linear system of n
equations for n unknowns, the appropriate order in n. These square matrices are known
as elementary matrices.
Formally, an elementary matrix E is defined as an n×n matrix obtained from the identity
matrix In by performing exactly one elementary row operation on In . For example,
1 0 0 0
& ' 1 0 0
1 0 0 1 0 0
, 0 0 1 , 0 7 1 0
0 3
0 1 0
0 0 0 1
are all elementary matrices. The first elementary matrix is obtained from I2 by multiplying
the second row of I2 by 3. This matrix operates (in a way to be explained below) on
a linear system of two equations for two unknowns. It represents the elementary row
operation corresponding to multiplying the second equation of such a system by 3. The
second elementary matrix is obtained from I3 by interchanging the second and third rows
of I3 . This matrix operates on a linear system of three equations for three unknowns. It
represents the elementary row operation corresponding to interchanging the second and
third equations of such a system. Finally, the third elementary matrix is obtained from
I4 by adding seven times the second row to the third row of I4 . This matrix operates
on a linear system of four equations for four unknowns. It represents the elementary row
operation corresponding to adding seven times the second equation to the third equation
of such a system.
In order to illustrate how elementary matrices operate on linear systems, let us return to
Example 11.2.1 and find all the elementary matrices involved in the reduction of this system
to its simplest possible equivalent form. For notational convenience, let us introduce the
so-called augmented matrix of the linear system Ax = b, denoted by
(A|b).
As its notation suggests, the augmented matrix is obtained by appending the column vector
b to the matrix A as the last column. When actual numerical entries are used, the vertical
bar is often omitted.
Example 11.3.1 The augmented matrix of the linear system considered in Example
11.2.1 is & '
2 3 12
(A|b) = .
1 4 11
The sequence of elementary row operations originally used in Example 11.2.1 in order to
transform this system into an equivalent system in reduced row echelon form is listed again
below:
E1 : R1 ↔ R2
125
E2 : R2 → R2 − 2R1
1
E3 : R2 → − R2
5
E4 : R1 → R1 − 4R2.
The way in which elementary matrices operate on the augmented matrix of a linear system
is by left matrix multiplication. For example, the matrix product E1 (A|b) represents
the very first elementary row operation performed on our system. Since the augmented
matrix of our system is a 2 × 3 matrix, the elementary matrices E1 , E2 , E3 and E4
representing the row operations E1, E2, E3 and E4 must be 2 × 2 matrices. Otherwise,
left matrix multiplication cannot be defined. Recall that, by its very construction, a 2 × 2
elementary matrix E representing an elementary row operation E is obtained from the
identity matrix I2 by performing the exact same elementary row operation E on I2 . Hence,
we find that
& ' & ' & ' & '
0 1 1 0 1 0 1 −4
E1 = E2 = E3 = E4 = .
1 0 −2 1 0 − 15 0 1
Let us now consider the corresponding sequence of equivalent linear systems arising in this
process, starting from the original system and terminating with the system in reduced row
echelon form. Each such system is represented by its augmented matrix, which is denoted
below by (Ai |bi ). The original system is denoted by (A|b), the system obtained after
performing the elementary operation E1 is denoted by (A1 |b1 ), the system obtained after
performing E2 is denoted by (A2 |b2 ), and so on. The following sequence is copied from
Example 11.2.1:
& '
2 3 12
Original system : (A|b) =
1 4 11
& '
1 4 11
After performing E1 : (A1 |b1 ) =
2 3 12
& '
1 4 11
After performing E2 : (A2 |b2 ) =
0 −5 −10
& '
1 4 11
After performing E3 : (A3 |b3 ) =
0 1 2
& '
1 0 3
After performing E4 : (A4 |b4 ) = .
0 1 2
126
& '& ' & '
0 1 2 3 12 1 4 11
E1 (A|b) = = = (A1 |b1 )
1 0 1 4 11 2 3 12
& '& ' & '
1 0 1 4 11 1 4 11
E2 (A1 |b1 ) = = = (A2 |b2 )
−2 1 2 3 12 0 −5 −10
& '& ' & '
1 0 1 4 11 1 4 11
E3 (A2 |b2 ) = = = (A3 |b3 )
0 − 15 0 −5 −10 0 1 2
& '& ' & '
1 −4 1 4 11 1 0 3
E4 (A3 |b3 ) = = = (A4 |b4 ).
0 1 0 1 2 0 1 2
Note that the coefficient matrix A4 is in reduced row echelon form and equal to the identity
matrix I. In other words, we have the result that
E4 E3 E2 E1 (A|b) = (I|b4 ),
E4 E3 E2 E1 A = I.
We will analyse this expression further in the following two subsections, where we will show
that not only does this expression imply that the matrix A is invertible, but it also provides
an algorithm for calculating the inverse matrix A−1 . Of course, as it stands, the expression
E4 E3 E2 E1 A = I is not valid for any linear system of n equations for n unknowns. This
is because, in the general case, the reduced row echelon form of the matrix A may not be
equal to the identity matrix I. Instead, it may contain one or more rows of zeros, as it
turned out to be the case in Examples 11.2.2 and 11.2.3.
E4 E3 E2 E1 A = I
by replacing the identity matrix I by the reduced row echelon form RRE(A) of the coef-
ficient matrix A, and by considering k elementary row operations instead of the above 4.
In this way, we derive the general expression
In fact, this result can be regarded as a corollary of a more general theorem. In order to
be able to state this theorem, we need one more definition: If A and B are matrices of the
same size, we say that A is row-equivalent to B only if there is a sequence of elementary
row operations that transforms A to B. If this is the case, B is also row-equivalent to A.
127
The proof is omitted. Note that, using the terminology just introduced, our result
amounts to the statement that the coefficient matrix A of a linear system of n equations
for n unknowns is row-equivalent to its reduced row echelon form RRE(A).
Theorem 11.4.1 Every elementary matrix E is invertible, and the inverse E−1 is also an
elementary matrix.
FE = In and EF = In .
F = E−1 .
The Main Theorem 11.4.2 If A is an n × n matrix, then the following statements are
all equivalent. This means that if any of these statements is true for a square matrix A,
then all the statements are true for A. Conversely, if any of these statements is false, then
all the statements are false.
Proof Note that if we show that (1) =⇒ (2) =⇒ (3) =⇒ (4) =⇒ (1), then any one
statement will imply all the others, so the statements are equivalent.
128
(1) =⇒ (2). We assume that A−1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the matrix
A−1 to solve for x by multiplying the equation on the left by A−1 . We have
This shows that x = A−1 b is the only possible solution; and it is a solution, since A(A−1 b) =
(AA−1 )b = Ib = b. So Ax = b has a unique solution for any b ∈ Rn .
(2) =⇒ (3). If Ax = b has a unique solution for all b ∈ Rn , then this is true for
b = 0. The unique solution of Ax = 0 must be the trivial solution, x = 0.
(3) =⇒ (4). If the only solution of Ax = 0 is x = 0, then the reduced row echelon form of
A must have a leading 1 in every column. Otherwise, there would be at least a row of zeros
in this matrix, which would imply the presence of free parameters and hence the occurrence
of infinitely many solutions. Since the reduced row echelon form of A is a square matrix
and has a leading 1 in every column, it can only be the n × n identity matrix.
(4) =⇒ (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence of row operations which reduce A to I, so there must exist elementary
matrices E1 , . . . , Ek such that
Ek Ek−1 . . . E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
by first multiplying the equation on the left by E−1 −1
k , then by Ek−1 , and so on, to obtain
−1 −1
A = E−1
1 . . . Ek−1 Ek I.
This completes the proof that the above four statements are all equivalent. !
Let us complete this subsection by including a theorem about the inverse of an invertible
square matrix A.
Recall that the inverse of such a matrix A is a matrix A−1 with the property that A−1 A = I
and AA−1 = I. A question arises:
AB = I,
is it the case that B is also a left inverse of A? In other words, does it follow that
BA = I
and hence that B = A−1 ? The answer is yes, as the following theorem guarantees:
Theorem 11.4.3 If A and B are n × n matrices and AB = I, then A and B are each
invertible matrices, and A = B−1 and B = A−1 .
Proof If we show that the homogeneous system of equations Bx = 0 has only the trivial
solution, x = 0, then by Theorem 11.4.2 this will prove that B is invertible. So we consider
129
the matrix equation Bx = 0 and multiply both sides of this equation on the left by the
matrix A. We have:
Bx = 0 =⇒ A(Bx) = A0 =⇒ (AB)x = 0.
(AB)x = 0 =⇒ Ix = 0 =⇒ x = 0.
This shows that the only solution of Bx = 0 is the trivial solution. We therefore conclude
that B is invertible, so the matrix B−1 exists.
We now multiply both sides of the equation AB = I on the right by the matrix B−1 . We
have:
So A is the inverse of B, and therefore A is also an invertible matrix. Then taking inverses
of both sides of the last equation, we conclude that A−1 = (B−1 )−1 = B. !
Note that this theorem also implies that if a square matrix A is a left inverse of a square
matrix B, then A is also a right inverse of B.
Now, from statement (4) of the Main Theorem 11.4.2, we know that if A is an invertible
matrix of order n, the reduced row echelon form of A is the n × n identity matrix I. Hence
we have
Ek Ek−1 ...E2 E1 A = I.
Moreover, the square matrix Ek Ek−1 ...E2 E1 is a left inverse of A. By Theorem 11.4.3, it
is equal to A−1 :
Ek Ek−1 ...E2 E1 A = I
tell us that the same row operations that transform A to I also transform I to A−1 .
130
This gives us a method for finding the inverse of an invertible square matrix A. We start
with the matrix A and we form a new, larger matrix by placing the identity matrix to the
right of A, obtaining the matrix denoted by (A|I). We then use row operations to reduce
this to (I|B). If this is not possible (because we get a row of zeros) then the matrix is not
invertible. If it can be done, then A is invertible and B = A−1 .
Example 11.5.1 Let us use this algorithm to show that the matrix A encountered in
Example 11.1.1 is invertible. Then let us find A−1 and use it to obtain the unique solution
of the corresponding linear system
2x + 3y = 12
x + 4y = 11.
We construct the matrix & '
2 3 1 0
(A|I) =
1 4 0 1
and perform a sequence of row operations that transform this matrix into (I|A−1 ) provided,
of course, that this is possible. One such sequence of row operations is given below:
& '
1 4 0 1
R1 ↔ R2 :
2 3 1 0
& '
1 4 0 1
R2 → R2 − 2R1 :
0 −5 1 −2
& '
1 1 4 0 1
R2 → − R2 :
5 0 1 − 15 25
& '
1 0 45 − 35
R1 → R1 − 4R2 :
0 1 − 15 25
& 4
' & '
−1 5
− 35 1 4 −3
It follows that A = = . Hence, by the Main Theorem 11.4.2,
− 15 2
5
5 −1 2
the solution of the system
2x + 3y = 12
x + 4y = 11
is unique. In order to obtain this solution, we multiply the matrix equation Ax = b by
A−1 on the left to obtain
A−1 Ax = A−1 b,
which implies that
& '& ' & ' & '
1 −1 4 −3 12 1 15 3
x=A b= = = .
5 −1 2 11 5 10 2
131
11.6 Exercises for self study
Exercise 11.6.1 Use the algorithm based on elementary row operations to investigate if
the following matrices are invertible. For each matrix that is invertible, find its inverse.
1 2 3 1 2 3
(a) A = 2 3 0 (b) B = 2 3 0
0 1 2 0 1 6
Ax = b
where b may represent several different vectors, it is often more practical to find A−1 (if
it exists) and then obtain the solutions using x = A−1 b. Following this approach, use
your inverse A−1 of the matrix A obtained in Exercise 11.6.1 and solve the linear system
Ax = b in each of the following cases:
1 1 0
(i) b = 0
(ii) b = 1
(iii) b = 1 .
3 1 0
Exercise 11.6.3 (a) Use elementary row operations to solve the linear system
2x + 5y + z = 4
6x + 15y + 2z = 9
z = 3.
(b) Then use the algorithm based on elementary row operations to investigate if the coef-
ficient matrix of the above system is invertible. Does your answer agree with the solution
obtained in part (a)?
(b) Hence find the elementary matrices E1 , E2 , . . . ,Ek corresponding to the elementary
row operations that you performed in part (a).
(c) Calculate the product matrices
Ek Ek−1 ...E2 E1 A
and
Ek Ek−1 ...E2 E1 b.
What does each of these matrices tell you about the linear system in part (a)?
132
11.7 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Section 3.1 of our Algebra textbook is relevant.
12 Matrices, 3 of 3
12.1 Minors, cofactors and the determinant
The determinant of an n × n matrix A, denoted by |A| or detA, is a number derived
from A which determines whether or not the matrix A is invertible. By the Main Theorem
11.4.2, a matrix A is invertible if and only if the corresponding matrix equation Ax = b
has a unique solution.
In order to appreciate how the issue of the uniqueness of the solution of the matrix equation
Ax = b boils down to a single number |A|, let us investigate the simplest possible matrix
equation of the form Ax = b; namely, one whose coefficient matrix is a 1 × 1 matrix.
Our task is to derive a condition which guarantees the uniqueness of the solution of the
equation Ax = b and then define the determinant |A| as a number which is characteristic
of this uniqueness property.
we arrive at the statement that the matrix A is invertible if and only if |A| =
) 0.
133
we again arrive at the statement that the matrix A is invertible if and only if |A| =
) 0.
For matrix equations Ax = b whose coefficient matrix A is of order higher than 2, this
approach is rather impractical. Luckily, a recursive formula can be established which
expresses the determinant of the n × n matrix A in terms of the determinants of (n − 1) ×
(n − 1) sub-matrices of A. These sub-matrices can then be expressed in terms of their own
(n−2)×(n−2) sub-matrices, and so on, until the original determinant is expressed entirely
in terms of the determinants of 1 × 1 matrices. These last determinants can be calculated
directly since, by definition, they are equal to their corresponding entries, according to
|(a)| = a.
The determinants of the above-mentioned sub-matrices of A are known as the (i, j) minors
of A. There is also a corresponding concept known as the (i, j) cofactors of A. The relevant
definitions follow:
The (i, j) cofactor of a matrix A, denoted by Cij , is derived from the corresponding minor
of A by using the formula
Cij = (−1)i+j Mij .
In other words, the (i, j) cofactor of A is equal to the (i, j) minor of A if i + j is even, and
it is equal to the negative of that minor if i + j is odd.
Let us write down the (i, j) minors of A and their corresponding cofactors:
Since A has nine entries, there are nine minors. The (i, j) minor of A is the determinant
of the 2 × 2 matrix obtained by removing the ith row and jth column of A. We have:
D& 'D D& 'D D& 'D
D e f DD D d f DD D d e DD
M11 = DD , M12 = DD , M13 = DD ,
h i D g i D g h D
D& 'D D& 'D D& 'D
D b c DD D a c DD D a b DD
M21 = DD , M22 = DD , M23 = DD ,
h i D g i D g h D
134
D& 'D D& 'D D& 'D
D b c D D a c D D a b D
M31 D
=D D, M32 D
=D D, M33 D
=D D.
e f D d f D d e D
Using the formula Cij = (−1)i+j Mij , we obtain the corresponding nine cofactors of A. We
have:
D& 'D D& 'D D& 'D
D e f D D d f DD D d e DD
C11 = DD D, C12 = − DD , C = D ,
h i D g i D 13 D g h D
D& 'D D& 'D D& 'D
D b c D D a c DD D a b DD
C21 = − D D D, C22 = DD , C = − D ,
h i D g i D 23 D g h D
D& 'D D& 'D D& 'D
D b c D D a c DD D a b DD
C31 = DD D, C32 = − DD , C33 = DD .
e f D d f D d e D
Remark 12.1.3 The formula Cij = (−1)i+j Mij says the following: The position (i, j) of
the (i, j) entry in A is associated with a ‘+’ sign or a ‘−’ sign according to the alternating
pattern
+ − +
− + − .
+ − +
It should be emphasised that this allocation of signs refers to the positions of the entries
in A. It does not refer to the values of these entries, which may be positive, negative or
zero.
This pattern of signs (which can be extended to any n × n matrix in a obvious way) allows
the connection between cofactors and minors to become transparent. For example, if we
want to calculate the (1, 3) cofactor of A, the ‘+’ sign in position (1, 3) informs us that the
(1, 3) cofactor is equal to the (1, 3) minor:
D& 'D
D d e D
C13 = +M13 = D D D.
g h D
On the other hand, if we want to calculate, say, the (2, 1) cofactor of A, the ‘−’ sign in
position (2, 1) informs us that the (2, 1) cofactor is equal to the negative of the (2, 1) minor:
D& 'D
D b c D
C21 = −M21 = − DD D.
h i D
With these definitions in place, we can provide a recursive formula for calculating the
determinant of any n × n matrix A: Choose any row or any column of A and multiply
each entry in that row or column by the corresponding cofactor of A. Then sum these
individual products to obtain the determinant |A| of A.
Example 12.1.4 Let us use this recursive definition of the determinant in order to confirm
that the determinant of the 2 × 2 matrix
& '
a11 a12
A=
a21 a22
135
is given by the expression a11 a22 − a12 a21 . Recall that this expression was previously
obtained by considering the uniqueness of the solution of the matrix equation Ax = b.
Let us choose the first row of A in order to obtain the so-called ‘cofactor expansion of |A|
by row 1’. We have:
D& 'D
D a11 a12 D
|A| = D D D = a11 C11 + a12 C12 = a11 M11 − a12 M12
a21 a22 D
Since we are free to expand |A| by any row or column, it is convenient to choose a row
or a column which involves the fewest number of calculations. Here, we see that a33 = 0,
so expanding by either row 3 or column 3 is equally convenient. Let us choose the third
column. Recall that the signs associated with the positions of the entries in A are
+ − +
− + − .
+ − +
This means, in particular, that the signs associated with the positions of the entries in the
third column of A are
+
− .
+
Thus, using the above column of signs and the corresponding minors of A, we find:
D D
D 1 2 3 D D& 'D D& 'D D& 'D
D D D 4 1 D D 1 2 D D 1 2 D
|A| = DD 4 1 1DD = +3 DD D − 1D
D
D D
D −1 3 D + 0 D 4 1 D
D
D −1 3 0 D −1 3
Example 12.1.6 Does the following linear system admit a unique solution?
136
x + 2y + 3z = 12
4x + y + z = −7
−x + 3y = 28
The coefficient matrix of this system is given by
1 2 3
A = 4 1 1 .
−1 3 0
The fastest way of establishing whether or not this system has a unique solution (if a
solution actually exists) is to calculate the determinant |A|. The determinant of the matrix
A was calculated in Example 12.1.5, so we know that |A| = 34. Since |A| = ) 0, the system
Ax = b has a unique solution.
For completeness, let us discuss ways of obtaining this solution. Recall from Lecture Notes
11 that one way of finding this solution is to use elementary row operations in order to
transform the matrix equation Ax = b into an equivalent equation in reduced row echelon
form; that is, Ix = s. The identity matrix I is simply the reduced row echelon form of A,
and the vector s is the unique solution. An alternative way of solving this system is to use
elementary row operations in order to derive the matrix (I|A−1 ) from the matrix (A|I)
and then obtain the unique solution s by means of the matrix equation s = A−1 b.
|AT | = |A|.
Proof This theorem follows from the result stated without proof in subsection 12.1 that
all cofactor expansions of a matrix A agree. Since the cofactor expansion of |AT | by row
i is precisely the same, number for number, as the cofactor expansion of |A| by column i,
we immediately deduce that |AT | = |A|.
Proof We just need to realise that the cofactor expansion of |A| by that particular row or
column results in every cofactor being multiplied by 0. Since all these individual products
are zero, their sum and hence |A| is also zero.
Example 12.2.3 Find the determinant of the matrix A given below and also the deter-
minant of its transpose AT :
137
0 0 ... 0
a21 a22 . . . a2n
A = .. .. .. .. .
. . . .
an1 an2 . . . ann
Alternatively, we can apply Theorem 12.2.2 one more time, or we can combine Theorem
12.2.1 with the fact that |A| = 0.
Theorem 12.2.4 If an n × n matrix A contains two rows or two columns which are
equal, then |A| = 0.
Proof We will only provide a sketch of this proof, as it requires the method of induction.
Moreover, we will only consider the case of ‘equal rows’. This is because if this result holds
for ‘equal rows’ then, by Theorem 12.2.1, it also holds for ‘equal columns’.
Now consider a 3 × 3 matrix with two equal rows. If we expand the determinant by the
other row, then each entry in that row is multiplied by a cofactor which is zero. This is
because any such cofactor is equal (up to sign) to the determinant of a 2 × 2 matrix with
two equal rows. Since each individual such product is zero, their sum is zero and so is the
determinant. For example,
D D
D a b c D D& 'D D& 'D D& 'D
D D D b c D D a c D D a b D
D D
|A| = D d e f D = −d D D D D D D D
b c D + e D a c D − f D a b D = 0 + 0 + 0 = 0.
D a b c D
In a similar way, one can show that if this result is valid for (n − 1) × (n − 1) matrices,
then it is also valid for n × n matrices. Hence, by the method of induction, it is valid for
all square matrices or order n ≥ 2.
138
Example 12.2.5 Calculate the determinant of the following matrix:
2 3 −6 2 8
1 7 8 1 −9
3 2 3 3 −1
.
4 5 −7 4 0
9 1 1 9 8
We observe that the first and fourth columns are equal. Hence, by Theorem 12.2.4, the
determinant of this matrix is zero.
Theorem 12.2.6 If the cofactors of one row are multiplied by the entries of a different
row and these products are added, then the result is 0. In other words,
The proof of this result is omitted, but it can be found in section 3.3 of our Algebra
Textbook under ‘proof of Corollary 3.30’. Note that, by Theorem 12.2.1, the above result
remains valid if ‘row’ is replaced by ‘column’.
Let us multiply the cofactors of the first column by the entries of the second column and
add these products. We have:
a12 C11 + a22 C21 + a32 C31 = a12 M11 − a22 M21 + a32 M31
The proof of this theorem is omitted, but the result itself is quite straightforward to
understand. The following example provides a visualisation.
Example 12.2.9 Find the determinant of the following upper triangular matrix:
1 3 0 5
0 −2 1 4
A= 0 0
.
2 3
0 0 0 4
139
We expand the determinant |A| by the first column of A to get
D D
D −2 1 4 D
D D
|A| = (1) DD 0 2 3DD + 0 + 0 + 0.
D 0 0 4 D
One final expansion by the first column yields the final answer:
|A| = (1)(−2)(2)(4) = −16.
Alternatively, we can use Theorem 12.2.8 to get the same answer directly.
In this subsection, we look at how each elementary row operation affects the value of the
determinant.
Suppose that the matrix B is obtained from a matrix A by multiplying row i by a non-zero
constant α. For example,
D D D D
D a11 a12 . . . a1n D D a11 a12 . . . a1n DD
D D D
D a21 a22 . . . a2n D Dαa21 αa22 . . . αa2n D
D D D D
|A| = D .. .. .. .. D , |B| = D .. .. .. .. D .
D . . .
. D D D . . . . DD
D D
D an1 an2 . . . ann D D an1 an2 . . . ann D
140
so
|B| = −|A|.
Now let A be a 3 × 3 matrix and let B be a matrix obtained from A by interchanging two
rows of A. For example, suppose that we interchange the first and third rows of A:
D D D D
D a b c D D g h i D
D D D D
|A| = DDd e f DD , |B| = DDd e f DD .
D g h i D D a b c D
By expanding both determinants |A| and |B| using the row that is not involved in this
interchanging, i.e., the second row, we see that the corresponding cofactors have their rows
interchanged and hence have opposite signs:
D& 'D D& 'D D& 'D
D b c D D a c D D a b D
|A| = −d DD D + eD D D
D g i D−fD g h D
D
h i D
D& 'D D& 'D D& 'D
D h i D D g i D D g h D
|B| = −d DD D + eD D D D
D a c D − f D a b D.
b c D
So, it remains true that
|B| = −|A|.
More generally, one can show that if this result holds for (n − 1) × (n − 1) matrices, then it
also holds for n × n matrices. By the method of induction, it holds for all square matrices
of order n ≥ 2.
Summarising, the effect of interchanging two rows of a matrix is to multiply the determinant
by −1.
Suppose that the matrix B is obtained from the matrix A by adding k times row i of A
to row j of A, where j )= i. For example, consider the case in which B is obtained from A
by adding 4 times row 1 of A to row 2 of A. Then
D D
D a11 a12 . . . a1n D
D D
D a21 a22 . . . a2n D
D D
|A| = D .. .. . . .. D ,
D . . . . DD
D
D an1 an2 . . . ann D
D D
D a11 a12 ... a1n D
D D
D a21 + 4a11 a22 + 4a12 . . . a2n + 4a1n D
D D
|B| = D .. .. .. .. D.
D . . . . D
D D
D an1 an2 ... ann D
In general, in a situation like this, we can expand |B| by row j (which in the above example
is row 2):
|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + · · · + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + · · · + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn )
= |A| + 0.
141
The second bracket above is equal to 0 because it consists of the cofactors of one row
multiplied by the entries of another row, whose sum vanishes by Theorem 12.2.6.
We conclude that there is no change in the value of the determinant if a multiple of one
row is added to another row.
Moreover, by Theorem 12.2.1, all the previous statements are true if ‘row’ is replaced by
‘column’. Collecting these statement together, we have the following theorem:
Example 12.3.2 Just as an illustration of the use of row operations in the calculation of
determinants let us express D& 'D
D 3 4 D
D D
D 2 7 D
The elementary row operations used in the above sequence should be clear by inspection.
Remark 12.3.3 Several examples involving the use of elementary row operations in the
calculation of large determinants can be found in the Exercises.
142
Theorem 12.4.2 If A is an n × n matrix, then A is invertible if and only if |A| =
) 0.
Proof Recall that one of the conclusions of Main Theorem 11.4.2 was that A is invertible
if and only if the reduced row echelon form of A is the identity matrix. Let A be any n × n
matrix and let RRE(A) be the reduced row echelon form of A. If A is invertible then
RRE(A) is the identity matrix and |RRE(A)| = 1. If A is not invertible, then RRE(A)
has at least one row of zeros and |RRE(A)| = 0.
In subsection 12.3 we saw that row operations either rescale the determinant by a non-zero
constant, or change its sign, or do not affect it at all. Therefore, we can conclude that
|A| = 0 if and only if |RRE(A)| = 0, and |A| =) 0 if and only if |RRE(A)| = 1.
Putting these statements together, |A| )= 0 if and only if RRE(A) = I. By the Main
Theorem 11.4.2, this implies that |A| =
) 0 if and only if A is invertible. !
Note that this is precisely how we motivated the concept of the determinant in the begin-
ning of subsection 12.1.
Theorem 12.4.3 Given an invertible n × n matrix A, its inverse matrix A−1 is equal to
1
A−1 = adj(A),
|A|
where the so-called adjoint matrix, denoted by adj(A), is defined as the transpose of the
matrix of cofactors of A. The latter is simply the matrix whose (i, j) entry is the (i, j)
cofactor of A. In other words, the adjoint of A is the matrix
C11 C21 ... Cn1
C12 C22 . . . Cn2
adj(A) = .. .. .. .. .
. . . .
C1n C2n . . . Cnn
The proof of this theorem can be found in our Algebra Textbook in Section 3.4 under
Definition 3.40.
Example 12.4.4 Use the method of the adjoint matrix to find the inverse of
1 2 3
A = −1 2 1
4 1 1
We know that
1
A−1 = adj(A).
|A|
Starting from |A|, we can expand this determinant by row 1. We have:
143
The matrix of the cofactors Acof is given by
1 5 −9
Acof = 1 −11 7 .
−4 −4 4
Hence,
1 1 −4
1
A−1 = − 5 −11 −4 .
16
−9 7 4
One can check that A−1 A = I.
Theorem 12.5.1 Consider an n × n matrix A with |A| = ) 0. Let x represent the vector
of unknowns x1 , x2 , . . . , xn . Then for any n × 1 column vector b, the solution of the linear
system Ax = b is given by
|Ai |
xi = ,
|A|
where Ai is defined as the matrix derived from A by replacing the ith column of A with
the vector b.
The proof of Theorem 12.5.1 can be found in section 3.4.2 of our Linear Algebra textbook.
Example 12.5.1 Use Cramer’s rule to find the solution of the linear system
x + 2y + 3z = 7
−x + 2y + z = −3
4x + y + z = 5
The matrices A1 , A2 and A3 are derived from the coefficient matrix A by replacing the
1st, 2nd and 3rd column of A with the vector b. We have:
7 2 3 1 7 3 1 2 7
A1 = −3 2 1
A2 = −1 −3 1
A3 = −1 2 −3
5 1 1 4 5 1 4 1 5
144
Provided that |A| =
) 0, Cramer’s rule tells us that the unique solution of the system Ax = b
is given by
(b) Can you infer from part (a) what the reduced row echelon forms of the matrices A and
B are?
Exercise 12.6.2 (a) Evaluate the following determinant using row operations:
D D
D 1 4 −1 3 0 D
D D
D 1 7 4 3 8 D
D D
D 2 8 −2 6 0D
D D
D 2 0 5 5 7 D
D D
D −1 9 0 9 2 D
& '
2−λ 3
(b) For which values of λ is the matrix A = not invertible?
2 1−λ
Exercise 12.6.3 (a) Evaluate the following determinant using row operations:
D D
D 1 −4 3 2 D
D D
D2 −7 5 1D
|A| = DD D
D 1 2 6 0DD
D 2 −10 14 4 D
(b) Based on your answer in part (a), does the matrix equation Ax = 0 have a solution?
If yes, is it unique?
(c) Consider an n × n matrix A whose determinant has the value det(A) = 5. Find the
values of det(3A), det(A2 ), det((2A)−1 ) and det(2A−1 ).
145
Exercise 12.6.4 Solve the system of linear equations described by the augmented matrix
& '
2 −3 3
(A|b) =
4 2 14
(ii) Performing row operations on (A|I) to obtain A−1 and then using x = A−1 b;
(iii) By using the adjoint matrix to find A−1 and then obtaining x as in part (ii);
146
Figure 13.1.1
Accordingly, the set R2 is visualised as a set of arrows of various lengths and directions,
all starting at the origin of the Cartesian plane and moving outwards:
Figure 13.1.2
147
Figure 13.2.1
& '
u1
• Scalar multiplication : Given a vector u = and a scalar α ∈ R, the scalar
& ' u 2
αu1
multiple αu = is the vector illustrated below:
αu2
Figure 13.2.2
& ' & '
v1 w1
• Parallel vectors : Two vectors v = and w = are called parallel if there
v2 w2
exists a non-zero real&scalar
' α such
& that
' w = αv; that is, if w1 = αv1 and w2 = αv2 . For
2 −1
example, the vectors and are parallel since the system of equations −1 = 2α
−6 3
and 3 = −6α has a consistent non-zero solution α = −1/2; i.e.,
& ' & '
−1 1 2
=− .
3 2 −6
& '
v1
• Length (or norm) : Given a vector v = , its length (or norm) is defined by
v2
/
||v|| := (v12 + v22 ). This corresponds to Pythagoras theorem for displacements:
148
Figure 13.2.4
/ / √ /
Note that ||αv|| = (αv1 )2 + (αv2 )2 = α2 (v12 + v22 ) = α2 (v12 + v22 ) = |α|||v||:, i.e.,
the length of a rescaled vector is equal to the original length times the absolute value of
the scalar used in the rescaling.
& '
v
• Unit vector : A vector v = 1 is called a unit vector if its length is equal to 1; that
v2 & ' & ' & '
/ 0.6 0 −1
2 2
is, if (v1 + v2 ) = 1. For example, the vectors , and are all unit vectors,
& ' 0.8 1 0
2
while the vector is not.
1
& ' & '
v1 0
Note that given any non-zero vector v = (any vector other than ) one can create
v2 0
a parallel unit vector pointing in the same direction as v by dividing v by its length ||v||.
& '
3 √
For example, starting from the vector whose length is 32 + 42 = 5 one can create
& ' & ' & '4
1 3 3/5 0.6
the unit vector = = . This is parallel to the original vector and points
5 4 4/5 0.8
in the same direction.
& ' & '
v1 w1
• Scalar product : The scalar product of two vectors v = and w = is a
v2 w2
number defined by
9v, w: := v1 w1 + v2 w2 .
The geometric meaning of this number will be derived later in the course. For the time
being, you just need to know that the scalar product of two vectors is equal to the product
of their lengths multiplied by the cosine of the angle between them:
• Angle between vectors : Making the angle θ the subject of the equation
149
which defines the angle between any two non-zero vectors v and w.
& ' & '
1 0
For example, the angle between and is given by
1 2
(1)(0) + (1)(2) 2 1
cos(θ) = √ √ = √ =√ ,
12 + 12 02 + 22 2 2 2
π
which yields θ = .
4
• Orthogonal vectors : Two non-zero vectors are called orthogonal if the angle θ between
π
them is equal to . This implies that cos(θ) = 0. Given that
2
9v, w:
cos(θ) = ,
||v|| ||w||
we deduce that the scalar product of two orthogonal vectors is zero:
9v, w: = 0.
& ' & '
1 −2
For example, the vectors and are orthogonal since
1 2
E& ' & 'F
1 −2
, = (1)(−2) + (1)(2) = 0.
1 2
13.3 Lines in R2
Definition and some terminology Suppose that we are given a vector p in R2 and a
non-zero vector d in R2 . We define the line through vector p in the direction of vector
d as the subset of R2 consisting of all vectors r for which r = p + td for some value of
the real parameter t. This last equation is called the parametric equation of the line
through p in the direction of d. The corresponding line is visualised below:
Figure 13.3.1
Remark 13.3.2 Note that the line is visualised as a collection of position vectors, where
each value of the parameter t gives a different position vector in this collection. If, alterna-
tively, one wishes to visualise the line as a collection of points in the Cartesian plane, then
150
it is the end-points of the position vectors that trace this line. If p has components p1 and
p2 , a common terminology for the above line is that it goes through the point (p1 , p2 ) and
is parallel to the vector d.
Figure 13.3.4
Geometric description There are three common ways of describing a line in R2 : using
(i) two distinct points on the line, (ii) a point on the line and a vector parallel to the line
and (iii) a point on the line and a vector orthogonal (also known as perpendicular) to the
line. These cases are visualised on the next page by using position vectors.
In each case, the parametric and the Cartesian equations of the line are derived.
Case (i) will be discussed in detail and cases (ii) and (iii) only briefly. These cases are
interconnected, so the examples and illustrations used for case (i) are equally relevant for
the other two cases.
151
Figure 13.3.5
152
Figure 13.3.6
In order to construct a Cartesian equation, we need a position vector on the line (we can
take either p or q) and a vector n perpendicular to the line. Any such vector is called a
normal vector
& and 'is unique up to rescaling by a non-zero scalar. Given that the line is
q1 − p 1
parallel to , a normal vector n can be obtained by switching these components
q2 − p 2 & '
q2 − p 2
around and multiplying one of them by minus one: n = . This trick works
& ' & ' −q 1 + p1
a b
because the scalar product of with is zero:
b −a
E& ' & 'F
a b
, = (a)(b) + (b)(−a) = 0.
b −a
Given such a normal vector n, the line is the set of all position vectors x for which x − p
is perpendicular to n; that is,
9x − p, n: = 0.
Figure 13.3.7
153
This last equation 9x − p, n: = 0 can be expressed in component form as
(x − p1 )(q2 − p2 ) + (y − p2 )(−q1 + p1 ) = 0.
Indeed, provided that q1 − p1 and q2 − p2 are both non-zero, the parameter t can be made
the subject of this set of equations in two different ways: either as
x − p1
t=
q1 − p 1
or as
y − p2
t= .
q2 − p 2
Equating these two expressions we obtain a single equation; namely,
x − p1 y − p2
= .
q1 − p 1 q2 − p 2
This is equivalent to the Cartesian equation derived in the previous paragraph. It is left
as an exercise to investigate what happens to this elimination process if either q1 − p1 or
q2 − p2 is zero. What is the Cartesian equation of the line in each case? Note that q1 − p1
and q2 −p2 cannot be both zero because the points (p1 , p2 ) and (q1 , q2 ) are assumed distinct.
Example 13.3.8 Find a parametric and a Cartesian equation for the line in R2 through
the points (1, 1) and (3, 0), sketched below:
Figure 13.3.9
154
Parametric equation &We'just&need ' to &select
' a position vector on the line and a direction
x 1 2
vector (see the sketch): = +t , t ∈ R.
y 1 −1
& '
1
Cartesian equation We need to have a position vector on the line such as as well as
& ' 1
2
a normal vector n. The latter can be derived from the direction vector using the trick
& ' E& −1 ' & 'F
1 x−1 1
mentioned earlier: n = . A Cartesian equation then follows: , = 0.
2 y−1 2
This equation reduces to (x−1)+2(y −1) = 0 or, equivalently, to x+2y = 3. Alternatively,
a Cartesian equation can be obtained by eliminating t from the parametric equation. We
x−1 y−1 x−1 y−1
have t = and t = . Equating these two expressions we get = . This
2 −1 2 −1
gives −x + 1 = 2y − 2 or, equivalently, x + 2y = 3.
Example 13.3.10 Find a parametric equation for the line in R2 described by the Carte-
sian equation x + 2y = 3:
Geometric method We arrange the Cartesian equation so that the x and y terms
appear on the same side. This is already the case here. Since x + 2y = 3 corresponds to
arranging the Cartesian equation in the
& ' form 9x, n: = 9p, n:, the coefficients of x and y
1
are the components of n. Hence, n = . A direction vector d is derived from n by the
& ' 2
−2
trick: d = . Having found d, we just need a position vector on the line. This can
1
be found by choosing any values (x, y) that satisfy x + 2y = 3. One such option is (5, −1).
A parametric equation then follows:
& ' & ' & '
x 5 −2
= +s , s ∈ R.
y −1 1
As an exercise, show that this parametric equation and the parametric equation in Example
13.3.8 describe the same set of position vectors; i.e., the same line.
Algebraic method A single equation such as x+2y = 3 cannot determine both variables
x and y uniquely. As a result, one of these variables can be regarded as a free parameter
in the sense that it can take any real value. For example, we can think of y as a free
parameter and write y = λ where λ ∈ R. Then, the Cartesian equation x + 2y = 3 implies
that x = 3 − 2λ. The expressions x = 3 − 2λ and y = λ provide the general solution of
the Cartesian equation x + 2y = 3 in parametric form. Each value of λ gives a distinct
solution of this equation.
Recall that we encountered this method in Lectures 11 and 12 when we applied it to linear
systems that have infinitely many solutions.
155
we obtain precisely a parametric description for the line x + 2y = 3:
& ' & ' & '
x 3 −2
= +λ , λ ∈ R.
y 0 1
As an exercise, start from the Cartesian equation x + 2y = 3 and find its general solution
by now treating x as a free parameter (instead of y). Convince yourselves that the resulting
parametric equation is equivalent to the previous one.
Let us finally complete the presentation of lines in R2 by briefly considering the remaining
two cases.
& '
d1
Case (ii): Given a point (p1 , p2 ) on the line and a vector d = parallel to the line, a
d2
parametric equation follows directly:
& ' & ' & '
x p1 d
= +t 1 , t ∈ R.
y p2 d2
& '
d2
A normal vector can be obtained via the trick: n = . Hence, a Cartesian equation
−d1
is E& ' & 'F
x − p1 d2
, = 0.
y − p2 −d1
Figure 13.3.12
& '
n1
Case (iii): Given a point (p1 , p2 ) on the line and a vector n = perpendicular to the
n2
line, a Cartesian equation follows directly:
E& ' & 'F
x − p1 n
, 1 = 0.
y − p2 n2
For the parametric
& ' description, we need a direction vector d. This can be found via the
n2
trick: d = . Hence, a parametric equation is
−n1
& ' & ' & '
x p1 n2
= +t , t ∈ R.
y p2 −n1
156
Figure 13.3.13
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row whose first two entries are zero and whose third entry is
non-zero. Therefore, the system is inconsistent and has no solution.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row of zeros, showing that one of these linear equations is
157
redundant. Hence, there is a single restriction on (x, y), which implies infinitely many
solutions.
Example 13.4.3 A typical example of lines intersecting at a unique point is the following:
C
x + 3y = 2
3x − 2y = 5.
& '
1
From the geometric point of view, the first line has as a normal vector and the second
& ' 3
3
line has as a normal vector, so these lines are not parallel. Consequently, they must
−2
have a unique point of intersection.
From the point of view of linear algebra, the reduced row echelon form of the coefficient
matrix is the identity I2 . Hence, there is a unique solution.
(i) Using elementary row operations, or otherwise, determine the value of a for which this
system does not have a unique solution.
(ii) For the particular value of a obtained in part (i), determine the value of b which makes
the system consistent. Hence, write down the general solution of this system in vector
parametric form and represent two of these solutions by position vectors in R2 . Add to
your sketch the line in R2 corresponding to the general solution of this system.
(iii) In the case where a has the value obtained in part (i) and b does not have the value
obtained in part (ii), interpret geometrically the corresponding linear system.
(iv) In the case where a does not have the value obtained in part (ii), what is the geometric
interpretation of the corresponding linear system? Is this interpretation affected by the
particular value of b?
158
Exercise 13.5.3 Consider the vectors
& ' & '
4 2
u= and v= .
1 2
x + 2y = 6.
(d) On a new graph, sketch the two lines described by the above Cartesian equations.
(e) By using elementary row operations, find the reduced row echelon form (I|s) of the
augmented matrix & '
1 2 6
(A|b) =
2 1 4
and hence the unique solution s of the above linear system. Represent your solution s by
a position vector on your graph in part (d).
(f) On that same graph, sketch the two lines whose Cartesian equations correspond to the
augmented matrix (I|s).
159
Parts of Sections 1.9 and 1.10 of our Algebra Textbook are relevant.
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 1.3 and 1.4 of our Calculus Textbook are also relevant.
Figure 14.1.1
a
Each vector b is visualised as a position vector (an arrow starting at the origin (0, 0, 0)
c
and ending at the point (a, b, c)) or, alternatively, as the point (a, b, c) itself.
All the definitions introduced in the /case of R2 can be extended to R3 . For example, the
norm of a vector is given by ||v|| := (v12 + v22 + v32 ), the scalar product of two vectors is
given by 9v, w: := v1 w1 + v2 w2 + v3 w3 , and so on.
Additionally, there is a vector operation in R3 that has no counterpart in R2 . This is the
following: Let
a1 b1
a = a2 and b = b2
a3 b3
160
be any two vectors and let
1 0 0
e1 = 0 ,
e2 = 1 ,
e3 = 0
0 0 1
be the unit vectors pointing along the x, y and z axis respectively. The vector product
(or cross product) of the vectors a and b is the vector a × b defined by calculating the
determinant
D D
D e1 e2 e3 D
D D
a × b = DDa1 a2 a3 DD = (a2 b3 − a3 b2 )e1 − (a1 b3 − a3 b1 )e2 + (a1 b2 − a2 b1 )e3
D b1 b2 b3 D
1 0 0
= (a2 b3 − a3 b2 ) 0 − (a1 b3 − a3 b1 ) 1 + (a1 b2 − a2 b1 ) 0 .
0 0 1
It can be shown using the properties of the determinant that the vector a × b is orthogonal
to both a and b.
14.2 Planes in R3
Geometric Descriptions
There are three common ways of describing a plane in R3 : by using (i) three distinct points
on the plane which are not collinear (i.e., do not lie on the same line), (ii) a point on the
plane and two vectors parallel to the plane but not parallel to each other and (iii) a point
on the plane and a vector perpendicular to the plane. These cases are depicted below using
position vectors in the three-dimensional real coordinate space:
Figure 14.2.1
161
Parametric and Cartesian Equations
Figure 14.2.2
162
In order to construct a Cartesian equation we need a position vector on the plane (again,
we can take p or q or s) and a vector n perpendicular to both direction vectors q − p and
s − p. The fastest way of finding such an n is by using the vector product of the vectors
q − p and s − p; that is, the vector n = (q − p) × (s − p). Alternatively, we can find n in
a different way, by solving the system of equations 9n, q − p: = 0 and 9n, s − p: = 0. The
solution of this system is unique up to rescaling n by a non-zero scalar. Unfortunately,
there is no trick that can be used here to speed up this process. Regardless of how n is
obtained, the corresponding Cartesian equation describes the plane as the set of all position
vectors x for which x − p is perpendicular to n; that is,
9x − p, n: = 0.
Note the similarity in form between the Cartesian equation of a plane in R3 and that of
a line in R2 . The corresponding geometric situation is visualised below. For clarity, the
vectors n and x − p are not depicted as position vectors (i.e., starting at the origin) but
have been translated.
Figure 14.2.3
An alternative way for obtaining a Cartesian equation for a plane involves eliminating the
parameters λ and µ from the parametric equation
x p1 q1 − p 1 s1 − p 1
y = p2 + λ q2 − p2 + µ s2 − p2 .
z p3 q3 − p 3 s3 − p 3
Example 14.2.4 Find a parametric and a Cartesian equation for the plane in R3 passing
through the points (1, 1, 1), (2, 4, −1) and (3, 1, 0). Note that the sketch below is generic
and does not correspond to the actual plane considered:
163
Figure 14.2.5
Parametric
equation We need to select a position vector on the plane (let us choose
1
1) and two vectors parallel to the plane but not parallel to each other: two such vectors
1
1 2
are 3 and 0 (the relevant calculations can be found on the sketch). A parametric
−2 −1
x 1 1 2
equation then follows: y = 1 + λ 3 + µ 0 , where λ ∈ R, µ ∈ R.
z 1 −2 −1
n1
Cartesian equation A normal vector n = n2 can be found by using the cross product
n3
1 2
n = 3 × 0 or by requiring that the scalar product of the normal vector with
−2 −1
1 2 G n 1 H
1
each of the two direction vectors 3 and 0 is zero: n2 , 3 = 0 and
−2 −1 n3 −2
Gn 2 H
n1
1
n2 , 0 = 0. It is left as an exercise to show that n2 must be a (non-zero)
n3 −1 n3
1 1 1
scalar multiple of 1 . Taking n = 1 and choosing 1 as the position vector on
2 2 1
G x − 1 1 H
the plane, the Cartesian equation follows: y − 1 , 1 = 0. This equation reduces
z−1 2
to (x − 1) + (y − 1) + 2(z − 1) = 0 or, equivalently, to x + y + 2z = 4. It is useful to confirm
that the points (1, 1, 1), (2, 4, −1) and (3, 1, 0) satisfy this equation.
164
Example 14.2.6 Find a parametric equation for the plane in R3 described by the Carte-
sian equation x + y + 2z = 4.
Geometric method We check that the Cartesian equation is arranged so that the x,
y and z terms appear on the same side. Then we identify the components of n with the
1
coefficients of x, y and y. This gives n = 1. We now need to find two direction vectors
2
for the plane; that is, two vectors perpendicular to n but not parallel to each other. For
this
purpose
we can use a generalisation
of ourprevious
trick:
given any normal vector
a 0 c b
n = b its scalar products with c , 0 and −a are all zero; moreover, given
c −b −a 0
that a, b and c are
not
all zero
(because
n cannot be a zero vector) then at least two of
0 c b
the vectors c , 0 and −a are non-zero and can be used as direction vectors
−b −a 0
1 0
for the plane perpendicular to n. Here, n = 1 so two direction vectors are 2 and
2 −1
2
0 . Finally, we need a position vector on the plane. Any point (x, y, z) that satisfies
−1
the Cartesian equation x +y + 2z
= 4is fine;
forexample
(10, 0, −3). The corresponding
x 10 0 2
parametric equation is y = 0 + κ 2 + ν 0 , where κ ∈ R, ν ∈ R. In
z −3 −1 −1
order to confirm that this parametric equation describes the plane of Example 14.1.5,
try
1 2
as an exercise to find the values of (κ, ν) that produce the position vectors 1, 4
1 −1
3
and 1 used to define this plane in the first place.
0
Algebraic method As in the case of a line in R2 , we use the fact that a parametric
equation is a general solution for the corresponding Cartesian equation. In R3 we have
three variables: x, y and z. Hence, a single equation such as x + y + 2z = 4 implies that
two of the variables x, y and z can be regarded as free parameters. Let us choose y and z
to play this role and write y = s and z = t, where s ∈ R, t ∈ R. Then x is determined by
the Cartesian equation according to x = 4 − s − 2t. The expressions x = 4 − s − 2t, y = s
and z = t provide the general solution of the Cartesian equation x + y + 2z = 4. Each
pair of values of (s, t) gives a distinct solution of this equation. We can write this general
solution in vector form as
x 4 − s − 2t
y = s .
z t
165
By expanding this expression, we obtain a parametric description of the plane:
x 4 −1 −2
y = 0 + s 1 + t 0 , s ∈ R, t ∈ R.
z 0 0 1
As an exercise, find pairs of values of (s, t) that produce the position vectors of the points
(1, 1, 1), (2, 4, −1) and (3, 1, 0) used to define this plane in Example 14.1.5.
Finally, let us complete the presentation of planes in R3 by considering the remaining two
ways of describing such planes.
Case (ii): Suppose we are given a point (p1 , p2 , p3 ) on the plane and two vectors u and v
parallel to the plane but not parallel to each other. Then, denoting the position vec-
tor to the point (p1 , p2 , p3 ) by p, we can write down a parametric equation directly:
x = p + su + tv, t ∈ R, s ∈ R. To construct a Cartesian equation, we find a normal
vector n by either using the cross product u × v or by solving the system of equations
9n, u: = 0 and 9n, v: = 0 (which determine n up to a non-zero multiple). The Cartesian
equation then follows: 9x − p, n: = 0.
Figure 14.2.7
n1
Case (iii): Given a point (p1 , p2 , p3 ) on the plane and a vector n = n2 perpendicular
n3
to the plane, let us denote the position vector to the point (p1 , p2 , p3 ) by p. A Cartesian
equation can be written down directly: 9x − p, n: = 0. To construct a parametric equation
we need two vectors u and v parallel to the plane. Following
our earlier
discussion,
two
0 n3 n2
such vectors can always be selected from the set n3 , 0 , −n1 . Then a
−n2 −n1 0
parametric equation follows: x = p + su + tv, where s ∈ R, t ∈ R.
166
Figure 14.2.8
14.3 Lines in R3
There are two common ways of describing a line in R3 : by using (i) two distinct points on
the line and (ii) a point on the line and a vector parallel to the line.
The line is visualised below as a collection of position vectors, where each vector in this
collection is obtained by selecting a value for λ. Again, for clarity, some of the vectors are
not depicted as position vectors but have been translated:
167
Figure 14.3.1
168
is non-zero. Then, the first two equations of the parametric system become x = p1 and
y = p2 . These are already the equations needed for the Cartesian description. The third
z − p3
equation, λ = , informs us that the variable z is free. This is in agreement with the
q3 − p 3
other two variables not being free, according to x = p1 and y = p2 .
Finally, let us note that it is not possible for all three quantities q1 −p1 , q2 −p2 , q3 −p3 to be
zero because the points (p1 , p2 , p3 ) and (q1 , q2 , q3 ) on the line have been assumed distinct.
Example 14.3.2 Find a parametric and a Cartesian equation for the line in R3 passing
through the points (1, 1, 1) and (3, 1, 0). Note that the sketch below is generic:
Figure 14.3.3
Parametric
equation We need to select a position
vector on the line (let us choose
1 2
1) and a vector parallel to the line such as 0 (see the sketch). A parametric
1 −1
x 1 2
equation then follows: y = 1 + µ 0 , where µ ∈ R.
z 1 −1
Cartesian equation We eliminate
µby combining
the first with the third equation of the
x 1 2
x−1 z−1
parametric system y = 1 + µ 0 , thus obtaining the equation = .
2 −1
z 1 −1
Equivalently, this equation can be written as x+2z = 3. The other equation needed for the
Cartesian description is given directly by the second equation of the parametric system,
namely y = 1. You can confirm that the Cartesian system of equations x + 2z = 3, y = 1
is satisfied by each of the points (1, 1, 1) and (3, 1, 0) on the line.
It is important to realise that the Cartesian system x + 2z = 3, y = 1 can be interpreted
as the intersection of two planes. The first plane has Cartesian equation x + 2z = 3 and
the second plane has Cartesianequation
y=1. It is easy to see that these planes intersect
1 0
because their normal vectors 0 and 1 are not parallel. You may also note that
2 0
2
these normal vectors are perpendicular to the direction vector 0 of the line.
−1
169
Figure 14.3.4
Let us also emphasise that the Cartesian description of a line in R3 is non-unique in a way
that goes beyond a simple rescaling. While the Cartesian equation of a line in R2 can only
be rescaled (i.e., x + 2y = 3 can be written equivalently as 5x + 10y = 15), the situation in
R3 is entirely different. For example, the equations x + 2z = 3, y = 1 describing the above
line can be added together to produce an equivalent system of equations x + y + 2z = 4,
y = 1. Geometrically, the old plane x + 2z = 3 is replaced by the new plane x + y + 2z = 4,
but the intersection of this new plane with the plane y = 1 still produces the same line.
Such operations (which transform a linear system of equations into an equivalent linear
system in the sense that they preserve the set of solutions) correspond to sequences of
elementary row operations.
Example 14.3.5 Find a parametric equation for the line in R3 described by the system
of Cartesian equations x + 2z = 3, y = 1.
Let us utilise the fact that a parametric equation is a general solution for the Cartesian
equations. We have two equations for three variables, which implies that one of the vari-
ables x, y, z can be regarded as free. Clearly, this variable cannot be y because y is equal
to 1. However, we can consider z to be a free variable and write z = t, where t ∈ R. Then,
equation x + 2z = 3 implies
that x = 3− 2t. Putting x = 3 − 2t, y = 1 and z = t in
x 3 − 2t
vector form we obtain y = 1 . This provides a parametric equation for the
z t
x 3 −2
line: y = 1 + t 0 . Check that this equation is equivalent to the parametric
z 0 1
equation of Example 14.3.2.
We can now complete the presentation of lines in R3 by briefly describing the second case.
Case (ii): Suppose we are given a point (p1 , p2 , p3 ) on the line and a vector u parallel to
the line. Then, denoting the position vector to the point (p1 , p2 , p3 ) by p, we can write
down a parametric equation directly: x = p + su, s ∈ R. The best way of constructing
a Cartesian equation is by eliminating the parameter s from the parametric system, as
discussed earlier. This always results in a system of two linear equations for the variables
x, y and z.
170
14.4 Geometric and algebraic approaches to linear systems
Consider a system of 3 linear equations for 3 unknowns. Each equation represents a plane.
A solution of this system is a vector whose components x, y and z must satisfy all equations,
which means that the end-point of this vector is on all planes, so (x, y, z) has to be a point
of intersection of these planes. There are eight distinct geometric possibilities, illustrated
below by suppressing one dimension (what appears as a line is actually a plane):
Figure 14.4.1
Let us investigate these cases by means of simple examples. We will analyse each example
from both the geometric and the algebraic point of view.
Example 14.4.2 A typical example of three distinct parallel planes is the following:
x + y − 2z = 2
x + y − 2z = 5
x + y − 2z = 7.
1
From the geometric point of view, all planes have 1 as a normal vector, so they are
−2
either parallel or coincident. The first plane goes through the point (2, 0, 0) while the
second and third planes do not go through that point. This already tells us that the three
planes have no point of intersection. Moreover, the second plane goes through the point
(5, 0, 0) while the third plane does not go through that point. So the three planes are
indeed distinct parallel planes.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has two rows whose first three entries are zeros and whose fourth
171
entry is non-zero. In particular, if we subtract the first row from the second and third
rows, we get:
1 1 −2 2
0 0 0 3 .
0 0 0 5
Therefore, the reduced system has two inconsistent rows and no solution.
Example 14.4.3 A typical example of two coincident planes and a distinct parallel plane
is the following:
x + 3y = 2
x + 3y = 7
2x + 6y = 4.
1
From the geometric point of view, all planes have a multiple of n = 3 as a normal
0
vector, so they are parallel or coincident. The first and third planes both go through the
point (2, 0, 0) so they are coincident. The second plane does not go through that point, so
it is indeed a distinct parallel plane. There is no point lying on all three planes, so there
is no point of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row of zeros and another row whose first three entries are zeros
and whose fourth entry is non-zero. In particular, if we subtract the first row once from
the second row and twice from the third row, we get:
1 3 0 2
0 0 0 5 .
0 0 0 0
172
row, we get:
1 5 3 2
0 0 0 0 .
0 0 0 0
For example, y and z can be regarded as free parameters, so we get a two-parameter family
of solutions.
Example 14.4.5 A typical example of two parallel planes with a third plane intersecting
them is the following:
x+z = 5
x+y+z = 6
x + y + z = 8.
1
From the geometric point of view, the last two planes have 1 as a normal vector so
1
they are parallel or coincident. The second plane goes through the point (1, 1, 4) while
the third plane
does not go through this point, so the planes are not coincident. The first
1
plane has 0 as a normal vector, so it is not parallel to the last two. Although this plane
1
intersects the other two planes, there is no point that lies on all three planes. Hence, there
is no point of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix has two leading ones (because there are at least two non-parallel planes), but also
a row whose first three entries are zero and whose fourth entry is non-zero. In particular,
if we subtract the first row from the second row and we also subtract the second row from
the third row, we get:
1 0 1 5
0 1 0 1 .
0 0 0 2
Hence, there is an inconsistent row and no solution.
Example 14.4.6 A typical example of two coincident planes with a third plane inter-
secting them is the following:
x+z = 5
x+y+z = 6
2x + 2y + 2z = 12.
1
From the geometric point of view, the last two planes have a multiple of 1 as a normal
1
vector so they are parallel or coincident. Both these planes goes through the point (6, 0, 0),
so they are coincident and hence the same plane. Moreover, this last plane is not parallel
1
to the first plane, because the first plane has 0 as a normal vector. Recall that two
1
173
non-parallel planes intersect at a line, so there are infinitely many solutions along this line
of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix has two leading ones (because there are at least two non-parallel planes) and a
row of zeros. In particular, if we subtract the first row from the second row and we also
subtract the second row twice from the third row, we get:
1 0 1 5
0 1 0 1 .
0 0 0 0
Since there are only two restrictions on three variables, there is a free parameter in the
system and hence a line of solutions.
Example 14.4.7 A typical example of three non-parallel planes intersecting at a line is
the following:
x+y+z = 5
x + 2y + 3z = 6
2x + 3y + 4z = 11.
From the geometric point of view, we can see that these planes have non-parallel nor-
mal vectors, so they are not parallel. However, it is difficult to proceed further without
performing some calculations. Linear algebra becomes essential here.
Indeed, from the point of view of linear algebra, we can see that if we add the first two
equations together, we obtain the third equation. Hence, the third equation is redundant.
Indeed, the reduced row echelon form of the augmented matrix has two leading ones (be-
cause there are at least two non-parallel planes), but also a row of zeros, corresponding to
the redundant equation. In particular, if we subtract the first row once from the second
row and twice from the third row, we get:
1 1 1 5
0 1 2 1 .
0 1 2 1
After subtracting the second row from the first and third rows, we arrive at the reduced
row echelon form of the augmented matrix:
1 0 −1 4
0 1 2 1 .
0 0 0 0
Since there are only two restrictions on three variables, there is a free parameter in the
system and hence a line of solutions. Due to its shape, this case can be referred to as a
star (see figure 14.3.1).
Example 14.4.8 A typical example of three non-parallel planes with no common point
of intersection is the following:
x+y+z = 5
x + 2y + 3z = 6
2x + 3y + 4z = 12.
174
From the geometric point of view, we can see that these planes have non-parallel normal
vectors, so they are not parallel. Again, it is difficult to proceed further without performing
some calculations. We have to rely on linear algebra once more.
From the point of view of linear algebra, we can see that if we add the first two equations
together, we obtain an equation which contradicts the third equation. Hence, the system
is inconsistent. Indeed, the reduced row echelon form of the augmented matrix has two
leading ones (because there are at least two non-parallel planes), but also a row whose first
three entries are zeros and whose fourth entry is non-zero. In particular, if we subtract the
first row once from the second row and twice from the third row, we get:
1 1 1 5
0 1 2 1 .
0 1 2 2
After subtracting the second row from the first and third rows, we arrive at the reduced
row echelon form of the augmented matrix:
1 0 −1 4
0 1 2 1 .
0 0 0 1
Hence, there is an inconsistent row and no solution. This case can be referred to as a
prism (see figure 14.3.1).
Example 14.4.9 Finally, a typical example of three non-parallel planes intersecting at a
single point is the following:
x = 5
x+y = 6
x+y+z = 8.
Again, from the geometric point of view, we can see that these planes are not parallel but
it is difficult to proceed further without performing calculations.
From the point of view of linear algebra, the reduced row echelon form of the coefficient
matrix is the identity. In particular, the reduced row echelon form of this system is
1 0 0 5
0 1 0 1 .
0 0 1 2
2x + y − 3z = 4
(a) Find a vector parametric equation of the plane Π by treating x and z as free parameters;
i.e. by letting x = s, s ∈ R, and letting z = t, t ∈ R.
175
(b) Find a vector parametric equation of the same plane Π by performing elementary row
operations on the augmented matrix
, -
2 1 −3 | 4
in order to obtain the reduced row echelon form, and then letting y = λ, λ ∈ R, and
z = µ, µ ∈ R, play the role of the free parameters.
(c) In what sense are the parametric equations obtained in parts (a) and (b) equivalent?
Exercise 14.5.2 Consider the line ) of intersection of the planes Π1 and Π2 described by
the Cartesian equations
Π1 : x + 2y − 3z = 5 and Π2 : x + y + z = 7.
(a) Find a vector parametric equation for the line ) and confirm that its direction vector
is orthogonal to the normal vectors of both planes Π1 and Π2 .
(b) Eliminate the parameter from the vector parametric equation of line ) to obtain a
Cartesian description for the line.
(c) Establish the relationship between the linear system of equations obtained in part (b)
and the original linear system
C
x + 2y − 3z = 5
x + y + z = 7.
Π1 : x + 3y + z = 4,
Π2 : x + y − 2z = 0,
Π3 : 2x − 7z = −4.
(a) Are any of these planes parallel? Briefly justify your answer.
(b) Using elementary row operations, find the reduced row echelon form of the augmented
matrix
1 3 1 4
1 1 −2 0
2 0 −7 −4
and hence show that these planes have a common line of intersection.
(c) Confirm that the direction vector d of the line r = p + td that you obtained in part (b)
is orthogonal to the normal vectors of all three planes Π1 , Π2 , Π3 and that the components
of the vector p satisfy all three equations
x + 3y + z = 4
x + y − 2z = 0
2x − 7z = −4
176
Exercise 14.5.4 Consider a vertical plane Π (i.e. parallel to the z-axis) which passes
through the points (1, 3, 0) and (1, 5, 2).
(a) Find a vector parametric equation for Π in the form r = p + sv + tw.
(b) Considering the cross product of the vectors v and w, or otherwise, find a Cartesian
equation for Π in the form
ax + by + cz = d
for some constants a, b, c, d to be determined.
(c) Confirm that the points (1, 3, 0) and (1, 5, 2) both satisfy the Cartesian equation ob-
tained in part (b) and that this Cartesian equation indeed describes a plane that is vertical.
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 1.5, 1.6, 1.7 and 1.8 of our Calculus Textbook are also relevant.
177
A line in Rn is a one-dimensional object described by the parametric equation
x = p + td,
x1
x2
where t ∈ R. The vector x = .. is the general position vector on the line, the vector
.
xn
p1 d1
p2 d2
p = .. is a particular position vector on the line, and the vector d = .. is a
. .
pn dn
direction vector for the line.
In order to obtain a Cartesian description for a line in Rn , we eliminate the free parameter
from its vector parametric equation. This results in n − 1 independent linear equations
for the n variables x1 , . . . , xn . Conversely, a system of n − 1 independent linear equations
for the variables x1 , . . . , xn imposes n − 1 independent restrictions on these variables. This
implies that one of these variables can be regarded as a free parameter. Hence, the general
solution of this set of equations corresponds to a line in Rn . Exercise 15.5.1 (b) provides
an example of a line in R4 .
A hyperplane in Rn is an (n − 1)-dimensional geometric object described by a Cartesian
equation of the form
9x − p, n: = 0.
The vector x is the general position vector on the hyperplane, p is a particular position
vector on the hyperplane and n is a vector orthogonal to the hyperplane.
Algebraically, a hyperplane in Rn is the set of solutions of a single linear equation for n
variables x1 , . . . , xn . Since a single restriction is imposed on x1 , . . . , xn , we deduce that n−1
of these variables remain free in the general solution. This confirms that a hyperplane is an
(n − 1)-dimensional object in Rn . Conversely, the general solution of this single equation
involves n−1 free parameters and provides a vector parametric equation for the hyperplane.
Exercise 15.5.1 (a) gives an example of a hyperplane in R4 .
In general, a flat in Rn is the set of solutions of any matrix equation Ax = b, where A is an
m × n matrix and b is an m × 1 column vector. If the system Ax = b is inconsistent, then
the flat is the empty set. If the system Ax = b is consistent and the reduced row echelon
form of the m × n matrix A has k leading ones, then n − k of the variables x1 , . . . , xn can
be regarded as free parameters in the general solution of this system. In this case, the flat
is an (n − k)-dimensional object in Rn . Conversely, given a Cartesian description for an
(n − k)-dimensional flat in Rn in the form of a system of k independent linear equations
for n variables x1 , . . . , xn , the general solution of this system involves n − k free parameters
and provides a vector parametric equation for this flat. Exercise 15.5.2 gives an example
of a two-dimensional flat in R5 .
Remark 15.1.1 Let us emphasise again that the general solution of any set of linear
equations is a flat. In particular, points, lines, planes and hyperplanes are all flats. Also
note that in Rn , a consistent system of k independent linear equations implies the presence
178
of n − k free parameters in the general solution of the system. For example, a consistent
system of seven independent linear equations in R12 implies a 5-dimensional flat in R12 .
A sequence of elementary row operations that puts the augmented matrix in row echelon
form is the following:
R2 ;→ R2 − 2R1
R3 ;→ R3 − R1 1 1 1 1 1 3
R4 ;→ R4 − R1 0 −1 −1 −1 0 −2
−−−−−−−−−−−−−→ 0 −2 −2 0 0 2
0 −1 −1 0 0 1
1 1 1 1 1 3
R2 ;→ −R2 0 1 1 1 0 2
−−−−−−−−−→
0 −2 −2
0 0 2
0 −1 −1 0 0 1
179
R3 ;→ R3 + 2R2 1 1 1 1 1 3
R4 ;→ R4 + R2 0 1 1 1 0 2
−−−−−−−−−−−−−→ 0
0 0 2 0 6
0 0 0 1 0 3
1 1 1 1 1 3
R3 ;→ 12 R3 0 1 1 1 0 2
−−−−−−−−−→ 0
0 0 1 0 3
0 0 0 1 0 3
1 1 1 1 1 3
R4 ;→ R4 − R3 0 1 1 1 0 2
−−−−−−−−−−−−→ 0
.
0 0 1 0 3
0 0 0 0 0 0
This matrix is in echelon form and the system is clearly consistent, with one row simply
having dropped out. We continue the process of reduction in order to obtain a reduced
row echelon form, starting with the third row:
R1 ;→ R1 − R3 1 1 1 0 1 0
R2 ;→ R2 − R3 0 1 1 0 0 −1
−−−−−−−−−−−−→ 0
0 0 1 0 3
0 0 0 0 0 0
1 0 0 0 1 1
R1 ;→ R1 − R2 0 1 1 0 0 −1
−−−−−−−−−−−−→ 0
.
0 0 1 0 3
0 0 0 0 0 0
There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. The matrix is equivalent to the system of equations
x1 + x5 = 1
x2 + x3 = −1
x4 = 3.
Since we have three restrictions for five variables, we know that our set of solutions will
be a 2-dimensional flat in R5 . Moreover, the form of these equations tells us that we can
regard the variables x1 , x2 and x4 (which correspond to the leading ones) as the variables
for which we are solving these equations and then treat x3 and x5 as free parameters.
At this stage, it is useful to introduce the following terminology. The variables correspond-
ing to the leading ones in the reduced row echelon form of an augmented matrix are called
the leading variables. The other variables are called the non-leading variables.
Hence, in Example 15.2.1, x1 , x2 and x4 are the leading variables and x3 and x5 are the
non-leading variables. Assigning to x3 and x5 the arbitrary values s ∈ R and t ∈ R
180
respectively, we solve for the leading variables x1 , x2 and x4 . We get
x4 = 3, x2 = −1 − s, x1 = 1 − t.
This is a vector parametric equation for a 2-dimensional flat in R5 (some sort of plane in
R 5
). The flatgoesthrough the point (1, −1, 0, 3, 0) and is parallel to the direction vectors
0 −1
−1 0
1 and 0 . For any particular assignment of values to s and t, such as s = 0,
0 0
0 1
t = 1, we obtain a particular solution of the system.
In practice, it is convenient to read the general solution directly from the reduced row
echelon form of the augmented matrix. We have
1 0 0 0 1 1
0 1 1 0 0 −1
(A|b) → 0 0 0 1 0 3 .
0 0 0 0 0 0
We locate the leading ones and hence identify the corresponding leading variables. Then
we locate the non-leading variables and treat them as free parameters. In the above case,
we note that the leading ones are in the first, second and fourth column. These correspond
to the leading variables x1 , x2 and x4 . We assign arbitrary parameters to the non-leading
variables x3 = s and x5 = t and then solve the three equations (appearing in the reduced
row echelon form) for the leading variables in terms of s and t. Finally, we express the
general solution in vector parametric form to recover the expression obtained previously.
181
If p and q are vectors such that Ap = b and Aq = b, p )= q, then the general position
vector v(t) on the line through p and q is described parametrically by
This means that the vector v(t) solves the linear system Ax = b for any t ∈ R, so there
are infinitely many solutions. !
(b) Find a vector parametric equation for the line in R4 described by the system of Cartesian
equations
x1 + x4 = 0
2x2 − x3 + 3x4 = −3
x1 + x2 + x3 + x4 = 5.
182
Exercise 15.5.2 Solve the following system of equations Ax = b by applying the Gauss-
Jordan elimination method. Interpret geometrically your general solution in R5 :
x1 − x2 + x3 + x4 + 2x5 = 4
−x1 + x2 + x4 − x5 = −3
x1 − x2 + 2x3 + 3x4 + 4x5 = 7.
Exercise 15.5.3 (a) Write down the augmented matrix for each of the following systems
of equations, where (x, y, z) ∈ R3 , and solve the system by reducing the augmented matrix
to reduced row echelon form. Express all solutions in vector parametric form.
x + y + 2z = 2 x + y + 2z = 2
(i) 2y + z = 0 (ii) 2y + z = 0
−x + y − z = 0 −x + y − z = −2
183
16 Systems of Linear Equations, 2 of 2
16.1 The principle of linearity
Each equation appearing in a linear system of m equations for n unknowns imposes a single
restriction on n variables; in other words, it corresponds to a hyperplane in Rn . Therefore,
the solution set of this system is either the empty set (if the system is inconsistent) or the
set of all points (equivalently, position vectors) lying on the intersection of m hyperplanes
in Rn . In particular, provided that the system Ax = b is consistent and that the reduced
row echelon form of the coefficient matrix A has k leading ones, the set of solutions is an
(n − k)-dimensional flat in Rn .
The vector parametric equation of this flat has the form
x = p + t1 u1 + t2 u2 + · · · + tn−k un−k ,
where p is a particular solution (i.e. position vector on the flat) and t1 , t2 , . . . , tn−k are
(n − k) free parameters.
It should be clear from the examples and exercises encountered so far that each of the
vectors u1 , u2 , . . . , un−k is orthogonal to the normal vector of each hyperplane appearing
in the linear system. This is because each ui provides a direction vector for the intersection
flat of these hyperplanes and therefore also provides a direction vector for each of these
hyperplanes. As such, it must be orthogonal to the normal vector of each hyperplane.
Realising that the normal vector of the hyperplane appearing in row i of the linear system
is simply the transpose of the ith row of the coefficient matrix A, we conclude that the dot
product of the transpose of the ith row of A with each vector uj must be zero. Moreover,
this must be true for all rows of A, which implies that
x = p + t1 u1 + t2 u2 + · · · + tn−k un−k ,
184
Theorem 16.1.1 Suppose that A is an m × n matrix and that the system Ax = b is
consistent. Let p be a solution of the system; that is, Ap = b. Then the set of solutions
of the system consists precisely of the vectors p + z where z ∈ N (A); that is,
{x | Ax = b} = {p + z | z ∈ N (A)} .
Proof To show the two sets are equal, we need to show that each is a subset of the other.
This means showing (i) that p + z is a solution of the system for any z in the null space
of A and (ii) that all solutions x of the system are of the form p + z for some z ∈ N (A).
We start with (i): If z ∈ N (A), then
A(p + z) = Ap + Az = b + 0 = b,
{p + z | z ∈ N (A)} ⊆ {x | Ax = b} .
{x | Ax = b} ⊆ {p + z | z ∈ N (A)} .
185
R2 ;→ R2 − 2R1
1 1 1 6 R ;→ R3 − 3R1 1 1 1 6
2 −1 −3 −9 −−−3−−−−−−−−−−→ 0 −3 −5 −21
3 0 −2 −3 0 −3 −5 −21
1 R1 ;→ R1 − R2
R2 ;→ − R2 1 1 1 6 R3 ;→ R3 + 3R2 1 0 −2/3 −1
−−−−−−−−3−−−→ 0 1 5/3 7 −−−−−−−−−−−−−→ 0 1 5/3 7 .
0 −3 −5 −21 0 0 0 0
By letting z = t and solving for the leading variables x and y, we obtain the general
solution of our system in vector parametric form:
2
x −1 + 23 t −1 3
y = 7 − 5 t = 7 + t − 5 .
3 3
z t 0 1
186
where min{m, n} denotes the smaller of the two integers m and n.
In the particular case of a square n × n matrix A, if the reduced row echelon form of A
is the identity then, clearly, A has rank n. On the other hand, if the reduced row echelon
form of A is not the identity, then there must be at least a row of zeros in the reduced row
echelon form of A, which means that the rank of A is strictly less than n.
Hence, we can revisit the Main Theorem 11.4.2 characterising invertibility of a square
matrix A and add a few more equivalent statements to that collection.
The Main Theorem 16.2.1 If A is an n × n matrix, then the following statements are
equivalent.
• A−1 exists.
• Ax = b has a unique solution for any b ∈ Rn .
• Ax = 0 has only the trivial solution, x = 0.
• The reduced row echelon form of A is the identity I.
• |A| =
) 0.
• The rank of A is n.
Finally, let us introduce two additional definitions which will help us analyse in the next
subsection the solution set of a general linear system Ax = b.
Definition 3 An m × n matrix A has full row rank if its rank is equal to the number
of rows in A; that is, if ρ(A) = m.
Definition 4 Similarly, we say that an m × n matrix A has full column rank if its
rank is equal to the number of columns in A; that is, if ρ(A) = n.
187
Clearly, since the number of rows, m, is equal to the number
, of leading
- ones, there cannot
be a row of zeros on the m × n left submatrix of RRE (A|b) . Therefore, the system
is consistent. Moreover, there are (n − m) free parameters present in the system, so the
solution set is an (n − m)-dimensional flat in Rn .
The solution is unique only if the solution set does not contain any free parameters; that
is, only if n = m. If this is the case, the matrix A has full column rank and full row rank.
The corresponding flat in Rn consists of only a single point (equivalently position vector)
and can be referred to as a ‘zero-dimensional’ flat.
Remark 16.3.1 Summarising, if A has full row rank, the system Ax = b is consistent.
If, additionally, A has full column rank, the solution is unique.
Case 2: A has full column rank
Consider a linear system Ax = b where A is an m × n matrix of rank n. Since the number
of rows, m, must be at least as large as the number n of leading ones, we deduce that
m ≥ n.
Typically, the reduced row echelon form RRE((A|b)) of the augmented matrix (A|b) of
such a system looks like this:
1 0 0 0 ∗
0 1 0 0 ∗
0 0 1 0 ∗
m≥n 0 0 0 1 ∗
0
0 0 0 ∗
0
0 0 0 ∗
0 0 0 0 ∗
? @A B
n
In the general case where m ≥ n, whether or not the system is consistent depends on the
particular details of the equations comprising the system. For example, if the vector b on
the right hand side of the system is such that the last three rows of the above reduced
matrix consists exclusively of zeros (i.e. if the entry ‘∗’ in each of the last three rows is
zero), then the system is consistent. In that case, the solution is unique, since the full
column rank of A implies the absence of free parameters.
In the particular case when m = n, A has full row rank and full column rank so the solution
is unique regardless of the vector b on the right hand side of the system.
Remark 16.3.2 Summarising, provided that the system is consistent, if A has full column
rank, the solution is unique. If A also has full row rank, the consistency of the system is
guaranteed.
Case 3: A has neither full row rank nor full column rank
This case corresponds to a linear system Ax = b where A is an m × n matrix of rank
k where k < min(m, n). Typically, the reduced row echelon form RRE((A|b)) of the
188
augmented matrix (A|b) of such a system looks like this:
1 0 0 ∗ ∗
∗
∗
0 1 0 ∗ ∗
m 0 0 1 ∗ ∗ ∗
0 0 0 0 0
∗
0 0 0 0 0 ∗
? @A B
n
The system may or may not be consistent; this depends on the particular details of the
equations comprising the system. For example, if b is such that the last two rows of the
above matrix consist entirely of zeros (i.e. the entry ‘∗’ in each of the last two rows is
zero), the system is consistent. If this is the case, the system necessarily has infinitely
many solutions, because k < n.
Let us complete this subsection by introducing a mathematical condition that amounts to
the consistency of the linear system Ax = b. This is the following:
ρ(A|b) = ρ(A).
, -
When the above condition is satisfied, there is no row in RRE (A|b) whose first entries are
all zeros and whose last, entry -is non-zero, so the system ,is consistent.
- Indeed, the presence
of such a row in RRE (A|b) would imply that RRE (A|b) has an additional leading
one compared to RRE(A) which would immediately lead to ρ(A|b) > ρ(A). Collecting
all our findings, we have the following simple rules:
• If A has full row rank, then we definitely have ρ(A|b) = ρ(A), so the consistency of
the system Ax = b is guaranteed for all b.
• If A does not have full row rank, the system may or may not be consistent. This
depends on the details of the vector b. Provided that ρ(A|b) = ρ(A), i.e., provided
that the system is consistent, then if A has full column rank the solution is unique,
and if A does not have full column rank there are infinitely many solutions.
(a) Without solving the system, argue that the system is consistent and has a line ) of
solutions.
189
−1
(b) Still without solving the system, is the line ) parallel to the vector 3 ?
2
(c) Solve the system using the Gauss-Jordan method in order to find the general parametric
solution in terms of a and b.
(a) By performing suitable elementary row operations on the augmented matrix (A|b),
find a condition on a that guarantees the consistency of this system for all c ∈ R.
(b) In the case where the coefficient matrix above cannot guarantee the consistency of the
system, what should the value of c be in order for solutions to exist? For that particular
value of c, obtain the general solution of the system.
190
17 Vector Spaces, 1 of 4
17.1 Definition of a vector space
A real vector space V is a non-empty set equipped with a vector addition operation and
a scalar multiplication operation such that for all α, β ∈ R and all u, v, w ∈ V :
The above axioms reproduce the algebraic properties of the operations of vector addition
and scalar multiplication as defined for column vectors in Rn . In other words, the vector
addition in Rn ,
u1 v1 u1 + v1
u2 v2 u2 + v2
.. + .. = .. ,
. . .
un vn un + vn
and the scalar multiplication in Rn ,
u1 λu1
u2 λu2
λ .. := .. ,
. .
un λun
satisfy all the above axioms and turn the set Rn into a vector space. Other well-known
properties of vectors, such as the property that 0x = 0 for any x ∈ Rn , follow from the
above axioms. To see this:
0x = (0 + 0)x
0x = 0x + 0x by distributivity
0x + (−0x) = (0x + 0x) + (−0x) by adding the negative − 0x of 0x
0 = 0x + (0x + (−0x)) by associativity
0 = 0x + 0
0 = 0x.
191
Remark 17.1.1 The above axioms have been motivated by the algebraic properties of
column vectors in Rn . However, it is not only column vectors in Rn that can be added
and multiplied by scalars in a way consistent with the above axioms. For example, the set
F of all functions from R to R can be regarded as a vector space provided that suitable
operations of vector addition and scalar multiplication are defined on F . These operations
allow us to view each function f in F as a vector in an abstract sense. Similarly, sets
of matrices, sets of polynomials, sets of sequences and many other sets of mathematical
objects can mimic the properties of column vectors in Rn provided that suitable definitions
of vector addition and scalar multiplication are introduced on these sets. A couple of
examples are given below:
Example 17.1.2 Let F be the set of all functions from R to R with vector addition
defined by point-wise addition of functions,
Then F is a vector space. Note that the function which plays the role of the zero vector 0
in the vector space F is the unique function z ∈ F which satisfies axiom 4; namely, for all
f ∈ F,
(f + z)(x) = f (x) ∀x ∈ R.
Given the definition of vector addition on F , the above condition implies that
192
Remark 17.1.5 Also note that the definition of a vector space says nothing about a
norm or a scalar product. The only operations with which the definition is concerned are
vector addition and scalar multiplication. Concepts such as the norm or the scalar product
(that we introduced and visualised for column vectors in Rn ) are additional structures on
a vector space. We are going to encounter such structures in Lecture Notes 20.
The above theorem is known as the subspace criterion. It provides a practical way for
deciding if a subset of a vector space V is itself a vector space; i.e., a subspace of V .
Example 17.2.2 Consider the vector space R2 with the standard operations of vector
addition and scalar multiplication. Let the subset S ⊂ R2 be defined by
* &x ' D +
D
S= D y = 2x .
y
Show that S is a subspace of R2 .
The set S is a line through the origin of R2 described by the Cartesian equation y = 2x.
To show that S is a subspace of R2 we apply the & subspace
' criterion. We note that S is a
0
non-empty set since, for example, the zero vector ∈ S. To show closure under vector
0
addition, let us take any two elements u, v of S,
& ' & '
a c
u= where b = 2a, and v= where d = 2c.
b d
Their vector sum is the vector
& ' & ' & '
a c a+c
u+v = + = .
b d b+d
The subset S is closed under vector addition only if u + v is in S. In order to verify this,
we note that
b + d = 2a + 2c = 2(a + c),
so indeed, the components of u + v satisfy the Cartesian equation y = 2x. Hence, u + v
belongs to S.
To show closure under scalar multiplication, we take any element of S,
& '
a
u= with b = 2a,
b
193
and any scalar λ ∈ R and we consider the vector λu. We have
& ' & '
a λa
λu = λ = .
b λb
λb = λ(2a) = 2(λa),
so indeed, the components of λu satisfy the Cartesian equation y = 2x. Hence, λu belongs
to S.
By the subspace criterion, S is a subspace of R2 .
Example 17.2.3 With the vector space R2 as in Example 17.2.2, consider the subset
U ⊂ R2 defined by
* &x ' D +
D
U= D y = 2x + 1 .
y
Show that U is not a subspace of R2 .
& 'note that U is a line in R which does not go through the origin. Hence, the zero vector
2
We
0
/ U . Any subspace of R2 must be a vector space in its own right, so it must contain
∈
0
& 'unique zero vector of R in order to have any chance of being a vector space. Since
2
the
0
/ U , axiom 4 fails. This implies that U is not a subspace of R2 .
∈
0
Remark 17.2.4 Examples 17.2.2 and 17.2.3 are visualised below:
Figure 17.2.5
It should be clear from the first diagram that the vector sum of any two elements of S is an
element of S and that any scalar multiple of any element of S is an element of S. On the
other hand, U is not closed under vector addition and scalar multiplication. The vectors
depicted provide counterexamples.
Remark 17.2.6 Stated without proof, the following is a valid result: Any flat in Rn
(that is, point, line, plane, etc) that contains the origin of Rn is a subspace of the vector
194
space Rn under the standard operations of vector addition and scalar multiplication. On
the other hand, any flat in Rn that does not contain the origin of Rn is not a subspace of
Rn under the same operations.
Remark 17.2.7 A subset S of a vector space V which contains the zero vector of V may
or may not be a subspace of V . For example, consider V = R2 and let S be the subset of
V depicted below:
The subset S contains the zero vector of V , but it is not closed under vector addition and
scalar multiplication. The vectors depicted provide counterexamples.
Show that these definitions do not turn the set V = R2 into a vector space by identifying
an axiom that fails to hold.
Exercise 17.3.2 Consider the vector space V = R2 where vector addition and scalar
multiplication are defined in the standard way:
& ' & ' & '
u1 v1 u1 + v1
u+v = + :=
u2 v2 u2 + v2
195
Identify which of the following subsets of R2 are subspaces of R2 :
* &x ' D +
D
(i) Q = D 2x − y = 0 ,
y
* &x ' D +
D
(ii) R = Dx=1 ,
y
* x' D
& +
D 2 2
(iii) S = D x + (y − 1) = 1 .
y
Exercise 17.3.4 Identify which of the following subsets of R3 are subspaces of R3 under
the standard operations of vector addition and scalar multiplication:
* x D + * x D +
D D 2 2 2
S1 = y D x + y + z = 0 , S2 = y Dx +y +z =1 ,
z z
* x D + * x D +
D D
S3 = y D xy = 0 , S4 = y D x = 0 and y = 0 .
z z
In each case, provide a proof or a counterexample to justify your answer. Also describe
each subset geometrically.
196
18 Vector Spaces, 2 of 4
18.1 Linear span
Recall that by a linear combination of vectors v1 , v2 , . . . , vk we mean a vector v of the
form
v = α1 v1 + α2 v2 + · · · + αk vk ,
for some constants αi ∈ R.
Now suppose that V is a vector space and that the vectors v1 , v2 , . . . , vk all belong to V .
The linear span of the set X = {v1 , v2 , . . . , vk }, denoted by Lin(X) or Lin{v1 , v2 , . . . , vk },
is the set of all linear combinations of the vectors v1 , v2 , . . . , vk . That is,
& '
a
which implies that u also belongs to R2 . Conversely, any vector v = ∈ R2 can be
b
written as a linear combination of the vectors v1 , v2 , v3 ; for example,
& ' & ' & ' & '
a 1 0 2
=a +b +0 .
b 0 1 3
197
Theorem 18.1.3 If X = {v1 , v2 , . . . , vk } is a set of vectors that all belong to a vector
space V , then Lin(X) is a subspace of V . In fact, it is the smallest subspace of V containing
the vectors v1 , v2 , . . . , vk .
Proof The set Lin(X) is non-empty, since
Example
18.1.4 Let V = R4 and consider the set of vectors X = {v1 , v2 } given by
1 0
0
v1 = ∈ R4 and 1 ∈ R4 . Theorem 18.1.3 tells us that Lin{v1 , v2 } is a subspace of
0 0
0 0
R , call it S. We can express S parametrically as follows:
4
N s D O
t DD
S=
0 DD s ∈ R, t ∈ R .
0
0
0
0
contains v1 and v2 , but it is larger than S since it also contains / S.
1 ∈
0
Given a set of vectors X = {v1 , v2 , . . . , vk } that all belong to a vector space V , we say
that X spans V if Lin(X) = V ; that is, if every vector v ∈ V can be written as a linear
combination
v = α1 v1 + α2 v2 + · · · + αk vk
for some scalars α1 , . . . , αk .
In the case where V = Rn , consider the following frequently occurring problem: We have
a set of vectors {v1 , v2 , . . . , vk } in Rn and we want to know if this set spans Rn ; that is,
we want to know if Lin{v1 , v2 , . . . , vk } = Rn .
198
Clearly, Lin{v1 , v2 , . . . , vk } ⊆ Rn because any linear combination of vectors in Rn is also
in Rn . So, in order to establish the equality Lin{v1 , v2 , . . . , vk } = Rn , we only need to
show that Rn ⊆ Lin {v1 , v2 , . . . , vk }. In other words, given any b ∈ Rn , we need to show
that we can find scalars α1 , α2 , . . . , αk such that
α1 v1 + α2 v2 + · · · + αk vk = b.
, -
Defining the matrix A = v1 v2 . . . vk as the n ×k matrix whose columns are the vectors
α1
..
v1 , v2 , . . . , vk , and the k × 1 column vector x = . as the vector of the unknowns, we
αk
can express the above vector equation as the linear system
Ax = b.
In order to show that any b can be obtained as a linear combination of the vectors {vi },
we require that the above matrix system admits at least one solution; that is, we require
that the system (A|b) is consistent for all b. This happens only if A has full row rank.
This is a very useful result in practice, as the following examples illustrate:
1
Example 18.1.6 Show that the set of vectors X = {v1 , v2 , v3 , v4 }, where v1 = 0,
0
0 0 1
v2 = 1 , v3 = 0 , v4 = 2 , spans R3 .
1 3 1
1 0 0 1 b1 1 0 0 1 b1 R →
' 1
R
1 0 0 1 b1
R3 '→R3 −R2 3 3 3
0 1 0 2 b2 −
−−−−−−→ 0 1 0 2 b2
−−−−−→ 0 1 0 2 b2
b3 −b2
0 1 3 1 b3 0 0 3 −1 b3 − b2 0 0 1 − 13 3
199
The system is consistent since A has full row rank. Hence, X spans R3 .
, -
Alternatively, and faster, we only consider A = v1 v2 v3 v4 and simply verify that it has
full row rank:
1 0 0 1 1 0 0 1 1
R '→ R3
1 0 0 1
R '→R −R2
A = 0 1 0 2 −−3−−−3−−→ 0 1 0 2 −−3−−3−→ 0 1 0 2 .
0 1 3 1 0 0 3 −1 0 0 1 − 13
Again, let us first approach this question by using the definition of ‘X spanning R2 ’. For
this to be the case, we must find scalars α1 , α2 such that
& ' & ' & '
1 2 b
α1 + α2 = 1
2 4 b2
& '
b
for any 1 ∈ R2 . Performing a few elementary row operations on the augmented matrix
b2
of this system we find
& ' & '
1 2 b1 R2 '→R2 −2R1 1 2 b1
−−−−−−−→ .
2 4 b2 0 0 b2 − 2b1
& ' & '
b1 b
In this case, it is not true that the system is consistent for all ∈ R , since any 1
2
b2 b2
which satisfies b2 − 2b1 )= 0 immediately leads to inconsistency. Hence X does not span
R2 .
, -
Alternatively, and faster, we simply focus on A = v1 v2 and verify that it does not have
full row rank: & ' & '
1 2 R2 '→R2 −2R1 1 2
A= −−−−−−−→ .
2 4 0 0
The conclusion that X does not span R2 follows.
α1 v1 + α2 v2 + · · · + αk vk = 0
α1 v1 + α2 v2 + · · · + αk vk = 0.
200
In the case where V = Rn , consider the following frequently occurring problem: We have
a set of vectors v1 , v2 , . . . , vk in Rn and we need to decide whether or not the vectors
v1 , v2 , . . . , vk are linearly independent.
, -
Again, by letting A= v1 v2 . . . vk be the n × k matrix whose columns are the vectors
α1
..
v1 , . . . , vk and x = . be the k × 1 column vector of the unknowns α1 , α2 . . . , αk , we
αk
express the vector equation
α1 v1 + α2 v2 + · · · + αk vk = 0
201
1 0
Example 18.2.3 In R3 , are the vectors v1 = 2, v2 = 1 linearly independent?
0 1
Do they span R3 ?
, -
Again, both questions can be answered by considering A = v1 v2 . We perform a few
elementary row operations on A to get it in RRE form:
1 0 1 0 1 0
R2 '→R2 −2R1 R '→R −R2
2 1 − −−−−−−→ 0 1 −−3−−−3−−→ 0 1 .
0 1 0 1 0 0
Since A does not have full row rank, v1 , v2 do not span R3 . On the other hand, A has full
column rank, so the vectors v1 , v2 are linearly independent.
& ' & ' & '
1 1 2
Example 18.2.4 In R , are the vectors v1 =
2
, v2 = , v3 = linearly
1 2 3
independent? Do they span R2 ?
& '
1 1 2
We consider A = and perform a few elementary row operations:
1 2 3
& ' & ' & '
1 1 2 R2 '→R2 −R1 1 1 2 R1 '→R1 −R2 1 0 1
−−−−−−−→ −−−−−−−→ .
1 2 3 0 1 1 0 1 1
Since A has full row rank, v1 , v2 , v3 span R2 . On the other hand, A does not have full
column rank, so the vectors v1 , v2 , v3 are not linearly independent.
1 0 0
Example 18.2.5 In R , are the vectors e1 = 0 , e2 = 1 , e3 = 0 linearly
3
0 0 1
independent? Do they span R ? 3
1 0 0
We see that A = 0 1 0 has both full column rank and full row rank. So these vectors
0 0 1
are linearly independent and span R3 .
Case (i): If the number of vectors, k, exceeds the dimension n of the vector space Rn , we
have that
ρ(A) ≤ n < k.
In this case, A may or may not have full row rank, but it certainly does not have full
column rank. In other words, the vectors cannot be linearly independent. They may or
may not span Rn , depending on whether ρ(A) < n or ρ(A) = n.
202
Case (ii): If the number of vectors, k, is less than the dimension n of the vector space
Rn , we have that
ρ(A) ≤ k < n.
A may or may not have full column rank, but it certainly does not have full row rank. In
other words, the vectors cannot span Rn . They may or may not be linearly independent,
depending on whether ρ(A) < k or ρ(A) = k.
Case (iii): If the number of vectors, k, is equal to the dimension n of the vector space Rn ,
the rank of A either satisfies
ρ(A) = k = n
and the vectors are linearly independent and also span Rn , or
ρ(A) < k = n,
in which case the vectors are not linearly independent and do not span Rn .
Alternatively, we can consider the determinant of the square matrix A. If |A| =
) 0, the
vectors are linearly independent and span Rn . If |A| = 0, the vectors are not linearly
independent and they do not span Rn .
Exercise 18.3.2 Let F(R) be the set of all functions from R to R. It is given that F(R)
is a vector space under the operations of vector addition and scalar multiplication given
below:
(f + g)(x) := f (x) + g(x),
(λf )(x) := λf (x).
Determine which of the following two sets are subspaces of F(R):
S1 = {f ∈ F(R) | f (0) = 1} , S2 = {f ∈ F(R) | f (1) = 0} .
Exercise 18.3.3 (a) Show that the set S1 spans R3 , and that any vector v ∈ R3 can be
written as a linear combination of the vectors in S1 in infinitely many ways. Show that S2
does not span R3 .
N 1 1 0 1 O N 1 2 1 O
S1 = 2 , 0 , 1 , 1 , S2 = 0 , 1 , 2 .
3 −1 1 0 −1 3 9
203
(b) Determine which of the following sets consist of linearly independent vectors.
N& ' & 'O N& ' & ' & 'O N& ' & 'O
1 1 1 1 2 0 1
L1 = , , L2 = , , , L3 = , ,
2 3 2 3 5 0 2
N 1 2 3 O N 1 2 4 O
2 0 4
L4 = 2 , 7 , 5 , L5 =
1 , −1 , 1 .
0 0 0
3 2 8
Exercise 18.3.4 Let M2 (R) be the set of all 2 × 2 matrices with real entries. It is given
that M2 (R) is a vector space under the standard operations of matrix addition and scalar
multiplication. Which of the following subsets of M2 (R) are subspaces of M2 (R)?
N& 'D O N& 'D O
a 0 D a2 0 D
W1 = D a, b ∈ R , W2 = D a, b ∈ R .
0 b 0 b2
19 Vector Spaces, 3 of 4
19.1 Theorems on linear span and linear independence
In this subsection, we collect four theorems relating to the linear span and the linear
independence or dependence of a set of vectors {v1 , v2 , . . . , vk } in a vector space V . These
theorems are applicable to a general vector space V and their proofs can be found in our
Algebra Textbook. Below, we restrict attention to the case where V = Rn and base the
proofs of these theorems on the properties of the matrix A = (v1 v2 . . . vk ). It is essential
to know these theorems and fully understand their proofs.
Theorem 19.1.1 The set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set if and only if
at least one vector vi in this set is a linear combination of the remaining vectors.
Proof This theorem is of the form ‘if and only if’, so we need to prove it both ways: If the
set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set, then the matrix A = (v1 v2 . . . vk )
does not have full column rank. Otherwise, the only solution of the equation Ax = 0,
α1
α2
where x = .. is a vector of scalars, would be the trivial solution x = 0. This would
.
αk
imply that the vectors {v1 , v2 , . . . , vk } are linearly independent, thus contradicting our
assumption that the set {v1 , v2 , . . . , vk } is linearly dependent. Hence, since A does not
204
have full column rank, there exists at least one free parameter t in the general solution of
the system Ax = 0. Now suppose that the role of this free parameter t is played by the
entry αi in x for some i ∈ {1, 2, . . . , k}. In other words, suppose that αi = t. We can
choose t = 1 and set any other free parameters that may appear in the general solution of
the system Ax = 0 to zero. In this way, we obtain a particular solution for x of the form
α1 c1
. ..
.
. .
αi = 1
. .
.. ..
αk ck
for some constants c1 , . . . , ci−1 , ci+1 , . . . , ck that depend on the details of the system. This
solution for x implies that
c1 v1 + · · · + vi + · · · + ck vk = 0.
Hence, solving this equation for vi , we obtain vi as a linear combination of the remaining
vectors in the set {v1 , v2 , . . . , vk }. This proves one direction of the theorem; namely, that
if the set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set, then at least one vector vi in
this set is a linear combination of the remaining vectors.
Conversely, if at least one vector vi in the set {v1 , v2 , . . . , vk } is a linear combination of
the remaining vectors; that is, if
vi = c1 v1 + · · · + ci−1 vi−1 + ci+1 vi+1 + · · · + ck vk ,
we reach the conclusion that a non-trivial linear combination of the vectors in the set
{v1 , v2 , . . . , vk } (in other words, a linear combination where not all the scalars are zero) is
equal to the zero vector; namely,
c1 v1 + · · · + ci−1 vi−1 − vi + ci+1 vi+1 + · · · + ck vk = 0.
Hence, by definition, the set {v1 , v2 , . . . , vk } is linearly dependent. This proves the second
direction of the theorem and completes the proof. !
Corollary to Theorem 19.1.1 Note that in the special case where we have two vectors
v1 and v2 in Rn , where n ≥ 2, the above theorem implies that these vectors are linearly
dependent if and only if at least one of them is a scalar multiple of the other. In other
words, at least one of the following two expressions is valid:
α1 v1 + v2 = 0 for some scalar α1 (hence v2 = −α1 v1 )
or
v1 + α2 v2 = 0 for some scalar α2 (hence v1 = −α2 v2 ).
1 2
For example, in R , let v1 = 2 and v2 = 4. The matrix A = (v1 v2 ) does not have
3
4 8
full column rank, so the set {v1 , v2 } ∈ R3 is a linearly dependent set. In this particular
case, we can express each vector as a scalar multiple of the other, since
−2v1 + v2 = 0 (hence v2 = 2v1 )
205
and
1 1
v1 − v2 = 0 (hence v1 = v2 ).
2 2
1 0
On the other hand, let v1 = 2 and v2 = 0. The matrix A = (v1 v2 ) does not have
4 0
full column rank, so the set {v1 , v2 } ∈ R is a linearly dependent set. Here, we can express
3
It is not possible to express a non-zero vector as a scalar multiple of the zero vector. This
last example motivates another theorem:
Theorem 19.1.2 A set of vectors {v1 , v2 , . . . , vk } ⊆ Rn which contains the zero vector
is a linearly dependent set.
Proof This follows immediately: If one of the vectors in the set {v1 , v2 , . . . , vk } is
the zero vector, the matrix A = (v1 v2 . . . vk ) cannot have full column rank, so the set
{v1 , v2 , . . . , vk } is linearly dependent. !
The following theorem captures an important property of a linearly independent set of
vectors:
α1 v1 + α2 v2 + · · · + αk vk = β1 v1 + β2 v2 + · · · + βk vk
α1 = β1 , α2 = β2 , ..., αk = βk .
206
independent. If w is in the linear span of S, then the new set T = {v1 , v2 , . . . , vk , w} fails
to be linearly independent. This is because, by definition of the linear span, any such w
is a linear combination of the ‘v’-vectors in T = {v1 , v2 , . . . , vk , w}. Hence, by Theorem
19.1.1, the set T is a linearly dependent set. So, what if w is not in the linear span of S?
The following theorem guarantees that in this case the new set T = {v1 , v2 , . . . , vk , w} is
still linearly independent.
207
of the vectors in B:
v = α1 v1 + α2 v2 + · · · + αn vn .
Conversely, if every v ∈ Rn can be expressed as a unique linear combination of the vectors
in B,
v = α1 v1 + α2 v2 + · · · + αk vk ,
α1
α2
the equation Ax = v has a unique solution for x = .. for any right hand side v ∈ Rn .
.
αk
Hence, the n×k matrix A is actually square and invertible of size n. As a result, A has full
column rank and full row rank. This implies that the set B = {v1 , v2 , . . . , vk }, with k = n,
is a linearly independent set which spans Rn . Hence, by definition, the set B is a basis for
Rn . !
There are many different bases for a vector space V . A fundamental fact concerning vector
spaces is that if a vector space V has a basis consisting of a finite number of vectors, k,
then all the bases of V contain precisely the same number of vectors, k. This important
result is stated here without proof, but its proof can be found in our algebra textbook.
This result enables us to define exactly what we mean by the dimension of a vector space:
The number k of vectors in a finite basis of a vector space V is called the dimension of
V.
Remark 19.2.2 The vector space {0} is defined to have dimension 0. According to this
convention, {0} has a basis that contains no vectors. The only set without elements is the
empty set ∅, so this is the basis for {0}. In other words, {0} = Lin(∅).
Example 19.2.3 Any basis B of the vector space Rn has to have n vectors. Otherwise,
the matrix A whose columns are the vectors comprising the set B cannot have full column
rank and full row rank. Hence, the dimension of what is known as the n-dimensional real
coordinate space Rn is indeed n.
Example 19.2.4 Regarding the subspaces of Rn , any flat in Rn through the origin that
has a basis of k vectors is called a k-dimensional subspace of Rn . In particular, a line
through the origin is a one-dimensional subspace of Rn , a plane through the origin is a
two-dimensional subspace of Rn , and so on, until we reach a hyperplane through the origin,
which is an (n − 1)-dimensional subspace of Rn .
Example 19.2.5 On the other hand, the vector space F(R) of all functions from R to R
under the standard operations of pointwise addition and scalar multiplication has no finite
basis. We will provide more details about it later in the course. For the time being, let
us only mention that a vector space such as F(R) is called an infinite-dimensional vector
space.
208
By a smallest spanning set, we mean a set of vectors {v1 , v2 , . . . , vk } that spans V and
has the property that if any of the vectors in this set is removed, the remaining vectors
can no longer span V . In particular, this implies that a smallest spanning set is a linearly
independent set.
v = α1 v1 + α2 v2 · · · + αk vk
are called the coordinates of v with respect to the basis B and are denoted by (v)B . The
subscript B can be omitted if the basis B happens to be the standard basis.
209
Example 19.3.1 In R2 , consider the standard basis B = {e1 , e2 } and a non-standard basis
C = {v1 , v2 }, where
& ' & ' & ' & '
1 0 1 1
e1 = , e2 = , v1 = v2 = .
0 1 2 −1
On the other hand, the coordinates (v)C of v with respect to the non-standard basis C are
obtained by solving the equation
α1 v1 + α2 v2 = v,
Performing a few elementary operations (the steps are omitted), we find that α1 = −1 and
α2 = 3, which means that & '
−1
(v)C = .
3 C
Note that the vector equation v = −1v1 + 3v2 is a valid equation irrespectively of the
basis with respect to which the coordinates of these vectors are expressed. In particular,
expressing the vector equation v = −1v1 +3v2 with respect to the standard basis B results
in the matrix equation
(v) = −1(v1 ) + 3(v2 ).
Indeed, we have & ' & ' & '
2 1 1
= −1 +3 .
−5 2 −1
Similarly, expressing the vector equation v = −1v1 + 3v2 with respect to the non-standard
basis C results in the matrix equation
210
Realising that & ' & '
1 0
(v1 )C = (v2 )C = ,
0 C 1 C
the above matrix equation becomes
& ' & ' & '
−1 1 0
= −1 +3 ,
3 C 0 C 1 C
Exercise 19.4.3 For each of the sets Si given below, find a basis of the vector space
Lin(Si ) and state its dimension. Provide a Cartesian description for any proper subspace
of R2 or R3 (i.e., for any subspace that is not equal to R2 or R3 ):
C& ' & 'P C& ' & ' & ' & 'P
1 2 1 0 2 −3
S1 = , S2 = , , ,
2 3 −1 0 −2 3
1 2 1
S3 = 0 , 1 , 2 .
−1 3 9
211
It is given that V is a vector space under the standard operations of pointwise addition
and scalar multiplication; that is, under the operations
f1 : R → R defined by f1 (x) = 1
and
f2 : R → R defined by f2 (x) = x.
(a) Show that the set S is a basis of V ; that is, show that S is a linearly independent set
which spans V . Also state the dimension of V .
(b) State the coordinates (f1 )S and (f2 )S of the vectors f1 and f2 with respect to the basis
S.
Now consider the set T = {g1 , g2 } ⊆ V consisting of the vectors
g1 : R → R defined by g1 (x) = 1 + 2x
and
g2 : R → R defined by g2 (x) = 3 − x.
(c) Find the coordinates (g1 )S and (g2 )S of the vectors g1 and g2 with respect to the basis
S.
(d) Given that the set T = {g1 , g2 } ⊆ V is also a basis of V , state the coordinates (g1 )T
and (g2 )T of the vectors g1 and g2 with respect to the basis T .
20 Vector Spaces, 4 of 4
In this section, we develop the concepts of angle and length, so that these concepts can be
applied to a general vector space V where angles and lengths do not arise in a natural way.
For example, in a vector space of matrices, what is the angle between two matrices? Or, in
a vector space of functions, what is the length of a function? By imposing the mathematical
ideas of inner product and norm as additional structures on a general vector space V , the
elements of V can mimic the geometrical properties of vectors in real coordinate vector
spaces. In this way, a variety of theorems established for real coordinate vector spaces,
including theorems on orthogonality of vectors and theorems on lengths, can be imported
directly on a general vector space V , enriching V and any theory defined on it considerably.
212
20.1 Inner product
Recall that the scalar product — or dot product — of two vectors u and v in the real
coordinate vector space Rn ,
G u1 v1 H
u2 v 2
9u, v: = .. , .. = u1 v1 + u2 v2 + . . . un vn ,
. .
un vn
is equal to the product of the lengths of these vectors and the cosine of the angle between
them:
9u, v: = ||u|| ||v|| cos(θ).
Below, we prove this result in the case of R2 , and then we use the properties of the scalar
product on R2 to motivate the introduction of what is known as an inner product on a
general vector space V .
Theorem 20.1.1 Let a, b ∈ R2 and let θ denote the angle between them. Define the
scalar product 9a, b: of two vectors a, b by
E& ' & 'F
a1 b
9a, b: = , 1 = a1 b 1 + a2 b 2
a2 b2
The cosine rule states that c2 = a2 +b2 −2ab cos(θ), where a=||a||, b = ||b|| and c = ||b−a||.
That is,
||b − a||2 = ||a||2 + ||b||2 − 2||a||||b|| cos(θ).
By expanding the scalar product on the left hand side and using properties which follow
from the definition of this product, we obtain
213
By comparing this equation with the cosine rule
we deduce that
9a, b: = ||a|| ||b|| cos(θ),
which is the required result. !
As explained previously, we would like to develop the concepts of angle and length in such
a way so that they become applicable to a general vector space V . To this end, we observe
that the scalar product 9 , : defined on R2 can be regarded as a function from the set of
pairs of vectors in R2 to the real numbers, i.e.,
9 , : : {(u, v) | u, v ∈ R2 } → R,
Bilinearity reflects the idea that the scalar product is compatible with the linear
structure of the vector space. For example, using bilinearity, we have
9a, b: = 9b, a: .
Symmetry reflects the idea that the angle between a and b is equal to the angle
between b and a.
(3) The function 9 , : is positive, in the sense that
Positivity reflects the idea that the square length 9a, a: of any vector a is non-negative
and that the only vector whose length is equal to 0 is the zero vector 0.
These three properties motivate the following definition of a generalised scalar product on
any real vector space V , referred to as an inner product.
Let V be a real vector space. An inner product on V is a function 9 , : from the set
{(x, y)} of vectors in V to the set R which satisfies the following properties:
214
(1) 9αx + βy, z: = α 9x, z: + β 9y, z: for all x, y, z ∈ V and all α, β ∈ R.
(3) 9x, x: ≥ 0 for all x ∈ V , and 9x, x: = 0 if and only if x = 0, the zero vector of the
vector space.
Remark 20.1.2 Property (1) is known as linearity on the left. Property (1) and the
symmetry property (2) imply linearity on the right and hence bilinearity.
A vector space equipped with an inner product is known as an inner product space.
20.2 Norm
Let V be an inner product space and let x be an element of V . Then the inner product 9, :
defines the norm ||x||, or length ||x||, of x by
/
||x|| = 9x, x:.
Note that, by the positivity of the inner product, only the zero vector 0 in V has zero
length. Also note that given any non-zero vector v in an inner product space V , then it is
a simple matter to create a unit vector u parallel to v. This is the vector
1
u= v.
||v||
The process of constructing u from v is known as normalising v.
where the symbol on the left hand side is the absolute value of 9x, y:.
Proof Let x, y be any two vectors of V . For any real number α, we consider the vector
αx + y. Certainly, ||αx + y||2 ≥ 0 for all α. But
215
or
(9x, y:)2 ≤ ||x||2 ||y||2 .
Taking the square root of each side, we obtain
9x, y:
cos(θ) = .
||x|| ||y||
This definition will only make sense if we can show that this number cos(θ) is between −1
and 1. But this follows immediately from the Cauchy-Schwarz inequality, which can be
stated as D D
D 9x, y: D
D D
D D ≤ 1.
D ||x|| ||y|| D
The usefulness of this definition is in the concept of orthogonality.
Suppose that V is an inner product space. Then x, y ∈ V are said to be orthogonal if
and only if 9x, y: = 0. We write x ⊥ y to mean that x, y are orthogonal.
Proof We know that for any z, ||z||2 = 9z, z:, simply from the definition of the norm.
So,
||x + y||2 = 9x + y, x + y:
= 9x, x + y: + 9y, x + y:
= 9x, x: + 9x, y: + 9y, x: + 9y, y:
= ||x||2 + 2 9x, y: + ||y||2
= ||x||2 + ||y||2 ,
where the last line follows from the fact that, x, y being orthogonal, 9x, y: = 0. !
216
Moreover, we also have the triangle inequality for norms. In the special case of the
standard inner product on R2 , this states the obvious fact that the length of one side of a
triangle must be no more than the sum of the lengths of the other two sides.
Proof We have
||x + y||2 = 9x + y, x + y:
= 9x, x + y: + 9y, x + y:
= 9x, x: + 9x, y: + 9y, x: + 9y, y:
= ||x||2 + 2 9x, y: + ||y||2
≤ ||x||2 + ||y||2 + 2| 9x, y: |
≤ ||x||2 + ||y||2 + 2||x|| ||y||
= (||x|| + ||y||)2 ,
where the last inequality used is the Cauchy-Schwarz inequality. Thus, taking square roots,
we have that ||x + y|| ≤ ||x||+||y||, as required. !
Theorem 20.5.3 Suppose that V is an inner product space and that vectors v1 , v2 , . . . , vk
∈ V are pairwise orthogonal (meaning vi ⊥ vj for i )= j), and none is the zero vector.
Then {v1 , v2 , . . . , vk } is a linearly independent set of vectors.
α1 v1 + α2 v2 + · · · + αk vk = 0,
9vi , α1 v1 + · · · + αk vk : = 9vi , 0: = 0.
But
9vi , α1 v1 + · · · + αk vk : = α1 9vi , v1 : + · · · + ai 9vi , vi : + · · · + αk 9vi , vk : .
Since 9vi , vj : = 0 for i )= j, the right hand side is equal to αi 9vi , vi :, which is equal to
αi ||vi ||2 . So the equation
9vi , α1 v1 + · · · + αk vk : = 0
reduces to
αi ||vi ||2 = 0.
Moreover, since vi )= 0, ||vi ||2 )= 0 and hence αi = 0. But i was any integer in the range 1
to k, so we deduce that
α1 = α2 = · · · = αk = 0,
as required. !
217
20.6 Orthonormality and the Gram-Schmidt process
A set of vectors {x1 , x2 , . . . , xk } in an inner product space V is said to be an
orthonormal set if any two different vectors in the set are orthogonal and each vector is
a unit vector; that is
• Lin{u1 , u2 , . . . , uk } = Lin{v1 , v2 , . . . , vk }
218
& '
1
(b) Find a vector y orthogonal to the vector x = with respect to the inner product
1
given above.
1 1
Exercise 20.7.2 Consider the vectors v1 = 2 and v2 = 0.
0 1
Exercise 20.7.3 Consider the set C[a, b] of all continuous functions defined on an interval
[a, b]. It can be shown that C[a, b] is a vector space under the standard operations of
pointwise addition and scalar multiplication of functions; that is,
(f + g)(x) := f (x) + g(x),
(λf )(x) := λf (x).
Show that the function 9 , : defined by
0 b
9f, g: := f (x)g(x)dx
a
is an inner product on the vector space C[a, b]; that is, show that 9 , : satisfies linearity on
the left, symmetry and positivity. You may use the fact that the product of two continuous
functions on [a, b] is also continuous and you may also use the fact that if a non-negative
continuous function h on [a, b] satisfies
0 b
h(x)dx = 0,
a
then h(x) = 0 for all x ∈ [a, b].
Exercise 20.7.4 Building on Exercise 20.7.3, consider the vector space L[−2, 2] of all
functions of the form a + bx defined on the interval [−2, 2]. It can be shown that the
functions
f1 : [−2, 2] → R defined by f1 (x) = 1
and
f2 : [−2, 2] → R defined by f2 (x) = x
form a basis B = {f1 , f2 } for L[−2, 2]. Apply the Gram-Schmidt procedure to find an
orthonormal basis C = {g1 , g2 } for L[−2, 2] with respect to the inner product given in
Exercise 20.7.3; that is, 0 2
9f, g: := f (x)g(x)dx.
−2
219
20.8 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 10.1, 10.2 and 10.4 of our Algebra Textbook are relevant.
220
Written by:
Dr Ioannis Kouletsis