MA100 MT Course Pack 2020-21

Download as pdf or txt
Download as pdf or txt
You are on page 1of 222

 

  MA100: Mathematical Methods (MT) „


Contents
Preface 6

1 Orientation 7
1.1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Teaching arrangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Course materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Current exam structure for MA100 . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Maple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Mathematical background . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 The Logical Framework 14


2.1 Definition, proposition, proof and related terminology . . . . . . . . . . . . 14
2.2 Truth tables, negations and compound propositions . . . . . . . . . . . . . 14
2.3 Logical Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Predicates and quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Some methods of proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 25

3 One-Variable Calculus, Part 1 of 7 25


3.1 Sets, subsets and intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Functions, domain, codomain and range . . . . . . . . . . . . . . . . . . . 26
3.3 The set R2 , Cartesian equations and graphs . . . . . . . . . . . . . . . . . 27
3.4 Surjective, injective and bijective functions . . . . . . . . . . . . . . . . . . 29
3.5 The derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Continuity and differentiability . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 39

4 One-Variable Calculus, Part 2 of 7 39


4.1 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Sum, product, quotient and chain rules . . . . . . . . . . . . . . . . . . . . 39
4.3 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Taylor polynomials and Taylor series . . . . . . . . . . . . . . . . . . . . . 43
4.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 47

1
5 One-Variable Calculus, Part 3 of 7 47
5.1 Increasing and decreasing functions . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Local extrema and strict local extrema . . . . . . . . . . . . . . . . . . . . 49
5.3 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Tests for classifying stationary points . . . . . . . . . . . . . . . . . . . . . 52
5.5 Convex and concave functions . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Inflection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 59

6 One-Variable Calculus, Part 4 of 7 59


6.1 Asymptotes and graph sketching . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Conics in standard position and orientation . . . . . . . . . . . . . . . . . 64
6.3 Global extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 74

7 One-Variable Calculus, Part 5 of 7 74


7.1 The inverse function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Some elementary bijective functions and their inverses . . . . . . . . . . . . 76
7.3 The derivative of an inverse function . . . . . . . . . . . . . . . . . . . . . 83
7.4 The local inverse of a non-bijective function . . . . . . . . . . . . . . . . . 85
7.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 88

8 One-Variable Calculus, Part 6 of 7 88


8.1 Indefinite and definite integrals . . . . . . . . . . . . . . . . . . . . . . . . 88
8.2 The fundamental theorem of calculus . . . . . . . . . . . . . . . . . . . . . 89
8.3 Primitives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . 90
8.4 Integration by recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.5 Integration by change of variable . . . . . . . . . . . . . . . . . . . . . . . 94
8.6 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 99

9 One-Variable Calculus, Part 7 of 7 100


9.1 Integration by partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.3 Consumers’ and producers’ surplus . . . . . . . . . . . . . . . . . . . . . . 107
9.4 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

2
9.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 109

10 Matrices, Part 1 of 3 109


10.1 Definitions, notation and terminology . . . . . . . . . . . . . . . . . . . . . 109
10.2 Operations on matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.3 The laws of matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.4 The inverse matrix and its properties . . . . . . . . . . . . . . . . . . . . . 116
10.5 Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.6 Properties of the transpose of a matrix . . . . . . . . . . . . . . . . . . . . 117
10.7 Matrix equations and their solutions . . . . . . . . . . . . . . . . . . . . . 118
10.8 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.9 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 119

11 Matrices, 2 of 3 120
11.1 Solving systems of n linear equations for n unknowns . . . . . . . . . . . . 120
11.2 Elementary row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11.3 Elementary matrices and row-equivalence . . . . . . . . . . . . . . . . . . . 125
11.4 Theorems on matrix invertibility . . . . . . . . . . . . . . . . . . . . . . . 128
11.5 Inversion algorithm based on elementary row operations . . . . . . . . . . . 130
11.6 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.7 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 133

12 Matrices, 3 of 3 133
12.1 Minors, cofactors and the determinant . . . . . . . . . . . . . . . . . . . . 133
12.2 The properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . 137
12.3 Calculating determinants using row operations . . . . . . . . . . . . . . . . 140
12.4 Inverting a matrix using cofactors and the determinant . . . . . . . . . . . 142
12.5 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.6 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.7 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 146

13 Developing Geometric Insight, 1 of 2 146


13.1 Visualising the set R2 using position vectors . . . . . . . . . . . . . . . . . 146
13.2 Visualising vector operations in R2 . . . . . . . . . . . . . . . . . . . . . . 147
13.3 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
13.4 Geometric and algebraic approaches to linear systems . . . . . . . . . . . . 157
13.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
13.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 159

3
14 Developing Geometric Insight, 2 of 2 160
14.1 Visualising vectors and operations in R3 . . . . . . . . . . . . . . . . . . . 160
14.2 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.3 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
14.4 Geometric and algebraic approaches to linear systems . . . . . . . . . . . . 171
14.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
14.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 177

15 Systems of Linear Equations, 1 of 2 177


15.1 Systems of m linear equations for n unknowns . . . . . . . . . . . . . . . . 177
15.2 The Gauss-Jordan method . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
15.3 Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
15.4 Homogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
15.5 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
15.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 183

16 Systems of Linear Equations, 2 of 2 184


16.1 The principle of linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
16.2 The rank of a matrix and the main theorem revisited . . . . . . . . . . . . 186
16.3 Analysing the set of solutions of Ax = b . . . . . . . . . . . . . . . . . . . 187
16.4 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
16.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 190

17 Vector Spaces, 1 of 4 191


17.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
17.2 Subspaces and the subspace criterion . . . . . . . . . . . . . . . . . . . . . 193
17.3 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
17.4 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 196

18 Vector Spaces, 2 of 4 197


18.1 Linear span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
18.2 Linear independence and linear dependence . . . . . . . . . . . . . . . . . 200
18.3 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
18.4 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 204

19 Vector Spaces, 3 of 4 204


19.1 Theorems on linear span and linear independence . . . . . . . . . . . . . . 204
19.2 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
19.3 Coordinates with respect to a basis . . . . . . . . . . . . . . . . . . . . . . 209
19.4 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
19.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 212

4
20 Vector Spaces, 4 of 4 212
20.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
20.2 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
20.3 The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . 215
20.4 Angle and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
20.5 Pythagoras’ theorem and the triangle inequality . . . . . . . . . . . . . . . 216
20.6 Orthonormality and the Gram-Schmidt process . . . . . . . . . . . . . . . 218
20.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
20.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . 220

5
Preface
These lecture notes are intended as a self-contained study resource for the MA100 Mathe-
matical Methods course at the LSE. At the same time, they are designed to complement
the MA100 course texts, ‘Linear Algebra, Concepts and Methods’ by Martin Anthony and
Michele Harvey, and ‘Calculus, Concepts and Methods’ by Ken Binmore and Joan Davies.
I am grateful to Martin Anthony and Michele Harvey for allowing me to use some materials
from their ‘Linear Algebra, Concepts and Methods’ textbook and to Michele Harvey for
commenting on a draft of the Calculus part of these lecture notes. I am also grateful to Siri
Kouletsis for her invaluable help with typing and editing the manuscript and for various
improvements to its content.

6
1 Orientation
1.1 General information

Lecturer:
Ioannis Kouletsis

Class teachers:
Abdulla Al-Othman, Phil Chan, Sam Fendrich, Jenjira Jaimunk, Ioannis Kouletsis, Siri
Kouletsis, Maria Molina-Domene, Kamila Nowakowicz, Ruilan Wang, Aled Williams, Michael
Yiasemides and George Zouros.

Mathematics department main office:


COL 4.01

1.2 Teaching arrangements

The lecture notes:


There is a set of lecture notes for the Michaelmas term (MT) and another such set for the
Lent term (LT). The first set contains sections 1 to 20 and covers MT weeks 1 to 10. The
second set contains sections 21 to 40 and covers LT weeks 1 to 10.
Section 1 (the current section) is an orientation section. It provides information about the
teaching arrangements and course materials and explains the role of each of the course
components: the lecture notes, the lecture videos, the lecture Q&A session, the course
forums, the classes, the class forums, the office hours, the Mathematics Support Centre
and the MA100 Moodle webpage (which is the web portal of the course). Sections 2 to 39
cover the course syllabus and section 40 is devoted to the summer exam.
Each set of lecture notes is posted in PDF form on the MA100 Moodle webpage at the
beginning of the relevant term.
The lecture notes are fully annotated and very detailed. They contain illustrations, exam-
ples, remarks, ‘exercises for self study’ (whose solutions are available on MA100 Moodle)
and references to the course texts. Together with the course texts, the lectures notes are
intended as your primary source of reading.

The lecture videos:


The lecture videos contain interactive video presentations structured around solving exer-
cises. They complement the lecture notes.
There are several such videos per each section of the lecture notes, all posted on MA100
Moodle under the relevant week.

7
In a typical lecture video presentation, you are given a few minutes to attempt a part of
an exercise based on the corresponding section of the lecture notes and then you compare
your solutions to mine, presented immediately afterwards.

Engaging with the lecture notes and videos:


The lecture notes and lecture videos are intended to work together as a whole. It is
important that you watch the videos only after having read the relevant section of the
lecture notes and that you participate in each video presentation as if it were taking place
in real time. There is little benefit in simply watching the videos; you need to attempt the
exercises presented there.
The lecture videos have been designed to facilitate your introduction to the (sometimes
complex) concepts presented in the lecture notes. Your active engagement with these video
presentations will enhance your understanding of the lecture notes, help you prepare for
your classes and provide you with good training for answering unseen questions in exams.
In addition, it allows you to gain an accurate sense of the pace of the course.
The material is not excessively difficult but its volume should not be underestimated.
Engaging with the lecture notes and videos on a weekly basis keeps you in tune with the
course and protects you from falling behind its pace.

The lecture Q&A session:


There is one lecture Q&A session per week, starting in week 2 and finishing in week 11 of
each term. This session takes place via Zoom on Tuesday at 2pm. The Zoom link will be
posted on MA100 Moodle at the end of MT week 1.

The course forums:


There are two course forums that can be accessed via the MA100 Moodle webpage. One of
them is for general announcements I will be making throughout the year. The other forum
is intended for any questions you may have about the course, and your fellow students may
participate in the discussion.

The classes:
These are weekly one-hour interactive sessions where you strengthen your understanding
of the course material with the help of your class teacher. You have a class per week,
starting in week 2 and finishing in week 11 of each term. Some of these classes take place
on campus and some are conducted online, but they all have the same structure and cover
the same material.
Each student is assigned to a class group of about fifteen students who usually follow
the same degree programme. Your personal timetable indicates the day and time of your
class along with the room number. Please note that class attendance is compulsory and is
recorded by your class teacher.

8
The class forums:
There is a forum dedicated to each class group where you can post questions about the self-
study exercises and related theory (in preparation for your class) as well as clarifications
about the class exercises and other class materials (after your class). Your classmates can
answer your questions, and your class teacher will monitor the forum and intervene as
needed.

In-class practice exercise:


Each class contains an open-book in-class practice session, initiated at the end of the class.
Your class teacher will give you an ‘in-class practice exercise’ to solve and you can use the
lecture notes and your own notes as needed, but you have to try to work individually. Your
class teacher will provide general hints as to how to approach the given exercise and also
provide individual help if needed.
The idea behind this exercise is to give you the opportunity to evaluate on a weekly basis
where you stand in relation to the demands and pace of the course. Each in-class practice
exercise will help you appreciate the level of preparation that is needed in order to reach
a solid understanding of the material and also give you valuable experience when it comes
to working under a time constraint.
Your solution should be submitted electronically to your teacher via MA100 Moodle by
the end of the day on which your class takes place. Information regarding how to submit
your in-class practice exercise will be provided at the end of MT week 1. Your class teacher
will receive your work electronically and give you collective feedback on your work via your
class forum, emphasising key ideas and drawing your attention to common mistakes and
misconceptions.

Class structure:
A typical class consists of three parts:
In the first part of the class, your class teacher will give you a ‘consolidation exercise’
in order to ensure that the main topics covered in the previous week’s class have been
understood. Your class teacher will provide the solution to this exercise afterwards. Please
feel free to ask your class teacher for further clarifications about the underlying concepts
and methods if needed.
In the second part of the class, the focus shifts to the most recent material; that is,
the material corresponding to the previous week’s lectures notes and videos. Your class
teacher will typically answer questions you may have about the theory, or expand on a
topic, or show you an application, and will then guide you through a set of ‘current topic
exercises’. As with the ‘consolidation exercise’, your class teacher will provide solutions to
these exercises afterwards.
Finally, in the third part of the class, you will initiate the ‘in-class practice exercise’.
As already mentioned, you can complete this exercise after class and should submit your
solution to your class teacher electronically. Please note that the ‘in-class practice exercise’
will always be based on the topics practised during the second part of the class.

9
Solutions to the class materials:
Solutions to the ‘consolidation exercise’, ‘current topic exercises’ and ‘in-class practice
exercise’ conducted in class during week n will be posted on MA100 Moodle at the end
of week n. I strongly recommend that you go through these solutions before the following
week’s class so that you do not leave any gaps in your understanding.

Assignments with exam-style questions:


During the year, you will be given four ‘assignments with exam-style questions’ to attempt
at home and submit electronically to your class teacher. Your class teacher will mark them
and return them to you electronically with a numerical mark and feedback. There will
be two such assignments in the Michaelmas term and another two in the Lent term. The
submission deadlines will be in weeks 5 and 9 of each term and solutions will be posted on
the MA100 Moodle webpage at the end of those weeks. Each assignment will be published
on MA100 Moodle two weeks before its submission deadline in order to give you enough
time to complete it.

Preparing for your first class:


For your first class in week 2 of the Michaelmas term you will need to prepare section 2 of
the lecture notes. Ideally, this should include:
• studying section 2 of the lecture notes,
• actively engaging with the lecture videos of section 2,
• attempting the ‘exercises for self study’ found in section 2 of your lecture notes,
• consulting the solutions to these exercises posted on MA100 Moodle, and
• studying any relevant sections from our textbooks.
Optional exercises can be found in the relevant sections of our course textbooks (please see
the references at the end of each section of the lecture notes)

Preparing for subsequent classes:


For each subsequent class, you should prepare both sections of the previous week’s lecture
notes, ideally in the manner recommended above. In addition, you should consult the
solutions to the ‘consolidation exercise’, ‘current topic exercises’ and ‘in-class practice
exercise’ from the previous week’s class, as recommended in a previous paragraph. It
is important that you work through all these materials to the best of your ability on a
weekly basis.

Some comments about your classes and the course in general:


I would like to emphasise that the ‘in-class practice exercises’ and all other exercises con-
ducted in class do not count towards your final MA100 grade. You should actively engage
with all the class components but you should not feel any stress or fear of evaluation. Your
class is the place to ask questions, practise, make mistakes and learn; it is not the place to
be examined.

10
The class format and the class forums provide an opportunity for establishing stronger
connections with your class teacher and fellow students. Moreover, your active engagement
with all the class components allows the School to get to know you. A record of your class
attendance and weekly submission of work is kept throughout your studies and, along with
your exam results, forms the basis for future academic references.
Please keep in mind that some parts of the ‘consolidation exercises’, ‘current topic exercises’
and ‘in-class practice exercises’ are designed to be challenging, so do not worry too much
if you are not able to solve them all on your first attempt. This is natural and part of your
learning process. However, after the solutions to these exercises are discussed in class and
posted on the MA100 Moodle webpage, make sure that you fully understand them; that
is, solve the parts you did not manage to complete the first time around. This comment
applies to the ‘exercises for self study’ and all other exercises covered in the course as well;
it is not enough to read their solutions on MA100 Moodle – you need to keep attempting
these exercises until you have mastered the applicable theory and methods.
The pace of the course is manageable. However, if you fall behind the pace, the volume
of the unread material can progressively become overwhelming, especially since you have
other courses to study as well. It is therefore imperative to adopt good study habits from
the very beginning of the academic year; that is, from week 1. Do not make the common
mistake of thinking that a four-week break before the exams provides enough time to cover
unread material. At that point, you should only be refreshing material that you have
already practised thoroughly during the term.

Office hours:
You can find the office hours of the MA100 teaching staff posted on the Mathematics
Department site: https://www.lse.ac.uk/Mathematics

Mathematics Support Centre:


The Mathematics Support Centre offers a (physical or virtual) space where teachers and lec-
turers from the Mathematics Department provide help with first-year mathematics courses;
among them, MA100 and MA103. Please check the Mathematics Department website for
the current term’s times and venues: https://www.lse.ac.uk/Mathematics

Email communication:
For all other issues related to the course, you can email me personally at [email protected].

1.3 Course materials


Lecture notes: Each set of lecture notes will be posted in PDF format on the MA100
Moodle webpage at the beginning of the relevant term.
Lecture videos: These videos will be posted on the MA100 Moodle webpage under the
relevant week.
The course texts: You are expected to have a copy of each of the following two books.
The lecture notes contain detailed references to them.

11
• Martin Anthony and Michele Harvey, Linear Algebra, Concepts and Methods, Cambridge
University Press,
• Ken Binmore and Joan Davies, Calculus, Concepts and Methods, Cambridge University
Press.
Reference texts: The following books provide additional reading and can be found in
the library:
• H. Anton, Elementary Linear Algebra (or the Applications Version of this book by Anton
and Rorres),
• D. Lay, Linear Algebra and its Applications,
• S. L. Salas and E. Hille, Calculus, One and Several Variables,
• Schaum Outline Series: Mathematics for Economists; Linear Algebra; Advanced Calculus;
Differential and Integral Calculus.
Solutions to the ‘exercises for self study’: These refer to the exercises found at the
end of each section of the lecture notes. Each set of solutions will be posted on the MA100
Moodle webpage under the relevant week. The solutions are very detailed; in essence,
they re-teach the lecture material from a practical perspective in order to ensure that you
appreciate the links that exist between the theory and its applications.
Solutions to the ‘consolidation exercises’, the ‘current topic exercises’ and the
‘in-class practice exercises’: These exercises and their solutions will be posted on the
MA100 Moodle webpage at the end of the week in which they are introduced in class.
Solutions to the ‘assignments with exam-style questions’: The solutions to these
assignments will be posted on the MA100 Moodle webpage in weeks 5 and 9 of each term;
that is, just after the corresponding submission deadlines. The assignments themselves will
be posted a couple of weeks earlier; that is, in weeks 3 and 7 of each term.
Solutions to past exams: Past examination papers from the last three years can be found
on the MA100 Moodle webpage. Their solutions will be released on the MA100 Moodle
webpage in due course. Older examination papers, without solutions, can be found in the
library. Please note that the MA100 exam structure changed in the 2015/16 academic year.
The 2014/15 paper followed the old exam structure which was based on a single exam in
the summer term. The exam structure introduced in 2015/16 – which is also our current
exam structure and is described in detail in section 1.4 below – is based on two exams, one
held in January and one held in May.
Regarding past exams: Past examination papers are useful for familiarising yourself
with the style of questioning, but I would advise against using them for the purpose of
deducing which topics are likely to appear in the forthcoming exams. The key to success in
MA100 is having understood the entire course material and having mastered the examples
from the lecture notes, the exercises covered in the lectures and workshops, the ‘exercises
for self study’, the ‘consolidation exercises’, the ‘current topic exercises’ and the ‘in-class
practice exercises’.

12
1.4 Current exam structure for MA100
The 2020/21 exam structure for MA100 consists of two exams. The first exam counts 25%
towards your final MA100 grade and will take place in week 0 of the Lent term, in January
2021. It will examine material covered in lectures 2-16. The second exam counts 75%
towards your final MA100 grade. This will take place in the summer term, in May 2021,
and will examine the entire material of the course with the emphasis being placed on the
material covered in lectures 17-39. The earlier material is examined only indirectly in the
second exam in the sense that it forms the foundations on which the later material rests.
I will give you further details as the course progresses.

1.5 Maple
Maple is a computer program which is widely used for analysing, visualising and solving
mathematical problems. You can try the Maple tutorial on the MA100 Moodle webpage
on any of the computer facilities offered by the School. The relevant link is posted on
the MA100 Moodle webpage under ‘Maple’. Note that Maple is not part of the MA100
examinable material.

1.6 Mathematical background


You may look through the review booklets ‘An Algebra Refresher’ and ‘A Calculus Re-
fresher’ if you feel the need to do so. The relevant links are posted on the MA100 Moodle
webpage under ‘Mathematical Background’. An exercise set based on material with which
you should already be familiar can also be found in the same section. Solutions to this set
and a Maple file containing all the relevant graphs are also there. In your spare time, you
can work through these exercises and consult their solutions. If you have difficulties with
some part of this material, you can look it up in the Refreshers using the relevant links.
Alternatively, you can make a note of any background you are missing and visit my office
hours or the office hours of any MA100 class teacher.

1.7 Syllabus
By the end of the academic year we will have covered a range of mathematical topics in
Calculus and Linear algebra and some of their applications to Economics and Statistics.
In the Michaelmas term, we will introduce the logical framework and methods of proof, one-
variable calculus, Taylor polynomials, classification of stationary points, convexity, conics,
general graph sketching, global optimisation, constrained optimisation, integration tech-
niques, matrices, determinants, Gaussian elimination, linear Cartesian geometry, Cartesian
and vector parametric descriptions of flats, systems of linear equations, real coordinate vec-
tor spaces, general vector spaces, vector subspaces, linear independence, linear span, basis,
coordinates and dimension.
In the Lent term, we will start by discussing some special kinds of vector spaces required
for analysing linear transformations and then proceed to transition matrices, linear trans-
formations, similarity, eigenvalues and eigenvectors, algebraic and geometric multiplicities,

13
diagonalisation and orthogonal diagonalisation, quadratic forms, multi-variate calculus,
surfaces, contours, vertical sections, gradients, directional derivatives, constraint optimi-
sation including Lagrange’s method, vector-valued functions, the general chain rule, dif-
ferential and difference equations including some partial differential equations and, finally,
linear systems of differential and difference equations.
This brings us to the end of the introductory part of these lecture notes. Next, we get
started on our syllabus with a section on mathematical logic. This section is necessary for
understanding some methods of proof that we will encounter throughout the course.

2 The Logical Framework


2.1 Definition, proposition, proof and related terminology
The following proposition is typical of the ones we often encounter in mathematics:

14 is an even number.

A proposition is a mathematical statement that is either true or false. Regarding the


above proposition, we are not yet in a position to decide whether it is true or false because
we have not defined what an even number is. We need to introduce a definition. A
definition is a precise and unambiguous description of a mathematical term: a number
k is called even only if k = 2n for some integer n. Note that the concept of an integer is
assumed known - otherwise, we would have to define it using a more fundamental concept.
We can now provide a proof that the above proposition is true. Indeed, we have 14 = 2 ×7
where 7 is an integer, so 14 is an even number. The result ‘14 is an even number’ is called a
theorem. A theorem is simply a valid mathematical result and a proof is an explanation
as to why this result is valid. Sometimes a theorem may be referred to as a lemma; this
is a preliminary result whose main role is to help us prove a forthcoming important or
general theorem. A theorem may also be referred to as a corollary; this is a result whose
proof relies heavily on a preceding important or general theorem.
Example 2.1.1 The proposition ‘13 is an even number’ is false so it is not a theorem.

2.2 Truth tables, negations and compound propositions


By its very definition, a proposition p is either true or false. If p is true, we assign to it
the truth value T . If it is false, we assign to it the truth value F . The simplest truth
table is displayed below. It consists of a single column that captures the two possibilities
associated with the truth value of a proposition p:

p
T
F

The negation of a proposition p is denoted by ¬p and is simply the proposition ‘not p’.
Its truth table follows:

14
p ¬p
T F
F T

Example 2.2.1 Let p be the proposition ‘4 is an even number’. Then ¬p is the proposition
‘4 is not an even number’. In this case, p is true and ¬p is false. On the other hand, let
p be the proposition ‘π is an even number’. Then ¬p is the proposition ‘π is not an even
number’. In this case, p is false and ¬p is true.
A compound proposition is a proposition that is built up from simpler propositions
using linking words such as and, or, if-then, if-and-only-if. The rest of this subsection
is concerned with some fundamental compound propositions and their truth tables.
The conjunction of two propositions p and q is denoted by p ∧ q and is the proposition
‘p and q’. The conjunction p ∧ q is true only if the propositions p and q are both true. Its
truth table is given below:

p q p∧q
T T T
T F F
F T F
F F F

Example 2.2.2 Let p be the proposition ‘11 is a multiple of 2’ and q be the proposition
‘16 is a multiple of 2’. The conjunction p ∧ q amounts to the proposition ‘11 and 16
are multiples of 2’. In this case, the conjunction p ∧ q is false since its first constituent
proposition is false.
The disjunction of two propositions p and q is denoted by p ∨ q and is the proposition ‘p
or q’. The disjunction p ∨ q is false only if the propositions p and q are both false. This
idea is reflected in the following truth table:

p q p∨q
T T T
T F T
F T T
F F F

Example 2.2.3 Let p be the proposition ‘14 is a multiple of 2’ and q be the proposition
‘14 is a multiple of 3’. The disjunction p ∨ q amounts to the proposition ‘14 is a multiple of
2 or 3’. In this case, the disjunction p ∨ q is true because its first constituent proposition
is true.

The conditional proposition denoted by p ⇒ q is the proposition ‘if p, then q’, also
referred to as ‘p implies q’. The conditional proposition is false only if p is true and q is
false. The truth table for p ⇒ q is given below:

p q p⇒q
T T T
T F F
F T T
F F T

15
Remark 2.2.4 The last two rows of the truth table for p ⇒ q express the idea that if the
premise p on which the conditional proposition p ⇒ q rests is false, the proposition itself
is a true proposition.
A few examples of conditional propositions are presented below:
Example 2.2.5 Let p be the proposition ‘16 is a perfect square’ and q be the proposition
‘64 is a perfect square’. Since both propositions p and q are true, the proposition p ⇒ q is
true.
Example 2.2.6 Let p be the proposition ‘13 is a perfect square’ and q be the proposition
‘64 is a perfect square’. Since p is false, the proposition p ⇒ q is true.
Example 2.2.7 Let p be the proposition ‘3 is an even number’ and q be the proposition
‘π is an even number’. Since p is false, the proposition p ⇒ q is true.
Example 2.2.8 Let p be the proposition ‘5 is an even number’ and q be the proposition
‘5 is not an even number’. Again, since p is false, the proposition p ⇒ q is true. There is
no contradiction in the proposition ‘if 5 is an even number, then 5 is not an even number’
because the premise on which it rests is false.
Example 2.2.9 Let p be the proposition ‘5 is not an even number’ and q be the proposition
‘5 is an even number’. Now, p is true and q is false, so the proposition p ⇒ q is false.
Example 2.2.10 Let p be the proposition ‘16 is an even number’ and q be the propo-
sition ‘the sum of the angles of a triangle is 180 degrees’. Since both p and q are true,
the proposition p ⇒ q is true. Note that the propositions p and q seem to be logically
disconnected here.
Example 2.2.11 Let p be the proposition ‘13 is an even number’ and q be the proposition
‘the sum of the angles of a triangle is 32 degrees’. Since p is false, the proposition p ⇒ q
is true.
Remark 2.2.12 A theorem of the form ‘p implies q’ is usually associated with a deductive
process for inferring the conclusion q from the assumed truth of the premise p. One may
wonder how the conditional proposition p ⇒ q can be relevant in capturing this idea. After
all, the truth value of the conditional p ⇒ q is entirely determined by the truth values of its
constituent propositions p and q. Whether or not we can reach q from p through a series of
mathematical deductions is not relevant for establishing the truth value of the conditional
p ⇒ q. In fact, in Examples 2.2.10 and 2.2.11, we cannot even imagine connecting the
propositions p and q. How can the conditional p ⇒ q be the right mathematical object to
represent the process of inferring q from p? The answer to this question lies in the fact that
a valid mathematical reasoning used in a proof of the form ‘p implies q’ cannot produce a
false proposition q by assuming the truth of a proposition p that is actually true. Any such
reasoning can only produce (i) a true proposition q by assuming the truth of a proposition
p that is actually true, (ii) a true proposition q by assuming the truth of a proposition p
that is actually false or (iii) a false proposition q by assuming the truth of a proposition
p that is actually false. In each of these cases, T ⇒ T , F ⇒ T or F ⇒ F , the validity of
the deductive argument and hence the validity of the resulting proposition ‘p implies q’ is
indeed captured by the truth value T of the conditional p ⇒ q.

16
Remark 2.2.13 The fact that a valid mathematical reasoning can never lead to a propo-
sition of the form ‘T implies F ’ provides the logical foundations for the so-called ‘proof by
contradiction’ which we will discuss among other methods of proof in subsection 2.5.
Example 2.2.14 Referring to Example 2.2.8 and Remark 2.2.12, let us construct a valid
mathematical argument that establishes the truth of the proposition ‘if 5 is an even number,
then 5 is not an even number’ in agreement with the truth value T of the corresponding
conditional. Note that we need to assume the truth of the premise ‘5 is an even number’
and infer the conclusion ‘5 is not an even number’.
Indeed, if 5 is an even number, we have that 5 = 2k for some integer k. Then, since
5 = 2(5) − 5, we can write that 5 = 2(2k) − 5 which implies that 5 = 4k − 6 + 1. Realising
that 4k − 6 is a multiple of 2, we obtain the equation 5 = 2(2k − 3) + 1 where 2k − 3 is an
integer. Thus, we deduce that 5 is not an even number. Note that this is an illustration of
the case where we produce a true proposition by assuming the truth of a proposition that
is actually false.
The biconditional proposition denoted by p ⇔ q is the compound proposition ‘p if and
only if q’. Its truth table is given below:

p q p⇔q
T T T
T F F
F T F
F F T

Remark 2.2.15 The biconditional proposition p ⇔ q may also be referred to as ‘p implies


q and q implies p’. This is because it is equivalent to the conjunction of the conditional
propositions p ⇒ q and q ⇒ p. We will establish this result in subsection 2.3 where we
will introduce the notion of logical equivalence of two propositions.
Example 2.2.16 Let p be the proposition ‘13 is an even number’ and q be the proposition
‘17 is an even number’. Both p and q are false so the biconditional proposition p ⇔ q is
true.
The converse of the conditional proposition p ⇒ q is the conditional proposition q ⇒ p.
For comparison, the truth tables of these two propositions are displayed together below:

p q p⇒q q⇒p
T T T T
T F F T
F T T F
F F T T

Example 2.2.17 Let p be the proposition ‘4 is even’ and q be the proposition ‘5 is even’.
The converse of the conditional proposition ‘if 4 is even, then 5 is even’ is the conditional
proposition ‘if 5 is even, then 4 is even’. Here, p is true and q is false, so the conditional
proposition p ⇒ q is false and its converse proposition q ⇒ p is true.
The contrapositive of the conditional proposition p ⇒ q is the conditional proposition
(¬q) ⇒ (¬p). The truth tables for these propositions are displayed together below:

17
p q ¬p ¬q p⇒q (¬q) ⇒ (¬p)
T T F F T T
T F F T F F
F T T F T T
F F T T T T

Remark 2.2.18 Note that the truth table of the proposition p ⇒ q is identical to that
of its contrapositive proposition (¬q) ⇒ (¬p).
Example 2.2.19 Let p be the proposition ‘4 is even’ and q be the proposition ‘4 + 1 is
odd’. The contrapositive of the conditional proposition ‘if 4 is even, then 4 + 1 is odd’ is
the conditional proposition ‘if 4 + 1 is not odd, then 4 is not even’. Here, both p and q
are true and hence their negations ¬p and ¬q are both false. The conditional proposition
p ⇒ q and its contrapositive proposition (¬q) ⇒ (¬p) are both true.
A final note on terminology: The conditional proposition ‘p ⇒ q’ is commonly known as
‘p is sufficient for q’ or, equivalently, as ‘q is necessary for p’. Indeed, provided that the
conditional proposition p ⇒ q is true, it must have one of the forms ‘T ⇒ T ’, ‘F ⇒ T ’ or
‘F ⇒ F ’. Hence, it is sufficient for p to be true in order for q to be true. However, it is
not necessary for p to be true in order for q to be true, because q can be true even if p is
false. On the other hand, it is necessary for q to be true in order for p to be true because
if this is not the case, then p has to be false. Also note that it is not sufficient for q to be
true in order for p to be true, because it may happen that q is true and p is false.
For similar reasons, the biconditional proposition p ⇔ q is known as ‘p is necessary and
sufficient for q’ or, equivalently, as ‘q is necessary and sufficient for p’.

2.3 Logical Equivalence


Consider a proposition p and its double negation ¬(¬p). Regardless of the truth value
of the proposition p, the propositions p and ¬(¬p) have the same truth value. This is
confirmed by the truth table below:

p ¬p ¬(¬p)
T F T
F T F

In general, two propositions A and B that are built up from a single proposition p through
some logical operations such as ¬, ∧, ∨, ⇒, ⇔ are called logically equivalent if they
always have the same truth value; that is, for every truth value of p, the truth value of A
and the truth value of B are the same.
Remark 2.3.1 In the above example, the role of A is played by p and the role of B is
played by ¬(¬p). Note that both A and B are built up from p.
We can extend this idea and apply it to propositions A and B that are built up from two
propositions p and q.
Two propositions A and B that are built up from two propositions p and q are called
logically equivalent if they always have the same truth value; that is, for every pair of
truth values of p and q, the truth value of A and the truth value of B are the same.

18
Remark 2.3.2 Looking at the truth table preceding Remark 2.2.18, we realise that the
proposition A given by p ⇒ q and its contrapositive proposition B given by (¬q) ⇒ (¬p)
are logically equivalent. This fact can be very useful in some proofs as we will see in
subsection 2.5.
Example 2.3.3 In order to show that the biconditional proposition p ⇔ q is logically
equivalent to the conjunction of the conditional propositions p ⇒ q and q ⇒ p we construct
the following truth table and focus on its last two columns:

p q p⇒q q⇒p p⇔q (p ⇒ q) ∧ (q ⇒ p)


T T T T T T
T F F T F F
F T T F F F
F F T T T T

Example 2.3.4 Similarly, we show that the conditional proposition p ⇒ q is logically


equivalent to the proposition (¬p) ∨ q. The relevant truth table is given below:

p q ¬p p⇒q (¬p) ∨ q
T T F T T
T F F F F
F T T T T
F F T T T

Example 2.3.5 We also show that the proposition ¬(p ∨ q) is logically equivalent to the
proposition (¬p) ∧ (¬q):

p q ¬p ¬q p∨q ¬(p ∨ q) (¬p) ∧ (¬q)


T T F F T F F
T F F T T F F
F T T F T F F
F F T T F T T

Remark 2.3.6 Similarly, we can show that the proposition ¬(p ∧ q) is logically equivalent
to the proposition (¬p) ∨ (¬q). The equivalence of these two propositions and the equiv-
alence of the propositions given in Example 2.3.5 are collectively known as de Morgan’s
Laws.

2.4 Predicates and quantifiers


A predicate is a mathematical statement that depends on one or more variables. A simple
example is given by the statement

n is an even number.

This is a predicate that depends on a single variable n. For simplicity, let us assume that
the variable n is an integer and let us denote this predicate by P (n). We can now construct

19
propositions such as P (1), P (4) and P (21) whose truth values can be determined. P (1) is
false, P (4) is true and P (21) is false.

Remark 2.4.1 In general, we cannot assign a truth value to a predicate P (n) before
we specify the value of the variable n. However, this does not exclude the possibility
of constructing predicates whose truth values can be determined irrespectively of n. For
example, let n be an integer, and consider the conditional statement ‘if n is even, then n+1
is odd’. This is a predicate P (n) whose truth value is T for all n. Indeed, for every even
value of n, P (n) becomes a proposition of the form ‘T ⇒ T ’, which is true. For every odd
value of n, P (n) becomes a proposition of the form ‘F ⇒ F ’, which is also true. Hence,
for all n, P (n) is true. Similarly, the conjunction ‘n is even and n is odd’ is a predicate
P (n) whose truth value can be determined irrespectively of n. In this case, P (n) is false
for all n.
An existential statement is a statement which expresses the existence of at least one
object of a certain kind which has a particular property. Examples of existential statements
are given below:

there exists a real number n such that n2 = 17,


there exists an integer m such that 3 < m < 12.

An existential statement can be written in the form ∃n[P (n)]. The symbol ∃ is called the
existential quantifier and simply means ‘there exists’. The property P (n) satisfied by n
is enclosed in the square brackets.
Example 2.4.2 The existential statement ‘there exists a real number n such that n2 =
17’ can be written in the form ‘∃ real number n[n2 = 17]’. Similarly, the existential
statement ‘there exists an integer m such that 3 < m < 12’ can be written in the form
‘∃ integer m[3 < m < 12]’.
Remark 2.4.3 The existential statements given in Example 2.4.2 are both true. In order
to prove them, we simply need to find at least one real number n and at least one integer
m with the required properties. In general, in order to prove that an existential statement
is true we just need to find an example.
A universal statement is a statement which expresses the fact that all objects of a certain
kind have a particular property. Examples of universal statements are the following:

for all integers n, 2n is a even number,


for all integers m, if m2 + m is even, then m is even.

A universal statement can be written in the form ∀n[P (n)]. The symbol ∀ is called the
universal quantifier and simply means ‘for all’. The property P (n) satisfied by n is
enclosed in the square brackets.
Example 2.4.4 The universal statement ‘for all integers n, 2n is a even number’ can be
written in the form ‘∀ integers n[2n is even]’. Similarly, the universal statement ‘for all
integers m, if m2 +m is even, then m is even’ can be written in the form ‘∀ integers m[(m2 +
m is even) ⇒ (m is even)]’.

20
Remark 2.4.5 The first statement in Example 2.4.4 is true and is straightforward to
prove by considering separately the case where n is odd and the case where n is even. The
second statement in Example 2.4.4 is false. In order to establish that it is false we need
to find a counterexample. That is, we need to find an integer m such that m2 + m is even
and m is odd, thereby obtaining a conditional proposition of the form T ⇒ F which is
false. We can easily see that m = 1 provides such a counterexample. In general, in order
to prove that a universal statement is false we just need to find a counterexample.
In preparation for some methods of proof that we will discuss in subsection 2.5 and also
later in the course, let us write down the negation of an existential statement as well as
that of a universal statement:
The negation of the existential statement ∃n[P (n)] is the universal statement
∀n[¬P (n)].

The negation of the universal statement ∀n[P (n)] is the existential statement
∃n[¬P (n)].
! "
Remark 2.4.6 Equivalently, we can say that ¬ ∃n[P (n)] is logically equivalent to
! "
∀n[¬P (n)] and that ¬ ∀n[P (n)] is logically equivalent to ∃n[¬P (n)].

Example 2.4.7 Let n be an integer. The negation of the existential statement ‘there
exists an n such that n2 = 7’ is the universal statement ‘for all n, n2 )= 7’.
Example 2.4.8 Let n be a natural number. The negation of the universal statement ‘for
all n, n2 ≥ n’ is the existential statement ‘there exists an n such that n2 < n’.

2.5 Some methods of proof


In this subsection, we collect some methods of proof that we will encounter in this module.
A direct proof is the most straightforward method of proof where the truth or falsity of
a given mathematical statement is established directly.
Example 2.5.1 Let us prove the universal statement ‘for all integers n, n2 + n is an even
number’.
Our aim is to show that, given any integer n, n2 + n is equal to 2k where k is an integer.
Note that k will depend on n in general.
We split the proof in the case where n is odd and the case where n is even. In the
first case, let n = 2s + 1 for some integer s. Then n2 + n = (2s + 1)2 + (2s + 1) =
(4s2 + 4s + 1) + (2s + 1) = 4s2 + 6s + 2 = 2(2s2 + 3s + 1). Indeed, this expression has
the form 2k where k = 2s2 + 3s + 1 is an integer. In the second case, let n = 2t for some
integer t. Then n2 + n = (2t)2 + (2t) = 4t2 + 2t = 2(2t2 + t). Again, this expression has the
form 2k where k = 2t2 + t is an integer. Therefore, the universal statement ‘for all integers
n, n2 + n is an even number’ is true.
A proof by contraposition is concerned with the proof of a conditional statement of the
form p ⇒ q. The idea is to prove instead the logically equivalent contrapositive statement
(¬q) ⇒ (¬p) which may happen to be easier to deal with.

21
Example 2.5.2 Let us prove the universal statement ‘for all integers n, if n2 is even,
then n is even’.
A direct proof here seems to be a difficult task. We can imagine starting it by letting
n2 = 2k for some integer√k, but then it is difficult to see how to proceed from there. We
cannot claim that n = ± 2k is even because√the only thing that we know about k is that
it is an integer, and not all integers k make 2k an even number.
Instead, let us prove this statement by contraposition. Letting p be the predicate ‘n2 is
even’ and q be the predicate ‘n is even’, we need to prove the logically equivalent statement
(¬q) ⇒ (¬p); that is, we need to show that ‘if n is not even, then n2 is not even’. Note
that, given that n is an integer, n2 is also an integer, so if they are not even they must be
odd. Thus, we are left to prove the statement ‘if n is odd, then n2 is odd’ whose truth can
be established by a direct proof. To this end, we let n = 2s + 1 for some integer s. Then,
n2 = (2s + 1)2 = 4s2 + 2s + 1 = 2(2s2 + s) + 1, so n2 has the form 2k + 1 where k = 2s2 + s
is an integer. Therefore, n2 is odd and our task has been completed.
A proof by example or counterexample is a convenient method for either proving
an existential statement ‘∃n[P (n)]’ or disproving a universal statement ‘∀n[P (n)]’. Recall
from subsection 2.4 that disproving a universal statement ‘∀n[P (n)]’ amounts to proving
the existential statement‘∃n[¬P (n)]’.
Example 2.5.3 Let us prove the existential statement ‘there exists an integer n such
2n = n2 ’.
An example suffices: n = 2 is such an integer.
Example 2.5.4 Let us disprove the universal statement ‘for all integers n, 2n )= n2 ’.
A counterexample suffices: n = 2. Note that disproving this statement amounts to prov-
ing its negation, which is logically equivalent to the existential statement encountered in
Example 2.5.3. Our ‘counterexample’ here is called an ‘example’ there.
A proof by contradiction is based on the following idea. Suppose that we want to prove
that a statement s is true. Instead of showing directly that the truth value of s is T , we
assume that s is false; in other words, we assume that ¬s is true. Our aim is to produce a
valid mathematical argument based on the assumed truth of the premise ¬s which leads to
some statement r that is false. The resulting conditional statement (¬s) ⇒ r is based on a
valid mathematical deduction so it is a true statement, as explained in Remark 2.2.12. As
such, (¬s) ⇒ r can only have one of the forms ‘T ⇒ T ’, ‘F ⇒ T ’ or ‘F ⇒ F ’. Given that
the truth value of r is F , we conclude that our statement (¬s) ⇒ r has the form ‘F ⇒ F ’.
In this way we deduce that the truth value of the premise ¬s is F and hence the truth
value of s is T !
In the special case where the statement s whose truth we want to establish has the con-
ditional form ‘p ⇒ q’, we need to assume that ‘p ⇒ q’ is false; that is, we need to assume
that ‘¬(p ⇒ q)’ is true. Using the fact established in subsection 2.3 that ‘p ⇒ q’
! is logically
"
equivalent to ‘(¬p) ∨ q’, we see that ‘¬(p ⇒ q)’ is logically equivalent to ‘¬ (¬p) ∨ q ’.
In turn, this is logically equivalent to ‘p ∧ (¬q)’ by one of the de Morgan’s laws. Thus,
in order to prove by contradiction that ‘p ⇒ q’ is true, we need to assume that ‘p ∧ (¬q)’
is true (equivalently, that p is true and q is false) and then produce a valid mathematical
argument that leads to some statement r that is false.

22
Let us apply this idea to the universal statement encountered in Example 2.5.2 in order to
prove its validity by contradiction.
Example 2.5.5 Show that for all integers n, if n2 is even, then n is even.
This is a conditional statement of the form ‘p ⇒ q’ where p is the predicate ‘n2 is even’
and q is the predicate ‘n is even’. As explained above, in order to establish the truth of
this conditional statement by contradiction, we need to assume that ‘n2 is even’ and ‘n is
not even’ and produce a valid mathematical argument that leads to some statement r that
is false.
We first use the fact that n is an integer, so if it not even, then it is odd. Therefore, we
assume that ‘n2 is even’ and ‘n is odd’, and our aim is to reach a contradiction in the form
of some false statement r.
Assuming that n is odd, let n = 2s + 1 for some integer s. Based on that assumption,
we see that n2 = (2s + 1)2 = 4s2 + 4s + 1 = 2(2s2 + 2s) + 1 is of the form 2k + 1 where
k = 2s2 + 2s is an integer, so n2 is odd. However, we have also assumed that n2 is even.
Thus, we have reached the statement r that ‘n2 is both odd and even’ which is clearly
false. Our task has been completed.

2.6 Exercises for self study


Exercise 2.6.1 Use a truth table to prove that the propositions

q,

(p ∧ q) ∨ q,
and
(p ∨ q) ∧ q

are all logically equivalent.

Exercise 2.6.2 For integers n, consider the universal statements

∀ even n [(2n + n3 ) is even]

and
∀ odd n [(2n + n3 ) is odd].
Show that both these statements are true.

Exercise 2.6.3 (i) Use a truth table to prove that the proposition ‘¬(p ∧ q)’ is logically
equivalent to the proposition ‘(¬p) ∨ (¬q)’.
(ii) Let p be the proposition ‘4 is an even integer’ and q be the proposition ‘5 is an even
integer’. Express in words the propositions ‘¬(p∧q)’ and ‘(¬p)∨(¬q)’ in order to appreciate
their logical equivalence.

Exercise 2.6.4 (i) Use a truth table to show that the proposition ‘(q ⇒ r) ∧ (q ⇒ s)’ is
logically equivalent to the proposition ‘q ⇒ (r ∧ s)’.

23
(ii) Let r be a proposition whose truth value is F and let s be a proposition whose truth
value is T . Suppose that the truth value of the proposition ‘(q ⇒ r) ∧ (q ⇒ s)’ is T . What
can you say about the truth value of q?

Exercise 2.6.5 Building on Exercise 2.6.4, consider the following definitions:


An integer n is even only if n = 2k for some integer k.
An integer n is odd only if n = 2k + 1 for some integer k.
Also consider the following propositions:
q: ‘1 is an even integer’,
r: ‘2 is an odd integer’,
s: ‘2 is an even integer’.
For the sake of the argument, let us assume that it is difficult to establish the truth value
of any of these propositions - we can only agree that the proposition ‘r ∧ s’ is absurd and
hence obviously false.
Your task in this exercise is to prove that q is false.
Since we have assumed that it is difficult to derive the falsity of q directly from the given
definitions, you must follow a different approach:
(i) Start from the given definitions and construct a valid argument which shows that the
conditional proposition ‘q ⇒ r’ is true; that is, assume the truth of q and deduce the truth
of r on the basis of that assumption. Hint: Your argument may start as follows: ‘If 1 is an
even integer, then 1 = 2k for some integer k. Then, ...’.
(ii) Construct another valid argument which shows that the conditional proposition ‘q ⇒ s’
is also true.
It follows from parts (i) and (ii) that the conjunction ‘(q ⇒ r) ∧ (q ⇒ s)’ is true. Hence,
using the result established in Exercise 2.6.4, its logically equivalent proposition ‘q ⇒ (r∧s)’
is true as well.
(iii) Given that the conditional ‘q ⇒ (r ∧ s)’ is true and that ‘r ∧ s’ is absurd and hence
false, what do you infer about the truth value of q?
Note that this approach amounts to proving by contradiction that the proposition ‘1 is not
an even integer’ is true. Indeed, you have assumed that ‘1 is not an even integer’ is false
(equivalently, you have assumed that ‘1 is an even integer’ is true) and reached the absurd
conclusion that ‘2 is both an odd and an even integer’.

Exercise 2.6.6 For integers n, consider the universal statement ‘∀n [(n2 + 3n) is odd]’.
(i) Prove that this statement is false.
For integers n, consider the existential statement ‘∃n [(n2 + 3n) is odd]’.
(ii) Use the definitions given in Exercise 2.6.5 to prove that this statement is false as well.
Hint: Prove instead that its negation is true.

24
2.7 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Some paragraphs on logic can be found on pages 8 and 9 of our Algebra Textbook as part of
its preliminaries. I would also recommend that you read through the entire preliminaries,
from page 1 to page 9.
• N. Biggs, Discrete Mathematics, Oxford University Press.
If you have a copy of this book, chapters 1 and 3 can be useful as additional reading.

3 One-Variable Calculus, Part 1 of 7


3.1 Sets, subsets and intervals
A set is a collection of distinct objects. These objects are called the elements of the set.
A set can be defined either by listing its elements or by describing their properties. For
example, we can define a set, call it A, whose elements are the first five natural numbers.
Equivalently, we can write A = {5, 1, 4, 3, 2}. The order in which the elements of A
appear in this list is irrelevant. Moreover, listing a given element more than once does
not change the set. For example, {5, 1, 4, 3, 2}, {1, 2, 3, 4, 5} and {1, 1, 5, 1, 4, 2, 3, 3, 2, 1}
are equivalent representations of A. An alternative description of this set is given by
A = {n | n is an integer; 1 ≤ n ≤ 5}. The vertical bar means ‘such that’ and a semicolon is
used to separate the two properties satisfied by the general element n. The resulting string
of symbols is read as ‘A is the set of all n such that n is an integer and 1 ≤ n ≤ 5’.
Two sets X and Y are called equal if they have the same elements. In this case, we write
X = Y . If x is an element of a set X, we write x ∈ X. If z is not an element of X, we
write z ∈
/ X. A set S is said to be a subset of X if every element of S is also an element
of X. In this case, we write S ⊆ X. If, additionally, S and X are not equal, S is called
a proper subset of X. If we want to emphasise this fact, we write S ⊂ X. Note that
X = Y if and only if X ⊆ Y and Y ⊆ X.
Example 3.1.1 Consider the sets A = {1, 2, 3, 4, 5}, B = {5, 4, 3, 2, 1} and C = {2, 3, 4}.
The statements A ⊆ B, B ⊆ A, A = B, C ⊆ A, C ⊂ A, C ⊆ B, C ⊂ B are all true. The
statements A ⊂ B, B ⊂ A are both false.
Example 3.1.2 Consider the sets A = {5, 6} and B = {4, 6, 7, 8, 9}. A is not a subset
of B because not all elements of A are elements of B. In particular, 5 ∈/ B. Similarly, B
is not a subset of A because not all elements of B are elements of A. For example, 7 ∈
/ A.
Given two sets X and Y , the union X ∪ Y is the set whose elements belong to X or Y .
The linking word ‘or’ has its standard inclusive meaning; that is, any element that belongs
to both X and Y is included in the union X ∪ Y . The intersection X ∩ Y is the set whose
elements belong to both X and Y .
Example 3.1.3 Let A = {1, 2, 3, 4, 5}, B = {1, 7, 8} and C = {4, 5, 6, 7}. Their unions
and intersections are: A ∪ B = {1, 2, 3, 4, 5, 7, 8}, A ∩ B = {1}, A ∪ C = {1, 2, 3, 4, 5, 6, 7},
A ∩ C = {4, 5}, B ∪ C = {1, 4, 5, 6, 7, 8}, B ∩ C = {7}.

25
The cardinality |X| of a set X is the number of elements of X. So, if A = {5, 6, 9}, its
cardinality is |A| = 3. There is a unique set with no elements. This is denoted by ∅ and
called the empty set. The cardinality of the empty set is zero. Some sets have infinitely
many elements and therefore infinite cardinality. These are called infinite sets. The set

N = {1, 2, 3, ...}

of all natural numbers and the set

Z = {..., −2, −1, 0, 1, 2, ...}

of all integers are both infinite sets. Another infinite set for which a special symbol is
reserved is the set Q of all rational numbers. These are numbers which can be written in
the form ab , where a and b )= 0 are elements of Z. The set R of real numbers is the set of
all numbers that can be written as decimals. R consists of the√ elements of Q and all the
irrational numbers. Irrational numbers are decimals such as 2, π, and e which cannot be
written as ratios of integers. Later in the course, we will encounter the set C of complex
numbers. These are numbers of the form x + iy, where x and y are elements of R and i is
defined by the property that i2 = −1.
Let us now focus on the set R. A subset I of R is called an interval if, whenever it
contains two real numbers, it contains all the real numbers between them. An interval can
be visualised as a line segment. It may include both its end points, one of them, or none
of them. Typical intervals are illustrated below:

In all these cases, a and b are called the endpoints of the interval regardless of whether they
belong to it or not. Any point of an interval that is not an endpoint is called an interior
point. Note that ∞ and −∞ are not real numbers but symbols. These are neither interior
points nor endpoints. A square bracket implies that the endpoint is included in the interval
while a circle bracket implies that the endpoint is excluded from the interval. The symbols
∞ and −∞ are associated with circle brackets; for example, we have R = (−∞, ∞). An
interval of the form (a, b) is called an open interval and an interval of the form [a, b] is
called a closed interval. We will not use any special name for intervals of the form (a, b]
or [a, b).

3.2 Functions, domain, codomain and range


Given two sets X and Y , a function f : X → Y is a rule which assigns to each element
x ∈ X a unique element y ∈ Y denoted by f (x). The set X is called the domain of f and
the set Y is called the codomain of f . In the case where the sets X and Y are subsets of
R, the function is said to be a real-valued function of a single real variable. The variable x
is called the independent variable and y is called the dependent variable. Once the

26
sets X, Y and the rule f have been specified, it does not matter what letters we choose to
denote these variables. For example,

f : [4, 7) → (1, 9] defined by y = f (x) = x

and √
f : [4, 7) → (1, 9] defined by f (s) = s

are equivalent descriptions of f . Note that the second description does not even involve a
letter for the dependent variable; the rule f is equally clear without it.
The set of outputs of f : X → Y is called the range of f and is denoted by R(f ) ⊆ Y :

R(f ) = {y ∈ Y | y = f (x) for some x ∈ X}.

Note that the range of f : X → Y may be a proper subset of the codomain Y .


Example 3.2.1 Let X = [0, 1] and Y = [3, 8]. Consider the function f : X → Y defined
by f (x) = 2x + 4. Here, the range of f is a proper subset of the codomain Y , given by
R(f ) = [4, 6].

3.3 The set R2 , Cartesian equations and graphs


The set R2 can be defined as the set of all pairs (a, b), where a and b are elements of R:

R2 = {(a, b) | a ∈ R and b ∈ R}.

Note that the notation (a, b) for a point in R2 coincides with the notation (a, b) for an open
interval. These concepts are of course unrelated.
It is helpful to visualise R2 and its elements. To this end, we introduce the so-called
Cartesian plane. The Cartesian plane is a plane equipped with a horizontal x-axis and a
vertical y-axis. These axes intersect at a point called the origin of the Cartesian plane. The
element (a, b) ∈ R2 is represented by the point in the Cartesian plane whose x-coordinate
is a and y-coordinate is b. Thus, the element (0, 0) ∈ R2 corresponds to the origin. The
Cartesian plane is depicted below:

Figure 3.3.1

A Cartesian equation of the form G(x, y) = 0 determines a subset SG ⊆ R2 which


consists precisely of those points (x, y) ∈ R2 that satisfy G(x, y) = 0.
Example 3.3.2 Let F (x, y) = y − x, G(x, y) = x2 + y 2 and H(x, y) = x − 2. Consider the
subsets SF ⊆ R2 , SG ⊆ R2 and SH ⊆ R2 defined by the Cartesian equations F (x, y) = 0,

27
G(x, y) = 0 and H(x, y) = 0, respectively. SF corresponds to a diagonal line in the Carte-
sian plane passing through the origin (0, 0), SG consists of only the origin (0, 0) and SH
corresponds to a vertical line passing through the point (2, 0). These subsets are illustrated
below:

Figure 3.3.3

The graph of a function f : X → Y is the subset of R2 defined by the Cartesian equation


y = f (x), where x ∈ X. Note that this equation can be cast in the general form G(x, y) = 0
if we define G(x, y) = y − f (x). Alternatively, we can write

Graph(f ) = {(x, f (x)) ∈ R2 | x ∈ X}.

Example 3.3.4 Let X = [−4, 6] and Y = R. Consider the function f : X → Y given by


f (x) = x2 . The graph of f corresponds to the subset of R2 sketched below:

Figure 3.3.5

Any real-valued function of a single real variable can be represented by a graph in R2


defined by the Cartesian equation y = f (x). However, a given subset SG ⊆ R2 defined by
a Cartesian equation of the form G(x, y) = 0 may not always represent a function. Some
characteristic cases are illustrated below:

28
Figure 3.3.6

Remark 3.3.7 Figure 3.3.6 captures the following idea: A subset SG ⊆ R2 defined by
G(x, y) = 0 represents a function if every vertical line in R2 intersects SG at most once.

3.4 Surjective, injective and bijective functions


A function f : X → Y is called surjective (or onto) if its range is equal to its codomain:
R(f ) = Y . In other words, for all y ∈ Y there exists an x ∈ X such that f (x) = y.
A function f : X → Y is called injective (or one-to-one) if distinct inputs are mapped
to distinct outputs; that is, for all x1 ∈ X, x2 ∈ X,

if x1 )= x2 , then f (x1 ) )= f (x2 ).

An alternative definition of an injective function utilises the logically equivalent contrapos-


itive statement: for all x1 ∈ X, x2 ∈ X,

if f (x1 ) = f (x2 ), then x1 = x2 .

A function f : X → Y is called bijective if it is both injective and surjective.


Typical examples of functions are given below. The function g is surjective but not injective,
h is injective but not surjective, j is bijective and k is neither surjective nor injective.

29
Figure 3.4.1

Remark 3.4.2 A function is injective if every horizontal line in R2 intersects its graph
at most once.
Note that if we restrict the domain and the codomain of g, h and k, we can construct
corresponding bijective functions:

Figure 3.4.3

30
3.5 The derivative
The derivative measures the response of a dependent variable y to changes in an indepen-
dent variable x. It generalises the concept of the slope of a line.

Let us start with the graph of a line in R2 described by the Cartesian equation y = mx+C.
The slope m of this graph determines the response of the dependent variable y to changes
in the independent variable x in the following sense: Consider any point on this graph.
If its x-coordinate is changed by an amount h, its y-coordinate must be changed by an
amount mh in order for this point to remain on the graph:

Figure 3.5.1

Given any two points (x0 , mx0 + C) and (x0 + h, mx0 + mh + C) on this graph, the slope
of the graph is defined by the ratio

δy mx0 + mh + C − mx0 − C mh
= = = m,
δx x0 + h − x0 h
where δx is the change in the x-coordinate and δy is the corresponding change in the
y-coordinate.

In the case of a graph which is not a line, the response of y to x may vary from point to
point. Therefore, the definition of the slope must be modified so that it can be applied to
an individual point.

Figure 3.5.2

31
For this purpose, let us select a random point (x0 , f (x0 )) on the graph of some random
function f . This function is assumed to be differentiable at the point (x0 , f (x0 )) in a
sense to be explained in subsection 3.6. For the time being, our task is to define the slope
of this graph at the point (x0 , f (x0 )). As illustrated below, we introduce a nearby point
(x0 + h, f (x0 + h)) and investigate what happens to the slope of the line segment joining
the points (x0 , f (x0 )) and (x0 + h, f (x0 + h)) as we bring (x0 + h, f (x0 + h)) closer to
(x0 , f (x0 )):

Figure 3.5.3

In general, the line segment joining (x0 , f (x0 )) and (x0 +h, f (x0 +h)) does not coincide with
the piece of the graph between (x0 , f (x0 )) and (x0 + h, f (x0 + h)) because the latter may be
a curve. However, this piece looks straighter and straighter as the point (x0 + h, f (x0 + h))
moves closer and closer to the original point (x0 , f (x0 )).

Figure 3.5.4

In the limit as h tends to zero, the graph of f between the points (x0 , f (x0 )) and
(x0 + h, f (x0 + h)) effectively becomes a straight line, thereby allowing us to define the
slope of the curve y = f (x) at the individual point (x0 , f (x0 )). More precisely, we define
the slope at the point (x0 , f (x0 )) by the expression

f (x0 + h) − f (x0 )
limh→0
h
32
and call it the derivative of f at x0 .

The derivative of f at x0 is denoted by various symbols, among them, f " (x0 ), Df (x0 ),
df dy
(x0 ), (x0 ) and y " (x0 ).
dx dx
Remark 3.5.5 It should be emphasised that the limit h → 0 does not mean that
h = 0. It means that h becomes exceedingly small while remaining non-zero. Hence, the
f (x0 + h) − f (x0 )
numerator and the denominator of the fraction are not zero and their
h
ratio is not undefined. Still, there is no guarantee that the limit of this ratio will exist for
all points in the domain of a function f . Whether this limit exists or not depends both
on the function f and the particular point considered. We will go through some examples
in subsection 3.6 where the derivative does not exist at a given point (x0 , f (x0 )) because
taking the limit there fails to produce a unique finite answer.

First, let us apply this process to a case where the derivative does exist.

Example 3.5.6 Let us calculate the derivative of the function f (x) = x2 at any given
point x0 ∈ R. Using the definition, we have

f (x0 + h) − f (x0 ) (x0 + h)2 − (x0 )2


f " (x0 ) = limh→0 = limh→0 .
h h
We expand the numerator in order to eliminate the x0 2 terms and then factorise the
resulting expression to obtain

h(2x0 + h)
f " (x0 ) = limh→0 .
h
Now recall that h is not actually zero; it only tends to zero. Hence, since h )= 0, we
eliminate the h terms and obtain the expression

f " (x0 ) = limh→0 (2x0 + h).

This expression is well-defined even if h = 0. Therefore, at this stage, letting h tend to


zero is virtually equivalent to setting h to zero. In this way, we find that

f " (x0 ) = 2x0 .


We finally observe that this calculation is valid for any point x0 . We can therefore replace
the label x0 by x and simply say that the derivative of f (x) = x2 is given by f " (x) = 2x.
Geometrically, this result can be visualised as follows:

33
Figure 3.5.7

Having defined the derivative f " (x), we can now define the tangent line to the graph of
y = f (x) at the point (x0 , f (x0 )) as the line which passes through (x0 , f (x0 )) and has slope
equal to f " (x0 ). A Cartesian equation for this line is

y − f (x0 ) = f " (x0 )(x − x0 ).

Example 3.5.8 Let us find a Cartesian equation for the line in R2 which is tangent to
the graph of y = x2 at the point (3, 9). By applying the above formula, we get

y − 9 = f " (3)(x − 3),

where f " (3) = 2(3) = 6. Hence,


y − 9 = 6(x − 3)
or, equivalently,
y = 6x − 9.

3.6 Continuity and differentiability


Let us now consider some typical examples where the derivative fails to exist.
In the first example below, the function f is continuous because there are no breaks in
its graph, but it is not differentiable at the given point. This is due to the fact that
although the left derivative and the right derivative both exist, they are not equal to
each other. The terms ‘left’ and ‘right’ refer to the neighbourhood of the given point and
the slope of the graph on either side of this point. We will explain this shortly.
In the second example below, the function g is neither continuous nor differentiable at the
given point. We will confirm this by direct calculation of the left and right derivatives, but
do note the following theorem:

34
Theorem 3.6.1a If a function is not continuous at a given point, then it is not differen-
tiable there either.
We will not go into the details of this theorem in this course, but do also note its contra-
positive version:
Theorem 3.6.1b If a function is differentiable at a given point, then it is continuous
there as well.
In the third example below, the function h is continuous but not differentiable at the given
point because neither the left nor the right derivative exist there. The relevant calculations
are performed below each figure:
Example 3.6.2 The modulus function f (x) = |x| is defined by f (x) = x when x ≥ 0
and by f (x) = −x when x < 0. This function has the following graph:

Figure 3.6.3

It is clear from this graph that f is continuous at x = 0 and at any other point, but it is
not differentiable at x = 0. Indeed, as we approach the origin from the left, the slope of the
line segment that joins the ‘approaching point’ and the origin is −1, but as we approach
the origin from the right, the slope of the corresponding line segment is 1. Hence, there is
no unique finite value for the slope of the graph at the point 0, which makes the function
non-differentiable there. Let us confirm this by direct calculation:
For a positive ε, the left derivative at the point 0 is given by

f (0) − f (0 − ε) 0−ε
limε→0 = limε→0 = limε→0 (−1) = −1.
ε ε

The fact that f (−ε) = −(−ε) = ε follows from the definition of the modulus function.
Similarly, for a positive ε, the right derivative at the point 0 is given by

f (0 + ε) − f (0) ε−0
limε→0 = limε→0 = limε→0 (1) = 1.
ε ε

If you are wondering why the calculation of the left derivative of f can be performed using
a positive ε as opposed to a negative one, then note the following:

35
Starting with the general definition of the derivative of f at the point x0 , that is,

f (x0 + h) − f (x0 )
limh→0 ,
h
and using x0 = 0 and a negative h, we obtain the following formula for the left derivative
of f at 0:
f (0 + h) − f (0)
limh→0 .
h

If we now replace the negative h by a positive ε in the above formula, that is, if we use
h = −ε, we arrive at the alternative formula

f (0 − ε) − f (0)
limε→0
−ε
which is indeed identical to the formula
f (0) − f (0 − ε)
limε→0
ε
used above.
Example 3.6.4 The function g is given by g(x) = x + 1 when x > 0 and by g(x) = x
when x ≤ 0. Its graph is presented below:

Figure 3.6.5

This graph shows that g is not continuous at x = 0, so on the basis of Theorem 3.6.1a,
it cannot be differentiable there either. Indeed, as we approach the origin from the left,
we see that the slope of the line segment that joins the ‘approaching point’ and the origin
is 1. However, as we approach the origin from the right, the line segment joining the
‘approaching point’ (ε, ε + 1) and the origin becomes vertical, so its slope tends to infinity.
Since there is no unique finite value for the slope of the graph at the point 0, the function
is non-differentiable there. The following calculation confirms our conclusion:
For a positive ε, the left derivative at 0 is given by

g(0) − g(0 − ε) 0 − (−ε)


limε→0 = limε→0 = limε→0 (1) = 1.
ε ε

Note that g(−ε) = −ε as a consequence of the definition of g.

36
Similarly, for a positive ε, the right derivative at 0 is given by
g(0 + ε) − g(0) (ε + 1) − 0
limε→0 = limε→0 → ∞.
ε ε

Example 3.6.6 The function h is given for all x by h(x) = x1/3 . Its graph is sketched
below:

Figure 3.6.7

The function h is continuous everywhere. However, it is not differentiable at x = 0 because


its graph becomes vertical there, so its slope tends to ∞. Note that there is no need to
consider the left derivative and the right derivative separately in this case, because the
function h is described by the same expression x1/3 on either side of the point 0. We have:
For either positive or negative ε, the derivative at 0 is given by
h(0 + ε) − h(0) ε1/3 − 0 1
limε→0 = limε→0 = limε→0 2/3 → ∞.
ε ε ε
Remark 3.6.8 Having explained the meaning of the left and right derivatives, we can
now define what we mean by a function f being differentiable at a point x0 . We say that
f is differentiable at an interior point x0 of its domain if the left and right derivatives of
f at x0 are finite and equal to each other. If this is the case, we say that the derivative of
df
f exists at x0 and we denote it by one of the standard symbols f " (x0 ), Df (x0 ), (x0 ),
dx
dy
(x0 ) or y " (x0 ) introduced earlier.
dx

3.7 Exercises for self study


Exercise 3.7.1 Consider the function f : R → R defined by

 1 x≤3
f (x) =

(x − 3)2 + 1 x>3

(i) Sketch the graph of f on the Cartesian plane.


For the next part, use the formula for the derivative at a given point x0 :
f (x0 + h) − f (x0 )
limh→0 .
h
37
(ii) Calculate the left and right derivatives of f at the point x0 = 3 in order to investigate
whether or not f is differentiable there. If it is differentiable, what is the value of f " at
x0 = 3?

Exercise 3.7.2 For this question, use the formula for the derivative at a general point x,

f (x + h) − f (x)
limh→0 ,
h
to show that the derivative of the function

f (x) = x3 − 4x
is the function
f " (x) = 3x2 − 4.

Exercise 3.7.3 Consider the function

f : R → [−3, ∞) defined by f (x) = x2 + 7.

(i) Find the domain, codomain and range of f .


(ii) Sketch the graph of f on the Cartesian plane and write down a Cartesian equation of
the form G(x, y) = 0 which corresponds to your graph.
(iii) Is f surjective? Is it injective? Explain briefly each of your answers.
(iv) Restrict the domain and the codomain of f appropriately in order for f to become a
bijective function.

Exercise 3.7.4 (i) Use the definition of the derivative at a general point x, namely

f (x + h) − f (x)
f " (x) = limh→0 ,
h

to show that the derivative of the function

f (x) = 2x2 + 3x + 1
is the function
f " (x) = 4x + 3.

For the following parts, consider the function g : R → R defined by



 2 x≤1
g(x) =

x+1 x>1

(ii) Sketch the graph of g on the Cartesian plane.


(iii) On the basis of your graph, is g differentiable at x0 = 1?

38
(iv) Use the formula for the derivative at a point x0 , namely

g(x0 + h) − g(x0 )
limh→0 ,
h

to calculate the left and right derivatives of g at x0 = 1. Confirm that your findings are
consistent with your answer to the previous part.

3.8 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sets and numbers can be found in the preliminaries of our Algebra Textbook.
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 2.1, 2.2, 2.7 and 2.8 of our Calculus Textbook are all relevant. It is also useful to
read through section 2.3 where the graphs of some elementary functions are described.

4 One-Variable Calculus, Part 2 of 7


4.1 Derivatives of elementary functions
The following results can be obtained by applying the definition of the derivative to ele-
mentary functions. We have:

f (x) f " (x)


xk kxk−1
ln(x) 1/x
ex ex
sin(x) cos(x)
cos(x) −sin(x)

where k ∈ Z. These results should be memorised as they will be used frequently in this
course.

4.2 Sum, product, quotient and chain rules


In addition, given arbitrary differentiable functions f , g, h, we can derive the following
general rules:

The sum rule


The sum rule states that the derivative of a sum of functions is the sum of the individual
derivatives: ! ""
f (x) + g(x) = f " (x) + g " (x),
! ""
f (x) + g(x) + h(x) = f " (x) + g " (x) + h" (x),

39
and so on.

Example 4.2.1 The derivative of the function

f (x) = cos(x) − x5 + 3

is given by
f " (x) = −sin(x) − 5x4 .

The product rule


The product rule expresses the fact that the derivative of a product of functions can be
obtained by differentiating each function in turn while leaving the other functions alone:
! ""
f (x)g(x) = f " (x)g(x) + f (x)g " (x),
! ""
f (x)g(x)h(x) = f " (x)g(x)h(x) + f (x)g " (x)h(x) + f (x)g(x)h" (x),

and so on.

Example 4.2.2 The derivative of the function

f (x) = 5xcos(x)ex

is given by
f " (x) = 5cos(x)ex − 5xsin(x)ex + 5xcos(x)ex .
The quotient rule
The quotient rule allows us to calculate the derivative of the ratio of two functions:
& '"
f (x) f " (x)g(x) − f (x)g " (x)
= ! "2 .
g(x)
g(x)

Example 4.2.3 Consider the function


ln(x)
f (x) = .
x3
By applying the quotient rule and simplifying the numerator, we obtain
x2 − 3x2 ln(x)
f " (x) = .
x6

The chain rule


It is best to illustrate the chain rule by an example. Consider the function

y(x) = cos(x7 + x5 ).

This function is not an elementary function, so its derivative is not among the list presented
in subsection 4.1. Either we have to differentiate y(x) from first principles or we can simply
regard y(x) as the composition v(w(x)) of the elementary functions

v(w) = cos(w) and w(x) = x7 + x5

40
whose derivatives are known. Formally, the composition of two functions v and w is the
function denoted by v ◦ w and defined by

(v ◦ w)(x) = v(w(x)).

This expression is meaningful provided that x belongs to the domain of w and that the
range of w is a subset of the domain of v.

The chain rule tells us that the derivative of the composition y(x) = v(w(x)) is the product
of the individual derivatives according to

dy(x) dv(w) dw(x)


= .
dx dw dx

By applying the chain rule to our current functions and expressing the final answer in
terms of x, we obtain

dy(x)
= −sin(w)(7x6 + 5x4 ) = −sin(x7 + x5 )(7x6 + 5x4 ).
dx

Remark 4.2.4 In practice, it is worth dispensing with all the intermediate labels such as
w. Note that w represents the circle bracket (x7 + x5 ) and that the chain rule effectively
allows us to treat this bracket as a variable. Therefore, we can write down the final answer
directly: ( )"
cos( ) = −sin( )( )" = −sin(x7 + x5 )(7x6 + 5x4 ).

The empty brackets are purely to illustrate the structure of the chain rule; they are not
needed either.

The larger the number of compositions of functions required, the greater the convenience
gained by avoiding intermediate labels. For example, the derivative of the function
* ( )+18
3 2
y(x) = sin ln(x + x )

can be written down directly without introducing any labels:


* ( )+17 ( ) 1
" 3 2 3 2
y (x) = 18 sin ln(x + x ) cos ln(x + x ) 3 2
(3x2 + 2x).
(x + x )

Example 4.2.5 Consider the function

f (x) = x4 ln[sin(x6 ex )].

Its derivative can be obtained by combining the chain rule and the product rule:
1
f " (x) = 4x3 ln[sin(x6 ex )] + x4 cos(x6 ex )(6x5 ex + x6 ex ).
[sin(x6 ex )]

41
4.3 Higher order derivatives
The second derivative of a twice differentiable function f is simply the derivative of its
derivative: , -"
f "" (x) = f " (x) .
Similarly, the third derivative of a thrice differentiable function f is the derivative of its
second derivative: , -"
f """ (x) = f "" (x) ,
and so on. The most convenient notation for the nth -derivative of f is Dn f (x).

Example 4.3.1 Using the sum, product, quotient and chain rules, let us find the second
derivative D2 f (x) of the function

x4 sin(3x)
f (x) = .
x2 + 5

The first derivative of f (x) is given by

[4x3 sin(3x) + 3x4 cos(3x)](x2 + 5) − (2x)x4 sin(3x)


f " (x) = .
(x2 + 5)2
It is worth simplifying the numerator of this expression in order to bring it into the form

(2x5 + 20x3 )sin(3x) + (3x6 + 15x4 )cos(3x)


f " (x) = .
(x2 + 5)2
By applying the quotient rule once more,

[(2x5 + 20x3 )sin(3x) + (3x6 + 15x4 )cos(3x)]" (x2 + 5)2


f "" (x) =
(x2 + 5)4
2(x2 + 5)(2x)[(2x5 + 20x3 )sin(3x) + (3x6 + 15x4 )cos(3x)]
− ,
(x2 + 5)4
we obtain the following expression for the second derivative:

[(10x4 + 60x2 )sin(3x) + 3(2x5 + 20x3 )cos(3x)](x2 + 5)2


f "" (x) =
(x2 + 5)4
[(18x5 + 60x3 )cos(3x) − 3(3x6 + 15x4 )sin(3x)](x2 + 5)2
+
(x2 + 5)4
2(x2 + 5)(2x)[(2x5 + 20x3 )sin(3x) + (3x6 + 15x4 )cos(3x)]
− .
(x2 + 5)4
Note that this calculation would have been significantly harder had we introduced inter-
mediate labels.

42
4.4 Taylor polynomials and Taylor series
A real polynomial of degree n is a function that can be written in the form

p(x) = an xn + an−1 xn−1 + ... + a1 x + a0

for some integer n ≥ 0 and some real numbers a0 , a1 ,...an , where an )= 0.

Given a function f with the property that all its derivatives up to Dn f exist, it is possible
to find a polynomial Pn whose graph closely resembles the graph of f in the neighbourhood
of a given point a. This is known as the Taylor polynomial of f about a of degree n
and is defined by:
1 2 1
Pn (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + ... + Dn f (a)(x − a)n .
2! n!

The characteristic property of Pn is that all its derivatives up to the nth -derivative evaluated
at a are equal to the corresponding derivatives of f at a. That is,

Pn (a) = f (a),

DPn (a) = Df (a),


D2 Pn (a) = D2 f (a),
..
.
Dn Pn (a) = Dn f (a).
You may find it useful to confirm these relations starting from the definition of Pn .
Example 4.4.1 Let us derive the first three Taylor polynomials, P0 (x), P1 (x) and P2 (x)
for the function f (x) = ex about the point 0 and sketch these polynomials and the function
f on the same graph. We need the derivatives of f up to the second derivative:

f (x) = ex ,

Df (x) = ex ,
D2 f (x) = ex .
Evaluating these derivatives at the point 0, we obtain

f (0) = 1,

Df (0) = 1,
D2 f (0) = 1.
Substituting these values into the formula for the Taylor polynomial P2 (x), we find
1
P2 (x) = 1 + (x − 0) + (x − 0)2 .
2!
Therefore, the first three Taylor polynomials of f about 0 are:

P0 (x) = 1,

43
P1 (x) = 1 + x,
1
P2 (x) = 1 + x + x2 .
2
These are sketched on the same graph as f , below:

Note that, in general, the Cartesian equation

y = P0 (x) = f (a)

describes the horizontal line that passes through the point (a, f (a)) on the graph of f ,
while the Cartesian equation

y = P1 (x) = f (a) + f " (a)(x − a)

describes the tangent line to the graph of f at the point (a, f (a)).

If all the derivatives of f exist, we can let n tend to ∞ and define the Taylor series of f
about a given point a as the infinite sum
1 2 1
P∞ (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + D3 f (a)(x − a)3 + ....
2! 3!
A question arises: ‘does the Taylor series converge to f (x)’ ? In other words, ‘is P∞ (x)
the same function as f (x)’ ?

On most occasions, the answer is ‘yes’: the Taylor series converges to f (x) for all x in the
domain of f or at least for some set of values of x that includes a. For those values of x,
we indeed have
1 2 1
f (x) = P∞ (x) = f (a) + Df (a)(x − a) + D f (a)(x − a)2 + D3 f (a)(x − a)3 + ....
2! 3!
However, there are cases where the series P∞ (x) converges to f (x) only when x = a and
hence loses its usefulness away from a, and there are also cases where P∞ (x) converges

44
to a function other than f (x). We will not encounter any such exceptional cases in this
module.

The following example is concerned with a typical Taylor polynomial, the corresponding
Taylor series and its convergence.

Example 4.4.2 Let us derive the Taylor polynomial Pn (x) about 0 for the function
1
f (x) = ,
1−x
where x )= 1.
First, we calculate a few derivatives in order to spot a pattern. By the chain rule, we have

f (x) = (1 − x)−1 ,

Df (x) = −(1 − x)−2 (−1) = (1 − x)−2 ,


D2 f (x) = −2(1 − x)−3 (−1) = 2(1 − x)−3 ,
D3 f (x) = −3!(1 − x)−4 (−1) = 3!(1 − x)−4 ,
D4 f (x) = −4!(1 − x)−5 (−1) = 4!(1 − x)−5 ,
..
.
Dn f (x) = −n!(1 − x)−(n+1) (−1) = n!(1 − x)−(n+1) .

Hence, evaluating these derivatives at the point 0, we obtain

f (0) = 1,

Df (0) = 1,
D2 f (0) = 2,
D3 f (0) = 3!,
D4 f (0) = 4!,
..
.
Dn f (0) = n!.
Substituting these values into the formula for the Taylor polynomial Pn (x), we find
1 1 1
Pn (x) = 1 + (x − 0) + 2(x − 0)2 + 3!(x − 0)3 + ... + n!(x − 0)n .
2! 3! n!
Thus, simplifying this expression, we obtain the approximate equation
1
6 Pn (x) = 1 + x + x2 + x3 + · · · + xn .
1−x
In order to investigate the convergence of the corresponding Taylor series, let us multiply
both sides of this approximation by 1 − x. We get

1 6 Pn (x)(1 − x) = (1 + x + x2 + x3 + ... + xn ) − (x + x2 + x3 + · · · + xn + xn+1 )

45
which, upon simplification, reduces to the following approximate equation:

1 6 1 − xn+1 .

Now recall that the Taylor series is derived from the Taylor polynomial Pn by letting
n → ∞. As long as −1 < x < 1, we have that xn+1 → 0 as n → ∞. Hence, the right hand
side ‘1 − xn+1 ’ of the above equation approaches the left hand side ‘1’. Accordingly, the
error in the approximation of f (x) by Pn (x) becomes exceedingly small as n → ∞, and
the Taylor series
P∞ (x) = 1 + x + x2 + x3 + x4 + . . .
converges to f (x). On the other hand, if x ≤ −1 or x > 1, the expression ‘1 − xn+1 ’ does
not approach ‘1’ as n → ∞, which means that the Taylor series does not converge to f (x).
For example, if we evaluate f (x) at x = 2 we find
1
f (2) = = −1,
1−2
while if we evaluate P∞ (x) at x = 2 we find

P∞ (2) = 1 + 2 + 22 + 23 + 24 + · · · → ∞.

In conclusion, P∞ converges to f (x) for |x| < 1 and diverges otherwise.

Note that this is an example of a Taylor series which converges to f (x) for some set of
values of x that includes the point of expansion a.

4.5 Exercises for self study


Exercise 4.5.1 Differentiate the following functions with respect to x using the sum,
product, quotient and chain rules, as appropriate:

x 3 x2 + 1 −x
(a) tan(x), (b) e sin(x − 1), (c) 2 , (d) .
x +x+1 x2 + x + 1

(e) Why is the derivative of the function in (c) the same as the derivative of the function
in (d)?
(f) Find a Cartesian equation in R2 that describes the tangent line to the graph of the
function
f (x) = ex sin(x3 − 1)
at the point (1, 0) on this graph. Note that f (x) is the function encountered in (b).

Exercise 4.5.2 Find the quadratic Taylor polynomial P2 (x) of the function

f (x) = x3 e−x

about the point a = 3.


Also derive the Taylor series P∞ (x) of the function

g(x) = cos(x)

46
about the point a = 0. You can assume that P∞ (x) converges to g(x) for all x.

Exercise 4.5.3 (i) Using the product and chain rules, differentiate the function

f (x) = 2x3 cos(ex + x2 + 5)

with respect to x.
(ii) Hence, find the Taylor polynomial P1 (x) of the function f (x) about the point a = 1
and also find a Cartesian equation in R2 for the tangent line to the graph of f at the point
(1, 2cos(e + 6)).

Exercise 4.5.4 (i) Find the quadratic Taylor polynomial P2 (x) of the function

f (x) = x − ln(x)

about the point a = 4.


(ii) Derive the Taylor series P∞ (x) of the function

g(x) = sin(x)

about the point a = 0. You can assume that P∞ (x) converges to g(x) for all x.
(iii) Use your result from part (ii) to find the Taylor series P∞ (x) of the function

h(x) = sin(x3 )

about the point a = 0.


(iv) Also find the Taylor polynomial P3 (x) of the function

k(x) = (2x + 1)sin(x) + 3x2

about the point a = 0. Hint: This should also follow from part (ii).

4.6 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 2.10, 2.11, 2.12, 2.13 and 2.15 (up to exercise 6) of our Calculus Textbook are
all relevant. The exercises found in sections 2.11 and 2.15 are optional but nevertheless
useful. You can be selective.

5 One-Variable Calculus, Part 3 of 7


5.1 Increasing and decreasing functions
A function f is said to be increasing on an interval I only if

f (x2 ) ≥ f (x1 )

47
whenever x1 ∈ I, x2 ∈ I and x2 > x1 . It is said to be strictly increasing on I only if

f (x2 ) > f (x1 )

whenever x1 ∈ I, x2 ∈ I and x2 > x1 .

A strictly increasing function is also an increasing function.

Similarly, a function f is said to be decreasing on an interval I only if

f (x2 ) ≤ f (x1 )

whenever x1 ∈ I, x2 ∈ I and x2 > x1 . It is said to be strictly decreasing on I only if

f (x2 ) < f (x1 )

whenever x1 ∈ I, x2 ∈ I and x2 > x1 .

A strictly decreasing function is also a decreasing function.

Remark 5.1.1 Some books may use the terms ‘non-decreasing’ and ‘non-increasing’ to
describe what we call ‘increasing’ and ‘decreasing’, respectively. They may also use the
terms ‘increasing’ and ‘decreasing’ instead of ‘strictly increasing’ and ‘strictly decreasing’.
Keep this in mind, especially if you are using various sources when studying.

Given a differentiable function f on an open interval I, the following statements are true:

1. f is increasing on I if and only if f " (x) ≥ 0 for all x ∈ I.

2. If f " (x) > 0 for all x ∈ I, then f is strictly increasing on I.

3. f is decreasing on I if and only if f " (x) ≤ 0 for all x ∈ I.

4. If f " (x) < 0 for all x ∈ I, then f is strictly decreasing on I.

The proof of these statements is omitted.

Remark 5.1.2 Note that the second statement does not claim ‘if and only if’. This
is because a differentiable function f on an open interval I may satisfy the non-strict
inequality f " (x) ≥ 0 and still be a strictly increasing function on I. An example is the
function f (x) = x3 on R. Its derivative f " (x) = 3x2 is not positive everywhere on R because
it vanishes at 0. Still, f is a strictly increasing function on R because x2 3 > x1 3 whenever
x1 ∈ R, x2 ∈ R and x2 > x1 . A similar remark applies to the fourth statement above.
Example 5.1.3 Let us show that the function f (x) = x2 is
(i) strictly decreasing on the interval I1 = (−9, −1) and
(ii) strictly increasing on the interval I2 = (0, 6).
Indeed, observe that f is differentiable on these open intervals and that

f " (x) = 2x.

48
Since f " (x) < 0 for all x ∈ (−9, −1), f is strictly decreasing on I1 . Similarly, since f " (x) > 0
for all x ∈ (0, 6), f is strictly increasing on I2 .
Example 5.1.4 Let us show that the constant function f (x) = 5 is both increasing and
decreasing on R but is neither strictly increasing nor strictly decreasing on R.
Indeed, we have that
f (x2 ) ≥ f (x1 ) and f (x2 ) ≤ f (x1 )
whenever x1 ∈ R, x2 ∈ R and x2 > x1 . It follows that f is both increasing and decreasing
on R.

On the other hand, it is neither true that

f (x2 ) > f (x1 )

whenever x1 ∈ R, x2 ∈ R and x2 > x1 nor that

f (x2 ) < f (x1 )

whenever x1 ∈ R, x2 ∈ R and x2 > x1 . Hence, f is neither strictly increasing nor strictly


decreasing on R.

5.2 Local extrema and strict local extrema


Let I be an open interval and let f : I → R be a function.
A local maximum of f is a point a ∈ I with the property that

f (a) ≥ f (x)

for all x ‘sufficiently near’ a. The qualification ‘sufficiently near’ can be made precise: there
exists a real number δ > 0 such that f (a) ≥ f (x) for all x in the subset (a − δ, a + δ) of I.
A strict local maximum of a function f : I → R is a point a ∈ I with the property that

f (a) > f (x)

for all x ‘sufficiently near’ a such that x )= a. In other words, there exists a real number
δ > 0 such that f (a) > f (x) for all x in the subset (a − δ, a) ∪ (a, a + δ) of I.
Similarly, a local minimum of f : I → R is a point a ∈ I with the property that

f (a) ≤ f (x)

for all x ‘sufficiently near’ a.


A strict local minimum of a function f : I → R is a point a ∈ I with the property that

f (a) < f (x)

for all x ‘sufficiently near’ a such that x )= a.

49
Local maxima and minima are known collectively as local extrema. A strict local ex-
tremum is necessarily a local extremum. However, the converse of this statement is not
true: there exist local extrema which are not strict.
The following figure illustrates some local extrema graphically. Note that they are all strict
local extrema:

Figure 5.2.1

Example 5.2.2 Consider the function f : R → R defined by f (x) = 5. The graph of f


is a horizontal line, so any point a in the domain of f is both a local maximum and a local
minimum. However, a is neither a strict local maximum nor a strict local minimum.
Example 5.2.3 Consider the modulus function f (x) = |x| and visualise its graph. The
point 0 is a local minimum of f which is in fact a strict local minimum.

5.3 Stationary points


Let I be an open interval and let f : I → R be a differentiable function on I. A point a ∈ I
is called a stationary point of f only if

f " (a) = 0.

For any such point a, the tangent line to the graph of f at (a, f (a)) is a horizontal line
described by the Cartesian equation

y = f (a).

Some stationary points are illustrated below:

Figure 5.3.1

50
A stationary point of f : I → R which is neither a local minimum nor a local maximum is
called a stationary inflection point.

Note that a local maximum or a local minimum of a general function f may not be a
stationary point of f . Indeed, in Example 5.2.3, the point x = 0 is a strict local minimum
of the modulus function, but it is not a stationary point of this function because the latter
is not differentiable at x = 0.

However, if a function f is differentiable on an open interval I, we have the following


important theorem which we will use frequently in this course:

Theorem 5.3.2 Let f be a function that is differentiable on an open interval I. If a


point a ∈ I is a local maximum or a local minimum of f , then a is a stationary point of f ;
i.e., a point a ∈ I such that f " (a) = 0.

Proof Let us prove this theorem by contradiction. Referring to subsection 2.5, we need
to assume the negation of this conditional statement — that is, we need to assume that a
is a local maximum or a local minimum of f and that a is not a stationary point of f —
and reach contradiction.

Indeed, assuming that a is not a stationary point of f , we have f " (a) )= 0. Then the slope
of the graph of f at a is either positive or negative. If this slope is positive, any nearby
point x to the right of a will satisfy f (x) > f (a) and any nearby point x to the left of a will
satisfy f (x) < f (a). Similarly, if this slope is negative, any nearby point x to the right of
a will satisfy f (x) < f (a) and any nearby point x to the left of a will satisfy f (x) > f (a).
Thus, in all cases, we end up contradicting the other assumption that we made, namely,
that a is a local minimum or a local maximum. This completes our proof.

The following example illustrates the usefulness of Theorem 5.3.2.

Example 5.3.3 Find the local extrema of the function f (x) = x4 − 3x2 on R.

We note that f is differentiable everywhere on R. Its derivative is given by

f " (x) = 4x3 − 6x.

The logically equivalent contrapositive version of Theorem 5.3.2 informs us that if x is not
a stationary point of f , then x is not a local extremum of f . Therefore, the only candidates
for such extrema are the stationary points of f . These are solutions of the equation

4x3 − 6x = 0.
Factorising this equation according to 2x(2x2 − 3) = 0, we find that f has stationary points
at √
3
x = 0 and x = ± √ .
2
Note that some of these candidates may not be local extrema; they may be stationary
inflection points. We will return to this example and complete it in the next subsection.

51
5.4 Tests for classifying stationary points
Let I be an open interval and let f : I → R be a twice-differentiable function on I. Also
let a be a stationary point of f .
The Second-Derivative Test: The second-degree Taylor polynomial of f about a pro-
vides an approximation for f in the neighbourhood of a:
1 ""
f (x) 6 f (a) + f (a)(x − a)2 .
2!
Given that (x − a)2 > 0 for all x )= a, it follows that the sign of f "" (a) coincides with the
sign of the difference f (x) − f (a) for all x near a such that x )= a. Using the definitions of
local extrema and strict local extrema encountered in subsection 5.2, we deduce that the
stationary point a is:
(i) a local minimum (in fact, a strict local minimum) if f "" (a) > 0,
(ii) a local maximum (in fact, a strict local maximum) if f "" (a) < 0.
On the other hand, if f "" (a) = 0, the second-degree Taylor polynomial at a merely tells us
that f (x) 6 f (a). Hence, it gives us no information about the difference f (x) − f (a) other
than the fact that this difference vanishes to that particular degree of accuracy. In other
words,
(iii) the second-derivative test fails if f "" (a) = 0.
In order to deal with case (iii), a Taylor polynomial of degree higher than 2 is needed.
However, f may not be differentiable at a more than twice, and even if it is, it is generally
unclear how high the degree of the Taylor polynomial should be in order for this polynomial
to yield conclusive information about the behaviour of f near a. In any case, it is simpler
to deduce the character of the point a by evaluating the function f at nearby points x1 < a
and x2 > a and comparing f (x1 ) and f (x2 ) with f (a).

The First-Derivative Test: Alternatively, we can classify the stationary point a of such
a function f : I → R by considering the behaviour of the first derivative of f near a. This
test is always conclusive and can be summarised by the following rules:

(a) if f " (x) changes sign from negative to the left of a to positive to the right of a, then a
is a local minimum (in fact, a strict local minimum),

(b) if f " (x) changes sign from positive to the left of a to negative to the right of a, then a
is a local maximum (in fact, a strict local maximum),

(c) if f " (x) does not change sign at a, then a is a stationary point of inflection.

Example 5.4.1 Let us use the test based on the first derivative in order to classify the
stationary points of the function f (x) = x4 − 3x2 encountered in Example 5.3.3. Let us
then complete that example by identifying the local extrema of f .

Continuing from Example 5.3.3, the derivative f " can be factorised completely as follows:
! √ √ "! √ √ "
f " (x) = 4x x − 3/ 2 x + 3/ 2 .

52
The way in which the sign of f " behaves over the corresponding subsets of the domain of
f is depicted below:
√ √ √ √ √ √ √ √
−∞ < x < − 3/ 2 − 3/ 2 < x < 0 0 < x < 3/ 2 3/ 2 < x < ∞
− + − +
√ √
It follows by the first-derivative
√ √ test that − 3/ 2 is a strict local minimum, 0 is a strict
local maximum and 3/ 2 is a strict local minimum. Therefore, all the stationary points
of f are local extrema and, in fact, strict local extrema.

Example 5.4.2 Consider the functions on R given by

(i) f (x) = 5, (ii) g(x) = x3 and (iii) h(x) = x4 .

Let us show that the second-derivative test fails at all the stationary points of these func-
tions and then classify these points in an alternative way.
(i) The derivative of f is given by f " (x) = 0. Since f " vanishes everywhere on R, every
point a ∈ R is a stationary point of f . The second derivative of f is given by f "" (x) = 0
and therefore also vanishes everywhere on R. It follows that the second derivative test
fails at each stationary point a of f . In order to classify these points we observe that the
graph of f is a horizontal line. Therefore, each a ∈ R is both a local maximum and a local
minimum but is neither a strict local maximum nor a strict local minimum.
(ii) The derivative of g is given by g " (x) = 3x2 , so the point 0 is the only stationary point
of g. The second derivative of g is given by g "" (x) = 6x. Since g "" (0) = 0 the second
derivative test fails. In order to classify this stationary point, we observe that g(x) < g(0)
for x < 0 and that g(x) > g(0) for x > 0. Hence the stationary point at 0 is neither a
local minimum nor a local maximum, so it is a stationary inflection point. Alternatively,
we observe that g " does not change sign at 0. Hence, by the first derivative test, we reach
the same conclusion.
(iii) The derivative of h is given by h" (x) = 4x3 , so 0 is the only stationary point of h. The
second derivative of h is given by h"" (x) = 12x2 . Since h"" (0) = 0, the second derivative test
fails. In order to classify this stationary point, we observe that h(x) > h(0) for all x )= 0.
Hence 0 is a local minimum and, in fact, a strict local minimum. Alternatively, we note
that h" (x) < 0 for x < 0 and h" (x) > 0 for x > 0. By the first-derivative test, we confirm
our previous conclusion.
Example 5.4.3 Confirm that the point 0 is a stationary point of the function
f (x) = cos(x) and then classify this point by using the Taylor polynomial

x2
P2 (x) = 1 −
2
of f about 0.
The first-derivative of f is given by

f " (x) = −sin(x),

so 0 is indeed a stationary point of f .

53
Using the Taylor polynomial given in the question, we obtain the approximation
x2
f (x) 6 P2 (x) = 1 − .
2
Noting that f (0) = 1, we deduce that for all x sufficiently near zero such that x )= 0 the
difference f (x) − f (0) satisfies
x2
f (x) − f (0) 6 −
< 0.
2
Hence the stationary point 0 is a local maximum and, in fact, a strict one.
For comparison, let us classify this stationary point by using the second-derivative test. In
this case, this test is conclusive and provides the fastest approach.
The second derivative of f is given by
f "" (x) = [−sin(x)]" = −cos(x).

At the stationary point x = 0, we have that


f "" (0) = −1 < 0.

Thus, we confirm that this point is a strict local maximum.


Example 5.4.4 Let us show that the function f (x) = (ex − 1)3 has a single stationary
point at 0 and that the second-derivative test fails there.
The first derivative of f is given by
f " (x) = 3(ex − 1)2 ex .

Setting this expression equal to zero, we see that a stationary point must satisfy
(ex − 1)2 = 0 or ex = 0.
The second equation has no solution and the first equation has a unique solution when
ex = 1; that is, when x = 0. Hence the point 0 is the only stationary point of f .

The second derivative of f is given by


f "" (x) = 6(ex − 1)(ex )2 + 3(ex − 1)2 ex .

At the stationary point x = 0, this expression vanishes,


f "" (0) = 0,
so the second derivative test fails.
In order to classify this point, we observe that f (a) < f (0) for a < 0 and f (a) > f (0) for
a > 0. Therefore, the stationary point at 0 is a stationary inflection point.
The same conclusion is reached via the first-derivative test by observing that f " (x) does
not change sign at 0.
Alternatively, we may use a Taylor polynomial Pn about 0 where n ≥ 3. It turns out
that P3 suffices. You may find it a useful exercise to construct P3 and confirm that 0 is a
stationary inflection point of f .

54
5.5 Convex and concave functions
A function f defined on an interval I is called convex only if the line segment joining any
two points on the graph of f lies above or on this graph. It is called strictly convex if
the line segment joining any two (distinct) points on the graph of f lies above this graph.
Similarly, a function f defined on an interval I is called concave only if the line segment
joining any two points on the graph of f lies below or on this graph. It is called strictly
concave if the line segment joining any two (distinct) points on the graph of f lies below
this graph. Various cases of convex and concave functions are illustrated below. Note that
the first graph corresponds to a strictly convex function:

Figure 5.5.1

An alternative description of concave and convex functions utilises the idea of a convex
set: A convex set in R2 is a subset S of R2 such that for any two points (x1 , y1 ) and (x2 , y2 )
in S, the line segment joining (x1 , y1 ) and (x2 , y2 ) lies entirely in S. A few illustrations of
convex and non-convex sets in R2 are given below:

Figure 5.5.2

Using this idea, we define a convex function f : R → R as a function with the property
that the set S of points lying above or on the graph of f in R2 is a convex set. Similarly,
we define a concave function f : R → R as a function with the property that the set S
of points lying below or on the graph of f in R2 is a convex set.

Figure 5.5.3

55
Example 5.5.4 Without sketching its graph, visualise the function f : R → R given by
f (x) = x2 . The set of points lying above or on the graph of f in R2 is a convex set. Hence,
f is a convex function. On the other hand, the function g : R → R given by g(x) = −x2
is a concave function, because the set of points lying below or on the graph of g in R2 is a
convex set.
Note that the definitions of convexity and concavity do not require the function f to be
differentiable on the interval I. However, if f is differentiable on an open interval I, we
have the following additional description:
A differentiable function f on an open interval I is convex if and only if its derivative
f " (x) is an increasing function on I; that is,

f " (x2 ) ≥ f " (x1 )

whenever x1 ∈ I, x2 ∈ I and x2 > x1 . In this case, all the tangent lines of the graph of f
lie below or on this graph.
Similarly, a differentiable function f on an open interval I is concave if and only if its
derivative f " (x) is a decreasing function on I; that is,

f " (x2 ) ≤ f " (x1 )

whenever x1 ∈ I, x2 ∈ I and x2 > x1 . In this case, all the tangent lines of the graph of f
lie above or on this graph.

Figure 5.5.5

If f is twice-differentiable on an open interval I we have yet another description:


A twice-differentiable function f on an open interval I is convex if and only if its second
derivative f "" (x) is non-negative on I; that is,

f "" (x) ≥ 0

for all x ∈ I.
Similarly, a twice-differentiable function f on an open interval I is concave if and only if
its second derivative f "" (x) is non-positive on I; that is,

f "" (x) ≤ 0

for all x ∈ I.

56
Remark 5.5.6 Note that the above definitions do not disagree with the statements
made in section 4.1 of our Calculus Textbook that ‘if f "" (x) > 0, then f is convex’ and ‘if
f "" (x) < 0, then f is concave’.
Example 5.5.7 Consider the functions f : R → R given by (i) f (x) = −x2 , (ii) g(x) = x3 ,
(iii) h(x) = x4 and (v) k(x) = −2x whose graphs are sketched below. Let us classify these
functions as convex, concave or neither by using the above tests.

Figure 5.5.8

We note that all these functions are twice-differentiable on R, so the above tests are all
applicable. The test based on the second derivative is generally the simplest, so it is the
one we will use below:
(i) The second derivative of f is given by f "" (x) = (−2x)" = −2. Since f "" (x) < 0 for all
x ∈ R, f is a concave function.
(ii) The second derivative of g is given by g "" (x) = (3x2 )" = 6x. Thus, it is neither true
that g "" (x) ≥ 0 for all x ∈ R nor that g "" (x) ≤ 0 for all x ∈ R. It follows that g is neither a
convex nor a concave function.
(iii) The second derivative of h is given by h"" (x) = (4x3 )" = 12x2 . Since h"" (x) ≥ 0 for all
x ∈ R, h is a convex function. On the other hand, it is not true that h"" (x) ≤ 0 for all
x ∈ R, so h is not a concave function.
(iv) The second derivative of k is given by f "" (x) = (−2)" = 0. Since we have that f "" (x) ≥ 0
and f "" (x) ≤ 0 for all x ∈ R, f is both a convex and a concave function.

5.6 Inflection points


If the graph of f changes from strictly convex to strictly concave or from strictly concave
to strictly convex at a point a, then a is called an inflection point of f .

Figure 5.6.1

57
Note that it is not necessary for f to be differentiable at an inflection point a. However, if
f is twice-differentiable on an open interval I that contains a, then we have the following
additional description:
A point a belonging to the domain of a twice-differentiable function f is an inflection
point only if f "" (a) = 0 and f "" (x) changes sign at a. It is a stationary inflection point
only if we also have that f " (a) = 0.
Example 5.6.2 Using the above test, let us investigate if the stationary points of the
functions (i) f : R → R given by f (x) = x3 and (ii) g : R → R given by g(x) = x4 are
inflection points.
We note that these functions are twice-differentiable on R, so the above test is applicable.
(i) The first and second derivatives of f are given by

f " (x) = 3x2 and f "" (x) = 6x.

Hence, x = 0 is the only stationary point of f and we also see that f "" (0) = 0.
The sign of f "" behaves over R as shown below:

−∞ < x < 0 0<x<∞


− +

Since f "" (0) = 0 and f "" (x) changes sign at 0, we conclude that 0 is an inflection point.
Moreover, we have that f " (0) = 0, so 0 is a stationary inflection point.
(ii) The first and second derivatives of g are given by

g " (x) = 4x3 and g "" (x) = 12x2 .

Hence, x = 0 is the only stationary point of g and we also see that g "" (0) = 0.
The sign of g "" behaves over R as shown below:

−∞ < x < 0 0<x<∞


+ +

Since g "" (x) does not change sign at 0, we conclude that 0 is not an inflection point.

5.7 Exercises for self study


Exercise 5.7.1 Consider the function f : R → R defined by

 1+x x≤1
f (x) =

4 − 2x x>1

(i) Sketch the graph of f . Is the point x0 = 1 a local maximum of f ? Is this point a strict
local maximum of f ?
(ii) By considering the left and right derivatives of f at x0 = 1, decide whether the point
x0 = 1 is a stationary point of f .

58
Exercise 5.7.2 Consider the function f : R → R given by

f (x) = x4 − 4x3 + 4x2 .

(i) Find f " and f "" and determine how the sign of each of these derivatives behaves over
the domain of f .
(ii) Identify and classify the stationary points of f using either the first-derivative test or
the second-derivative test.

Exercise 5.7.3 Consider the function f : R → R given by

f (x) = x3 − 3x.

(i) Find the stationary points of f (x). For each stationary point, determine whether it
is a local maximum, a local minimum or a stationary inflection point by using the first-
derivative test.
(ii) Confirm your classification of each stationary point by using the second-derivative test.
(iii) Find the largest open intervals of R where the function f is increasing, strictly de-
creasing, convex, concave.

Exercise 5.7.4 Consider the function g : R → R defined by



 1 x<2
g(x) =

1 + (x − 2)2 x≥2

(i) Sketch the graph of g. On the basis of your graph: Is g a convex function on R? Is the
point x0 = 2 a local minimum of g? Is this point a strict local minimum of g?
(ii) By considering the left and right derivatives of g at x0 = 2, decide whether the point
x0 = 2 is a stationary point of g.

5.8 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
The only relevant section of our Calculus Textbook is section 4.1.

6 One-Variable Calculus, Part 4 of 7


6.1 Asymptotes and graph sketching
Consider a function f : A → R where the set A ⊆ R is either an interval I or a finite union
I1 ∪ I2 ∪ · · · ∪ In of disjoint intervals {In }, n ∈ N. Assume that f is twice differentiable
in the interior of each of these intervals. The following methodology aims to facilitate the
sketching of the corresponding graph:

59
(i) Identify each point b ∈ R which is an endpoint of an interval in the collection {In }.
Do this regardless of whether the point b belongs to A or not. For example, if
A = (−8, 5], the points of interest are −8 and 5. If A = (−∞, 3) ∪ (3, ∞), the
relevant point is 3. If b belongs to A, evaluate f at b; otherwise, investigate how f
behaves as x approaches b.
(ii) Find all the points of intersection of the graph of f with the coordinate axes.
(iii) Find f " and hence any stationary points of f . Also investigate how the sign of f " varies
over A in order to identify subsets of A where f is increasing or strictly increasing
and subsets of A where f is decreasing or strictly decreasing. Note that this process
amounts to the first-derivative test for classifying stationary points.
(iv) In addition, find f "" and investigate how its sign varies over A in order to identify
subsets of A where f is convex and subsets of A where f is concave. Hence find any
points of inflection.
(v) If A extends to ∞ or −∞, investigate how f behaves for large positive or large
negative values of x. Then combine the above findings in order to sketch the graph
of f .

Remark 6.1.1 Some of these steps may be omitted in practice in order to reduce the time
spent on sketching a graph. However, following the entire procedure as laid out above is a
good way of gaining affinity with the relevant concepts. With this in mind, the following
two examples go through all these steps.
Example 6.1.2 Let us apply this methodology to the function
1
f : (−∞, 0) ∪ (0, ∞) → R given by f (x) = 2x − .
x
(i) We are interested in the point 0 ∈ R which does not belong to the domain of f . As x
approaches 0 from the left, that is, as
x → 0− ,
we see that f (x) is positive and large:
f (x) → ∞.
As x approaches 0 from the right, that is, as
x → 0+ ,
we see that f (x) is negative and large:
f (x) → −∞.
We say that the graph of f has a vertical asymptote at 0. This is the line with Cartesian
equation x = 0.
(ii) Given that 0 ∈ R does not belong to the domain of f , no y-intercept exists for this
function. In order to see if f has any x-intercepts, we solve the equation
1
2x − = 0.
x
60
This is equivalent to the quadratic equation
2x2 − 1 = 0
which
! implies
" that
! there
" are two x-intercepts; namely the points with Cartesian coordinates
1 1
− √2 , 0 and √2 , 0 .

(iii) The derivative of f is the function


1
f " (x) = 2 + .
x2
Since f " (x) > 0 for all x in the domain of f , there are no stationary points.
The way in which the sign of f " behaves over the disjoints intervals (−∞, 0) and (0, ∞)
comprising the domain of f is recorded in the table below:
x<0 x>0
+ +
Thus, in each of these intervals, f is strictly increasing. Recall that there is a vertical
asymptote separating the corresponding pieces of the graph of f .
(iv) The second derivative of f is the function
2
f "" (x) = − .
x3
Since f "" (x) )= 0 for all x in the domain of f , there are no inflection points.
The way in which the sign of f "" behaves over the intervals (−∞, 0) and (0, ∞) is recorded
below:
x<0 x>0
+ −
It follows that f is convex to the left of the vertical asymptote x = 0 and concave to the
right of this asymptote.
Note that although f "" (x) changes sign at 0, there is no inflection point at 0 because 0 is
not in the domain of f .
(v) Finally, we have that
f (x) → ∞ as x → ∞ and f (x) → −∞ as x → −∞.

Taking a closer look at the way in which f (x) behaves for large positive and large negative
values of x, we observe the following:
1 1
2x − < 2x and 2x − 6 2x as x → ∞
x x
and
1 1
2x − > 2x and 2x − 6 2x as x → −∞.
x x
This means that the graph of f approaches the line y = 2x from below as x → ∞ and from
above as x → −∞. The line y = 2x is called a slant asymptote of the graph of f .
The relevant graph is presented below:

61
Figure 6.1.3

Example 6.1.4 Let us sketch the graph of the polynomial function

f : R → R defined by f (x) = x4 − 4x2 .

(i) The domain of f has no endpoints, so we proceed to step (ii).


(ii) For the y-intercept, we set x = 0 and obtain the point with Cartesian coordinates

(0, 0).

This is also an x-intercept. In order to find if additional x-intercepts exist, we solve the
equation
x4 − 4x2 = 0.
We factorise the left hand side completely,

x4 − 4x2 ≡ x2 (x2 − 4) ≡ x2 (x + 2)(x − 2),

and obtain two additional x-intercepts; namely the points with Cartesian coordinates

(−2, 0) and (2, 0).

(iii) The derivative of f is the function

f " (x) = 4x3 − 8x.

Any stationary point of f satisfies the equation

4x3 − 8x = 0,

62
which is factorised as follows:
√ √
4x(x + 2)(x − 2) = 0.

Thus, there are three stationary points whose y-coordinates can be found by using the
relation y = x4 − 4x2 . These are:
√ √
(− 2, −4), (0, 0) and ( 2, −4).

The sign of f " varies over the domain R in the following way:
√ √ √ √
x<− 2 − 2<x<0 0<x< 2 x> 2
− + − +
√ √
It follows that f is decreasing to the left√of (− 2, −4), increasing between (−√ 2, −4)
and (0, 0), decreasing between (0, 0) and ( 2, −4) and increasing to the right of ( 2, −4).
Hence,
√ √
(− 2, −4) is a local minimum, (0, 0) is a local maximum and ( 2, −4) is a local minimum.

(iv) The second derivative of f is the function

f "" (x) = 12x2 − 8,

so any inflection point of f satisfies the equation

12x2 − 8 = 0.

It follows that there are two inflection points whose y-coordinates are obtained by using
y = x4 − 4x2 . These are:
√ √ √ √
(− 2/ 3, −20/9) and ( 2/ 3, −20/9).

The sign of f "" varies over the domain R in the following way:
√ √ √ √ √ √ √ √
x < − 2/ 3 − 2/ 3 < x < 2/ 3 x > 2/ 3
+ − +
√ √ √
Thus, f is convex to the left of (− √23 , − 20
9
), concave between (− √23 , − 20
9
) and ( √23 , − 20
9
)

and convex to the right of ( √23 , − 20
9
).

(v) Finally, we have that


f (x) → ∞ as x → ±∞.
More specifically, for large positive and large negative values of x, we see that the dominant
term of f is x4 .
The graph of f follows:

63
Figure 6.1.5

6.2 Conics in standard position and orientation


In three-dimensional space, a conic section or, more briefly, a conic is a curve which
arises as the intersection of a certain conical surface with a plane. Within the context
of the two-dimensional space R2 , it turns out that any such curve can be described by a
Cartesian equation of the form

Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0

for some given real numbers A, B, C, D, E and F .


We will restrict attention to the so-called non-degenerate conics. These consist of the
parabola, the circle, the ellipse and the hyperbola. Moreover, for the time being, we
will only describe conics in standard position and orientation. This qualification
implies a particular position and orientation with respect to our Cartesian coordinate
system.
Parabola in Standard Position and Orientation: This is a conic described by a
Cartesian equation of the simple quadratic form

x2 = ay

for some real number a )= 0:

64
Figure 6.2.1

Circle in Standard Position and Orientation: This is a conic described by a Cartesian


equation of the form
x2 + y 2 = r 2
for some real number r > 0:

Figure 6.2.2

Ellipse in Standard Position and Orientation: This is a conic described by a Carte-


sian equation which has the form of a sum of squares:

x2 y 2
+ 2 =1
a2 b
for some real numbers a > 0 and b > 0:

Figure 6.2.3

Note that if a = b we recover the equation of a circle of radius r = a = b in standard


position and orientation.
Hyperbola in Standard Position and Orientation: This is a conic described by a
Cartesian equation which has the form of a difference of squares:

x2 y 2
− 2 =1
a2 b
for some real numbers a > 0 and b > 0.
Its graph follows:

65
Figure 6.2.4

b b
The lines y = x and y = − x are the slant asymptotes of the hyperbola.
a a
Remark 6.2.5 A quick way of deriving the Cartesian equations of these asymptotes from
the Cartesian equation of the hyperbola is by factorising the difference of squares on the
left hand side: !x y " !x y "
+ − = 1.
a b a b
Now notice that if any of these factors became zero at some point (x, y), the above equation
would be violated. Hence, no point (x, y) on the hyperbola can satisfy
x y
± = 0,
a b
which implies that the lines described by the Cartesian equations
b
y=± x
a
can never intersect the graph of the hyperbola. These lines are indeed the slant asymptotes
of this graph.

Remark 6.2.6 Later in the course, we will discuss conics with a general orientation with
respect to the Cartesian axes. For now, it suffices to give a few examples of conics that are
not in standard position or orientation. These examples are by no means exhaustive:

Example 6.2.7 A circle of radius r > 0 centred at a point (x0 , y0 ) )= (0, 0) is described
by the Cartesian equation
(x − x0 )2 + (y − y0 )2 = r2 .
The illustration below corresponds to the case where x0 > 0 and y0 > 0:

66
Figure 6.2.7

Example 6.2.8 The conic described by the Cartesian equation

(x − x0 )2 (y − y0 )2
+ =1
a2 b2
for some real numbers a > 0, b > 0 and (x0 , y0 ) )= (0, 0) is an ellipse. It is in standard
orientation with respect to the axes but is translated by x0 in the x-direction and by y0 in
the y-direction. In the graph below, x0 > 0 and y0 > 0:

Figure 6.2.8

Example 6.2.9 For (x0 , y0 ) )= (0, 0), the parabola

x − x0 = (y − y0 )2

is not in standard position and orientation. The same is true for the hyperbola

xy = 3.

The graphs of these conics follow. The graph of the parabola is illustrated using x0 > 0
and y0 > 0:

67
Figure 6.2.9

Note that instead of slant asymptotes, the hyperbola has a vertical and a horizontal asymp-
tote. This is due to its non-standard orientation.

6.3 Global extrema


Consider a function f : A → R defined on a set A ⊆ R. The set A is assumed to be either
an interval I or a finite union I1 ∪ I2 ∪ ... ∪ In of disjoint intervals {In }, n ∈ N.

A global maximum of f is a point a ∈ A such that f (a) ≥ f (x) for all x ∈ A.

A strict global maximum of f is a point a ∈ A such that f (a) > f (x) for all x ∈ A such
that x )= a.

Similarly, a global minimum of f is a point a ∈ A such that f (a) ≤ f (x) for all x ∈ A.

A strict global minimum of f is a point a ∈ A such that f (a) < f (x) for all x ∈ A such
that x )= a.

Global maxima and minima will be collectively referred to as global extrema. We will
also use the term strict global extrema where appropriate. A strict global extremum is
necessarily a global extremum but the converse may not be true. Some typical examples
of strict and non-strict global extrema are illustrated below:

Figure 6.3.1

The first graph depicts a function which has two non-strict global maxima. Note that they
are both strict local maxima. The second and third graphs show a strict and a non-strict
global minimum, respectively.

68
Example 6.3.2 Consider the functions (i) f : R → R given by f (x) = |x|, (ii) g :
(−1, 3] → R given by g(x) = x(x − 1)(x − 2) and (iii) h : R → R given by h(x) = 5. Let us
identify any global maxima and minima for each of these functions based on the following
graphs:

(i) The function f has no global maximum but it has a global minimum at x = 0. This is
a strict global minimum.

(ii) The function g has a global maximum at the right endpoint of its domain; that is, at
x = 3. This is a strict global maximum. On the other hand, g has no global minimum.

(iii) Every point x is both a global minimum and a global maximum of h. However, none
of these points is a strict global extremum.

Remark 6.3.3 The definition of a local extremum given in subsection 5.2 refers to an
open interval and hence can only be applied to an interior point of the domain A of f .
Given that a global extremum of f can also occur at an endpoint of A, the need arises to
introduce the corresponding local concept. This is essential in order to avoid the awkward
situation of a function having a global extremum at an endpoint of its domain which — in
the absence of the relevant concept — cannot be called a local one.

To this end, let I be any one of the disjoint intervals that may comprise the domain A of
f : An endpoint b of I is a local maximum of f only if b belongs to I and

f (b) ≥ f (x)

for all x ‘sufficiently near’ b. For a left endpoint b ∈ I, the qualification ‘sufficiently near’
means that there exists a real number δ > 0 such that f (b) ≥ f (x) for all x in the subset
[b, b + δ) of I. For a right endpoint b ∈ I, the subset [b, b + δ) needs to be replaced by
(b − δ, b].

Similarly, an endpoint b of I is a strict local maximum of f only if b belongs to I and

f (b) > f (x)

for all x ‘sufficiently near’ b such that x )= b. For a left endpoint b ∈ I, this means that
there exists a real number δ > 0 such that f (b) > f (x) for all x in the subset (b, b + δ) of
I. For a right endpoint b ∈ I, the subset (b, b + δ) needs to be replaced by (b − δ, b).

By reversing the direction of the inequalities ‘f (b) ≥ f (x)’ and ‘f (b) > f (x)’ in the above
definitions, we obtain corresponding definitions for a local minimum and a strict local
minimum that occurs at an endpoint b of I.

69
These concepts are illustrated graphically below:

Figure 6.3.4

The first graph depicts a strict local maximum which occurs at the left endpoint of the
domain. This is not a global maximum. The second graph shows a non-strict local max-
imum (which is also a non-strict local minimum!) at the right endpoint of the domain.
This point is a non-strict global maximum. The third graph shows a strict local minimum
at the left endpoint of the domain. This is a global minimum which is not a strict global
minimum.

6.4 Optimisation
Consider the problem of finding the maximum value of a function f on an interval I ⊆ R.
This interval may be open, closed, or neither.

In general, our maximisation problem may not have a solution because, simply, a global
maximum may not exist. Even if a global maximum exists, the problem of identifying
where it occurs may not be straightforward.

However, if f is differentiable in the interior of the interval I, then the places at which
a global maximum can occur are limited considerably. The following argument explains
why:

Recall from Theorem 5.3.2 that if a function is differentiable on an open interval, then a
local maximum can only occur at a stationary point of this function. In the case we are
discussing here, the interior of the interval I is by definition an open interval so, given
that f has been assumed differentiable in this interior, the conditions of Theorem 5.3.2 are
satisfied. This theorem implies that among points in the interior of I, the stationary points
of f are the only candidates for a local maximum. Moreover, since a global maximum is
also a local maximum, it follows from Theorem 5.3.2 that if f admits a global maximum
and this maximum occurs in the interior of I, then it has to be among the stationary points
of f . If, on the other hand, f admits a global maximum and this maximum does not occur
in the interior of I, then the only other places where it can occur are those endpoints of
I that belong to I. Hence, we have deduced that if f admits a global maximum, the only
places where this can occur are the stationary points of f in the interior of I and those
endpoints of I that belong to I.

In the more general case where the domain A of a function f : A → R is a finite union of
disjoint intervals {In } and f is differentiable in the interior of each In , the above conclusion
still holds: if f admits a global maximum, the only places where this can occur are the

70
stationary points of f in the interior of the intervals {In } as well as any endpoints of these
intervals that belong to A.

This result is also applicable to the problem of minimising f : A → R. In fact, it should


be noted that minimising f on A is equivalent to maximising −f on A. Maximisation and
minimisation problems are collectively known as optimisation problems.

The steps required for dealing with a general optimisation problem are summarised below.
Let us emphasise that this methodology relies on the assumption that the function f is
differentiable in the interior of the disjoint intervals {In } that comprise its domain A.

Step 1: Find the stationary points of f in the interior of each interval In by solving the
equation f " (x) = 0. These points are candidates for a global extremum.

Step 2: Identify all the endpoints of the disjoint intervals {In }. Among these endpoints,
those that belong to A are candidates for a global extremum.

Step 3: If each disjoint interval In contains both its endpoints — that is, if each such
interval is a closed set of the form [a, b] — then a global extremum necessarily exists on
A and is among the points found in steps (i) and (ii). The proof of this statement is not
part of our syllabus but it can be established using Theorem 6.4.4 below. On the other
hand, if at least one of the intervals In does not have the form [a, b] — that is, if it has the
form (a, b] or [a, b) or (a, b) or (−∞, b] or (−∞, b) or [a, ∞) or (a, ∞) or (−∞, ∞) — then
there is no guarantee that a solution to the optimisation problem exists. We may then
need to employ ad hoc methods in order to decide whether an optimal solution exists or
not. Ideally, sketching the graph of f will settle this issue; however, this task may be too
complicated, in which case considering the sign of the derivative f " may be of considerable
help. In any case, if a global extremum exists, then it must be among the candidates found
in steps (i) and (ii).

Remark 6.4.1 The following theorems may be useful for certain types of optimisation
problems. If they are applicable, they guarantee the existence of at least one kind of global
extremum. The proof of these theorems goes beyond the scope of our course.

Theorem 6.4.2 If f : A → R is a concave differentiable function on an open set A and a


is a stationary point of f , then a is a global maximum of f on A.

Theorem 6.4.3 If f : A → R is a convex differentiable function on an open set A and a is


a stationary point of f , then a is a global minimum of f on A.

Theorem 6.4.4 If f : A → R is a continuous function on a closed set A = [a, b] then both


a global maximum and a global minimum exist on A.

The reason that we require the set A to be open in Theorems 6.4.2 and 6.4.3 is that we
have defined differentiability only on open sets. Typical cases where theorems 6.4.2, 6.4.3
and 6.4.4 are relevant are illustrated below:

71
Figure 6.4.5

Example 6.4.6 Consider the function f : [−1, ∞) → R defined by


 2

 x x ∈ [−1, 2]




f (x) = x+2 x ∈ (2, 4]




 −1

x ∈ (4, ∞)
x

Without relying on the graph of f , determine whether f has any global extrema on its
domain [−1, ∞). If it does, find where.

Given the definition of f , it is natural to regard its domain A = [−1, ∞) as the union
I1 ∪ I2 ∪ I3 of the disjoint intervals I1 = [−1, 2], I2 = (−2, 4] and I3 = (4, ∞). Clearly, f
is differentiable in the interior of these intervals, so we can apply the methodology given
above.

Step 1: The derivative of f in the interior (−1, 2) of the interval I1 = [−1, 2] is

f " (x) = (x2 )" = 2x.

It follows that f has a stationary point at x = 0, which is our first candidate for a global
extremum.

The derivative of f in the interior (2, 4) of the interval I2 = (2, 4] is

f " (x) = (x + 2)" = 1.

This can never be zero, so there are no stationary points — and hence no candidates for a
global extremum — in the interior of I2 .

72
Finally, the derivative of f in the open interval I3 = (4, ∞) is
1
f " (x) = .
x2

This is always greater than zero, so there are no stationary points — and hence no candi-
dates for a global extremum — in the interior of I3 , either.

Step 2: We identify the endpoints of the intervals I1 , I2 and I3 that belong to A = [−1, ∞).
These are the points −1, 2 and 4. Together with the stationary point 0, they are the only
candidates for a global extremum. Let us evaluate f at each of these candidates in order
to eliminate some possibilities. We find:

f (−1) = 1, f (0) = 0, f (2) = 4, f (4) = 6.

Step 3: These values imply that a global maximum of f can only occur at the point x = 4.
Similarly, a global minimum of f can only occur at the point x = 0. Unfortunately, the
intervals I1 , I2 and I3 are not all closed intervals of the form [a, b], so we have no guarantee
that global extrema actually exist for our optimisation problem. In addition, we realise
that none of the theorems 6.4.2 - 6.4.4 is directly applicable, so we need to improvise.

For example, we observe that f (x) = − x1 is negative on I3 = (4, ∞). This means that the
point x = 0 (at which f takes the value zero) cannot be a global minimum. Since this point
is the only candidate for a global minimum, we conclude that f has no global minimum.
Regarding a global maximum, we observe that the maximum value of f (x) = − x1 on the
interval I3 = (4, ∞) cannot exceed zero. This means that the point x = 4 (at which f takes
the value six) remains a candidate for a global maximum. Similarly, we observe that the
maximum value of f (x) = x2 on the interval I1 = [−1, 2] is four, which is less than the value
that f takes at our candidate x = 4. Finally, we are left with the function f (x) = x + 2
on the interval I2 = (2, 4]. Since f is strictly increasing on (2, 4], its maximum value is
attained at the right endpoint of I2 . This means that x = 4 is the global maximum of f
and that it is in fact a strict global maximum.

For clarity, the graph of f is sketched below. If we had used this graph from the beginning,
we would have arrived at the solution more quickly. However, following the methodology
and using a few ad hoc arguments also has its merits.

73
6.5 Exercises for self study
Exercise 6.5.1 Consider the function f : I → R given by

f (x) = x3 − 3x.

Find if global extrema exist and where they occur for each of the following cases:

(i) I = (−∞, 0), (ii) I = (−3, 4], (iii) I = [−2, 1].

Exercise 6.5.2 Find if global extrema exist and where they occur for each of the following
functions. Hint: For the first case below, it is not necessary to find the stationary points
of the function.
f : R → R defined by f (x) = x5 − 3x4 + 7x − 18,
g : R → R defined by g(x) = x4 + 6x2 + 5,
h : [−2, 3] → R defined by h(x) = ex − x.

Exercise 6.5.3 Following the methodology presented in subsection 6.1 of the Lecture
Notes, sketch the graph of the function
1
f : (−∞, 0) ∪ (0, ∞) → R defined by f (x) = x + .
2x

Exercise 6.5.4 Consider the profit function π : [0, 2] → R defined by

π(x) = R(x) − C(x)

where R(x) = 4x is the revenue function and C(x) = x3 − x2 − 5x is the cost function.
Find the maximum value of π(x) on [0, 2] and the corresponding quantity x.

6.6 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
The material presented in sections 2.14, 4.2 and 4.3 of our Calculus Textbook is relevant.
The exercises found in sections 2.15 (only exercise 7) and 4.5 (up to exercise 8) are optional
but useful.

7 One-Variable Calculus, Part 5 of 7


7.1 The inverse function
Let f : X → Y be a bijective function. The sets X and Y are assumed to be subsets of R.
There is a unique function f −1 : Y → X, called the inverse function of f , which is
defined by the property that

y = f (x) if and only if x = f −1 (y).

74
This definition is equivalent to saying that the composite function f −1 ◦ f : X → X is the
identity function on X and the composite function f ◦ f −1 : Y → Y is the identity function
on Y ; that is,
f −1 (f (x)) = x for all x ∈ X
and
f (f −1 (y)) = y for all y ∈ Y.

Figure 7.1.1

Example 7.1.2 Consider the function f : [0, 5) → [3, 13) defined by

f (x) = 2x + 3

and let the function g : [3, 13) → [0, 5) be defined by


x−3
g(x) = .
2
Show that g is the inverse of f .

Using the definition of an inverse function, we need to prove that the following statements
are both true:
if y = f (x), then x = f −1 (y)
and
if x = f −1 (y), then y = f (x).
y−3
Indeed, assuming that y = f (x) = 2x + 3, we solve for x and find that x = = f −1 (y).
2
−1 y−3
Similarly, assuming that x = f (y) = , we solve for y and find that y = 2x+3 = f (x).
2
Alternatively, we can show that

g(f (x)) = x for all x ∈ [0, 5)

and
f (g(x)) = x for all x ∈ [3, 13).

75
Indeed, for all x in the domain [0, 5) of f , we have that

(2x + 3) − 3
g(f (x)) = g(2x + 3) = =x
2
and, for all x in the domain [3, 13) of g, we have that
& ' & '
x−3 x−3
f (g(x)) = f =2 + 3 = x.
2 2

If the graphs of f : X → Y and f −1 : Y → X are sketched on the same Cartesian plane


R2 , they are reflections of each other in the diagonal line through the origin. In order to
understand why this property arises, let us introduce Cartesian coordinates (s, t) on R2
and let us denote the elements of the sets X and Y by x and y, respectively. The above-
mentioned symmetry is a consequence of the fact that (by the very definition of an inverse
function) a point (s, t) = (x, y) is on the graph of f if and only if (s, t) = (y, x) is on the
graph of f −1 .

As an illustration, consider a random bijective function f : [1, 4] → [5, 6] and its inverse
f −1 : [5, 6] → [1, 4]:

Figure 7.1.3

7.2 Some elementary bijective functions and their inverses


Consider a bijective function f : X → Y defined by an expression of the form f (x). As
we saw in Example 7.1.2, in order to find an expression f −1 (y) for the inverse function
f −1 : Y → X, we simply need to make x the subject of the equation y = f (x). Then, we
can relabel the general element of the domain of f −1 and denote it by x, as it is common.

Below, we apply this approach to some elementary bijective functions in order to obtain
explicit expressions for their inverses.

76
Linear Functions Consider the linear function f : R → R defined by the rule

f (x) = ax

where a )= 0 is a real constant. By making x the subject of y = ax we find that


y
x= .
a
Hence, after relabelling the general element of its domain, the inverse function f −1 : R → R
is given by the rule
x
f −1 (x) = .
a
The particular case a = 2 is illustrated below:

Figure 7.2.1

Affine Functions Consider the so-called affine function f : R → R defined by

f (x) = ax + b

where a )= 0 and b are real constants. By making x the subject of y = ax + b we find that
y−b
x= .
a
Hence, the inverse function f −1 : R → R is given by
x−b
f −1 (x) = .
a
The graph corresponding to the case a = 2, b = 1 follows:

77
Figure 7.2.2

Note that a linear function can be considered as a special case of an affine function, where
b = 0. In other words, the graph of a linear function is a line that goes through the origin
and that of an affine function is a line that has a non-zero y-intercept (0, b), in general.

Odd Positive Integer Powers Consider the function f : R → R defined by

f (x) = xn

where n is an odd natural number. By making x the subject of y = xn we find that


1
x = yn.

Hence, the inverse function f −1 : R → R is described by the expression


1
f −1 (x) = x n .

The case n = 3 is illustrated below:

Figure 7.2.3

78
Even Positive Integer Powers of Non-Positive Inputs Consider the function
f : (−∞, 0] → [0, ∞) defined by
f (x) = xn
where n is an even natural number. Note that f is bijective. By making x the subject of
y = xn we obtain two possible expressions:
1
x = ±y n .

However, the fact that the inverse of a bijective function is unique implies that only one of
these expressions is correct. Indeed, before relabelling, we observe that x ≤ 0. This means
that
1
x = −y n .
We deduce that the inverse function f −1 : [0, ∞) → (−∞, 0] is given by
1
f −1 (x) = −x n .

The case n = 2 results in the following graph:

Figure 7.2.4

Even Positive Integer Powers of Non-Negative Inputs Consider the function


f : [0, ∞) → [0, ∞) defined by
f (x) = xn
where n is an even natural number. As in the previous case, f is bijective. By making x
the subject of y = xn we obtain
1
x = ±y n .
We now observe that x ≥ 0. This implies that
1
x = yn.

Therefore, the inverse function f −1 : [0, ∞) → [0, ∞) is given by


1
f −1 (x) = x n .

The case n = 2 is illustrated below:

79
Figure 7.2.5

Exponential functions Consider the function expb : R → (0, ∞) defined by

expb (x) = bx

where b is a real number strictly greater than 1. We will call expb the exponential
function with base b. The base b need only be greater than 0 and not equal to 1 but it
suffices to impose the stricter condition b > 1.

As an optional exercise, you may find it useful to investigate why the condition b > 1 is
reasonable. In particular, check what happens in the cases where (i) b < 0, (ii) b = 0 and
(iii) b = 1. Then note that if 0 < b < 1, we have that b = q −1 for some q > 1, which
implies that bx = q −x . What does this tell us about the relationship between the graphs of
the functions expb (x) with 0 < b < 1 and expq (x) with q > 1?

The inverse function


logb : (0, ∞) → R
of the exponential function expb : R → (0, ∞) is called the logarithmic function with
base b, where b > 1.

Being an inverse function of expb , the function logb satisfies the following properties:

logb (expb (x)) = x for all x ∈ R


and
expb (logb (x)) = x for all x ∈ (0, ∞).

The graphs of expb and logb are shown below for the case b = 2:

80
Figure 7.2.6

Note that any point (x, y) that satisfies the logarithmic equation y = log2 (x) also satis-
fies the exponential equation exp2 (y) = x. Similarly, any point (x, y) that satisfies the
exponential equation y = exp2 (x) also satisfies the logarithmic equation log2 (y) = x.

Several additional properties follow from the definition logb (expb (x)) = x. The most im-
portant of these properties are listed below and should be memorised:

(i) special values: logb (b) = 1 and logb (1) = 0,


(ii) product: logb (y1 y2 ) = logb (y1 ) + logb (y2 ),
y1
(iii) quotient: logb ( ) = logb (y1 ) − logb (y2 ),
y2
(iv) power: logb (y n ) = nlogb (y),
logc (y)
(v) change of base: logb (y) =
logc (b)
The proofs of these statements are omitted.

Trigonometric functions To complete this subsection, let us discuss the trigonometric


functions sin, cos and tan and their inverses. Consider the bijective function
( π π)
sin : − , → [−1, 1]
2 2
( π π)
obtained by restricting the domain of sin to the subset − , ⊂ R. The graph of
( π π) 2 2
sin : − , → [−1, 1] and that of its inverse function
2 2
( π π)
arcsin : [−1, 1] → − ,
2 2
81
are illustrated below:

Figure 7.2.7
Also consider the bijective function
cos : [0, π] → [−1, 1]
and its inverse function
arccos : [−1, 1] → [0, π].
Their graphs are sketched below:

Figure 7.2.8
Finally, consider the bijective function
! π π"
tan : − , →R
2 2
and its inverse function ! π π"
arctan : R → − , .
2 2
The graphs of these functions can be found in our Calculus Textbook - see subsection 7.6
of the lecture notes for the relevant references.

82
7.3 The derivative of an inverse function
Let f : X → Y be a differentiable bijective function described by an expression of the form
y = f (x). Our task is to find an expression for the derivative
df −1 (y)
dy
of the inverse function f −1 : Y → X.

Ideally, we can make x the subject of y = f (x) in order to obtain an expression


x = f −1 (y) for the inverse function and then differentiate this expression directly. However,
the function x = f −1 (y) may not be among our set of elementary functions, in which case
we are lacking the appropriate rule of differentiation.

An alternative approach utilises the fact that the composite function f −1 ◦ f is the identity
function on X; that is,
f −1 (f (x)) = x for all x ∈ X.
Hence, using the chain rule and differentiating both sides of this equation with respect to
x, we obtain the following result:
df −1 (y) df (x)
= 1.
dy dx

Hence, we have that


df −1 (y) 1
=
dy df (x)
dx
with the understanding that the expression on the right hand side should be expressed in
terms of y by using the relation x = f −1 (y). A common way of writing the above relation
is

dx 1
= .
dy dy
dx

Remark 7.3.1 Note that the derivative of f −1 (y) at a given point y0 is well-defined only
if the derivative of f (x) at the corresponding point x0 = f −1 (y0 ) is non-zero.

Example 7.3.2 The derivative


f " (x) = 3x2
of the bijective function
y = f (x) = x3 + 1
vanishes at the point x = 0. This means that the inverse function
1
x = f −1 (y) = (y − 1) 3
is not differentiable at the corresponding point y = f (0) = 1. Indeed, the derivative
1 2
(f −1 )" (y) = (y − 1)− 3
3
83
is not defined at y = 1.

Remark 7.3.3 Graphically, the fact that the derivative of f −1 at y0 is the reciprocal of
the derivative of f at x0 = f −1 (y0 ) can be understood as follows: The graphs of f and
f −1 are reflections of each other in the diagonal line through the origin of the Cartesian
plane. Hence, if the slope of the graph of f at a point (x0 , y0 ) is m, the slope of the graph
of f −1 at the corresponding point (y0 , x0 ) is 1/m. In order to convince yourself that this
is indeed the case, you may sketch on the same Cartesian plane the graph of a random
bijective function f and that of its inverse f −1 and compare their slopes at corresponding
points.

Example 7.3.4 Let us apply the above-mentioned result to the functions (i) g : (0, ∞) →
R defined by
g(x) = loge (x) = ln(x)
π π
and (ii) h : R → (− , ) defined by
2 2
h(x) = arctan(x)

in order to obtain expressions for their derivatives:

(i) For the first function, we let


y = g(x) = ln(x)
and solve this equation for x in order to obtain an expression for its inverse function:

x = g −1 (y) = ey .

Using the result that


dy 1
= ,
dx dx
dy
we find that the derivative dy/dx is given by
1
(ln(x))" = .
ey
Finally, we need to express the right hand side in terms of x. Given that ey = x, we see
that
1
(ln(x))" = .
x
This is indeed what we would have obtained had we used the differentiation rule for ln(x)
directly.

(ii) For the second function, we let

y = h(x) = arctan(x)

and solve this equation for x in order to obtain the expression

x = h−1 (y) = tan(y)

84
for the inverse function. Using the fact that
dy 1
= ,
dx dx
dy
we find the following expression for the derivative dy/dx:
1
(arctan(x))" = .
sec2 (y)
Finally, we need to express the right hand side in terms of x. Instead of replacing y by
arctan(x) to obtain the rather complicated expression
1
(arctan(x))" = ,
sec2 [arctan(x)]
we follow a different approach: We use the trigonometric identity
sec2 (y) = 1 + tan2 (y)
and the relation x = tan(y) to obtain the simpler expression
1 1
(arctan(x))" = 2
= .
1 + tan (y) 1 + x2
This result should be memorised and added to our list of elementary derivatives.

7.4 The local inverse of a non-bijective function


In subsection 3.4, we discussed the possibility of restricting the domain X ⊆ R and the
codomain Y ⊆ R of a non-bijective function f : X → Y in order to construct a bijective
function f : A → B, where A ⊆ X and B ⊆ Y . Extending this idea, we can divide
the graph of a non-bijective function f : X → Y into smaller invertible pieces and then
introduce an appropriate inverse function for each of these pieces.

Formally, we say that


f −1 : B → A
is a local inverse for
f :X→Y
if the following statement is true: Provided that x ∈ A and y ∈ B, we have
x = f −1 (y) if and only if y = f (x).

This definition implies that both f : A → B and f −1 : B → A are bijective. In addition,


the composite function f −1 ◦ f : A → A is the identity function of the subset A of the
domain of f , and the composite function f ◦ f −1 : B → B is the identity function of the
subset B of the codomain of f . Moreover, if x0 is an interior point of A and y0 = f (x0 ) is
an interior point of B, we say that
f −1 : B → A
is a local inverse for f : X → Y at the point x0 . Note that there are many local
inverses for f : X → Y at such a point x0 . They correspond to different choices of intervals
A ∈ X containing x0 as an interior point. Two such choices are illustrated below:

85
Figure 7.4.1

Example 7.4.2 Consider the function f : R → R defined by


2
f (x) = ex .

Without sketching the graph of f , find a local inverse for f at the point x0 = −2.

We note that f (x) is differentiable everywhere on R. Its derivative is given by


2
f " (x) = 2xex .

We also note that f " (x) < 0 for all x < 0 and that f " (x) > 0 for all x > 0. This means
that the function f is strictly decreasing on the interval (−∞, 0) and strictly increasing on
the interval (0, ∞). Hence, the function f is bijective on the interval A = [−3, −1] which
contains x0 = −2 as an interior point. The corresponding interval B = [e, e9 ] contains
y0 = f (−2) as an interior point. Of course, other choices of intervals A and B are possible.

In order to obtain an explicit expression for the local inverse f −1 : B → A, we need to


2
make x the subject of the equation y = f (x) = ex . We find that
/
x = ± ln(y),

and we also need to satisfy the condition that x should belong to A = [−3, −1]. This
implies that the negative root should be selected. Therefore, the function

f −1 : [e, e9 ] → [−3, −1]

defined by /
f −1 (y) = − ln(y)
is a local inverse for f : R → R at the point x0 = −2.

86
7.5 Exercises for self study
Exercise 7.5.1 (i) Show carefully that the derivative of g(x) = arcsin(x) is given by
1
g " (x) = √ .
1 − x2

(ii) Consider the non-bijective function f : R → R defined by

f (x) = sin(x).

Find a local inverse for f at x0 = .
3

Hint: The function arcsin is not a local inverse of sin at x0 = .
3

Exercise 7.5.2 Consider the function f : R → R defined by

f (x) = x2 + 4x + 7.

Sketch the graph of f and hence find a local inverse for f at the point x0 = −4.

Exercise 7.5.3 (i) Show carefully that the derivative of f (x) = arccos(x) is given by
1
f " (x) = − √ .
1 − x2

(ii) For a > 1, find the derivative of g(x) = ax .


Hint: Express g as a composite function constructed from elementary functions whose
derivatives we know.
(iii) Using your answer to part (ii), find the derivative of g −1 (x) = loga (x).

Exercise 7.5.4 Let x denote the quantity of a product and let p denote the price per
unit required to sell x units of this product. You can assume that x and p are continuous
variables.
(i) Sketch the graph of the demand function D : (0, ∞) → (0, 85) defined by

x = D(p) = 85e−6p

and explain why D is a bijective function. Use the labels (s, t) for your graph; i.e., sketch
t = D(s).
(ii) Find an expression for the inverse demand function p = D−1 (x) and sketch its graph
on the same Cartesian plane as in part (i); i.e., sketch t = D−1 (s).
(iii) Obtain an expression for the revenue function R(x) = xD−1 (x) and maximise this
function on (0, 85). State the quantity xmax and the price pmax at this maximum.

87
The price elasticity of demand ε(x) is defined by

D−1 (x)/x
ε(x) = .
(D−1 )" (x)

(iv) Verify that the absolute value of ε(xmax ) is equal to 1, where xmax is defined in part
(iii).

7.6 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 2.3, 2.5, 2.6, 2.9 and 7.1 of our Calculus Textbook are relevant. Exercises 1-6
found in section 7.6 are optional.

8 One-Variable Calculus, Part 6 of 7


8.1 Indefinite and definite integrals
Let X and Y be subsets of R and let f : X → Y be a function. An indefinite integral
of f : X → Y is a function 0
f (x)dx

with the property that its derivative is f (x); that is,


&0 '
d
f (x)dx = f (x).
dx

Any such function is frequently denoted by the corresponding capital letter:


0
F (x) = f (x)dx.

Remark 8.1.1 Note that there are many indefinite integrals F for a given function f .
This is because the addition of a constant term C )= 0 to an indefinite integral F1 of f
produces another indefinite integral F2 of f .

Example 8.1.2 The functions F1 (x) = sin(x) and F2 (x) = sin(x) + 3 are both indefinite
integrals for f (x) = cos(x) because

F1" (x) = F2" (x) = f (x).

Remark 8.1.3 An alternative terminology for an indefinite integral F of f is an an-


tiderivative of f or a primitive of f .

Given an interval I ⊆ X whose left and right endpoints are a and b respectively, the
definite integral 0 b
f (x)dx
a

88
of f : X → Y is numerically equal to the area bounded by the x-axis, the graph of f and
the vertical lines x = a and x = b, with the understanding that area beneath the x-axis
counts as negative:

Figure 8.1.4

8.2 The fundamental theorem of calculus


Definite and indefinite integrals are linked by the fundamental theorem of calculus,
which is stated without proof. This theorem asserts that
0 b
f (x) dx = F (b) − F (a),
a

where F is any primitive of f . In other words, it tells us that the area bounded by the
graph of a given function f , the x-axis and the vertical lines x = a and x = b is equal to
F (b) − F (a), where F is any indefinite integral of f ; i.e., any function whose derivative is
f.
In fact, the above result is known as the second version of the fundamental theorem of
calculus. An alternative version of this theorem, referred to as the first version, states
that &0 x '
d
f (t) dt = f (x).
dx a

In other words, it states that the function F defined by


0 x
F (x) = f (t) dt
a

is a primitive of f .
Remark 8.2.1 The first version of the fundamental theorem of calculus tells us that we
write an indefinite integral of a function f as a definite integral of f with the variable x
being the upper limit and a constant a being the lower limit. Note that the constant a
plays the role of the arbitrary constant of integration.
The following example illustrates both versions of this theorem.
Example 8.2.2 Let f : R → R be defined by

f (x) = x2 .

89
The first version of the fundamental theorem of calculus states that for a ∈ R,
&0 x '
d 2
t dt = x2 .
dx a

t3
In order to illustrate this, let us choose as a primitive of t2 . Then we have
3
&0 x ' & '
d 2 d x 3 a3
t dt = − = x2 ,
dx a dx 3 3
in agreement with the first version of the theorem.

Equivalently, the function F defined by


0 x 0 x
x 3 a3
F (x) = f (t) dt = t2 dt = − ,
a a 3 3

where a is any constant, is a primitive of f (x) = x2 .

Continuing with this example, consider the interval (1, 2) ⊂ R. The second version of the
fundamental theorem of calculus states that
0 2
x2 dx = F (2) − F (1),
1

where F is any primitive of f . One such primitive F : R → R is the function


1
F (x) = x3 + 43
3
since its derivative F " (x) = x2 is equal to f (x). Using this particular primitive in the
calculation above, we learn that the area under the graph of f between the vertical lines
x = 1 and x = 2 is equal to
0 2 & ' & '
2 1 3 1 3 7
x dx = F (2) − F (1) = 2 + 43 − 1 + 43 = .
1 3 3 3

8.3 Primitives of elementary functions


The following primitives should be memorised. Do check that the derivatives of these
primitives are indeed the corresponding integrands; that is, the functions appearing inside
the integrals on the left hand side:
0
xα+1
xα dx = + C, α )= −1, x ∈ R,
α+1
0
x−1 dx = ln(x) + C, x > 0,

0
x−1 dx = ln(−x) + C, x < 0,

90
0
ex dx = ex + C, x ∈ R,

0
cos(x)dx = sin(x) + C, x ∈ R,

0
sin(x)dx = −cos(x) + C, x ∈ R,

0
1
dx = arctan(x) + C, x ∈ R,
x2 +1
0
1
√ dx = arcsin(x) + C, x ∈ (−1, 1).
1 − x2

Remark 8.3.1 Combining


0
x−1 dx = ln(x) + C, x>0

and 0
x−1 dx = ln(−x) + C, x<0

into a single relation, we obtain


0
x−1 dx = ln|x| + C, x )= 0

where |x| denotes the modulus of x.


Remark 8.3.2 With x ∈ (−1, 1), the function

−arccos(x) + C

1
provides an alternative primitive for √ , since we have that
1 − x2
1
(−arccos(x) + C)" = √ .
1 − x2

8.4 Integration by recognition


This technique is analogous to the chain rule of differentiation: Consider any elementary
rule of integration, such as 0
cos(x)dx = sin(x),

where the arbitrary constant C has been omitted for simplicity. Using the corresponding
rule of differentiation,
d
sin(x) = cos(x),
dx
91
and the chain rule, we obtain the derivative of the composite function sin(u(x)):

d
sin(u(x)) = cos(u(x))u" (x).
dx

In this way we generate the primitive


0
cos(u(x))u" (x)dx = sin(u(x)),

where the function u(x) is left unspecified and thus remains at our disposal.

The following example illustrates the usefulness of this technique.

Example 8.4.1 Calculate the integral


0
xcos(x2 )dx.

We recognise that the term x inside the integral is a multiple of the derivative of the
argument x2 of cos(x2 ). We can therefore think of x2 as u(x) and proceed as follows:
0 0 0
1 1 1
2
xcos(x )dx = 2
cos(x )(2x)dx = cos(u(x))u" (x)dx = sin(u(x)) + C
2 2 2
1
= sin(x2 ) + C.
2
Indeed, by the chain rule, the derivative of the primitive on the right hand side is the
integrand on the left hand side:
& '"
1 2 1
sin(x ) + C = cos(x2 )(2x) = xcos(x2 ).
2 2

Generalising our elementary rules in this way, we produce a much wider family of such rules,
listed below. It should be noted that the domain of validity of each of these generalised
rules depends on the details of the function u(x):
0
uα+1 (x)
uα (x)u" (x)dx = + C, α )= −1,
α+1
0
u−1 (x)u" (x)dx = ln|u(x)| + C,

0
eu(x) u" (x)dx = eu(x) + C,

0
cos(u(x))u" (x)dx = sin(u(x)) + C,

0
sin(u(x))u" (x)dx = −cos(u(x)) + C,

92
0
1
u" (x)dx = arctan(u(x)) + C,
u2 (x) +1
0
1
/ u" (x)dx = arcsin(u(x)) + C.
1 − u2 (x)
Remark 8.4.2 Note that the term u" (x)dx can be written compactly as du(x). Simpli-
fying this notation further, we can even write du(x) as du, in which case we recover our
original elementary rules with x replaced by u. For example, the last rule above becomes:
0
1
√ du = arcsin(u) + C
1 − u2

with the understanding that u can represent any function of x.


Let us use this notation in the examples below:
Example 8.4.3 This example is concerned with the special case where u is an affine
function of x: Compute the integral
0
e4x+7 dx.

Without explicitly introducing the label u, we have:


0 0
4x+7 1 1
e dx = e4x+7 d(4x + 7) = e4x+7 + C.
4 4

Example 8.4.4 This example is concerned with the case where u is a non-linear
function of x: Compute the integral
0
2
e4x xdx.

Again, without explicitly introducing the label u, we have:


0 0
4x2 1 2 1 2
e xdx = e4x d(4x2 ) = e4x + C.
8 8

Example 8.4.5 This is about integrals of the arctan-type. The characteristic prop-
erty of such integrals is that the quadratic polynomial in the denominator has negative
discriminant: Compute the integral
0
1
2
dx.
x + 4x + 7
Without the explicit use of the label u, we have:
0 0 0
1 1 1 1
2
dx = 2
dx = & '2 dx
x + 4x + 7 (x + 2) + 3 3 x+2
√ +1
3

93
√ 0 & ' √ & '
3 1 x+2 3 x+2
= & '2 d √ = arctan √ + C.
3 x+2 3 3 3
√ +1
3

Example 8.4.5 This is about integrals of the arcsin-type. The characteristic prop-
erty of such integrals is that the quadratic polynomial inside the square root has positive
discriminant and negative leading coefficient: Compute the integral
0
1
√ dx.
5 − x2

Again, without the explicit use of u, we have:


0 0 √ 0 & '
1 1 1 5 1 x
√ dx = √ 1 & '2 dx = √5 1 & '2 d √ 5
5 − x2 5 x x
1− √ 1− √
5 5
& '
x
= arcsin √ + C.
5

Remark 8.4.6 We could add a few more integrals of the trigonometric-type to the above
list as well as some typical integrals of the hyperbolic trigonometric-type. However, that
would go beyond the scope of this course.

8.5 Integration by change of variable


All the primitives listed above can be re-derived by explicitly introducing a new label u.
For example, the sequence of steps
0 0
4x2 1 2 1 2
e xdx = e4x d(4x2 ) = e4x + C
8 8

amounts to the following process: We introduce a new variable u (strictly speaking, a


function u(x)) by
u = 4x2 .
By differentiating this equation,
du
= 8x,
dx
we obtain a corresponding equation between the differentials du and dx:

du = 8xdx.

Hence, 0 0 0
4x2 1 4x2 1 1 1 2
e xdx = e 8xdx = eu du = eu + C = e4x + C.
8 8 8 8

94
This technique is known as integration by change of variable and is more general than
integration by recognition, in the sense that it can be applied to integrals where it is
difficult to recognise a primitive directly.

Example 8.5.1 Find the integral


0 1 √
1 − x2 dx.
−1

Here, it seems difficult to recognise a primitive, so let us try to eliminate the square root
by using a suitable change of variable. To this end, we introduce the trigonometric variable
θ by
x = sin(θ),
where
π π

<θ< .
2 2
π π
Note that the function sin : [− 2 , 2 ] → [−1, 1] is bijective; this guarantees the validity of
this method.
We have
dx = cos(θ)dθ,
so the integral becomes
0 π/2 2
1 − sin2 (θ) cos(θ)dθ.
−π/2

We now use the fact that 1 − sin2 (θ) = cos2 (θ) and note that cos(θ) ≥ 0 for θ ∈ [− π2 , π2 ],
which implies that 2
1 − sin2 (θ) = cos(θ).
Hence, the integral becomes
0 π/2
cos2 (θ)dθ.
−π/2

Using the double angle formula cos(2θ) ≡ 2cos2 (θ) − 1, we obtain the equivalent integral
0 π/2
1
(cos(2θ) + 1)dθ
2 −π/2

which yields
3 4 π/2
1 sin(2θ)
+θ .
2 2 −π/2

By evaluating these limits, we learn that the area under the graph of y = 1 − x2 between
−1 and 1 is equal to π/2. This is indeed the area of the upper half of a circle of radius 1.

8.6 Improper integrals


Consider the graph of some random function f : X → R illustrated below:

95
Figure 8.6.1

This graph has two vertical asymptotes described by the Cartesian equations x = b and
x = d. Clearly, f is not defined at the points b and d, so its domain X is the subset of R
given by
X = (−∞, b) ∪ (b, d) ∪ (d, ∞).
The points a ∈ X, c ∈ X and e ∈ X are some random points belonging to the domain of
f . Note that f is continuous on X and that f (x) → 0 as x → ±∞.

Suppose that we are interested in calculating the areas under the graph of f corresponding
to the following definite integrals:
0 a 0 b 0 c
(i) f (x)dx, (ii) f (x)dx, (iii) f (x)dx,
−∞ a b

0 d 0 e 0 ∞
(iv) f (x)dx, (v) f (x)dx, (vi) f (x)dx.
c d e

A question immediately arises: Do these integrals produce a finite numerical value or do


they diverge to infinity? To be more specific, we see that the domain of integration of the
first and of the sixth integral is unbounded. Hence, even if f (x) → 0 as x → ±∞, there
is no guarantee that the corresponding areas will be finite. The same uncertainty applies
to the second, third, fourth and fifth integrals. Their integrands all encounter a vertical
asymptote at one of the endpoints of the domain of integration. Given that f (x) tends to
infinity there, there is again no guarantee that the corresponding areas will be finite.

The first and sixth integrals are called improper integrals of the first kind: the integrand
remains finite throughout the domain of integration, but the domain of integration is
unbounded. The second, third, fourth and fifth integrals are called improper integrals
of the second kind: the domain of integration is bounded, but the integrand encounters
a vertical asymptote at one of the endpoints of the domain of integration.

An improper integral is defined by first evaluating it as a standard definite integral and


then taking the appropriate limit. For example, the six integrals introduced above are
defined as follows:
0 a &0 a '
(i) f (x)dx = limM →−∞ f (x)dx ,
−∞ M

96
0 b &0 M '
(ii) f (x)dx = limM →b− f (x)dx ,
a a
0 c &0 c '
(iii) f (x)dx = limM →b+ f (x)dx ,
b M
0 d &0 M '
(iv) f (x)dx = limM →d− f (x)dx ,
c c
0 e &0 e '
(v) f (x)dx = limM →d+ f (x)dx ,
d M
0 ∞ &0 M '
(vi) f (x)dx = limM →∞ f (x)dx .
e d

If taking the relevant limit leads to a unique finite value, we say that the improper integral
converges. Otherwise, we say that the integral diverges.

Example 8.6.2 Compute the following improper integrals:

0 ∞ 0 ∞ 0 2 0 7
1 1 1 1
(a) dx, (b) √ dx, (c) dx, (d) √ dx.
3 x2 2 x 0 x2 4 7−x

The areas corresponding to these integrals are sketched below:

Figure 8.6.3

97
(a) This is an improper integral of the first kind. Following the definition, we have:
0 ∞ &0 M ' 53 4M 6 & '
1 1 1 1 1 1
dx = lim M →∞ dx = lim M →∞ − = lim M →∞ − + = .
3 x2 3 x2 x 3 M 3 3

1
Hence, this integral converges. The first area is equal to .
3
(b) This is an improper integral of the first kind. We have:
0 ∞ &0 M ' !7 √ 8 "
1 1 M
√ dx = limM →∞ √ dx = limM →∞ 2 x 2
2 x 2 x
! √ √ "
= limM →∞ 2 M − 2 2 → ∞.

This integral diverges, so the second area is infinite.

(c) This is an improper integral of the second kind. The vertical asymptote occurs at
x = 0, so the integral is defined by the following limit:
0 2 &0 2 ' 53 42 6 & '
1 1 1 1 1
2
dx = limM →0+ 2
dx = limM →0+ − = limM →0+ − + → ∞.
0 x M x x M 2 M

This integral diverges, so the third area is infinite.

(d) This is an improper integral of the second kind. The vertical asymptote occurs at
x = 7, so the integral is defined by the following limit:
0 7 &0 M ' !7 √
1 1 8M "
√ dx = limM →7− √ dx = limM →7− −2 7 − x 4
4 7−x 4 7−x
! √ √ " √
= limM →7− −2 7 − M + 2 3 = 2 3.

This integral converges, so the fourth area is equal to 2 3.

8.7 Exercises for self study


Exercise 8.7.1 Use a change of variable to compute the arctan-type integral
0
1
2
dx
x + 6x + 11

and the arcsin-type integral 0


1
√ dx.
−x2 − 6x − 5
Express your final answers in terms of x.

98
Exercise 8.7.2 Compute the following integrals either by recognition or by a change of
variable: 0 √x
e
√ dx,
2 x
0
3 − 2x
√ dx.
1 − x2
Express your final answers in terms of x.

Exercise 8.7.3 A probability density function for a continuous random variable X is a


function f : R → R which satisfies the following two properties:
0 ∞
f (x) ≥ 0 ∀x ∈ R and f (x) dx = 1.
−∞

Given such a function f , the probability P (X ≤ a) that X takes a value less than or equal
to a ∈ R is equal to the improper definite integral
0 a
P (X ≤ a) = f (x) dx.
−∞

(i) Find the value of the constant c that makes the following function f : R → R a
probability density function:
 x
 ce x ∈ (−∞, 0)
f (x) =
 −x
ce x ∈ [0, ∞)

(ii) Hence, calculate the probability P (X ≤ 2) leaving your answer in exact form.

Exercise 8.7.4 Use the change-of-variable technique to compute the indefinite integrals
0
1
2
dx
2x + 12x + 23

and 0
1
√ dx
−x2 − 8x − 10
Express both integrals in terms of x.

8.8 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 10.2, 10.3, 10.4, 10.5, 10.7, 10.8 and 10.11 of our Calculus Textbook are relevant.
Exercises 15, 16, 19, 20, 22 found in section 10.10 are optional.

99
9 One-Variable Calculus, Part 7 of 7
9.1 Integration by partial fractions
This technique deals with integrals of the form
0
pm (x)
dx,
pn (x)

where pm (x) is a polynomial of degree m and pn (x) is a polynomial of degree n. The


following methodology applies to all such integrals:

Step 1: We start by comparing the degrees of the polynomials pm (x) and pn (x):

pm (x)
(i) If m < n, the integrand is said to be in standard form.
pn (x)
pm (x)
(ii) If m ≥ n, we use polynomial division in order to express the integrand in
pn (x)
the form
R(x)
Q(x) + .
pn (x)
R(x)
The quotient Q(x) is a polynomial. The remainder term has the property that
pn (x)
the degree of its numerator is strictly less than the degree of its denominator. As a
result, we can express the original integral as a sum of two integrals:
0 0 0
pm (x) R(x)
dx = Q(x)dx + dx.
pn (x) pn (x)

The first integral is straightforward to calculate. The integrand appearing in the


second integral is in standard form.

Step 2: Since standard form is always attainable, let us only consider integrands of the
form
R(x)
pn (x)
where the degree of the numerator is strictly less than n. Let us also assume that the
polynomial pn (x) has been factorised into a product of polynomials of the lowest degree;
that is, polynomials of degree 1 and polynomials of degree 2 of negative discriminant. Some
of these factors may be repeated in pn (x).
Using these factors as building blocks, we construct partial fractions according to the
following rules:

(i) Each distinct polynomial p(x) of degree 1 that is raised to the power m in pn (x)
produces m partial fractions of the form
a1 a2 am
+ + · · · + ,
p(x) (p(x))2 (p(x))m

where the constants {ai } are to be determined.

100
(ii) Each distinct polynomial q(x) of degree 2 of negative discriminant that is raised to
the power k in pn (x) produces k partial fractions of the form
a1 x + b 1 a2 x + b 2 ak x + b k
+ 2
+ ··· + ,
q(x) (q(x)) (q(x))k

where the constants {ai } and {bi } are to be determined.

Step 3: The resulting fractions can be integrated using elementary integration techniques.
In particular:

(i) Each fraction of the form


am
,
(p(x))m
where p(x) is a polynomial of degree 1, can be integrated by recognition.

(ii) Each fraction of the form


ak x + b k
,
(q(x))k
where q(x) is a polynomial of degree 2 of negative discriminant, can be integrated by
techniques which will be discussed shortly.

This completes the methodology.

Example 9.1.1 Before we present these integration techniques, let us illustrate the
process described in step 2 by applying it to an integrand that contains all the different
kinds of factors that we have discussed: The denominator of this integrand is a polynomial
of degree ten which has been factorised into its simplest possible factors. The numerator
of this integrand is an unspecified polynomial of degree strictly less than ten. Regardless
of the particular choice of the polynomial in the numerator, the process described in step
2 results in the following partial fractions:
A0 x0 + A1 x1 + · · · + A9 x9 a1 b1 b2 b3
3 2 2 2
= + + 2
+
(x − 4)(x + 5) (x + 4x + 7)(x + 1) x − 4 x + 5 (x + 5) (x + 5)3
c1 x + d 1 e1 x + f1 e 2 x + f2
+ + 2 + 2 .
x2+ 4x + 7 x +1 (x + 1)2
Note that the constants on the left hand side are arbitrary but given, while the constants
on the right hand side are to be determined.

Remark 9.1.2 Examples involving the calculation of constants such as the lower-case
constants above and examples illustrating the process of polynomial division can be found
in section 10.6 of our calculus textbook. Throughout these notes, I have assumed that you
are familiar with such calculations.

Let us now complete this subsection by presenting two examples that involve the integration
of fractions of the form
am x + b m
,
(q(x))m

101
where q(x) is a polynomial of degree 2 of negative discriminant:

Example 9.1.3 Calculate the integral


0 3
x + 6x2 + 18x − 3
dx.
x2 + 6x + 11

We observe that the integrand is not in standard form. Therefore, we use polynomial
division to express it as follows:
0 & '
7x − 3
x+ 2 dx.
x + 6x + 11

7x − 3
Moreover, we observe that the remainder term is of the form
x2 + 6x + 11
ax + b
q(x)

where q(x) is a polynomial of degree 2 of negative discriminant. Any such integral can be
calculated by using the following technique:

The key idea is to replace the integrand


7x − 3
x2 + 6x + 11
by an equivalent expression which involves a fraction whose numerator is the derivative of
x2 + 6x + 11. This fraction can be integrated by recognition since it results in an integral
of the form 0 "
q (x)
dx
q(x)
which yields ln(q(x)) + C. To this end, we express our integral
0 & '
7x − 3
x+ 2 dx
x + 6x + 11

as the sum of three integrals I1 , I2 and I3 :


0 0 0
7 2x + 6 −24
x dx + 2
dx + 2
dx.
2 x + 6x + 11 x + 6x + 11

The factor 7/2 appearing in I2 is needed in order to reproduce the ‘7x’ term in the numer-
ator of the original integral. Similarly, the constant −24 appearing in I3 is needed in order
to reproduce the ‘−3’ term in the numerator of the original integral.

The integrals I1 , I2 and I3 are straightforward to calculate:


0
x2
I1 = x dx = ,
2

102
0
7 2x + 6 7
I2 = dx = ln(x2 + 6x + 11),
2 x2 + 6x + 11 2
and I3 is of the arctan-type. We have:

0 0 0
1 1 24 1
I3 = −24 2
dx = −24 2
dx = − '2 & dx
x + 6x + 11 (x + 3) + 2 2
x+3
√ +1
2
& ' & '
√ 0 1 x+3 √ x+3
= −12 2 & '2 d √ = −12 2 arctan √ .
x+3 2 2
√ +1
2
Adding I1 , I2 and I3 , we obtain the final answer:
0 3 & '
x + 6x2 + 18x − 3 x2 7 2
√ x+3
dx = + ln(x + 6x + 11) − 12 2 arctan √ + C.
x2 + 6x + 11 2 2 2

Example 9.1.4 Calculate 0


8x + 5
dx.
(x2 + 6x + 11)2
We observe that the fraction is in standard form and that the integrand has the particular
form
ax + b
(q(x))2
where q(x) is a polynomial of degree 2 of negative discriminant.
8x + 5
Once more, the key idea is to replace the integrand by an equivalent
(x2
+ 6x + 11)2
expression which involves a fraction whose numerator is the derivative of x2 + 6x + 11.
This fraction can be integrated by recognition since it results in an integral of the form
0
q " (x)
dx
(q(x))2

1
which yields − + C. To this end, we express our integral
q(x)
0
8x + 5
dx
(x2 + 6x + 11)2

as the sum of two integrals I1 and I2 :


0 0
2x + 6 −19
4 2 2
dx + dx.
(x + 6x + 11) (x + 6x + 11)2
2

The factor 4 appearing in I1 is needed in order to reproduce the ‘8x’ in the numerator
of the original integral. Similarly, the constant −19 appearing in I2 is needed in order to
reproduce the ‘+5’ term in the numerator of the original integral.

103
The integral I1 can be calculated by recognition. We have:
0
2x + 6 4
4 2 2
dx = − 2 .
(x + 6x + 11) x + 6x + 11

The integral I2 is not of the arctan-type because of the presence of the square in the
denominator. However, it is closely related to such an integral in the sense that it can be
simplified by using a arctan-type change of variable. In particular, we complete the square
to get 0
dx
I2 = −19
((x + 3)2 + 2)2
and then consider a change of variable that is motivated by the following argument:

We would like to set


(x + 3)2 + 2 = 2 tan2 (θ) + 2
in order to make the denominator of I2 a multiple of sec4 (θ). Indeed, we have

(2 tan2 (θ) + 2)2 = 4 (tan2 (θ) + 1)2 = 4 sec4 (θ).

In other words, we would like to consider the change of variable given by



x + 3 = 2 tan(θ).

The differential dx in the numerator of I2 then becomes 2sec2 (θ)dθ, so I2 simplifies to
0 √ √ 0 √ 0
2sec2 (θ)dθ 19 2 dθ 19 2
I2 = −19 =− =− cos2 (θ)dθ.
4 sec4 (θ) 4 sec2 (θ) 4

1
We now use the double angle formula cos2 (θ) = (cos(2θ) + 1) to obtain the equivalent
2
integral √ 0 √ & '
19 2 19 2 sin(2θ)
I2 = − (cos(2θ) + 1)dθ = − + θ + C.
8 8 2
Finally, we would like to express I2 as a function of x. To this end, we use the double
angle formula sin(2θ) = 2 sin(θ)cos(θ) to obtain an expression for I2 in terms of θ, sin(θ)
and cos(θ),

19 2 ! "
I2 = − sin(θ)cos(θ) + θ + C,
8

and then use the relation x + 3 = 2 tan(θ) to express θ, sin(θ) and cos(θ) in terms of x.
In fact, the easiest way to achieve this is to consider a right-angled triangle with θ being
x+3
one of the two acute angles. Given that tan(θ) = √ , we can set the side opposite to
2 √
the angle θ equal to x + 3 and the side / adjacent to the angle θ equal to 2. Then the
hypotenuse of this triangle is equal to (x + 3)2 + 2, as illustrated below:

104
Figure 9.1.5

In this way, we deduce that



x+3 2
sin(θ) = / and cos(θ) = / .
(x + 3)2 + 2 (x + 3)2 + 2

Therefore, I2 is equal to
√ 5 √ & '6
19 2 x+3 2 x+3
I2 = − / / + arctan √ + C.
8 (x + 3)2 + 2 (x + 3)2 + 2 2

Putting it all together, we deduce that our original integral is equal to


0 √ 5 √ & '6
8x + 5 4 19 2 2(x + 3) x+3
dx = − 2 − + arctan √ + C.
(x2 + 6x + 11)2 x + 6x + 11 8 (x + 3)2 + 2 2

You may confirm this result by differentiating the right hand side.

9.2 Integration by parts


This technique is analogous to the product rule of differentiation and results in a general
formula for integrating integrands of a certain kind. In order to derive this formula, consider
the derivative of the product of two functions f and g. By the product rule, we have

(f (x)g(x))" = f " (x)g(x) + f (x)g " (x).

Let us arrange this equation in the form

f (x)g " (x) = (f (x)g(x))" − f " (x)g(x)

and integrate both sides with respect to x. We obtain


0 0 0
"
f (x)g (x)dx = (f (x)g(x)) dx − f " (x)g(x)dx.
"

Since f (x)g(x) is an indefinite integral for (f (x)g(x))" , we find that


0 0
f (x)g (x)dx = f (x)g(x) − f " (x)g(x)dx.
"

105
This is the formula for integration by parts. As is always the case with indefinite
integrals, the left hand side is determined by the right hand side only up to the addition
of an arbitrary constant.
0
Given an integral h(x) dx that is difficult to calculate directly, the above formula can be
useful as long as the following two conditions are satisfied:

(i) The integrand h(x) can be viewed as the product of two functions f (x) and g " (x),
where it is possible to integrate g " (x).
0
(ii) Either the resulting integral f " (x)g(x)dx can be calculated directly or can eventu-
ally lead to a solution.

The following examples capture these ideas:

Example 9.2.1 Calculate 0


xex dx.

We identify f (x) = x and g " (x) = ex and apply the rule. We have f " (x) = 1, g(x) = ex , so
0 0
xe dx = xe − ex dx = xex − ex + C.
x x

Example 9.2.2 Calculate 0


ln(x)dx.

1
We identify f (x) = ln(x) and g " (x) = 1 and apply the rule. We have f " (x) = , g(x) = x,
x
so 0 0
ln(x)dx = xln(x) − dx = xln(x) − x + C.

Remark 9.2.3 There are cases where the method of integration by parts generates a
recursive formula which eventually leads to a solution. The following example provides a
simple illustration of such a case.

Example 9.2.4 Calculate 0


ex sin(x)dx.

We identify f (x) = sin(x) and g " (x) = ex and apply the rule. We have f " (x) = cos(x),
g(x) = ex , so 0 0
e sin(x)dx = e sin(x) − ex cos(x)dx.
x x

Regarding the second integral, we now perform integration by parts once more. We identify
f (x) = cos(x) and g " (x) = ex (otherwise we will undo our previous step and go back to the
start) and apply the rule again. We have f " (x) = −sin(x), g(x) = ex , so
0 & 0 '
x x x x
e sin(x)dx = e sin(x) − e cos(x) + e sin(x) dx .

106
0
We now observe that the integral ex sin(x)dx appears twice in the above equation, so we
can make it the subject of this equation. In this way, we find:
0
1
ex sin(x)dx = (ex sin(x) − ex cos(x)) + C.
2

Example 9.2.5 Calculate 0


arctan(x)dx.

1
We identify f (x) = arctan(x) and g " (x) = 1 and apply the rule. We have f " (x) =
x2 +1
and g(x) = x, so 0 0
x
arctan(x)dx = x arctan(x) − dx.
x2 + 1
Integrating the second term by recognition, we obtain
0 0
1 2x 1
arctan(x)dx = x arctan(x) − dx = x arctan(x) − ln(x2 + 1) + C.
2 x2 + 1 2

9.3 Consumers’ and producers’ surplus

Let us introduce this topic with the help of an example.

Consider a market with one product. Let x denote the number of units of this product and
p denote the price per unit. We can assume that both x and p are continuous variables.

Suppose that the inverse demand function is given by


p = D(x) = 60 − 3x,

where 0 ≤ x ≤ 20. Also suppose that the inverse supply function is given by
p = S(x) = x2 + 20.

At the market price p0 , we have the equilibrium condition


60 − 3x0 = x20 + 20,

which implies that


x20 + 3x0 − 40 = 0.
Hence either x0 = −8 or x0 = 5. The first solution does not belong to the domain [0, 20]
of the inverse demand function, so it is rejected. The second solution is accepted. The
corresponding market price is p0 = 45. Note that, at this market price, the total amount
the consumers pay and the producers receive is x0 p0 = 225.

Clearly, consumers who buy the product at this market price p0 would have been willing
to pay at least p0 . This monetary gain obtained by consumers — because they are able
to purchase a product for a price that is less than the highest price that they would be
willing to pay — forms the consumers’ surplus. It is the area under the demand curve
y = D(x) minus the area x0 p0 corresponding to the total amount the consumers pay. The
relevant illustration follows:

107
Figure 9.3.1

We can see that the consumers’ surplus corresponds to the definite integral
0 x0 0 5 3 45
3x2 75
D(x)dx − x0 p0 = (60 − 3x)dx − 225 = 60x − − 225 = .
0 0 2 0 2

Similarly, the producers’ surplus is the amount that producers benefit by selling at a
market price that is higher than the least price they would be willing to sell for. This is the
area x0 p0 corresponding to the total amount the producers receive minus the area under
the supply curve y = S(x):

Figure 9.3.2

We can see that the producers’ surplus corresponds to the definite integral
0 x0 0 5 3 45
2 x3 250
x0 p0 − S(x)dx = 225 − (x + 20)dx = 225 − + 20x = .
0 0 3 0 3

9.4 Exercises for self study


Exercise 9.4.1 Calculate the integral
0
x2
dx.
x2 + x − 2

108
Exercise 9.4.2 Calculate the integral
0
1
dx.
(1 − x2 )2

Exercise 9.4.3 Use the method of partial fractions to calculate the integral
0 3
x − x2 + x + 1
dx
(x2 + 1)2

and the method of integration by parts to calculate the integral


0 1
arcsin(x) dx.
0

Exercise 9.4.4 Use the method of change of variable to calculate the integrals
0 0 0
8 3 2 6 3 sin(x) cos(x)
(i) cos (x) sin (x) dx (ii) (x + 2) x dx and (iii) dx.
(1 + sin(x))3

9.5 Relevant sections from the textbooks


• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 10.6 and 10.9 of our Calculus Textbook are relevant. Exercises 13, 14, 18, 23, 24,
25 and 26 found in section 10.10 are optional.

10 Matrices, Part 1 of 3
10.1 Definitions, notation and terminology
We define a matrix A of size m × n as a rectangular array of numbers with m rows and
n columns:  
a11 a12 ... a1n
 a21 a22 . . . a2n 
 
A =  .. .. .. ..  .
 . . . . 
am1 am2 . . . amn
The numbers in the array are called the entries in the matrix. In particular, the entry
that occurs in the ith row and jth column of A is denoted by aij and called the (i, j) entry.
It is often useful to denote an m × n matrix A by (aij )m×n where 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Two matrices are equal if they have the same size and their corresponding entries are
equal. In other words, given

A = (aij )m×n and B = (bij )m×n ,

we say that
A=B

109
if and only if aij = bij for all i, j, where i ∈ {1, 2, ..., m} and j ∈ {1, 2, ..., n}.

The zero matrix of size m × n, denoted by 0 = (0)m×n , has all entries zero:
 
0 0 ... 0
0 0 . . . 0 
 
0 =  .. .. . . ..  .
. . . .
0 0 ... 0

A column matrix, also called a column vector, is a matrix C = (ci1 )m×1 with only one
column:  
c11
 c21 
 
C =  ..  .
 . 
cm1
Similarly, a row matrix, also called a row vector, is a matrix R = (r1i )1×n with only
one row: , -
R = r11 r12 . . . r1n .
A matrix S = (sij )n×n with n rows and n columns is called a square matrix of order n:
 
s11 s12 ... s1n
 s21 s22 . . . s2n 
 
S =  .. .. . . ..  .
 . . . . 
sn1 sn2 . . . snn

The identity matrix of order n, denoted by In , is the matrix


 
1 0 ... 0
0 1 . . . 0
 
In =  .. .. . . ..  .
. . . .
0 0 ... 1

Equivalently, we can write In = (aij )n×n with aii = 1 for all i ∈ {1, 2, ..., n} and aij = 0 if
i )= j.

The identity matrix In and the zero matrix 0 = (0)n×n are examples of what is known as a
diagonal matrix of order n. This is a square matrix D = (dij )n×n in which all the entries
off the main diagonal are zero:
 
d11 0 . . . 0
 0 d22 . . . 0 
 
D =  .. .. . . ..  .
 . . . . 
0 0 . . . dnn

Using the alternative notation, we have: D = (dij )n×n with dij = 0 if i )= j.

A wider category in which diagonal matrices belong is that of triangular matrices. There
are two kinds of triangular matrices:

110
An upper triangular matrix U = (uij )n×n is a square matrix of order n in which all the
entries below the main diagonal are zero:
 
u11 u12 . . . u1n
 0 u22 . . . u2n 
 
U =  .. .. . . . ..  .
 . . . 
0 0 . . . unn

That is, U = (uij )n×n with uij = 0 if i > j.

Similarly, a lower triangular matrix L = (lij )n×n is a square matrix of order n in which
all the entries above the main diagonal are zero:
 
l11 0 . . . 0
 l21 l22 . . . 0 
 
L =  .. .. . . .. .
 . . . . 
ln1 ln2 . . . lnn

In other words, L = (lij )n×n with lij = 0 if i < j.

Note that a diagonal matrix of order n is both an upper triangular and a lower triangular
matrix of the same order.

Given the general matrix A = (aij )m×n illustrated below,


 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 
A =  .. .. . . ..  ,
 . . . . 
am1 am2 . . . amn

the transpose of A, denoted by AT , is defined to be the n × m matrix that results from


interchanging the rows and columns of A. That is,
 
a11 a21 . . . am1
 a12 a22 . . . am2 
T  
A =  .. .. . . ..  .
 . . . . 
a1n a2n . . . amn

A square matrix A = (aij )n×n is symmetric if it is equal to its transpose:

A = AT .

This is equivalent to saying that aij = aji for all i, j.

Example 10.1.1 Write down the zero matrix 0 of size 2 × 3, the identity matrix I2 , a
general diagonal matrix D of order 3 and a general square matrix M of order 2. Also write
down their transposes 0T , IT2 , DT and MT . Are any of these matrices symmetric?

111
We have:
 
& ' & ' a 0 0 & '
0 0 0 1 0 d e
0= I2 = D =  0 b 0 M= .
0 0 0 0 1 f g
0 0 c
Their transposes are:
   
0 0 & ' a 0 0 & '
1 0 d f
0 T = 0 0  IT2 = DT =  0 b 0  T
M = .
0 1 e g
0 0 0 0 c
A matrix is symmetric if it is equal to its transpose. The matrix 0 is not symmetric because
it is not equal to 0T . The matrices I2 and IT2 are equal so they are symmetric. Similarly,
the general diagonal matrices D and DT are equal so they are symmetric. Regarding the
general square matrices M and MT , they are equal and hence symmetric only if e = f ;
otherwise, they are not symmetric.

Example 10.1.2 Write down the transpose UT of a general upper triangular matrix U
of order 3. Is UT a lower triangular matrix? What property must U satisfy in order to be
a symmetric matrix?

We have:    
a b c a 0 0
U = 0 d e  and UT =  b d 0  .
0 0 f c e f
Clearly, the matrix UT is a lower triangular matrix. The matrices U and UT are equal
and hence symmetric if b = c = e = 0. In other words, they are symmetric only if they are
diagonal.

Finally, let us introduce two more definitions which will be needed in what follows:

A matrix is said to be in row echelon form only if it has the following three properties:

1. For every non-zero row, the first non-zero entry in the row is equal to 1. Such an
entry is called a leading 1.
2. For any two non-zero rows, the leading 1 in the lower row is further to the right than
the leading 1 in the higher row.
3. Zero rows are at the bottom of the matrix.

A matrix is said to be in reduced row echelon form only if it satisfies the previous three
properties plus the fourth property added below:

4. Every column that contains a leading 1 has zeros elsewhere in the column.

Example 10.1.3 Identify every matrix that is in row echelon form and every matrix that
is in reduced row echelon form:
 
  1 2 0
1 5 −4 & ' & '
2 0 0 0 1 0 0 1 0

A= 0 0 0  B= C= D= 0 0 0 .

0 0 0 1 0 0
0 0 0
0 0 0

112
The matrix A is in reduced row echelon form and hence also in row echelon form. The
matrices B and C are not in row echelon form; hence, they are not in reduced row echelon
either. The matrix D is in row echelon form; however, it is not in reduced row echelon
form.

10.2 Operations on matrices


The three main operations defined on matrices are matrix addition, scalar multiplication
and matrix multiplication.

The operation of matrix addition applies only to matrices of the same size. In particular,
given matrices A = (aij )m×n and B = (bij )m×n , their sum A+B is defined to be the matrix
obtained by adding the entries of A to the corresponding entries of B. That is,

A + B = (cij )m×n

where
cij = aij + bij .
The operation of scalar multiplication applies to a matrix of any size: Given a matrix
A = (aij )m×n and a real number λ ∈ R (also known as a scalar), the matrix λA is defined
to be the matrix obtained by multiplying each entry of the matrix A by λ. That is,

λA = (bij )m×n

where
bij = λaij .
The matrix λA is called a scalar multiple of the matrix A.

Remark 10.2.1 Given matrices A = (aij )m×n and B = (bij )m×n of the same size, the
difference A − B is defined by combining the operations of matrix addition and scalar
multiplication. In particular, we have

A − B = A + (−B),

where −B is the matrix obtained by multiplying B by the scalar −1. In other words,

A − B = (cij )m×n

where
cij = aij − bij .

Example 10.2.2 Consider the following three matrices:


& ' & ' & '
3 2 −4 2 5 −6 0 1 −4 6
A= B= C= .
0 9 1 7 7 0 1 0 3 2
Are the matrices 3A − 2B and 2A + 5C defined? Compute those which are defined.

113
By combining the operations of matrix addition and scalar multiplication, we get:
& ' & ' & ' & '
3 2 −4 2 5 −6 9 6 −12 4 10 −12
3A − 2B = 3 −2 = −
0 9 1 7 7 0 0 27 3 14 14 0
& ' & ' & '
9 6 −12 −4 −10 12 5 −4 0
= + = .
0 27 3 −14 −14 0 −14 13 3

On the other hand, the matrix 2A + 5C is not defined. This is because the matrices 2A
and 5C do not have the same size, so they cannot be added.

The operation of matrix multiplication produces a product matrix AB from two ma-
trices A and B only if the number of columns of A is equal to the number of rows of B.
In particular, given a matrix A = (aij )m×r and a matrix B = (bij )r×n , their product is the
matrix AB = (cij )m×n whose general entry cij is given by the rule

cij = ai1 b1j + ai2 b2j + . . . + air brj .

This rule tells us that in order to find the general entry cij of AB, we need to select row
i of A and column j of B, multiply the corresponding entries from the row and column
together, and then add up the resulting products.

As an illustration, the product of a 2 × 4 matrix M and a 4 × 3 matrix N is the 2 × 3


matrix MN written down below:
 
& ' A B C
a b c d  D E F  =

MN =
e f g h G H I 
J K L
& '
aA + bD + cG + dJ aB + bE + cH + dK aC + bF + cI + dL
.
eA + f D + gG + hJ eB + f E + gH + hK eC + f F + gI + hL

Example 10.2.3 Consider the following three matrices:


& ' & ' & '
3 2 2 5 0 1 −6
A= B= C= .
0 9 1 −4 1 1 3

Are the matrices AB, BA, AC and CA defined? Compute those which are defined.

Using the above rule, we find that


& '& ' & '
3 2 2 5 8 7
AB = = ,
0 9 1 −4 9 −36
& '& ' & '
2 5 3 2 6 49
BA = =
1 −4 0 9 3 −34
and & '& ' & '
3 2 0 1 −6 2 5 −12
AC = = .
0 9 1 1 3 9 9 27

114
On the other hand, the matrix CA is not defined. This is because C is a 2 × 3 matrix and
A is a 2 × 2 matrix, so the number of columns of C is not equal to the number of rows of
A.

Remark 10.2.4 The above example shows that matrix multiplication is not commuta-
tive; that is MN )= NM in general. On the other hand, many familiar laws of arithmetic
are valid for matrices. We summarise these laws in the following subsection.

10.3 The laws of matrix algebra


The five main laws satisfied by the operations of matrix addition and matrix multiplication
are listed and named below. For all the laws presented in this subsection, it is assumed
that the sizes of the matrices are such that the indicated operations can all be performed.

• Matrix addition is commutative: A + B = B + A,

• Matrix addition is associative: (A + B) + C = A + (B + C),

• Matrix multiplication is associative: (AB)C = A(BC),

• Left distributive law: A(B + C) = AB + AC,

• Right distributive law: (B + C)A = BA + CA.

In addition, there are associative and distributive laws which involve scalar multiplication.
The four main ones are:

• λ(AB) = (λA)B = A(λB),

• λ(µA) = (λµ)A,

• λ(A + B) = λA + λB,

• (λ + µ)A = λA + µA.

Finally, there are laws satisfied by the zero matrix 0 of the appropriate size,

• A + 0 = A,

• A − A = 0,

• 0A = 0,

• A0 = 0,

and laws satisfied by the identity matrix I of the appropriate size,

• AI = A,

• IA = A.

115
10.4 The inverse matrix and its properties
The operation of matrix inversion applies only to a square matrix A = (aij )n×n . Moreover,
not all square matrices can be inverted. We say that a matrix A = (aij )n×n is invertible
if there exists a matrix B = (bij )n×n such that
AB = BA = In .
Recall that In is the n × n identity matrix. Provided that B exists, it is called the inverse
of A and is denoted by A−1 .

Theorem 10.4.1 If A is an n × n invertible matrix, then the matrix A−1 is unique.

Proof Assume that an invertible n × n matrix A has two inverses, B and C, so that
AB = BA = In
and
AC = CA = In .
We will show that B and C must actually be the same matrix, thereby proving uniqueness.
To this end, consider the product CAB. Since matrix multiplication is associative and
AB = In , we have
CAB = C(AB) = CIn = C.
On the other hand, again by associativity and the fact that CA = In , we have
CAB = (CA)B = In B = B.
We conclude that C = B, so there is only one inverse matrix for any invertible n×n matrix
A.

Some of the most important properties of the inverse matrix are listed below:

• If A is an invertible matrix, then, by definition, A−1 exists and is such that


AA−1 = A−1 A = I.
This statement can be reinterpreted as saying that A is the inverse of A−1 , that is,
(A−1 )−1 = A.
• If A is an invertible matrix, then, for any scalar λ )= 0, we have:
1
(λA)−1 = A−1 .
λ
1 −1
In order to prove that this statement is true, we need to show that the matrix C =A
λ
satisfies (λA)C = C(λA) = I. This is straightforward using the laws of matrix algebra
presented in subsection 10.3. We have:
& ' & '
1 −1 1 −1 1 −1 1
(λA) A = λ AA = I and A (λA) = λ A−1 A = I.
λ λ λ λ
• If A and B are invertible matrices of the same order, then:
(AB)−1 = B−1 A−1 .
The proof of this statement is left as an Exercise.

116
10.5 Powers of a matrix
Given a square matrix A and a natural number n ∈ N, we define An as the product

An = A
? A@A. . . AB .
n times

Powers of matrices obey rules similar to those obeyed by powers of numbers. First, if A is
an invertible matrix and n ∈ N, then:

(An )−1 = (A−1 )n .

The proof of this statement follows immediately from the definition of an inverse matrix
and the associativity of matrix multiplication.

In addition, the usual rules for exponents hold; that is, for integers r, s, we have:

Ar As = Ar+s

and
(Ar )s = Ars .

10.6 Properties of the transpose of a matrix


Some properties of the transpose are summarised below:

(AT )T = A,
(λA)T = λAT ,
(A + B)T = AT + BT ,
(AB)T = BT AT
and finally,
(AT )−1 = (A−1 )T .
The proofs of the first four statements are omitted but they follow directly from the defi-
nition of the transpose. The last property follows from the fourth property above and the
definition of the inverse. In particular, we have:

AT (A−1 )T = (A−1 A)T = IT = I

and, in the same way, (A−1 )T AT = I. Therefore, (A−1 )T is the inverse of AT .

117
10.7 Matrix equations and their solutions
A system of m linear equations in n unknowns x1 , x2 , . . . , xn is a set of m equations
of the form
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .

am1 x1 + am2 x2 + . . . + amn xn = bm .


The given numbers aij are known as the coefficients of the system. The numbers b1 , b2 , . . . , bm
appearing on the right hand side of these equations are also given.

We say that s1 , s2 , . . . , sn is a solution of this system if all m equations hold true when
x 1 = s 1 , x2 = s 2 , . . . , x n = s n .

A system of linear equations is also known as a set of simultaneous equations. It is also


referred to as a linear system.

The matrix A = (aij )m×n , whose (i, j) entry is the coefficient aij of the linear system is
called the coefficient matrix:
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 
A =  .. .. ... ..  .
 . . . 
am1 am2 . . . amn

If we denote the n × 1 column vector whose entries are the unknowns x1 , x2 , . . . , xn by


 
x1
 x2 
 
x =  ..  ,
.
xn

and we also denote the m×1 column vector whose entries are the given numbers b1 , b2 , . . . , bm
by  
b1
 b2 
 
b =  ..  ,
 . 
bm

then our linear system can be expressed as the matrix equation

Ax = b.

Finding the set of all solutions of such a matrix equation will be the main focus of the
following few lectures.

118
10.8 Exercises for self study
Exercise 10.8.1 If a and b are both column matrices of size n × 1, what is the size of
the matrix product aT b? What is the relationship between aT b and bT a?
& ' & '
2 5 x y
Exercise 10.8.2 Let B = and set B−1 = . Find numerical values for
0 1 z w
x, y, z and w and hence determine the matrix B−1 by solving the system of equations
& '& ' & '
2 5 x y 1 0
=
0 1 z w 0 1

corresponding to&the matrix


' equation BB−1 = I. Then,
& generalise
' your result by replacing
2 5 a b
the matrix B = by the general matrix B = with ad − bc )= 0.
0 1 c d

Exercise 10.8.3 Consider the matrices


       
2 1 1 1 2 1 0 1
A = 1 1  b= 1  C = 3 0 −1 D = 2 5 
0 3 −1 4 1 1 6 3

Which of the following matrix expressions are defined? Compute those which are defined.

(a) Ab (b) CA (c) A + Cb (d) A + D (e) bT D


T
(f) DA + C T
(g) b b (h) bbT (i) Cb

Exercise 10.8.4 Using the definition of the inverse of a matrix and the rules of matrix
algebra found in subsection 10.3, prove that if A and B are invertible matrices of the same
order, then AB is invertible and

(AB)−1 = B−1 A−1 .

10.9 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.1 of our Algebra textbook are all relevant.

119
11 Matrices, 2 of 3
11.1 Solving systems of n linear equations for n unknowns
We begin our investigation of linear systems by considering systems of n linear equations
for n unknowns x1 , x2 , ..., xn . In particular, we have

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .

an1 x1 + an2 x2 + . . . + ann xn = bn

for some given numbers {aij } and {bi }.

This system can be expressed as a matrix equation of the form

Ax = b,

where the square matrix A and the column vectors x and b are defined below:
     
a11 a12 . . . a1n x1 b1
 a21 a22 
. . . a2n     
  x2   b2 
A =  .. .. . . . ..  , x =  .  , b =  . .
 . . .  ..   .. 
an1 an2 . . . ann xn bn

In general, the matrix equation Ax = b may not have a solution, and even if it has a
solution, this solution may not be unique. In order to appreciate this point, let us consider
three simple examples of linear systems, each involving two equations for two unknowns.
We will see that the first system has a unique solution, the second system has no solution
and the third system has infinitely many solutions. For simplicity, the unknowns are
denoted by x and y instead of x1 and x2 .

Example 11.1.1 Consider the linear system


C
2x + 3y = 12
x + 4y = 11.

In order to solve it, we multiply the second equation by 2 to obtain the equivalent system
C
2x + 3y = 12
2x + 8y = 22

and then subtract the first equation from the second one to find the value of y. In this
way, we get
5y = 10,
which implies that
y = 2.

120
By substituting this value in any one of the original equations, we find that
x = 3.
Thus, the system has the unique solution
(x, y) = (3, 2).

Example 11.1.2 Consider the linear system


C
2x + 6y = 12
x + 3y = 11.

Following the same method, we multiply the second equation by 2 to obtain the equivalent
system C
2x + 6y = 12
2x + 6y = 22
and then subtract the first equation from the second one. We get the contradictory equation
0 = 10,
which implies that this system has no solution.

Example 11.1.3 Consider the linear system


C
2x − 4y = 10
x − 2y = 5.

Using the same approach, we multiply the second equation by 2 to obtain the equivalent
system C
2x − 4y = 10
2x − 4y = 10.
This implies that our system imposes only a single restriction on the variables x and y.
Hence, any pair of values for x and y that satisfies
2x − 4y = 10
is a solution of this system. In particular, we can assign an arbitrary value t ∈ R (known
as a free parameter) to any one of the variables x and y, and then solve the equa-
tion 2x − 4y = 10 for the other variable. For example, if we let y play the role of the
free parameter t ∈ R and make x the subject of the equation 2x − 4y = 10, we obtain
x = 5 + 2t. In this way, we express the general solution of our linear system in the
so-called parametric form:
x = 5 + 2t
y = t.
Clearly, this system has infinitely many solutions. Each solution corresponds to a different
value of t ∈ R. For example, if t = 0, we find (x, y) = (5, 0). If t = 1, we find (x, y) = (7, 1),
and so on.

The first and the third linear systems are called consistent because they admit at least
one solution. The second linear system is called inconsistent because it admits no
solution.

121
11.2 Elementary row operations
As the number of equations and unknowns in a linear system increases, so does the com-
plexity of the algebra involved in finding solutions. A systematic approach is therefore
needed. The key idea is to perform invertible algebraic operations on the linear system
which are designed to produce a succession of increasingly simpler linear systems. These
systems must all be equivalent to the original system in the sense that they should preserve
its set of solutions. The process terminates when a linear system is reached whose con-
sistency or inconsistency is evident and whose set of solutions, if any, can be immediately
obtained. It turns out that this final system has a coefficient matrix which is always in
reduced row echelon form.

Typically, there are three main algebraic operations involved in this process:

• Multiplying both sides of an equation by a non-zero constant.

• Interchanging two equations.

• Adding a multiple of one equation to another equation.

These operations are known as elementary row operations. It should be immediately


emphasised that they are all invertible operations whose inverses are also elementary row
operations. More precisely, their respective inverses are the following:

• Multiplying both sides of that particular equation by the reciprocal of the constant
used above.

• Interchanging the same two equations again.

• Subtracting the same multiple of the first equation from the second equation. Note
that ‘subtracting ... from’ is equivalent to ‘adding the negative of ... to’, so this last
operation is an elementary row operation in accordance with the definition of that
term.

In subsection 11.3, we will prove the invertibility of each elementary row operation by a
more formal argument which involves invertible matrices, called elementary matrices. For
the time being, let us return to the linear systems presented in Examples 11.1.1, 11.1.2 and
11.1.3 and use elementary row operations to establish the consistency or the inconsistency
of each system and also obtain its set of solutions whenever that system is consistent.

Example 11.2.1 The linear system considered in Example 11.1.1 is


C
2x + 3y = 12
x + 4y = 11.

Our aim is to use a sequence of elementary row operations that result in an equivalent
system whose coefficient matrix is in reduced row echelon form. We can start by creating
a leading 1 in the first row of the corresponding matrix equation. In order to achieve this,
we can either divide the first equation by 2 or simply interchange the two equations. Let
us perform the latter operation and denote the corresponding interchanging of the rows by

R1 ↔ R2.

122
In this way, we get the equivalent system
C
x + 4y = 11
2x + 3y = 12.

We now subtract two times the first equation from the second equation. This corresponds
to the row operation
R2 → R2 − 2R1
and produces the simpler equivalent system
C
x + 4y = 11
−5y = −10.

By multiplying the second equation by − 15 , we create a leading 1 in the second row of the
corresponding coefficient matrix. This row operation is denoted by
1
R2 → − R2
5

and yields a system whose coefficient matrix is in row echelon form:


C
x + 4y = 11
y = 2.

Finally, we can transform this matrix into reduced row echelon form by subtracting four
times the second equation from the first equation. The corresponding row operation is

R1 → R1 − 4R2

and produces the simplest possible equivalent system; namely,


C
x = 3
y = 2.

The system is clearly consistent and has a unique solution which can be read directly. Note
that the coefficient matrix of this system is equal to the identity matrix
& '
1 0
I=
0 1

and that this is indeed a matrix in reduced row echelon form.

Example 11.2.2 The linear system considered in Example 11.1.2 is


C
2x + 6y = 12
x + 3y = 11.

We can start again by creating a leading 1 in the first row of the corresponding matrix
equation. To this end, we perform the row operation

R1 ↔ R2

123
which brings our system into the form
C
x + 3y = 11
2x + 6y = 12.
We now subtract two times the first equation from the second equation. The corresponding
row operation
R2 → R2 − 2R1
produces the equivalent system
C
x + 3y = 11
0 = −10.
which is clearly inconsistent. The coefficient matrix of this last system has a row of zeros,
& '
1 3
,
0 0
so the process of row-reduction cannot continue further. As with the previous example,
note that this final coefficient matrix is indeed in reduced row echelon form.

Example 11.2.3 The linear system considered in Example 11.1.3 is


C
2x − 4y = 10
x − 2y = 5.
Now, the elementary row operation
R1 ↔ R2
yields C
x − 2y = 5
2x − 4y = 10
and the elementary row operation

R2 → R2 − 2R1

produces the equivalent system C


x − 2y = 5
0 = 0.
The coefficient matrix of this system has the reduced row echelon form
& '
1 −2
,
0 0
so the process cannot continue further. As with the previous example, this matrix contains
a row of zeros. However, our current system does admit solutions. In fact, the entire set
of solutions is readily available: By assigning the role of the free parameter t ∈ R to the
variable y and solving the first equation for the variable x, we arrive at the general solution
of our system expressed in parametric form:

x = 5 + 2t

y = t.

124
11.3 Elementary matrices and row-equivalence
It is an interesting fact that all the elementary row operations used in the previous sub-
section can be represented themselves by square matrices of the appropriate order. For
the examples presented in subsection 11.2, the appropriate order is 2. In the general case
where the system to which elementary row operations are applied is a linear system of n
equations for n unknowns, the appropriate order in n. These square matrices are known
as elementary matrices.

Formally, an elementary matrix E is defined as an n×n matrix obtained from the identity
matrix In by performing exactly one elementary row operation on In . For example,
 
  1 0 0 0
& ' 1 0 0
1 0 0 1 0 0 
,  0 0 1 ,  0 7 1 0 

0 3
0 1 0
0 0 0 1

are all elementary matrices. The first elementary matrix is obtained from I2 by multiplying
the second row of I2 by 3. This matrix operates (in a way to be explained below) on
a linear system of two equations for two unknowns. It represents the elementary row
operation corresponding to multiplying the second equation of such a system by 3. The
second elementary matrix is obtained from I3 by interchanging the second and third rows
of I3 . This matrix operates on a linear system of three equations for three unknowns. It
represents the elementary row operation corresponding to interchanging the second and
third equations of such a system. Finally, the third elementary matrix is obtained from
I4 by adding seven times the second row to the third row of I4 . This matrix operates
on a linear system of four equations for four unknowns. It represents the elementary row
operation corresponding to adding seven times the second equation to the third equation
of such a system.

In order to illustrate how elementary matrices operate on linear systems, let us return to
Example 11.2.1 and find all the elementary matrices involved in the reduction of this system
to its simplest possible equivalent form. For notational convenience, let us introduce the
so-called augmented matrix of the linear system Ax = b, denoted by

(A|b).

As its notation suggests, the augmented matrix is obtained by appending the column vector
b to the matrix A as the last column. When actual numerical entries are used, the vertical
bar is often omitted.

Example 11.3.1 The augmented matrix of the linear system considered in Example
11.2.1 is & '
2 3 12
(A|b) = .
1 4 11
The sequence of elementary row operations originally used in Example 11.2.1 in order to
transform this system into an equivalent system in reduced row echelon form is listed again
below:

E1 : R1 ↔ R2

125
E2 : R2 → R2 − 2R1
1
E3 : R2 → − R2
5
E4 : R1 → R1 − 4R2.

The way in which elementary matrices operate on the augmented matrix of a linear system
is by left matrix multiplication. For example, the matrix product E1 (A|b) represents
the very first elementary row operation performed on our system. Since the augmented
matrix of our system is a 2 × 3 matrix, the elementary matrices E1 , E2 , E3 and E4
representing the row operations E1, E2, E3 and E4 must be 2 × 2 matrices. Otherwise,
left matrix multiplication cannot be defined. Recall that, by its very construction, a 2 × 2
elementary matrix E representing an elementary row operation E is obtained from the
identity matrix I2 by performing the exact same elementary row operation E on I2 . Hence,
we find that
& ' & ' & ' & '
0 1 1 0 1 0 1 −4
E1 = E2 = E3 = E4 = .
1 0 −2 1 0 − 15 0 1

Let us now consider the corresponding sequence of equivalent linear systems arising in this
process, starting from the original system and terminating with the system in reduced row
echelon form. Each such system is represented by its augmented matrix, which is denoted
below by (Ai |bi ). The original system is denoted by (A|b), the system obtained after
performing the elementary operation E1 is denoted by (A1 |b1 ), the system obtained after
performing E2 is denoted by (A2 |b2 ), and so on. The following sequence is copied from
Example 11.2.1:
& '
2 3 12
Original system : (A|b) =
1 4 11
& '
1 4 11
After performing E1 : (A1 |b1 ) =
2 3 12
& '
1 4 11
After performing E2 : (A2 |b2 ) =
0 −5 −10
& '
1 4 11
After performing E3 : (A3 |b3 ) =
0 1 2
& '
1 0 3
After performing E4 : (A4 |b4 ) = .
0 1 2

It is now straightforward to confirm that the elementary matrices E1 , E2 , E3 and E4


reproduce the above sequence of augmented matrices by means of left matrix multiplication.
Specifically,

126
& '& ' & '
0 1 2 3 12 1 4 11
E1 (A|b) = = = (A1 |b1 )
1 0 1 4 11 2 3 12
& '& ' & '
1 0 1 4 11 1 4 11
E2 (A1 |b1 ) = = = (A2 |b2 )
−2 1 2 3 12 0 −5 −10
& '& ' & '
1 0 1 4 11 1 4 11
E3 (A2 |b2 ) = = = (A3 |b3 )
0 − 15 0 −5 −10 0 1 2
& '& ' & '
1 −4 1 4 11 1 0 3
E4 (A3 |b3 ) = = = (A4 |b4 ).
0 1 0 1 2 0 1 2
Note that the coefficient matrix A4 is in reduced row echelon form and equal to the identity
matrix I. In other words, we have the result that

E4 E3 E2 E1 (A|b) = (I|b4 ),

which implies, in particular, that

E4 E3 E2 E1 A = I.

We will analyse this expression further in the following two subsections, where we will show
that not only does this expression imply that the matrix A is invertible, but it also provides
an algorithm for calculating the inverse matrix A−1 . Of course, as it stands, the expression
E4 E3 E2 E1 A = I is not valid for any linear system of n equations for n unknowns. This
is because, in the general case, the reduced row echelon form of the matrix A may not be
equal to the identity matrix I. Instead, it may contain one or more rows of zeros, as it
turned out to be the case in Examples 11.2.2 and 11.2.3.

Let us therefore generalise the expression

E4 E3 E2 E1 A = I

by replacing the identity matrix I by the reduced row echelon form RRE(A) of the coef-
ficient matrix A, and by considering k elementary row operations instead of the above 4.
In this way, we derive the general expression

Ek Ek−1 ...E2 E1 A = RRE(A)

which is valid for any linear system of n equations for n unknowns.

In fact, this result can be regarded as a corollary of a more general theorem. In order to
be able to state this theorem, we need one more definition: If A and B are matrices of the
same size, we say that A is row-equivalent to B only if there is a sequence of elementary
row operations that transforms A to B. If this is the case, B is also row-equivalent to A.

Theorem 11.3.5 Every matrix is row-equivalent to a matrix in reduced row echelon


form.

127
The proof is omitted. Note that, using the terminology just introduced, our result

Ek Ek−1 ...E2 E1 A = RRE(A)

amounts to the statement that the coefficient matrix A of a linear system of n equations
for n unknowns is row-equivalent to its reduced row echelon form RRE(A).

11.4 Theorems on matrix invertibility


Recall that elementary row operations are invertible in the sense that their effect can be
undone by applying the inverse elementary row operations. Let us now formalise this result
by considering the elementary matrices that represent them.

Theorem 11.4.1 Every elementary matrix E is invertible, and the inverse E−1 is also an
elementary matrix.

Proof If E is an elementary matrix of order n, then E is obtained from the identity


matrix In by performing some elementary row operation on In . Given that any elementary
row operation can be undone by its inverse row operation, let F be the elementary matrix
obtained from In by performing that inverse row operation on In . Since these operations
cancel the effect of each other, we have that

FE = In and EF = In .

Thus, E is an invertible matrix, and F is its inverse:

F = E−1 .

This completes the proof. !

We are now ready to answer an important question: ‘When is a matrix invertible?’ We


collect several results in the following theorem, referred to as the main theorem on matrix
invertibility:

The Main Theorem 11.4.2 If A is an n × n matrix, then the following statements are
all equivalent. This means that if any of these statements is true for a square matrix A,
then all the statements are true for A. Conversely, if any of these statements is false, then
all the statements are false.

(1) A−1 exists.

(2) Ax = b has a unique solution for any b ∈ Rn .

(3) Ax = 0 only has the trivial solution x = 0.

(4) The reduced row echelon form of A is I.

Proof Note that if we show that (1) =⇒ (2) =⇒ (3) =⇒ (4) =⇒ (1), then any one
statement will imply all the others, so the statements are equivalent.

128
(1) =⇒ (2). We assume that A−1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the matrix
A−1 to solve for x by multiplying the equation on the left by A−1 . We have

A−1 Ax = A−1 b =⇒ Ix = A−1 b =⇒ x = A−1 b.

This shows that x = A−1 b is the only possible solution; and it is a solution, since A(A−1 b) =
(AA−1 )b = Ib = b. So Ax = b has a unique solution for any b ∈ Rn .

(2) =⇒ (3). If Ax = b has a unique solution for all b ∈ Rn , then this is true for
b = 0. The unique solution of Ax = 0 must be the trivial solution, x = 0.

(3) =⇒ (4). If the only solution of Ax = 0 is x = 0, then the reduced row echelon form of
A must have a leading 1 in every column. Otherwise, there would be at least a row of zeros
in this matrix, which would imply the presence of free parameters and hence the occurrence
of infinitely many solutions. Since the reduced row echelon form of A is a square matrix
and has a leading 1 in every column, it can only be the n × n identity matrix.

(4) =⇒ (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence of row operations which reduce A to I, so there must exist elementary
matrices E1 , . . . , Ek such that
Ek Ek−1 . . . E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
by first multiplying the equation on the left by E−1 −1
k , then by Ek−1 , and so on, to obtain
−1 −1
A = E−1
1 . . . Ek−1 Ek I.

This says that A is a product of invertible matrices, hence invertible.

This completes the proof that the above four statements are all equivalent. !

Let us complete this subsection by including a theorem about the inverse of an invertible
square matrix A.

Recall that the inverse of such a matrix A is a matrix A−1 with the property that A−1 A = I
and AA−1 = I. A question arises:

If we are given a square matrix B such that B is a right inverse of A, i.e.,

AB = I,

is it the case that B is also a left inverse of A? In other words, does it follow that

BA = I

and hence that B = A−1 ? The answer is yes, as the following theorem guarantees:

Theorem 11.4.3 If A and B are n × n matrices and AB = I, then A and B are each
invertible matrices, and A = B−1 and B = A−1 .

Proof If we show that the homogeneous system of equations Bx = 0 has only the trivial
solution, x = 0, then by Theorem 11.4.2 this will prove that B is invertible. So we consider

129
the matrix equation Bx = 0 and multiply both sides of this equation on the left by the
matrix A. We have:

Bx = 0 =⇒ A(Bx) = A0 =⇒ (AB)x = 0.

But we are given that AB = I, so that

(AB)x = 0 =⇒ Ix = 0 =⇒ x = 0.

This shows that the only solution of Bx = 0 is the trivial solution. We therefore conclude
that B is invertible, so the matrix B−1 exists.

We now multiply both sides of the equation AB = I on the right by the matrix B−1 . We
have:

AB = I =⇒ (AB)B−1 = IB−1 =⇒ A(BB−1 ) = B−1 =⇒ A = B−1 .

So A is the inverse of B, and therefore A is also an invertible matrix. Then taking inverses
of both sides of the last equation, we conclude that A−1 = (B−1 )−1 = B. !

Note that this theorem also implies that if a square matrix A is a left inverse of a square
matrix B, then A is also a right inverse of B.

11.5 Inversion algorithm based on elementary row operations


Recall that the application of k elementary row operations to an arbitrary linear system
of n equations for n unknowns results in a matrix equation of the form

Ek Ek−1 ...E2 E1 A = RRE(A).

Now, from statement (4) of the Main Theorem 11.4.2, we know that if A is an invertible
matrix of order n, the reduced row echelon form of A is the n × n identity matrix I. Hence
we have
Ek Ek−1 ...E2 E1 A = I.
Moreover, the square matrix Ek Ek−1 ...E2 E1 is a left inverse of A. By Theorem 11.4.3, it
is equal to A−1 :

Ek Ek−1 ...E2 E1 = A−1 .


By inserting the identity matrix at the end of the matrix product on the left hand side of
the above equation, we obtain the statement

Ek Ek−1 ...E2 E1 I = A−1 ,

which together with the statement

Ek Ek−1 ...E2 E1 A = I

tell us that the same row operations that transform A to I also transform I to A−1 .

130
This gives us a method for finding the inverse of an invertible square matrix A. We start
with the matrix A and we form a new, larger matrix by placing the identity matrix to the
right of A, obtaining the matrix denoted by (A|I). We then use row operations to reduce
this to (I|B). If this is not possible (because we get a row of zeros) then the matrix is not
invertible. If it can be done, then A is invertible and B = A−1 .

Example 11.5.1 Let us use this algorithm to show that the matrix A encountered in
Example 11.1.1 is invertible. Then let us find A−1 and use it to obtain the unique solution
of the corresponding linear system

2x + 3y = 12

x + 4y = 11.
We construct the matrix & '
2 3 1 0
(A|I) =
1 4 0 1

and perform a sequence of row operations that transform this matrix into (I|A−1 ) provided,
of course, that this is possible. One such sequence of row operations is given below:
& '
1 4 0 1
R1 ↔ R2 :
2 3 1 0
& '
1 4 0 1
R2 → R2 − 2R1 :
0 −5 1 −2
& '
1 1 4 0 1
R2 → − R2 :
5 0 1 − 15 25
& '
1 0 45 − 35
R1 → R1 − 4R2 :
0 1 − 15 25
& 4
' & '
−1 5
− 35 1 4 −3
It follows that A = = . Hence, by the Main Theorem 11.4.2,
− 15 2
5
5 −1 2
the solution of the system
2x + 3y = 12
x + 4y = 11
is unique. In order to obtain this solution, we multiply the matrix equation Ax = b by
A−1 on the left to obtain
A−1 Ax = A−1 b,
which implies that
& '& ' & ' & '
1 −1 4 −3 12 1 15 3
x=A b= = = .
5 −1 2 11 5 10 2

131
11.6 Exercises for self study
Exercise 11.6.1 Use the algorithm based on elementary row operations to investigate if
the following matrices are invertible. For each matrix that is invertible, find its inverse.
   
1 2 3 1 2 3
(a) A = 2 3 0 (b) B = 2 3 0
0 1 2 0 1 6

Exercise 11.6.2 Given a system of equations

Ax = b

where b may represent several different vectors, it is often more practical to find A−1 (if
it exists) and then obtain the solutions using x = A−1 b. Following this approach, use
your inverse A−1 of the matrix A obtained in Exercise 11.6.1 and solve the linear system
Ax = b in each of the following cases:
     
1 1 0
(i) b = 0
 (ii) b = 1
 (iii) b = 1 .

3 1 0

Exercise 11.6.3 (a) Use elementary row operations to solve the linear system

 2x + 5y + z = 4
6x + 15y + 2z = 9

z = 3.

(b) Then use the algorithm based on elementary row operations to investigate if the coef-
ficient matrix of the above system is invertible. Does your answer agree with the solution
obtained in part (a)?

Exercise 11.6.4 (a) Find a sequence E1 , E2 , . . . , Ek of elementary row operations in


order to obtain the reduced row echelon form RRE(A|b) of the augmented matrix (A|b)
of the following system: C
x − 5y = −9
2x + 4y = 10.

(b) Hence find the elementary matrices E1 , E2 , . . . ,Ek corresponding to the elementary
row operations that you performed in part (a).
(c) Calculate the product matrices

Ek Ek−1 ...E2 E1 A

and
Ek Ek−1 ...E2 E1 b.
What does each of these matrices tell you about the linear system in part (a)?

132
11.7 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Section 3.1 of our Algebra textbook is relevant.

12 Matrices, 3 of 3
12.1 Minors, cofactors and the determinant
The determinant of an n × n matrix A, denoted by |A| or detA, is a number derived
from A which determines whether or not the matrix A is invertible. By the Main Theorem
11.4.2, a matrix A is invertible if and only if the corresponding matrix equation Ax = b
has a unique solution.

In order to appreciate how the issue of the uniqueness of the solution of the matrix equation
Ax = b boils down to a single number |A|, let us investigate the simplest possible matrix
equation of the form Ax = b; namely, one whose coefficient matrix is a 1 × 1 matrix.
Our task is to derive a condition which guarantees the uniqueness of the solution of the
equation Ax = b and then define the determinant |A| as a number which is characteristic
of this uniqueness property.

With A, x and b all being 1 × 1 matrices and given by


A = (a), x = (x), b = (b),

the equation ax = b has the unique solution


b
x=
a
only if a )= 0. Thus, if we define the determinant of a 1 × 1 matrix A by
|A| = |(a)| = a,

we arrive at the statement that the matrix A is invertible if and only if |A| =
) 0.

Let us repeat this procedure for an equation Ax = b whose coefficient matrix is a 2 × 2


matrix. Setting & ' & ' & '
a11 a12 x1 b
A= , x= , b= 1
a21 a22 x2 b2
and using elementary row operations, it is not difficult to show that the equation Ax = b
has the unique solution
a22 b1 − a12 b2 a11 b2 − a21 b1
x1 = and x2 =
a11 a22 − a12 a21 a11 a22 − a12 a21
only if a11 a22 − a12 a21 )= 0. Thus, if we define the determinant of a 2 × 2 matrix A by
D& 'D
D a11 a12 D
|A| = DD D = a11 a22 − a12 a21 ,
a21 a22 D

133
we again arrive at the statement that the matrix A is invertible if and only if |A| =
) 0.

For matrix equations Ax = b whose coefficient matrix A is of order higher than 2, this
approach is rather impractical. Luckily, a recursive formula can be established which
expresses the determinant of the n × n matrix A in terms of the determinants of (n − 1) ×
(n − 1) sub-matrices of A. These sub-matrices can then be expressed in terms of their own
(n−2)×(n−2) sub-matrices, and so on, until the original determinant is expressed entirely
in terms of the determinants of 1 × 1 matrices. These last determinants can be calculated
directly since, by definition, they are equal to their corresponding entries, according to
|(a)| = a.

Remark 12.1.1 Provided that the formula


D& 'D
D a11 a12 D
D D
D a21 a22 D = a11 a22 − a12 a21

giving the determinant of a 2 × 2 matrix is memorised, the recursive procedure described


above does not need to go all the way, but it can actually be terminated when 2×2 matrices
are reached.

The determinants of the above-mentioned sub-matrices of A are known as the (i, j) minors
of A. There is also a corresponding concept known as the (i, j) cofactors of A. The relevant
definitions follow:

Given an n × n matrix A, the (i, j) minor of A, denoted by Mij , is the determinant of


the (n − 1) × (n − 1) matrix obtained by removing the ith row and jth column of A.

The (i, j) cofactor of a matrix A, denoted by Cij , is derived from the corresponding minor
of A by using the formula
Cij = (−1)i+j Mij .
In other words, the (i, j) cofactor of A is equal to the (i, j) minor of A if i + j is even, and
it is equal to the negative of that minor if i + j is odd.

Example 12.1.2 Consider the 3 × 3 matrix


 
a b c
A = d e f  .
g h i

Let us write down the (i, j) minors of A and their corresponding cofactors:

Since A has nine entries, there are nine minors. The (i, j) minor of A is the determinant
of the 2 × 2 matrix obtained by removing the ith row and jth column of A. We have:
D& 'D D& 'D D& 'D
D e f DD D d f DD D d e DD
M11 = DD , M12 = DD , M13 = DD ,
h i D g i D g h D
D& 'D D& 'D D& 'D
D b c DD D a c DD D a b DD
M21 = DD , M22 = DD , M23 = DD ,
h i D g i D g h D

134
D& 'D D& 'D D& 'D
D b c D D a c D D a b D
M31 D
=D D, M32 D
=D D, M33 D
=D D.
e f D d f D d e D

Using the formula Cij = (−1)i+j Mij , we obtain the corresponding nine cofactors of A. We
have:
D& 'D D& 'D D& 'D
D e f D D d f DD D d e DD
C11 = DD D, C12 = − DD , C = D ,
h i D g i D 13 D g h D
D& 'D D& 'D D& 'D
D b c D D a c DD D a b DD
C21 = − D D D, C22 = DD , C = − D ,
h i D g i D 23 D g h D
D& 'D D& 'D D& 'D
D b c D D a c DD D a b DD
C31 = DD D, C32 = − DD , C33 = DD .
e f D d f D d e D

Remark 12.1.3 The formula Cij = (−1)i+j Mij says the following: The position (i, j) of
the (i, j) entry in A is associated with a ‘+’ sign or a ‘−’ sign according to the alternating
pattern  
+ − +
− + −  .
+ − +
It should be emphasised that this allocation of signs refers to the positions of the entries
in A. It does not refer to the values of these entries, which may be positive, negative or
zero.

This pattern of signs (which can be extended to any n × n matrix in a obvious way) allows
the connection between cofactors and minors to become transparent. For example, if we
want to calculate the (1, 3) cofactor of A, the ‘+’ sign in position (1, 3) informs us that the
(1, 3) cofactor is equal to the (1, 3) minor:
D& 'D
D d e D
C13 = +M13 = D D D.
g h D

On the other hand, if we want to calculate, say, the (2, 1) cofactor of A, the ‘−’ sign in
position (2, 1) informs us that the (2, 1) cofactor is equal to the negative of the (2, 1) minor:
D& 'D
D b c D
C21 = −M21 = − DD D.
h i D

With these definitions in place, we can provide a recursive formula for calculating the
determinant of any n × n matrix A: Choose any row or any column of A and multiply
each entry in that row or column by the corresponding cofactor of A. Then sum these
individual products to obtain the determinant |A| of A.

Example 12.1.4 Let us use this recursive definition of the determinant in order to confirm
that the determinant of the 2 × 2 matrix
& '
a11 a12
A=
a21 a22

135
is given by the expression a11 a22 − a12 a21 . Recall that this expression was previously
obtained by considering the uniqueness of the solution of the matrix equation Ax = b.

Let us choose the first row of A in order to obtain the so-called ‘cofactor expansion of |A|
by row 1’. We have:
D& 'D
D a11 a12 D
|A| = D D D = a11 C11 + a12 C12 = a11 M11 − a12 M12
a21 a22 D

= a11 |(a22 )| − a12 |(a21 )| = a11 a22 − a12 a21 .


Alternatively, let us choose the second column of A to obtain the ‘cofactor expansion of
|A| by column 2’. We have:
D& 'D
D a11 a12 D
|A| = DD D = a12 C12 + a22 C22 = −a12 M12 + a22 M22
a21 a22 D

= −a12 |(a21 )| + a22 |(a11 )| = −a12 a21 + a11 a22 .


Note that both calculations give the same answer for |A|.

Example 12.1.5 Find the determinant |A| of the matrix


 
1 2 3
A =  4 1 1 .
−1 3 0

Since we are free to expand |A| by any row or column, it is convenient to choose a row
or a column which involves the fewest number of calculations. Here, we see that a33 = 0,
so expanding by either row 3 or column 3 is equally convenient. Let us choose the third
column. Recall that the signs associated with the positions of the entries in A are
 
+ − +
− + −  .
+ − +

This means, in particular, that the signs associated with the positions of the entries in the
third column of A are  
+
−  .
+
Thus, using the above column of signs and the corresponding minors of A, we find:
D D
D 1 2 3 D D& 'D D& 'D D& 'D
D D D 4 1 D D 1 2 D D 1 2 D
|A| = DD 4 1 1DD = +3 DD D − 1D
D
D D
D −1 3 D + 0 D 4 1 D
D
D −1 3 0 D −1 3

= 3(12 + 1) − 1(3 + 2) = 34.

Example 12.1.6 Does the following linear system admit a unique solution?

136

 x + 2y + 3z = 12
4x + y + z = −7

−x + 3y = 28
The coefficient matrix of this system is given by
 
1 2 3
A =  4 1 1 .
−1 3 0

The fastest way of establishing whether or not this system has a unique solution (if a
solution actually exists) is to calculate the determinant |A|. The determinant of the matrix
A was calculated in Example 12.1.5, so we know that |A| = 34. Since |A| = ) 0, the system
Ax = b has a unique solution.

For completeness, let us discuss ways of obtaining this solution. Recall from Lecture Notes
11 that one way of finding this solution is to use elementary row operations in order to
transform the matrix equation Ax = b into an equivalent equation in reduced row echelon
form; that is, Ix = s. The identity matrix I is simply the reduced row echelon form of A,
and the vector s is the unique solution. An alternative way of solving this system is to use
elementary row operations in order to derive the matrix (I|A−1 ) from the matrix (A|I)
and then obtain the unique solution s by means of the matrix equation s = A−1 b.

12.2 The properties of the determinant


Several results about determinants are collected below and stated as theorems. These
results arise frequently in both theoretical considerations and practical problems. They
are quite easy to memorise.

Theorem 12.2.1 If A is an n × n matrix, then

|AT | = |A|.

Proof This theorem follows from the result stated without proof in subsection 12.1 that
all cofactor expansions of a matrix A agree. Since the cofactor expansion of |AT | by row
i is precisely the same, number for number, as the cofactor expansion of |A| by column i,
we immediately deduce that |AT | = |A|.

Theorem 12.2.2 If a row or a column of an n × n matrix A consists entirely of zeros,


then |A| = 0.

Proof We just need to realise that the cofactor expansion of |A| by that particular row or
column results in every cofactor being multiplied by 0. Since all these individual products
are zero, their sum and hence |A| is also zero.

Example 12.2.3 Find the determinant of the matrix A given below and also the deter-
minant of its transpose AT :

137
 
0 0 ... 0
 a21 a22 . . . a2n 
 
A =  .. .. .. ..  .
 . . . . 
an1 an2 . . . ann

The cofactor expansion of |A| by row 1 gives


D D
D 0 0 . . . 0 D
D D
D a21 a22 . . . a2n D
D D
D .. .. ... .. D = 0C11 + 0C12 + · · · + 0C1n = 0.
D . . . DD
D
D an1 an2 . . . ann D

Alternatively, we can use Theorem 12.2.2 to obtain the same answer.

Similarly, the cofactor expansion of |AT | by column 1 gives


D D
D 0 a21 . . . an1 D
D D
D0 a22 . . . an2 D
D D
D .. .. . . .. D = 0.
D . . . . DD
D
D 0 a2n . . . ann D

Alternatively, we can apply Theorem 12.2.2 one more time, or we can combine Theorem
12.2.1 with the fact that |A| = 0.

Theorem 12.2.4 If an n × n matrix A contains two rows or two columns which are
equal, then |A| = 0.

Proof We will only provide a sketch of this proof, as it requires the method of induction.
Moreover, we will only consider the case of ‘equal rows’. This is because if this result holds
for ‘equal rows’ then, by Theorem 12.2.1, it also holds for ‘equal columns’.

Let us first consider a 2 × 2 matrix A with two equal rows. We have


D& 'D
D a b D
|A| = DD D = ab − ab = 0.
a b D

Now consider a 3 × 3 matrix with two equal rows. If we expand the determinant by the
other row, then each entry in that row is multiplied by a cofactor which is zero. This is
because any such cofactor is equal (up to sign) to the determinant of a 2 × 2 matrix with
two equal rows. Since each individual such product is zero, their sum is zero and so is the
determinant. For example,
D D
D a b c D D& 'D D& 'D D& 'D
D D D b c D D a c D D a b D
D   D
|A| = D d e f D = −d D D D D D D D
b c D + e D a c D − f D a b D = 0 + 0 + 0 = 0.
D a b c D

In a similar way, one can show that if this result is valid for (n − 1) × (n − 1) matrices,
then it is also valid for n × n matrices. Hence, by the method of induction, it is valid for
all square matrices or order n ≥ 2.

138
Example 12.2.5 Calculate the determinant of the following matrix:
 
2 3 −6 2 8
1 7 8 1 −9
 
3 2 3 3 −1
 .
4 5 −7 4 0
9 1 1 9 8

We observe that the first and fourth columns are equal. Hence, by Theorem 12.2.4, the
determinant of this matrix is zero.

Theorem 12.2.6 If the cofactors of one row are multiplied by the entries of a different
row and these products are added, then the result is 0. In other words,

aj1 Ci1 + aj2 Ci2 + · · · + ajn Cin = 0

for all i and j such that i )= j.

The proof of this result is omitted, but it can be found in section 3.3 of our Algebra
Textbook under ‘proof of Corollary 3.30’. Note that, by Theorem 12.2.1, the above result
remains valid if ‘row’ is replaced by ‘column’.

Example 12.2.7 Consider the matrix


 
1 3 0
4 −2 3 .
5 1 2

Let us multiply the cofactors of the first column by the entries of the second column and
add these products. We have:

a12 C11 + a22 C21 + a32 C31 = a12 M11 − a22 M21 + a32 M31

= (3)(−4 − 3) − (−2)(6 − 0) + (1)(9 − 0) = −21 + 12 + 9 = 0,


in agreement with the ‘column version’ of Theorem 12.2.6.

Theorem 12.2.8 If an n × n matrix A is upper triangular, lower triangular or diagonal,


then
|A| = a11 a22 . . . ann .

The proof of this theorem is omitted, but the result itself is quite straightforward to
understand. The following example provides a visualisation.

Example 12.2.9 Find the determinant of the following upper triangular matrix:
 
1 3 0 5
0 −2 1 4
A= 0 0
.
2 3
0 0 0 4

139
We expand the determinant |A| by the first column of A to get
D D
D −2 1 4 D
D D
|A| = (1) DD 0 2 3DD + 0 + 0 + 0.
D 0 0 4 D

Expanding by the first column again, we find


D& 'D
D 2 3 D
D
|A| = (1)(−2) D D + 0 + 0.
0 4 D

One final expansion by the first column yields the final answer:
|A| = (1)(−2)(2)(4) = −16.

Alternatively, we can use Theorem 12.2.8 to get the same answer directly.

12.3 Calculating determinants using row operations


A square matrix in row echelon form is upper triangular. By Theorem 12.2.8, its determi-
nant can easily be calculated as the product of the diagonal entries. Hence, if we know how
the determinant of a matrix A is affected by row operations, we can use row operations to
transform A into RRE(A), and then deduce the value of the determinant of A by linking
it to that of RRE(A) which is generally much easier to calculate.

In this subsection, we look at how each elementary row operation affects the value of the
determinant.

• Multiplying a row by a non-zero constant:

Suppose that the matrix B is obtained from a matrix A by multiplying row i by a non-zero
constant α. For example,
D D D D
D a11 a12 . . . a1n D D a11 a12 . . . a1n DD
D D D
D a21 a22 . . . a2n D Dαa21 αa22 . . . αa2n D
D D D D
|A| = D .. .. .. .. D , |B| = D .. .. .. .. D .
D . . . 
. D D D  . . . . DD
D D
D an1 an2 . . . ann D D an1 an2 . . . ann D

If we evaluate |B| using the cofactor expansion by row i, we obtain


|B| = αai1 Ci1 + αai2 Ci2 + · · · + αain Cin
= α(ai1 Ci1 + ai2 Ci2 + · · · + ain Cin )
= α|A|.

So, the effect of multiplying a row of A by a non-zero constant α is to multiply |A| by α.

• Interchanging two rows:

Let us provide a sketch of an inductive proof. If A is a 2 × 2 matrix and B is the matrix


obtained from A by interchanging two rows, then
D& 'D D& 'D
D a b D D c d D
|A| = DD D = ad − bc, |B| = DD D = bc − ad,
c d D a b D

140
so
|B| = −|A|.
Now let A be a 3 × 3 matrix and let B be a matrix obtained from A by interchanging two
rows of A. For example, suppose that we interchange the first and third rows of A:
D D D D
D a b c D D g h i D
D D D D
|A| = DDd e f DD , |B| = DDd e f DD .
D g h i D D a b c D

By expanding both determinants |A| and |B| using the row that is not involved in this
interchanging, i.e., the second row, we see that the corresponding cofactors have their rows
interchanged and hence have opposite signs:
D& 'D D& 'D D& 'D
D b c D D a c D D a b D
|A| = −d DD D + eD D D
D g i D−fD g h D
D
h i D
D& 'D D& 'D D& 'D
D h i D D g i D D g h D
|B| = −d DD D + eD D D D
D a c D − f D a b D.
b c D
So, it remains true that
|B| = −|A|.
More generally, one can show that if this result holds for (n − 1) × (n − 1) matrices, then it
also holds for n × n matrices. By the method of induction, it holds for all square matrices
of order n ≥ 2.

Summarising, the effect of interchanging two rows of a matrix is to multiply the determinant
by −1.

• Adding a multiple of one row to another:

Suppose that the matrix B is obtained from the matrix A by adding k times row i of A
to row j of A, where j )= i. For example, consider the case in which B is obtained from A
by adding 4 times row 1 of A to row 2 of A. Then
D D
D a11 a12 . . . a1n D
D D
D a21 a22 . . . a2n D
D D
|A| = D .. .. . . .. D ,
D . . . . DD
D
D an1 an2 . . . ann D
D D
D a11 a12 ... a1n D
D D
D a21 + 4a11 a22 + 4a12 . . . a2n + 4a1n D
D D
|B| = D .. .. .. .. D.
D . . . . D
D D
D an1 an2 ... ann D

In general, in a situation like this, we can expand |B| by row j (which in the above example
is row 2):
|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + · · · + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + · · · + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn )
= |A| + 0.

141
The second bracket above is equal to 0 because it consists of the cofactors of one row
multiplied by the entries of another row, whose sum vanishes by Theorem 12.2.6.

We conclude that there is no change in the value of the determinant if a multiple of one
row is added to another row.

Moreover, by Theorem 12.2.1, all the previous statements are true if ‘row’ is replaced by
‘column’. Collecting these statement together, we have the following theorem:

Theorem 12.3.1 The effect of a row (column) operation on |A|:

• If a row of A is multiplied by a non-zero constant α then |A| changes to α|A|.


• If two rows are interchanged, then |A| changes to −|A|.
• If a multiple of one row is added to another, then there is no change in |A|.

Example 12.3.2 Just as an illustration of the use of row operations in the calculation of
determinants let us express D& 'D
D 3 4 D
D D
D 2 7 D

as a determinant of a matrix in upper triangular form. We have:


D& 'D D& 'D D& 'D D& 'D
D 3 4 D D 3 4 D D 0 − 13 D D 1 7 D
D D = 2D D = 2D 2 D = −2 D 2 D = −2(1)(− 13 ) = 13.
D 2 7 D D 1 7 D D 1 7 D D 0 − 13 D 2
2 2 2

The elementary row operations used in the above sequence should be clear by inspection.

Remark 12.3.3 Several examples involving the use of elementary row operations in the
calculation of large determinants can be found in the Exercises.

12.4 Inverting a matrix using cofactors and the determinant


Theorem 12.4.1 If A and B are n × n matrices, then
|AB| = |A||B|
The proof of this important theorem is omitted, but we note the following consequence of
this theorem: If A is invertible, then AA−1 = I. Taking the determinant of both sides of
this matrix equation, we find that
|AA−1 | = |I|,
which implies, by Theorem 12.4.1 and the fact that |I| = 1, that
|A||A−1 | = 1.
Clearly, if the product of two real numbers, such as |A| and |A−1 |, is non-zero then neither
number is zero. Hence, dividing the above equation by |A|, we deduce that the determinant
of the inverse of A is the reciprocal of the determinant of A:
1
|A−1 | = .
|A|

142
Theorem 12.4.2 If A is an n × n matrix, then A is invertible if and only if |A| =
) 0.

Proof Recall that one of the conclusions of Main Theorem 11.4.2 was that A is invertible
if and only if the reduced row echelon form of A is the identity matrix. Let A be any n × n
matrix and let RRE(A) be the reduced row echelon form of A. If A is invertible then
RRE(A) is the identity matrix and |RRE(A)| = 1. If A is not invertible, then RRE(A)
has at least one row of zeros and |RRE(A)| = 0.

In subsection 12.3 we saw that row operations either rescale the determinant by a non-zero
constant, or change its sign, or do not affect it at all. Therefore, we can conclude that
|A| = 0 if and only if |RRE(A)| = 0, and |A| =) 0 if and only if |RRE(A)| = 1.

Putting these statements together, |A| )= 0 if and only if RRE(A) = I. By the Main
Theorem 11.4.2, this implies that |A| =
) 0 if and only if A is invertible. !

Note that this is precisely how we motivated the concept of the determinant in the begin-
ning of subsection 12.1.

Theorem 12.4.3 Given an invertible n × n matrix A, its inverse matrix A−1 is equal to
1
A−1 = adj(A),
|A|

where the so-called adjoint matrix, denoted by adj(A), is defined as the transpose of the
matrix of cofactors of A. The latter is simply the matrix whose (i, j) entry is the (i, j)
cofactor of A. In other words, the adjoint of A is the matrix
 
C11 C21 ... Cn1
 C12 C22 . . . Cn2 
 
adj(A) =  .. .. .. ..  .
 . . . . 
C1n C2n . . . Cnn

The proof of this theorem can be found in our Algebra Textbook in Section 3.4 under
Definition 3.40.

Example 12.4.4 Use the method of the adjoint matrix to find the inverse of
 
1 2 3

A = −1 2 1
4 1 1

We know that
1
A−1 = adj(A).
|A|
Starting from |A|, we can expand this determinant by row 1. We have:

|A| = 1(2 − 1) − 2(−1 − 4) + 3(−1 − 8) = −16.

) 0, we proceed to calculate the inverse matrix A−1 .


Noting that |A| =

143
The matrix of the cofactors Acof is given by
 
1 5 −9
Acof =  1 −11 7  .
−4 −4 4

The adjoint matrix adj(A) is equal to the transpose of Acof , so


 
1 1 −4
adj(A) =  5 −11 −4 .
−9 7 4

Hence,  
1 1 −4
1
A−1 = −  5 −11 −4 .
16
−9 7 4
One can check that A−1 A = I.

12.5 Cramer’s rule


If A is an n×n invertible matrix, the so-called Cramer’s rule gives us an alternative method
of solving the system of linear equations Ax = b.

Theorem 12.5.1 Consider an n × n matrix A with |A| = ) 0. Let x represent the vector
of unknowns x1 , x2 , . . . , xn . Then for any n × 1 column vector b, the solution of the linear
system Ax = b is given by
|Ai |
xi = ,
|A|
where Ai is defined as the matrix derived from A by replacing the ith column of A with
the vector b.

The proof of Theorem 12.5.1 can be found in section 3.4.2 of our Linear Algebra textbook.

Example 12.5.1 Use Cramer’s rule to find the solution of the linear system

 x + 2y + 3z = 7
−x + 2y + z = −3

4x + y + z = 5

This system can be expressed in the matrix form Ax = b where


     
1 2 3 x 7

A = −1 2 1 , x= y   and b = −3 .

4 1 1 z 5

The matrices A1 , A2 and A3 are derived from the coefficient matrix A by replacing the
1st, 2nd and 3rd column of A with the vector b. We have:
     
7 2 3 1 7 3 1 2 7

A1 = −3 2 1  
A2 = −1 −3 1  
A3 = −1 2 −3
5 1 1 4 5 1 4 1 5

144
Provided that |A| =
) 0, Cramer’s rule tells us that the unique solution of the system Ax = b
is given by

|A1 | 7(2 − 1) − 2(−3 − 5) + 3(−3 − 10) −16


x= = = = 1,
|A| 1(2 − 1) − 2(−1 − 4) + 3(−1 − 8) −16

|A2 | 1(−3 − 5) − 7(−1 − 4) + 3(−5 + 12) 48


y= = = = −3,
|A| 1(2 − 1) − 2(−1 − 4) + 3(−1 − 8) −16

|A3 | 1(10 + 3) − 2(−5 + 12) + 7(−1 − 8) −64


z= = = = 4.
|A| 1(2 − 1) − 2(−1 − 4) + 3(−1 − 8) −16

12.6 Exercises for self study


Exercise 12.6.1 (a) Evaluate the following determinants using a cofactor expansion by
an appropriate row or column.
D D
D D D 1 2 1 0 DD
D 2 6 1 D D
D D D 3 2 1 0 D
|A| = DD 1 0 5DD |B| = DD D
D −3 1 4 D

D 0 1 6 5DD
D 0 1 1 1 D

(b) Can you infer from part (a) what the reduced row echelon forms of the matrices A and
B are?

Exercise 12.6.2 (a) Evaluate the following determinant using row operations:
D D
D 1 4 −1 3 0 D
D D
D  1 7 4 3 8 D
D D
D 2 8 −2 6 0D
D D
D  2 0 5 5 7 D
D D
D −1 9 0 9 2 D

& '
2−λ 3
(b) For which values of λ is the matrix A = not invertible?
2 1−λ

Exercise 12.6.3 (a) Evaluate the following determinant using row operations:
D D
D 1 −4 3 2 D
D D
D2 −7 5 1D
|A| = DD D

D 1 2 6 0DD
D 2 −10 14 4 D

(b) Based on your answer in part (a), does the matrix equation Ax = 0 have a solution?
If yes, is it unique?

(c) Consider an n × n matrix A whose determinant has the value det(A) = 5. Find the
values of det(3A), det(A2 ), det((2A)−1 ) and det(2A−1 ).

145
Exercise 12.6.4 Solve the system of linear equations described by the augmented matrix
& '
2 −3 3
(A|b) =
4 2 14

using the following methods:

(i) Using row operations on the augmented matrix (A|b);

(ii) Performing row operations on (A|I) to obtain A−1 and then using x = A−1 b;

(iii) By using the adjoint matrix to find A−1 and then obtaining x as in part (ii);

(iv) By using Cramer’s rule.

12.7 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 3.2, 3.3 and 3.4 of our Algebra textbook are all relevant.

13 Developing Geometric Insight, 1 of 2


13.1 Visualising the set R2 using position vectors
In Lecture Notes 3, we defined the set R2 as the set of all pairs (a, b) where a and b are
elements of R:

R2 = {(a, b) | a ∈ R and b ∈ R}.


Clearly, there
& ' is a one-to-one correspondence between the above set and the set of column
& '
a a
matrices , according to which the point (a, b) corresponds to the column matrix .
b b
Hence, we can also write
* & a' D +
D
R =
2
D a ∈ R and b ∈ R .
b
& '
a
The column vector is simply known as a vector. The real numbers a and b are known
b
& ' & ' & ' & '
a 3 −2 1
as the components of the vector . For example, , √ , are all vectors
b 1 3 π
& '
1
and the real numbers 1 and π are the components of the vector .
π
& '
a
In the context of the Cartesian plane, each vector can be visualised as an arrow
b
starting at the origin (0, 0) and ending at the point (a, b). This arrow is depicted below:

146
Figure 13.1.1

Accordingly, the set R2 is visualised as a set of arrows of various lengths and directions,
all starting at the origin of the Cartesian plane and moving outwards:

Figure 13.1.2

It is common to call these arrows position vectors.

13.2 Visualising vector operations in R2


Vectors in R2 are 2×1 matrices, so matrix addition and scalar multiplication are applicable
to vectors as well. On the other hand, it should be clear that matrix multiplication is not
defined for vectors, since the number of columns in the left matrix must be equal to the
number of rows in the right matrix in order for matrix multiplication to be defined, and
this is not the case here.
In terms of position vectors, matrix addition and scalar multiplication are visualised as
follows:
& ' & '
u1 v1
• Vector addition : Given two vectors u = and v = , their sum u + v =
& ' u2 v2
u1 + v 1
corresponds geometrically to the so-called parallelogram rule:
u2 + v 2

147
Figure 13.2.1
& '
u1
• Scalar multiplication : Given a vector u = and a scalar α ∈ R, the scalar
& ' u 2
αu1
multiple αu = is the vector illustrated below:
αu2

Figure 13.2.2
& ' & '
v1 w1
• Parallel vectors : Two vectors v = and w = are called parallel if there
v2 w2
exists a non-zero real&scalar
' α such
& that
' w = αv; that is, if w1 = αv1 and w2 = αv2 . For
2 −1
example, the vectors and are parallel since the system of equations −1 = 2α
−6 3
and 3 = −6α has a consistent non-zero solution α = −1/2; i.e.,
& ' & '
−1 1 2
=− .
3 2 −6
& '
v1
• Length (or norm) : Given a vector v = , its length (or norm) is defined by
v2
/
||v|| := (v12 + v22 ). This corresponds to Pythagoras theorem for displacements:

148
Figure 13.2.4
/ / √ /
Note that ||αv|| = (αv1 )2 + (αv2 )2 = α2 (v12 + v22 ) = α2 (v12 + v22 ) = |α|||v||:, i.e.,
the length of a rescaled vector is equal to the original length times the absolute value of
the scalar used in the rescaling.
& '
v
• Unit vector : A vector v = 1 is called a unit vector if its length is equal to 1; that
v2 & ' & ' & '
/ 0.6 0 −1
2 2
is, if (v1 + v2 ) = 1. For example, the vectors , and are all unit vectors,
& ' 0.8 1 0
2
while the vector is not.
1
& ' & '
v1 0
Note that given any non-zero vector v = (any vector other than ) one can create
v2 0
a parallel unit vector pointing in the same direction as v by dividing v by its length ||v||.
& '
3 √
For example, starting from the vector whose length is 32 + 42 = 5 one can create
& ' & ' & '4
1 3 3/5 0.6
the unit vector = = . This is parallel to the original vector and points
5 4 4/5 0.8
in the same direction.
& ' & '
v1 w1
• Scalar product : The scalar product of two vectors v = and w = is a
v2 w2
number defined by
9v, w: := v1 w1 + v2 w2 .
The geometric meaning of this number will be derived later in the course. For the time
being, you just need to know that the scalar product of two vectors is equal to the product
of their lengths multiplied by the cosine of the angle between them:

9v, w: = ||v|| ||w|| cos(θ).

• Angle between vectors : Making the angle θ the subject of the equation

9v, w: := ||v|| ||w|| cos(θ)

we obtain the relation


9v, w: v1 w 1 + v2 w 2
cos(θ) = =/ 2 / ,
||v|| ||w|| v1 + v22 w12 + w22

149
which defines the angle between any two non-zero vectors v and w.
& ' & '
1 0
For example, the angle between and is given by
1 2

(1)(0) + (1)(2) 2 1
cos(θ) = √ √ = √ =√ ,
12 + 12 02 + 22 2 2 2
π
which yields θ = .
4
• Orthogonal vectors : Two non-zero vectors are called orthogonal if the angle θ between
π
them is equal to . This implies that cos(θ) = 0. Given that
2
9v, w:
cos(θ) = ,
||v|| ||w||
we deduce that the scalar product of two orthogonal vectors is zero:

9v, w: = 0.
& ' & '
1 −2
For example, the vectors and are orthogonal since
1 2
E& ' & 'F
1 −2
, = (1)(−2) + (1)(2) = 0.
1 2

13.3 Lines in R2
Definition and some terminology Suppose that we are given a vector p in R2 and a
non-zero vector d in R2 . We define the line through vector p in the direction of vector
d as the subset of R2 consisting of all vectors r for which r = p + td for some value of
the real parameter t. This last equation is called the parametric equation of the line
through p in the direction of d. The corresponding line is visualised below:

Figure 13.3.1

Remark 13.3.2 Note that the line is visualised as a collection of position vectors, where
each value of the parameter t gives a different position vector in this collection. If, alterna-
tively, one wishes to visualise the line as a collection of points in the Cartesian plane, then

150
it is the end-points of the position vectors that trace this line. If p has components p1 and
p2 , a common terminology for the above line is that it goes through the point (p1 , p2 ) and
is parallel to the vector d.

Example & 13.3.3


' As an example,
& let
' us construct the parametric equation of the line
1 1
through in the direction of (or, in the common terminology, the line through
2 −1& '
1
the point (1, 2) parallel to the vector ). Denoting the general position vector on this
& ' −1
x
line by r = , we obtain the parametric equation
y
& ' & ' & '
x 1 1
= +t .
y 2 −1

This line is sketched below:

Figure 13.3.4

Geometric description There are three common ways of describing a line in R2 : using
(i) two distinct points on the line, (ii) a point on the line and a vector parallel to the line
and (iii) a point on the line and a vector orthogonal (also known as perpendicular) to the
line. These cases are visualised on the next page by using position vectors.

In each case, the parametric and the Cartesian equations of the line are derived.

Case (i) will be discussed in detail and cases (ii) and (iii) only briefly. These cases are
interconnected, so the examples and illustrations used for case (i) are equally relevant for
the other two cases.

151
Figure 13.3.5

Case (i): Given two distinct


& ' points on& the ' line, (p1 , p2 ) and (q1 , q2 ), the corresponding
p1 q
position vectors are p = and q = 1 . In order to construct a parametric equation
p2 q2
we need a position vector on the line (we& can choose' either p&or'q) and a vector parallel
q − p1 x
to the line. One such vector is q − p = 1 . Let x = be the general position
q2 − p 2 y
vector on the line. Then, the line is described parametrically as the set of all position
vectors x for which x = p + t(q − p) for some value of the real parameter t. In component
form, we have & ' & ' & '
x p1 q1 − p 1
= +t ,
y p2 q2 − p 2
where t ∈ R.

152
Figure 13.3.6

In order to construct a Cartesian equation, we need a position vector on the line (we can
take either p or q) and a vector n perpendicular to the line. Any such vector is called a
normal vector
& and 'is unique up to rescaling by a non-zero scalar. Given that the line is
q1 − p 1
parallel to , a normal vector n can be obtained by switching these components
q2 − p 2 & '
q2 − p 2
around and multiplying one of them by minus one: n = . This trick works
& ' & ' −q 1 + p1
a b
because the scalar product of with is zero:
b −a
E& ' & 'F
a b
, = (a)(b) + (b)(−a) = 0.
b −a

Given such a normal vector n, the line is the set of all position vectors x for which x − p
is perpendicular to n; that is,
9x − p, n: = 0.

Figure 13.3.7

153
This last equation 9x − p, n: = 0 can be expressed in component form as

(x − p1 )(q2 − p2 ) + (y − p2 )(−q1 + p1 ) = 0.

It provides a Cartesian equation for the line.


An alternative way for obtaining a Cartesian equation is by eliminating the parameter t
from the parametric equation
& ' & ' & '
x p1 q1 − p 1
= +t .
y p2 q2 − p 2

Indeed, provided that q1 − p1 and q2 − p2 are both non-zero, the parameter t can be made
the subject of this set of equations in two different ways: either as
x − p1
t=
q1 − p 1
or as
y − p2
t= .
q2 − p 2
Equating these two expressions we obtain a single equation; namely,
x − p1 y − p2
= .
q1 − p 1 q2 − p 2

This is equivalent to the Cartesian equation derived in the previous paragraph. It is left
as an exercise to investigate what happens to this elimination process if either q1 − p1 or
q2 − p2 is zero. What is the Cartesian equation of the line in each case? Note that q1 − p1
and q2 −p2 cannot be both zero because the points (p1 , p2 ) and (q1 , q2 ) are assumed distinct.
Example 13.3.8 Find a parametric and a Cartesian equation for the line in R2 through
the points (1, 1) and (3, 0), sketched below:

Figure 13.3.9

154
Parametric equation &We'just&need ' to &select
' a position vector on the line and a direction
x 1 2
vector (see the sketch): = +t , t ∈ R.
y 1 −1
& '
1
Cartesian equation We need to have a position vector on the line such as as well as
& ' 1
2
a normal vector n. The latter can be derived from the direction vector using the trick
& ' E& −1 ' & 'F
1 x−1 1
mentioned earlier: n = . A Cartesian equation then follows: , = 0.
2 y−1 2
This equation reduces to (x−1)+2(y −1) = 0 or, equivalently, to x+2y = 3. Alternatively,
a Cartesian equation can be obtained by eliminating t from the parametric equation. We
x−1 y−1 x−1 y−1
have t = and t = . Equating these two expressions we get = . This
2 −1 2 −1
gives −x + 1 = 2y − 2 or, equivalently, x + 2y = 3.

Example 13.3.10 Find a parametric equation for the line in R2 described by the Carte-
sian equation x + 2y = 3:

Geometric method We arrange the Cartesian equation so that the x and y terms
appear on the same side. This is already the case here. Since x + 2y = 3 corresponds to
arranging the Cartesian equation in the
& ' form 9x, n: = 9p, n:, the coefficients of x and y
1
are the components of n. Hence, n = . A direction vector d is derived from n by the
& ' 2
−2
trick: d = . Having found d, we just need a position vector on the line. This can
1
be found by choosing any values (x, y) that satisfy x + 2y = 3. One such option is (5, −1).
A parametric equation then follows:
& ' & ' & '
x 5 −2
= +s , s ∈ R.
y −1 1

As an exercise, show that this parametric equation and the parametric equation in Example
13.3.8 describe the same set of position vectors; i.e., the same line.

Algebraic method A single equation such as x+2y = 3 cannot determine both variables
x and y uniquely. As a result, one of these variables can be regarded as a free parameter
in the sense that it can take any real value. For example, we can think of y as a free
parameter and write y = λ where λ ∈ R. Then, the Cartesian equation x + 2y = 3 implies
that x = 3 − 2λ. The expressions x = 3 − 2λ and y = λ provide the general solution of
the Cartesian equation x + 2y = 3 in parametric form. Each value of λ gives a distinct
solution of this equation.

Recall that we encountered this method in Lectures 11 and 12 when we applied it to linear
systems that have infinitely many solutions.

If we now express the general solution given by x = 3 − 2λ and y = λ in vector form,


& ' & '
x 3 − 2λ
= ,
y λ

155
we obtain precisely a parametric description for the line x + 2y = 3:
& ' & ' & '
x 3 −2
= +λ , λ ∈ R.
y 0 1
As an exercise, start from the Cartesian equation x + 2y = 3 and find its general solution
by now treating x as a free parameter (instead of y). Convince yourselves that the resulting
parametric equation is equivalent to the previous one.

Let us finally complete the presentation of lines in R2 by briefly considering the remaining
two cases.
& '
d1
Case (ii): Given a point (p1 , p2 ) on the line and a vector d = parallel to the line, a
d2
parametric equation follows directly:
& ' & ' & '
x p1 d
= +t 1 , t ∈ R.
y p2 d2
& '
d2
A normal vector can be obtained via the trick: n = . Hence, a Cartesian equation
−d1
is E& ' & 'F
x − p1 d2
, = 0.
y − p2 −d1

Figure 13.3.12
& '
n1
Case (iii): Given a point (p1 , p2 ) on the line and a vector n = perpendicular to the
n2
line, a Cartesian equation follows directly:
E& ' & 'F
x − p1 n
, 1 = 0.
y − p2 n2
For the parametric
& ' description, we need a direction vector d. This can be found via the
n2
trick: d = . Hence, a parametric equation is
−n1
& ' & ' & '
x p1 n2
= +t , t ∈ R.
y p2 −n1

156
Figure 13.3.13

13.4 Geometric and algebraic approaches to linear systems


Consider a system of 2 linear equations for 2 unknowns. Each equation represents a line. A
solution of this system is a vector whose components x and y must satisfy both equations,
which means that the end-point of this vector is on both lines, so (x, y) has to be their
point of intersection. There are only three distinct geometric possibilities: the lines are
parallel, coincident or intersecting. Let us investigate these cases by means of simple
examples. We will analyse each example from both the geometric and the algebraic point
of view.

Example 13.4.1 A typical example of parallel lines is the following:


C
x + 3y = 2
x + 3y = 5.
& '
1
From the geometric point of view, both lines have as a normal vector, so they are
3
either parallel or coincident. The first line goes through the point (2, 0) while the second
line does not go through that point, so they are not coincident. Hence, they are parallel
and have no point of intersection.

From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row whose first two entries are zero and whose third entry is
non-zero. Therefore, the system is inconsistent and has no solution.

Example 13.4.2 A typical example of coincident lines is the following:


C
x + 3y = 2
2x + 6y = 4.
& '
1
From the geometric point of view, both lines have n = as a normal vector, so they
3
are parallel or coincident. They both go through the point (2, 0) so they are coincident.
Hence, all points on the first line are also on the second line. There are infinitely many
points of ‘intersection’.

From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row of zeros, showing that one of these linear equations is

157
redundant. Hence, there is a single restriction on (x, y), which implies infinitely many
solutions.

Example 13.4.3 A typical example of lines intersecting at a unique point is the following:
C
x + 3y = 2
3x − 2y = 5.
& '
1
From the geometric point of view, the first line has as a normal vector and the second
& ' 3
3
line has as a normal vector, so these lines are not parallel. Consequently, they must
−2
have a unique point of intersection.

From the point of view of linear algebra, the reduced row echelon form of the coefficient
matrix is the identity I2 . Hence, there is a unique solution.

13.5 Exercises for self study


Exercise 13.5.1 Consider the vectors
& ' & '
3 2
u= and v= .
4 0
(a) Represent the vectors u, v and their difference u − v by position vectors in R2 .
(b) Calculate the distance ||u − v|| between these vectors.
(c) Find two unit vectors parallel to u − v and add them to your graph as position vectors.
(d) Write down an expression for the most general vector orthogonal to u.
(e) Hence write down a Cartesian equation which describes the line in R2 which is parallel
to u and passes through the origin of the Cartesian plane. Add this line to your sketch.

Exercise 13.5.2 For a ∈ R and b ∈ R, consider the following linear system:


C
2x − 3y = −1
ax + 6y = b.

(i) Using elementary row operations, or otherwise, determine the value of a for which this
system does not have a unique solution.
(ii) For the particular value of a obtained in part (i), determine the value of b which makes
the system consistent. Hence, write down the general solution of this system in vector
parametric form and represent two of these solutions by position vectors in R2 . Add to
your sketch the line in R2 corresponding to the general solution of this system.
(iii) In the case where a has the value obtained in part (i) and b does not have the value
obtained in part (ii), interpret geometrically the corresponding linear system.
(iv) In the case where a does not have the value obtained in part (ii), what is the geometric
interpretation of the corresponding linear system? Is this interpretation affected by the
particular value of b?

158
Exercise 13.5.3 Consider the vectors
& ' & '
4 2
u= and v= .
1 2

(a) Represent the vectors u, v and their sum u + v by position vectors in R2 .


(b) Calculate the following lengths: ||u||, ||v||, ||u+v||. Is it true that ||u+v|| = ||u||+||v||?

(c) Find a vector w parallel to v such that ||w|| = 2 and represent it by a position vector
on your graph.
(d) Find a unit vector û parallel to u and represent it by a position vector on your graph.
(e) Calculate the dot product u · v and hence the angle between u and v.
(f) Find a unit vector n orthogonal to u and represent it by a position vector on your
graph.

Exercise 13.5.4 Consider the line in R2 described by the Cartesian equation

x + 2y = 6.

(a) Sketch the line in R2 .


(b) By treating y as a free parameter t ∈ R, write down the general solution of the equation
x + 2y = 6 in vector parametric form.
(c) Add to your graph in part (a) the position vectors of the solutions obtained in part (b)
corresponding to t = 0, t = 1 and t = 2.

Now consider the linear system C


x + 2y = 6
2x + y = 4.

(d) On a new graph, sketch the two lines described by the above Cartesian equations.
(e) By using elementary row operations, find the reduced row echelon form (I|s) of the
augmented matrix & '
1 2 6
(A|b) =
2 1 4

and hence the unique solution s of the above linear system. Represent your solution s by
a position vector on your graph in part (d).
(f) On that same graph, sketch the two lines whose Cartesian equations correspond to the
augmented matrix (I|s).

13.6 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.

159
Parts of Sections 1.9 and 1.10 of our Algebra Textbook are relevant.

• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 1.3 and 1.4 of our Calculus Textbook are also relevant.

14 Developing Geometric Insight, 2 of 2


14.1 Visualising vectors and operations in R3
 
a
By analogy with R2 , R3 is defined as the set of all 3 × 1 column matrices  b  where a,
  c
* a D +
D
b and c are real numbers: R3 =  b D a ∈ R, b ∈ R, c ∈ R . We still call the column
  c  
a a
matrix b a vector and the real numbers a, b and c the components of the vector b .
  
c c
We visualise R3 as a three-dimensional coordinate space that is equipped with three mu-
tually orthogonal axes intersecting at the origin. The x-axis and the y-axis are horizontal
and the z-axis is vertical. The orientation of these axes follows the so-called right hand
rule: If the index finger of the right hand points in the direction of the x-axis and the
middle finger in the direction of the y-axis, then the thumb points in the direction of the
z-axis:

Figure 14.1.1
 
a
Each vector b  is visualised as a position vector (an arrow starting at the origin (0, 0, 0)

c
and ending at the point (a, b, c)) or, alternatively, as the point (a, b, c) itself.
All the definitions introduced in the /case of R2 can be extended to R3 . For example, the
norm of a vector is given by ||v|| := (v12 + v22 + v32 ), the scalar product of two vectors is
given by 9v, w: := v1 w1 + v2 w2 + v3 w3 , and so on.
Additionally, there is a vector operation in R3 that has no counterpart in R2 . This is the
following: Let    
a1 b1

a = a2  and b = b2 

a3 b3

160
be any two vectors and let
     
1 0 0
e1 = 0  ,
 e2 = 1  ,
 e3 = 0 

0 0 1
be the unit vectors pointing along the x, y and z axis respectively. The vector product
(or cross product) of the vectors a and b is the vector a × b defined by calculating the
determinant
D D
D e1 e2 e3 D
D D
a × b = DDa1 a2 a3 DD = (a2 b3 − a3 b2 )e1 − (a1 b3 − a3 b1 )e2 + (a1 b2 − a2 b1 )e3
D b1 b2 b3 D
     
1 0 0
= (a2 b3 − a3 b2 ) 0 − (a1 b3 − a3 b1 ) 1 + (a1 b2 − a2 b1 ) 0 .
    
0 0 1
It can be shown using the properties of the determinant that the vector a × b is orthogonal
to both a and b.

14.2 Planes in R3
Geometric Descriptions

There are three common ways of describing a plane in R3 : by using (i) three distinct points
on the plane which are not collinear (i.e., do not lie on the same line), (ii) a point on the
plane and two vectors parallel to the plane but not parallel to each other and (iii) a point
on the plane and a vector perpendicular to the plane. These cases are depicted below using
position vectors in the three-dimensional real coordinate space:

Figure 14.2.1

161
Parametric and Cartesian Equations

Case (i): Given 3 distinct non-collinear points (p1 , p2


, p3 ),
 (q1 , q2 ,
q3 ) 
and (s1 , s2 ,
s3 ) 
on
p1 q1 s1
the plane, the corresponding position vectors are p = p2 , q = q2  and s = s2 .
p3 q3 s3
In order to construct a parametric equation we need a position vector on the plane (which
can be chosen as p or q or s) and two vectors parallel to the plane. Two such vectors are
   
q1 − p 1 s1 − p 1
q − p = q 2 − p 2  and s − p = s 2 − p 2  .
q3 − p 3 s3 − p 3

Note that the vectors q − p and s − p are not parallel  to


 each other because (p1 , p2 , p3 ),
x
(q1 , q2 , q3 ) and (s1 , s2 , s3 ) are not collinear. Let x = y  be the general position vector
z
on the plane. Then, the plane is described parametrically as the set of all position vectors
x for which x = p + λ(q − p) + µ(s − p) for some values of the real parameters λ and µ.
In component form, we have
       
x p1 q1 − p 1 s1 − p 1
 y  = p 2  + λ q 2 − p 2  + µ s 2 − p 2  ,
z p3 q3 − p 3 s3 − p 3

where λ ∈ R, µ ∈ R. The plane is visualised as a collection of position vectors, where each


vector in this collection is obtained by selecting a pair of values (λ, µ).
The resulting two-parameter family of position vectors (whose endpoints are on the plane)
is visualised below:

Figure 14.2.2

162
In order to construct a Cartesian equation we need a position vector on the plane (again,
we can take p or q or s) and a vector n perpendicular to both direction vectors q − p and
s − p. The fastest way of finding such an n is by using the vector product of the vectors
q − p and s − p; that is, the vector n = (q − p) × (s − p). Alternatively, we can find n in
a different way, by solving the system of equations 9n, q − p: = 0 and 9n, s − p: = 0. The
solution of this system is unique up to rescaling n by a non-zero scalar. Unfortunately,
there is no trick that can be used here to speed up this process. Regardless of how n is
obtained, the corresponding Cartesian equation describes the plane as the set of all position
vectors x for which x − p is perpendicular to n; that is,

9x − p, n: = 0.

Note the similarity in form between the Cartesian equation of a plane in R3 and that of
a line in R2 . The corresponding geometric situation is visualised below. For clarity, the
vectors n and x − p are not depicted as position vectors (i.e., starting at the origin) but
have been translated.

Figure 14.2.3

An alternative way for obtaining a Cartesian equation for a plane involves eliminating the
parameters λ and µ from the parametric equation
       
x p1 q1 − p 1 s1 − p 1
y  = p2  + λ q2 − p2  + µ s2 − p2  .
z p3 q3 − p 3 s3 − p 3

However, this process is quite laborious. It is mentioned only for completeness.

Example 14.2.4 Find a parametric and a Cartesian equation for the plane in R3 passing
through the points (1, 1, 1), (2, 4, −1) and (3, 1, 0). Note that the sketch below is generic
and does not correspond to the actual plane considered:

163
Figure 14.2.5

Parametric
  equation We need to select a position vector on the plane (let us choose
1
1) and two vectors parallel to the plane but not parallel to each other: two such vectors
1   
1 2
are 3 and 0  (the relevant calculations can be found on the sketch). A parametric
  
−2 −1        
x 1 1 2
equation then follows: y = 1 + λ 3 + µ 0 , where λ ∈ R, µ ∈ R.
      
z 1 −2 −1
 
n1
Cartesian equation A normal vector n = n2  can be found by using the cross product
    n3
1 2
n =  3  ×  0  or by requiring that the scalar product of the normal vector with
−2 −1    
1 2 G n   1 H
1
each of the two direction vectors  3  and  0  is zero: n2  ,  3  = 0 and
−2 −1 n3 −2
Gn   2 H  
n1
1
n2  ,  0  = 0. It is left as an exercise to show that n2  must be a (non-zero)
n3 −1      n3
1 1 1
scalar multiple of 1 . Taking n = 1 and choosing 1 as the position vector on
    
2 2  1
G x − 1  1 H
the plane, the Cartesian equation follows: y − 1 , 1 = 0. This equation reduces
z−1 2
to (x − 1) + (y − 1) + 2(z − 1) = 0 or, equivalently, to x + y + 2z = 4. It is useful to confirm
that the points (1, 1, 1), (2, 4, −1) and (3, 1, 0) satisfy this equation.

164
Example 14.2.6 Find a parametric equation for the plane in R3 described by the Carte-
sian equation x + y + 2z = 4.

Geometric method We check that the Cartesian equation is arranged so that the x,
y and z terms appear on the same side.  Then we identify the components of n with the
1
coefficients of x, y and y. This gives n = 1. We now need to find two direction vectors
2
for the plane; that is, two vectors perpendicular to n but not parallel to each other. For
this 
purpose
 we can use a generalisation
  of  ourprevious
 trick:
 given any normal vector
a 0 c b
n =  b  its scalar products with  c ,  0  and −a are all zero; moreover, given
c −b −a 0
that a, b and c are
 not
 all zero 
(because
 n cannot be a zero vector) then at least two of
0 c b
the vectors  c ,  0  and −a are non-zero and can be used as direction vectors
−b −a 0    
1 0
 
for the plane perpendicular to n. Here, n = 1 so two direction vectors are  2  and
  2 −1
2
 0 . Finally, we need a position vector on the plane. Any point (x, y, z) that satisfies
−1
the Cartesian equation  x +y + 2z
 = 4is fine;
 forexample
 (10, 0, −3). The corresponding
x 10 0 2
parametric equation is y  =  0  + κ  2  + ν  0 , where κ ∈ R, ν ∈ R. In
z −3 −1 −1
order to confirm that this parametric equation describes the plane of Example   14.1.5,
 try

1 2
as an exercise to find the values of (κ, ν) that produce the position vectors 1,  4 
  1 −1
3
and 1 used to define this plane in the first place.
0

Algebraic method As in the case of a line in R2 , we use the fact that a parametric
equation is a general solution for the corresponding Cartesian equation. In R3 we have
three variables: x, y and z. Hence, a single equation such as x + y + 2z = 4 implies that
two of the variables x, y and z can be regarded as free parameters. Let us choose y and z
to play this role and write y = s and z = t, where s ∈ R, t ∈ R. Then x is determined by
the Cartesian equation according to x = 4 − s − 2t. The expressions x = 4 − s − 2t, y = s
and z = t provide the general solution of the Cartesian equation x + y + 2z = 4. Each
pair of values of (s, t) gives a distinct solution of this equation. We can write this general
solution in vector form as
   
x 4 − s − 2t
y  =  s .
z t

165
By expanding this expression, we obtain a parametric description of the plane:
       
x 4 −1 −2
 y  = 0  + s  1  + t  0  , s ∈ R, t ∈ R.
z 0 0 1

As an exercise, find pairs of values of (s, t) that produce the position vectors of the points
(1, 1, 1), (2, 4, −1) and (3, 1, 0) used to define this plane in Example 14.1.5.

Finally, let us complete the presentation of planes in R3 by considering the remaining two
ways of describing such planes.

Case (ii): Suppose we are given a point (p1 , p2 , p3 ) on the plane and two vectors u and v
parallel to the plane but not parallel to each other. Then, denoting the position vec-
tor to the point (p1 , p2 , p3 ) by p, we can write down a parametric equation directly:
x = p + su + tv, t ∈ R, s ∈ R. To construct a Cartesian equation, we find a normal
vector n by either using the cross product u × v or by solving the system of equations
9n, u: = 0 and 9n, v: = 0 (which determine n up to a non-zero multiple). The Cartesian
equation then follows: 9x − p, n: = 0.

Figure 14.2.7
 
n1
Case (iii): Given a point (p1 , p2 , p3 ) on the plane and a vector n = n2  perpendicular

n3
to the plane, let us denote the position vector to the point (p1 , p2 , p3 ) by p. A Cartesian
equation can be written down directly: 9x − p, n: = 0. To construct a parametric equation
we need two vectors u and v parallel to the plane. Following
  our  earlier
 discussion,
 two
 0 n3 n2 
such vectors can always be selected from the set  n3  ,  0  , −n1  . Then a
 
−n2 −n1 0
parametric equation follows: x = p + su + tv, where s ∈ R, t ∈ R.

166
Figure 14.2.8

14.3 Lines in R3
There are two common ways of describing a line in R3 : by using (i) two distinct points on
the line and (ii) a point on the line and a vector parallel to the line.

Parametric and Cartesian Equations

Case (i): Given two distinct points


 (p1 , p2 , p3
) and(q1 , q2 , q3 ) on the line, the corresponding
p1 q1
position vectors are p = p2  and q = q2 . For a parametric equation we need
p3 q3
a position  vector  on the line 
(either
 p or q) and a vector parallel to the line such as
q1 − p 1 x
q − p = q2 − p2 . Let x = y  be the general position vector on the line. Then, the
q3 − p 3 z
line is described parametrically as the set of all position vectors x for which x = p+λ(q−p)
for some value of the real parameter λ. In component form, we have
     
x p1 q1 − p 1
y  = p2  + λ q2 − p2  , where λ ∈ R.
z p3 q3 − p 3

The line is visualised below as a collection of position vectors, where each vector in this
collection is obtained by selecting a value for λ. Again, for clarity, some of the vectors are
not depicted as position vectors but have been translated:

167
Figure 14.3.1

The most straightforward way for arriving at a Cartesiandescription


   for this
 line is by
x p1 q1 − p 1
eliminating the parameter λ from the parametric equation y  = p2  + λ q2 − p2 .
z p3 q3 − p 3
Assuming for the time being that the quantities q1 − p1 , q2 − p2 and q3 − p3 are all non-
zero, we can make λ the subject of the above parametric system in three different ways:
x − p1 y − p2 z − p3
λ= , λ= and λ = . Equating these three expressions produces a
q1 − p 1 q2 − p 2 q3 − p 3
x − p1 y − p2 z − p3
system of two independent equations, namely = = . This system
q1 − p 1 q2 − p 2 q3 − p 3
of equations constitutes the Cartesian description of the line.
The reason why we need two equations in order to describe a line in R3 is the following:
Regardless of whether a line is considered in R2 or in R3 , its description requires a single free
parameter. R3 is a space of three variables, so if the description of a line requires one of them
to remain free, two independent equations must be imposed on these three variables. These
equations correspond precisely to the two equations needed for the Cartesian description.
In contrast, R2 is a space of two variables, so if the description of a line requires one of
them to remain free, one equation must be imposed on these two variables; namely the
single equation needed for the Cartesian description of a line in R2 .
For completeness, let us investigate what happens when one of the quantities in the set
{q1 − p1 , q2 − p2 , q3 − p3 } is zero or when two of these quantities are zero. Without any real
loss of generality, let us assume that q1 − p 1 iszero  q2 − p2 and q3
while − p3 are both non-
x p1 q1 − p 1
zero. Considering the parametric system y  = p2  + λ q2 − p2 , we see that the
z p3 q3 − p 3
first equation becomes x = p1 , which already gives one of the two equations needed for the
Cartesian description. The second equation is derived in the standard way by combining
y − p2 z − p3 y − p2 z − p3
λ= and λ = . This leads to = .
q2 − p 2 q3 − p 3 q2 − p 2 q3 − p 3
The case when two of the quantities in the set {q1 − p1 , q2 − p2 , q3 − p3 } are zero is quite
similar. Let us assume for simplicity that q1 − p1 and q2 − p2 are both zero while q3 − p3

168
is non-zero. Then, the first two equations of the parametric system become x = p1 and
y = p2 . These are already the equations needed for the Cartesian description. The third
z − p3
equation, λ = , informs us that the variable z is free. This is in agreement with the
q3 − p 3
other two variables not being free, according to x = p1 and y = p2 .
Finally, let us note that it is not possible for all three quantities q1 −p1 , q2 −p2 , q3 −p3 to be
zero because the points (p1 , p2 , p3 ) and (q1 , q2 , q3 ) on the line have been assumed distinct.
Example 14.3.2 Find a parametric and a Cartesian equation for the line in R3 passing
through the points (1, 1, 1) and (3, 1, 0). Note that the sketch below is generic:

Figure 14.3.3

Parametric
  equation We need to select a position
 vector on the line (let us choose
1 2
1) and a vector parallel to the line such as  0  (see the sketch). A parametric
1       −1
x 1 2
equation then follows: y = 1 + µ 0 , where µ ∈ R.
    
z 1 −1
Cartesian equation   We eliminate
  µby combining
 the first with the third equation of the
x 1 2
x−1 z−1
parametric system y  = 1 + µ  0 , thus obtaining the equation = .
2 −1
z 1 −1
Equivalently, this equation can be written as x+2z = 3. The other equation needed for the
Cartesian description is given directly by the second equation of the parametric system,
namely y = 1. You can confirm that the Cartesian system of equations x + 2z = 3, y = 1
is satisfied by each of the points (1, 1, 1) and (3, 1, 0) on the line.
It is important to realise that the Cartesian system x + 2z = 3, y = 1 can be interpreted
as the intersection of two planes. The first plane has Cartesian equation x + 2z = 3 and
the second plane has Cartesianequation
 y=1. It is easy to see that these planes intersect
1 0
because their normal vectors 0 and 1 are not parallel. You may also note that
2 0  
2
these normal vectors are perpendicular to the direction vector  0  of the line.
−1

169
Figure 14.3.4

Let us also emphasise that the Cartesian description of a line in R3 is non-unique in a way
that goes beyond a simple rescaling. While the Cartesian equation of a line in R2 can only
be rescaled (i.e., x + 2y = 3 can be written equivalently as 5x + 10y = 15), the situation in
R3 is entirely different. For example, the equations x + 2z = 3, y = 1 describing the above
line can be added together to produce an equivalent system of equations x + y + 2z = 4,
y = 1. Geometrically, the old plane x + 2z = 3 is replaced by the new plane x + y + 2z = 4,
but the intersection of this new plane with the plane y = 1 still produces the same line.
Such operations (which transform a linear system of equations into an equivalent linear
system in the sense that they preserve the set of solutions) correspond to sequences of
elementary row operations.
Example 14.3.5 Find a parametric equation for the line in R3 described by the system
of Cartesian equations x + 2z = 3, y = 1.
Let us utilise the fact that a parametric equation is a general solution for the Cartesian
equations. We have two equations for three variables, which implies that one of the vari-
ables x, y, z can be regarded as free. Clearly, this variable cannot be y because y is equal
to 1. However, we can consider z to be a free variable and write z = t, where t ∈ R. Then,
equation x + 2z = 3 implies
  that x = 3− 2t. Putting x = 3 − 2t, y = 1 and z = t in
x 3 − 2t
vector form we obtain y =   1 . This provides a parametric equation for the
    z  t
x 3 −2
line: y  = 1 + t  0 . Check that this equation is equivalent to the parametric
z 0 1
equation of Example 14.3.2.
We can now complete the presentation of lines in R3 by briefly describing the second case.
Case (ii): Suppose we are given a point (p1 , p2 , p3 ) on the line and a vector u parallel to
the line. Then, denoting the position vector to the point (p1 , p2 , p3 ) by p, we can write
down a parametric equation directly: x = p + su, s ∈ R. The best way of constructing
a Cartesian equation is by eliminating the parameter s from the parametric system, as
discussed earlier. This always results in a system of two linear equations for the variables
x, y and z.

170
14.4 Geometric and algebraic approaches to linear systems
Consider a system of 3 linear equations for 3 unknowns. Each equation represents a plane.
A solution of this system is a vector whose components x, y and z must satisfy all equations,
which means that the end-point of this vector is on all planes, so (x, y, z) has to be a point
of intersection of these planes. There are eight distinct geometric possibilities, illustrated
below by suppressing one dimension (what appears as a line is actually a plane):

Figure 14.4.1

Let us investigate these cases by means of simple examples. We will analyse each example
from both the geometric and the algebraic point of view.
Example 14.4.2 A typical example of three distinct parallel planes is the following:

 x + y − 2z = 2
x + y − 2z = 5

x + y − 2z = 7.
 
1
From the geometric point of view, all planes have  1  as a normal vector, so they are
−2
either parallel or coincident. The first plane goes through the point (2, 0, 0) while the
second and third planes do not go through that point. This already tells us that the three
planes have no point of intersection. Moreover, the second plane goes through the point
(5, 0, 0) while the third plane does not go through that point. So the three planes are
indeed distinct parallel planes.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has two rows whose first three entries are zeros and whose fourth

171
entry is non-zero. In particular, if we subtract the first row from the second and third
rows, we get:  
1 1 −2 2
0 0 0 3 .
0 0 0 5
Therefore, the reduced system has two inconsistent rows and no solution.
Example 14.4.3 A typical example of two coincident planes and a distinct parallel plane
is the following: 
 x + 3y = 2
x + 3y = 7

2x + 6y = 4.
 
1
From the geometric point of view, all planes have a multiple of n = 3 as a normal

0
vector, so they are parallel or coincident. The first and third planes both go through the
point (2, 0, 0) so they are coincident. The second plane does not go through that point, so
it is indeed a distinct parallel plane. There is no point lying on all three planes, so there
is no point of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has a row of zeros and another row whose first three entries are zeros
and whose fourth entry is non-zero. In particular, if we subtract the first row once from
the second row and twice from the third row, we get:
 
1 3 0 2
0 0 0 5  .
0 0 0 0

Therefore, the reduced system has an inconsistent row and no solution.


Example 14.4.4 A typical example of three coincident planes is the following:

 x + 5y + 3z = 2
2x + 10y + 6z = 4.

−x − 5y − 3z = −2.
 
1
From the geometric point of view, all planes have a multiple of n = 5 as a normal

3
vector, so they are parallel or coincident. All planes go through the point (2, 0, 0) so they
are coincident. Hence, all points on the first plane are also on the second and third plane.
There are infinitely many points of ‘intersection’; in fact, a whole plane of such points.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix of this system has two rows of zeros, showing that two of these linear equations
are redundant. Since there is a single restriction on the three variables (x, y, z), there are
infinitely many solutions; in fact, there are two free parameters in the general solution.
Indeed, if we subtract the first row twice from the second row and add it once to the third

172
row, we get:  
1 5 3 2
0 0 0 0  .
0 0 0 0
For example, y and z can be regarded as free parameters, so we get a two-parameter family
of solutions.

Example 14.4.5 A typical example of two parallel planes with a third plane intersecting
them is the following: 
 x+z = 5
x+y+z = 6

x + y + z = 8.
 
1
From the geometric point of view, the last two planes have 1 as a normal vector so

1
they are parallel or coincident. The second plane goes through the point (1, 1, 4) while
the third plane
  does not go through this point, so the planes are not coincident. The first
1
plane has 0 as a normal vector, so it is not parallel to the last two. Although this plane

1
intersects the other two planes, there is no point that lies on all three planes. Hence, there
is no point of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix has two leading ones (because there are at least two non-parallel planes), but also
a row whose first three entries are zero and whose fourth entry is non-zero. In particular,
if we subtract the first row from the second row and we also subtract the second row from
the third row, we get:  
1 0 1 5
0 1 0 1  .
0 0 0 2
Hence, there is an inconsistent row and no solution.

Example 14.4.6 A typical example of two coincident planes with a third plane inter-
secting them is the following:

 x+z = 5
x+y+z = 6

2x + 2y + 2z = 12.
 
1
From the geometric point of view, the last two planes have a multiple of 1 as a normal

1
vector so they are parallel or coincident. Both these planes goes through the point (6, 0, 0),
so they are coincident and hence the same plane.   Moreover, this last plane is not parallel
1
to the first plane, because the first plane has 0 as a normal vector. Recall that two
1

173
non-parallel planes intersect at a line, so there are infinitely many solutions along this line
of intersection.
From the point of view of linear algebra, the reduced row echelon form of the augmented
matrix has two leading ones (because there are at least two non-parallel planes) and a
row of zeros. In particular, if we subtract the first row from the second row and we also
subtract the second row twice from the third row, we get:
 
1 0 1 5
0 1 0 1  .
0 0 0 0

Since there are only two restrictions on three variables, there is a free parameter in the
system and hence a line of solutions.
Example 14.4.7 A typical example of three non-parallel planes intersecting at a line is
the following: 
 x+y+z = 5
x + 2y + 3z = 6

2x + 3y + 4z = 11.
From the geometric point of view, we can see that these planes have non-parallel nor-
mal vectors, so they are not parallel. However, it is difficult to proceed further without
performing some calculations. Linear algebra becomes essential here.
Indeed, from the point of view of linear algebra, we can see that if we add the first two
equations together, we obtain the third equation. Hence, the third equation is redundant.
Indeed, the reduced row echelon form of the augmented matrix has two leading ones (be-
cause there are at least two non-parallel planes), but also a row of zeros, corresponding to
the redundant equation. In particular, if we subtract the first row once from the second
row and twice from the third row, we get:
 
1 1 1 5
0 1 2 1  .
0 1 2 1

After subtracting the second row from the first and third rows, we arrive at the reduced
row echelon form of the augmented matrix:
 
1 0 −1 4
0 1 2 1 .
0 0 0 0

Since there are only two restrictions on three variables, there is a free parameter in the
system and hence a line of solutions. Due to its shape, this case can be referred to as a
star (see figure 14.3.1).
Example 14.4.8 A typical example of three non-parallel planes with no common point
of intersection is the following:

 x+y+z = 5
x + 2y + 3z = 6

2x + 3y + 4z = 12.

174
From the geometric point of view, we can see that these planes have non-parallel normal
vectors, so they are not parallel. Again, it is difficult to proceed further without performing
some calculations. We have to rely on linear algebra once more.
From the point of view of linear algebra, we can see that if we add the first two equations
together, we obtain an equation which contradicts the third equation. Hence, the system
is inconsistent. Indeed, the reduced row echelon form of the augmented matrix has two
leading ones (because there are at least two non-parallel planes), but also a row whose first
three entries are zeros and whose fourth entry is non-zero. In particular, if we subtract the
first row once from the second row and twice from the third row, we get:
 
1 1 1 5
0 1 2 1  .
0 1 2 2

After subtracting the second row from the first and third rows, we arrive at the reduced
row echelon form of the augmented matrix:
 
1 0 −1 4
0 1 2 1 .
0 0 0 1

Hence, there is an inconsistent row and no solution. This case can be referred to as a
prism (see figure 14.3.1).
Example 14.4.9 Finally, a typical example of three non-parallel planes intersecting at a
single point is the following: 
 x = 5
x+y = 6

x+y+z = 8.
Again, from the geometric point of view, we can see that these planes are not parallel but
it is difficult to proceed further without performing calculations.
From the point of view of linear algebra, the reduced row echelon form of the coefficient
matrix is the identity. In particular, the reduced row echelon form of this system is
 
1 0 0 5
0 1 0 1  .
0 0 1 2

Hence, there is a unique solution.

14.5 Exercises for self study


Exercise 14.5.1 Consider the plane Π described by the Cartesian equation

2x + y − 3z = 4

(a) Find a vector parametric equation of the plane Π by treating x and z as free parameters;
i.e. by letting x = s, s ∈ R, and letting z = t, t ∈ R.

175
(b) Find a vector parametric equation of the same plane Π by performing elementary row
operations on the augmented matrix
, -
2 1 −3 | 4

in order to obtain the reduced row echelon form, and then letting y = λ, λ ∈ R, and
z = µ, µ ∈ R, play the role of the free parameters.
(c) In what sense are the parametric equations obtained in parts (a) and (b) equivalent?

Exercise 14.5.2 Consider the line ) of intersection of the planes Π1 and Π2 described by
the Cartesian equations

Π1 : x + 2y − 3z = 5 and Π2 : x + y + z = 7.

(a) Find a vector parametric equation for the line ) and confirm that its direction vector
is orthogonal to the normal vectors of both planes Π1 and Π2 .
(b) Eliminate the parameter from the vector parametric equation of line ) to obtain a
Cartesian description for the line.
(c) Establish the relationship between the linear system of equations obtained in part (b)
and the original linear system
C
x + 2y − 3z = 5
x + y + z = 7.

Exercise 14.5.3 Consider the planes Π1 , Π2 , Π3 described by the Cartesian equations

Π1 : x + 3y + z = 4,

Π2 : x + y − 2z = 0,
Π3 : 2x − 7z = −4.

(a) Are any of these planes parallel? Briefly justify your answer.
(b) Using elementary row operations, find the reduced row echelon form of the augmented
matrix  
1 3 1 4
1 1 −2 0 
2 0 −7 −4
and hence show that these planes have a common line of intersection.
(c) Confirm that the direction vector d of the line r = p + td that you obtained in part (b)
is orthogonal to the normal vectors of all three planes Π1 , Π2 , Π3 and that the components
of the vector p satisfy all three equations

 x + 3y + z = 4
x + y − 2z = 0

2x − 7z = −4

176
Exercise 14.5.4 Consider a vertical plane Π (i.e. parallel to the z-axis) which passes
through the points (1, 3, 0) and (1, 5, 2).
(a) Find a vector parametric equation for Π in the form r = p + sv + tw.
(b) Considering the cross product of the vectors v and w, or otherwise, find a Cartesian
equation for Π in the form
ax + by + cz = d
for some constants a, b, c, d to be determined.
(c) Confirm that the points (1, 3, 0) and (1, 5, 2) both satisfy the Cartesian equation ob-
tained in part (b) and that this Cartesian equation indeed describes a plane that is vertical.

14.6 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 1.9, 1.10 and 1.11 of our Algebra Textbook are all relevant.

• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 1.5, 1.6, 1.7 and 1.8 of our Calculus Textbook are also relevant.

15 Systems of Linear Equations, 1 of 2


15.1 Systems of m linear equations for n unknowns
We extend and generalise our approach to linear systems by introducing the Gauss-Jordan
elimination method. This is a systematic procedure for dealing with any system of m
linear equations for n unknowns. It is based on an idea that we have already encountered,
namely that of using elementary row operations to convert the augmented matrix (A|b) of
the system to its reduced row echelon form (RRE(A)|c) for some vector c. In this simpler
equivalent form, the consistency or inconsistency of the system is apparent. Moreover, if
the system is consistent, its general solution is readily available in parametric form.
It is useful to interpret geometrically the general solution of any linear system, so before
we begin with the Gauss-Jordan method, we need to discuss real coordinate spaces of
dimension higher than 3.
 
v1
 v2 
 
An element v ∈ Rn is an n × 1 matrix v =  ..  whose components are the real numbers
.
vn
v1 , v2 , ..., vn . Any such element of Rn is called an n-dimensional vector. The operations of
vector addition and scalar multiplication as well as the definitions of the norm of a vector
and the scalar product of two vectors are analogous to the corresponding operations and
definitions in R2 and R3 .

177
A line in Rn is a one-dimensional object described by the parametric equation

x = p + td,
 
x1
 x2 
 
where t ∈ R. The vector x =  ..  is the general position vector on the line, the vector
.
xn
   
p1 d1
 p2   d2 
   
p =  ..  is a particular position vector on the line, and the vector d =  ..  is a
. .
pn dn
direction vector for the line.
In order to obtain a Cartesian description for a line in Rn , we eliminate the free parameter
from its vector parametric equation. This results in n − 1 independent linear equations
for the n variables x1 , . . . , xn . Conversely, a system of n − 1 independent linear equations
for the variables x1 , . . . , xn imposes n − 1 independent restrictions on these variables. This
implies that one of these variables can be regarded as a free parameter. Hence, the general
solution of this set of equations corresponds to a line in Rn . Exercise 15.5.1 (b) provides
an example of a line in R4 .
A hyperplane in Rn is an (n − 1)-dimensional geometric object described by a Cartesian
equation of the form
9x − p, n: = 0.

The vector x is the general position vector on the hyperplane, p is a particular position
vector on the hyperplane and n is a vector orthogonal to the hyperplane.
Algebraically, a hyperplane in Rn is the set of solutions of a single linear equation for n
variables x1 , . . . , xn . Since a single restriction is imposed on x1 , . . . , xn , we deduce that n−1
of these variables remain free in the general solution. This confirms that a hyperplane is an
(n − 1)-dimensional object in Rn . Conversely, the general solution of this single equation
involves n−1 free parameters and provides a vector parametric equation for the hyperplane.
Exercise 15.5.1 (a) gives an example of a hyperplane in R4 .
In general, a flat in Rn is the set of solutions of any matrix equation Ax = b, where A is an
m × n matrix and b is an m × 1 column vector. If the system Ax = b is inconsistent, then
the flat is the empty set. If the system Ax = b is consistent and the reduced row echelon
form of the m × n matrix A has k leading ones, then n − k of the variables x1 , . . . , xn can
be regarded as free parameters in the general solution of this system. In this case, the flat
is an (n − k)-dimensional object in Rn . Conversely, given a Cartesian description for an
(n − k)-dimensional flat in Rn in the form of a system of k independent linear equations
for n variables x1 , . . . , xn , the general solution of this system involves n − k free parameters
and provides a vector parametric equation for this flat. Exercise 15.5.2 gives an example
of a two-dimensional flat in R5 .
Remark 15.1.1 Let us emphasise again that the general solution of any set of linear
equations is a flat. In particular, points, lines, planes and hyperplanes are all flats. Also
note that in Rn , a consistent system of k independent linear equations implies the presence

178
of n − k free parameters in the general solution of the system. For example, a consistent
system of seven independent linear equations in R12 implies a 5-dimensional flat in R12 .

15.2 The Gauss-Jordan method


We are already familiar with elementary row operations and the concept of row equivalence
from our discussion of linear systems of n equations for n unknowns in previous sections of
the lecture notes. Row equivalence implies that if the reduced system (RRE(A)|c) is con-
sistent, then it has the same set of solutions as the original system (A|b). If (RRE(A)|c)
is inconsistent, then row-equivalence implies that the original system (A|b) is inconsistent
as well.
We are also already familiar with linear systems whose number of unknowns is not equal to
the number of equations. For example, we encountered linear systems of three equations for
three unknowns where one of the equations is redundant. Such a system can be regarded
as a system of two equations for three unknowns and constitutes an example of the general
situation where the coefficient matrix A of the system Ax = b is an m × n matrix.
Since no additional theory is required for dealing with general linear systems, let us illus-
trate the Gauss-Jordan method by means of an example.
Example 15.2.1 Consider a linear system consisting of four equations for five unknowns:


 x1 + x2 + x3 + x4 + x5 = 3

2x1 + x2 + x3 + x4 + 2x5 = 4

 x1 − x2 − x3 + x4 + x5 = 5

x1 + x4 + x5 = 4.

The augmented matrix of this system is


 
1 1 1 1 1 3
2 1 1 1 2 4
(A|b) = 1 −1 −1 1
.
1 5
1 0 0 1 1 4

A sequence of elementary row operations that puts the augmented matrix in row echelon
form is the following:

R2 ;→ R2 − 2R1
 
R3 ;→ R3 − R1 1 1 1 1 1 3
R4 ;→ R4 − R1 0 −1 −1 −1 0 −2
−−−−−−−−−−−−−→ 0 −2 −2 0 0 2 

0 −1 −1 0 0 1
 
1 1 1 1 1 3
R2 ;→ −R2 0 1 1 1 0 2
−−−−−−−−−→ 
0 −2 −2

0 0 2
0 −1 −1 0 0 1

179
 
R3 ;→ R3 + 2R2 1 1 1 1 1 3
R4 ;→ R4 + R2 0 1 1 1 0 2
−−−−−−−−−−−−−→ 0

0 0 2 0 6
0 0 0 1 0 3
 
1 1 1 1 1 3
R3 ;→ 12 R3 0 1 1 1 0 2
−−−−−−−−−→  0

0 0 1 0 3
0 0 0 1 0 3
 
1 1 1 1 1 3
R4 ;→ R4 − R3 0 1 1 1 0 2
−−−−−−−−−−−−→ 0
.
0 0 1 0 3
0 0 0 0 0 0
This matrix is in echelon form and the system is clearly consistent, with one row simply
having dropped out. We continue the process of reduction in order to obtain a reduced
row echelon form, starting with the third row:
 
R1 ;→ R1 − R3 1 1 1 0 1 0
R2 ;→ R2 − R3 0 1 1 0 0 −1
−−−−−−−−−−−−→ 0

0 0 1 0 3
0 0 0 0 0 0
 
1 0 0 0 1 1
R1 ;→ R1 − R2 0 1 1 0 0 −1
−−−−−−−−−−−−→ 0
.
0 0 1 0 3
0 0 0 0 0 0

There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. The matrix is equivalent to the system of equations

 x1 + x5 = 1
x2 + x3 = −1

x4 = 3.

Since we have three restrictions for five variables, we know that our set of solutions will
be a 2-dimensional flat in R5 . Moreover, the form of these equations tells us that we can
regard the variables x1 , x2 and x4 (which correspond to the leading ones) as the variables
for which we are solving these equations and then treat x3 and x5 as free parameters.
At this stage, it is useful to introduce the following terminology. The variables correspond-
ing to the leading ones in the reduced row echelon form of an augmented matrix are called
the leading variables. The other variables are called the non-leading variables.
Hence, in Example 15.2.1, x1 , x2 and x4 are the leading variables and x3 and x5 are the
non-leading variables. Assigning to x3 and x5 the arbitrary values s ∈ R and t ∈ R

180
respectively, we solve for the leading variables x1 , x2 and x4 . We get

x4 = 3, x2 = −1 − s, x1 = 1 − t.

Then we express the general solution in vector parametric form:


         
x1 1−t 1 0 −1
x2  −1 − s −1 −1 0
         
x=       
x3  =  s  =  0  + s  1  + t  0  .
 
x4   3   3  0 0
x5 t 0 0 1

This is a vector parametric equation for a 2-dimensional flat in R5 (some sort of plane in
R 5
 ). The flatgoesthrough the point (1, −1, 0, 3, 0) and is parallel to the direction vectors
0 −1
−1 0
   
 1  and  0 . For any particular assignment of values to s and t, such as s = 0,
   
0 0
0 1
t = 1, we obtain a particular solution of the system.
In practice, it is convenient to read the general solution directly from the reduced row
echelon form of the augmented matrix. We have
 
1 0 0 0 1 1
0 1 1 0 0 −1
(A|b) → 0 0 0 1 0 3  .

0 0 0 0 0 0

We locate the leading ones and hence identify the corresponding leading variables. Then
we locate the non-leading variables and treat them as free parameters. In the above case,
we note that the leading ones are in the first, second and fourth column. These correspond
to the leading variables x1 , x2 and x4 . We assign arbitrary parameters to the non-leading
variables x3 = s and x5 = t and then solve the three equations (appearing in the reduced
row echelon form) for the leading variables in terms of s and t. Finally, we express the
general solution in vector parametric form to recover the expression obtained previously.

15.3 Solution sets


We have seen systems of linear equations which have a unique solution, no solution and
infinitely many solutions. It turns out that these are the only possibilities. The following
theorem establishes this fact.
Theorem 15.3.1 A system of linear equations either has no solutions, a unique solution
or infinitely many solutions.
Proof To see this, suppose that we have a consistent linear system Ax = b which does
not have a unique solution. Then it should have at least two distinct solutions, p and
q. Thinking of these solutions as position vectors in Rn , we will show that every position
vector on the line through p and q is also a solution. Therefore, as soon as there is more
than one solution, there must be infinitely many.

181
If p and q are vectors such that Ap = b and Aq = b, p )= q, then the general position
vector v(t) on the line through p and q is described parametrically by

v(t) = p + t(q − p), t ∈ R.

Hence, for any v(t) on the line, we have


, -
Av(t) = A p + t(q − p) = Ap + tA(q − p) = Ap + t(Aq − Ap) = b + t(b − b) = b.

This means that the vector v(t) solves the linear system Ax = b for any t ∈ R, so there
are infinitely many solutions. !

15.4 Homogeneous systems


A homogeneous linear system is a linear system of the form Ax = 0. There are two
important facts about homogeneous systems:
• A homogeneous system Ax = 0 is always consistent.
This is because A0 = 0, so the system always has the solution x = 0. For this reason,
x = 0 is called the trivial solution. The second fact is now evident:
• If Ax = 0 has a unique solution, then it must be the trivial solution, x = 0.
We also have a theorem for homogeneous systems:
Theorem 15.4.1 If A is an m × n matrix with m < n, then Ax = 0 has infinitely many
solutions.
Proof The system is consistent because it is homogeneous. Its solutions can be found by
reducing the augmented matrix (A|0). If A is m × n and m < n, then the reduced row
echelon form of A contains at most m leading ones (more leading ones would not fit in such a
matrix). Therefore, there are at most m leading variables, which means that there must be
at least n−m free parameters. Since m < n, we have n−m ≥ 1, so there must be at least one
free parameter in the general solution of such a system. It follows that there are infinitely
many solutions. !

15.5 Exercises for self study


Exercise 15.5.1 (a) Find a vector parametric equation for the hyperplane in R4 described
by the Cartesian equation
x1 + 2x3 − 5x4 = 7.

(b) Find a vector parametric equation for the line in R4 described by the system of Cartesian
equations 
 x1 + x4 = 0
2x2 − x3 + 3x4 = −3

x1 + x2 + x3 + x4 = 5.

182
Exercise 15.5.2 Solve the following system of equations Ax = b by applying the Gauss-
Jordan elimination method. Interpret geometrically your general solution in R5 :

 x1 − x2 + x3 + x4 + 2x5 = 4
−x1 + x2 + x4 − x5 = −3

x1 − x2 + 2x3 + 3x4 + 4x5 = 7.

Exercise 15.5.3 (a) Write down the augmented matrix for each of the following systems
of equations, where (x, y, z) ∈ R3 , and solve the system by reducing the augmented matrix
to reduced row echelon form. Express all solutions in vector parametric form.
 
 x + y + 2z = 2  x + y + 2z = 2
(i) 2y + z = 0 (ii) 2y + z = 0
 
−x + y − z = 0 −x + y − z = −2

(b) Find a vector parametric equation for the following flat in R5 :



 x1 + x2 + x3 + x5 = 1
3x1 + 3x2 + 6x3 + 3x4 + 9x5 = 6

2x1 + 2x2 + 4x3 + x4 + 6x5 = 5

Exercise 15.5.4 Consider the Cartesian equation x = 3.


(a) Identify the geometric object described by this Cartesian equation when the equation
is imposed on
(i) R = {x | x ∈ R},
(ii) R2 = {(x, y) | x ∈ R, y ∈ R},
(iii) R3 = {(x, y, z) | x ∈ R, y ∈ R, z ∈ R}.
(b) In each case above, find a vector parametric equation describing the relevant geometric
object.
(c) Solve the following system of four equations for three unknowns and interpret your
solution geometrically as an intersection of planes in R3 :


 x + 3y + 4z = 2

x − y + 2z = 5

 x + 11y + 8z = −4

5x − y + 12z = 22

15.6 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 2.1, 2.2, 2.3 and 2.4 (up to and including the proof of Theorem 2.21) of our Algebra
Textbook are relevant.

183
16 Systems of Linear Equations, 2 of 2
16.1 The principle of linearity
Each equation appearing in a linear system of m equations for n unknowns imposes a single
restriction on n variables; in other words, it corresponds to a hyperplane in Rn . Therefore,
the solution set of this system is either the empty set (if the system is inconsistent) or the
set of all points (equivalently, position vectors) lying on the intersection of m hyperplanes
in Rn . In particular, provided that the system Ax = b is consistent and that the reduced
row echelon form of the coefficient matrix A has k leading ones, the set of solutions is an
(n − k)-dimensional flat in Rn .
The vector parametric equation of this flat has the form

x = p + t1 u1 + t2 u2 + · · · + tn−k un−k ,

where p is a particular solution (i.e. position vector on the flat) and t1 , t2 , . . . , tn−k are
(n − k) free parameters.
It should be clear from the examples and exercises encountered so far that each of the
vectors u1 , u2 , . . . , un−k is orthogonal to the normal vector of each hyperplane appearing
in the linear system. This is because each ui provides a direction vector for the intersection
flat of these hyperplanes and therefore also provides a direction vector for each of these
hyperplanes. As such, it must be orthogonal to the normal vector of each hyperplane.
Realising that the normal vector of the hyperplane appearing in row i of the linear system
is simply the transpose of the ith row of the coefficient matrix A, we conclude that the dot
product of the transpose of the ith row of A with each vector uj must be zero. Moreover,
this must be true for all rows of A, which implies that

Auj = 0 for each j = 1, 2, . . . , n − k.

Considering the vector p appearing in the general parametric solution

x = p + t1 u1 + t2 u2 + · · · + tn−k un−k ,

this is a position vector on the flat of intersection of the hyperplanes in question, so it


satisfies
Ap = b.

The above geometric results, namely, Auj = 0 and Ap = b, correspond to a central


theorem of linear algebra. This theorem is stated below, after introducing a couple of
preliminary definitions:
Definition 1 Given a linear system Ax = b, the system Ax = 0 is called the associated
homogeneous system.
Definition 2 For any m × n matrix A, the null space of A, denoted by N (A), is the
subset of Rn defined by
N (A) = {x ∈ Rn | Ax = 0} .
In other words, N (A) is the set of all solutions of the homogeneous system Ax = 0.

184
Theorem 16.1.1 Suppose that A is an m × n matrix and that the system Ax = b is
consistent. Let p be a solution of the system; that is, Ap = b. Then the set of solutions
of the system consists precisely of the vectors p + z where z ∈ N (A); that is,

{x | Ax = b} = {p + z | z ∈ N (A)} .

Proof To show the two sets are equal, we need to show that each is a subset of the other.
This means showing (i) that p + z is a solution of the system for any z in the null space
of A and (ii) that all solutions x of the system are of the form p + z for some z ∈ N (A).
We start with (i): If z ∈ N (A), then

A(p + z) = Ap + Az = b + 0 = b,

so p + z is a solution of Ax = b; that is, p + z ∈ {x | Ax = b}. This shows that

{p + z | z ∈ N (A)} ⊆ {x | Ax = b} .

Conversely, suppose that x is any solution of Ax = b. Because p is also a solution, we


have Ap = b and
A(x − p) = Ax − Ap = b − b = 0,
so the vector z = x − p is a solution of the system Az = 0; in other words, z ∈ N (A).
But then x = p + z, where z ∈ N (A). This shows that all solutions are of the form p + z
for some z ∈ N (A); that is

{x | Ax = b} ⊆ {p + z | z ∈ N (A)} .

So the two sets are equal, as required. !


The above result is known as the principle of linearity. It says that the general solution x
of a consistent linear system Ax = b is given by a particular solution p of this system plus
the general solution z of the associated homogeneous system Az = 0; that is, x = p + z.
Remark 16.1.2 In the general case where the solution set of Ax = b is an (n − k)-
dimensional flat in Rn , the general solution z of the associated homogeneous system Ax = 0
has the form
z = t1 u1 + t2 u2 + · · · + tn−k un−k ,
where {ti } are free parameters and {ui } is a set of direction vectors for the flat. Such a
vector z is said to be a linear combination of the vectors {ui }.
Remark 16.1.3 In general, given a set of vectors {vi } and a set of scalars {ai }, the
vector v = a1 v1 + · · · + an vn , is called a linear combination of the vectors {vi }.
Example 16.1.4 Solve the linear system

 x+y+z = 6
2x − y − 3z = −9

3x − 2z = −3

We apply the Gauss-Jordan elimination method:

185
  R2 ;→ R2 − 2R1  
1 1 1 6 R ;→ R3 − 3R1 1 1 1 6
2 −1 −3 −9 −−−3−−−−−−−−−−→ 0 −3 −5 −21
3 0 −2 −3 0 −3 −5 −21

1   R1 ;→ R1 − R2  
R2 ;→ − R2 1 1 1 6 R3 ;→ R3 + 3R2 1 0 −2/3 −1
−−−−−−−−3−−−→ 0 1 5/3 7  −−−−−−−−−−−−−→ 0 1 5/3 7 .
0 −3 −5 −21 0 0 0 0
By letting z = t and solving for the leading variables x and y, we obtain the general
solution of our system in vector parametric form:
       2 
x −1 + 23 t −1 3
 y  =  7 − 5 t  =  7  + t − 5  .
3 3
z t 0 1

Equivalently (to avoid fractions), we have


     
x −1 2
y  =  7  + s −5 .
z 0 3

This is a one-dimensional flat (i.e., a line) in R3 . Any solution


 of 
this system has
 theform
−1 2
x = p + z, where Ap = b and Az = 0. Indeed, with p =  7  and z = s −5, we

0 3
have that
    
1 1 1 −1 6
Ap = 2 −1 −3  7  = −9 = b
3 0 −2 0 −3
and, for all s,
          
1 1 1 2 1 1 1 2 0 0
    
Az = 2 −1 −3 s −5 = s 2 −1 −3   −5 = s 0 = 0 = 0.
   
3 0 −2 3 3 0 −2 3 0 0

16.2 The rank of a matrix and the main theorem revisited


The rank of a matrix A, denoted by ρ(A), is the number of leading ones in the reduced
row echelon form of A.
If A is an m × n matrix, the number of non-zero rows in the reduced row echelon form of A
(that is, the number of leading ones) cannot exceed the total number of rows of A, which
is m. Furthermore, since the leading ones must be in different columns, the number of
leading ones in the reduced row echelon form of A cannot exceed the number of columns,
which is n. Thus,
ρ(A) ≤ min {m, n} ,

186
where min{m, n} denotes the smaller of the two integers m and n.
In the particular case of a square n × n matrix A, if the reduced row echelon form of A
is the identity then, clearly, A has rank n. On the other hand, if the reduced row echelon
form of A is not the identity, then there must be at least a row of zeros in the reduced row
echelon form of A, which means that the rank of A is strictly less than n.
Hence, we can revisit the Main Theorem 11.4.2 characterising invertibility of a square
matrix A and add a few more equivalent statements to that collection.
The Main Theorem 16.2.1 If A is an n × n matrix, then the following statements are
equivalent.

• A−1 exists.
• Ax = b has a unique solution for any b ∈ Rn .
• Ax = 0 has only the trivial solution, x = 0.
• The reduced row echelon form of A is the identity I.
• |A| =
) 0.
• The rank of A is n.

Finally, let us introduce two additional definitions which will help us analyse in the next
subsection the solution set of a general linear system Ax = b.
Definition 3 An m × n matrix A has full row rank if its rank is equal to the number
of rows in A; that is, if ρ(A) = m.
Definition 4 Similarly, we say that an m × n matrix A has full column rank if its
rank is equal to the number of columns in A; that is, if ρ(A) = n.

16.3 Analysing the set of solutions of Ax = b


In this section, we consider a general linear system Ax = b where A is an m × n matrix
of rank ρ(A). We obtain a set of simple rules that determine whether Ax = b has no
solutions, a unique solution or infinitely many solutions.
Case 1: A has full row rank
Consider a linear system Ax = b where A is an m × n matrix of rank m. Since the number
of columns, n, must be at least as large as the number m of leading ones, we deduce that
n ≥ m. Typically, the reduced row echelon form RRE((A|b)) of the augmented matrix
(A|b) of such a system looks like this:
 
 1
 0 0 0 ∗ ∗ ∗ ∗
 0 1 0 0 ∗ ∗ ∗ ∗
m 0


 0 1 0 ∗ ∗ ∗ ∗

0 0 0 1 ∗ ∗ ∗ ∗
? @A B
n≥m

187
Clearly, since the number of rows, m, is equal to the number
, of leading
- ones, there cannot
be a row of zeros on the m × n left submatrix of RRE (A|b) . Therefore, the system
is consistent. Moreover, there are (n − m) free parameters present in the system, so the
solution set is an (n − m)-dimensional flat in Rn .
The solution is unique only if the solution set does not contain any free parameters; that
is, only if n = m. If this is the case, the matrix A has full column rank and full row rank.
The corresponding flat in Rn consists of only a single point (equivalently position vector)
and can be referred to as a ‘zero-dimensional’ flat.
Remark 16.3.1 Summarising, if A has full row rank, the system Ax = b is consistent.
If, additionally, A has full column rank, the solution is unique.
Case 2: A has full column rank
Consider a linear system Ax = b where A is an m × n matrix of rank n. Since the number
of rows, m, must be at least as large as the number n of leading ones, we deduce that

m ≥ n.

Typically, the reduced row echelon form RRE((A|b)) of the augmented matrix (A|b) of
such a system looks like this:
 

 1 0 0 0 ∗

 

 0 1 0 0 ∗



0 0 1 0 ∗

m≥n 0 0 0 1 ∗


 0

  0 0 0 ∗


 0

 0 0 0 ∗

0 0 0 0 ∗
? @A B
n
In the general case where m ≥ n, whether or not the system is consistent depends on the
particular details of the equations comprising the system. For example, if the vector b on
the right hand side of the system is such that the last three rows of the above reduced
matrix consists exclusively of zeros (i.e. if the entry ‘∗’ in each of the last three rows is
zero), then the system is consistent. In that case, the solution is unique, since the full
column rank of A implies the absence of free parameters.
In the particular case when m = n, A has full row rank and full column rank so the solution
is unique regardless of the vector b on the right hand side of the system.
Remark 16.3.2 Summarising, provided that the system is consistent, if A has full column
rank, the solution is unique. If A also has full row rank, the consistency of the system is
guaranteed.
Case 3: A has neither full row rank nor full column rank
This case corresponds to a linear system Ax = b where A is an m × n matrix of rank
k where k < min(m, n). Typically, the reduced row echelon form RRE((A|b)) of the

188
augmented matrix (A|b) of such a system looks like this:
 
 1 0 0 ∗ ∗




 ∗
0 1 0 ∗ ∗ 
m  0 0 1 ∗ ∗ ∗

 0 0 0 0 0

 ∗

0 0 0 0 0 ∗
? @A B
n

The system may or may not be consistent; this depends on the particular details of the
equations comprising the system. For example, if b is such that the last two rows of the
above matrix consist entirely of zeros (i.e. the entry ‘∗’ in each of the last two rows is
zero), the system is consistent. If this is the case, the system necessarily has infinitely
many solutions, because k < n.
Let us complete this subsection by introducing a mathematical condition that amounts to
the consistency of the linear system Ax = b. This is the following:

ρ(A|b) = ρ(A).
, -
When the above condition is satisfied, there is no row in RRE (A|b) whose first entries are
all zeros and whose last, entry -is non-zero, so the system ,is consistent.
- Indeed, the presence
of such a row in RRE (A|b) would imply that RRE (A|b) has an additional leading
one compared to RRE(A) which would immediately lead to ρ(A|b) > ρ(A). Collecting
all our findings, we have the following simple rules:

• A mathematical condition that amounts to consistency is ρ(A|b) = ρ(A).

• If A has full row rank, then we definitely have ρ(A|b) = ρ(A), so the consistency of
the system Ax = b is guaranteed for all b.

• If A does not have full row rank, the system may or may not be consistent. This
depends on the details of the vector b. Provided that ρ(A|b) = ρ(A), i.e., provided
that the system is consistent, then if A has full column rank the solution is unique,
and if A does not have full column rank there are infinitely many solutions.

• A homogeneous system Ax = 0 is always consistent, so ρ(A|b) = ρ(A) regardless


of whether or not A has full row rank. In this case, if A has full column rank the
solution x = 0 is unique, and if A does not have full column rank there are infinitely
many solutions, which include the solution x = 0.

16.4 Exercises for self study


Exercise 16.4.1 Consider the linear system for (x, y, z) given by
C
2x + 4y − 5z = a
x − y + 2z = b

(a) Without solving the system, argue that the system is consistent and has a line ) of
solutions.

189
 
−1
(b) Still without solving the system, is the line ) parallel to the vector  3 ?
2
(c) Solve the system using the Gauss-Jordan method in order to find the general parametric
solution in terms of a and b.

Exercise 16.4.2 Consider a linear system Ax = b consisting of m equations for 5 un-


knowns. It is known that the rank of A is equal to 3.
(a) State an inequality satisfied by the number m.
(b) If m is equal to 3, is the system consistent?
(c) Given that m is equal to 4, write down a vector b that guarantees consistency.
(d) Given that m is equal to 4 and that b )= 0, write down four typical equations that give
a consistent system.

Exercise 16.4.3 Consider the linear system for (x, y, z) given by



 3x + 2y + z = 9
x − y + 2z = 3

2x + 3y + az = c

(a) By performing suitable elementary row operations on the augmented matrix (A|b),
find a condition on a that guarantees the consistency of this system for all c ∈ R.
(b) In the case where the coefficient matrix above cannot guarantee the consistency of the
system, what should the value of c be in order for solutions to exist? For that particular
value of c, obtain the general solution of the system.

Exercise 16.4.4 Consider a linear system Ax = b consisting of 5 equations for n un-


knowns. It is known that the rank of A is equal to 5.
(a) State an inequality satisfied by the number n of unknowns.
(b) Is the system consistent for all b?
(c) If the system is consistent, for what values of n do we have infinitely many solutions?
(d) Given that the solution set of this system is a 3-dimensional flat in Rn , what is the
value of n?
(e) For what value of n does the solution set of this system consist of a single vector? In
what real coordinate space does this vector lie?

16.5 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 2.4, 4.1 and 4.2 of our Algebra Textbook are all relevant.

190
17 Vector Spaces, 1 of 4
17.1 Definition of a vector space
A real vector space V is a non-empty set equipped with a vector addition operation and
a scalar multiplication operation such that for all α, β ∈ R and all u, v, w ∈ V :

1. u + v ∈ V (closure under addition)


2. u + v = v + u (the commutative law for addition)
3. u + (v + w) = (u + v) + w (the associative law for addition)
4. there is a unique member 0 of V , called the zero vector, such that for all v ∈ V ,
v + 0 = v.
5. for every v ∈ V there is an element w ∈ V , usually written as −v and called the
negative of v, such that v + w = 0.
6. αv ∈ V (closure under scalar multiplication)
7. α(u + v) = αu + αv (distributive law)
8. (α + β)v = αv + βv (distributive law)
9. α(βv) = (αβ)v (associative law for scalar multiplication)
10. 1v = v.

The above axioms reproduce the algebraic properties of the operations of vector addition
and scalar multiplication as defined for column vectors in Rn . In other words, the vector
addition in Rn ,      
u1 v1 u1 + v1
 u2   v2   u2 + v2 
     
 ..  +  ..  =  ..  ,
. .  . 
un vn un + vn
and the scalar multiplication in Rn ,
   
u1 λu1
 u2   λu2 
   
λ  ..  :=  ..  ,
.  . 
un λun
satisfy all the above axioms and turn the set Rn into a vector space. Other well-known
properties of vectors, such as the property that 0x = 0 for any x ∈ Rn , follow from the
above axioms. To see this:
0x = (0 + 0)x
0x = 0x + 0x by distributivity
0x + (−0x) = (0x + 0x) + (−0x) by adding the negative − 0x of 0x
0 = 0x + (0x + (−0x)) by associativity
0 = 0x + 0
0 = 0x.

191
Remark 17.1.1 The above axioms have been motivated by the algebraic properties of
column vectors in Rn . However, it is not only column vectors in Rn that can be added
and multiplied by scalars in a way consistent with the above axioms. For example, the set
F of all functions from R to R can be regarded as a vector space provided that suitable
operations of vector addition and scalar multiplication are defined on F . These operations
allow us to view each function f in F as a vector in an abstract sense. Similarly, sets
of matrices, sets of polynomials, sets of sequences and many other sets of mathematical
objects can mimic the properties of column vectors in Rn provided that suitable definitions
of vector addition and scalar multiplication are introduced on these sets. A couple of
examples are given below:
Example 17.1.2 Let F be the set of all functions from R to R with vector addition
defined by point-wise addition of functions,

(f + g)(x) := f (x) + g(x),

and scalar multiplication defined by

(αf )(x) := αf (x).

Then F is a vector space. Note that the function which plays the role of the zero vector 0
in the vector space F is the unique function z ∈ F which satisfies axiom 4; namely, for all
f ∈ F,
(f + z)(x) = f (x) ∀x ∈ R.
Given the definition of vector addition on F , the above condition implies that

f (x) + z(x) = f (x) ∀x ∈ R,

which in turn implies that


z(x) = 0 ∀x ∈ R.
In other words, the zero vector 0 in the vector space F corresponds to the function which
maps every real number to 0; that is, the identically zero function.
Example 17.1.3 The set of m × n matrices with real entries is a vector space, with
vector addition and scalar multiplication defined as the usual matrix addition and scalar
multiplication of matrices. The ‘zero vector’ in this vector space is the zero m × n matrix,
which has all entries equal to 0.
Remark 17.1.4 Of course, not all definitions of vector addition and scalar multiplication
turn a set into a vector space. For example, one can define vector addition and scalar
multiplication on the set F of all functions from R to R as follows:

(f + g)(x) := f (x) + 2g(x),

(αf )(x) := αf (x) + 1.


Then, vector addition fails to satisfy axiom 2:

(g + f )(x) := g(x) + 2f (x) )= (f + g)(x).

Many other axioms also fail.

192
Remark 17.1.5 Also note that the definition of a vector space says nothing about a
norm or a scalar product. The only operations with which the definition is concerned are
vector addition and scalar multiplication. Concepts such as the norm or the scalar product
(that we introduced and visualised for column vectors in Rn ) are additional structures on
a vector space. We are going to encounter such structures in Lecture Notes 20.

17.2 Subspaces and the subspace criterion


A subspace W of a vector space V is a non-empty subset of V that is itself a vector space
under the same operations of vector addition and scalar multiplication as V .
Theorem 17.2.1 Let V be a vector space. Then a non-empty subset W of V is a
subspace of V if and only if both the following conditions hold:

• for all u, v ∈ W, u + v ∈ W (that is, W is closed under vector addition)


• for all v ∈ W and all α ∈ R, αv ∈ W (that is, W is closed under scalar multiplica-
tion)

The above theorem is known as the subspace criterion. It provides a practical way for
deciding if a subset of a vector space V is itself a vector space; i.e., a subspace of V .
Example 17.2.2 Consider the vector space R2 with the standard operations of vector
addition and scalar multiplication. Let the subset S ⊂ R2 be defined by
* &x ' D +
D
S= D y = 2x .
y
Show that S is a subspace of R2 .
The set S is a line through the origin of R2 described by the Cartesian equation y = 2x.
To show that S is a subspace of R2 we apply the & subspace
' criterion. We note that S is a
0
non-empty set since, for example, the zero vector ∈ S. To show closure under vector
0
addition, let us take any two elements u, v of S,
& ' & '
a c
u= where b = 2a, and v= where d = 2c.
b d
Their vector sum is the vector
& ' & ' & '
a c a+c
u+v = + = .
b d b+d
The subset S is closed under vector addition only if u + v is in S. In order to verify this,
we note that
b + d = 2a + 2c = 2(a + c),
so indeed, the components of u + v satisfy the Cartesian equation y = 2x. Hence, u + v
belongs to S.
To show closure under scalar multiplication, we take any element of S,
& '
a
u= with b = 2a,
b

193
and any scalar λ ∈ R and we consider the vector λu. We have
& ' & '
a λa
λu = λ = .
b λb

The subset S is closed under scalar multiplication only if λu is in S. We note that

λb = λ(2a) = 2(λa),

so indeed, the components of λu satisfy the Cartesian equation y = 2x. Hence, λu belongs
to S.
By the subspace criterion, S is a subspace of R2 .
Example 17.2.3 With the vector space R2 as in Example 17.2.2, consider the subset
U ⊂ R2 defined by
* &x ' D +
D
U= D y = 2x + 1 .
y
Show that U is not a subspace of R2 .

& 'note that U is a line in R which does not go through the origin. Hence, the zero vector
2
We
0
/ U . Any subspace of R2 must be a vector space in its own right, so it must contain

0
& 'unique zero vector of R in order to have any chance of being a vector space. Since
2
the
0
/ U , axiom 4 fails. This implies that U is not a subspace of R2 .

0
Remark 17.2.4 Examples 17.2.2 and 17.2.3 are visualised below:

Figure 17.2.5

It should be clear from the first diagram that the vector sum of any two elements of S is an
element of S and that any scalar multiple of any element of S is an element of S. On the
other hand, U is not closed under vector addition and scalar multiplication. The vectors
depicted provide counterexamples.
Remark 17.2.6 Stated without proof, the following is a valid result: Any flat in Rn
(that is, point, line, plane, etc) that contains the origin of Rn is a subspace of the vector

194
space Rn under the standard operations of vector addition and scalar multiplication. On
the other hand, any flat in Rn that does not contain the origin of Rn is not a subspace of
Rn under the same operations.
Remark 17.2.7 A subset S of a vector space V which contains the zero vector of V may
or may not be a subspace of V . For example, consider V = R2 and let S be the subset of
V depicted below:

The subset S contains the zero vector of V , but it is not closed under vector addition and
scalar multiplication. The vectors depicted provide counterexamples.

17.3 Exercises for self study


Exercise 17.3.1 Consider the set V = R2 and define vector addition and scalar multi-
plication as follows:
For u ∈ V and v ∈ V , vector addition is defined by numerical addition in the standard
way: & ' & ' & '
u1 v1 u1 + v1
u+v = + := .
u2 v2 u2 + v2

For u ∈ V and λ ∈ R, scalar multiplication is defined by numerical multiplication in a


non-standard way: & ' & '
v1 λv1
λv = λ := .
v2 0

Show that these definitions do not turn the set V = R2 into a vector space by identifying
an axiom that fails to hold.

Exercise 17.3.2 Consider the vector space V = R2 where vector addition and scalar
multiplication are defined in the standard way:
& ' & ' & '
u1 v1 u1 + v1
u+v = + :=
u2 v2 u2 + v2

and & ' & '


v1 λv1
λv = λ := .
v2 λv2

195
Identify which of the following subsets of R2 are subspaces of R2 :
* &x ' D +
D
(i) Q = D 2x − y = 0 ,
y
* &x ' D +
D
(ii) R = Dx=1 ,
y
* x' D
& +
D 2 2
(iii) S = D x + (y − 1) = 1 .
y

Exercise 17.3.3 Consider the set V of positive real numbers


L D M
V = v∈RDv>0 .
Define the operations of vector addition and scalar multiplication as follows:
For u ∈ V, v ∈ V , vector addition is defined by numerical multiplication,
u + v := uv.
For v ∈ V and scalar k ∈ R, scalar multiplication is defined by numerical exponentiation,
kv := v k .
It can be shown that these two operations satisfy all the axioms of a vector space.
(a) Identify which element of V plays the role of the ‘zero vector’.
(b) Given any v ∈ V , identify which element of V plays the role of ‘negative v’.
(c) Verify that the following axiom holds:
k(u + v) = (ku) + (kv).

Exercise 17.3.4 Identify which of the following subsets of R3 are subspaces of R3 under
the standard operations of vector addition and scalar multiplication:
   
* x D + * x D +
  D   D 2 2 2
S1 = y D x + y + z = 0 , S2 = y Dx +y +z =1 ,
z z
   
* x D + * x D +
D D
S3 = y  D xy = 0 , S4 = y  D x = 0 and y = 0 .
z z
In each case, provide a proof or a counterexample to justify your answer. Also describe
each subset geometrically.

17.4 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 5.1 and 5.2 of our Algebra Textbook are relevant.

196
18 Vector Spaces, 2 of 4
18.1 Linear span
Recall that by a linear combination of vectors v1 , v2 , . . . , vk we mean a vector v of the
form
v = α1 v1 + α2 v2 + · · · + αk vk ,
for some constants αi ∈ R.
Now suppose that V is a vector space and that the vectors v1 , v2 , . . . , vk all belong to V .
The linear span of the set X = {v1 , v2 , . . . , vk }, denoted by Lin(X) or Lin{v1 , v2 , . . . , vk },
is the set of all linear combinations of the vectors v1 , v2 , . . . , vk . That is,

Lin {v1 , v2 , . . . , vk } = {α1 v1 + α2 v2 + · · · + αk vk | α1 , α2 , . . . , αk ∈ R} .

Example 18.1.1 Let V = R2 and consider the following three vectors


& ' & ' & '
1 0 2
v1 = v2 = v3 = .
0 1 3

The linear span of the set {v1 , v2 , v2 } is the set


N & ' & ' & ' D O
1 0 2 D
Lin {v1 , v2 , v3 } = α1 + α2 + α3 D α 1 , α2 , α3 ∈ R .
0 1 3

Note that Lin{v1 , v2 , v3 } = R2 because the statements Lin{v1 , v2 , v3 } ⊆ R2 and R2 ⊆


Lin {v1 , v2 , v3 } are both true.
Indeed, Lin{v1 , v2 , v3 } ⊆ R2 because any vector u ∈ Lin{v1 , v2 , v3 } has the form
& '
α1 + 2α3
u= ,
α2 + 3α3

& '
a
which implies that u also belongs to R2 . Conversely, any vector v = ∈ R2 can be
b
written as a linear combination of the vectors v1 , v2 , v3 ; for example,
& ' & ' & ' & '
a 1 0 2
=a +b +0 .
b 0 1 3

Hence v also belongs to Lin{v1 , v2 , v3 }.


& '
1
Example 18.1.2 Let V = R and consider the vector v =
2
. The linear span Lin{v}
3
is the set N & ' O
D
1 D
Lin {v1 } = α Dα∈R ,
3

which is a line in R2 through the origin.

197
Theorem 18.1.3 If X = {v1 , v2 , . . . , vk } is a set of vectors that all belong to a vector
space V , then Lin(X) is a subspace of V . In fact, it is the smallest subspace of V containing
the vectors v1 , v2 , . . . , vk .
Proof The set Lin(X) is non-empty, since

0 = 0v1 + · · · + 0vk ∈ Lin(X).

Moreover, for all v = α1 v1 + · · · + αk vk ∈ Lin(X) and w = β1 v1 + · · · + βk vk ∈ Lin(X)


and all scalars λ ∈ R, we have that
v + w = (α1 + β1 )v + · · · + (αk + βk )vk ∈ Lin(X), and

λv = (λα1 )v1 + · · · + (λαk )vk ∈ Lin(X),

so by the Subspace Criterion, Lin(X) is a subspace of V .


Furthermore, any subspace of V which contains the vectors v1 , v2 , . . . , vk must also contain
all linear combinations of these vectors, so it must contain Lin(X). That is, Lin(X) is the
smallest subspace of V containing v1 , v2 , . . . , vk . !

Example
  18.1.4 Let   V = R4 and consider the set of vectors X = {v1 , v2 } given by
1 0
0  
v1 =   ∈ R4 and 1 ∈ R4 . Theorem 18.1.3 tells us that Lin{v1 , v2 } is a subspace of
0 0 
0 0
R , call it S. We can express S parametrically as follows:
4

 
N s D O
 t  DD
S=  
0 DD s ∈ R, t ∈ R .
0

Theorem 18.1.3 also tells us that S is the smallest


  subspace of R4 containing the vectors
N α D O
β  DD
v1 and v2 . For example, the subspace W =  
 γ  DD α ∈ R, β ∈ R, γ ∈ R of R also
4

0
 
0
0 
contains v1 and v2 , but it is larger than S since it also contains   / S.
1  ∈
0
Given a set of vectors X = {v1 , v2 , . . . , vk } that all belong to a vector space V , we say
that X spans V if Lin(X) = V ; that is, if every vector v ∈ V can be written as a linear
combination
v = α1 v1 + α2 v2 + · · · + αk vk
for some scalars α1 , . . . , αk .
In the case where V = Rn , consider the following frequently occurring problem: We have
a set of vectors {v1 , v2 , . . . , vk } in Rn and we want to know if this set spans Rn ; that is,
we want to know if Lin{v1 , v2 , . . . , vk } = Rn .

198
Clearly, Lin{v1 , v2 , . . . , vk } ⊆ Rn because any linear combination of vectors in Rn is also
in Rn . So, in order to establish the equality Lin{v1 , v2 , . . . , vk } = Rn , we only need to
show that Rn ⊆ Lin {v1 , v2 , . . . , vk }. In other words, given any b ∈ Rn , we need to show
that we can find scalars α1 , α2 , . . . , αk such that
α1 v1 + α2 v2 + · · · + αk vk = b.
, -
Defining the matrix A = v1 v2 . . . vk as the n ×k matrix  whose columns are the vectors
α1
 .. 
v1 , v2 , . . . , vk , and the k × 1 column vector x =  .  as the vector of the unknowns, we
αk
can express the above vector equation as the linear system
Ax = b.

In order to show that any b can be obtained as a linear combination of the vectors {vi },
we require that the above matrix system admits at least one solution; that is, we require
that the system (A|b) is consistent for all b. This happens only if A has full row rank.
This is a very useful result in practice, as the following examples illustrate:
 
1
Example 18.1.6 Show that the set of vectors X = {v1 , v2 , v3 , v4 }, where v1 = 0,

      0
0 0 1
v2 = 1 , v3 = 0 , v4 = 2 , spans R3 .
    
1 3 1

  of ‘X spanning R ’. For this to


3
First, let us approach this question by using the definition
b1
be the case, we must have that for any b ∈ R3 , call it b2 , there exist scalars α1 , α2 , α3 , α4
b3
such that
α1 v1 + α2 v2 + α3 v3 + α4 v4 = b,
that is,          
1 0 0 1 b1
α 1 0 + α 2 1 + α 3 0 + α 4 2 = b2  .
        
0 1 3 1 b3
This is equivalent to the matrix equation Ax = b,
 
  α1  
1 0 0 1   b1
0 1 0 2 α2  = b2  ,
α3 
0 1 3 1 b3
α4
where the columns of the matrix A are precisely the four given vectors. Performing a
couple of elementary row operations on the augmented matrix (A|b) of the system we get

     
1 0 0 1 b1 1 0 0 1 b1 R →
' 1
R
1 0 0 1 b1
R3 '→R3 −R2 3 3 3
0 1 0 2 b2  − 
−−−−−−→ 0 1 0 2 b2  
−−−−−→ 0 1 0 2 b2 
b3 −b2
0 1 3 1 b3 0 0 3 −1 b3 − b2 0 0 1 − 13 3

199
The system is consistent since A has full row rank. Hence, X spans R3 .
, -
Alternatively, and faster, we only consider A = v1 v2 v3 v4 and simply verify that it has
full row rank:
     
1 0 0 1 1 0 0 1 1
R '→ R3
1 0 0 1
R '→R −R2
A = 0 1 0 2 −−3−−−3−−→ 0 1 0 2  −−3−−3−→ 0 1 0 2  .
0 1 3 1 0 0 3 −1 0 0 1 − 13

The conclusion that X spans R3 follows.


N& ' & 'O
1 2
Example 18.1.7 Show that the set X = , does not span R2 .
2 4

Again, let us first approach this question by using the definition of ‘X spanning R2 ’. For
this to be the case, we must find scalars α1 , α2 such that
& ' & ' & '
1 2 b
α1 + α2 = 1
2 4 b2
& '
b
for any 1 ∈ R2 . Performing a few elementary row operations on the augmented matrix
b2
of this system we find
& ' & '
1 2 b1 R2 '→R2 −2R1 1 2 b1
−−−−−−−→ .
2 4 b2 0 0 b2 − 2b1
& ' & '
b1 b
In this case, it is not true that the system is consistent for all ∈ R , since any 1
2
b2 b2
which satisfies b2 − 2b1 )= 0 immediately leads to inconsistency. Hence X does not span
R2 .
, -
Alternatively, and faster, we simply focus on A = v1 v2 and verify that it does not have
full row rank: & ' & '
1 2 R2 '→R2 −2R1 1 2
A= −−−−−−−→ .
2 4 0 0
The conclusion that X does not span R2 follows.

18.2 Linear independence and linear dependence


Let V be a vector space and v1 , v2 , . . . , vk ∈ V . The vectors v1 , v2 , . . . , vk are called
linearly independent if and only if the vector equation

α1 v1 + α2 v2 + · · · + αk vk = 0

has the unique solution α1 = α2 = · · · = αk = 0.


On the other hand, we say that the vectors v1 , v2 , . . . , vk are linearly dependent if and
only if there are real numbers α1 , α2 , . . . , αk , not all zero, such that

α1 v1 + α2 v2 + · · · + αk vk = 0.

200
In the case where V = Rn , consider the following frequently occurring problem: We have
a set of vectors v1 , v2 , . . . , vk in Rn and we need to decide whether or not the vectors
v1 , v2 , . . . , vk are linearly independent.
, -
Again, by letting A=  v1 v2 . . . vk be the n × k matrix whose columns are the vectors
α1
 .. 
v1 , . . . , vk and x =  .  be the k × 1 column vector of the unknowns α1 , α2 . . . , αk , we
αk
express the vector equation

α1 v1 + α2 v2 + · · · + αk vk = 0

as a homogeneous linear system Ax = 0. We have linear independence only if the unique


solution of the matrix equation Ax = 0 is the trivial solution α1 = α2 = · · · = αk = 0;
i.e., only if x = 0. This is possible only if A has full column rank, in which case the
always consistent homogeneous system does not have any free parameters in its general
solution. The following examples illustrate the usefulness of this result as well as that of
the corresponding result concerning the linear span.
& ' & '
1 1
Example 18.2.1 In R2 , are the vectors and linearly independent? Do they
2 −1
span R2 ?
& '
1 1
We consider A = and perform a few elementary row operations:
2 −1
& ' & ' & ' & '
1 1 R2 '→R2 −2R1 1 1 R2 '→− 13 R2 1 1 R1 '→R1 −R2 1 0
−−−−−−−→ −−−−−−→ −−−−−−−→ .
2 −1 0 −3 0 1 0 1
& ' & '
1 1
It follows that A has full column rank. Hence and are linearly independent.
2 −1
In addition, A has full row rank. Hence, these two vectors span R2 . Note that the last
two row operations are not really needed; the situation is clear after the very first row
operation.
& ' & '
1 2
Example 18.2.2 In R , are the vectors
2
and linearly independent? Do they
2 4
span R2 ?
& '
1 2
We consider A = and perform a few elementary row operations:
2 4
& ' & '
1 2 R2 '→R2 −2R1 1 2
−−−−−−−→ .
2 4 0 0
& ' & '
1 2
Since A does not have full column rank, the vectors and are not linearly in-
2 4
dependent. Also, since A does not have full row rank, these two vectors do not span
R2 .

201
   
1 0
Example 18.2.3 In R3 , are the vectors v1 = 2, v2 = 1 linearly independent?
0 1
Do they span R3 ?
, -
Again, both questions can be answered by considering A = v1 v2 . We perform a few
elementary row operations on A to get it in RRE form:
     
1 0 1 0 1 0
R2 '→R2 −2R1 R '→R −R2
2 1  − −−−−−−→ 0 1 −−3−−−3−−→ 0 1 .
0 1 0 1 0 0

Since A does not have full row rank, v1 , v2 do not span R3 . On the other hand, A has full
column rank, so the vectors v1 , v2 are linearly independent.
& ' & ' & '
1 1 2
Example 18.2.4 In R , are the vectors v1 =
2
, v2 = , v3 = linearly
1 2 3
independent? Do they span R2 ?
& '
1 1 2
We consider A = and perform a few elementary row operations:
1 2 3
& ' & ' & '
1 1 2 R2 '→R2 −R1 1 1 2 R1 '→R1 −R2 1 0 1
−−−−−−−→ −−−−−−−→ .
1 2 3 0 1 1 0 1 1

Since A has full row rank, v1 , v2 , v3 span R2 . On the other hand, A does not have full
column rank, so the vectors v1 , v2 , v3 are not linearly independent.
     
1 0 0
Example 18.2.5 In R , are the vectors e1 = 0 , e2 = 1 , e3 = 0 linearly
3     
0 0 1
independent? Do they span R ? 3

 
1 0 0
We see that A = 0 1 0 has both full column rank and full row rank. So these vectors
0 0 1
are linearly independent and span R3 .

Methodology regarding linear independence and linear span

Let us now collect and generalise our findings


, by considering
- k vectors in Rn . We base
our approach on the n × k matrix A = v1 v2 . . . vk constructed by placing the k vectors
v1 , v2 , . . . vk as its columns.

Case (i): If the number of vectors, k, exceeds the dimension n of the vector space Rn , we
have that
ρ(A) ≤ n < k.
In this case, A may or may not have full row rank, but it certainly does not have full
column rank. In other words, the vectors cannot be linearly independent. They may or
may not span Rn , depending on whether ρ(A) < n or ρ(A) = n.

202
Case (ii): If the number of vectors, k, is less than the dimension n of the vector space
Rn , we have that
ρ(A) ≤ k < n.
A may or may not have full column rank, but it certainly does not have full row rank. In
other words, the vectors cannot span Rn . They may or may not be linearly independent,
depending on whether ρ(A) < k or ρ(A) = k.

Case (iii): If the number of vectors, k, is equal to the dimension n of the vector space Rn ,
the rank of A either satisfies
ρ(A) = k = n
and the vectors are linearly independent and also span Rn , or
ρ(A) < k = n,
in which case the vectors are not linearly independent and do not span Rn .
Alternatively, we can consider the determinant of the square matrix A. If |A| =
) 0, the
vectors are linearly independent and span Rn . If |A| = 0, the vectors are not linearly
independent and they do not span Rn .

18.3 Exercises for self study


Exercise 18.3.1 Determine which of the following sets of vectors spans R3 , and which
of these sets consists of linearly independent vectors.
N 1 3 O N 1 1 0 2 O
S1 = 2 , 1 , S2 = 0 , 1 , 1 , 1 ,
0 0 0 1 0 3
N 1  2  1  O N 1 2 1 O
S 3 = 2  , 1  , 0  , S4 = 2 , 1 , 0 .
3 4 1 5 4 1

Exercise 18.3.2 Let F(R) be the set of all functions from R to R. It is given that F(R)
is a vector space under the operations of vector addition and scalar multiplication given
below:
(f + g)(x) := f (x) + g(x),
(λf )(x) := λf (x).
Determine which of the following two sets are subspaces of F(R):
S1 = {f ∈ F(R) | f (0) = 1} , S2 = {f ∈ F(R) | f (1) = 0} .

Exercise 18.3.3 (a) Show that the set S1 spans R3 , and that any vector v ∈ R3 can be
written as a linear combination of the vectors in S1 in infinitely many ways. Show that S2
does not span R3 .
N 1   1  0  1  O N  1  2  1  O
S1 = 2  ,  0  , 1  , 1  , S2 =  0  , 1  , 2  .
3 −1 1 0 −1 3 9

203
(b) Determine which of the following sets consist of linearly independent vectors.
N& ' & 'O N& ' & ' & 'O N& ' & 'O
1 1 1 1 2 0 1
L1 = , , L2 = , , , L3 = , ,
2 3 2 3 5 0 2

     
N 1 2 3 O N 1 2 4 O
2   0  4 
L4 = 2 , 7 , 5 , L5 =      
1 , −1 , 1 .
0 0 0
3 2 8

Exercise 18.3.4 Let M2 (R) be the set of all 2 × 2 matrices with real entries. It is given
that M2 (R) is a vector space under the standard operations of matrix addition and scalar
multiplication. Which of the following subsets of M2 (R) are subspaces of M2 (R)?
N& 'D O N& 'D O
a 0 D a2 0 D
W1 = D a, b ∈ R , W2 = D a, b ∈ R .
0 b 0 b2

18.4 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 5.3 and 6.1 of our Algebra Textbook are relevant.

19 Vector Spaces, 3 of 4
19.1 Theorems on linear span and linear independence
In this subsection, we collect four theorems relating to the linear span and the linear
independence or dependence of a set of vectors {v1 , v2 , . . . , vk } in a vector space V . These
theorems are applicable to a general vector space V and their proofs can be found in our
Algebra Textbook. Below, we restrict attention to the case where V = Rn and base the
proofs of these theorems on the properties of the matrix A = (v1 v2 . . . vk ). It is essential
to know these theorems and fully understand their proofs.

Theorem 19.1.1 The set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set if and only if
at least one vector vi in this set is a linear combination of the remaining vectors.
Proof This theorem is of the form ‘if and only if’, so we need to prove it both ways: If the
set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set, then the matrix A = (v1 v2 . . . vk )
does not have  full  column rank. Otherwise, the only solution of the equation Ax = 0,
α1
 α2 
 
where x =  ..  is a vector of scalars, would be the trivial solution x = 0. This would
.
αk
imply that the vectors {v1 , v2 , . . . , vk } are linearly independent, thus contradicting our
assumption that the set {v1 , v2 , . . . , vk } is linearly dependent. Hence, since A does not

204
have full column rank, there exists at least one free parameter t in the general solution of
the system Ax = 0. Now suppose that the role of this free parameter t is played by the
entry αi in x for some i ∈ {1, 2, . . . , k}. In other words, suppose that αi = t. We can
choose t = 1 and set any other free parameters that may appear in the general solution of
the system Ax = 0 to zero. In this way, we obtain a particular solution for x of the form
   
α1 c1
 .   .. 
.
 .  .
   
 αi  =  1 
 .  .
 ..   .. 
αk ck
for some constants c1 , . . . , ci−1 , ci+1 , . . . , ck that depend on the details of the system. This
solution for x implies that
c1 v1 + · · · + vi + · · · + ck vk = 0.
Hence, solving this equation for vi , we obtain vi as a linear combination of the remaining
vectors in the set {v1 , v2 , . . . , vk }. This proves one direction of the theorem; namely, that
if the set {v1 , v2 , . . . , vk } ⊆ Rn is a linearly dependent set, then at least one vector vi in
this set is a linear combination of the remaining vectors.
Conversely, if at least one vector vi in the set {v1 , v2 , . . . , vk } is a linear combination of
the remaining vectors; that is, if
vi = c1 v1 + · · · + ci−1 vi−1 + ci+1 vi+1 + · · · + ck vk ,
we reach the conclusion that a non-trivial linear combination of the vectors in the set
{v1 , v2 , . . . , vk } (in other words, a linear combination where not all the scalars are zero) is
equal to the zero vector; namely,
c1 v1 + · · · + ci−1 vi−1 − vi + ci+1 vi+1 + · · · + ck vk = 0.
Hence, by definition, the set {v1 , v2 , . . . , vk } is linearly dependent. This proves the second
direction of the theorem and completes the proof. !

Corollary to Theorem 19.1.1 Note that in the special case where we have two vectors
v1 and v2 in Rn , where n ≥ 2, the above theorem implies that these vectors are linearly
dependent if and only if at least one of them is a scalar multiple of the other. In other
words, at least one of the following two expressions is valid:
α1 v1 + v2 = 0 for some scalar α1 (hence v2 = −α1 v1 )
or
v1 + α2 v2 = 0 for some scalar α2 (hence v1 = −α2 v2 ).
   
1 2
For example, in R , let v1 = 2 and v2 = 4. The matrix A = (v1 v2 ) does not have
3   
4 8
full column rank, so the set {v1 , v2 } ∈ R3 is a linearly dependent set. In this particular
case, we can express each vector as a scalar multiple of the other, since
−2v1 + v2 = 0 (hence v2 = 2v1 )

205
and
1 1
v1 − v2 = 0 (hence v1 = v2 ).
2 2
   
1 0
On the other hand, let v1 = 2 and v2 = 0. The matrix A = (v1 v2 ) does not have
4 0
full column rank, so the set {v1 , v2 } ∈ R is a linearly dependent set. Here, we can express
3

only one of these vectors as a scalar multiple of the other, according to

0v1 + v2 = 0 (hence v2 = 0v1 ).

It is not possible to express a non-zero vector as a scalar multiple of the zero vector. This
last example motivates another theorem:

Theorem 19.1.2 A set of vectors {v1 , v2 , . . . , vk } ⊆ Rn which contains the zero vector
is a linearly dependent set.
Proof This follows immediately: If one of the vectors in the set {v1 , v2 , . . . , vk } is
the zero vector, the matrix A = (v1 v2 . . . vk ) cannot have full column rank, so the set
{v1 , v2 , . . . , vk } is linearly dependent. !
The following theorem captures an important property of a linearly independent set of
vectors:

Theorem 19.1.3 If {v1 , v2 , . . . , vk } ⊆ Rn is a linearly independent set and if

α1 v1 + α2 v2 + · · · + αk vk = β1 v1 + β2 v2 + · · · + βk vk

for some scalars α1 , . . . , αk and β1 , . . . , βk , then

α1 = β1 , α2 = β2 , ..., αk = βk .

Proof If {v1 , v2 , . . . , vk } ⊆ Rn is a linearly independent


 set, then the matrix A =
α1 − β1
 α2 − β2 
(v1 v2 . . . vk ) has full column rank. By letting x =  
 . . . , we can express the vector
αk − βk
equation
α1 v1 + α2 v2 + · · · + αk vk = β1 v1 + β2 v2 + · · · + βk vk
as the homogeneous linear system Ax = 0. Using the fact that A has full column rank,
the only solution of this system is the trivial solution x = 0. Hence, the conclusion of the
theorem follows. !
Now, if we have a linearly dependent set of vectors S = {v1 , v2 , . . . , vk } ⊆ Rn and we add
another vector w to S, then the new set T = {v1 , v2 , . . . , vk , w} is still linearly dependent.
This is because, by Theorem 19.1.1, it is still true that at least one of the ‘v’-vectors in
T = {v1 , v2 , . . . , vk , w}, call it vi , is a linear combination of the remaining vectors in
T = {v1 , v2 , . . . , vk , w}. This is because vi is already a linear combination of the other
‘v’-vectors in T , without necessarily involving w in that particular linear combination.
On the other hand, if the set S = {v1 , v2 , . . . , vk } ⊆ Rn is linearly independent and we add
another vector w to S, then the new set T = {v1 , v2 , . . . , vk , w} may or may not be linearly

206
independent. If w is in the linear span of S, then the new set T = {v1 , v2 , . . . , vk , w} fails
to be linearly independent. This is because, by definition of the linear span, any such w
is a linear combination of the ‘v’-vectors in T = {v1 , v2 , . . . , vk , w}. Hence, by Theorem
19.1.1, the set T is a linearly dependent set. So, what if w is not in the linear span of S?
The following theorem guarantees that in this case the new set T = {v1 , v2 , . . . , vk , w} is
still linearly independent.

Theorem 19.1.4 If S = {v1 , v2 , . . . , vk } is a linearly independent set of vectors in Rn


and if w ∈ Rn is not in the linear span of S; that is, w ∈ / Lin(S), then the new set
T = {v1 , v2 , . . . , vk , w} is linearly independent.
Proof If S = {v1 , v2 , . . . , vk } is a linearly independent set of vectors in Rn , the n × k
matrix A = (v1 v2 . . . vk ) has full column rank:
ρ(A) = k.
If w ∈/ Lin(S), the linear system described by the augmented matrix (A|w) is inconsistent.
Otherwise, it would have at least one solution, which would imply that w is a linear combi-
nation of the vectors in S, thus contradicting the assumption that w ∈/ Lin(S). Therefore,
since (A|w) is inconsistent, we have that ρ(A|w) > ρ(A). However, we know that (A|w)
differs from A by a single column, so the inequality ρ(A|w) > ρ(A) is tantamount to the
equation ρ(A|w) = ρ(A) + 1. Using the fact that ρ(A) = k, we deduce that
ρ(A|w) = k + 1.
It follows that the n × (k + 1) matrix B = (v1 v2 . . . vk w) associated with the set T =
{v1 , v2 , . . . , vk , w} has rank k + 1 and hence full column rank. This means that the set
T = {v1 , v2 , . . . , vk , w} is linearly independent.

19.2 Basis and dimension


Let V be a vector space. A subset B = {v1 , v2 , . . . , vk } of V is said to be a basis of V
only if B is a linearly independent set which spans V .
An alternative characterisation of a basis is provided by the following theorem. This is an
important theorem to understand and remember.

Theorem 19.2.1 A subset B = {v1 , v2 , . . . , vk } of a vector space V is a basis of V if


and only if every v ∈ V can be expressed as a unique linear combination of the vectors in
B.
Proof Below, we prove this theorem only for V = Rn , utilising the properties of the
matrix A = (v1 v2 . . . vk ). We start by noting that this is an ‘if and only if’ theorem, so we
have to prove both directions: If B = {v1 , v2 , . . . , vk } is a basis of Rn , then, by definition
of a basis, B is a linearly independent set which spans Rn . This implies that the n × k
matrix A = (v1 v2 . . . vk ) has to have full column rank and full row rank. This is possible
only if k = n = ρ(A), so A must be a square invertible matrix of order n. It follows that,
 v∈ R , the linear system Ax = v has a unique solution for the vector of scalars
n
for any
α1
 α2 
 
x =  .. . In other words, every v ∈ Rn can be expressed as a unique linear combination
 . 
αn

207
of the vectors in B:
v = α1 v1 + α2 v2 + · · · + αn vn .
Conversely, if every v ∈ Rn can be expressed as a unique linear combination of the vectors
in B,
v = α1 v1 + α2 v2 + · · · + αk vk ,
 
α1
 α2 
 
the equation Ax = v has a unique solution for x =  ..  for any right hand side v ∈ Rn .
.
αk
Hence, the n×k matrix A is actually square and invertible of size n. As a result, A has full
column rank and full row rank. This implies that the set B = {v1 , v2 , . . . , vk }, with k = n,
is a linearly independent set which spans Rn . Hence, by definition, the set B is a basis for
Rn . !

There are many different bases for a vector space V . A fundamental fact concerning vector
spaces is that if a vector space V has a basis consisting of a finite number of vectors, k,
then all the bases of V contain precisely the same number of vectors, k. This important
result is stated here without proof, but its proof can be found in our algebra textbook.

This result enables us to define exactly what we mean by the dimension of a vector space:
The number k of vectors in a finite basis of a vector space V is called the dimension of
V.

Remark 19.2.2 The vector space {0} is defined to have dimension 0. According to this
convention, {0} has a basis that contains no vectors. The only set without elements is the
empty set ∅, so this is the basis for {0}. In other words, {0} = Lin(∅).

Example 19.2.3 Any basis B of the vector space Rn has to have n vectors. Otherwise,
the matrix A whose columns are the vectors comprising the set B cannot have full column
rank and full row rank. Hence, the dimension of what is known as the n-dimensional real
coordinate space Rn is indeed n.

Example 19.2.4 Regarding the subspaces of Rn , any flat in Rn through the origin that
has a basis of k vectors is called a k-dimensional subspace of Rn . In particular, a line
through the origin is a one-dimensional subspace of Rn , a plane through the origin is a
two-dimensional subspace of Rn , and so on, until we reach a hyperplane through the origin,
which is an (n − 1)-dimensional subspace of Rn .

Example 19.2.5 On the other hand, the vector space F(R) of all functions from R to R
under the standard operations of pointwise addition and scalar multiplication has no finite
basis. We will provide more details about it later in the course. For the time being, let
us only mention that a vector space such as F(R) is called an infinite-dimensional vector
space.

Finally, let us mention another way of characterising a basis B = {v1 , v2 , . . . , vk } of a


vector space V . The proof of the relevant theorem can be found in our Algebra textbook.

Theorem 19.2.6 If V is a vector space, then a basis B = {v1 , v2 , . . . , vk } of V is a


smallest spanning set of V .

208
By a smallest spanning set, we mean a set of vectors {v1 , v2 , . . . , vk } that spans V and
has the property that if any of the vectors in this set is removed, the remaining vectors
can no longer span V . In particular, this implies that a smallest spanning set is a linearly
independent set.

19.3 Coordinates with respect to a basis


The standard basis of Rn is denoted by {e1 , e2 , . . . , en }. The vectors appearing in this
basis are defined by
     
1 0 0
0  1  0 
     
e1 =  ..  e2 =  ..  ... en =  ..  .
. . .
0 0 1
 
v1
 v2 
 
Given any vector v ∈ Rn , where v =  .. , we know by Theorem 19.2.1 that we
.
vn
can express v as a unique linear combination of the basis vectors. Introducing scalars
α1 , α2 , . . . , αn , we have
v = α 1 e1 + α 2 e2 · · · + α n en ,
or, equivalently, that
       
v1 1 0 0
 v2  0  1  0
       
 ..  = α1  ..  + α2  ..  · · · + αn  ..  .
. . . .
vn 0 0 1

It follows that α1 = v1 , α2 = v2 , . . . , αn = vn . The scalars α1 , α2 , . . . , αn are called the


standard coordinates of the vector v with respect to the standard basis. They are
denoted by (v). Of course, the coordinates (v) of v with respect to the standard basis
coincide numerically with the entries of v as a vector in Rn ; i.e.,
 
v1
 v2 
 
(v) =  ..  = v.
.
vn

However, this is not the case for other bases of Rn .

In general, if B = {v1 , v2 , . . . , vk } is a basis of a k-dimensional vector space V and v is


any vector in V , the scalars α1 , α2 , . . . , αk appearing in the unique linear combination

v = α1 v1 + α2 v2 · · · + αk vk

are called the coordinates of v with respect to the basis B and are denoted by (v)B . The
subscript B can be omitted if the basis B happens to be the standard basis.

209
Example 19.3.1 In R2 , consider the standard basis B = {e1 , e2 } and a non-standard basis
C = {v1 , v2 }, where
& ' & ' & ' & '
1 0 1 1
e1 = , e2 = , v1 = v2 = .
0 1 2 −1

Given a vector v ∈ R2 , where & '


2
v= ,
−5
the coordinates (v) of v with respect to the standard basis B are obtained by solving the
equation
α1 e1 + α2 e2 = v,
which is equivalent to the linear system
& '& ' & '
1 0 α1 2
= .
0 1 α2 −5

Clearly, α1 = 2 and α2 = −5, which means that


& '
2
(v) = .
−5

On the other hand, the coordinates (v)C of v with respect to the non-standard basis C are
obtained by solving the equation

α1 v1 + α2 v2 = v,

which is equivalent to the linear system


& '& ' & '
1 1 α1 2
= .
2 −1 α2 −5

Performing a few elementary operations (the steps are omitted), we find that α1 = −1 and
α2 = 3, which means that & '
−1
(v)C = .
3 C
Note that the vector equation v = −1v1 + 3v2 is a valid equation irrespectively of the
basis with respect to which the coordinates of these vectors are expressed. In particular,
expressing the vector equation v = −1v1 +3v2 with respect to the standard basis B results
in the matrix equation
(v) = −1(v1 ) + 3(v2 ).
Indeed, we have & ' & ' & '
2 1 1
= −1 +3 .
−5 2 −1
Similarly, expressing the vector equation v = −1v1 + 3v2 with respect to the non-standard
basis C results in the matrix equation

(v)C = −1(v1 )C + 3(v2 )C .

210
Realising that & ' & '
1 0
(v1 )C = (v2 )C = ,
0 C 1 C
the above matrix equation becomes
& ' & ' & '
−1 1 0
= −1 +3 ,
3 C 0 C 1 C

which is again a valid equation.

19.4 Exercises for self study


Exercise 19.4.1 Which of the following sets Ti provide a basis of R3 ? Briefly explain
your answers:
             
 1 2 4 7   1 1 
T 1 = 2  , 1  , 1  , 2  T2 = 0 , −1
   
3 0 0 1 1 1
             
 2 1 3   2 1 3 
T 3 = 1  ,  2  , 3  T4 = 1 ,  2  , 3 .
   
1 −1 0 1 −1 1

Exercise 19.4.2 Consider the non-standard basis C = {v1 , v2 } of R2 , where


& ' & '
1 2
v1 = v2 = .
1 −4

Given a general vector v ∈ R2 , where


& '
a
v= ,
b

find the coordinates (v)C of v with respect to the basis C.

Exercise 19.4.3 For each of the sets Si given below, find a basis of the vector space
Lin(Si ) and state its dimension. Provide a Cartesian description for any proper subspace
of R2 or R3 (i.e., for any subspace that is not equal to R2 or R3 ):
C& ' & 'P C& ' & ' & ' & 'P
1 2 1 0 2 −3
S1 = , S2 = , , ,
2 3 −1 0 −2 3
      
 1 2 1 
S3 =  0 , 1 , 2 .
   
 
−1 3 9

Exercise 19.4.4 Consider the set of functions defined by


* D +
D
V = f : R → R D f (x) = a + bx for some a, b ∈ R .

211
It is given that V is a vector space under the standard operations of pointwise addition
and scalar multiplication; that is, under the operations

(f + g)(x) := f (x) + g(x)

(λf )(x) := λf (x).


Consider the set S = {f1 , f2 } ⊆ V consisting of the vectors

f1 : R → R defined by f1 (x) = 1

and
f2 : R → R defined by f2 (x) = x.
(a) Show that the set S is a basis of V ; that is, show that S is a linearly independent set
which spans V . Also state the dimension of V .
(b) State the coordinates (f1 )S and (f2 )S of the vectors f1 and f2 with respect to the basis
S.
Now consider the set T = {g1 , g2 } ⊆ V consisting of the vectors

g1 : R → R defined by g1 (x) = 1 + 2x

and
g2 : R → R defined by g2 (x) = 3 − x.

(c) Find the coordinates (g1 )S and (g2 )S of the vectors g1 and g2 with respect to the basis
S.
(d) Given that the set T = {g1 , g2 } ⊆ V is also a basis of V , state the coordinates (g1 )T
and (g2 )T of the vectors g1 and g2 with respect to the basis T .

19.5 Relevant sections from the textbooks


• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 6.1, 6.2, 6.3 and 6.4 of our Algebra Textbook are relevant.

20 Vector Spaces, 4 of 4
In this section, we develop the concepts of angle and length, so that these concepts can be
applied to a general vector space V where angles and lengths do not arise in a natural way.
For example, in a vector space of matrices, what is the angle between two matrices? Or, in
a vector space of functions, what is the length of a function? By imposing the mathematical
ideas of inner product and norm as additional structures on a general vector space V , the
elements of V can mimic the geometrical properties of vectors in real coordinate vector
spaces. In this way, a variety of theorems established for real coordinate vector spaces,
including theorems on orthogonality of vectors and theorems on lengths, can be imported
directly on a general vector space V , enriching V and any theory defined on it considerably.

212
20.1 Inner product
Recall that the scalar product — or dot product — of two vectors u and v in the real
coordinate vector space Rn ,
   
G u1 v1 H
 u2   v 2 
   
9u, v: =  ..  ,  ..  = u1 v1 + u2 v2 + . . . un vn ,
. .
un vn

is equal to the product of the lengths of these vectors and the cosine of the angle between
them:
9u, v: = ||u|| ||v|| cos(θ).
Below, we prove this result in the case of R2 , and then we use the properties of the scalar
product on R2 to motivate the introduction of what is known as an inner product on a
general vector space V .
Theorem 20.1.1 Let a, b ∈ R2 and let θ denote the angle between them. Define the
scalar product 9a, b: of two vectors a, b by
E& ' & 'F
a1 b
9a, b: = , 1 = a1 b 1 + a2 b 2
a2 b2

and the length ||a|| of a vector a by


/
||a|| = 9a, a:.

Then, 9a, b: = ||a|| ||b|| cos(θ).


Proof Consider the following triangle:

The cosine rule states that c2 = a2 +b2 −2ab cos(θ), where a=||a||, b = ||b|| and c = ||b−a||.
That is,
||b − a||2 = ||a||2 + ||b||2 − 2||a||||b|| cos(θ).
By expanding the scalar product on the left hand side and using properties which follow
from the definition of this product, we obtain

||b − a||2 = 9b − a, b − a: = 9b, b: + 9a, a: − 2 9a, b: ,

which implies that


||b − a||2 = ||a||2 + ||b||2 − 2 9a, b: .

213
By comparing this equation with the cosine rule

||b − a||2 = ||a||2 + ||b||2 − 2||a||||b|| cos(θ),

we deduce that
9a, b: = ||a|| ||b|| cos(θ),
which is the required result. !

As explained previously, we would like to develop the concepts of angle and length in such
a way so that they become applicable to a general vector space V . To this end, we observe
that the scalar product 9 , : defined on R2 can be regarded as a function from the set of
pairs of vectors in R2 to the real numbers, i.e.,

9 , : : {(u, v) | u, v ∈ R2 } → R,

which satisfies the following three properties:

(1) The function 9 , : is bilinear, meaning that

9αa + βb, c: = α 9a, c: + β 9b, c:

9a, βb + γc: = β 9a, b: + γ 9a, c: .


The above equations can be verified using the definition of the inner product on R2 :
E& ' & 'F
a1 b
9a, b: = , 1 = a1 b 1 + a2 b 2 .
a2 b2

Bilinearity reflects the idea that the scalar product is compatible with the linear
structure of the vector space. For example, using bilinearity, we have

90, 0: = 9u − u, v − v: = 9u, v: − 9u, v: − 9u, v: + 9u, v: = 0,

as expected for the square length 90, 0: of the zero vector 0.


(2) The function 9 , : is symmetric, i.e.,

9a, b: = 9b, a: .

Symmetry reflects the idea that the angle between a and b is equal to the angle
between b and a.
(3) The function 9 , : is positive, in the sense that

9a, a: ≥ 0, with equality 9a, a: = 0 only if a = 0.

Positivity reflects the idea that the square length 9a, a: of any vector a is non-negative
and that the only vector whose length is equal to 0 is the zero vector 0.

These three properties motivate the following definition of a generalised scalar product on
any real vector space V , referred to as an inner product.

Let V be a real vector space. An inner product on V is a function 9 , : from the set
{(x, y)} of vectors in V to the set R which satisfies the following properties:

214
(1) 9αx + βy, z: = α 9x, z: + β 9y, z: for all x, y, z ∈ V and all α, β ∈ R.

(2) 9x, y: = 9y, x: for all x, y ∈ V .

(3) 9x, x: ≥ 0 for all x ∈ V , and 9x, x: = 0 if and only if x = 0, the zero vector of the
vector space.

Remark 20.1.2 Property (1) is known as linearity on the left. Property (1) and the
symmetry property (2) imply linearity on the right and hence bilinearity.

A vector space equipped with an inner product is known as an inner product space.

20.2 Norm
Let V be an inner product space and let x be an element of V . Then the inner product 9, :
defines the norm ||x||, or length ||x||, of x by
/
||x|| = 9x, x:.

Note that, by the positivity of the inner product, only the zero vector 0 in V has zero
length. Also note that given any non-zero vector v in an inner product space V , then it is
a simple matter to create a unit vector u parallel to v. This is the vector
1
u= v.
||v||
The process of constructing u from v is known as normalising v.

20.3 The Cauchy-Schwarz inequality


The following theorem is known as the Cauchy-Schwarz inequality:

Theorem 20.3.1 Let V be an inner product space. Then, for all x, y ∈ V ,

| 9x, y: | ≤ ||x|| ||y||,

where the symbol on the left hand side is the absolute value of 9x, y:.

Proof Let x, y be any two vectors of V . For any real number α, we consider the vector
αx + y. Certainly, ||αx + y||2 ≥ 0 for all α. But

||αx + y||2 = 9αx + y, αx + y:


= α2 9x, x: + α 9x, y: + α 9y, x: + 9y, y:
= α2 ||x||2 + 2α 9x, y: + ||y||2 .

Now, this is a quadratic expression in α. Generally, if a quadratic expression aα2 + bα + c


is non-negative for all α (as is the case here), then its graph cannot go below the horizontal
Cartesian axis. This implies that the discriminant b2 − 4ac ≤ 0. Applying this observation
to the above quadratic function in α, we see that we must have

(2 9x, y:)2 − 4||x||2 ||y||2 ≤ 0,

215
or
(9x, y:)2 ≤ ||x||2 ||y||2 .
Taking the square root of each side, we obtain

| 9x, y: | ≤ ||x|| ||y||,

which is the required result. !

20.4 Angle and orthogonality


We are now ready to extend the concept of angle to a general inner product space V . To
do this, we begin with the result that in R2 , 9x, y: = ||x|| ||y||cos(θ), where θ is the angle
between the vectors x, y. This suggests that we might, more generally (in a general inner
product space) define the cosine of the angle between vectors x and y to be

9x, y:
cos(θ) = .
||x|| ||y||

This definition will only make sense if we can show that this number cos(θ) is between −1
and 1. But this follows immediately from the Cauchy-Schwarz inequality, which can be
stated as D D
D 9x, y: D
D D
D D ≤ 1.
D ||x|| ||y|| D
The usefulness of this definition is in the concept of orthogonality.
Suppose that V is an inner product space. Then x, y ∈ V are said to be orthogonal if
and only if 9x, y: = 0. We write x ⊥ y to mean that x, y are orthogonal.

20.5 Pythagoras’ theorem and the triangle inequality


We now have the ingredients to imitate the geometry of vectors of real coordinate vector
spaces. We are already familiar with Pythagoras’ theorem in R2 , which states that if c is
the length of the hypotenuse of a right-angled triangle, and a and b the lengths of the other
two sides, then c2 = a2 + b2 . The generalised version of Pythagoras’ theorem, applicable
to any real inner product space, is described by the following theorem.

Theorem 20.5.1 In an inner product space V , if x, y ∈ V are orthogonal, then

||x + y||2 = ||x||2 + ||y||2 .

Proof We know that for any z, ||z||2 = 9z, z:, simply from the definition of the norm.
So,
||x + y||2 = 9x + y, x + y:
= 9x, x + y: + 9y, x + y:
= 9x, x: + 9x, y: + 9y, x: + 9y, y:
= ||x||2 + 2 9x, y: + ||y||2
= ||x||2 + ||y||2 ,
where the last line follows from the fact that, x, y being orthogonal, 9x, y: = 0. !

216
Moreover, we also have the triangle inequality for norms. In the special case of the
standard inner product on R2 , this states the obvious fact that the length of one side of a
triangle must be no more than the sum of the lengths of the other two sides.

Theorem 20.5.2 In an inner product space V , if x, y ∈ V , then

||x + y|| ≤ ||x|| + ||y||.

Proof We have
||x + y||2 = 9x + y, x + y:
= 9x, x + y: + 9y, x + y:
= 9x, x: + 9x, y: + 9y, x: + 9y, y:
= ||x||2 + 2 9x, y: + ||y||2
≤ ||x||2 + ||y||2 + 2| 9x, y: |
≤ ||x||2 + ||y||2 + 2||x|| ||y||
= (||x|| + ||y||)2 ,

where the last inequality used is the Cauchy-Schwarz inequality. Thus, taking square roots,
we have that ||x + y|| ≤ ||x||+||y||, as required. !

An important property of a set {u1 , . . . , uk } of orthogonal vectors in an inner product space


V is that they are linearly independent. This means that if this set of vectors spans V ,
then it is a basis for V . The linear independence is guaranteed by the following theorem.

Theorem 20.5.3 Suppose that V is an inner product space and that vectors v1 , v2 , . . . , vk
∈ V are pairwise orthogonal (meaning vi ⊥ vj for i )= j), and none is the zero vector.
Then {v1 , v2 , . . . , vk } is a linearly independent set of vectors.

Proof We need to show that if

α1 v1 + α2 v2 + · · · + αk vk = 0,

then α1 = α2 = · · · = αk = 0. Let i be any integer between 1 and k. Then taking the


inner product with vi ,

9vi , α1 v1 + · · · + αk vk : = 9vi , 0: = 0.

But
9vi , α1 v1 + · · · + αk vk : = α1 9vi , v1 : + · · · + ai 9vi , vi : + · · · + αk 9vi , vk : .
Since 9vi , vj : = 0 for i )= j, the right hand side is equal to αi 9vi , vi :, which is equal to
αi ||vi ||2 . So the equation
9vi , α1 v1 + · · · + αk vk : = 0
reduces to
αi ||vi ||2 = 0.
Moreover, since vi )= 0, ||vi ||2 )= 0 and hence αi = 0. But i was any integer in the range 1
to k, so we deduce that
α1 = α2 = · · · = αk = 0,
as required. !

217
20.6 Orthonormality and the Gram-Schmidt process
A set of vectors {x1 , x2 , . . . , xk } in an inner product space V is said to be an
orthonormal set if any two different vectors in the set are orthogonal and each vector is
a unit vector; that is

9xi , xj : = 0 for i )= j and ||xi || = 1.

Given a set of linearly independent vectors {v1 , v2 , . . . , vk }, the Gram-Schmidt


orthonormalisation process is a way of producing k vectors that span the same space
as {v1 , v2 , . . . , vk }, and that form an orthonormal set. That is, the process produces a set
{u1 , u2 , . . . , uk } such that:

• Lin{u1 , u2 , . . . , uk } = Lin{v1 , v2 , . . . , vk }

• {u1 , u2 , . . . , uk } is an orthonormal set.

It works as follows. First, we set


v1
u1 =
||v1 ||
so that u1 is a unit vector and Lin{u1 } = Lin {v1 }.
Then we define
w2 = v2 − 9v2 , u1 : u1 ,
and set
w2
u2 = .
||w2 ||
Then {u1 , u2 } is an orthonormal set and Lin {u1 , u2 } = Lin {v1 , v2 }.
Next, we define
w3 = v3 − 9v3 , u1 : u1 − 9v3 , u2 : u2
and set
w3
u3 = .
||w3 ||
Then {u1 , u2 , u3 } is an orthonormal set and Lin {u1 , u2 , u3 } = Lin {v1 , v2 , v3 }. The process
continues in an identical fashion until we exhaust all the given vectors {v1 , v2 , . . . vk }.
Using the inner product, you can verify that w2 and hence u2 is orthogonal u1 , w3 and
hence u3 is orthogonal u2 , and so on. An example of applying the Gram-Schmidt process
to create an orthonormal set of vectors can be found in Exercise 20.7.2 below.

20.7 Exercises for self study


Exercise 20.7.1 An inner product 9 , : is defined on the real coordinate vector space R2
as follows & '
T 1 −2
9u, v: = u v.
−2 5
(a) Show that bilinearity, symmetry and positivity are satisfied by 9 , :.

218
& '
1
(b) Find a vector y orthogonal to the vector x = with respect to the inner product
1
given above.
   
1 1
Exercise 20.7.2 Consider the vectors v1 = 2 and v2 = 0.
  
0 1

(a) Show that the set {v1 , v2 } is a linearly independent set.


(b) Find a Cartesian equation for the subspace Lin{v1 , v2 } of R3 and state the dimension
of Lin{v1 , v2 }.
(c) Construct an orthonormal basis {u1 , u2 } for Lin{v1 , v2 } by applying the Gram-Schmidt
procedure on the set {v1 , v2 }, starting with v1 .
(d) Extend the basis {u1 , u2 } to an orhonormal basis {u1 , u2 , u3 } for R3 ; that is, find an
orthonormal basis for R3 whose first two vectors span Lin{v1 , v2 } and whose third vector
u3 is orthogonal to Lin{v1 , v2 }.

Exercise 20.7.3 Consider the set C[a, b] of all continuous functions defined on an interval
[a, b]. It can be shown that C[a, b] is a vector space under the standard operations of
pointwise addition and scalar multiplication of functions; that is,
(f + g)(x) := f (x) + g(x),
(λf )(x) := λf (x).
Show that the function 9 , : defined by
0 b
9f, g: := f (x)g(x)dx
a
is an inner product on the vector space C[a, b]; that is, show that 9 , : satisfies linearity on
the left, symmetry and positivity. You may use the fact that the product of two continuous
functions on [a, b] is also continuous and you may also use the fact that if a non-negative
continuous function h on [a, b] satisfies
0 b
h(x)dx = 0,
a
then h(x) = 0 for all x ∈ [a, b].

Exercise 20.7.4 Building on Exercise 20.7.3, consider the vector space L[−2, 2] of all
functions of the form a + bx defined on the interval [−2, 2]. It can be shown that the
functions
f1 : [−2, 2] → R defined by f1 (x) = 1
and
f2 : [−2, 2] → R defined by f2 (x) = x
form a basis B = {f1 , f2 } for L[−2, 2]. Apply the Gram-Schmidt procedure to find an
orthonormal basis C = {g1 , g2 } for L[−2, 2] with respect to the inner product given in
Exercise 20.7.3; that is, 0 2
9f, g: := f (x)g(x)dx.
−2

219
20.8 Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge Univer-
sity Press.
Sections 10.1, 10.2 and 10.4 of our Algebra Textbook are relevant.

220
   
Written by:
Dr Ioannis Kouletsis

You might also like