IFP - Mathematics and Statistics PDF
IFP - Mathematics and Statistics PDF
IFP - Mathematics and Statistics PDF
Mathematics
and Statistics
FP0001
This guide was prepared for the University of London by:
This is one of a series of subject guides published by the University. We regret that due to pressure
of work the authors are unable to enter into any correspondence relating to, or arising from, the
guide. If you have any comments on this subject guide, please use the online form found on the
virtual learning environment.
University of London
Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
london.ac.uk
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form, or
by any means, without permission in writing from the publisher. We make every effort to respect
copyright. If you think we have inadvertently used your copyright material, please let us know.
Contents
Contents
Introduction 1
Route map to the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Time management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Recommendations for working through the units . . . . . . . . . . . . . . 2
Overview of learning resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The subject guide and textbooks . . . . . . . . . . . . . . . . . . . . . . 2
Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Virtual Learning Environment (VLE) . . . . . . . . . . . . . . . . . . . . 3
Making use of the Online Library . . . . . . . . . . . . . . . . . . . . . . 4
Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Part 1 Mathematics 6
Introduction to Mathematics 7
Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Learning outcomes for the course (Mathematics) . . . . . . . . . . . . . . . . . 8
Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
i
Contents
4 Functions 67
4.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 What is a function? . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.2 Some common functions . . . . . . . . . . . . . . . . . . . . . . . 69
ii
Contents
5 Calculus I — Differentiation 86
5.1 The gradient of a curve at a point . . . . . . . . . . . . . . . . . . . . . . 86
5.1.1 Tangents to a parabola . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 Chords of a parabola . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1.3 Tangents to other curves . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 What is differentiation? . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Standard derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.2 Two rules of differentiation . . . . . . . . . . . . . . . . . . . . . . 94
5.2.3 Some general points on what we have seen so far . . . . . . . . . . 96
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
iii
Contents
iv
Contents
v
Contents
vi
Contents
vii
Contents
viii
Contents
ix
Contents
x
Introduction
Introduction
Welcome to the world of Mathematics and Statistics! These are disciplines which are
widely applied in areas such as finance, business, management, economics and other
fields in the social sciences. The following units will provide you with the opportunity to
grasp the fundamentals of these subjects and will equip you with some of the vital
quantitative skills and powers of analysis which are highly sought-after by employers in
many sectors.
As Mathematics and Statistics has so many applications, it should not be surprising
that it forms the compulsory component of the International Foundation Programme.
The analytical skills which you will develop on this course will therefore serve you well
in both your future studies and beyond in the real world of work. The material in this
course is necessary as preparation for other courses you may study later on as part of a
degree programme or diploma; indeed, in many cases a course in Mathematics or
Statistics is a compulsory component on University of London degrees.
This subject guide provides you with a framework for covering the syllabus of the
Mathematics and Statistics course in the International Foundation Programme and
directs you to additional resources such as readings and the virtual learning
environment (VLE).
The following 20 units will introduce you to these disciplines and equip you with the
necessary quantitative skills to assist you in further programmes of study. Given the
cumulative nature of Mathematics and Statistics, the units are not a series of
self-contained topics, rather they build on each other sequentially. As such, you are
strongly advised to follow the subject guide in unit order. There is little point in rushing
past material which you have only partially understood in order to reach the final unit.
Once you have completed your work on all of the units, you will be ready for
examination revision. A good place to start is the sample examination paper which you
will find at the end of the subject guide.
Time management
About one-third of your private study time should be spent reading and the other
two-thirds doing problems. (Note the emphasis on practising problems!)
To help your time management, each unit of this course should take a week to study
and so you should be spending 10 weeks on Mathematics and 10 weeks on Statistics.
1
Introduction
2
Introduction
course’, unlike textbooks which may cover additional material which will not be
examinable or may not cover some material that is! Therefore the subject guide should
act as your principal resource.
However, a textbook may give an alternative explanation of a topic (which is useful if
you have difficulty following something in the subject guide) and so you may want to
consult one for further clarification. Additionally, a textbook will contain further
examples and exercises which can be used to check and consolidate your understanding.
For this course, a useful starting point is
+ Swift, L., and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558].
as this will serve as useful background reading. But, many books are available covering
the material frequently found in mathematics and statistics courses like this one and so,
if you need a textbook for background reading, you should find one that is appropriate
to your level and tastes.
3
Introduction
Past examination papers and Examiners’ commentaries are available for download and
these provide advice on how each examination question might best be answered.
Self-testing activities allow you to test your knowledge and recall of the academic
content of various courses. Finally, a section of the VLE has been dedicated to
providing you with expert advice on practical study skills such as preparing for
examinations and developing digital literacy skills.
1. removing any punctuation from the title, such as single quotation marks, question
marks and colons, and/or
2. putting quotation marks around the title, for example “Why the banking system
should be regulated”.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login: http://tinyurl.com/ollathens
Examination advice
Important: the information and advice given in the following section are based on the
examination structure used at the time this subject guide was written. Please note that
subject guides may be used for several years. Because of this, we strongly advise you to
check both the current Regulations for relevant information about the examination,
and the current Examiners’ commentaries, where you should be advised of any
forthcoming changes. You should also carefully check the rubric/instructions on the
paper you actually sit and follow those instructions.
The examination is by a two-hour, unseen, written paper. No books may be taken into
the examination, but you will be provided with extracts of statistical tables (as
reproduced in this subject guide). A calculator may be used when answering questions
on this paper, see below, and it must comply in all respects with the specification given
in the General Regulations.
The examination comprises two sections, each containing three compulsory questions.
Section A covers the mathematics part of the course counting for 50% of the marks, and
Section B covers the statistics part of the course for the remaining 50% of the marks.
You are required to pass both Sections A and B to pass the examination.
4
Introduction
In each section, the first question contains four short questions worth 5 marks each,
followed by two longer questions worth 15 marks each. Since the examination will seek
to assess a broad cross-section of the syllabus, we strongly advise you to study the
whole syllabus. A sample examination paper is provided at the end of this subject guide
along with a commentary providing extensive advice on how to answer each question.
Remember, it is important to check the VLE for:
Calculators
You will need to provide yourself with a basic calculator. It should not be
programmable, because such machines are not allowed in the examination by the
University. The most important thing is that you should accustom yourself to using
your chosen calculator and feel comfortable with it. Your calculator must comply in all
respects with the specification given in the General Regulations.
5
Part 1
Mathematics
6
Introduction to Mathematics
Introduction to Mathematics
Syllabus
This half of the course introduces some of the basic ideas and methods of Mathematics
with an emphasis on their application. The Mathematics part of this course has the
following syllabus.
Calculus: The meaning of the derivative and how to find it (including the
product, quotient and chain rules). Using derivatives to find approximations and
solve simple optimisation problems with economic applications. Curve sketching.
Integration of simple functions and using integrals to find areas.
7
Introduction to Mathematics
Textbook
As previously mentioned in the main introduction, this subject guide has been designed
to act as your principal resource. The textbook
+ Swift, L., and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558].
may be useful as ‘background reading’ but it is not essential. However, you might
benefit from reading parts of it if you find any of the material difficult to follow at first.
8
1. Review I — A review of some basic mathematics
1
Unit 1: Review I
A review of some basic mathematics
Overview
In this unit we revise some material on arithmetic and algebra which you should have
encountered before. Starting with arithmetic, this will involve revising the basic
mathematical operations and how they can be combined with and without the use of
brackets, how we can manipulate fractions and the use of powers. We then look at some
basic algebra and see how to use and manipulate algebraic expressions.
Aims
To revise the basics of arithmetic, including the use of fractions and powers.
1.1 Arithmetic
In this section we revise some material which could be called ‘arithmetic’. The idea
behind this revision is to refresh our memories about how things like brackets, fractions
and powers work so that our revision of ‘algebra’ in the next section will, hopefully, be
easier.
9
1. Review I — A review of some basic mathematics
1
confused with a handwritten ‘x’ whereas, for division, the reason is that writing
expressions that involve division (i.e. ‘÷’) as fractions enables us to manipulate them
more easily using the laws of fractions.
Combinations of operations
Often, different mathematical operations will occur in the same expression. For
example, we might be asked to work out the values of the expressions
1. 22 − 7 + 12 − 26 + 1,
2. 125 ÷ 25 × 2 × 3 ÷ 15,
3. 22 − 20 × 3 ÷ 4 − 5.
In such cases, we have the following rules.
1. If only addition and subtraction are involved: We work from left to right to get
22 − 7} +12 − 26 + 1 = 15
| {z + 12} −26 + 1 = 27
| {z − 26} +1 = 1
| {z + 1} = 2.
| {z
2. If only multiplication and division are involved: We work from left to right to get
125
| {z÷ 25} ×2 × 3 ÷ 15 = 5 × 2} ×3 ÷ 15 = 10
| {z × 3} ÷15 = 30
| {z ÷ 15} = 2.
| {z
If an expression involves brackets, then the operations within the brackets must be
performed first. As such, brackets can be used to change the order in which operations
are performed. For example, we might be asked to work out the values of the expressions
1. 9 − (4 + 3) as opposed to 9 − 4 + 3,
2. 6 ÷ (2 × 3) as opposed to 6 ÷ 2 × 3,
3. (12 × 3 − 8) × 2 as opposed to 12 × 3 − 8 × 2.
In such cases, we work out the expression in brackets first, i.e. we get
as opposed to
− 4} +3 = 5 + 3 = 8,
9| {z
where we work from left to right.
10
1. Review I — A review of some basic mathematics
1
2. working out the expression in brackets first we get
6 ÷ (2| {z
× 3}) = 6 ÷ 6 = 1,
as opposed to
÷ 2} ×3 = 3 × 3 = 9,
|6 {z
where we work from left to right.
3. working out the expression in brackets first, proceeding to the rules above as
necessary, we have
(12 × 3} −8) × 2 = (36
| {z − 8}) × 2 = 28 × 2 = 56,
| {z
as opposed to
12 × 3} − 8| {z
| {z × 2} = 36 − 16 = 20,
where we multiply first and then subtract.
What if we have two or more sets of brackets? Well, if they are not ‘nested’, for example
if we have
(12 × 3 − 8) × (24 − 14),
then we need to work out what is in each of the brackets first, proceeding according to
the rules above, i.e.
(12 × 3} −8) × (24
| {z − 14}) = (36
| {z − 8}) × 10 = 28 × 10 = 280.
| {z
And, if the brackets are ‘nested’, for example
6 + (9 − (4 + 3)),
then we start with the innermost set of brackets and work ‘outwards’, i.e.
6 + (9 − (4| {z − 7}) = 6 + 2 = 8.
+ 3})) = 6 + (9| {z
These rules allow you to work out the values of simple mathematical expressions using
brackets. In a moment we shall see another way of dealing with brackets which will be
more useful to us in this course.
Negative numbers
1. 6 − 3 = +3,
2. 6 − 6 = 0,
3. 6 − 9 = −3.
In this case, we can see that subtracting larger and larger numbers from six, gives us a
positive answer, zero and a negative answer respectively. For simplicity, we usually omit
the ‘+’ sign and write ‘+3’, say, as 3.
When we have expressions involving negative numbers, we have the following handy
rules.
11
1. Review I — A review of some basic mathematics
1
1. Adding a negative number: This has the same effect as subtracting the
corresponding positive number, e.g.
5 + (−3) = 5 − (+3) = 5 − 3 = 2,
and
−5 + (−3) = −5 − (+3) = −5 − 3 = −8.
2. Subtracting a negative number: This has the same effect as adding the
corresponding positive number, e.g.
5 − (−3) = 5 + (+3) = 5 + 3 = 8.
and
−5 − (−3) = −5 + (+3) = −5 + 3 = −2.
5. Dividing a positive number by a negative number (or vice versa): This gives us a
negative number, e.g.
(+6) ÷ (−3) = −(6 ÷ 3) = −2.
and
(−6) ÷ (+3) = −(6 ÷ 3) = −2.
This is normally remembered as ‘positive divided by negative is negative’ (or vice
versa).
12
1. Review I — A review of some basic mathematics
1
Brackets II: Removing brackets from expressions
A more useful way of thinking about brackets involves being able to ‘remove’ the
brackets from an expression. For example, consider the expression
3 + 2 × (9 − 4).
3 + 2 × (9| {z
− 4}) = 3 + |2 {z
× 5} = 3 + 10 = 13.
2 × (9 − 4) = (2 × 9) − (2 × 4).
3 + 2 × (9 − 4) = 3 + ((2 × 9}) − (2
| {z × 4})) = 3 + (18
| {z − 8}) = 3 + 10 = 13,
| {z
| {z }
Activity 1.1 Show that if we worked out 3 + (9 − 4) × 2, we would also get 13.
What if we had to work out 3 − (9 − 4)? We adopt the convention that a minus sign in
front of a bracket is the same as adding something that has been multiplied by −1.
Using this, and what we saw above, gives us
− 4}) = 3 − 5 = −2.
Of course, this is what we should expect as 3 − (9| {z
Absolute values
The magnitude (or absolute value) of a number is found by ignoring the minus sign
(if there is one). For example, the magnitude of 6, written |6|, is 6 and the magnitude of
−6, written | − 6|, is also 6, i.e. we have
|6| = 6 and | − 6| = 6.
In a way, the magnitude operation acts like a bracket as we need to evaluate the
magnitude of the number inside it before we use it in calculations, e.g.
4 − |2 − 3| = 4 − 1 = 3 as |2 − 3| = | − 1| = 1, and
|4 − 2| − 3 = 2 − 3 = −1 as |4 − 2| = |2| = 2.
13
1. Review I — A review of some basic mathematics
1
Inequalities
We use the symbols ‘<’ and ‘>’ to show that one number is ‘less than’ or ‘greater than’
another number respectively. So, for example, 2 < 3 and 5 > 1. Zero is less than any
positive number and greater than any negative number, e.g. 0 < 5 and 0 > −5. As such,
any negative number is less than any positive number, e.g. −3 < 2. Negative numbers
are larger when they have smaller magnitudes (i.e. when they are closer to zero), e.g.
−3 < −2 and −1 > −5. As such, we can say that smaller negative numbers (like −100
compared to −1) have larger absolute values (like 100 compared to 1).
1.1.2 Fractions
A fraction such as 32 is, using our ‘horizontal line’ notation for division, the same as
dividing the number above the line (i.e. 3) by the number below the line (i.e. 2). We call
the number above the line the numerator and the number below the line the
denominator. If we have two fractions, say
3 4
and ,
5 2
the number we get by multiplying their denominators together is called the common
denominator of these fractions, and this will be 5 × 2 = 10 in this case.
Manipulating fractions
6
To simplify a fraction we want to write it in lowest terms,1 e.g. 10
can be written as
6 2×3 3
= = ,
10 2×5 5
3 2×3 6
= = ,
5 2×5 10
14
1. Review I — A review of some basic mathematics
1
Adding and subtracting fractions
To add or subtract fractions, we first put them over a common denominator, e.g.
4 2 4×3 2×5 12 10 12 + 10 22
+ = + = + = = ,
5 3 5×3 3×5 15 15 15 15
and
4 2 4×3 2×5 12 10 12 − 10 2
− = − = − = = .
5 3 5×3 3×5 15 15 15 15
Multiplying fractions
To multiply fractions, we just multiply the numerators and denominators together, e.g.
4 2 4×2 8
× = = .
5 3 5×3 15
Reciprocals
The reciprocal of a fraction is what we get when we swap the numerator and
denominator around, e.g. the reciprocal of 53 is 53 . The reciprocal is useful when we come
to divide fractions as we shall now see.
Dividing fractions
To divide fractions, we multiply the first fraction by the reciprocal of the second, e.g. if
we want to evaluate
4 2
÷ ,
5 3
4
the rule tells us that this is the same as multiplying 5
by the reciprocal of 23 , which is 32 ,
and so we have
4 2 4 3
÷ = × .
5 3 5 2
This can now be worked out using the multiplication rule, i.e.
4 2 4 3 4×3 12
÷ = × = = .
5 3 5 2 5×2 10
Of course, we can simplify this by noting that the numerator and denominator have a
common factor of 2, i.e. the answer is 65 in lowest terms.
It is, perhaps, also interesting to note that the reciprocal of a fraction is just one
divided by that fraction, e.g. as
3 2 2
1÷ =1× = ,
2 3 3
15
1. Review I — A review of some basic mathematics
1
Improper and proper fractions
An improper fraction is one where the numerator is greater in magnitude than the
denominator and a proper fraction is one where the numerator is less in magnitude
than the denominator, e.g. 22
5
is an improper fraction and 45 is a proper fraction.
Sometimes it is convenient to be able to write improper fractions as proper fractions,
e.g. we can write
22 20 + 2 20 2 2
= = + =4+ ,
5 5 5 5 5
2
as 5 goes into 20 four times. This can be written as 4 5 and we read it as ‘four and two
fifths’ to indicate that 22
5
is the same as four ‘wholes’ and two fifths of a ‘whole’.
However, in this course, we will usually not use this way of writing fractions as, using
our convention of writing 4 × 25 as 4 · 25 , we can easily get confused between ‘four and
two fifths’ and ‘four times two fifths’. As such, when the need arises, we will normally
stick to improper fractions.
Decimals
Often, you will see fractions written as decimals and vice versa, e.g. the fraction 41 is
exactly the same as the decimal 0.25. But, be aware that some fractions do not have a
nice finite decimal expansion, e.g.
1
is the decimal 0.333333 . . . ,
3
i.e. there is an infinite number of threes after the decimal point. The problem with this
is that, in such cases, using decimals instead of fractions can lead to rounding errors, e.g.
1
3× = 1,
3
exactly. But, just keeping the first four threes of the decimal expansion for 13 , i.e.
rounding 13 to four decimal places, written 4dp, we have 0.3333 and this gives us
1
3× ' 3 × 0.3333 = 0.9999,
3
where ‘'’ means ‘approximately equal to’. That is, using the decimal rounded to four
decimal places gives us an answer which is not exactly one, i.e. there is a rounding error
in our calculation, and this is why we generally use fractions instead of decimals.
Percentages
20
The percentage sign, i.e. ‘%’, means ‘divide by 100’, e.g. 20% is the same as 100
as a
fraction, or 0.2 as a decimal. As such, 20% of 150 is
20 3, 000
150 × = = 30.
100 100
Knowing this, we can see what it means to increase 150 by 20% or decrease 150 by 20%,
i.e.
16
1. Review I — A review of some basic mathematics
1
to increase 150 by 20%, we get
150 + 30 = 180,
as 30 is 20% of 150. Notice that an increase by 20% can also be seen as 120% of the
original, i.e.
120 18, 000
150 × = = 180,
100 100
as before.
to decrease 150 by 20%, we get
150 − 30 = 120,
as 30 is 20% of 150. Notice that a decrease by 20% can also be seen as 80% of the
original, i.e.
80 12, 000
150 × = = 120,
100 100
as before.
These ideas will be particularly useful when we come to consider compound interest in
Unit 9.
1.1.3 Powers
Another operation that you will have come across before is the idea of ‘raising a number
to a certain power’. The number which represents the power can also be called the
exponent and the number which is being raised to that power is called the base. For
1
example, we could have 42 , 4−2 or 4 2 and, in each case, ‘4’ is the base and the other
number, i.e. ‘2’, ‘−2’ or ‘ 12 ’ respectively, is the exponent or power. We often refer to
expressions of this form as ‘powers’.
The simplest powers to work out are those where the power is a positive integer such as
1, 2, 3, . . . . In such cases, the power just means ‘multiply the base by itself that many
times’, e.g.
41 = 4, 42 = 4 × 4 = 16, 43 = 4 × 4 × 4 = 64, . . . .
One application of this is standard index form (or scientific notation) where we are
able to write large numbers in terms of powers of 10, e.g. we can write three million as
3, 000, 000 = 3 × 1, 000, 000 = 3 × 106 ,
as 1, 000, 000 is the same as 106 .
17
1. Review I — A review of some basic mathematics
1
Of course, as before, we can also use brackets to change the order in which we do the
operations, e.g.
× 4})2 + 3 = |{z}
(2| {z 82 + 3 = 64 + 3} = 67,
| {z
and
42 + 3) = 2 × (16
2 × (|{z} + 3}) = 2| ×
| {z {z19} = 38.
In particular, when writing out expressions involving brackets, take care to distinguish
between, e.g. 23 + 5 and 23+5 , as the former is 13 whilst the latter is 256!
Also, similar to what we saw earlier, it is possible to remove the brackets from
expressions involving powers by applying the power to all of the numbers in the bracket.
For example,
(2 × 3)4 = 24 × 34 = 16 × 81 = 1, 296.
4
2 24 16
= 4 = .
3 3 81
If we have the same base, then the power laws can allow us to simplify expressions
that involve multiplying powers, dividing powers and raising to powers. These laws are
as follows.
Multiplying powers: If we multiply two powers, we add the powers. For example,
if we have 24 × 23 , we can write,
24 × 23 = 24+3 ,
(24 )3 = 24×3 ,
34 × 25 we could use 34 × 25 = 81 × 32 = 2, 592, but we could not use the power law.
34 34 81
we could use = , but we could not use the power law.
25 25 32
18
1. Review I — A review of some basic mathematics
1
Negative integer powers
Negative integer powers, such as −1, −2, −3, . . ., mean ‘take the reciprocal of the base
raised to the corresponding positive power’. For example,
1 1 1 1 1 1
4−1 = 1
= , 4−2 = 2
= , 4−3 = 3
= ,....
4 4 4 16 4 64
1
In particular, note that a power of −1 is the same as the reciprocal, e.g. 4−1 = 4
which
is the reciprocal of 4. Similarly, this means that
−1
3 5
= ,
5 3
Zero powers
We now observe that any number raised to the power zero is one. For example, as
41 × 4−1 = 41−1 = 40 ,
A square root of a number, say 64, is a number which, when multiplied by itself, gives
us 64. So, as
8 × 8 = 64,
we can see that 8 is a square root of 64. Indeed, since a negative number times a
negative number is positive, we can see that
and so −8 is also a square root of 64. Thus, we can see that the square roots of 64 are 8
and −8. We often express this by saying ‘the square roots of 64 are ±8’ where the ‘±’ is
read ‘plus or minus’. Thus, we can see, by repeating this argument, that every positive
number has two square roots, one positive and one negative, and both of the same
magnitude.
What about other numbers? Well, since 0 × 0 = 0, we can see that the square root of
zero is zero and, moreover, zero is the only square root of zero. And, if we consider
negative numbers, say −64, we can see that there are no square roots since there is no
way of multiplying a number by itself to get −64.
√
We often denote the positive
√ square root
√ of a number, say 64, by ‘ 64’ and so, from the
above we can see that 64 =√8 and 0 = 0. Of course, as negative numbers have no
square roots, something like −64 does not exist.
19
1. Review I — A review of some basic mathematics
1
Going back to our earlier example, as the square root of 64 is a number which, when
multiplied by itself, gives us 64 we can see that
√ 2 √ √
64 = 64 × 64 = 64,
and this is why the square root is so called: squaring the square root gives us the original
number. Now, if we think of raising the number 64 to the power 21 , we can see that
1
2 1
64 2 = 64 2 ×2 = 641 = 64,
Activity 1.2 Find the square roots of 4, 9, 16, 25, 36 and 49.
More generally, if n is a positive integer greater than 2, we say that an nth root of a
number, say 64, is a number which gives us 64 √ when raised to the power n. We often
denote the nth root of a number, say 64, by n 64. For example,
√
3
the cube root of 64, denoted by 64, is 4 as four cubed is 64, i.e.
√
43 = 64 and so
3
64 = 4.
Notice that 64 has no negative cube root since (−4)3 = −64 and not 64, as such 64
only has one cube root, i.e. 4. Repeating this argument, we can see that all positive
numbers only have one cube root.
√ 3
64 = 43 = 64 and
3
In terms of powers, as
1
3 1
64 3 = 64 3 ×3 = 641 = 64,
1
comparing
√ these two expressions it is natural to think of 64 3 as exactly the same
3
thing as 64, i.e. √
1 3
64 3 = 64,
and so we identify cube roots with powers of 13 .
√
the sixth root of 64, denoted by 6 64, is 2 as two to the power six is 64, i.e.
√
26 = 64 and so
6
64 = 2.
Notice that 64 also has a negative sixth root since (−2)6 = 64 and so 64 has two
sixth roots, i.e. ±2. Repeating this argument, we can see that all positive numbers
will have two sixth roots.
20
1. Review I — A review of some basic mathematics
1
√ 6
64 = 26 = 64 and
6
In terms of powers, as
1
6 1
64 6 = 64 6 ×6 = 641 = 64,
1
comparing
√ these two expressions it is natural to think of 64 6 as exactly the same
6
thing as 64, i.e. √
1 6
64 6 = 64,
and so we identify sixth roots with powers of 16 .
√
And, more generally, we can write the positive nth root of a number a, or n
a, as a to
1
the power of n1 , i.e. a n .
Activity 1.3 Find the cube root of 27 and the fourth roots of 81.
When using the above ideas you should also bear the following in mind.
√ √
When using the square root and nth root sign, i.e. and n , always be clear
about what parts of the expression are included in the root. For example,
√ √
4 × 16 and 4 × 16,
are different expressions (the former is equal to 8 whilst the latter is equal to 32).
Generally speaking, you can make your expressions clear by extending the ‘tail’ of
the root sign or using brackets.
Be careful when working with powers of negative numbers since even roots of
negative numbers do not exist. For example,
1 1
((−2)2 ) 2 = 4 2 = 2,
1
1
2
is fine, but (−2) does not exist and, as such, nor does (−2)
2 2 .
21
1. Review I — A review of some basic mathematics
1
Recap on combinations of operations
To summarise everything we have seen above about this, operations are done in
‘BEDMAS’ order, i.e.
1.2 Algebra
We use algebra to express and manipulate information about unknown quantities.
These unknown quantities are called variables and these are normally represented by
letters such as x, y and z. One way to think of this is that numbers are constants, i.e.
they always have the same value, whereas variables can take different values depending
on the context.
4x + 3x = 7x,
as four lots of x plus three lots of x is seven lots of x. Note that all of the mathematical
operations that we have seen so far can be used in algebraic expressions.
Often, we use mathematical expressions to represent the value of some quantity. For
instance, we can consider the following examples.
1. If you have a job which pays £10 per hour and you work x hours, then your income
is given by the algebraic expression £10x.
2. If a firm has a revenue of £x and costs of £y, then its profit is £(x − y).
3. If a firm prices a product at £x per unit and sells x units of this product, then the
revenue is £x2 . If the costs are £x, then its profit is £(x2 − x).
As the above examples show, some algebraic expressions contain one variable, such as
4x + 3x, some contain two variables, such as 4x + 3y − 7, and some can contain one
variable used several times, such as x2 − x where x is used twice (i.e. once in an x term
and once in an x2 term). Of course, the quantities represented may be more complicated
than those given in these examples.
22
1. Review I — A review of some basic mathematics
1
Example 1.1 Suppose that you heat your house with gas for d days per year and
on each day you use m cubic metres of gas. This means that you use dm cubic
metres of gas each year.
If gas costs £P per cubic metre, this means that the cost of heating your house for a
year is £dmP .
Suppose that you must also pay a fixed amount of £81 per year to the gas company.
This means that the cost of heating your house for a year is now £(dmP + 81).
Suppose that you pay your gas bill in twelve equal monthly instalments, this means
that you must pay
dmP + 81
£
12
every month.
Activity 1.4 What will the annual payment be if the gas company raises the price
of gas by £p per cubic metre? What will the corresponding monthly repayments be?
Given an algebraic expression, we are sometimes given specific values for each of the
variables involved and asked to evaluate it, i.e. find a value for the whole algebraic
expression given the values of the variables. So, for example, using our examples above
we have the following.
1. With x = 5, you have a job which pays £10 per hour and you work 5 hours, then
your income is given by £(10 × 5) = £50.
2. With x = 40 and y = 30, the firm has a revenue of £40 and costs of £30, and so its
profit is £(40 − 30) = £10.
3. With x = 10, the firm prices the product at £10 per unit and sells 10 units, i.e. the
revenue will be £102 . The costs will be £10, and so its profit is
£(102 − 10) = £(100 − 10) = £90.
Indeed, we can also look at how this works in our more complicated example.
Example 1.2 Following on from Example 1.1, suppose that when heating your
house, gas costs £0.12 per cubic metre and that you use 13 cubic metres of gas per
day for 125 days. This means that we have to pay
13 × 125 × 0.12 + 81 195 + 81 276
£ =£ =£ = £23
12 12 12
every month.
23
1. Review I — A review of some basic mathematics
1
Activity 1.5 What is the cost of heating your house for a year?
What will the annual payment be if the gas company raises the price of gas by 8p
per cubic metre? What will the corresponding monthly repayments be?
As long as we take care to combine ‘like with like’, an algebraic expression can
sometimes be simplified, i.e. it can be changed into a form that is easier to evaluate
without altering what we will get from an evaluation. For example, we saw earlier that
4x + 3x = 7x,
and so we can write 4x + 3x as 7x, which is simpler. In particular, we can often simplify
expressions by removing brackets from an expression and simplifying what remains, e.g.
if we have an algebraic expression like 3(2x) we can think of this as ‘three lots of 2x’
which gives us 6x, i.e.
3(2x) = 6x.
However, if we have an algebraic expression like 3(x + 2), which we can think of as
‘three lots of x + 2’, we can remove the brackets by multiplying everything inside the
brackets by 3, i.e.
3(x + 2) = 3x + 6,
whereas if we have an algebraic expression like −(2x − 1), we can think of the minus as
telling us to multiply everything inside the brackets by −1, i.e.
−(2x − 1) = −2x + 1.
Indeed, we may be able to do some simplifying after we have multiplied out the
brackets, e.g.
2(x + 3) + x = 2x + 6 + x = 3x + 6,
where, here, we have multiplied out the brackets and collected ‘like’ terms to get a
simpler expression. Some other examples of simplifying algebraic expressions are:
4x − 3x = x,
4(2x) − x = 8x − x = 7x,
3(x + y) = 3x + 3y,
3(x + 1) − 4(x − 1) = 3x + 3 − 4x + 4 = −x + 7.
Notice that none of these simplifications changes the outcome of any evaluation which
we may want to perform, i.e. whatever we get if we evaluate the expression at the start
we will also get if we evaluate the expression at the end. In this sense, the expressions
may look different, but algebraically they are the same throughout.
24
1. Review I — A review of some basic mathematics
1
Multiplying out two pairs of brackets
Sometimes we will want to multiply out the brackets in more complicated expressions.
For example, how would you remove the brackets from (x + 3)(y − 2)? We can think of
this in two ways:
Multiplying out the first bracket, everything in the first bracket needs to be
multiplied by the second bracket, i.e.
(x + 3)(y − 2) = x(y − 2) + 3(y − 2),
and then simplifying this as before we get
(x + 3)(y − 2) = x(y − 2) + 3(y − 2) = xy − 2x + 3y − 6.
Multiplying out the second bracket, everything in the second bracket needs to be
multiplied by the first bracket, i.e.
(x + 3)(y − 2) = (x + 3)y + (x + 3)(−2),
and then simplifying this as before we get
(x + 3)(y − 2) = (x + 3)y + (x + 3)(−2) = xy + 3y − 2x − 6.
But, notice that these are the same expression, and so we can multiply out in either
way as long as we make sure that every term in a bracket is multiplied by every term in
the other bracket.
Activity 1.6 We can write (x + 3)2 as (x + 3)(x + 3). Use this to remove the
brackets from the expression (x + 3)2 . In a similar manner, remove the brackets from
the expression (2x + 3)2 .
Factorising
Sometimes we can simplify expressions even further by putting brackets in, e.g. going
back to an earlier example, we could write
2(x + 3) + x = 2x + 6 + x = 3x + 6 = 3(x + 2),
as 3(x + 2) = 3x + 6 if we multiply out the brackets. The process of putting brackets
into an expression is called factorisation. For our current purposes, we just need to
note that we can factorise when every term in our expression has a common factor, such
as 3 in the example above. Some other examples, which can be verified by multiplying
out the brackets, are:
2x − 6 = 2(x − 3),
−2x − 10 = −2(x + 5), and
3xy − 12y = 3y(x − 4).
We will return to factorisation in Unit 3.
25
1. Review I — A review of some basic mathematics
1
1.2.2 Equations, formulae and inequalities
So far, we have considered how to manipulate algebraic expressions and what they may
be used to express. We now look at the ways in which a pair of algebraic expressions
may be related to one another.
Equations
and
4x − 8 2x + 4
= ,
9 9
but, it has different solutions to the equation
Bearing this in mind, let’s see how we would actually solve this equation.
26
1. Review I — A review of some basic mathematics
1
Example 1.3 Solve the equation 4x − 8 = 2x + 4.
We solve this by rearranging it, i.e. performing some well chosen mathematical
operations on both sides at the same time:
4x − 8 = 2x + 4 our equation
4x − 8 − 2x = 2x + 4 − 2x subtracting 2x from both sides
2x − 8 = 4 simplifying
2x − 8 + 8 = 4 + 8 adding 8 to both sides
2x = 12 simplifying
x=6 dividing both sides by 2
Lastly, always check that any solution you find is a solution by using it to evaluate both
sides of the original equation.
3x + 6 = 5x − 10 our equation
3x + 6 − 3x = 5x − 10 − 3x subtracting 3x from both sides
6 = 2x − 10 simplifying
6 + 10 = 2x − 10 + 10 adding 10 to both sides
16 = 2x simplifying
8=x dividing both sides by 2
The equations in the last two examples are linear equations and they will be the
starting point for a more detailed discussion of equations that will start in Unit 2.
Inequalities
27
1. Review I — A review of some basic mathematics
1
As with equations, inequalities can be solved by rearranging them into simpler
inequalities that are true for the same range of values. Generally, given an inequality,
this means that we can:
3x > 6 can be simplified to give x > 2 by dividing both sides by 3 (as 3 is positive).
However, if we multiply (or divide) by a negative expression, we must ‘reverse the
direction’ of the inequality. For example,
−3x > 6 can be simplified to give x < −2 by dividing both sides by −3 and
reversing the direction of the inequality (as −3 is negative).
To see why we need to do this, consider the inequality 2 < 3 which is true. If we
multiply by 2 (which is positive) we get 4 < 6 which is still true, but if we multiply by
−2 (which is negative) we get −8 < −12 which is not true. However, if we reverse the
direction of the inequality as well, we get −8 > −12 which is now true.
We solve this by rearranging it, i.e. performing some well chosen mathematical
operations on both sides at the same time:
Thus, the solution to our inequality is −2 < x, or rewriting this, x > −2.
28
1. Review I — A review of some basic mathematics
1
Formulae
Example 1.6 Following on from Example 1.1, let S denote the amount, in pounds,
of our monthly gas payments so that
dmP + 81
S= .
12
If our monthly repayment, S, is now given, for how many days, d, can we heat our
house?
Activity 1.9 In a similar manner, find the price, P , per cubic metre of gas.
Identities
x(x + 1) = x2 + x,
is an identity because reading it from left to right tells us how to multiply out the
brackets in ‘x(x + 1)’ and reading it from right to left tells us how to factorise the
quadratic x2 + x. In particular, notice that although this looks like an equation, it isn’t
really because it is true for all values of x! In fact, throughout this unit we have been
reviewing how certain mathematical operations work and, as you have probably
realised, many of these can be usefully summarised by using identities. For instance, the
following identities allow us to summarise some of the ideas that we encountered when
we discussed fractions.
29
1. Review I — A review of some basic mathematics
1
Arithmetic with fractions
At this stage, we can also usefully summarise some of the ways in which powers work as
follows.
Power laws
We can also summarise some of our results concerning brackets by using identities as
you can see in the next activity.
Activity 1.10 Write out the identities that arise when you remove the brackets
from the following algebraic expressions.
And, just to be sure that we understand what is going on, try the next activity.
Activity 1.11 Use these identities to simplify the following algebraic expressions.
(x + y)2 − (x − y)2 √ √ √
i. (x+y)2 −x(x+y)−y(x+y), ii. , iii. x + y−( x+ y).
4xy
30
1. Review I — A review of some basic mathematics
1
Learning outcomes
At the end of this unit, you should be able to:
simplify and evaluate arithmetic expressions including those that involve brackets
and powers;
manipulate algebraic expressions including those that involve brackets and powers;
model certain situations using formulae and be able to rearrange such formulae;
Exercises
Exercise 1.1
6 3 − (−3 − (4 − 5) − 2) − 6
Evaluate the expressions 3 · 2 + · 7 + 4 and .
2 −(−(−1)) − 1
Exercise 1.2
Evaluate the following expressions.
i. | − 3| + | − 2|, ii. | − 3| − | − 2|, iii. − |3| + | − 2|, iv. − |3| − | − 2|, v. − |3| − |2|.
Exercise 1.3
Write the proper fractions 4 72 , 1 32 and 2 14 as improper fractions.
Exercise 1.4
Evaluate the following expressions, writing your answers in lowest terms.
1 1 30 5 2 25 13 9
i. + , ii. − , iii. · , iv. ÷ .
3 2 7 3 5 4 8 4
Exercise 1.5
You deposit £1000 in a bank account that pays 10% interest. What will the balance be
after one year? Two years?
After two years, what is the increase in the balance as a percentage of the original
deposit?
Exercise 1.6
Evaluate the following expressions.
1 1 1 1 2 10−2
i. 92 − 9 2 , ii. 16− 4 + 16 2 , iii. 7 3 · 7 3 , iv. .
2−10
31
1. Review I — A review of some basic mathematics
1
Exercise 1.7
Express the following in the simplest form possible.
x2 y xy 3 z 1 1
i. + , ii. x(y 2 z 3 ) 2 (xz)−2 , iii. x(xy)−2 (x + z) 2 .
2xz xy
Exercise 1.8
Multiply out the brackets in the following expressions simplifying your answers as far as
possible.
i. (x + 1)(x − 1), ii. (2y + 3)(y − 2), iii. (x + 3y)(2x − y), iv. (2x − 3y)(x + z).
Exercise 1.9
Solve the following equations.
i. −3 p = 21, ii. 4 q − 1 = 15,
5 1 2
iii. 5 z + 4 (z − 2) = 1, iv. 6
k − 2k + 3
= 3
,
v. 5 m − 3 (m − 2) = 11 (m + 2), vi. 83 (w − 1,996) + 17 (w − 1,996) = 600.
Exercise 1.10
You hire a car for £20 plus the cost of petrol used. Let x be the distance you travel in
miles and p be the price, in pence, of petrol per gallon. If petrol consumption is 30 miles
per gallon, write down expressions, in pence, for the amount you spend on petrol and
the cost per mile.
Exercise 1.11
z
Rearrange the formula y = − 3 to make x the subject.
2+x
Exercise 1.12
Solve the inequality 5 − x > 2x − 1.
32
2. Review II — Linear equations and straight lines
Unit 2: Review II 2
Linear equations and straight lines
Overview
In this unit we continue our study of equations by looking at linear equations in one and
two variables. In particular, we will see that linear equations in two variables represent
straight lines. We see how to sketch these lines by finding their intercepts and
investigate how their gradient allows us to measure changes. Lastly, we will see how to
solve problems that involve simultaneous equations.
Aims
To see how to solve simple linear equations in one and two variables.
ax = c.
where a 6= 0 and c are constants. As in Unit 1, we can solve this by dividing through on
both sides by a to get
c
x= ,
a
33
2. Review II — Linear equations and straight lines
as a 6= 0 and, in this case, we say that such an equation has a unique solution.1 Of
course, the variable need not be x, as linear equations in one variable may use a
different variable, for example, the variable
2
i. y in 4y = −8, which gives the solution y = −2;
ii. z in 3z = 9 ,which gives the solution z = 3;
iii. q in 3q = 9, which gives the solution q = 3.
Notice, in particular, that examples ii. and iii. are the same equation written in terms of
two different variables.
More generally, a linear equation in one variable can come about through an equation
that only involves multiples of the variable and constants. For example, if we consider
the equations,
1. 6y + 4 = 2y − 4,
2. 2z − 6 = −z + 3,
3. q − 5 = −2q + 4,
we can rearrange them, as in Unit 1, to yield the linear equations that we saw above.
The only exceptions to this are when we have something like
2x − 6 = 2x + 2 which rearranges to 0 = 8,
and this is never true, i.e. an equation like this has no solutions since, whatever value of
x we put into the equation, it is never satisfied. Or we have something like
2x + 2 = 2x + 2 which rearranges to 0 = 0,
and this is always true, i.e. an equation like this has an infinite number of solutions
since, whatever value of x we put into the equation, it is always satisfied. That is, this
equation is actually an identity because it is true for all values of x.
2x + y = 7,
y = 7 − 2x.
1
That is, it always has a solution and there is only one solution.
34
2. Review II — Linear equations and straight lines
Now, if we substitute any value of x into this equation, it will give us a value of y.
For instance, if we take
x = 1 we get y = 5;
2
x = 2 we get y = 3;
x = 3 we get y = 1;
and so on for any other values of x that we may choose. Furthermore, each of these
pairs of numbers is a solution to the equation as putting the x value and its
corresponding y value into the equation satisfies it.
Example 2.2 Consider the linear equation in two variables given by x = 2. Notice
that this linear equation only contains the variable x, but as we are told that it is a
linear equation in two variables, we have to think about what this means for the
other variable, which we can call y. The way to think about this is to write it as
x + 0y = 2,
and then notice that, for any value of y, the quantity ‘0y’ is always zero and so we
must always get x = 2. That is, among the solutions to this linear equation in two
variables we will find the pairs of numbers
x = 2 and y = 1;
x = 2 and y = 2;
x = 2 and y = 3;
and so on for any other value of y as long as we pair it with x = 2.
Activity 2.1 What are the solutions to the linear equation in two variables given
by y = 3?
35
2. Review II — Linear equations and straight lines
point with coordinates (0, 0) and we call this the origin. We often refer to the ‘space’
which contains all the points with (x, y) coordinates as the ‘xy-plane’.3 Repeating this
for the other two solutions of the equation we found earlier, i.e. those with coordinates
2 (2, 3) and (3, 1), yields three points on our diagram as shown in Figure 2.1(b). This
procedure, of representing the solutions to an equation in two variables as points on
such a diagram, is known as plotting those points.
y y
5 5
3 3
1 1
O x O x
1 2 3 1 2 3
(a) (b)
Figure 2.1: Plotting points. (a) The point (1, 5) and (b) the points (1, 5), (2, 3) and (3, 1).
All of the points plotted here are solutions to the equation 2x + y = 7.
If we were to repeat this procedure, i.e. if we were to plot all the points which
represented a solution to our linear equation, we would find that they are all on the
straight line shown in Figure 2.2(a). In fact, any linear equation in two variables that
has an infinite number of solutions can be represented as a line on such a diagram.
Indeed, the lines which represent the linear equations in two variables given by x = 2
(from Example 2.2) and y = 3 (from Activity 2.1) are illustrated in Figure 2.2(b). In
particular, notice that points on the vertical line, which represent the solutions to the
equation x = 2, always have coordinates of the form (2, y) where y can take any value.
Activity 2.2 In a similar manner, what can we say about the coordinates of the
points on the horizontal line? What are the coordinates of the point at which this
horizontal line and this vertical line intersect?
Activity 2.3 If we have the vertical line x = k and the horizontal line y = l where
k and l are constants, what can we say about the coordinates of the points on these
lines? What are the coordinates of the point at which these two lines intersect?
3
Basically, it’s a ‘plane’ because it is ‘flat and level’. It’s the xy-plane because the points have (x, y)
coordinates as determined by the x and y-axes.
36
2. Review II — Linear equations and straight lines
y y
2x + y = 7 x=2
2
5 5
y=3
3 3
1 1
O x O x
1 2 3 1 2 3
(a) (b)
Figure 2.2: Drawing straight lines. (a) Each point on this line has coordinates that satisfy
the equation 2x + y = 7. (b) Each point on the vertical line has coordinates that satisfy
the equation x = 2 and each point on the horizontal line has coordinates that satisfy the
equation y = 3.
Activity 2.4 What are the equations of the lines that we use to represent the x and
y-axes? What are the coordinates of the points on these lines?
If a = 0 and b 6= 0, then the equation can be written as y = c/b and so, for any
value of x, a point with coordinates (x, c/b) is on this line. As in Activity 2.1, where
we had y = 3, we see that these equations represent horizontal straight lines as
illustrated in Figure 2.2(b). In particular, the line with equation y = 0 is the x-axis.
37
2. Review II — Linear equations and straight lines
If a 6= 0 and b = 0, then the equation can be written as x = c/a and so, for any
value of y, a point with coordinates (c/a, y) is on this line. As in Example 2.2,
where we had x = 2, we see that these equations represent vertical straight lines as
2 illustrated in Figure 2.2(b). In particular, the line with equation x = 0 is the y-axis.
If a 6= 0 and b 6= 0, then the equation can not be written so simply and any point
whose coordinates satisfy this equation will be on this line. As in Example 2.1,
where we had 2x + y = 7, we see that these equations represent lines which are
neither horizontal nor vertical and we call them oblique straight lines, as illustrated
in Figure 2.2(a).
Now, given a linear equation in two variables, when it comes to drawing the straight
line that represents its solutions, all we need to do is find at most two points on the
line. In particular, on the one hand, if we can see from its equation that the line is
horizontal or vertical, we need only one point on the line to draw it. On the other hand,
if we can see from its equation that the line is oblique, then we only need to find two
points on the line to draw it. That is, if we find any two points whose coordinates
satisfy the equation, we can plot these two points on our diagram and the line we seek
is the one that goes through these two points.
ax + by = c,
x-intercept, i.e. the value of x where the line crosses the x-axis. But, the x-axis is
the line y = 0 and so we are looking for the value of x which occurs when y = 0 in
our equation. That is, we want x to be such that
c
ax = c =⇒
x= ,
a
c
and so the coordinates of the x-intercept are ,0 .
a
y-intercept, i.e. the value of y where the line crosses the y-axis. But, the y-axis is
the line x = 0 and so we are looking for the value of y which occurs when x = 0 in
our equation. That is, we want y to be such that
c
by = c =⇒
y= ,
b
c
and so the coordinates of the y-intercept are 0, .
b
This general case is illustrated in Figure 2.3(a).
38
2. Review II — Linear equations and straight lines
y y
2
5
c
b 4
ax + by = c 3 2x + y = 4
O x O x
c 1 2 3
a
(a) (b)
Figure 2.3: The x and y-intercepts of an oblique straight line. (a) In general, the oblique
line ax + by = c has a 6= 0 and b 6= 0, so the x and y-intercepts are given by the points
( ac , 0) and (0, cb ) respectively. (b) The line 2x + y = 4 has x and y-intercepts given by the
points (2, 0) and (0, 4) respectively.
Example 2.3 As an example of how this works, consider the linear equation in two
variables given by
2x + y = 4.
This represents an oblique line and so we can find its x and y-intercepts as follows.
For the x-intercept, we set y = 0 to get 2x = 4 and hence x = 2. Thus, the point
with coordinates (2, 0) is the x-intercept of this line.
For the y-intercept, we set x = 0 to get y = 4. Thus, the point with coordinates
(0, 4) is the y-intercept of this line.
Once we have plotted these two points, the line that we seek is the one that goes
through both of them, as illustrated in Figure 2.3(b).
Activity 2.5 Suppose that you are going to spend exactly £3 when buying x
apples and y bananas. If apples cost 50p each and bananas cost 30p each, find a
linear equation in terms of x and y that gives the combinations of apples and
bananas that you can purchase. Draw the straight line that is represented by this
linear equation and comment on its economic significance.
39
2. Review II — Linear equations and straight lines
If (x1 , y1 ) and (x2 , y2 ) are the coordinates of two distinct points on a straight line,
then
the change in x is ∆x = x2 − x1 .
The gradient, m, of this straight line is then given by
∆y y2 − y1
m= = .
∆x x2 − x1
In particular, whichever two points on the straight line we use when finding the
gradient, we will always get the same value.
Example 2.4 Using the line from Example 2.3 which was illustrated in
Figure 2.3(b), we can see that it goes through the points with coordinates (2, 0) and
(0, 4). As such, using these two points, we can see that
Activity 2.6 Following on from Example 2.1, find the gradient of the straight line
whose equation is 2x + y = 7.
Following on from Example 2.2 and Activity 2.1, what can you say about the
gradients of the straight lines whose equations are x = 2 and y = 3?
40
2. Review II — Linear equations and straight lines
If we know that the line has gradient, m, and y-intercept, k, then the equation of the
line is given by
2
y = mx + k.
For example, if we are told that a line has a gradient of 3 and its y-intercept is the point
with coordinates (0, 7), then the equation of the line is
y = 3x + 7
and, if the gradient of the line is zero and the y-intercept is the point with coordinates
(0, 5), then the equation of the line is y = 5.
If we are given the gradient of the line, m, and a point on the line other than the
y-intercept, say the point (x1 , y1 ), then we know that for any other point, (x, y), on the
line we must have
y − y1
m= ,
x − x1
as the gradient of a line is the same regardless of which pair of points on the line we use
to calculate it.
To verify that this formula works, consider again the line which has a gradient of 3 and
whose y-intercept is the point with coordinates (0, 7). Using the formula, this yields the
equation,
y−7
3= which can be rearranged to give y = 3x + 7,
x−0
as before. Similarly, in the case of the line which has a gradient of zero and whose
y-intercept is the point with coordinates (0, 5), we can see that the formula yields the
equation
y−5
0= which can be rearranged to give y = 5,
x−0
as before.
However, the full power of this formula is when we have to find the equation of the line
that, for example, has a gradient of 10 and goes through the point with coordinates
(2, 3). Using the formula in this case yields the equation
y−3
10 = =⇒ y − 3 = 10(x − 2) =⇒ y − 3 = 10x − 20,
x−2
or y = 10x − 17. Indeed, we can verify this is correct since the x-coefficient is the
gradient and the point (2, 3) satisfies the equation.
If we know that the two distinct points (x1 , y1 ) and (x2 , y2 ) lie on the line, then its
gradient is given by
y2 − y1
m= .
x 2 − x1
41
2. Review II — Linear equations and straight lines
However, if the point (x, y) is also on the line, then the gradient between it and any
other point on the line, say (x1 , y1 ), is given by
2 m=
y − y1
.
x − x1
So, as the line has the same gradient regardless of the pairs of points we take, this
means that the equation of the line is given by
y − y1 y2 − y1
= .
x − x1 x 2 − x1
For example, if the points with coordinates (1, −7) and (2, 3) are on the line, then the
equation of the line is given by
y − (−7) 3 − (−7)
= =⇒ y + 7 = 10(x − 1) =⇒ y + 7 = 10x − 10,
x−1 2−1
or y = 10x − 17, i.e. this line is the same as the one we saw earlier.
Example 2.5 If y is the distance travelled and x is the time taken, then a linear
equation that relates these two variables would give us a straight line that represents
the distance travelled in terms of the time taken. In this case, the gradient of the
line, i.e.
∆y
m= ,
∆x
is the speed of the object whose motion we are considering.
If the x variable is time, as in this example, we often call the gradient the rate of change
of y. So, in this example, speed is the rate of change of distance, as one might expect. If
the x variable is something else, then we call the gradient the rate of change of y with
respect to x.
In economics, if x measures the quantity being produced, then gradients are usually
referred to as marginals. So, for instance, the rate of change of profit with respect to
the amount produced would be the marginal profit and so on. To motivate this, let’s
consider another example.
Example 2.6 Suppose that a factory produces a certain product and the profit,
when written in terms of the amount produced, is thought to be linear. One year,
they produce 40 units and lose £4, 000, while the next year, production is doubled
to 80 units resulting in a profit of £1, 000. What is the equation describing the profit
in this case?
One way to start is to denote the profit by π and the amount produced by x. Then
since we know that we are looking for a linear expression, we can use the
42
2. Review II — Linear equations and straight lines
π = 125x + k.
However, we can find k as we know that the point (80, 1000) must satisfy this linear
relationship, i.e. we must have
π = 125x − 9, 000,
and this can be verified by showing that it is also satisfied by the point (40, −4000).
So, in this example, the gradient of the straight line is the marginal profit of the factory.
That is, when quantities like profit and production are related by a straight line, we say
that the marginal profit is the gradient of that line, i.e. the change in profit divided by
the change in production.
2x + y = 4.
But, what if we want to find the points (x, y) that two lines have in common? That is,
what if we have two linear equations, say,
2x + y = 4 and x − y = −1,
and we want to find the points (x, y) that are solutions to both of them? In such cases,
we say that we are solving the two equations simultaneously, we call the pair of
equations simultaneous equations and we usually denote this by ‘pairing’ them with a
curly bracket, i.e. )
2x + y = 4
x − y = −1
Sometimes, we will refer to a collection of two or more equations, such as the ones
above, as a system of linear equations. We now turn our attention to visualising what
the solution to a pair of simultaneous equations is and how to solve them using algebra.
43
2. Review II — Linear equations and straight lines
y y
5 5
4 4
x − y = −1
3 3 2x + y = 4
2 2
1 2x + y = 4 1
O x O x
1 2 3 1 2 3
(a) (b)
Figure 2.4: Finding points of intersection. (a) The lines represented by the linear equations
2x + y = 4 and x − y = −1 intersect at the point (1, 2). (b) The x and y-intercepts of the
line represented by the linear equation 2x + y = 4 are its points of intersection with the
lines y = 0 (i.e. the x-axis) and x = 0 (i.e. the y-axis) respectively.
i.e. the point that the lines they represent have in common, is the point (1, 2) where the
two lines intersect. However, using pictures, no matter how well they are drawn, to find
such points can be inaccurate and so we want to develop an algebraic method for
solving such equations.
However, we have already seen examples of such an algebraic method. For example,
when we found the x and y-intercepts of the line represented by the linear equation
2x + y = 4, as illustrated in Figure 2.3(b). In this case, finding the x-intercept of the
line involved finding the point (x, y) that lies on the line and the x-axis, i.e. this
involved solving the simultaneous equations
)
2x + y = 4
x=0
whereas finding the y-intercept of the line involved finding the point (x, y) that lies on
the line and the y-axis, i.e. this involved solving the simultaneous equations
)
2x + y = 4
y=0
44
2. Review II — Linear equations and straight lines
Method I: Substitution
This method involves rearranging one of the two equations so that it is in the form
y = mx + k, say, and then using this to substitute for the y in the other equation. This
yields an equation that allows us to solve for x. We can then find y by substituting this
value of x back into our equation of the form y = mx + k.
Activity 2.7 Verify that the solution to these simultaneous equations is x = 3 and
y = 1 by showing that these values satisfy the two original linear equations.
Activity 2.8 Make x the subject of the second equation in Example 2.7 and use
this to find the solution to the given simultaneous equations.
This method involves multiplying each equation by a specially chosen number, namely
the number that makes the coefficients of one of the variables the same in both
equations. Then, by subtracting one equation from the other, we can eliminate that
variable and hence solve what is left for the other.
45
2. Review II — Linear equations and straight lines
2
)
2x + y = 7
x − 2y = 1
2x + y = 7 multiply by 1 to get 2x + y = 7
x − 2y = 1 multiply by 2 to get 2x − 4y = 2
and subtracting gives 5y = 5
which tells us that y = 1. Then, using this value of y in either of the original
equations, say the second, we see that x = 3. Thus, the solution is x = 3 and y = 1.
Activity 2.10 Repeat the calculation in this example, but instead of eliminating x
as we did above, use your multiplications to eliminate y.
A warning
With either of these methods, we may find that when we eliminate one variable, we
eliminate the other variable as well, ending up with something like 0 = 0 or 2 = 5. In
such cases we conclude that:
If we get the former, i.e. we get something which is always true, this means that
our simultaneous equations have an infinite number of solutions. This occurs when
the two lines that are represented by our simultaneous equations are actually just
the same line and so every point on this line is a solution as every point is a point
of intersection.
If we get the latter, i.e. we get something which is never true, this means that our
simultaneous equations have no solutions. This occurs when the two lines that are
represented by our simultaneous equations are parallel, i.e. they never intersect,
and so no point on either of the lines can be a solution.
Thus, we can see that in such cases, we will always get parallel lines. And, if the parallel
lines are the same we get an infinite number of solutions, and if they are different we get
no solutions.
One way of seeing when such cases occur is to note that parallel lines have the same
gradient. As such, if you find that your simultaneous equations represent lines with the
same gradient, the question is whether they always intersect (e.g. they have the same
y-intercept as well) or whether they never intersect (e.g. they have different
y-intercepts).
46
2. Review II — Linear equations and straight lines
The level of demand, q, for a product depends on the [per-unit] price, p, of the
product. Generally, the level of demand falls as the price increases, and so a line
representing this relationship between p and q must have a negative gradient. We
generally denote the demand line by D.
The point where the supply and demand lines intersect is called the equilibrium point.
In theory, this is the point where the market stabilises since, at this point, the [per-unit]
price is such that the levels of supply and demand are equal. This is illustrated in
Figure 2.5.
S
equilibrium point
O q
Figure 2.5: Representing the supply, S, and demand, D, by lines, the equilibrium point
is where the two lines intersect. Notice that, at this point, the [per-unit] price, p, is such
that the levels of supply and demand, q, are equal.
Activity 2.12 If demand is given by the equation 2q + 5p = 500 and supply is given
by 3q = 25 + 7p, what are the equilibrium price and quantity?
47
2. Review II — Linear equations and straight lines
Learning outcomes
use a linear equation in two variables to draw the corresponding straight line;
Exercises
Exercise 2.1
For the following linear equations, find two points that are solutions to the equation and
hence draw the straight line.
i. 3x + 4y = 12; iii. x − 2y = 4;
In each case, use your two points to calculate the gradient of the straight line and use
the equation of the straight line to verify that your answer for the gradient is correct.
Exercise 2.2
Draw the straight lines that go through the following pairs of points.
ii. (0, −3) and (3, 0); iv. (−2, 3) and (4, 6).
In each case, find the equation of the line you have drawn.
Exercise 2.3
Find the equations of the lines with the following properties.
i. A line that passes through the point (8, −1) and has a gradient of 41 .
ii. A line with a gradient of −6 and a y-intercept with coordinates (0, 45).
7 11
iii. A line with a gradient of 2 and an x-intercept with coordinates − , 0 .
2
48
2. Review II — Linear equations and straight lines
Exercise 2.4
A company increased its weekly production from 20 to 25 units and found that its costs
went up by £800 per week. Assuming that the relationship between costs and 2
production is linear, find the marginal cost of production.
Given that the original cost was £5, 000, find the linear equation that relates the costs
to production.
If the selling price was £200 per unit, how many more units does the company need to
produce in order to break-even?
Exercise 2.5
Solve the following sets of simultaneous equations.
) )
x + 2y = 7 4x + 2y = 5
i. iii.
x − 3y = −3 2x + y = 2
) )
2x + 5y = 11 4x + 2y = 4
ii. iv.
3x + 3y = 12 2x + y = 2
Exercise 2.6
The demand for a product, q, is related to the price, p, by the equation q = 200 − 2p
while suppliers respond to a price of p by supplying an amount, q, given by the equation
q = 3p − 200. Find the equilibrium price and the corresponding level of production.
49
3. Review III — Quadratic equations and parabolae
In this unit we see how algebra allows us to solve quadratic equations by factorising and
completing the square. We then see, more generally, that quadratics can represent
special curves known as parabolae and we see how to sketch them.
Aims
To see how to write quadratics in their factorised and completed square forms.
ax2 + bx + c = 0,
where a 6= 0, b and c are constants.1 As such, we refer to expressions like the one on the
left-hand side of this equation as quadratics and we call the constants a, b and c the
coefficients of the quadratic. In this section we shall investigate several ways in which
we can solve such equations.
3.1.1 Factorising
One way of solving a quadratic equation like
ax2 + bx + c = 0
1
Notice that, if a = 0, then we have bx + c = 0 and this is a linear equation. That is, to be a quadratic
equation in x, there must be an x2 term in the equation.
50
3. Review III — Quadratic equations and parabolae
is to factorise it. This involves writing ax2 + bx + c as the product of two linear factors,
i.e. we want to ‘put brackets in’ so that we can write
ax2 + bx + c = (Ax + B)(Cx + D),
for some constants A, B, C and D. If we can do this, we can then rewrite the quadratic
equation as
(Ax + B)(Cx + D) = 0,
3
and this helps us because the product on the left-hand side of the expression can only
equal zero if one of the linear factors in the brackets is equal to zero. That is, the
solutions to our quadratic equation must be the solutions to the two linear equations
Ax + B = 0 and Cx + D = 0.
Thus, as A, C 6= 0,2 the solutions will be given by
B D
x=− and x = − ,
A C
which we can easily find given the constants A, B, C and D. Consequently, we see that
if we can factorise the quadratic in this way, we can easily solve the quadratic equation.
But, how can we go about factorising a quadratic? The basic idea involves the identity
(x + α)(x + β) = x2 + (α + β)x + αβ,
which tells us how a certain factorised form, i.e. the left-hand side, is related to a
certain quadratic, i.e. the right-hand side. So, reading this the other way, if we have the
quadratic
x2 + (α + β)x + αβ,
we can factorise it by simply taking the numbers α and β to get the factorised form
(x + α)(x + β).
But, of course, the problem is that we do not know what numbers α and β are! So, in
this relatively simple case, where a (the x2 coefficient) is one, we will have a quadratic
like
x2 + bx + c,
and we need to find the numbers α and β which add together to give us b (as we need
b = α + β) and which multiply together to give us c (as we need c = αβ). Then, if we
can find the numbers α and β that do this, we will have
x2 + bx + c = x2 + (α + β)x + αβ = (x + α)(x + β),
as the required factorised form. The trick then, is to take the values of b and c, think
carefully about which numbers α and β could add and multiply in the right kind of way,
and hopefully settle on the ones that will make everything work.
However, generally speaking, factorising any given quadratic can be tricky (especially in
the more general case where a, the x2 coefficient, is not one) and so this method for
solving quadratic equations is not always that useful. Nevertheless, because it is so nice
when it works, we will consider some examples below before we move on to some more
‘reliable’ methods.
2
This must be the case since AC = a and, as a 6= 0, neither A nor C can be zero.
51
3. Review III — Quadratic equations and parabolae
We start by factorising the quadratic x2 − x − 6, i.e. we need two numbers that add
together to give us −1 and multiply together to give us −6. A little thought should
convince you that the required numbers are +2 and −3 which means that we have
3 x2 − x − 6 = (x + 2)(x − 3),
as you can easily verify by multiplying out the brackets. This means that we can
rewrite the quadratic equation as
(x + 2)(x − 3) = 0,
x + 2 = 0 and x − 3 = 0,
i.e. the solutions are x = −2 and x = 3. When this happens, we say that we have
two distinct solutions.
Activity 3.1 Verify that x = −2 and x = 3 are solutions to the quadratic equation
in Example 3.1 by substituting them into the left-hand side of the equation and
showing that they give zero.
x2 − 4x + 4 = (x − 2)(x − 2),
as you can easily verify by multiplying out the brackets. This means that we can
rewrite the quadratic equation as
(x − 2)(x − 2) = 0,
x − 2 = 0 and x − 2 = 0,
i.e. both solutions are x = 2. When this happens, we say that x = 2 is a repeated
solution.
52
3. Review III — Quadratic equations and parabolae
Unfortunately, as mentioned above, we will meet quadratic equations which are difficult
to solve by factorisation. And for this reason, we now seek a method that will always
work. The method that we will use here requires us to complete the square of the
quadratic instead of factorising it. So we now consider how to perform this procedure
and then, having done this, we will be able to see how to use it to solve quadratic 3
equations.
(x + k)2 = x2 + 2kx + k 2 ,
and, since we can write x2 + 2kx + k 2 as (x + k)2 in this way, we say that it is a perfect
square. That is, it can be written as something (in this case, x + k) squared and nothing
else.
x2 + 6x + 9 = (x + 3)2 .
x2 + 6x + 10 = (x2 + 6x + 9) + 1 = (x + 3)2 + 1,
and not as ‘something squared and nothing else’ due to the presence of the ‘+1’ on
the right-hand side.
Now imagine that we have a quadratic expression of the form x2 + 2kx and we want to
complete the square. That is, we want to find something that we can add to this
expression in order to get a perfect square. The idea is that:
x2 + 2kx + k 2 = (x + k)2 ,
x2 + 2kx = (x + k)2 − k 2 ,
where, on the right-hand side, we now have a perfect square, i.e. (x + k)2 , plus
something which doesn’t depend on x, i.e. −k 2 . When we write x2 + 2kx in this way, we
have what we call its completed square form.
53
3. Review III — Quadratic equations and parabolae
Example 3.4 The quadratic expression x2 − 4x can be made into a perfect square
by adding (−2)2 = 4 to it, i.e.
x2 − 4x + 4 = (x − 2)2 .
x2 + 2kx + c,
x2 + 2kx = (x + k)2 − k 2 ,
which is the completed square form of x2 + 2kx + c since, on the right-hand side, we
now have a perfect square, i.e. (x + k)2 , plus something which doesn’t depend on x, i.e.
c − k2.
x2 − 4x + 4 = (x − 2)2 ,
x2 − 4x = (x − 2)2 − 4,
x2 − 4x + 3 = x2 − 4x + 3 = (x − 2)2 − 4 + 3 = (x − 2)2 − 1,
We can find the completed square form of even more complicated quadratic expressions,
like ax2 + 2kx + c, by using brackets to break the expression down into simpler parts as
we did above.
54
3. Review III — Quadratic equations and parabolae
We start by putting in brackets so that we can work with something which is similar
to what we saw above, i.e. we want a quadratic expression where the x2 coefficient is
one. This means that we want to write
−2x2 + 8x + 10 = −2 x2 − 4x + 10.
−2x2 +8x+10 = −2 x2 −4x +10 = −2 (x−2)2 −4 +10 = −2(x−2)2 +8 +10 = −2(x−2)2 +18,
Activity 3.4 Verify that, in the previous examples, the completed square form of
the expression is indeed equal to the original expression by multiplying out the
brackets.
Activity 3.5 Find the completed square form of −2x2 + 4x + 8 and verify that your
answer is correct by multiplying out the brackets.
x2 − 4x = (x − 2)2 − 4,
in completed square form. This means that the quadratic equation we have to solve is
(x − 2)2 − 4 = 0.
x − 2 = ±2.
Hence, the solutions to our quadratic equation are given by x = 2 ± 2, i.e. x = 4 and
x = 0.
55
3. Review III — Quadratic equations and parabolae
To verify that these are the solutions, we could substitute these values into the
left-hand side of the equation and check that we get zero. Or, alternatively, we can
verify our answer by solving this quadratic equation by factorising, i.e.
x2 − 4x = 0 =⇒ x(x − 4) = 0 =⇒ x = 0 and x = 4,
as before.
3
Example 3.8 Solve the quadratic equation x2 − 4x + 3 = 0 by completing the
square.
x2 − 4x + 3 = (x − 2)2 − 1
in completed square form. This means that the quadratic equation we have to solve
is the same as
(x − 2)2 − 1 = 0.
This is easily rearranged to get
(x − 2)2 = 1,
and then, if we take the square root of both sides, we get
x − 2 = ±1.
Hence, the solutions to our quadratic equation are given by x = 2 ± 1, i.e. x = 3 and
x = 1.
To verify that these are the solutions, we could substitute these values into the
left-hand side of the equation and check that we get zero. Or, alternatively, we can
verify our answer by solving this quadratic equation by factorising, i.e.
x2 − 4x + 3 = 0 =⇒ (x − 1)(x − 3) = 0 =⇒ x = 1 and x = 3,
as before.
56
3. Review III — Quadratic equations and parabolae
Hence, the solutions to our quadratic equation are given by x = 2 ± 3, i.e. x = 5 and
x = −1.
To verify that these are the solutions, we could substitute these values into the
left-hand side of the equation and check that we get zero. Or, alternatively, we can
verify our answer by solving this quadratic equation by factorising, i.e.
Activity 3.6 In Examples 3.1 and 3.2, we solved the quadratic equations
x2 − x − 6 = 0 and x2 − 4x + 4 = 0
Activity 3.7 In Activity 3.3, you were asked to solve the quadratic equation
x2 + 7x + 12 = 0 by factorising. Verify your answer by solving it by completing the
square.
3.1.4 Warning!
So far, we have looked at the solutions to several quadratic equations and we have
found that there can be either two distinct solutions or one repeated solution. But, this
is not always the case! Consider the quadratic equation
ax2 + bx + c = 0,
for some numbers a, b and c. When written in completed square form this will give us
a(x + p)2 − q = 0,
57
3. Review III — Quadratic equations and parabolae
Consequently, we can see that a quadratic equation can have two, one or no [real]
solutions depending on what happens when we rearrange the completed square form.3
We shall investigate the consequences of this observation in the following sections.
the formulae
b b2
p= and q= − c,
2a 4a
tell us the values of p and q.
Quadratic formula
This formula and its use should be familiar to everyone and so we will only give one
example of its use.
58
3. Review III — Quadratic equations and parabolae
Activity 3.9 In Examples 3.7, 3.8 and 3.9, we solved the quadratic equations
x2 − 4x = 0, x2 − 4x + 3 = 0 and − 2x2 + 8x + 10 = 0,
by completing the square. Use the quadratic formula to verify your answers.
You should also note that this formula comes from our method of solving quadratic
equations by completing the square. Indeed, the conditions from Section 3.1.4 for two,
one or no solutions can also be written as:
If b2 − 4ac > 0, we will get two distinct [real] solutions from the quadratic formula.
If b2 − 4ac = 0, we will get one repeated [real] solution from the quadratic formula.
If b2 − 4ac < 0, we will get no [real] solutions from the quadratic formula.4
We also note in passing that the quantity b2 − 4ac is called the discriminant.
3.2 Parabolae
In Unit 2 we saw that if we had a linear equation in two variables, say ax + by = c, this
represented a straight line. Indeed, we saw that oblique straight lines had equations of
the form
y = mx + k,
with m 6= 0. We now turn our attention to the curves which are represented by
equations of the form
y = ax2 + bx + c,
where a 6= 0 so that we can be sure that we are dealing with a quadratic expression in
x. Two such curves, called parabolae, are illustrated in Figure 3.1. Observe that,
unlike straight lines, parabolae have a minimum — like the point with coordinates
4
Again, this is because the square root of a negative number does not exist!
59
3. Review III — Quadratic equations and parabolae
(2, −1) in Figure 3.1(a) — or a maximum — like the point with coordinates (1, 4) in
Figure 3.1(b). Points like these, where the curve ‘stops going down’ or ‘stops going up’
are called turning points.
y y
y = x2 − 4x + 3
3
4 y = −x2 + 2x + 3
3 3
2
O x O x
1 3 −1 1 3
−1
(a) (b)
Figure 3.1: Two parabolae and their ‘key features’. (a) The parabola with equation y =
x2 − 4x + 3 has a minimum with coordinates (2, −1), the y-intercept is y = 3 and the
x-intercepts are x = 1 and x = 3. (b) The parabola with equation y = −x2 + 2x + 3
has a maximum with coordinates (1, 4), the y-intercept is y = 3 and the x-intercepts are
x = −1 and x = 3.
60
3. Review III — Quadratic equations and parabolae
We start by noting that, following on from Example 3.5, we can write the equation
of this parabola as
y = (x − 2)2 + 1,
in completed square form. This will enable us to find the ‘key features’ of the
parabola as follows.
3
The y-intercept of the parabola occurs when x = 0 and so, substituting x = 0
into the original form of its equation we get y = 3 as the y-intercept.
The x-intercepts of the parabola occur when y = 0 and so we have to solve the
quadratic equation
x2 − 4x + 3 = 0,
which, as we saw in Example 3.8, is easily done if we use the completed square
form. Thus, as we saw there, the solutions to the quadratic equation above are
x = 1 and x = 3 which means that these values of x are the x-intercepts.
The turning point of the parabola can be found by using the completed square
form of its equation. In this case, as we know that (x − 2)2 ≥ 0 for all [real]
values of x we can see that
and so, as y must always be greater than or equal to −1, this must be the
minimum value of y which occurs when (x − 2)2 = 0. Thus, the turning point is
a minimum with coordinates (2, −1).
With this information, we can plot the ‘key features’ of the parabola on the axes and
draw a nice parabolic shape through them to get the sketch in Figure 3.1(a).
Let’s now consider an example where we haven’t done most of the work before.
We start by finding the completed square form of the equation of the parabola. This
can be found by writing
y = −x2 + 2x + 3 = − x2 − 2x + 3,
so that, because
x2 − 2x + 1 = (x − 1)2 ,
we get
y = − (x − 1)2 − 1 + 3 = −(x − 1)2 + 1 + 3 = −(x − 1)2 + 4,
in completed square form. This enables us to find the ‘key features’ of the parabola
as follows.
61
3. Review III — Quadratic equations and parabolae
The x-intercepts of the parabola occur when y = 0 and so we have to solve the
quadratic equation
−x2 + 2x + 3 = 0.
But, using the completed square form, this gives us
The turning point of the parabola can be found by using the completed square
form of its equation. In this case, as we know that (x − 1)2 ≥ 0 for all [real]
values of x we can see that
and so, as y must always be less than or equal to 4, this must be the maximum
value of y which occurs when (x − 1)2 = 0. Thus, the turning point is a
maximum with coordinates (1, 4).
With this information, we can plot the ‘key features’ of the parabola on the axes and
draw a nice parabolic shape through them to get the sketch in Figure 3.1(b).
One thing to note from what we have seen so far is the result of the following activity.
for certain values of p and q. Using this fact, explain why the turning point of the
parabola
y = ax2 + bx + c
will have coordinates (−p, −q) and why it will be
a maximum if a < 0.
In particular, observe how the sign of a determines whether the parabola has a
maximum or a minimum.
62
3. Review III — Quadratic equations and parabolae
To find the points of intersection, we want to find the values of x that make the
values of y from both equations the same, i.e. we seek the values of x that satisfy the
equation
x2 − 4x + 3 = −x + 3. 3
But, in this case, these are easily found because we can rearrange this to get a
quadratic equation which is particularly easy to solve, namely
x2 − 3x = 0 =⇒ x(x − 3) = 0 =⇒ x = 0 or x = 3.
Now we know the values of x, we can substitute them back into either equation to
get the corresponding values of y. So, as y = −x + 3 is the easier equation, we use
this to get
Thus, the required points of intersection between the parabola and the straight line
have coordinates (0, 3) and (3, 0) as illustrated by the ‘•’s in Figure 3.2(a).
To find the points of intersection, we want to find the values of x that make the
values of y from both equations the same, i.e. we seek the values of x that satisfy the
equation
x2 − 4x + 3 = −x2 + 2x + 3.
But, in this case, these are easily found because we can rearrange this to get a
quadratic equation which is particularly easy to solve, namely
2x2 − 6x = 0 =⇒ x2 − 3x = 0 =⇒ x(x − 3) = 0 =⇒ x = 0 or x = 3.
Now we know the values of x, we can substitute them back into either equation to
get the corresponding values of y. So, using y = x2 − 4x + 3, we use this to get
63
3. Review III — Quadratic equations and parabolae
y y
y = x2 − 4x + 3 y = x2 − 4x + 3
y = −x2 + 2x + 3
4
3
3 3
y = −x + 3
2 2
O x O x
1 3 −1 1 3
−1 −1
(a) (b)
Figure 3.2: Returning to the parabola y = x2 − 4x + 3 first seen in Figure 3.1(a) we can
see: (a) its two points of intersection with the straight line y = −x + 3 and (b) its two
points of intersection with the parabola y = −x2 + 2x + 3 first seen in Figure 3.1(b). In
both cases, the points of intersection are indicated by ‘•’s.
Thus, the required points of intersection between the two parabolae have coordinates
(0, 3) and (3, 0) as illustrated by the ‘•’s in Figure 3.2(b).
Learning outcomes
At the end of this unit, you should be able to:
identify the ‘key features’ of a parabola and use these to draw a sketch;
find the points of intersection of a parabola with a straight line or another parabola.
64
3. Review III — Quadratic equations and parabolae
Exercises
Exercise 3.1
Multiply out the following brackets.
Exercise 3.2
Factorise the following quadratic expressions.
Exercise 3.3
Solve the following equations. Try factorising first and then completing the square. Use
the quadratic formula only as a last resort!
i. x2 = 5; iv. x2 = −7x;
ii. x2 + 4x − 5 = 0; v. 2x2 + 5x = 3;
Exercise 3.4
For each of the following, complete the square and then sketch the graph.
i. y = x2 − 6x + 5; iii. y = −x2 − 6x + 6;
In each case, you should determine the coordinates of the turning point and the x and
y-intercepts.
Exercise 3.5
i. Sketch the parabola given by the equation y = 6 − x − x2 and the straight line
given by the equation y = 2x + 4 on the same set of axes.
ii. By solving the appropriate equation, find the points where the parabola and the
straight line intersect.
iii. Sketch another line, parallel to the first, which only intersects the parabola once
and calculate the y-intercept of this second line.
65
3. Review III — Quadratic equations and parabolae
Exercise 3.6
A company sells its products in a market where the price, p, is linked to the quantity
sold, q, by the demand equation p = 120 − 2q.
i. Calculate the market price, and the revenue, if the company sells 35 units. What is
the revenue in terms of q?
3 The company incurs fixed costs of 400 and an additional cost of 12 for each unit
produced.
ii. How much will it cost to produce 35 units? What is the total cost in terms of q?
iii. What profit will the company make from producing and selling 35 units? What is
the profit in terms of q?
iv. By completing the square, calculate the number of units that will maximise the
profit. What is the corresponding market price?
66
4. Functions
Unit 4: Functions
Overview
In this unit we introduce the idea of a function. This will play an important role in the
rest of this course and, in particular, it bridges the gap between what we have seen so
far and what we will see when we look at calculus. 4
Aims
To see how functions can be combined and how they can be used in economics.
4.1 Functions
In this unit, we want to introduce the idea of a function which, at the most basic level,
is just a rule that turns an input into an output. In particular, when we talk about
inputs and outputs we mean numbers, or more specifically, real numbers. These can be
thought of in several ways but, essentially, every number that can be written as a
decimal is a real number. Alternatively, we can think of each real number as a point on
a number line (and vice versa) as illustrated in Figure 4.1. Of course, in a way, we have
√
− 21 2 e π
−3 −2 −1 0 1 2 3
Figure 4.1: The central portion of the real number line and some of the numbers on it.
We will encounter the real numbers e and π shortly.
already seen real numbers represented in this way as the x-axis is just a real number
line which represents all the inputs a rule can have. And, similarly, if we think of the
y-axis as another real number line which represents all the outputs a rule can have, we
may start to appreciate that the curves we have been sketching in Units 2 and 3 are just
ways of visualising how certain rules relate their inputs to their outputs.
67
4. Functions
Now that we have an idea of what our inputs and outputs can be, let’s look more
closely at the relationship between rules and functions.
4 x −→ f −→ f (x),
as this indicates how each input, x, is ‘processed’ by the function f to give the output
f (x). Indeed, observe that we can use any variable to represent the input and so, if we
had used t instead of x, the output would be f (t) and we could write
t −→ f −→ f (t),
to indicate how each input, t, is ‘processed’ by the function f to give the output f (t).
Once we have this notation, we can then capture the effect of any given function on
each input by using an appropriate formula to express the rule.
Example 4.1 Let’s say that the rule we want to capture is ‘square the number and
then add one’. This rule gives us a function, let’s call it f , which can be captured by
the formula f (x) = x2 + 1 which tells us how each input, x, is ‘processed’ by f to
give the output f (x). In particular, we can see that, if x = 1, the output is
f (1) = 12 + 1 = 2 whereas if the input was x = 2, the output would be
f (2) = 22 + 1 = 5.
Notice also, that this rule does define a function because every input, x, gives rise to
exactly one output, namely whatever number x2 + 1 turns out to be. And, indeed, if
we had chosen to use the variable t instead of x we would now be using the formula
f (t) = t2 + 1 to capture the effect of this function.
Activity
√ 4.1 Following on from Example 4.1, find the values of f (0), f (−1) and
f ( 2).
However, not all rules will give us a function. For instance, if we had the rule ‘take the
square root of the number’, we find that
Negative numbers do not have square roots and so this rule gives us no outputs
when the input is a negative number, i.e. this rule can not specify a function
because we do not get at least one output for these inputs.
Positive numbers have two square roots and so this rule gives us two outputs when
the input is a positive number, i.e. this rule can not specify a function because we
do not get at most one output for these inputs.
So, we can see that when looking at whether a rule can define a function, we may need
to take some care when specifying what the inputs are and whether the rule itself
actually satisfies the ‘exactly one output’ requirement.
68
4. Functions
In what follows, we will look at some of the most common functions that occur in
mathematics and we will also see how these functions can be combined to make new
functions.
constant functions: which take the form f (x) = k for some constant k,
linear functions: which take the form f (x) = ax + b for some constants a 6= 0 and b, 4
2
quadratic functions: which take the form f (x) = ax + bx + c for some constants
a 6= 0, b and c.
In particular, we know what all of these functions look like because we saw how to
sketch them in Units 2 and 3. More generally, these are examples of polynomial
functions because they take the form
Exponential functions
Given a positive number a 6= 1, called the base, an exponential function has the form
f (x) = ax ,
and, depending on whether 0 < a < 1 or a > 1, they give us curves like the ones
illustrated in Figure 4.2. In particular, observe that ax 6= 0 for all values of x.
The most important exponential function occurs when the base is the number e which
is approximately 2.71828 (5dp). We will encounter this function, ex , and see some
reasons why it is so special in Units 5 and 9.
Two other functions that we will be interested in are the sine and cosine functions. You
are probably familiar with these from their use in problems involving triangles since we
know that
opposite adjacent
sin(θ) = and cos(θ) = ,
hypotenuse hypotenuse
using the right-angled triangle illustrated in Figure 4.3.
69
4. Functions
y y
y = ax y = ax
4
1 1
O x O x
opposite
ot
p
hy
θ
adjacent
Figure 4.3: The sine and cosine functions can be defined in terms of the sides of a
right-angled triangle.
In this course, however, when we talk about angles, we will measure them in radians
and not degrees. The basic idea here is that π radians, where the number π is
approximately 3.142 (3dp), is the same as 180 degrees and, using this, we can convert
angles in degrees to angles in radians using the formula
π
angle in radians = × angle in degrees.
180
So, if we use the triangles in Figure 4.4 to determine the most important values of these
functions, we would get the results given in the following table.
70
4. Functions
√
π 1 3
30
6 2 2
π 1 1
45 √ √
4 2 2
4
√
π 3 1
60
3 2 2
Activity 4.2 Verify that the results in the table are correct.
π/6
√
2
π/4
3
√
2
π/3 π/4
1 1
(a) (b)
Figure 4.4: The triangles which allow us to find sin(θ) and cos(θ) when (a) θ = π/6 or
θ = π/3 and (b) θ = π/4.
More generally, as illustrated in Figure 4.5, we find that these functions are periodic
with a period of 2π radians, a fact that we could express mathematically by writing
and we can also see that the cosine function is just the sine function ‘shifted to the left’
by π/2 radians, a fact that we could express mathematically by writing
π
cos(x) = sin x + .
2
Observe, in particular, that some other important values of these functions are given in
the following table.
71
4. Functions
0 0 0 1
π
90 1 0
2
4 180 π 0 −1
y = sin(x) y = cos(x)
Figure 4.5: The sine and cosine functions for −π ≤ x ≤ 4π. Notice, in particular, that
they are both periodic with period 2π and that the cosine function is just the sine function
‘shifted to the left’ by π/2 radians.
Activity 4.3 What are sin(x) and cos(x) when x is 3π/2? 2π?
This may sound a bit abstract, but the following example should make it clear.
72
4. Functions
Example 4.2 Suppose that the functions f and g are given by the formulae
i.e. it is just three times f (x), whereas the function f + g would be given by the
formula
(f + g)(x) = f (x) + g(x) = x2 − 4 + ex ,
4
i.e. it is just the sum of f (x) and g(x).
Indeed, if we have two functions, f and g, and two constants, k and l, we can get the
new function kf + lg, called a linear combination of f and g, by using the rule
Example 4.3 Following on from Example 4.2, the function 2f − g would be given
by the formula
√ 4.4 Following on from Example 4.3, find the formulae for the functions
Activity
−f , 2g, f − g, −9f + 2g.
Activity 4.5 Explain how the linear combination rule can be obtained from the
constant multiple and sum rules.
If f and g are functions, write down the rule which would give us the new function
f − g, called the difference of f and g.
Two other ways of combining functions are products and quotients. The former, as its
name may suggest, is what we get when we have two functions, f and g, and we
multiply them together to get the new function f · g, called the product of f and g, by
using the rule
(f · g)(x) = f (x) · g(x).
Similarly, if we divide f by g we get the new function f /g, called the quotient of f and
g, by using the rule
f f (x)
(x) = .
g g(x)
73
4. Functions
It is important to observe, however, that this last rule can only be used if we have
values of x where g(x) 6= 0. In particular, if g(x) = 0 at some value of x, the new
function f /g is undefined at that value of x because division by zero is never allowed.
Example 4.4 Following on from Example 4.2, the function f · g would be given by
the formula
(f · g)(x) = f (x) · g(x) = (x2 − 4) ex ,
i.e. it is just f times g, whereas the function f /g would be given by the formula
x2 − 4
f f (x)
(x) = = ,
4 g g(x) ex
i.e. it is just f divided by g. Notice, in particular, that this quotient is defined for all
values of x because ex is never equal to zero.
Activity 4.6 Following on from Example 4.4, verify that the function g · f is the
same as the function f · g.
Find the formula for the function g/f . For which inputs is this function not defined?
Compositions
The last way of combining functions that we will consider is their composition. If we
have two functions, f and g, then we can get the new function f ◦ g, which is the
composition we get when we apply f after applying g, by using the rule
(f ◦ g)(x) = f (g(x)),
provided that it makes sense to apply the rule for f to each output, g(x), of g. Indeed,
to see what is happening here, it useful to think of these functions as machines again so
that we can represent this composition as
x −→ g −→ g(x) −→ f −→ f (g(x)),
and, in this way, we see that it can only make sense if g(x) is giving us an input for f
that allows us to get its output f (g(x)).
Of course, in a similar manner, we can also get the new function g ◦ f , which is the
composition we get when we apply g after applying f , by using the rule
provided that it makes sense to apply the rule for g to each output, f (x), of f . In this
case, we could represent the composition as
and, in this way, we see that it can only make sense if f (x) is giving us an input for g
that allows us to get its output g(f (x)). In particular, observe that the functions f ◦ g
and g ◦ f are usually different as we can see in the next example.
74
4. Functions
Example 4.5 Following on from Example 4.2, the function f ◦ g would be given by
the formula
(f ◦ g)(x) = f (g(x)) = f ( ex ) = ( ex )2 − 4 = e2x − 4,
whereas the function g ◦ f would be given by the formula
2 −4
(g ◦ f )(x) = g(f (x)) = g(x2 − 4) = ex .
Activity 4.7 Suppose that the functions f and g are given by the formulae 4
√
f (x) = x − 1 and g(x) = x.
Find the formulae for the compositions f ◦ g and g ◦ f . For which inputs is the latter
function not defined?
So far, we have seen how to combine certain functions in different ways to get new
functions. However, when we come to look at calculus, we will also need to do this ‘in
reverse’, i.e. we will need to be able to look at a function and see how it has been
constructed by combining other, simpler functions. This is usually quite straightforward
as illustrated in the following example.
ex sin(x),
Activity 4.8 Find two functions f and g that can be combined to get the functions
x2
(i) x2 ex , (ii) , (iii) e2x , (iv) e2x + 3 ex + 1.
ex
In each case, also indicate the kind of combination that you have found.
75
4. Functions
Activity 4.9 In Activity 2.12, supply was given by the equation 3q = 25 + 7p and
demand was given by the equation 2q + 5p = 500. Find the supply and demand
functions.
Use these functions to find the equilibrium price and quantity.
Activity 4.10 A company sells each unit of its product for £4. What is its revenue
function?
If its profit function is given by π(q) = −q 2 + 6q − 4, what is its cost function?
x ←− f −1 ←− f (x).
1
Notice, in particular, that we use the Greek letter ‘π’ to denote the profit function as we are already
using ‘p’ to denote prices.
76
4. Functions
In particular, if we can find such a function, called the inverse of f , we see that it takes
the original outputs, f (x), as inputs and gives us the corresponding original inputs, x,
as outputs.2 Indeed, we will find that some functions have an inverse whereas others do
not, unless we take some care with the inputs and outputs that we are considering.
Another thing to notice is that, if an inverse function exists, then the composition of a
function and its inverse gives us a function which takes an input and gives us this very
same input as its output. To see this, consider that the composition f −1 ◦ f can be
represented as
x −→ f −→ f (x) −→ f −1 −→ x,
and so we should always find that, if the inverse exists, 4
(f −1 ◦ f )(x) = f −1 (f (x)) = x,
y −→ f −1 −→ f −1 (y) −→ f −→ y,
(f ◦ f −1 )(y) = f (f −1 (y)) = y.
In particular, notice that this is one of the few cases where the composition gives us the
same function regardless of the order in which we perform the composition. As we shall
see, the fact that the composition of a function and its inverse must behave in this way
will provide us with a useful way of verifying that we have found the correct formula for
an inverse function!
f (x) = 2x + 3.
77
4. Functions
as the equation which relates the inputs, x, of f to its outputs, y. Rearranging this
to find x in terms of y, we find that
y−3
y = 2x + 3 =⇒ 2x = y − 3 =⇒ x= ,
2
and this equation now relates the outputs, y, of f to its inputs, x. Moreover, as each
value of y will give exactly one value of x, the inverse function exists and so,
thinking of this equation as x = f −1 (y), we can then deduce that
y−3
f −1 (y) = ,
4 2
is the formula for the inverse of f .
In particular, notice that we can verify that this is correct by noting that
(2x + 3) − 3 2x
(f −1 ◦ f )(x) = f −1 (f (x)) = f −1 (2x + 3) = = = x,
2 2
and, indeed, that
y−3 y−3
−1 −1
(f ◦ f )(y) = f (f (y)) = f =2 + 3 = (y − 3) + 3 = y,
2 2
Indeed, thinking back to our discussion of supply and demand functions in Section 4.1.4,
we may be able to find their inverses. That is, if we have a supply equation which can
be written in the form p = pS (q), then we call pS the inverse supply function as it is just
qS−1 whereas if we have a demand equation which can be written in the form p = pD (q),
−1
then we call pD the inverse demand function as it is just qD . With these functions, we
can then see that the equilibrium quantity, i.e. the quantity that makes the suppliers’
price equal to the consumers’ price, can be found by solving the equation
pS (q) = pD (q),
and then we can use either of these functions to find the corresponding equilibrium
price.
Activity 4.11 Following on from Activity 4.9, where the supply was given by the
equation 3q = 25 + 7p and demand was given by the equation 2q + 5p = 500. Find
the inverse supply and demand functions.
Use these functions to find the equilibrium price and quantity.
However, situations where inverses do not exist are not hard to find as the next example
shows.
78
4. Functions
f (x) = x2 .
If the output, y, of f is positive this equation gives us two values for the
corresponding input, x.
And, of course, we need to get exactly one value of x for each value of y in order to
define an inverse function!
Although, having said this, if we take some care with the inputs and outputs that we
are considering, then we can often overcome such problems and find an inverse function
as the next example shows.
f (x) = x2
and we only want to consider values of x that are positive, i.e. we have x > 0. If we
set y = f (x), this gives us
y = x2
as the equation which relates the inputs, x, of f to its outputs, y. In particular, as
x2 > 0 for all values of x > 0, this can only give us outputs, y, that are positive, i.e.
we have y > 0. Rearranging this equation to find x in terms of y, we again find that
√
y = x2 =⇒ x = ± y,
and this equation again relates the outputs, y, of f to its inputs, x. But, now we can
√
find an inverse function since y > 0 means that we can always find a value for y
√
and x > 0 means that we are only interested in the positive value, + y, we get from
√
the square root, i.e. we can reject the problematic − y as we know that the values
of x that we started with are positive. That is, since f only takes positive numbers
as inputs and only gives positive numbers as outputs, we must have
√
x= y,
79
4. Functions
for all allowed values of x and y. Consequently, as each allowed value of y will give
exactly one allowed value of x, the inverse function exists and so, thinking of this
equation as x = f −1 (y), we can then deduce that
√
f −1 (y) = y,
Activity 4.12 Following on from Example 4.9, verify that the inverse is correct by
showing that
4 (f −1 ◦ f )(x) = x and (f ◦ f −1 )(y) = y,
as we should expect.
Activity 4.13 Following on from Example 4.9, suppose that the function f is again
given by the formula f (x) = x2 , but now we only want to consider values of x that
are negative. Does f have an inverse? If it does, what is it?
If you do find an inverse, verify that
Furthermore, in some cases, this method for finding an inverse function just does not
work as we have no useful algebraic way of ‘rearranging’ the relevant equation. Instead,
we often have to define an entirely new, but related, function to do the job. This is what
happens, for instance, when we have an exponential function of the form
f (x) = ax ,
4.2.2 Logarithms
Logarithms are defined as follows.
Logarithms
80
4. Functions
Activity 4.14 Suppose that a 6= 1 is any positive number. Explain why the
following results are true. 4
1
i. loga (1) = 0, ii. loga (a) = 1, iii. loga = −1, iv. loga (ab ) = b.
a
Activity 4.15 Suppose that, for some positive number, a 6= 1, the function, f , is
given by the formula
f (x) = ax .
Explain why the inverse of f is given by f −1 (y) = loga (y) as long as y is a positive
number.
Why does the inverse of f not exist if a = 1?
As logarithms to base a are closely related to powers of a, the power laws that we saw
in Unit 1 allow us to deduce the laws of logarithms. These are as follows.
81
4. Functions
Activity 4.16 Explain why loga (x2 ) = 2 loga (x) and loga (x3 ) = 3 loga (x) using (i)
the power law and (ii) the multiplication law. Can you see how this generalises?
Changing base
Generally, when we use logarithms, we use logarithms to the base 10 or base e. As these
bases are so special, we have special names for them:
Logarithms to the base 10 are denoted by ‘log’ and are called ‘common logarithms’.
Logarithms to the base e are denoted by ‘ln’ and are called ‘natural logarithms’.
The main reason for emphasising these logarithms is that many calculators have
buttons which enable us to easily work them out. But, in this course, the basic
calculator which you are allowed to use in the examination does not have these buttons
and so the values of any logarithms (which can not easily be figured out) will be given
to an appropriate number of decimal places.
as this value is not easy to figure out. Or, we might be told some other information
that would allow us to find this value using our basic calculators.
82
4. Functions
Sometimes, however, it is convenient to work with logarithms to some other base a, i.e.
‘loga ’, and in such cases, when it comes to working them out we need to know how to
convert the ‘loga ’ into, say, ‘log’s or ‘ln’s or whatever so that we can use any given
values to evaluate them. For such purposes the rule is as follows.
Given two bases a and b, we can convert a logarithm to base a, say loga (x), into a
logarithm to base b, say logb (x), by using
logb (x)
loga (x) =
logb (a)
. 4
In particular, if we were given the relevant ‘log’s or ‘ln’s we would have to use
log(x) ln(x)
loga (x) = or loga (x) = ,
log(a) ln(a)
respectively.
To see how this works, let’s say that we wanted to work out log100 (10000) using
common logarithms. Given the numbers involved we don’t need a calculator to see that
Activity 4.18 Following on from Example 4.11, use the change of base rule for
logarithms to find log(101) to 5dp given that, to 5dp, ln(101) = 4.61512 and
ln(10) = 2.30259.
Learning outcomes
At the end of this unit, you should be able to:
solve problems that involve the given common functions and combinations of them;
83
4. Functions
Exercises
Exercise 4.1
4 The function C(x) = 10x + 315 gives the cost of producing x units of a product. Find
the cost when the following numbers of units are produced.
Exercise 4.2
Suppose that the functions f , g and h are given by
Exercise 4.3
To convert a temperature from degrees Fahrenheit to degrees Centigrade we subtract 32
and then multiply by 5/9. If f is the temperature in degrees Fahrenheit and c is the
temperature in degrees Centigrade, find the function c(f ).
What is the inverse of this function?
Exercise 4.4
The total revenue that a firm receives from selling different levels of output q is given by
the function R(q) = 40q − 4q 2 for 0 ≤ q ≤ 10. A manager would like to know the inverse
function so that they can determine how many products need to be sold in order to
obtain a certain revenue. Explain why it is not possible to find this inverse function.
What happens if 0 ≤ q ≤ 5?
Exercise 4.5
Use the laws of logarithms to evaluate the following.
84
4. Functions
Exercise 4.6
Solve the following equations.
2
(i) x2 = 4, (ii) 2x = 4, (iii) 2x = 4.
85
5. Calculus I — Differentiation
Unit 5: Calculus I
Differentiation
Overview
In this unit, we start our study of calculus by introducing the notion of differentiation.
In particular, we ask how we can find the gradient of a curve at a point and see that
differentiation allows us to answer this question in a very easy way. We also see how to
differentiate some simple functions using standard derivatives and rules of
5 differentiation.
Aims
Example 5.1 In economics, if we are given a function which links the profit, π, of
a firm to its production level, q, we may want to find out how the profit changes if
we change the production level.
Indeed, if the profit function is linear, i.e. its graph is a straight line, we have
something like
π(q) = mq + c,
and we can easily see how this works since the gradient of a linear function tells us
how changes in profit, ∆π, are related to changes in the production level, ∆q. That
is, we have
∆π
m= ,
∆q
86
5. Calculus I — Differentiation
However, we can only tell such a simple story for linear functions, i.e. straight lines,
because the gradient of a straight line is constant. That is, as we saw before, whichever
two points on the line we use to calculate the gradient, we always get the same answer.
But, unfortunately, this doesn’t work for more complicated curves.
Example 5.2 If we are given the quadratic function f (x) = x2 , whose graph is the
parabola y = x2 , we could try to estimate its gradient at the point (2, 4) by
considering the changes in x and y between this point and the points, say, (3, 9) and
(4, 16) which also lie on the parabola.
∆y 16 − 4 12
= = = 6.
∆x 4−2 2
But, unsurprisingly perhaps, these give us different values and so, clearly, we can not
just use a pair of points on a parabola to find its gradient at the point (2, 4).
So, in the case of non-linear functions, i.e. curves that are not straight lines, since we
cannot just look at the changes between a point and any other point on the curve to
find the gradient of the curve at that point, we must ask what we can do to find it.
87
5. Calculus I — Differentiation
y y
y = x2 y = x2
A A
4 4
C
5 O
2
x O
2
x
T L2 T L1
(a) (b)
Figure 5.1: (a) The straight line labelled T is the tangent to the parabola y = x2 at
the point, A, on the curve with coordinates (2, 4). (b) The tangent, T , goes through the
point, A, with coordinates (2, 4) in a special way, namely, unlike other lines through A
(such as the lines L1 and L2 ) it only has one point of intersection with the parabola.
(Note that L1 is ‘steeper’ than T and so it cuts the parabola at both A and B whereas
L2 is ‘shallower’ than T and so it cuts the parabola at both A and C.)
L2 is shallower than T .
That is, we can see that the gradient of T must be somewhere between the gradient of
L2 and the gradient of L1 . This means that we can try and find the gradient of T by
considering the gradients of other lines whose gradients provide us with better estimates
of its value and one way of doing this is to look at chords.
88
5. Calculus I — Differentiation
y
9 B
9
y = x2
64
9
C
49
9
4
A 4
O 2 x
7 8 3
O
2 3
x T 1
3 3 5
3
2
T 3
1
(a) (b)
Figure 5.2: (a) C, the chord joining the points A(2, 4) and B(3, 9) on the parabola y = x2
and, T , the tangent to y = x2 at the point (2, 4). (b) The chords joining the points (3, 9),
8 64
and 73 , 49
,
3 9 9
to the point (2, 4). Observe that, as the x-coordinate of the chord gets
closer to x = 2, the gradient of the chord gets closer to the gradient of T , the tangent to
y = x2 at the point (2, 4).
9−4 5
(3, 9) has a gradient given by m = = = 5.
3−2 1
64 28
8 64
−4 14
9 9
= 4 23 .
,
3 9
has a gradient given by m = 8 = 2 =
3
−2 3
3
49 13
7 49
−4 13
9 9
= 4 13 .
,
3 9
has a gradient given by m = 7 = 1 =
3
−2 3
3
In particular, notice that as the x-coordinate of the other point on the curve gets closer
to x = 2, the gradients of the chords get smaller (i.e. the chords get less steep) and get
closer to the gradient of T . That is, we are getting better estimates of the gradient of T
and this is an idea that we can generalise to find the gradient of T itself!
To see how this generalisation works, let h > 0 be a real number and consider the chord,
C, joining the points (2, 4) and (2 + h, [2 + h]2 ) on the parabola y = x2 as in
Figure 5.3(a). As before, we now look at the gradient of the chord joining these two
points and we find that
[2 + h]2 − 4 [4 + 4h + h2 ] − 4 4h + h2
m= = = = 4 + h.
(2 + h) − 2 h h
89
5. Calculus I — Differentiation
Now, if we let h get smaller, i.e. the x-coordinate of B gets closer to x = 2, we should
get a better estimate of the gradient of T .2 Indeed, we can see that as h gets closer and
closer to zero, m gets closer and closer to four and so, this must be the sought-after
gradient of T .
[2 + h]2 B
y
y= x2
[2 + h1 ]2
5 C
[2 + h2 ]2
[2 + h3 ]2
A
4
4
O 2 x
O x
2 2+h T h3
T h2
h1
(a) (b)
Figure 5.3: C, the chord joining the points A(2, 4) and B(2 + h, [2 + h]2 ) on the parabola
y = x2 for some real number h > 0 and, T , the tangent to y = x2 at the point (2, 4). (b)
As we take three successively smaller values of h given by h1 > h2 > h3 , we see that the
gradient of the chord gets closer to the gradient of T .
But, of course, we can generalise this method further by asking for the gradient of the
tangent at any point (x, x2 ) on the curve y = x2 . In this case, we want the chord joining
the point (x, x2 ) with the point (x + h, [x + h]2 ) for some real number h > 0. The
gradient of the chord joining these two points is then given by
Now, if we let h get smaller, i.e. the x-coordinate of the point (x + h, [x + h]2 ) gets
closer to the x-coordinate of the point (x, x2 ), we should get a better estimate of the
gradient of the tangent line at the point (x, x2 ). Indeed, we can see that as h gets closer
and closer to zero, m gets closer and closer to 2x and so, this must be the sought-after
gradient of the tangent to y = x2 at the point (x, x2 ). Indeed, notice that if we have
x = 2 as we did above, we find that the gradient of the tangent is 2 × 2 = 4 as before!
2
This idea is illustrated in Figure 5.3(b) where we take three successively smaller values of h given
by h1 > h2 > h3 and see how the gradient of the chord gets closer to the gradient of T .
90
5. Calculus I — Differentiation
f (x + h) − f (x) f (x + h) − f (x)
m= = ,
(x + h) − x h
and the gradient of the tangent to the curve y = f (x) at the point (x, f (x)) is what this
gives us when we let h go to zero.
Activity 5.1 What is the gradient of the tangent to the curve y = k (where k is a
fixed real number) at the point with x-coordinate, x? 5
Activity 5.2 What is the gradient of the tangent to the curve y = mx + c (where
m 6= 0 and c are fixed real numbers) at the point with x-coordinate, x?
Activity 5.3 What is the gradient of the tangent to the curve y = ax2 + bx + c
(where a 6= 0, b and c are fixed real numbers) at the point with x-coordinate, x?
df
, or more compactly, f 0 (x),
dx
and this notation tells us to differentiate f (x) with respect to the variable x. The
result of this procedure is called the derivative of f (x) with respect to x.
Example 5.3 As we saw above, the gradient of the tangent to y = f (x) with
f (x) = x2 at the point (x, x2 ) is given by 2x. Thus, the derivative of f (x) with
respect to x can be written as
df
= 2x, or more compactly, f 0 (x) = 2x.
dx
91
5. Calculus I — Differentiation
If we want to calculate the derivative at a certain point, say when x = 2, we can now
evaluate
df
= 2 · 2 = 4, or more compactly, f 0 (2) = 2 · 2 = 4.
dx x=2
By definition, this must be the gradient of the tangent line to the parabola y = x2 at
the point where x = 2, i.e. the point with coordinates (2, 4), and this is indeed what
we found earlier.
Most functions can be differentiated, but we don’t want to use the definition of
differentiation every time we need to find a derivative. So, we let other people do the
hard work and take note of two different kinds of thing they can tell us, namely:
f 0 (x) = 0.
That is, if k is a constant, f (x) = k is a function whose derivative (or gradient) is equal
to zero at every point.
92
5. Calculus I — Differentiation
f 0 (x) = kxk−1 .
f 0 (x) = ex .
Observe, in particular, that this is one of the special properties of the function ex where
e is the exponential constant: it is the function whose value and gradient are the same
at every point. (We will see another special property of ex in Section 9.1.3.)
93
5. Calculus I — Differentiation
Standard derivatives
We now look at how we can differentiate some simple combinations of these functions.
94
5. Calculus I — Differentiation
could find:
a constant multiple of f , which was the function kf where (kf )(x) = k · f (x).
the sum of f and g, which was the function f + g where (f + g)(x) = f (x) + g(x).
The question is, if we can differentiate the functions f and g, can we also differentiate
the functions kf and f + g? Obviously, the answer is ‘yes’, and we do it by using rules
of differentiation. Among other things, these rules will allow us to differentiate any
polynomial function of x.
The constant multiple rule tells us how to differentiate a constant multiple of a function
f (x) and it works as follows.
5
Constant multiple rule
So, in these cases, we just differentiate as before and then multiply the answer by
the appropriate constant multiple.
The sum rule tells us how to differentiate the sum of two functions f (x) and g(x) and it
works as follows.
Sum rule
95
5. Calculus I — Differentiation
It should be clear that, taken together, our two rules of differentiation enable us to
differentiate functions of the form kf (x) + lg(x) as follows.
So, in these cases, we just differentiate as before and combine the answers in the
obvious way.
Activity 5.5 Show that the constant multiple rule and the sum rule do indeed give
the linear combination rule.
Hence, use the linear combination rule to find the derivative of the function
f (x) − g(x) in terms of the derivatives of the functions f (x) and g(x).
96
5. Calculus I — Differentiation
There are some functions, related to the ones that we have seen above, which we can
differentiate using what we have seen so far.
Example 5.10 Please note that, for two functions f (x) and g(x),
d df dg
[f (x) · g(x)] is NOT · .
dx dx dx
d f (x) df dg
is NOT .
dx g(x) dx dx
And, for things like f (x) = e2x , we can NOT say what f 0 (x) is, as even though we
can differentiate ex , we don’t yet know how to deal with the ‘2’ in e2x .
The correct way of differentiating all of the things listed in this example will be dealt
with in Unit 6.
Activity 5.6 If k is a constant, for each of the following functions, find its
derivative or explain why it can not be found using the results in this unit.
k
(i) f (x) = ex+k , (ii) g(x) = ekx , (iii) h(x) = ex .
Activity 5.7 If k is a constant, for each of the following functions, find its
derivative or explain why it can not be found using the results in this unit.
(i) f (x) = ln(x + k), (ii) g(x) = ln(kx), (iii) h(x) = ln(xk ).
So far, we have only been differentiating functions of x, like f (x), with respect to x. But
sometimes, we will want to differentiate functions of other variables with respect to
97
5. Calculus I — Differentiation
their variable. The good news is that everything we have seen so far carries over in a
straightforward way.
That is, everything stays the same with the exception that the ‘x’s are now replaced
with ‘y’s.
5
Example 5.12 Similarly, if f (q) = q 2 − 3q + 7, then f 0 (q) = 2q − 3(1) + 0 = 2q − 3.
Learning outcomes
At the end of this unit, you should be able to:
explain the relationship between the gradient of a curve and the derivative of a
function;
find simple derivatives by using standard derivatives and the rules of differentiation.
Exercises
Exercise 5.1
Consider the parabola y = x2 and the point (3, 9) that is on this curve.
i. Find the gradient of the chords joining this point to the points on the curve with
x = 4, x = 3 12 and x = 3 14 .
ii. Find the gradient of the chord joining this point to the point on the curve with
x = 3 + h where h > 0 is a real number. What value does this give you as h goes
to zero?
iii. By differentiating the function f (x) given by the curve y = f (x) above, find the
gradient of the curve at the point (3, 9).
Note: Your final answers to ii. and iii. should be the same!
98
5. Calculus I — Differentiation
Exercise 5.2
Find the derivatives of the following functions.
i. f (x) = −17; v. k(x) = 3 + ln(x);
Exercise 5.3
Find the derivatives of the following functions.
√
i. f (x) = sin(x) + cos(x); vi. l(x) = 5 x;
x 2
5
ii. g(x) = ln(x) + 4 e ; vii. n(x) = 3x − 5x + 7;
4 2 3
v. k(x) = ; x. s(x) = + 5.
x3 x 2 2x
Exercise 5.4
Find the derivatives of the following functions.
√
i. f (y) = 6y − 5; iii. h(z) = z 2 − z;
99
6. Calculus II — More differentiation
Unit 6: Calculus II
More differentiation
Overview
Aims
The product rule tells us how to differentiate the product of two functions f (x) and g(x)
and it works as follows.
1
Provided, of course, that g(x) 6= 0.
100
6. Calculus II — More differentiation
Product rule
We can write this function as h(x) = (x + 1)(x + 1) and so we have the product of
the two functions
f (x) = x + 1 and g(x) = x + 1,
and these give us
f 0 (x) = 1 and g 0 (x) = 1.
As such, the product rule tells us that
h(x) = x2 + 2x + 1,
if we factorise. Clearly, this is the same as the answer we got from the product rule.
Here, we can not check the answer as we can not rewrite the function h(x) = x ex .
101
6. Calculus II — More differentiation
Here, we can not check the answer as we can not rewrite the function h(x) = x ln(x).
Here, we can not check the answer as we can not rewrite the function
h(x) = ex ln(x).
Quotient rule
Of course, this all assumes that the quotient of the two functions is defined, i.e. it
only works for values of x where g(x) 6= 0.
102
6. Calculus II — More differentiation
x+1
Example 6.5 Differentiate the function h(x) = for x 6= 0.
x
This is the quotient of the two functions
(1)(x) − (x + 1)(1) 1
h0 (x) = 2
= − 2,
x x
for x 6= 0. Notice that we can check this answer as we can write h(x) as
x+1 x 1 1
h(x) = = + = 1 + = 1 + x−1 ,
x x x x
and, differentiating, this gives us
6
1
h0 (x) = 0 + (−x−2 ) = − .
x2
Clearly, this is the same as the answer we got from the quotient rule.
ex
Example 6.6 Differentiate the function h(x) = for x 6= 0.
x
This is the quotient of the two functions
x3
Example 6.7 Differentiate the function h(x) = for x 6= 1.2
ln(x)
This is the quotient of the two functions
f (x) = x3 and g(x) = ln(x),
and these give us
1
f 0 (x) = 3x2 and g 0 (x) = .
x
103
6. Calculus II — More differentiation
for x 6= 1. Here, we can not check the answer as we can not rewrite the function
h(x) = x3 / ln(x).
ln(x) 3
Example 6.8 Differentiate the function h(x) = .
ex
This is the quotient of the two functions
Here, we can not check the answer as we can not rewrite the function
h(x) = ln(x)/ ex .
Chain rule
104
6. Calculus II — More differentiation
As such we have
f 0 (g) = 2g and g 0 (x) = 1,
and so the chain rule tells us that
Notice that this is the same as the answer we found Example 6.1.
As such we have
f 0 (g) = 3g 2 and g 0 (x) = 2,
and so the chain rule tells us that
if we factorise. And, clearly, this is the same as the answer we got from the chain
rule.
√
Example 6.11 Differentiate the function h(x) = 2x + 1.
√
The function h(x) = 2x + 1 is the composition of the functions
√ 1
f (g) = g = g2 and g(x) = 2x + 1.
As such we have
1 1
f 0 (g) = g − 2 and g 0 (x) = 2,
2
and so the chain rule tells us that
0 1 −1 1 1
h (x) = g 2 (2) = g − 2 = √ .
2 2x + 1
Here, we√can not check the answer as we can not rewrite the function
h(x) = 2x + 1.
105
6. Calculus II — More differentiation
√
Example 6.12 Differentiate the function h(x) = x3 + 2.
√
The function h(x) = x3 + 2 is the composition of the functions
√ 1
f (g) = g = g2 and g(x) = x3 + 2.
As such we have
1 1
f 0 (g) = g − 2 and g 0 (x) = 3x2 ,
2
and so the chain rule tells us that
3x2 3x2
0 1 −1
h (x) = g 2 (3x2 ) = √ = √ .
2 2 g 2 x3 + 2
Here, we√can not check the answer as we can not rewrite the function
h(x) = x3 + 2.
106
6. Calculus II — More differentiation
k
The function h(x) = ex is the composition of the functions
As such we have
f 0 (g) = eg and g 0 (x) = kxk−1 ,
and so we get
k
h0 (x) = ( eg )(kxk−1 ) = kxk−1 eg = kxk−1 ex ,
from the chain rule.
and
1
f (x) = loga (x) =⇒ f 0 (x) = .
x ln(a)
Example 6.13 Find the derivative of the function l(x) = (x3 + 1) ln(x2 + 4).
and clearly, f 0 (x) = 3x2 . But, to differentiate g(x), we need to use the chain rule
because it is a composition. In this case, we have
which gives us
1
g 0 (h) = and h0 (x) = 2x,
h
so that
0 1 2x 2x
g (x) = (2x) = = 2 ,
h h x +4
107
6. Calculus II — More differentiation
by the chain rule. Now, putting all of this into the product rule gives us
2x(x3 + 1)
0 2 2 3 2x 2 2
l (x) = 3x ln(x + 4) + x + 1 = 3x ln(x + 4) + ,
x2 + 4 x2 + 4
2
Example 6.14 Find the derivative of the function l(x) = ex ln(x3 + 1).
which gives us
1
g 0 (h) =and h0 (x) = 3x2 ,
h
so that
3x2 3x2
0 1
g (x) = (3x2 ) = = 3 ,
h h x +1
by the chain rule. Now, putting all of this into the product rule gives us
3x2 3x2
0 x2 3 x2 3 2
l (x) = 2x e ln(x + 1) + e 3
= 2x ln(x + 1) + 3 ex ,
x +1 x +1
108
6. Calculus II — More differentiation
caused by changing the quantity produced from, say, q0 to q0 + ∆q, i.e. an increase in
production of ∆q. In this case, the exact expression for the change in costs given this
change in production will be
∆C = C(q0 + ∆q) − C(q0 ).
This can be thought of as the marginal (or additional) cost of producing an extra
quantity, ∆q, of our good. But, if ∆q is small4 we can find an approximation for ∆C
which uses the derivative of the cost function, C0 (q), namely
∆C ' C0 (q0 )∆q.
Let’s look at an example to see how the answers from these two approaches compare.
Either way, we can see that the increase in costs resulting from an increase in
production from 50 to 51 units would be about £300.
The reason why we can use the derivative here is that, geometrically, C0 (q0 ) is the
gradient of the tangent line, T , to the curve y = C(q) at the point (q0 , C(q0 )) and so,
looking at this tangent line we can see that
dC ∆C
C0 (q0 ) = ∆C ' C0 (q0 )∆q,
' =⇒
dq q=q0
∆q
as shown in Figure 6.1. As such, the discrepancy between our exact and approximate
values for ∆C is the difference between the y-coordinates of the curve y = C(q) and the
tangent line T when q = q0 + ∆q. Obviously, the smaller ∆q is, the smaller this
discrepancy will be!
In fact, economists often work with marginal quantities and so, given a function f (x),
we define the marginal function of f to be f 0 (x). This will allow us to find the
approximate value of ∆f , the change in f associated with a change in x from x0 to
x0 + ∆x, by using the formula
∆f ' f 0 (x0 )∆x.
For example, we can define the following important marginal functions from economics.
4
In a sense that we do not exactly specify in this course!
109
6. Calculus II — More differentiation
y = C(q)
T
C(q0 + ∆q)
Figure 6.1: The curve y = C(q) and the tangent, T , to this curve at the point (q0 , C(q0 )).
Looking at an increase in q from q0 to q0 + ∆q, we can see that the corresponding change
in the function C(q), i.e. ∆C, is given exactly by C(q0 + ∆q) − C(q0 ) and approximately
by C0 (q0 )∆q where C0 (q0 ) is the gradient of the tangent line.
6
If C(q) is a cost function, MC(q) = C0 (q) is the marginal cost function.
Example 6.16 The profit function for a firm is π(q) = 100 + 20q − 2q 2 pounds
when it is selling a quantity q. If the quantity sold is increased from 10 to 10.2, what
will be the change in profit?
Hence, the profit will decrease by approximately £4 if the quantity sold is increased
from 10 to 10.2 units.
Learning outcomes
At the end of this unit, you should be able to:
110
6. Calculus II — More differentiation
Exercises
Exercise 6.1
For the following functions, identify the functions f (x) and g(x) such that the function
is f (x)g(x) and hence find the derivative of the function using the product rule.
√
i. x2 (x + 2); iii. 3x4 x;
Exercise 6.2
For the following functions, identify the functions f (x) and g(x) such that the function
is f (x)/g(x) and hence find the derivative of the function using the quotient rule.
x+2 4x2 + 1
i. ; iii. ;
x2 x3 − 2x 6
32x5 + 3 2x2 + 7
ii. ; iv. .
2x5 ex
Also, in parts i. and ii. check that your answer is correct by rewriting the function and
differentiating it without using the quotient rule. (Note that this check cannot easily be
performed in parts iii. and iv.)
Exercise 6.3
For the following functions, identify the functions f (x) and g(x) such that the function
is f (g(x)) and hence find the derivative of the function using the chain rule.
i. (x + 2)2 ; iii. (x4 + 3)−1 ;
√
ii. (x3 + 3x)2 ; iv. 3
2x − 1.
Also, in parts i. and ii. check that your answer is correct by rewriting the function and
differentiating it without using the chain rule. (Note that this check cannot be
performed in parts iii. and iv.)
Exercise 6.4
Differentiate the following functions using the appropriate rules.
p√
i. x5 (x8 + x2 ); iv. x + x;
x4
ii. (x3 + 3)3 ; v. ;
1 + 2x6
111
6. Calculus II — More differentiation
Exercise 6.5
Differentiate the following functions with respect to the independent variable using
whichever rule is appropriate.
3 2z 5
i. ; iv. ;
y+1 32z 5 + 3
Exercise 6.6
The level of demand for a product, q, is linked to its price, p, by the equation
p2 q = 6, 000. By writing q as a function of p and differentiating, estimate how sales will
change if the price is increased from £10 to £10.50.
What is the exact value of the change in sales if the price is increased from £10 to
£10.50?
6
112
7. Calculus III — Optimisation
Overview
Having seen how to differentiate functions, we now turn our attention to some
applications of differentiation. In particular, we are interested in what derivatives can
tell us about the behaviour of a function. This will lead on to a study of how we can
optimise a function of one variable, i.e. how we can use differentiation to find the
maximum and/or minimum values of such a function, and how this information is
invaluable when we want to sketch their graphs.
Aims
113
7. Calculus III — Optimisation
y y
y = f (x) y = f (x)
T T
O x O x
(a) (b)
Figure 7.1: As x increases, we see that at the indicated value of x, the function f (x) is
7 increasing in (a) and decreasing in (b). These correspond to points on the curve where
the tangent line has a positive or negative gradient in (a) and (b) respectively. That is,
the derivative, i.e. f 0 (x), of the function at these values of x will be positive or negative
respectively.
Quite apart from the application of this idea to optimising functions of one variable,
this idea can be useful in economic contexts as the following example shows.
The firm’s profit will be decreasing when π 0 (q) < 0, i.e. when we have
i.e. if q > 5, then the firm’s profit is decreasing as q increases. As such, it would be
unwise for the firm to produce more than five units since this puts them in a
position where their profits are decreasing!
114
7. Calculus III — Optimisation
If f 0 (x) = 0, then the function is stationary at that value of x. We call such values
of x stationary points.
In particular, the key idea is that f 0 (x) tells us the gradient of the tangent line to the
curve at this value of x, labelled T in Figure 7.2, and when f 0 (x) = 0 we find that this
tangent line must be horizontal.
y y
y = f (x)
O x O x
y = f (x)
(a) (b)
Figure 7.2: Two stationary points, i.e. points where the derivative is zero. Notice that in
(a) the stationary point is a maximum and in (b) it is a minimum. 7
Indeed, we can see that at the point in
Figure 7.2(a), as x increases through the point, the function increases until it is
stationary and then it decreases again, i.e. we have
f 0 (x) > 0 until f 0 (x) = 0 and then f 0 (x) < 0,
and in such circumstances we say that the stationary point of the function is a
maximum.
Figure 7.2(b), as x increases through the point, the function decreases until it is
stationary and then it increases again, i.e. we have
f 0 (x) < 0 until f 0 (x) = 0 and then f 0 (x) > 0,
and in such circumstances we say that the stationary point of the function is a
minimum.
That is, the maxima and minima of a function, f (x), will be amongst its stationary
points, i.e. points where f 0 (x) = 0, and we can identify whether we have found a
maximum or minimum by seeing how the sign of f 0 (x) changes as we move through the
stationary point.
When looking for stationary points, i.e. finding the values of x where f 0 (x) = 0, there
will be cases where what we find will be neither a maximum nor a minimum. In such
115
7. Calculus III — Optimisation
cases, we will have a stationary point which is a point of inflection. Indeed, if we look at
the stationary point in:
Figure 7.3(a), as x increases through the point, the function increases until it is
stationary and then it increases again, i.e. we have
Figure 7.3(b), as x increases through the point, the function decreases until it is
stationary and then it decreases again, i.e. we have
y = f (x)
T T
7 y = f (x)
O x O x
(a) (b)
Figure 7.3: Two more stationary points, i.e. points where the derivative is zero. Notice
that in both (a) and (b) the stationary point is a point of inflection.
We will sometimes refer to stationary points which are maxima or minima as turning
points since the function ‘turns’ (or ‘changes direction’) at these points. However,
stationary points that are points of inflection are not turning points.
116
7. Calculus III — Optimisation
i.e. we differentiate the derivative again, and we denote the result of doing this by
d2 f
, or more compactly, f 00 (x).
dx2
Unsurprisingly, perhaps, we call this the second derivative of the original function, f (x).
Of course, we could differentiate again to get the third derivative of f (x) and so on, but
the third and higher derivatives of f (x) are not necessary for this course.
df
= 3x2 + 2x + 1, or more compactly, f 0 (x) = 3x2 + 2x + 1.
dx
Now, if we want to find the second derivative of f (x), we want to differentiate f 0 (x),
i.e. we want
d2 f
d df d
3x2 + 2x + 1 = 6x + 2,
2
= =
dx dx dx dx
or, more compactly, f 00 (x) = 6x + 2. Indeed, if we want to calculate the second
derivative at a certain point, say when x = 2, we can now evaluate
d2 f
= 6 · 2 + 2 = 14, or more compactly, f 00 (2) = 6 · 2 + 2 = 14.
dx2 x=2
Of course, we could now differentiate this again to get the third derivative of f (x),
7
but we won’t!
117
7. Calculus III — Optimisation
However, if our stationary point is a point of inflection, we find that f 0 (x) decreases (or
increases) until it is zero and then it increases (or decreases) as x increases through the
point where x = a, i.e. we find that f 0 (x) is neither increasing nor decreasing when
x = a, and this means that we will find f 00 (a) = 0.
Warning! But, having said this, do not think that f 00 (a) = 0 implies that a stationary
point is a point of inflection! The fact is that, in cases where f 00 (a) = 0, second
derivatives fail to tell us anything useful about the nature of a stationary point at
x = a. In particular, f 00 (a) = 0 is compatible with a stationary point being a maximum
or a minimum as well as a point of inflection! To see that this is the case, try the
following activity.
Activity 7.1 Consider the functions
Show that all four of these functions have a stationary point at x = 0 (i.e. that their
first derivatives are zero when x = 0) and that their second derivatives are also zero
when x = 0.
By considering how the derivatives of these four functions change as you go through
the stationary point with x = 0, determine their nature.
Deduce that, at a stationary point, a second derivative of zero tells you nothing
7 about the nature of that stationary point.
x is large [in magnitude] and positive, e.g. when x takes values like 1,000,000 and
even larger positive numbers and we think of this as telling us what happens to
f (x) as x goes to infinity, denoted by x → ∞, which corresponds to places which
are far along the x-axis in the right-hand direction, or
x is large in magnitude and negative, e.g. when x is −1, 000, 000 and even larger [in
magnitude] negative numbers and we like to think of this as telling us what
happens as x goes to minus infinity, denoted by x → −∞, which corresponds to
places which are far along the x-axis in the left-hand direction.
In particular, when dealing with polynomials, such as quadratics like
ax2 + bx + c,
for constants a, b and c where a 6= 0 and cubics like
ax3 + bx2 + cx + d,
for constants a, b, c and d where a 6= 0 we can easily determine how these functions
behave for ‘large x’. The key to this is to isolate the highest power of x in your
polynomial (so that’s ax2 in the quadratic above and ax3 in the cubic) and then
consider that, if xn is this highest power, then:
118
7. Calculus III — Optimisation
If n is even, your polynomial will become arbitrarily large and positive as x goes to
both +∞ and −∞.
If n is odd, your polynomial will become arbitrarily large and positive as x goes to
+∞ and arbitrarily large [in magnitude] and negative as x goes to −∞.
Of course, multiplying xn by a constant a > 0 will not change this behaviour, but if we
multiply xn by a constant a < 0, then the sign of the large |x| behaviour above will
change.
Step 1: Find all the stationary points of the function, i.e. all the values of x that
satisfy the equation
f 0 (x) = 0,
and, if necessary, their corresponding values of y using y = f (x).
Step 2: Determine the nature of the stationary points that you have found by using
7
one of the following two methods.
Method A: The first-derivative test: If, as x increases through the stationary
point, we find that f 0 (x) changes from:
positive to positive, then it is a point of inflection,
positive to negative, then it is a local maximum,
negative to positive, then it is a local minimum,
negative to negative, then it is a point of inflection.
This test will always work.
Method B: The second-derivative test: If, at the stationary point, we find that
f 00 (x) is:
negative, then it is a local maximum,
positive, then it is a local minimum.
This test fails if we find that f 00 (x) = 0 at the stationary point and,
in such cases, the stationary point could be a local maximum, a
local minimum or a point of inflection!
Step 3: If necessary, we may need to identify any global maxima or global minima, i.e.
the largest or smallest values that the function can take over its domain. This
identification will involve some or all of the following:
Identifying the largest local maxima and the smallest local minima.
If the domain of the function is:
119
7. Calculus III — Optimisation
For Step 1, we find the stationary points of the function by solving f 0 (x) = 0 and so,
as
7 f 0 (x) = 3x2 − 6x,
we solve the equation
3x2 − 6x = 0 =⇒ 3x(x − 2) = 0 =⇒ x = 0 or x = 2,
to see that stationary points occur when x = 0 and x = 2. Indeed, at these values of
x, we see that the function itself takes the values
f (0) = (0)3 − 3(0)2 = 0,
when x = 0, and
f (2) = (2)3 − 3(2)2 = 8 − 12 = −4,
when x = 2.
For Step 2, we can determine the nature of these points by using the
second-derivative test. To do this, we see that
f 00 (x) = 6x − 6 = 6(x − 1),
so that,
120
7. Calculus III — Optimisation
Of course, we can also tackle similar problems for more complicated functions as you
can see if you try the next activity.
Activity 7.2 Find and classify the stationary points of the function g(x) = x2 ex .
And, lastly, if you want to have a look at an example where the second-derivative test
fails at one of the stationary points, try the next activity.
Activity 7.3 Find and classify the stationary points of the function h(x) = x3 e−x .
Example 7.4 Sketch the curve with equation y = f (x) where, as above,
f (x) = x3 − 3x2 .
The x-intercepts of the curve occur when y = 0 and so we have to solve the
equation
x3 − 3x2 = 0 =⇒ x2 (x − 3) = 0 =⇒ x = 0 or x = 3.
The stationary points of the curve were found above, i.e. we found that it had a
local maximum at the point (0, 0) and a local minimum at the point (2, −4).
With this information, we can begin to sketch the curve by roughly indicating these
‘key features’ on some axes as in Figure 7.4(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 7.4(b). In particular, notice that
in our sketch we have:
For x > 2, the function is increasing and, as we know that it must cut the x-axis
at x = 3, we expect it to increase in the manner shown as x goes to +∞.
For x < 0, the function is decreasing and as such it will continue decreasing,
going to more and more negative values of y, as x goes to −∞.
As we shall see, it is often useful to spend a moment thinking about what the curve
does away from its ‘key features’ so that we can accurately represent it in our sketch.
121
7. Calculus III — Optimisation
y y
y = f (x)
2 2
O x O x
3 3
−4 −4
Example 7.5
7 Sketch the curve y = f (x) where f (x) = 2x4 − 4x3 + 2x2 .
We find the key features of this curve according to the list given above, namely:
122
7. Calculus III — Optimisation
Classifying the stationary points: Let’s use the second-order derivative test here.
We can see that
f 00 (x) = 24x2 − 24x + 4,
and so, looking at the stationary points, we have
• f 00 (0) = 4 > 0 and so (0, 0) is a local minimum;
• f 00 (1/2) = −2 < 0 and so (1/2, 1/8) is a local maximum; and
• f 00 (1) = 4 > 0 and so (1, 0) is a local minimum.
Limiting behaviour: The term with the highest power of x in f (x) is 2x4 and so
f (x) → ∞ as x → ∞ and as x → −∞.
With this information, we begin to sketch this curve by roughly indicating these key
features on some axes as in Figure 7.5(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 7.5(b).
7
y y y = f (x)
1 1
8 8
O 1 O 1
2 1 x 2 1 x
(a) The key features (b) The sketch
Figure 7.5: Sketching the curve y = 2x4 − 4x3 + 2x2 in Example 7.5. (a) Using what we
have discovered about the key features of the curve, we can begin to see what it must
look like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.
123
7. Calculus III — Optimisation
In these cases, we are free to consider any value of x and we want to find the largest and
smallest values a function can attain, i.e. its global maximum and its global minimum
respectively, if they exist! In particular, we will need to consider the value of the
function at any stationary points and the behaviour of the function for large |x|.
For the curve sketched in Example 7.4, as sketched in Figure 7.4(b), we see that
although there is a local minimum at (2, −4) and a local maximum at (0, 0), there
is no global minimum as x can take arbitrarily large [in magnitude] negative values
as x goes to −∞ and there is no global maximum as x can take arbitrarily large
positive values as x goes to +∞.
For the curve sketched in Example 7.5, as sketched in Figure 7.5(b), we see that
although there is a local maximum at (1/2, 1/8), there is no global maximum as x
can take arbitrarily large positive values as x goes to −∞ or +∞. We also have
local minima at (0, 0) and (1, 0) and as these both give us the smallest value that
the function can take (i.e. y = 0), we see that both of these points give us a global
minimum.
7 In these cases, the values of x that we are free to consider are restricted to some interval
such as a ≤ x ≤ b and we want to find the largest and smallest values a function can
attain, i.e. its global maximum and its global minimum respectively, if they exist! In
particular, in these cases, we need to consider the value of the function at any
stationary points and its value at the endpoints of the interval.
For the curve sketched in Example 7.4 with x in the interval 1 ≤ x ≤ 3, as sketched
in Figure 7.6(a), we see there is a local minimum at (2, −4) and this is the global
minimum as y = −4 is the smallest value that the function can take. Also, we see
that there is a global maximum at the endpoint where x = 3 as y = f (3) = 0 is the
largest value that the function can take. (Of course, this global maximum is an
endpoint of the interval but not a stationary point of the function!)
For the curve sketched in Example 7.5 with x in the interval −1/4 ≤ x ≤ 5/4, as
sketched in Figure 7.6(b), we see that the local minima at (0, 0) and (1, 0) still both
give us the smallest value that the function can take (i.e. y = 0), and so both of
these points still give us a global minimum. Also, the local maximum at (1/2, 1/8)
is now a global maximum as y = 1/8 is now the largest value that the function can
take.
124
7. Calculus III — Optimisation
y
y = f (x)
1 2
O x
3 y y = f (x)
−2
1
8
−4
O
− 14 1
2 1 5
4
x
(a) (b)
Figure 7.6: (a) The function from Example 7.4 when we only consider values of x in the
interval 1 ≤ x ≤ 3 and (b) the function from Example 7.5 when we only consider values
of x in the interval −1/4 ≤ x ≤ 5/4. In particular, the dotted parts of these curves are
irrelevant here because they correspond to values of x which are not in the given intervals.
7
where R(q) is the revenue generated by selling this amount and C(q) is the cost of
producing this amount. Obviously, when doing this, the firm will want to sell an
amount q that will maximise its profit. Indeed, whereas the costs involved are
determined by factors intrinsic to the firm, the revenue generated is given by
R(q) = pq,
where p, the price per unit, is determined by the market the firm is selling in.
As an example, consider the case where the firm is a monopoly, i.e. it is the only supplier
of this product to the market. Indeed, as it is the only supplier and the amount it is
supplying is q, the price that the consumers will be willing to pay for this is given by
p = pD (q) where pD (q) is, as in Section 4.2.1, the inverse demand function of the market.
As such, in this case, the revenue generated by the sale of an amount q is given by
Thus, in the case of a monopoly, given the firm’s cost function and the inverse demand
function for the market, we should be able to determine the amount, q, that the firm
should be selling by finding the value of q that maximises the firm’s profit. Let’s look at
an example.
125
7. Calculus III — Optimisation
Example 7.6 Suppose that a firm is a monopoly with a cost function given by
pD (q) = 10 − q.
Here there is an implicit restriction on the values of q that we can consider because
we must have
q ≤ 10 as, otherwise, the price that the consumers pay would be negative.
So, we need to maximise the firm’s profit, i.e.
π(q) = qpD (q) − C(q) = q(10 − q) − (q 3 − 10q 2 + 25q + 10) = −q 3 + 9q 2 − 15q − 10,
and so, as the stationary points occur when π 0 (q) = 0, we solve the equation
to see that the stationary points occur when q = 1 and q = 5. We can then see that
which means that the maximum occurs at q = 5 because it yields the largest profit.
Thus, q = 5 will maximise the firm’s profit.
Activity 7.4 Sketch the profit function from Example 7.6 and verify that q = 5
does indeed maximise the profit. (Do not try to find the q-intercepts here.)
126
7. Calculus III — Optimisation
Learning outcomes
At the end of this unit, you should be able to:
Exercises
Exercise 7.1
Use differentiation to find the stationary point of the following quadratic functions and
determine whether it is a local maximum or a local minimum using (a) the first
derivative test and (b) the second derivative test.
Exercise 7.3
A firm has a monopoly on its market and so it can decide the price at which it sells its
product. If it sells the product for price p, then demand is given by the equation
q = 300 − 2p where q is the amount sold. The cost of producing q is given by the
function
q2
C(q) = 30 + 30q − ,
10
and the revenue is given by the function R(q) = pq.
i. Find the revenue function, R(q), in terms of q and hence find the profit function,
π(q), in terms of q.
ii. Calculate the value of q that will give the firm its maximum profit making sure
that you check that this value of q does indeed give you the maximum profit. What
is the maximum profit that the firm makes and what price, p, will provide this?
iii. If the firm can produce at most 120 units, what price will maximise the profit?
127
8. Calculus IV — Integration
Unit 8: Calculus IV
Integration
Overview
Our last topic in calculus is integration which can be thought of as the ‘opposite’ of
differentiation. In this unit we will see how to find indefinite integrals and explore the
relationship between definite integrals and the area under a curve.
Aims
In such cases, as we are integrating the function f (x) with respect to x, we call it the
integrand. And, similarly to what we saw before, we will see how to find such integrals
by using the rules of integration and some standard integrals. In particular, the standard
integrals will be closely related to our standard derivatives since the key idea behind our
method for finding integrals will be the idea that integration is the process that
‘undoes’ (or ‘reverses’) the process of differentiation, i.e. the process of indefinite
integration can be thought of as antidifferentiation and the resulting indefinite integral
can be thought of as an antiderivative.
128
8. Calculus IV — Integration
Consider the functions F (x) and f (x) where we know that f (x) is the derivative1 of
F (x), i.e.
dF
= f (x).
dx
Now, using the idea that integration ‘undoes’ differentiation, i.e. if we integrate f (x)
with respect to x we are looking for a function, F (x), whose derivative is f (x), we can
see that Z
f (x) dx must be, more or less, given by F (x).
In such cases, we say that F (x) is an antiderivative of f (x) as opposed to, say, the
indefinite integral.
However, you may wonder why we say that the function, F (x), that we found above is
‘an’, as opposed to ‘the’, antiderivative of f (x). The reason for this is that if, instead of
the function F (x) we had the function F (x) + c where c is a constant, then its
derivative would still be f (x), i.e.
d
F (x) + c = f (x),
dx
and so, using the reasoning above, we would find that
Z
f (x) dx can also, more or less, be given by F (x) + c,
where c is a constant. That is, F (x) + c is also an antiderivative of f (x) for this
constant c.
Example 8.1 Show that 4x2 and 4x2 + 1 are both antiderivatives of 8x.
8
2 2
4x is an antiderivative of 8x as we can differentiate 4x to get 8x. But, similarly, we
can see that 4x2 + 1 is also an antiderivative of 8x as we can differentiate 4x2 + 1 to
get 8x.
As such, because this works for any constant c we add to F (x), we say that the
indefinite integral gives us a whole family of antiderivatives which only differ by a
constant, i.e. the choice of c. In this way, we say that indefinite integration, i.e. the
process of finding Z
f (x) dx,
is antidifferentiation, i.e. it seeks all the functions F (x) + c that can be differentiated to
yield f (x) and, as such, every one of these functions will be an antiderivative of f (x).
Z
Example 8.2 What is 8x dx?
We saw in Example 8.1 that 4x2 is an antiderivative of 8x. This means that
Z
8x dx = 4x2 + c,
1
We say that it is the derivative because differentiation always yields exactly one answer.
129
8. Calculus IV — Integration
where c is an arbitrary (i.e. any) constant. Notice that this works because
differentiating 4x2 + c we get 8x.
Now that we have the idea, let’s see how we’re going to actually find the indefinite
integrals of the functions that commonly occur in this course.
We have seen how to find indefinite integrals using antiderivatives, but now we want to
explore a more convenient way of finding them. The key idea is that, similar to what we
saw in Unit 5 when we introduced derivatives, we can introduce standard integrals
which tell us how to integrate our basic functions and once we know how to integrate
these, the rules of integration will allow us to integrate combinations of these functions.
Standard integrals
Z
8x dx = 4x2 + c,
where c is an arbitrary constant. We now state some results that will allow us to find
the indefinite integrals of our other basic functions.
2
As we can add any constant to F (x) to account for the fact that F (x) + c, for any constant c, is also
an antiderivative.
130
8. Calculus IV — Integration
Constant powers of x
If k 6= −1 is a constant, we have
xk+1
Z
xk dx = + c,
k+1
where c is an arbitrary constant and this works because
k+1
(k + 1)xk
d x
+c = + 0 = xk .
dx k + 1 k+1
In particular, if k = 0, we have
Z Z
1 dx = x0 dx = x + c,
where we need the modulus sign in ln |x| as x may be negative but the logarithm
function is only defined for x > 0. This works because, if x > 0, we have |x| = x and so
d ln |x| d ln(x) 1
= = ,
dx dx x
8
whereas if x < 0, we have |x| = −x and so
d ln |x| d ln(−x) −1 1
= = = ,
dx dx −x x
if we use the chain rule.
However, there is no nice standard integral for ln(x) and so we won’t really discuss the
indefinite integral Z
ln(x) dx,
in this course. But, if you’re interested in what it is, see Exercise 8.2.
131
8. Calculus IV — Integration
Standard integrals
xk+1
Z
If k 6= −1 is a constant, then xk dx = + c.
k+1
Z Z
In particular, if k = 0, we have 1 dx = x0 dx = x + c.
Z
x−1 dx = ln |x| + c.
8
Z
ex dx = ex + c.
Z
sin(x) dx = − cos(x) + c.
Z
cos(x) dx = sin(x) + c.
We now look at how we can integrate some simple combinations of these functions.
132
8. Calculus IV — Integration
The constant multiple rule tells us how to integrate a constant multiple of a function
f (x) and it works as follows.
7
Z Z
dx = 7 x−1 dx = 7 ln |x| + c.
x
Z Z
−4 sin(x) dx = −4 sin(x) dx = −4 − cos(x) + c = 4 cos(x) + c.
So, in these cases, we just integrate as before and then multiply the answer by the
appropriate constant multiple.
8
In particular, observe that when using this rule, we integrate to find one of the
antiderivatives and then just add on an arbitrary constant, c, to take care of the
constant of integration.
Activity 8.1 Use antiderivatives to show that the constant multiple rule works.
The sum rule tells us how to integrate the sum of two functions f (x) and g(x) and it
works as follows.
Sum rule
Z Z Z
If f and g are functions, then [f (x) + g(x)] dx = f (x) dx + g(x) dx.
133
8. Calculus IV — Integration
Z Z Z
[sin(x) + cos(x)] dx = sin(x) dx + cos(x) dx = − cos(x) + sin(x) + c.
Z
1
Z Z
x −1
+ e dx = x dx + ex dx = ln |x| + ex + c.
x
So, in these cases, we just integrate as before and then add the answers together.
In particular, observe that when using this rule, we integrate to find the two
antiderivatives and then just add on an arbitrary constant, c, to take care of the
constant of integration.
Activity 8.2 Use antiderivatives to show that the sum rule works.
It should be clear that, taken together, our two rules of integration enable us to
integrate functions of the form kf (x) + lg(x) as follows.
If k and l are constants and f (x) and g(x) are functions then
Z Z Z
[kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx.
8
Example 8.5 Clearly, this means that:
x2
Z Z Z
[2x + 5] dx = 2 x dx + 5 1 dx = 2 + 5x + c = x2 + 5x + c.
2
Z Z Z
[sin(x) − cos(x)] dx = sin(x) dx − cos(x) dx = − cos(x) − sin(x) + c.
Z
3
Z Z
−1
− 4 e dx = 3 x dx − 4 ex dx = 3 ln |x| − 4 ex + c.
x
x
So, in these cases, we just integrate as before and then combine the answers in the
obvious way.
In particular, observe that when using this rule, we integrate to find the two
antiderivatives and then just add on an arbitrary constant, c, to take care of the
constant of integration.
Activity 8.3 Show that the constant multiple rule and the sum rule do indeed give
the linear combination rule.
Hence, use the linear combination rule to find the integral of the function
f (x) − g(x) in terms of the integrals of the functions f (x) and g(x).
134
8. Calculus IV — Integration
Activity 8.4 Use the rules above to find the following integrals.
Z
3
Z Z
x
(a) −3 cos(x) dx, (b) [ e + cos(x)] dx, (c) 3 sin(x) − dx.
x
There are, of course, more rules of integration which correspond to the product and
chain rules for differentiation, but these are beyond the scope of this course.
In Section 5.1 we saw that the derivative of a function, f (x), gave us the gradient of the
curve y = f (x). We now consider what the integral of a function, f (x), tells us about
the curve y = f (x) and see how this comes about through the idea of a definite integral.
Recall that an indefinite integral is so-called since, given a function, f (x), and one of its
antiderivatives, F (x), i.e. two functions related by the fact that 8
dF
= f (x),
dx
we have Z
f (x) dx = F (x) + c,
where c is an arbitrary constant. And, indeed, it is this arbitrary constant that makes
this integral indefinite as we do not know what c is. In a similar vein, instead of writing,
Z Z b
f (x) dx we could also write f (x) dx,
a
where the constants a and b are called the limits of integration.
In order to work out integrals that look like this we need to know what to do with these
limits and the procedure is:
Firstly: Deal with the integral. Integrating f (x), we take one of its
antiderivatives, F (x), and then write
Z b b
f (x) dx = F (x) .
a a
In particular, as we shall see below, observe that we no longer need a constant of
integration.
135
8. Calculus IV — Integration
i.e. the value of the integral depends only on the value of the antiderivative at the
points x = a and x = b. Thus, this is now a definite integral as it no longer involves an
arbitrary constant, c.
if c is a constant. Hence explain why we can omit the constant of integration when
evaluating definite integrals.
Another consequence of this discussion is that it allows us to see how to use our basic
rules of integration to evaluate definite integrals. For instance, if k and l are constants
and f (x) and g(x) are functions, then we can see that the linear combination rule gives
8 us Z b Z b Z b
[kf (x) + lg(x)] dx = k f (x) dx + l g(x) dx,
a a a
if we are using definite integrals.
Activity 8.6 Following what we saw in Section 8.1.2, write down the constant
multiple rule and the sum rule for definite integrals.
Activity 8.7 Using what we have seen so far, derive the linear combination rule for
definite integrals.
Now that we have the basic idea, let’s see how we can work out a definite integral.
Z 3
Example 8.6 Evaluate (x + 4) dx.
1
136
8. Calculus IV — Integration
Definite integrals are useful because they tell us about the area under a curve.
Specifically, if we have the definite integral
Z b
f (x) dx,
a
where f (x) ≥ 0 for all x such that a ≤ x ≤ b,4 we say that we have a non-negative
integrand and find that the value of the integral is the area of the region between the
curve y = f (x), the x-axis and the vertical lines x = a and x = b as illustrated in
Figure 8.1.
y
y = f (x)
8
O x
a b
Figure 8.1: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-negative integrand, i.e. f (x) ≥ 0
Rb
for a ≤ x ≤ b, and so the definite integral a f (x) dx gives us the area of this hatched
region.
Example 8.7 Find the area of the region between the line y = 4 − 2x, the x-axis
and the vertical lines x = 0 and x = 2 which is illustrated in Figure 8.2(a).
As this is just a right-angled triangle, the area is just ‘half times base times
height’, i.e.
1
area of triangle = × 2 × 4 = 4.
2
Thus, the area of the region is four.
4
At the moment we will just accept this caveat. The reason why we need f (x) to be non-negative for
values of x between the limits of integration will become clear very soon.
137
8. Calculus IV — Integration
As we have y = f (x) with f (x) = 4 − 2x, we can see from Figure 8.2(a) that
f (x) ≥ 0 between x = 0 and x = 2. So, as noted above, the area should be given
by
Z 2 2
2
(4 − 2x) dx = 4x − x = (4 × 2 − 22 ) − (4 × 0 − 02 ) = (8 − 4) − 0 = 4,
0 0
y y
11111
00000
4 4
00000
11111
00000
11111
3 y = 4 − 2x 3
00000
11111
00000
11111
2 2
00000
11111
y = 4 − x2
00000
11111
1 1
00000
11111
O
1 2
x
−2 −1
O
1 2
x
(a) (b)
8
Figure 8.2: Non-negative integrands. (a) For Example 8.7, the region between the line
y = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 2. (b) For Example 8.8, the
region between the parabola y = 4 − x2 , the x-axis and the vertical lines x = −1 and
x = 1.
However, generally, we won’t have a simple geometric way of finding the area under a
curve and so we will have to use integration.
Example 8.8 Find the area of the region between the parabola y = 4 − x2 , the
x-axis and the vertical lines x = −1 and x = 1 which is illustrated in Figure 8.2(b).
As we have y = f (x) with f (x) = 4 − x2 , we can see from Figure 8.2(b) that
f (x) ≥ 0 between x = −1 and x = 1. So, as noted above, the area should be given by
1 1
x3 (1)3 (−1)3
Z
2
(4 − x ) dx = 4x − = 4(1) − − 4(−1) −
−1 3 −1 3 3
11 11 22
= − − = ,
3 3 3
138
8. Calculus IV — Integration
and verify that this does indeed give the correct area.
We now start to consider what happens to the relationship between definite integrals
and areas when we can not guarantee that the integrand is non-negative. That is, what
happens if we do not have f (x) ≥ 0 for all x such that a ≤ x ≤ b? To simplify matters,
we will start by asking: What happens when this condition always fails? That is, what
happens when the integrand is non-positive as f (x) ≤ 0 for all x such that a ≤ x ≤ b?
Consider the area of the region bounded by the curve y = f (x), the x-axis and the
vertical lines x = a and x = b when we have a non-positive integrand, i.e. when f (x) ≤ 0
for a ≤ x ≤ b, as illustrated in Figure 8.3. Now, if we note that
we see that the function −f (x) does give us a non-negative integrand and so, following
what we saw above, the area, A, of the region in question is given by
A=
Z b
−f (x) dx = −
Z b
f (x) dx =⇒
Z b
f (x) dx = −A.
8
a a a
That is, for non-positive integrands, the definite integral gives us minus the area. Thus,
in the case of non-positive integrands, the area is given by the magnitude of the definite
integral. Let’s have a look at an example.
a b
O x
y = f (x)
Figure 8.3: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-positive integrand, i.e. f (x) ≤ 0 for
Rb
a ≤ x ≤ b, and so the definite integral a f (x) dx gives us minus the area of this hatched
region.
139
8. Calculus IV — Integration
Example 8.9 Find the area of the region between the line y = 4 − 2x, the x-axis
and the vertical lines x = 2 and x = 4 which is illustrated in Figure 8.4(a).
As this is just a right-angled triangle, the area is just ‘half times base times
height’, i.e.
1
area of triangle = × 2 × 4 = 4.
2
Thus, the area of the region is four.
As we have y = f (x) with f (x) = 4 − 2x, we can see from Figure 8.4(a) that
f (x) ≤ 0 between x = 2 and x = 4. So, looking at the definite integral we get,
Z 4 4
2
(4−2x) dx = 4x−x = (4×4−42 )−(4×2−22 ) = (16−16)−(8−4) = −4,
2 2
which is minus the answer we would expect. As such, we take the magnitude of
this answer and so the area is, again, four.
Consequently, if f (x) ≤ 0 between the vertical lines, the definite integral gives us
minus the area and so we take the magnitude of the definite integral to find the area.
y y
8
4
11111
00000
4
00000
11111
3 y = 4 − 2x
00000
11111
3 y = 4 − 2x
00000
11111
00000
11111
2 2
00000
11111
00000
11111
1 1
1111
0000
O x 000001111
11111O
0000 x
0000
1111 0000
1111
1 2 3 4 1 2 3 4
−1
0000
1111
0000
1111
−1
0000
1111
1
0
1 1111
−2
1 1111
0000
111111111
000000000
0
−4
01 1111
0000
111111111
000000000
−4
(a) (b)
Figure 8.4: Negative integrands and their relationship to area. The region between the
line y = 4 − 2x, the x-axis and the vertical lines (a) x = 2 and x = 4 for Example 8.9,
and (b) x = 0 and x = 4 for Example 8.10.
140
8. Calculus IV — Integration
We now consider what happens to the relationship between definite integrals and areas
when we can not guarantee that the integrand is non-positive or non-negative. That is,
what happens if f (x) ≥ 0 for some x such that a ≤ x ≤ b but not others? Let’s start by
considering the simple case where we have an integrand which is neither non-positive
nor non-negative because there is some number c such that a ≤ c ≤ b where
the area, say A1 , of the hatched region between the vertical lines x = a and x = c is
given by the definite integral Z c
f (x) dx,
a
Z c
i.e. A1 = f (x) dx, and
a
the area, say A2 , of the hatched region between the vertical lines x = c and x = b is
given by minus the definite integral
Z b
f (x) dx,
c
b
8
Z
i.e. A2 = − f (x) dx.
c
As such, the area, say A, of the hatched region between the lines x = a and x = b is
now given by Z c Z b
A = A1 + A2 = f (x) dx + f (x) dx .
a c
In particular, note that in this case we will need to find two different definite integrals
to find the area and not one like we did in the earlier cases!
y
y = f (x)
b
O x
a c
Figure 8.5: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-negative integrand for a ≤ x ≤ c
and a non-positive integrand for c ≤ x ≤ b, we need to find two different definite integrals
to find the area of the region.
141
8. Calculus IV — Integration
Thus, for general integrands, the procedure for finding the area of the region bounded
by the curve y = f (x), the x-axis and the vertical lines x = a and x = b is as follows:
Firstly, determine all the points where the curve crosses the x-axis with
x-coordinates between x = a and x = b.
Secondly, use these points to determine (possibly via a sketch) where the curve is
positive and where the curve is negative.
Thirdly, use this information to determine the areas by finding the appropriate
definite integrals (bearing in mind that the integrands will now be either
non-negative or non-positive).
Example 8.10 Find the area of the region between the line y = 4 − 2x, the x-axis
and the vertical lines x = 0 and x = 4 which is illustrated in Figure 8.4(b).
As indicated in Figure 8.4(b), the line y = 4 − 2x crosses the x-axis when x = 2 and
this lies between x = 0 and x = 4. We can also see that the function is non-negative
for 0 ≤ x ≤ 2 and non-positive for 2 ≤ x ≤ 4. As such, using our earlier workings in
Examples 8.7 and 8.9, we split the total region into two sub-regions to see that:
which gives us 4 as we saw in Example 8.7. Thus, the area is four here as we
have a non-negative integrand.
which gives us −4 as we saw in Example 8.9. Thus, the area is four here as we
have a non-positive integrand.
Consequently, the total area is eight.
and, as this is zero, it most definitely is not giving us the area we seek!
142
8. Calculus IV — Integration
Activity 8.9 Verify that the answer to the previous example is correct by finding
the areas of the triangles involved.
Example 8.11 Find the area of the region between the parabola y = 1 − x2 , the
x-axis and the vertical lines x = −2 and x = 2 which is illustrated in Figure 8.6.
143
8. Calculus IV — Integration
1 y = 1 − x2
−2 2
O x
−1 1
−1
−2
−3
Figure 8.6: Negative integrands and their relationship to area (continued). For
Example 8.11, the region between the parabola y = 1 − x2 , the x-axis and the vertical
lines x = −2 and x = 2.
Learning outcomes
At the end of this unit, you should be able to:
find simple indefinite integrals by using standard integrals and the rules of
integration;
Exercises
Exercise 8.1
Find the following indefinite integrals and use differentiation to verify your answer.
Z Z
i. −17 dx; vi. 5 ex dx;
Z Z
ii. 27x dx; vii. (3x2 − 5x + 7) dx;
Z Z
3
iii. 2x dx; viii. (3x10 + 8x5 + 4 ex ) dx;
Z √
√
Z
1
iv. 5 x dx; ix. 3 x3 − 2x− 2 dx;
Z
4 2 3
Z
v. dx; x. 3
+ dx.
x x 2x
144
8. Calculus IV — Integration
Exercise 8.2
Differentiate the function F (x) = x ln(x) − x. (You will need to use the product rule!)
Hence find Z
ln(x) dx,
Exercise 8.3
Use the ‘adding powers’ power law and the constant multiple rule to show that
Z
ex+k dx = ex+k + c,
Exercise 8.4
Find the indefinite integral Z
(2x − 1)2 dx,
Exercise 8.5
Evaluate the following definite integrals.
Z 15 4 √
Z
i. 2 dx; v. 3 x dx;
7 1 8
Z 1 Z 2
ii. x5 dx; vi. (4x3 + 3) dx;
0 −1
8 9 √
1
Z Z
iii. dx; vii. x x dx;
2 2x 0
Z 0 Z π
x
iv. 2 e dx; viii. sin(x) dx.
−1 0
Exercise 8.6
Find the areas between the following curves, the x-axis and the vertical lines x = 1 and
x = 3. (You may find it useful to sketch the curve in each case.)
i. y = x2 − x − 2, ii. y = x2 − 3.
145
9. Financial Mathematics I — Compound interest and its uses
Overview
In this unit we look at some of the basic ideas behind financial mathematics. The key
concept here is compound interest and how this adds value to our savings. We will also
look at different compounding intervals and see how we can use Annual Percentage
Rates (or APRs) to compare investments with different interest rates and compounding
intervals. Lastly, we will see how these ideas also allow us to model the depreciation of
assets over time.
Aims
9 9.1 Interest
Suppose you deposit a certain amount, called the principal, in a savings account that
offers you a certain rate of interest. Let’s say, for example, that you want to invest £500
in a savings account which pays 12% interest annually (i.e. every year). This means
that, after a year, you will receive 12% of £500 in interest. How much will this be?
Well, we recall that
12
12% = = 0.12,
100
and so we can see that 12% of £500 is given by
That is, you will accrue (or receive) £60 in interest from investing this principal in this
account for a year and the amount in your account, called the balance, will now be £560.
If we were to leave this money in the account for a second year, it then becomes
necessary to know how the interest is being calculated and there are two types of
interest that we may wish to consider:
146
9. Financial Mathematics I — Compound interest and its uses
Simple interest is where the bank always pays you interest on your principal.
That is, even though the balance is £560 at the end of the first year, you still only
accrue 12% of £500 in interest. As such, under simple interest, your balance at the
end of the second year will be £620 as you will have your original deposit of £500
plus two lots of £60 in interest.
Compound interest is where the bank always pays you interest on your balance.
That is, at the end of the second year you will accrue
in interest. As such, under compound interest, your balance at the end of the
second year will be £627.20 as you will now have an additional £67.20 to add to
your previous balance of £560.
Notice, in particular, that compound interest gives us a higher balance at the end of the
second year than simple interest because we also get interest on the interest from the
previous year. That is, at the end of the first year our balance is
and, after the second year, we get 12% interest on both of these amounts which gives us
an additional £60 from the principal and an additional
from the first year’s interest yielding, as expected from above, a total of £67.20 in
interest. In this course, we will mainly focus on compound interest as that is most
widely used, but we will occasionally mention simple interest in the activities or when it
provides a useful application.
Of course, this process of calculating simple or compound interest can continue for any
number of years and so, instead of working these things out year-by-year we want to be 9
able to work with a formula that will tell us the balance of the account after any given
number of years. In particular, to find these formulae, we need to generalise our
discussion so that we are now dealing with the following variables.
1
In particular, even though interest rates will usually be given as a percentage, we want to work
with the decimal. That is, when speaking generally, we will specify an interest rate of 100r% as this
corresponds to the decimal r. (For example, we had 12% above and, as 100(0.12) = 12, this gives us the
decimal r = 0.12 that we used in our calculations.)
147
9. Financial Mathematics I — Compound interest and its uses
Activity 9.1 Suppose that you invest P in an account that pays simple interest at
a rate of 100r% per year. Explain why the balance of this account will be P (1 + nr)
after n years.
After one year, the balance of the account will be P from the initial investment plus
P r from the interest accrued on this investment, i.e. after one year we will have
P + P r = P (1 + r).
After two years, the balance of the account will be P (1 + r) from the balance at the
end of the first year plus P (1 + r)r from the interest accrued on this balance, i.e.
after two years we will have
P (1 + r) + P (1 + r)r = P (1 + r)(1 + r) = P (1 + r)2 .
After three years, the balance of the account will be P (1 + r)2 from the balance at
the end of the second year plus P (1 + r)2 r from the interest accrued on this
balance, i.e. after three years we will have
P (1 + r)2 + P (1 + r)2 r = P (1 + r)2 (1 + r) = P (1 + r)3 .
148
9. Financial Mathematics I — Compound interest and its uses
After n years, the balance of the account will be P (1 + r)n−1 from the balance at
the end of the (n − 1)th year plus P (1 + r)n−1 r from the interest accrued on this
balance, i.e. after n years we will have
A principal, P , invested in an account that pays 100r% interest per year under annual
compounding will give a balance of
P (1 + r)n ,
after n years.
500 × (1.03)4n ,
since n years is the same as 4n quarters and in each quarter we compound at the
quarterly interest rate. Thus, thinking of this more generally, we get the following result.
149
9. Financial Mathematics I — Compound interest and its uses
A principal, P , invested in an account that pays 100r% interest per year under
quarterly compounding will give a balance of
r 4n
P 1+ ,
4
after n years. Note that r/4 is the quarterly interest rate and 4n is the number of
quarters in n years.
Activity 9.2 Explain why quarterly compounded interest works in this way.
There are, of course, other periods over which compounding can occur, for example:
monthly compounding uses a monthly rate of r/12 and, after the first year, the
balance is r 12
P 1+ ,
12
due to the twelve monthly compoundings. After n years this yields a balance of
r 12n
P 1+ ,
12
as there are 12n months in n years.
weekly compounding uses a weekly rate of r/52 and, after the first year, the balance
is r 52
P 1+ ,
52
9 due to the fifty-two weekly compoundings. After n years this yields a balance of
r 52n
P 1+ ,
52
as there are 52n weeks in n years.
Activity 9.3 Explain why monthly and weekly compounded interest work in this
way.
Activity 9.4 Say I invest £500 at 12% interest per year. What is the balance after
one year if the interest is compounded annually? Quarterly? Monthly? Weekly?
What do you notice about these numbers?
In each case, what is the balance after three years?
In each case, how much interest will you have received after six months?
26
[Note that, to 5dp, (1.01)6 = 1.06152 and 1303
1300
= 1.06176.]
150
9. Financial Mathematics I — Compound interest and its uses
Thinking about these examples more generally leads us to the following general result
for compounding over a given interval.
A principal, P , in an account that pays 100r% interest per year where interest is
compounded over m equal intervals in each year will give a balance of
r mn
P 1+ ,
m
after n years. Note that r/m is the interest rate for each compounding and mn is
the number of compoundings in n years.
And, using this result, we can easily recover all of the compounding results that we have
seen so far.
Activity 9.5 Explain why the general result for compound interest over a given
interval works.
Daily compounded interest is always calculated on the assumption that a year has 365
days. But, as we know, in reality, every four years we have a leap year that has 366
days. In the next activity, just for fun, we consider how this would affect the calculation
of daily compounded interest if we were to take this into account.
Activity 9.6 Say I invest £1, 000, 000 at 12% interest per year at the beginning of
a common (i.e. non-leap) year. If the interest is compounded daily, what is the
balance at the end of the year?
Say I invest £1, 000, 000 at 12% interest per year at the beginning of a leap year. If
the interest is compounded daily, what is the balance at the end of the year? 9
What is the balance after four years?
9128 365 3051 366
[Note that, to 8dp, 9125 = 1.12747462 and 3050
= 1.12747468.]
151
9. Financial Mathematics I — Compound interest and its uses
Thus we can see that if the bank was to compound continuously (or, speaking loosely, if
the value of m was ‘infinitely large’ so that the interest was effectively being
compounded at ‘every instant’), the balance of the account at the end of
A principal, P , in an account that pays 100r% interest per year under continuous
compounding will give a balance of
P enr ,
Clearly, this means that if I invest £500 at 12% interest per year with continuous
compounding, then given that e0.12 = 1.127497 to 6dp, we can see that the balance of
the account after one year will be given by,
or £563.75 (to the nearest penny) which is, we note, more than we would get from any
finite number of compoundings.
152
9. Financial Mathematics I — Compound interest and its uses
Activity 9.7 If I invest £500 at 12% interest per year with continuous
compounding, what will be the balance of the account after (i) two years, (ii) six
years and (iii) n years?
[Note that, to 6dp, e0.24 = 1.271249.]
153
9. Financial Mathematics I — Compound interest and its uses
to 4dp if we use the fact that (6/5)1/5 = 1.0371 to 4dp. This means that the interest
rate needs to be at least 3.8% (to 1dp) if we want to ensure that we meet, or rather just
exceed, our target.
154
9. Financial Mathematics I — Compound interest and its uses
given that we are comparing the investments over one year. Then, if we were given the
relevant information, say that
365
0.1
1+ = 1.1052
365
to 4dp, we would find that
1 + r∗ = 1.1052 =⇒ r∗ = 0.1052 = 10.52%,
is the APR. And, similarly, in the case of the account where we have an interest rate of
10.1% per year with quarterly compounding, investing £1 would give us
4
0.101
return from account −→ 1 + = 1+r∗ ←− return from annual compounding,
4
given that we are comparing the investments over one year. Then, if we were given the
relevant information, say that
4
0.101
1+ = 1.1048
4
to 4dp, we would find that
1 + r∗ = 1.1048 =⇒ r∗ = 0.1048 = 10.48%,
is the APR. Thus, as we get a better return (i.e. a higher APR) from the account where
I have an interest rate of 10% per year with daily compounding, we should opt for this
one. In particular, notice that here, the higher number of compoundings
overcompensates for the fact that this account has a slightly lower interest rate! To
summarise then, we have the following result.
An account that pays 100r% interest per year where interest is compounded over m
equal intervals in each year has an APR of
9
r m
1+ − 1,
m
as a decimal.
If the interest is continuously compounded, then the APR is
er − 1,
Activity 9.8 You want to invest some money for a year and are given the choice
between two accounts that use monthly compounding. Given that one account offers
an interest rate of 5% per year and the other account offers an interest rate of 6%
per year for the first three months and 4% per year for the rest of the year, find their
APRs and decide which gives the best return.
12 201 3 301 9
[Note that, to 5dp, 241
240
= 1.05116, 200
= 1.01508 and 300
= 1.03040.]
155
9. Financial Mathematics I — Compound interest and its uses
9.3 Depreciation
Often, when you buy an asset, e.g. a car, its value depreciates (or goes down) over time.
For example, if you buy a car for £10, 000 and you know that a car depreciates at a
rate of 5% per year, its value after one year is given by
10, 000 − 5% of 10, 000 = 10, 000 − 10, 000 × 0.05 = 10, 000(1 − 0.05) = 10, 000 × 0.95,
which is £9, 500. To find the value after two years we follow a similar procedure to get
which is £9, 025. Clearly, this generalises, so that after n years the value of the car is
given by 10, 000(0.95)n pounds.
Thus, the idea behind depreciation is that the rate of depreciation acts like a compound
interest rate, but whereas with compound interest we add the effect of the interest rate,
when we look at depreciation we need to subtract to get the effect of the rate of
depreciation as the value is decreasing over time. And, as we saw above, this means that
we can use the same formulae, but now the rate of depreciation which is the positive
number, r, needs to be replaced by the negative number −r. As such, we have the
following result.
Compound depreciation
If the initial value of an asset is V and it depreciates at a rate of 100r% per year
where depreciation is compounded over m equal intervals in each year, then its value
will be r mn
V 1− ,
m
after n years. Here r/m is the rate of depreciation for each compounding and mn is
9 the number of compoundings in n years.
If this asset depreciates continuously, its value after n years is
V e−nr ,
Example 9.1 A computer is bought for £1, 000 and its value depreciates
continuously at a rate of 40% per year. How much will the computer be worth after
six months?
As the value of the computer is depreciating continuously at a rate of 40% for six
months, which is half of a year, its value after that time is given by
where we have used the fact that e−0.2 = 0.81873 to 5dp. That is, the computer will
be worth £818.73 after six months.
156
9. Financial Mathematics I — Compound interest and its uses
Learning outcomes
At the end of this unit, you should be able to:
Exercises
Exercise 9.1
You invest P pounds in a savings account that pays 5% interest per year using annual
compounding.
(i) Write down, in terms of P , the amount that will be in the account after one, two
and three years.
(ii) If, after two years, the amount in the account is £1, 764, how much did you
initially invest?
Exercise 9.2
Find the value of a principal sum of £10, 000 invested at an interest rate of 12% per
year for three years when the interest is compounded (i) annually, (ii) quarterly, (iii)
monthly, and (iv) continuously.
What is the APR of each of these investments?
[Note that, to 7dp, (1.03)4 = 1.1255088, (1.01)12 = 1.1268250 and e0.12 = 1.1274969.]
9
Exercise 9.3
Two investments are made and it is given that the principal of one is 80% of the other.
If the smaller principal is put into an account where interest is paid at 5% per year
using continuous compounding and the larger principal is put into an account where
interest is paid at 2% per year using continuous compounding, how long will it take for
the two accounts to have the same balance?
[Note that, to 5dp, ln(0.8) = −0.22314.]
Exercise 9.4
A car is worth £20, 000 brand-new, but its value depreciates continuously at a rate of
20% per year.
157
10. Financial Mathematics II — Applications of series
Overview
In this final Mathematics unit, we look at some more complicated ideas in financial
mathematics. The key concept here is a geometric series and how this allows us to deal
with regular savings plans and annuities. We will also see how to compare the value of
different investment strategies using the idea of present values.
Aims
To see how geometric series can be used to model certain kinds of investment.
2, 5, 8, 11, . . .
where here, the list is ordered because we consider 2 to be the first term, 5 to be the
10 second term and so on. Indeed, we could think of this sequence of numbers as what we
get when we start with two and then add three to the previous term to get each
successive term. Indeed, as we could continue to do this indefinitely, we use the ‘. . .’ to
indicate that this list of numbers goes on forever. In this course, we will be interested in
two special types of sequence and what we get when we add up some (or all) of the
terms in such a sequence.
a, a + d, a + 2d, a + 3d, . . .
158
10. Financial Mathematics II — Applications of series
which is generated by adding the number d to each term to get the next term. Observe
that we call d the common difference because we move from one term of the sequence to
the next by adding d.
Of course, we have seen this kind of thing before since taking the first term to be P and
the common difference to be P r, we get the arithmetic sequence
P, P + P r, P + 2P r, P + 3P r, . . .
which is, for principal P and an interest rate of 100r% per year, the initial balance
followed by the balance after one year, two years, three years, . . . under simple interest.
Of course, this means that the balance after n years will be given by P + nP r, or
P (1 + nr), as we saw in Activity 9.1.
a, a + d, a + 2d, a + 3d, . . .
a + (a + d) + (a + 2d).
S = a + (a + d) + (a + 2d),
S = (a + 2d) + (a + d) + a,
so that, adding the corresponding terms in these two expressions for S together we get
a, a + d, a + 2d, a + 3d, . . .
159
10. Financial Mathematics II — Applications of series
we can find a formula for the sum of any arithmetic series. If we do this, we get the
following result.
a + (a + d) + (a + 2d) + · · · + (a + [n − 1]d),
where a is the first term of the series, d is the common difference and n is the number
of terms is
n
(2a + [n − 1]d).
2
2a + [n − 1]d as a + (a + [n − 1]d),
and so we have
a + (a + [n − 1]d)
n
a+(a+d)+(a+2d)+· · ·+(a+[n−1]d) = a+(a+[n−1]d) = n .
2 2
So, noting that n is the number of terms, a is the first term of the series and a + [n − 1]d
is the last term in the series, this means that the sum of the arithmetic series
can be thought of as ‘the number of terms multiplied by the average of the first and last
terms of the series’.
Here the first term is 1 and we have to add three to get each successive term and so
that is the common difference. So, as there are five terms in the series, we can use
the formula to see that
10 1 + 4 + 7 + 10 + 13 =
5
5
2(1) + [5 − 1](3) = × 14 = 35,
2 2
Activity 10.1 Find the sum of the whole numbers from 1 to 100.
If n is a whole number, what is the sum of the whole numbers from 1 to n?
Activity 10.2 Suppose that you have an eccentric aunt who, starting in 2000, gives
you a cash gift every year and the amount you get (in pounds) is given by the year.
(So, in 2000 you get a gift of £2, 000 and in 2001 you get a gift of £2, 001, etc.) If
160
10. Financial Mathematics II — Applications of series
you save all of these gifts in your money box, how much will you have after you have
received the gift in 2013?
How much will you have in your money box after you have received n of these gifts?
which is generated by multiplying each term by the common ratio to get the next term.
Observe that we call r the common ratio because we move from one term of the
sequence to the next by multiplying by r.
Of course, we have seen this kind of thing before since taking the first term to be P and
the common ratio to be 1 + r, we get the sequence
P, P (1 + r), P (1 + r)2 , P (1 + r)3 , . . .
which is, for a principal P and a 100r% per year interest rate, the initial balance
followed by the balance after one year, two years, three years, . . . under annual
compounding.
A geometric series is what we get when we ‘add up’ a certain number of successive
terms from a geometric sequence. For instance, if we were to add up the first three
terms of the geometric sequence
a, ar, ar2 , ar3 , ar4 , . . .
10
we would want to find the sum of the geometric series
a + ar + ar2 .
We can easily find this sum, let’s call it S, by writing
S = a + ar + ar2 ,
and then multiplying this whole expression by the common ratio, r, to get
rS = ar + ar2 + ar3 ,
so that, subtracting the second expression from the first we get
S − rS = a − ar3 ,
161
10. Financial Mathematics II — Applications of series
as all the intermediate terms cancel. Taking out the common factor of S on the
left-hand side and a on the right-hand side then gives
S(1 − r) = a(1 − r3 ).
we can find a formula for the sum of any geometric series with a finite number of terms.
If we do this we get the following result.
where a is the first term, r is the common ratio and n is the number of terms is
1 − rn
a ,
1−r
provided that r 6= 1. If r = 1, the sum is an instead.
10
This geometric series has six terms where the first term is 1 and the common ratio is
2. Thus, using the formula, we have
1 − 26 1 − 26
1 + 2 + 22 + 23 + 24 + 25 = 1 × = = 26 − 1 = 63,
1−2 −1
as the sum of this series. This can be verified by adding up the terms on your
calculator.
1
So that we are not dividing by zero!
162
10. Financial Mathematics II — Applications of series
We notice that each successive term of this series is multiplied by two and so this
geometric series, which has five terms, can be written as
3 + 6 + 12 + 24 + 48 = 3 + 3 × 2 + 3 × 22 + 3 × 23 + 3 × 24 ,
which means that the first term is 3 and the common ratio is 2. Thus, using the
formula, we have
1 − 25 −31
3 + 6 + 12 + 24 + 48 = 3 × =3× = 93,
1−2 −1
as the sum of this series. This can be verified by adding up the terms on your
calculator.
1 1 1 1
Example 10.4 Sum the geometric series − + − .
2 4 8 16
1
We note that this geometric series has four terms where the first term is 2
and, as
we can write it as
2 3
1 1 1 1 1 1 1 1 1 1 1
− + − = + − + − + − ,
2 4 8 16 2 2 2 2 2 2 2
we can see that the common ratio is − 12 . Thus, using the formula, we have
1 1 − (− 21 )4 1 1− 1
16 1 1 1 15 5
× 1 = × 3 = 1− = × = ,
2 1 − (− 2 ) 2 2
3 16 3 16 16
as the sum of this series. This can be verified by adding up the terms on your
calculator.
Sometimes, we can make sense of what happens when we have an infinite number of
terms in our geometric series. In such cases, we want to find the value of
a + ar + ar2 + ar3 + · · · ,
and here, the absence of a last term in the series is supposed to indicate that it ‘goes on
forever’ or that it has an infinite number of terms. To see what the sum of such an
infinite geometric series would be, we recall that if we just took the first n terms of this
series, the sum would be given by
1 − rn
a ,
1−r
and we want to see what happens to this formula if we let n go off to infinity.
163
10. Financial Mathematics II — Applications of series
In particular, if |r| < 1, then rn gets smaller as n gets larger. This means that, as n goes
to infinity, rn will go to zero, and so our formula will give us
1−0 a
= a ,
1−r 1−r
as the sum of our geometric series with an infinite number of terms.
However, if |r| > 1, then rn gets larger [in magnitude] as n gets larger. This means that,
as n goes to infinity, rn will go to infinity too and so we will not be able to make any
sense of the formula. In such cases, we say that the sum of the infinite geometric series
‘does not exist’.2
To summarise, we have the following formula which allows us to sum an infinite
geometric series when |r| < 1.
a + ar + ar2 + ar3 + · · · ,
1 1 1 1
Example 10.5 Sum the infinite geometric series + + + + ···.
2 4 8 16
This geometric series has an infinite number of terms, the first term is 21 and we can
write it as 2 3
1 1 1 1 1 1 1
+ + + + ··· ,
2 2 2 2 2 2 2
so the common ratio is 1 . As 1 < 1, we can use the formula to see that
2 2
10 1 1
2 2
= =1
1 − ( 12 ) 1
2
1 1 1 1
Example 10.6 Sum the infinite geometric series − + − + ···.
2 4 8 16
1
This geometric series has an infinite number of terms, the first term is 2
and we can
write it as 2 3
1 1 1 1 1 1 1
+ − + − + − + ··· ,
2 2 2 2 2 2 2
2
For reasons we won’t go into here, the sum of an infinite geometric series doesn’t exist when |r| = 1
either. The r = 1 case is obvious, but the r = −1 case is harder to understand.
164
10. Financial Mathematics II — Applications of series
so the common ratio is − 12 . As − 21 < 1, we can use the formula to see that
1 1
2 2 1
= =
1− (− 12 ) 3
2
3
At the end of the first year, the balance of the account is 600(1.12).
At the beginning of the second year, another £600 is added to the account making
the balance 600 + 600(1.12) and so, at the end of the second year, the balance is
[600 + 600(1.12)](1.12) = 600(1.12) + 600(1.12)2 . 10
At the beginning of the third year, another £600 is added to the account making
the balance 600 + 600(1.12) + 600(1.12)2 and so, at the end of the third year, the
balance is
[600 + 600(1.12) + 600(1.12)2 ](1.12) = 600(1.12) + 600(1.12)2 + 600(1.12)3 .
Now, if we add another £600 at the beginning of the fourth year, this means that the
balance of the account is now
600 + 600(1.12) + 600(1.12)2 + 600(1.12)3 .
This is a geometric series of four terms with a first term of 600 and a common ratio 1.12
which means that, using the formula above, the balance we seek is given by
1 − (1.12)4 1 − (1.12)4
600 × = 600 × = 5, 000[(1.12)4 − 1] = 2, 867.595,
1 − 1.12 −0.12
165
10. Financial Mathematics II — Applications of series
or £2, 867.60 (to the nearest penny) if we use the fact that, to 6dp, 1.124 = 1.573519.
Similarly, if we wanted to follow this investment scheme for a longer period of time, for
example if we wanted to calculate the balance at the beginning of the twenty-sixth year
(just after that year’s £600 has been invested), we would need to sum the geometric
series
600 + 600(1.12) + 600(1.12)2 + 600(1.12)3 + · · · + 600(1.12)25 ,
which has 26 terms. So, again using our formula, the balance is given by
1 − (1.12)26
600 × = 5, 000[(1.12)26 − 1] = 90, 200.36,
1 − 1.12
pounds (to the nearest penny) if we use the fact that, to 6dp, 1.1226 = 19.040072.
Indeed, more generally, we can see that if we wanted to calculate the balance at the
beginning of the nth year (just after that year’s £600 has been invested), we would
need to sum the geometric series
which has n terms. And so, using the formula again, we see that
1 − (1.12)n
600 × = 5, 000[(1.12)n − 1],
1 − 1.12
is the balance of the account at the beginning of the nth year.
10.2.2 Annuities
If we invest a certain amount of money, P , in a bank account that pays annually
compounded interest at a rate of 100r% per year, we may want to set up an annuity.
This is where, at the end of each of the next n years, we receive a payment of I from
the account. The question then is, under these circumstances, how much can we afford
to withdraw each year? If we withdraw too much or for too long a time, the money in
the account will run out. But, if we withdraw too little or for too short a time, we have
put too much money in the account. How can we model an annuity so that we can be
sure that we are investing in a wise and sustainable way?
10 For example, suppose that we decide to invest £10, 000 in an account which pays
annually compounded interest at a rate of 5% per year in order to set up an annuity
that will pay £I at the end of each year for the next ten years. What, we may ask, is
the balance of the account after this annuity’s last payment?
Well, consider that the balance in the account can be modelled as follows. Given an
initial investment of £10, 000, we can see that:
At the end of the first year, the balance in the account is 10, 000(1.05) and so, if we
make our first withdrawal of I, the balance is now 10, 000(1.05) − I.
At the end of the second year, the balance of the account is
166
10. Financial Mathematics II — Applications of series
and, in particular, if we consider the series in the big square brackets we see that we
have
1 + 1.05 + · · · + 1.058 + 1.059 ,
which is a geometric series with first term one, common ratio 1.05 and ten terms. So,
using the formula above, we see that this gives us
1 − 1.0510 1 − 1.0510
1 = = −20(1 − 1.0510 ),
1 − 1.05 −0.05
and so the balance we seek is given by
B = 10, 000(1.05) − I − 20(1 − 1.05 ) = 10, 000(1.05)10 − 20I[1.0510 − 1].
10 10
We can now ask, with this annuity, how big can the withdrawals be? The key to
answering this question is to note that if our annual withdrawal, I, is too big, then at
some point before this ten year period has elapsed, the account will run out of money
and the balance will become negative. That is, if I is too big, the bank will stop
allowing us to make the withdrawals and the annuity will fail to achieve its purpose. So,
we need to see what values of I give us a balance, B, which is still non-negative after
ten years. But, if we need B ≥ 0, this means that we must have 10
10, 000(1.05)10 − 20I[1.0510 − 1] ≥ 0,
and this can be rearranged to give us
10, 000(1.05)10
10, 000(1.05)10 ≥ 20I[1.0510 − 1] =⇒ ≥ I,
20[1.0510 − 1]
as 1.0510 − 1 > 0. This means that we have
500(1.05)10
I≤ = 1, 295.0453,
1.0510 − 1
if we use the fact that, to 6dp, 1.0510 = 1.628895. That is, the maximum withdrawal we
can make each year is £1, 295.04.
167
10. Financial Mathematics II — Applications of series
Activity 10.5 Assuming that we make this maximum withdrawal at the end of
each year, what is the balance of the account after the last of these withdrawals?
Activity 10.6 Alternatively, suppose that we want this annuity to pay out £1, 500
at the end of each year. How many of these withdrawals will we be able to make?
[Note that, to 2dp, log1.05 23 = 8.31.]
Example 10.7 Suppose that you have to choose between a gift of £20, 000 in ten
years’ time or a gift of £30, 000 in twenty years’ time. Which should you choose
given that an interest rate of 10% per year compounded annually is available to you?
Given that an interest rate of 10% per year compounded annually is available to
you, the present value of £20, 000 in ten years’ time is
20, 000 20, 000
10 = = 7, 710.8672
10 (1.1)10
1 + 100
168
10. Financial Mathematics II — Applications of series
or £7, 710.87 (to the nearest penny) if we use the fact that, to 6dp, 1.110 = 2.593742
whereas the present value of £30, 000 in twenty years’ time is
30, 000 30, 000
=
10 20
= 4, 459.3088
1 + 100 (1.1)20
or £4, 459.31 (to the nearest penny) if we use the fact that, to 6dp, 1.120 = 6.727500.
Thus, you should choose the £20, 000 in ten years’ time as it is worth more to you
now.3
Present values can also be used to see what an annuity is worth as we can find the
present value of each payment and hence the present value of the annuity as a whole.
Let’s look at an example.
Example 10.8 You win a competition and you can claim a prize of £10, 000 now
or an annuity which pays £1, 000 at the end of each year for ten years. Which
should you choose given that an interest rate of 5% per year compounded annually is
available to you?
The present value of the first annuity payment is 1, 000/1.05, the second is
1, 000/1.052 , and so on until the tenth which has a present value of 1, 000/1.0510 .
Thus, the present value of all the annuity payments is
1, 000 1, 000 1, 000
+ + · · · + .
1.05 1.052 1.0510
This is a geometric series with a first term of 1,000/1.05, a common ratio of 1/1.05
and ten terms which means that, using the formula for the sum of a geometric series,
we see that the present value of this annuity is
10
1
1− 1− 1
1, 000 1.05 1, 000 1.0510
=
1.05 1 1.05 0.05
1−
1.05 1.05
1, 000 1
= 1−
0.05 1.0510
10
1
= 20, 000 1 −
1.0510
= 7, 721.74
pounds (to the nearest penny) if we use the fact that, to 6dp, 1.0510 = 1.628895. As
such, when choosing your prize, you should opt for the £10, 000 lump sum as that is
worth more to you now.
3
For example, you could take the £20, 000 in ten years’ time and invest it for the following
10 10
ten years to get a return of 20, 000(1 + 100 ) = 20, 000(1.1)10 = 51, 874.84 pounds (to the nearest
penny) in twenty years’ time. This is far better than just receiving £30, 000 after the same amount of time!
We also observe, in passing, that £51, 874.85 is the future value, in twenty years’ time, of getting
£20, 000 in ten years’ time and investing it. So, in terms of future values over a common period of time,
we should, again, opt for the £20, 000 in ten years’ time!
169
10. Financial Mathematics II — Applications of series
Activity 10.7 Use present values to determine how many years they would have to
pay the annuity for in order for it to be a better prize than the lump sum.
[Note that, to 2dp, log1.05 (2) = 14.21.]
Activity 10.8 Suppose that the annuity was a perpetuity, i.e. you would get
£1, 000 at the end of each year forever. What is the present value of this perpetuity?
Activity 10.9 Why is your answer to the previous activity not a surprise?
Learning outcomes
At the end of this unit, you should be able to:
Exercises
Exercise 10.1
Find the sums of the following arithmetic series.
i. 1 + 2 + 3 + · · · + 10; ii. 1 + 2 + 3 + · · · + n;
Exercise 10.2
Find the sums of the following geometric series.
1 1 1 1 1 1
i. 1+ + 2 + 3; iv. 1+ + + + · · ·;
2 2 2 3 9 27
1 1 1
ii. 3 − 6 + 12 − 24 + 48 − 96; v. 1− + − + · · ·;
2 4 8
1 1 1 1
iii. 3 − 6 + 12 − 24 + · · · + 3(−2)n ; vi. − + − + · · ·.
4 16 64 256
170
10. Financial Mathematics II — Applications of series
Exercise 10.3
Suppose that, at the beginning of each year you pay £500 into a savings account paying
7% interest per year. How much will be in the account at the end of the eighth year?
[Note that, to 6dp, 1.078 = 1.718186.]
Exercise 10.4
Suppose that you invest £10, 000 in a bank account that pays 5% interest per year. If
you want to withdraw £900 at the end of each year, how many years will you be able to
do this for?
[Note that, to 2dp, log1.05 49 = 16.62.]
Exercise 10.5
You win a competition and can choose between the following prizes.
(ii) £10, 000 at the end of each year for seven years.
Exercise 10.6
You borrow £1, 200 from your bank which requires that you repay the loan in monthly
instalments over two years. If interest is charged at 12% per annum using monthly
compounding, how much will you have to pay back each month?
171
Part 2
Statistics
172
Introduction to Statistics
Syllabus
This half of the course introduces some of the basic ideas of theoretical statistics,
emphasising the applications of these methods and the interpretation of tables and
results. The Statistics part of this course has the following syllabus.
Data exploration. The statistics part of the course begins with basic data
analysis through the interpretation of graphical displays of data. Univariate,
bivariate and categorical situations are considered, including time series plots.
Distributions are summarised and compared and their patterns discussed.
Descriptive statistics are introduced to explore measures of location and dispersion.
173
Learning outcomes for the course (Statistics)
At the end of the Statistics part of the course, you should be able to:
interpret and summarise raw data on social science variables graphically and
numerically
Textbook
As previously mentioned in the main introduction, this subject guide has been designed
to act as your principal resource. The following textbook is referenced throughout the
Statistics part of the course.
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558].
This has been indicated as ‘background reading’ meaning it is not essential, but you
could benefit from reading it if you find any of the material in the subject guide difficult
to follow.
10
174
11. Data exploration I – The nature of statistics
Overview
We begin the Statistics section of the course with data exploration, arguably the single
most important part of any data analysis. To make sense of any data, we must first
‘understand’ the basic features of each variable under consideration. Visualising data
communicates a wealth of information to even non-technical audiences. Data
exploration presents different ways of presenting data graphically depending on the type
of variable(s) being explored. We then move on to descriptive statistics (measures of
location and measures of dispersion) which are commonly-used statistics in the social
sciences whose roles are to ‘describe’ or ‘summarise’ data numerically.
Aims
This unit explains the nature of statistics providing a gentle introduction to the
discipline. The concept of ‘data’ is explored including the different types of data which
may be obtained. The role of statistics in the research process is also discussed.
Particular aims are:
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Statistics’ Chapter 8.
11
11.1 Introduction
So just what is ‘Statistics’ ? Well, there are several possible definitions. A good working
one is:
175
11. Data exploration I – The nature of statistics
Statistics is largely concerned with data. This is a plural noun meaning ‘given things’
or more loosely ‘information’ or ‘facts’.
Sometimes we look at non-numerical data such as sex (‘gender’) or social class, but
usually we are concerned with numerical information. The primary objective is to
determine what the data tell us about the underlying context in business, economics,
society, medicine etc.
An experiment. For example, some patients are given an active drug and others a
placebo.
A survey. For example, to find out more about consumers or voters (or computers
or cars).
The main distinction between an experiment and a survey is that, in the former case,
there is some sort of intervention by the researcher. Most, although not all, of the
statistics you may go on to carry out (in finance, politics etc.) are likely to be based on
survey data.
176
11. Data exploration I – The nature of statistics
clearest and best. In some cases, the display alone is sufficient – there is no need for any
formal mathematical or statistical study.
11.1.5 Analysis
This is the heavy part of Statistics. Most of the time, the methods used are
well-established, so it is only necessary to learn the relevant technique. It is important
to understand that most methods depend on certain assumptions about the data. If
these assumptions fail to hold, the conclusions are likely to be invalid.
11.1.6 Interpretation
Outside a few universities and research institutes, clear interpretation is vital!
Interpretation should be understandable by managers and others without formal
statistical training. For example, do not say ‘the p-value of 0.02 shows that the result of
the t test is significant’, but rather ‘there is evidence that men and women differ in their
attitudes to a policy of lowering taxes’.1
11.1.7 Uncertainty
In general, what is being measured is subject to uncertainty, or random variation. For
example, two randomly-chosen groups of 100 voters will typically not give exactly the
same outcomes.
We often wish to establish whether a change or a difference (between men and women,
left-wing and right-wing voters etc.) can be attributed to chance, or whether it is the
result of some real effect. We study probability largely in order to measure this
uncertainty.
177
11. Data exploration I – The nature of statistics
inferential study – the population consists of all tyres produced, the sample
consists of 100 (or 50, or 500, or 5,000) tyres which are examined.
A sports writer wants to list the times taken to run 100 metres in Olympic
Games over 60 years. This is a descriptive study.
A politician wants to know how many votes were cast for her party in her region
at a recent election. This is a descriptive study.
Notice that in an inferential study it is the properties of the population which we wish
to determine. You could argue that it would be better to examine all population
members. This is known as conducting a census. However, this will usually be slower,
more costly and may sometimes be impossible. Consider a census of all the trees in the
UK, or all the fish in the Atlantic Ocean!
The main thing to ensure is that the sample is representative of the population. This is
most easily done using a ‘simple random sample’, where each population member has
an equal chance of inclusion in the sample, although there are alternatives. We will
explore this further in the ‘Sampling and experimentation’ section of the course.
It may not come as a surprise that, generally speaking, descriptive statistics are more
easily carried out than inferential statistics. Sadly, descriptive statistics are often poorly
done, or even omitted completely in practical contexts, as well as student work. This is
a shame because they can tell us a great deal about the data, and can even render
inferential statistics unnecessary. As a rule, any data analysis should start with
descriptive statistics.
Categorical data (also known as qualitative data) give information about the
discrete groups into which a population, or sample, is divided. These may be
11 nominal or ordinal.
• Nominal data are unranked. For example, a group of individuals may be
classified by gender, eye colour or blood type (A, B, AB or O).
• Ordinal data are ranked. They give information about order or rank on a
scale. For example, a group of students may be classified by the letter grades
they receive in an examination (A, B, C etc.). So-called Likert scales are
ordinal (this course is ‘very interesting’, ‘interesting’, ‘quite interesting’, ‘not
very interesting’, ‘boring’). Investments can be graded by risk on an ordinal
scale (for example, ‘high risk’, ‘moderate risk’ or ‘low risk’).
178
11. Data exploration I – The nature of statistics
Metric data are numerical values on some continuous scale. They may be interval
or ratio data.
• Interval data are measured on a continuous scale and have the property that
the differences between numbers have a meaning. For example, centigrade
temperatures are interval data – the difference between 150 and 160 is the
same as the difference between 250 and 260, but both are different from the
difference between 150 and 200. The current time (for example, 19:34) is also
measured on an interval scale.
• Ratio data are similar to interval data but now there is an absolute zero, and
hence the ratio of two numbers can be given a meaning. For example, height,
weight and the length of time an individual has been alive all constitute ratio
data. In each of these cases, there is a fixed zero – nobody can have a negative
height or weight, or have lived a negative amount of time. In contrast, the zero
for centigrade temperatures or the current time is merely a matter of
convention. (Note that Kelvin temperatures do have an absolute zero and are,
therefore, measured on a ratio scale.)
Finally, we mention that many datasets considered are on a single attribute, such as
weight. Such data are called univariate data. Sometimes we wish to consider two
variables together, say the height and weight of a group of individuals. Such data are
called bivariate data. Multivariate data arise when we consider three or more
variables together – perhaps height, weight, age and pulse rate.
There are other ways to classify data and the classification is not always precise.
However, in most cases it is fairly clear and is sufficient for most applications – in 11
particular, the choice of correct statistical method.
179
11. Data exploration I – The nature of statistics
Research may be about almost any topic – physics, biology, medicine, economics,
history, literature etc. Most of our examples will be from the social sciences, i.e. from
economics, management, finance, sociology, political science, psychology etc. Research
in this sense is not just what universities do. Government, business, and all of us as
individuals do it too. Statistics is used in essentially the same way for all of these.
Understanding the gender pay gap – what has competition got to do with it?
Heeding the push from below – how do social movements persuade the rich to
listen to the poor?
We can think of the empirical research process as having five key stages.
2. Research design – deciding which kinds of data to collect, how and from where.
We conclude this section with an example of how statistics can be used to help answer a
research question.
180
11. Data exploration I – The nature of statistics
Gill and Spriggs (2005): Assessing the impact of CCTV. Home Office Research
Study 292.
Intervention: CCTV cameras installed in the target area but not in the
control area.
Compare measures of crime and fear of crime in the target and control areas,
in the twelve months before and twelve months after the intervention.
Level of crime: the number of crimes recorded by the police in the twelve
months before and twelve months after the intervention.
181
11. Data exploration I – The nature of statistics
Here RES < 1 which means that the observed change in the reported fear of
crime has been a bit less good in the target area.
The confidence interval for RES includes 1 which means that changes in the
self-reported fear of crime in the two areas are not statistically significantly
different from each other.
Now RES = 1.34 > 1 which means that the observed change in the number of
crimes has been worse in the control area than in the target area.
However, the numbers of crimes in each area are fairly small which means that
these estimates of the changes in crime rates are fairly uncertain.
The confidence interval for RES again includes 1 which means that the changes
in crime rates in the two areas are not statistically significantly different from
each other.
In summary, this study did not support the claim that the introduction of CCTV
reduces crime or the fear of crime.
If you want to read more about research regarding this question, see:
Many of the statistical terms and concepts mentioned above were not explained.
However, it serves as an interesting example of how statistics can be employed in the
social sciences to investigate research questions.
11 Activities 11.1, 11.2 and 11.3 are not concerned with any technicalities of statistics, and
they do not ask you to do any calculations yourself (except, perhaps, a little bit in
Activity 11.2). Instead these exercises invite you to think about various topics related to
the use of statistics, and to research design more generally. These include such issues as
the definition and measurement of variables, the selection of subjects for studies and the
justifiability of claims about causes and effects.
You are asked to think of answers to the questions, using your own reasoning and
common sense. You are welcome to discuss the questions with friends.
You do not need to worry about getting the answers right or wrong – the only point is
to start thinking!
182
11. Data exploration I – The nature of statistics
Activity 11.1 Consider the following statements. Do you think the conclusions are
valid? If so, say why. If not, indicate why not – because the logic used is faulty,
because any assumptions made are dubious, because the data collection method is
inappropriate, or for any other reason.
(a) ‘10% of drivers involved in 100 car accidents had previously taken substance X.
A parallel study of drivers not involved in accidents showed that only 1% had
taken substance X. Therefore, substance X is a contributory cause of car
accidents.’
(b) ‘Five years ago, the average stay of patients in this hospital was 21 days. Now it
is 16 days. We now cure our patients more quickly.’
(c) ‘We wanted to see if the public approved of our plans to transfer resources to
elderly patients. We carried out a large-scale survey based on 800 daytime city
centre interviews. We found 79% of respondents approved of our plans.
Therefore, we have public backing.’
(d) ‘Nugro is the revolutionary hair restorer for men. A sample of 100 men with
thinning hair was selected to apply Nugro lotion every day for a month. Of
these, 77 reported new hair growth. Nugro is proven to be effective in the
treatment of male baldness.’
Activity 11.2 The following cross-tabulation shows data on the 3,593 people who
applied to graduate study at the University of California, Berkeley, in 1973. The
table classifies the applicants according to their sex, and whether or not they were
admitted to the university.
Admitted
Sex No Yes % Yes Total
Male 1,180 686 36.8 1,866
Female 1,259 468 27.1 1,727
Total 2,439 1,154 32.1 3,593
The table shows that 36.8% of male applicants, but only 27.1% of female applicants,
were admitted.
Bob observes this and concludes that in that year Berkeley practised discrimination 11
against female applicants.
Amy, however, decides to take another look at the statistics. She adds one more piece
of data, the department to which each person applied, and creates cross-tabulations
separately for each department (which are labelled A, B, C, D and E). These tables
are shown below. For example, the first table cross-classifies the sex and admission
status of just those 585 people who applied to Department A, and so on.
Amy examines her tables and states that she disagrees with Bob – there is no
evidence of discrimination. Why does she conclude this? Why do Amy and Bob
come to different conclusions? Which one do you agree with?
183
11. Data exploration I – The nature of statistics
Admitted
Department Sex No Yes % Yes Total
A Male 207 353 63.0 560
Female 8 17 68.0 25
Total 215 370 63.2 585
(b) In a study of the ages and professions of people who had died, it was found that
the profession with the lowest average age of death was ‘student’. Therefore,
being a student is the most dangerous of professions.
(c) In 2007, the official suicide rate in Sweden was 15.8 per 100,000 people per year.
This was much higher than in many other countries, some of which even had a
rate of 0.0. This indicates that suicide is a much more serious problem in
Sweden than in those other countries.
184
11. Data exploration I – The nature of statistics
(d) Data over the past 10 years in a country show that the number of deaths from
drowning tends to be higher in months when the total consumption of ice cream
is high. Therefore, eating ice cream before going swimming increases the risk of
drowning.
(e)† A country has two kinds of secondary schools – private schools and state-owned
schools. Statistics show that 40% of those graduating from private schools, but
only 20% of those graduating from state schools, go on to study at a university.
Therefore, private schools are twice as good as state schools.
(f)† Sociologists conduct a study where they select a random sample of people and
ask these people for a list of their close friends. A random sample of the people
named as friends is then contacted and the survey is repeated. The people
sampled at the second stage have, on average, many more friends than do the
people in the original sample. Therefore, your friends have more friends than
you do.
11.4 Summary
This introductory unit has outlined the purpose of statistics and the role the discipline
plays in the research process. Preliminary considerations of issues relating to data
collection and analysis were discussed, as well as the different types of data which exist.
Having spent some time thinking about the nature of statistics, you are now ready to
start doing statistics, beginning with data visualisation in the next unit.
185
11. Data exploration I – The nature of statistics
Learning outcomes
At the end of this unit, you should be able to:
Exercises
Exercise 11.1
The given working definition of ‘Statistics’ was:
Exercise 11.2
Briefly discuss the distinction between descriptive statistics and inferential statistics.
Exercise 11.3
Explain the different types of data which can occur.
Exercise 11.4
What is the measurement level for each of the following variables?
(e) Income measured by percentiles (for example, if someone’s income is above the
20th percentile, this means 20% of the population earn less).
186
11. Data exploration I – The nature of statistics
Exercise 11.5
In 2009 the UK government reclassified cannabis from a Class C drug to a Class B
drug, thereby introducing the threat of arrest for possession of the drug. The following
table cross-classifies age and agreement with the reclassification.
Complete the table in such a way that there is a weak positive association between age
and agreement. (Assume the measurement scale of agreement as given in the table is an
ordinal one.)
11
187
12. Data exploration II – Data visualisation
Overview
Aims
This unit explains the importance of data visualisation and its role in communicating
the underlying distribution of data. Particular aims are:
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Describing data’ Chapter 1.
188
12. Data exploration II – Data visualisation
This is much better! We can see, for example, that 172 individuals (slightly over half
those surveyed) spend less than £125.
There is some arbitrariness in the grouping used and the choice of classes is often down
to common sense, but as a guide:
there should be between 5 and 25 classes
each piece of data should belong to one and only one class
in general, all classes should have the same width (but we can sometimes have
open-ended classes at the extreme, such as < 0 or > 1000).
The lower class limit is the smallest value which can go in a class.
The upper class limit is the largest value which can go in a class.
The class width is the difference between the upper and lower class limits for a
class.
12.2 Histograms 12
Diagrams are a particularly useful way of illustrating data as ‘a picture is worth a
thousand words’. We can illustrate the frequency data for the credit cards as shown in
Figure 12.1. From the histogram it is clear that most credit card holders (in the sample)
spend moderate amounts each month, with a few spending large amounts.
189
12. Data exploration II – Data visualisation
80
60
Frequency
40
20
0
Expenditure in pounds
Cumulative Cumulative
Expenditure Frequency frequency Expenditure Frequency frequency
[−25, 25) 87 87 [575, 625) 3 284
[25, 75) 55 142 [625, 675) 2 286
[75, 125) 30 172 [675, 725) 3 289
[125, 175) 24 196 [725, 775) 1 290
[175, 225) 23 219 [775, 825) 4 294
12 [225, 275) 22 241 [825, 875) 2 296
[275, 325) 8 249 [875, 925) 0 296
[325, 375) 10 259 [925, 975) 0 296
[375, 425) 7 266 [975, 1025) 0 296
[425, 475) 6 272 [1025, 1075) 3 299
[475, 525) 3 275 [1075, 1125) 0 299
[525, 575) 6 281 [1125, 1175) 1 300
190
12. Data exploration II – Data visualisation
Now we can quickly see, say, that just under two-thirds of credit card holders spend less
than £175 on a monthly basis.
Having determined the cumulative frequencies, we can construct a cumulative
frequency polygon. The horizontal axis is labelled with the class endpoints and the
vertical axis with the cumulative frequencies. A point of zero frequency is placed at the
beginning of the first class and a point is plotted at the end of each class interval for the
cumulative value. The points are then joined up and, for the credit card data, we get
Figure 12.2.
Cumulative Frequency Polygon of Monthly Credit Card Expenditure
300
x x x x x x x
x x x
x x x x
x x
x
250
x
x
x
200
Cumulative frequency
x
x
150
x
100
x
50
x
0
Expenditure in pounds
So, for example, from the graph, we can see that only about 16 of the 300 credit card
holders spend more than £600 in a month.
Next we look at some other types of graphical display. Recall that the type of diagram
used will depend on the type of data, and the objective of any diagram is to illustrate
the key features of the dataset.
Histograms (and some other forms of diagrams) are suitable for (univariate) interval or
ratio data. For categorical data, other alternatives are more appropriate.
0
1
0
8
2
5
0
0
0
0
0
4
3
3
3
0
0
6
0
2
12
0 3 1 1 0 1 0 1 1 0
2 2 0 0 0 17 1 2 1 2
0 1 6 4 3 3 1 2 4 0
0 3 15 2 0 0 0 0 0 1
1 0 2 0 2 4 4 0 2 2
191
12. Data exploration II – Data visualisation
(b) Assuming a year consists of 255 working days, on how many days would you
expect 5 or more stoppages to occur?
(c) Discuss in a couple of sentences what the data tell you. What recommendations
would you make?
Crest 27%
Colgate 24%
Ultrabrite 2%
Closeup 2%
Listerin 3%
Aquafresh 13%
Sensodyne 4%
Rembrandt 4%
192
12. Data exploration II – Data visualisation
350000
300000
250000
200000
150000
100000
50000
Listerin
Ultrabrite
Crest
Colgate
Aquafresh
Mentadent
A&H
Sensodyne
Closeup
Rembrandt
The line graph of this dataset is shown in Figure 12.5. Note the clear ‘seasonal variation’
and small, but probably significant, upward trend. (What might the commodity be?)
12
193
12. Data exploration II – Data visualisation
x
x
14
x
x
12
x
Sales
x
x
10 x
x
x
x
x
8
x
x
x
x
6
5 10 15
Season number
example, we might have data on the salary and age of a number of employees of a
company, as depicted in Figure 12.6. Think about what this scatter plot tells us about
the relationship between salary and age. (Note the anomalous point is called an outlier
– more on this later in the unit.)
x
140
120
Salary (in £000s)
100
80
60
x x
x x x
x xx
40
x x
x
x xxx x
x
20
x x
12 20 30 40 50 60
Age
194
12. Data exploration II – Data visualisation
We now consider a slightly more elaborate example which illustrates the potential
power of relatively simple descriptive statistics. Assume we have data for advertising
and sales (both in £ millions) for 60 companies of similar size in a given year. Each
company is in one of three sectors: A, B or C. How is advertising related to sales? First,
let us look at the data.
Advertising Sales Sector Advertising Sales Sector Advertising Sales Sector
38 77 A 66 77 B 93 67 C
10 57 A 43 71 B 86 68 C
60 65 A 54 73 B 20 47 C
80 77 A 46 74 B 10 43 C
68 73 A 6 29 B 37 49 C
86 55 A 25 64 B 91 87 C
1 63 A 87 30 B 89 88 C
41 77 A 59 53 B 68 66 C
86 70 A 80 26 B 7 32 C
14 76 A 31 49 B 35 44 C
25 54 A 10 18 B 42 50 C
5 49 A 94 26 B 21 40 C
3 72 A 68 68 B 28 42 C
16 84 A 41 67 B 77 77 C
22 76 A 69 72 B 53 60 C
2 63 A 6 20 B 30 39 C
29 76 A 93 24 B 24 37 C
34 77 A 3 19 B 95 91 C
55 71 A 34 47 B 84 80 C
36 92 A 100 20 B 66 75 C
x x
xx
x
80
x
x x x x x x x x x
x x
x x xx
x x x
Sales (in £ millions)
x x x x
x x
xx x
60
x
x
x x
x
x x x x
x x
x x
x
40
x x
x
x
x x
x x
x
12
20
x x x x
0 20 40 60 80 100
195
12. Data exploration II – Data visualisation
Clearly, it is very difficult to say anything interesting about the dataset by looking at
the raw data in a table. So, first we plot sales against advertising while ignoring the
sector. The scatter plot is shown in Figure 12.7 and this suggests increasing advertising
may lead to higher sales, but it is not very clear.
Suppose we produce scatter plots for each sector separately. These are shown in Figure
12.8. Advertising appears to have no effect on sales in Sector A. Advertising appears to
have an increasing effect on sales in Sector B, after which it has a decreasing effect. This
quite often happens – the market has become saturated, or the advertising campaign
becomes less effective. Finally, advertising appears to have a steadily increasing effect
on sales in Sector C.
Sector A Sector B Sector C
x x x
90
90
x x xx
x x
70
x x
80
x x
x x
x
80
60
x xx x
70
Sales (in £ millions)
x x
x
70
x x
60
x
40
x
xx
50
xx
60
x
30
x x x x x
x
x
40
x x x x
x x x
20
xx x x
50
x x
30
0 20 40 60 80 0 20 40 60 80 100 20 40 60 80
Figure 12.8: Scatter plot of ‘Sales’ against ‘Advertising’ for 60 companies, by sector.
Now consider data on sales, in thousands of units, of a small electronics firm over 10
years.
Year 1 2 3 4 5 6 7 8 9 10
Sales 2.51 2.72 3.22 3.19 4.09 4.76 5.23 6.36 7.28 9.28
What can we deduce? First, we plot the data as shown in Figure 12.9.
The data appear to be increasing exponentially (literally, i.e. according to a law of the
general form y = a + becx for some constants a, b and c). Note the precise use of the
12 word ‘exponentially’ !
However, perhaps the data points are increasing according to a quadratic, rather than
an exponential, law, so we would be better looking for a relation of the general form
y = a + bx + cx2 . Statistical modelling can be used to determine the curve best fitting a
set of data, according to some criterion. In Unit 19 we consider how to find the best
fitting line using a technique called ‘linear regression’.
196
12. Data exploration II – Data visualisation
9
8
Sales (in 000s) x
7
6 x
x
5
x
x
4
x x
3
x
x
2 4 6 8 10
Year
Figure 12.9: Scatter plot of ‘Sales’ against ‘Time’ for a small electronics firm.
12.6 Summary
This unit has looked at different ways of presenting data visually. Which type of
diagram is most appropriate will depend on the type of data being analysed. You
should be able to interpret any important features which are apparent from a diagram.
Learning outcomes
At the end of this unit, you should be able to:
interpret and summarise raw data on social science variables graphically
12
distinguish between univariate and bivariate situations
197
12. Data exploration II – Data visualisation
Exercises
Exercise 12.1
A pie chart is most suitable for a variable measured using which of the following scales:
(a) nominal scale, (b) ordinal scale, or (c) interval scale? What about a bar chart?
What about a histogram?
Exercise 12.2
Name one possible advantage and one possible disadvantage of histograms.
Exercise 12.3
The table below gives the numbers of people killed or seriously injured in the UK for
different categories of road user during 1982 and 1984. These two years, 1982 and 1984,
represent a complete year before and a complete year after the introduction of the seat
belt law.
1982 1984
Car drivers 19,460 16,421
Front seat passengers 9,458 7,047
Rear seat passengers 4,706 5,062
Pedestrians 18,963 19,168
Cyclists 5,967 6,506
(a) What is the percentage change in the number of people killed or seriously injured
for each category of road user between 1982 and 1984?
(b) What was the percentage of car drivers and car front seat passengers killed or
seriously injured, out of all cases, each year?
(c) Write a brief commentary on your findings (a few sentences), with any suggestions
as to additional information you would require for a fuller investigation as to why
there were percentage changes.
Exercise 12.4
The following table shows the weekly visits for five health, fitness or nutrition websites.
Display the data using a suitable graph and comment on the results, giving possible
reasons for any trends which you notice.
198
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Overview
Although data visualisation is useful to get a ‘feel’ for the data, in practice we also need
to be able to summarise data numerically. This unit introduces descriptive statistics and
distinguishes between measures of location, measures of dispersion and skewness. All
these statistics provide useful summaries of raw datasets.
Aims
This unit introduces and explains the importance of descriptive statistics. Particular
aims are:
to calculate simple numbers which will summarise the most important
characteristics of a dataset
to explain the use and limitations of various descriptive statistics.
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Describing data’ Chapter 2.
199
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Summation operator
X
To shorten this we use the symbol (known as the summation operator),
writing:
XN
xi = x1 + x2 + x3 + · · · + xN .
i=1
We can ‘translate’ this notation as follows – ‘the sum of the values, whose typical
member is xi , beginning with the number x1 and ending with the number xN ’. So, using
the above example, if x1 = 7, x2 = 4, x3 = 12 and x4 = 6, we have:
4
X
xi = x1 + x2 + x3 + x4 = 7 + 4 + 12 + 6 = 29.
i=1
X
As you might expect, it is possible to write down other expressions involving . For
example, we might be interested in the sum of the squares of the values
x1 , x2 , x3 , . . . , xN , which would be written as:
N
X
x2i = x21 + x22 + x23 + · · · + x2N .
i=1
Quite often the value of N will be clear and in such cases it is common to write simply
X XN
xi instead of xi . With practice, using the summation operator should not pose
i=1
any difficulties. However, it is essential that you properly understand its interpretation
since the summation operator is used extensively in many areas of statistics.
3
X
yi = 4 + 5 + 6 = 15
i=1
3
X
3 xi = 3 × 6 = 18
i=1
3
X
3xi = 3 + 6 + 9 = 18
i=1
13 3
X 3
X
xi + yi = 6 + 15 = 21
i=1 i=1
200
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
3
X
(xi + yi ) = (1 + 4) + (2 + 5) + (3 + 6) = 21
i=1
3
X
x2i = 12 + 22 + 32 = 14
i=1
3
!2
X
xi = 62 = 36
i=1
3
X
xi yi = (1 × 4) + (2 × 5) + (3 × 6) = 32
i=1
3
X 3
X
xi yi = 6 × 15 = 90
i=1 i=1
3
X
3 60 3 60 3 60 3 60
xi + = 1 + + 2 + + 3 + = 16 + 20 + 37 = 73
i=1
yi 4 5 6
3
X
8 = 8 + 8 + 8 = 24.
i=1
3
X 3
X 3
X
We saw that xi + yi = 21 and (xi + yi ) = 21. It is true, in general, that:
i=1 i=1 i=1
N
X N
X N
X
xi + yi = (xi + yi ).
i=1 i=1 i=1
3
X 3
X
We also saw that 3 xi = 18 and 3xi = 18. It is true, in general, that:
i=1 i=1
N
X N
X
c xi = cxi
i=1 i=1
N
X N
X
!2 13
x2i 6= xi .
i=1 i=1
201
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
3
X 3
X 3
X
We also saw that xi yi = 32 and xi yi = 90. In general:
i=1 i=1 i=1
N
X N
X N
X
xi yi 6= xi yi .
i=1 i=1 i=1
N
X
(a) 2xi
i=1
N
X
(b) x2i
i=1
N
X
(c) (xi − 2)
i=1
N
X
(d) (xi − 2)2
i=1
N
!2
X
(e) xi
i=1
N
X
(f) 2.
i=1
N N
!2 N N N
X X X X X
x2i 6= xi and xi yi 6= xi yi ?
i=1 i=1 i=1 i=1 i=1
N
!2 N N
X X X
What are xi and xi yi ? (Consider the case where N = 2.)
i=1 i=1 i=1
202
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
of a dataset. We shall encounter three ways of defining the ‘average’. In general, the
average is a value typical, or representative, of a dataset.
The (arithmetic) mean is the sum of all the members of a dataset divided by the
number of values in the dataset. Sometimes the mean is denoted by µ (pronounced
‘mew’) and sometimes it is denoted by x̄ (pronounced ‘x-bar’). The distinction between
µ and x̄ is very important. µ refers to the mean of a population, whereas x̄ refers to the
mean of a sample. For now we shall not concern ourselves too much about this
distinction – we shall return to it when we cover ‘sampling distributions’ in Unit 16.
So, for example, if a student scored 62, 74, 49, 37 and 58 in a sample of five tests, the
mean mark achieved is:
62 + 74 + 49 + 37 + 58 280
= = 56.
5 5
Another measure of location is the median. This is the central value of the dataset
when the numbers are arranged in ascending order. If no single such central value exists
(this occurs when there is an even number of values), then the mean of the two middle
numbers is taken.
The mode of a set of numbers is the most frequently-occurring value. In some cases it
may not exist, or indeed it may not be unique.
The mode of 3, 3, 3, 3, 3, 7, 7, 8, 9, 9 is 3.
The set of values 4, 4, 30, 50, 50, 90 is bimodal – there are two modes, 4 and 50.
203
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
One possible remedy is to calculate the trimmed mean. This involves dropping k
observations (where k is a number, typically 1 or 2) from each end of the ordered
dataset and calculate the mean of the remaining observations.
With ordered values x(1) , x(2) , . . . , x(n) , where x(i) indicates the ith ordered value, the
trimmed (sample) mean, denoted x̄tr , is:
n−k
P
x(i)
i=k+1
x̄tr = .
n − 2k
So the trimmed mean may be useful if we are concerned about extreme values being
present in the dataset.
For example, suppose the ordered dataset is 1, 32, 37, 38, 41, 192. Clearly, the largest
value is extreme and to a lesser extent the smallest value is too, so we set k = 1 and
compute the trimmed mean to be:
n−k
P
x(i)
i=k+1 32 + 37 + 38 + 41
x̄tr = = = 37.
n − 2k 6−2
Activity 13.3 Consider again the data on computer stoppages in Activity 12.1.
13 (a) Compute the mean, median, mode and a suitable trimmed mean for these data.
204
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Observation, xi Frequency, fi
2 4
3 2
4 3
5 1
(4 × 2) + (2 × 3) + (3 × 4) + (1 × 5)
= 3.1.
4+2+3+1
This leads to the following more general result. If the numbers x1 , x2 , x3 , . . . , xk occur
with respective frequencies f1 , f2 , f3 , . . . , fk , then:
k
P
f i xi
f 1 x1 + f 2 x2 + f 3 x3 + · · · + f k xk i=1
x̄ = = k . (13.1)
f1 + f2 + · · · + fk P
fi
i=1
Let us now re-visit the credit card data from Unit 12. The classes are −25 to 25, 25 to
75 etc. and we can use the frequency table to estimate the mean. You should appreciate
that when we do this, we lose some information when the table is constructed as we no
longer have the raw (original) data. Consequently, if we calculate the mean using (13.1)
we shall expect to lose some precision, but we still expect a reasonable estimate. So we
face a trade-off – although we lose some precision (a disadvantage), we have the
convenience of summarising the data in a frequency table (an advantage). So our
frequency table is:
In the table above, the ‘Midpoint’ column is simply the centre value of the interval in
the ‘Expenditure’ column and we take this to be the expenditure value for each class.
Now we are able to estimate the mean using (13.1). The estimate is:
205
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Both have a mean of 7, but the datasets are clearly very different. The second dataset is
more ‘compact’ while the first dataset is more ‘spread out’. Just because two datasets
have the same mean is not sufficient to fully describe them as the mean is unable to
distinguish between the difference in the spread of the data. So we seek precise ways for
measuring spread, or dispersion. Just as there are several measures of location, there are
also several measures of dispersion.
We begin with the range, which is defined as the difference between the maximum and
minimum observations.
For the dataset 0, 1, 5, 8, 9, 19, the range is 19 − 0 = 19.
For the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, the range is 7.3 − 6.8 = 0.5.
The first dataset has a much larger range owing to the greater dispersion.
Example 13.2 The marks for 20 students in an introductory statistics class are:
88 67 64 76 86 85 82 39 75 34
90 63 89 89 84 81 96 100 70 96
34 39 63 64 67 70 75 76 81 82
84 85 86 88 89 89 90 96 96 100
Finding these quartiles posed no great difficulties here because the number of
observations, 20, is divisible by 4. When the number of observations is not divisible by 4
things can become slightly more complicated, although for our purposes it will suffice to
13 take the average of the values either side of where each quartile is located. However, in
practice most datasets are large which means the differences between alternative
methods which exist become negligible.
206
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Analogously to quartiles, it is possible to divide datasets into deciles (10 equal parts) or
even percentiles (100 equal parts). For example, we can express the median as Q2 , the
5th decile or even the 50th percentile. However, we shall not consider deciles and
percentiles any further.
Having introduced quartiles, we are now in a position to discuss another measure of
dispersion – the interquartile range (IQR).
Interquartile range
We define the IQR as the difference between the third and first quartiles, that is:
IQR = Q3 − Q1 .
For the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, the quartiles are similarly estimated to be
Q1 = 6.85, Q2 = 6.95 and Q3 = 7.05. Hence the IQR is 7.05 − 6.85 = 0.2.
As we found with the range, the first dataset has a greater IQR reflecting the
greater dispersion in the dataset.
5-number summary
We are now able to provide a more formal definition of an outlier. Earlier we described
it as an ‘extreme observation’. Now we define an outlier to be a data value which is
more than 1.5 times the interquartile range above Q3 or below Q1 , that is less than
Q1 − 1.5 × IQR or greater than Q3 + 1.5 × IQR. Extreme outliers are more than 3 times
the interquartile range above Q3 or below Q1 .
For example, for the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, we have found Q1 = 6.85,
Q3 = 7.05 and IQR = 0.2. So outliers are any data points which are either less than
6.85 − 1.5 × 0.2 = 6.55, or greater than 7.05 + 1.5 × 0.2 = 7.35. Hence there are no
outliers.
Another, less often-used, measure of dispersion is the mean absolute deviation
(MAD). For a dataset containing the points xi , for i = 1, 2, . . . , n, we define it to be:
n
P
|xi − x̄|
13
i=1
MAD =
n
207
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
that is, we use the absolute value of the differences between the observations and the
(sample) mean. Using the absolute value sign gives equal weight to values either side of
the mean. Although it is easy to calculate the MAD, it is used far less in practice than
the more common, and important, measures of dispersion known as the variance and
standard deviation.
σ 2 = i=1 . (13.2)
N
We can think of this as the average squared deviation from the mean. Due to the
squared term, data values which are distant from the mean have correspondingly large
values of xi − µ and, therefore, contribute a great deal to the variance, regardless of
whether values lie far above or far below the mean. Similarly, data values which lie close
to the mean (above or below) contribute comparatively little to the variance.
Note the notation for the variance is σ 2 , which is pronounced ‘sigma-squared’. In fact σ 2
– as defined in (13.2) – is the notation used for the population variance, i.e. when the
data values cover the entire population under consideration. If, instead, the dataset
represents a sample drawn from an underlying population, we refer to the sample
variance, which we denote by s2 . Clearly, if we only have sample data, not only is the
population variance, σ 2 , unknown, but so is the population mean, µ.
Sample variance
Notice that (13.3) is similar to (13.2) except we replace the population mean, µ, with
the sample mean, x̄, and divide by n − 1 instead of N . It should be intuitively clear why
we use x̄ (µ is, of course, unknown). The n − 1 in the denominator is present for reasons
which are beyond the scope of this course.
Consider again the datasets 0, 1, 5, 8, 9, 19 and 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, each with a
mean of 7. We shall treat these as population datasets, hence µ = 7 for each dataset.
The first dataset has deviations about µ of −7, −6, −2, 1, 2 and 12. Therefore, the
squared deviations are 49, 36, 4, 1, 4 and 144, with a sum of 238. The variance,
using√(13.2), is then σ 2 = 238/6 = 39.67 and the standard deviation is
σ = 39.67 = 6.30.
13
The second dataset has deviations about µ of −0.2, −0.1, −0.1, 0, 0.1 and 0.3.
Therefore, the squared deviations are 0.04, 0.01, 0.01, 0, 0.01 and 0.09, with a sum
208
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
2
of 0.16. The variance, again
√ using (13.2), is then σ = 0.16/6 = 0.027 and the
standard deviation is σ = 0.027 = 0.16.
As before, the first dataset has a greater variance (and hence standard deviation)
due to the greater dispersion in the dataset.
Clearly, using (13.2) for population datasets and (13.3) for sample datasets becomes
onerous when working them out by hand. It can be shown that (13.2) and (13.3) can be
equivalently expressed, respectively, as:
N
x2i
P
i=1
σ2 = − µ2 (13.4)
N
and: n
x2i − nx̄2
P
i=1
s2 = . (13.5)
n−1
For example, using the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3 we have:
N
X
x2i = (6.8)2 + (6.9)2 + · · · + (7.3)2 = 294.16
i=1
For example, suppose we have the following frequency distribution for ages of students.
xi (age) 18 19 20 21 22 23 24 25 26
fi (frequency) 1 5 8 12 10 7 4 1 2
For these data we first find, using (13.1), the (population) mean:
13
(1 × 18) + (5 × 19) + (8 × 20) + · · · + (2 × 26)
µ= = 21.58.
1 + 5 + 8 + ··· + 2
209
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
13.4 Skewness
We conclude our look at descriptive statistics with one further quantity since the mean
and variance, while very useful, do not provide complete information about a dataset.
Skewness of a distribution quantifies the departure from symmetry. By definition a
symmetric distribution has zero skewness. Although various methods exist to quantify
skewness, for this course we shall only be concerned with describing skewness
qualitatively, that is whether the skewness is positive (to the right) or negative (to the
left). This can be achieved by either comparing the mean and median, or visually by
consulting a distribution plot of a dataset, such as a histogram.
Skewness
If the mean is greater than the median, then this indicates a positively-skewed
distribution (also referred to as ‘right-skewed’).
If the mean is less than the median, then this indicates a negatively-skewed
distribution (also referred to as ‘left-skewed’).
If the mean equals the median, then this indicates a symmetric distribution.
In the case of skewed distributions, we have already said that the mean is sensitive to
outliers and so the mean is ‘pulled’ in that direction leading to the above relationships
between the mean and median.
Graphically, skewness can be determined by identifying where the long ‘tail’ of the
distribution lies. If the long tail is heading toward increasingly positive values on the
horizontal axis (i.e. on the right-hand side), then this indicates a positively-skewed
(right-skewed) distribution. Similarly, if the long tail is heading toward increasingly
negative values (i.e. on the left-hand side) then this indicates a negatively-skewed
(left-skewed) distribution, as illustrated in Figure 13.1.
Finally, a boxplot (sometimes called a box-and-whisker plot) is a graph which shows
the 5-number summary as well as any outliers and extreme outliers. Boxplots are useful
for displaying a dataset’s distribution. Unlike histograms, these explicitly depict the
13 quartiles. From a boxplot it is easy to obtain the following: median, quartiles, IQR,
range, skewness and outliers. An example of a (not-to-scale) boxplot can be seen in
Figure 13.2.
210
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Positively-skewed
distribution
Negatively-skewed
distribution
Q3
50% of cases
have values Q2 = Median
within the box
Q1
The lower and upper quartiles, Q1 and Q3 , respectively, are represented as the ends
of the box.
‘Whiskers’ are drawn from Q1 and Q3 to the observations furthest from the median
which are not more than 1.5 times the IQR (i.e. excluding outliers).
211
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
In the example in Figure 13.3, it can be seen that the median is around 74, Q1 is about
63, and Q3 is approximately 77. The numerous outliers provide a useful indicator that
this distribution is negatively-skewed as the long tail covers lower values of the variable.
Note also that Q3 − Q2 < Q2 − Q1 .
Activity 13.5 A group of cows was fed one of three experimental diets A, B or C.
After two weeks, the gain or loss in weight was recorded in kilograms.
13.5 Summary
This unit has introduced some quantitative approaches to summarising data, known as
descriptive statistics. We have distinguished measures of location, dispersion and
skewness. Although descriptive statistics serve as a very basic form of statistical
13 analysis, they nevertheless are extremely useful for capturing the main characteristics of
a dataset. Therefore, any statistical analysis of data should start with visualising the
data (covered in Unit 12) and the calculation of descriptive statistics.
212
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
Learning outcomes
At the end of this unit, you should be able to:
interpret and summarise raw data on social science variables numerically
Exercises
Exercise 13.1
The tables below show the scores of two groups of students in a test.
Mark 0 1 2 3 4
Frequency 3 x 8 5 10
Mark 0 1 2 3 4
Frequency 3 y 8 5 10
Exercise 13.2
For variables measured at which measurement level(s) (nominal, ordinal, interval or
ratio) is the arithmetic mean the most appropriate?
Exercise 13.3
13
Asked whether they agree with a proposed increase in university tuition fees, the
following counts were obtained from a group of 75 respondents:
213
13. Data exploration III – Descriptive statistics: measures of location, dispersion and skewness
1. Strongly disagree 30
2. Disagree 15
3. Neither agree nor disagree 15
4. Agree 5
5. Strongly agree 10
Total 75
(a) What are the median and mode of the responses? Using the numerical scores in the
left-hand column (i.e. the scores 1 to 5), calculate the mean response. Briefly
discuss whether the mean is appropriate for this type of data.
(b) Do these data indicate that there is widespread dissatisfaction with the proposed
increase in university tuition fees? Justify your answer briefly.
Exercise 13.4
Display the data below using a boxplot and provide the 5-number summary.
3 2 4 8 7 19 2 5 3 4 10 12.
Exercise 13.5
Exercise 13.6
A service station sells both unleaded petrol and diesel. It has recorded the following
frequency distribution for the number of gallons sold per car for the two fuels in a total
sample of 1,000 vehicles.
Unleaded (gallons) Frequency Diesel (gallons) Frequency
[0, 4.99] 74 [0, 4.99] 22
[5, 9.99] 192 [5, 9.99] 68
[10, 14.99] 280 [10, 14.99] 153
[15, 19.99] 105 [15, 19.99] 57
[20, 24.99] 23 [20, 24.99] 11
[25, 29.99] 6 [25, 29.99] 9
Total 680 Total 320
(a) Estimate the mean for these grouped data for unleaded and diesel separately.
(b) Do drivers of unleaded vehicles, or of diesel vehicles, fill up with more fuel, on
average? Give a possible reason for your answer.
(c) Suppose the service station expects to refuel 240 cars in a day, in the same
proportions as given in the above table. Suppose unleaded petrol costs $5.97 a
13 gallon and diesel costs $6.24 a gallon. Estimate the total daily income from the sale
of fuel.
214
14. Probability I – Introduction to probability theory
Overview
The world around us is an uncertain place. Will GDP growth next year be positive or
negative? Which political party will win a general election? What will the weather be
tomorrow? These are just a few examples. Yes, we know what could happen (for
example, positive GDP growth, negative GDP growth or no GDP growth) but we do
not know with certainty in advance what will happen. ‘Probability’ allows us to model
uncertainty and in this unit we explore probability theory.
Aims
This unit introduces the concept of probability and its role in modelling uncertainty.
Particular aims are:
to provide an insight into the concept of probability
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Probability’ Chapter 1.
Although probability is an interesting and important subject in its own right, our main
interest in probability arises due to its role in statistical inference. Inference is
215 14
14. Probability I – Introduction to probability theory
For example, a factory has 200 workers, of whom 70 are female. The experiment
consists of randomly selecting an employee – the event consists of randomly
selecting a female employee. In this case we have n = 70 and N = 200. We would
write P (female) = 70/200 = 0.35.
If we toss a fair (i.e. an unbiased ) coin, the experiment is the toss of the coin, and
the event is obtaining a tail, say. Now n = 1 and N = 2, so P (tail) = 1/2.
Even at this stage we can draw some general conclusions. Clearly, n ≥ 0 and n ≤ N , i.e.
0 ≤ n ≤ N . It follows that:
0 n N
≤ ≤
N N N
i.e. we have:
0 ≤ P (·) ≤ 1
where P (·) denotes the probability of some event. From this, we deduce that:
It is also true that events which are equally likely to occur or not occur (for example,
tossing a fair coin and getting ‘tails’) have probabilities of 0.5. Sometimes probabilities
are converted to percentages, such as ‘there is a 70% chance of rain tomorrow’.
14 216
14. Probability I – Introduction to probability theory
14.2 Terminology
As we have seen, an experiment is a process which produces outcomes – for example,
rolling a die or selecting 100 eggs from a farm and seeing how many are cracked.
217 14
14. Probability I – Introduction to probability theory
Example 14.1 Suppose an experiment involves rolling two fair dice. What is the
probability that the sum of the scores is 7?
We can set out the sample space as follows:
We can see that there are six outcomes with a sum of 7 (highlighted in bold), out of
36 possible events. Hence the probability that the sum is 7 is 6/36 = 1/6.
14.3 Sets
Numbers or objects enclosed in braces can be thought of as sets. Sets themselves are
simply collections of objects. For example:
the collection of all outcomes when a die is rolled is the set {1, 2, 3, 4, 5, 6}
the colours of the rainbow form the set {red, orange, yellow, green, blue, indigo,
violet}.
Sets can be represented by Venn diagrams, like the one in Figure 14.1. The set A is
shown as an oval, and the rectangle is the sample space, or ‘universal set’. If
A = {1, 2, 3, 4, 5, 6}, the universal set might be all positive integers; if A = {red, orange,
yellow, green, blue, indigo, violet}, the universal set might be all possible colours.
Union of sets
The union of two sets A and B consists of those elements in A or B or both, and is
written:
A ∪ B.
14 218
14. Probability I – Introduction to probability theory
Figure 14.1: A Venn diagram showing the set A in the universal set (sample space).
No value is listed more than once in the union. For example, if A = {1, 4, 7, 9} and
B = {2, 3, 4, 5, 6}, then A ∪ B = {1, 2, 3, 4, 5, 6, 7, 9}. If A and B are represented by two
(overlapping) shaded ovals, the union is represented by the entire shaded region, as
shown in Figure 14.2.
A B
Figure 14.2: A Venn diagram showing the union of the sets A and B (the shaded region).
Intersection of sets
The intersection of two sets A and B consists of those elements common to both
A and B, and is written:
A ∩ B.
{1, 3, 5, 7} ∩ {2, 4, 6, 8} = ∅.
Two events are mutually exclusive if the existence of one precludes the other. The
events ‘male’ and ‘female’ are mutually exclusive when we observe gender. The
outcomes ‘cracked’ and ‘not cracked’ are mutually exclusive when we sample eggs.
219 14
14. Probability I – Introduction to probability theory
A B
Figure 14.3: A Venn diagram showing the intersection of the sets A and B (the shaded
region).
In general, if A and B are mutually exclusive, the event A ∩ B is certain not to occur.
Hence:
P (A ∩ B) = 0
if A and B are mutually exclusive. Or, equivalently, P (A ∩ B) = 0 if A ∩ B = ∅.
14.4 Independence
Two events are independent if the occurrence or non-occurrence of one does not affect
the occurrence or non-occurrence of the other. For example:
whether Party A wins the election is independent of whether your car breaks down
today
coin tosses are independent of each other – the event of getting ‘heads’ on the first
toss is independent of getting ‘heads’ on the second toss.
However:
whether Party A wins the election is not independent of who the party leader is
whether a die shows a ‘6’ is not independent of whether it shows an even number.
We let P (A | B) denote the probability of A occurring given that B has occurred, this is
called a conditional probability. If A and B are independent, then the probability of
A occurring given that B has occurred is just the probability of A occurring. That is, if
A and B are independent, then:
P (A | B) = P (A) and P (B | A) = P (B).
Hence if A and B are not independent (i.e. are dependent), then P (A | B) 6= P (A) and
P (B | A) 6= P (B).
For example, a person’s handedness is presumably independent of whether they prefer
tea of coffee, so:
P (prefers tea | person is right-handed) = P (prefers tea).
14 220
14. Probability I – Introduction to probability theory
If, in rolling a fair die, event A is getting an even number, Ac is the event getting
an odd number.
If event A is getting a cracked egg from a sample, Ac is finding a good
(non-cracked) egg.
If A is the event that it rains tomorrow, Ac is the event that there is no rain
tomorrow.
If the occurrence of event A corresponds to one of n elementary events out of a total of
N elementary events, then Ac corresponds to N − n elementary events. Note:
N −n N n
= −
N N N
and we deduce the following important result.
Complementary events
P (Ac ) = 1 − P (A).
221 14
14. Probability I – Introduction to probability theory
Example 14.4 Suppose movies are classified at a store. Let C and H be the events
that the movie rented by the next customer is comedy and horror, respectively.
Suppose P (C) = 0.26 and P (H) = 0.18. Using (14.1), we have:
Hence the probability that it is neither comedy nor horror is 1 − 0.44 = 0.56.
(Of course, we are assuming that there are no horror comedies!)
Let us look again at the Venn diagram for the union of two sets in Figure 14.2. We shall
use the notation n(A ∪ B) for the number of elements in A ∪ B, and similarly for n(A),
n(B) and n(A ∩ B). Now count n(A ∪ B). It is not n(A) + n(B) because the part in
A ∩ B will have been counted twice. If we subtract n(A ∩ B) we will be right. So:
Hence:
n(A ∪ B) n(A) n(B) n(A ∩ B)
= + −
N N N N
where there are N possible events altogether. However, n(A ∪ B)/N = P (A ∪ B),
n(A)/N = P (A), and so on. We deduce the general law of addition.
(14.2) holds for any events A and B, regardless of whether or not they are mutually
exclusive. (14.1) is a special case of (14.2) where A ∩ B = ∅, i.e. when P (A ∩ B) = 0.
14 222
14. Probability I – Introduction to probability theory
Example 14.6 If 16% of the population are left-handed, 30% are overweight, but
only 25% of left-handers are overweight, what is the probability of a
randomly-selected person being left-handed or overweight?
Let L and O be the events of the person being left-handed and overweight,
respectively. We have P (L) = 0.16, P (O) = 0.3 and P (L ∩ O) = 0.04. Applying
(14.2) gives:
The probability of getting a ‘6’ and a head is 1/12, because (6, H) is just one
elementary outcome out of 12, written in full as:
(1, H), (2, H), (3, H), (4, H), (5, H), (6, H),
Notice that 1/6 × 1/2 = 1/12. We have been able to multiply the probabilities because
the two events are independent.
Independent events
Example 14.7 Suppose 90% of UK adults drive a car and 60% have a broadband
connection. What is the probability that a UK adult drives and has a broadband
connection?
Well, it is tempting to use (14.3) to say that the probability is 0.9 × 0.6 = 0.54.
However, this depends on the two events being independent. If they are, the answer
is correct; if not, we do not have enough information to solve the problem.
Activity 14.1 A chain is formed from n links. The strengths of the links are
mutually independent, and the probability that any one link fails under a specified
load is q. What is the probability that the chain fails under the load?
223 14
14. Probability I – Introduction to probability theory
Recall that P (A | B) is the probability of event A given that event B has occurred. The
event A ∩ B (A and B) will occur if A occurs and if B occurs, so once we know B has
occurred with probability P (B) we can use P (A | B) to find the probability that A ∩ B
occurs, i.e. we have:
P (A ∩ B) = P (B) P (A | B).
We could also argue that P (A ∩ B) = P (A) P (B | A).
Example 14.8 Suppose a company has 140 employees, of which 30 are supervisors.
80 of the employees are married, 20% of the married employees are supervisors.
What is the probability that a random employee is a married supervisor?
We let M denote married, S denote supervisor and we require P (M ∩ S). We know
that P (M ) = 80/140 = 4/7 and P (S | M ) = 20/100 = 1/5. So, applying (14.4), we
obtain:
4 1 4
P (M ∩ S) = P (M ) P (S | M ) = × = ≈ 0.1143.
7 5 35
Do not confuse ‘mutually exclusive’ and ‘independent’ ! ‘Mutually exclusive’ means two
events cannot occur simultaneously. ‘Independent’ means that the occurrence or
non-occurrence of one event does not affect the occurrence or non-occurrence of the
other. These are not the same thing at all!
Activity 14.2 Suppose A and B are events with P (A) = 0.2, P (B) = p and
P (A ∪ B) = 0.6.
14 224
14. Probability I – Introduction to probability theory
We state the theorem in three forms, of increasing generality, illustrating both ideas.
The first two will be justified, but not the third (the proof is beyond the scope of the
course).
14.8.1 Version 1
P (A | B) P (B)
P (B | A) = . (14.5)
P (A)
So the unconditional probability of 5% has been changed to 10% in light of the new
information about unmet demand.
14.8.2 Version 2
P (A | B) P (B)
P (B | A) = . (14.6)
P (A | B) P (B) + P (A | B c ) P (B c )
225 14
14. Probability I – Introduction to probability theory
This follows from the first version, which has denominator P (A). Since either B or B c
must occur, we can deduce that P (A) = P (A ∩ B) + P (A ∩ B c ) (a Venn diagram may
make this clear). Also, P (A ∩ B) = P (A | B) P (B) and P (A ∩ B) = P (A | B c ) P (B c ) by
two applications of (14.4), so:
P (A) = P (A | B) P (B) + P (A | B c ) P (B c )
P (L | D) P (D)
P (D | L) = .
P (L | D) P (D) + P (L | Dc ) P (Dc )
14.8.3 Version 3
P (A | Bi ) P (Bi )
P (Bi | A) = . (14.7)
P (A | B1 ) P (B1 ) + P (A | B2 ) P (B2 ) + · · · + P (A | Bn ) P (Bn )
14 226
14. Probability I – Introduction to probability theory
As previously mentioned, we omit the proof of this version. However, note that:
n
X
P (A) = P (A | Bi ) P (Bi ).
i=1
Example 14.11 Machines 1, 2 and 3 all produce the same two parts A and Z. Of
all the parts produced, machine 1 produces 60%, machine 2 produces 30% and
machine 3 produces 10%. In addition, 40% of parts made by machine 1 are part A,
50% of parts made by machine 2 are part A, and 70% of parts made by machine 3
are part A. A part is randomly selected and is found to be an A part. With this
knowledge, what are the revised probabilities that it came from machines 1, 2 and 3,
respectively?
Let A be the event that we have randomly selected an A part. We can usefully put
calculations in a table:
P (A | Bi ) P (Bi )
P (Bi | A) =
P (A | B1 ) P (B1 ) + P (A | B2 ) P (B2 ) + P (A | B3 ) P (B3 )
and we have been able to find revised probabilities of 0.52, 0.33 and 0.15 of the part
having come from machines 1, 2 and 3, respectively, rather than the probabilities of
0.6, 0.3 and 0.1 using the knowledge that the part was an A part. The unmodified
and modified probabilities are sometimes called prior and posterior probabilities,
respectively.
227 14
14. Probability I – Introduction to probability theory
P (A | B) P (B)
P (B | A) = .
P (A)
P (A | B) P (B)
P (B | A) = .
P (A | B) P (B) + P (A | B c ) P (B c )
P (A | Bi ) P (Bi )
P (Bi | A) = .
P (A | B1 ) P (B1 ) + P (A | B2 ) P (B2 ) + · · · + P (A | Bn ) P (Bn )
Example 14.12 Five bonds are rated A+, A, B+, B or C, depending on the
stability of the issuing firm. An inexperienced bond buyer selects two different bonds
at random from these five bonds (i.e. without replacement).
(a) What is the probability that she does not select the C-rated bond?
(b) What is the probability that she selects only the A+ and A-rated bonds?
(a) Let C1c be the event that the first selected bond is not the C-rated bond, and C2c
be the event that the second selected bond is not the C-rated bond. We require
P (C1c ∩ C2c ) = P (C1c ) P (C2c | C1c ), and we know that P (C1c ) = 4/5. To find
P (C2c | C1c ), we note that if the first bond is not the C-rated one, there are four
remaining: one is the C-rated bond, the others are not. So P (C2c | C1c ) = 3/4,
hence:
4 3 3
P (C1c ∩ C2c ) = P (C1c ) P (C2c | C1c ) = × = .
5 4 5
(b) The investor only selects the A+ and A-rated bonds if the first one is A-rated
and the second A+-rated, or the first is A+-rated and the second is A-rated.
The first of these probabilities is 1/5 × 1/4, by a similar argument to the first
part. The second is also 1/5 × 1/4. So the probability of the investor selecting
these two bonds is:
1 1 1
+ = .
20 20 10
Example 14.13 A recent survey of 1,700 companies showed that 49% performed
studies of marketing effectiveness, 61% conducted short-term sales forecasts, and
38% undertook both activities. Let A denote the firm studies marketing effectiveness
14 228
14. Probability I – Introduction to probability theory
and let B denote the firm produces short-term sales forecasts. Find P (A ∪ B),
P (A | B) and determine how many of the firms undertook both A and B.
Note that P (A) = 0.49, P (B) = 0.61 and P (A ∩ B) = 0.38 directly. So:
and:
P (A ∩ B) 0.38
P (A | B) = = ≈ 0.62.
P (B) 0.61
We would say 0.38 × 1700 = 646 firms undertook A and B.
Example 14.14 The table below gives the marital status of adults in a country by
sex in terms of proportions of the total population.
There are more women than men in the ratio 0.525 : 0.475 = 21 : 19.
If we also knew that the total adult population is 13.6 million, we can convert the
proportions to absolute numbers. The resulting table is known as a contingency table.
229 14
14. Probability I – Introduction to probability theory
Example 14.15 Two events A and B are independent with P (A) = 0.3 and
P (B) = 0.1.
(a) Are A and B mutually exclusive? Give a reason.
P (A ∩ B) = 0.3 × 0.1 6= 0.
Therefore, the event A ∩ B can occur, i.e. A and B are not mutually exclusive.
(c) We have:
(d) Look at the Venn diagram in Figure 14.2. The white area represents both
Ac ∩ B c and (A ∪ B)c . These are the same set, i.e. Ac ∩ B c = (A ∪ B)c . Hence:
14.10 Summary
This unit has introduced the fundamentals of probability theory and we have seen some
important probability results, including conditional probability and independence. A
good grounding in probability theory is necessary before moving on to probability
distributions in the next unit.
14 230
14. Probability I – Introduction to probability theory
Learning outcomes
At the end of this unit, you should be able to:
apply the ideas and notation used for sets in simple examples
Exercises
Exercise 14.1
Let K be the event of drawing a ‘king’ from a well-shuffled deck of playing cards. Let D
be the event of drawing a ‘diamond’ from the pack. Determine:
(a) P (K)
(b) P (D)
(c) P (K c )
(d) P (K ∩ D)
(e) P (K ∪ D)
(f) P (K | D)
(g) P (D | K)
(h) P (D ∪ K c )
(i) P (Dc ∩ K)
Exercise 14.2
If A and B are independent events such that P (A) = 0.2 and P (B) = 0.6, what is
P (Ac ∩ B c )?
Exercise 14.3
A and B are two mutually exclusive events. State what this means:
(a) in words
231 14
14. Probability I – Introduction to probability theory
Exercise 14.4
A student has an important job interview in the morning. To ensure he wakes up in
time, he sets two alarm clocks which ring with probabilities 0.97 and 0.99, respectively.
What is the probability that at least one of the alarm clocks will wake him up?
Exercise 14.5
20% of men show early signs of losing their hair. 2% of men carry a gene that is related
to hair loss. 80% of men who carry the gene experience early hair loss.
(a) What is the probability that a man carries the gene and experiences early hair loss?
(b) What is the probability that a man carries the gene, given that he experiences
early hair loss?
Exercise 14.6
Tower Construction Company (‘Tower’) is determining whether it should submit a bid
for a new shopping centre. In the past, Tower’s main competitor, Skyrise Construction
Company (‘Skyrise’), has submitted bids 80% of the time. If Skyrise does not bid on a
job, the probability that Tower will get the job is 0.6. If Skyrise does submit a bid, the
probability that Tower gets the job is 0.35.
(a) What is the probability that Tower will get the job?
(b) If Tower gets the job, what is the probability that Skyrise made a bid?
(c) If Tower did not get the job, what is the probability that Skyrise did not make a
bid?
Exercise 14.7
In a large lecture, 60% of students are female and 40% are male. Records show that
15% of female students and 20% of male students are registered as part-time students.
(a) If a student is chosen at random from the lecture, what is the probability that the
student studies part-time?
(b) If a randomly chosen student studies part-time, what is the probability that the
student is male?
Exercise 14.8
James is a salesman for a company and sells two products, A and B. He visits three
different customers each day. For each customer, the probability that James sells
product A is 1/3 and the probability is 1/4 that he sells product B. The sale of product
A is independent of the sale of product B during any visit, and the results of the three
visits are mutually independent. Calculate the probability that James will:
(a) sell both products, A and B, on the first visit
(b) sell only one product during the first visit
(c) make no sales of product A during the day
(d) make at least one sale of product B during the day.
14 232
14. Probability I – Introduction to probability theory
Exercise 14.9
Given two events, A and B, state why each of the following is not possible. Use
formulae or equations to illustrate your answer.
Exercise 14.10
At a local school, 90% of the students took test A, and 15% of the students took both
test A and test B. Based on the information provided, which of the following
calculations are not possible, and why? What can you say based on the data?
(a) P (B | A).
(b) P (A | B).
(c) P (A ∪ B).
If you knew that everyone who took test B also took test A, how would that change
your answers?
Exercise 14.11
A company is concerned about interruptions to email. It was noticed that problems
occurred on 15% of workdays. To see how bad the situation is, calculate the
probabilities of an interruption during a five-day working week:
Exercise 14.12
A restaurant manager classifies customers as well-dressed, casually-dressed or
poorly-dressed and finds that 50%, 40% and 10%, respectively, fall into these categories.
The manager found that wine was ordered by 70% of the well-dressed, by 50% of the
casually-dressed and by 30% of the poorly-dressed.
(a) What is the probability that a randomly chosen customer orders wine?
(b) If wine is ordered, what is the probability that the person ordering is well-dressed?
(c) If wine is not ordered, what is the probability that the person ordering is
poorly-dressed?
233 14
15. Probability II – Probability distributions
Overview
The previous unit introduced probability as a means for modelling uncertainty. We now
consider the probabilities attached to all possible outcomes of a chance experiment, that
is, how probability is distributed across the sample space. Just as we used descriptive
statistics to summarise important features of sample datasets, here we learn how to
calculate equivalent features of population probability distributions.
Aims
This unit explores probability distributions and how to calculate the expected value and
variance for discrete random variables. Particular aims are:
to introduce some common discrete probability distributions
to explore properties of such distributions such as the expected value and variance.
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Probability’ Chapter 2.
Example 15.1 Examine the outcomes when two fair dice are rolled. We consider
the random variable X which is the sum of the shown scores. We can read off the
various possibilities from the sample space:
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)
(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)
234
15. Probability II – Probability distributions
X=x 2 3 4 5 6 7 8 9 10 11 12
Probability 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Example 15.2 We describe the sample space when two fair coins are tossed, and
the associated random variable, X, which counts the number of tails. The sample
space is:
S = {HH, HT, T H, T T }.
X takes the form:
Number of tails 0 1 2
Probability 1/4 1/2 1/4
So, X is a random variable taking values 0, 1 and 2 with probabilities 1/4, 1/2 and
1/4, respectively.
The first of these is clearly finite. The others are finite in practice – but there is no
theoretical upper limit, so it may be convenient to specify an infinite number of
outcomes.
We can describe a discrete distribution (i.e. the random variable and its associated
probabilities) with a histogram or bar chart (rarely), a table as for the dice and coins
above (sometimes) or with a rule or formula (most common).
A useful discrete distribution which we shall discuss later is the binomial
distribution.
235
15. Probability II – Probability distributions
Number of tails 0 1 2
Probability 1/4 1/2 1/4
In the penultimate line, the first term is 0 × P (X = 0), the second term is
1 × P (X = 1) and the third term is 2 × P (X = 2).
Therefore, the mean value of X is:
2
X
x P (X = x).
x=0
236
15. Probability II – Probability distributions
We often write:
E(X) = µ
the same symbol used for the (population) arithmetic mean. We can think of E(X) as
the long-run average when the experiment is carried out a large number of times.
Example 15.3 Suppose I buy a lottery ticket for £1. I can win £500 with a
probability of 0.001 or £100 with a probability of 0.003. What is my expected profit?
We begin by defining the random variable X to be my profit (in pounds). Its
distribution is:
X=x −1 99 499
P (X = x) 0.996 0.003 0.001
So I expect to make a loss of £0.20 (which will go to funding the prize money or,
possibly, charity).
X=x 1 2 3 4 5 6
P (X = x) 1/6 1/6 1/6 1/6 1/6 1/6
These take the values derived from the function given, and the associated probabilities
are derived from those of X. Therefore, from the distribution of X we can derive, for
example, the distribution of X1 = 1/X.
237
15. Probability II – Probability distributions
X 2 = x2 1 4 9 16 25 36
P (X2 = x2 ) 1/6 1/6 1/6 1/6 1/6 1/6
X 3 = x3 0 1 2
P (X3 = x3 ) 1/2 1/3 1/6
Just as we defined:
X
E(X) = x P (X = x)
where g(X) is the function of X being considered, defines the expectation of this
function of X where we sum over all the x values which are taken by the random
variable X.
Example 15.4 For the random variables X1 , X2 and X3 defined above, we have
the following.
For X1 , its expectation is:
X
1 1 1 1 1 1 1 1 1 1 1 1 1 1 49
E(X1 ) = E = P (X = x) = × + × + × + × + × + × = .
X x 1 6 2 6 3 6 4 6 5 6 6 6 120
238
15. Probability II – Probability distributions
X=x 1 2 3 4 5
P (X = x) 0.1 0.2 0.3 0.3 0.1
(b) E(2X + 1)
(c) E(X 3 )
(d) E(1/X).
Recall that E(X) = µ, and the summation is taken over all the x values which are taken
by the random variable X. We often write:
Var(X) = σ 2
and just as we could find a sample variance in two ways, we can rewrite this as:
X
Var(X) = x2 P (X = x) − µ2 . (15.2)
So there are two equivalent versions we can use. The latter is often easier in practice.
The square root of the variance is the standard deviation. We could write (15.2) more
succinctly as follows.
239
15. Probability II – Probability distributions
Example 15.5 For the two coins in Example 15.2, we saw that µ = 1. Therefore,
the variance is:
2
2
X 1 1 1
σ = (x − µ)2 P (X = x) = (0 − 1)2 × + (1 − 1)2 × + (2 − 1)2 ×
x=0
4 2 4
1 1
= +0+
4 4
1
= (first method)
2
or:
2
2
X
2 2 1 221 2 1
σ = x P (X = x) − µ = 0 × + 1 × + 2 × − 12
x=0
4 2 4
1
= 0+ +1 −1
2
1
= (second method)
2
√
giving a standard deviation of 1/ 2.
X=x 0 1 2 3 4
P (X = x) 0.25 0.30 0.25 0.15 0.05
(b) What is the probability that, on any given day, the number of calls exceeds:
i. µ + 2σ
ii. µ + 3σ?
(a) We have:
and:
240
15. Probability II – Probability distributions
p √
(b) We have that σ = Var(X) = 1.3475 = 1.16.
Activity 15.2 Find the variance and standard deviation of the discrete probability
distribution in Activity 15.1.
X=x 1 2 3 4 5 6 7
P (X = x) 1/7 1/7 1/7 1/7 1/7 1/7 1/7
What are the mean and variance in the general case? Well, the mean is:
X X 1 1 1 1
E(X) = x P (X = x) = k P (X = k) = 1× +2× +· · ·+k× = (1+2+· · ·+k).
k k k k
A useful result from mathematics is that:
k (k + 1)
1 + 2 + ··· + k = .
2
Refer to Activity 10.1 where this is derived. So:
1 k (k + 1) k+1
E(X) = × = .
k 2 2
Therefore, the expectation is the arithmetic mean of the minimum and maximum values.
A similar, slightly more involved calculation shows that:
k2 − 1
Var(X) = .
12
241
15. Probability II – Probability distributions
k (k + 1) (2k + 1)
12 + 22 + · · · + k 2 = .
6
242
15. Probability II – Probability distributions
If X ∼ Bernoulli(π), then:
E(X) = π
and:
Var(X) = π (1 − π).
Activity 15.4 Derive the mean and variance of the Bernoulli distribution.
If X denotes the total number of successes in these n trials then X follows a binomial
distribution with parameters n and π, where n ≥ 1 is a known integer, and 0 ≤ π ≤ 1.
This is often written as:
X ∼ Bin(n, π).
243
15. Probability II – Probability distributions
A certain type of car battery has a known market share. If we examine n cars, we
can find the probability of finding x batteries of this type.
Example 15.7 A multiple choice test has four questions, each with four possible
answers. James is taking the test, but has no idea at all about the answers. So he
guesses every answer and, therefore, has a probability of 1/4 of getting any
individual question correct by chance.
Let X denote the number of correct answers in James’ test. This follows a binomial
distribution with n = 4 and π = 0.25, hence:
X ∼ Bin(4, 0.25).
244
15. Probability II – Probability distributions
and:
4
P (X = 4) = × (0.25)4 × (0.75)0 = 0.004.
4
Example 15.9 Suppose we now have a test with 20 questions where each question
has 4 possible answers and consider again a student who guesses every one of the
answers. Let X denote the number of correct answers by such a student, so that
X ∼ Bin(20, 0.25). The expected number of correct answers is E(X) = 20 × 0.25 = 5.
The teacher wants to set the pass mark of the examination so that, for such a
student, the probability of passing is less than 0.05. What should the pass mark be?
In other words, what is the smallest x such that P (X ≥ x) < 0.05, i.e. such that
P (X < x) ≥ 0.95?
Calculating the probabilities of x = 0, 1, . . . , 20 we get (rounded to 3 decimal places):
X=x 0 1 2 3 4 5 6 7 8 9 10
P (X = x) 0.003 0.021 0.067 0.134 0.190 0.202 0.169 0.112 0.061 0.027 0.010
X=x 11 12 13 14 15 16 17 18 19 20
P (X = x) 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
We find that P (X < 8) = 0.898 and P (X < 9) = 0.959. Therefore, using the result
P (Ac ) = 1 − P (A), P (X ≥ 8) = 0.102 > 0.05 and P (X ≥ 9) = 0.041 < 0.05. So the
pass mark should be set at 9.
More generally, consider a student who has the same probability π of getting the
correct answer for each question, so that X ∼ Bin(20, π). In Figure 15.1, plots of the
probabilities for π = 0.25, 0.5, 0.7 and 0.9 are provided. Note how the shape of the
245
15. Probability II – Probability distributions
0.30
0.20
0.20
Probability
Probability
0.10
0.10
0.00
0.00
0 5 10 15 20 0 5 10 15 20
0.30
0.20
0.20
Probability
Probability
0.10
0.10
0.00
0.00
0 5 10 15 20 0 5 10 15 20
Figure 15.1: Binomial distribution probabilities where X ∼ Bin(20, π), for π = 0.25, 0.5,
0.7 and 0.9.
Figure 15.1 illustrates how different probability distributions may differ from each other
in a broader or narrower sense. In the broader sense, we have different families of
distributions which may have quite different characteristics, for example:
• among continuous: different sets of possible values (for example, all real numbers x,
or x > 0).
The distributions discussed in this unit are really families of distributions in this sense.
In the narrower sense, individual distributions within a family differ in having different
values of the parameters of the distribution. The parameters determine the mean and
246
15. Probability II – Probability distributions
• use observed data to estimate values of the parameters of that distribution, and
perform statistical inference.
(a) no defectives
If X ∼ Poisson(λ), then:
E(X) = λ (15.3)
and:
Var(X) = λ.
Poisson distributions are used for counts of occurrences of various kinds. To give a
formal motivation, suppose that we consider the number of occurrences of some
phenomenon in time, and that the process which generates the occurrences satisfies the
following conditions.
247
15. Probability II – Probability distributions
1. The numbers of occurrences in any two mutually exclusive intervals of time are
independent of each other.
2. The probability of two or more occurrences at the same time is negligibly small.
3. The probability of one occurrence in any short time interval of length t is λt for
some constant λ > 0.
Because λ is the rate per unit of time, its value also depends on the unit of time (length
of interval) we consider. For example, if X is the number of arrivals in an hour and
X ∼ Poisson(1.5), then if Y is the number of arrivals in two hours we must have that
Y ∼ Poisson(2 × 1.5) = Poisson(3).
λ is also the mean, E(X), of the distribution as we saw in (15.3).
Both motivations suggest that distributions with higher values of λ have higher
probabilities of large values of X. For example, Figure 15.2 plots the probabilities
P (X = x) for x = 0, 1, 2, . . . , 10 for Poisson(2) and Poisson(4).
0.25
λ=2
λ=4
0.20
0.15
p(x)
0.10
0.05
0.00
0 2 4 6 8 10
(b) We have:
P (X > 2) = 1 − P (X ≤ 2) = 1 − (P (X = 0) + P (X = 1) + P (X = 2))
= 1 − P (X = 0) − P (X = 1) − P (X = 2)
e−1.6 (1.6)0 e−1.6 (1.6)1 e−1.6 (1.6)2
=1− − −
0! 1! 2!
= 1 − 0.2019 − 0.3230 − 0.2584
= 0.2167.
e−8 80 e−8 81
P (Y ≤ 1) = P (Y = 0)+P (Y = 1) = + = 0.000335+0.002684 = 0.003019.
0! 1!
Activity 15.6 Hits on a website arrive at the rate of 12 per hour. Briefly discuss
whether or not you believe the assumptions underlying the Poisson distribution
hold. Assuming the assumptions are valid, calculate the probabilities that:
(a) there are exactly three hits between 10:00 and 10:30
(c) there are more than two hits between 16:40 and 16:45.
249
15. Probability II – Probability distributions
and ‘e’ key (for the Poisson), which will not appear on a basic calculator. Note that any
probability calculations which are required in the examination will be possible on a
basic calculator. For example, if a Poisson probability required the numerical value of
e−3 , then this would be provided in the examination question.
15.12 Summary
This unit has introduced the concept of a random variable and explained how there are
two types of random variable – discrete and continuous. Focusing on discrete random
variables, probability distributions were constructed to represent how likely the different
possible outcomes of a chance experiment were to occur. Important theoretical
properties of these probability distributions were also discussed, specifically the
expected value and variance.
Learning outcomes
At the end of this unit, you should be able to:
calculate the expected value and variance for discrete random variables
Exercises
Exercise 15.1
The probability function P (X = x) = 0.02x is defined for x = 8, 9, 10, 11 and 12. What
are the mean and variance of this probability distribution?
250
15. Probability II – Probability distributions
Exercise 15.2
Of all the candles produced by a company, 0.01% do not have wicks (the core piece of
string). A retailer buys 10,000 candles from the company.
(a) What is the probability that all the candles have wicks?
(b) What is the probability that at least one candle will not have a wick?
Exercise 15.3
If a large grass lawn contains an average of 1 weed per 600 cm2 , what will be the
distribution of X, the number of weeds in an area of 400 cm2 ? Hence find P (X ≤ 1).
Exercise 15.4
A graduate applies for 10 jobs. She believes she has a constant probability 0.1 of
receiving a job offer in each case. Assume independence of job offers.
(a) Write down the distribution of the total number of job offers received. What are
the mean and variance of the distribution?
(c) The graduate is considering using the Poisson distribution to simplify the
calculation in (b). What advice would you give her?
(d) Discuss briefly whether you think the assumption of independence is realistic in
this context.
Exercise 15.5
The random variable X has a binomial distribution such that X ∼ Bin(4, 0.3). It has
the following probability distribution.
X=x 0 1 2 3 4
P (X = x) 0.2401 0.4116 0.2646 a b
Exercise 15.6
In a prize draw, the probabilities of winning various amounts of money are:
251
15. Probability II – Probability distributions
Exercise 15.7
For a random variable X, the formula:
14
× (0.3)6 × (0.7)8
6
was used to compute P (X = 6). What is the standard deviation of this probability
distribution?
Exercise 15.8
Explain briefly when it would be appropriate to use a:
252
16. Probability III – The normal distribution and sampling distributions
Overview
The normal distribution is introduced and probabilities calculated for this distribution
(which requires a transformation to the standard normal distribution). We then proceed
to consider the estimation of a population mean through the use of sampling. This gives
rise to a sampling distribution and its properties are discussed. We conclude the
probability section of the course with the powerful result known as the ‘central limit
theorem’.
Aims
This unit explores the normal distribution and how it relates to sampling distributions
and the central limit theorem. Particular aims are:
to work with the normal distribution
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Probability’ Chapter 3.
253
16. Probability III – The normal distribution and sampling distributions
approximately normal due to the central limit theorem (CLT). Because of this,
the normal distribution has a crucial role in statistical inference. This will be
discussed later.
The probability density function (i.e. the formula for the distribution curve) of the
normal distribution (which you do not need to remember!) is:
1 1 2
f (x) = √ exp − 2 (x − µ) for − ∞ < x < ∞ (16.1)
2πσ 2 2σ
where π is the mathematical constant (i.e. π = 3.14159 . . .), and µ and σ 2 are
parameters, with −∞ < µ < ∞ and σ 2 > 0.
A random variable X with this probability density function is said to have a normal
distribution with mean µ and variance σ 2 , denoted X ∼ N (µ, σ 2 ).
If X ∼ N (µ, σ 2 ), then:
E(X) = µ
and:
Var(X) = σ 2
hence the standard deviation is σ.
The normal distribution is the so-called ‘bell curve’. The two parameters affect it as
follows.
The mean, µ, determines the location of the curve.
The variance, σ 2 , determines the dispersion (spread) of the curve.
For example, in Figure 16.1, N (0, 1) and N (5, 1) have the same dispersion but different
locations – the N (5, 1) curve is identical to the N (0, 1) curve, but shifted 5 units to the
right, while N (0, 1) and N (0, 9) have the same location but different dispersions – the
N (0, 9) curve is centred at the same value as the N (0, 1) curve, but spread out more
widely.
The mean can also be inferred from the observation that the normal distribution is
symmetric about µ. This also implies that the median of the normal distribution is also
µ, and we also note that since the distribution reaches a maximum at µ, then the mean
and median are also equal to the mode.
Probabilities are given by areas under the curve, which involves integrating equation
(16.1). Unfortunately, such integrals cannot be evaluated in closed-form, so instead we
make use of statistical tables.1 Specifically, we note the special transformation:
X −µ
Z= .
σ
The transformed variable Z is known as a standardised variable, or z-score. It can be
shown (but is beyond the scope of this course), that the distribution of the z-score is
1
In practice we could also use a computer, but not in the examination!
254
16. Probability III – The normal distribution and sampling distributions
0.4
0.3
N(0, 1) N(5, 1)
0.2
0.1
N(0, 9)
0.0
−5 0 5 10
N (0, 1), i.e. the normal distribution with mean µ = 0 and variance σ 2 = 1 (and,
therefore, a standard deviation of σ = 1).
If X ∼ N (µ, σ 2 ), then:
X −µ
Z= ∼ N (0, 1).
σ
The cumulative probability, P (Z ≤ z), is often denoted by Φ(z) and values for various
‘z’ are given in Appendix C.
In the examination, you will have a copy of the table in Appendix C. The table shows
values of Φ(z) = P (Z ≤ z) for z ≥ 0. This can be used to calculate probabilities of any
intervals for any normal distribution – but how? The table seems to be incomplete.
1. It is only for N (0, 1), not for N (µ, σ 2 ) for any other µ and σ 2 .
2. Even for N (0, 1), it only shows probabilities for z ≥ 0.
We now show how these are not really limitations, starting with ‘2.’, i.e. how to work
out cumulative standard normal probabilities for negative z-values.
The key to using the table is that the standard normal distribution is symmetric about
zero. This means that for an interval in one tail, its ‘mirror image’ in the other tail has
the same probability.
Suppose that z ≥ 0, so that −z ≤ 0. The table in Appendix C shows:
P (Z ≤ z) = Φ(z).
255
16. Probability III – The normal distribution and sampling distributions
In the continuous world, the probability of a single point value is zero. Therefore, since
P (Z = z) = 0 for all z, we are indifferent between using ≤ and <, similarly we are
indifferent between using ≥ and >. So, P (Z ≤ z) = P (Z < z) and also we have that
P (Z ≥ z) = P (Z > z). This is because:
P (Z ≤ z) = P (Z < z) + P (Z = z) = P (Z < z) + 0 = P (Z < z)
Figure 16.2 shows equal tail probabilities for the standard normal distribution, i.e. it
shows that P (Z ≤ −z) = P (Z ≥ z).
−z 0 +z
Figure 16.2: Equal tail probabilities for the standard normal distribution showing that
P (Z ≤ −z) = P (Z ≥ z).
where Φ(z2 ) and Φ(z1 ) are obtained using the tabulated values in Appendix C.
Reality check : Remember that the standard normal distribution is symmetric about 0,
hence:
Φ(0) = P (Z ≤ 0) = 0.5.
So if you ever end up with results like P (Z ≤ −1) = 0.7 or P (Z ≤ 1) = 0.2 or
P (Z > 2) = 0.95, these must be wrong! Why? Well, P (Z ≤ −1) < P (Z ≤ 0) = 0.5, so
P (Z ≤ −1) cannot be 0.7. Similarly, 0.5 = P (Z ≤ 0) < P (Z ≤ 1), so P (Z ≤ 1) cannot
be 0.2. Finally, P (Z ≥ |z|) ≤ 0.5 for any z, so P (Z > 2) cannot be 0.95.
256
16. Probability III – The normal distribution and sampling distributions
using the ‘1.2’ row and ‘0.00’ column which shows that:
0.4
0.3
f Z (z )
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
Figure 16.3: The standard normal density function where P (Z > 1.20) is the area of the
shaded region.
Example 16.2 Turn to Appendix C. Look up the probability in the ‘0.8’ row and
‘0.04’ column of the table, which shows that:
257
16. Probability III – The normal distribution and sampling distributions
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
Figure 16.4: The standard normal density function where the red area is P (0 ≤ Z ≤ 1.86)
and the blue area is P (−1.24 ≤ Z ≤ 0).
(e) Find a value for z such that P (−z < Z < z) = 0.80.
258
16. Probability III – The normal distribution and sampling distributions
Example 16.4 Let X denote the diastolic blood pressure of a randomly selected
person in England. This is approximately distributed as X ∼ N (74.2, 127.87). Note
that diastolic blood pressure can only be approximately normal, rather than exactly
normal, because normal random variables can take negative values and, clearly,
diastolic blood pressure cannot be negative. However, for practical purposes, we can
use the normal distribution to model diastolic blood pressure.
Suppose we want to know the probabilities of the following intervals:
X > 90 (high blood pressure)
These are calculated using the previous results, with µ = 74.2 and σ 2 = 127.87, and
hence σ = 11.31. So here:
X − 74.2
= Z ∼ N (0, 1)
11.31
and we can refer values of this standardised variable to the table in Appendix C. We
have:
X − 74.2 90 − 74.2
P (X > 90) = P > = P (Z > 1.40) = 1 − Φ(1.40)
11.31 11.31
259
16. Probability III – The normal distribution and sampling distributions
Also:
X − 74.2 60 − 74.2
P (X < 60) = P < = P (Z < −1.26) = P (Z > 1.26) = 1−Φ(1.26)
11.31 11.31
Mid: 0.82
0.03
Low: 0.10
0.02
High: 0.08
0.01
0.00
40 60 80 100 120
Figure 16.5: Probabilities for Example 16.4 regarding diastolic blood pressure.
Activity 16.2 The scores on a verbal reasoning test are modelled by a normal
distribution with a mean of µ = 100 and a standard deviation of σ = 10.
(a) What proportion of the scores will be greater than 95?
(c) What is the probability of an individual selected at random having a score less
than 70?
(e) What is the range of scores such that 5% of the scores are below the range and
5% of the scores are above the range?
260
16. Probability III – The normal distribution and sampling distributions
P (µ − 1.96 × σ < X < µ + 1.96 × σ) = 0.950. In words, 95% of the total probability
is within 1.96 standard deviations of the mean.
P (µ − 2.58 × σ < X < µ + 2.58 × σ) = 0.99. In words, 99% of the total probability
is within 2.58 standard deviations of the mean.
0.683
Figure 16.6: Some probabilities around the mean. The shaded area shows that 68.3% of
the total probability is within 1 standard deviation of the mean. The shaded and hatched
areas combined show that 95% of the total probability is within 1.96 standard deviations
of the mean.
261
16. Probability III – The normal distribution and sampling distributions
random selection mechanism, we do not know (in advance) which sample will occur.
Every population element has a known, non-zero probability of selection in the sample,
but no element is certain to appear.
Consider a population of size N = 6 elements: A, B, C, D, E and F. We consider all
possible simple random samples of size n = 2 (without replacement, i.e. once an object
has been chosen it cannot be selected again). There are 15 different, but equally likely,
such samples, so each sample has the same probability of selection, i.e. 1/15. The
possible samples are:
AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF.
16.2.1 Estimation
A population has particular characteristics of interest such as the mean, µ, and
variance, σ 2 . Collectively we refer to these characteristics as parameters. If we do not
have population data, the parameter values will be unknown.
‘Statistical inference’ is the process of estimating the (unknown) parameter values using
the (known) sample data. We use a statistic (called an ‘estimator’) calculated from
sample observations to provide a ‘point estimate’ of a parameter.
Returning to our example, recall there are 15 different samples of size 2 from a
population of size 6. Suppose the variable of interest is monthly income, such that:
Individual A B C D E F
Income (in £000s) 3 6 4 9 7 7
We use the sample mean, X̄, as our estimator of the population mean, µ, where for
a random sample size n we define:
n
P
Xi
i=1
X̄ = .
n
For example, if the observed sample was AB, the sample mean is (3000 + 6000)/2 =
£4,500.
Clearly, different observed random samples will lead to different sample means.
Consider the values of X̄, i.e. x̄, for all possible samples (in £000s):
262
16. Probability III – The normal distribution and sampling distributions
So the values of X̄ vary from 3.5 to 8, depending on the sample values. Since we have
the population data here we can actually compute the population mean, µ (in £000s),
which is:
PN
Xi
i=1 3+6+4+9+7+7
µ= = = 6.
N 6
So, even with simple random sampling, we obtain some x̄ values far from µ. Here, in
fact, only one sample (AD) results in x̄ = µ.
Let us now consider the maximum possible absolute deviations of the sample mean
from the population mean, i.e. the distance |x̄ − µ|.
So, for example, there is an 80% probability of being within 1.5 units of µ (in £000s).
We now represent this as a frequency distribution. That is, we record the frequency
of each possible value of x̄.
263
16. Probability III – The normal distribution and sampling distributions
This is a good illustration of a population parameter (µ) being estimated by its sample
counterpart (X̄).
The unbiasedness of an estimator is clearly desirable. However, we also need to take into
account the dispersion of the estimator’s sampling distribution. Ideally, the possible
values of the estimator should not vary much around the true parameter value. So we
seek an estimator with a small variance. Recall the variance is defined to be the mean
of the squared deviations about the mean of the distribution. In the case of sampling
distributions, it is referred to as the sampling variance.
Returning to our example:
264
16. Probability III – The normal distribution and sampling distributions
N − n σ2
Var(X̄) = × .
N −1 n
6−2 4
Var(X̄) = × = 1.6
6−1 2
as we saw above.
We use the term standard error to refer to the standard deviation of the sampling
distribution, so:
r
N − n σ2
q
S.E.(X̄) = Var(X̄) = × = σX̄ .
N −1 n
Some implications are the following.
As the sample size, n, increases, the sampling variance decreases, i.e. the precision
increases.4
N −n
≈1
N −1
Returning to our example, the larger the sample, the less variability there will be
between samples.
4
Although greater precision is desirable, data collection costs will rise with n – remember why we
sample in the first place!
265
16. Probability III – The normal distribution and sampling distributions
We see that there is a striking improvement in the precision of the estimator, because
the variability has decreased considerably. The range of possible x̄ values goes from 3.5
to 8.0 down to 5.0 to 7.25. The sampling variance is reduced from 1.6 to 0.4.
The factor (N − n)/(N − 1) decreases steadily as n → N . When n = 1 the factor equals
1, and when n = N it equals 0. Sampling without replacement, increasing n must
increase precision since less of the population is left out. In most practical sampling N
is very large (for example, several million), while n is comparatively small (at most
1,000, say). Therefore, in such cases the factor (N − n)/(N − 1) is close to 1, hence:
N − n σ2 σ2 Var(X)
Var(X̄) = × ≈ =
N −1 n n n
for small n/N . When N is large, it is the sample size n which is important in
determining precision, not the sampling fraction. Consider two populations: N1 = 3
million and N2 = 200 million, both with the same variance σ 2 . If we sample
n1 = n2 = 1000 from each population then:
2 N1 − n1 σ 2 σ2
σX̄ = × = 0.999667 ×
1
N1 − 1 n1 1000
and:
2 N2 − n2 σ 2 σ2
σX̄ = × = 0.999995 × .
2
N2 − 1 n2 1000
2 2
So σX̄1
≈ σX̄2
, despite N1 being much less than N2 .
266
16. Probability III – The normal distribution and sampling distributions
n=100
n=20
n=5
267
16. Probability III – The normal distribution and sampling distributions
which has mean E(Xi ) = µ and variance Var(Xi ) = σ 2 , for i = 1, . . . , n. If X̄n denotes
the sample mean calculated from a random sample of size n, then:
X̄n − µ
lim P √ ≤ z = Φ(z)
n→∞ σ/ n
for any z, where Φ(z) denotes P (Z ≤ z) where Z ∼ N (0, 1).
The ‘ lim ’ indicates that this is an asymptotic result, i.e. one which holds increasingly
n→∞
well as n increases, and exactly when the sample size is infinite.
In less formal language, the CLT says that for a random sample from (nearly) any
non-normal distribution with mean µ and variance σ 2 , then:
σ2
X̄ ∼ N µ,
n
‘Nearly’ because the CLT requires that the variance of the population distribution is
finite. If it is not, the CLT does not hold, but such distributions are not common.
It may appear that the CLT is still somewhat limited, in that it applies only to sample
means calculated from simple random samples. However, this is not really true, for two
main reasons.
There are more general versions of the CLT which do not require the observations
to be from such samples.
Even the basic version applies very widely when we realise that the ‘X’ can also be
a function of the original variables in the data. For example, if X and Y are
variables in the sample, we can also apply the CLT to:
Pn Pn
log(Xi ) X i Yi
i=1 i=1
or .
n n
Therefore, the CLT can be used to derive sampling distributions for many statistics
which do not initially look at all like X̄ for a single variable in a random sample.
How large is ‘large n’ ? The larger the sample size n, the better is the normal
approximation provided by the CLT. In practice, we have various rules-of-thumb for
what is ‘large enough’ for the approximation to be ‘accurate enough’. This also depends
on the population distribution – for example:
for symmetric distributions, even small n is enough
for very skewed distributions, larger n is required.
268
16. Probability III – The normal distribution and sampling distributions
Example 16.5 In the first example, random samples (not shown here) of sizes:
n=5 n = 10
n=1
0 10 20 30 40 0 2 4 6 8 10 12 14 2 4 6 8 10
n = 30 n = 100 n = 1000
2 3 4 5 6 7 2.5 3.0 3.5 4.0 4.5 5.0 5.5 3.6 3.8 4.0 4.2 4.4
Example 16.6 In the second example, 10,000 random samples (again, not shown
here) of sizes:
n = 1, 10, 30, 50, 100 and 1000
were simulated from the Bernoulli(0.2) distribution (for which µ = 0.2 and also
σ 2 = 0.2 × (1 − 0.2) = 0.16).
Here the distribution itself is not even continuous, and has only two possible values,
0 and 1. Nevertheless, the sampling distribution of X̄ can be well-approximated by
the normal distribution, when n is large enough, as shown in Figure 16.9.
269
16. Probability III – The normal distribution and sampling distributions
n
P
Note that since here Xi = 1 or Xi = 0 for all i = 1, . . . , n, X̄ = Xi /n = m/n,
i=1
where m is the number of observations for which Xi = 1. In other words, X̄ is the
sample proportion of the value X = 1.
The normal approximation is clearly very bad for small n, but reasonably good
already for n = 50.
n = 30
n = 10
n=1
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 0.5
n = 100 n = 1000
n = 50
0.0 0.1 0.2 0.3 0.4 0.50.05 0.10 0.15 0.20 0.25 0.30 0.35 0.16 0.18 0.20 0.22 0.24
A B C D
3 6 9 12
(b) Write down the sampling distribution of the sample mean for samples of size
n = 2.
(c) Using the result in (b), calculate the mean of the sampling distribution.
(d) Using the result in (b), calculate the variance of the sampling distribution.
(e) Use the formula for the variance of the sample mean to verify the relationship
between the value from (d) and the population variance.
270
16. Probability III – The normal distribution and sampling distributions
16.4 Summary
The final probability unit covered the key points relating to the normal distribution. We
saw how to calculate probabilities for this distribution by considering areas under its
curve. We then proceeded to explain the concept of a sampling distribution and its
importance when estimating an unknown parameter such as a population mean when
sampling from a normal population and, by way of the central limit theorem, when
sampling from non-normal populations.
Learning outcomes
At the end of this unit, you should be able to:
compute areas under the curve for a normal distribution
Exercises
Exercise 16.1
The random variable X has a normal distribution with mean µ and variance σ 2 , i.e.
X ∼ N (µ, σ 2 ). It is known that:
271
16. Probability III – The normal distribution and sampling distributions
Exercise 16.2
A random variable takes the values 1, 2 and 3, each with equal probability. List all
possible samples of size two which may be chosen, without replacement, from this
population and hence construct the sampling distribution of the sample mean, X̄.
Exercise 16.3
The weights of a large group of animals have a mean of 8.2 kg and a standard deviation
of 2.2 kg. What is the probability that a random selection of 80 animals from the group
will have mean weight between 8.3 kg and 8.4 kg? State any assumptions you make.
Exercise 16.4
A perfectly-machined regular tetrahedral (pyramid-shaped) die has four faces labelled 1
to 4. It is tossed twice onto a level surface and after each toss the number on the face
which is downward is recorded. If the recorded values are x1 and x2 and the mean is
x̄ = (x1 + x2 )/2, describe the distribution of x̄ as a random quantity over repeated
double tosses.
Exercise 16.5
A normal distribution has a mean of 40. If 10% of the distribution falls between the
values of 50 and 60, what is the standard deviation of the distribution?
Exercise 16.6
Consider the following set of data. Does it appear to approximately follow a normal
distribution? Justify your answer.
45 31 37 55 54 56
48 54 52 55 52 51
49 46 62 38 45 48
47 46 40 61 50 58
46 35 36 59 50 48
39 48 51 52 43 45
Exercise 16.7
Discuss the differences or similarities between a sampling distribution of size 5 and a
single (simple) random sample of size 5.
Exercise 16.8
The distribution of salaries of lecturers in a university is positively skewed, with most
lecturers earning near the minimum of the pay scale. What would a sampling
distribution of size 2 look like? How about size 5? How about size 50?
Exercise 16.9
In no more than 200 words, explain the term ‘central limit theorem’.
272
17. Sampling and experimentation I – Sampling techniques and contact methods
Overview
Statistics concerns data analysis, but to do any analysis first we need data! This unit
explores various methods which social scientists can use to gather data. Central to this
is the concept of sampling – the (possibly random) selection of a sample of members
from an underlying population. From our sample we can then make inferences about
the population. We begin by describing a range of sampling techniques, outlining their
relative advantages and disadvantages, and then consider the possible contact methods
which might be used.
Aims
This unit presents random and non-random sampling techniques and survey contact
methods. Particular aims are:
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Statistics’ Chapter 8.
17.1 Sampling
Sampling is a key component of any research design. The key to the use of statistics in
research is being able to take data from a sample and make inferences about a large
population. This idea is depicted in Figure 17.1.
Sampling design involves several basic questions.
Should a sample be taken?
273
17. Sampling and experimentation I – Sampling techniques and contact methods
Figure 17.1: A depiction of inferring population characteristics from a sample drawn from
the population of interest.
Sample or census?
The length of time available for the study is important – a sample is far quicker to
perform.
274
17. Sampling and experimentation I – Sampling techniques and contact methods
Sampling errors occur when the sample fails to adequately represent the population.
If the consequences of making sampling errors are extreme (i.e. the ‘cost’ is high),
then a census would appeal more since it eliminates sampling errors completely.
Measuring sampled elements may result in the destruction of the object, such as
testing the lifespan of a tyre. Clearly, in such cases a census is not feasible as there
would be no tyres left to sell!
The conditions which favour the use of a sample or census are summarised in Table
17.1. Of course, in practice, some of our factors may favour a sample while others favour
a census, in which case a balanced judgement is required.
Activity 17.1 Under what conditions would (a) a sample be preferable to a census,
and (b) a census be preferable to a sample?
We draw a sample from the target population, which is the collection of elements or
objects which possess the information sought by the researcher and about which
inferences are to be made. We now consider the different types of sampling techniques
which can be used in practice, which can be decomposed into ‘non-probability sampling
techniques’ and ‘probability sampling techniques’.
275
17. Sampling and experimentation I – Sampling techniques and contact methods
Non-probability sampling techniques are characterised by the fact that some units in
the population do not have a chance of selection in the sample. Individual units in the
population have an unknown probability of being selected. There is also an inability to
measure sampling error. Examples of such techniques are:
convenience sampling
judgemental sampling
quota sampling
snowball sampling.
Probability sampling techniques mean every population element has a known, non-zero
probability of being selected in the sample. Probability sampling makes it possible to
estimate the margins of sampling error, and hence all statistical techniques (such as
confidence intervals and hypothesis tests – not considered in this course) can be applied.
In order to perform probability sampling, we need a sampling frame which is a list of
all population elements. However, we need to consider whether the sampling frame is (i)
adequate (does it represent the target population?), (ii) complete (are there any missing
units, or duplications?), (iii) accurate (are we researching dynamic populations?), and
(iv) convenient (is the sampling frame readily accessible?). Examples of such techniques
are:
simple random sampling
systematic sampling
stratified sampling
cluster sampling
multistage sampling.
We now consider each of the listed techniques, explaining their strengths and
weaknesses. To illustrate each, we will use the example of 25 students (labelled ‘1’ to
‘25’) who happen to be in a particular class (labelled ‘A’ to ‘E’) as follows:
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
276
17. Sampling and experimentation I – Sampling techniques and contact methods
place at the right time. Examples include using students and members of social
organisations, as well as ‘people-in-the-street’ interviews.
Suppose class D happens to assemble at a convenient time and place, so all elements
(students) in this class are selected. The resulting sample consists of students 16, 17, 18,
19 and 20. Note that no students are selected from classes A, B, C and E.
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Strengths of convenience sampling include being the cheapest, quickest and most
convenient form of sampling. Weaknesses include selection bias (discussed later) and the
lack of a representative sample.
Judgemental sampling
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Quota sampling
277
17. Sampling and experimentation I – Sampling techniques and contact methods
sample composition to reflect this. See Table 17.2 assuming a required sample size of
1,000 which means 48% of the sample (480) should be male and 52% of the sample
(520) should be female.
Suppose a quota of one student from each class is imposed. Within each class, one
student is selected based on convenience or judgement. The resulting sample consists of
students 3, 6, 13, 20 and 22.
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Snowball sampling
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Snowball sampling has the major advantage of being able to increase the chance of
locating the desired characteristic in the population and is also fairly cheap. However, it
can be time-consuming.
278
17. Sampling and experimentation I – Sampling techniques and contact methods
In a simple random sample each element in the population has a known and equal
probability of selection. Each possible sample of a given size, n, has a known and equal
probability of being the sample which is actually selected. This implies that every
element is selected independently of every other element.
Suppose we select five random numbers (using a ‘random number generator’) from 1 to
25. Suppose the random number generator returns 3, 7, 9, 16 and 24. Therefore, the
resulting sample consists of students 3, 7, 9, 16 and 24. Note there is no student from
class C.
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
SRS is simple to understand and results are readily projectable. However, there may be
difficulty constructing the sampling frame, lower precision (relative to other probability
sampling methods) and there is no guarantee of a representative sample.
Systematic sampling
279
17. Sampling and experimentation I – Sampling techniques and contact methods
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Stratified sampling
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Stratified sampling includes all the important subpopulations and ensures a high level
of precision. However, sometimes it might be difficult to select relevant stratification
factors and the stratification process itself might not be feasible in practice if it was not
known to which stratum each population element belonged.
1
‘Strata’ is the plural of ‘stratum’.
280
17. Sampling and experimentation I – Sampling techniques and contact methods
Cluster sampling
In cluster sampling, the target population is first divided into mutually exclusive and
collectively exhaustive subpopulations known as clusters. A random sample of clusters is
then selected, based on a probability sampling technique such as SRS. For each selected
cluster, either all the elements are included in the sample (one-stage cluster sampling),
or a sample of elements is drawn probabilistically (two-stage cluster sampling).
Elements within a cluster should be as heterogeneous as possible, but clusters
themselves should be as homogeneous as possible. Ideally, each cluster should be a
small-scale representation of the population. In ‘probability proportionate to size
sampling’ the clusters are sampled with probability proportional to size. In the second
stage, the probability of selecting a sampling unit in a selected cluster varies inversely
with the size of the cluster.
Suppose we randomly select three clusters: B, D and E. Within each cluster, randomly
select one or two elements. The resulting sample here consists of students 7, 18, 20, 21
and 23. Note that no students are selected from clusters A and C.
A B C D E
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Cluster sampling is easy to implement and cost effective. However, the technique suffers
from a lack of precision and it can be difficult to compute and interpret results.
Multistage sampling
281
17. Sampling and experimentation I – Sampling techniques and contact methods
telephone surveys depend very much on whether your target population is on the
telephone (and how good the telephone system is)
We now explore some2 of the advantages and disadvantages of various contact methods.
Face-to-face interviews
• Advantages: good for personal questions; allow for probing issues in greater
depth; permit difficult concepts to be explained; can show samples (such as
new product designs).
• Disadvantages: (very) expensive; not always easy to obtain detailed
information on the spot.
Telephone interviews
• Advantages: easy to achieve a large number of interviews; easy to check on the
quality of interviewers (through a central switchboard perhaps).
• Disadvantages: not everyone has a telephone so the sample can be biased;
cannot usually show samples; although telephone directories exist for landline
numbers, what about mobile numbers? Also, young people are more likely to
use mobiles rather than landlines, so they are more likely to be excluded.
Self-completion interviews
• Advantages: most people can be contacted this way (there will be little
non-response due to not-at-home reasons); allow time for people to look up
details such as income, tax returns etc.
• Disadvantages: high non-response rate – it requires effort to complete the
questionnaire; answers to some questions may influence answers to earlier
questions since the whole questionnaire is revealed to the respondent – this is
important where the order of a questionnaire matters; you have no control over
who answers the questionnaire.
2
This is not necessarily an exhaustive list. Can you add any more?
282
17. Sampling and experimentation I – Sampling techniques and contact methods
Example 17.1 The following are examples of occasions when you might use a
particular contact method.
17.2 Summary
This unit has described the different sampling techniques which exist when sampling
from a population. It is important to know the merits and limitations of each so that a
recommendation for the most suitable choice of method can be made dependent on the
circumstances of the research problem. Attention has also been given to the choice of
contact method, again with a focus on the strengths and weaknesses of each type.
283
17. Sampling and experimentation I – Sampling techniques and contact methods
Learning outcomes
At the end of this unit, you should be able to:
Exercises
Exercise 17.1
What are the main potential disadvantages of quota sampling with respect to
probability sampling?
Exercise 17.2
Why might disproportionate stratified sampling be preferable to proportionate stratified
sampling?
Exercise 17.3
In no more than 200 words, discuss the relative advantages and disadvantages of
telephone interviewing compared to face-to-face interviewing.
Exercise 17.4
The simplest probability-based sampling method is simple random sampling. Give two
reasons why it may be desirable to use a sampling design which is more sophisticated
than simple random sampling.
284
17. Sampling and experimentation I – Sampling techniques and contact methods
Exercise 17.5
What is the difference between one-stage cluster sampling and two-stage cluster
sampling?
Exercise 17.6
A corporation wants to estimate the total number of worker-hours lost for a given
month because of accidents among its employees. Each employee is classified into one of
three categories – labourer, technician and administrator. Which sampling method do
you think would be preferable here – simple random sampling, stratified sampling, or
cluster sampling? Give arguments to explain your choice.
Exercise 17.7
What criteria would you use in deciding which contact method to use in a survey of
individuals?
Exercise 17.8
Discuss the feasibility of each of the types of survey contact methods (personal
interview, postal survey, email and telephone survey) for a random sample of university
students about their undergraduate experiences and attitudes at the end of the
academic year.
Exercise 17.9
Retirement and Investment Services would like to conduct a survey on online users’
demands for additional internet retirement services. Outline your suggested sampling
and contact method and explain how the results might be affected by your methodology.
285
18. Sampling and experimentation II – Bias and the design of experiments
Overview
This unit explores potential sources of bias which may occur as a result of sampling.
Bias comes in various forms and potential remedies are presented. We conclude with a
look at the design of experiments in the social sciences. Unlike observational studies,
experiments are excellent for establishing causality through use of a control group.
Aims
This unit presents sources of bias and the design of experiments. Particular aims are:
to introduce the notion of causality and how properly designed experiments can
test for this.
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Statistics’ Chapter 8.
18.1 Introduction
We have previously seen that the term target population represents the collection of
units (people, objects etc.) in which we are interested. In the absence of time and
budget constraints we conduct a census, that is a total enumeration of the population.
Its advantage is that there is no sampling error because all population units are
observed and so there is no estimation of population parameters. Due to the large size,
N , of most populations, an obvious disadvantage with a census is cost, so it is often not
feasible in practice. However, even with a census non-sampling errors may occur, for
example if we have to resort to using cheaper (hence less reliable) interviewers who may
erroneously record data, misunderstand a respondent etc.
So we select a sample, that is a certain number of population members are selected and
studied. The selected members are known as elementary sampling units. Sample surveys
286
18. Sampling and experimentation II – Bias and the design of experiments
(hereafter ‘surveys’) are how new data are collected on a population and tend to be
based on samples rather than a census. Selected respondents may be contacted in a
variety of methods such as face-to-face interviews, telephone, mail or email
questionnaires.
Sampling error will occur (since not all population units are observed). However,
non-sampling errors should be fewer since resources can be used to ensure high quality
interviewers or to check completed questionnaires.
Both kinds of error can be controlled or allowed for more effectively by a pilot survey.
A pilot survey is used:
to find the standard error which can be attached to different kinds of questions and
hence to underpin the sampling design chosen
287
18. Sampling and experimentation II – Bias and the design of experiments
Activity 18.1 Give the main problem with the wording of the survey question: ‘Do
you want to be rich and famous?’.
18.3 Bias
Bias caused by non-response and response is worth a special mention. It can cause
problems at every stage of a survey, both random and non-random, and however
administered. The first problem can be in the sampling frame. Is an obvious group
missing? For example:
if the list is of householders, those who have just moved in will be missing
if the list is of those aged 18 or over on the electoral register, and the under-20s are
careless about registration, then younger people will be missing from the sample.
In the field, non-response (data not provided by a unit we wish to sample) is one of
the major problems of sample surveys as the non-respondents, in general, cannot be
treated like the rest of the population. As such, it is most important to try to get a
picture of any shared characteristics in those refusing to answer or people who are not
available at the time of the interview. We can classify non-response into:
item non-response, which occurs when a sampled member fails to respond to a
question in the questionnaire.
lost schedules due to information being lost or destroyed after it had been
collected.
How should we deal with non-response? Well, note that increasing the sample size will
not solve the problem – the only outcome would be that we have more data on the
types of individuals who are willing to respond! Instead, we might look at improving
our survey procedures such as data collection and interviewer training. Non-respondents
could be followed up using callbacks or an alternative contact method to the original
survey in an attempt to subsample the non-respondents. A proxy interview (where a
unit from your sample is substituted with an available unit) may be another possibility.
288
18. Sampling and experimentation II – Bias and the design of experiments
(Note that non-response also occurs in quota sampling but is not generally recorded –
see the earlier discussion.) However, an obvious remedy is to provide an incentive (for
example cash or entry into a prize draw) to complete the survey – this exploits the
notion that human behaviour can be influenced in response to the right incentives!
Response error is very problematic because it is not so easy to detect. A seemingly
clear reply may be based on a misunderstanding of the question asked or a wish to
deceive. A good example from the UK is the reply to the question about the
consumption of alcohol in the Family Expenditure Survey. Over the years there is up to
a 50% understatement of alcohol use compared with the overall known figures for sales
from HM Revenue & Customs!
Sources of response error include the:
role of the interviewer due to the characteristics and/or opinions of the
interviewer, asking leading questions and the incorrect recording of responses
Control of response errors typically involves improving the recruitment, training and
supervision of interviewers, reinterviewing, consistency checks and increasing the
number of interviewers.
In relation to all of these problems pilot work is very important. It may also be possible
to carry out a check on the interviewers and contact methods used after the survey
(post-enumeration surveys).
289
18. Sampling and experimentation II – Bias and the design of experiments
Trend analysis – this is an attempt to discern a trend between early and late
respondents. This trend is projected to non-respondents to estimate where they
stand on the characteristic of interest.
(b) Why is non-response problematic for the person or organisation conducting the
research? Give two reasons.
(c) How can non-response be reduced in telephone surveys and mail surveys,
respectively?
290
18. Sampling and experimentation II – Bias and the design of experiments
291
18. Sampling and experimentation II – Bias and the design of experiments
The sample size in each treatment group must be large enough to ensure that medically
(or socially) important differences can be detected. Sample size calculations to ensure
adequate power are a routine part of experimental design.
To increase the accuracy of the comparisons, the units may be grouped into blocks (for
example by age and gender, or by severity of disease). Within each block one or more
units receive each treatment. Treatments are allocated using randomisation within each
block. Sometimes there are strata or subgroups of interest (for example, we might want
to know whether the drug is as effective for men as it is for women) in which case the
blocks should be chosen to correspond to strata (or to subsets of strata).
18.5.5 Quasi-experiments
Cluster randomised trials are used where it is not practical to apply a treatment, or
treatment combination, to individuals using randomisation, but only to groups or
clusters of individuals. In an educational experiment, schools might be clusters. Half of
the schools, chosen at random, might be given new technology and the other half not.
Results for the students could be aggregated to school level and the treatments could be
compared. Note the experimental units are the clusters (schools) and the relevant
sample size is the number of clusters (schools), not the number of subunits (students).
292
18. Sampling and experimentation II – Bias and the design of experiments
Similar methods of statistical analysis may be used for experimental and for
observational data, but the interpretation differs.
An observational study (such as a survey of schools) may show that schools with
modern technology have better examination results. However, this could be due to the
fact that these schools are generally better equipped and/or have better students.
In an experiment where schools chose to participate it might be found that those
provided with ‘modern technology’ did better than those given extra supplies of paper
and pencils. This might be evidence that having modern equipment would help schools
which would choose to participate. Experiments can provide evidence of causation.
However, the results might not apply to all schools. Less adventurous or more
hard-pressed schools might benefit more from additional paper and pencils.
18.6 Summary
This unit has explored the different sources of error and bias which exist when drawing
a sample from a population. Non-response bias is particularly problematic, and a
variety of adjustments to account for non-response were suggested. Experimentation
concluded this topic, and the importance of a control and treatment group was outlined
in order to establish causality.
Blinding Blocking
Control group Experiment
Incentive Intervention
Interviewer bias Item non-response
Non-response Non-sampling error
Observational study Pilot survey
Placebo Randomisation
Research design Response bias
Response error Sampling error
Selection bias Treatment
Unit non-response
293
18. Sampling and experimentation II – Bias and the design of experiments
Learning outcomes
At the end of this unit, you should be able to:
design and conduct experiments in a social science context
define different forms of bias, explain why they are problematic and offer potential
remedies
Exercises
Exercise 18.1
The following question appeared in a survey of university students: ‘How much time do
you spend studying per week?’. List two problems with the phrasing of this question
which may adversely affect the reliability of the answers to it.
Exercise 18.2
Give an example of response bias. Is response bias a form of sampling error or a form of
non-sampling error? Briefly explain why.
Exercise 18.3
In no more than 200 words, explain the difference between an experimental design and
a survey design, and discuss their relative advantages.
Exercise 18.4
Briefly discuss the advantages and disadvantages of paying respondents for an interview.
Exercise 18.5
A research group has designed a survey and finds the costs are greater than the
available budget. Two possible methods of saving money are a sample size reduction or
spending less on interviewers (for example, by providing less interviewer training or
taking on less-experienced interviewers). Discuss the advantages and disadvantages of
these two approaches.
Exercise 18.6
In no more than 200 words, discuss the role of the interviewer in a survey and the
importance of training an interviewer.
Exercise 18.7
Readers of the magazine Popular Science were asked to phone in (on a premium rate
number) their responses to the following question: ‘Should the United States build more
fossil fuel generating plants or the new so-called safe nuclear generators to meet the
294
18. Sampling and experimentation II – Bias and the design of experiments
energy crisis?’. Of the total call-ins, 86% chose the nuclear option. Discuss the way the
poll was conducted, the question wording, and whether or not you think the results are
a good estimate of the prevailing mood in the country.
Exercise 18.8
What is randomisation in the context of experimental design?
Exercise 18.9
Explain what is meant by each of the following and why they are considered desirable in
an experiment:
(a) placebo
(c) blocking
295
19. Fundamentals of regression I – Correlation and the simple linear regression model
Overview
In Section 12.5, we saw that bivariate datasets could be visualised using scatter plots.
We discussed, for example, the effect advertising appeared to have on sales, i.e. whether
there is a positive or negative relationship between the variables. In this unit we go
further by introducing correlation and then proceed to modelling a linear relationship
between variables using a common procedure known as regression.
Aims
This unit explains the concepts of correlation and the fundamentals of regression.
Particular aims are:
to highlight the importance of correlation
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Statistics’ Chapter 3.
19.1 Introduction
We now investigate the relationship between variables. When we have data on two
variables (x and y, say), we have bivariate data. We will consider how to:
measure the strength of the relationship
The first thing to do with data is to provide a graphical representation. For one variable
this might be a histogram or a pie chart. For two variables we produce a scatter plot (as
previously discussed in Section 12.5).
296
19. Fundamentals of regression I – Correlation and the simple linear regression model
Example 19.1 Assume that we have some data in paired form, say:
(xi , yi ), for i = 1, 2, . . . , n.
We plot x on the horizontal axis and y on the vertical axis. By doing so, we can
easily see whether there is any relationship between the variables. The scatter plot is
shown in Figure 19.1.
Scatter plot of Crime against Unemployment
x
x
6000
x
Number of offences
x
x
5500
x x
x
x
x
5000
Unemployment
Figure 19.1: Scatter plot of the unemployment and reported crime data.
19.2 Correlation
Correlation measures the strength of the linear relationship between two variables,
each measured on an interval scale.
297
19. Fundamentals of regression I – Correlation and the simple linear regression model
Scatter plot
x
x x
8
x
x
6
x
x
y
x
x
4
x
2
x
x
2 4 6 8
Positive correlation – the two variables tend to vary in the same direction.
Negative correlation – the two variables tend to vary in the opposite direction.
Perfect correlation – the two variables have points which all lie exactly on a
straight line.
If there exists a perfect linear relationship between x and y, we can represent them
using an equation of the form:
y = b0 + b1 x
where:
Variables Correlation
Height and weight Positive
Rainfall and sunshine hours Negative
Ice cream sales and sun cream sales Positive
Hours of study and examination mark Positive
Car’s petrol consumption and goals scored Zero
Positive correlation is characterised by large x with large y and small x with small y.
Negative correlation is characterised by large x with small y and small x with large y.
However, since x and y may have widely different numerical values we need to take this
298
19. Fundamentals of regression I – Correlation and the simple linear regression model
Scatter plot
x x
8
x
6
x
x
y
x
4
x
2
x
x x
0 2 4 6 8
Figure 19.3: Scatter plot showing uncorrelated data (no obvious (linear) relationship
between x and y).
into account. We do this by considering how far away from their means the two
variables are.
So, we are interested in the degree to which variations in variable values are related to
each other. Our basis for the measurement of correlation is:
n
X n
X
(xi − x̄)(yi − ȳ) = xi yi − nx̄ȳ.
i=1 i=1
Unfortunately, this measure is extremely sensitive to the units in which the variables
are measured. We would prefer a measure of correlation to remain the same regardless
of the units of measurement (for example days, hours, minutes or seconds). For this
reason we use the following.
where x̄ and ȳ are the sample means, and sx and sy are the sample standard
deviations, of x and y, respectively.
Note that r is just the sum of the products of the z-scores (see Section 16.1) of each
point’s coordinates. This statistic is completely independent of the units used to
measure the variables.
299
19. Fundamentals of regression I – Correlation and the simple linear regression model
where:
n
X n
X n
X n
X
2
Sxx = (xi − x̄) = x2i − nx̄ , 2
Syy = 2
(yi − ȳ) = yi2 − nȳ 2
i=1 i=1 i=1 i=1
and: n n
X X
Sxy = (xi − x̄)(yi − ȳ) = xi yi − nx̄ȳ
i=1 i=1
−1 ≤ r ≤ 1.
300
19. Fundamentals of regression I – Correlation and the simple linear regression model
Beware r ' 0 does not necessarily imply no relationship (as there could be a
non-linear relationship). For example, the scatter plot in Figure 19.4 arises from
data where r = 0.1481, but there is a clear quadratic relationship.
Scatter plot
x
2500
x
x
x
x x
2000
x x
x
1500
y
x
x
1000
x
x
500
x
x
20 30 40 50 60 70 80
Figure 19.4: Scatter plot of data simulated from the (approximate) quadratic equation
y = 2(x − 15)(85 − x).
Activity 19.1 State whether the following statements are true or false, explaining
your answers.
(a) ‘The correlation coefficient between x and y is the same as the correlation
coefficient between y and x.’
(c) ‘If two variables have a correlation coefficient of minus 1 they are not related.’
(d) ‘A large correlation coefficient means the regression line will have a large slope
b1 .’
301
19. Fundamentals of regression I – Correlation and the simple linear regression model
test the adequacy of the proposed model and the relevance of the explanatory
variable.
y = b0 + b 1 x
where:
b0 and b1 are fixed, but unknown, parameters
b0 is the y-intercept
y = b0 + b1 x + ε
where ε is some random perturbation from the initial ‘approximate’ line. In other
words, each y observation almost lies on the hypothesised line, but ‘jumps’ off the line
according to the random variable ε. Often we refer to ε as the error term of the model.
302
19. Fundamentals of regression I – Correlation and the simple linear regression model
yb = bb0 + bb1 x
where yb is our estimate of y based on the line of best fit when x is the value of the
explanatory variable.
yb = 4323.6 + 0.7468x.
19.5 Prediction
One of the reasons for calculating the line of best fit is prediction. Specifically, for
some value of x, we can provide a prediction of y. So, returning to the example, how
many offences would you predict if there were 2,000 unemployed people in a city area?
To answer this we just substitute the desired value of x into the least squares regression
line:
yb = 4323.6 + 0.7468 × 2000 = 5817.
Provided we are predicting y for an x value which is within the available x data, then we
can be fairly confident in our prediction. This is what we call interpolation. However,
if we base our prediction on an x value outside the available x data, then we should view
the prediction with caution. This would be an example of extrapolation which is risky
since the relationship between x and y may change for such out-of-sample values of x.
Activity 19.2 The following table shows the number of computers (in 000s), x,
produced by a company each month and the corresponding monthly costs (in
£000s), y, for running its computer maintenance department.
10
X 10
X
x2i = 573.33 and yi2 = 116988.
i=1 i=1
(b) Calculate the correlation coefficient for computers and maintenance costs.
304
19. Fundamentals of regression I – Correlation and the simple linear regression model
(d) Comment on your results. How would you check on the strength of the
relationship you have found?
19.6 Summary
This unit has introduced the concept of correlation to measure the strength of a linear
relationship between two continuous variables. Having seen that a linear relationship
exists between two such variables, it is possible to model the relationship
mathematically using the simple linear regression model. Estimation of the intercept
and slope in the regression model was discussed and the subsequent use of the
estimated model for prediction.
Learning outcomes
At the end of this unit, you should be able to:
discuss the strength of correlation between two continuous variables
Exercises
Exercise 19.1
305
19. Fundamentals of regression I – Correlation and the simple linear regression model
Exercise 19.2
Define the term ‘sample correlation coefficient’, r, based on data (x1 , y1 ), . . . , (xn , yn ).
Describe some properties of r in terms of how its value is different when the data have
different patterns of scatter plot.
Exercise 19.3
An area manager in a department store wants to study the relationship between the
number of workers on duty and the value of merchandise lost to shoplifters. To do so,
she assigned a different number of clerks for each of 10 weeks. The results were:
10
X 10
X 10
X
xi = 130, yi = 3,090, xi yi = 38305,
i=1 i=1 i=1
10
X 10
X
x2i = 1760, yi2 = 1007750.
i=1 i=1
(a) Which is the independent variable and which is the dependent variable?
(b) Plot the data in a scatter plot and comment on its shape.
(f) Compute the correlation coefficient between the number of workers and the loss.
Exercise 19.4
Write down the simple linear regression model, explaining each term in the model.
306
19. Fundamentals of regression I – Correlation and the simple linear regression model
Exercise 19.5
The following data were recorded during an investigation into the effect of fertiliser in
g/m2 , x, on crop yields in kg/ha, y.
Crop yields (kg/ha) 160 168 176 179 183 186 189 186 184
Fertiliser (g/m2 ) 0 1 2 3 4 5 6 7 8
9
X 9
X
x2i = 204, yi2 = 289099.
i=1 i=1
(a) Plot the data and comment on the appropriateness of using the simple linear
regression model.
(b) Calculate a least squares regression line for the data.
(c) Predict the crop yield for 3.5 g/m2 of fertiliser.
(d) Would you feel confident predicting a crop yield for 10 g/m2 of fertiliser? Explain
briefly why or why not.
Exercise 19.6
In a study of household expenditure a population was divided into five income groups
with the mean income, x, and the mean expenditure, y, on essential items recorded (in
Euros per month). The results are in the following table.
x y
1000 871
2000 1300
3000 1760
4000 2326
5000 2950
5
X 5
X
x2i = 55000000, and yi2 = 19659017.
i=1 i=1
307
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
Overview
In practice, regression calculations are performed by computer and so in this final unit
we will consider how to interpret computer output to assess the adequacy of a given
regression model. Central to this are whether the model provides a good ‘fit’ to the
data, in terms of explanatory power, and the statistical significance of the explanatory
variable.
Aims
This unit considers how to judge whether a regression model is ‘good’ and how to
interpret typical computer output of a regression. Particular aims are:
Background reading
Swift, L. and S. Piff Quantitative methods for business, management and finance.
(Palgrave, 2014) fourth edition [ISBN 9781137376558] ‘Statistics’ Chapter 3.
20.1 Introduction
We conclude the course with a discussion of computer output for the simple linear
regression model and how to assess the adequacy of a particular model. Remember our
aims are decision-making and prediction. In order to make the best decisions and
predictions we need to use the best models.
308
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
attempt this using a single explanatory variable. The total variation in the response
variable sample data is simply:
Xn
TSS = Syy = (yi − ȳ)2 . (20.1)
i=1
We call this the total sum of squares (TSS). As can be seen from (20.1), TSS is
simply the sum of the squared deviations of the response observations, the yi s, about
the mean, ȳ.1 We can decompose TSS into two components which are:
the amount of variation which we are able to explain using the proposed model,
called the explained sum of squares (ESS)
the remaining (or residual) variation which we are unable to explain with the
model, called the residual sum of squares (RSS).
Hence:
TSS = ESS + RSS. (20.2)
Coefficient of determination
309
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
However, how to choose the candidate explanatory variables in the first place? Well, it
would be sensible to draw on our prior knowledge about the response variable and
general common sense to come up with some obvious choices. For example, if our
response variable is a macroeconomic variable (consumption, say) then we could use
basic economic theory to come up with something suitable as an explanatory variable
(income, say).
In the ‘Predictor’ column, the ‘Constant’ relates to the y-intercept and ‘Courses’ is
the explanatory variable.
In the ‘Coefficient’ column, the parameter estimates bb0 and bb1 (of the y-intercept
and slope of the regression line, respectively) are provided, yielding the fitted
regression line (using appropriate rounding) of:
[ = 812.988 + 50.479 × Courses.
GRE
In the ‘Std. error’ column, the ‘Courses’ value (3.347518) is the standard error of
the slope, denoted sb̂1 , which measures the precision of the slope estimate.
Similarly, the ‘Constant’ value (70.73298) is the standard error of the y-intercept,
denoted sb̂0 , which measures the precision of the y-intercept estimate, although we
shall not consider this term any further in this course.
In the ‘t ratio’ column, the ‘Courses’ value (15.0795 = 50.478786/3.347518) is the t
statistic:
bb1
t=
sb̂1
which can be used to perform a statistical test to assess the significance of
‘Courses’ as an explanatory variable of ‘GRE’ – that is whether or not the true
slope b1 = 0. However, we shall perform the test using the ‘p-value’, discussed next.
Similarly, there is a t statistic, t = bb0 /sb̂0 , for testing whether or not the true
intercept b0 = 0 – that is whether or not the true line passes through the origin –
but, again, we shall not consider this further in this course.
310
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
In the ‘p’ column, the ‘Courses’ value is the p-value for a ‘two-sided’ hypothesis
test of whether the true slope b1 = 0 or b1 6= 0. All p-values less than 0.05 suggest
that b1 6= 0 and hence that the explanatory variable is statistically significant.3
In the bottom row, ‘S’ is the standard error of the regression – the standard
deviation of the observed y values about the predicted yb values. It is an estimate of
the standard deviation of the model error term, ε, and tells us by how much the
regression line varies.
In the bottom row, ‘R-sq’ is the value of the coefficient of determination, R2 , which
is the proportion of the variation in y explained by x.
√
In the bottom row, R (where R = R-sq) is the sample correlation coefficient, r, as
defined in the previous unit.
y = b0 + b1 x + ε.
We observe that the p-value associated with ‘Courses’ is 0.000 which is clearly below4
our threshold value of 0.05 and hence we conclude that b1 in (20.3) is not equal to 0.
Therefore, the number of mathematics courses taken by students, ‘Courses’, does help
to explain GRE scores. Indeed, we estimate that taking one additional mathematics
course will increase a student’s GRE by 50.479 points.
311
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
By looking at the sample correlation coefficient, −0.9257, we see that there is a very
strong, negative linear relationship between the number of workers and the
construction time. This means the more workers on a job, the shorter the completion
time.
Looking at the p-value of the ‘Workers’ explanatory variable, we see it is 0.000 (to
three decimal places) which is clearly below 0.05, indicating that the number of
workers is a highly significant explanatory variable so is useful in explaining the
response variable. The R2 value tells us that this model is able to explain 85.7% of
the variation in pool construction time using the number of workers as the
explanatory variable.
The coefficient of ‘Workers’ is −6.725 which means each additional worker on a pool
construction job reduces the completion time by 6.725 hours.
312
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
Example 20.3 An estate agent recorded sales of houses according to the sale price,
in pounds, and the size of living space, in square metres. She was interested in
investigating the relationship between these two variables and wondered whether the
sales price could be predicted from the size of living space. A regression model was
estimated with the following results:
It is clear that there is a very strong positive correlation between the sale price and
amount of living space, with a sample correlation coefficient of 0.9284. This is
perhaps not too surprising since we would expect larger properties to be worth more.
‘Living space’ is statistically significant due to the small p-value and the coefficient
of 11647 can be interpreted by saying that for every extra square metre of living
space, the sale price increases by £11,647.
The coefficient of determination tells us that 86.2% of the variation in house sale
prices can be explained by living space alone. What might account for the other
13.8%? Perhaps the number of bedrooms, location, age of the property, allocated
parking etc.
The intercept of the model is negative. Is this reasonable? Well, the intercept gives
the predicted value of the response variable when the explanatory variable is zero.
Clearly, a sale price cannot be negative! However, we would expect a minimum
amount of living space for any house (perhaps 50 square metres?), so the model is
fine as we would never encounter properties with near-zero amounts of living space!
Finally, note the ‘large’ value of S, the standard error of the regression. This is
purely a consequence of the large values of the response variable, since sale prices are
in pounds, rather than hundreds of thousands of pounds. Always pay attention to
the units of measurement!
Activity 20.1 A retailer has asked you to develop a model which could be used to
predict total sales for some proposed new retail locations. As an expanding retailer,
it needs accurate predictions to determine whether it would be profitable to build
new stores at various locations. The company has obtained data from a household
survey on retail sales per household, y, and income per household, x. You run a
simple linear regression model and obtain the results below. Comment on the
adequacy of the model.
313
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
Activity 20.2 A company sets different prices for its DVD system in different
regions of its country of operation. Data on the number of units sold and the
corresponding prices were collected and a simple linear regression analysis
performed. The regression results are:
Remember the aim of statistics is decision-making and prediction. In order to make the
best decisions and predictions we need to use the best models. This often means making
314
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
the models more complex by adding more explanatory variables. However, the models
should not be too complex.5
It is very straightforward to extend the simple linear regression model to incorporate
several explanatory variables. Multiple linear regression is just a natural extension
of this framework, but with more than one explanatory variable.
Example 20.4 Suppose XYZ Catering sells catering goods and senior management
wants to know which factors affect sales. Your team has data on sales, clients,
suppliers etc. How does the management question translate into a model? What is
the dependent variable? What is (are) the independent variable(s)?
Clearly, here the dependent variable is sales (the variable we are trying to explain).
There could be several explanatory factors such as:
size of client company
location
etc.
However, multiple linear regression is beyond the scope of this course so it will not be
discussed further, although many of you are likely to meet multiple linear regression
during your undergraduate studies.
20.6 Summary
In practice most datasets which are used for regression are large and the estimation of
regression parameters can be computationally intensive. Therefore, we tend to use
computers to perform regression analysis. In this final unit of the course, we have looked
at the interpretation of regression output. Specifically, we have seen how to obtain the
equation of the best-fitting line, how to determine how much of the total variation in
the response variable can be explained by the model (using R2 ) and how to determine
whether the explanatory variable in our model was statistically significant. Finally, we
briefly looked at introducing more than one explanatory variable into the regression
model which has the advantage of being more realistic, but the disadvantage of leading
to a more complicated model.
315
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
Learning outcomes
At the end of this unit, you should be able to:
determine how good a regression model is at explaining the dependent variable
interpret the computer output of a regression model
assess the statistical significance of the explanatory variable
discuss the key terms and concepts introduced in this unit.
Exercises
Exercise 20.1
The head of a statistics department has taken data from her instructors to observe the
correlation between the number of homework assignments the instructors give for a
course and the average course grade for the students. The following is a plot of the
residuals, defined as yi − ybi , for this study.
Plot of residuals
0.2
x x
0.1
x x
Grade Average
x
x x
0.0
x x
x x x x
−0.1
x x
x
−0.2
0 5 10 15
(a) Based on the plot of residuals, describe the strength of a linear relationship
between the number of homework assignments and grade average. What would be
a likely value for the correlation coefficient, r? Explain your answer.
(b) Based on the plot of residuals, describe the effect that the number of homework
assignments has on student grades.
(c) Based on the plot of residuals, about how many homework assignments should an
instructor give to maximise a student’s grade average? Explain your answer.
316
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
Exercise 20.2
A botanist is studying the relationship between the trunk circumferences of a species of
tree and the number of leaves it has. The scatter plot and regression results are given
here:
2500
2000
Number
1500
of Leaves
1000
A
500
0
100 150 200 250 300 350 400 450
Trunk Circumference (inches)
(a) What is the equation of the least squares regression line relating the number of
leaves to the trunk circumference in inches? Define any variables used.
(b) If the point A, as shown in the scatter plot above, represents a tree with a trunk
circumference of 350 inches, and 980 leaves, what is the residual, yi − yb, for this
data point?
(c) If the data point A is removed from the sample, what effect will this have on the
correlation coefficient, r? Explain.
317
20. Fundamentals of regression II – Interpretation of computer output and assessing model adequacy
318
Part 3
Appendices
319
A. A sample examination paper
A
Appendix A
A sample examination paper
Important note: This Sample examination paper reflects the examination and
assessment arrangements for this course in the academic year 2013–2014. The format
and structure of the examination may have changed since the publication of this subject
guide. You can find the most recent examination papers on the VLE where all changes
to the format of the examination are posted.
Candidates should answer ALL questions. Section A (50 marks) covers the
Mathematics part of the course, Section B (50 marks) covers the Statistics part of the
course. Candidates are required to pass BOTH sections to pass the examination.
A list of formulae and the table of cumulative Normal probabilities is provided at the
end of this paper.1
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
1
The table is provided at the back of this subject guide in Appendix C.
320
A. A sample examination paper
A
Section A: Mathematics
Answer ALL questions (50 marks in total).
1. (a) The demand for a product, q, is related to its price, p, by the equation
q = 10 − p,
i. cos(x2 ).
ii. ex cos(x2 ).
(d) Suppose that you buy a car for £10, 000 and its value depreciates continuously
at a rate of 25% per year. What is its value after three years? Explain why the
car’s value is halved after 4 ln(2) years. (5 marks)
[You may use the fact that, to 5dp, e0.25 = 1.28403.]
3. Consider an annuity which pays £100 every year. The first payment is to be made
now and further payments will be made at the end of each year for the next n years.
(a) Find the present value of this annuity, simplifying your answer as far as
possible, given that an interest rate of 5% per annum compounded annually is
available to you. (5 marks)
(b) If the annuity is to make eleven payments, what is the smallest lump sum
payment that will be worth more to you than the annuity? (3 marks)
(c) How many payments are needed if the annuity is to be worth more than a
lump sum of £2, 000? (5 marks)
(d) If the annuity was a perpetuity, what would be its present value? (2 marks)
[You may use the facts that, to 5dp, 1.0511 = 1.71034 and log1.05 (21) = 62.40033.]
321
A. A sample examination paper
A
Section B: Statistics
Answer ALL questions (50 marks in total).
4. (a) Would the distribution of income (in pounds per year) in the UK most likely
be symmetrically distributed, skewed to the right, or skewed to the left?
Briefly explain why. Which measure of central tendency would you use to
describe income? Justify your choice.
(5 marks)
(b) Given events A and B where P (A) = 0.5 and P (A ∪ B) = 0.7, find P (B) in
the following three cases:
322
A. A sample examination paper
A
6. For a group of 15 students, the following table shows the average number of hours
per week spent on study and their final results in the corresponding examination.
15 15 15
x2i = 4218.75,
P P P
xi = 247.5, yi = 1155,
i=1 i=1 i=1
15 15
yi2 = 92999
P P
and xi yi = 19750.5
i=1 i=1
(a) Calculate the sample correlation coefficient for these data and comment.
(5 marks)
(b) Calculate the least squares regression line of y on x.
(5 marks)
(c) Use the calculated line to predict examination marks for a student who studied
for 16 hours. Would you consider a prediction based on 20 hours to be more
accurate? Explain why/why not.
(5 marks)
[END OF PAPER]
323
A. A sample examination paper
A
Formula sheet
Section A: Mathematics
df df dg
The chain rule: If f (x) = f (g) for some function g(x), then = .
dx dg dx
d df dg
The product rule: f (x)g(x) = g(x) + f (x) .
dx dx dx
d f (x) 1 df dg
The quotient rule: = g(x) − f (x) .
dx g(x) [g(x)]2 dx dx
Section B: Statistics
The variances for a population and a sample are:
N
n
x2i x2i − nx̄2
P P
i=1 i=1
σ2 = − µ2 and s2 = .
N n−1
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (A ∩ B) = P (A) P (B | A)
P (A | B) P (B)
P (B | A) = .
P (A)
324
A. A sample examination paper
A
E(X) = n π and Var(X) = n π (1 − π).
If X ∼ Poisson(λ), then:
e−λ λx
P (X = x) = , E(X) = λ and Var(X) = λ.
x!
where:
n
X n
X
Sxx = (xi − x̄)2 = x2i − nx̄2
i=1 i=1
n
X n
X
2
Syy = (yi − ȳ) = yi2 − nȳ 2
i=1 i=1
n
X n
X
Sxy = (xi − x̄)(yi − ȳ) = xi yi − nx̄ȳ.
i=1 i=1
325
B. Solutions to the sample examination paper
B Appendix B
Solutions to the sample examination
paper
The solutions to the sample examination paper provided below are to give guidance
about the level of detail required by the Examiners. In response to a ‘qualitative’
question, it is essential to directly address the areas of the syllabus which are being
assessed. For a ‘quantitative’ question, it is essential that you show all your working as
most of the credit will be for the method, rather than the final answer.
Section A: Mathematics
Question 1
(a) As in Section 2.3.3, given the demand equation q = 10 − p and the supply equation
q = 4p − 30, the equilibrium price is given by
40
10 − p = 4p − 30 =⇒ 5p = 40 =⇒ p= = 8.
5
The corresponding quantity is then given by, say, q = 10 − 8 = 2. (This is, of course, the
equilibrium quantity.)
Then, as in Section 4.1.4, using the supply equation, we can see that the supply function
is given by qS (p) = 4p − 30. This is economically meaningful as long as q ≥ 0 and
p ≥ 15/2 since other values of p or q will make at least one of these quantities negative.
(b) As in Section 4.2.2, if we note that
and so, using the definition of the logarithm, we see that x = 101 = 10.
(c) For (i), the function f (x) = cos(x2 ) is the composition given by f (g) = cos(g) with
g(x) = x2 . Thus, using the chain rule from Section 6.1.3, we find that
df df dg
= · = [− sin(g)][2x] = −2x sin(x2 ).
dx dg dx
326
B. Solutions to the sample examination paper
For (ii), the function h(x) = ex cos(x2 ) is the product of the functions ex and cos(x2 ).
Thus, using the product rule from Section 6.1.1, we find that
dh
= [ ex ][cos(x2 )] + [ ex ][−2x sin(x2 )] = ex [cos(x2 ) − 2x sin(x2 )],
B
dx
if we use our answer from (i).
(d) As the car is initially worth £10, 000 and its value depreciates continuously at a rate
of 25% per year, we can use what we saw in Section 9.3 to see that its value is given by
after three years. We are told in the question that, to 5dp, e0.25 = 1.28403 and so this
gives us
10, 000 10, 000
0.25 3
' = 4, 723.615,
(e ) (1.28403)3
i.e. the car’s value is £4, 723.61 after three years.
The car’s value is halved from £10, 000 to £5, 000 after t years where
1
5, 000 = 10, 000 e−0.25t =⇒ e−0.25t = .
2
Using the definition of ‘ln’, as in Section 4.2.2, this then gives us
1 1
−0.25t = ln =⇒ t = −4 ln = 4 ln(2),
2 2
if we use the laws of logarithms. Consequently, as required, the car’s value is halved
after 4 ln(2) years.
Question 2
and solve the equation f 0 (x) = 0 and so, using factorisation, we can see that
5
3x2 − 4x − 15 = 0 =⇒ (3x + 5)(x − 3) = 0 =⇒ x=− or 3.
3
Thus, the stationary points occur when x = −5/3 and x = 3.
To classify these stationary points, we see that the second derivative of f (x) is given by
f 00 (x) = 6x − 4,
327
B. Solutions to the sample examination paper
y y y = f (x)
400 400
27 27
3 3
O O
−3 − 53 5 x −3 − 53 5 x
−36 −36
(c) As in Section 8.2, to find the area of the region bounded by the curve y = f (x), the
x-axis and the vertical lines x = −1 and x = 1, we observe that f (x) is positive for
−1 ≤ x ≤ 0 and negative for 0 ≤ x ≤ 1. This means that the area we need to find is
given by
Z 0 Z 1
f (x) dx + f (x) dx .
−1 0
328
B. Solutions to the sample examination paper
Question 3
329
B. Solutions to the sample examination paper
(d) If the annuity was a perpetuity, its present value would be given by the infinite
geometric series
100 100 100
B 100 + +
1.05 1.05 2
+ ··· +
1.05n
+ ··· ,
whose sum is
100
= 2, 100,
1
1−
1.05
if we use the formula for the sum of an infinite geometric series (or think about what we
found in (a) as n → ∞). Thus, the present value of the corresponding perpetuity would
be £2, 100.
Section B: Statistics
Question 4
(a) The distribution of income would be skewed to the right. Most people earn, say,
between £12,000 and £60,000, with few earning less. However, a relatively small
number earn a lot more, leading to a long ‘tail’ to the right. We would probably
not use the mode, instead preferring the mean or median. The mean, though, is
sensitive to outliers and so will be ‘pulled’ up due to the few high earners. As such,
it could be argued that the median would be the best measure of central tendency
for representing the ‘average’ income of a UK employee.
(b) i. We use the fact that P (A ∪ B) = P (A) + P (B) − P (A ∩ B). If A and B are
mutually exclusive, then P (A ∩ B) = 0, so:
Hence:
0.7 = 0.5 + P (B) − 0.5 × P (B).
Therefore, 0.5 × P (B) = 0.2, hence P (B) = 0.4.
iii. Again, P (A ∪ B) = P (A) + P (B) − P (A ∩ B) and also:
P (A ∩ B) = P (B) P (A | B)
so:
0.7 = 0.5 + P (B) − 0.5 × P (B).
Hence P (B) = 0.2/0.5 = 0.4.
Alternatively (and more elegantly), P (A) = P (A | B) implies A and B are
independent and so we can use the result of part ii.
330
B. Solutions to the sample examination paper
70 − 61 70 − 64
zA =
5
= 1.8 and zB =
4
= 1.5. B
Since zA > zB , type B schools have a higher proportion of students with marks
above 70.
Alternatively, the actual proportions could be calculated. P (Z > 1.8) = 0.0359 and
P (Z > 1.5) = 0.0668, hence type B schools have the higher proportion.
(d) Several forms of bias exist in this design, including undercoverage bias, response
bias and non-response bias. Readers of a hunting magazine would probably share
positive views about gun ownership. This group of readers is not a representative
sample of the general public when it comes to gun control. Therefore,
undercoverage bias is suggested. Since this group of readers probably has strong
views about gun control, they might answer more often than the general public,
resulting in non-response bias. There is also an expectation among hunters that
gun control is not a good idea. This expectation might lead to a response bias not
found in the general population. In order to try to avoid this form of bias, a
magazine covering a topic unrelated to guns might be a more appropriate
population from which to select a sample.
Question 5
A is Mr Adams elected
C is Dr Cooper elected.
(b) We have:
331
B. Solutions to the sample examination paper
Question 6
This indicates (very) strong, positive correlation between examination marks and
hours of study.
(b) The least squares regression line parameters are estimated to be:
15
P
xy − nx̄ȳ
bb1 = i=1
15
= 5.1333
P
x2 − nx̄2
i=1
and:
bb0 = ȳ − bb1 x̄ = −7.7000.
yb = −7.7000 + 5.1333x.
which we may round to 74. We expect the predicted value for x = 16 to be more
accurate because the available x data cover a range of 11.5 to 22, hence 16 is near
the middle of the sample x values whereas 20 is toward the upper limit.
Interpolation is more accurate for values near the centre of the sample data.
332
C. Table of cumulative normal probabilities
Appendix C
Table of cumulative normal
probabilities C
The entries in this table are cumulative probabilities for the standard normal
distribution and give Φ(z) = P (Z ≤ z) for z ≥ 0. For example, P (Z ≤ 1.96) = 0.9750.
For values of z < 0, use P (Z ≤ z) = 1 − P (Z ≤ |z|) = 1 − Φ(|z|). For example,
P (Z ≤ −1) = 1 − P (Z ≤ 1) = 1 − Φ(1) = 1 − 0.8413 = 0.1587.
333
C. Table of cumulative normal probabilities
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
C 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
334