CS323: Lecture #1: Types of Errors
CS323: Lecture #1: Types of Errors
CS323: Lecture #1: Types of Errors
Mridul Aanjaneya
January 18, 2023
Types of Errors
1. When doing integer calculations one can many times proceed exactly,
except of course in certain situations, e.g. division 5/2=2.5. However,
when doing floating point calculations, rounding errors are the norm, e.g.
1./3. = .3333333 . . . cannot be expressed on the computer. Thus, the
computer commits rounding errors to express numbers with machine pre-
cision, e.g. 1./3. = .3333333. Machine precision is 10−7 for single precision
and 10−16 for double precision. Rounding errors are only one source of
approximation error when considering floating point calculations. Some
others are listed below.
2. Approximation errors come in many forms:
(a) empirical constants: Some numbers are unknown and measured
in a laboratory only to limited precision. Others may be known
more accurately but limited precision hinders the ability to express
the numbers on a finite precision computer. Examples include Avo-
gadro’s number, the speed of light in vacuum, the charge on an elec-
tron, Planck’s constant, Boltzmann’s constant, pi, etc. Note that
the speed of light is 299792458 m/s exactly, so we are ok for double
precision but not single precision.
(b) modeling errors: Parts of the problem under consideration may
simply be ignored. For example, when simulating solids or fluids,
sometimes frictional or viscous effects respectively are not included.
(c) truncation errors: These are also sometimes called discretization
errors and occur in the mathematical approximation of an equation
as opposed to the mathematical approximation of the physics (i.e.,
as in modeling errors). We will see later that one cannot take a
derivative or integral exactly on the computer so we approximate
these with some formula (recall Simpson’s rule from your Calculus
class).
(d) inaccurate inputs: Many times we are only concerned with part of
a calculation and we receive a set of input numbers and produce a
1
set of output numbers. It is important to realize that the inputs may
have been previously subjected to any of the errors listed above and
thus, may already have limited accuracy. This can have implications
for algorithms as well, e.g., if the inputs are only accurate to 4 decimal
places, it makes little sense to carry out the algorithm to an accuracy
of 8 decimal places. This issue commonly resurfaces in scientific
visualization or physical simulation where experimental engineers can
be unhappy with visualization algorithms that are “lossy”, meanwhile
forgetting that the part that is lost may contain no useful, or accurate
information whatsoever.
3. In dealing with errors, we will refer to both the absolute error and the
relative error.
(a) Absolute error = |approximate value - true value|
(b) Relative error = absolute error/|true value|
ax2 + bx + c = 0
x1 = 1971.605916, x2 = 0.05077069387
x1 = 1972, x2 = 0.0998
Although the value of x1 is reasonably accurate (it is, in fact, the correct answer
to 4 significant digits), the value x2 is off by almost 100% from the intended
2
value even when rounded to 4 digits. The reason for this behavior is hiding
in the inaccurate (approximate) representation of the intermediate√steps of this
calculation. Particularly, the computer algorithm approximates b2 − 4ac ≈
98.77, discarding any subsequent digits due to its limited available precision.
Substituting this value into the quadratic formula yields:
98.78 ± 98.77
x1,2 =
0.1002
The numerator involves either a sum or a difference of two near-equal quan-
tities. When adding the 2 values, we compute the reasonably accurate x1 =
1972. When subtracting, all the information that was discarded by truncating
√
b2 − 4ac ≈ 98.77 is now dominating the computed value of the numerator.
As a consequence, we obtain the compromised value x2 = 0.0998. In general,
subtraction reveals the error, and division by a small number amplifies it.
So, subtracting two near-equal quantities is risky . . . can we avoid doing so?
There is, in fact, an alternative formulation:
√ √
(−b ± b2 − 4ac)(−b ∓ b2 − 4ac)
x1,2 = √
2a(−b ∓ b2 − 4ac)
b − (b2 − 4ac)
2
4ac
= √ = √
2
2a(−b ∓ b − 4ac) 2a(−b ∓ b2 − 4ac)
2c
= √ (2)
−b ∓ b2 − 4ac
Equation (2) subtracts 2 quantities in the denominator where (1) adds them and
vice-versa. This technique is called derationalization. Applying this approach,
we obtain:
x1 = 1003, x2 = 0.05077
Here the roles are reversed, x2 is quite accurate, but x1 is off by about 50%
from the intended value. One might suggest that we selectively use (1) or (2) to
compute x1 or x2 respectively, and avoid the subtraction-induced inaccuracies.
Although this may be realistic in this isolated example, in more complex prob-
lems that arise in practice, performing such a case-by-case handling may prove
impractical. In many cases, it may even be impossible to predict that a value
lacks accuracy (not knowing the exact value beforehand). √
Let us take a brief detour and look at an iterative scheme for computing a.
Starting with an initial guess x0 , we iterate as follows:
x2k + a
xk+1 =
2xk
Assuming this scheme converges, i.e., limk→∞ xk = A, gives
A2 + a √
A= ⇒ 2A2 = A2 + a ⇒ A2 = a ⇒ A = a
2A
3
√
Note how we computed a with just addition, multiplication and division op-
erations! The same principle can be applied to solve a quadratic equation as
well. Consider the following scheme (which is a special case of what we will
later describe as Newton’s method):
• Start with an initial guess x0 of the solution.
• Iterate
ax2k − c
xk+1 =
2axk + b
x0 = 1, x1 = 0.050313 . . . , x2 = 0.0507706 . . .
As is typically the case, there are trade-offs to consider: with this iterative
method, we require an “initial guess”, and there is little guidance offered on
how to pick a good one. Also, we need a few iterations before we can obtain an
excellent approximation. On the other hand, the method does not require any
square roots, thus being significantly cheaper in terms of computation cost per
iteration. In general, the two different methodologies that we have seen fall into
two broad classes:
• Direct methods: These methods give a “recipe” for directly obtaining
the final solution. The closed-form formulas we saw above for the roots of
a quadratic equation fall in this category.
• Iterative methods: These methods obtain better approximations to the
solution through successive iterations.
The traits and trade-offs of such methods will be a point of focus for CS323,
and are central to the field we call numerical analysis and scientific computing.