Numerical Computing MCSE004

Floating Point
Arithmetic
and Errors
UNIT 1 FLOATING POINT ARITHMETIC
AND ERRORS
Structure Page Nos.
1.0 Introduction 7
1.1 Objectives 8
1.2 Floating Point Representations 8
1.2.1 Floating Point Arithmetic 10
1.2.2 Properties of Floating Point Arithmetic 10
1.2.3 Significant Digits 11
1.3 Error - Basics 15
1.3.1 Rounding-off Error 16
1.3.2 Absolute and Relative Errors 18
1.3.3 Truncation Error 20
1.4 Summary 21
1.5 Solutions/Answers 22
1.6 Exercises 23
1.7 Solutions to Exercises 24
1.0 INTRODUCTION
Numerical Analysis is the study of computational methods for solving scientific and
engineering problems by using basic arithmetic operations such as addition,
subtraction, multiplication and division. The results obtained by using such methods,
are usually approximations to the true solutions. These approximations to the true
solutions introduce errors but can be made more accurate up to some extent. There can
be several reasons behind this approximation, such as the formula or method used to
solve a problem may not be exact. i.e., the expression of sin x can be evaluated by
expressing it as an infinite power series. This series has to be truncated to the finite
number of terms. This truncation introduces an error in the computed result. As a
student of computer science you should also consider the computer oriented aspect of
this concept of approximation and errors, say the machine involved in the computation
doesn’t have the capacity to accommodate the data or result produced by calculation
of a numerical problem and hence the data is to be approximated in to the limitations
of the machine. When this approximated data is to be further utilized in successive
calculations, then it causes the propagation of error, and if the error starts growing
abnormally then some big disasters may happen. Let me cite some of the well-known
disasters caused because of the approximations and errors.
Instance 1: On February 25, 1991, during the Gulf War, an American Patriot Missile
battery in Dhahran, Saudi Arabia, failed to intercept an incoming Iraqi Scud Missile.
The Scud struck an American Army barracks and killed 28 soldiers. A report of the
General Accounting office, GAO/IMTEC-92-26, entitled Patriot Missile Defense:
Software Problem Led to System Failure at Dhahran, Saudi Arabia reported on the
cause of the failure. It turns out that the cause was an inaccurate calculation of the
time since boot due to computer arithmetic errors.
Instance 2: On June 4, 1996, an unmanned Ariane 5 rocket launched by the European

Space Agency exploded just forty seconds after lift-off. The rocket was on its first
voyage, after a decade of development costing $7 billion. A board of inquiry
investigated the causes of the explosion and in two weeks issued a report. It turned out
that the cause of the failure was a software error in the inertial reference system.
7
Numerical
Computing -I Specifically, a 64-bit floating point number relating to the horizontal velocity of the
rocket with respect to the platform was converted to a 16-bit signed integer. The
number was larger than 32,768, the largest integer storeable in a 16-bit signed integer,
and thus the conversion failed.
In this Unit, we will describe the concept of number approximation, significant digits,
the way, the numbers are expressed and arithmetic operations are performed on them,
types of errors and their sources, propagation of errors in successive operations etc.
The Figure 1 describes the stages of Numerical Computing.
Mathematical
concepts
Mathematical model Computer and

Physical
software
problem
Numerical method
Error causing states Implementation
Improve Solution
Change algorithm
Modify model
method
Wrong
Modification to
Validity
reduce error
Correct
Application
Figure 1: Stages of Numerical Computation
1.1 OBJECTIVES
After studying this unit, you should be able to:
• describe the concept of fixed point and floating point numbers representations;
• discuss rounding-off errors and the rules associated with round-off errors;
• implement floating-point arithmetic on available data;
• conceptual description of significant digits, and
• analysis of different types of errors – absolute error, relative errors, truncation
error.
1.2 FLOATING POINT REPERESENTATIONS
In scientific calculations, very large numbers such as velocity of light or very small
numbers such as size of an electron occur frequently. These numbers cannot be
satisfactorily represented in the usual manner. Therefore, scientific calculations are
usually done by floating point arithmetic.
This means that we need to have two formats to represent a number, which are fixed
point representation and floating point representation. We can transform data of one
8
Floating Point
Arithmetic
format in to another and vice versa. The concept of transforming fixed point data into and Errors
floating point data is known as normalisation, and it is done to preserve the maximum
number of useful information carrying digits of numbers. This transformation
ultimately leads to the calculation errors. Then, you may ask what is the benefit of
doing this normalisation when it is contributing to erroneous results. The answer is
simply to proceed with the calculations keeping in mind the data and calculation
processing limitation of machine.
Fixed-Point numbers are represented by a fixed number of decimal places. Examples

are 62.358, 1.001, 0.007 all correctly expressed up to 3rd decimal place.
Floating-Point numbers have a fixed number of significant places. Examples are

6.236 x 103 1.306 x 10-3 which are all given as four significant figures. The position
of the decimal point is determined by the powers of base (in decimal number system it
is 10) 1.6236 x 103.
3
1.6236 x 10 Exponent
Base
Mantissa
Let us first discuss what is a floating-point number. Consider the number 123. It can
be written using exponential notation as:
1.23 x 102, 12.3 x 102, 123 x 102, 0.123 x 102, 1230 x 102, etc.
Notice how the decimal point “floats” within the number as the exponent is changed.
This phenomenon gives floating point numbers their name. The representations of the
number 123 above are in kind of standard form. The first representation, 1.23 x 102, is
in a form called “scientific notation”.
In scientific computation, a real number x is usually represented in the form
x = ±(d1d 2 ........d n ) x10m (1)
where d 1 , d 2 ,.........d n are decimal digits and m is an integer called exponent.

( d 1d 2 ........d n ) is called significand or mantissa. We denote this representation by
fl(x). A floating-point number is called a nomalised floating-point number if d1 ≠ 0 or
else d2 = d3 = …. = dn = 0. The exponent m is usually bounded in a range
–M<m<M (2)
In scientific notation, such as 1.23 x 102 in the above example, the significand is
always a number greater than or equal to 1 and less than 10. We may also write
1.23E2.
Standard computer normalisation for floating point numbers follows the fourth form
namely, 0.123 x 103 in the list above.
In the standard normalized floating-point numbers, the significand is greater than or

equal to 0.1, and is always less than 1.
In floating point notation (1), if fl(x) ≠ 0 and m ≥ M (that is, the number becomes too
large and it cannot be accommodated), then x is called an over-flow number and if
9
Numerical
Computing -I m ≤ – M (that is the number is too small but not zero) the number is called an under-
flow number. The number n in the floating-point notation is called its precision.
1.2.1 Floating Point Arithmetic
When arithmetic operations are applied on floating-point numbers, the results usually
are not floating-point numbers of the same length. For example, consider an operation
with 2 digit precision floating-point numbers (i.e., those numbers which are accurate
up to two decimal places) and suppose the result has to be in 2 digit floating point
precision. Consider the following example,
x = 0.30 x101 , y = 0.66 x10 −6 , z = 0.10 x101
then x + y = 0.300000066 x 101 = 0.30 x 101

x x y = 0.198 x 10– 5 =0 (3)
z/x = 0.333… x 100 = 0.33 x 100
Hence, if θ is one of the arithmetic operations, and θ* is corresponding floating-point

operation, then we find that
x θ* y ≠ x θ y
However, x θ y = fl(x θ y) (4)
1.2.2 Properties of Floating Point Arithmetic
Arithmetic using the floating-point number system has two important properties that
differ from those of arithmetic using real numbers.
Floating point arithmetic is not associative. This means that in general, for floating
point numbers x, y, and z:
• ( x + y) + z ≠ x + ( y + z )
• ( x . y) . z ≠ x . ( y . z)
Floating point arithmetic is also not distributive. This means that in general,
• x . ( y + z) ≠ ( x . y) + ( x . z)
Therefore, the order in which operations are carried out can change the output of a
floating-point calculation. This is important in numerical analysis since two
mathematically equivalent formulas may not produce the same numerical output, and
one may be substantially more accurate than the other.
Example 1: Let a = 0.345 x 100, b = 0.245 x 10–3 and c = 0.432 x 10–3. Using
3-digit decimal arithmetic with rounding, we have
b + c = 0.000245 + 0.000432 = 0.000677 (in accumulator)
= 0.677 × 10–3
a + (b + c) = 0.345 + 0.000677 (in accumulator)
= 0.346 × 100 (in memory) with rounding
a + b = 0.345 × 100 + 0.245 × 10–3
= 0.345 × 100 (in memory)
(a + b) + c = 0.345432 (in accumulator)
= 0.345 × 100 (in memory)
Hence, we see that,

(a + b) + c ≠ a + (b + c).
10
Floating Point
Arithmetic
Example 2: Suppose that in floating point notation (1) given above, n = 2 and m = 11. and Errors
Consider x = 0.10 x1010 , y = −0.10 x1010 and z = 0.10 x101. Then,
( x + y ) + z = 0.1x101 while x + ( y + z ) = 0.0 .
Hence, ( x + y) + z ≠ x + ( y + z ) .
From the above examples, we note that in a computational process, every floating-
point operation gives rise to some error, which may then get amplified or reduced in
subsequent operations.
Check Your Progress 1

( a − b) a b
1) Let a = 0.41, b = 0.36 and c = 0.70. Prove ≠ − .
c c c
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) Let a = .5665E1, b = .5556E – 1, c = .5644E1. Verify the associative property for

the floating point numbers i.e., prove (a + b) – c # (a – c) + b.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) Let a = .5555E1, b = .4545E1, c = .4535E1. Verify the distributive property for

these floating point numbers, i.e., prove a(b – c) # ab –ac.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
1.2.3 Significant Digits
The concept of significant digits has been introduced primarily to indicate the
accuracy of a numerical value. For example, if, in the number y = 23.40657, only the
digits 23406 are correct, then we may say that y has given significant digits and is
correct to only three decimal places.
The number of significant digits in an answer in a calculation depends on the number
of significant digits in the given data, as discussed in the rules below.
When are Digits Significant?
Non-zero digits are always significant. Thus, 22 has two significant digits, and 22.3
has three significant digits. The following rules are applied when zeros are
encountered in the numbers,
a) Zeros placed before other digits are not significant; 0.046 has two significant
digits.
b) Zeros placed between other digits are always significant; 4009 kg has four
significant digits.
c) Zeros placed after other digits but behind a decimal point are significant;
7.90 has three significant digits.
11
Numerical
Computing -I d) Zeros at the end of a number are significant only if they are behind a decimal
point as in (c). For example, in the number 8200, it is not clear if the zeros are
significant or not. The number of significant digits in 8200 is at least two, but
could be three or four. To avoid uncertainty, we use scientific notation to place
significant zeros behind a decimal point.
8.200 * 10 3 has four significant digits,
8.20 * 10 3 has three significant digits,
8.2 * 10 3 has two significant digits.
Note: Accuracy and precision are closely related to significant digits. They are related
as follows:
1) Accuracy refers to the number of significant digits in a value. For example, the
number 57.396 is accurate to five significant digits.
2) Precision refers to the number of decimal positions, i.e. the order of magnitude of
the last digit in a value. The number 57.396 has a precision of 0.001 or 10–3.
Example 1: Which of the following numbers has the greatest precision?

a) 4.3201, b) 4.32, c) 4.320106.
Solution:
a) 4.3201 has a precision of 10–4
b) 4.32 has a precision of 10–2
c) 4.320106 has a precision of 10–6
The last number has the greatest precision.
Example 2: What is the accuracy of the following numbers?

a) 95.763, b) 0.008472, c) 0.0456000, d) 36 e) 3600.00.
Solution:
a) This has five significant digits.
b) This has four significant digits. The leading or higher order zeros are only place
holders.
c) This has six significant digits.
d) This has two significant digits.
e) This has six significant digits. Note that the zeros were made significant by
writing .00 after 3600.
Significant digits in Multiplication, Division, Trigonometry functions, etc.
In a calculation involving multiplication, division, trigonometric functions, etc., the

number of significant digits in an answer should equal the least number of significant
digits in any one of the numbers being multiplied, divided, etc.
Thus, in evaluating sin(kx), where k = 0.097 m–1 (two significant digits) and
x = 4.73 m (three significant digits), the answer should have two significant digits.
Note that whole numbers have essentially an unlimited number of significant digits.
As an example, if a hairdryer uses 1.2 kW of power, then 2 identical hairdryers use
2.4 kW.
1.2 kW{2 significant digit} x 2{unlimited significant digit) = 2.4 kW
{2 significant digit}
12
Floating Point
Arithmetic
Significant digits in Addition and Subtraction and Errors
When quantities are being added or subtracted, the number of decimal places (not
significant digits) in the answer should be the same as the least number of decimal
places in any of the numbers being added or subtracted.
Keep one extra digit in Intermediate Answers
When doing multi-step calculations, keep at least one or more significant digits in
intermediate results than needed in your final answer.
For instance, if a final answer requires two significant digits, then carry at least three
significant digits in calculations. If you round-off all your intermediate answers to
only two digits, you are discarding the information contained in the third digit, and as
a result the second digit in your final answer might be incorrect. (This phenomenon is
known as “round-off error.”)
This truncation process is done either through rounding off or chopping, leading
to round off error.
Example 3: Let x = 4.5 be approximated to x* = 4.49998. Then,

xx – x = – 0.00002,
| x − xx | 1 1 –5 1
= 0.0000044 ≤ 0.000005 ≤ (.00001) = 10 = × 101–6
x 2 2 2
Hence, x* approximates x correct to 6 significant decimal digits.
Wrong way of writing significant digits
1) Writing more digits in an answer (intermediate or final) than justified by the

number of digits in the data.
2) Rounding-off, say, to two digits in an intermediate answer, and then writing three
digits in the final answer.
Example 4: Expressions for significant digits and scientific notation associated with a
floating point number.
Number Number of Scientific

Significant Notation
Figures
0.00682 3 6.82 * 10-3 Leading zeros are not significant.

1.072 4 1.072 (* 100) Embedded zeros are always
significant.
300 1 3 * 102 Trailing zeros are significant only if
the decimal point is specified.
300 3 3.00 * 102
300.0 4 3.000 * 102
Loss of Significant Digits

One of the most common (and often avoidable) ways of increasing the importance of
an error is known as loss of significant digits.
13
Numerical
Computing -I Loss of significant digits in subtraction of two nearly equal numbers:
Subtraction of two nearly equal number gives the relative error
x y
rx– y = rx – ry
x− y x− y
which becomes very large. It has largest value when rx and ry are of opposite signs.
Suppose we want to calculate the number z = x – y and x* and y* are approximations

for x and y respectively, accurate to r digits and assume that x and y do not agree in
the most left significant digit, then z* = x* – y* is as good an approximation to x – y as
x* and y* to x and y.
But, if x* and y* agree at left most digits (one or more), then the left most digits will
cancel and there will be loss of significant digits.
The more the digits on left agrees, the more loss of significant digits. A similar loss in
significant digits occurs when a number is divided by a small number (or multiplied
by a very large number).
Remark 1: To avoid this loss of significant digits in algebraic expressions, we must

rationalise these numbers. If no alternative formulation to avoid the loss of significant
digits is possible, then we can carry more significant digits in calculation using
floating-point numbers in double precision.
Example 5: Solve the quadratic equation x2 + 9.9 x – 1 = 0 using two decimal digit
arithmetic with rounding.
Solution:
Solving the quadratic equation, we have one of the solutions as
− b + b 2 − 4ac −9.9 + (9.9) 2 − 4.1.( − 1)

x= =
2a 2
−9.9 + 102 −9.9 + 10 0.1

= = = =0.05
2 2 2
while the true solutions are – 10 and 0.1. Now, if we rationalize the expression, we
obtain
− b + b 2 − 4ac −4ac
= =
2a 2a(b + b 2 − 4ac )
−2c 2 2 2 2
= = = = = ≅ 0.1 . (0.1000024)
b + b − 4ac )
2
9.9 + 102 9.9 + 10 19.9 20
which is one of the true solutions.
14
Floating Point
Arithmetic
and Errors
1.3 ERROR - BASICS
What is Error?
An error is defined as the difference between the actual value and the approximate
value obtained from the experimental observation or from numerical computation.
Consider that x represents some quantity and xa is an approximation to x, then
Error = actual value – approximate value = x – xa
How errors are generated in computers?
Every calculation has two parts, one is operand and other is operator. Hence, any
approximation in either of the two contributes to error. Approximations to operands
causes propagated error and approximation to operators causes generated errors. Let
us discuss how the philosophy behind these errors is related to computers.
Operand Point of View: Computers need fixed Operator Point of View: Computers need some
numbers to do processing, which is mostly not operation to be performed on the operands
available. Hence, we need to transform the output available. Now, the operations that occur in
of an operation to a fixed number by performing computers are at bit level and complex operations
truncation of series, rounding, chopping etc. This are simplified. There are, hence, small changes in
contributes to difference between exact value and actual operations and operations performed by
approximated value. These errors get further computer. This difference in operations produces
amplified in subsequent calculations as these errors in calculations, which get further amplified
values and the results produced are further utilized in subsequent calculations. This error contribution
in subsequent calculations. Hence, this error is referred to as generated error.
contribution is referred to as propagated error.
What are the sources of error?
The sources of error can be classified as (i) data input errors, (ii) errors in algorithms
and (iii) errors during computations.
Sources of Error?
Data Input Algorithms Computations

Input Error: The Algorithmic Errors: If direct Computational Errors:
input information is algorithms based on finite Even when elementary
rarely exact since it sequence of operations are used, operations, such as
comes from errors due to limited steps don’t multiplication or division
experiments and any amplify the existing errors. But are used, the number of
experiment can give if algorithms with infinite steps digits increases greatly so
results of only limited are used, the algorithm has to be that the number cannot be
accuracy. Moreover, stopped after a finite number of held fully in register
the quantity used can steps available in a given
be represented in a computer. In such a case, a
computer for only a certain number of digits
limited number of must be discarded, and this
digits. again leads to error
15
Numerical
Computing -I Type of Errors?
We list below the types of errors that are encountered while carrying out numerical
calculations to solve a problem.
1) Round off errors arise due to floating point representation of initial data in the
machine. Subsequent errors in the solution due to this are called propagated
errors.
2) Due to finite digit arithmetic operations, the computer produces generated errors
or rounding errors.
3) Error due to finite representation of an inherently infinite process. For example,
consider the use of a finite number of terms in the infinite series expansions of
Sin x, Cos x or f(x) by Maclaurin’s or Taylor Series expression. Such errors are
called truncation errors.
Remark 2: Sensitivity of an algorithm for a numerical process used for computing

f(x): if small changes in the initial data x lead to large errors in the value of f(x), then
the algorithm is called unstable.
How error measures accuracy?
The two terms “error” and “ accuracy” are inter-related, one measures the other, in the
sense less the error is, more the accuracy is and vice versa. In general, the errors
which are used for determination of accuracy are categorized as:
a) Absolute error b) Relative error c) Percentage error
Now, we define these errors.
a) Absolute Error: Absolute error is the magnitude of the difference between the
true value x and the approximate value xa. Therefore, absolute error = | x – xa |.
b) Relative Error: Relative error is the ratio of the absolute error and actual value.
Therefore, relative error = |x – xa | / x .
c) Percentage Error: Percentage error is defined as,

percentage error = 100er= 100 * |x – xa|/x.
Now, we discuss each of the errors defined above, and its propagation in detail.
1.3.1 Rounding-off Error
There are two ways of translating a given real number x into floating-point number
f(x) – rounding and chopping. For example, suppose we want to represent the number
5562 in the normalized floating point representation. The representations for different
values of n are as follows:
n = 1, fl(5562) = .5 * 10 4 chopped
= .6 * 10 4 rounded . (5)
n = 2, fl(5562) = .55 * 10 4 chopped

= .56 * 10 4 rounded . (6)
n = 3, fl(5562) = .556 * 10 4 chopped

= .556 * 10 4 rounded . (7)
16
Floating Point
Arithmetic
Rules for rounding-off: Whenever, we want to use only a certain number of digits and Errors
after the decimal point, then number is rounded-off to that many digits. A number is
rounded-off to n places after decimal by seeing (n+1)th place digit dn+1, as follows:
i) If dn+1 < 5, then it is chopped

ii) If dn+1 >5, then dn = dn + 1
iii) If dn = 5, and dn is odd then dn = dn + 1 else the number dn+1 is chopped.
The difference between a number x and fl(x) is called the round-off error. It is clear
that the round-off error decreases when precision increases. The round-off error also
depends on the size of x and is therefore represented relative to x as
fl( x ) = x (1 + δ). (8)
It is not difficult to show that
δ < .5 *10 −( n −1) in rounding
while, − 10 − ( n −1) < δ ≤ 0 in chopping . (9)
Definition 1: Let x be a real number and x* be a real number having non-terminal

decimal expansion, then we say x* represents x rounded to k decimal places if
1
x − x * ≤ 10 − k , where k is a positive integer.
2
Example 6: If p = 3.14159265, then find out to how many decimal places the
approximate value of 22/7 is accurate?
Solution: We find that

22
p− = 0.00126449
7
1
Since, 0.00126449 < 0.005 = 10 − 2 . Hence, k = 2, and we conclude that the
2
approximation is accurate to 2 decimal places or three significant digits.

1) Round off the following numbers to four significant digits.
(i) 450.92, (ii) 48.3668, (iii) 9.3265, (iv) 8.4155,
(v) 0.80012, (vi) 0.042514, (vii) 0.0049125, (viii) 0.00020215
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) Write the following numbers in floating-point form rounded to four significant

digits.
(i) 100000, (ii) –0.0022136, (iii) –35.666
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
17
Numerical
Computing -I 3) The numbers 28.483 and 27.984 are both approximate and are correct up to the
last digit shown. Compute their difference. Indicate how many significant digits
are present in the result and comment.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
4) Consider the number 2/3. Its floating point representation rounded to 5 decimal
places is 0.66667. Find out to how many decimal places the approximate value of
2/3 is accurate?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
5) Find out to how many decimal places the value 355/133 is accurate as an
approximation to p ?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
1.3.2 Absolute and Relative Errors
We shall now discuss two types of errors that are commonly encountered in numerical
computations. You are already familiar with the rounding off error. These rounded-off
numbers are approximations of the actual values. In any computational procedure, we
make use of these approximate values instead of the true values. How do we measure
the goodness of an approximation fl(x) to x ? The simplest measure which naturally
comes to our mind is the difference between x and fl(x). This measure is called the
error. Formally, we define error as a quantity which satisfies the identity
x = fl(x) + e, (10)
If error e is considerably small, then we say that fl(x) is a good approximation of x.

Error can be positive or negative. We are in general interested in the magnitude or
absolute value of the error which is defined as follows
e = x − fl(x) (11)
Sometimes, when the true value x is very large or very small, we prefer to study the
error by comparing it with the true value. This is known as relative error and we
define this error as
x − f ( x)
relative error = rx =
x
and
x − fl(x) e
relative error = = (12)
x x
18
Floating Point
Arithmetic
Note that in certain computations, the true value may not be available. In that case, we and Errors
replace the true value by the computed approximate value in the definition of relative
error.
Theorem: If fl(x) is the n-digit floating point representation in base β of a real

number x, then rx the relative error in x, satisfies the following:
1 1– n
i) rx < β if rounding is used.
2
ii) 0 ≤ rx ≤ β1 – n if chopping is used.
For proving i), you may use the following:
1
Case 1. dn+1 < β, then fl(x) = ± (.d1d2…dn)β e
2
x-fl(x) = dn+1, dn+2 …β
e– n–1
1 1
≤ β.β e– n–1 = β e– n
2 2
1
Case 2. dn+1 ≥ β,
2
fl(x) = ± {(.d1d2…dn)β e+β e– n}
x-fl(x) = . − d n +1 , d n + 2 . β e-n-1 + β e − n
= β e– n–1 dn+1 . dn+2 ৄ β
1 1
≤ β e– n–1 × β = β e– n
2 2
Example 7: The true value of p is 3.14159265… In menstruation problems the value

22/7 is commonly used as an approximation to p . What is the error in this
approximation?
Solution: The true value of p is p = 3.14159265.
Now, we convert 22/7 to decimal form, so that we can find the difference between the
approximate value and true value. Then, the approximate value of
22
p is = 3.14285714
7
Therefore, absolute error = 0.00126449 and relative-error = 0.00040249966.
The round-off error of computer representation of the number p depends on how

many digits are left out. Make sure that you understand each line of the following
rounding off of the number p :
Number of digits Approximation for p absolute error relative error

1 3.100 0.041593 0.0132%
2 3.140 0.001593 0.0507%
3 3.142 0.000407 0.0130%
Round-off errors may accumulate, propagate and even lead to catastrophic

cancellations leading to loss of accuracy of numerical calculations.
19
Numerical
Computing -I Check Your Progress 3
1) Let x* = .3454 and y* = .3443 be approximations to x and y respectively correct to
3 significant digits. Further, let z* = x* – y* be the approximation to x – y. Then
show that the relative error in z* as an approximation to x – y can be as large as
100 times the relative error in x or y.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) Round the number x = 2.2554 to three significant figures. Find the absolute error
and the relative error.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) If π = 3.14 instead of 22/7, find the relative error and percentage error.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
4) Determine the number of correct digits in s = 0.2217, if it has a relative

error, ε r = 0.2 * 10 −1.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
5) Round-off the number 4.5126 to four significant figures and find the relative
percentage error.
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
1.3.3 Truncation Error
Truncation error is a consequence of doing only a finite number of steps in a

calculation that would require an infinite number of steps to do exactly. A simple
example of a calculation that will be affected by truncation error is the evaluation of
an infinite sum. The computer uses only a finite number of terms and the terms that
are left out lead to truncation error.
20
Floating Point
Arithmetic
Numerical integration is another example of an operation that is affected by truncation and Errors
error. A quadrature formula works by evaluating the integrand at a finite number of
points and using smooth functions to approximate the integrand between those points.
The difference between those smooth functions and the actual integrand leads to
truncation error.
Taylor series represents the local behaviour of a function near a give n point. If one
replaces the series by the n-th order polynomial, the truncation error is said to be
order of n, or O(hn), where h is the distance to the given point. Consider the
irrational number e
e = 2.71828182845905…
and compare it with the Taylor series of the function exp(x) near the given point x = 0.
exp( x ) = 1 + x + x 2 + x 3 6 + .....
Let us check a few Taylor series approximations of the number e = exp(1):
order of n approximation for e absolute error relative error

3 2.500000 0.218282 8.030140%
4 2.666667 0.051615 1.898816%
5 2.708333 0.009948 0.365984%
Example 8: Find the value of e correct to three decimal places.
1 1 1
Solution: Recall that e = 1+ + + + .........
2! 3! 4!
The series is to be truncated such that the finite sum equals e to three decimal places.
This means the must be less than 0.0005. Suppose that the tail starts at n = k+1. Then,
∞
1 1 1
∑
n = k +1 n !
= +
(k + 1)! (k + 2)!
....... + ....
1 1 1
< [1 + + + ......
(k + 1)! (k + 1) (k + 1) 2
1  (k + 1)  1
=   = < 0.0005
(k + 1)! 1 − /(k + 1)  k !k
For k = 6, This expression is satisfied and the truncated value of e = 2.7181.
1.4 SUMMARY
In this unit, we have defined the floating point numbers and their representation for
usage in computers. We have defined accuracy and number of significant digits in a
given number. We have also discussed the sources of errors in computation. We have
defined the round-off and truncation errors and their propagation in later computations
21
Numerical
Computing -I using these values, which contains errors. Therefore, care must be taken to analyise
the computations, so that we are sure that the output of computations is meaningful.
Total error
Modelling errors Inherent errors Numerical errors Blunders
Missing Human
Information imperfection
Data Conversion Roundoff Truncation

errors errors errors errors
Measuring Computing Numerical

method machine method
Figure 2: Types of error and their contribution to total errors
1.5 SOLUTIONS/ANSWERS
1) Using two decimal digit arithmetic with rounding we have,

(a − b) a b
= .71 × 10–1 and − = .59 – .51 = .80 × 10–1
c c c
(a − b)
while true value of = 0.071428 ….
c
(a − b) a b
Therefore, ≠ −
c c c
2) Do as 1) above.
3) Do as 1) above.
1) (i) 50.9 (ii) 48.37 (iii) 9.326 (iv) 8.416 (v) 0.8001 (vi) 0.04251 (vii) 0.004912
(viii) 0.0002022
2) (i) 1000 * 102 or 0.1000 * 106 (ii) –0.2214 * 10–2 (iii) –0.3567 * 102
22
Floating Point
Arithmetic
and Errors
3) We have 28.483 – 27.984 = 00.499. The result has only three significant digits.
This is due to the loss of significant digits during subtraction of nearly equal
numbers.
1
4) We find that 2 3 − 0.66667 = 0.0000033... < 10 −5
2
We find, k = 5. Therefore, the approximation is accurate to 5 decimal places.
5) Left as an exercise.
1 1–3
1) Given, rx, ry, ≤ 10
2
z* = x* – y* = 0.3454 – 0.3443 = 0.0011 = 0.11 × 10–2.
This is correct to one significant digit since last digits 4 in x* and 3 in y* are not
reliable and second significant digit of i* is derived from the fourth digits of x*
and y*.
1 1  1  –2
Max. rz = 101–1 = = (100).   .10 ≥ 100 rx, 100 ry
2 2  2
2) The rounded-off number is 2.25. The absolute error is 0.0054.

0.0054
The relative error is ≈ = 0.0024. The percentage error is 0.24%.
2.25
 22  22
3) Relative error =  − 3.14  = 0.00093. Percentage error = 0.093 %.
 7  7
4) Absolute error = 0.2 * 10 −1 * 0.2217 = 0.04493. Hence x has only one correct digit
x ≈ 0.2 .
5) The number 4.5126 round-off to four significant figures is 4.153.

− 0.0004
Relative percentage error = * 100 = −0.0088% .
4.5126
1.6 EXERCISES
E1) Give the floating-point representation of the following numbers in 2 decimal
digit and 4 decimal digit floating point number using (i) rounding and (ii)
chopping.
(a) 37.21829
(b) 0.022718
(c) 3000527.11059
E2) Show that a(b – c) ≠ ab – ac, where, a = .5555 × 101, b = .4545 × 101,
c = .4535 × 101.
E3) How many bits of significance will be lost in the following subtraction?
37.593621 – 37.584216. Assume each number is correct to seven significant
digits.
23
Numerical
Computing -I E4) What is the relative error in the computation of x – y, where x = 0.3721448693
and y = 0.3720214371 with five decimal digit of accuracy?
E5) Find the smaller root in the magnitude of the quadratic equation
x 2 + 111.11x + 1.2121 = 0 , using five-decimal digit floating point chopped
arithmetic.
1.7 SOLUTIONS TO EXERCISES
E1) a) rounding chopping

.37 × 102 .37 × 102
.3722 × 102 .3721 × 102
b) .23 × 10– 1 .22 × 10– 1

.2272 × 10– 1 .2271 × 10– 1
c) .31 × 10 2 .30 × 102

.3056 × 102 .3055 × 102
E2) Let, a = .5555 × 101, b = .4545 × 101, c = .4535 × 101

b – c = .0010 × 101 = .1000 × 10–1
a(b – c) = (.5555 × 101) × (.1000 × 10–1) = .05555 × 100 = .5550 × 10–1
ab = (.5555 × 101) (.4545 × 101) = (.2524 × 102)
ac = (.5555 × 101) (.4535 × 101) = (.2519 × 102)
and ab – ac = .2524 × 102 – .2519 × 102 = .0005 × 102 = .5000 × 10–1
Hence a(b – c) ≠ ab – ac.
E3) 37.593621 – 37.584216 = (0.37593621)102 – (0.37584216)102

= x* – y* = (0.00009405)102
The numbers are, correct to seven significant digits. Then, in eight digit
floating-point arithmetic, the number can be written as
z* = x* – y* = (0.94050000) 10 −2 But as an approximation to z = x – y, z* is
good only to three digits, since the fourth significant digit of z* is derived from
the eighth digits of x* and y*, and both possibly contains errors. Here, while
the error in z* as an approximation to z = x – y is at most the sum of the errors in
x* and y*, the relative error in z* is possibly 10,000 times the relative error in x*
or y*. Loss of significant digits is, therefore, dangerous only if we wish to
keep the relative error small.
1
Given rx , ry < 10 1−7 , z* = (0.9405)10 −2 , is correct to three significant digits.
2
1 1
Max rz = 10 1− 3 = 10000. 10 −6 ≥ (1000) rz (10000) ry
2 2
E4) With five decimal digit accuracy x* = 0.37214 × 100 , y* = 0.37202 × 100 ,
x* – y* = 0.00012 while x – y = 0.0001234322.
( x − y ) − ( x* − y* ) 0.0000034322
= ≈ 3 × 10 −2 .
x− y 0.0001234322
24
Floating Point
Arithmetic
The magnitude of this relative error is quite large when compared with the and Errors
relative errors of x* and y* (which cannot exceed 5 × 10–5 and in this case it is
approximately 1.3 × 10–5)
E5) Using the formula

−b ± b 2 − 4ac −111.11+111.09
x= , we get, x1 = = – 0.01000
2a 2
while the true solution is x1 = −0.010910, correct to the number of digits
shown.
2c
However, if we calculate x1 as x1 = , we get
b + b 2 − 4ac
−2 x 1.2121 −2.4242
x =
1 =
111.11 + 111.09 222.20
24242
=− = − 0.0109099 = − .0109099
2222000
which is accurate to five digits.
25
Numerical
Computing -I
UNIT 2 SOLUTION OF ALGEBRAIC AND
TRANSCENDENTAL
EQUATIONS
Structure Page Nos.
2.0 Introduction 26
2.1 Objectives 27
2.2 Initial Approximation to a Root 27
2.3 Bisection Method 28
2.3.1 Error Analysis 29
2.4 Regula Falsi Method 29
2.5 Newton’s Method 30
2.5.1 Error Analysis 33
2.6 Secant Method 35
2.7 Method of Successive Iteration 36
2.8 Summary 38
2.9 Exercises 39
2.0 INTRODUCTION
We often come across equations of the forms x4 − 3 x3 + x2 + 6 x − 5 =0 or

ex + x − 2 = 0 etc. Finding one or more values of x which satisfy these equations is
one of the important problems in Mathematics.
An equation of the type f(x) = 0 is algebraic if it contains power of x, that is, f(x) is a
polynomial. The equation is called transcendental, if it contains powers of x,
exponential functions, logarithm functions etc.
Example of algebraic equatoins:
2 x = 5, x 2 + x = 1, x 7 = x(1 + 2 x) .
Example of transcendental equations
x + sin x = 0, e x
= x, tan x = x.
As we know, direct methods can be used to solve the polynomial equations of fourth
or lower orders. We do not have any direct methods for finding the solution of higher
order polynomial equations or transcendental equation. In these cases, we use
numerical methods to solve them.
In this unit , we shall discuss some numerical methods which give approximate
solutions of an equation f(x) = 0. These methods are iterative in nature. An iterative
method gives an approximate solution by repeated application of a numerical process.
In an iterative method, we start with an initial solution and the method improves this
solution until it is improved to acceptable accuracy.
Properties of polynomial equations:
i) The total number of roots of an algebraic equation is the same as its degree.
ii) An algebraic equation can have at most as many positive roots as the number of
changes of sign in the coefficients of f(x).
26
Solution of
Algebraic and
iii) An algebraic equation can have at most as many negative roots as the number of Transcendental
changes of sign in the coefficient of f( − x). Equations
iv) If f(x) = a0xn + a1xn − 1 + a2xn − 2 + ... + an − 1x + an have roots α1,α2,...,αn, then the
following hold good:
− a1 a2 an
∑αi
i =
a0
, ∑α α
i< j
i j =
a0
, ∏αi
i = (−1) n
a0
.
2.1 OBJECTIVES
After studying this unit, you should be able to :
• find an initial guess of a root;

• use bisection method;
• use Regula-falsi method;
• use Newton’s Method;
• use Secant Method, and
• use successive iterative method.
2.2 INITIAL APPROXIMATION TO A ROOT
All the numerical methods have in common the requirement that we need to make an
initial guess for the root. Graphically, we can plot the equation and make a rough
estimate of the solution. However, graphic method is not possible to use in most cases.
We with to determine analytically an approximation to the root.
Intermediate Value Theorem
This theorem states that if f is a continous function on [a, b] and the sign of f(a) is
different from the sign of f(b), that is f(a)f(b) < 0, then there exists a point c, in the
interval (a, b) such that f(c) = 0. Hence, any value c ε (a, b) can be taken as an initial
approximation to the root.
Example 1: Find an initial guess to find a root of the equation, 2x – log10 x = 7.

Solution: Let f(x) = 2 x – log10 x – 7. The values of function f are as given in Table 1.
Table 1
X 1 2 3 4
f(x) −5 − 3.301 − 1.477 0.397
We find f(3)f(4) < 0. Hence, any values in (3, 4) can be taken as an initial guess.
Example 2: Estimate an initial guess to find a root of the equation,

2x – 3 sin x – 5 =0.
Solution: Let f(x) = 2x – 3 sin x – 5 . Note that f(−x) = − 2x + 3 sin x – 5 which is
always negative. Therefore, the function f(x) has no negative real roots. We tabulate
the values of the function for positive x, in Table 2.
Table 2
x 0 1 2 3
f(x) −5 − 5.5224 − 3.7278 0.5766
Since f(2) and f(3) are of opposite signs, a root lies between 2 and 3. The initial guess
can be taken as any value in (2, 3).
27
Numerical
Computing -I
2.3 BISECTION METHOD
This is one of the simplest methods and is based on the repeated application of the
intermediate value theorem.
The bisection method is defined as follows:

i) Find an interval (a, b) in which in root lies, using intermediate value theorem.
ii) Direction the interval (a, b). Let c = (a + b)/2. If f(c) = 0, then x = c is the root and
the root is determined. Otherwise, use the intermediate value theorem to decide
whether the root lies in (a, c) or (c, b).
iii) Repeat step using the interval (a, c).
iv) The procedure is repeated while an length the last interval is less than the desired
accuracy. The mid point of this last interval is taken as the root.
Example 3: Use bisection method to find a positive root of the equation

f(x) = 0.5 ex − 5x + 2
Solution: We find that f(0) = 2.5 and f(1) = −1.6408. Therefore, there is a root
between 0 and 1. We apply the bisection method with a=0 and b= 1. The mid point is
c = 0.5 and f(0.5) = 0.32436. The root now lies in (0.5, 1.0).
The tabulated values are shown in Table 3.
Table 3
After 24 iterations we see that the smaller root can be found in the interval
[.578300476, .578300595]. Therefore, we can estimate one root to be 0 .5783005. One
of the first things to be noticed about this method is that it takes a lot of iterations to
28
Solution of
Algebraic and
get a high degree of precision. In the following error analysis, we shall see method as Transcendental
to why the method is taking so many directions. Equations
2.3.1 Error Analysis

The maximum error after the ith iteration using this process is given by
b−a
εi =
2i
Taking logarithms on both sides and simplfying, we get
i≥
[log(b − a) − log ε i ] (1)
log 2
As the interval at each iteration is halved, we have (ε i +1 / ε i ) = (1 / 2) . Thus, this
method converges linearly .
Example 4 : Obtain the smallest positive root of x3 − 2x − 5 = 0, correct upto 2

decimal places.
Solution : We have f(x) = x3 − 2x − 5 , f(2) = −1 and f(3) = 16. The smallest positive
root lies in (2, 3). Therefore, a = 2, b = 3, b − a = 1, we need solution correct to two
decimal places, that is,
ε ≤ 0.5(10−2), from (1), we get
log 1 − log[0.5(10 −2 )] − log(0.005)

i≥ = ≈ 8.
log 2 log 2
This shows that 8 iterations are required to obtain the required accuracy. Bisection
method gives the iteratived values as x1 = 2.5, x2 = 2.25, …, x8 = 2.09. Then x ≈ 2.09
is the approximate root.
2.4 REGULA FALSI METHOD
Let the root lie in the interval (a, b). Then, P(a, f(a)), Q(b, f(b)) are points as the curve.
Join the points P and Q. The point of intersection of this, with the X-axis, c, line is
taken as the next approximation to the root. We deermine by the intermediate value
theorem, whether the root now lies in (a, c) or (c, b) we repeat the procedure. If x0, x1,
x2,…… are the sequence of approximations, then we stop the iteration when
| xk+1 − xk| < given error tolerance.
Figure 2.0: Regula Falsi method
29
Numerical
Computing -I The equation of line chord joining (a, f(a)), (b, f(b)) is
f (b) − f (a )
y – f(a) = (x − a).
b−a
Setting y = 0, we set the point of intersection with X-axis as givens

b−a
x = c=a− f(a)
f (b) − f (a )
af (b) − bf (a )
=
f (b) − f (a )
If we denote x0 = a, x1 = b, then the iteration formula can be written as

x n+1 f ( x n ) − x n f ( x n−1 )
x n +1 = , n = 1,2,...... (2)
f ( x n ) − f ( x n −1 )
The rate of convergence is still linear but faster than that of the bisection method.
Both these methods will fail if f has a double root.
Example 5: Obtain the positive root of the equation x2 −1 = 0 by Regua Falsi method.
Solution: Let f(x)=x2 −1. Since f(0) = −1, f(2) = 3, Let us take that the root lies in
(0, 2). We have x0 = 0, x1 = 2.
Then, using (2), we get

x 0 f (2) − x1 f (0) 0 − 2(−1)
x2 = = = 0.5, f (0.5) = −0.75
f ( 2) − f ( 0) 3 +1
The root lies in (0.5, 2.0), we get

0.5 f (2) − 2.0 f (0.5) 0.5(3) − 2.0(−0.75
x3 = = = 0.8
f (2) − f (0.5) 3 + 0.75
f(0.8) = -0.36. The root lies in (0.8, 2). The next approximation
0.8(3) − 2.0(−0.36)
x4 = = 0.9286, f (0.9286) = −0.1377.
3 + 0.36
We obtain the next approximations as x5 = 0.9756, x6 = 0.9918, x7 = 0.9973,

x8 = 0.9990. Since, |x8 – x7| = 0.0017 < 0.005, the approximation x8 = 0.9990 is correct
to decimal places.
Note that in this problem, the lower end of the interval tends to the root, and the
minimum error tends to zero, but the upper limit and maximum error remain fixed. In
other problems, the opposite may happen. This is the property to the regula falsi
method.
2.5 NEWTON’S METHOD
This method is also called Newton-Raphsan method. We assume that f is a

differentiable function in some interval [a, b] containing the root.
We first look at a “pictorial” view of how Newton's method works. The graph of
y = f(x) is plotted in Figure 3.1. The point of intersection x = r, is the required root.
30
Solution of
Algebraic and
Transcendental
Equations
x0 r
Figure 2.1: Plor of y = f(x)
Let x0 be an initial approximation of r. Then, (x0, f(x0)) is a point as the curve

(Figure 3.2).
(x0 , f(x0 ))
x0 r
Figure 2.2: Point on the centre
Draw the tangent line to the curve at the point (x0 , f(x0 )) . This line intersects the
x-axis at a new point, say x1 (Figure 3.3).
(x0 , f(x0 ))
x0 x1 r
Figure 2.3: Tangent at (x0, f(x0))
Now, x1 is a better approximation to r, than x0 . We now repeat this process, yielding

new points x2 , x3 , ... until we are “close enough” to r. Figure 3.4 shows one more
iteration of this process, determining x2 .
(x0 , f(x0 ))
x0 x1 x2 r
Figure 2.4: Newton’s method
31
Numerical
Computing -I Now, we derive this method algebraically. The equation of the tangent at (x0, f(x0)) is
given by
y − f(x0) =f’(x0)(x − x0)

Where f’(x0) is the slope of the curve at (x0, f(x0)). Setting y = 0, we get the point of
intersection of the tangent with x-axis as
f ( x0 )
y − f(x0) =f’(x0)(x − x0), or x = x 0 −
f ' ( x0 )
But, this is our next approximation, that is

f ( x0 )
x1 = x0 −
f '( x0 )
Iterating this process, we get the Newton-Rephson as

f ( xn )
xn + 1 = xn − for n = 0, 1, 2,……. (3)
f '( xn )
Example 6: Find the smallest positive root of x7 + 9x5 − 13x − 17 = 0.
Solution : Let f(x) = x7 + 9x5 − 13x − 17, we have f(0) < 0, f(1) < 0 and f(1)f(2) < 0.
Hence, the smallest positive root lies in (1, 2). We can take any value in (1, 2) or one
of the end points as the initial approximation. Let x0 = 1, we have,
f’(x) =7x6 +45x4 − 13. The Newton-Raphson method becomes
x n7 + 9 x n5 − 13 x n − 17
x n +1 = xn − , n = 0,1,2.........
7 x n6 + 45 x n4 − 13
Starting with x0 = 1, we obtain the values given in Table 4.
Table 4
n xn f(xn ) f'(xn ) xn+1
0 1 −20 39 1.512820513
1 1.512820513 52.78287188 306.6130739 1.340672368
2 1.340672368 12.33751268 173.0270062 1.269368397
3 1.269368397 1.46911353 133.1159618 1.258332053
4 1.258332053 0.03053547 127.6107243 1.258092767
5 1.258092767 0.00001407 127.4932403 1.258092657
After 6 iterations of Newton’s method, we have

x6 − x5 = 1.258092657 − 1.258092767 = 0.000000110.
Therefore, the root correct to 6 decimal places is r = 1.258092657.
Possible drawbacks:
Newton’s method may not work in the following cases:

i) The x-values may run away as in Figure 2.5(a). This might occur when the x-axis
is an asymptote.
ii) We might choose an x-value that when evaluated, the derivative gives us 0 as in
Figure 2.5(b). The problem here is that we want the tangent line to intersect the
x-axis so that we can approximate the root. If x has a horizontal tangent line, then
we can't do this.
32
Solution of
Algebraic and
iii) We might choose an x, that is the beginning of a cycle as in Figure 2.5(c). Again Transcendental
it is hoped that the picture will clarify this. Equations
(a) Runaway (b) Flat spot
(c) Cycle
Figure 2.5: Divergence of Newton’s method
However, the dificulties posed have one artifical. Normally, we do not encounter such
problems in practice. Newton-Raphson methods is one of the powerful methods
available for obtaning a simple root of f(x) = 0.
2.5.1 Error Analysis
Let the error at the nth step be defined as

en = xn − x
Then the error at the next step is

f ( x + en )
x + en +1 = x + en − .
f ' ( x + en )
Explaing in Taylor Series, we obtain

f ( x ) + en f '( x) + 12 en 2 f ''( x) + .....
en +1 = en − (4)
f '( x) + en f ''( x) + .....
Since, x is a root, we have f(x) = 0. Then,
1 f " ( x)
en f ' ( x)[1 + en + .......]
2 f ' ( x)
en +1 = en −
f " ( x)
f ' ( x)[1 + en + ..........]
f ' ( x)
1 f " ( x) f " ( x)
= en − en [1 + en + ........][1 + en + ........]−1
2 f ' ( x) f ' ( x)
1 f " ( x) f " ( x)
= en − en [1 + en + ........][1 − en + ........]
2 f ' ( x) f ' ( x)
1 f " ( x)
= en − en [1 − en + ........]
2 f ' ( x)
1 f " ( x) 2
= en en + ........] (5)
2 f ' ( x)
We can neglect the centric and higher powers of en, as they are much smallest than en2,
(en is itself a small number).
33
Numerical
Computing -I Notice that the error is squared at each step. This means that the number of correct
decimal places doubles with each step, much faster than linear convergence. We call it
quadratic convergence.
This sequence will converge if
f " ( x) 2 f " ( x)
en < en , en < 2 (6)
f ' ( x) f ' ( x)
If f’ is not zero at the root (simple root), then there will always be a range round the
root where this method converges.
If f’ is zero at the root (double root), then the convergence becomes linear.
Example 7: Compute the square root of a, using Newton’s method. How does the
error behave?
Solution: Let x = a , or x2 = a. Define f(x) = x2 − a. Here, we know the root exactly,

so that we can see how well the method converges.
We have the Newton’s method for finding a root of f(x) = 0 as
f ( xn ) x 2 − a xn 2 + a 1  a
xn +1 = xn − = xn − n = =  xn +  (7)
f '( xn ) 2 xn 2 xn 2 xn 
Starting with any suitable initial approximation to a , we find x1, x2, ….., which
convege to the required value.
Error at the nth step is en= xn − a . Substituting, we get
(en + a ) 2
en +1 = − a
2( a + en )
2a + 2en + a + en 2
= − a
2( a + en )
2( a + en ) a + en 2
= − a
2( a + en )
en 2
= (8)
2( a + en )
If a = 0, this simplifies to en/2, as expected. Here, we are finding the root of x2 = 0,

which gives a double root x = 90.
Since a>0, en+1 will be positive, provided en is greater than −√a, i.e provided xn is
positive. Thus, starting from any positive number, all the errors, except perhaps the
first, will be positive.
The method converges when,

en 2
| en +1 |= <| en | (9)
2( a + en )
or en < 2(en + a )
34
Solution of
Algebraic and
which is always true. Thus, the method converges to the square root, starting from any Transcendental
positive number, and it does so quadratically. Equations
We now discuss another method, which does not require the knowledge of the
derivative of a function.
2.6 SECANT METHOD
Let x0, x1 be two initial approximations to the root. We do not require that the root lie
in (x0, x1) as in Regula Falsi method. Therefore, the approximations x0, x1 may lie on
the same side of the root. Further, we obtain the sequence of approximations as
x2, x3…… At any stage, we do not require or check that the not lies in the interval
(xk, xk –1). The derivation of the method is same as in the Regula Falsi method.
(Figure 2.6)
Figure 2.6: Secant method
The method is given by (see Equation (21))

x n −1 f n − x n f n −1
x n +1 = (10)
f n − f n −1
We compute x2 using x0, x1; x3 using x1, x2; and so on.
The rate of convergence of the method is super linear (1.618), that is, it works better
than the Regula Falsi method.
Example 8: Use secant method to find the roots of the equation f(x) = 0.5ex − 5x + 2.
Solution: We have f(−x) =0.5e−x + 5x +2 > 0 for all x. Hence, there is no negative
root.
We obtain,
f(0) = 2.5, f(1) = − 1.6408, f(2) = − 4.3055, f(3)= −2.9572,
f(4) = 902990, f(x) > 0 for x > 4.
Therefore, the given function has two roots, one root in (0, 1) and the second root in
(3, 4).
For finding the first not, we take x0 = 0, x1 = 1, and compute the sequence of
approximations x2, x3,…..
For finding the second root, we take x0 = 3 , x1 = 4 and compute the sequence of
approximations x2, x3,……
35
Numerical
Computing -I The results are given in Table 5.
Table 5
xn-1 xn xn-1 xn xn+1

0 1 2.5 –1.640859086 0.60373945
1 0.603739453 –1.6408591 –0.104224624 0.57686246
0.603739453 0.576862465 –0.1042246 0.002909403 0.57830459
0.576862465 0.578304589 0.0029094 –1.68E-06 0.57830058
0.578304589 0.578300578 –1.65E-05 –2.57E-09 0.57830058
0.578300578 0.578300577 –2.57E-09 1.11E-15 0.57830058
0.578300577 0.578300577 1.11E-15 0 0.57830058
3 4 –2.9572315 9.299075017 3.24128244

4 3.241282439 9.29907502 –1.423168098 3.34198736
3.241282439 3.341987358 –1.4231681 –0.572304798 3.40972316
3.341987358 3.409723161 –0.5723048 0.079817605 3.40173252
3.409723161 3.401432525 0.07981761 –0.003635061 3.40179365
3.401432525 3.401793651 –0.0036351 –2.15E-09 3.4017958
3.401793651 3.401795804 –2.15E-05 5.87E-09 3.4017958
3.401795804 3.401795804 5.87E-09 –7.11E-15 3.4017958
The two roots are 0.57830058, 3.4017958 correct to all decimal places given.
2.7 METHOD OF SUCCESSIVE ITERATION
The first step in this method is to write the equation f(x) = 0 in the form
x = g(x). (11)
For example, consider the equation x2 – 4x + 2 =0 . We can write it as
x= 4x − 2 , (12)
or as x = (x2 + 2)/ 4, (13)

2
or as x = (14)
4−x
Thus, we can choose from (11) in several ways. Since, f(x) = 0 is the same as x =
g(x), finding a root of f(x) = 0 is the same as finding a root of x = g(x), i.e. finding a
fixed point α of g(x) such that α = g(α) . The function g(x) is called an iteration
function for solving f(x) = 0.
If an initial approximation x0 to a root α is provided , a sequence x1 , x 2 ,….. may be

defined by the iteration scheme
x n+1 = g( xn ) (15)
with the hope that the sequence will converge to α. The successive iterations for
solving x = e−x/3, by the method xn + 1 = e−xn/3 is given in Figure 2.7.
36
Solution of
Algebraic and
Transcendental
Equations
Figure 2.7: Successive iteration method
The method converges if, for some constant M such that 0< M <1, the inequality
| g(x) – g(α) | ≤ M | x – α | (16)
holds true whenever |x − α | ≤ | x0 – α | . For, if (16) holds, we find that
| xn + 1 – α | = | g(xn ) – α | = | g(xn ) – g(α ) | ≤ M | xn – α | (17)
Using this relation recursively, we get
| xn+1 – α | ≤ M | xn – α | ≤ M 2 | xn-1 – α | ≤ M n | x0 – α | (18)
Since, 0 < M < 1, lim Mn = 0 and thus lim xn = α.
Condition (16) is satisfied if the function g(x) possesses a derivative g ′ (x) such that
| g ′ (x) | < 1 for | x − α | < | x0 – α| If xn is close to α , then we have
| xn + 1 – α | = | g(xn ) – g(α ) | ≤ g ′ (ξ ) | xn − α | (19)
for some ξ between x0 and α . Therefore, condition for convergence is
| g ′ (ξ ) | < 1, or | g ′ (x) | < 1. (20)
Example 9 : Let us consider the equation f(x) = x3 + x − 2. It has only one real root at
x = 1. There are several ways in which f(x)=0 can be written in the desired
form, x = g(x).
For example, we may write x = x + f(x) = g(x) and write the method as
xn+1 = xn + f ( xn ) = xn3 + 2 xn − 2
In this case, g ′ (x) = 3x2 + 2, and the convergence condition is

g ′( x) = 3 x 2 + 2 < 1, or 3 x 2 < −1.
Since, this is never true, this arrangement doesn't converge to the root.
An alternate rearrangement is
xn+1 = 2 − xn3
37
Numerical
Computing -I This method converges when
1 1
g ′( x) = − 3 x 2 < 1, or x2 < , or x < .
3 3
Since this range [−1 3 ,1 3 ] does not include the root x = 1, this method will not
converge either.
Another rearrangement is
xn+1 = 3 2 − xn
In this case, the convergence condition becomes

1 −2 3
(2 − x n ) < 1, or ( 2 − x n ) − 2 < 33 , or x n − 2 > 27 .
3
Again, this range does not conain the root.
Another rearrangement is
2
xn +1 = (21)
xn + 1
2
In this case, the convergence condition becomes

4| x|
< 1, 4 | x |< (1 + x 2 ) 2
(1 + x 2 )2
This inequality is satisfied when x > 1. Hence, if we start with such an x, the method
will converge to the root.
Let x0 = 1.2, Then, from (21), we obtain the sequence of approximations as

x1 = 0.8197, x2 = 1.1963, x3 = 0.8227, x4 = 1.1927, x5 = 0.8255, x6 = 1.1894, …….
The approximations oscillate about x = 1 and converge very slowly.
2.8 SUMMARY
In this unit, we have discussed the acutal meaning of root. In general, root
determination of an equation is a tedious exercise. So, to handle such tasks, we have
discussed some basic, simple, but still powerful methods of root determination. The
methods discussed in this unit are Bisection method, Regular falsi method, Newton’s
method, Secan method and Successive iteration method.
2.9 EXERCISES
E1) In the following problems, find the intervals of length 1 unit, in which the roots
lie
(a) 12x3 – 76x2 + 131x – 42 = 0; (b) 4x2 + 8x – 21 = 0
(c) x – e–x = 0 (d) x = 2 cos x
E2) Find all the roots in Problems 1(a), (b), (c) by ergular falth method, secant
method and Newton-Raghson method.
38
Solution of
Algebraic and
E3) Find the smaller roots in Problems 1(b) and the root in 1(c), by successive Transcendental
iteration method. Equations
E4) Show that the equation x3 – 6x – 1 = 0, has a root in the internal (–1, 0). Obtain
this root using the successive iteration method.
2.10 SOLUTIONS TO EXERCISES
E1) (a) (0, 1), (1, 2), (3, 4) (b) (– 4, –3), (1, 2)
(c) (0, 1) (d) (0.5, 1.5)
E2) (a) 0.5, 1.2, 3.5 (b) – 3.5, 1.5

(c) 0.567143
E3) (a) Use xn+1 = xn – 0.05(4xn2 + 8xn – 21) with x0 = 1.4

(b) Write xn+1 = en–xn
E4) Write xn+1 = (xn3 – 1) / 6; x0 = –0.5, – 0.167449
39
Numerical
Computing -I
UNIT 3 SOLUTION OF LINEAR
ALGEBRAIC EQUATIONS
Structure Page Nos.
3.0 Introduction 40
3.1 Objectives 41
3.2 Gauss Elimination Method 41
3.3 Pitfalls of Gauss Elimination Method 45
3.4 Gauss Elimination Method with Partial Pivoting 46
3.5 LU Decomposition Method 46
3.6 Iterative Methods 49
3.7 Summary 55
3.8 Exercises 55
3.0 INTRODUCTION
Systems of linear equations arise in many areas of study, both directly in modelling
physical situations and indirectly in the numerical solution of other mathematical
models. Linear algebraic equations occur in the linear optimization theory, least
square fitting of data, numerical solution of ordinary and partial differential equations,
statistical interference etc. Therefore, finding the numerical solution of a system of
linear equations is an important area of study.
From study of algebra, you must be familiar with the following two common methods
of solving a system of linear equations :
1) By the elimination of the variables by elementary row operations.

2) By the use of determinants, a method better known as Cramer’s rule.
When smaller number of equations are involved, Cramer’s rule appears to be better
than elimination method. However, Cramer’s rule is completely impractical when a
large number of equations are to be solved because here n+1 determinants are to be
computed for n unknowns.
Numerical methods for solving linear algebraic systems can be divided into two
methods, direct and iterative. Direct methods are those which, in the absense of
round-off or other errors, yield exact solution in a finite number of arithmetic
operations. Iterative methods, on the other hand, start with an initial guess and by
applying a suitable procedure, give successively better approximations.
To understand, the numerical methods for solving linear systems of equations, it is

necessary to have some knowledge of properties of the matrices. You might have
studied matrices, determinants and their properties in your linear algebra course.
In this unit, we shall discuss two direct methods, namely, Gauss elimination method
and LU decomposition method, and two iterative methods, viz.; Jacobi method,
Gauss – Seidel method and Successive over relaxation method.These methods are
frequently used to solve systems of linear equations.
40
Solution of
Linear Algebraic
3.1 OBJECTIVES Equations
After studying this unit, you should be able to:
• state the difference between direct and iterative methods for solving a system of
linear equations;
• learn how to solve a system of linear equations by Gauss elimination method;
• understand the effect of round off errors on the solution obtained by Gauss
elimination method;
• learn how to modify Gauss elimination method to Gaussian elimination with
partial pivoting to avoid pitfalls of the former method;
• learn LU decomposition method to solve a system of linear equations;
• learn how to find inverse of a square matrix numerically;
• learn how to obtain the solution of a sytem of linear equations by using an
iterative method, and
• state whether an iterative method will converge or not.
3.2 GAUSS ELIMINATION METHOD
One of the most popular techniques for solving simultaneous linear equations is the
Gaussian elimination method. Karl Friedrich Gauss, a great 19th century
mathematician, suggested this elimination method as a part of his proof of a particular
theorem. Computational scientists use this “proof” as a direct computational method.
The approach is designed to solve a general set of n equations and n unknowns
a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = b1

a21x1 + a22 x2 + a23 x3 + ... + a2 n xn = b2
. . . . (1)
an1 x1 + an 2 x2 + an3 x3 + ... + ann xn = bn
In matrix form, we write Ax = b, where
 a11 a12 .. a1n   x1   b1 

a a 22 .. a 2 n  x  b 
A=   2 ,b=  2
21
,x=
 .. .. .. ..  .  .
     
 a n1 an2 .. a nn  xn  bn 
Gaussian elimination consists of two steps:
1) Forward Elimination: In this step, the elementary row operations are applied on
the augmented matrix [A|b] to transform the coefficient matrix A into upper
triangular form.
2) Back Substitution: In this step, starting from the last equation, each of the
unknowns is found by back substitution.
Forward Elimination of Unknowns: In this first step the first unknown, x1 is

eliminated from all rows below the first row. The first equation is selected as the
pivot equation to eliminate x1. So, to eliminate x1 in the second equation, one divides
41
Numerical
Computing -I the first equation by a11 (hence called the pivot element) and then multiply it by a21.
That is, same as multiplying the first equation by a21/ a11 to give
a21 a a
a21x1 + a12 x2 + ... + 21 a1n xn = 21 b1
a11 a11 a11
Now, this equation is subtracted from the second equation to give
 a   a  a
 a22 − 21 a12  x2 + ... +  a2 n − 21 a1n  xn = b2 − 21 b1
 a11   a11  a11
or '
a 22 x 2 + ... + a 2' n x n = b2
a 21
'
where a 22 = a 22 − a12 ,…,
a11
a 21 a
a 2' n = a 2 n − a1n , b2' = b2 − 21 b1 .
a11 a11
This procedure of eliminating x1 , is now repeated for the third equation to the
nth equation to reduce the set of equations as
a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = b1

'
a 22 x 2 + a 23
'
x3 + ... + a 2' n x n = b2 (2)
a x 2 + a x + ... + a x n = b3
'
32
'
33 3
'
3n
. . . .
a x 2 + a x3 + ... + a x n = b .
'
n2
'
n3
'
nn
'
n
This completes the first step of forward elimination. Now, for the second step of
forward elimination, we start with the second equation as the pivot equation and a ' 22
as the pivot element. So, to eliminate x2 in the third equation, one divides the second
a'
equation by a ' 22 (the pivot element) and then multiply it by 32 . That is, same as
a' / a'
multiplying the second equation by 32 22 and subtracting from the third equation.
This makes the coefficient of x2 zero in the third equation. The same procedure is
now repeated for the fourth equation till the nth equation to give
a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = b1

'
a 22 x 2 + a 23
'
x3 + ... + a 2' n x n = b2'
''
a 33 x3 + ... + a 3'' n x n = b3'' (3)
. .
a n'' 3 x3 + ... + a nn
''
x n = bn''
The next steps of forward elimination are done by using the third equation as a pivot
equation and so on. That is, there will be a total of (n–1) steps of forward elimination.
At the end of (n–1) steps of forward elimination, we get the set of equations
a11x1 + a12 x2 + a13 x3 + ... + a1n xn = b1

'
a 22 x 2 + a 23
'
x3 + ... + a 2' n x n = b2'
42
Solution of
''
a 33 x3 + ... + a n'' x n = b3'' (4) Linear Algebraic
Equations
. .
(n −1) (n −1 )
ann xn = bn
Back Substitution: Now, the equations are solved starting from the last equation as it
has only one unknown. We obtain
bn( n −1)
xn = ( n −1)
a nn
Now, we solve the (n–1)th equation to give
x n −1 =
1
( n − 2)
[b( n −2)
n −1 − a n( n−−1,2n) x n ]
a n −1, n − 2
since x n is determined.
We repeat the procedure until x1 is determined. The solution is given by
bn( n −1)
xn = ( n −1)
a nn
bi(i −1) − ∑ aij(i −1) x j

n
j =i +1
xi =
and aii(i −1) , for i = n – 1, n – 2,…, 1 (5)
Example 1: Solve the following linear system of equations

x1+ x2 + x3 = 3,
4x1 + 3x 2 + 4x 3= 8,
9x1 + 3 x 2+ 4x3 = 7
using the Gauss elimination method.
Solution: In augmented form, we write the system as

1 1 1 3
 
4 3 4 8 
9 3 4 7 
Subtracting 4 times the first row from the second row gives
1 1 1 3 
 
0 − 1 0 − 4 
9 3 4 7 
Subtracting 9 times the first row from the third row, we get
1 1 1 3 
 
0 − 1 0 − 4 
0 − 6 − 5 − 20
43
Numerical
Computing -I Subtracting 6 times the second row from the third row gives
1 1 1 3
 
0 − 1 0 − 4
0 0 − 5 4 
Restoring the transformed matrix equation gives
−4
Solving the last equation, we get x 3 = . Solving the second equation, we get
5
4 −1
x 2 = 4 and the first equation gives x1 = 3 – x 2 – x 3 = 3 – 4 + = .
5 5
Example 2: Use Gauss Elimination to solve

10 x1 − 7 x 2 = 7
− 3 x1 + 2.099 x 2 + 6 x3 = 3.901
5 x1 − x2 + 5 x3 = 6
correct to six places of significant digits.
Solution : In matrix form , we write
 10 − 7 0  x1   7 
− 3 2.099 6  x  = 3.901
   2  
 5 − 1 5  x3   6 
Multiply the first row by 3/10 and add to the second equation, we get
10 −7 0  x1   7 
 0 − 0.001 6  x  = 6.001
   2  
 5 −1 5  x3   6 
Multiply the first row by 5/10 and subtract from the third equation, we get
10 −7 0  x1   7 
 0 − 0.001 6  x  = 6.001
   2  
 0 2.5 5  x3   2.5 
This completes the first step of forward elimination.
Multiply the second equation by 2.5/(–0.005)= –2500 and subtract from the third
equation, we obtain
10 −7 0   x1   7 
 0 − 0.001 6  x  =  6.001 
  2  
 0 0 15005  x3  15005
We can now solve the above equations by back substitution. From the third equation,
we get
44
Solution of
15005x3 = 15005 , or x 3 = 1. Linear Algebraic
Equations
Substituting the value of x3 in the second equation, we get
− 0.001x 2 + 6 x3 = 6.001, or –0.001 x 2 = 6.001 − 6 = 0.001 , or x 2 = −1
Substituting the values of x3 and x2 in the first equation, we get
10 x1 − 7 x 2 = 7 , or 10 x1 = 7 + 7 x 2 = 0 , or x1 = 0 .
Hence, the solution is [0 − 1 1] ,

T
3.3 PITFALLS OF GAUSS ELIMINATION

METHOD
There are two pitfalls in the Gauss elimination method.
Division by zero: It is possible that division by zero may occur during forward
elimination steps. For example, for the set of equations
10 x 2 − 7 x3 = 7
6 x1 + 2.099 x 2 − 3 x3 = 3.901
5 x1 − x 2 + 5 x3 = 6
during the first forward elimination step, the coefficient of x1 is zero and hence
normalisation would require division by zero.
Round-off error: Gauss elimination method is prone to round-off errors. This is true,
when there are large numbers of equations as errors propagate. Also, if there is
subtraction of almost equal numbers, it may create large errors. We illustrate through
the following examples.
Example 3: Solve the following linear equations

10 −5 x + y = 1.0,
x + y = 2.0 (6)
correct to 4 places of accuracy.
Solution: For 4 places of accuracy the solution is , x ≈ y ≈ 1.0 .

Applying the Gauss elimination method, we get (by dividing with the pivotal element)
x + 10 5 y = 10 5
( 1 − 10 5 ) y = 2.0 − 10 5.
Now, 10 5 – 1 when rounded to four places of accuracy, become is 10 5. Similarly,

10 5 – 2 when rounded to four places of accuracy becomes 10 5.
Hence, from the second equation we get, 10 5 y = 10 5, or y = 1.0.
Substituting in the first equation, we get x = 0.0, which is not the solution.
Such errors can also arise when we perform computations with less number of digits.
To avoid these computational disasters, we apply partial pivoting to gauss elimination.
45
Numerical
Computing -I
3.4 GAUSS ELIMINATION METHOD WITH
PARTIAL PIVOTING
We perform the following modification to the Gauss elimination method. At the

beginning of each step of forward elimination, a row interchange is done, if necessary,
based on the following criterion. If there are n equations, then there are (n − 1)
forward elimination steps. At the beginning of the kth step of forward elimination, we
find the maximum of
a kk a k +1,k a nk
, , …………,
That is, maximum in magnitude of this elements on or below the diagonal element.
a
Then, if the maximum of these values is pk in the pth row, k ≤ p ≤ n, then
interchange rows p and k.. The other steps of forward elimination are the same as in
Gauss elimination method. The back substitution steps remain exactly the same as in
Gauss elimination method.
Example 4: Consider Example 3. We now apply partial pivoting on system (6).
Solution: We obtain the new system as
Since, a11 < a 21, we interchange the first and second rows (equations).
x + y = 2.0
10 −5 x + y = 1.0
On elimination , we get second equation as y = 1.0 correct to 4 places. Substituting in

the first equation, we get x = 1.0, which is the correct solution.
3.5 LU DECOMPOSITION METHOD
The Gauss elimination method has the disadvantage that the right-hand sides are
modified (repeatedly) during the steps of elimination). The LU decomposition method
has the property that the matrix modification (or decomposition) step can be
performed independent of the right hand side vector. This feature is quite useful in
practice. Therefore, the LU decomposition method is usually chosen for computations.
In this method, the coefficient matrix into a product of two matrices is written as
A=LU (7)
where L is a lower triangular matrix and U is an upper triangular matrix.
Now, the original system of equations, A x = b becomes

LU x = b (8)
Now, set U x = y, then, (8) becomes

Ly = b (9)
The rationale behind this approach is that the two systems given in (9) are both easy to
solve. Since, L is a lower triangular matrix, the equations, Ly = b , can be solved for
y using the forward substitution step. Since U is an upper triangular matrix, U x = y
can be solved for x using the back substitution algorithm.
46
Solution of
We define writing A as LU as the Decomposition Step. We discuss the following Linear Algebraic
three approaches of Decomposition using 4 × 4 matrices. Equations
Doolittle Decomposition
We choose lii = 1, i=1, 2, 3, and write
1 0 0 0  u11 u12 u13 u14   a11 a12 a13 a14 

l 0   0 u22 u23 u24   a21 a24 
 21 1 0
=
a22 a23
(10)
l31 l32 1 0  0 0 u33 u34   a31 a32 a33 a34 
    
l41 l42 l43 1  0 0 0 u44   a41 a42 a43 a44 
Because of the specific structure of the matrices, we can derive a systematic set of
formulae for the components of L and U .
Crout Decomposition:
We choose u ii = 1 , I = 1, 2, 3, 4 and write
 l11 0 0 0  1 u12 u13 u14   a11 a12 a13 a14 

l 0 0  0 1 u23 u24   a21 a24 
 21 l22  =
a22 a23
(11)
l31 l32 l33 0  0 0 1 u34   a31 a32 a33 a34 
    
l41 l42 l43 l44  0 0 0 1   a41 a42 a43 a44 
The evaluation of the components of L and U is done in a similar fashion as above.
Cholesky Factorization:
If A is a symmetric and positive definite matrix, then we can write the decomposition
as
Where L is the lower triangular matrix
 l11 0 0 0
l l 0 0 
L =  21 22 (12)
l31 l32 l33 0
 
l41 l42 l43 l44 
We now describe the rationale behind the choice of lii = 1 in (10) or u ii = 1 in (11).
Consider the decomposition of a 3 × 3 matrix as follows.
 l11 0 0  u11 u12 u13   a11 a12 a13 

l 0   0 u22 u23  =  a21 a23 
 21 l22 a22
l31 l32 l33   0 0 u33   a31 a32 a33 
 l11u11 l11u12 l11u13   a11 a12 a13 

l u l u + l u l21u13 + l22u23  = a a23  (13)
 21 11 21 12 22 22   21 a22
 l31 l31u12 + l32u22 l31u13 + l32u23 + l33u33   a31 a32 a33 
47
Numerical
Computing -I We note that L has 1+2+3=6 unknowns and U has 3+2+1=6 unknowns, that is, a total
of 12 unknowns. Comparing the elements of the matrices on the left and right hand
sides of (13), we get 9 equations to determine 12 unknowns . Hence, have 3 arbitrary
parameters, the choice of which can be done in advance. Therefore, to make
computations easy we choose lii = 1 in Doolittle method and u ii = 1 in Crout’s
method.
In the general case of decomposition of an N × N matrix, L has 1+2+3+….+N=

[N ( N + 1) / 2] And U also has N(N+1)/2 unknowns, that is a total of N 2 + N
unknowns comparing the elements of A and the product LU, we obtain N 2
equations. Hence, we have N arbitrary parameters. Therefore, we choose either
lii = 1 or u ii = 1 ,i = 1,2,….n,
Now , let us give the solution for the Doolittle and Crout decomposition.
Doolittle Method: Here l ii = 1, i = 1 to N. In this case, generalisation of (13) gives

u 1j = a 1j , j = 1 to N
l i1 = a i1 / a 11 , i = 2 to N
u2 j = a2j – l21 . u1 j , j=2 to N
l i2 = ( a i2 – l i1 u 12 ) / u22 , i = 3 to N, and so on
Crout’s Method: Here u ii = 1, i = 1 to N . In this case , we get

l i1 = a i1 , i=1 to N
u1j = a1j / a11 , j= 2 to N
l i 2 = a i2 – l i1 u 12, i = 2 to N
u 2j = ( a 2j – l21 u 1j ) / l22, j= 3 to N, and so on
Example 5: Given the following system of linear equations, determine the value of
each of the variables using the LU decomposition method.
6x1 – 2x2 = 14
9x1 – x2 + x3= 21
3x1 – 7x2 + 5x3= 9
Solution: We write A = LU, with uii = 1 as
 6 −2 0   l11 0 0  1 u12 u13 

 9 −1 1  =  l 0  0 1 u23 
   21 l22
 3 +7 5  l31 l32 l33  0 0 1 
 l11 l11u12 l11u13 

= l21 l21u12 + l22 l21u13 + l22u33


l31 l31u12 + l32 l31u13 + l32u23 + l33 
We obtain l11 = 6, l21 = 9, l31 = 3, l11u12 = – 2, u12 = – 1/3;
l11 u13 = 0, u13 = 0
l21u12 + l22 = – 2, l22 = – 1 + 3 = 2; l21u13 + l22 u23 = 1
u23 = 1/2, l31u12 + l32 = + 7, l32 = + 7 + 1 = 8;
l31u13 + l32u23 + l33 = 5, l33 = 5 – 4 = +1.
48
Solution of
6 0 0   1 −1/ 3 0  Linear Algebraic
Hence, L = 9 2 0 , U =  0
  1/ 2 
Equations
1
   
 3 8 +1  0 0 1 
 6 0 0   y  14  1
Solving Ly = b, 9 2 0   y  =  21
    
2
 3 8 1   y   9 
3
14 7 1
We get, y1 = = ; 9 y1 + 2 y2 = 21, y2 = (21–21) = 0
6 3 2
3 y1 + 8 y2 + y3 = 9, y3 = 9 – 7 = 2.
1 −1/ 3 0   x  7 / 3 1
Solving Ux = y,  0 1 1/ 2   x  =  0 
    
2
 0 0 1   x   2  3
1 1 7 7 1
We get, x3 = 2, x2 + x3 = 0, x2 = –1; x1 – x2 = , x1 = – = 2.
2 3 3 3 3
The solution vector is [2 –1 2].
Solving the system of Equations
After decomposing A as A = LU, the next step is to computate the solution. We have,
LUX = b, set UX = y
Solve first Ly = b, by forward substitution. Then, solve Ux = y, by backward

substitution to get the solution vector x.
3.6 ITERATIVE METHODS
Iterate means repeat. Hence, an iterative method repeats its process over and over,
each time using the current approximation to produce a better approximation for the
true solution, until the current approximation is sufficiently close to the true solution –
or until you realize that the sequence of approximations resulting from these iterations
is not converging to the true solution.
Given an initial guess or approximation x(0) for the true solution x, we use x(0) to find
a new approximation x(1), then we use x(1) to find the better approximation x(2), and so
on. We expect that x(k) → x as k → ∞ ; that is, our approximations should become
closer to the true solution as we take more iterations of this process.
Since, we do not actually have the true solution x, we cannot check to see how close
our current approximation x(k) is to x. One common way to check the closeness of x(k)
to x is instead by checking how close Ax(k) is to Ax, that is, how close Ax(k) is to b.
49
Numerical
Computing -I Another way to check the accuracy of our current approximation is by looking at the
magnitude of the difference in successive approximations, | x(k) − x(k-1) |. We expect
x(k) to be close to x if | x(k) − x(k-1) | is small .
The Jacobi Method
This method is also called Gauss – Jacobi method. In Jacobi method, the first equation
is used to solve for x1, second equation is used to solve x2 etc. That is,
1
x1 = [b1 − (a12 x 2 + ...... + a1n xn)]
a11
1
x 2 = [b 2 − (a 21 x1 + a 23 x 3 + ...... + a 2 n xn )]
a 22
If in the ith equation
n
∑a
j =1
i, j x j =bi (15)
we solve for the value of xi, we obtain ,

n
xi = (bi − ∑ ai , j x j ) / ai , j (16)
j ≠1
This suggests an iterative method defined by
xi( k ) = (bi − ∑ ai , j x (jk −1) / ai , j (17)

j ≠i
which is the Jacobi method. Note that the order in which the equations are solved is
irrelevant, since the Jacobi method treats them independently. For this reason, the
Jacobi method is also known as the method of simultaneous displacements, since the
updates could in principle be done simultaneously.
Jacobi method can be written in matrix notation.
Let A be written as A = L + D + U, where L is strictly lower triangular part, D the

diagonal part and U is strictly upper triangular part.
0 0 L 0
0 a 12 L a 
1na 11 0
a 0 O 0  0 0 O a   a 
L=  ,U =  ,D =  
21 2n 22
M O O M M O O M   O 
     
a n1 a L 0
n2
0 0 L 0 0 a 
nn
Therefore, we have
(L + D + U) X = b, or DX = – (L + U) x + b
Since, aii ≠ 0, D–1 exists and is equal to

D–1 = diag ( 1/a11, 1/a12, ……1/ann).
Inverting D, we write the intersection as

X(k+1) = – D–1 (L + U) X(k) + D–1 b (18)
= MJ x(k) + C (19)
where MJ – D–1 (L + U) and C = D–1 b.
The matrix MJ is called the iteration matrix. Covergence of the method depends on the
properties of the matrix MJ.
50
Solution of
Diagonally dominant: A matrix A is said to be diagonally dominant if Linear Algebraic
n Equations
| aii |≥ ∑
j =1,i ≠ j
| aij | (20)
with inequality satisfied for atleast one row.
Convergence: (i) The Jacobi method converges when the matrix A is diagonally
dominant. However, this is a sufficient condition and necessary condition.
(ii) The Jacobi method converges if Spectral radius (MJ) < 1. Where Spectiral radius
of a matrix = max =| λ | and xi are eigenralues of MJ.This is a necessary and sufficient
i
i
condition. If no intial approximation is known, we may assume X(0) = 0.
Exercise 1: Are the following matrices diagonally dominant?
 2 −5.81 34  124 34 56 
A =  45 43 1

and B = 23 53 5

   
123 16 1   96 34 129
Solution: In A, all the three rows violate the condition (20). Hence, A is not
diagonally dominant.
In B, in the row | –129 | = 129 < 96 + 34 = 130. Therefore, B is also not diagonally
dominant.
Example 2: Solve the following system, of equations

x + y – z = 0, – x + 3y = 2, x – 2z = –3 by Jacobi Method, both directly and in
matrix form. Assume the initial solution vector as [0.8 0.8 2.1]T.
Solution : We write the Jacobi method as

1 1
x(k+1) = – y(k) + z(k), y(k+1) = (2 + x(k)), z(k+1) = (3 + x(k))
3 2
with x(0) = 0.8, y(0) = 0.8, z(0) = 2.1, we get the following approximations.
x(1) = 1.3, y(1) = 0.9333, z(1) = 1.9;

x(2) = 0.9667, y(2) = 1.1, z(2) = 2.5;
x(3) = 1.0500, y(3) = 0.9889, z(3) = 1.98335;
x(4) = 0.99445, y(4) = 1.01667, z(4) = 2.025;
x(5) = 1.00833, y(5) = 0.99815, z(5) = 1.997225;
x(6) = 0.988895, y(6) = 1.00278, z(6) = 2.004165;
x(7) = 1.001385, y(7) = 0.99630, z(7) = 1.99445;
x(8) = 0.99815, y(8) = 1.00046, z(8) = 2.00069;
x(9) = 1.00023, y(9) = 0.99938, z(9) = 1.999075.
At this stage, we have,
| x(9) – x(8) | = 0.002, | y(9) – y(8) | = 0.0019, | z(9) – z(8) | = 0.0016.
Therefore, the 9th interaction is correct to two decimal places.
51
Numerical
Computing -I Let us represent the matrix A in the form
 0 0 0  1 0 0  0 1 1 
A = L + D + U =  −1 0 0  +  0 3 0  +  0 0 0 
     
 1 0 0   0 0 −2  0 0 0 
We have,
1 0 0   0 1 −1  0 −1 1 
−1
Mj = − D ( L + U ) = − 0 1/ 3 0   −1 0 0 = 1/ 3 0 0 
 
    
 0 0 −1/ 2   1 0 0  1/ 2 0 0 
1 0 0  0 
 −1
c = D b = 0 1/ 3 0   2 / 3
  
 0 0 −1/ 2  3 / 2 
Therefore, Jacobi method gives,
 0 −1 1   0 
X ( k +1)
= 1/ 3 0 0 X +  2 / 3
  (k )
   
1/ 2 0 0  3 / 2 
The intial approximation is given as X(0) = [0.8 0.8 2.1]T
Then, we have
 0 −1 1   0.8  0   1.3 
X (1)
= 1/ 3 0 0   0.8 +  2 / 3 =  0.9333
      
1/ 2 0 0   2.1 3 / 2   1.9 
which is same as X(1) obtained earlier.
Since, the two procedures (direct and in matrix form) are identical, we get the same
approximations x(2), ….. x(9). The exact solution is x = [1 1 2]T.
Note that the coefficient matrix A is not diagonal dominant. But, we have obtained the
solution correct to two decimal places in 9 interactions. This shows that the
requirement of A being diagonal dominant is a sufficient condition.
Example 3: Solve by Jacobi’s method the following system of linear equations.
2x1 – x2 + x3 = –1
x1 + 2x2 – x3 = 6
x1 – x2 + 2x3 = –3.
Solution: This system can be written as
x1 = 0.5 x2 – 0.5 x3 – 0.5

x2 = – 0.5 x1 + 0.5 x3 + 3.0
x3 = – 0.5 x1 + 0.5 x2 – 1.5
52
Solution of
So the Jacobi iteration is Linear Algebraic
Equations
 x ( k +1)   0.0 0.5 −0.5  x ( k )   −0.5
1 1
 ( k +1)     (k )   
x 2
 =  −0.5 0.0 0.5   x  +  3.0  2
 x ( k +1)   −0.5 0.5 0.0   x ( k )   −1.5

3 3
Since, no intial approximation is given, we start with x(0) = (0, 0, 0)T. We get the
following approximations.
X(1) = [– 0.5000 3.0000 – 1.5000]T

X(2) = [1.7500 2.5000 0.2500]T
X(3) = [0.6250 2.2500 – 1.1250]T
X(4) = [1.1875 2.1250 – 0.6875]T
X(5) = [0.9063 2.0625 – 1.0313]T
X(6) = [1.0469 2.0313 – 0.9219]T
X(7) = [0.9766 2.0156 – 1.0078]T
X(8) = [1.0117 2.0078 – 0.9805]T
X(9) = [0.9941 2.0039 – 1.0020]T
X(10) = [1.0029 2.0020 – 0.9951]T
X(11) = [0.9985 2.0010 – 1.0005]T
X(12) = [1.0007 2.0005 – 0.9988]T
X(13) = [0.9996 2.0002 – 1.0001]T
X(14) = [1.0002 2.0001 – 0.9997]T
After 14 iteractions, the errors in the solutions are
| x1(14) – x1(13) | = 0.0006, | x2(14) – x2(13) | = 0.0001, | x3(14) – x3(13) | = 0.0004.
The solutions x(14) are therefore almost correct to 3 decimal places.
The Gauss-Seidel Method
We observe from Examples 2 and 3 that even for a 3 x 3 system, the number of
iterations taken by the Jacobi method (to achieve 2 or 3 decimal accuracy) is large.
For large systems, the number of iterations required may run into thousands. Hence,
the Jacobi method is slow. We also observe that when the variable xi is being iterated
in say the k-th iteration, the variables, x1, …. xi-1 have already been updated in the k-th
iteration. However, these values are not being used to compute xi(k). This is the
disadvantage of the Jacobi method. If we use all the current available values, we call it
the Gauss-Seidel method.
Therefore, Gauss – seidel method is defined by

n n
x ( k ) = (bi − ∑ ai , j x ( k ) − ∑ ai , j x ( k −1) ) / a
i j j i, i (21)
j <i j >i
Two important facts about the Gauss-Seidel method should be noted. First, the
computations in (21) are serial. Since, each component of the new iterate depends
upon all previously computed components, the updates cannot be done simultaneously
as in the Jacobi method. Second, the new iterate depends upon the order in which the
equations are being used. The Gauss-Seidel method is sometimes called the method of
successive displacements to indicate the dependence of the iterates on the ordering. If
this ordering is changed, the components of the new iterate (and not just their order)
will also change.
53
Numerical
Computing -I To derive the matrix formulation, we write,
AX = (L + D + U) X = b or (L + D) X = – UX + b.
The Gauss-Seidel method can be expressed as
X(k+1) = – (L + D) –1 U X(k) + (L + D)–1b
= MG X (k) + C (22)
where MG = – (L + D) –1 U is in iteration matrix and C = (L + D) –1 b.
Again, convergence depends on the properties of MG if Spectral radius(MG) < 1, the
iteraction converges always for any intial solution vector. Further, it is known that
Gauss-Seidel method converges atleast two times faster than for Jacobi method.
Example 4: Solve the system in Example 2, by the Gauss-Seidel method. Write its
matrix form.
Solution: Gauss-Seidel method for solving the system in Example 2 is given by

1 1
x(k+1) = – y(k) + z(k), y(k+1) = (2 + x(k)), z(k+1) = (3 + x(k))
3 2
with x(0) = 0.8, y(0) = 0.8, z(0) = 2.1, we obtain the following results.
x(1) = 1.3, y(1) = 1.1, z(1) = 2.15;

x(2) = 1.05, y(2) = 1.01667, z(2) = 2.025;
x(3) = 1.00833, y(3) = 1.00278, z(3) = 2.004165;
x(4) = 1.001385, y(4) = 1.00046, z(4) = 2.00069;
x(5) = 1.00023, y(5) = 1.000077, z(5) = 2.000115;
The errors after the 5th iteractions are
| x(5) – x(4) | = 0.0012, | y(5) – y(4) | = 0.00038, | z(5) – z(4) | = 0.00057.
In 5 iteractions, we have for two place accuracy, while 9 iteractions we required in the
Jacobi method.
The matrix function can be written as

−1
1 0 0  0 1 −1 1 0 0  0 
X ( k +1)
= −  −1 3 0   0 0 0  X ( k ) +  −1 3 0   2 
      
 1 0 −2   0 0 0   1 0 −2   −3
 −6 0 0 0 1 −1  −6 0 0  0 
=+
1
−2 −2 0 0 0 0X
(k )
=
1
−2 −2 0   2
6    6  
 3 0 3
  0 0 
0  −3 0 3  −3
0 −6 6 0
− −4 
1 1
=+ 2 X
(k )
0 −2
6  6 
0 −3 3  −9 
Starting with X(0) = [ 0.8 0.8 2.1]T, we get the same iterated values as above.
Example 5: Solve the system given in Example 3 by Gauss-Seidel method.

Solution: For Gauss- Seidel iterations, the system in Example 3 can be written as
x1(k+1) = 0.5 x2(k) – 0.5 x3(k) – 0.5
x2(k+1) = –0.5 x1(k+1) + 0.5 x3(k) +3.0
x3(k+1) = –0.5 x1(k+1) + 0.5 x2(k+1) + 1.5
54
Solution of
Start with (0, 0, 0), we get the following values Linear Algebraic
Equations
X(1) = [– 0.5000 3.2500 0.3750]T
X(2) = [0.9375 2.7188 – 0.6094]T
X(3) = [1.1641 2.1133 – 1.0254]T
X(4) = [1.0693 1.9526 – 1.0583]T
X(5) = [1.0055 1.9681 – 1.0187]T
X(6) = [0.9934 1.9939 – 0.9997]T
X(7) = [0.9968 2.0017 – 0.9976]T
X(8) = [0.9996 2.0014 – 0.9991]T
X(9) = [1.0003 2.0003 – 1.0000]T
X(10) = [1.0001 1.9999 – 1.0001]T
After 10 iterations, the errors in solutions are

| x1(10) – x1(9) | = 0.0002, | x2(10) – x1(9) | = 0.0.0004, | x3(10) – x3(9) | = 0.0001.
The solutions are correct to 3 decimal places.
3.7 SUMMARY
In this unit, we have discussed direct and interative methods for solving a system of
linear equations. Under these categories of methods, used to solve a system of linear
equations, we have discussed Gauss elimination method and LU decomposition
method. We have also discussed the method of finding inverse of a square matrix.
Further, under the category of iterative methods for root determination we have
discussed Jacobi and Gauss Scidel method.
3.8 EXERCISES
E1. Solve the following systems using the Gauss elimination method.
(a) 3x1 + 2x2 + 3x3 = 5, (b) 3x1 + x2 + x3 = 1.8,

x1 + 4x2 + 2x3 = 4, 2x1 + 4x2 + x3 = 2.7,
2x1 + 4x2 + 8x3 = 8, x1 + 3x2 + 5x3 = 4.0,
(c) x1 – x2 + x3 = 0, (d) 3x1 + x2 = 5,

2x1 + 3x2 + x3 − 2x4 = −7 x1 + 3x2 + 6x3 = 6
3x1 + x2 − x3 + 4x4= 12, 4x2 + x3 + 3x4= 7
3x2 – 5x3 + x4= 9 x3 + 5x4= 8,
E2. Solve the following systems using the LU decomposition method.

(a) 3x + y + z = 3, (b) 2x + y + z = 5,
x + 4y + 2z = 0, x + 3y + 2z = 4,
2x + y + 5z = 4, −x + y + 6z = 4,
(c) 4x + y + 2z = 3.6, (d) 3x + y = −2,
x + 3y + z = 2.5, x + 3y - z = 0,
2x + y + 2z = 4.0, − y + 7z = 13.
55
Numerical
Computing -I E3. For problems in 1(a), (b); 2(a), (b), (c), (d), obtain the solution to 3 decimals
using the Jacobi and Gauss Seidel methods. Write the matrix formulations
also. Assume the initial solution vectors respectively as
(i) [0.8, 0.6, 0.5]T,
(ii) [0.3, 0.3, 0.6]T,
(iii) [0.9, −0.6, 0.6]T,
(iv) [1.9, 0.2, 0.9]T,
(v) [0.2, 0.5, 1.1]T,
(vi) [−1.1, 0.9, 2.1]T.
3.9 SOLUTIONS TO ANSWERS

1. (a) 1, ½, ½ . (b) 0.3, 0.4, 0.5.
(c) 1, −1, −2, 2. (d) 3/2, ½, ½, 3/2.
2. (a) 1, −1/2, 1/2. (b) 2, 0, 1.
(c) 0.3, 0.4, 1. (d) −1, 1, 2, (You can also try LLT decomposition)
3. Refer to Page 49 and 52.
56
Differentiation,
Integration and
Differential Equations
UNIT 2 NUMERICAL INTEGRATION
Structure Page Nos
2.0 Introduction 24
2.1 Objectives 25
2.2 Methods Based on Interpolation 25
2.2.1 Methods Using Lagrange’s Interpolation
2.2.2 Methods Using Newton’s Forward Interpolation
2.3 Composite Integration 34
2.4 Summary 38
2.0 INTRODUCTION
In Unit 1, we developed methods of differentiation to obtain the derivative of a
function f(x), when its values are not known explicitly, but are given in the form of a
table. In this unit, we shall derive numerical methods for evaluating the definite
integrals of such functions f(x). You may recall that in calculus, the definite integral
of f(x) over the interval [a, b] is defined as
b
lim
∫ f (x) dx = h → 0 R[h ]
a
(b − a )
where R[h] is the left-end Riemann sum for n subintervals of length h = and is
n
given by
n −1
R [h ] = ∑ h f (x
k −0
k)
for the nodal points x0, x1, …, xn, where x k = x0 +kh and x0 = a, xn = b.
The need for deriving accurate numerical methods for evaluating the definite integral
arises mainly, when the integrand is either
2 sin( x )
i) a complicated function such as f(x) = e − x or f ( x ) = etc. which have no
x
anti-derivatives expressible in terms of elementary functions, or
ii) when the integrand is given in the form of tables.
Many scientific experiments lead to a table of values and we may not only require an
approximation to the function f(x) but also may require approximate representations
of the integral of the function. Also, for functions the integrals of which can be
calculated analytically, analytical evaluation of the integral may lead to
transcendental, logarithmic or circular functions. The evaluation of these functions
for a given value of x may not be accurate. This motivates us to study numerical
integration methods which can be easily implemented on calculators.
In this unit we shall develop numerical integration methods where in the integral is
approximated by a linear combination of the values of the integrand i.e.
b
∫ f (x) dx = β
a
0 f ( x 0 ) + β1 f ( x 1 ) + ............ + β n f ( x n ) (1)
24
where x0, x1, ………xn are the points which divide the interval [a, b] into n sub- Numerical Integration
intervals and β0, β1, ………βn are the weights to be determined. We shall discuss in
this unit, a few techniques to determine the unknowns in Eqn. (1).
2.1 OBJECTIVES
After going through this unit you should be able to:
• state the basic idea involved in numerical integration methods for evaluating the
definite integral of a given function;
• use numerical integration method to find the definite integral of a function f(x)
whose values are given in the form of a table;
• use trapezoidal and Simpson’s rules of integration to integrate such functions
and find the errors in these formulas.
2.2 METHODS BASED ON INTERPOLATION

In Block 2, you have studied interpolation formulas, which fits the given data (xk, fk),
k = 0, 1, 2, …………….., n. We shall now see how these interpolation formulas can
be used to develop numerical integration methods for evaluating the definite integral
of a function which is given in a tabular form. The problem of numerical integration
is to approximate the definite integral as a linear combination of the values of f(x) in
the form
b n
∫a
f ( x ) dx ≈ ∑β
k =0
k fk (2)
where the n + 1 distinct points xk, k = 0, 1, 2, ………., n are called the nodes or
abscissas which divide the interval [a, b] into n sub-intervals with
(x0 < x1, < x2, ……. < xn-1 < xn) and βk, k =0, 1, ….n, are called the weights of the
integration rule or quadrature formula. We shall denote the exact value of the definite
integral by I and denote the rule of integration by
n
I h [f ] = ∑ β k f k (3)
k =0
The error of approximating the integral I by Ih [f] is given by
b n
E h [f ] = f ( x )dx −
∫ ∑β
k =0
kfk (4)
a
The order of the integration method (3) is defined as follows:
Definition : An integration method of the form (3) is said to be of order p if it

produces exact results for all polynomials of degree less than or equal to p.
In Eqn. (3) we have 2n+2 unknowns viz., n + 1 nodes xk and the n + 1 weights, βk and
the method can be made exact for polynomials of degree ≤ 2n + 1. Thus, the method
of the form (3) can be of maximum order 2n + 1. But, if some of the values are
prescribed in advance, then the order will be reduced. If all the n + 1 nodes are
prescribed, then we have to determine only n + 1 weights and the corresponding
method will be of maximum order n.
We first derive the numerical method based on Lagrange’s interpolation.

25
Differentiation, 2.2.1 Methods Using Lagrange’s Interpolation
Integration and
Differential Equations Suppose we are given the n + 1 abscissas xk’s and the corresponding values fk’s are
known that the unique Lagrange’s interpolating polynomial Pnn(x) of degree ≤ n,
satisfying the interpolatory conditions Pn (xk)= f(xk), k = 0, 1, 2, ….. n, is given by
n
f ( x ) ≈ Pn ( x ) = ∑L
k =0
k ( x )f k (5)
with the error of interpolation
π( x ) n +1
E n +1 [Pn ( x )] = f (α) , with x 0 < α < x n (6)
(n + 1)!
( x − x 0 ) ( x − x1 )....( x − x k −1 ) ( x − x k +1 )....( x − x n )
where Lk (x) =
( x k − x 0 ) ( x k − x1 ) ......( x k − x k −1 ) ( x k − x k +1 )....( x k − x n )
and π( x ) = ( x − x 0 ) ( x − x 1 )...( x − x n ) .
We replace the function f(x) in the definite integral (2) by the Lagrange interpolating
polynomial Pn (x) given by Eqn. (5) and obtain
b n b n
I n [f ] = ∫
a
Pn ( x )dx = ∑∫
k =0
a
L k ( x )f k dx = ∑β
k −0
kfk (7)
b
where β k = ∫La
k ( x ) dx. (8)
The error in the integration rule is
b b
π( x )
E n [f ] = E n +1 [Pn ( x )] dx =
∫ ∫ (n + 1)!f
n +1
(α)dx (9)
a a
We have
b
M n +1
E n [f ] ≤ ∫ π( x ) dx (10)
(n + 1)!
a
max
where M n +1 = f n+1 ( x )
x0 < x < xn
Let us consider now the case when the nodes xk ‘s are equispaced with x0 = a, xn = b
b−a
with the length of each subinterval as h = . The numerical integration methods
n
given by (7) are then known as Newton-Cotes formulas and the weights βk’s given
by (8) are known as Cotes numbers. Any point x ε [a, b] can be written as x = x0 +
sh.
With this substitution, we have
π (x) = h
n+1
s (s─1) (s─2) (s─3) ……(s─n)
26
Numerical Integration
(−1) n − k s(s − 1)....(s − k + 1)(s − k − 1).....(s − n )
L k (x) = (11)
k! (n − k )!
Using x = x0 + sh and changing the variable of integration from x to s, we obtain
n
(−1) n − k
βk =
k! (n − k )! ∫
h s(s − 1)(s − 2).........(s − k + 1)(s − k − 1)......(s − n ) ds
0
(12)
n
h n + 2 M n +1
and E n [f ] ≤ ∫ s(s − 1)(s − 2).....(s − n)ds (13)
(n + 1)!
0
We now derive some of the Newton Cotes formulas viz. trapezoidal rule and
Simpson’s rule by using first and second degree Lagrange polynomials with equally
spaced nodes.
Trapezoidal Rule
When n = 1, we have x0 = a, x1= b and h = b─a. Using Eqn. (12) the Cotes numbers can
be found as
1
h
β 0 = − h ∫ (s − 1)ds = ;
0
2
1
h
and ∫
β1 = h s ds =
0
2
.
Substituting the values of β0 and β1 in Eqn. (7), we get
h
IT [ f ] = [f 0 + f1 ] (14)
2
The error of integration is
1
h3 h3 h3
E T [f ] ≤ M2 ∫ s (s − 1) ds = − M2 = M2
2 12 12
0
Where M2 = max f " ( x )

x0 <x<x1
b
Thus, by trapezoidal rule, ∫ f (x ) dx is given by
a
h h3
I[f ] = (f 0 + f 1 ) − M2
2 12
The reason for calling this formula the trapezoidal rule is that geometrically when f(x)
is a function with positive values then h (f0 + f1)/2 is the area of the trapezium when
height h = b─a and parallel sides as fo and f1. This is an approximation to the area
under the curve y = f(x) above the x-axis bounded by the ordinates x = x1 (see Fig. 1.)
27
Differentiation, Since the error given by Eqn. (15) contains the second derivative, trapezoidal rule
Integration and integrates exactly polynomials of degree ≤ 1.
P1
a = x0 x1 = b
Fig. 1
Let us now consider an example.
Example 1: Find the approximate value of
1
dx
I= ∫ 1+ x
0
using trapezoidal rule and obtain a bound for the error. The exact value of
I = ln2 = 0.693147 correct to six decimal places.
Solution: Here x0 = 0, x1=1, and h = 1 – 0 = 1. Using Eqn. (14), we get
1 1
IT [f ] = 1 +  = 0.75
2  2
Actual error = 0.75 – 0.693147 = 0.056853.
The error in the trapezoidal rule is given by
1 2 1
ET [f ] ≤ max 3
= = 0.166667
12 (1 + x ) 6
Thus, the error bound retain is much greater than the actual error.
We now derive the Simpson’s rule.
Simpson’s Rule
b−a a+b
For n = 2, we have h = , x 0 = a, x 1 = and x 2 = b.
2 2
From (12), we find the Cotes numbers as
2
h h
β0 =
20 ∫
(s − 1) (s − 2) ds =
3
28
2 2 Numerical Integration
4h h h
∫
β1 = h s(s − 2) ds =
0
3
, β2 =
20 ∫
s(s − 1) ds = .
3
Eqn. (7) in this case reduces to
h
I S [f ] = [f 0 + 4f1 + f 2 ] (17)
3
b
Eqn. (17) is the Simpson’s rule for approximating I = ∫ f (x ) dx
a
The magnitude of the error of integration is
2
h 4M3
E2[f ] ≤ ∫ s (s − 1) (s − 2) ds
3!
0
h 4M3  
1 2
=
3!  ∫ ∫
 s (s − 1) (s − 2) ds + s(s − 1) (s − 2) ds 

0 1 
 s 4 1 2

h 4M3 2  s4 2
=  − s + s  +  − s + s  
3 3
3!  4  4
0 
 
1 

h 4M3  1 1 
= − =0
3!  4 4 
This indicates that Simpson’s rule integrates polynomials of degree 3 also exactly.
Hence, we have to write the error expression (13) with n = 3. We find
5 2
ES [ f ] ≤h M4
∫ s(s − 1) (s − 2) (s − 3) ds
24
0
h 5M 4  
1 2
=
24  0 ∫ 1
∫
 s(s − 1) (s − 2) (s − 3) ds + s (s − 1) (s − 2) (s − 3) ds 

(18)
− h 5M 4
=
90
max
where M 4 = f IV (X)
x0 < x < x2
Since the error in Simpson’s rule contains the fourth derivative, Simpson’s rule
integrates exactly all polynomials of degree ≤ 3.
b
Thus, by Simpson’s rule, ∫ f (x) dx is given by
a
5
h
I S [f ] = [f 0 + 4f 1 + f 2 ] − h M 4
3 90
29
Differentiation, h
Integration and Geometrically, [f 0 + 4f1 + f 2 ] represents the area bounded by the quadratic curve
Differential Equations 3
passing through (x0 , f0), (x1 , f1) and (x2 , f2) above the x-axis and lying between the
ordinates x = x0, x = x2 (see figure. 2).
y
f
P2
x
a= x0 x1 x2 = b
Fig. 2
In case we are given only one tabulated value in the interval [a, b], then h = b – a, and
the interpolating polynomial of degree zero is P0(x) = fk. In this case, we obtain the
rectangular integration rule given by
b
I R [f ] = f k dx ≈ hf k
∫ (19)
a
The error in the integration rule is obtained from Eqn. (13) as
h 2 M1
ER f ≤ (20)
2
where max
M1 = f '(x )
a < x < b
If the given tabulated value in the interval [a, b] is the value at the mid-point then we
( a + b)
have x k = , and f k = f 1 . In this case h = b – a and we obtain the integration
2 k+
2
rule as
b
I M [f ]= f ∫ 1 dx ≈ hf 1
(21)
k+ k+
a 2 2
Rule (21) is called the mid-point rule. The error in the rule calculated from (13) is
h2
E M [f ]=
1/ 2
2 ∫
−1 / 2
s ds = 0
This shows that mid-point rule integrates polynomials of degree one exactly. Hence
the error for the mid-point rule is given by
30
Numerical Integration
h 3M 2 h 3M 2
E M [f ]≤
1/ 2
2 ∫−1 / 2
s (s − 1) ds =
24
(22)
max
where M 2 = f " ( x ) and h = b – a
a<x<b
We now illustrate these methods through an example.
∫
2
Example 2 : Evaluate e − x dx, u sin g
0
a) Rectangular rule b) mid-point rule c) trapezoidal rule and d) Simpson’s rule. If

the exact value of the integral is 0.74682 correct to 5 decimal places, find the
error in these rules.
2
Solution: The values of the function f(x) = e − x at x = 0, 0.5 and 1 are
f(0) = 1, f(0.5) = 0.7788, f(1) = 0.36788
Taking h = 1 and using
a) I R [f ]= hf 0, we get I R [f ]=1.
b) I M [f ]= hf1 / 2, we get I M [f ]= 0.7788
h
c) I T [f ] = [f 0 + f1 ] we get I T [f ] = 1 (1 + 0.36788) = 0.68394.
2 2
Taking h = 0.5 and using Simpson’s rule, we get
h
d) I S [f ] = [f 0 + 4f1 + f 3 ]
3
h
= [f (0) + 4f (0.5) + f (1)]
3
= 0.74718
Exact value of the integral is 0.74682.
The errors in these rules are given by
E R [f ]= − 0.25318, E M [f ]= − 0.03198
E T [f ] = 0.06288, E s [t ] = − 0.00036.
You may now try the following exercise:

Ex.1) Use the trapezoidal and Simpson’s rule to approximate the following
integrals. Compare the approximations to the actual value and find a bound
for the error in each case.
2
a) ∫ 1n x dx
1
31
Differentiation, 0.1
∫x
1/ 3
Integration and b) dx
π/4
c) ∫ tan x dx
0
We now derive integration methods using Newton’s forward interpolation formula.
2.2.2 Methods Using Newton’s Forward Interpolation

Let the data be given at equi-spaced nodal points x k = x 0 + sh, s = 0,1, 2, ........., n ,
Where x0 = a and xn = x0 + nh = b.
b−a
The step length is given by h = .
n
The Newton’s forward finite difference interpolation formula interpolating this data is
given by
∆2f 0 s (s − 1) (s − 2) ... (s − n + 1) ∆n f 0
f ( x ) ≈ Pn ( x ) = f 0 + s∆f 0 + s(s − 1) + ... + (23)
2 n!
with the error of interpolation
h n +1 s(s − 1) (s − 2) ... (s − n ) n +1
E n [f ]= f (α )
(n + 1) !
Integrating both sides of Eqn. (23) w.r.t. x between the limits a and b, we can
approximate the definite integral I by the numerical integration rule
b 1
 s (s − 1) 2 
I h [f ] = Pn ( x ) dx = h
∫ ∫ f 0 + s ∆f 0 + ∆ f 0 + ... ds (24)
2 
a 0
The error of interpolation of (24) is given by
1
h n + 2 M n +1
E h (f ) ≤
(n + 1) ! 0 ∫
s (s − 1) (s − 2) ... (s − 1) ds
We can obtain the trapezoidal rule (14) from (24) by using linear interpolation i.e.,
f(x) ≈ P1 ( x ) = f 0 + s ∆ f 0 . We then have
1
IT (x ) = h ∫ [f
0
0 + s∆f 0 ]ds
1
 s2 
= h s f 0 + ∆f 0 
 2 0
 ∆f  h
= h f 0 + 0  = [f 0 + f1 ]
 2  2
32
with the error of integration given by (15). Numerical Integration
Similarly Simpson’s rule (16) can be obtained from (24) by using quadratic
interpolation i.e., f (x) ≈ P2 (x).
Taking x0 = a, x1 = x0 + h, x2 = x0 + 2h = b, we have
b 2
 s (s − 1) 2 
Is [f ]= ∫ f ( x ) dx ≈ h ∫ f 0 + s ∆f 0 + ∆ f 0  ds
a 0  2 
 ∆2 f 0 
= h 2 f 0 + 2∆f 0 + 
 3 
h
= [f n + 4f1 + f 2 ].
3
The error of interpolation is given by Eqn. (18).
1
dx
Example 3: Find the approximate value of I = ∫ 1 + x u sin g
0
Simpson’s rule. Obtain the error bound and compare it with the actual error. Also
compare the result obtained here with the one obtained in Example 1.
1
Solution: Here x0 = 0, x1 = 0.5, x2 = 1 and h = .
2
Using Simpson’s rule we have
I s [f ] =
h
[f (0) + 4f (0.5) + f (1)] = 1 1 + 8 + 0.5 = 0.694445
3 6 3 
Exact value of I =ln 2 = 0.693147.
Actual error = 0.001297. The bound for the error is given by
h5 24
E s [f ] ≤ M 4 = 0.00833, where M 4 = max = 24
90 (1 + x ) 5
Here too the actual error is less than the given bound.
Also actual error obtained here is much less than that obtained in Example 1. You
may now try the following exercise.
1.5
∫e
x
Ex. 2) Find an approximation to dx, u sin g
1.1
a) the trapezoidal rule with h = 0.4
b) Simpson’s rule with h = 0.2
The Newton-Cotes formulas as derived above are generally unsuitable for use over
large integration intervals. Consider for instance, an approximation to
4
∫e
x
dx, using Simpson’s rule with h = 2. Here
0
33
Differentiation, 4
∫ e dx ≈ 3 (e )
x 2 0
Integration and + 4e 2 + e 4 = 56.76958.
Since the exact value in this case e4 – e0 = 53.59815, the error is –3.17143. This error
is much larger than what we would generally regard as acceptable. However, large
error is to be expected as the step length h = 2.0 is too large to make the error
expression meaningful. In such cases, we would be required to use higher order
formulas. An alternate approach to obtain more accurate results while using lower
order methods is the use of composite integration methods, which we shall discuss in
the next section.
2.3 COMPOSITE INTEGRATION

In composite integration we divide the given interval [a, b] into a number of
subintervals and evaluate the integral in each of the subintervals using one of the
integration rules. We shall construct composite rules of integration for trapezoidal
and Simpson’s methods and find the corresponding errors of integration when these
composite rules are used.
Composite Trapezoidal Rule
(b − a )
We divide the interval [a, b] into N subintervals of length h = . We denote the
N
subintervals as
( x k −1 , x k ), k =1, 2, .. , N where x 0 = a , x N = b. Then
b N xk
∫
I = f ( x ) dx = ∑ ∫
k =1 x k −1
f ( x ) dx (25)
a
Evaluating each of the integrals on the right hand side by trapezoidal rule, we have
N (26)
h
I T [f ] = ∑ [f k −1 + f k ] = h [f 0 + f N + 2(f1 + f 2 + ..... + f N −1 ) ]
k =1
2 2
`
h  
3 N
E T [f ] = − ∑
 f " (α i ) , x k −1 < α i < x k , for k = 1...., N.
12  i =1 
Now since f is a continuous function on the interval [a, b] we have as a consequence

of Intermediate-value theorem
N N
∑ i =1
f " (α i ) = f " (ξ) ∑ 1, where a < ξ < b.
i =1
3
−h
∴E T [f ]= f " (ξ) N, a < ξ < b.
12
− Nh 3
= h f " ( ξ)
12
− ( b − a ) h 2.
= f " (ξ).
12
34
max Numerical Integration
If M 2 = f " (ξ) . Then
a<ξ<b
(b − a ) h 2
E T [f ] ≤ M2 (27)
12
The error is of order h2 and it decreases as h decreases
Composite trapezoidal rule integrates exactly polynomials of degree ≤ 1. We can try

to remember the formula (26) as
h
I T [f ]=   [first ordinate + last ordinate + 2 (sum of the remaining ordinates)].
2
Composite Simpson’s Rule

In using Simpson’s rule of integration (17), we need three abscissas. Hence, we divide
the interval (a, b) into an even number of subintervals of equal length giving an odd
b−a
number of abscissas in the form a = x0 < x1 < x2 < ….. < x2n = b with h = and
2N
xk = x0 + kh, k = 0, 1, 2, …, 2N. We then write
b N x 2k
∫
I = f ( x) dx = ∑ ∫
k =1 x 2 k − 2
f ( x ) dx (28)
a
Evaluating each of the integrals on the right hand side of Eqn. (28) by the Simpson’s
rule, we have
h
N = [f0 + f2N + 4(f1 + f3 +....+ f2N−1) + 2(f2 + f4 + ....+ f2N−2)] (29)
h
Is[f ] =
∑ [f2k−2+4f2k−1+f2k] 3
k=1
3
The formula (29) is known as the composite Simpson’s rule of numerical
integration. The error in (29) is obtained from (18) by adding up the errors. Thus we
get
h5  
N
E s [f ] = − ∑IV
 f (α )  , x 2 k − 2 < α k < x 2 k
90  k =1 
N
h5
= −
90
f iv
(ξ ) ∑i =1
1, a < ξ < b
Nh 5 IV
=− f (ξ)
90
(b − a ) h 4 IV
=− f (ξ).
180
max (b − a )
If M 4 = f IV (ξ) write using h =
a ≤ ξ ≤b 2N
(b − a ) 4
E s [f ] ≤ h M4 (30)
180
35
Differentiation, The error is of order h4 and it approaches zero very fast as h → 0. The rule integrates
Integration and exactly polynomials of degree ≤ 3. We can remember the composite Simpson’s rule
as
h
Is [f ]=   [first ordinate + last ordinate + 2 (sum of even ordinates) + 4 (sum of the remaining odd ordinates)]
3
We now illustrate composite trapezoidal and Simpson’s rule through examples.
1
dx
Example 4: Evaluate ∫ 1+ x
0
using
(a) Composite trapezoidal rule (b) composite Simpson’s rule with 2, 4 and 8
subintervals.
1
Solution: We give in Table 1 the values of f (x) with h = from x = 0 to x = 1.
8
Table 1
x : 0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
f(x) : 1 8/9 8/10 8/11 8/12 8/13 8/14 8/15 8/16

f0 f1 f2 f3 f4 f5 f6 f7 f8
If N = 2 then h = 0.5 and the ordinates f 0, f4 and f8 are to be used
We get
1
I T [f ]= [f 0 + 2f 4 + f8 ]= 17 = 0.708333
4 24
1 25
Is [f ]= [f 0 + 4f1 + f 8 ]= = 0.694444
6 36
If N = 4 then h = 0.25 and the ordinates f0, f2, f4, f6, f8 are to be used.
We have
1
I T [f ] = [f 0 + f8 + 2(f 2 + f 4 + f 6 )]= 0.697024
8
1
I T [f ] = [f 0 + f 8 + 4 (f 2 + f 6 ) + 2 f 4 ] = 0.693254
12
If N = 8 then h = 1/8 and all the ordinates in Table 1 are to be used.
We obtain
1
I T [f ] = [f 0 + f8 + 2 (f1 + f 2 + f 3 + f 4 + f 5 + f 6 + f 7 )]= 0.694122
16
1
Is [f ] = [f 0 + f8 + 4 (f1 + f 3 + f 5 + f 7 ) + 2 (f 2 + f 4 + f 6 )]= 0.693147
24
The exact value of the given integral correct to six decimal places is ln2 = 0.693147.
We now give the actual errors in Table 2 below.

36
Table 2 Numerical Integration
N ET (f) Es (f)
2 0.015186 0.001297
4 0.003877 0.000107
8 0.000975 0.000008
Note that as h decreases, the errors in both trapezoidal and Simpson’s rule also
decreases. Let us consider another example.
1
dx
Example 5: Find the minimum number of intervals required to evaluate
0
1 + x ∫
with
an accuracy 10-6, by using the Simpson rule.
Solution: In example 4 you may observe from Table 2 that N ≈ 8 gives

(1.E─06) accuracy. We shall now determine N from the theoretical error bound for
Simpson’s rule which gives 1.E − 06 accuracy. Now
(b − a )5 M 4
E s [f ] ≤
2880 N 4
where
max
M4 = f IV ( x )
0 < x <1
max 24
= = 24
0 < x < 1 (1 + x )5
To obtain the required accuracy we should therefore have
24 24 * 10 6
≤ 10 − 6 , or N 4 ≥ = 8333.3333
2880 N 4 2880
∴N ≥ 9.5
We find that we cannot take N = 9 since to make use of Simpson’s rule we should
have even number of intervals. We therefore conclude that N = 10 should be the
minimum number of subintervals to obtain the accuracy 1.E – 06 (i.e., 10-6)
1
dx
Ex.3) Evaluate ∫1+ x
0
2
by subdividing the interval (0, 1) into 6 equal parts and
using.
(a) Trapezoidal rule (b) Simpson’s rule. Hence find the value of π and actual errors.
Ex. 4) A function f(x) is given by the table
x : 1.0 1.5 2.0 2.5 3.0
f(x) : 1.000 2.875 7.000 14.125 25.000
37
Differentiation, Ex.5) The speedometer reading of a car moving on a straight road is given by the
Integration and following table Estimate the distance traveled by the car in 12 minutes using
(a) Trapezoidal rule (b) Simpson’s rule.
Time : 0 2 4 6 8 10 12
(minutes)
Speedo meter : 0 15 25 40 45 20 0
Reading
0.4
Ex.6) Evaluate ∫ (sin x − 1n x + e x ) dx

0.2
using (a) Trapezoidal rule (b)
Simpson’s rule taking h = 0.1. Find the actual errors.
∫
2
Ex.7) Determine N so that composite trapezoidal rule gives the value of e − x dx
0
−x 2
correct upto 3 digits after the decimal point, assuming that e can be calculated
accurately.
2.4 SUMMARY
In this unit, we learnt the following :
1) If a function f(x) is not known explicitly but a table of values of x is given or

when it has no anti-derivative expressible in terms of elementary functions then
its integral cannot be obtained by calculus methods. In such cases numerical
integration methods are used to find the definite integral of f(x) using the given
data.
2) The basic idea of numerical integration methods is to approximate the definite

integral as a linear combination of the values of f(x) in the form
b n
∫ f ( x ) dx ≈ ∑β
k =0
k f (x k ) (See Eqn. (21))
a
where the (n+1) distinct nodes x k , k = 0,1, ...., n , with x 0 < x1 < x 2 < .... < x n divide
the interval (a, b) into n subintervals and β k , k = 0,1, ...., n are the weights of the
integration rule. The error of the integration methods is then given by
b n
E h [f ] = f ( x ) dx −
∫ ∑β
k =0
k f (x k ) (see Eqn. (4))
a
3) For equispaced nodes, the integration formulas derived by using Lagrange

interpolating polynomials Pn (x) of degree ≤ n, satisfying the interpolatory
conditions Pn ( x k ) = f ( x k ), k = 0,1, ..., n are known as Newton-Cotes formulas.
Corresponding to n = 1 and n = 2, Newton-Cotes formulas viz., trapezoidal rule
and Simpson’s rule are obtained.
38
4) For large integration intervals, the Newton-Cotes formulas are generally Numerical Integration
unsuitable for they give large errors. Composite integration methods can be
used in such cases by dividing the interval into a large number of subintervals
and evaluating the integral in each of the subintervals using one of the
integration rules.
2.5 SOLTIONS/ANSWERS
h
E1) a) I T [f ]= [f 0 + f1 ]= 0.346574
2
h
Is [f ]= [f 0 + 4f1 + f 2 ]
3
0 .5
= [41n 1.5 +1n 2]= 0.385835
3
Exact value of I = 0.386294
Actual error in I T [f ]= 0.03972
Actual error in IS [f ]= 0.0000459
Also
h 3 max 1 1
E T [f ] ≤ − 2
=− = − 0.083334
12 1 < x < 2 x 12
h5 max 6
E s [f ] ≤ − = − 0.002083
90 1 < x < 2 x 4
b) I T [f ]= 0.023208, E T [f ] = none .
IS [f ]= 0.032296, E S [f ] = none .
Exact value = 0.034812.
c) I T [f ] = 0.39270, E T [f ] = 0.161.
IS [f ]= 0.34778 , E S [f ] = 0.00831.
Exact value = 0.34657
E2) I T [f ]=1.49718
IS [f ]=1.47754
1
E3) With h = 1/6, the values of f(x) =
1 + x2
From x = 0 to 1 are
x : 0 1/6 2/6 3/6 4/6 5/6 1
f(x) : 1 0.972973 0.9 0.8 0.692308 0.590167 0.5

39
Differentiation, Now
Integration and
h
I T [f ]= [f 0 + f 6 + 2(f1 + f 2 + f 3 + f 4 + f 5 )]
2
= 0.784241
h
IS [f ]= [f 0 + f 6 + 4 (f1 + f 3 + f 5 ) + 2(f 2 + f 4 )]
3
= 0.785398
[ ]10 = π4 .
1
dx
∫0
1+ x 2
= tan −1x Exact π = 3.141593
Value of π from I T [f ]= 4 x 0.784241 = 3.136964
Error in calculating π by I T [f ] is E T [f ]= 0.004629
Value of π from Is [f ]= 4 x 0.785398 = 3.141592
Error in π by I s [f ] is E s [f ]= 1.0 ×10 −6
h
E4) I T [f ]=   [f 0 + f 4 + 2(f1 + f 2 + f 3 ) ]
2
= (1 / 4 )[1 + 25 + 2(2.875 + 7 + 14.125)] = 18.5

h
Is [f ]=   [f 0 + f 4 + 2f 2 + 4(f1 + f 3 )]
3
= (1 / 6) [1 + 25 + 2 x 7 + (2.875 + 14.125)]=18
E5) Let v 0 = 0, v 1 = 15, v 2 = 25, v 3 = 40, v 4 = 45, v 5 = 20, v 6 = 0. Then
12
I= ∫
0
v dt ,
h
I T [v] =   [v 0 + v 6 + 2( v1 + v 2 + v 3 + v 4 + v 5 )] =290
2
880
Is [v]= = 293.33.
30
E6) The values of f(x) = sin x = 1n x + ex are
f (0.2) = 3.02951, f (0.3) = 2.849352, f (0.4) = 2.797534
 0.1 
I T [f ]=   [f (0.2) + 2f (0.3) + f (0.4)]= 0.57629
 2 
 0.1 
IS [f ]=   [f (0.2) + 4f (0.3) + f (0.4)]= 0.574148
 3 
Exact value = 0.574056

40
ET = 2.234 x 10-3 Numerical Integration
Es = 9.2 x 10-5
E7) Error in composite trapezoidal rule
(b − a ) 3 max
E T [f ] = − 2
M2 , M2 = f " (x) .
12 N 0 < x <1
Thus
1 max
E T [f ] ≤ 2
f " (x)
12 N 0 ≤ x ≤1
2 2
f ( x ) = e − x , f " ( x ) = e − x (4x 2 − 2)
2
f " ' ( x ) = e − x 4x (3 − 2 x 2 ) = 0 when or x = 0, x = 1.5
max [f " (0) , f " (1)] = max 2, 2e −1 = 2 [ ]

For getting the correct value upto 3 digits, we must have
2 103 10 4
< 10 − 03 or N 2 > =
12 N 2 6 60
or
100
N> ≈12.9 .
(60)
The integer value of N =13.
41
Differentiation,
Integration and
42
Differentiation,
Integration and
UNIT 3 NUMERICAL SOLUTION OF
ORDINARY DIFFERENTIAL
EQUATIONS
Structure Page Nos.
3.0 Introduction 42
3.1 Objectives 42
3.2 Basic Concepts 42
3.3 Taylor Series Method 49
3.4 Euler’s Method 52
3.5 Summary 56
3.0 INTRODUCTION
In the previous two units, you have seen how a complicated or tabulated function can
be replaced by an approximating polynomial so that the fundamental operations of
calculus viz., differentiation and integration can be performed more easily. In this
unit we shall solve a differential equation, that is, we shall find the unknown function
which satisfies a combination of the independent variable, dependent variable and its
derivatives. In physics, engineering, chemistry and many other disciplines it has
become necessary to build mathematical models to represent complicated processes.
Differential equations are one of the most important mathematical tools used in
modeling problems in the engineering and physical sciences. As it is not always
possible to obtain the analytical solution of differential equations recourse must
necessarily be made to numerical methods for solving differential equations. In this
unit, we shall introduce two such methods namely, Euler’s method and Taylor series
method to obtain numerical solution of ordinary differential equations (ODEs). To
begin with, we shall recall few basic concepts from the theory of differential equations
which we shall be referring to quite often.
3.1 OBJECTIVES
After studying this unit you should be able to:
• identify the initial value problem (IVP) for the first order ordinary differential
equations;
• state the difference between the single step and multi-step methods of finding
solution of IVP;
• obtain the solution of the initial value problems by using single-step methods
viz., Taylor series method and Euler’s method.
3.2 BASIC CONCEPTS

In this section we shall state a few definitions from the theory of differential equations
and define some concepts involved in the numerical solution of differential equations.
Definition: An equation involving one or more unknown function (dependent

variables) and its derivatives with respect to one or more known functions
(independent variables) is called differential equation.
For example,
42
dy Numerical Solution of
x = 2y (1) Ordinary Differential
dx Equations
∂z ∂z
x +y −z=0 (2)
∂x ∂y
are differential equations.
Differential equations of the form (1), involving derivatives w.r.t. a single independent
variable are called ordinary differential equations (ODEs) whereas, those involving
derivatives w.r.t. two or more independent variables are partial differential
equations (PDEs). Eqn. (2) is an example of PDE.
Definition: The order of a differential equation is the order of the highest order
derivative appearing in the equation and its degree is the highest exponent of the
highest order derivative after the equation has been rationalized i.e., after it has been
expressed in the form free from radicals and any fractional power of the derivatives or
negative power. For example equation
2 3
 d3y  2
  + 2 d y − dy + x 2  dy  = 0 (3)
 dx 3  dx 2 dx  dx 
 
is of third order and second degree. Equation
dy a
y= x +
dx dy / dx
is of first order and second degree as it can be written in the form
2
dy  dy 
y = x  + a (4)
dx  dx 
Definition: When the dependent variable and its derivatives occur in the first degree
only and not as higher powers or products, the equation is said to be linear, otherwise
it is nonlinear.
d2y dy
Equation 2
+ y = x 2 is a linear ODE, whereas (x+y)2 = 1 is a nonlinear ODE.
dx dx
2
∂2z  ∂2z 
∂2z
Similarly, + −   = 0, is a nonlinear PDE.
∂x 2 ∂y 2  ∂x ∂y 
In this unit we shall be concerned only with the ODEs.
The general form of a linear ODE of order n can be expressed in the form
L[y] ≡ a0 (t) y(n) (t) + a1 (t) y(n-1) (t) + …..+ an-1 (t) y′ (t) + an (t) y (t) = r(t) (5)
where r(t), ai (t), i = 0,1, 2, ……………., n are known functions of t and
dn d n −1 d
L = a0(t) n
+ a1 (t) n −1
+ ……………+an-1 (t) + an (t),
dt dt dt
is the linear differential operator. The general nonlinear ODE of order n can be
written as
43
Differentiation, F(t, y, y′, y′′, ……………, y(n)) = 0 (6)
Integration and
or, y(n) = f (t, y, y′, y′′,…………, y(n-1)). (7)
Eqn. (7) is called a canonical representation of Eqn. (6). In such a form, the highest
order derivatives is expressed in terms of lower order derivatives and the independent
variable.
The general solution of an nth order ODE contains n arbitrary constants. In order to
determine these arbitrary constants, we require n conditions. If all these conditions
are given at one point, then these conditions are known as initial conditions and the
differential equation together with the initial conditions is called an initial value
problem (IVP). The nth order IVP alongwith associates initial conditions can be
written as
y(n) (t) = f (t, y, y′, y′′, ………., y(n-1))

y(p) (t0) = y (0p ) , p = 0, 1, 2, …….., n-1. (8)
We are required to find the solution y(t) for t > to
If the n conditions are prescribed at more than one point then these conditions are
known as boundary conditions. These conditions are prescribed usually at two
points, say t0 and ta and we are required to find the solution y(t) between to <t<ta .The
differential equation together with the boundary conditions is then known as a
boundary value problem (BVP).
As may be seen below, the nth order IVP (8) is equivalent to solving following system
of n first order equations:
Setting y = y 1,
Similarly setting y'i −1 = yi, we may write
y′ = y′1= y2 y1(t0) = y0
y′2 = y3 y2(t0) = y′0
…………. ……………
y′n-1 = yn yn-1 (t0) = y (0n − 2)
y′n = f(t, y1,y2, ………., yn) yn (t0) = y (0n −1) ;
In vector notation, this system can be written as a single equation as
dy
= f (t, y), y (t0) = α (9)
dx
where y = (y1, y2 ……….., yn)T, f(t,y) = (y2, y3, ………., f(t, y1, ……., yn))T
α (y0, y′0, …….., y (0n −1) )T.
Hence, it is sufficient to study numerical methods for the solution of the first order
IVP.
y′ = f(t, y), y(t0) = y0 (10)
The vector form of these methods can then be used to solve Eqn. (9). Before
attempting to obtain numerical solutions to Eqn, (10), we must make sure that the
44
problem has a unique solution. The following theorem ensures the existence and Numerical Solution of
Ordinary Differential
uniqueness of the solution to IVP (10). Equations
Theorem 1: If f(t, y) satisfies the conditions
i) f(t, y) is a real function

ii) f(t, y) is bounded and continuous for t ε [t0,b], y ε ] - ∞, ∞ [
iii) there exists a constant L such that for any t ε [t0, b] and for any two numbers y1
and y2
 f (t, y1) – f (t, y2)  ≤ L  y1 –y2 
then for any y0, the IVP (10) has a unique solution. This condition is called the
Lipschitz condition and L is called the Lipschitz constant.
We assume the existence and uniqueness of the solution and also that f(t,y) has
continuous partial derivatives w.r.t. t and y of as high order as we desire.
Let us assume that [t0, b] be an interval over which the solution of the IVP (10) is
required. If we subdivide the interval [t0, b] into n subintervals using a step size
tn − to 
h=   , where tn = b, we obtain the mesh points or grid points t0, t1, t2, ….., tn
 n 
as shown in Fig. 1.
h h
t0 t1 t2 tk tn-1 tn = b
Fig. 1
We can then write tk = t0 + kh, k = 0, 1, ….., n. A numerical method for the solution
of the IVP (10), will produce approximate values yk at the grid points tk in a step by
step manner i.e. values of y1, y2, …. etc in unit order.
Remember that the approximate values yk may contain the truncation and round-off
errors. We shall now discuss the construction of numerical methods and related basic
concepts with reference to a simple ODE.
dy
= λy, tε [a b] (11)
dt
y (t0) = y0,
where λ in a constant.
Let the domain [a, b] be subdivided into N intervals and
let the grid points be defined by,
tj = t0 + jh, j = 0, 1, ………….., N
where t0 = a and tN = t0 + Nh = b.
Separating the variables and integrating, we find that the exact solution of Eqn. (11) is
y(t) = y(t0) eλ(t-t0) . (12)

y(t0) eλ(t-t0)
45
Differentiation, In order to obtain a relation between the solutions at
Integration and two successive permits, t = tn and tn+1 in Eqn. (12), we use,
y(tn) = y(t0) e λ ( t n − t 0 )
and
−t 0 )
y(tn+1) = y(t0) e λ ( t n +1 .
Dividing we get
y( t n +1 ) e λt n +1 − tn )
= λt = e λ ( t n +1 .
y( t n ) e n
Hence we have,
y(tn+1) = eλh y (tn), n = 0, 1, ….., N-1. (13)
Eqn. (13) gives the required relation between y (tn) and y(tn+1).
Setting n = 0, 1, 2, ……, N-1, successively, we can find y(t1), y(t2),…….., y(tN) from
the given value y(t0).
An approximate method or a numerical method can be obtained by approximating eλh

in Eqn. (13). For example, we may use the following polynomial approximations,
eλh = 1 + λh + 0 (λh2 ) (14)

2 2
λ h
eλh = 1 + λh + + 0 (λh3) (15)
2
λ2 h 2 λ3 h 3
eλh = 1 +λh + + + 0 ( λh4) (16)
2 6
and so on.
Let us retain (p + 1) terms in the expansion of eλh and denote the approximation to eλh
by E(λh). The numerical method for obtaining the approximate values yn of the exact
solution y(tn) can then be written as
yn+1 = E(λh) yn, n = 0, 1, ………, N-1 (17)
The truncation error (TE) of the method is defined by
TE = y (tn+1) – yn+1.
Since (p +1) terms are retained in the expansion of eλh, we have

TE = 1 + λh + ......... +
(λh ) + (λh ) p p +1  
e θλh  − 1 + λh + ....... +
(λ h ) p 

 p! (p+1)!  
  p! 

=
(λh )p +1 e θλh , 0<θ<1
(p + 1) !
The TE is of order p+1 and the numerical method is called of order p.
The concept of stability is very important in a numerical method.
46
We say that a numerical method is stable if the error at any stage, i.e. yn –y(tn) = εn Numerical Solution of
remains bounded as n → ∞.Let us examine the stability of the numerical method Equations
(17). Putting yn+1 = y (tn+1) + εn+1 and yn = y(tn) + εn in Eqn. (17), we get
y(tn+1) + εn+1 = E(λh) [y(tn) + εn]
εn+1 = E(λh) [y(tn) + εn] – y (tn+1)
which on using eqn. (13) becomes
εn+1 = E(λh) [y(tn) + εn] - eλhy(tn)
∴ εn+1 = [E(λh) - eλh] y(tn) + E(λh) εn (18)
We note from Eqn. (18) that the error at tn+1 consists of two parts. The first part
E(λh) -eλh is the local truncation error and can be made as small as we like by
suitably determining Eλh. The second part E(λh) εn is the propagation error
from the previous step tn to tn+1and will not grow if E(λh) < 1. If E(λh) < 1,
then as n → ∞ the propagation error tends to zero and method is said to be absolutely
stable. Further, a numerical method is said to be relatively stable if
E(λh) ≤ eλh, λ >0.
The polynomial approximations (14), (15) and (16) always give relatively stable
methods. Let us now find when the methods yn+1 = E(λh) yn are absolutely stable
where E(λh) is given by (14) (15) or (16).
These methods are given by
First order: yn+1 = (1 + λh) yn
 λ2 h 2 
Second order: yn+1 = 1 + λh +  y n and
 2 
 λ2 h 2 λ3 h 3 
Third order: yn+1 = 1 + λh + + y n
 2 6 
Let us examine the conditions for absolute stability in various methods:
First order: 1 + λh ≤ 1

or -1≤ 1 + λh ≤ 1
or -2≤ λh ≤ 0
λ2 h 2
Second order:  1 + λh + ≤1
2
λ2 h 2
or – 1 ≤ 1 + λh + ≤1
2
The right inequality gives
 λh 
λh 1 + ≤ 0
 2 
λh
i.e., λh ≤ 0 and 1 + ≥ 0.
2
The second condition gives –2 ≤ λh. Hence the right inequality gives –2 ≤ λh ≤ 0.
47
Differentiation, The left inequality gives
Integration and
λ2 h 2
2 + λh + ≥ 0.
2
For –2 ≤ λh ≤ 0, this equation is always satisfied. Hence the stability condition is

-2 ≤ λh ≤0.
λ2 h 2 λ3 h 3
Third order: 1 + λh + + ≤ 1
2 6
Using the right and left inequalities, we get
─ 2.5 ≤ λh ≤ 0.
These intervals for λh are known as stability intervals.
Numerical methods for finding the solution of IVP given by Eqn. (10) may be broadly
classified as,
i) Singlestep methods
ii) Multistep methods
Singlestep methods enable us to find yn+1, an approximation to y(tn+1), in terms of

yn, and y′n.
Multistep methods enable us to find yn+1, an approximation to y(tn+1), in terms of

yi, y′i , i = n, n-1, ……n-m+1 i.e. values of y and y′ at previous m points. Such
methods are called m-step multistep methods.
In this course we shall be discussing about the singlestep methods only.
A singlestep method for the solution of the IVP

y′ = f(t, y), y(t0) = y0, t ε (t0, b)
is a recurrence relation of the form
yn+1 = yn + h φ (tn, yn, h) (19)
where φ (tn, yn, h) is known as the increment function.
If yn+1 can be determined from Eqn. (19) by evaluating the right hand side, then the
singlestep method is known as an explicit method, otherwise it is known as an
implicit method. The local truncation error of the method (19) is defined by
TE = y ( t n +1 ) -y(tn) - h φ (tn, yn, h) . (20)
The largest integer p such that
h-1 TE = O (hp) (21)

is called the order of the singlestep method.
Let us now take up an example to understand how the singlestep method works.
Example 1: find the solution of the IVP y′ = λy, y(0) = 1 in 0 < t ≤ 0.5, using the first
order method
yn+1= (1 + λh) yn with h = 0.1 and λ = ± 1.
48
0.5 0.5 Numerical Solution of
Solution: Here the number of intervals are N = = =5 Ordinary Differential
h 0.1 Equations
We have y0 = 1
y1 = (1 + λh) y0 = (1 + λh) = (1 + 0.1λ)
y2 = (1 + λh) y1 = (1 + λh)2 = (1 + 0.1λ)2
---------------------------------------------------
y5 = (1 + λh)5 = (1 + 0.1λ)5
The exact solution is y(t) = eλt.

We now give in Table 1 the values of yn for λ = ± 1 together with exact values.
Table 1
Solution of y′ = λy, y(0) = 1, 0 ≤ t ≤ 0.5 with h = 01.
λ=1 λ = ─1
t First Order Exact First Order Exact
method Solution method Solution
0 1 1 1 1
0.1 1.1 1.10517 0.9 0.90484
0.2 1.21000 1.22140 0.81 0.81873
0.3 1.33100 1.34986 0.729 0.74082
0.4 1.46410 1.49182 0.6561 0.67032
0.5 1.61051 1.64872 0.59049 0.60653
Similarly you can obtain the solution using the second order method and compare the
results obtained in the two cases.
Ex 1) Find the solution of the IVP

y′ = λy, y(0) = 1
in 0 ≤ t ≤ 0.5 using the second order method
 λ2 h 2 
yn+1 = 1 + λh + y with h = 0.1 and λ = 1.
 n
 2 
We are now prepared to consider numerical methods for integrating differential

equations. The first method we discuss is the Taylor series method. It is not strictly a
numerical method, but it is the most fundamental method to which every numerical
method must compare.
3.3 TAYLOR SERIES METHOD

Let us consider the IVP given by Eqn. (10), i.e.,
y′ = f(t,y), y(t0) = y0, t ε [t0, b]
The function f may be linear or nonlinear, but we assume that f is sufficiently

differentiable w.r.t. both t and y.
The Taylor series expansion of y(t) about any point tk is given by
(t − t k )2 (t − t k )p
y(t) = y(tk) + (t-tk) y′ (tk) + y ′′ ( t k ) + ....... + y ( p ) ( t k ) +……. (22)
2! p!
Substituting t = tk+1 in Eqn. (22), we have

49
Differentiation, h 2 y ′′( t k ) h p y (p) ( t k )
Integration and y(tk+1) = y(tk) + hy′ (tk) + + ........ + + ..... (23)
Differential Equations 2! p!
where tk+1 = tk + h. Neglecting the terms of order hp+1 and higher order terms, we have
the approximation
h2 h p (p)
yk+1 = yk + hy′k + y ′k′ + ..... + y (24)
2! p! k
= yk + h φ (tk, yk, h)
h h p −1 ( p )
where φ (tk, yk, h) = y′k + y ′k′ + ..... + y
2! p! k
This is called the Taylor Series method of order p. The truncation error of the method
is given by
TE = y (tk+1) – y(tk ) - hφ (tk,y(tk), h) (25)
h p +1 ( p +1)
= y ( t k + θh ), 0 < θ < 1
(p + 1) !
when p = 1, we get from Eqn. (24)

yk+1 = yk + hy′k (26)
which is the Taylor series method of order one.
To apply (24), we must know y(tk), y′(tk), y′′(tk)………., y(p) (tk).
However, y(tk) is known to us and if f is sufficiently differentiable, then higher order

derivatives can be obtained by calculating the total derivative of the given differential
equation w.r.t. t, keeping in mind that y is itself a function of t. Thus we obtain for the
first few derivatives as:
y′ = f(t,y)
y′′ = ft + f fy
y′′′ = ftt +2 f fty + f2 fyy + fy (ft + ffy) etc.
∂ 2f
where ft = ∂f / ∂t , f tt = ∂ 2 f / ∂t 2 , fty = etc.
∂t 2 y
The number of terms to be included in the method depends on the accuracy

requirements.
Let p = 2. Then the Taylor Series method of O(h2) is

h 2 y ′k′
yk+1 = yk + hy′k + (27)
2
h3
with the TE = y′′′ (α) , t n < α < t n +1
6
The Taylor series method of O(h3), (p = 3) is
h 2 y ′k′ h3
yk+1 = yk + hy′k + + y ′k′′ (28)
2 6
h 4 ( IV)
with the TE = y (α ) , t k ≤ α ≤ t k +1
24
50
Let us consider the following examples. Numerical Solution of
Equations
Example 2: Using the third order Taylor series method find the solution of the
differential equation.
xy′ = x-y, y = 2 at x = 2, taking h =1.
Solution: We have the derivatives and their values at x=2, y=2 as follows:
y(2) = 2 is given. Further, xy′ = x-y can be written as

y
y′ = 1 -
x
2
y′ (2) = 1 - = 0.
2
Differentiating w.r.t. x, we get
y′ y
y′′ = 0 - +
x x2
0 2 1
y′′ (2) = - + = .
2 4 2
y 2
y′ = 1- =1- =1 −1
k 2
Similarly,
y′′ 2y′ 2y
y′′′ = − + − , y′′′ (2) = − 3 / 4
x x 2 x3
1 2× 0 2 x 2
y′′′ (2) = − + −
4 4 8
3
=
4
Using taylor series method of O(h3) given by Eqn. (28), we obtain

(.1) 2 (.1) 3
y(2 + .1) = y(2) + 01 x y′ (2) + y′′(2) + y′′′(2)
2 6
or
1
y( 2.1) = 2 + .1 x 0 + .005 x .5 + .001 x × (− .75)
6
= 2+0.0025-0.000125 = 2.002375.
Example 3: Solve the equation x2y′ = 1 –xy –x2y2, y(1) = -1 from x=1 to x=2 by
using Taylor series method of O(h2) with h = 1/3 and ¼ and find the actual error at
x=2 if the exact solution is y = -1/x.
1 y
Solution: From the given equation, we have y′ = 2
− − y2
x x
Differentiating it w.r.t. x, we get
−2 y′ y
y ′′ = 3
− + − 2 yy ′
x x x2
Using the second order method (27),

51
Differentiation, h2
Integration and yk+1 = yk +hy′k + y ′′k
We have the following results

1
h= , y(1) = -1, y′(1) = 1, y′′(1) = -2.
3
4
x1 = , y(x1) = ─ 0.7778, y′(x1) = 0.5409, y′′(x1) = ─ 0.8455.
3
5
x2 = , y(x2) = ─ 0.6445, y′(x2) = 0.3313, y′′(x2) = ─ 0.4358.
3
x3 = 2, y(x3) = ─ 0.5583 = y(2)

1
h=
4
5
x1 = , y(x1) = ─ 0.8125, y′(x1) = 0.6298, y′′(x1) = ─ 1.0244.
4
3
x2 = , y(x2) = ─ 0.6871, y′(x2) = 0.4304, y′′(x2) = ─ 0.5934.
2
7
x3 = , y(x3) = -0.5980, y′(x3) = 0.3106, y′′(x3) = ─ 0.3745.
4
x4 = 2, y(x4) = -0.5321 = y(2)
Since the exact value is y(2) =─0.5, we have the actual errors as
1
e1 = 0.0583 with h =
3
1
e2 = 0.0321 with h =
4
Note that error is small when the step size h is small.
You may now try the following exercise.
Write the Taylor series method of order four and solve the IVPs E2) and E3).
E2) y′ = x – y2, y(0) = 1. Find y(0.1) taking h = 0.1.
E3) y′ = x2 + y2, y(0) = 0.5. Find y(0.4) taking h = 0.2.
E4) Using second order Taylor series method solve the IVP
y
y′ = 3x + , y(0) = 1. Find y(0.6) taking h = 0.2 and h = 0.1.
2
Find the actual error at x = 0.6 if the exact solution is y = -6x –12.
Notice that though the Taylor series method of order p give us results of desired
accuracy in a few number of steps, it requires evaluation of the higher order
derivatives and becomes tedious to apply if the various derivatives are complicated.
Also, it is difficult to determine the error in such cases. We now consider a method,
called Euler’s method which can be regarded as Taylor series method of order one
and avoids these difficulties.
3.4 EULER’S METHOD

Let the given IVP be
y′ = f(t, y), y(t0) = y0.

52
Let [t0, b] be the interval over which the solution of the given IVP is to be Numerical Solution of
determined. Let h be the steplength. Then the nodal points are defined by tk = t0 + kh, Equations
k = 0, 1, 2, …………., N with tN = t0 +Nh = b.
h h
t0 t1 t2 tk tN = b
Fig. 1
The exact solution y(t) at t = tk+1 can be written by Taylor series as
h  2 
y(tk + h) = y(tk) + hy′ (tk) +   y ′′( t k ) + ..... (29)
 2 
Neglecting the term of O(h2) and higher order terms, we get
yk+1 = yk + hy′k (30)
h  2 
with TE =   y ′′ (α), t k < α < t k +1 (31)
 2 
From the given IVP, y′ (tk) = f(tk, yk) = fk
We can rewrite Eqn. (30) as
yk+1 = yk + h fk
for k = 0, 1, …….., N-1. (32)
Eqn. (32) is known as the Euler’s method and it calculates successively the solution at
the nodal points tk, k = 1, ………, N.
Since the truncation error (31) is of order h2, Euler’s method is of first order. It is also
called an O(h) method.
Let us now see the geometrical representation of the Euler’s method.
Geometrical Interpretation
dy
Let y(t) be the solution of the given IVP. Integrating = (t, y) from tk to tk+1, we get
dt
t k +1
t k +1 dy
∫
tk dt
dt = ∫ f (t, y) dt = y(t
tk
k +1 ) − y( t k ). (33)
We know that geometrically f(t, y) represents the slope of the curve y(t). Let us
approximate the slope of the curve between tk and tk+1 by the slope at tk only. If we
approximate y(tk+1) and y(tk) by yk+1 and yk respectively, then we have
t k +1
yk+1 – yk = f(tk, yk) ∫

tk
dt (34)
= (tk+1 – tk) f(tk, yk)

= hf (tk, yk)
53
Differentiation, ∴ yk+1 = yk + hf (tk, yk), k = 0, 1, 2, …., N-1.
Integration and
Thus in Euler’s method the actual curve is approximated by a sequence of the
segments and the area under the curve is approximated by the area of the quadrilateral.
Let us now consider the following examples.
Example 4: Use Euler method to find the solution of y′ = t + y, given y(0) = 1.

Find the solution on [0, 0.8] with h = 0.2.
Solution: We have
yn+1 = yn + hfn
y(0.2) ≈ y1 = y0 + (0.2) f0
= 1 + (0.2) [0 + 1] = 1.2
y(0.4) ≈ y2 = y1 + (0.2) f1
= 1.2 + (0.2) [0.2 + 1.2]
= 1.48
y(0.6) ≈ y3 = y2 + (0.2) f2
= 1.48 + (0.2) [0.4 + 1.48]
= 1.856
y(0.8) ≈ y4 = y3 + (0.2) f3
= 1.856 + (0.2) [0.6 + 1.856]
= 2.3472
Example 5: Solve the differential equation y′ = t+y, y(0) = 1. t ε [ 0, 1] by Euler’s

method using h = 0.1. If the exact value is y(1) = 3.436564, find the exact error.
Solution: Euler’s method is
yn+1 = yn + hy′n
For the given problem, we have

yn+1 = yn + h [tn + yn]
(1 + h) yn + htn.
h = 0.1, y(0) = 1,
y1 = y0 = (1+0.1) + (0.1) (0) = 1.1
y2 = (1.1) (1.1) + (0.1) (0.1) = 1.22, y3 = 1.362
y4 = 1.5282, y5 = 1.72102, y6 = 1.943122,
y7 = 2.197434, y8 = 2.487178, y9 = 2.815895
y10 = 3.187485 ≈ y(1)
exact error = 3.436564-3.187485 = .249079
Example 6: Using the Euler’s method tabulate the solution of the IVP
y′ = -2ty2, y(0) =1
in the interval [0,1] taking h = 0.2, 0.1.

Solution: Euler’s method gives
yk+1= yk + h fk where tk = -2tk y 2k

= yk –2h tk y 2k .
Starting with t0 = 0, y0 = 1, we obtain the following table of values for h = 0.2.

54
Table 2: h = 0.2 Numerical Solution of
T y(t) Ordinary Differential
Equations
0.2 0.92
0.4
0.6 0.78458
0.8 0.63684
1.0 0.50706
Thus y(1.0) = 0.50706 with h = 0.2

Similarly, starting with t0 = 0, y0 = 1, we obtain the following table of values for h =
0.1.
Table 3: h = 0.1
t y(t) T y(t)
0.1 1.0 0.6 0.75715
0.2 0.98 0.7 0.68835
0.3 0.94158 0.8 0.62202
0.4 0.88839 0.9 0.56011
0.5 0.82525 1.0 0.50364
y(1.0) = 0.50364 with h = 0.1.
Remark: Since the Euler’s method is of O(h), it requires h to be very small to obtain
the desired accuracy. Hence, very often, the number of steps to be carried out becomes
very large. In such cases, we need higher order methods to obtain the required
accuracy in a limited number of steps.
Euler’s method constructs yk ≈ y(tk) for each k = 1, 2, …., N,

where
yk+1 = yk + hf(tk, yk).
This equation is called the difference equation associated with Euler’s method. A
difference equation of order N is a relation involving yn, yn+1, ……, yn+N. Some simple
difference equations are
y n +1 − y n = 1 

y n +1 − y n = n  (35)
y n +1 − (n + 1) y n = 0
where n is an integer.
A difference equation is said to be linear if the unknown function y n + k (k = 0, 1,

……., N) appear linearly in the difference equation. The general form of a linear non-
homogeneous difference equation of order N is
yn+N + aN-1 yn+N-1 +………..+a0 yn = b (36)
where the coefficients aN-1, aN-2, …………, a0 and b may be functions of n but not of y.
All the Eqns. (35) are linear. It is easy to solve the difference Eqn. (36), when the
coefficients are constant or a linear or a quadratic function of n.
The general solution of Eqn. (36) can be written in the form
yn = yn(c) + y (np )
where yn (c) is the complementary solution of the homogenous equation associated

with Eqn. (36) and yn(p) is a particular solution of Eqn. (36). To obtain the
complementary solution of the homogeneous equations, we start with a solution in the
55
Differentiation, form yn = βn and substitute it in the given equation. This gives us a polynomial of
Integration and
degree N. We assume that its roots β1, β2,…………,βN are all real and distinct.
Therefore, the general solution of the given problem is

5
yk = C (1+3h)k -
3
Using the condition y(0) = 1, we obtain C = 8/3.
Thus
8 5
yk = (1 + 3h ) k − .
3 3
Eqn. (41) gives the formula for obtaining yk ∀ k.
8 5
y6 = y(0.6) = (1 + 3 × 0.1) 6 −
3 3
= 11.204824.
Now Euler’s method is
yk+1 = (1 + 3h) yk + 5h
and we get for h = 0.1
y1 = 1.8, y2 = 2.84, y3 = 4.192, y4 = 5.9496, y5 = 8.23448, y6 = 11.204824.
You may now try the following exercises
Solve the following IVPs using Euler’s method
E5) y′ = 1 –2xy, y(0.2) = 0.1948. Find y(0.4) with h = 0.2
1
E6) y′ = , y(4) = 4. Find y(4.1) taking h = 0.1
x − 4y '
2
y−x
E7) y′ = , y(0) = 1. Find y(0.1) with h = 0.1
y + x'
E8) y′ = 1 + y2, y(0) = 1. Find y(0.6) taking h = 0.2 and h = 0.1
E9) Use Euler’s method to solve numerically the initial value problem y′ = t + y,
y(0) = 1 with h = 0.2, 0.1 and 0.05 in the interval [0.06].
3.5 SUMMARY
In this unit, we have covered the following
1) y′ = f(t, y), y (t0) = y0, tε [t0, b] is the initial value problem (IVP) for the first
order ordinary differential equation.
2) Singlestep methods for finding the solution of IVP enables us to find
yn+1, if yn, y′n and h are known.
3) Multistep method for IVP enables as to find yn+1, if yi, y′i, i = n, n-1, …..,
n-m+1 and h are known and are called m-step multistep methods.
56
4) Taylor series method of order p for the solution of the IVP Numerical Solution of
is given by Equations
yk+1 = yk + h φ [ tk, yk, h]
h h p −1 ( p )
where φ [ tk, yk, h] = y′k + y ′k′ + ........... + y and t k = t 0 + kh , k = 0, 1, 2,
2! p! k
…………..N-1, tN = b. The error of approximation is given by
h p +1 ( p +1)
TE = y (tk + θh), 0 < θ < 1.
(p + 1) !
5) Euler’s method is the Taylor series method of order one. The steps involved in
solving the IVP given by (10) by Euler’s method are as follows:
Step 1: Evaluate f(t0, y0)

Step 2: Find y1 = y0 + h f(t0, y0)
Step 3: If t0 < b, change t0 to t0 + h and y0 to y1 and repeat steps 1 and 2
Step 4 : If t0 = b, write the value of y1.
E1) We have y0 = 1, λ = 1, h = 0.1
 (0.1) 2 
y1 = 1 + 0.1 + 
 2 
 
y2 = (1.105)2
----------------
y5 = (1.105)5
Table giving the values of yn together with exact values is
Table 4
t Second order method Exact solution
0 1 1
0.1 1.105 1.10517
0.2 1.22103 1.22140
0.3 1.34923 1.34986
0.4 1.49090 1.49182
0.5 1.64745 1.64872
E2) Taylor series method of 0(h4) to solve y′ = x-y2, y(0) = 1 is
h2 h3 h 4 iv
yn+1 = yn + hy′n + y ′n′ + y ′n′′ + yn
2 6 24
y′ = x – y2 y′ (0) = -1
y′′ = 1-2xy′ y′′ (0) = 3
y′′′ = -2xy′′ - 2(y′)2 y′′′(0) = -8
yiv = -2yy′′ - 6y′ y′′ yiv (0) = 34
Substituting
(0.1) 2 (0.1) 3 (0.1) 4
y(0.1) = 1 –(0.1) (-1) + (3) + (−8) + (34)
2 6 24
= 0.9138083
E3) Taylor series method
y′ = x2 + y2, y(0) = 0.5, y′ (0) = 0.25, y′(0.2) = 0.35175

57
Differentiation, y′′ = 2x + 2yy′ y′′ (0) = 0.25, y′′ (0.2) = 0.79280
Integration and
y′′′ = 2 + 2yy′′ + 2(y′)2 y′′′(0) = 2.375, y′′′ (0.2) = 3.13278
yiv = 2yy′′′ + 6y′ y′′ yiv (0) = 2.75, yiv (0.2) = 5.17158
E4) Second order Taylor’s method is

h2
yn+1 = yn + hy′n + y ′n′ .
2
h = 0.2
y(0) = 1. y′(0) = 0.5, y′′(0) = 3.25
y(0.2) = 1.165, y′(0.2) = 1.1825, y′′(0.2) = 3.59125
y(0.4) = 1.47333, y′(0.4) = L93667, y′′(0.4) = 3.96833
y(0.6) = 1.94003
h = 0.1
y(0.1) = 1.06625, y′(0.1) = 0.83313, y′′(0.1) = 3.41656
y(0.2) = 1.16665, y′(0.2) = 1.18332, y′′(0.2) = 3.59167
y(0.3) = 1.46457, y′(0.3) = 1.63228, y′′(0.3) = 3.81614
y(0.4) = 1.64688, y′′(0.4) = 2.02344, y′′(0.4) = 4.01172
y(0.5) = 1.86928, y′(0.5) = 2.43464, y′′(0.5) = 4.21732
y(0.6) = 2.13383
E5) Euler’s method is yk+1 = yk + hfk = yk + h (1-2xk yk)

y(0.4) = 0.1948 + (0.2) (1-2 x 0.2 x 0.1948)
= 0.379216.
1
E6) y′ = 2
, y(4) = 4, y′(4) = 0.05
x +y
y(4.1) = y(4) + hy′(4)
= 4+ (0.1) (0.05) = 4.005.
E7) Euler’s method y′ = (y-x) / (y + x), y(0) = 1, y′(0) = 1

y(0.1) = 1 + (0.1) (1) = 1.1
E8) Euler’s method is

yk+1 = h + yk + hy 2k
Starting with t0 = 0 and y0 = 1, we have the following tables of values

Table 5: h = 0.2
T Y(t)
0.2 1.4
0.4 1.992
0.6 2.986
∴ y(0.6) = 2.9856
Table 6: h = 0.1
T Y(t)
0.1 1.2
0.2 1.444
0.3 1.7525
0.4 2.1596
0.5 2.7260
0.6 3.5691
∴ y(0.6) = 3.5691
58
E9) yk+1 = yk + h (tk + yk) = (1 + h) yk + htk Numerical Solution of
Equations
Starting with t0 = 0, y0 = 1, we obtain the following of values.
Table 7: h = 0.2
T Y(t)
0.2 1.2
0.4 1.48
0.6 1.856
∴ y(0.6) = 1.856 with h = 0.2
Table 8: h = 0.1
t y(t)
0.1 1.1
0.2 1.22
0.3 1.362
0.4 1.5282
0.5 1.72102
0.6 1.943122
∴y(0.6) = 1.943122 with h = 0.1
Table 9: h = 0.05
t y(t) T y(t)
0.05 1.05 0.35 1.46420
0.10 1.105 0.40 1.55491
0.15 1.16525 0.45 1.65266
0.20 1.23101 0.50 1.75779
0.25 1.30256 0.55 1.87068
0.30 1.38019 0.60 1.99171
∴ y(0.6) = 1.99171 with h = 0.05.
59
Differentiation,
Integration and
60
UNIT 4 SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS
USING RUNGE-KUTTA METHODS
Structure Page Nos.
4.0 Introductions 60
4.1 Objectives 60
4.2 Runge-Kutta Methods 60
4.2.1 Runge-Kutta Methods of Second Order
4.2.2 Runge-Kutta Methods of Third Order
4.2.3 Runge-Kutta Methods of Fourth Order
4.3 Summary 75
4.4 Solutions/ Answers 76
4.0 INTRODUCTION
In unit 3, we considered the IVPs
y΄ = f(t, y), y΄(t0) = y0 (1)
and developed Taylor series method and Euler’s method for its solution. As mentioned earlier, Euler’s method being a first order
method, requires a very small step size for reasonable accuracy and therefore, may require lot of computations. Higher order Taylor
series require evaluation of higher order derivates either manually or computationally. For complicated functions, finding second,
third and higher order total derivatives is very tedious. Hence, Taylor series methods of higher order are not of much practical use in
finding the solutions of IVPs of the form given by Eqn. (1).
In order to avoid this difficulty, at the end of nineteenth century, the German mathematician, Runge observed that the expression for
the increment function Ø (t, y, h) in the single step methods [see Eqn. (24) of Sec. 7.3, Unit 7]
yn+1 = yn + h ø (tn, yn, h) (2)
can be modified to avoid evaluation of higher order derivatives. This idea was further developed by Runge and Kutta (another
German mathematician) and the methods given by them are known as Runge-Kutta methods. Using their ideas, we can construct
higher order methods using only the function f(t, y) at selected points on each subinterval. We shall, in the next section, derive some
of these methods.
4.1 OBJECTIVES
After going through this unit, you should be able to:
• State the basic idea used in the development of Runge-Kutta methods;
• Obtain the solution of IVPs using Runge-Kutta methods of second, third and fourth order, and
• Compare the solutions obtained by using Runge-Kutta and Taylor series methods.
4.2 RUNGE-KUTTA METHODS

We shall first try to discuss the basic idea of how the Runge-Kutta methods are developed.
Consider the O(h2) singlestep method
h2
yn+1 = yn + hy′n + y″n (3)
2
If we write Eqn. (3) in the form of Eqn. (2) i.e., in terms of Ø [tn, yn, h] involving partial derivatives of f(t, y), we obtain
Ø(t, y, h) = f(tn, yn) +

h
2
[ ]
f t (t n , y n ) + f (t n , y n ) f y (t n , y n ) (4)
Runge observed that the r.h.s. of Eqn. (4) can also be obtained using the Taylor series expansion of f(tn + ph, yn + qhfn) as
f(tn + ph, yn + qhfn) ≈ fn + ph ft (tn, yn) + qhfn fy (tn, yn), (5)
Taylor’s expansion in two variables t, y
60
Solution of Ordinary
using Runge-Kutta
where fn = f(tn, yn) Methods
Comparing Eqns. (4) and (5) we find that p = q = ½ and the Taylor series method of O(h2) given by Eqn. (3) can also be written as
⎛ h h ⎞
y n +1 = y n + hf ⎜ t n + , y n + f n ⎟ (6)
⎝ 2 2 ⎠
Since Eqn. (5) is of O(h2), the value of yn+1 in (6) has the TE of O(h3). Hence the method (6) is of O(h2) which is same as that of (3).
The advantage of using (6) over Taylor series method (3) is that we need to evaluate the function f(t, y) only at two points (tn, yn) and
⎛ h h ⎞
⎜ t n + , y n + f n ⎟ . We observe that f(tn, yn) denotes the slope of the solution curve to the IVP (1) at (tn, yn). Further,
⎝ 2 2 ⎠
⎡ h ⎛h⎞ ⎤
f ⎢ t n + , y n + ⎜ ⎟f n ⎥ denotes an approximation to the slope of the solution curve at
⎣ 2 2
⎝ ⎠ ⎦
⎡ h ⎛ h ⎞⎤
the point ⎢ t n + , y⎜ t n + ⎟⎥ Eqn. (6) denotes geometrically, that the slope of the
⎣ 2 ⎝ 2 ⎠⎦
solution curve in the interval [t n , t n +1 ] is being approximated by an approximation to
h
the slop at the middle points tn + . This idea can be generalized and the slope of the solution curve in [t n , t n +1 ] can be replaced by a
2
weighted sum of slopes at a number of
points in [t n , t n +1 ] (called off-step points). This idea is the basis of the Runge-Kutta methods.
Let us consider for example, the weighted sum of the slopes at the two points [tn, yn] and [tn + ph, yn + qhfn], 0 < p, q < 1 as
φ (tn, yn, h) = W1f(tn, yn) + W2f [tn + ph, yn + qhfn] (7)
We call W1 and W2 as weights and p and q as scale factors. We have to determine the four unknowns W1, W2, p and q such that Ø (tn,
yn, h) is of O(h2). Substituting Eqn. (5) in (7), we have
φ (tn, yn, h) = W1fn +W2 [fn + phft (tn, yn) + phfn fy (tn, yn)]. (8)
putting this in (2) we get,
y n +1 = y n + h[W1 f n + W2 {f n + phf t (t n , y n ) + qhf n fy(t n , y n )}]

= y n + h (W1 + W2 )f n + h 2 W2 (pf t + qf n f y )n (9)
where ( )n denotes that the quantities inside the brackets are evaluated at (tn, yn). Comparing the r.h.s. of Eqn. (9) with Eqn. (3), we
find that
W 1 + W2 = 1
1
W2 p = W2 q = (10)
2
In the system of Eqns. (10), since the number of unknowns is more than the number of equations, the solutions is not unique and we
have infinite number of solutions. The solution of Eqn. (10) can be written as
W1 = 1 – W2
p = q = 1/(2W2) (11)
1
By choosing W2 arbitrarily we may obtain infinite number of second order Runge-Kutta methods. If W2 = 1, p = q = and W1 = 0,
2
1 1
then we get the method (6). Another choice is W2 = which gives p = q = 1 and W1 = . With this choice we obtain from (7), the
2 2
method
61
h
y n +1 = y n + [f (t n , y n ) + f (t n + h, y n + hf n )] (12)
2
which is known as Heun’s method.
Note that when f is a function of t only, the method (12) is equivalent to the trapezoidal rule of integration, whereas the method (6) is
equivalent to the midpoint rule of integration. Both the methods (6) and (12) are of O(h2). The methods (6) and (12) can easily be
implemented to solve the IVP (1). Method (6) is usually known as improved tangent method or modified Euler method. Method
(12) is also known as Euler – Cauchy method.
We shall now discuss the Runge-Kutta methods of O(h2), O(h3) and O(h4).
4.2.1 Runge-Kutta Methods of Second Order

The general idea of the Runge-Kutta (R-K) methods is to write the required methods as
y n +1 = y n + h (weighted sum of the slopes).
m
= yn + ∑W K
i =1
i i (13)
where m slopes are being used. These slopes are defined by

K1 = hf(tn, yn),
K2 = hf(tn + C2h, yn + a21K1),
K3 = hf(tn + C3h, yn + a31K1 + a32K3),
K4 = hf(tn + C4h, yn + a41K1 + a42K2 + a43K3),
etc. In general, we can write
⎡ i −1 ⎤
K i = hf ⎢ t n + C i h ,
⎢⎣
∑j=1
a ij K j ⎥ , i = 1, 2, ….., m with C1 = 0
⎥⎦
(14)
The parameters Ci, aij, Wj are unknowns and are to be determined to obtain the Runge-Kutta methods.
We shall now derive the second order Runge-Kutta methods.
Consider the method as
Yn+1 = yn + W1K1 + W2K2, (15)
where
K1 = hf(tn, yn)
K2 = hf(tn + C2h, yn + a21K1), (16)
where the parameters C2, a21, W1and W2 are chosen to make yn+1 closer to y(tn=1).
The exact solution satisfies the Taylor series
h2 h3
y(t n +1 ) = y(t n ) + hy′(t n ) + y′′(t n ) + y′′′(t n ) + .... (17)
2 6
where
y ′ = f (t , y )
y′′ = f t + ff y
y′′′ = f tt + 2ff ty + f yyf 2 + f y (f t + ff y )
We expand K1 and K2 about the point (tn,yn)
K1 = hf(tn, yn) = hfn
62
K2 = hf(tn + C2h, yn + a21hfn) using Runge-Kutta
Methods
⎧ 1
2!
( )
h ⎨f (t n , y n ) + (C 2 hf t + a 21hf n f y ) + C 22 h 2f tt + 2C 2a 21h 2f n f ty + a 221h 2f n2f yy + ...⎬
⎫
⎩ ⎭
Substituting these values of K1 and K2 in Eqn. (15), we have
[
y n +1 = y n + (W1 + W2 )hf n + h 2 W2C 2f t + W2a 21f n f y ]
+
h3
2
(
W2 C 22f tt + 2C 2 a 21f n f ty + a 221f n2f yy + .... ) (18)
Comparing Eqn. (18) with (17), we have
W 1 + W2 = 1
1
C2W2 =
2
1
a21W2 =
2
From these equations we find that if C2 is chosen arbitrarily we have
a 21 = C2 , W2 = 1 / (2C2 ), W1 = 1 − 1 / (2C2 ) (19)
The R-K method is given by
y n +1 = y n + h[W1f (t n , y n ) + W2 f (t n + C 2 h, y n + C 2 hf n )]
and Eqn. (18) becomes
y n +1
h2
= y n + hf n (f1 + f n f y ) +
2
C2h 3
4
( 2
)
f tt + 2f n t yy + f n f yy + ..... (20)
Subtracting Eqn. (20) from the Taylor series (17), we get the truncation error as
TE = y(tn+1) – yn+1
⎡⎛ 1 C
h 3 ⎢⎜⎜ − 2
⎞
( ) 1
( ⎤
)
⎟⎟ f tt + 2f n f ty + f n2 f yy + fy f t + f n f y ⎥ + .....
⎣⎝ 6 4 ⎠ 6 ⎦
=
h3
12
[ ]
(2 − 3C 2 )y ′′′ + 3C 2 f y y ′′ + ..... (21)
Since the TE is of O(h3), all the above R-K methods are of second order. Observe that no choice of C2 will make the leading term of
TE zero for all f(t, y). The local TE depends not only on derivatives of the solution y(t) but also on the function f(t, y). This is typical
of all the Runge-Kutta methods. Generally, C2 is chosen between 0 and 1 so that we are evaluating f(t, y) at an off-step point in [tn,
tn+1]. From the definition, every Runge-Kutta formula must reduce to a quadrature formula of the same order or greater if f(t, y) is
independent of y, where Wi and Ci will be weights and abscissas of the corresponding numerical integration formula.
Best way of obtaining the value of the arbitrary parameter C2 in our formula is to
i) choose some of Wi’s zero so as to minimize the computations.

ii) choose the parameters to obtain least TE,
iii) choose the parameter to have longer stability interval.
Methods satisfying either of the condition (ii) or (iii) are called optimal Runge-Kutta methods.
We made the following choices:
63
1 1
i) C2 = , ∴ a 21 = , W1 = 0, W2 = 1, then
2 2
yn+1 = yn + K2,
K1 = hf (tn, yn),
⎛ h K ⎞
K 2 = hf ⎜⎜ t n + , y n + 1 ⎟⎟ (22)
⎝ 2 2 ⎠
which is the same as improved tangent or modified Euler’s method.
1
ii) C 2 = 1, ∴ a 21 = 1, W1 = W2 = , then
2
1
y n +1 = y n + (K 1 + K 2 ),
2
K 1 = hf (t n , y n ),
K 2 = hf (t n + h, y n + K 1 ) (23)
which is same as the Euler-Cauchy method.
2 2 1 3
iii) C2 = , ∴ a 21 = , W1 = , W2 = , then
3 3 4 4
1
y n +1 = y n + (K 1 + 3K 2 ),
4
K 1 = hf (t n , y n ),
⎛ 2h 2K 1 ⎞
K 2 = hf ⎜⎜ t n + , yn + ⎟ (24)
⎝ 3 3 ⎟⎠
which is the optimal R-K method.
Method (24) is optimal in the sense that it has minimum TE. In other words, with the above choice of unknowns, the leading term in
the TE given by (21) is minimum. Though several other choices are possible, we shall limit our discussion with the above three
methods only.
In order to remember the weights W and scale factors Ci and aij we draw the following table:
C2 a21 1/2 1/2
W1 W2 0 1
General form Improved tangent method
1 1 2/3 2/3
1/2 1/2 1/4 ¼
Heun’s method Optimal method
We now illustrate these methods through an example.
64
using Runge-Kutta
Methods
Example 1: Solve the IVP y′ = − ty 2 , y(2 ) = 1 and find y(2.1) and y(2.2) with h = 0.1 using the following R-K methods of O(h2)
a) Improved tangent method [modified Euler method (22)]

b) Heun’s method [Euler-Cauchy method (23)]
c) Optimal R-K method [method (24)]
d) Taylor series method of O(h2).
Compare the results with the exact solution
2
y(t ) = 2
t −2
Solution: We have the exact values

y(2.1) = 0.82988 and y(2.2) = 0.70422
a) Improved tangent method is
y n +1 = y n + K 2
K 1 = hf (t n , y n )
⎛ h K ⎞
K 2 = hf ⎜ t n + y n + 1 ⎟ .
⎝ 2 2 ⎠
For this problem f( t, y) = ─ t y2 and
K1 = (0.1) [( −2) (1)]= − 0.2
[ ]
K 2 = (0.1) ( −2.05) (1 − 0.1) 2 = − 0.16605
y(2.1) = 1-0.16605 = 0.83395
Taking t1 = 2.1 and y1 = 0.83395, we have
[ ]
K1 = hf (t1 , y1 ) = (0.1) ( −2.1) (0.83395) 2 = − 0.146049
⎛ h k ⎞
K2 = hf ⎜ t1 + , y1 + 1 ⎟ .
⎝ 2 2⎠
[ ]
= (0.1) − ( 2.15) (0.83395 − 0.0730245) 2 = − 0.124487
y(2.2) = y1 +k2 = 0.83395 ─ 0.124487 = 0.70946
b) Heun’s method is :
1
y n +1 = y n + ( K1 + K 2 )
2
K1 = hf (t n , y n ) = − 0.2
K 2 = hf (t n + h , y n + K1 ) = − 0.1344
y(2.1) = 0.8328
K1 = ─ 0.14564, K2 = ─ 0.10388
y(2.2) = 0.70804
c) Optimal method is:
65
1
y n +1 = y n + (K1 + 3K 2 )
4
K1 = hf (t n , y n ) = − 0.2
⎛ 2h 2k ⎞
K 2 = hf ⎜ t n + , y n + 1 ⎟ = 0.15523
⎝ 3 3 ⎠
y(2.1) = 0.83358

K1 = ─ 0.1459197, K2 = ─ 0.117463
y (2.2) = 07090
d) Taylor series method of O(h2) :
h2
y n +1 = y n + hy'n + y"n
2
y' = − ty 2 , y" = − y 2 − 2 tyy'
y(2) = 1, y’ (2) = -2, y” (2) = 7

y(2.1) = 0.8350
with t1 = 2.1, y1 = 0.835, we get

y ′ (2.1) = -1.4641725, y ′′ (2.1) = 4.437627958
y(2.2) = 0.71077
We now summarise the results obtained and give them in Table 1.
Table 1
2
Solution and errors in solution of y ′ = ─ t y , y (2) = 1, h = 0.1. Number inside brackets denote the errors.
T Method Method Method Method Exact

(22) (23) (24) Taylor Solution
O(h2)
2.1 0.83395 0.8328 0.83358 0.8350 0.8299
(0.00405) (0.0029) (0.00368) (0.0051)
2.2 0.70746 0.708.4 0.7090 0.71077 0.7042
(0.0033) (0.00384) (0.0048) (0.00657)
You may observe her that all the above numerical solutions have almost the same error. You may now try the following exercises:
Solve the following IVPs using Heun’s method of O(h2) and the optimal R-K method of O(h2).
E1) 10y’ = t2+y2, y(0) = 1. find y (0.2) taking h = 0.1.

E2) y’ = 1+y2, y(0) = 0. Find y(0.4) taking h = 0.2. iven that the exact solution is y(t) = tan t, find the errors.
Also compare the errors at t = 0.4, obtained here with the one obtained by Taylor series method of O(h2)
1
E3) y' = 3t + y, y(0) =1. Find y(0.2) taking h = 0.1. Given y(t) = 13e t / 2 − 6 t − 12, find the error.
2
Let us now discuss the R-K methods of third order.
4.2.2 Runge-Kutta Methods of Third Order

Here we consider the method as
y n +1 = y n + W1K1 + W2 K 2 + W3 K 3 (25)
66
where using Runge-Kutta
Methods
K1 = h f(tn, yn)
K2 = hf (tn + C2h, yn + a21 K1)
K3 = hf (tn + C3h, Yn + a31K1 + a32 K2)
Expanding K2, K3 and Yn+1 by Taylor series, substituting their values in Eqn. (25) and comparing the coefficients of powers of h, h2
and h3, we obtain
1
a 21 = C 2 C 2 W2 + C3 W3 =
2
2 2 1
a 31 + a 32 = C3 C W2 + C W3 = (26)
2 3 3
1
W1 + W2 + W3 =1 C 2a 32 W3 =
6
We have 6 equations to determine the 8 unknowns (3 W′s + 2C′s + 3a′s). Hence the system has two arbitrary parameters. Eqns. (26)
are typical of all the R-K methods. Looking at Eqn. (26), you may note that the sum of aij’s in any row equals the corresponding Ci’s
and the sum of the Wi’s is equal to 1. Further, the equations are linear in w2 and w3 and have a solution for W2 and W3 if and only if
C2 C3 − 1/ 2
2 2
C C − 1/ 3 =0
0 C 2 a 32 − 1/ 6
(Ref. Sec. 8.4.2, Unit 8, Block-2, MTE-02, IGNOU material).
Expanding the determinant and simplifying we obtain
C 2 ( 2 − 3C 2 ) a 32 − C3 (C 3 − C 2 ) = 0, C 2 ≠ 0 (27)
Thus we choose C2, C3 and a32 satisfying Eqns. (27).
Since two parameters of this system are arbitrary, we can choose C2, C3 and determine a32 from Eqn. (27) as
C 3 (C 3 − C 2 )
a 32 =
C 2 (2 − 3C 2 )
2
If C3 = 0, or C2 = C3 then C2 = and we can choose a 32 ≠ 0, arbitrarily. All Ci’s should be chosen such that 0 < Ci < 1. Once C2 and
3
C3 are prescribed, Wi’s and aij’s can be determined from Eqns. (26).
We shall list a few methods in the following notation
C2 a21
C3 A31 A32
W1 W2 W3
i) Classical third order R-K method
1/2 1/2
1 -1 2
1/6 4/6 1/6
Yn +1 = Yn + 1
6
(K1 + 4k 2 + K 3 ) (28)
67
K1 = hf ( t n , y n )
⎛ h K ⎞
K 2 = hf ⎜ t n + , y n + 1 ⎟
⎝ 2 2 ⎠
h
K 3 = hf ( t n _ ; y n − K1 + 2K 2 )
2
ii) Heun’s Method
1/3 1/3
2/3 0 2/3
1/4 0 3/4
1
y n +1 = y n + (K1 + 3K 3 ) (29)
4
K1 = h f ( t n , y n )
⎛ h K ⎞
K 2 = h f ⎜ t n + , yn + 1 ⎟
⎝ 3 3 ⎠
⎛ 2K 2 ⎞
K 3 = h f ⎜ t n + 2h , y n + ⎟
⎝ 3 3 ⎠
iii) Optimal method
½ 1/2
¾ 0 ¾
2/9 3/9 4/9
1
y n +1 = y n + (2K1 + 3K 2 + 4K 3 ) (30)
9
K1 = h f ( t n , y n ) ,
h K
K 2 = f (t n + , y n + 1 ),
2 2
⎛ 3h 3K 2 ⎞
K3 = h f ⎜ t n + , yn + ⎟.
⎝ 4 4 ⎠
We now illustrate the third order R-K methods by solving the problem considered in Example 1, using (a) Heun’s method (29) (b)
optimal method (30).
a) Heun’s method
68
1
(K1 + 3K 3 )
using Runge-Kutta
y n +1 = y n + Methods
4
K1 = h f ( t n , y n )
= − 0.2
⎛ h K ⎞
K 2 = h f ⎜ t n + , yn + 1 ⎟
⎝ 3 3 ⎠
= − 0.17697
⎛ 2h 2k ⎞
K3 = h f ⎜ t n + , yn + 2 ⎟
⎝ 3 3 ⎠
= − 0.16080
y(2.1) = 0.8294
Taking t1 = 2.1and y1 = 0.8294, we have
K1 = − 0.14446
K 2 = − 0.13017
K 3 = − 0.11950
y(2.2) = 0.70366
b) Optimal method
1
y n +1 = y n + (2K1 + 3K 2 + 4K 3 )
9
K 1 = − 0 .2
K 2 = −0.16605
K 3 = −0.15905
y(2.1) = 0.8297
Taking t1 = 2.1and y1 = 0.8297, we have
K1 = −0.14456
K 2 = −012335
K 3 = − 0.11820
y(2.2) = 0.70405
You can now easily find the errors in these solutions and compare the results with those obtained in Example 1.
And now here is an exercise for you.
E4) Solve the IVP

Y’ = y-t y (0) =2
Using third order Heun’s and optimal R-K methods. Find (02) taking h = 0.1. Given the exact solution to be y (t) = 1+t+et, find the
errors at t = 0.2.
We now discuss the fourth order R-K methods.
4.2.3 Runge-Kutta Methods of Fourth Order

Consider the method as
y n +1 = y n + W1K. + W2 K 2 + W3K 3 + W4 K 4
K1 = h f ( t n , y n ),
K 2 = h f ( t n + C 2 h , y n + a 21 K1 ), (31)
K 3 = h f ( t n + C3h , y n + a 31K1 + a 32 K 2 ),
K 4 = h f ( t n + C 4 h , y n + a 41K 2 + a 43 K 3 ).
Since the expansions of K2, K3, K4 and yn+1 in Taylor series are completed, we shall not write down the resulting system of equations
for the determination of the unknowns. It may be noted that the system of equations has 3 arbitrary parameters. We shall state directly
a few R-K methods of O(h4). The R-K methods (31) can be denoted by
69
C2 a21
C3 a31 a32
C4 a41 a42 a43
W1 W2 W3 W4
For different choices of these unknowns we have the following methods :
i) Classical R-K method
½ 1/2
½ 0 ½
1 0 0 1
1/6 2/6 2/6 1/6
1
y n +1 = y n + (K1 + 2K 2 + 2K 3 + K 4 ),
6
K1 = h f ( t n , y n ),
⎛ h K ⎞
K 2 = h f ⎜ t n , y n + 1 ⎟, (32)
⎝ 2 2 ⎠
⎛ h K ⎞
K 3 = h f ⎜ t n + , y n + 2 ⎟,
⎝ 2 2 ⎠
K 4 = h f (t n + h , y n + K 3 ).
This is the widely used method due to its simplicity and moderate order. We shall also be working out problems mostly by the
classical R-K method unless specified otherwise.
1/2 1/2
1/2 ( 2 −1 (2 − 2
2 2
1 0 2 2
1+
2 2
1/6 (2 − 2 (2 + 2 1/6
6 6
1
y n +1 = y n + ( K 1 + ( 2 − 2) K 2 + ( 2 + 2) K 3 + K 4 ) (33)
6
K1 = h f ( t n , y n ),
⎛ h K ⎞
K 2 = h f ⎜ t n + , y n + 1 ⎟,
⎝ 2 2 ⎠
⎛ ⎛ 2 − 1⎞ ⎛ ⎞ ⎞
h
K3 = h f ⎜ t n + , yn + ⎜ ⎟ K1 + ⎜ 2 − 2 ⎟ K 2 ⎟ ,
⎜ 2 ⎜ 2 ⎟ ⎜ 2 ⎟ ⎟
⎝ ⎝ ⎠ ⎝ ⎠ ⎠
⎛ 2 ⎛ 2 ⎞⎟ ⎞⎟
K 4 = h f ⎜ t n + h, y n − K 2 + ⎜1 + K ,
⎜ 2 ⎜ 2 ⎟ 3⎟
⎝ ⎝ ⎠ ⎠
The Runge-Kutta-Gill method is also used widely. But in this unit, we shall mostly work out problems with the classical R-K method
of O(h4). Hence, whenever we refer to R-K method of O(h4) we mean only the classical R-K method of O(h4) given by (32). We
shall now illustrate this method through examples.
Example 2 : Solve the IVP y′ = t+y, y(0) = 1 by Runge-Kutta method of O(h4) for t ∈ (0,0.5) with h = 0.1. Also find the error at t
= 0.5, if the exact solution is y(t) = 2e t − t − 1.
Solution : We use the R-K method of O(h4) given by (32).
Initially, t0 =, t0 = 1
We have
70
using Runge-Kutta
K1 = hf ( t 0 , y 0 ) = (0.1) [0 + 1]= 0.1 Methods
⎛ h K ⎞
K 2 = hf ⎜ t 0 + , y 0 + 1 ⎟ = (0.1) [0.05 + 1 + 0.05]= 0.11
⎝ 2 2 ⎠
⎛ h K ⎞
K 3 = hf ⎜ t 0 + , y 0 + 2 ⎟ = (0.1) [0.05 + 1 + 0.055]= 0.1105
⎝ 2 2 ⎠
K 4 = hf (t 0 + h , y 0 + K 3 ) = (0.1) [0.1 + 1 + 0.1105]= 0.12105
1
y1 = y 0 + ( K1 + 2K 2 + 2K 3 + K 4 )
6
1
=1+ [1 + 0.22 + 0.2210 + 0.12105] = 1.11034167
6
Taking t1 = 0.1 and y1 = 1.11034167, we repeat the process.
K1 = hf(t1, y1) = (0.1) [0.1 + 1.11034167] = 0.121034167
⎛ h K1 ⎞
K2 = hf ⎜ t1 + , y1 + ⎟ = (0.1)
⎝ 2 2 ⎠
⎡ (0.121034167 )⎤ = 0.132085875
⎢0.1 + 0.05 + 1.1103416 + 2 ⎥
⎣ ⎦
⎛ h K2 ⎞
K3 = hf ⎜ t1 + , y1 + ⎟ = (0.1)
⎝ 2 2 ⎠
⎡ (0.132085875 )⎤ = 0.132638461
⎢0.1 + 0.05 + 1.1103417 + 2 ⎥
⎣ ⎦
K4 = hf (t1 + h, y1 + K3) = (0.1) [0.1 + 0.05 + 1.11034167 +

(0.132085875)
2
= 0.144303013
1
y 2 = y1 + (K1 + 2K2 + 2K3 + K4)
6
1
= 1.11034167 + [(0.121034167 + 2(0.132085875) + 2(0.132638461)
6
+ 0.1443030131 = 1.24280514
Rest of the values y3, y4, y5 we give in Table 2.
Table 2
tn yn
0.0 1
0.1 1.11034167
0.2 1.24280514
0.3 1.39971699
0.4 1.58364848
0.5 1.79744128
Now the exact solution is
y( t ) = 2e t − t − 1
Error at t = 0.5 is
71
y(0.5) − y 5 = ( 2e 0.5 − 0.5 − 1) − 1.79 / 44128
=1.79744254 − 1.79744128
= 0.000001261
= 0.13× 10 − 5.
Let us consider another example
Example 3 : Solve the IVP
y' = 2 y + 3e t , y(0) = 0 u sin g

a ) classical R − K method of O(h 4 )
b) R − K Gill method of O(h 4 ),
Find y (0.1), y(0.2), y(0.3) taking h = 0.1. Also find the errors at t = 0.3, if the exact solution is y(t) = 3(e 2 t − e t ).
Solution: a) Classical R-K method is
1
yn+1 = yn + (K1 +2K2 +2K3 + K4)
6
Here t0 = 0, y0 = 0, h = 0.1
K1 = h f(t0, y0) = 0.3

⎛ h K1 ⎞
K2 = h f ⎜⎜ t 0 + , y 0 + ⎟ = 0.3453813289
⎝ 2 2 ⎟⎠
⎛ h K2 ⎞
K3 = h f ⎜ t 0 + , y 0 + ⎟ = 0.3499194618
⎝ 2 2 ⎠
K4 = h f (t0 + h, y0 + K3) = 0.4015351678
y1 = 0.3486894582
Taking t 1 = 0.1, y 1 = 0.3486894582, we repeat the process and obtain
K1= 0.4012891671 K2 = 0.4584170812

K3 = 0.4641298726 K4 = 0.6887058455
Y(0.2) = 0.8112570941
Taking t2 = 0.2, y2 = 0.837870944 and repeating the process we get
K1= 0.53399502 K2 = 0.579481565

K3 = 0.61072997 K4 = 0.694677825
∴ y (0.3) = 1.416807999
b) R-K gill method is
y n +1 = y n +
1
6
( )
K 1 + (2 + (2 − 2 K 2 + (2 + 2 ) K 3 + K 4 )
Taking t0 = 0, y0 = 1 and h = 0.1, we obtain
K1= 0.3 K2 = 0.3453813289

K3 = 0.3480397056 K4 = 0.4015351678
y (0.1) = 0.3486894582
Taking t1 = 0.1, y1 = 0.3486894582, we obtain
72
using Runge-Kutta
K1= 0.4012891671 K2 = 0.4584170812 Methods
K3 = 0.4617635569 K4 = 0.5289846936
y (0.2) = 0.8112507529
Taking t2 = 0.2, y2 = 0.8112507529, we obtain
K1= 0.528670978 K2 = 0.6003248734

K3 = 0.6045222614 K4 = 0.6887058455
y (0.3) = 1.416751936
From the exact solution we get
y(0.3) = 1.416779978
Error in classical R-K method (at t = 0.3) = 02802 ×10 −04
Error in R-K Gill method (at t = 0.3) = 0.2804 ×10 −04

You may now try the following exercises
Solve the following IVPs using R-K method of O(h4)
y−t
E5) y' = , y(0) =1. Find y(0.5) taking h = 0.5.
y+t
E6) y ′ 1─ 2ty, y(0.2) = 0.1948. Find y(0.4) taking h = 0.2.
E7) 10ty '+ y 2 = 0, y(4) = 1. Find y(4.2) taking h = 0.2. Find the error give n the
1
exact solution is y( t ) = where c = 0.86137
c + 0.1in t '
1 y
E8) y'= 2
− − y 2 , y(1) = − 1. Find y (1.3) taking h = 0.1. Given the exact solution
t t
1
to be y(t) = find the error at t = 1.3.
t'
We now end this unit by giving a summary of what we have covered in it.
4.3 SUMMARY
In this unit we have learnt the following :
1) Runge-Kutta methods being singlestep methods are self-starting methods.
2) Unlike Taylor series methods, R-K methods do not need calculation of higher order derivatives of f(t, y) but need only the
evaluation of f(t, y) at the off-step points.
3) For given IVP of the form
y ' = f ( t , y) , y' ( t 0 ) = y 0 , tε [t 0 , b]
where the mesh points are t j = t 0 + jh , j = 0,1, ...., n.
t n = b = t 0 + nh , R − K methods are obtained by writing
y n +1 = y n + h (weighted sum of the slopes)
73
m
= yn + ∑W K
i =1
i i
where in slopes are used. These slopes are defined by
⎡ i −1 ⎤
K i = f ⎢ t n + C i h,
⎢⎣
∑a i =1
ij k j ⎥ , i = 1, 2, ..............., m, C1 = 0.
⎥⎦
Here is the order of the method. The unknowns Ci, aij and Wj are then obtained by expanding Ki’s and yn+1 in Taylor series about the
point (tn, yn) and comparing the coefficients of different powers of h.
1
E1) Heun’s method : y n +1 = y n + ( K1 + K 2 )
2
Starting with t 0 = 0, y 0 = 1, h = 0.1
∴ K1 = 0.01
K2 = 0.010301
y(0.1) = 1.0101505
Taking t1 = 0.1, y1 = 1.0101505
K1 = 0.0103040403
K2 = 0.0181327468
y(0.2) = 1.020709158
1
Optimal R-K, method : y n +1 = y n + (K1 + 3K 2 )
4
t 0 = 0, y 0 = 1, h = 0.1
K1 = 0.01, K 2 = 0.01017823
y(0.1) =1.010133673
t1 = 0.1, y1 =1.010133673
K i = 0.0103037, K 2 = 0.010620.
y(0.2) =1.020675142
E2) Heun’s method:
K1 = 0.2, K2 = 0.208
y(0.2) = 0.204
K1 = 0.2083232, K2 = 0.2340020843
y(0.4) = 0.4251626422
Optimal R-K, method:

K1 = 0.2, K2 = 0.2035556
y (0.2) = 0.2026667
K1 = 0.2082148, K2 = 0.223321245
y(0.4) = 0.42221134
Taylor Series method

y ′ = 1 + y2, y ′′ = 2yy ′
y(0) = 0, y ′ (0) = 1, y ′′ (0) = 0
y(0.2) = 0.2
y ′ (0.2) = 1.04, y ′′ (0.2) = 0.416
y (0.4) = 0.41632
Now the exact solution is y(t) = tan t

74
using Runge-Kutta
Exact y (0.4) = 0.422793219 Methods
Error in Heun’s method = 0.422793219
Error in Optimal R-K method = 0.236 ×10 −2

Error in Optimal R-K method = 0.582 ×10 −3
Error in Taylor series method = 0.647 ×10−2
E3) Heun’s method:
K1 = 0.05, K2 = 0.0825
y(0.1) = 1.06625
K1 = 0.0833125
y(0.2) = 1.166645313
Optimal R-K, method
K1 = 005, K2 = 0.071666667
y(0.2) = 1.166645313
Exact y(0.2) = 1.167221935
Error in both the methods is same and = 0.577 x 10-3
1
E4) Heun’s method : y n +1 = y n + (K1 + 3K 3 )
4
Starting with t0 = 0, y0 = 2, h = 0.1, we have
K i = 0.2, K 2 = 0.203334, K 3 = 0.206889

y(0.1) = 2.205167
t1 = 0.1, y1 = 2.205167 we have
K1 = 0.210517, K 2 = 0.214201, K 3 = 0.218130
y(0.2) = 2.421393717
1
Optimal R − K method : y n +1 = y n + (2K1 + 3K 2 + 4K 3 )
9
K1 = 0.2, K 2 = 0.205, K 3 = 0.207875

y(0.1) = 2.205167
t1 = 0.1, y1 = 2.205167
K1 = 0.2105167, K 2 = 0.2160425, K 3 = 0.219220
y(0.2) = 2.421393717
exact y(0.2) = 2.421402758
Since y(0.2) is same by both the mehods
Error = 0.9041 × 10 − 5 in both the methods at t = 0.2.
K1 = 0.5, K 2 = 0.333333
E5) K 3 = 0.3235294118, K 4 = 0.2258064516
y(0.5) =1.33992199.
E6) K1 = 0.184416, K2 = 0.16555904 75

K3 = 0.1666904576, K4 = 0.1421615268
y (0.4) = 0.3599794203.
E7) K1 = -0.005, K2 = -0.004853689024

K3 = -0.0048544, K4 = -0.004715784587
y (4.2) = 0.9951446726.
Exact y (4.2) = 0.995145231, Error = 0.559 × 10 −6
E8) K1 0.1, K2 = 0.09092913832

K3 = 0.9049729525, K4 = 0.8260717517
y(1.1) = ─ 0.909089993
K1 = 0.08264471138, K2 = 0.07577035491
K3 = 0.07547152415, K4 = 0.06942067502
y(1.2) = ─ 0.8333318022
K1 = 0.06944457204, K2 = 0.06411104536
K3 = 0.06389773475, K4 = 0.0591559551
y(1.3) = ─ 0.7692307692
Exact y(1.3) = ─ 0.7692287876
Error = 0.19816 x 10-5
76
Linear System of
Algebraic Equations &
Polynomial UNIT 2 INTERPOLATION
Interpolation
Structure Page Nos.

2.0 Introduction 24
2.1 Objectives 25
2.2 Lagrange’s Form 25
2.2.1 Problem of Interpolation
2.2.2 Lagrange’s Form of Interpolating Polynomial
2.2.3 Inverse Interpolation
2.2.4 General Error Term
2.3 Newton Form of the Interpolating Polynomial 31
2.4 Interpolation at Equally Spaced Points 37
2.4.1 Differences – Forward and Backward Differences
2.4.2 Newton’s Forward-Difference and Backward-Difference Formulas
2.5 Summary 43
2.0 INTRODUCTION
The interpolation has been defined as the art of reading between the lines of a table,
and in elementary mathematics the term usually denotes the process of computing
intermediate values of a function from a set of given values of that function. Suppose
the value of the function f(x) (instead of the analytical formula representing the
function) is tabulated at a discrete set of values of the argument x at x0, x1, …, xn x0 ≤
x1 ≤ … ≤ xn. If the value of f(x) is to be found at some point ξ in the interval [xo, xn]
and ξ is not one of the nodes xi, then value is estimated by using the known values of
f(x) at the surrounding points. Such estimates of f(x) can be made using a function
that fits the given data. If the point ξ is outside the interval [xo, xn], then the
estimation of f( ξ ) is called extrapolation.
The general problem of interpolation, however, is much more complex than this. In
higher mathematics we often have to deal with functions whose analytical form is
either totally unknown or else is of such a nature, complicated or otherwise, that the
function cannot easily be subjected to certain operations like differentiation and
integration etc. In either case, it is desirable to replace the given function by another
which can be more readily handled.
We derive various forms of the interpolating polynomial. Polynomials are used as the
basic means of approximation in nearly all areas of numerical analysis. We have
divided our discussion on polynomial interpolation into 3 sections. First we discuss
Lagrange form of interpolating polynomial for unequally spaced nodes. Also general
expression for the error (truncation) of polynomial interpolation is obtained which
provides the estimates of the error in polynomial approximation. In next section, we
deal with another very useful form of interpolating polynomial called the Newton
form of interpolating polynomial. Also we obtain another expression for the error
term in term of divided difference.
Finally we deal with some useful forms of interpolating polynomial for equally spaced
nodes like Newton’s forward difference form Newton’s backward difference form
after introducing the concepts of forward and backward differences.
24
Interpolation
2.1 OBJECTIVES
After going through this unit, you will be able to:
• find the Lagrange’s form and Newton’s divided difference form of

interpolating polynomials interpolating f(x) at n + 1 distinct nodal points;
• compute the approximate value of f a non-tabular point;
• compute the value of x (approximately) given a number y such that f( x ) = y
(inverse interpolation);
• compute the error committed in interpolation, if the function is known, at a non-
tabular point of interest;
• find an upper bound in the magnitude of the error;
• form a table of divided differences and find divided differences with a given set
of arguments from the table;
• write a forward (backward) difference in terms of function values from a table
of forward (backward) differences and locate a difference of given order at a
given point from the table; and
• obtain the interpolating polynomial of f(x) for a given data by Newton’s
forward (backward) difference formula.
2.2 LAGRANGE’S FORM

2.2.1 Problem of Interpolation
We are here concerned with a real-valued function f(x) defined on the interval [a, b]
such that the analytical formula representing the function is unknown, but the values
of the function f(x) are given for a given set of n + 1 distinct values of x = x0, x1, x2,
…, xn where x0 < x1 < x2, …, < xn lying in the interval [a, b]. We denote f(xk) by fk.
The technique of determining an approximate value of f(x) for a non-tabular value of
x which lies in the interval is called interpolation. The process of determining the
value of f(x) for a value of x lying outside the interval [a, b] is called extrapolation.
However, we shall assume that f(x) is defined in (─ ∞ , ∞ ) in which it is continuously
differentiable a sufficient number of times.
2.2.2 Lagrange’s Form of Interpolating Polynomial

In this section, we derive a polynomial P(x) of degree ≤ n which agrees with values
of f at the given (n + 1) distinct points, called the nodes or abscissas. In other words,
we can find a polynomial P(x) such that P(xi) = fi, i = 0, 1, 2, …, n. Such a
polynomial P(x) is called the interpolating polynomial of f(x).
First we prove the existence of an interpolating polynomial by actually constructing

one such polynomial having the desired property. The uniqueness of the interpolating
polynomial is proved by invoking the corollary of the fundamental theorem of
algebra. Then we derive a general expression for error in approximating the function
by the interpolating polynomial at a point and this allows us to calculate a bound on
the error over an interval.
In polynomial interpolation the approximating function is taken to be a polynomial

Pn (x) of degree ≤ n such that
Pn ( xi ) = f ( xi ) i = 0,1,2 …..n (2.2.1)

Let Pn (x) = a n x n + a n −1 x n −1 + ... + a 1 x x + a 0 . Then, by conditions (4.3.1) we have
25
Linear System of
Algebraic Equations & a n x in + a n −1 x in −1 + ..... + a 1 x i + a 0 = f ( x i ), i = 0, 1, 2…, n.
Polynomial
Interpolation
This is a system of n + 1 linear equations in n + 1 unknowns a0, a1, …, an. Since the
determinant of the co-efficients
1 x0 … x 0n
1 x1 … x1n = ∏i> j
( xi − x j ) ≠ 0
… …
1 xn … x nn
as xo, x1…….xn are distinct points/nodes, the values of a0, a1, …, an can be uniquely
determined so that the polynomial Pn(x) exists and is unique. But this does not give
us the explicit form of Pn(x). Hence in the following we first show the existence of
Pn(x) by constructing one such polynomial and then prove its uniqueness.
Theorem 1: Let x0, x1, …, xn be n + 1 distinct points on the real line and let f(x) be
a real-valued function defined on some interval I = [a, b] containing these points.
Then, there exists exactly one polynomial Pn(x) of degree ≤ n, which interpolates f(x)
at xo, x1, …, xn, that is, Pn ( xi ) = f ( xi ) = f i i = 0, 1, 2, …, n.
Proof: Consider the problem of determining a polynomial of degree 1 that passes

through the distinct points ( x0 y 0 ) and ( x1, f 1 ) . That is, we are approximating the
function f by means of a first degree polynomial interpolating f(x) at x = x0 and x1.
Consider the polynomial

(x − x1 ) (x − x0 )
P1 (x) = f0 + f1
(x0 − x1 ) (x1 − x0 )
Then P1(x0) = f0 and P1(x1) = f1. Hence P1(x) has the required properties. Let
( x − x1 ) ( x − x0 )
L0(x) = and L1(x) = , then P1(x) = L0(x) f0 + L1(x) f1.
( x0 − x1 ) ( x1 − x0 )
Also we note that L0 ( x 0 ) = 1, L1 ( x0 ) = 0, L0 ( x1 ) = 0 and L1(x1) = 1.
For the general case, let the required polynomial be written as
Pn ( x) = L0 ( x) f 0 + L1 ( x) f 1 + ... + Ln ( x) f n (2.2.2)
n
= ∑ Li ( x) f i
i =0
Setting x = xj, we get

n
fj = Pn ( x j ) = ∑ L (x
i =0
i j ) fi
Since the polynomial fits the data exactly,

we must have
1 i= j
Li ( x j ) = 
0 i≠ j
The polynomial Li(x) which are of degree ≤ n are called the Lagrange fundamental
polynomials. It is easily verified that these polynomials are given by
26
Interpolation
( x − x0 )( x − x1 )...( x − xi −1 )( x − xi +1 )...( x − x n )
Li ( x) =
( xi − x 0 )( xi − x1 )...( xi − xi −1 )( xi − xi +1 ).....( xi − x n )
n n
= Π (x − x j ) Π ( xi − x j ) , i = 0, 1, 2, …, n. (2.2.3)
j =0 j =0
j ≠i j ≠i
Substitution of (4.3.2) in (4.3.1) gives the required Lagrange form of the interpolating
polynomial. The uniqueness of the interpolating polynomial can be established easily
with the help of following results.
Lemma 1. If x1, x2 , …, xk are distinct zeros of the polynomial P(x), then

P(x) = (x – x1) (x – x2) … (x – xk) R(x)
for some polynomial R(x)
Corollary. If Pk(x) and Qk(x) are two polynomials of degree ≤ k which agree at the
k +1 distinct points then Pk(x) = Qk(x) identically.
Example 1: Find the Lagrange’s interpolating polynomial for the following data:
X 1 1 1
4 3
f(x) –1 2 7
Solution:
 1
 x − (x − 1)
3  1
L0 (x) =  = 16  x − (x − 1)
 1 1  1   3
 −  − 1
 4 3  4 
 1
 x − (x − 1)
4  1
L1 (x) =  = −18 x − (x − 1)
 1 1  1   4 
 −  − 1
 3 4  3 
 1  1
 x −  x − 
L2 (x) = 
3  4  = 2 x − 1  x − 1 
  
 1  1  3  4
 1 −  1 − 
 3  4
Hence P2 (x) = L0 (x) (-1) + L1 (x) (2) + L2 (x) (7)
 1  1  1  1
P 2 (x) = −16  x − (x − 1) − 36  x − (x − 1) + 14 x −  x − 
 3  4  3  4
Example 2: If f(1) = –3, f(3) = 9, f(4) = 30 and f(6) = 132, find the Lagrange’s
interpolation polynomial of f(x). Also find the value of f when x = 5.
Solution: We have x0 = 1, x1 = 3, x2 = 4, x3 = 6 and

f0 = –3, f1 = 9, f2 = 30, f3 = 132.
The Lagrange’s interpolating polynomial P3(x) is given by
P3(x) = L0(x) f 0 + L1 ( x) f 1 + L 2 ( x) f 2 + L3 ( x) f 3 (2.2.4)
27
Linear System of
Algebraic Equations & Where
Polynomial ( x − 3)( x − 4)( x − 6) 1
Interpolation L0 ( x) = = − ( x 3 − 13 x 2 + 54 x − 72)
(1 − 3)(1 − 4)(1 − 6) 30
( x − 1)( x − 4)( x − 6) 1 3
L1 ( x) = = ( x − 11x 2 + 34 x − 24)
(3 − 1)(3 − 4)(3 − 6) 6
( x − 1)( x − 3)( x − 6) 1
L 2 ( x) = = − ( x 3 − 10 x 2 + 27 x − 18)
(4 − 1)(4 − 3)(4 − 6) 6
( x − 1)( x − 3)( x − 4) 1 3
L3 ( x ) = = ( x − 8 x n + 19 x − 12)
(6 − 1)(6 − 3)(6 − 4) 30
Substituting these in (2.2.4) we have:

1 1
P3(x) = − (x 3 − 13x 2 + 54x − 72)( −3) + (x 3 − 11x 2 + 34x − 24)
30 6
1 3
×(9) − (x − 10x 2 + 27x − 18)(30)
6
1
+ (x 3 − 8x 2 + 19x − 12(132)
30
which gives on simplification
P3(x) = x3 – 3x2 + 5x – 6.
f(5) ≈ P3(5) =(5)3 − 3(5)2 + 5 × 5 −6 = 125 − 75 + 25 −6 = 69
You may now solve the following exercises
E1) Prove the uniqueness of the interpolating polynomial using corollary of

Lemma 1.
E2) Find Lagrange’s interpolating polynomial for the data. Hence obtain f(2).
x 0 1 4 5
f(x) 8 11 68 123
E3) Using the Lagrange’s interpolation formula, find the value of y when x=10
x 5 6 9 11
f(x) 12 13 14 16
2.2.3 Inverse Interpolation

In inverse interpolation in a table of values of x find y = f(x), one is given a number
y and wishes to find the point x so that f (x ) = y , where f(x) is the tabulated
function. For this, we naturally assume that the function y = f(x) has a unique inverse
x = g(y) in the range of the table. The problem of inverse interpolation simply reduces
to interpolation with x-row and y-row in the given table interchanged so that the
interpolating points now become y0, y1, …, yn (same as f0, f1, …, fn i.e. f(xi) = fi = yi)
and the corresponding function values are x0, x1, …, xn where the function is x = g(y).
Since the points y0, y1, …, yn are invariably unequally spaced, this interpolation can
be done by Lagrange’s form of interpolation (also by Newton’s divided difference
form discussed later). By Lagrange’s formula
n n (y − y j )
Pn(y) = ∑x
i =0
i
j =0 (yi − y j )
Π and
j ≠i
28
x ≈ Pn ( y ) . This process is called inverse interpolation Interpolation
Example 3: Find the value of x when y = 3 from the following table of values
x 4 7 10 12
y –1 1 2 4
Solution: The Lagrange’s interpolation polynomial of x is given by
(y − 1)(y − 2)(y − 4) (y + 1)(y − 2)(y − 4)

P(y) = (4) + (7)
( −2)( −3)( −5) (2)(1)( −3)
(y + 1)(y − 1)(y − 4) (y + 1)(y − 1)(y − 2)

+ (10) + (12)
(3)(1)( −2) (5)(3)(2)
(2)(1)( −1) (4)(1)( −1)

So P(3) = ⋅ (4) + ⋅ (7)
− (2)(3)(5) (2)(3)
(4)(2)( −1) (4)(2)(1)

+ (10) + (12)
− (3)(2) (5)(3)(2)
4 14 40 48 182
= − + + = = 12.1333
15 3 3 15 15
∴ x(3) ≈ P(3) = 12.1333.

E4) Using Lagranges interpolation formula, find the value of f(4) from the
following data:
x 1 3 7 13
f(x) 2 5 12 20
E5) From the following table, find the Lagrange’s interpolating polynomial, which
agrees with the values of x at the given value of y. Hence find the value of x
when y = 2.
x 1 19 49 101
y 1 3 4 5
E6) Using the inverse Lagrange interpolation, find the value of x when y = 3 from
the following table of values:
x 36 54 72 144
y –2 1 2 4
2.2.4 General Error Term

The next step is to estimate/calculate a remainder term or bound for the error involved
in approximating a function by an interpolating polynomial.
Let En(x) = f(x) – Pn(x) be the error involved in approximating the function f(x) by an
interpolating polynomial. We derive an expression for En(x) in the following
29
Linear System of
Algebraic Equations & theorem. We shall just indicate the proof. This result helps us in estimating a useful
Polynomial bound on the error as explained through an example.
Interpolation
Theorem 2: Let x0, x1, ..., xn be distinct numbers in the interval [a, b] and f has
(continuous) derivatives up to order (n + 1) in the open interval (a, b). If Pn(x) is the
interpolating polynomial of degree ≤ n, which interpolating f(x) at the points x0, x1,…,
xn, then for each x ∈ [a, b], a number ξ (x) in (a, b) exists such that
f ( n +1) (ξ ( x) )
En(x) = f(x) – Pn(x) = ( x − x 0 )( x − x1 )...( x − x n ) .
(n + 1)!
Proof (Indication): If x ≠ x k for any k = 0, 1, 2, …, n, define the function g for t in
n (t − x )
j
[a, b] by g(t) = f(t) – Pn(t) – [f(x) – Pn(x)] Π .
j = 0 (x − x )
j
g(t) has continuous derivatives up to (n + 1) order. Now, for k = 0, 1, 2, …, n, we
have g(xk) = 0 and g(x) = 0.
Thus, g has continuous derivatives up to order (n + 1) and g vanishes at the (n + 2)

distinct points x, x0, x1, …, xn. By generalized Rolle’s Theorem stated below, there
exists ξ (ξ(x) ) in (a, b) for which g(n+1) (ξ ) = 0. That is
[ f ( x) − Pn ( x)]
0 = g(n+1) (ξ ) = f(n+1)( ξ ) – (n+1)! n
∏ (x − x )
i =0
i
where the differentiation is with respect to t.
Simplifying above we get the error at x = x
En ( x ) = f ( x ) – P n ( x ) =
f ( n +1)
(ξ(x )) n
(n + 1)!
∏ (x − x )
i =0
i (2.2.5)
By repeated application of Rolle’s Theorem the following theorem can be proved.
Theorem 3 (Generalized Rolle’s Theorem):

Let f be a real-valued function defined on [a, b] which is n times differentiable on
(a, b). If f vanishes at the (n + 1) distinct prints x0, x1, …, xn, in [a, b] then a number
c in (a, b) exists such that f(n) (c) = 0.
The error formula derived above, is an important theoretical result and error formula
and interpolating polynomial will be used in deriving important formula for numerical
differentiation and numerical integration
It is to be noted that ξ = ξ (x ) depends on the point x at which error estimate is

required. This error formula is of limited utility since f(n+1) (x) is not known (when we
are given a set of data at specific nodes) and the point ξ is hardly known. But the
formula can be used to obtain a bound on the error of interpolating polynomial as
illustrated below:
Example 4: The following table given the values of f(x) = ex, .1 ≤ x ≤ 2. If we fit an
interpolating polynomial of degree four to the data given below, find the magnitude of
the maximum possible error in the computed value of f(x) when
x = 1.25.
x 1.2 1.3 1.4 1.5 1.6

f(x) 3.3201 3.6692 4.0552 4.4817 4.9530
30
Solution: From Equation, the magnitude of the error associated with the 4th degree Interpolation
polynomial approximation is given by
f (5) (ξξ
E4 (x) = (x − x0 )(x − x1 )(x − x2 )(x − x3 )(x − x4 )
5!
(5 )
f (ξξ
= (x − 1.2)(x − 1.3)(x − 1.4)(x − 1.5)(x − 1.6)
5!
f (5 )(x) = e x and Max f (5) (x) = e1.6 = 4.9530

1.2 ≤ x ≤1.6
Since f(x) = ex is increasing in [1.2, 1.6]. Hence
4.9530
E4 (1.25) ≤ (1.25 − 1.2)(1.25 − 1.3)(1.25 − 1.4)(1.25 − 1.5) × = 0.00000135
120
When nodes are equipspaced, we shall get another bound later.
2.3 NEWTON FORM OF THE INTERPOLATING

POLYNOMIAL
The Lagrange’s form of the interpolating polynomial discussed in previous section
has certain drawbacks. One generally calculates a linear polynomial P1(x), a quadratic
polynomial P2(x) etc. by increasing the number of interpolating points, until a
satisfactory approximation Pk(x) to f(x) has been found. In such a situation, Lagrange
form does not take any advantage of the availability of Pk-1(x) in calculating Pk(x). In
Newton form, this advantage is taken care of.
Before deriving Newton’s general form of interpolating polynomial, we introduce the

concept of divided difference and the tabular representation of divided differences.
Also the error term of the interpolating polynomial in this case is derived in terms of
divided differences. Using the two different expressions for the error term we
establish a relationship between nth order divided difference and the nth order
derivative.
Suppose we have determined a polynomial Pk-1(x) of degree ≤ k – 1 which interpolates

f(x) at the points x0, x1,…, xk–1. In order to make use of Pk–1(x) in calculating Pk(x) we
consider the following problem. What function g(x) should be added to Pk–1(x) to get
Pk(x)? Let g(x) = Pk(x) – Pk–1(x). Now, g(x) is a polynomial of degree ≤ k and
g ( xi ) = Pk ( xi ) − Pk −1 ( xi ) = f ( xi ) − f ( xi ) = 0 for = 0, 1, …, k – 1.
Hence g(x) can be written as A k ( x − x 0 ).....( x − x k −1 )

Where Ak is a constant depending on x0, x1, …, xk–1.
Suppose that Pn (x) is the Lagrange polynomial of degree at most n that agrees with
the function f at the distinct numbers x0, x1, …, xn. The divided difference of f with
respect to x0, x1, …, xn can be obtained by proving that Pn has the representation,
called Newton form.
Pn ( x ) = A 0 + A1 ( x − x 0 ) + A 2 ( x − x 0 )( x − x1 ) + ......
+ A n ( x − x 0 )......( x − x n −1 ) (2.3.1)
for appropriate constants A0, A1,…..An.
31
Linear System of
Algebraic Equations & Evaluating Pn(x) [Equation (4.4.1)] at xo we get A0 = Pn(x0) = f(x0).
Polynomial f(x1 ) − f(x1 )
Interpolation Similarly when Pn(x) is evaluated at x1, we get A1 = . Let us introduce
x1 − x0
the notation for divided differences and define it at this stage. The divided difference
of the function f, with respect to xi, is denoted by f[xi] and is simply the evaluation f at
xi, that is, f[xi]=f(xi). The first divided difference of f with respect to xi and xi+1 is
denoted by f[xi, xi+1] and defined as
f[xi +1 ] − f[xi ] f(xi +1 ) − f(xi )
f[xi, xi+1] = =
xi +1 − xi xi +1 − xi
The remaining divided differences of higher orders are defined inductively as follows.
The kth divided differences relative to xi, xi+1, …, xi+k is defined as follows:
f[xi +1,... xi + k ] − f[xi, .....xi + k −1 ]
f[xi, xi +1, .., xi + k ] =
xi + k − xi
where the (k –1)st divided differences f[xi,…, xi+k–1] and f[xi+1,…, xi+k] have been
determined. This shows that the kth divided difference is the divided differences of
(k –1)st divided differences justifying the name. It can be shown that the divided
difference f[x0, x1–1, xk] is invariant under all permutations of argument x0, x1,…, xk.
The constant A2, A3, …, An can be consecutively obtained in similar manner like the
evaluation of A0 and A1. As shown in the evaluation of A0 and A1 , the required
constants Ak=f[x0, x1,…xk].
This shows that Pn(x) can be constructed step by step with the addition of the next
term in Equation (4.4.1), as one constructs the sequence P0(x), P1(x) with Pk(x)
obtained from Pk −1(x) in the form
Pk (x) = Pk −1 (x) + Ak (x − x0 ).....(x − xk −1 ) (2.3.2)
That is, g(x) is a polynomial of degree ≤ k having (at least) k distinct zeros x0,…..xk–1.
This constant Ak is called the kth divided difference of f(x) at the points x0, x1, …, xk
and is denoted by f[x0, x1…, xk]. This coefficient depends only on the values of f(x) at
the points x0, x1, …, xk. The following expressions can be proved for f[x0, x1, …..xk].
k
f(xi )
f[x0, , x1 ....., xk ] = ∑
i =0 (xi − x0 )...(xi − xi −1 )....(xi − xi +1 )...(xi − xk )
(2.3.3)
and
f [ x1 ,.....x k ] − f [ x 0 ,.....x k − l ]
f [ x 0 , x1 ,.....x k ] = (2.3.4)
xk − x0
Equation (4.4.2) can now be rewritten as
Pk ( x) = Pk −1 ( x) + f [ x0, x1 ,..., x k ]( x − x 0 ),..., ( x − x k −1 ) (2.3.5)
Using Equation (4.4.5), Equation (4.4.1) can be rewritten as

Pk ( x) = f [ x0 ] + ( x − x0 ) f [ x0 , x1 ] + ( x − x0 )( x − x) ) f [ x0 , x1 , x 2 ]
+ ... + ( x − x0 )( x − x1 )...( x − x n −1 ) f [ x0 ,..., x n ] (2.3.6)
This can be written compactly as follows

n i −1
Pn ( x) = ∑ f [ x 0, x1 ...xi ]∏ ( x − x j ) (2.3.7)
i =0 j =0
This is Newton’s form of interpolating polynomial or Newton’s interpolatory divided-

difference formula.
32
Methods for determining the explicit representation of an interpolating polynomial Interpolation
from tabulated data are known as divided-difference methods. These methods can be
used to derive techniques for approximating the derivatives and integrals of functions,
as well as for (approximating the solutions) to differential equations.
2.3.1 Table of Divided Differences

Suppose we denote, for convenience, a first order divided difference of f(x) with any
two arguments by f[., .], a second order divided difference with any three arguments
by f[., ., .] and so on. Then the table of divided difference can be written as follows:
Table 1
X f[.] f[., .] f[., ., .] f[[., ., ., .] f[[., ., ., .,.]
x0 f0
f[x0, x1]
x1 f1 f[x0, x1, x2]
f[x1, x2] f[x0, x1, x2, x3]
x2 f2 f[x1, x2, x3] f[x0, x1, x2, x3, x4]
f[x2, x3] f[x1, x2, x3, x4]
x3 f3 f[x2, x3, x4]
f[x3, x4]
x4 f4
Example: If f(x) = x3, find the value of f[a ,b, c].
f(b) − f(a) b 3 − a 3
Solution: f[a, b] = = = b 2 + ba + a 2
b−a b−a
Similarly f[b, c] = b2 + bc + c2
f[b,c] − f[a,b]
Hence f[a, b, c] =
c−a
b + bc + c 2 − b 2 − ba − a 2
2
=
c−a
(c − a)(c + a + b)
=
(c − a)
=a+b+c
1 1
E7) If f(x)= , show that f[a, b, c, d] = – .
x abcd
E8) Using divided difference show that the following data
x 1 2 3 5 6
f(x) 1 3 7 21 31
represents a second degree polynomial. Obtain this polynomial. Hence, find

the approximate value of f(4).
Example 5: Form the following table of values, find the Newton’s form of
interpolating polynomial approximating f(x).
x –1 0 3 6 7
f(x) 3 –6 39 822 1611
33
Linear System of
Algebraic Equations & Also find the approximate value of the function f(x) at x = 2.
Polynomial
Interpolation Solution: We note that the value of x, that is xi are not equally spaced. To find the
desired polynomial, we form the table of divided differences of f(x)
Table 2
x f[.] f[., .] f[., ., .] f[[., ., ., .] f[[., ., ., .,.]
x0 –1 3
─9
x1 0 –6 6
15 5
x2 3 39 41 1
261 13
x3 6 822 132
789
x4 7 1611
Newton’s interpolating polynomial P4(x) is given by

P4 ( x ) = f ( x 0 ) + ( x − x 0 )f [ x 0 , x1 ] + ( x − x 0 )( x − x1 )f [ x 0 , x1 , x 2 ]
+(x – x0)(x – x1)(x – x2)f[x0, x1 ,x2, x3] +
(x – x0)(x – x1) (x – x2) (x – x3)f[x0, x1, x2, x3] (2.3.8)
The divided differences f[x0], f[x0, x1], f[x1, x1, x2], f[x0, x1, x2, x3], f[x0, x1, x2, x3] are
those which lie along the diagonal at f(x0) as shown by the dotted line. Substituting
the values of xi and the values of the divided differences in Equation (4.4.8), we get
P4(x) = 3 + (x + 1)( –9) + (x + 1)(x – 0)(6) + (x + 1)(x – 0)(x –3)(5)

+(x +1)(x– 0)(x – 3)(x – 6)(1)
This on simplification gives
P4(x) = x4 – 3x3 + 5x2 – 6

∴ f ( x) ≈ P4 ( x)
f (2) ≈ P4 (2) = 16 − 24 + 20 − 6 = 6
You may now solve the following exercises.
E9) From the table of values given below, obtain the value of y when x = 1 using
(a) divided difference interpolation formula.
(b) Lagrange’s interpolation formula
x 0 2 3 4
f(x) –4 6 26 64
We now give another expression for the error term, that is, the error committed in
approximating f(x) by Pn(x).
Let Pn(x) be the Newton form of interpolating polynomial of degree ≤ n which

interpolates f(x) at x0, …, xn. The Interpolating error En(x) of Pn(x) is given by
E n ( x) = f ( x) − Pn ( x). (2.3.9)
34
Let x be any point different from x0, x1,….xn. If Pn+1(x) is the Newton form of Interpolation
interpolating polynomial which interpolates f(x) at x0, x1, ….xn and x , then Pn+1( x )
= f( x ). Then by Equation (4.4.5) we have
n
Pn +1 ( x) = Pn ( x) + f [ x0 , x1 ,....., x n. x ]∏ ( x − x j )
j =0
Putting x = x , in the above, we have

n
f( x ) = Pn+1( x ) = Pn( x ) + f[x0,x1, ….xn, x ] ∏ (x − x
j =0
j )
n
That is, En( x ) = f( x ) – Pn ( x ) = f[x0,x1, …, xn, x ] ∏ (x − x
j =0
j )
This shows that the error is like the next term in the Newton form.
Comparing the two expressions derived for En(x), we establish a relationship between
divided differences and the derivatives of the function as follows:
f n +1 (ξ(x )) n
En ( x ) =
(n + 1) j =0 ∏(x − xi )
n
= f[x0 , x1 .....xn , x ] ∏ (x − x ) .
j= 0
i
Comparing, we have
f ( n +1) (ξ )
f[x0, x1, …, xn+1) =
(n + 1)!
(Considering x = xn+1 ). Further, it can be shown that ξ ∈ (min xi, max xi).
We state these results in the following theorem.
Theorem 4: Let f(x) be a real-valued function, defined on [a, b] and n time

differentiable in (a, b). If x0, x1,….xn are n+1 distinct points in [a, b], then there exists
ξ ∈ ( a, b ) such that
f (ξ )
(n)
f[x0,….xn]=
n!
n!
Corollary 1: If f(x) = xn, then f[x0, x1, …, xn]= =1
n!
Corollary 2: If f(x) = xk, k < n, then f[x0, …, xk] = 0

(Since the nth derivative of xk, k < n, is zero).
Example 6: If f(x)=an xn + an–1 xn–1 + … + a1 x + a0, then find f[x0, x1, …, xn].
n1
Solution: By corollaries 1 and 2, we have f [x0, x1, …, xn] = an + 0 = an
n1
Let us consider the error formula
f n +1 (ξ(x))
n
En(x) = f(x) – Pn(x) =
(n + 1)! ∏(x − x )
j =0
i
We want to obtain a bound on the error. For this we have

max f (n +1) (t)
En (x) = f(x) − Pn (x) ≤ t∈I max ψ n (t) (2.3.10)
(n + 1)! t∈I
where [a, b] = I, ψ n (x) = (x − x0 )(x − x1 )...(x − xn ).
35
Linear System of
Algebraic Equations & Example 7: Find a bound for the error in linear interpolation.
Polynomial
Interpolation Solution: For the case n = 1, we have linear interpolation. If x∈ [xi –1, xi] then we
approximate f(x) by P1(x) which interpolates at xi–1, xi. From Equation (4.4.10).
We have
1
E1(x) ≤ max .f"(x) max ψ1(x)
2 x ∈I x ∈I
where Ψ 1 (x ) = (x – xi–1) (x – xi).
dΨ 1
Now = x – xi–1 + x – xi = 0 ⇒ x = (xi -1 + xi ) 2 .
dx
Hence, the maximum value of |(x – xi–1) (x – xi)| occurs at x = x* = (xi -1 + xi ) 2 .

2
(xi − xi −1 ) h2
∴ max ψ1(x) = = .
xi −1 ≤t ≤xi 4 4
(xi − xi −1 )2 1 h2
Thus E 1(x) ≤ ⋅ max F"(x) = ⋅M
4 2 x ∈I 8
Where h = xi – xi–1 and f"(x) ≤ M on I=[xi–1, xi].
Example 8: Determine the largest step-size h that can be used in the tabulation of the
function f(x) = (1 + x2)2 on [0, 1] so that the error in the linear interpolation is less
than 5 × 10–5.
Solution: We have to determine the spacing h in a table of equally spaced values of

the function of f(x) = (1 + x2) 2 between 0 and 1. By assumption, the table will contain
1 −0 1
f(xi) with xi = 1 + ih, i = 0, …, N, where N = = .
h h
If x ∈[xi−1,xi ], we approximate f ( x ) by P1 ( x ), where P1(x) is the linear
polynomial which interpolates f(x) at xi–1 and xi. By Equation (4.4.10)
h2 h2
E 1( x) ≤ ⋅M ≤ ⋅ max f"(x)
8 8 x ∈[0,1]
When M = max |f''(x)|.

x ∈[xi −1,xi ]
h2
E 1(x ) ≤ .M' where M' = max |f''(x)|
8 0 ≤x ≤1
When x ∈ (0, 1).
Since f'' (x) = 12xz + 4 and it is an increasing function on [0, 1], max |f''(x)| = 16.
x∈[ 0 ,1]
2
h
Thus |E1( x )| ≤ .16 = 2h 2
8
We have 2h2 < 5 x 10–5 or h ≤ .005.
1 1000
That is N = = = 200 .
.005 5
E10) Determine the spacing h in a table of equally spaced points of the function
f(x) = x x between 1 and 2, so that the error in the linear interpolation is
less than 5 × 10–6.
36
Interpolation
2.4 INTERPOLATION AT EQUALLY SPACED
POINTS
Suppose the value of f(x) at (n + 1) equally spaced values of x are known or given,
that is, (xi, yi), i = 0,…, n are known where xi – xi–1 = h (fixed), i = 1, 2, …, x and y
= f(x). Suppose we are required to approximate value of f(x) or its derivative f’(x) for
some values of x in the interval of interest. The methods for solving such problems
are based on the concept of finite differences. However, our discussion will be
confined to obtain Newton’s forward and backward difference forms only. Each is
suitable for use under specific situation.
2.4.1 Differences – Forward and Backward Differences

Suppose that we are given a table of values (xi, yi ) i = 0,…, N where yi = f(xi) = fi.
Let the nodal points be equidistant. That is xi = x0 + ih, i = 0, 1, …, N with
N = (b ─ a)/h (I = [a, b]).
x − x0
For simplicity we introduce a linear change of variables s = s ( x) = , so that x
h
= x(s) = x0 + sh and introduce the notation.
f(x)=f(x0 + sh)=fs.
The linear change of variables transforms polynomials of degree n in x into

polynomial of degree n in s. For equally spaced nodes, we shall deal with two types
of differences, namely forward and backward and discuss their representation in the
form of a table. We shall also derive/give the relationship of these differences with
divided differences and their interrelationship.
Forward Differences
We denote the forward differences of f(x) of ith order at x = x0 + sh by ∆i f s and

define it as follows:
fs, i =0
∆ifs =  ∆(∆i −1fs ) = ∆i −1fs +1 − ∆i −1fs,i > o (2.4.1)

where ∆ denotes forward difference operator.
When s = k, that is x = sk, we have

for i = 1 ∆ fk = fk+1 –fk
for i =2 ∆ 2fk = ∆ fk+1 – ∆ fk = f k+2 – fk+1 – [fk+1 – fk]
= fk+2 –2fk+1 + fk
Similarly, you can obtain

∆ 3fk = fk+3 – 3 fk+2 + 3fk+1 – fk.
We recall the binomial theorem
s
 s  j s − j.
(a+b)s= ∑   a b (2.4.2)
j =0  j
s 
where   = C ( s, j )
j
 
and s is a real and non-negative integer.

The shift operator E is defined as
37
Linear System of
Algebraic Equations & E fi = fi+I (2.4.3)
Polynomial In general Ef(x)= f(x + h). We have Esfi = fi+s
Interpolation 1 −1
For example E4fi = fi+4. E 2
f i = f i + 1 and E 2
fi = f 1
2 i−
2
Now ∆ fi = fi +1 – f i = Ef i − f i = ( E − 1) f i
Hence the shift and forward difference operators are related by

∆ = E − 1 or E = 1+ ∆ .
Operating s times, we get
s
s 
∆ s = (E − 1) s = ∑
j =0
  E
 j
j
( − 1) s −j (2.4.4)
by Equation (4.5.2) (4.5.3), we get

s
s  s
∆ sfi = ∑
j =0
  ( −1)s −j( )fj +1
 j j
We now give in the following table, the forward differences of various orders using 5
values.
Table 3 : Forward Difference Table

x f(x) ∆f ∆ 2f ∆ 3f ∆ 4f
x0 f0
∆ f0
x1 f1 ∆ 2f0
∆ f1 ∆ 3f0
x2 f2 ∆ 2f1 ∆ 4f0
∆ f2 ∆ f1 3
x3 f3 ∆ f2
2
∆ f3
x4 f4
Note that the forward difference ∆ kf0 lie on a straight line slopping down ward to the
right.
Now we give Lemma 1 the relationship between the forward and divided differences,
which can be proved by induction. This relation is utilized to derive the Newton’s
forward-difference formula which interpolates f(x) at
xk + ih, i =0, 1, …, n.
Lemma 1: For all in ≥ 0

1 i
f[xk, xk+1, xk+i]= ∆ fk
i!h i
This has an easy corollary.
Corollary: If Pn(x) is a polynomial of degree n with leading co-efficients an, and

x0 is an arbitrary point, then
∆ nPn(x0) = an n! hn and
∆ n+1Pn(x0)=0, i.e., all higher differences are zero.
Backward Differences
The backward differences of f(x) of ith order at xs = x0 +sh are denoted by ∇ i fs. They
are defined as follows:
38
Interpolation
 fs, i =0 
∇ ifs = ∇ifs = ∇i −1 ( ∇fs ) = ∇i −1[fs −fs −1 ],i ≥ 1  = ∇t −1 fs − ∇t −1 fs −1 (2.5.5)
 
where ∇ denotes backward difference operator. When s = k, that is

x = x0 + kh = xk, we have for
i=1 ∇ fk = fk – fk–1
i=2 ∇ 2fk = ∇(fk −fk −1 ) = ∇fk − ∇fk −1 = fk −fk −1 +fk −2
Similarly for
i=3 ∇ 3fk = fk − 3fk −1 + 3fk −2 −fk −3
The relation between the backward difference operator ∇ and the shift operator E is
given by
∇ = 1 – E–1 E = (I – ∇ )–1
Also ∆ = E – 1 = E(I – E–1) = E ∇
(
Since ∇f k = f k − f k −1 = f k − E −1f k = 1 − E −1 f k )
Operating s time, we get
s
s 
∇ sfk = ( 1 – E–1)s fk = ∑( −1)
j =0
j
  f k − j
 j
We can extend the binomial coefficient notation to include negative numbers, by
letting
 − s  − s(− s − 1)(− s − 2)...(− s − i + 1) s( s + 1)...( s + i − 1)
  = = (−1) i
i  i! i!
The backward differences of various orders with 5 nodes are given in the following
table:
Table 4: Backward Difference Table
x f(x) ∇f ∇ 2f ∇ 3f ∇ 4f
x0 f0
∇ f1
x1 f1 ∇ 2f2
∇ f2 ∇ 3f3
x2 f2 ∇ f32
∇ 4f4
∇ f3 ∇ 3f4
x3 f3 ∇ f42
∇ f4
x4 f4
Note that the backward difference ∇ kf4 lie on a straight line slopping upward to the
right. Also note that ∆ fk = ∇ fk+1 = fk+1 – fk.
2.4.2 Newton’s Forward-Difference and Backward-Difference

Formulas
The Newton’s form of interpolating polynomial interpolating at xk, xk+1 ., xk+n is
n
Pn ( x) = ∑ ( x − x k )( x − x k +1 )....( x − x k +i −1 ) f [ x k +1 , x k + 2 ,..., x k +1 ] (2.4.6)
i =0
In this we make use of the following:

1
f[xk, …, xk+n] = n
∆ nfk and get
n! h
39
Linear System of
n 1
Algebraic Equations & Pn(x) = ∑ (x − x k )(x − x k +1 )...(x − x k +i −1 ) n ∆ifk (2.4.7)
Polynomial i =0 i! h
Interpolation
Setting k = 0, we have
Here xs = x0 + sh may be introduced, where
xs − x0
s=
h
s
Also f(x) = can be derived straight from Es f0 = (1 + ∆) f0
2 n
(x − x 0 ) ∆f0 (x − x 0 )(x − x 1 ) ∆ f0 (x − x 0 )...(x − x n −1 ) ∆ f0
Pn (x) = f0 + +
2
+ ...
n
(2.4.8)
1! h 2! h n! h
Also we have x − x k +j = x 0 + sh −[x 0 +(k + j)h] =(s − k − j)h
Substituting this in Equation (4.5.8) and simplifying we get

n
Pn (x) = p(x 0 + sh) = ∑(s − k)(s − k −1)....(s − k −i + 1)∆ f
i =0
i
k
n
 s −k 
= ∑
i =0
∆ifk 
 i 

(s − k)(s − k − 1) 2
= fk +(s − k)∆)k + ∆ fk +...
2!
(s − k)....(s − n − 1) n
+ ∆ fk (2.4.9)
n!
of degree ≤ n.
Setting k = 0, in Equation (4.5.9) we get the formula
n
s
Pn ( x 0 + sh ) = ∑ i =0
∆i f 0  
i
This form Equation (4.5.1)..(4.5.9)…. is called the Newton’s forward-difference
formula.
The error term is now given by

 s 
E n(x) =  h n +1f n +1(ξξ
 n +1 
For deriving the Newton’s Backward-Difference formula, we reorder the interpolating
nodes as xn, xn–1,…..x0 and apply the Newton’s divided difference form. This gives
Pn(x) = f[x n ] +(x − x n )f[x n −1,x n ] +(x − x n )(x − x n −1 )f[x n −2 ,x n −1,x n ]
+(x − x n ).....(x − x 0 )f[x n........x 0 ] (2.4.10)
Set x = xn + sh, then

x − x j = x n + sh − [ x n − (n − i)h ] = (s + n − j)h
x − x n − j = x n + sh − [ x n − (n − n + j)h ] = (s + j)h
and
(x − x n )(x − x n −1 )...(x − x n −i +1 ) = s(s + 1)...(s +i − 1)h x
1 1
Also we have f[x n −k ,...,x n ] = k
∆ k(x n ) = ∆ kf(x n )
k! h k! h k
Substituting those in Equation (2.4.10) and simplifying we get the following two
expressions
n
∑i! h (x − x
1
Pn(x) = i n )(x − x n −i )...(x − x n −i +1 )∆ifn
i =0
(x − x n ) ∆fn (x − x n )(x − x n −i ) ∆ 2fn

= fn + + +...
1! h 2! h2
40
Interpolation
(x − x n )(x − x n −i )...(x − x n ) ∆ nfn
+ (2.4.11)
n! hn
s(s + 1) 2 s(s + 1)...(s + n − 1) n
= fn + s∆∆n + ∆ fn +... + ∆ fn
2! n!
 −s  ( −1)k s(s + 1)...(s + k − 1)
Since   = , we have
k  k!
 −s  −s
2  2
−s
n  n
Pn(x) = f(x n ) +( −1) ∇f(x n ) +( −1)  ∇ f(x n ) +...+( −1)  ∇ f(x n ) (2.4.12)
1  2  n 
or
n
 −s  k
Pn(x) = ∑( −1)
k =0
k
 ∇ f(x n )
k 
This is called Newton’s backward-difference form.
In this case, error is given by

s(s + 1)...(s + n) n +1 n +1
E n(x) =( −1)n +1 h f (ξξ)
(n + 1)!
The forward-difference formula (4.5.7) is suitable for approximating the value of the
function at x that lies towards the beginning of the table and the backward-difference
form is suitable for approximating the value of the function at x that lies towards the
end of the table.
Example 9: Find the Newton’s forward-difference interpolating polynomial which

agrees with the table of values given below. Hence obtain the value of f(x) at x = 1.5.
x 1 2 3 4 5 6
f(x) 10 19 40 79 142 235
Solution:
Table 5: Forward Differences

x f(x) ∆f ∆ 2f ∆ 3f
1 10
9
2 19 12
21 6
3 40 18 0
39 6 0
4 79 24 0
63 6
5 142 30
93
6 285
The Newton’s forward-differences interpolation polynomial is
(x − 1)(x − 2) 2 (x − 1)(x − 2)(x − 3) 3

f(x) ≈ f0 +(x − 1)∆)0 + ∆ f0 + ∆ f0
2! 3!
(x − 1)(x − 2) (x − 1)(x − 2)(x − 3)
= 10 +(x − 1)(9) + (12) + (6)
2 6
On simplification we get
41
Linear System of
Algebraic Equations & f (x ) ≈ x 3 + 2x + 7
Polynomial
Interpolation ∴ f (1.5) = (1.5) 3 + 2(1.5) + 7 = 13.375
Example 10: Find Newton’s backward difference form of interpolating polynomial

for the data
x 4 6 8 10
f(x) 19 40 79 142
Hence interpolate f(9).
Solution: We have
Table 6: Backward Difference

x f(x) ∇f ∇ 2f ∇ 3f
4 19
21
6 40 18
39 6
8 79 24
63
10 142
63 ( x − 10)( x − 8) 24 ( x − 10)( x − 8)( x − 6) 6

Pn ( x ) = 142 + ( x − 10)+ ⋅ + ⋅
2 2! 4 3! 8
63 3
f (9) ≈ Pn (9) = 142 − − 3 − = 107.125
2 8

E11) Find the Newton’s backward differences interpolating polynomial for the data
of Example 9.
E12) Using forward differences, show that the following data represents a third
degree polynomial.
x –3 –2 –1 0 1 2 3
f(x) – 29 –9 –1 1 3 11 31
Find the polynomial and obtain the value of f(0.5).
E13) Using forward Differences, show that the following data:

x –1 0 1 2 3 4
f(x) 6 1 0 3 10 21
represents a second degree polynomial. Find this polynomial and an
approximate value of f(2.5)
E14) Estimate the value of f(1.45) from the data given below
x 1.1 1.2 1.3 1.4 1.5

f(x) 1.3357 1.5095 1.6984 1.9043 2.1293
E15) Evaluate the differences
(i) ∇ 3 [a 2 x 2 + a1 x + a 0 ]
(ii) ∇ 3 [a3 x 3 + a 2 x 2 + a1 x + a 0 ]
42
Interpolation
(iii) ∆3 [a3 x 3 + a 2 x 2 + a1 x + a0 ]
1
E16) Show that the nth order divided differences of f(x) = is
x
(−1) n /( x0 .x1 ...x n ) .
E17) A table of values is to be constructed for the function f(x) given by
f(x) = x4 + 1 in the interval [3, 4] with equal step-length. Determine the
spacing h such that linear interpolation gives results with accuracy 1 × 10-4.
E18) Find the Newton’s divided difference form of interpolating polynomial for the
data.
x –4 –1 0 2 5
f(x) 1245 33 5 9 1335
Also approximate f(1) from the polynomial.

E19) Construct Newton’s forward difference table for the data
x 3 5 7 9
f(x) 6 24 38 108
Hence approximate f(4) from Newton’s forward difference interpolating

polynomial.
E20) If f(x) = ax2 + bx + c (a, b, c, ∈, R), then show that f[1, 2, 3] = a.
2.5 SUMMARY
In the first section apart from deriving the Lagrange’s form of interpolating
polynomial for a given data, it has been shown that the interpolating polynomial for a
given data is unique. We have also seen how the Lagrange’s interpolation formula
can be applied with y as the independent variable and x as the dependent variable so
that the value of x corresponding to a given value of y can be calculated
approximately when some conditions are satisfied. Finally, we have derived the
general error formula and its use has been illustrated to judge the accuracy of our
calculation. In the next section, we have derived a form of interpolating polynomial
called Newton’s general form (divided difference form) which has some advantages
over the Lagrange’s form discussed in section 1. We have introduced the concept of
divided differences and discussed some of its properties before deriving Newton’s
general form. The error term also has been derived and we have established a
relationship between the divided difference and the derivative of the function f(x)
using the two different expressions of the error terms. In section 3, we have derived
interpolation formulas for data with equally spaced values of the argument. The
application of the formulas derived in this section is easier compared to the
application of the formulas derived in first and second sections.
The mathematical formulas derived in the unit are listed below for your easy reference.
1. Lagrange’s Form
n n
(x − x j )
Pn(x) = ∑
i =0
f(xi )Li(x) where Li(x) = ∏(x
j =0 i −xj)
j ≠i
2. Inverse Interpolation
43
Linear System of
Algebraic Equations & 
n
 n (y − y j )
Polynomial
Interpolation
Pn(y) = ∑ ∏
i =0
xi 
 j =0 (yi − y j )
 j ≠i
3. Interpolation Error
f (n +1)(ξξ(x)
n
E n(x) =f(x) − Pn(x) =
(n + 1)! i =0 ∏
(x − xi )
4. Divided Difference
f[x 1,.....x j ] −f[x 0 ,.....x j −1 ]
f[x 0 ,x 1,...x j ] =
x j − x0
5. Newton’s Form
n i −1
Pn(x) = ∑f[x ,...,x ]∏(x − x
i =0
0 i
j =0
j)
6. Interpolation Error (In terms of divided difference)

n
E n(x) =f[x 0,x 1,....x n,x] ∏(x − x
j =0
j)
f (n)(ξξ
7. f[x 0,x 1,...,x n ] = ,ξ ∈ (min xi, max xi)
n!
8. Newton’s Forward Difference Formula

n
s  i
Pn(x) = Pn(x 0 + sh) = ∑ ∆ f0
i =0 i 
s(s − 1) 2 s(s − 1).....(s − n + 1) n
= f0 + s∆∆0 + ∆ f0 +...+ ∆ f0
2! n!
When s =(x − x 0 )/h and
 s 
E n(x) =  h n +1f (n +1)(ξξ
 n +1 
9. Newton’s Backward Difference Formula:
n
− s
Pn ( x) = Pn ( x n + sh) = ∑ (−1) k  ∇ k f n
k =0 k 
where s = ( x − x n ) / h and
s( s + 1)...( s + n) n +1 ( n +1)
E n ( x) = (−1) n +1 h f (ξ )
(n + 1)!
1 n
10. f [ x k ,..., x k + n ] = ∆ f k and
n!h n
1
f [ x n − k ,..., x n ] = k
∇ k f ( xn).
k! h
E1) We show that if P(x) and Q(x) are two polynomials of degree ≤ k which
agree at the k + 1 distinct points x0, x1, … xk, then P(x) = Q(x) identically.
Let ψ (x) = P(x) = Q(x). Then ψ (x) is a polynomial of degree ≤ k, and by
Lemmal, it can be written as ψ (x) = (x – x0) (x – x1)… (x – xk) R(x) with
R(x) some polynomial. Let R(x) = cmxm + = cm–1xm–1 + … + c1x + c0 such that
44
cm ≠ 0. Then k ≥ degree of ψ = k + 1 + m which is absurd. Hence ψ (x) = 0 Interpolation
identically, so P(x) = Q(x).
E2) x 3 − x 2 + 3 x + 8, 18
E3) 14.6667
E4) 6.6875
E5) Let x = g(y). The Lagrange’s interpolating polynomial P(y) of g(y) is given by
1 19
P(y) = − (y 3 − 12y 2 + 47y −60) + (y 3 − 10y 2 + 29y − 20)
24 4
49 101 3
− (y 3 − 9y 2 + 23y − 15) + (y − 8y 2 + 19y − 12)
3 8
which, on simplification gives
P(y) = y 3 − y 2 + 1 when y = 2, x ≈ P(2) = 5 .
E6) x = g(y)
(y − 1)(y − 2)(y − 4) (y + 2)(y − 2)(y − 4)
P(y) = .(36) + .(54)
( −3)( −4)( −6) (3)( −1)( −3)
(y + 2)(y − 1)(y − 4) (y + 2)(y − 1)(y − 2)
+ .(72) + .(144)
(4)(1)( −2) (6)(3)(2)
(2)(1)( −1) (5)(1)( −1)
x ≈ P(3) = ×(36) + ×(54)
− 72 9
(5)(2)(1) (5)(2)(1)
+ × (72) + × (144)
8 36
= 1 − 30 + 90 + 40 = 101
1 1
−
E7) f[a,b] = b a =− 1
b −a ab
1 1
f[b,c] = − andf[c,d] = −
bc cd
1 1
f [a , b, c] = , f [b, c, d] =
abc bcd
f [b, c, d, ] − f [a , b, c] 1
f [a , b, c, d ] = =−
d−a abcd
E8) Divided Difference Table
x f[x] f[., .] f[., ., .]

1 1
2
2 3 1
4 0
3 7 1 0
7 0
5 21 1
10
6 31
Since third and higher order divided differences are zeros, given data represents a
second degree polynomial
P2(x) = f[x 0 ] +(x − x 0 ]f[x 0 ,x 1 ] +(x − x 0 )(x − x 1 )f[x 0 ,x 1,x 2 ]
45
Linear System of
Algebraic Equations & = 1 +(x − 1).2 +(x − 1)(x − 2).1 = x 2 − x + 1f(4) ≈ P2(4) = 13
Polynomial f(4) ≈ P2(4) = 13
Interpolation
E9) Lagrange’s interpolation polynomial
1 3
P(x) = (x 3 − 9x 2 + 26x − 24) + (x 3 − 7x 2 + 12x)
6 2
26 3
− (x −6x 2 + 8x) + 8(x 3 − 5x 2 + 6x)
3
= x 3 + x − 4 on simplification.
Divided difference interpolation polynomial:
x f[x] f[., .] f[., ., .] f[., ., .,.]

0 –4
5
2 6 5
20 1
3 26 9
38
4 64
P(x) = – 4 +(x – 0)5 + x(x – 2)(5) + x(x – 2)(x – 3)(1)

= x3 +x – 4
f (1) ≈ p(1) = −2 .
h2
E10) E 1(x ) ≤ M where M = max = |f'' (x)|
8 1 ≤x ≤2
3 12 3 −1
and x ε(1,2).f 1(x) = x ,f"(x) = x 2.
2 4
3
Max f"(x) =
1 ≤x ≤2 4
h2 3 1
E 1(x ) ≤ ⋅ ≤ 5 ⋅10 −6 = ⋅10 −5
8 4 2
32 16
i.e. h 2 ≤ ⋅10 −5 = 10 −5 ⇒ h = 0.0073
2 ×3 3
E11) Backward Difference Table
x f(x) ∇f ∇ 2f ∇ 3f
x0 1 10
9
x1 2 19 12
21 6
x2 3 40 18 0
39 6 … 0
x3 4 79 24 0
63 6
46
x4 5 142 30 Interpolation
93
x5 6 235
(x − x 5 )(x − x 4 ) 2
P(x) = f5 +(x − x 5 )∇f5 + ∇ f5
2!
(x − x 5 )(x − x 4 )(x − x 3 ) 3 3
+ ∇ f5 = 235 + 93(x − 6) + 15(x − 6)(x − 5) + (x − 4)(x − 5)(x − 6) = x + 2x + 7
3!
E12) Forward Difference Table
x f(x) ∆f ∆ 2f ∆ 3f ∆ 4f
–3 –29
20
–2 – 9 –12
8 6
–1 –1 –6 0
2 6
0 1 0 0
2 6
1 3 6 0
8 6
2 11 12
20
3 31
(x − x 0 ) (x − x 0 )(x − x 1 ) 2
f(x) ≈ P3(x) = f(x 0 ) + ∆f0 + ∆ f0
h 2! h 2
( x − x 0 )( x − x1 )( x − x 2 )
+ ∆3f 0
3!h 3
= x 3 + x + 1f(0.5) ≈ P3(0.5) = 1.625
x f(x) ∆f ∆ 2f ...
–1 6
–5
0 1 4
–1 0
1 0 4
3 0
2 3 4
7 0
3 10 4
11
4 21
47
Linear System of
Algebraic Equations & Since third and higher order differences are zeros, f(x) represents a second order
Polynomial polynomial
Interpolation
(x − x 0 ) (x − x 0 )(x − x 1 ) 2
f(x) ≈ P2(x) = f0 + ∆f0 + ∆ f0
h 2! h 2
(x + 1)(x − 0)
= 6 +(x + 1)( −5) + (4) = 2x 2 − 3x + 1
2
f(2.5) ≈ 6.
E14) Backward Difference Table
x f(x) ∇f ∇ 2f ∇ 3f ∇ 3f
1.1 1.3357
0.1738
1.2 1.5095 0.0151
0.1889 0.0019
1.3 1.6984 0.0170 0.0002
0.2059 0.0021
1.4 1.9043 0.0910
0.2050
1.5 2.1293
x − x n 1.45 −1.5
Here x n = 1.5,x = 1.45,h = 0.1∴s = = = −0.5
h 0.1
s(s + 1) 2 s(s + 1)(s + 2) 3
f ( x ) = f n + s∇ f n + ∇ fn + ∇ fn
2! 3!
s(s + 1)(s + 2)(s + 3)∆4 f n
+
4!
= 2.1293 − 0.1125 − 0.00239 − 0.00013 − 0.0000075
= 2.01427 ≈ 2.0143
E15) i) 0
f i (ξ )
ii) a3 3! h2 (Recall that f(x0,……xi]=
i!
iii) a3 3!h3 and consider x fixed)
E16) Prove this by induction.
h2
E17) |E1(x)| ≤ M 2 where M2 = Max |f”(x)|
8 3≤ x ≤ 4
We have f’ (x) = 4x3, f”(x) = 12x2 M2 = 12 ×16 = 192

2
h 1
We have ×192 ≤ 10 −4 ⇒ h ≤ 10 −2 ≈ 0.002
8 24
E18) Divided Difference Table
x f[.] f[., .] f[., ., .] f[[., ., ., .] f[[., ., ., .,.]

–4 1245
–404
–1 33 94
– 28 – 14
48
0 5 10 3 Interpolation
2 13
2 9 88
442
5 1355
P4(x) = 1245 − 404(x + 4) + 94(x + 4)(x + 1) − 14(x + 4)(x + 1)x +

3(x + 4)(x + 1)x(x − 2)
f(1) ≈ P4(1) = −5
x f(x) ∆f ∆ 2f ∆ 3f
3 6
18
5 24 –4
14 60
7 38 56
70
9 108
x − x0 4 − 3 1
s= = =
h 2 2
1 1 1
f ( x 0 + sh ) ≈ P4 ( x 0 + sh ) = f 0 + ∆f 0 + ∆2 f 0 + ∆3f 0
2 8 16
1 1 1
i.e., f (4) = 6 +
⋅18 − ⋅ −4 + .60
2 8 16
1 15
= 6+9+ + = 19.25
2 4
1 (n)
E20) We have f[x 0 ,.....,x n ] = f (ξξ) ∈(minxi,maxxi )
n!
1 1 d 2 (ax 2 + bc + c) 1
f[1,2,3] = f"(ξξ = = .2a = a
2! 2! dx 2 x =ξ 2
49
UNIT 1 PROBABILITY DISTRIBUTIONS
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives
1.2 Random Variables
1.3 Discrete Random Variable
1.2.1 Binomial Distribution
1.2.2 Poisson Distribution
1.4 Continuous Random Variable
1.4.1 Uniform Random Variable
1.4.2 Exponential Random Variable
1.4.3 Normal Distribution
1.4.4 Chi-square Distribution
1.0 INTRODUCTION
The discipline of statistics deals with the collection, analysis and interpretation of data.
Outcomes may vary even when measurements are taken under conditions that appear to
be the same. Variation is a fact of life. Proper statistical methods can help us understand
the inherent variability in the collected data, and facilitate the appropriate analysis of the
same.
Because of this variability, uncertainty is a part of many of our decisions. In medical

research, for example, interest may center on the effectiveness of a new vaccine for
AIDS; an agronomist may want to decide if an increase in yield can be attributed to a new
strain of wheat; a meteorologist may be interested in predicting whether it is going to rain
on a particular day; an environmentalist may be interested in testing whether new controls
can lead to a decrease in the pollution level; an economist’s interest may lie in estimating
the unemployment rate, etc. Statistics, and probabilistic foundations on which statistical
methods are based, can provide the models that may be used to make decisions in these
and many other situations involving uncertainties.
Any realistic model of a real world phenomenon must take into account the
possibilities of randomness. That is, more often that not, the quantities we are interested in
will not be predicted in advance, but rather will exhibit as inherent variation that should
be taken into account by the model. Such a model is, naturally enough, referred to as a
probability model.
In this unit we shall see what is a random variable and how it is defined for a
particular random experiment. We shall see that there are two major types of probability
distribution. We shall investigate their properties and study the different applications.
1
1.1 OBJECTIVES
After reading this unit, you should be able to
• describe events and sample spaces associated with an experiment;

• define a random variable associated with an experiment;
• decide whether a random variable is discrete or continuous;
• describe the following distributions
a) Binomial distribution
b) Poisson distribution
c) Uniform distribution
d) Exponential distribution
e) Normal distribution
f) Chi-square distribution.
1.2 RANDOM VARIABLES
Definition A ”Random experiment” or a “Statistical experiment” is any act whose

outcome can not be predicted in advance. Any outcome of a random experiment is known
as “event”
We will start with the following illustrations :

1) The number of telephone calls received by Monica, a telephone operator in a call
center in Delhi, between say 1:00 am and 3:00 am in the early morning.
2) The amount of rainfall in Mumbai on August 1st.
3) The number of misprints on a randomly chosen page of a particular book.
4) The final results of the 5 one-day matches between India-Pakistan.
5) The outcome of rolling die.
6) The volume of sales of a certain item in a given year.
7) Time to failure of a machine.
In all the above cases there is one common feature. These experiments describe the
process of associating a number to an outcome of the experiment (i.e. to an event). A
function which associates a number to each possible outcome of the experiment is called a
“random variable”. It is often the case that our primary interest is in the numerical value
of the random variable rather than the outcome itself. The following examples will help to
make this idea clear.
EXAMPLE 1: Suppose we are interested in the number of heads, say X, obtained in three
tosses of a coin.
SOLUTION
If we toss a coin three times, then the experiment has a total of eight possible outcomes,
and they are as follows;
a1={HHH}, a2={HHT}, a3={HTH}, a4={HTT}

a5={THH}, a6={THT}, a7={TTH}, a8={TTT}
Denoting the event corresponding to getting k heads, k=0,1,2,3,.. as{X=k}, observe that
{X=0} => {a8} ; {X=1} => {a4, a6,a7} ; {X=2} =>{a2, a3,a5} ; {X=1} =>{a1}
2
Note that each value in the support of X corresponds to some element (or set of elements)
in the sample space S. For example the , the value 0 corresponds to the element {a8},
while the value 1 corresponds to the set of elements { a4, a6,a7}
Therefore the sample space S, the set of all possible outcomes of this experiment
can be expressed as
S={a1, a2,…,a8}
Since X is the characteristic, which denotes the number of heads out of the three tosses, it
is associated with each outcome of this experiment. Therefore, X is a function defined on
the elements of S and the possible values of X are {0,1,2,3} . The set of possible values
that X can take is called the support of X which may be denoted as χ . Observe, that X can
be explicitly expressed as follows;
X (a1)=3, X (a2)=X (a3)=X (a5)=2, X (a4)=X (a6)=X (a7)=1, X (a8)=0
It is interesting to observe that to each value there is always some element in sample
space or a set of element in sample spaces. For, example, the set of element in sample
spaces corresponding to the value ‘0’ is the point {a8}; for 1, the set is {a4, a6, a7}, for 2,
the set is {a2, a3, a5} and for 3 the point is {a1}.
Therefore, we can easily make the following identification of events

corresponding to the values associated by X. Denoting the event corresponding to ‘0’, as
{X=0}, similarly for other values, observe that
{X=0}={a8}; {X=1}={a4, a6, a7}; {X=2}={a2, a3, a5} {X=3}={a1}
If we assume that the coin is unbiased and the tosses have been performed independently,
the probabilities of all the outcomes of the sample space are equal, that is
1
P(a1)=P(a2)=…=P(a8) = . Therefore, using the probability law of disjoint events we can
8
easily obtain
1
P ({X=0}) = P ({a8}) =
8
3
P ({X=1}) = P ({a4, a6, a7}) = P ({a4})+P ({a6})+P ({a7}) =
8
3
P ({X=2}) = P ({a2, a3, a5}) = P ({a2})+P ({a3})+P ({a5}) =
8
1
P ({X=3}) = P ({a1}) =
8
Therefore, in this case the random variable X takes four values 0,1, 2,3 with the
probabilities 1/8,3/8,3/8,1/8 respectively. It is also important to observe that
P({X=0})+P({X=1})+P({X=2})+P({X=3})=1
It is not a coincidence, it is always true. If we add the probabilities of all possible values
of a random variable it is always one.
To sum up, we say that the random variable X, is a real valued function defined
on all the elements of a sample space of a random experiment. The random variable takes
different values, and for each value there is a probability associated with it. The sum of all
the probabilities of all the points in the support χ of the random variable X adds up to one.
The following figure will demonstrate the random variable X.
3
HHH HHT HTH HTT X
THH THT TTH TTT {0, 1, 2, 3}
Figure 1: Schematic representation of a random variable
Because of this representation , one can define probabilities for the set of
numbers(depending on the random variable) rather than working with arbitrary space and
this simplifies the problem considerably. Now we consider the following example.
EXAMPLE 2: You have purchased a new battery operated wristwatch and you have
inserted one new battery into it. Suppose you are interested in the following:
a) How long will it be before the first battery needs replacement ?

b) How many battery will have to be replaced during a one year period?
SOLUTION
Note that both (a) and (b) are random variables and let us discuss them one by one. In
case (a), we want to find the duration of time before the battery needs to be replaced. Note
that the variable takes values continuously along a line say from the time duration A to
time duration B. No values in between A and B are left out. In other words there is no
break in the values assumed by this random variable.
In case (b), the random variable is the number of batteries. This variable can take
values 0 or 1 or 2 etc. There is no continuity, since only non-negative integer values can
be assumed. So, the range of this variable is a discrete set of points. From this discussion
it is clear that the random variable X defined in Example 1 is also a discrete random
variable. The above examples shows that the random variables can be of two types. We
will distinguish between the following two types of random variables;
1. Discrete Random Variable , and

2. Continuous Random Variable.
E 1: Suppose you take a 50-question multiple-choice examination, guessing your answer,

and are interested in the number of correct answers obtained. Then
(a) What is the random variable X that you will consider for this situation?
(b) What is the set of possible values of X in this example?
(c) What does P(X=10) mean in this context?
Now in the next two sections we will describe the discrete and continuous random
variables in detail.
1.3 DISCRETE RANDOM VARIABLE

In this section, we define properly a discrete random variable and mention some of its
basic properties.
DEFINITION : A random variable X is said to be discrete, if the total number of values X
can take is finite or countably infinite(i.e the support of X is either finite or countable).
The support χ of X may be listed as {a0, a1, a2,…}. Moreover, for each value of
ai, there is a probability associated with it. Suppose we denote them as {p0, p1, p2,…},
4
therefore, we have P(X= ai)=pi for I=0,1,…. From the properties of a random variable and
from the probability law, we have
(a) pi ≥ 0 for all i ≥ 0
∞
(b) ∑p
i =0
i = p0 + p1 + p2 + .... = 1
From the above discussions, it follows that there exists a function p: χ → R as follows;
p if a = ai ; i = 0,1, 2.....
p(a) =  i
0 otherwise
This function p is called the probability mass function (p.m.f.) of the discrete random
variable X.
The collection of the pairs {(ai, pi;I=0,1,…} is called the probability distribution of X.
Another function which plays a very important role for any random variable is
known as the cumulative distribution function (c.d.f.) or simply the distribution function
of the random variable. The c.d.f. F : R→[0,1] of the random variable X is defined as
F(b) = P(X ≤ b), for - ∞ < b < ∞.
In other words, F(b) denotes the probability that the random variable X takes on a value
which will be less than or equal to b. Some important properties of the c.d.f. F (.) are
(a) F (b) is a non-decreasing function of b.
(b) lim F (b) = 1
b →∞
(c) lim F (b) = 0
b →−∞
Now we clarify these concepts with the same example discussed in the previous
section. Suppose X is the random variable denoting the number of heads obtained in three
independent tosses of a fair coin, then the probability mass function (p.m.f ) p is the
function, p: χ → R, such that
1 3 1
p(0) = , p(1) = p(2) = , p(3) =
8 8 8
Therefore, p(ai) = pi ≥ 0, for all ai and
3
1 3 3 1
∑
i =0
pi = + + + = 1
8 8 8 8
In this case the p.m.f of the random variable by the function p and the corresponding
 1   3   3   1  
probability distribution is the set  0,  ,  1,  ,  2,  ,  3,   .This can also be
 8   8   8   8  
expressed in a tabular form as follows
TABLE 1
PROBABILITY DISTRIBUTION OF THE NUMBER OF HEADS
IN THREE INDEPENDENT TOSSES OF A FAIR COIN
The number of heads (X value) Probability
1
0 8
3
1
8
3
2 8
5
3 1
8
Now let us see the graphical representation of the distribution.
0.50
0.40
Probability 0.30
0.20
0.10
0 1 2 3
Number of heads
Figure 2: Graphical representation of the distribution of .X
Graphically along the horizontal axis, plot the various possible values ai of a random
variable and on each value erect a vertical line with height proportional to the
corresponding probability pi.
Now let us consider the c.d.f of the random variable X. Note that if b<0, clearly
F(b)=P(X ≤ b)=0, because X takes values only {0,1,2,3}. If b=0, that is F(0)=P(X ≤
0)=P(X=0)=1/8. If 0 ≤ b ≤ 1, then P( X ≤ b)=P(X=0)+P(0<X<b)=1/8+0=1/8. Similarly, if
b=1, F(1)= P(X ≤ 1) = P(X=0) + P(X=1)=1/8+3/8=4/8 and so on. Therefore, the c.d.f. F(.)
has the following form;
 0 if b<0
1
 if 0 ≤ b < 1
8
 4
F (b) =  if 1 ≤ b < 2
8
7
 if 2 ≤ b < 3
 8
 1 if b≤3
Note :Mathematical expectation or Expected values or Expectations forms
the fundamental idea in the study of probability distribution of any discrete
random variable X , the expected value (or mean) , denoted as E(X) is defined as
E(X) = x0p0+ x1p1+ x2p2+…………..=Σxipi
Where x0, x1, x2 etc are the values assumed by X and p0, p1, p2 etc are probabilities
of these values. Under special conditions (like all probabilities are equal )then
E(X) = mean of x0, x1 ,x2……
Similarly for continuous variables X having density function p(x) where P[X=x] =
p(x) , the Expectation E(X) will be given by integral of xip(xi) w.r.t x
This concept of Expectation also contributes to the definition of Moment
Generating Function of X i.e Mx(t)= E(etx)
6
Example 3 A box contains twice as many red marbles as green marbles. One marble is
drawn at random from the box and is replaced ; then a second marble is drawn at random
from the box.If both marbles are green you win Rs. 50 ; if both marbles are red you loose
Rs. 10 ; and if the y are of different colour then neither you loose nor you win. Determine
the probability distribution for the amount you win or loose?
Solution Say X denote the amount you win (+) or loose (-) ; i.e X= +50 or -10
The probability that both marbles are green is 1/9 i.e. P[X=+50] =1/9
The probability that both marbles are red is 4/9 i.e. P[X=-10] =4/9
The probability that marbles are of different colours is 4/9 i.e. P[X=0] =4/9
Thus the probability distribution is given by following table
Amount(in Rs won(+) or lost(-)) Probability

+50 1/9
0 4/9
-10 4/9
E1: Which of the variables given below are discrete? Give reasons for your answer.
(a) The daily measurement of snowfall at Shimla.

(b) The number of industrial accidents in each month in West Bengal.
(c) The number of defective goods in a shipment of goods from a manufacturer.
1.3.1 Binomial Distribution
One very important discrete random variable (or discrete distribution) is the binomial
distribution. In this subsection, we shall discuss this random variable and its probability
distribution.
Quite often we have to deal with the experiments where there are only two
possible outcomes. For example, when a coin is tossed either a head or a tail will comes
up, a newborn is either a girl or a boy, a seed either germinates or fails to germinate. Let
us consider such an experiment. For example consider the same experiment of tossing a
coin independently three times. Note that the coin need not necessarily be a fair one, that
is P(Head) may not be equal at P(Tail)
This particular experiment has a certain characteristic. First of all it involves

repetition of three identical experiments (trials). Each trial has only two possible
outcomes: a Head or a Tail. We refer to the outcome ‘Head’ as success and the outcome
‘Tail’ as failure. All trials are independent of each other. We also know that the
probability of getting a ‘Head’ in a trial is p and probability of getting a ‘tail’ in a trial is 1
– p, that is
P(Head) = P(success) = p and P(Tail) = P(failure) = q = 1 - p
This shows that the probability of getting a ‘success’ or a ‘failure’ does not change from
one trial to another. If X denotes the total number of ‘Heads’, obtained in three trials, then
7
X is a random variable, which takes values {0,1,2,3}. Then regarding the above
experiment, we have observed the following;
(1) It involves a repetition of n identical trials (Here n=3).

(2) The trials are independent of each other.
(3) Each trial has two possible outcomes.
(4) The probability of success (p) and the probability of failure’ (q=1-p) remain
constant and do not change from trial to trial.
Now let us try to compute the probabilities P{X = 0}, P{X = 1}, P{X = 2} and P{X =
3} in this case. Note that
P(X=0) = P(getting tails in all three trials)
= P({TTT}) = (1-p)3 = q3.
Similarly,
P(X = 1) = P(getting one Tail and two Heads in three trials)
= P({THH,HTH,HHT}) = P({THH}) + P({HTH}) + ({HHT})
= (1-p)2p+(1-p)2p +(1-p)2p = 3(1-p)2p = 3q2p.
Similarly,
P(X = 2) = (getting two Tails and one Head in three trials)
= P({HTT,THT,TTH}) = P({HTT}) + P({THT}) + ({TTH})
= (1-p)p2 + (1-p)p2 + (1-p)p2 = 3(1-p)p2=3q2p
Finally
P(X=3) = P(getting Heads in three trials)
= P({HHH}) = p3
Now observe that instead of n=3, in the above example we can easily compute the
probability for any general n. Suppose we compute P(X = r), for 0 ≤ r ≤ n, then note that
P(X = r) = C(n,r)pr(1-p)n-r = C(n,r)prqn-r,

Where C(n,r) denotes the number of ways n places can be filled with r Heads and n-r
Tails. From your school mathematics, recall that it is the number of combination of n
objects taken r at a time and it can be calculated by the following formula:
n!
C ( n, r ) =
r !( n − r )!
Therefore, for r = 0,1,…,n,
n!
P( X = r ) = p r q n−r ,
r !(n − r )!
where
n = the number of trial made
r = the number of success
p = the probability of success in a trial
q = 1-p = the probability of a failure.
Now we define the binomial distribution formally.
Let X represents the number of successes in the set of n independent identical trials. Then
X is a discrete random variable taking values 0,1,…,n. The probability of the event
P(X= r) is given by
8
n!
P( X = r ) = p r q n − r , r=0,1,2,…..,n
r !(n − r )!
where n, r, p, q are same as defined before. Such a random variable X is called a binomial
random variable and its probability distribution is called the binomial distribution. A
Binomial distribution has two parameters n and p
E1 : A farmer buys a quantity of cabbage seeds from a company that claims that
approximately 90% of the seeds will germinate if planted properly. If four seeds are
planted, what is the probability that exactly two will germinate?
1.3.2 Poisson Distribution
In this subsection we will introduce another discrete distribution called ‘Poisson

Distribution’. First we shall describe the different situations where we can apply this
Poisson Distribution.
Suppose it is the first hour at a bank in a busy Monday morning , and we are
interested in the number of customers who might arrive during that hour, or during a 5-
minute or a 10-minute interval in that hour. In statistical terms , we want to find the
probabilities for the number of arrivals in a time interval.
To find this probability, we are making some assumptions similar to the binomial
distribution.
(a) The average arrival rate at any time, remains the same over the entire first hour.
(b) The number of arrivals in a time interval does not depend on what has happened
in the previous time intervals.
(c) It is extremely unlikely that more than one customer will arrive at the same time.
Under those above assumptions, we can find the required the probabilities. Suppose X
is the random variable denoting the number of customers that arrive in the first hour, then
e −λ λ i
P( X = i) = , i = 0,1,2,3,...
i!
Where λ (the Greek letter Lambda) denotes the average arrival rate per hour. For
example, suppose we know that average number of customers that arrive in that bank
during the first hour is 60 and we want to find what is the chance there will be no more
than 3 customers in the first 10 minutes. Since we know that the average arrival rate per
hour is 60 , if we denote λ to be the average arrival rate per 10 minutes, then
60 × 10
λ= =10. Therefore, we can use the above formula and get
60
e −1010i
P( X = i) = , i = 0,1, 2,3.....
i!
But we want to find the probability that no more than 3 customers will be there in the first
ten minutes and that is
P ( X ≤ 3) = P ( X = 0) + P( X = 1) + P( X = 2) + P( X = 3) (1)
e −10 10 2 e −10 10 3
= e −10 + e −10 10 + + (2)
2! 3!
9
≈ 0.00005 +0.00050 + 0.00226 +0.00757 = 0.01038 (3)
What does this value 0.01038 indicate? It tells us that if the arrival rates are uniform then
there is only 1% chance that less than three customers will be there, or in other words,
there is a 99% chance that there will be more then 3 customers in the first 10 minutes.
Similarly if we want to find out the chance that there will be no more than 3
customers in the first 5 minutes, then similarly, as above we can see that in this case
60 × 5
λ= = 5.
60
Therefore, if Y denotes the number of customers presents in the first 5 minutes, then
P (Y ≤ 3) = P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3) (4)
e −5 52 e −5 53
= e −5 + 5e−5 + + (5)
2! 3!
≈ 0.00674 +0.03369 +0.08422 +0.14037 = 0.26502 (6)
From the above two examples it is clear that if we change the time unit (and hence the
value of λ), the probabilities will change. The probability mass function (p.m.f ) given by
e −λ λ i
p (i ) = P ( X = i ) = , i = 0,1, 2,3, K
i!
represents the Poisson probability distribution. From the series expansion of eλ , it easily
follows that
∞ ∞
e−λ λ i
∑
i =0
P( X = i) = ∑
i =0 i!
=1
as it should be.
One point that should always be kept in mind is that a random variable denoting
the number of occurrences in an interval of time will follow a Poisson distribution, if the
occurrences have the following characteristics:
(a) The average occurrence rate per unit time is constant.

(b) Occurrence in an interval is independent of what has happened previously.
(c) The chance that more than one occurrence will happen at the same time is
negligible.
Now let us look at some situations where we can apply the Poisson distribution. Here is
an example
EXAMPLE 4: Calls at a particular call center occur at an average rate of 8 calls per 10
minutes. Suppose that the operator leaves his position for a 5 minute coffee break. What is
the chance that exactly one call comes in while the operator is away?
Solution: In this case the conditions (a), (b) and (c) are satisfied. Therefore if X denotes
the number of calls during a 5 minute interval, then X is a Poisson random variable with
8× 5
λ= = 4. Therefore,
10
e −4 41
P ( X = 1) = = 4e −4 ≈ 0.073
1!
10
That means the chance is 7.3% that the operator misses exactly one call.
Check your progress 4

E 1: If a bank receives on an average λ = 6 bad Cheques per day, what is the probability
that it will receive 4 bad checks on any given day
1.4 CONTINUOUS RANDOM VARIABLE
So far we have discussed about the discrete random variables in details and we have
provided two important discrete distributions namely binomial and Poisson distributions.
Now in this section we will be discussing another type of random variables namely
continuous random variables.
Let us look at the part (a) of Example 2. Note that we want to find the time of
occurrence rather than the number of occurrences. Therefore if the random variable X
denotes the time of occurrence of a particular event, then the random variable X can take
any value on the positive real line or may be any value on a fixed interval say (A,B).
Therefore, the random variable can take uncountably many values. This type of a
random variable which can take uncountably many values is called a continuous random
variable. For a continuous random variable X, the probability that X takes a particular
value is always zero, but we can always specify the probability of X of any interval
through a probability density function (p.d.f.). The exact details are given below.
DEFINITION: Let X be a continuous random variable which takes values in the interval
(A,B). A real valued function f(x): R→ R is called the p.d.f of X, if
(a) f (x) ≥ 0 and f (x) = 0, if x < A or x > B.
B
(b) ∫ A
f ( x)dx = 1
d
(c) P (c < X < d) = ∫
c
f ( x)dx
Now we shall see how we can use the graph of the p.d.f. of a continuous random variable
to study real life problems
Example 5: Suppose the Director of a training program wants to conduct a programme to

upgrade the supervisory skills of the production line supervisors. Because the programme
is self-administered, supervisors require different number of hours to complete the
programme. Based on a past study, it is known that the following p.d.f. shows the
distribution of time spent by a candidate to complete the program. From the graph it is
clear that the average time
11
250
Figure 3: The p.d.f. of the time spent by a candidate to complete the program
Spent by the candidate is 250 and it is symmetrically distributed around 250. How can the
Director use this graph to find the following. What is the chance that a participant selected
at random will require
(a) more than 250 hours to complete the program
(b) less than 250 hours to complete the program
SOLUTION:Since the graph is symmetric, therefore, it is clear that area under the curve
above 250 is half. Therefore, the probability that the random variable takes values higher
1 1
than 250 is . Similarly, the random variable takes value lower than 250 is also .
2 2
Please try the following exercise now:
Now in the following subsections we will consider different continuous distributions.
1.4.1 The Uniform Random Variable
The uniform distribution is the simplest of a few well-known continuous distributions,

which occur quite often. It can be explained intuitively very easily. Suppose X is a
continuous random variable such that if we take any subinterval of the sample space, then
the probability that X belongs to this subinterval is the same as the probability that X
belongs to any other subintervals of the same length. The distribution corresponding to
this random variable is called a uniform distribution and this random variable is called a
uniform random variable.
Formally, we define the uniform random variable as follows: The random

variable X is a uniform random variable between (A, B), if its p.d.f. is given by
 1
 for A< x< B
f ( x) =  B − A
 0 otherwise
From the p.d.f. it is clear that if A < a1 < b1 < B, A < a2 < b2 <B and b1-a1 = b2-a2, then
12
P(a1 < X < b1) = P(a2 < X < b2 ). Therefore, if the length of the intervals are same then the
corresponding probabilities will be also equal. Let us see some examples of such random
variables:
EXAMPLE 6: A train is likely to arrive at a station at any time uniformly between 6:15
am and 6:30 am. Suppose X denotes the time the train reaches, measured in minutes,
after 6:00 am.
SOLUTION In this case X is a uniform random variable takes value between (15,30).
Note that in this P(20 < X < 25) is same P(18 < x < 23) and that is equal to
25 − 30 23 − 18 1
= =
30 − 15 30 − 15 3
E 1 An office fire drill is scheduled for a particular day, and the fire alarm is likely to ring
uniformly at any time between 10:00 am to 1:00 pm.
1.4.2 Exponential Random Variable
In making a mathematical model for a real world phenomenon, it is always necessary to

make certain simplifying assumptions so as to render the mathematical tractability. On the
other hand, however, we can not make too many simplifying assumptions, for then our
conclusions obtained from the mathematical model, would not be applicable to the real
world problem. Thus in short, we must take enough simplifying assumptions to enable us
to handle the mathematics but not so many that the mathematical model no longer
resembles the real world problem. One simplifying assumption that is often made is to
assume that certain random variables are exponentially distributed. The reason for this is
that the exponential distribution is both easy to work and is often a good approximation to
the actual distribution.
We use exponential distribution to model lifetime data that is the data, which are mainly
non-negative. Although, with proper modifications, it can be used to analyze any type of
data (not necessarily non-negative only). The property of the exponential distribution,
which makes it easy to analyze, is that it does not deteriorate with time. By this we mean
that if the lifetime of an item is exponentially distributed, then an item which has been in
use for say ten hours, is as good as a new item in regards to the amount of time remaining
until the item fails.
Now we define the exponential distribution formally: A continuous random

variable X is said to be an exponential random variable if the p.d.f of X is given for some
X>0, by
f(x) = λe - λx for x>0

0 otherwise
Here λ is known as the rate constant. It can be shown mathematically that the average
1
value or the mean values of X is .Shapes of f (x) for different values of λ are provided
λ
in the figure below. From the figure,
13
3
λ=3
2.5
2 λ=2
1.5 λ=1
1 λ=0 5
0.5
0
0 1 2 3 4 5 6
Figure 4: The p.d.f. of the exponential distribution for different values of λ.
It is clear that f(x) is a decreasing function for all values of λ and f(x) tends to 0 as
x tends to ∞. Now consider the following example.
EXAMPLE 7: suppose that the amount of time one spends in a bank to withdraw cash
from an evening counter is exponentially distributed with mean ten minutes, that is λ =
1/10. What is the probability that the customer will spend more than fifteen minutes in the
counter?
SOLUTION: If X represents the amount of time that the customer spend in the
counter than we need to find P(X>15). Therefore,
∞ 3
−
∫λ e
− λ x −15 λ
P (X > 15) = =e =e 2
≈ 0.223
15
P(X>15)= .223 represents that there is a 22.3 % chance that the customer has to
wait more than 15 minutes.
1.4.3 Normal Distribution
Normal distribution is undoubtedly the most used continuous distribution in different

areas like astronomy, biology, psychology and of course in probability and statistics also.
Because of its practical as well as theoretical importance it has received considerable
attentions in different fields. The normal distribution has a unique position in probability
theory, and it can be used as an approximation to other distributions. In practice ‘normal
theory’ can frequently be applied with small risk of serious error, when substantially non-
normal distributions correspond more closely to the observed value. The work of Gauss in
1809 and 1816 established techniques based on normal distribution, which became
standard methods used during the nineteenth century. Because of Gauss’s enormous
contribution, it is popularly known as Gaussian distribution also.
14
We will now state the normal distribution formally: The random variable X is said
to be normally distributed with parameters µ and σ , if the p.d.f f(x) of X is given by
( x− µ )2
1 −
f ( x) = e 2σ 2
, where - ∞ < x < ∞
2π σ
Here µ is a real number lying between -∞ and ∞ and σ is a real number lying
between 0 and ∞.
The function f(x) may look a bit strange, but do not get bother. Just notice the
following important things. Note that it involves two parameters µ and σ , that means
corresponding to each µ and σ we get a distribution function. More over it can be seen
that for -∞ < µ < ∞ and 0 < σ < ∞, the function f(x) is symmetric about µ and is a ‘bell
shaped’ one. Both µ and σ have nice interpretation. It can be easily checked that µ is
the average value or mean of the distribution and σ provides the measure of spread. The
p.d.f. of two different normal distributions are provided below.
0.8
0.7
(µ=1,σ=0.5)
0.6
0.5
0.4
0.3
(µ=0,σ=1)
0.2
0
0.1
-4 –2 0 2 4
Figure 5: The p.d.f of the normal distribution for two different values of ( µ, σ).
It is clear from the above figure that the p.d.f. is symmetric about µ and the shape
depends on σ . The spread of the distribution is more if σ is large.
Now let us find the P(a < X < b) for any a and b, when X follows normal
distribution with parameters µ and σ , note that,
( x−µ )2 b− µ z2
b 1 − 1 −
P(a < X < b) = ∫a
2πσ
e 2σ 2
dx = ∫a −σµ
σ 2π
e 2
dz.
15
x−µ
The last equality follows by simply making the transformation z = . Therefore it
σ
follows
a−µ b−µ
P(a < X < b) = P ( < Z< ),
σ σ
Where Z follows normal distribution with parameters 0 and 1. Although the probability
can not be calculated in a compact form, extensive tables are available for P (Z < z) for
different values of z. The table values can be used to compute P (a < X < b) for any µ
and σ .
Say we denote F(a) = P[Z<= a], the probability thet the standard normal variable Z takes
values less than or equal to ‘a’.The values of F for different values of a are calculated and
listed in table . One such table is given in the end of this unit
Note that the entries in the table are values of z for z=0.00 0.01,0.02 , ..0.09.To find the
probability that a random variable having standard normal distribution will take on a
value between a and b, we use the equation
P[a < Z < b] = F(b) – F(a)

And if either a or b is negative then we can make use of the identity
F(-z) = 1 - F(z)
EXAMPLE 8 Use the table to find the following probabilities

(a) P[0.87 < Z < 1.28]
(b) P[-0.34 < Z < 0.62]
(c) P[Z >= 0.85]
(d) P[Z >= -0.65]
SOLUTION
a)P[0.87 < Z < 1.28] : Find F(1.28 from the table). In the table in the row for Z=1.2 find
the value under column 0.08 it will be 0.8997 . Similarly find F(0.87) =0.8078
so, P[0.87 < Z < 1.28] = 0.8997 – 0.8078 = 0.0919
b) Similarly P[-0.34 < Z < 0.62] = F(0.62) – F(0.34) = F(0.62) – [ 1- F(0.34)]

= 0.7324 – (1 - 0.6331) = 0.3655
c) Similarly calculate P[Z > 0.85] = 1- P[Z <= 0.85] = 1-F(0.85) = 0.1977
d)P[Z > -0.65] = 1 – P [ Z <= - 0.65]

= 1 – F (-0.65)
= 1- F(1-F(0.65))
= 0.7422
Next we shall see that how to use the standard normal probability table to
calculate probability of any normal distribution
Standardising
Any normal random variable X, which has mean Φ and variance σ2 can be
standardized as follows.
Take a variable X, and
i) subtract its mean (m or Φ) and then,
ii) divide by its standard deviation(s or σ).
16
We will call the result, Z, so
X −µ
Z=
σ
For example, suppose, as earlier, that X is an individual’s IQ score and that it has a
normal distribution with mean Φ = 100 and standard deviation σ = 15. To
standardize and individuals IQ score, X, we subtract Φ = 100 and divide the result
by σ = 15 to give,
X − 100
Z=
15
In this way every value of X, has a corresponding value of Z. For instance, when
130 − 100 90 − 100
X = 130, Z = = 2 and when X = 90, Z = = −0.67 .
15 15
The distribution of standardized normal random variables

The reason for standardizing a normal random variable in this way is that a
standardized normal random variable
X −µ
Z=
σ
has a standard normal distribution.
That is, Z is N(0,1). So, if we take any normal random variable, subtract its mean
and then divide by its standard deviation, the resulting random variable will have
standard normal distribution. We are going to use this fact to calculate (non-
standard) normal probabilities.
Calculating probabilities
With reference to the problem of IQ score, suppose we want tot find the
probability that an individual’s IQ score is less than 85, i.e. P[X<85]. The
corresponding area under the pdf N(100,152) is shown in Figure 6 below.
2.5
f(x)
1.5
f(x) P(X<85)
1
0.5
70 85 100 115
x F
Figure 6: area under the pdf N(100,152)
Figure 5: Density function f
(x)
We cannot use normal tables directly because these give N(0,1) probabilities.
Instead, we will convert the statement X<85 into an equivalent statement which
17
X − 100
involves the standardized score, Z = because we know it has a standard
15
normal distribution.
We start with X=85. To turn X into Z we must standardize the X, but to ensure
that we preserve the meaning of the statement we must treat the other side of the
inequality in exactly the same way. (Otherwise we will end up calculating the
probability of another statement, not X<85). ‘Standardising’ both sides gives,
X − 100 85 − 100
< .
15 15
The left hand side is now a standard normal random variable and so we can call it
Z, and we have,
85 − 100
Z<
15
which is
Z < – 1.
So, we have established that the statement we started with, X < 85 is equivalent to
Z < – 1. This means that whenever an IQ score, X is less than 85 the
corresponding standardized score, Z will be less than – 1 and so the probability we
are seeking, P[X<85] is the same P[Z < – 1].
P[Z < – 1] is just a standard normal probability and so we can look it up in Table
1 in the usual way, which gives 0.1587. We get that P[X < 85] = 0.1587.
This process of rewriting a probability statement about X, in terms of Z, is not
difficult if you are systematically writing down what you are doing at each stage.
We would lay out the working we have just done for P[X < 85] as follows.
X has a normal distribution with mean 100 and standard deviation 15. Let us find
the probability that X is less than 85.
 X − 100 85 − 100 
P[ X < 85] = P  <
 15 15 
= P[Z – 1] = 0.1587
Let us do some problems now.
Example 9: For each of these write down the equivalent standard normal
probability.
a) The number of people who visit a historical monument in a week is
normally distributed with a mean of 10,500 and a standard deviation of
600. Consider the probability that fewer than 9000 people visit in a week.
b) The number of cheques processed by a bank each day is normally
distributed with a mean of 30,100 and a standard deviation of 2450.
Consider the probability that the bank processes more than 32,000 cheques
in a day.
Solution: Here, we want to find the standard normal probability corresponding to

the probability P[X < 9000].
18
 X − 10500 9000 − 10500 
a) We have P[ X < 9000] = P  <  = P[ Z < −2.5] .
 600 600
b) Here, we want to find the standard normal probability corresponding to the
probability P[X > 32000].
 X − 30100 32000 − 30100 
P[ X < 32000] = P  <  = P[ Z < −0.78]
 2450 2450
Note: Probabilities like P[a < X < b] can be calculated in the same way. The only
difference is that when X is standardized, similar operations must be applied to
both a and b. that is, a < X < b becomes,
a−µ X −µ b−µ
< <
σ σ σ
which is
a−µ b−µ
<Z<
σ σ
Example 10: An individual’s IQ score has a N(100, 152) distribution. Find the
probability that an individual’s IQ score is between 91 and 121.
Solution: We require P[91 < X < 121]. Standardising gives

 91 − 100 X − 100 121 − 100 
P < <
 15 15 15 
The middle term is standardized normal random variable and so we have,
 −9 21 
P  < Z <  = P[−0.6 < Z < 1.4] = 0.9192 − 0.2743 = 0.6449 .
 15 15 

E1 If a random variable has the standard normal distribution , find the probability that it
will take on a value
a) Less than 1.50
b) Less than –1.20
c) Greater than –1.75
E2 A filling machine is set to pour 952 ml of oil into bottles. The amount to fill are
normally distributed with a mean of 952 ml and a standard deviation of 4 ml. Use the
standard normal table to find the probability that the bottle contains oil between 952 and
956 ml ?
1.4.4 Chi-Square Distribution

In the last subsection we have discussed normal distribution. The chi-square distribution
can be obtained from the normal distribution as follows. Suppose Z1,…,Zn are n
independent identically distributed normal random variables with parameters 0 and 1,
then Z12 +…+ Zn2 is said to have chi-square distribution with n degrees of freedom. The
degrees of freedom here basically indicates the number of independent components which
constitute the chi-square distribution. It has received several attention because of its
appearance in the constructing analysis of variable tables, contingency tables and for
obtaining the critical values of different testing procedure. Now we formally provide the
p.d.f of a chi-square random variable with n degrees of freedom.
19
If the random variable X has chi-square distribution with n-degrees of freedom,
then the p.d.f. of X is f(x)
1
e − x / 2 x ( n / 2) −1 if x >0
f(x) = 2 n/2
Γ(n / 2)
0 otherwise
here Γ (.) is a gamma function and it is defined as
∞
Γ( a ) = ∫ x a −1e − x dx
0
Although, the p.d.f of chi-square random variable is not a very nice looking one, do not
bother about that. Keep in mind that the shapes of density functions are always skewed. In
this case also if we want to compute P(a < X < b) for any a, b and n, explicitly it is not
possible to compute. Numerical integration is needed to compute this probability. Even
for chi-square distribution extensive tables of P(a < X < b) are available for different
values of a, b and n.
Note : We have a standard table corresponding to Chi- Square Distribution, many times
you may need to refer the values from the table. So the same is given at the end , method
of usage is similar to that discussed in Normal distribution.
EXAMPLE 9 Show that the moment generating function of a random variable X

which is chi-square distributed with v degrees of freedom is M(t) = (1 – 2t) –v/2.
∞
1
SOLUTION M (t ) = E (etx ) =
2v / 2 Γ( v/2) 0 ∫
etx x ( v − 2) / 2 e− x / 2 dx
∞
1
= v/2
2 Γ( v/2) 0 ∫
x ( v − 2) / 2 e − x (1− 2t ) x / 2 dx
Letting (1 − 2t ) x / 2 = u in the last integral we find

∞ ( v − 2) / 2
1  2u  2du
M (t ) = v / 2
2 Γ( v/2) ∫
0
 
 1 − 2t 
e−u
1 − 2t
(1 − 2t ) − v / 2 ∞
=
Γ(v / 2) ∫ 0
u ( v / 2) −1e −u du = (1 − 2t ) − v / 2
E1 Let X1 and X2 be independent random variables, which are chi-square

distributed with v1 and v2 degrees of freedom respectively. Show that the moment
generating function of Z = X1 + X2 is (1 − 2t ) − v ( v + v ) / 2 1 2
E2 Find the values of x2 for which the area of the right-hand tail of the x2
distribution is 0.05, if the number of degrees of freedom v is equal to (a) 15, (b)
21, (c) 50.
20
1.5 SUMMARY
In this unit we have covered following points:
a) A random variable is a variable that takes different values according to the chance
outcome
b) Types of random variables: Discrete and Continuous
c) Probability distribution gives the probabilities with which the random variables
takes various values in their range
d) Discussed probability distributions :
a. Binomial Distribution : The probability of an event P[X=r] in this

distribution is given by
P(X = r) = C(n,r)pr(1-p)n-r = C(n,r)prqn-r,
b. Poisson Distribution : The probability of an event P[X=i] in this

distribution is given by
e −λ λ i
P( X = i) = , i = 0,1,2,3,...
i!
c. Uniform Distribution : The probability density function is defined by
 1
 for A< x< B
f ( x) =  B − A
 0 otherwise
d. Normal Distribution : The probability for this distribution is determined

by calculating the area under the curve of probability density function
defined by
( x− µ )2
1 −
f ( x) = e 2σ 2
, where - ∞ < x < ∞
2π σ
e. Chi-Square Distribution : If the random variable X has chi-square

distribution with n-degrees of freedom, then the probability density
function of X is given by
1
e − x / 2 x ( n / 2 ) −1 if x >0
f(x) = 2 n/2
Γ(n / 2)
0 otherwise
here Γ (.) is a gamma function and it is defined as
∞
Γ( a ) = ∫ x a −1e − x dx
0
21
f. Mathematical expectation or Expected values or Expectations
E(X) is defined as E(X) = x0p0+ x1p1+ x2p2+…………..=Σxipi
when all probabilities are equal then E(X) = mean of x0, x1 ,x2……
Similarly for continuous variables X having density function p(x)
where P[X=x] = p(x) , the Expectation E(X) will be given by
integral of xip(xi) w.r.t x .
This concept of Expectation also contributes to the definition of
Moment Generating Function of X i.e Mx(t)= E(etx)
1.6 SOLUTIONS

E1 a) If X denotes the number of correct answers, then X is the random
variable for this situation
b) X can take the values 0,1,2,3…..up to 50
c) P[X=10] means the probability that the number of correct answers is 10

E1 Case (a) is not discrete where as case (b) and (c) are discrete because in
case (a) we are taking values in an interval but in case(b) the number of
accident is finite , similarly you argue for case (c)

E1 This situation follows the binomial distribution with n=4 and p=90/100=9/10
The random variable X is the number of seeds that germinate. We have to
calculate the probability that exactly two of the four seedswill germinate. That is
P[X=2]. By applying Binomial formula , we get
P[X=2] = (4C2) * (9/10)2 *(1/10)2
= 6 * (81/100) * (1/100) = 486/10000 = 0.0486
So , the required probability is 0.0486

E1 Here we are dealing with the problem related to the receipt of bad Cheques,
which is an event with rare occurrence over an interval of time ( which is a day
In this case). So, we can apply Poisson distribution
Average bad Cheques received per day = 6

Thus by substituting λ = 6 and x=4 in Poisson formula we get
P[X=4] = (64 e-6)/4! = 0.135

E1 Suppose X denotes the time the fire alarm starts measured in minutes after 10:00 am.
Then clearly X is a uniform random variable between (0,180). If we want to find the
probability that fire alarm will ring before noon, then
(12 − 10) × 60 2
P (X ≤ 12 : 00 noon) = = .
180 3
22
E1 a) 0.9332
b) 0.1151
c) 0.9599
E2 The standard normal probability corresponding to this probability is given by
P[952 < Z < 956] = P[((952-952)/4) < ((X-952)/4) < ((952-956)/4)]

= P [0 < Z < 1]
= F(1) - F(0)
= 0.8413 – 0.5 = 0.343
E1 The moment generating function of Z = X1 + X2 is

M (t ) = E et ( X + X )  = E (etX ) E (etX ) = (1 − 2t ) − v / 2 (1 − 2t )− v
1 2 1 2 1 2 /2
= (1 − 2t ) − ( v + v ) / 2
1 2
using Example 9.
E2 Using the table in for Chi Square distribution we find in the column headed x2.95 the
values: (a) 25.0 corresponding to v = 15; (b) 32.7 corresponding to v = 21; (c) 67.5
corresponding to v = 50.
23
Table 1: The distribution function of standard normal random variable
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
Table 2: The critical values of chi-square distribution. The areas given across the top are
the areas to the right of the critical value. To look up an area on the left, subtract it from
one, and then look it up (i.e., 0.05 on the left is 0.95 on the right).
df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 - - 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.597 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.19 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.17 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.341 46.928
UNIT 2 PSEUDO RANDOM NUMBER GENERATION
2.0 Introduction
2.1 Objectives
2.2 Uniform random number generator
2.3 Generating random variates from
arbitrary distribution
2.4 Inverse Transform
2.5 Acceptance – rejection method
2.6 Summary
2.7 Solutions
2.0 INTRODUCTION
A pseudo-random number generation is the methodology to develop algorithms and programs that can be used in, probability and
statistics applications when large quantities of random digits are needed. Most of these programs produce endless strings of single-
digit numbers, usually in base 10, known as the decimal system. When large samples of pseudo-random numbers are taken, each of
the 10 digits in the set {0,1,2,3,4,5,6,7,8,9} occurs with equal frequency, even though they are not evenly distributed in the sequence.
Many algorithms have been developed in an attempt to produce truly random sequences of numbers, endless strings of digits in which
it is theoretically impossible to predict the next digit in the sequence based on the digits up to a given point. But the very existence of
the algorithm, no matter how sophisticated, means that the next digit can be predicted! This has given rise to the term pseudo-random
for such machine-generated strings of digits. They are equivalent to random-number sequences for most applications, but they are not
truly random according to the rigorous definition.
A simulation that has any random aspects at all, must involve sampling or generating random variables from different probability
distributions. These distributions are often specified, that is the form of the distribution functions is explicitly known, for example it
can be exponential, gamma, normal or Poisson as discussed in Unit 1.
Random number generation has intrigued scientists for several years and a lot of efforts has been spent on the creation of
randomness on a deterministic (non-random) machine, that is to design computer algorithms that are able to produce ‘random’
sequences of integers. This is not a trivial task. Such algorithms are called generators and all generators have flaws because all of
them construct the n-th number in the sequence as a function of the (n – 1) – th number, initialized with a non-random seed value.
Numerous techniques have been invented over the years that measure just how random a sequence is, and most well known generator
have subjected to rigorous testing. The mathematical tools that are required to design such an algorithm are largely number theoretic
and combinatorial in nature. These tools differ drastically from those needed when we want to generate sequences of integers with
certain non-uniform distributions given that a perfect uniform random number generator is available.
The methodology of generating random numbers has a long and interesting history. The earliest methods were essentially carried out
by hand, such as casting lots, throwing dice, dealing out cards or drawing numbered balls from a well-stirred urn. Many lotteries are
still operating in this way. In the early twentieth century statisticians joined gamblers in generating random numbers and mechanized
devices were built to generate random numbers more quickly. Some time later, electric circuits based on randomly pulsating vacuum
tubes were developed that delivered random digits at rates up to 50 per second. One such random number generator machine the
Electronic Random Number Indicator Equipment (ERNIE) was used by the British General Post Office to pick the winners in the
Premium Savings Bond lottery. Another electronic device was used by the Rand Corporation to generate a table of million random
digits. In India also Indian Statistical Institute has published a book just the collection of million random numbers in the mid twentieth
century which was used for sample survey planning.
As computers and simulation became more widely used, increasing attention was paid to methods of random number
generation compatible with the way computers work. A good uniform between 0 and 1, random generator should possess certain
properties as listed below:
• Above all, the numbers produced should appear to be distributed uniformly on

[0, 1] and should not exhibit any correlation with each other; otherwise, the simulation’s results may be completely invalid.
• From a practical point of view a generator should be fast and avoid the need for a lot of storage.
• We should be able to produce a given stream of random numbers from a given initial (seed) value for at least two reasons. First
this can sometimes make debugging or verification of the computer program easier or we might want to use identical random
numbers in simulating different systems in order to obtain a more precise comparison.
In this unit we will describe how to generate U (0,1) (uniform between 0 and 1, see unit 1 for the actual definition) in a
1
computer. Once we have a U (0, 1) random number, we will use that to generate several other deviates from different discrete and
continuous distributions.
2. 1 OBJECTIVES
After reading this unit, you should know

• how to generate U (0, 1) random number in a computer
• how to generate random deviates from any discrete distribution
• how to generate random numbers from many continuous distributions, like exponential, Weibull, gamma, normal, chi-square etc.
2.2 UNIFORM RANDOM NUMBER GENERATORS
As we had mentioned in the previous section that we need the uniform random number of generator for generating random numbers
from any other distributions. Therefore, it is very important to have a very good uniform random number generator. There are several
methods available to generate uniform random numbers. But currently the most popular one is the linear congruential generator
(LCG). Most of the existing software’s today use this LCGs proposed by Lehmer in the early 50’s. The LCGs provides an algorithm
how to generate uniform random number between (0,1). It can be simply described as follows.
A sequence of integers Z1, Z2,… is defined by the recursive formula
Zi = (aZi – 1 + c) (mod m), (1)
where m, the modulus, a, the multiplier, c, the increment and Z0, the seed or the initial value, are all non-negative integers. Those who
are not familiar with the definition of modules, note that for non-negative integers, x, y and z, x = y (mod z) means x is the remainder
when the integer y is divided the integer z. For example if y = 10 and z = 3, then x = 1, or if y = 10 and z = 2, then x = 0. Therefore,
from (1) it is clear that to obtain Zi, first divide aZi – 1 + c by m and Zi is the corresponding remainder of this division. It is clear that 1
Zi
≤ Zi ≤ m – 1 and to obtain the desired random numbers Ui, for i = 1, 2…. On [0, 1], we let Ui = . The choice of the non-negative
m
integers, a, c and m are the most crucial steps, and they come from the theoretical considerations. In addition to non-negatively, they
also should satisfy 0 < m, a < m and c < m. Moreover, the initial value Z0 < m.
Immediately, two objections could be raised against LCGs. The first objection is one which is common to all random number
generators, namely the Zi’s defined by (1) are not really random. It can be easily seen that for i = 1, 2,….,
⎡ c(a i – 1) ⎤
Zi = ⎢a i Z 0 + ⎥ (mod m),
⎣ a – 1 ⎦
so that every Zi is completely determined by m, a, c and Z0. However, by careful choice of these four parameters the aim is to induce a
behavior in the Zi’s that makes the corresponding Ui’s appear to be independent identically distributed U (0, 1) random variates when
subjected to variety to statistical tests.
1 2 ( m − 1)
The second objection to LCGs might be that the Ui’s can take on only the rational numbers 0, , , …., ; in fact
m m m
the Ui’s might take only a fraction of these values, depending on the specifications of the four parameters. Therefore, there is no
0 .1 0 .9 0 .8
possibility of getting a value of Ui between, say and , whereas this should occur with probability > 0. We will see later
m m m
that the modulus m is usually chosen to be very large, say 109 or more, so that the points in [0, 1] where Ui’s can fall are very dense.
This provides an accurate approximation to the true continuous U (0, 1) distribution, sufficient for most purposes.
Let us consider the following example:
Example 1 : Consider the LCG defined by m = 16, a = 5, c = 3 and Z0 = 7. The following table gives Zi and Ui (up to three decimal
places) for i = 1, …., 19. Note that Z17 = Z1 = 6, Z18 = Z2 = 1 and so on. Therefore, from i = 17, the sequence repeats itself. Naturally
we do not seriously suggest anybody to use this generator. The main reason here m is too small. This we are presenting just for
illustrative purpose.
Table 1
The LCG Zi = (5Z i – 1 + 3) (mod 16) with Z0 = 7
2i Zi Ui i Zi Ui i Zi Ui i Zi Ui
0 7 - 5 10 0.625 10 9 0.563 15 4 0.250
1 6 0.375 6 5 0.313 11 0 0.000 16 7 0.438
2 1 0.063 7 12 0.750 12 3 0.188 17 6 0.375
3 8 0.500 8 15 0.938 13 2 0.125 18 1 0.063
4 11 0.688 9 14 0.875 14 13 0.813 19 8 0.500
Note that the repeating behaviour of LCG is inevitable. By the definition of Zi, whenever it takes on a value it had taken previously,
from that point onward the sequence will repeat itself endlessly. The length of a cycle is called the period of a generator. For LCG, Zi
depends only on the previous value Zi – 1 and since 0 ≤ Zi ≤ m – 1, it is clear that the period is at most m. If it m, the LCG is said to
have full period. Clearly, if a generator is full period, any choice of the initial seed Z0 from {0,…,m-1} will produce the entire cycle in
some order.
Since for large scale simulation projects may require hundreds of thousands of random numbers, it is desirable to have LCGs
with long periods. Moreover, it is desirable to have full period LCGs, because then it is assured to have every integer between 0 and m
– 1 exactly once in every cycle. Thus it is very important to know how to choose a, m and c so that the corresponding LCG will have
full period. We should also keep in mind that obtaining full period is just one desirable property for a good LCG. It should also have
good statistical properties, such as apparent independence, computational and storage efficiency and reproducibility. Reproducibility
is simple. For reproducibility, we must only remember that the initial seed Z0 initiates the generator with this value again to obtain the
same sequence of Ui exactly. Some of the standard LCGs with different values of a, m and c are presented below. These LCG have
been observed to perform very well in several machines and passed the standard statistical tests also.
Generator 1: a = 16807, m = 231 – 1, c = 0.

Generator 2: a = 1664525, m = 232, c = 1664525.
Fortunately today, most of the simulation packages and even simple scientific calculators have reliable U (0 1,) generator available.
E1 What do you mean by Pseudo random number generation? What is the practical advantage of the concept of random number
generation? Do you know any algorithm which works in designing the software for Random number generation?
2.3 GENERATING RANDOM VARIATES FROM ARBITRARY

DISTRIBUTIONS
A simulation that has any random aspects at all must involve generating random variates from different distributions. We usually use
the phrase generating a random variate to refer the activity of obtaining an observation or a realization on a random variable from the
desired distribution. These distributions are often specified as a result of fitting some appropriate distributional form. They are often
specified in advance, for example exponential, gamma or Poisson etc. In this section we assume that the distributional form is already
been specified including the values of the parameters and we address the issue of how we can generate random variate from this
specified distribution.
We will see in this section that the basic ingredient needed for every method of generating random variates from any
distribution is a source of independent identically distributed U (0, 1) random variates. Therefore, it is essential that a statistically
reliable U (0, 1) generator is available for generating random deviate correctly from any other distribution.Therefore, from now on
we assume that we have a reliable sequence of U(0,1) variates available to us.
There are several issues which should be kept in mind before using any particular generator. The most important issue is of
course the exactness. It is important that one should use an algorithm that results in random variates with exactly the desired
distribution, within the unavoidable external limitations of machine accuracy and the exactness of U (0,1) random number generator.
The second important issue is efficiency. Given that we have several choices, we should choose that algorithm which is efficient in
terms of both storage space and execution time. Now we provide some of the most popular and standard techniques to generator non-
uniform random deviates, it may be both discrete or continuous and even mixed distribution also.
2.4 INVERSE TRANSFORM
Suppose we want to generate a random variate X that has a continuous and strictly increasing distribution function F, when 0 < F (x) <
1, i.e., whenever x1 < x2 then F (x1) < F (x2). We draw a curve of F in the Figure 1. Let F–1 denote the inverse of the function F. Then
an algorithm for generating a random variate X having distribution function F is as follow .
ALGORITHM
• Step 1: Generate U1 from U1 (0,1)

• Step 2: Return X1 = F–1 (U1). 3
1
0.9
U1 0.8
0.7
0.6
0.5
F(x) 0.4
0.3
0.2
0.1
-2 -1 0 1 2 3
X1
Figure 1: Distribution function of a continuous a random variable.
Note that when F is a strictly increasing function, F–1 (U) will be always defined, since 0 ≤ U ≤ 1 and the range of F is [0, 1]. Figure 1
illustrates the algorithm graphically. According to the figure it is clear that the uniform random variable U1 results the random
variable X1 with the distribution function F. To show that the value X1 returned by the above algorithm has the desired distribution
function F, note that
P (X1 ≤ x) = P (F–1 (U1 )≤ x) = P (U1 ≤ F (x)) = F (x).
The first equality follows from the definition of X1, the second equality follows because F is invertible and the third equality follows
because U1 follows U (0,1).
EXAMPLE 2: Let X have the Weibull distribution with the following probability density function:
⎧⎪αλe − λxα xα −1 if x>0 -1
f(x) = ⎨ Find F
⎪⎩ 0 if x<0
SOLUTION Here α and λ both are known constants and both of them are strictly greater than 0.
Therefore, X has the distribution function
⎧⎪1 − e − λ xα if x>0
F (x) = ⎨
⎪⎩ 0 if x≤0
Therefore, to compute F–1 (u), let us equate u = F (x) and we solve for x to obtain
1
–1 1 α
F (u) = [ { – In (1 – u)} ]
λ
Therefore to generate X from a Weibull distribution with α = 2 and λ = 1, generate U from U (0, 1) and then set
1
2
X = [{ – In (1 – u)}]
In this case also as before, it is possible to replace U by 1 – U, therefore we can use

1
2
X = [{ – In u}] ,
to avoid one subtraction.
The inverse-transform method can be used also when the random variable X is discrete.
In this case the distribution function is
4
F (x) = P (X ≤ x) = ∑ p( x ),
xi ≤ x
i
Where p(xi) is the probability mass function of X, i.e.,
p (xi) = P (X = xi).
We can assume that X can take values x1, x2,…., where x1 < x2 < …. Then the algorithm is as follows:
ALGORITHM
• Step 1: Generate U from U (0,1)
• Step 2: Determine the smallest positive integer I such that U ≤ F (x1) and return
X = xI . The following figure 2 illustrates the method. In that case we generate X=x4
Now to show that the discrete inverse transform method is valid, we need to show that P (X = xi) = p (xi) for all i = 1, we get X = x1, if
and only if U ≤ F (x1) = p (x1), since xi’s are in the increasing order and U follows U (0,1). For i ≥ 2, the algorithm sets X = xi if and
only if F (xi – 1) < U ≤ F (xi), since the i chosen by the algorithm is the smallest positive integer such that U ≤ F (xi). Further, since U
follows U (0,1) and 0 ≤ F (xi – 1) < F (xi) ≤ 1,
P (X = xi) = P {F (xi – 1 < U ≤ F (xi)} = F (xi) – F (xi – 1) = p (xi).
Now consider the following example.
F(x)
x1 x2 x3 x4 x5
x X
Figure 2: Distribution function of a discrete random variable
EXAMPLE 3: Suppose we want to generate a random sample from the following discrete probability distribution.
1 1 1
P (X = 1) = , P (X = 2) = , P (X = 4) = . Generate a random sample from X?
2 4 4
SOLUTION The distribution function of the random variable X is
⎧0 if x <1
⎪1
⎪⎪ if 1≤ x < 2
2
F (x) = ⎨ 3
⎪ if 2≤ x<4
⎪4
⎪⎩1 if x≥4
The distribution function is X is presented in the Figure 3. If we want to generate a random sample from X, first generate a random
1 1 3
variable U from U (0, 1). If U ≤ , then assign X = 1. If < U ≤ , then assign X = 2, otherwise assign X = 4.
2 2 4
0.8
0.6 5
F(x)
Figure 3: Distribution function of the random variable X of example 3.
E1 Let X have the exponential distribution with mean 1. The distribution function is F (x) =
⎧1 − e − x if x ≥ 0
⎨ Find F-1
⎩ 0 if x < 0
E2 Consider another discrete random variable which may take infinitely many values. Suppose the random variable X has the
following probability mass function.
1
P (X = i) = pi = , i = 1, 2, 3, ……… . Generate a random sample from X?
2i
2. 5 ACCEPTANCE-REJECTION METHOD
In the last subsection we have discussed the inverse transformation method to generate random number from different non-uniform
distributions. Note that apparently, the inverse transformation method seems to be the most general method to generate random
deviate from any distribution function functions. In fact it can be used provided the distribution function can be written in an explicit
form, or more precisely the inverse of the distribution function can be computed analytically. For example, in case of exponential,
Weibull, Cauchy distributions the distribution function and also their inverses can be constructed analytically. Unfortunately that is
not the case in general. Suppose the random variable X follows gamma with the shape and scale parameters as α and λ respectively.
Then the density function of X, say
fx (x | α , λ ), is
⎧ λα α −1 λx
⎪ x e− if x>0
f (x | α , λ ) = ⎨ Γα
⎪ x≤0
⎩ 0 if
x
The distribution function X, say F ( x | α , λ ) = ∫0
f ( y |α , λ )dy can not be expressed in explicit form. Therefore, F–1 (x | α , λ ) also can
not be calculated explicitly. Exactly the same problem arises if X is a normal random variable. Suppose X is a normal random variable
with mean μ and variance σ2, then the probability density function of X, say f (x | μ, σ) is
( x−μ )2
1 −
2σ 2
f (x | μ , σ) = e , for – ∞ < x < ∞.
2πσ
In this case also the distribution function can not be computed analytically and similarly it’s inverse also. Therefore in these cases we
can not apply the inverse transformation method to generate the corresponding random deviates. The acceptance-rejection method can
be used quite effectively to generate these random deviates. It can be described as follows.
Suppose we have a density function f (x) and we want to generate a random deviate from the density function f (x). The
distribution function of f (x) can not be expressed in explicit form. The acceptance-rejection method requires that we specify a
function g (x) that majorizes the function f (x), that is, f (x) ≤ g (x) for all x. Naturally, g (x) will not be a density function always, since
∞ ∞
c= ∫
∞
g ( x)dx ≥ ∫ f ( x) = 1,
∞
1
but the function h (x) = g ( x) is clearly a density function provided c < ∞ . Now for any given f (x), we choose the function g (x),
c
such that c < ∞ and it is possible to generate random deviate from g (x) by a inverse transformation method. Then we can generate the
random deviate X from f (x) as follows:
ALGORITHM
6
• Step 1: Generate Y having density function g (x).
• Step 2: Generate U from U (0,1) which is independent of Y.
f (Y )
• Step 3: If U ≤ , X = Y, otherwise go back to Step 1 and try again.
g (Y )
f (Y )
Note that the algorithm is looping back to Step 1 until we generate a pair (Y, U) pairs in Steps 1 and 2 for which U ≤ , when we
g (Y )
accept the value Y for X. Theoretically it can be shown that the random deviate X generated by the above algorithm has indeed the
probability density function f (x). Since it is not very easy to prove the result we do not provide it.
EXAMPLE 4: Consider the following density function
⎧60 x 3 (1 − x) 2 if 0 ≤ x ≤1
f (x) = ⎨
⎩ 0 otherwise
In this case the distribution function of f (x) is
⎧ 0 if x<0
⎪
F (x) = ⎨10 x 6 + 15 x 4 − 24 x 5 if 0 < x < 1
⎪ 1 if x >1
⎩
1
0.8
0.6
0.4
x)
0.2
0
0 0.2 0.4 0.6 0.8 1
x
Figure 4 : Distribution function of the density function f (x)
2.5
1.5
1
)
0.5
0
7
From the distribution function F (x) is provided in figure 4 .It is clear that the distribution function F (x) is a strictly increasing
function of x in [0, 1]. Therefore F–1 (x) exists, but unfortunately to find F–1 (x) we need to solve a six degree polynomial, which can
not be obtained analytically. We need to solve numerically only. Therefore, we can not generate random deviate from the density
function f (x) using the inversion method. But we will be able to generate random deviate from f (x) using the acceptance-rejection
method.
First let us look at the graph of the density function f (x). It is presented in the Figure 5. From the Figure 5 it is clear that f (x) is an
unimodal function with a unique maximum. The maximum can be easily obtained by the standard differential calculus, that is by
df ( x)
setting = 0. We see that the maximum of f (x) occurs at x = 0.6 and the maximum value at 0.6, that is f (0.6) = 2.0736.
dx
Therefore, if we define
⎧2.0736 if 0 < x <1

g (x) = ⎨
⎩ 0 otherwise.
then clearly f (x) ≤ g (x) for all x. Now to calculate h (x), first we need to calculate c. Since,
1
c= ∫ 2.0735 = 2.0736, therefore,
0
⎧ 2.0736
⎪ = 1 if 0 < x < 1
h (x) = ⎨ c
⎪⎩ 0 otherwise.
It is immediate that h (x) is just the U(0,1) density function. Now the algorithm takes the following form.
- Step 1: Generate Y from U (0, 1)

- Step 2: Generate U from U (0,1) which is independent of Y.
60Y 3 (1 − Y ) 2
- Step 3: If U ≤ then return with X = Y,otherwise go back to Step 1.
2.0736
In this case X has the desired density function f (x).
E1 use acceptance-rejection method to generate a random deviate from gamma density function, . The gamma density function with
the shape parameter α can be written as
⎧ 1 α −1 − x
⎪ x e if x>0
f (x) = ⎨ Γ(α )
⎪
⎩ 0 otherwise
2. 6 SUMMARY
In this unit we have discussed the meaning of pseudo random number generation and along with that we have described
uniform random number generator and arbitrary random number generator. Under Uniform random number generator case
we have emphasized on LCG (Linear Congruential Generator) and some objections related to LCG. Actually LCGs
provides an algorithm how to generate uniform random number between (0,1). It can be simply described as follows.
A sequence of integers Z1, Z2,… is defined by the recursive formula
Zi = (aZi – 1 + c) (mod m), (1)

8
where m, the modulus, a, the multiplier, c, the increment and Z0, the seed or the initial value, are all non-negative integers
Then we have discussed the concept , algorithm and application of Inverse transforms for random number generation .
In brief Suppose we want to generate a random variate X that has a continuous and strictly increasing distribution function
F, when 0 < F (x) < 1, i.e., whenever x1 < x2 then F (x1) < F (x2). Let F–1 denote the inverse of the function F. Then an
algorithm for generating a random variate of X having distribution function F is as follow .
ALGORITHM
• Step 1: Generate U1 from U1 (0,1)
• Step 2: Return X1 = F–1 (U1).
The inverse-transform method can be used also when the random variable X is discrete.
In this case the distribution function is
F (x) = P (X ≤ x) = ∑ p( x ),
xi ≤ x
i
Where p(xi) is the probability mass function of X, i.e.,
p (xi) = P (X = xi).
We can assume that X can take values x1, x2,…., where x1 < x2 < …. Then the algorithm is as follows:
ALGORITHM
• Step 1: Generate U from U (0,1)
• Step 2: Determine the smallest positive integer I such that U ≤ F (x1) and return
X = xI . The following figure 2 illustrates the method. In that case we generate X=x4
Note:
• the inverse transformation method to generate random number from different non-uniform distributions. Note that apparently,
the inverse transformation method seems to be the most general method to generate random deviate from any distribution
function functions. In fact it can be used provided the distribution function can be written in an explicit form
Finally we had discussed the Acceptance Rejection method of random number generation, this method is quite important because
,when the distribution function can not be computed analytically and similarly it’s inverse also. Then in such cases we can not apply
the inverse transformation method to generate the corresponding random deviates. The acceptance-rejection method can be used quite
effectively to generate these random deviates.
In brief ,Suppose we have a density function f (x) and we want to generate a random deviate from the density function f (x). The
distribution function of f (x) can not be expressed in explicit form. The acceptance-rejection method requires that we specify a
function g (x) that majorizes the function f (x), that is, f (x) ≤ g (x) for all x. Naturally, g (x) will not be a density function always, since
∞ ∞
c= ∫
∞
g ( x)dx ≥ ∫ f ( x) = 1,
∞
1
but the function h (x) = g ( x) is clearly a density function provided c < ∞ . Now for any given f (x), we choose the function g (x),
c
such that c < ∞ and it is possible to generate random deviate from g (x) by a inverse transformation method. Then we can generate the
random deviate X from f (x) as follows:
ALGORITHM
• Step 1: Generate Y having density function g (x).
• Step 2: Generate U from U (0,1) which is independent of Y.
f (Y )
• Step 3: If U ≤ , X = Y, otherwise go back to Step 1 and try again.
g (Y )
f (Y )
Note that the algorithm is looping back to Step 1 until we generate a pair (Y, U) pairs in Steps 1 and 2 for which U ≤ , when we
g (Y )
accept the value Y for X. Theoretically it can be shown that the random deviate X generated by the above algorithm has indeed the
probability density function f (x). Since it is not very easy to prove the result we do not provide it.
9
2. 7 SOLUTIONS
E1 A pseudo-random number generation is the methodology to develop algorithms and programs that can be used in,
probability and statistics applications when large quantities of random digits are needed. Most of these programs produce
endless strings of single-digit numbers, usually in base 10, known as the decimal system. When large samples of pseudo-
random numbers are taken, each of the 10 digits in the set {0,1,2,3,4,5,6,7,8,9} occurs with equal frequency, even though
they are not evenly distributed in the sequence.
Many algorithms have been developed in an attempt to produce truly random sequences of numbers, endless strings of
digits in which it is theoretically impossible to predict the next digit in the sequence based on the digits up to a given point.
But the very existence of the algorithm, no matter how sophisticated, means that the next digit can be predicted! This has
given rise to the term pseudo-random for such machine-generated strings of digits. They are equivalent to random-number
sequences for most applications, but they are not truly random according to the rigorous definition.
There are several methods available to generate uniform random numbers. But currently the most popular one is the linear
congruential generator (LCG). Most of the existing software’s today use this LCGs proposed by Lehmer in the early 50’s.
E1 To find F-1, we set u = F(x) and solve for x to obtain

x = F −1(u ) = − ln(1 − u ).
Therefore, to generate random variate X from exponential distribution with mean 1, first generate U from a U (0,1) and
then let X = - In (1 – U). Therefore X will have exponential distribution with mean 1. Since U and 1 – U have the same U
(0, 1) distribution, we can also use X = In U. This saves a subtraction.
∞
1
E2 Note that ∑2
i =1
i
= 1, therefore pi denotes the probability mass function of a discrete random variable. The
corresponding distribution function of the discrete random variable X can be written as
⎧ 0 if x <1
⎪
F (x) = ⎨ m 1
⎪⎩∑i =1 2i if m ≤ x < m + 1,
where m is any positive integer. Now to generate a random deviate from the random variable X, first draw a random
m −1 m 0
1 1 1
sample U from U (0, 1), since 0 ≤ U ≤ 1, there exits a positive integer m such that ∑ i ≤ U < ∑ i , where ∑ i = 0,
i =1 2 i =1 2 i =1 2
then X = m,
E1 Our problem is to generate random deviate from f (x) for a given 0 < α < 1. Note that we can not use the acceptance-
rejection method in this case. It is easily observed if we take
⎧
⎪0 if x≤0
⎪
⎪⎪ x α −1
g (x) = ⎨ if 0 < x < 1
⎪ Γ(α )
⎪ e −x
⎪ if x > 1,
⎪⎩ Γ(α )
then f (x) ≤ g (x) for all x. In this case
∞ 1 x α −1 ∞ e−x 1 ⎡ (e + a ) ⎤
c= ∫∞
g (x) = ∫0 Γ(α ) dx + ∫
1 Γ(α )
dx =
Γ(α ) ⎢⎣ ae ⎥⎦
.
10
1
Therefore, h (x) = g (x) is
c
⎧
⎪ 0 if x≤0
⎪ α −1
⎪αx
h (x) = ⎨ if 0 ≤ x ≤ 1
⎪ b
⎪αe − x
⎪ if x > 1,
⎩ b
e+a
where b = . The distribution function H (x) corresponds to the density function h (x) is
e
⎧ xα
⎪⎪ if 0 ≤ x ≤1
x b
H (x) = ∫ h (y) dy = ⎨
⎪1 − αe
0 −x
if x > 1,
⎪⎩ b
Which can be easily inverted as
⎧ 1
1
⎪⎪ (bu ) α
if u ≤
H – 1 (u) = ⎨ b
⎪− In b(1 − u ) if u > 1
⎪⎩ α b
Therefore, it is possible to generate random deviate from the density function h (x) using the simple inversion method.
Generation of a random deviate Y from the density function h (x) can be performed as follows. First generate U1 from U (0,
b (1 − U1 )
1
1
1), if U1 ≤ we set Y = (bU1) α , in this case Y ≤ 1. Otherwise Y = – In and in this case Y > 1. Also note that
b α
f (Y ) ⎧⎪e
−Y
if 0 ≤ Y ≤ 1
= ⎨ α −1
g (Y ) ⎪⎩Y if Y >1
Now the algorithm to generate random deviate from a gamma density function with the shape parameter α, for 0 < α < 1
takes the following form:
• Step 1: Generate U1 from U (0,1) and let P = bU1. If P > 1, go to Step 3 otherwise proceed to Step 2.
1
• Step 2: Let Y = P and generate U2 from U (0, 1). If U2 ≤ e – Y, return X = Y.
α
Otherwise go back to Step 1.
(b − P )
• Step 3: Let Y = – In and generate U2 from U (0, 1). If U2 ≤ Y α – 1, return X = Y, otherwise go back to Step 1.
α
11
Regression Analysis
UNIT 3 REGRESSION ANALYSIS
Structure Page Nos
3.0 Introduction
3.1 Objectives
3.2 Simple Linear Regression
3.2.1 Least Squares Estimation
3.2.2 Goodness of Fit
3.2.3 Residual Analysis
3.3 Non-Linear Regression
3.3.1 Least Squares Estimation
3.4 Summary
3.5 Solutions
3.0 INTRODUCTION
In many problems there are two or more variables that are inherently related and it may be
necessary to explore the nature of their relationship. Regression analysis is a statistical
technique for modeling and investigating the relationship between two or more variables.
For example in a chemical process suppose that the yield of the product is related to the
process operating temperature. Regression analysis can be used to build a model that
expresses yield as a function of temperature. This model can be used to predict yield at a
given temperature level. It can also be used for process optimization or process control
purposes.
In general, suppose that there is a single dependent variable or response variable y and
that is related to k independent or regressor variables say x1,……..,xk. The response
variable y is a random variable and the regressor variables x1,……,xk are measured with
negligible error. The relationship between y and x1,……..xk is characterized by a
mathematical model and it is known as the regression model. It is also known as the
regression of y on x1,……..,xk. This regression model is fitted to a set of data. In many
situations the experimenter knows the exact form of the functional relationship
between y and x1, ..., xk, say φ(x1, ..., xk), except for a set of unknown parameters.
When the functional form is unknown, it has to be approximated on the basis of
past experience or from the existing information. Because of its tractability, a
polynomial function is popular in the literature.
In this unit we will be mainly discussing the linear regression model and when k = 1, that
is only one regressor variables. We will be discussing in details how to estimate the
regression line and how it can be used for prediction purposes from a given set of data.
We will also discuss briefly how we can estimate the function φ, if it is not linear.
3.1 OBJECTIVES
After reading this unit, you should be able to

• Decide how two variables are related.
• Measure the strength of the linear relationship between two variables.
• Calculate a regression line that allows to predict the value of one of the variable if
1
the value of the other variable is known.
• Analyze the data by the method of least squares to determine the estimated
regression line to be used for prediction.
• Apply the least squares methods to fit different curves and use it for prediction
purposes.
3.2 SIMPLE LINEAR REGRESSION
We wish to determine the relationship between a single regressor variable x and a

response variable y(note: The linear regression with one independent variable is referred
to as simple linear regression ). We will refer to y as the dependent variable or response
and x as the independent variable or regressor. The regressor variable x is assumed to be a
continuous variable controlled by the experimenter. You know that it is often easy to
understand data through a graph. So, let us plot the data on Scatter diagram (a set of
points in a 2-D graph where horizontal axis is regressor and vertical axis is response)
Suppose that the true relationship between y and x is straight line. Therefore, each
observation y can be described by the following mathematical relation (model)
y = β 0 + β1 x + ∈ (1)
Yield (y)
80 100 120 140 160 180 200 220
Temperature (x)
Figure 1: Scatter diagram of yield versus temperature
where ∈ is a random variable with mean 0 and variance σ2. The ∈ is known as the error
component and it is assumed to be small. If the error ∈ was absent then it was a perfect
relation between the variables y and x which may not be very practical. Let us look at the
following example.
Example 1: A chemical engineer is investigating the effect of process operating

temperature on product yield. The study results in the following data.
2
Regression Analysis
Temperature °C (x) 100 110 120 130 140 150 160 170 180 190
Yield,%(y) 45 51 54 61 66 70 74 78 85 89
The scatter diagram between the temperature and the yield is presented in the Figure 1
above. From the Figure 1 it is clear that there is a linear relationship between yield and
temperature but clearly it is not perfect. For example we can not write the relationship
between y and x as follows
y = β 0 + β1 x
Clearly the presence of the error ∈ is needed. Moreover the error ∈ is a random variable
because it is not fixed and it varies from one temperature to another. It may also vary
when two observations are taken at the same temperature. If there was a perfect
linear relationship between y and x we would have required just two points to find the
relationship. Since the relationship is not perfectly linear it is usually required much more
than two data points to find their relationship. Our main objective is to find the
relationship between them from the existing information (data points). Since it is assumed
that the relationship between x and y is linear therefore the relationship can be expressed
by the equation (1) and finding the relationship basically boils down finding the unknown
constants β 0 and β1 from the observations.
Let us discuss this concept of linear regression by one more illustration/collection of data
described in the table 1 given below. This table encloses the data of 25 samples of
cement, for each sample we have a pair of observation (x,y) where x is percentage of SO3,
a chemical and y is the setting time in minutes. These two components are strongly
related; it is the percentage of SO3 which influences the setting time of any cement
sample, the recorded observations are given in table 1 below.
Table 1: Data on SO3 and Setting Time
S.No. Percentage of SO3 Setting Time

i x Y (in minutes)
1 1.84 190
2 1.91 192
3 1.90 210
4 1.66 194
5 1.48 170
6 1.26 160
7 1.21 143
8 1.32 164
9 2.11 200
10 0.94 136
11 2.25 206
12 0.96 138
13 1.71 185
14 2.35 210
15 1.64 178
16 1.19 170
17 1.56 160
18 1.53 160 3
19 0.96 140
20 1.7 168
21 1.68 152
22 1.28 160
23 1.35 116
24 1.49 145
25 1.78 170
Total 39.04 4217
Sum of
Squares 64.446 726539
From the table 1, you see that setting time y increases as percentage of SO3
increases. Whenever you find this type of increasing (or decreasing) trend in a
table , same will be reflected in the scatter diagram ,and it indicates that there is a
linear relationship between x and y. By drawing the scatter diagram you can
observe that the relationship is not perfect in the sense that a straight line cannot
be drawn through all the points in the scatter diagram.
Nevertheless, we may approximate it with some linear equation. What formula

shall we use? Suppose, we use the formula y = 90 + 50x to predict y based on x.
To examine how good this formula is, we need to compare the actual values of y
with the corresponding predicted values. When x = 0.96, the predicted y is equal
to 138(= 90 + 50 x 0.96). Let (xi, yi) denote the values of (x, y) for the ith sample.
From Table-1, notice that x12=x19=0.96, whereas y12 = 138 and y19 = 140.
Let $y = 90 + 50 xi . That is, $yi is the predicted value of y (then using y = 90 + 50x
for the ith sample. Since, x12=x19=0.96, both $y12 and $y19 are equal to 138. Thus the
difference e$i = yi − $yi , the error in prediction, also called residual is observed to
be e$ 12 = 0 and e$ 19 = 2. The formula we have considered above, y = 90 + 50x, is
called a simple linear regression equation , we will study these terms in detail in
our successive sections.
3.2.1 Least squares estimation
Suppose that we have n pairs of observations, say (x1 , y1),…….(xn , yn). It is assumed that
the observed yi and xi satisfy a linear relation as given in the model (1). These data can be
used to estimate the unknown parameters β 0 and β1 . The method we are going to use is
known as the method of least squares, that is, we will estimate β 0 and β1 so that the sum
of squares of the deviations from the observations to the regression line is minimum. We
will try to explain it first using a graphical method in Figure 2. For illustrative purposes
we are just taking 5 data points (x, y) = (0.5, 57), (0.75, 64), (1.00, 59), (1.25, 68),
(1.50,74). The estimated regression line can be obtained as follows. For any line we have
calculated the sum of the differences (vertical distances) squares between the y value and
the value, which is obtained using that particular line. Now the estimated regression line
is that line for which the sum of these differences squares is minimum.
4
90
Regression Analysis
85
80
Estimated regression line
75
70
Observed point Difference
y 65
60
55 Estimated point
0 0.5 1 1.5 2
x
Figure 2 Differences between y values and the estimated values using regression line
Matematically the sum of squares of the deviations of the observations from the
regression line is given by
n n
L= ∑
i =1
∈i2 = ∑(y − β
i =1
i 0 − β1 xi ) 2
The least squares estimators of β 0 and β1 , are βˆ0 and βˆ1 which can be obtained by
solving the following two linear equations.
n
∂L
∂β 0 i =1
∑
= −2 ( yi − β 0 − β1 xi ) = 0
∂L n
= −2∑ ( y i − β 0 − β1 xi ) = 0
∂β1 i =1
Simplifying these two equations yields
n n
n β 0 + β1 ∑ i =1
xi = ∑y
i =1
i (2)
n n n
β 0 ∑ xi + β1 ∑ xi2 = ∑y x i i (3)
i =1 i =1 i =1
Solving (2) and (3) βˆ0 and βˆ1 can be obtained as follows:
βˆ 0 = ŷ − βˆ1 x (4)
βˆ1 =
∑
n
yx
i =1 i i
−
1
n ( ∑ yi )( ∑ x )
n
i =1
n
i =1 i
(5)
− ( ∑ xi )
2
1
∑
n n
x2
i =1 i i =1
n
5
n n
where y = ∑ y and x = ∑ x
i =1
i
i =1
i . Therefore, the fitted simple linear regression line
between these n points is
ŷ = βˆ0 + βˆ1 x (6)
Note : Linear Correlation and Regression

linear correlation and regression are very similar. One uses the correlation
coefficient to determine whether two variables are linearly related or not. The
correlation coefficient measures the strength of the linear relationship. Regression
on the other hand is used when we want to answer question about the relationship
between two variables.
Some Properties for Linear Correlation and Regression

(1) The line of regression of y on x always passes through( x, y ) where
x, and y are the mean of x and y values.
(2) There are always two line of regression one of y on x and other of x on y.
i.e., y = a 1 + b yx x or x = a 2 + b xy y
σy
where byx = Regression coeff of y on x = r
σx
σx
bxy = Regression coeff of y on x = r
σy
Correlation can be obtained by the following formula also,
r = bxy * byx (-1≤ r ≤ 1)
Angle between lines of regression is given by,
 r 2 − 1  σ x * σ y  
θ = tan −1   2 2 

 r  σ x + σ y  
Where r = correlation coeff between x and y
σ x = standard deviation of variable x
σ y = standard deviation of variable y
So, now, Regression equation of y on x can be rewritten as
σy
( y − y) = r ( x − x)
σx
And Regression equation of x on y as,
σx
( x − x) = r ( y − y)
σy
Example 1 (contd.) Now we will compute the estimates of β 0 and β1 for the data points
given in Example 1. In this case it is observed
10 10
n = 10, ∑i =1
xi = 1450, ∑ yi = 673,
i =1
x = 145, y = 67.3
6
10 10 10
Regression Analysis
∑
i =1
xi2 = 218,500, ∑
i =1
yi2 = 47, 225, ∑x y
i =1
i i = 101,570
Therefore,
101,570 − 10 × 1450 × 673
βˆ1 = = 0.483,
218,500 − 10 × 14502
and
β̂ 0 = 673 – 0.483 X 145 = – 2.739.
The fitted simple linear regression line through those 10 points is
yˆ = −2.739 + 0.483x (7)
The best fitted line along with the data points are plotted in the Figure 3. Note that the
best fitted line can be used effectively for prediction purposes also. For example suppose
we want to know the expected yield when the temperature is 170°C, for which the data is
not available. We can use the (7) for this purpose as follows.
yˆ = −2.739 + 0.483 × 170 = 79.371 .
Therefore, the best fitted line shows that the expected yield at 170°C is 79.371.
100
90 Best fitted line
80
70
Yield (y)
60
50
40
80 100 120 140 160 180 200 220
Temperature (x)
Figure 3: Data points and the best fitted regression line passing through these points
Soon in this section only, we will discuss the technique consisted of few steps; which can
be used to fit a line in best way, such that the error is minimum. In crux we will study the
technique to determine the best equation, that can fit a line in the data such that the error
is minimum. But before that lets see one more example.
7
Example 2: A survey was conducted to relate the time required to deliver a proper
presentation on a topic , to the performance of the student with the scores he/she receives.
The following Table shows the matched data:
Table 2
Hours (x) Score (y)
0.5 57
0.75 64
1.00 59
1.25 68
1.50 74
1.75 76
2.00 79
2.25 83
2.50 85
2.75 86
3.00 88
3.25 89
3.50 90
3.75 94
4.00 96
(1) Find the regression equation that will predict a student’s score if we know how many
hours the student studied.
(2) If a student had studied 0.85 hours, what is the student’s predicted score?
Solution. We will arrange the data in the form of a chart to enable us to perform the
computations easily.
Table 3
x y x2 xy
0.5 57 0.25 28.5
0.75 64 0.56 48.0
1.00 59 1.00 59.0
1.25 68 1.56 85.0
1.50 74 2.25 111.0
1.75 76 3.06 133.0
2.00 79 4.00 158.0
2.25 83 5.06 186.75
2.50 85 6.25 212.5
2.75 86 7.56 236.5
3.00 88 9.00 246.0
3.25 89 10.56 289.25
3.50 90 12.25 315.0
3.75 94 14.06 352.50
4.00 96 16.00 384.0
33.75 1188 93.44 2863
In this case n = 15, therefore
15 × 2863 − 33.75 × 1188 1

βˆ1 2
= 10.857 , βˆ0 = [1188 − 10.857 × 33.75] = 54.772
15 × 93.44 − 33.75 15
8
Therefore, the prediction line becomes: Regression Analysis
yˆ = 54.772 + 10.857 x
Now the predicted score when x = 0.85, is
yˆ = 54.772 + 10.857 × 0.85 = 64.00
100
95
90
85
80
75
Score (y) 70
65 Predicted line
60
55
50
0 1 2 3 4 5
Hours (x)
Figure 4: Hours studied and the corresponding score with the best fitted regression line passing
through these points
Thus the predicted score of the student who had studied 0.85 hours is approximately
64.00.
We have plotted the individual scores and the hours studied with the best fitted prediction
line in the Figure 4. It shows the hours studied by the student and the corresponding score
follows a linear pattern and the predicted line can be used quite effectively to predict the
score of a student if we know how many hours the student had studied.
Now, it’s the time to discuss the technique for determining the best equation, i.e., the
equation which fits the line in a way that the overall error is minimized.
From above illustrations and examples you might have noticed that different
equations give us different residuals. What will be the best equation? Obviously,
$i are small.
the choice will be that equation for which es
This means that whatever straight line we use, it is not possible to make all es$i zero
, where e$i = yi − $yi (the difference). However, we would expect that the errors are
positive in some cases and negative in the other cases so that, on the whole, their
sum is close to zero. So, our job is to find out the best values of β0 and β1 in the
formula y = β 0 + β 1 x + e ( s.t. e 0) . Let us see how we do this.
9
$i are minimum. For
Now our aim is to find the values β0 and β1 so that the error es
that we state here four steps to be done.
1) Calculate a sum Sxx defined by
n
Sxx = ∑ x 2i − n x
2
(8)
i =1
Σxi
where xi’s are given value of the data and x = is the mean of the observed
n
values and n is the sample size.
The sum Sxx is called the corrected sum of squares.
2) Calculate a sum Sxy defined by
n
Sxy = ∑ xi yi − n x y (9)
i =1
where xi’s and yi’s are the x-values and y-values given by the data and x and
y are their means.
Sxy
3) Calculate = β 1 say. That is
Sxx
Sxy
β1 = (10)
Sxx
4) Find y − β 1 x = β 0 , say.
Let us now compute these values of the data in Table 1: Data on SO3 and Setting
Time, we get
x = 1.5616, y = 168.68, Sxx = 3.4811, and Sxy = 191.2328.
Substituting these values in (10) and (11), we get
Sxy
β1 = = 54.943 and β 0 = 168.68 – 54.943 x 1.5616 = 82.88 (11)
Sxx
Therefore, the best linear prediction formula is given by
y = 82.88 + 54.943x.
After drawing this line on the scatter diagram, you can find that this straight lines
is close to more points, and hence it is the best linear prediction.
Example 3: A hosiery mill wants to estimate how its monthly costs are related to its
monthly output rate. For that the firm collects a data regarding its costs and output for a
sample of nine months as given in the following table.
Table 4
Output (tons) Production cost (thousands of dollars)
1 2
2 3
4 4
8 7
6 6
5 5
8 8
9 8
10 7 6
Regression Analysis
(a) Find the scatter diagram for the data given above.
(b) Find the regression equation when the monthly output is the dependent variable (x)
and monthly cost is the independent variable (y).
(c) Use this regression line to predict the firm’s monthly cost if they decide to produce 4
tons per month.
(d) Use this regression line to predict the firm’s monthly cost if they decide to produce 9
tons per month.
Solution
a) Suppose that xi denote the output for the ith month and yi denote the cost
for the ith month. Then we can plot the graph for the pair (xi, yi) of the
values given in Table . Then we get the scatter diagram as shown in Figure
below.
C
O
10
S
T 9
(T
H 8 • •
O
U 7 •
S
A 6 • •
N
D’
5 •
S
4 •
OF
D 3 •
O
L 2 •
L
A 1
R)
0
1 2 3 4 5 6 7 8 9 10
OUTPUT (TONS)
Figure 5: Scatter Diagram
b) Now to find the least square regression line, we first calculate the sums Sxx
and Sxy from Eqn.(8) and (9).
n
Sxx = ∑ x 2i − nx
i =1
Note that from Table(4) we get that
n
∑ x
50
x = i =1 i =
n 9
11
n
∑ y
y= i =1i = 49
n 9
∑ x 2
i = 340
∑ y = 303
2
i
and ∑ x y = 319 i i
Therefore, we get that

n
∑ xi yi −nx y
β 1 = i =n1 2
∑ 1 x 2 − nx
i =1
9 x 319 − 50 x 49
=
9 x 340 − 502
421
= = 0.752
560
Correspondingly, we get
49 50
β 0 = − (0.752) x
9 9
= 1.266
Therefore, the best linear regression line is
y = 1.266 + (0.752)x
c) If the firms decides to produce 4 tons per month, then one can predict that
its cost would be
1.266 + (0.752) x 4 = 4.274
Since the costs are measured in thousands of dollars, this means that the total
costs would be expected to be $4,274.
d) If the firms decides to produce 9 tons per month, then one can predict that
its cost would be 1.266 + (0.752) x 9=8.034
Since the costs are measured in thousands of dollars, this means that the
total costs would be expected to be $8034.
E 1: In partially destroyed laboratory record of an analysis of correlation data, the

following results only are legible.
Variance of x = 9
Regression equations : 8x – 10y + 66 = 0
40x – 18y – 214 = 0
what were (1) the mean values of x and y,

(2) the correlation coeff between x and y
(3) the standard deviation of y
E2 Since humidity influences evaporation, the solvent balance of water reducible paints
during sprayout is affected by humidity. A controlled study is conducted to examine the
relationship between humidity (X) and the extent of evaporation (Y) is given below in
12 table 5. Knowledge of this relationship will be useful in that it will allow the painter to
adjust his or her spray gun setting to account for humidity. Estimate the simple linear Regression Analysis
regression line and predict the extent of solvent evaporation (i.e loss of solvent ,by
weight)when the relative humidity is 50%
Table 5
Observation (x) (y)
Relative Solvent
humidity, Evaporation,
(%) (%) wt
1 35.3 11.0
2 29.7 11.1
3 30.8 12.5
4 58.8 8.4
5 61.4 9.3
6 71.3 8.7
7 74.4 6.4
8 76.7 8.5
9 70.7 7.8
10 57.5 9.1
11 46.4 8.2
12 28.9 12.2
13 28.1 11.9
14 39.1 9.6
15 46.8 10.9
16 48.5 9.6
17 59.3 10.1
18 70.0 8.1
19 70.0 6.8
20 74.4 8.9
21 72.1 7.7
22 58.1 8.5
23 44.6 8.9
24 33.4 10.4
25 28.6 11.1
3.2.2 Goodness of Fit
We have seen in the previous subsection that the regression line provides estimates of the
dependent variable for a given value of the independent variable. The regression line is
called the best fitted line in the sense of minimizing the sum of squared errors. The best
fitted line shows the relationship between the independent (x) and dependent (y) variables
better than any other line. Naturally the question arises “How good is our best fitted
line?”. We want a measure of this goodness of fit. More precisely we want to have a
numerical value which measures this goodness of fit.
For developing a measure of goodness of fit, we first examine the variation in y. Let us
first try the variation in the response y. Since y depends on x, if we change x, then y also
changes. In other words, a part of variation in y’s is accounted by the variation in x’s.
13
Actually, we can mathematically show that the total variation in y’s can be split up as
follows:
n S 2 xy n
Syy = ∑ i =1
( yi − y ) 2 =
S xx
+ ∑ ( y − yˆ ) ,
i =1
i i
2
(12)
where
n n
Sxx = ∑
i =1
( xi − x) 2 ; Sxy = ∑ ( x − x)( y − yˆ )
i =1
i i i
Now if we divide (12) by Syy on both sides, we get
∑
n
S 2 xy i =1
( yi − yî ) 2
1= +
S xx S yy S yy
Since the quantities on the right hand side are both non-negative, none of them can exceed
one. Also if one of them is closer to zero the other one has to be closer to one. Thus if we
denote
S xy
R=
S xx S yy
then
2
S 2 xy
R =
S xx S yy
Since R2 must be between 0 and 1, R must be between – 1 and 1. It is clear that if R2 = 1,

then
∑
n
( yi − yî ) 2 n
i =1
S yy
= 0 or ∑ ( y − yˆ )
i =1
i i
2
=0 or yi = yî for all i.
n
Again when R2 is close to 1, ∑ ( y − yˆ )
i =1
i i
2
is close to zero. When R is negative, it means
that y decreases as x increase and when R is positive y increases when x increases. Thus R
gives a measure of strength of the relationship between the variables x and y.
Now let us compute the value of R for Example 1. For calculating the numerical value of
R, the following formula can be used;
∑ ∑
n n
S xy i =1
( xi − x)( yi − y ) xy
i =1 i i
− nx y
R= = =
S xx S yy
∑ ∑ ∑ ∑
n n n n
i =1
( xi − x ) 2 i =1
( yi − y ) 2 x2
i =1 i
− nx 2 y2
i =1 i
− ny 2
Therefore, for Example 1, the value of R becomes;
101,570 − 10 × 145 × 67.3 3985

R= = = 0.9981
218,500 − 10 × 145 2
47225 − 10 × 67.3 2 8250 1932.1
and R2 = 0.9963.
Therefore, it is clear from the value of R or from R2 that both of them are very close to
one. From the figure also it is clear that the predicted line fits the data very well.
14
Moreover R is positive means, there is a positive relation between the temperature and Regression Analysis
yield. As the temperature increases the yield also increases.
Now the natural question is how large this R or R2 will be to say that the fit is very good.
There is a formal statistical test based on F-distribution which can be used to test whether
R2 is significantly large or not. We are not going into that details. But as a thumb rule we
can say that if R2 is greater that 0.9, the fit is very good, if it is between 0.6 to 0.8, the fit
is moderate and if it is less than 0.5 it is not good.
E1) For the data given in the table below compute R and R2
Table 6: $yi and e$i For Some Selected i

Sample 12 21 15 1 24
No. (i)
xi 0.96 1.28 1.65 1.84 2.35
yi 138 160 178 190 210
$yi 138
e$i 0
Note: $yi = 90 + 50x and e$i = yi – $yi
3.2.3 Residual Analysis
Fitting a regression model to a set of data requires several assumptions. For example
estimation of the model parameters requires the assumptions that the errors are
uncorrelated random variables with mean zero and constant variance. If these assumptions
are not satisfied, then using the simple least squares method may not produce the efficient
regression line. In fitting a linear model, it is also assumed that the order of the model is
correct, that is if we fit a first order polynomial, then we are assuming that phenomenon
actually behave in a first order manner. Therefore, for a practitioner it is important to
verify these assumptions and the adequacy of the model.
We define the residuals as e i = y i − yˆ i , i = 1, 2, …, n, where y i is an observation

and ŷ i is the corresponding estimated value from the best fitting regression line.
Analysis of the residuals is frequently helpful in checking the assumption that errors are
independent and identically distributed with mean zero and finite variance and in
determining `whether the additional terms in the model would be required not. It is
advisable to plot the residuals
a) in time sequence (if known),

b) against the yî or
c) against the independent variable x. These graphs will usually look like one of the
four general patterns as shown in the Figures 6 – 9.
15
9
9 9 9
9
9 9 9 9 9
ei 9 9 9
9 9
9 9 9 9
9
9 9 9 9
9 9 9 9
9 9
9 9
9 9 9
9 9 9 9
∧
yi
Figure 6 Pattern of the residual plot; satisfactory.
The Figure 6 indicates that the residuals are behaving in satisfactory manner and the
model assumptions can be assumed to be correct. The Figures 7 – 9 given below indicate
unsatisfactory behaviour of the residuals. The Figure 7 clearly indicates that the variances
are gradually increasing. Similarly the Figure 8 indicates that the variances are not
constant. If the residuals plot is like the Figure 9, then it seem the model order is not
correct, that means, the first order model is not the correct assumption. We should look
for higher order models.
*
* *
* *
* * * *
* ei * *
* * * * *
ei * * * * * *
* * * * *
* * * * * * * * * *
* * * * * * * * *
* * * *
* * * * *
* * * * * * * *
* * * * * * * * *
* * * * *
* * * *
* * * * *
* * *
* * * *
* *
* * *
* *
∧
yi ∧
16 yi
Figure 7 Pattern of the residual plot; indicates the
Figure 8 Pattern of the residual plot; indicates the vari
variance is gradually increasing this case
not constant.
Regression Analysis
*
*
*
*
* *
* * * *
*
ei * *
*
* * * *
* * *
* *
* * * *
*
* *
* *
* * *
∧
yi
Figure 9 Patent of the residual plot; indicates the
model order is not correct.
Example 4: Now we provide the residual plots of the data given in Example 1. We have
plotted yî vs. ei. It is provided in the Figure 10. From the Figure 10, it is quite clear that
the residuals plot is quite satisfactory and apparently all the model assumptions are
satisfied in Figure 10 here.
1.5
0.5
ei
0
-0.5
-1
0
45 50 55 60 65 70 75 80 85 90
∧
yi
Figure10 Pattern of the residual plot; satisfactory.
17
E1 What is the utility of residual plots? what is the disadvantage of residual plots?
3.3 NON-LINEAR REGRESSION
Linear regression is a widely used method for analyzing data described by models which
are linear in parameters. However, in many practical situations, people come across with
data where the relationship between the independent variable and the dependent variable
is no more linear. In that case definitely one should not try to use a linear regression
model to represent the relationship between the independent and dependent variable. Let
us consider the following example.
Example 5 Data on the amount of protein generated by a certain experiment were counted
and reported over time. They are presenting below in Table 7:
Table 7
Time Protein Time Protein Time Protein Time Protein
(min) (gm) (min) (gm) (min) (gm) (min) (gm)
0 0.844 80 0.818 160 0.580 240 0.457
10 0.908 90 0.784 170 0.558 250 0.448
20 0.932 100 0.751 180 0.538 260 0.438
30 0.936 110 0.718 190 0.522 270 0.431
40 0.925 120 0.685 200 0.506 280 0.424
50 0.908 130 0.685 210 0.490 290 0.420
60 0.881 140 0.628 220 0.478 300 0.414
70 0.850 150 0.603 230 0.467 310 0.411
We present the Time vs. Protein generated in the Figure 11.
From Figure 11 it is clear that the relationship between the time and protein generated is
not linear. Therefore, they can not be explained by a linear equation. In a situation like
this we may often go for a non-linear model to explain the relationship between the
independent and dependent variables and they are called the non-linear regression model.
A non-linear regression model can be formally written as
y = f ( x ,θ ) + ∈ , (13)
where f ( x,θ ) is a known response function of k-dimensional vector of explanatory

variable x and p-dimensional vector of unknown parameter θ . Here also ∈ represents the
error component and it is assumed that it has mean zero and finite variance. Therefore, it
is clear that the non-linear regression model is a generalization of the linear regression
model. In case of linear regression model f (x, θ) is a linear function, but there it can be
any non-linear function also. Similar to the linear regression model, here also our problem
is same, that is, if we observe a set of n, {(x1, y1),……, (xn, yn)} how to estimate the
unknown parameters θ, when we know the functional form of f (x, θ).
18
Regression Analysis
3.3.1 LEAST SQUARES ESTIMATION
Similar to the linear regression method here also to estimate the unknown parameters, we
adopt the same method. We find the estimate of θ by minimizing the residual sums of
squares, that is minimize
n
Q (θ ) = ∑[ y − f ( x ,θ )] ,
i =1
i i
2
(14)
with respect to the unknown parameters. The idea is same as before, that is we try to find
that particular value of θ for which the sum of squares of the distance between the points
yi and f ( xi ,θ ) is minimum. Unfortunately in this case the minimum can not be performed
as easily as before. We need to adopt some numerical technique to minimize the function
Q(θ ) . This minimization can be performed iteratively and one technique that can be used
to accomplish this is the Gauss-Newton method.
0.9
0.8
Protein 0.7
0.6
0.5
0.4
0 50 100 150 200 250 300 350
Time
Figure 11 Time vs. Protein generated in an experiment.
You have already learned about the Gauss-Newton method in details before, we just give
a brief description for your convenience. We use the following notations below:
θ = (θ1 ,.....,θ p ), θ ( k ) = (θ1( k ) ,........,θ p( k ) ) (15)

Expand the function f ( x,θ ) using a Taylor series expansion about the starting point θ (0)
and using the only the first order expansion, we get:
f ( xi ,θ ) ≈ f ( xi ,θ (0) ) + vi1 (θ1 − θ1(0) ) + ..... + vip (θ p − θ p(0) )

where
∂f ( xi ,θ )
vij = for j = 1,.........., p.
∂θ j
θ =θ ( 0)
19
Let η (θ ) = ( f ( x1 ,θ ),....., f ( xn ,θ ))' and y = y = ( y1 ,...... yn ) ' then in the matrix notation
we can write (15)
η (θ ) ≈ η (θ (0) ) + V (0) (θ − θ (0) ),
where V (0) is the η × p derivative matrix with elements vij. Therefore, to compute the first
estimates beyond the starting value is to compute
b0 = [V (0)'V (0) ]−1[ y − η (θ (0) )]

and then solve for the new estimate θ (1) as
θ (1) = b0 + θ (0) .
This procedure is repeated then with θ ( 0 ) is replaced by θ (1) and V (0) by V (1) and this
produces a new set of estimates. This iterative procedure continues until convergence is
achieved. Naturally these computations can not performed by hands, we need calculators
or computers to perform these computations.
Example 5 (Contd). In this case it is observed (theoretically) that the following model (16)
can be used to explain the relationship the time and yield generated yi where
yt = α 0 + α1e β1t + α 2 e β 2t + ∈t . (12)
Note that as we have mentioned for the general non-linear regression model, in this case
also the form of the non-linear function namely α 0 + α1e β1t + e β 2t is known, but the
parameters of the model, that is, θ = (α 0 ,α1 ,α 2 , β1 , β 2 ) is unknown. Given the data as
provided in Example 5, we want to estimate the unknown parameters.
We use the following initial guess α 0 = 0.5,α1 = 1.5,α 2 = −1.0, β1 = −0.01, β 2 = −0.02,
and finally using the Gauss-Newton algorithm we obtain the estimates of the parameters
as follows:
αˆ 0 = 0.375, αˆ1 = 1.936 αˆ 2 = 1.465 , βˆ0 = −0.013 βˆ1 = −0.022
We have plotted the points and also the best fitted regression line, namely
yˆ = 0.375 + 1.936e −0.013t − 1.465e −0.013t (17)
0.9
0.8 Fitted line
Protein 0.7
0.6
0.5
0.4
0 50 100 150 200 250 300 350
Figure 12 Time vs Protein generated in an experiment and the best fitted curve
20 Time
Regression Analysis
in the Figure 12. The Figure 12 indicates that the fitted regression curve provides a very
good relationship between the time and protein generated in that experiment. As before
the prediction curve, that is, the curve (17) can be used easily for prediction purposes also.
For example suppose we want to estimate the expected protein generation at the time 115
minutes after the experiment, then using (17), we obtain
yˆ = 0.375 + 1.936e −0.013×115 − 1.465e −0.013×115 = 0.698.

Therefore, at the 115 minutes the expected protein generation is 0.698 gms.
Some points we should remember about the non-linear regression model is that we have
to know the functional form of the relationship, but the parameters involved are unknown.
Usually the functional forms are obtained from the physical nature of the process and they
are available. If they are completely unknown it is difficult to use the non-linear
regression model. In that case we need to try with some guess models but they are not
very easy and they are not pursued here. Another important issue is the choice of the
guess value of the iterative process. This is also a difficult problem. Usually from the
prior experimental results we may have to try some trial and error method to find the
initial guess values.
E1 Data on the amount of heat generated by friction were obtained by Count Rumford in
1798. A bore was fitted into a stationery cylinder and pressed against the bottom by
means of a screw. The bore was turned by a team of horses for 30 minutes and Rumford
measured the temperature at small in intervals of time. They are provided in the Table
below:
Table 8
Time Temperature Time Temperature
(min) (°F) (min) (°F)
4 126
24 115
5 125
28 114
7 123
31 113
12 120
34 112
14 119
37.5 111
16 118
41 110
20 116
(1) Plot the time versus temperature and convince yourself that the linear regression
model is not the correct model in this case.
(2) A model based on Newton’s law of cooling was proposed as

f (t ,θ ) = 60 + 70e −θ t
Using an initial guess of θ (0) = 0.01, find the least squares estimate of θ .
(3) Based on the fitted least squares regression line find the expected temperature at the
time 15th minute after starting the experiment.
21
3.4 SUMMARY
In this unit you have seen :
• that regression analysis is an important technique, which can be used to verify the
results of any experiment.
• How to determine the relationship between a dependent and an independent

variable by using the Scatter diagram
• that by knowing the technique of regression you have an edge to analyse the
results in an organized way. Further this analysis is smoothened by application of
the concepts like least square estimation, goodness to fit and residual analysis.
• that many times the data obtained by conducting an experiment does not follow
the linear relation. So, to handle such aspects we have also discussed the concept
of non linear regression, under we have emphasized least square estimation
technique.
• Formulas and applications of following topics:
Simple Linear Regression

• Least Squares Estimation
• Goodness to Fit
• Residual Analysis
Non-Linear Regression
• Least Squares Estimation
3.5 SOLUTIONS
Check you r progress 1
E 1:
(1) Since both the regression lines pass through the point ( x , y ), we have
8 x – 10 y + 66 = 0
40 x – 18 y – 214 = 0
Solving we get, x = 13
y = 17.
Let 8x – 10y + 66 = 0 and 40x – 18y – 214 = 0

Be the lines of regression of y and x and x on y respectively. Now, we put them in
the following form.
8 66 18 214
= x + and x = y + (4)
10 10 40 40
22
8 4 Regression Analysis
∴ byx = regression coeff of y on x = =
10 5
18 9
bxy = regression coeff of x on y = =
40 20
4 9 9
Hence, r2 = bxy. byx = =
5 20 25
3
So r = ± = ± 0.6
5
Since, both the regression coeff are +ve, we take r = + 0.6
σy 4 3 σy
(3) We have, byx = r ⇒ = x
σx 5 5 3
∴ σy=4
Remarks (i) had we taken 8x – 10y + 66 = 0 as regression equation of x on y and
40x – 18y = 214, as regression equation of y on x.
10 40
Then bxy = and byx =
8 18
10 40
or r2 = bxy byx = x = 2.78
8 18
so r = ± 1.66
Which is wrong as r lies between ± 1.
E2
Observation (x) (y)
Relative Solvent
humidity, Evaporation,
(%) (%) wt
1 35.3 11.0
2 29.7 11.1
3 30.8 12.5
4 58.8 8.4
5 61.4 9.3
6 71.3 8.7
7 74.4 6.4
8 76.7 8.5
9 70.7 7.8
10 57.5 9.1
11 46.4 8.2
12 28.9 12.2
13 28.1 11.9
14 39.1 9.6
15 46.8 10.9
16 48.5 9.6
17 59.3 10.1
18 70.0 8.1
23
19 70.0 6.8
20 74.4 8.9
21 72.1 7.7
22 58.1 8.5
23 44.6 8.9
24 33.4 10.4
25 28.6 11.1
Summary statistics for these data are
n = 25 Σ x = 1314.90 Σ y = 235.70
Σ x2 = 76.308.53 Σ y2 = 2286.07 Σ xy = 11824.44
To estimate the simple linear regression line, we estimate the slope β1 and
intercept β0. these estimates are
n ∑ xy − [(∑ x)(∑ y )]
=β1 => β$ 1 = b1 =
n ∑ x 2 − (∑ x)2
25(11,824.44) − [(1314.90)(235.70)]
=
25(76,308.53) − (1314.90) 2
= – .08
β0. => β$ 0 = b0 = y − b1 x
= 9.43 – (– .08) (52.60) = 13.64
S 14
O
L 13
V
E 12
N
T 11
E 10
V
A 9
P
O 8
R
A 7
T
I 6
O
N 5
(% wt)
24 4 10 20 30 40 50 60 70 80 90 100
RELATIVE HUMIDITY (%)
Regression Analysis
Figure: A graph of the estimated line of regression of Y, the extent
of evaporation, on X, the relative humidity
Hence, the estimated regression equation is

µY | x = $y = 13.64 − .08 x
The graph of this equation is shown in Figure above. To predict the extent of
solvent evaporation when the relative humidity is 50 %, we substitute the value 50
for x in the equation.
$y = 13.64 − .08 x
to obtain $y = 13.64 − .08(50) = 9.64 . That is, when there relative humidity is 50 %
we predict that 9.64% of the solvent, by weight, will be lost due to evaporation.
Recall from elementary calculus that the slope of a line gives the change in y for a
unit change in x. If the slope is positive, then as x increases so does y; as x
decreases, so does y. If the slope is negative, things operate in reverse. An
increase in x signals a decrease in y, whereas a decrease in x yields an increase in
y.
S xy
E1 R = = (138.076) / [(2.6929 * 7839.93)1/2] = 0.9504
S xx S yy
So, R2 = 0.9033
E1 Residual plots are helpful in spotting potential problems. How ever theyare not always
easy to interpret. Residual patterns are hard to spot with small data set except in extreme
cases, residual plots are most useful with fairly large collection of data.
E1 Refer to example solved in the section 3.3.1
25

Numerical Computing MCSE004

Uploaded by

Copyright:

Available Formats

Numerical Computing MCSE004

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical Computing MCSE004

Uploaded by

Copyright:

Available Formats

Floating Point

Instance 2: On June 4, 1996, an unmanned Ariane 5 rocket launched by the European

Mathematical model Computer and

Error causing states Implementation

After studying this unit, you should be able to:

1.2 FLOATING POINT REPERESENTATIONS

Fixed-Point numbers are represented by a fixed number of decimal places. Examples

Floating-Point numbers have a fixed number of significant places. Examples are

In scientific computation, a real number x is usually represented in the form

x = ±(d1d 2 ........d n ) x10m (1)

where d 1 , d 2 ,.........d n are decimal digits and m is an integer called exponent.

In the standard normalized floating-point numbers, the significand is greater than or

1.2.1 Floating Point Arithmetic

then x + y = 0.300000066 x 101 = 0.30 x 101

Hence, if θ is one of the arithmetic operations, and θ* is corresponding floating-point

However, x θ y = fl(x θ y) (4)

1.2.2 Properties of Floating Point Arithmetic

Hence, we see that,

( x + y ) + z = 0.1x101 while x + ( y + z ) = 0.0 .

Check Your Progress 1

2) Let a = .5665E1, b = .5556E – 1, c = .5644E1. Verify the associative property for

3) Let a = .5555E1, b = .4545E1, c = .4535E1. Verify the distributive property for

1.2.3 Significant Digits

When are Digits Significant?

Example 1: Which of the following numbers has the greatest precision?

Example 2: What is the accuracy of the following numbers?

Significant digits in Multiplication, Division, Trigonometry functions, etc.

In a calculation involving multiplication, division, trigonometric functions, etc., the

Keep one extra digit in Intermediate Answers

Example 3: Let x = 4.5 be approximated to x* = 4.49998. Then,

Hence, x* approximates x correct to 6 significant decimal digits.

Wrong way of writing significant digits

1) Writing more digits in an answer (intermediate or final) than justified by the

Number Number of Scientific

0.00682 3 6.82 * 10-3 Leading zeros are not significant.

Loss of Significant Digits

Suppose we want to calculate the number z = x – y and x* and y* are approximations

Remark 1: To avoid this loss of significant digits in algebraic expressions, we must

Solving the quadratic equation, we have one of the solutions as

− b + b 2 − 4ac −9.9 + (9.9) 2 − 4.1.( − 1)

−9.9 + 102 −9.9 + 10 0.1

which is one of the true solutions.

Error = actual value – approximate value = x – xa

How errors are generated in computers?

What are the sources of error?

Data Input Algorithms Computations

Remark 2: Sensitivity of an algorithm for a numerical process used for computing

How error measures accuracy?

a) Absolute error b) Relative error c) Percentage error

Now, we define these errors.

c) Percentage Error: Percentage error is defined as,

1.3.1 Rounding-off Error

n = 2, fl(5562) = .55 * 10 4 chopped

n = 3, fl(5562) = .556 * 10 4 chopped

i) If dn+1 < 5, then it is chopped

fl( x ) = x (1 + δ). (8)

It is not difficult to show that

δ < .5 *10 −( n −1) in rounding

while, − 10 − ( n −1) < δ ≤ 0 in chopping . (9)

Definition 1: Let x be a real number and x* be a real number having non-terminal