1 - MCSDSC-2.1 Linear Algebra & Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 298

Karnataka State Open University

Mukthagangotri, Mysore-570006

M.Sc. Computer science


Second Semester

LINEAR ALGEBRA & STATISTICS

COURSE – MCSDSC 2.1 BLOCK –1-4


CREDIT PAGE

Programme Name: MSc-Computer Science Year/Semester: II Semester Block No:1-4

Course Name: Linear Algebra & Statistics Credit: 4 Unit No: 1-16

Course Design Expert Committee


Dr. Sharanappa V Halase Chairman
Vice Chancellor,
Karnataka State Open University,
Mukthagangotri, Mysuru-570006.

Dr. Ashok Kamble Member


Dean (Academic),
Karnataka State Open University,
Mukthagangotri, Mysuru-570 006.

Editorial Committee
Dr. Mahesha D.M., MCA.,PhD Chairman
BOS Chairman,
Assistant Professor & Programme co-ordinator(PG)
DoS&R in Computer Science,
Karnataka State Open University, Mysuru-570 006.

Smt, Suneetha MSc.,(PhD) Member Convener


Dept Chairperson & Programme co-ordinator (UG)
DoS&R in Computer Science,
Karnataka State Open University, Mysuru-570 006.

Dr Bhavya D.N., MTech., PhD Member


Assistant Professor & Programme co-ordinator(UG)
DoS&R in Computer Science,
Karnataka State Open University, Mysuru-570 006.

Dr. Ashoka S B., MSc.PhD Member


External Subject Expert,
Assistant Professor,
DoS&R in Computer Science,
Maharani’s Cluster University, palace Road Bangalore-01
Name of Course Writer No of No of Name of No of
Blocks Units Course Editor Units
Dr. B.Chaluvaraju Sri J. Mahendra
Assistant professor Block-1-3 1-12 Guest Lecturer 1- 12
Dept,of Mathematics, Central DOS in Mathematics (PG)
College campus Karnataka state Open
Bangalore University, university Mukthagangothri
Bangalore-560001 Mysore-570006
Dr. S.B Monoli Sri Dhruvkumar B
Professor Block-4 12-16 Asst. Professor 12-16
Department of Statistics DOS in Computer Science
Karnataka University Dharwad JSS women’s College
Saraswathipuram, Mysore
580003

Copy Right
Registrar,
Karnataka State Open University,
Mukthagantoghri, Mysore 570 006.

Developed by the Department of Studies and Research in Information Technology,


under the guidance of Dean (Academic), KSOU, Mysuru.
Karnataka State Open University, February-2022.
All rights reserved. No part of this work may be reproduced in any form or any other
means, without permission in writing from the Karnataka State Open University.
Further information on the Karnataka State Open University Programmes may be obtained
from the University’s Office at Mukthagangotri, Mysore – 570 006.

Printed and Published on behalf of Karnataka State Open University, Mysore-570 006 by
the Registrar (Administration)-2023
TABLE OF CONTENTS

• PREFACE
• PRELIMINARIES

BLOCK - I: Vector Spaces, Linear Transformations and Matrix


UNIT-1: VECTOR SPACES 3-20

1. 0. Objectives
1. 1. Introduction
1. 2. Vector Spaces
1. 3. Subspaces
1. 4. Linear Combinations and Systems of Linear equation
1. 5. Linearly Dependence and Linearly Independence
1. 6. Bases and Dimension
1. 7. Maximal Linearly Independent Subsets
1. 8. Summary
1. 9. Keywords
1. 10. Assessment Questions
1. 11. References

UNIT-2: LINEAR TRANSFORMATION AND MATRIX 21-45

2. 0. Objectives
2. 1. Introduction
2. 2. Linear Transformation
2. 3. The Matrix Representation of a Linear Transformation
2. 4. Composition of a Linear Transformation and Matrix Multiplication
2. 5. Invertiblity and Isomorphism
2. 6. The Change of Coordinate Matrix
2. 7. The Dual Space
2. 8. Summary
2. 9. Keywords
2. 10. Assessment Questions
2. 11. References

UNIT-3: ELEMENTARY MATRIX OPERATION, RANK OF A MATRIX, MATRIX INVERSE

AND SYSTEM OF LINEAR EQUATION 46-60

3. 0. Objectives
3. 1. Introduction
3. 2. Elementary Matrix Operations and Elementary Matrices
3. 3. The Rank of a Matrix and Matrix Inverses
3. 4. System of linear equations
3. 5. Summary
3. 6. Keywords
3. 7. Assessment Questions
3. 8. References

UNIT- 4: PROPERTIES OF DETERMINANT, COFACTOR EXPANSIONS AND

CRAMER’S RULE 61-73

4. 0. Objectives
4. 1. Introduction
4. 2. Properties of determinant
4. 3. Cofactor Expansions
4. 4. Elementary operations and Cramer’s rule
4. 5. Summary
4. 6. Keywords
4. 7. Assessment Questions
4. 8. References

BLOCK - II: Diagonalization and Inner Product Spaces

UNIT-1: EIGENVALUES AND EIGENVECTORS, DIAGONALIZABILITY AND INVARIANT

SUBSPACES 77-89

1. 0. Objectives
1. 1. Introduction
1. 2. Eigenvalues and Eigenvectors
1. 3. Diagonalizability
1. 4. Invariant Subspaces and The Cayley-Hamilton Theorem
1. 5. Summary
1. 6. Keywords
1. 7. Assessment Questions
1. 8. References

UNIT-2: INNER PRODUCT SPACE 90-104

2. 0. Objectives
2. 1. Introduction
2. 2. Inner Products Space
2. 3. The Gram-Schmidt Orthogonalization Process
2. 4. Orthogonal Complements
2. 5. Summary
2. 6. Keywords
2. 7. Assessment Questions
2. 8. References

UNIT-3: THE ADJOINT, NORMAL, SELF-ADJOINT, UNITARY AND ORTHOGONAL

OPERATORS, ORTHOGONAL PROJECTIONS AND THE SPECTRAL THEOREM

105-120

3. 0. Objectives
3. 1. Introduction
3. 2. The Adjoint of a linear operator
3. 3. The Normal and Self-Adjoint operators
3. 4. Unitary and Orthogonal operators
3. 5. Orthogonal Projections and The Spectral Theorem
3. 6. Summary
3. 7. Keywords
3. 8. Assessment Questions
3. 9. References

UNIT- 4: BILINEAR AND QUADRATIC FORMS 121-130

4. 0. Objectives
4. 1. Introduction
4. 2. Bilinear Form
4. 3. Quadratic Form
4. 4. Summary
4. 5. Keywords
4. 6. Assessment Questions
4. 7. References

BLOCK - III: Canonical Forms


UNIT-1: THE DIAGONAL AND TRIANGULAR FORM 133-146

1. 0. Objectives
1. 1. Introduction
1. 2. The Diagonal Form
1. 3. The Triangular Form
1. 4. Summary
1. 5. Keywords
1. 6. Assessment Questions
1. 7. References

UNIT-2: JORDAN CANONICAL FORM 147-158

2. 0. Objectives
2. 1. Introduction
2. 2. The Jordan Canonical Form
2. 3. Summary
2. 4. Keywords
2. 5. Assessment Questions
2. 6. References

UNIT-3: THE MINIMAL POLYNOMIAL 159-167

3. 0. Objectives
3. 1. Introduction
3. 2. Minimal Polynomial
3. 3. Summary
3. 4. Keywords
3. 5. Assessment Questions
3. 6. References

UNIT- 4: THE RATIONAL CANONICAL FORM 168-176

4. 0. Objectives
4. 1. Introduction
4. 2. The Rational Canonical Form
4. 3. Summary
4. 4. Keywords
4. 5. Assessment Questions
4. 6. References

• GLOSSARY OF SYMBOLS
BLOCK - IV: STATISTICS
UNIT-13: Combinatorics and descriptive statistics 177-208
13.0 Objectives
13.1 Introduction
13.2 Combinatorics & permutation
13.3 Frequency Distribution
13.4 Graphical Representation of Data
13.5 Measures of Central Tendency
13.6 Moments, Skewness, Kurtosis
13.7 Summary
13.8 Keywords
13.9 Questions for self-study
13.10 References

UNIT-14: Elementary probability theory 209-228


14.0 Objectives
14.1 Introduction
14.2 Concepts of Probability
14.3 Random Variables and Probability Distributions
14.4 Theoretical Distributions
14.5 Summary
14.6 Keywords
14.7 Question for self-study
14.8 References

UNIT-15: Correlation and regression analysis 229-245

15.0 Objectives
15.1 Introduction
15.2 Regression analysis
15.3 Fitting of Second degree parabola
15.4 Inverse regression
15.5 Correlation versus Regression
15.6 Summary
15.7 Keywords
15.8 Question for self-study
15,9 References

UNIT-16: Testing of Hypothesis 246-272

16.1 Introduction
16.2 Large Sample tests
16.3 Small sample tests
16.4 Testing for population variance
16.5 Tests based on Chi-square distribution
16.6 Introduction to Monte Carlo Methods
16.7 Summary
16.8 Keywords
16.9 Question for self-study
16.10 References
PREFACE

Linear algebra, mathematical discipline that deals with vectors and matrices and,
more generally, with vector spaces and linear transformations. Unlike other parts of
mathematics that are frequently invigorated by new ideas and unsolved problems, linear
algebra is very well understood. Linear algebra is a very useful subject, and its basic
concepts arose and were used in different areas of mathematics and its applications. It is
therefore not surprising that the subject had its roots in such diverse fields as number
theory (both elementary and algebraic), geometry, abstract algebra (groups, rings, fields,
Galois theory), analysis (differential equations, integral equations, and functional
analysis), and physics. Among the elementary concepts of linear algebra are linear
equations, matrices, determinants, linear transformations, linear independence,
dimension, bilinear forms, quadratic forms, and vector spaces. Since these concepts are
closely interconnected, several usually appear in a given context (e.g., linear equations
and matrices) and it is often impossible to disengage them. It has extensive applications
in engineering, natural sciences, computer science, and the social sciences.
Nonlinear mathematical models can often be approximated by linear ones. In 1880,
many of the basic results of linear algebra had been established, but they were not part of
a general theory. In particular, the fundamental notion of vector space, within which
such a theory would be framed, was absent. This was introduced only in 1888 by
Giuseppe Peano, Italian mathematician and he known as a founder of symbolic logic.
This study material deals with three blocks. In block -I contain vector spaces, linear
transformation and matrices, block-II contains diagonalization and inner product space
and finally block-III contains canonical forms, each blocks comprises four units.

0. PRELIMINARIES
This section provides a brief review of all basic concepts, definitions, examples
etc which will be used in the subsequent blocks. It has been assumed that the reader is
familiar with elementary Algebra.

Definition 0. 1. A group is a nonempty set G with a function G ×G → G denoted


(a, b) → ab, which satisfies the following axioms:

(i) (Associativity) For each a, b, c ∈ G, (ab) c = a (bc).


(ii) (Identity) There exists an element e ∈ G such that ea = a = ae for each a ∈ G.
(iii) (Inverses) For each a ∈ G, there exists an element a−1 ∈ G such that
a −1a = e = aa−1.
A group is abelian if ab = ba for all a, b ∈ G.

Examples.

1. Any vector space is an abelian group under +.


2. The set of invertible n × n real matrices is a group under matrix multiplication.

Definition 0. 2. A ring is a set R together with a function R×R → R called addition,


denoted (a, b) →a +b, and a function R × R → R called multiplication, denoted
(a, b) →ab, which satisfy the following axioms:

(i) (Commutativity of +) For each a, b ∈ R, a + b = b + a.


(ii) (Associativity) For each a, b, c ∈ R, (a + b) + c = a + (b + c) and
(ab) c = a (bc).
(iii) (+ Identity) There exists an element 0 in R such that 0 + a = a.
(iv) (+ Inverse) For each a ∈ R, there exists an element −a ∈ R such that
(−a) + a = 0.
(v) (Distributivity) For each a, b, c ∈ R, a (b + c ) = ab + ac and
(a + b) c = ac + bc.
A zero divisor in a ring R is a nonzero element a ∈ R such that there exists a nonzero
b ∈ R with ab = 0 or ba = 0.

Example. The set of integers, Z, is a ring.


Definition 0. 3. A field is a set F with at least two elements together with a function F ×
F → F called addition, denoted (a, b) → a + b, and a function F × F → F called
multiplication, denoted (a, b) → ab, which satisfy the following axioms:

(i) (Commutativity) For each a, b ∈ F, a + b = b + a and ab = ba.


(ii) (Associativity) For each a, b, c ∈ F, (a + b) + c = a + (b + c) and
(ab) c = a (bc).
(iii) (Identities) There exist two elements 0 and 1 in F such that 0 + a = a and
1a = a for each a ∈ F
(iv) (Inverses) For each a ∈ F, there exists an element −a ∈ F such that
(−a) + a = 0. For each nonzero a ∈ F, there exists an element a−1 ∈ F such that
a−1a = 1.
(v) (Distributivity) For each a, b, c ∈ F, a(b + c ) = ab + ac.
Examples.

1. The real numbers R the complex numbers C, and the rational numbers Q, are all
fields.
2. The set of integers Z, is not a field.

Definition 0. 4. An object of the form (a 1 , a 2 , . . . ,a n ), where the entries


a 1 , a 2 , . . . ,a n are elements of a field F, is called n-tuple with entries from F.

Definition 0. 5. A polynomial with coefficients from a field F is an expression of the


form p(x) = a 0 +a 1 x +….+ a n xn, where n is a nonnegative integer and each a i , called the
coefficient of xi, is in F. The set of all polynomials with coefficients from F is a vector
space is denoted by P(F) or F[x]. A polynomial p(x) in F[x] is said to be irreducible over
F if whenever p(x) = g(x) h(x) with g(x) h(x)∈ F[x], then one of g(x) or h(x) has degree 0
(that is a constant).

Definition 0. 6. A metric on a set X is d : X × X → R satisfying the following axioms :

(i) d(x, y) ≥ 0 for all x, y∈ X


(ii) d(x, y) ≥ 0 exactly when x= y
(iii) d(x, y) = d(y, x)
(iv) d(x, y) ≤ d(x, z) + d(z, y) ≥ 0 for all x, y, z∈ X.

In other words, a metric is simply a generalization of the notion of distance.

Definition 0. 7. An m × n matrix with entries from afield F is a rectangular array of the


𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
form � ⋮ ⋮ … ⋮ �, where each entry a ij (1≤ i ≤ m, 1≤ j≤ n) is an element
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
of F. The set of all m × n matrices with entries from a field F is a vector space is denoted
by M m×n (F).

Definition 0. 8. An n × n matrix M is called a diagonal matrix if M ij = 0 whenever


i ≠ j, that is if all its nondiagonal entries are zero.

Definition 0. 9. The transpose of an m × n matrix A = [a ij ] is the n × m matrix AT with


a ji as the element in the ith row and jth column.

The trace of an n × n matrix M, denoted tr(M), is the sum of the diagonal entries
of M. That is, tr(M) = M 11 + M 22 + . . . .+ M nn .

A symmetric matrix is a matrix A such that AT=A and a skew-symmetric matrix is


a matrix A such that AT= –A.

Definition 0. 10. For any pair of positive integers i and j, the symbol δ ij is defined
δ ij = 0 if i≠ j and δ ij = 1 if i=j. This symbol is known as the kronecker delta.
GLOSSARY OF SYMBOLS

Symbols Meaning
Aij The ijth entry of the matrix A

A- 1 The inverse of the matrix A

A* The ajdoint of the matrix A

AT The transpose of the matrix A

Tr(A) The trace of the matrix A

(A | B) The matrix A augmented by the matrix B

B(x, y) The bilinear form

B (V) The set of all bilinear forms on V

B and A The basis of B and A

B* and A* The dual basis of the basis B and A

C The field of complex numbers

C (R) The vector space of continuous functions on R

det (A) or |A| The determinant of the matrix A

δij The Kronecker delta

dim (V) The dimension of V

ei The ith standard vector of Fn

Eλ The eigenspace of T corresponding to λ

F A field

f (A) The polynomial f(x) evaluated at the matrix A

Fn The set of n - tuples with entries in a field F

f (T) The polynomial f(x) evaluated at the operator T

In or I The n x n identity matrix

Iv or I The identity operator on v in V

Q(x) The quadratic form

Kλ The generalized eigenspace of T corresponding to λ


TA The left - multiplication transformation by matrix A

A (V) The linear transformations from vector space V to V

A (U, V) The space of linear transformations from U to V

Mm x n (F) The set of m x n matrices with entries in F

nullity(T) or n(T) The dimension of the null space of T

P (F) The space of polynomials with coefficients in F

Pn (F) The polynomials in P(F) of degree at most n

φB The standard representation with respect to basis B

R The field of real numbers

rank (A) The rank of the matrix A

rank (T) or r(T) The rank of the linear transformation T

rank (Q) or r The rank of real quadratic form of Q

range (T) The range of the linear transformation T

Span (S) The span of the set S

S⊥ The orthogonal complement of the set S

T The linear transformation

T–1 The inverse of the linear transformation T

T* The adjoint of the linear operator T

T0 The zero transformation

[T]B The matrix representation of T in basis B

[T ]B, A = [T ]BA The matrix representation of T in basis B and A

TW The restriction of T to a subspace W

V, U, Z The vector spaces

V* The dual space of the vector space V

[x]B The coordinate vector of x relative to B

W1 ⊕W2⊕ . . . . .⊕Wk The direct sum of sub spaces W1 ,W2, . . . . .,Wk

W1 +W2+ . . . . .+Wk The sum of sub spaces W1 ,W2, . . . . . ,Wk


BLOCK - I
Vector Spaces, Linear Transformations and Matrix

1
2
UNIT-1: VECTOR SPACES

STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. Vector Spaces
1. 2. 1. General Properties of Vector Space
1. 3. Subspaces
1. 3. 1. Criteria of a subspace
1. 4. Linear Combinations and Systems of Linear equation
1. 4. 1. Span
1. 5. Linearly Dependence and Linearly Independence
1. 5. 1. Properties of Linearly dependence and linearly independence
1. 6. Bases and Dimension
1. 6. 1. Finite and Infinite dimensional vector space
1. 7. Maximal Linearly Independent Subsets
1. 8. Summary
1. 9. Keywords
1. 10. Assessment Questions
1. 11. References

3
UNIT-1: VECTOR SPACES

1. 0. Objectives

After studying this unit you will be able to:


• Define vector space and known its general properties.
• Solving some problems by using the criteria of a subspace
• Describe the system of linear equation
• Distinguish between linearly dependent and linear independent set
• Define the bases and identifying the Finite and Infinite dimensional vector space
• Explain the maximal linearly independent subset is a basis

1. 1. Introduction

A vector space is a mathematical structure formed by a collection of vectors:


objects that may be added together and multiplied ("scaled") by numbers, called scalars
in this unit. Scalars are often taken to be real numbers, but one may also consider vector
spaces with scalar multiplication by complex numbers, rational numbers, or even more
general fields instead. The operations of vector addition and scalar multiplication have to
satisfy certain requirements, called axioms.
This unit mainly deals with standard properties of Vectors, Subspaces, Linear
combinations and systems of linear equation, Linearly dependence and linearly
independence, Bases and Dimension, and Maximal linearly independent subsets along
with some illustrated examples and theorems.

1. 2. Vector Spaces

Definition 1. 2. 1. A vector space V over a field F is an abelian group with a scalar


product a⋅v or av defined for all a∈F and all v∈V satisfying the following axioms.

4
(i) a(bv) = (ab)v
(ii) (a +b)v = av + bv
(iii) a(u+v)= au + av
(iv) 1v = v, where a, b∈F and u, v∈V.
Note.
1. The elements of V are called vectors, the elements of F are called scalars.
2. To differentiate between the scalar zero and the vector zero or null vector, we will
write them as 0 and 0, respectively.

Let us examine some examples of vector spaces.


Example -1. The n-tuples of real numbers, denoted by (u 1, . . . ,u n ), form a vector space
over R. Given vectors u = (u 1, . . . ,u n ) and v = (v 1, . . . ,v n ) in Rn and a in R , we can
define vector addition by u + v = (u 1, . . . ,u n ) + (v 1, . . ,v n ) = (u 1 + v 1, . . . , u n + v n ) and
scalar multiplication by au =a(u 1, . . . ,u n ) = (au 1, . . . ,au n ).

Example - 2. If F is a field, then F[x] is a vector space over F.


The vectors in F[x] are simply polynomials. Vector addition is just polynomial addition.
If a∈F and p(x)∈F[x], then scalar multiplication is defined by ap(x).

1. 2. 1. General Properties of Vector Space

Theorem 1. 2. 1. Let V be a vector space over a field F and 0 be the zero vector in V.
Then each of the following statements is true.

(i) 0v = 0 for all v∈V.


(ii) a 0 = 0 for all a∈F.
(iii) If av = 0, then either a = 0 or v = 0.
(iv) (−1)v = −v for all v∈V.
(v) −(av) = (−a)v = a(−v) for all a∈F and v∈V.

Proof. To prove (i), observe that 0v = (0 + 0)v = 0v + 0v; consequently, 0 + 0v = 0v + 0v.


Since V is an abelian group, 0 = 0v. The proof of (ii) is almost identical to the proof of
(i). For (iii), we are done if a = 0. Suppose that a ≠ 0. Multiplying both sides of

5
a = 0 by 1/a, we have v = 0. To show (iv), observe that v + (−1)v = 1v + (−v) = (1−1)v =
0v = 0, and so −1v = (−1)v. The proof (v) is obvious.

1. 3. Subspace

Definition 1. 3. 1. Let V be a vector space over a field F, and W a subset of V. Then W


is a subspace of V if it is closed under vector addition and scalar multiplication; that is, if
u, v ∈ W and a∈F, it will always be the case that u + v and av are also in W.

1. 3. 1. Criteria of a subspace
1. A nonempty subset W of a vector space V over a field F is a subspace of V over
F if and only if W is closed under addition and scalar multiplication of V.
2. A subset W of a vector space V over a field F is a subspace of V over F if and
only if
(i) u ∈ W, v ∈ W ⇒ u−v ∈ W.
(ii) u ∈ W, a∈F ⇒ au ∈ W.
3. A subset W of a vector space V over a field F is a subspace of V over F if and
only if u, v ∈ W and a, b∈F ⇒ au +bv ∈ W

Using the above criteria, we illustrate the following examples


Illustrative Example - 1. Let W be the subspace of R3 defined by
W = {(x 1 , 2x 1 + x 2 , x 1 − x 2 ) : x 1 , x 2 ∈ R3}.
Then show that W is a subspace of R3.
Solution. Since a(x 1 , 2x 1 + x 2 , x 1 − x 2 ) = (ax 1 , a(2x 1 + x 2 ), a(x 1 − x 2 ))
= (ax 1 , a(2x 1 ) + ax 2 , ax 1 − ax 2 ),
Therefore, W is closed under scalar multiplication.
To show that W is closed under vector addition,
Let u = (x 1 , 2x 1 + x 2 , x 1 − x 2 ) and v = (y 1 , 2y 1 + y 2 , y 1 − y 2 ) be vectors in W.
Then u + v =(x 1 + y 1 , 2(x 1 + y 1 ) + (x 2 + y 2 ), (x 1 + y 1 ) − (x 2 + y 2 )).
Hence W is a subspace of R3.

6
Illustrative Example - 2. Let R be the field of real numbers and S be the set of all
solutions of the equations x + y + 2z = 0. Show that S is a subspace of R3.
Solution. We have, S = {(x, y, z): x + y + 2z = 0, x, y, z ∈ R}.
Clearly, 1× 0 + 1× 0 + 2 × 0 = 0 . So, (0, 0, 0) satisfies the equation x + y + 2z = 0
Therefore, (0, 0, 0)∈S
⇒ S is non- empty subset of R3.
Let u = (x 1 , y 1 ,z 1 ) and v = (x 2 , y 2 , z 2 ) be any two elements of S.
Then x 1 + y 1 + 2z 1 = 0 and x 2 + y 2 + 2z 2 = 0
Let a, b be any two elements of R. Then,
au + bv = a(x 1 , y 1 ,z 1 ) + b(x 2 , y 2 , z 2 )
⇒ au + bv = (ax 1 +bx 2 , ay 1 +by 2 , az 1 + bz 2 )
Now (ax 1 +bx 2 ) + (ay 1 +by 2 ) + 2 (ac 1 + bc 2 ) = a(x 1 + y 1 + 2z 1 ) + b(x 2 + y 2 + 2z 2 )
=a × 0+b × 0=0
Therefore, au + bv = (ax 1 +bx 2 , ay 1 +by 2 , az 1 + bz 2 ) ∈ S
Thus au + bv ∈ S for all u, v ∈ S and a, b ∈ R
Hence, S is a subspace of R3.

Illustrative Example - 3. Let V be the vector space of all real valued continuous
functions over the field R of all real numbers. Show that the set S of solutions of the

d 2 y dy
differential equation 2 2 − 9 + 2 y =
0 is a subspace of V.
dx dx
 d2y dy 
Solution. We have S =  y : 2 2 − 9 + 2 y =
0  , where y = f(x).
 dx dx 
Clearly, y = 0 satisfies the given differential equations.
Therefore, 0 ∈S ⇒ S ≠ φ .

d 2 y dy
Let y 1 , y 2 ∈ S, Then, y 1 , y 2 are solutions of 2 2 − 9 + 2 y =
0
dx dx
d 2 y1 dy1 d 2 y2 dy
⇒ 2 2 − 9 + 2 y1 =
0 and 2 2 − 9 2 + 2 y2 =
0
dx dx dx dx
Let a, b ∈ R and y = ay 1 + by 2 . Then

7
d 2 y dy d2 d
2 2 − 9 +=2 y 2 2 ( ay1 + by2 ) − 9 ( ay1 + by2 ) + 2 ( ay1 + by2 )
dx dx dx dx
d2y dy  d 2 y1 d 2 y2   dy1 dy 
⇒ 2 − 9 + 2 y = 2  a + b 2 
− 9 a + b 2  + 2 ( ay1 + by2 )
dx 2
dx  dx
2
dx   dx dx 

d2y dy  d 2 y1 dy1   d 2 y2 dy 
⇒ 2 − 9 +
= 2 y a  2 − 9 + 2 y1
+ b  2 2 − 9 2 + 2 y2 
 dx   dx 
2 2
dx dx dx dx

d 2 y dy
⇒ 2 2 − 9 + 2 y =a × 0 + b × 0
dx dx
Therefore, y = ay 1 + by 2 ∈ S for all y 1 , y 2 ∈ S and a, b ∈ R.
Hence, S is a subspace of V.

1. 4. Linear combinations and systems of linear equation

Definition 1. 4. 1. Let V be a vector space and S a nonempty subset of V. A vector


v ∈V is called a linear combination of vectors of S if there exist a finite number
of vectors u 1, u 2 ,…..,u n in S and scalars a 1 ,a 2 ,……,a n in F such that
v = a 1 u 1 + a 2 u 2 + a 3 u 3 + ………… + a n u n .
In this case we also say that v is a linear combination of u 1 ,u 2 , ….u n and call a 1 ,
a 2 ,……,a n the coefficients of the linear combination.

Note. In any vector space V, 0v = 0 for each v ∈ V. Thus the zero vector is a linear
combination of any nonempty subset of V.

1. 4. 1. Span
Definition 1. 4. 2. Let S be a nonempty subset of a vector space V. The span of S,
denoted span(S), is the set consisting of all linear combinations of the vectors in S.

Note.
1. For convenience, we define span ( φ ) = {0}.
2. In R3, for instance, the span of the set {(1, 0, 0), (0, 1, 0)} consists of all vectors
in R3 that have the form a(1, 0, 0) + b(0, 1, 0) = (a, b, 0) for some scalars a and b.
8
Thus the span of {(1, 0, 0), (0, 1, 0)} contains all the points in the xy-plane. In
this case, the span of the set is a subspace of R3. This fact is true in general.

Theorem 1. 4. 1. The span of any subset S of a vector space V is a subspace of V.


Moreover, any subspace of V that contains S must also contain the span of S.
Proof. This result is immediate is S = φ because span ( φ ) = {0}, which is a subspace
that is contained in any subspace of V.
If S ≠ φ , then S contains a vector z. So 0z = 0 is in span(S). Let x, y ∈ span(S). Then
there exist vectors u 1 , u 2 , ….u m , v 1 , v 2 , ….v n in S and scalars a 1 , a 2 , …,a m , b 1 , b 2 ,….b n
such that x = a 1 u 1 + a 2 u 2 + a 3 u 3 + …… + a m u m and y = b 1 v 1 + b 2 v 2 + b 3 v 3 + ……
+ bn vn.
Therefore, x + y = a 1 u 1 + a 2 u 2 + ……+ a m u m + b 1 v 1 + b 2 v 2 + …… + b n v n and, for
any scalar c, cx = (ca 1 )u 1 + (ca 2 )u 2 + (ca 3 )u 3 + ………+(ca m )u m are clearly linear
combinations of the vectors in S, so x + y and cx are in span (S).
Thus span (S) is a subspace of V.

Definition 1. 4. 3. A subset S of a vector space V generates (or spans) V is span (S) = V.


In this case, we also say that the vectors of S generate (0 span) V.

Example - 1. The vectors (1, 1, 0), (1, 0, 1) and (0, 1, 1) generate R3 since an arbitrary
vector (a 1 , a 2 , a 3 ) in R3 is a linear combination of the three given vectors, in fact, the
scalars r, s and t for which r(1, 1, 0) + s(1, 0, 1) + t(0, 0, 1) = (a 1 ,a 2, a 3 ) are r = ½ (a 1
+ a 2 – a 3 ), s = ½ (a 1 – a 2 + a 3 ), and t = ½ (–a 1 + a 2 + a 3 ).

Example - 2. The polynomials x2 + 3x – 2, 2x2 + 5x – 3, and –x2 –4x + 4 generate


P 2 (R) since each of the three given polynomials belongs to P 2 (R) and each polynomial
ax2 + bx + c in P 2 (R) is a linear combination of these three, namely,
(– 8a + 5b + 3c) (x2 + 3x – 2)
+ (4a – 2b – c) (2x2 + 5x – 3)
+ (– a + b + c) (– x2 – 4x + 4) = (ax2 + bx + c).

Definition 1. 4. 4. The system of linear equations can be written in the form


a 11 x 1 + a 12 x 2 + . . . . . . . . + a 1n x n = b 1
9
a 21 x 1 + a 22 x 2 + . . . .. . . . + a 2n x n = b 2
:
a m1 x 1 + a m2 x 2 + . . . . . . . + a mn x n = b m
where a ij and b i (1 ≤ i ≤ m and 1 ≤ j ≤ n ) are scalars in a field F and x 1 , x 2 , x 3, . .. . ,x n
are n variables taking values in F, is called a system of m linear equations in n unknowns
over F.

Illustrative Example - 4. Express v = {1, –2, 5) in R3 as a linear combination of the


following vectors: v 1 = (1, 1, 1), v 2 = (1, 2, 3), v 3 = (2, –1, 1).
Solution. Let x, y, z be scalars in R such that v = xv 1 +yv 2 +zv 3
⇒ (1, –2, 5) = x(1,1,1) + y(1,2,3) + z (2, –1, 1)
⇒ (1, –2, 5) = (x + y +2z, x +2y – z, x + 3y + z)
⇒ x + y + 2z = 1, x +2y – z = –2, x + 3y + z = 5.
This system of equations can be written in matrix as follows:
1 1 2   x  1 
1 2 −1  y  = −2 
  
1 3 1   z   5 

This is equivalent to
1 1 2   x   1 
0 1 −3  y  =   Applying R 2 →R 2 – R 1 , R 3 →R 3 – R 1
     −3
0 2 −1  z   4 

1 1 2   x   1 
0 1 −3  y  =   Applying R 3 →R 3 – R 2
    −3
or

0 0 5   z  10 

This is equivalent to x + y + 2z = 1, y – 3z = –3, 5z = 10


⇒ x = –6, y = 3, z = 2.
Hence, v = –6v 1 + 3v 2 +2v 3 .

1. 5. Linearly Dependence and Linearly Independence

10
Definition 1. 5. 1. A subset S of a vector space V is called linearly dependent if there
exists a finite number of distinct vectors u 1 , u 2 , u 3 ,……,u n in S and scalars a 1 , a 2 , a 3 ,
……,a n , not all zero, such that a 1 u 1 + a 2 u 2 + a 3 u 3 + ……+ a n u n = 0. In this case we
also say that the vectors of S are linearly dependent.

Note. For any vectors u 1 , u 2 , u 3 , ……,u n , we have a 1 u 1 + a 2 u 2 + a 3 u 3 + …… + a n


u n = 0 if a 1 = a 2 = a 3 = ……… = a n = 0. We call this the trivial representation of 0 as a
linear combination of u 1 , u 2 , u 3 , ……, u n. Thus, for a set to be linearly dependent, there
must exist a nontrivial representation of 0 as a linear combination of vectors in the set.

Illustrative Example - 1. If the set S = {(1, 3, –4, 2), (2, 2, –4, 0), (1, –3, 2, –4), (–1,
0, 1, 0)} in R4, then show that S is linearly dependent and express one of the vectors in S
as a linear combination of the other vectors in S.
Solution. To show S is linearly dependent, we must find scalars a 1 , a 2 , a 3 and a 4 , not all
zero, such that a 1 (1, 3,–4, 2) + a 2 (2, 2,–4, 0) + a 3 (1,–3, 2,–4) + a 4 (–1, 0, 1, 0) = 0.
Finding such scalars to finding a nonzero solution to the system of linear equations.
a 1 + 2a 2 + a 3 – a 4 = 0
3a 1 + 2a 2 – 3a 3 =0
–4a 1 – 4a 2 + 2a 3 + a 4 = 0
2a 1 – 4a 4 = 0
One such solution is a 1 = 4, a 2 = –3, a 3 = 2, and a 4 = 0. Thus S is a linearly dependent
subset of R4, and 4 (1, 3, –4, 2) – 3( 2, 2, –4, 0) + 2(1, –3, 2, –4) + 0(–1, 0, 1, 0) = 0.

Illustrative Example - 2. Let v 1, v 2, v 3, be vectors in a vector space V(F), and


let λ 1 , λ 2 ∈ F. Show that the set { v 1, v 2, v 3 } is linearly dependent, if the set
{ v 1 + λ 1 v 2 + λ 2 v 3 , v 2, v 3 } is linearly dependent.
Solution. If the set {v 1 + λ 1 v 2 + λ 2 v 3 , v 2, v 3 } is linearly dependent, then there exist
scalars λ, µ, γ ∈F (not all zero) such that λ (v 1 + λ 1 v 2 + λ 2 v 3 ) + µ v 2 + γ v 3 = 0 V
⇒ λ v 1 + (λ λ 1 + µ) v 2 + (λ λ 2 + γ )v 3 = 0 V → (1)
The set {v 1, v 2, v 3 } will be linearly dependent if in (1) at least one of the scalar
coefficients is non-zero. If λ ≠ 0, then the set will be linearly dependent whatever may be

11
the values of µ and γ. But, if λ = 0, then at least one of µ and γ should not be equal to
zero and hence at least one of λ λ 1 + µ and λ λ 2 + γ will not be zero.
Since λ = 0 ⇒ λ λ 1 + µ = µ and λ λ 2 + γ = γ. Hence, from (1), we find that the scalars
λ, λ λ 1 + µ, λ λ 2 + γ are not all zero. Consequently, the set {v 1, v 2, v 3 } is linearly
dependent.

Definition 1. 5. 2. A subset S of a vector space that is not linearly dependent is called


linearly independent. As before, we also say that the vectors of S are linearly
independent.

Properties of Linearly independent and linearly dependent.


The following properties are true in any vector space.
1. The empty set is linearly independent, for linearly dependent sets must be
nonempty.
2. A set consisting of a single nonzero vector is linearly independent. For if {u} is
linearly dependent, then au = 0 for some nonzero scalar a.
Thus, u = a–1 (au) = a–10 = 0
3. A set is linearly independent if and only if the only representations of 0 as linear
combinations of its vectors are trivial representations.

The property-3 provides a useful method for determining whether a finite set is
linearly independent. This technique is illustrated in the following examples.
Illustrative Example - 3. Prove that the set S = {(1, 0, 0, –1), (0, 1, 0, –1),
(0, 0, 1, –1), (0, 0, 0, 1)} is linearly independent.
Solution. We must show that the only linear combination of vectors in S that equals the
zero vector is the one in which all the coefficients are zero.
Suppose that a 1 , a 2 , a 3 and a 4 are scalars such that
a 1 (1, 0, 0, –1) + a 2 (0, 1, 0, –1) + a 3 (0, 0, 1, –1) + a 4 (0, 0, 0, 1) = (0, 0, 0, 0).
Equating the corresponding coordinates of the vectors on the left and the right sides of
this equation, we obtain the following system of linear equations.
a1 =0
a2 =0

12
a3 =0
– a 1 – a 2 –a 3 + a 4 = 0
Clearly the only solution to this system is a 1 = a 2 = a 3 = a 4 = 0, and so S is linearly
independent.

Illustrative Example - 4. Suppose the vectors u, v, w are linearly independent vectors


in the vector space V over a field F. Show that the vectors u + v, u – v, u – 2v +w are
also linearly independent.
Solution. Let λ 1 , λ 2 , λ 3 be scalars such that
λ 1 (u + v) + λ 2 (u – v) + λ 3 (u – 2v + w) = 0 V
⇒ (λ 1 + λ 2 + λ 3 ) u + (λ 1 – λ 2 – 2 λ 3 )v + λ 3 w = 0 V
⇒ λ 1 + λ 2 + λ 3 = 0 , λ 1 – λ 2 – 2 λ 3 = 0, 0λ 1 + 0λ 2 + λ 3 = 0 , since u, v, w are linearly
independent.
The coefficient matrix A of the above system of equations is given by
1 1 1
A = 1 − 1 − 2 ⇒ A = −2 ≠ 0
0 0 1 

So, the above system of equations has only trivial solution λ 1 = λ 2 = λ 3 = 0.


Thus, u + v, u – v, u – 2v +w are linearly independent vectors.

1. 6. Bases and Dimension

Definition 1. 6. 1. A basis A for a vector space V is a linearly independent subset

of V that generates V. If A is a basis for V, we also say that the vectors of A form a
basis for V.

Example - 1. Recalling that span ( φ ) = {0} and φ is linearly independent, we see that
φ is a basis for the zero vector space.

Example - 2. If M mxn (F), let Eij denote the matrix whose only nonzero entry is a 1 in the
ith row and jth column. Then {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n }is a basis for M m x n (F).

13
Definition 1. 6. 2. In F n , let e 1 = (1, 0, 0, ……,0), e 2 = (0, 1, 0……,0) , …….,
e n = (0,0,……,0,1); { e 1, e 2, e 3 , ……,e n } is readily seen to be a basis for F n and is called
the standard basis for F n and is denoted by ε n .

Example - 3. If P n (F) the set {1, x, x2,……, xn} is a basis. This is the standard basis for
P n (F).

Theorem 1. 6. 1. Let V be a vector space and A = {u 1 , u 2 ,……, u n } be a basis

of a subset of V. Then A is a basis for V if and only if each v ∈ V can be uniquely


expressed as a linear combination of vectors of A, that is, A can be expresses in the
form v = a 1 u 1 + a 2 u 2 + a 3 u 3 + …… + a n u n for unique scalars a 1 , a 2 , a 3 , …,a n .
Proof. Let A be a basis for V. If v ∈ V, then v ∈ span (A) because span (A) = V. Thus
v is a linear combination of the vectors of A. Suppose that
v = a 1 u 1 + a 2 u 2 + a 3 u 3 + ………… + a n u n and
v = b 1 u 1 + b 2 u 2 + b 3 u 3 + ………… + b n u n
are two such representations of v. Subtracting the second equation from the first gives
0 = (a 1 – b 1 )u 1 + (a 2 – b 2 )u 2 + (a 3 – b 3 )u 3 + …………….+ (a n – b n )u n .
Since A is linearly independent, it follows that
(a 1 – b 1 ) = (a 2 – b 2 ) = (a 3 – b 3 ) = …………..(a n – b n ) = 0.
Hence a 1 = b 1 , a 2 = b 2 , a 3 = b 3 ,………., a n = b n , so v is uniquely expressible as a linear
combination of the vectors of the basis A.

1. 6. 1. Finite and Infinite dimensional vector space

Definition 1. 6. 2. A vector space is called finite-dimensional if it has a basis consisting


of a finite number of vectors. The unique number of vectors in each basis for V is called
the dimension of V and is denoted by dim(V). A vector space that is not finite-
dimensional is called infinite-dimensional.

Examples - 4.
(i) The vector space {0} has dimension zero.
(ii) The vector space F n has dimension n.
14
(iii) The vector space M m × n (F) has dimension mn.
(iv) The vector space P n (F) has dimension (n+1).

The following examples show that the dimension of a vector space depends on its field of
scalars.
Examples - 5.
(i) Over the field of complex numbers C, the vector space of complex numbers has
dimension 1. (A basis is {1, i}).
(ii) Over the field of real numbers R, the vector space of complex numbers has
dimension 2. (A basis is {1, i}).
Note.
1. In a finite dimensional vector space V whose basis set A = {u 1 , u 2 ,……, u n },
every vector u∈V is uniquely expressible as linear combination of the vectors in
A, that is u = ∑𝑛𝑖=1 𝑎𝑖 𝑢𝑖 for all a i ∈F.
2. There exists a basis for each finite dimensional vector space.
3. Let V be a finite dimensional vector space which is spanned by a finite set S of
vectors u 1 , u 2 ,……,u m . Then any linearly independent set of vectors in V
contains not more than m-elements.
4. Let V be finite dimensional vector space over a field F. Then any two basis of V
have same number of elements.
5. Let V be finite dimensional vector space over a field F, and let W be a subspace
of V. Then dim (V/W) = dim (V) – dim (W).
6. If A and B are two subspaces of a finite dimensional vector space V over a field
F, then dim (A+B) = dim (A) + dim (B) – dim (A∩B).

1. 7. Maximal Linearly Independent subsets

Definition 1. 7. 1. Let F be a family of sets. A member M of F is called maximal (with


respect to set inclusion) if M is contained in no member of F other than M itself.

15
Example- 1. Let F be the family of all subsets of a nonempty set S. (This family F is
called the power set of S.) The set S is easily seen to be a maximal element of F.

Example -2. Let S and T be disjoint nonempty sets, and let F be the union of their power
sets. Then S and T are both maximal elements of F.

Definition 1. 7. 2. Let S be a subset of a vector space V. A maximal linearly


independent subset of S is a subset H of S satisfying both of the following conditions.
(i) H is linearly independent.
(ii) The only linearly independent subset of S that contains H is H itself.

Note. A basis A for a vector space V is a maximal linearly independent subset of V,


because
1. A is linearly independent by definition.

2. If v ∈ V and v∉A, then A ∪{v} is linearly dependent by S is a linearly


independent subset of a vector space V, and v in V that is not in S. Then S ∪ {v}
is linearly dependent if and only if v ∈ span (S) because span (A) = V.

Our next result shows that the converse of this statement is also true.
Theorem 1. 7. 1. Let V be a vector space and S a subset that generates V. If A is a

maximal linearly independent subset of S, then A is a basis for V.

Proof. Let A be a maximal linearly independent subset of S. Because the basis A is

linearly independent, it suffices to prove that A generates V.

We claim that S ⊆ span (A), for otherwise there exists a v ∈ S such that v ∉ span(A).
Since S is a linearly independent subset of a vector space V, and let v ∈ V that is not in S.
Then S ∪ {v} is linearly dependent if and only if v ∈ span (S). This implies that
A ∪ {v} is linearly independent, we have contradicted the maximality of A.

Therefore, S ⊆ span (A). Because span (S) = V, it follows from the span of any subset S
of a vector space V is a subspace of V.
Moreover, any subspace of V that contains S must also contains the span of S. That is
span (A) = V.
16
Theorem 1. 7. 2. Let S be a linearly independent subset of a vector space V. There
exists a maximal linearly independent subset of V that contains S.
Proof. Let F denote the family of all linearly independent subsets of V that contain S.
In order to show that F contains a maximal element, we must show that if C is a chain in
F, then there exists a member U of F that contains each member of C.
We claim that U, the union of the members of C, is the desired set. Clearly U contains
each member of C, and so it suffices to prove that U ∈ F (That is U is a linearly
independent subset of V that contains S). Because each member of C is a subset of V
containing S, we have S ⊆ U ⊆ V. Thus we need only prove that U is linearly
independent.
Let u 1 , u 2 , u 3 , . . . . ,u n be in U and a 1 , a 2 , a 3 , . . . .,a n be scalars such that a 1 u 1 + a 2 u 2
+ . . . . . . + a n u n. = 0. Because u i ∈ U for i = 1, 2, 3……..n, there exists a set A i in C
such that u i ∈ A i . But since C is a chain, one of these sets, say A k , contains all the others.
Thus u i ∈ A k for i = 1, 2, 3……n. However, A k is a linearly independent set; so a 1 u 1 +
a 2 u 2 + a 3 u 3 + …… + a n u n. = 0 implies that F has a maximal element. This element is
easily seen to be a maximal linearly independent subset of V that contains S.

1. 8. Summary

The following are the important points contained in this unit.

1. A general definition of vector space along with examples and establish its basic
properties. Vector space, a set of multidimensional quantities, known as vectors,
together with a set of one-dimensional quantities, known as scalars, such that
vectors can be added together and vectors can be multiplied by scalars while
preserving the ordinary arithmetic properties (associativity, commutativity,
distributivity, and so forth).
2. A linear subspace is usually called simply a subspace when the context serves to
distinguish it from other kinds of subspaces.

17
3. A linear combination is an expression constructed from a set of terms by
multiplying each term by a constant and adding the results. The concept of linear
combinations is central to linear algebra and a system of linear equations (or linear
system) is a collection of linear equations involving the same set of variables.
• The span of any subset S of a vector space V is a subspace of V.
4. A family of vectors is linearly independent if none of them can be written as a linear
combination of finitely many other vectors in the collection. A family of vectors
which is not linearly independent is called linearly dependent.
5. Every vector space has a basis, and all bases of a vector space have the same
number of elements, called the dimension of the vector space. In this unit, we study
the dimension theorem, which is play very vital role for subsequent units.
6. Finally, the maximal principle guarantees the existence of maximal elements in a
family of sets, with the help of above maximal property we study the Maximal
linearly independent subsets.
• Let V be a vector space and S a subset that generates V. If A is a maximal
linearly independent subset of S, then A is a basis for V.
• Let S be a linearly independent subset of a vector space V. There exists a
maximal linearly independent subset of V that contains S.

1. 9. Keywords
Basis Scalar Multiplication
Dimension Span of a subset
Dimension theorem Spans
Finite - dimensional space Standard basis
Linear combination Subspace
Linearly dependent System of linear equation
Linearly independent Vector
Polynomial Vector addition
Scalar Vector space

18
1. 10. Assessment Questions

1. Let V be a vector space over a field F. Prove that


(i) the additive identity 0 is unique.
(ii) the additive inverse –v is unique.

Hint. See the general properties of vector space.

2. Let S be the set of all elements of the form (x + 2y, y, –x + 3y) in R3, where
x, y ∈ R. Show that S is a subspace of R3.

(Hint. Let u, v ∈S. Then, u = (x 1 + 2y 1 , y 1 , –x 1 +3y 1 ), v = (x 2 + 2y 2 , y 2 , –x 2


+3y 2 )
where x 1 , y 1 , x 2 , y 2 ∈ R. Then show au + bv = ∈S, where α = ax 1 +bx 2
and β = ay 1 +by 2 ).

3. Express the polynomial f(x) = x2+ 4x –3 in the vector space V of all polynomials
over R as a linear combination of the polynomials g(x) = x2 – 2x + 5,
h(x) = 2x2 – 3x and φ(x) = x +3.
Answer. f(x) = –3g(x) +2h(x) +4 φ(x).

4. Express v = (2, –5, 3) in R3 as a linear combination of the vector:


v 1 = (1, –3, 2), v 2 = (2, –4, –1), v 3 = (1, –5, 7).
Answer.
1 2 1   x  2 
    
0 2 −2   y  =  1 
0 0 0   z  3 2 

This is an inconsistent system of equations and so has no solution. Hence, v


cannot be written as a linear combination of v 1 , v 2 and v 3 .

5. Verify whether given set of vectors (0, 1, 0, 1, 1), (0, 1, 0, 1, 1), (0, 1, 0, 1, 1) and
(0, 1, 0, 1, 1) are linearly independent in R5.

19
Answer. b = 0, d = 0, a = 1 and c = –1. This shows the set of vectors are
Linearly dependent.

1. 11. References

1. Gilbert Strang – Linear Algebra and its Applications, Academic Press, New York,
1976.
2. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
3. N. S. Kumeresan – Linear Algebra, A Geometric approach, Prentice Hall India, 2000.
4. Michael Artin – Algebra, Prentice Hall India, New Delhi, 2007.

20
UNIT-2: LINEAR TRANSFORMATION AND MATRIX

STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. Linear Transformation,
2. 2. 1. Null Spaces and Ranges
2. 2. 2. Dimension Theorem
2. 3. The Matrix Representation of a Linear Transformation
2. 4. Composition of a Linear Transformation and Matrix Multiplication
2. 4. 1. The relation between multiplication of linear transformation and matrix
2. 5. Invertiblity and Isomorphism
2. 5. 1. The Algebra of linear transformation
2. 5. 2. Invertible and Non-invertible
2. 5. 3. Isomorphism
2. 6. The Change of Coordinate Matrix
2. 6. 1. Transition matrix
2. 7. The Dual Space
2. 7. 1. Linear functional
2. 8. Summary
2. 9. Keywords
2. 10. Assessment Questions
2. 11. References

21
UNIT-2: LINEAR TRANSFORMATION AND MATRIX

2. 0. Objectives

After going through this unit, you will be able to:


• Known as vector in additive form and scalar in multiplicative form, which form a
linear transformation.
• Finding the Null space, range, nullity, kernel and rank of given linear transformation.
• Explain the dimension theorem.
• Express the matrix by using linear transformation
• Describe the product of linear transformation
• Distinguish multiplication of linear transformation and multiplication of matrix.
• Define the change of coordinate matrix
• Illustrate the concept of dual basis and dual space.

2. 1. Introduction

A linear map, linear mapping, linear transformation, or linear operator (in some
contexts also called linear function) is a function between two vector spaces that
preserves the operations of vector addition and scalar multiplication. As a result, it
always maps straight lines to straight lines or 0. A. Cayley formally introduced m × n
matrices in two papers in 1850 and 1858 (the term “matrix” was coined by Sylvester in
1850). He noted that they “comport themselves as single entities” and recognized their
usefulness in simplifying systems of linear equations and composition of linear
transformations.

22
2. 2. Linear Transformation

Definition 2. 2. 1. Let U and V be vector space over field F. Then a map T: U→V is a
linear transformation (or a linear map) if T (u+v) = T(u) + T(v) and T(av) = a T (v).
Two conditions together can be written in the form T (au+bv) = aT(u) + bT(v) for
all a, b∈F and u, v∈U.
Note.
1. If T is linear map, then T(0)=0.
2. T is linear map if and only if T(au+v) = aT(u) + T(v) for all a∈F and u, v∈U.
3. If T linear map, then T (u – v) = T(u) – T(v) for all u, v∈U.
4. T is linear map if and only if, for u 1 , u 2 , . . . ,u n ∈U and a 1 , a 2 , . . . , a n ∈F,
n n
we have T (∑ a i ui ) = ∑ a iT (ui ) .
=i 1 =i 1

The following examples provide some more detailed illustrations concerning


linear transformations.

Illustrative Example - 1. If the mapping T: R2→R2 defined by T(x, y) = (4x+5y, 6x–y),


then show that T is linear transformation.
Solution. Let a, b∈ R and u, v∈ R2, where u = (u 1 , u 2 ) and v = (v 1 , v 2 ).
Then, T(au+ bv) = T(au 1 + bv 1 , au 2 + bv 2 )
= (4(au 1 + bv 1 ) + 5(au 2 + bv 2 ), 6(au 1 + bv 1 ) – 5(au 2 + bv 2 ))
= (4au 1 + 5au 2 , 6au 1 – au 2 ) + (4bv 1 +5 bv 2 , 6bv 1 – bv 2 )
= aT(u) + bT(v).
Thus, T is a linear transformation of R2 into R2.

Note. Let P n (R) denote the vector space that consists of all polynomials in x with degree
n or less and coefficient in R.

23
Illustrative Example - 2. If the mapping T: P 1 (R) →P 2 (R) defined by
T(p(x)) = (1+x)p(x) for all p(x) in P 1 (R), then show that T is linear transformation.
Solution. Let a, b∈ R and p(x), q(x)∈P 1 (R) and the polynomials T(p(x)), T(q(x))∈
P 2 (R). Then, T (a p(x) + b q(x)) = (1+x)( a p(x)+b q(x))
= a (1+x)( p(x)+b (1+x) q(x)
= a T( p(x)) + b T(1+x) q(x).
Thus, T is a linear transformation of P 1 (R) into P 2 (R).
For verifying the above example whether T is linear transformation or not, check
a specific computation of a value of T is provided by T(3+2x) = (1+x) (3+2x) =
3+5x+2x2.

Note:
1. For vector spaces V and W over a field F, we define an identity transformation
I : V → W by I(v) =v for all v∈V and the zero transformation T 0 : V → W by
T 0 (v) = 0 for all v∈V. It is clear that both of these transformations are linear.
2. If S and T are linear transformation of vector space U into V over a field F, then
sum S + T is defined by (S + T )(u) = S(u) + T(u) for all u in U. Also, for each
linear transformation T of U into V and each scalar in F, we define the product of
a and T to be the mapping aT of U into V given by (aT)(u)=a(T(u)).

Theorem 2. 2. 1. Let U and V be vector space over the same field F. Then the set of all
linear transformation of U into V is a vector space over F.
Proof. Let T 1 , T 2 and T 3 denote arbitrary linear transformation of U into V, let u and v
be arbitrary vectors in U, and let a, b and c be scalars in F.
Since, (T 1 +T 2 ) (au+ bv) = T 1 (au+ bv) + T 2 (au+ bv)
= aT 1 (u)+ b T 1 (v) + aT 2 (u)+ b T 2 (v)
= a[T 1 (u)+ T 2 (u)] + b[T 1 (v)+ T 2 (v)]
= a(T 1 + T 2 )(u) + b(T 1 + T 2 )(v),
Therefore, T 1 + T 2 is a linear transformation of U into V.
Addition is associative, since
(T 1 + (T 2 +T 3 ))(u) = T 1 (u) + (T 2 + T 3 )(u)
24
= T 1 (u) + [T 2 (u) + T 3 (u)]
= [T 1 (u) +T 2 (u)] + T 3 (u)
= (T 1 + T 2 ) (u) + T 3 (u)
= ((T 1 + T 2 ) +T 3 ) (u).
The zero linear transformation Z is an additive identity, since
(T 1 + Z)(u) = T 1 (u) + Z(u) = T 1 (u) + 0 = T 1 (u) for all u in U.
The additive inverse of T 1 is the linear transformation (–T 1 ) of U into V defined by
(–T 1 )(u)= –T 1 (u), since (T 1 + (–T 1 ))(u) = T 1 (u)+ (–T 1 (u)) = 0 for all u in U.
For any u in U, (T 1 + T 2 ) (u) = T 1 (u) + T 2 (u) = T 2 (u) + T 1 (u) = (T 2 + T 1 ) (u),
so T 1 + T 2 = T 2 + T 1 .
Since, (cT 1 ) (au+ bv) = c(T 1 (au+ bv)) = c(aT 1 (u)+ b T 1 (v)) = c(aT 1 (u))+ b (T 1 (v)),
cT 1 is a linear transformation of U into V.

Theorem 2. 2. 2. Let T be a linear transformation of U into V. If W is any subspace of U,


then T(W) is a subspace of V.
Proof. Let W be a subspace of a vector space U. Then 0∈W and T(0) = 0, so T(W) is
nonempty. Let v 1 , v 2 ∈ T(W), and a, b∈F. There exist vectors w 1 , w 2 ∈W such that
T(w 1 ) = v 1 and T(w 2 )= v 2 . Since W is a subspace of a vector space U, aw 1 + bw 2 is in W
and T 1 (a w 1 + b w 2 ) = aT 1 (w 1 ) + b T 1 (w 2 )) = av 1 + b v 2 . Thus, av 1 + bv 2 is in T(W), and
T(W) is a subspace of a vector space V.

2. 2. 1. Null Spaces and ranges

Definition 2. 2. 2. The subspace T(U) of V is called range of T. The dimension of T(U)


is called the rank of T and is denoted by rank (T) or r(T).

Note. Let T be a linear transformation of U into V. If W is any subspace of a vector


space V, the inverse image T–1(W) is a subspace of a vector space U.

Definition 2. 2. 3. The subspace T–1(0) is called kernel of the linear transformation T.


The dimension of T–1(0) is the nullity of T, and is denoted by nullity (T) or n(T).

25
Theorem 2. 2. 3. If T is a linear transformation of U into V and A = (u 1 , u 2 , . . . ,u n ) is a

basis of U, then T(A) spans T(U).

Proof. Suppose that A = ( u 1 , u 2 , . . . ,u n ) is a basis of a vector space U, and consider the

set T(A) ={ T(u 1 ), T(u 2 ), . . . ,T(u n )}. For any vector v in T(U), there is a vector u in U
n
such that T(u) = v. The vector u can be written as u = ∑ ai ui since A is a basis of U.
i =1

n n
This = (∑ ai ui )
gives v T= ∑ aiT (ui ), and T(A) spans T(U).
=i 1 =i 1

2. 2. 2. Dimension Theorem
Theorem 2. 2. 4. Let T be a linear transformation of U into V. If U has finite dimension,
then rank(T) + nullity(T) = dim(U).
Proof. Suppose dim(U) = n, and let nullity(T) = k. Choose ( u 1 , u 2 , . . . ,u k ) to be a basis
of the kernel T–1(0). This linearly independent set can be extended to a basis
A = { u 1 , u 2 , . . . ,u k , u k+1 , . . . ,u n } of U. By Theorem 2. 2. 3, the set T(A) spans T(U).
But T(u 1 ) = T(u 2 ) = T(u 3 ) =, . . . . . . , = T(u n ) = 0, so this means that the set of (n–k)
vectors { T(u k+1 ), T(u k+2 ), . . . ,T(u n )} spans T(U).
To show that this set is linearly independent, suppose that
c k+1 T(u k+1 )+ c k+2 T(u k+2 ) + , . . . ,+ c n T(u n ) = 0.
n
Then T(c k+1 u k+1 + c k+2 u k+2 + , . . . ,+ c n u n ) = 0, and ∑ ciui is in T–1(0). Thus there are
i= k +1

n n n n
scalars (d 1 , d 2 , . . . ,d k ) such that ∑ di ui = ∑ ci ui ⇒ ∑ di ui − ∑ ci ui = 0 . Since A is a
i=
1 i=
k +1 i ==
1 i k +1

basis, each c i and d i must be zero.


Hence {T(u k+1 ), T(u k+2 ), . . . ,T(u n )} is a basis of T(U). Since rank (T) is the dimension
of T(U), n – k = rank (T) and rank (T) + nullity (T) = n = dim (U).

2. 3. The Matrix Representation of a Linear Transformation

26
Let U and V be vector spaces of dimension n and m, respectively, over the same
field F, and T will denote the a linear transformation of U into V. Suppose that
A = ( u 1 , u 2 , . . . ,u n ) is a basis of U. Any u in U can be written uniquely in the form
n n
u = ∑ x j u j , and T (u ) = ∑ x jT (u j ). If B = ( v 1 , v 2 , . . . ,v m ) is a basis of V, then each T(u j )
j =1 j =1

m
can be written uniquely as T (u j ) = ∑ aij vi . Thus, with each choice of basis A and B, a
i =1

linear transformation T of U into V determines a unique indexed set{a ij } of mn - elements


of F.

Definition 2. 3. 1. Suppose that A = ( u 1 , u 2 , . . . ,u n ) and B = ( v 1 , v 2 , . . . ,v m ) are basis


of U into V, respectively. Let T be a linear transformation of U into V. The matrix of T
A
relative to the basis A and B is the matrix A = [a ij ] m×n = [T ]B, A = [T ]B , where the {a ij }
m
are determined by the conditions T (u j ) = ∑ aij vi = a1 j v1 + a2 j v2 +,....., + amj vm for
i =1

j =1,2,…, n.

Note. The symbols of [T ]B, A in the above definition denote the same matrix, but the
first one place notational emphasis on the elements of the matrix, while the second one
place emphasis on T and the basis A and B. This matrix A is also referred to as the

matrix of T with respect to A and B, and we say that T is represented by the matrix A.

Also, the element of a ij are uniquely determined by T for given basis A and B. Another
way to describe A is to observe that the jth column of A is the coordinate matrix of T(u j )
with respect to B.

 a1 j 
 
 a2 j 
That is  .  = [T(u j )] B and A = [T ]B, A = [T(u 1 )] B , [T(u 2 )] B , . . . . . . , [T(u n )] B.
 
 . 
 
 amj 

27
Illustrative Example - 1. Let T: P 2 (R)→ P 3 (R) be the linear transformation defined by
T(a 0 + a 1 x+ a 2 x2) = (2a 0 + 2a 2 )+(a 0 + a 1 + 3a 2 )x+( a 1 + 2a 2 )x2 +( a 0 + a 2 ) x3.
Then, find the matrix A of T relative to the basis.
Solution. Let A = {1, 1–x, x2} be a basis of P 2 (R) and B={ 1, x,1–x2, 1+x3 }be a basis
of P 3 (R).
To find the first column of A, we compute T(1) and write it as a linear combination of the
vectors in B.
If T(1) = 2+x+x3= (1)(1)+(1)(x)+(0)(1–x2)+(1)( 1+x3).
1 
1 
Then, the first column of A is [T(1)] B =   ,
0
 
1 
To find the second column of A, we compute T(1–x) and write it as a linear combination
of the vectors in B.
If T(1–x ) = 2–x+x3= (0)(1)+(0)(x)+(1)( 1–x2)+(1)( 1+x3).
0
0
Then, the second column of A is [T(1–x)] B =  
1 
 
1 
To find the third column of A, we compute T(x2) and write it as a linear combination of
the vectors in B.
If T(x2) = 2+3x+2 x2+x3 = (3)(1) + (3)(x) + ( –2)( 1–x2) + (1)( 1+x3).
 3
 3
Then, the third column of A is [T(x2)] B =  
 −2 
 
 1
Thus the matrix of T with respect to A and B is given by

28
1 0 3
1 0 3 
A= [T ]B, A = [T(1)] B , [T(1–x)] B , [T(x2)] B =
0 1 −2 
 
1 1 1 .

Illustrative Example - 2. If the linear transformation T: R4→ R3 given by


T(x 1 , x 2 , x 3 , x 4 ) = ( x 1 – x 3 + x 4 , 2x 1 + x 2 +3 x 4 , x 1 + 2x 2 +3 x 3 +3 x 4 ). Find [T ]ε 3 ,ε 4 .

Solution. Using the definition of standard basis of ε 4 and ε 3 , we compute


T(1, 0, 0, 0) = (1, 2, 1),
T(0, 1, 0, 0) = (0, 1, 2),
T(0, 0, 1, 0) = (–1, 0, 3),
T(0, 0, 0, 1) = (1, 3, 3).

Thus the matrix of T relative to ε 4 and ε 3 is

 1 0 −1 1
[T ]ε 3 ,ε 4 =  2 1 0 3 .
 1 2 3 3

Note.
1. If A is any matrix that represents the linear transformation T of U into V, then

rank (A) = rank(T). That is rank ( [T ]B, A ) = rank(T).

2. The matrix of T relative to the basis A and B is the matrix A = [a ij ] m×n = [T ]B, A .

If u is an arbitrary vector in U, then [T(u)] B = [T] B,A [u] A .

3. The matrix of T relative to the basis A and B is the matrix A = [a ij ] m×n = [T ]B, A

is a matrix such that the equation [T(u)] B = A[u] A is satisfied for all u in U, then

A is the matrix of the linear transformation T relative to the basis A and B.

2. 4. Composition of Linear Transformation and Matrix Multiplication

29
Definition 2. 4. 1. Let U, V and Z be vector spaces over the same field F, and let S:
U→V and T : V → Z be linear transformations. Then the product TS is the mapping
of U into Z (that is TS : U → Z) defined by TS(u) = (T o S)(u) = T(S(u)) for each u in U.

Theorem 2. 4. 1. The product of two linear transformations is a linear transformation.


Proof. Let U, V and Z be vector spaces over the same field F, and let S: U→V and
T : V → Z be linear transformations. Then TS : U → Z defined by TS(u) = (T o S)(u) =
T(S(u)) for each u in U.
To show that TS is a linear transformation, let u 1 , u 2 ∈U and a, b∈F.
TS (au 1 + b u 2 ) = T(S (au 1 + b u 2 ))
= T(aS(u 1 ) + b S( u 2 ))
= aT(S(u 1 )) + b T(S( u 2 ))
= aTS(u 1 ) + b TS( u 2 ),
and TS is indeed a linear transformation of U into Z.

2. 4. 1. The relation between multiplication of linear transformation and


multiplication of matrix

Theorem 2. 4. 2. Suppose that U, V and Z are finite dimensional vector spaces with
bases A, B and C, respectively. If S has matrix A relative to A and B and T has matrix

B relative to B and C, then TS has matrix BA relative to A and C.

Proof. Assume that the hypotheses are satisfied with A = ( u 1 , u 2 , . . . ,u n ),

B = ( v 1 , v 2 , . . . ,v n ) and C=( w 1 , w 2 , . . . ,w n ), A = [a ij ] m×n , and B = [b ij ] p×m . Let C =

[c ij ] p×n be the matrix of TS relative to the basis A and C. For j=1, 2, …, n , we have
p
∑ cij wi = TS (u j )
i =1

m
= T ( ∑ akj vk )
k =1

m
= ∑ akjT (vk )
k =1

30
m p
= ∑ akj (∑ bik wi )
=k 1 =i 1

m p
= ∑ ∑ (akj bik wi )
k= 1 =i 1

p m
= ∑ ( ∑ (bik akj ) wi ,
=i 1 =
k 1

m
and consequently cij = ∑ bik akj for all values of i and j.
k =1

Therefore C=BA and the theorem is proved.

Illustrative Example -1. If the linear transformation S : M 2x2 (R) →R3 and
 a b
T : R3→P 1 (R) defined by S =(a + 2b, b − 3c, c + d ) and T(a 1 , a 2, a 3 ) = (a 1 –
 c d  

2a 2 –6a 3 ) + (a 2 +3a 3 )x. Then


(i) Compute the product TS.
(ii) Find the matrices of S, T, TS relative to these basis
 1 0 0 1 0 0 0 0 
A =  , , ,   , B = ε 3 , and C = {1, x}.
 0 0 0 0 1 0 0 1 

(iii) Verify that the matrix of TS is the product of the matrix of T times the matrix of S.

Solution. (i) We have

 a b     a b   
TS     = T  S     
 c d      c d 
= T( a + 2b, b − 3c, c + d )
= (a+2b–2(b–3c) –6(c+d) + (b–3c+3(c+d))x
= (a–6d) + (b+3d)x.
 1 0   0 1 
(ii) Since S  = (1, 0, 0), S  = (2,1, 0),
 0 0    0 0  

 0 0   0 0 
S = (0, −3,1), S    = (0, 0,1) .
 1 0   0 1  

31
1 2 0 0 
A 0 1 −3 0  .
The matrix A of S relative to A and B is=
0 0 1 1 

Since T (1, 0, 0) = (1) (1) + (0) (x)


T (0, 1, 0) = (–2) (1) + (1) (x)
T (1, 0, 0) = (–6) (1) + (3) (x),
1 − 2 − 6 
The matrix B of T relative to B and C is B =  .
0 1 3 
For the product TS, we have
 1 0    0 1  
TS   =   (1)(1) + (0)( x), TS   =   (0)(1) + (1)( x),
 0 0    0 0  
 0 0    0 0  
TS   =   (0)(1) + (0)( x), TS    =(−6)(1) + (3)( x),
 1 0    0 1  

1 0 0 − 6 
so the matrix of TS relative to A and C is C =  
0 1 0 3 
(iii) Verify the usual arithmetic matrix multiplication, we have
1 2 0 0 
1 −2 −6   1 0 0 −6 
=
BA    3 0  
0 1 −= =  C.
 0 1 3  0 0 1 1   0 1 0 3 
 

Note.
1. In the above example the product AB is not defined, and this is consistent with
the fact that S is not defined
2. The operation of addition and multiplication of matrices are connected by the
distributive property, that is A(B+C) = AB+AC, where A = [a ij ] m×n , B = [b ij ] p×m
and C = [c ij ] p×n .
3. Let A be an n × n matrix with entries from a field F. The mapping T A : F n → F n
defined by T A (x) = Ax (the matrix product of A and x) for each column vector x∈
F n . This is known as Left-Multiplication transformation.

32
Illustrative Example -2. Let T : R2 → R2 be a linear transformation defined by
T(a, b) = (2a – 3b, a + b) for all (a, b) ∈ R2. Find the matrix of T relative to bases
B = {v 1 = (1, 0), v 2 = (0, 1)} and B1 = { 𝑣11 = (2, 3), 𝑣21 = (1, 2)}.

Solution. In order to find the matrix of t relative to bases B and B 1, we have to express
T(v 1 ) and T(v 2 ) as a linear combination of 𝑣11 and 𝑣21 .R

For this, we first find the coordinates of an arbitrary vector (a, b) ∈ R2 with respect to
basis B1.
(a, b) = 𝑥𝑣11 +y 𝑣21
R

⇒ (a, b) = x(2, 3) + y(1, 2)


⇒ (a, b) = (2x + y, 3x + 2y)
⇒ 2x + y = a and 3x + 2y = b
⇒ x = 2a – b and y = – 3a + 2b
Therefore (a, b) = (2a – b) 𝑣11 + (– 3a + 2b) 𝑣21
Now, T(a, b) = (2a – 3b, a + b) → (1)
⇒ T(v 1 ) = T(1, 0) = (2, 1) and T(v 2 ) = T(0, 1) (– 3,1) [ Putting a = 2 and b = 1 in
(1)]
⇒ T(v 1 ) = 3 𝑣11 – 4𝑣21 and T(v 2 ) = –7 𝑣11 + 11𝑣21 [ Putting a = –3 and b = 1]

 3 −4  −7 
T
 3
[T ]B,B1 =
=   −4 11 .
 −7 1 

2. 5. Invertibility and Isomorphism

2. 5. 1. Algebra of Linear Transformation

Definition 2. 5. 1. Let V be a vector space over a field F and let Hom (V, V) or A(V) be
the set of all linear transformation of V onto itself. This is known as Algebra of Linear
transformation.
That is a map T : U →V is a linear transformation on a vector space with U=V.
For T 1, T 2 ∈ A(V),
(T 1 + T 2 )(v) = T 1 (v) + T 2 (v) ⇒ T 1 + T 2 ∈ A(V)

33
a(T(v)) = (aT)(v) ⇒ aT ∈ A(V)
Now, we show that (T 1 T 2 ) (au + bv) = a(T 1 T 2 )(u) + b(T 1 T 2 )(v) for all a, b ∈ F, u, v ∈
V and T 1 ,T 2 ∈ A(V).
Consider (T 1 T 2 ) (au +bv) = T 1 (T 2 (au + bv))
= T 1 (aT 2 (u)) + T 1 (bT 2 (u))
= a(T 1 T 2 )(u) + b(T 1 T 2 )(v).

2. 5. 2. Invertible and Non-invertible


Definition 2. 5. 2. An element TϵA(V) is invertible (regular) if it is both right and left
invertible. That is, ST = TS = I for some S∈A(V). We denote the inverse S of T as T–1.
Note.
1. An element in A(V) which is not invertible is called singular.
2. It is quite possible that an element in A(V) is right invertible but is not
invertible.
For example, we consider the field F of real numbers and V. The linear map
𝑑 𝑥
S, T: V→V be defined by S(q(x)) = 𝑑𝑥 �𝑞(𝑥)� and T (q(x))= ∫1 𝑞(𝑥)𝑑𝑥 + 𝑘 .
𝑑 𝑑
First, TS(q(x)) = T (S (q(x))) = T (𝑑𝑥 𝑞(𝑥)) = ∫ 𝑑𝑥 𝑞(𝑥)𝑑𝑥 + 𝑘 = q(x) + k, where

k is a non zero integrating constant. Therefore T S ≠ 1.


𝑥 𝑑
Next , ST(q(x)) = S (T (q(x)) = S(∫1 𝑞(𝑥)𝑑𝑥 + 𝑘 ) = 𝑑𝑥 (∫ 𝑞(𝑥)𝑑𝑥 + 𝑘 = q(x).

Therefore ST = 1. Finally, ST≠TS.


3. Let A be an algebra with unit element e, over a field F and let
p(x) = a 0 +a 1 x +….+ a n xn be a polynomial in F. For α ∈ A, the polynomial p(α)
, we shall mean the element α 0 e + α 1 a+…..+α n an in A. If p(α) = 0 we shall say
α satisfies p(x).
4. A polynomial p(x) is said to be the Minimal polynomial for T, if
(i) p(T) = 0
(ii) If for all q(x) ∈F[x] such that degree of q(x) < degree of p(x), then q(T)≠0.
5. If V is a finite dimensional vector space over a field F, then T∈A(V) is invertible
(regular) if and only if the constant term of the minimal polynomial for T is not
zero.

34
6. If V is finite dimensional over F and if Tϵ A(V) is non-invertible(singular), then
there exists an S ≠ 0 in A(V) such that ST = TS = 0.

Theorem 2. 5. 1. If V is finite dimensional vector space over a field F, then TϵA(V) is


non-invertible (singular) if and only if there exists a v ≠ 0 in V such that T(v) = 0.
Proof. Suppose TϵA(V) is singular. Then there exists S ≠ 0 , there is an element w∈V
such that S(w) ≠ 0. Let v = S(w) ≠ 0 ∈ V.
Then T(v) = T (S(w)) = T (S(w)) = (TS)(w)=0w= 0, since T is singular. Thus T(v) = 0
for v≠ 0 in V.
Conversely, suppose T(v) = 0 for v≠ 0 in V. We claim that T is singular. Suppose T is
regular, then there exits T–1 such that TT–1 = T–1T = I.
So T(v) = 0 ⇒ T–1 (T(v)) = T–10 ⇒ (T–1T)v = 0 ⇒ v = 0, a contradiction.
There fore T is singular.

Note.
1. Let A be an n×n matrix. Then A is invertible if there exists an n×n matrix B such
that AB=BA=I, which is unique (If C is an another such matrix, then C = CI =
C(AB) = (CA)B = IB+B). The matrix B is called the inverse of A and is denoted
by A–1.
2. If V is finite dimensional vector space over a field F with ordered bases A and B,
respectively, then a linear transformation T∈A(V) is invertible if and only if

[T ]B, A is invertible. Furthermore, T −1 B,A = ([T ]B,A ) .


−1

2. 5. 3. Isomorphism

Definition 2. 5. 3. Let U and V be vector space. We say that U is isomorphic to V if


there exist a linear transformation T: U→V that is invertible. Such a linear transformation
is called an isomorphism U onto V.

Note.
1. Let U and V be finite dimensional vector space (over the same filed F). Then V is
isomorphic to U if and only if dim(U) = dim(V).
35
2. Let V be finite dimensional vector space over a field F. Then V is isomorphic to
F n if and only if dim (V) = n.

Theorem 2. 5. 2. If A is an algebra, with unit element, over F, then A is isomorphic to a


sub algebra of A(V) for some vector space V over F.
Proof. Since A is an algebra over F, it must be a vector space over F.
We shall use V = A to prove the result
For x ∈ A , define T x : V → V by T x (v) = vx for all v∈V or A
We assert that T x is a linear transformation on V(=A)
Consider T x (v 1 + v 2 ) = (v 1 v 2 )x = v 1 x + v 2 x = T x (v 1 ) + T x (v 2 ) and
T x (av) = (avx) = a(vx) = a(T x (v))
Hence T x is a linear transformation on V. Therefore, T x ∈A(V).
Consider the mapping ψ : A →A(V) defined by ψ(x) = T x .
We claim that ψ is an isomorphism of A into A(V).
For x, y ∈ A and a, b ∈ F, we have
T a x+ by (v) = v (a x + by)
= v(a x) + v(by)
= a(vx) + b(vy)
= a (T x (v)) + b(T y (v))
= (a T x + bT y )(v), since T x and T y are linear transformation.
Thus, T a x+ by = a T x + bT y ⇒ ψ( a x + by) = a ψ (x) + bψ (y)
Therefore ψ is a linear transformation of A into A(V).
Now for x, y ∈ A ,we have
T xy (v) = v(xy) = (vx)y = (T x (v))T y = T x (T y (v))= (T x T y )(v).
Thus, T xy = T x T y ⇒ ψ(xy) = ψ (x) + ψ (y).
Therefore ψ is also a ring homomorphism of A. So far we have proved that ψ is a
homomorphism of A, as an algebra in to A(V).
To prove ψ is one to one, we claim that Ker (ψ) = {0}
Let x∈ Ker (ψ). Then ψ(x) = 0 ⇒ T x = 0
⇒ T x (v) = 0(v) = 0 for any v ∈V(=A)

36
⇒ T x (e) = 0 where e is the unit element of V
⇒ ex = 0 or x = 0, since e ≠ 0
Therefore Ker (ψ) = {0} and hence ψ is one to one.
So ψ is an isomorphism of A in to A(V).

Theorem 2. 5. 3. Let A be an algebra, with unit element, over a field F, and suppose that
A is of dimension m over F. Then every element in A satisfies some nontrivial
polynomial in F[x] (or P n (F)) of degree at most m.
Proof. Let e be an unit element of A. For y∈ A, we have e, y, y2,. . ., ym in A.
Since A is m - dimensional over a field F, it follows that the (m+1)- elements (vectors)
e, y,y2, . . . ,ym are linearly dependent over a field F. So there exist the elements
a 0 , a 1 ,……,a m in a field F, not all zero, therefore x satisfies the non trivial polynomial
q(x) = a 0 + a 1 +,……..,+a m x m , of degree at most m, in F[x]. Hence, degree of p(x) ≤ m.

Note. A(V) = Hom (U, V) = {T: U→V / T is a linear transformation}

Theorem 2. 5. 4. If V is an n - dimensional vector space over a field F, then, given any


element T in A(V), there exists a non trivial polynomial q(x) ∈ F[x] of degree atmost n2,
such that q(T) = 0.
Proof. We know that dim (Hom (U, V)) = dim(U) dim (V), where V and W are vector
spaces over a field F. Hence dim (A(V)) = dim (Hom (V, V)) = n2.
Hence, by theorem 2. 5. 3, any element T of A(V) satisfies some non trivial polynomial
q(x) in F[x] of degree n2, that is, q(T) = 0.

Illustrative Example-1. Let T : R2 → R2 be a linear transformation defined by


T(x, y) = (2x + y, 3x + 2y). Then show that T is invertible and find T – 1.
Solution. To show T is invertible, it is sufficient to show that T is non-singular.
That is, T(x, y) = (0, 0) ⇔ (x, y) = (0, 0).
Now, T (x, y) = (0, 0) ⇔ (2x+ y, 3x+2 y) = (0, 0)
⇔ 2x+ y = 0, 3x+2y=0
⇔ x = 0, y = 0
⇔ (x, y) = (0, 0)

37
Thus, T(x, y) = (0,0) ⇔ (x, y) = (0, 0).
To find the formula for T– 1.
Let T(x, y) = (a, b). Then, T – 1(a, b) = (x, y)
Now, T(x, y) = (a, b) ⇒ (2x + y, 3x + 2y) = (a, b)
⇒ 2x + y =a and 3x + 2y=b
⇒ x = 2a – b, y = –3a + 2b
Therefore, T – 1(a, b) = (2a – b, –3a + 2b).

2. 6. The Change of Coordinate Matrix

The relation between those matrices that represent the same linear transformation.
This description is found by examining the effect that a change in the bases of U and V
has on the matrix of T.
Theorem 2. 6. 1. Let C= (w 1 , w 2 , . . . ,w k ) and C1 = (w 1 1, w 2 1, . . . , w k 1 ) be two bases
of the vector space W over F. For an arbitrary vector w in W, let
c1  c11 
c   1
 2 c2 
[ w]C = C =  .  and [w]C l = C 1 = .
   
. .
c  c 1 
 k  k
denote the coordinate matrices of w relative to C and C1 , respectively. If P is the matrix

of transition from C to C 1, then C =PC 1. That is , [w] C =P[w] C 1 .

Proof. Let P = [p ij ] k×k , and assume that the hypotheses of the theorem are satisfied.
k k
= ∑=
ci wi ∑ c1j w1j and
k
Then w w1j = ∑ pi j wi . Combining these equalities, we have
=i 1 =j 1 i =1

( )  
k k k

∑ pi j wi ==∑  ∑ pi j c j  wi .
k 1
w = ∑ ci wi = c j
1

i =1 i =1 i 1 =j 1  

38
 c1   p11c11 + p12 c11 + ....... + p1k ck1 
   1
 c2   p21c1 + p22 c1 + ....... + p2 k ck 
1 1

k
Therefore, ci = ∑ pi j c j and  .  =   = PC1.
1
...
j =1    
.  ... 
ck   pk 1c11 + pk 2 c11 + ....... + pkk ck1 

Example -1. Consider the bases C = {x, 2+x} and C1={ 4+x, 4–x} of the vector space
W = P 1 (R). Since 4+x = (–1)(x)+(2)(2+x) and 4–x = (–3)(x)+(2)(2+x) the matrix of
 −1 −3
transition P from C to C 1 is give by P =  .
 2 2
 2
The vector w=4+3x can be written as 4+3x=2(4+x)+(–1)( 4–x) , so [w] C 1 =   .
 −1
By Theorem 2.6.1, the coordinate matrix [w] C may be found from

 −1 −3  2  1 
[w] C =P[w] C 1 =    =  .
 2 2   −1  2 
This result can be checked by using the base vectors in C.
We get, (1)(x)+2(2+x) = 4+3x = w.

2. 6. 1. Transition Matrix
Definition 2. 6. 1. If A= ( u 1 , u 2 , . . . ,u r ) is asset of vectors in Rn and B= ( v 1 , v 2 , . . .

,v s ) is a set of vectors in 〈A 〉, a matrix of transition (or transition matrix) from A to


n
B is a matrix A=[a ij ] rxs such that v j = ∑ aij ui for j= 1, 2, . . . . ,s.
j =1

Theorem 2. 6. 2. Suppose that T has matrix A = [a ij ] m×n relative to the bases A of U and

B of V. If Q is the matrix of transition from A to the basis A1 of U and P is the matrix

of transition from B to the basis B1 of V, then the matrix of T relative to A1 and B1


is P–1AQ (See Figure-1 ).

39
Figure-1
Proof. Assume the hypotheses of the theorem are satisfied. Let U be an arbitrary vector
in U, let X and X1 denote the coordinate matrices of u relative to A and A1,

respectively, and let Y and Y1 denote the coordinate matrices of T(u) relative to B and

B1, respectively. Since the matrix of T relative to the bases A and B is the matrix

A = [a ij ] m×n = [T ]B, A . If u is an arbitrary vector in U, then [T(u)] B = [T] B,A [u] A and by

Theorem 2. 6. 1, we have Y=AX, where Y=PY1 and X= QX1. Substituting for Y and X,
we have PY1=AQX1, and therefore Y1= ( P–1AQ)X1.

If the matrix of T relative to the bases A and B is the matrix A = [T ]B, A is a

matrix such that the equation [T(u)] B = A[u] A is satisfied for all u in U, then A is the

matrix of the linear transformation T relative to A and B. Thus P–1AQ is the matrix of T

relative to A1and B1.

Theorem 2. 6. 2. Two m×n matrices A and B represent the same linear transformation T
of U into V if and only if A and B are equivalent.
Proof. If A and B represent T relative to the set of bases A,B and A1,B1, respectively,

then B= P–1AQ, where Q is the matrix of transition from A to A1and P is the matrix of

transition from B to B1. Hence A and B are equivalent. If B is equivalent to A, then

B= P–1AQ for invertible P and Q. If A represents T relative to A and B, the B

represents T relative to A1andB1, where Q is the matrix of transition from A to A1 and

P is the matrix of transition from B to B1.

40
To obtain two similar results concerning row and column equivalence. For
requiring B =B1 is the same as requiring P=I m , and requiring Q=I n . Thus we have the
following theorem
Theorem 2. 6. 3. Two m×n matrices A and B represent the same linear transformation T
of U into V if and only if they are row (or column) - equivalent.

Illustrative Example - 2. Relative to the basis B = {v 1 , v 2 } = {(1, 1), (2, 3)} of R2, find
the coordinate matrix of (i) v = (4, –3) and (ii) v = (a, b)
Solution. Let x, y ∈ R such that v = xv 1 + yv 2 = x(1, 1) + y(2, 3) = (x + 2y, x + 3y)
(i) If v = (4, –3), then v = xv 1 + yv 2 ⇒ (4, –3) = (x + 2y, x + 3y)
⇒ x + 2y = 4, x + 3y = –3
⇒ x = 18, y = –7.
 19 
Hence, coordinate matrix [v] B of v relative to the basis B is given by [v] B =  
 −7 
(ii) If v = (a, b), then v = xv 1 + yv 2 ⇒ (a, b) = (x + 2y, x + 3y)
⇒ x + 2y = a, x + 3y = b
⇒ x = 3a – 2b, y = –a + b
3a - 2b 
Hence, coordinate matrix [v] B of v relative to the basis B is given by [v] B =  
 -a + b 

2. 7. The dual space

2. 7. 1. Linear functional
Definition 2. 7. 1. Liner transformation from a vector space V into its field of scalars F,
which is itself a vector space of dimension 1 over a field F. Such a linear transformation
is called a linear functional on V. We generally use the letters f, g, h….. to denote linear
functional.

Example -1. Let V be the vector space of continuous real valued functions on the
interval [0, 2π]. Consider a function g ∈ V. The function h : V → R defined by

41
1 2π
ℎ(𝑥) = 2π
∫0 x(t) g(t) dt , is a linear functional on V. In the cases that g(t) equals
sin (nt) or cos (nt), h(x) is often called the nth Fourier co-efficient of x.

Example -2. Let V be a finite - dimensional vector space and let B = {x 1 , x 2 , . . . . , x n }


be an ordered basis for V. For each i = 1, 2, . . . . . , n, define f i (x) = a i , where
𝑎1
𝑎2
[x] B = � ⋮ � is the coordinate vector of x relative to B. Then f i is a linear functional on
𝑎𝑛
a vector space V is called the ith coordinate function with respect to the basis B.

Note. f i (x j ) = δ ij , where δ ij is the kronceker delta. let V and W be finite - dimensional


vector spaces (over the same field F). Then V is isomorphic to W if and only if dim (V) =
dim (W)

Definition 2. 6. 2. For a vector space V over F, we define the dual space of V to be the
vector space Hom (V, F), denoted by V*. Thus V* is the vector space consisting of all
linear functional on V with the operations of addition and scalar multiplication. If V is
finite - dimensional, then dim (V*) = dim (Hom (V, F)) = dim (V) ⋅ dim (F) = dim (V).
Hence by above note, if V and V* are isomorphic. We also define the double dual
V** of V to be the dual of V*.

Theorem 2. 6. 1. Suppose that V is a finite dimensional vector space over a field F


with the ordered basis B = {x 1 , x 2 , . . . . , x n }. Let B* ={ f 1 , f 2 , . . . . , f n }, where

f i (1≤ i ≤ n) is the ith coordinate function with respect to B. Then B* is an ordered basis

of V*, and , for any f∈ V*, we have 𝐟 = ∑ni=1 𝐟(𝑥𝑖 ) 𝐟𝑖 .


Proof. Let f∈ V*. Since dim (V*) = n, we need only show that 𝐟 = ∑ni=1 𝐟(𝑥𝑖 ) 𝐟𝑖 .
Since any finite generating set for V contains at least n vectors, and a generating set for V
that contains exactly n vectors is a basis for V. Hence, it follows that B* generates V* is

a basis. Let 𝐠 = ∑ni=1 𝐟(𝑥𝑖 ) 𝐟𝑖 . For 1≤ j ≤ n, we have

42
n n n

𝐠�𝑥𝑗 � = �� 𝐟(𝑥𝑖 ) 𝐟𝑖 � �𝑥𝑗 � = � 𝐟(𝑥𝑖 ) 𝐟𝑖 �𝑥𝑗 � = � 𝐟(𝑥𝑖 ) δ𝑖𝑗 = 𝐟�𝑥𝑗𝑖 � .


i=1 i=1 i=1

Therefore f = g due to the fact that two vector space V and W, with V has a finite basis
{ v 1 , v 2 , . . . ,v n }, if U, T: V→W are linear and U(v i ) = T(v i ) for i = 1,2, …, n, then U=T.

Using the notion of above theorem, we define


Definition 2. 6. 3. The coordinate basis B* ={ f 1 , f 2 , . . . . , f n } of V* that satisfies

f i (x j ) = δ ij (1≤ i, j ≤ n) is known as the dual basis of B.

Illustrative Example-3. Let B ={ (2, 1), (3, 1)} be an ordered basis for R2 . Suppose

that the dual basis B is given by B* ={ f 1 , f 2 }. Determine f 1 (x, y) and f 2 (x, y).
Solution. We need to consider the equations
1= f 1 (2, 1) = f 1 (2e 1 + e 2 ) = 2 f 1 (e 1 ) + f 1 (e 2 )
0= f 1 (3, 1) = f 1 (3e 1 + e 2 ) = 3 f 1 (e 1 ) + f 1 (e 2 ).
Solving these equations, we obtain f 1 (e 1 )= – 1 and f 1 (e 2 ) = 3. That is , f 1 (x, y) = – x+ 3y.
Similarly, it can be show that f 2 (x, y) = x– 2y.

2. 8. Summary

1. In this unit, the important concept of a linear transformation of a vector space is


introduced. The relation of two vector spaces can be expressed by linear map or
linear transformation. They are functions that reflect the vector space structure. That
is, they preserve vectors in additive form and scalars in multiplicative form. Let U
and V be vector space over the same field F. Then
• the set of all linear transformation of U into V is a vector space over F.
• T(W) is a subspace of V, provided W is any subspace of U .
• rank(T) + nullity(T) = dim(U), provide U is finite dimensional vector space.
2. The expression "linear operator" is commonly used for linear maps from a vector
space to itself (That is, endomorphisms). If one has a linear transformation T(x) in
functional form, it is easy to determine the transformation matrix A by simply
43
transforming each of the vectors of the standard basis by T and then inserting the
results into the rows [columns] of a matrix.
3. The matrix multiplication is a binary operation that takes a pair of matrices, and
produces another matrix. The matrix of a composition of linear maps is given by the
product of the matrices.
• The product of two linear transformations is a linear transformation.
• The relation between multiplication of linear transformation and multiplication of
matrices.
4. The change of basis refers to the conversion of vectors and linear transformations
between matrix representations which have different bases. Using transition matrix,
we study the
• Two m×n matrices A and B represent the same linear transformation T of U into V
if and only if A and B are equivalent.
5. A dual basis is a set of vectors that forms a basis for the dual space of a vector space.

2. 9. Keywords
Change of coordinate matrix Identity matrix
Coordinate vector relative to a basis Identity transformation
Composition of linear transformation Invertible linear transformation
Dual basis Invertible matrix
Dual space Isomorphic vector spaces
Kernel of a linear transformation Isomorphism
Left - multiplication transformation Nullity of a linear transformation
Linear functional Null space
Linear transformation Product of Matrices
Matrix multiplication Range
Non-invertible linear transformation Rank of a linear transformation

2. 10. Assessment Questions

44
1. Define the range, rank, kernel and nullity of a linear transformation with an
example.
Hint. See the section 2.2.1
2. Show that the mapping J: R2 →R3 given by J(a, b) = (a+b, a–b, b) for all
(a, b) ∈ R2, is a linear transformation. Find the range, rank, kernel and nullity of
transformation J.
Answers. rank(J) = dim(I m (J))= dim(range(J)) = 2 ,
nullity(J) = dim R2 – rank(J) = 2 – 2= 0 and Kernel (J) = Null Space = {(0,0)},
where I m (J) = { J(1,0), J(0,1)}= {(1,1,0), (1, -1, 1)}.

3. Let V = P 1 (t) = { a + bt: a, b ∈ R} be the vector space of real polynomials of


degree at most one. Find the basis {b 1 , b 2 } of V that is dual to the basis {x 1 , x 2 }
1 2
of V = ∫=
f (t )dt and x 2 ( f (t )) ∫ f (t )dt
*
defined by x1 ( f (t ))
0 0

1
Answer. b1 =2 − 2t , b2 =− +t
2
4. Define algebra of linear transformation. Explain the properties of algebra of linear
transformation.
Hint. See the section 2. 5. 1.
5. Give an example to show that A(V) is right-invertible but is not invertible.
Hint. See the section 2. 5. 2.
6. Find the dual basis of the basis set B = {(1, −1, 3), (0, 1, −1), (0, 3, −2)}.

Answer. The dual basis B* = {f 1 , f 2 , f 3 }, where f 1 (x, y, z) = x, f 2 (x, y, z) =


7x−2y−3z and f 3 (x, y, z) = −2x+y+z.

2. 11. References

45
1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,
2009.
2. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.
3. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.

46
UNIT-3: ELEMENTARY MATRIX OPERATION,
RANK OF A MATRIX, MATRIX INVERSE AND
SYSTEM OF LINEAR EQUATION

STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. Elementary Matrix Operations and Elementary Matrices
3. 2. 1. Elementary row [column] operation
3. 2. 2. Elementary Matrix
3. 3. The Rank of a Matrix and Matrix Inverses
3. 3. 1. Rank of a Matrix
3. 3. 2. Augmented matrix
3. 4. System of linear equations
3. 4. 1. Homogeneous system of equation
3. 5. Summary
3. 6. Keywords
3. 7. Assessment Questions
3. 8. References

47
UNIT-3: ELEMENTARY MATRIX OPERATION,
RANK OF A MATRIX, MATRIX INVERSE AND
SYSTEM OF LINEAR EQUATION

3. 0. Objectives

After working through this unit, the reader should able:


• To discuss the three different types of elementary row and column operation.
• To describe an elementary row operation on a matrix is equivalent to multiplying the
matrix by an elementary matrix
• To find the rank and inverse of a given matrix.
• To find inverse of the matrix with help augmented matrix
• To verify the given matrix is consistent or not
• To explain the system of homogenous equation.

3. 1. Introduction

In the previous two units, we have learnt about vector space and linear
transformations between two vector spaces U(F) and V(F) defined over the same filed F.
In this unit, we shall see that each linear transformation from n-dimensional vector space
U(F) to an m-dimensional vector space V(F) corresponds to an m×n matrix over a field F.
Here, we see how the matrix representation of a linear transformation changes with the
changes of basis of the given vector with the help of elementary matrix operation.

3. 2. Elementary Matrix Operations and Elementary Matrices

3. 2. 1. Elementary row [column] operation

Definition 3. 2. 1. Let A be an m × n matrix. Any one of the following three operations


on the rows [columns] of A is called an elementary row [column] operation:
48
1. Interchanging any two rows [columns] of A;
2. Multiplying any row [column] of A by a nonzero scalar;
3. Adding any scalar multiple of a row [column] of A to another row [column].
Any of these three operations is called an elementary operation. Elementary
operations are of Type -1, Type -2, or Type- 3 depending on whether they are obtained by
(1), (2) or (3).

1 2 3 4
Example - 1. Let A = � 2 1 −1 3�
4 0 1 2
interchanging the second row of A with the first row is an example of an elementary row
operation of Type -1.
2 1 −1 3
The resulting matrix is B = � 1 2 3 4 �.
4 0 1 2
Multiplying the second column of A by 3 is and example of an elementary column
operation of Type - 2.
1 6 3 4
The resulting matrix is C = � 2 3 −1 3 �.
4 0 1 2
Adding 4 times the third row of A to the first row is an example of an elementary row
operation of Type-3.
17 2 7 12
In this case, the resulting matrix is M = � 2 1 −1 3�
4 0 1 2
If a matrix Q can be obtained from a matrix P by means of an elementary row
operation, then P can be obtained from Q by an elementary row operation of the same
type. In the above example, A can be obtained from M by adding –4 times the third row
of M to the first row of M.

3. 2. 2. Elementary Matrix

Definition 3. 2. 2. An n × n elementary matrix is a matrix obtained by performing


elementary operation on I n , where I n is identity matrix.
The elementary matrix is said to be of Type 1, 2, or 3 according to whether the
elementary operation performed on I n is a Type 1, 2 or 3 operation, respectively.
49
Example - 2. Interchanging the first two rows of I 3 produces the elementary matrix
0 1 0
E= �1 0 0�
0 0 1
Note that E can also be obtained by interchanging the first two columns of I 3 . In fact,
any elementary matrix can be obtained in at least tow ways – either by performing an
elementary row operation on I n or by performing an elementary column operation on I n.
Similarly,
1 0 −2
�0 1 0�
0 0 1
is an elementary matrix since it can be obtained from I 3 by an elementary column
operation of Type-3 (adding – 2 times the first column of I 3 to the third column) or by an
elementary row operation of Type 3 (adding – 2 times the third row to the first row).

3. 3. The Rank of a Matrix and Matrix Inverses

Our first theorem shows that performing an elementary row operation on a matrix
is equivalent to multiplying the matrix by an elementary matrix.
Theorem 3. 3. 1. Let A∈M mxn (F), and suppose that B is obtained from A by performing
an elementary row [column] operation. Then there exists an m × m [n × n] elementary
matrix E such that B = EA [B = AE].

In fact, E is obtained from I m [I n ] by performing the same elementary row


[column] operations as that which was performed on A to obtain B. Conversely,
if E is an elementar m × m [n × n] matrix, then EA [AE] is the matrix obtained from
A by performing the same elementary row [column] operation as the which produces E
from I m [In ].
The proof, which we omit, requires verifying the above theorem for each type of
elementary row operation. The proof for column operations can then be obtained by
using the matrix transpose to transform a column operation into a row operation. The
next example illustrates the use of the above theorem.

50
Example - 3. Consider the matrices A and B in Example-1. In this case, B is obtained
from A by interchanging the first two rows of A. Performing this same operation on I 3 ,
we obtain the elementary matrix (as in Example-2)
0 1 0
E = �1 0 0�
0 0 1
Note that EA = B.
In the second part of Example -1, C is obtained from A by multiplying the second
column of A by 3. Performing this same operation on I 4 , we obtain the elementary matrix
1 0 0 0
E= � 0 3 0 0�
0 0 1 0
0 0 0 1
Observe that AE = C. It is a useful fact that the inverse of an elementary matrix is also an
elementary matrix.

Theorem 3. 3. 2. Elementary matrices are invertible, and the inverse of an elementary


matrix is an elementary matrix of the same type.
Proof. Let E be an elementary n × n matrix. Then E can be obtained by an elementary
row operation on I n . By reversing the steps used to transform I n into E, we can transform
E back into I n . The result is that I n can be obtained from E by an elementary row
operation of the same type. By Theorem 3. 3. 1, there is an elementary matrix 𝐸� such
that 𝐸� E = I n . Therefore, E is invertible and E–1 = 𝐸� .

3. 3 .1. Rank of a Matrix


Definition 3. 3. 1. If A ∈ M mxn (F), we define the rank of A, denoted rank (A), to be the
rank of the linear transformation T A : F n → F m .

Note.
1. Many results about the rank of a matrix follow immediately from the
corresponding facts about a linear transformation. An important result of this
type is that an n × n matrix is invertible if and only if its rank is n.

51
2. Every matrix A is the matrix representation of the linear transformation T A is the
same as the rank of one of its matrix representations, namely, A.
The theorem 3.3.1, extends this fact to any matrix representation of any linear
transformation defined on finite-dimensional vector spaces.

3. Let T: V → W be a linear transformation between finite-dimensional vector spaces


over a field F, and let A and B be ordered bases for V and W respectively. Then
A
rank (T) = rank [T ]B = [T ]B, A .

4. Let A be an m × n matrix. If P and Q are invertible m × m and n × n matrices,


respectively, then

(i) rank (AQ) = rank (A)


(ii) rank (PA ) = rank (A), and therefore
(iii) rank (PAQ) = rank(A)

5. Elementary row and column operations on a matrix are rank preserving.

Theorem 3. 3. 3. The rank of any matrix equals the maximum number of its linearly
independent columns; that is, the rank of a matrix is the dimension of the subspace
generated by its columns.
Proof. For any A ∈ M mxn (F), rank (A) = rank (T A ) = dim (rank (T A ) ). Let A be the

standard ordered basis for F n . Then A spans F n and let T : V → W be linear

transformation. If A = ( u 1 , u 2 , . . . ,u n ) is a basis for V, then

rank (T) = span (T(A)) = span ({T(v 1 ), T(v 2 ), T(v 3 ),……,T(v n )}).

Similarly rank (T A ) = span (T A (A)) = span ({T A (e 1 ), T A (e 2 ), T A (e 3 ), ….., T A (e n )}).


But, for any j, we see that T A (e j ) = Ae j = a j , where a j the jth column of A.
Hence rank (T A ) = span ({a 1 , a 2 , a 3 ,……,a n }).
Thus, rank (A) = dim (rank (T A )) = dim (span{ a 1 , a 2 , a 3 ,……,a n }).

1 0 1
Example -1. Let A = � 0 1 1 �
1 0 1

52
Observe that the first and second columns of A are linearly independent and that the third
column is a linear combination of the first two.
1 0 1
Thus, rank(𝐴) = dim �𝑠𝑝𝑎𝑛 ��0�, �1�, �1��� = 2.
1 0 1
To compute the rank of a matrix A, it is frequently useful to postpone the use of
above theorem until A has been suitably modified by means of appropriate elementary
row and column operations so that the number of linearly independent columns is
obvious.
By Note 2, guarantees that the rank of the modified matrix is the same as the rank
of A. One such modification of A can be obtained by using elementary row and column
operations to introduce zero entries. The next example illustrates this procedure.

1 2 1
Example -2. Let A = � 1 0 3� .
1 1 2
If we subtract the first row of A from rows 2 and 3 (Type-3 elementary row operations),
1 2 1
the result is � 0 −2 2 �.
0 −1 1
If we now subtract twice the first column from the second and subtract the first column
from the third (Type-3 elementary column operations),
1 0 0
we obtain, �0 −2 2 �.
0 −1 1
It is now obvious that the maximum number of linearly independent columns of this
matrix is 2. Hence the rank of A is 2.

Example -3.

(i) Let A = �1 2 1 1

2 1 −1 1

Note that the first and second rows of A are linearly independent since one is not a
multiple of the other. Thus rank (A) = 2.
1 3 1 1
(ii) Let A = �1 0 1 1�
0 3 0 0

53
In this case, there are several ways to proceed. Suppose that we begin with an
elementary row operation to obtain a zero in the 2, 1 position. Subtracting the first
1 3 1 1
row from the second row, we obtain, �0 −3 0 0 �.
0 3 0 0
Note that the third row is a multiple of the second row, and the first and second rows
are linearly independent. Thus rank (A) = 2.
1 2 3 1
(iii) Let A = �2 1 1 1�
1 −1 1 0
Using Elementary row operations, we can transform A as follows:
1 2 3 1 1 2 3 1
A → �0 −3 −5 −1� → �0 −3 −5 −1�
0 −3 −2 −1 0 0 3 0

It is clear that the last matrix has three linearly independent rows and hence has
rank 3.

3. 3. 2. Augmented matrix

Definition 3. 3. 2. Let A and B be m × n and m × p matrices, respectively. By the


augmented matrix (A|B), we mean the m × (n+p) matrix (A|B), that is, the matrix whose
first n columns are the columns of A, and whose last p columns are the columns of B.
0 2 4
Illustrative Example - 4. Determine whether the matrix A = �2 4 2� is invertible or
3 3 1
not. Also, find A–1.

Solution. First, we attempt to use elementary row operations to transform


0 2 4 1 0 0
(A| I) = �2 4 2 ⋮ 0 1 0�
3 3 1 0 0 1
into a matrix of the form (I|B). One method for accomplishing this transformation is to
change each column of A successively, beginning with the first column, into the
corresponding column of I. Since we need a nonzero entry in the 1, 1 position, we
begin by interchanging rows 1 and 2.

54
2 4 2 0 1 0
The result is �0 2 4 ⋮ 1 0 0�
3 3 1 0 0 1
In order to place a 1 in the 1, 1 position, we must multiply the first row by ½; this
operation yields
1 2 1 0 12 0
�0 2 4 ⋮ 1 0 0�
3 3 1 0 0 1
We now complete work in the first column by adding – 3 times row 1 to row 3 to obtain
1
1 2 1 0 2
0
�0 2 4 ⋮ 1 0 0�
−3
0 −3 −2 0 2
1
In order to change the second column of the preceding matrix into the second column of
I, we multiply row 2 by ½ to obtain a 1 in the 2, 2 position. This operation produces
1
1 2 1 0 2
0
1
�0 1 2 ⋮ 2
0 0�
0 −3 −2 0 −3
1
2

We now complete our work on the second column by adding –2 times row 2 to row 1 and
3 times row 2 to row 3. The result is
1
1 0 −3 −1 2
0
1
�0 1 2 ⋮ 2
0 0�
0 0 4 3 −3
1
2 2

Only the third column remains to be changed. In order to place a 1 in the 3, 3 position,
we multiply row 3 by ¼ ; this operation yields
1
1 0 −3 −1 2
0
1
�0 1 2 ⋮ 2
0 0�
0 0 1 3 −3 1
8 8 4

Adding appropriate multiples of row 3 to rows 1 and 2 completes the process and gives
1 −5 3
1 0 0 8 8 4
−1 3 −1
�0 1 0 ⋮ 4 4 2

0 0 1 3 −3 1
8 8 4

Thus A is invertible, and the inverse of the matrix A, that is

55
1 −5 3
8 8 4
−1 3 −1
A–1 = � 4 4 2

3 −3 1
8 8 4

Illustrative Example - 5. Determine whether the following matrix is invertible or not.

1 2 1
A = �2 1 −1 �
1 5 4
Solution. Using the strategy similar to the one used in the above example,
we attempt to use elementary row operations to transform
1 2 1 1 0 0
(𝐴 | 𝐼 ) = �2 1 −1 ⋮ 0 1 0�
1 5 4 0 0 1
into a matrix of the form (I | B). We first add - 2 times row 1 to row 2 and - 1 times row
1 2 1 1 0 0
1 to row 3. We then add row 2 to row 3. The result, �2 1 −1 ⋮ 0 1 0�
1 5 4 0 0 1
1 2 1 1 0 0 1 2 1 1 0 0
⇒ �0 −3 −3 −2 1 0� ⇒ �0 −3 −3 ⋮ −2 1 0� is a matrix
0 3 3 −1 0 1 0 0 0 −3 1 1
with a row whose first 3 entries are zeros. Therefore A is not invertible.

3. 4. Systems of Linear Equations

Definition 3. 4. 1. The system of linear equations


a 11 x 1 + a 12 x 2 + . . . . . . . . + a 1n x n = b 1
a 21 x 1 + a 22 x 2 + . . . .. . . . + a 2n x n = b 2
:
a m1 x 1 + a m2 x 2 + . . . . . . . + a mn x n = b m
where a ij and b i (1 ≤ i ≤ m and 1 ≤ j ≤ n ) are scalars in a field F and x 1 , x 2 , x 3, . .. . ,x n
are n variables taking values in F, is called a system of m linear equations in n unknowns
over F.

56
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Definition 3. 4. 2. The m × n matrix � ⋮ ⋮ … ⋮ � is called the
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
coefficient matrix of the system.
Note.
𝑥1 𝑏1
𝑥2 𝑏2
1. If we let 𝑥 = �⋮ � and 𝑏 = � �. Then the system may be rewritten as a

𝑥𝑛 𝑏𝑚
single matrix equation Ax = b.
2. Consider a system of linear equations as a single matrix equation.

𝑠1
𝑠2
𝑠 = �⋮ � ∈ 𝐹 𝑛
𝑠𝑛
such that As = b. The set of all solutions to the system of equation is called the
solution set of the system. System of equation is called consistent if its solution
set is nonempty; otherwise it is called inconsistent.

Illustrative Example - 1. Solve the system of equation x 1 + x 2 = 3 and x 1 – x 2 = 1


Solution. By use of familiar techniques, we can solve the preceding system and conclude
2
that there is only one solution: x 1 = 2, x 2 = 1; that is 𝑠 = � �. In matrix form, the
1
1 1 𝑥1 3 1 1 3
system can be written as � � �𝑥 � = � � . So 𝐴 = � � and 𝐵 = � �.
1 −1 2 1 1 −1 1

Illustrative Example -2. Solve the system of equation 2x 1 + 3x 2 + x 3 = 1 and


x 1 – x 2 + 2x 3 = 6.

𝑥1
2 3 1 𝑥 1
Solution. In matrix form, the system can be written as � � � 2 � = � �.
1 −1 2 𝑥 6
3

-6 8
This system has many solutions, such as s = � 2 � and s = � -4 �.
7 -3

57
Illustrative Example -3. Solve the system of equation x 1 + x 2 = 0 and x 1 + x 2 = 1.

1 1 𝑥1 0
Solution. In matrix form, the system can be written as � � �𝑥 � = � �, it is
1 1 2 1
evident that this system has no solutions. Thus we see that a system of linear equations
can have one, many, or no solutions.

3. 4. 1. Homogeneous system of equation


Definition 3. 4. 3. A system Ax = b of m liner equations in n unknowns is said to be
homogeneous if b = 0. Otherwise the system is said to be non homogeneous.
Note.
1. Let Ax=0 be a homogeneous system of m linear equations in n unknowns over a
field F. Let K denote the set of all solutions to Ax=0. Then K= nullity (T A ), hence
K is a subspace of F n of dimension n–rank (T A ) = n–rank (T A ).
2. If m<n , the system Ax=0 has a nonzero solution.
Based on above note we illustrate the following examples.
Example - 4.
(i) Consider the system of equation x 1 + 2x 2 + x 3 = 0 and x 1 – x 2 – x 3 = 0.

1 2 1
Let A= � � be the coefficient matrix of this system.
1 -1 -1
It is clear that rank (A) = 2. If K is the solution set of this system, then
dim (K) = 3 – 2 = 1. Thus any nonzero solution constitutes a basis for K.
1
For example, since � −2 � is a solution to the given system,
3
1
�� −2 �� is a basis for K.
3
1 t
Thus any vector in K is of the form t � -2 � = � -2t � , where t ∈ R.
3 3t
(ii) Consider the system x 1 – 2x 2 + x 3 = 0 of one equation in three unknowns.
If A = (1 –2 1) is the coefficient matrix, then rank (A) = 1.

Hence if K is the solution set, then dim (K) = 3 – 1 = 2.

58
2 −1
Note that � 1 � and � 0 � are linearly independent vectors in K.
0 1
Thus they constitute a basis for K,
2 −1
so that 𝐾 = �𝑡1 �1� + 𝑡2 � 0� ∶ 𝑡1 , 𝑡2 𝜖 𝑅�.
0 1

Theorem 3. 4. 1. Le K be the solution set of a system of linear equations Ax = b, let K H


be the solution set of the corresponding homogeneous system of equation Ax = 0. Then
for any solution s to Ax = b. K = {s} + K H = {s + k : k ∈ K H }.
Proof. Let s be any solution to Ax = b. We must show that K = {s} + K H . If w ∈ K,
then Aw = b. Hence A(w – s ) = Aw – As = b – b = 0. So (w – s) ∈ K H . Thus there
exists k ∈ K H such that (w – s) = k. It follows that w = (s + k) ∈ {s} + K H , and therefore
K is a subset of {s} + K H .
Conversely, suppose that w ∈ {s} + K H ; then w = (s + k ) for some k ∈ K H . But
then, Aw = A(s +k) = As + Ak = b + 0 = b; so w ∈ K. Therefore {s} + K H is a subset of K.
Thus, K = {s + K H }.
Note. Let Ax = b be a system of n linear equations in n unknowns. If a is invertible,
then the system has exactly one solution, namely, A–1b. Conversely, if the system has
exactly one solution, then A is invertible.

3. 5. Summary

1. This unit is devoted to two related objectives:

(i) The study of certain “rank-preserving” operations on matrices;


(ii) The application of these operations and the theory of linear transformations to the
solution of systems of linear equations.
As a consequence of Objective – (i), we obtain a simple method for computing
the rank of a linear transformation between finite-dimensional vector spaces by
applying these rank-preserving matrix operations to a matrix that represents that
transformation.

59
2. In the above unit, we define the rank of a matrix. We then use elementary operations
to compute the rank of a matrix and a linear transformation. We have remarked that
an n x n matrix is invertible if and only if its rank is n. Since we know how to
compute the rank of any matrix, we can always test a matrix to determine whether it
is invertible. The section concludes with a procedure for computing the inverse of an
invertible matrix.
3. Solving a system of linear equations is probably the most important application of
linear algebra. The familiar method of elimination for solving systems of linear
equation, involves the elimination of variables so that a simpler system can be
obtained. The technique by which the variables are eliminated utilizes three types of
operations:

(i) Interchanging any two equations in the system;


(ii) Multiplying any equation in the system by a nonzero constant;
(iii) Adding a multiple of one equation to another.

3. 6. Keywords

Augmented matrix Elementary row operation


Coefficient of a linear equations Inverse of matrix
Consistent of linear equations Matrix
Elementary column operation Rank of a matrix
Elementary matrix System of linear equations
Elementary operation Type- 1, -2 and -3 operations

3. 7. Assessment Questions

1. Define an elementary row operation. Explain the Three types of operation with an
example.
Hint. See the section. 3. 2. 1.
2. Explain the role of augmented matrix for determine the inverse matrix of the given
matrix.
60
Hint. See the section 3. 3. 2
1 2 3
3. Find the inverse of the matrix A = �1 3 4 �.
1 4 4
4 −4 1
Answer. A−1= � 0 −1 1 �.
−1 2 −1

4. Solve the system of equation


x 1 + 2x 2 + 2x 3 = 4
x 1 + 3 x 2 +3 x 3 = 5
2x 1 + 6x 2 + 5x 3 = 6
Answer. (x 1 , x 2 , x 3 ) = (2, −3, 4).
5. Explain the difference between homogenous and non homogeneous system of
equation with an suitable example.
Hint. See the section 3. 4. 1.

3. 8. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. P. R. Halmos – Finite Dimensional Vector Space, D. Van Nostrand, 1958.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,

61
UNIT- 4: PROPERTIES OF DETERMINANT, COFACTOR
EXPANSIONS AND CRAMER’S RULE

STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. Properties of determinant
4. 2. 1. Fundamentals of determinants
4. 3. Cofactor Expansions
4. 3. 1. Minors and Cofactors
4. 4. Elementary operations and Cramer’s rule
4. 4. 1. The Adjoint Matrix
4. 5. Summary
4. 6. Keywords
4. 7. Assessment Questions
4. 8. References

62
UNIT- 4: PROPERTIES OF DETERMINANT, COFACTOR
EXPANSIONS AND CRAMER’S RULE

4. 0. Objectives

After working through this unit, the reader should able to :


• Define the permutations and determinants.
• Express the determinants of given n × n matrix.
• Discuss the properties of determinants. Particularly, solving the problems
• Find the Minors and Cofactors of given determinants.
• Find the inverse of the matrix with help of adjoint matrix.
• Solve the given system of equation by using the cramer’s rule.

4. 1. Introduction

An exposition of the theory of determinants independent of their relation to the


solvability of linear equations was first given by Vandermonde in his “Memoir on
elimination theory” of 1772. (The word “determinant” was used for the first time by
Gauss, in 1801, to stand for the discriminant of a quadratic form, where the discriminant
of the form ax2 + bxy + cy2 is b2 − 4ac.) Laplace extended some of Vandermonde’s work
in his Researches on the Integral Calculus and the System of the World (1772), showing
how to expand n × n determinants by cofactors.
The determinant of a square matrix is a value computed from the elements of the
matrix by certain, equivalent rules. The determinant provides important information when
the matrix consists of the coefficients of a system of linear equations, and when it
describes a linear transformation: in the first case the system has a unique solution if and
only if the determinant is nonzero, in the second case that same condition means that the
transformation has an inverse operation. So this unit deals with some important properties
of determinants, cofactor expansions and Cramer’s rule.
63
4. 2. Properties of Determinants

4. 2. 1. Fundamentals of determinants

Definition 4. 2. 1. A Permutation of set {x 1 , x 2 ,…, x n }is simply an arrangement of the


elements of the set into a particular sequence or order. Although the number of
interchanges used to carry j 1 , j 2 , …, j n into 1, 2, …, n is not always the same, because the
number of interchanges used to a carry a permutation j 1 , j 2 , …, j n of {1, 2, …, n } into the
natural ordering is either always odd or even. Hence the sign (–1)t of each term is well-
defined.

Definition 4. 2. 2. The determinants of the square matrix A = [a ij ] over F is the scalar

det(A) defined by det( A=


) ∑ (−1)t a1 j1 a2 j2 ...anjn , where ∑ denotes the sum of all terms
( j) ( j)

of the form (−1)t a1 j1 a2 j2 ...anjn as j 1 , j 2 ,…, j n assumes all possible permutations of the
numbers 1, 2, …, n, and the exponent t is the number of interchanges used to carry
j 1 , j 2 , …, j n into the natural ordering 1, 2, …, n.
The notations det(A) and A are used interchangeably with det(A). When the
elements of A = [a ij ] n are written as a rectangular array, we would have
a11 a12 . . a1n
a a22 . . a21
det( A=
) = 21
A and det(A) is uniquely determined by A.
. . . . .
an1 an1 . . ann

We observe that there are n! terms in the sum det(A) since there are n! possible
ordering of 1, 2, …, n. The determinant of n × n matrix is referred to as n × n determinant,
or a determinant of order n.

Note.
1. Determinants of order 1 and 2.

64
a11 a12
That is  a11  = a11 and = a11 a22 – a12 a21 .
a21 a22

2. Determinants of order 3.

 a11 a12 a13 


Consider a 3× 3 matrix A =  a21 a22 a23 


a a33 
 31 a32

By the definition,

det( A) = (−1)t1 a11a22 a33 + (−1)t2 a11a23a32 + (−1)t3 a12 a23a31

+(−1)t4 a12 a21a33 + (−1)t5 a13a22 a31 + (−1)t6 a13a21a32 .


Since 1, 2, 3 is the natural ordering, we may take t 1 =0. Since 1, 3, 2 can be carried
into 1, 2, 3 by the single interchange of 2 and 3, we may take t 2 =1. The ordering
2, 3, 2 can be carried into 1, 2, 3 be an interchange of 2 and 1, followed by an
interchange of 2 and 3. Thus we may take t 3 =2.
By the same method, we find t 4 =1, t 5 =1 and t 6 =2.

Hence, det( A) = a11a22 a33 − a11a23a32 + a12 a23a31 − a12 a21a33 − a13a22 a31 + a13a21a32 .

2 1 1   3 2 1  1 2 −1
Let A  0
Examples - 1.= , 
5 −2  B =
 and  
  −4 5 −1 C=  −2 0 7 .
1 
−3 4     3 7 
  2 −3 4   0

Then the by note-2, we have det (A) = 21, det (B) = 81 and det(C) = –70.

Theorem 4. 2. 1. For any matrix A = [a ij ], det( A=


) ∑ (−1)t a1 j1 a2 j2 ...anjn , where ∑
( j) (i )

denotes the sum over all possible permutations i 1 , i 2 ,…,i n of 1, 2, …, n, and s is the
number of interchanges used to carry i 1 , i 2 ,…,i n into the natural ordering.

Proof. Let =
S ∑ (−1) s ai11 , ai2 2 ,..., ain n. Now both S and det(A) have n! terms. Except
(i )

possibly for sign, each term of S is a term of det(A), and each term of det(A) is a term of
S. Thus, S and det(A) consist of the same terms, with a possible difference in sign.

Consider a certain term (−1) s ai11 , ai2 2 ,..., ain n and (−1)t a1 j1 a2 j2 ...anjn be the corresponding

65
term in det(A). Then (−1) s ai11 , ai2 2 ,..., ain n can be carried into (−1)t a1 j1 a2 j2 ...anjn by s
interchanges of factors since the permutation i 1 , i 2 ,…,i n can be changed into the natural
ordering 1, 2, …, n by s interchanges of elements. This means that the natural ordering
1, 2, …, n can be changed into the permutations j 1 , j 2 , …, j n by s interchanges since the
column subscripts have been interchanged each time the factors were interchanged. But
j 1 , j 2 , …, j n can be carried into 1, 2, …, n by t interchanges, by the definition of det(A).
Thus 1, 2, …, n can be carried into j 1 , j 2 , …, j n and then back into itself by (s+t)
interchanges. Since 1, 2, …, n can be carried into itself by an even number(zero) of
interchanges, (s+t) is even, because the number of interchanges used to a carry a
permutation j 1 , j 2 , …, j n of {1, 2, …, n } into the natural ordering is either always odd or
even. Therefore (–1)s+t = 1 and (–1)s = (–1)t. Now we have the corresponding terms in
det (A) and S with the same sign, and therefore det(A) = S.

Theorem 4. 2. 2. If A = [a ij ] n , then det (AT) = det (A).


Proof. Let B=AT, so that b ij =a ji for all pairs i,j.

Thus det( B=
) ∑ (−1)t b1 j1 b2 j2 ...bnjn by the definition of det (B), and
( j)

=) ∑ ( −1 )t a j11 ,a j2 2 ,...,a jn n .
det( B
( j)

Therefore det (B) = det(A), by theorem 4. 2. 1.

1 2 3 1 2 3
Example - 2. Let A =  2 1 3
 and T  
A =  2 1 1  be two matrices.

3 1 2   3 3 2
  
Then det (AT) = det (A) = 6.

Theorem 4. 2. 3. If B results from matrix A by interchanging two rows (columns) of A,


then det (B) = –det (A).
Proof. Suppose that B arises from A by interchanging rows r and s of A, say r < s. Then
we have b rj =a sj , b sj =a rj and b ij =a ij for i≠r, i≠s. Now

det(B) = ∑ (−1)t b1 j1 b2 j2 ....brjr ....bsjs ....bnjn


( j)

66
= ∑ (−1) a1 j1 a2 j2 ....asjr ....arjs ....anjn
t

( j)

= ∑ (−1) a1 j1 a2 j2 ....arjr ....asjs ....anjn .


t

( j)

The permutation j 1 , j 2 … j s … j r … j n results from the permutation j 1 , j 2 … j r … j s … j n by


an interchange of two numbers; the number of inversions in the former differs by an odd
number from the number of inversions in the latter. This means that the sign of each term
in det (B) is the negative of the sign of the corresponding term in det (A).
Hence det (B) = –det (A).
Now suppose that B is obtained from A by interchanging two columns of A. Then BT is
obtained from AT by interchanging two rows of AT. So det (BT) = –det (AT), but
det (BT) = det (B) and det (AT) = det (A). Hence det (B) = – det (A).

 2 −1 3 2
Example -3. Let A =   and B =   . Then det (B) = – det (A) = –7.
3 2  2 −1

Theorem 4. 2. 4. If a matrix A = [a ij ] n is upper (lower) triangular, then


det (A) = a 11 a 22 . . . .a nn .
Proof. Let A=[a ij ] be upper triangular (that is a ij =0 for i>j). Then a term a1 j1 , a2 j2 ...., anjn

in the expression for det (A) can be nonzero only for 1≤ j 1 , 2≤ j 2 , . . .,n ≤ j n . Now, j 1 , j 2 ,
…, j n must be a permutation, or rearrangement, of {1, 2, …,n}. Hence, we must have
j n =n, j n–1 = n–1,…., j 2 =2, j 1 =1. Thus the only term of det (A) that can be nonzero is the
product of the elements on the main diagonal of A. Since the permutation 1, 2,….,n has
no inversions, the sign associated with it is +.
Therefore, det (A) =a 11 a 22 . . . .a nn .
Similarly, we can prove the lower triangular case.

4. 3. Cofactor Expansions

4. 3. 1. Minors and Cofactor

67
Definition 4. 3. 1. Let A=[a ij ] n be an n × n matrix. Let M ij be the (n–1) × (n–1)
submatrix of A obtained by deleting the ith row and jth column of A. The det(M ij ) is
called the Minor of a ij . The cofactor A ij of a ij is defined as A ij = (–1)i+j det(M ij ).

3 −1 2 
Example - 1. Let A =  4 
5 6  . Then

7 1 2 

4 6
det(M12 ) = = 8 – 42 = –32,
7 2

3 −1
det(M 23 ) = = 3 + 7=10, and
7 1

−1 2
det(M31 ) = = – 6 –10 = –16.
5 6
Also we have, A 12 = (–1)1+2 det (M 12 ) = (–1)( –34) = 34,

A 23 = (–1)2+3 det (M 23 ) = (–1)( 10) = –10, and

A 31 = (–1)3+2 det (M 31 ) = (1)( –16) = –16.


The expression given in the theorem is referred to as “ an expansion by cofactors” or
more precisely as the “the expansion about the ith row”. This expansion is the main
result of this section.

Theorem 4. 3. 2. If A=[a ij ] n , then det (A) = a i1 A i1 + a i2 A i2 + . . . . + a in A in .


Proof. For a fixed integer i, we collect all of the terms in the sum

=) ∑ ( −1 )t a1 j1 a2 j2 ...anjn that contains a i1 as a factor in one group, all of the terms


det( A
( j)

that contains a i2 as a factor in another group, and so on for each column number. This
separates the terms in det (A) into n group with no overlapping since each term contains
exactly one factor from row i. In each of the terms containing a i1 , we factor our a i1 and
let F i1 denote the remaining factor. Repeating this process for each a i1 a i2 . . . .a in in
turn,

68
we obtain det(A) = a i1 F i1 + a i2 F i2 + . . . . + a in F in . To finish the proof, we need only
show that F ij =A ij = (–)i+jM ij , where M ij is the minor of a ij . Consider first the case where
i=1 and j=1.
We shall show that a 11 F 11 =a 11 M 11 . Each term in F 11 was obtained be factoring a 11

from a term (−1)−t1 a1 j1 a2 j2 ...anjn in the expansion of det (A). Thus each term F 11 has the
− t1
form (−1) a2 j2 a3 j3 ...anjn , where t 2 is the number of interchanges used to carry j 2 , …, j n
into 2, 3,….,n. Letting j 2 , …, j n range over all permutations of 2,3,…,n, we see
that each of F 11 and M 11 has (n–1)! terms. Now 1, j 2 , …, j n can be carried into natural
ordering by the same interchanges used to carry j 2 , …, j n into 2,3,…,n. That is, we may
take t 1 =t 2 . This means that F 11 and M 11 have exactly the same terms, yielding F 11 = M 11
and a 11 F 11 =a 11 M 11 . Consider now an arbitrary a ij . By (i–1)
interchanges of the original row i with the adjacent row above and then ( j–1)
interchanges of column j with the adjacent column on the left, we obtain a matrix B that
has a ij in the first row, first column position. Since the order of the remaining rows and
columns of A was not changed, the minor of a ij in B is the same M ij as it is in A. If B
results from matrix A by interchanging two rows (columns) of A, then det(B) = – det(A).
So det (B) = (–1)i–1+ j–1 det(A) = (–1)i+j det (A). This gives det (A) = (–1)i+j det (B). The
sum of all the terms in det (B)that contains a ij as a factor is a ij M ij , from our first case.
Since det(A) = (–1)i+j det (B), the sum of all the terms in det (A) that contains a ij as a
factor is (–1)i+j a ij M ij . Thus a ij F ij = (–1)i+j a ij M ij = a ij A ij , and the theorem proved.

The dual statement for above theorem as follows.


Theorem 4. 3. 3. If A=[a ij ] n , then det(A) = a 1j A 1j + a 2j A 2j + . . . . + a nj A nj .

Note. If A=[a ij ] 3 , the expansion of det(A) about the 2nd row is given by
a11 a12 a13
=A a=
21 a22 a23 a21 A21 + a22 A22 + a23 A23
a31 a32 a33

=
−a21 (a12 a33 − a13a32 ) + a22 (a11a33 − a13a31 ) − a23 (a11a32 − a12 a31 ).

69
4. 4. Elementary Operations and Cramer Rules

For an elementary operation, which are based on the properties of determinant


and cofactor expansion, we have the following results
1. If two rows (columns) of A are equal, then det (A) = 0
2. If a row(column) of A consists entirely of zeros, then det (A)=0
3. If B is obtained from A by multiplying a row (column) of A by a real number k,
then det (B) = k det (A).
4. If B = [b ij ] n is obtained from A = [a ij ] n by adding to each element of the rth
row(column) of A a constant k times the corresponding element of the sth
row(column) r≠s of A, then det (B) = det (A).
5. If two rows (columns) of A are equal with det (A) = 0 and if two The
determinant of a product of two matrices is the product of their determinant.
That is det (AB) = det (A) det(B).

−1 1
6. If A is non singular, then det(A)≠0 and det(A ) =
det(A)
4. 4. 1. The adjoint matrix
Definition 4. 4. 1. Let A = [a ij ] be an n×n matrix. The n×n matrix adj(A), called the
adjoint of A, is the matrix whose i, jth element is the cofactor A ji of a ji . Thus
 A11  An1 
 
adj ( A) =     .
A  Ann 
 1n
Note.
1. The adjoint of A is formed by taking the transpose of the matrix of cofactors of
the elements of A.
2. It should be noted that the term adjoint has other meaning in linear algebra in
addition to its use in the above definitions.
3. If A = [a ij ] be an n×n matrix. Then
(i) A adj(A)=(adj(A))A= det(A)I n, where I n is an identity matrix.

70
1
(ii) A−1 = adj ( A) , provided det(A)≠0.
det( A)
Our final results of this unit make an important connection between
determinants and the solutions of certain types of systems of linear equations. This
theorem presents a formula for the unknowns in terms of certain determinants. This
formula is commonly known as Cramer’s rule.

Theorem 4. 4. 1. Consider a system of linear equations AX=B, in which A = [a ij ] n×n ,


X=[ x 1 , x 2 , …,x n ]T, and B=[ b 1 , b 2 , …,b n ]T. If det(A)≠0, the unique solution of the
n
∑ bk Akj
system is given by x j = k =1 , j=1,2,3…,n.
det( A)
Proof. We first show that the given values are a solution of the system. Substitution of
these values for x j into the left member of the ith equation of the system yields
1   n   n 
ai1 x1=
+ . . . + ain xn  ai1  ∑ bk Ak1  + . . . + ain  ∑ bk Akn  
det( A)   k 1 =
=  k 1 
1 n n 
= ∑  ∑ aij bk Akj 
det( A)=j 1 =k 1 
1 n  n 
= ∑  bk ∑ aij Akj 
A) k 1 =j 1
det(= 

1 n
= ∑ ( bk (δ ik det( A)) ) , where δ ik is the Kronecker delta
det( A) k =1
1
= = bi (δ ii det( A) ) bi .
det( A)
n
∑ bk Akj
Thus, the values x j = k =1 furnish a solution of the system.
det( A)
To prove the uniqueness, suppose that x j =y j, j=1, 2, 3…, n, represents any
n
solution to the system. Then the ith equation ∑ aik yk = bi is satisfied for i=1,2,….,n.
k =1

71
If we multiply both members of the ithequation by A i j (j fixed) and form the sum of these
n n  n n
n  n
equations, we find that ∑  ∑ aik Aij yk  = ∑ bi Aij and ∑  ∑ aik Aij yk = ∑ bi Aij .
=i 1 
= k 1  =i 1 k 1 =i 1
=  =i 1

n
n n n ∑ bi Aij
But, for each k, ∑ aik Aij = δ kj det( A). ∑
Thus δ kj det( A) yk = bi Aij , ∑ and y j = i =1
.
k =1 =k 1 =i 1 det( A)
Hence these y i ’s are the same as the solution given in the statement of the theorem.

n
Note. The sum ∑ bk Akj is the determinant of the matrix obtained by replacing the jth
k =1

column of A by the column of constants B = [ b 1 , b 2 , …,b n ]T.

Example -1. For a system in three unknowns with det(A)≠0,


a 11 x 1 + a 12 x 2 + a 13 x 3 =b 1
a 21 x 1 + a 22 x 2 + a 23 x 3 =b 2
a 31 x 1 + a 32 x 2 + a 33 x 3 =b 3 ,
the solution stated in the above theorem, can be written as
b1 a12 a13 a11 b1 a13 a11 a12 b1
b2 a22 a23 a21 b2 a23 a21 a22 b2
b3 a32 a33 a31 b3 a33 a31 a32 b3
=x1 = , x2 = , x3 .
det( A) det( A) det( A)

4. 5. Summary

1. In this unit, the fundamental of the theory of determinants and its properties are
explored. It is necessary to the study of eigenvalues and eigenvectors of linear
transformation. For this purpose we discuss some important theorems along their
proof.
2. The different method for evaluating the determinant of an n × n matrix, which
reduces the problem to the evaluation of determinants of matrices of order (n–1). We
can then repeat the process for these (n–1) × (n–1) matrices until we get to 2 × 2

72
matrix. For an elementary operation, which are based on the properties of
determinant and cofactor expansion.
3. The important connection between determinants and the solutions of certain types of
systems of linear equations. We present the Cramer’s rule for finding the unknowns
in terms of certain determinants.

4. 6. Keywords

Adjoint matrix Determinant of a matrix


Cofactor Elementary operation
Cramer’s rule Minor
Determinant n – linear function

4. 7. Assessment Questions

1. Write out the complete expression for det (A) if A = [a ij ] 4 .


Hint. See the section 4. 2. 1.
2. Prove or disprove each of the given statements with suitable examples
(i) A+ B = A + B
(ii) A+ AT  = A +  AT .
Hint. See the section 4. 2. 1.
3. Determine whether t is even or odd in the given term of det(A), where A = [a ij ] n .
(i) (–)ta 13 a 21 a 34 a 42
(ii) (–)ta 14 a 21 a 33 a 42
(iii) (–)ta 14 a 23 a 32 a 41
(iv) (–)ta 12 a 24 a 31 a 43
Hint. See Theorem 4.2.1.

73
 3 −2 1 
4. Let A =  5 6 2  . Then compute adj(A) and A–1.
 
 1 0 −3 
 

 −18 −6 −10   18 6 10

 17 1 
94 94 94
Answer. adj ( A)=  17  −1
− 1  and A = − 10
 −10  94 94 94 
 −6 −2 28   6 2 28 
− 94
  94 94 .

 1 2 3
5. If A =  4 5 6  , find the following minors and cofactors:
 
7 8 9
 
(i) M 23 and A 23
(ii) M 13 and A 13

Answer. (i) M 23 = – 6 and A 23 = 6, (ii) M 13 = – 3 and A 13 = – 3

6. Solve the system of linear equation by using the Cramer rule:

–2 x 1 + 3 x 2 – x 3 = 1
x1 + 2 x2 – x3 = 4
–2x 1 – x 2 + x 3 = –3.
Answer. det (A)= – 2, and x 1 =2, x 2 =3, x 3 = 4.

4. 8. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. P. R. Halmos – Finite Dimensional Vector Space, D. Van Nostrand, 1958.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.

74
75
74
BLOCK - II
Diagonalization and Inner Product Spaces

75
76
UNIT-1: EIGENVALUES AND EIGENVECTORS,
DIAGONALIZABILITY AND INVARIANT SUBSPACES

STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. Eigenvalues and Eigenvectors
1. 2. 1. Characteristic Polynomial
1. 3. Diagonalizability
1. 3. 1. Polynomial splits
1. 4. Invariant Subspaces
1. 4. 1. Cayley-Hamilton Theorem
1. 5. Summary
1. 6. Keywords
1. 7. Assessment Questions
1. 8. References

77
UNIT-1: EIGENVALUES AND EIGENVECTORS,
DIAGONALIZABILITY AND INVARIANT SUBSPACES

1. 0. Objectives

After going through this unit reader will be able to:


• Define the eigenvalues and eigenvector.
• Finding the eigenvalues by using the characteristic polynomial
• Explain the diagonalizability of given matrix
• Discuss the invariant subspaces
• Explain the Cayley-Hamilton theorem

1. 1. Introduction

Eigenvectors are a special set of vectors associated with a linear system of


equations (that is a matrix equation) that are sometimes also known as characteristic
vectors, proper vectors, or latent vectors. The determination of the eigenvectors and
eigenvalues of a system is extremely important in physics and engineering, where it is
equivalent to matrix diagonalization and arises in such common applications as stability
analysis, the physics of rotating bodies, and small oscillations of vibrating systems, to
name only a few. In 1858, A. Cayley proved the important Cayley-Hamilton theorem
that a square matrix satisfies its characteristic polynomial.

1. 2. Eigen values and Eigen vectors

Definition 1. 2. 1. Let T be a linear operator on a vector space V. A nonzero vector v∈


V is called an eigenvector of T if there exists a scalar λ such that T(v) = λv . The scalar λ
is called the eigenvalue corresponding to the eigenvector v.

78
Note.
1. Let A be in M nxn (F). A nonzero vector v∈ F n is called an eigenvector of A if v is
an eigenvector of T A ; that is, if Av = λv for some scalar λ. The scalar λ is called
the eigenvalue of A corresponding to the eigenvector v.
2. A vector is an eigenvector of a matrix A if and only if it is an eigenvector of T A .
Likewise, a scalar λ is an eigenvalue of A if and only if it is an eigenvalue of T A .

To obtain a basis of eigenvectors for a matrix (or a linear operator), we need to be


able to determine its eigenvalues and eigenvectors. The following theorem gives us a
method for finding eigenvalues.

Theorem 1. 2. 1. Let A ∈ M m x n (F) be a matrix. Then a scalar λ is an eigenvalue of A if


and only if det (A - λIn ) = 0.
Proof. A scalar λ is an eigenvalue of A if and only if there exists a nonzero vectors v∈ V
such that Av = λv , that is, (A – λI n )(v) = 0.
By two vector spaces U and V are equal, finite dimensional vector space over F , and let
T : U → V be linear.
Then the following are equivalent.
(i) T is one-to-one
(ii) T is onto.
(iii) rank (T) = dim (V).

This is true if and only if (A – λI n ) is not invertible. However, this result is equivalent to
the statement that det (A – λIn ) = 0.

1. 2. 1. Characteristic Polynomial
Definition 1. 2. 2. Let A ∈ M m x n (F) be a matrix, The polynomial f(t) = det (A – tI n ) is
called the characteristic polynomial of A.

Note.
1. Let T be a linear operator on a n-dimensional vector space V with ordered basis B.
We define the characteristic polynomial f(t) of T to be the characteristic

79
polynomial of A = [T] B , which is independent of the choice of ordered basis B.

Thus if T is a linear operator on a finite-dimensional vector space V and B is an


ordered basis for V, then λ is an eigenvalue of T if and only if λ is an eigenvalue
of [T] B . The characteristic polynomial f(t) is also denoted as ∆ A (t).
2. Theorem 1. 2. 1, states that the eigenvalues of a matrix are the zeros of its
characteristic polynomial. When determining the eigenvalues of a matrix or a
linear operator, we normally compute its characteristic polynomial, as in the next
example.

1 1
Illustrative Example -1. Find the eigenvalues of 𝐴 = � � ∈ 𝑀2 𝑥 2 (𝐑).
4 1
Solution. The characteristic polynomial of A is
1−𝑡 1
det (𝐴 − 𝑡𝐼2 ) = det � �
4 1−𝑡
= t2 – 2t – 3 = (t – 3)(t + 1).
It follows from Theorem 1. 2. 1, that the only eigenvalues of A are 3 and –1.

1 1 0
Illustrative Example -2. Find the eigenvalues of the matrix A= �0 2 2�
0 0 3
Solution. The characteristic polynomial of T is
1−𝑡 1 0
det (A – tI 3 ) = det � 0 2−𝑡 2 �
0 0 3−𝑡
= (1 – t) (2 – t)(3 – t) = – (t –1)(t–2)(t –3).
Hence λ is an eigenvalue of T (or A) if and only if λ = 1, 2 or 3.

Note. Let T be a linear transformation on a vector space V over a field F. Then a vector
v∈ V is an eigenvector of T corresponding to eigenvalue λ if and only if v ≠ 0 and
v∈ nullity (T − λI).

Illustrative Example -3. Show that a square matrix A has 0 as eigenvalues if and only if
A is not invertible.
Solution. Let A be an n × n matrix over a field F.
80
First, let 0 be an eigenvalue of matrix A and non-zero vector X ∈ F n be the corresponding
eigenvector. Then,
AX = 0X ⇒ AX = 0
If possible, let A be an invertible matrix. Then,
AX = 0 ⇒ A – 1 (AX) = A –1 0
⇒ (A – 1A) X = 0
⇒ IX = 0
⇒ X=0
But X ≠ 0. So, we arrive at a contradiction.
Hence, A must be a non-invertible matrix.
Conversely, let A be a non-invertible matrix. Then, A is non-invertible.
The system of equations AX = 0 has non-trivial solutions.
So, there exists a non-zero vector X ∈ Fn such that AX = 0 ⇒AX = 0X ⇒ 0 is an
eigenvalue of A.

Illustrative Example -4. If λ is an eigenvalue of an invertible matrix A over a field F,


then show that λ – 1 is an eigenvalue of A– 1.
Solution. Since A is an invertible matrix with an eigenvalue λ. By above example, we
have , λ ≠ 0. So λ – 1∈F.
Now, λ is an eigenvalue of matrix A.
A – 1(AX) = A – 1(λX), since A is invertible, therefore A – 1 exists
⇒ (A – 1 A) X = λ (A – 1 X)
⇒ IX = λ (A – 1 X)
⇒ X = λ (A – 1 X)
⇒ λ – 1X = A – 1 X.
Therefore, λ – 1 is the eigenvalue of A – 1.

81
1. 3. Diagonalizability

Definition 1. 2. 1. A linear operator T on a finite-dimensional vector space V is called


diagonalizable if there is an ordered basis B for V such that [T] B is a diagonal matrix. A
square matrix A is called diagonalizable if T A is diagonalizable.
In other words, A square matrix A over F is said to be diagonalizable if there
exists an invertible matrix C such that C–1AC is a diagonal matrix. Clearly, A is
diagonalizable if and only if A is similar to a diagonal matrix.

We want to determine when a linear operator T on a finite-dimensional vector


space V is diagonalizable and, if so, how to obtain an ordered basis B = {v 1 , v 2 , ……,v n }

for V such that [T] B is a diagonal matrix.

Note that, if D = [T] B is a diagonal matrix, then for each vector v j ∈B, we have,

𝑇�𝑣𝑗 � = ∑𝑛𝑖=1 𝐷𝑖𝑗 𝑣𝑖 = 𝐷𝑗𝑗 = λ𝑗 𝑣𝑗 , where λ j = D jj .


Conversely, if B = {v 1 , v 2 , ……,v n } is an ordered basis for V such that T(v j ) = λ j
λ1 0 ⋯ 0
λ2 ⋯ 0
v j for some scalars λ 1 , λ 2 , ………, λ n , then clearly [T] B = � 0 ⋮ ⋮ �
⋮ ⋮
0 0 ⋯ λ𝑛
for each vector v in the basis B satisfies the condition that T(v) = λv for some scalar λ.

Theorem 1. 3. 1. Let T be a linear operator on a vector space V, and let λ 1 , λ 2 , . . . . , λ k


be distinct eigenvalues of T. If v 1 , v 2 , . . . . , v k are eigenvectors of T such that λ i
corresponds to v i (1 ≤ i ≤ k), then { v 1 , v 2 , . . . . , v k } is linearly independent.

Proof. The proof is by mathematical induction on k. Suppose that k = 1. Then v 1 ≠ 0


since v 1 is an eigenvector, and hence {v i } is linearly independent.
Now assume that the theorem holds for (k – 1) - distinct eigenvalues, where (k – 1) ≥ 1,
and that we have k eigenvectors v 1 , v 2 , . . . . , v k corresponding to the distinct eigenvalues
λ 1 , λ 2 , . . . . , λ k . We wish to show that {v 1 , v 2 , . . . , v k } is linearly independent.
Suppose that a 1 , a 2 , ……,a k are scalars such that
82
a 1 v 1 + a 2 v 2 + ……….. + a k v k = 0 → (1)
Applying T – λ k I to both sides of (1), we obtain
a 1 (λ 1 – λ k )v 1 + a 2 (λ 2 – λ k )v 2 + . . . . . . . . . . + a k – 1 (λ k– 1 – λ k ) v k – 1 = 0.
By the induction hypothesis {v 1 , v 2 , . . . . , v k – 1 } is linearly independent, and hence
a 1 (λ 1 – λ k ) = a 2 (λ 2 – λ k ) = . . . . . . . . . . = a k – 1 (λ k– 1 – λ k ) = 0.
Since λ 1 , λ 2 , . . ,λ k are distinct eigenvalues, it follows that (λ i – λ k ) ≠ 0 for 1 ≤ i ≤ k – 1.
So, a 1 = a 2 = . . . . . . = a k – 1 = 0, and (1) therefore reduces to a k v k = 0. But v k ≠ 0 and
therefore a k = 0. Consequently a 1 = a 2 = . . . . = a k = 0, and it follows that {v 1 , v 2 , . . ., v k
} is linearly independent.

Theorem 1. 3. 2. Let T be a linear operator on an n-dimensional vector space V. If T


has n distinct eigenvalues, then T is diagonalizable.
Proof. Suppose that T has n distinct eigenvalues λ 1 , λ 2 , . . . . , λ k .
For each i choose an eigenvector v i corresponding to λ i .
By theorem 1. 3. 1, { v 1 , v 2 , . . . . , v n } is linearly independent, and since dim(V) = n, this
set is a basis for V. Thus, T is diagonalizable.

Remark. A linear operator T on a finite-dimensional vector space V is diagonalizable if


and only if ordered basis B for V consisting of eigenvectors of T. So, to diagonalizable
a matrix or a linear operator is to find a basis of eigenvectors and the corresponding
eigenvalues.

1 3 1 3
Illustrative Example -1. Let A = � �, 𝑣1 = � �, and 𝑣2 = � �. Find [T A ] B
4 2 −1 4

1 3 1 2 1
Solution. Since [T] B (v 1 ) = � � � � = � � = −2 � � = −2𝑣1 , v 1 is an
4 2 −1 −2 −1
eigenvector of T A , and hence of A.
Here λ 1 = –2 is the eigenvalue corresponding to v 1 .
1 3 3 15 3
Further, [T] B (v 2 ) = � � � � = � � = 5 � � = 5𝑣2 , and so v 2 is an eigenvector of
4 2 4 20 4
T A , and hence of A, with the corresponding eigenvalue λ 2 = 5.

83
Note that B = {v 1 , v 2 } is an ordered basis for R2 consisting of eigenvectors both A and
T A , and therefore A and T A are diagonalizable.
−2 0
Therefore, by above remark we have [T A ] B = � �.
0 5
1 1
Example -2. Let 𝐴 = � � ∈ 𝑀2 𝑥 2 (𝐑).
1 1
The characteristic polynomial of A (and hence of T A ) is
1−𝑡 1
det(𝐴 − 𝑡𝐼) = det � � = 𝑡 (𝑡 − 2),
1 1−𝑡
and thus the eigen values of T A are 0 and 2.
Since T A is a linear operator on the two-dimensional vector space R2, we conclude from
the preceding Theorem 1. 2. 2, that T A (and hence A) is diagonalizable.

1. 3. 1. Polynomial splits
Definition 1. 3. 2. A polynomial f(t) in P(F) splits over a field F if there are scalars
c, a 1 , a 2 , ……,a n (not necessarily distinct) in a field F such that
f(t) = c(t – a 1 ) (t – a 2 ) . . . . . . . (t – a n ).

Example - 3. Consider the polynomial f(t) = t2 – 1 = (t + 1) (t – 1) splits over R, but


(t2 + 1) ( t – 2) does not split over R because (t2 + 1) cannot be factored into a product of
linear factors.
However (t2 + 1)( t – 2) does split over c because it factors into the product
(t + i)(t – i)( t – 2). If f(t) is the characteristic polynomial of a linear operator or a matrix
over a field F, then f(t) splits over F.

Theorem 1. 3. 3. The characteristic polynomial of any diagonalizable linear operator


splits.
Proof. Let T be a diagonalizable linear operator on the n-dimensional vector space V,
and let B be an ordered basis for V such that [T] B = D is a diagonal matrix. Suppose that
λ1 0 ⋯ 0
λ2 ⋯ 0
𝐷 = �0 ⋮ ⋮ �
⋮ ⋮
0 0 ⋯ λ𝑛

and let f(t) be the characteristic polynomial of T. Then


84
f(t) = det (D− tI)
λ1 − 𝑡 0 ⋯ 0
λ2 − 𝑡 ⋯ 0
= det � 0 ⋮ ⋮ �
⋮ ⋮
0 0 ⋯ λ𝑛 − 𝑡
= (λ 1 – t) (λ 2 – t) . . . . . . . . (λ n – t)
= (–1)n (t – λ 1 )( t – λ 2 ). . . . . . . . . (t – λ n ).

Note. Let λ be an eigenvalue of a linear operator or matrix with characteristic


polynomial f(t). The multiplicity of λ is the largest positive integer k for which (t–λ)k is a
factor of f(t).
3 1 0
Example - 4. Consider the matrix A=�0 3 4�, which has characteristic polynomial
0 0 4
f(t) = – (t –3)2(t –4).
Hence λ = 3 is an eigenvalue of A with multiplicity 2, and λ = 4 is an eigenvalue of A
with multiplicity 1.

Note. For diagonalzing matrices we have following facts


1. Let A be n × n matrix over a field F. If has n distinct eigenvalues, then the
corresponding eigenvectors of A are linearly independent.
2. Let A be n × n matrix over F. If has n distinct eigenvalues λ 1 , λ 2 , . . . . , λ k , then
there exists an invertible matrix C such that C–1AC= diagonal(λ 1 , λ 2 , . . ., λ k ).
3. Let A be n × n matrix over a field F. If the characteristic polynomial f(t) of a
matrix A is product of n distinct factors, say (–1)n (t – λ 1 )( t – λ 2 ). . . . . . . (t – λ n ).
Then A is similar to the diagonal matrix D= diagonal (λ 1 , λ 2 , . . . . , λ k ).

1. 4. Invariant Subspaces and the Cayley – Hamilton Theorem

If v is an eigenvector of a linear operator T, then T maps the span of {v} into


itself. Subspaces that are mapped into themselves are of great importance in the study of
linear operators.

85
Definition 1. 4. 1. Let T be a linear operator on a vector space V. A subspace W of V is
called a T-invariant subspace of V if T(W) ⊆ W. That is, if T(v) ∈ W for all v ∈ W.

Examples - 1. Suppose that T is a linear operator on a vector space V. Then {0}, V ,


rank(T), nullity(T) and E λ , for any eigen value λ of T are T-invariant subspaces of V.

Example - 2. Let T be the linear map on R3 defined by T(a + b + c) = (a + b, b + c, 0).


Then the xy - plane = {(x, y, 0): x, y ∈ R} and the x-axis = {(x, 0, 0): x ∈ R} are
T - invariant subspaces of R3.

Definition 1. 4. 2. Let T be a linear operator on a vector space V, and let x be a nonzero


vector in V, the subspace W = span ({x, T(x), T2(x), . . . . . . }) is called the T-cyclic
subspace of V generated by x.
It is a simple matter to show that W is T-invariant. In fact, W is the “smallest” T-
invariant subspace of V containing x. That is, any T - invariant subspace of V containing
x must also contain W. Cyclic subspaces have various uses. We apply them in this unit
to establish the Cayley - Hamilton theorem.

Example -3. Let T be the linear operator on R3 defined by T(a, b, c) = (–b + c, a + c, 3c)
We determine the T - cyclic subspace generated by e 1 = (1, 0, 0).
Since T(e 1 ) = T (1, 0, 0) = (0, 1, 0) = e 2 and T2(e 1 ) = T(T(e 1 )) = T(e 2 ) = (–1, 0, 0) = –
e1,
It follows that span ({e 1 , T(e 1 ) , T2(e 1 ) . . . . . }) = span ({e 1 , e 2 }) = {(s, t, o): s, t ∈ R}.

1. 4. 1. Cayley - Hamilton Theorem


Theorem 1. 4. 1. Let T be a linear operator on a finite - dimensional vector space V, and
let f(t) be the characteristic polynomial of T. Then f(T) = T 0 , the zero transformation.
That is, T “satisfies” its characteristic equation.
Proof. We show that F(T) (v) = 0 for all v ∈ V. This is obvious if v = 0 because f(T) is
linear; so suppose that v ≠ 0.

86
Let W be the T - cyclic subspace generated by v, and suppose that dim(W) = k. Let T be a
linear operator on a finite - dimensional vector space V, and let W denote the T - cyclic
subspace of V generated by a nonzero vector v ∈ V.
Let dim (W) = k. Then {v, T(v) , T2(v),…….Tk – 1(v)} is a basis for W. There exist scalars
a 0 , a 1 , ……, a k – 1 such that a 0 v + a 1 T(v) + ……….. + a k – 1 Tk – 1(v) + Tk (v) = 0.
This implies that g(t) = (–1)k (a 0 + a 1 t + ……….. + a k – 1 tk – 1 + tk ) is the characteristic
polynomial of T W . Combining these two equations yields
g(T)(v) = (–1)k (a 0 I + a 1 T+ ……….. + a k – 1 Tk – 1 + Tk )(v) = 0.
g(t) divides f(t); hence there exists a polynomial q(t) such that f(t) = q(t)g(t).
So f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(T)(0) = 0.

Example - 4. Let T be the linear operator of R2 defined by T(a, b) = (a + 2b, –2a + b),
1 2
and let B = {e 1 , e 2 }. Then 𝐴 = � �, where A = [T] B . The characteristic
−2 1
polynomial of T is, therefore,
f(t) = det (D− tI)
1−𝑡 2
= 𝑑𝑒𝑡 � �
−2 1−𝑡
= 𝑡 2 − 2𝑡 + 5.
It is easily verified that T 0 = f(T) = T2 – 2T + 5I.
Similarly, f(A) = A2 – 2A + 5I
−3 4 −2 −4 5 0
= � � + � � + � �
−4 −3 4 −2 0 5
0 0
=� �.
0 0

Corollary. Let A be an n × n matrix, and let f(t) be the characteristic polynomial of A.


Then f(A) = 0, the n × n zero matrix.

1. 5. Summary

87
1. In this unit we have discussed about matrices and determinants with respect to the
vectors (to be called characteristic vectors or eigenvectors) and a scalar λ (to be called
characteristic value or eigenvalue), such that Ax = λx.
2. A formula for the reflection of R2 about the line y = 2x. The key to our success was
to find a basis B1 . For which [T] B1 , is a diagonal matrix. This unit is concerned with
the so-called diagonalization problem. A solution to the diagonalization problem
leads naturally to the concepts of eigen value and eigen vector. Aside from the
important role that these concepts play in the diagonalization problem, they also
prove to be useful tools in the study of many non-diagonalizable operators.
3. An invariant subspace of a linear mapping T : V → V from some vector space V to
itself is a subspace W of V such that T(W) is contained in W. An invariant subspace of
T is also said to be T- invariant. With the help of an invariant subspace we study the
Cayley-Hamilton theorem states that every square matrix over a commutative ring
(including the real or complex field) satisfies its own characteristic equation.

1. 6. Keywords
Cayley- Hamilton theorem Eigenspace of a matrix
Characteristic polynomial of a linear Eigenvalue of a matrix
operator Eigenvalue of a linear operator
Characteristic polynomial of a matrix Eigenvector of a matrix
Diagonalizable linear operator Invariant subspace
Diagonalizable matrix Splits
Eigenspace of a liner operator Transition matrix

1. 7. Assessment Questions

1. If λ∈ F is a eigenvalue of T∈A(V), then show that there is a vector v ≠ 0 in V


such that T(v) = λv.
Hint. Use the definition of eigenvalue with (T −λ) is singular(non-invertible.
88
2. If λ∈F is a eigenvalue of T∈A(V), then show that for any polynomial q(x) ∈F[x],
q(λ) is a eigenvalue of q(T).
Hint. Use the polynomial expression of q(λ)∈F and q(T)∈A(V)
3. Find the all eigenvalues of the following matrix
1 2 −1
2 2
(i) A=� � and (ii) B = � 1 0 1�
1 3
4 −4 5

Answer. (i) λ=1 and λ= 4, (ii) λ=1 , λ= 2 and λ= 3.


4. Show that similar matrices have the same characteristic polynomial.
Hint. If the matrix B is similar to the matrix A over F, then there is an invertible
matrix C∈F such that B=C–1AC. Show det(B–xI) = det(A–xI)
3 1 0
5. Test the matrix A=�0 3 0� ∈M 3×3 (R) for diagonalizability.
0 0 4
Hint. The characteristic polynomial of A is det(A-tI) = – (t –4)(t –3)2 and find the
eigen values, finally show which is not diagonalizable.
6. Let T: R3 → R3 be defined by T(x, y, z) = ( 2x + y – 2z , 2x + 3y – 4z , x + y – z ).
Find all eigenvalues of T. Is T diagonalizable?
Hint. ∆(t) = t3 – 4t2 + 5t – 2 and λ = 1 and λ = 2 are eigenvalues.

1. 8. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. P. R. Halmos– Linear Algebra Problem Book, No.16, The Mathematical Association
of America, 1995.
3. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.

89
UNIT-2: INNER PRODUCT SPACE

STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. Inner Products Space
2. 2. 1. Norms
2. 2. 2. Orthogonal and Orthonormal
2. 3. The Gram-Schmidt Orthogonalization Process
2. 4. Orthogonal Complements
2. 5. Summary
2. 6. Keywords
2. 7. Exercise
2. 8. References

90
UNIT-2: INNER PRODUCT SPACE

2. 0. Objectives

After studying this unit reader will be able :


• To explain the inner product space
• To define the norms and orthogonal vectors.
• Finding the eigenvalues by using the characteristic polynomial
• Explain the diagonalizability of given matrix
• Discuss the invariant subspaces
• Explain the Cayley-Hamilton theorem

2. 1. Introduction

Inner product space is a vector space or function space in which an operation for
combining two vectors or functions (whose result is called an inner product) is defined
and has certain properties. In this unit we also study the Gram-Schmidt process,
which takes a finite, linearly independent set S = {v 1 , …, v k } for k ≤ n and generates
an orthogonal set S′ = {u 1 , …, u k } that spans the same k-dimensional subspace of
Rn as S. Finally, this unit we concentrate the orthogonal complement W⊥ of a subspace W
of an inner product space V is the set of all vectors in V that are orthogonal to every
vector in W.

2. 2. Inner product spaces

91
Definition 2. 2. 1. Let V be a vector space over F, an inner product on V is a function
that assigns, to every ordered pair of vectors x and y in V, a scalar in F 1 denoted 〈x, y〉,
such that for all x, y and z in V and all c in F, the following holds:
1. 〈x+ z, y〉 = 〈x, y〉 + 〈y, z〉.

2. 〈cx, y〉 = c〈x, y〉.


�������
3. 〈𝑥, 𝑦〉 = 〈y, x〉, where the bar denotes complex conjugation.

4. 〈x, x〉 > 0 if x ≠ 0.

Note that (3) reduces to 〈x, y〉 = 〈y, x〉 if F = R. Condition (1) and (2) simply require that
the inner product be linear in the first component.
It is easily shown that if a 1 , a 2 , ……,a n ∈ F and y, v 1 , v 2 , ……,v n ∈ V, then
〈 ∑𝑛𝑖=1 𝑎𝑖 𝑣𝑖 , 𝑦 〉 = ∑ni=1 𝑎𝑖 〈𝑣𝑖 , 𝑦 〉
For example, x = (a 1 , a 2 , ……,a n ) and y = (b 1 , b 2 , ……,b n ) in Fn, define
〈x, y〉 = ∑ni=1 ai b� i .
The verification that ( ⋅ , ⋅) satisfies conditions (1) through (4) is easy.
For example, if z = (c 1 , c 2 , ……,c n ), we have for (1)
〈x + z, y〉 = ∑ni=1(ai + ci ) b� i = ∑ni=1 ai b� i + ∑ni=1 ci b� i
= 〈x, y〉 + 〈z, y〉.
Thus, for x = (1 + i, 4) and y = (2 – 3i, 4 + 5i) in C2,
〈x, y〉 = ( 1 + i )(2+3i) + 4(4 – 5i) = 15 – 15i =15(1 – i).
The inner product in the above example is called the standard inner product on F n .
Where F = R the conjugations are not needed, and in early courses this standard inner
product is usually called the dot product and is denoted by x . y instead of 〈x, y〉.

Example - 1. If 〈x, y〉 is any inner product on a vector space V and r > 0, we may define
another inner product by the rule 〈x, y〉1 = r 〈x, y〉. If r ≤ 0, then (4) would not hold.

Example -2. Let V = C( [0,1] ), the vector space of real - valued continuous functions on
1
[0,1]. For f, g ∈ V, define 〈f, g〉 = ∫0 𝑓(𝑡)𝑔(𝑡)𝑑𝑡. Since the preceding integral is linear in
f, (1) and (2) are immediate, and (3) is trivial. If f ≠ 0, then f2 is bounded away from zero
1
on some subinterval of [0,1] and hence 〈f, f〉 = ∫0 [𝑓(𝑡)]2 𝑑𝑡 > 0.
92
Definition 2. 2. 2. Let A ∈ M m × n (F). We define the conjugate transpose or adjoint of
����
A to be the n × m matrix A* such that (A*) ij = �𝐴 ∗
𝚤𝚥 � for all i, j.

𝑖 1 + 2𝑖 −𝑖 2
Example - 3. Let 𝐴= � �. Then 𝐴∗ = � �.
2 3 + 4𝑖 1 − 2𝑖 3 − 4𝑖

A vector space V over a field F endowed with a specific inner product is called an
inner product space. If F = C, we call V a complex inner product space, whereas if
F = R, we call V a real inner product space. It is clear that if V has an inner product
〈x, y〉 and W is a subspace of V, then W is also an inner product space when the same
function 〈x, y〉 is restricted to the vectors x, y ∈ W.

Note. Let V be an inner product space. Then for x, y, z ∈ V and c ∈ F, the following
statements are true.
(i) 〈x, y + z〉 = + 〈x, z〉
(ii) 〈x, cy〉 = 𝑐̅〈x, y〉.
(iii) 〈x, 0〉 = 〈0, x〉 = 0
(iv) 〈x, x〉 = 0 if and only if x = 0
(v) If 〈x, y〉 = 〈x, z〉 for all x ∈ V, then y – z.

2. 2. 1. Norms
Definition 2. 2. 3. Let V be an inner product space. For x ∈ V, we define the norm or
length of x by ‖𝑥‖ = �〈x, x〉.

Example - 4. Let V= Fn. If x = (a 1 , a 2 , ……,a n ), then


1
‖𝑥‖ = ‖𝑎1 , 𝑎2 , … … … 𝑎𝑛 ‖ = [∑𝑛𝑖=1|𝑎𝑖 |2 ]2 .

Theorem 2. 2. 1. Let V be an inner product space over F. Then for all x, y ∈ V and c ∈
F, the following statements are true.
(i) ‖𝑐𝑥‖ = |𝑐| ∙ ‖𝑥‖
(ii) ‖𝑥‖ = 0 if and only if x = 0. In any case, ‖𝑥‖ ≥ 0
(iii) (Cauchy – Schwarz Inequality) |〈x, y〉| = ‖𝑥‖ ⋅ ‖𝑦‖
93
(iv) (Triangle Inequality) ‖𝑥 + 𝑦‖ ≤ ‖𝑥‖ + ‖𝑦‖

Proof. (i) and (ii) are obvious.


(iii) If y = 0, then the result is immediate. So assume that y ≠ 0. For any c ∈ F, we
have

0 ≤ ‖𝑥 − 𝑐𝑦‖2 = 〈x − cy, x − cy〉 = 〈x, x − cy〉 − c〈y − cy〉


= 〈x, x〉 − c�〈x, y〉 − c〈y, x〉 + cc�〈y, y〉.
〈x,y〉
In particular, if we set 𝑐 = 〈y,y〉
. The inequality becomes
|〈x,y〉 |2 |〈x,y〉 |2
0 ≤ 〈x, x〉 − = ‖x‖2 = ‖y‖2
, from which Cauchy –
〈y,y〉

Schwarz Inequality follows.


(iv) We have ‖𝑥 + 𝑦‖2 = 〈x + y, x + y〉 = 〈x, x〉 + 〈y, x〉 + 〈x, y〉 + 〈y, y〉
= ‖x‖2 + 2𝑅 〈x, y〉 + ‖y‖2
≤ ‖x‖2 + 2|〈x, y〉| + ‖y‖2
≤ ‖x‖2 + 2 ‖x‖ ∙ ‖y‖ + ‖y‖2
= (‖x‖ + ‖y‖ )2 ,
where R〈x, y〉 denotes the real part of the complex number 〈x, y〉.

2. 2. 2. Orthogonal and Orthonormal

Definition 2. 2. 4. Let V be an inner product space. Vectors x and y in V are orthogonal


(perpendicular) if 〈x, y〉 = 0.
A subset S of V is orthogonal if any two distinct vectors in S are orthogonal. A
vector x in V is a unit vector if ‖x‖ = 1. Finally, a subset S of V is orthonormal if S is
orthogonal and consists entirely of unit vectors.

Note. If S = {v 1 , v 2 , …, v k }, then S is orthonormal if and only if 〈v i , v j 〉 = δ ij , where δ ij


denotes the kronecker delta. Also, observe that multiplying vectors by nonzero scalars
does not affect their orthogonality and that if x is any nonzero vector, then (1/ ‖𝑥‖) x is a

94
unit vector. The process of multiplying a nonzero vector by the reciprocal of its length is
called normalizing.

Example - 5. In F3, {(1, 1, 0), (1, –1, 1), (–1, 1, 2)} is an orthogonal set of nonzero
vectors, but it is not orthonormal; however, if we normalize the vectors in the set, we
1 1 1
obtain the orthonormal set. � (1, 1, 0), (1, −1, 1), (−1,1,2)�.
√2 √3 √6

Note. Let V be a vector space over F, where F is either R or C. Regardless of whether V


is or is not an inner product space, we may still define a norm ‖ ⋅ ‖ as a real - valued
function on V satisfying the following three conditions for all x, y ∈ V and a ∈ F:
1. ‖𝑥‖ ≥ 0 and ‖𝑥‖ = 0 if and only if x = 0.
2. ‖𝑎𝑥‖ = |𝑎| ∙ ‖𝑥‖
3. ‖𝑥 + 𝑦‖ ≤ ‖𝑥‖ + ‖𝑦‖.

Illustrative Example-6. Let u and v be two vectors in an inner product space v such that
‖𝑢 + 𝑣‖ = ‖𝑢‖ + ‖𝑣‖. Prove that u and v are linear dependent vectors. Give an
example to show that the converse of this statement is not true.

Solution. We have ‖𝑢 + 𝑣‖ = ‖𝑢‖ + ‖𝑣‖


⇒ ‖𝑢 + 𝑣‖2 = ( ‖𝑢‖ + ‖𝑣‖ )2
⇒ 〈𝑢 + 𝑣 , 𝑢 + 𝑣〉 = ‖𝑢‖2 + ‖𝑣‖2 + 2 ‖𝑢‖‖𝑣‖
⇒ 〈𝑢 , 𝑢〉 + 2 〈𝑢 , 𝑣〉 + 〈𝑣 , 𝑣〉 = ‖𝑢‖2 + ‖𝑣‖2 + 2 ‖𝑢‖‖𝑣‖
⇒ 〈𝑢 , 𝑣〉 = ‖𝑢‖‖𝑣‖
⇒ u and v are linearly dependent vectors
The converse is not true, because vectors u = ( –1, 0, 1) and v = (2, 0, –2) in R3 are
linearly dependent as v = 2u. But ‖𝑢 + 𝑣‖ ≠ ‖𝑢‖ + ‖𝑣‖.

Illustrative Example -7. Let V be an inner product space and u, v ∈ V. Then, prove that
(i) ‖𝑢 + 𝑣‖2 − ‖𝑢 − 𝑣‖2 = 4 〈𝑢, 𝑣〉

(ii) ‖𝑢 + 𝑣‖2 + ‖𝑢 − 𝑣‖2 = 2(‖𝑢‖2 + ‖𝑣‖2 )

Solution. By using the definition of norm of a vector, we have

95
‖𝑢 + 𝑣‖2 = 〈𝑢 + 𝑣 , 𝑢 + 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢 + 𝑣〉 + 〈𝑣 , 𝑢 + 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢〉 + 〈𝑢 , 𝑣〉 + 〈𝑣 , 𝑢 〉 + 〈𝑣 , 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢〉 + 2 〈𝑢 , 𝑣〉 + 〈𝑣 , 𝑣〉 , since 〈𝑢 , 𝑣〉 = 〈𝑣 , 𝑢〉
⇒ ‖𝑢 + 𝑣‖2 = ‖𝑢‖2 + ‖𝑣‖2 + 2 〈𝑢 , 𝑣〉 → (1)
‖𝑢 − 𝑣‖2 = 〈𝑢 − 𝑣 , 𝑢 − 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢 − 𝑣〉 + 〈−𝑣, 𝑢 − 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 + 〈𝑢, −𝑣〉 + 〈 −𝑣, 𝑢〉 + 〈−𝑣, −𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 − 〈𝑢, 𝑣〉 − 〈 𝑣, 𝑢〉 + (−1)2 〈𝑣, 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 − 2 〈𝑢, 𝑣〉 + 〈𝑣, 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = ‖𝑢‖2 − 2 〈𝑢, 𝑣〉 + ‖𝑣‖2 → (2)
On subtracting (2) from (1), we get
‖𝑢 + 𝑣‖2 − ‖𝑢 − 𝑣‖2 = 4 〈𝑢, 𝑣〉
On adding (1) and (2), we get
‖𝑢 + 𝑣‖2 + ‖𝑢 − 𝑣‖2 = 2(‖𝑢‖2 + ‖𝑣‖2 )

2. 3. The Gram - Schmidt Orthogonalization Process

Let V be an inner product space. A subset of V is an orthonormal basis for V if it


is an ordered basis that is orthonormal.

Example -1. The standard ordered basis for F n is an orthonormal basis for F n .

1 2 2 −1
Example -2. The set � � , �, � , � � is an orthonormal basis for R2.
√5 √5 √5 √5

Theorem 2. 3. 1. Let V be an inner product space and S = {v 1 , v 2 ,………,v k }be an


orthogonal subset of V consisting of nonzero vectors. If y ∈ span(S), then
〈y,v 〉
𝑦 = ∑𝑘𝑖=1 ‖𝑣 ‖i2 𝑣𝑖
𝑖

Proof. Write 𝑦 = ∑𝑘𝑖=1 𝑎𝑖 𝑣𝑖 , where a 1 , a 2 , ……,a k ∈ F. Then, for 1 ≤ j ≤ k,


2
we have, 〈𝑦, 𝑣𝑗 〉 = 〈∑𝑘𝑖=1 𝑎𝑖 𝑣𝑖 , 𝑣𝑗 〉 = ∑𝑘𝑖=1 𝑎𝑖 〈𝑣𝑖 , 𝑣𝑗 〉 = 𝑎𝑗 〈𝑣𝑗 , 𝑣𝑗 〉 = 𝑎𝑗 �𝑣j � .

96
〈𝑦, 𝑣𝑗 〉
So 𝑎𝑗 = 2 , and the result follows.
�𝑣𝑗 �

The next couple of results follow immediately from the above theorem
Corollary-1. If, in addition to the hypotheses of the above theorem, S is orthonormal and
y ∈ span (S), then 𝑦 = ∑𝑘𝑖=1 〈y, vi 〉 𝑣𝑖 .
If v possesses a finite orthonormal basis, then Corollary-1 allows us to compute
the coefficients in a linear combination very easily.

Corollary- 2. Let V be an inner product space, and let S be an orthogonal subset of V


consisting of nonzero vectors. Then S is linearly independent.
Proof. Suppose that v 1 , v 2 , ……,v k ∈ S and ∑𝑘𝑖=1 ai 𝑣𝑖 = 0. As in the poof of above
2
theorem with y = 0, we have a j = 〈0, v j 〉 / �𝑣𝑗 � = 0 for all j. So S is linearly
independent.

Example -3. By Corollary 2, the orthonormal set


1 1 1
� (1, 1, 0), (1, −1, 1), (−1,1,2)�
√2 √3 √6

obtained is an orthonormal basis for R3.


Let x = (2, 1, 3). The coefficients given by Corollary-1 to Theorem 2. 3.1, that express
x as a linear combination of the basis vectors are
1 3
𝑎1 = (2 + 1) = ,
√2 √2
1 4
𝑎2 = (2 − 1 + 3) = and
√3 √3
1 5
𝑎3 = (−2 + 1 + 6) = .
√6 √6
3 4 5
As a check, we have (2, 1, 3) = 2
(1, 1, 0) +
3
(1, −1, 1) +
6
(−1, 1, 2).

Theorem 2. 3. 2. Let V be an inner product space and S = {w 1 , w 2 , ……,w n }be a


linearly independent subset of V. Define S1 = {v 1 , v 2 , ……,v n }, where v 1 = w 1 and
〈𝑤𝑘 ,vj 〉
𝑣𝑘 = 𝑤𝑘 − ∑𝑘−1
𝑗=1 2 𝑣𝑗 for 2 ≤ k ≤ n. → (1)
�𝑣𝑗 �

Proof. The proof is by mathematical induction on n, the number of vectors in S. For


k = 1, 2 ,….., n, let S k = { w 1 , w 2 , ……, w n }.

97
If n = 1, then the theorem is proved by taking 𝑆11 = S 1 . That is, v 1 = w 1 ≠ 0. Assume
R

1
then that the set 𝑆𝑘−1 = { v 1 , v 2 , ……,.v k – 1 }with the desired properties has been
constructed by the repeated use of (1).
We show that the set 𝑆𝑘1 = { v 1 , v 2 , ……,v k –1 v k } also has the desired properties, where v k
1
is obtained from 𝑆𝑘−1 by (1).
1
If v k = 0, then (1) implies that w k ∈ span (𝑆𝑘−1 ) = span (S k – 1 ), which contradicts the
assumption that S k is linearly independent. For 1 ≤ i ≤ k – 1, it follows from (1) that
𝑘−1
〈𝑤𝑘 , vj 〉
〈 𝑣𝑘 , 𝑣𝑖 〉 = 〈 𝑤𝑘 , 𝑣𝑖 〉 − � 2 〈 𝑣𝑗 , 𝑣𝑖 〉
𝑗=1 �𝑣𝑗 �
〈𝑤𝑘 ,vj 〉 2
= 〈 𝑤𝑘 , 𝑣𝑖 〉 − 2 �𝑣𝑗 �
�𝑣𝑗 �

= 0.
1
Since 〈 𝑣𝑗 , 𝑣𝑖 〉 = 0 if i ≠ j by the induction assumption that 𝑆𝑘−1 is orthogonal.
Hence 𝑆𝑘1 is an orthogonal set of nonzero vectors.
Now, by (1), we have that span(𝑆𝑘1 ) ⊆ span(S k ).
But by Corollary-2 to Theorem 2. 3.1, 𝑆𝑘1 is linearly independent.
So dim(span(𝑆𝑘1 ) ) = dim (span (S k )) = k.
Therefore span (𝑆𝑘1 ) = span (S k ).
The construction of {v 1 , v 2 , ……,v n } by the use of theorem 2. 3. 2 is called the Gram-
Schmidt Orthogonalization Process.

Illustrated Example -4. Let w 1 = (1, 0, 1, 0), w 2 = (1, 1, 1, 1) , and w 3 = (0, 1, 2, 1) in


R4. If {w 1 , w 2 , w 3 } is linearly independent. Use the Gram-Schmidt Orthogonalization
Process to compute the orthogonal vectors v 1 , v 2 and v 3 , and then find the normalize
these vectors to obtain an orthonormal set.
Solution. Take v 1 = w 1 = (1, 0, 1, 0).
〈𝑤2 ,v1 〉 2
Then 𝑣2 = 𝑤2 − ‖𝑣1 ‖2 1
𝑣 = (1, 1, 1, 1) − 2
(1, 0, 1, 0) = (0, 1, 0, 1).
〈𝑤3 ,v1 〉 〈𝑤3 ,v2 〉
Finally, 𝑣3 = 𝑤3 − ‖𝑣1 ‖2 1
𝑣 − ‖𝑣2 ‖2 2
𝑣
2 2
= (0, 1, 2, 1) − 2
(1, 0, 1, 0) −
2
(0, 1, 0, 1) = (–1, 0, 1, 0).

These vectors can be normalized to obtain the orthonormal basis { u 1 , u 2 , u 3 },


98
1 1 1 1
where 𝑢1 = 𝑣1 = (1, 0, 1, 0), 𝑢2 = 𝑣2 = (0, 1, 0,1) and
‖𝑣1 ‖ √2 ‖𝑣2 ‖ √2
1 1
𝑢3 = 𝑣3 = (−1, 0, 1, 0).
‖𝑣3 ‖ √2

2. 4. Orthogonal Complement

Definition 2. 3. 1. Let S be a non empty subset of an inner product space V. We define


S ⊥ (read “S perp”) to be the set of all vectors in V that are orthogonal in every vector in

S; That is, S⊥ = {x ∈ V: x, y = 0 for all y ∈ S}. The set S⊥ is called the orthogonal

complement of S. It is easily seen that S⊥ is a subspace of V for any subset S of V.

Theorem 2. 4. 1. Let W be a finite-dimensional subspace of an inner product space V,


and let y ∈ V. Then there exist unique vectors u ∈ W and z ∈ W⊥ such that y = u + z.
k
Furthermore, if {v 1 , v 2 , . . . , v k } is an orthonormal basis for W, then u = ∑
i =1
y , vi vi .

Proof. Let {v 1 , v 2 , . . . , v k } be an orthonormal basis for W, let u be as defined in the


preceding equation, and let z = y – u. Clearly in u ∈ W and y = u + z. To show that z ∈
W ⊥ , it suffices that z is orthogonal to each v j . For any j, we have

 k  k
z , v j =  y − ∑ y , vi vi  , v j = y − v j − ∑ y , vi vi , v j
 i= 1  i =1

= y,v j − y,v j = 0
To show uniqueness of u and z, suppose that y = u + z = u1 + z1, where u1 ∈ W and
z1 ∈ W⊥. Then u – u1 = z1 – z ∈ W∩ W⊥ = {0}. Therefore u = u1 and z = z1.

In the notation of the above theorem, we have following corollary


Corollary. The vector u is the unique vector in W that is “closest” to y; that is, for any
x ∈ W, y − x ≥ y − u , and this inequality is an equality if and only if x = u.

99
Theorem 2. 4. 2. Suppose that S = {v 1 , v 2 , …,v n } is an orthonormal set in an n-
dimensional inner product space V. Then
(i) S can be extended to an orthonormal basis { v 1 , v 2 , ……,v k , v k + 1 ,……, v n } for
V.
(ii) If W = span(S), then S 1 = { v k + 1 , v k + 2 , ……,v n } is an orthonormal basis for
W⊥.
(iii) If W is any subspace of V, then dim (V) = dim (W) + dim (W⊥).

Proof. (i) Any finite generating set for V with dimension n contains at aleast n vectors,
and a generating set for vector space V that contains exactly n vectors is a basis for V, S
can be extended to an ordered basis S1 = {v 1 , v 2 , . . . . , v k , w k + 1 , . . . . . , w n } for V.
Now apply the Gram-Schmidt Orthogonalization Process to S1. The first k
vectors resulting from this process are the vectors in S, and this new set spans V.
Normalizing the last (n – k) vectors of this set produces an orthonormal set that spans V.
The result follows.
(ii) Because S 1 is a subset of a basis, it is linearly independent. Since S 1 is
clearly a subset of W⊥, we need only show that it spans W⊥. Note that, for any x ∈ V,

we have 𝑥 = ∑𝑛𝑖=1〈𝑥 , 𝑣𝑖 〉𝑣𝑖 . If x ∈ W⊥, then 〈𝑥 , 𝑣𝑖 〉 = 0 for 1 ≤ i ≤ k.


Therefore, x = ∑𝑛𝑖=𝑘+1〈𝑥 , 𝑣𝑖 〉𝑣𝑖 ∈ span (S 1 ).
(iii) Let W be a subspace of a vector space V. It is a finite - dimensional inner
product space because V is, and so it has an orthonormal basis {v 1 , v 2 , ……,v k }. By (i)
and (ii), we have dim (V) = n = k + (n – k) = dim (W) + dim (W⊥).

Example -1. Let W = span ({e 1 , e 2 }) in F3. Then x = (a, b, c) ∈ W⊥ if and only if

0 = 〈 x, e 1 〉 = a and 0 = 〈 x, e 2 〉 = b. So x = (0, 0, c), and therefore W⊥ = span ({e 3 }).


One can deduce the same result by noting that e 3 ∈ W⊥ and from (iii), that dim (W⊥) =
3 – 2 = 1.

Illustrated Example -2. Let C[ – π, π] be the inner product space of all continuous
π
functions defined on [ – π, π] with the inner product defined by 〈𝑓, 𝑔〉 = ∫−π 𝑓(𝑡)𝑔(𝑡)𝑑𝑡.

Prove that sin t and cos t are orthogonal functions in C[ - π, π].

100
π
Solution . We have, 〈𝑓, 𝑔〉 = ∫−π 𝑓(𝑡)𝑔(𝑡)𝑑𝑡
π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = ∫−π 𝑠𝑖𝑛 𝑡 𝑐𝑜𝑠 𝑡 𝑑𝑡
1 π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = ∫ 𝑠𝑖𝑛 2𝑡 𝑑𝑡
2 −π
−1 π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = [cos 2𝑡]− π
4
−1
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = (−1 − 1) = 0
4

Thus, sin t and cos t are orthogonal functions in the inner product space C[–π, π].

Illustrated Example -3. Let u = (−1, 4, −3) be a vector in the inner product space with
the standard inner product. Find a basis of the subspace u⊥ of R3.
Solution. We have
𝑢⊥ = { 𝑣 ∈ 𝑅 3 : 〈𝑢, 𝑣〉 = 0} or , 𝑢⊥ = { 𝑣 = (𝑥, 𝑦, 𝑧)∈ 𝑅 3 : − 𝑥 + 4𝑦 − 3𝑧 = 0}
Thus, u⊥ consists of all vectors v = (x, y, z) such that –x + 4y − 3z = 0. In this equation
there are only two free variables. Taking y and z as free variable, we find that
y = 1, z = 1 ⇒ x = 1; y = 0, z = 1 ⇒ x = − 3
Thus, v 1 = (1, 1, 1) and v 2 = (−3, 0, 1) are two independent solutions of −x + 4y −3z = 0.
Hence, {v 1 = (1, 1, 1), v 2 = (−3, 0, 1)} form a basis for u⊥

Illustrated Example - 4. If { v 1 , v 2 , . . . . . . .v n } is an orthonormal set in an inner


product space V and v ∈ V. Then prove that ∑𝑛𝑖=1|〈𝑣, 𝑣𝑖 〉|2 ≤ ‖𝑣‖2 (Bessel’s
inequality). Further, show that equality holds if and only if v is in the subspace spanned
by { v 1 , v 2 , . . . . . . .v n }.
Solution. Consider the vector u given by 𝑢 = 𝑣 – ∑𝑛𝑖=1〈𝑣, 𝑣𝑖 〉𝑣𝑖
‖𝑢‖2 = 〈𝑢, 𝑢〉
𝑛 𝑛

‖𝑢‖2 = 〈𝑣 − �〈𝑣, 𝑣𝑖 〉𝑣𝑖 , 𝑣 − �〈𝑣, 𝑣𝑗 〉𝑣𝑗 〉


𝑖=1 𝑗=1
𝑛 𝑛 𝑛 𝑛

‖𝑢‖2 = 〈𝑣, 𝑣〉 − 〈𝑣, �〈𝑣, 𝑣𝑗 〉𝑣𝑗 〉 − 〈�〈𝑣, 𝑣𝑖 〉𝑣𝑖 , 𝑣〉 + 〈�〈𝑣, 𝑣𝑖 〉𝑣𝑖 , �〈𝑣, 𝑣𝑗 〉𝑣𝑗 〉
𝑗=1 𝑖=1 𝑖=1 𝑗=1
𝑛 𝑛 𝑛 𝑛

‖𝑢‖2 = 〈𝑣, 𝑣〉 − �〈𝑣, 𝑣𝑗 〉〈𝑣, 𝑣𝑗 〉 − �〈𝑣, 𝑣𝑖 〉〈𝑣𝑖 , 𝑣〉 + � �〈𝑣, 𝑣𝑖 〉〈𝑣, 𝑣𝑗 〉〈𝑣𝑖 , 𝑣𝑗 〉


𝑗=1 𝑖=1 𝑖=1 𝑗=1

101
𝑛 𝑛 𝑛

‖𝑢‖2 = 〈𝑣, 𝑣〉 − 2 �|〈𝑣, 𝑣𝑖 〉|2 + � �〈𝑣, 𝑣𝑖 〉〈𝑣, 𝑣𝑗 〉δ𝑖𝑗


𝑖=1 𝑖=1 𝑗=1
𝑛 𝑛

‖𝑢‖2 = ‖𝑣‖2 − 2 �|〈𝑣, 𝑣𝑖 〉|2 + �|〈𝑣, 𝑣𝑖 〉|2


𝑖=1 𝑖=1
𝑛

‖𝑢‖2 = ‖𝑣‖2 − �|〈𝑣, 𝑣𝑖 〉|2


𝑖=1
𝑛

‖𝑣‖2 − �|〈𝑣, 𝑣𝑖 〉|2 ≥ 0


𝑖=1

∑𝑛𝑖=1|〈𝑣, 𝑣𝑖 〉|2 ≤ ‖𝑣‖2 . This proves the first part.

Now, ∑𝑛𝑖=1|〈𝑣, 𝑣𝑖 〉|2 = ‖𝑣‖2


⇒ ‖𝑢‖2 = 0
⇒ 𝑢 = 0𝑣
𝑛

⇒ 𝑣 − �〈𝑣, 𝑣𝑖 〉𝑣𝑖 = 0𝑣
𝑖=1
𝑛

⇒ 𝑣 = �〈𝑣, 𝑣𝑖 〉𝑣𝑖
𝑖=1

⇒ v is the linear combination of v 1 , v 2 , . . . . . . .v n.


⇒ v ∈ subspace spanned by the set { v 1 , v 2 , . . . . . . .v n }.
Conversely, let v be linear combination of v 1 , v 2 , . . . . . . .v n . Then, 𝑣 = ∑𝑛𝑖=1〈𝑣, 𝑣𝑖 〉𝑣𝑖
⇒ u = 0v ⇒ u = 0 ⇒ ‖𝑣‖2 = ∑𝑛𝑖=1|〈𝑣, 𝑣𝑖 〉|2 .

2. 4. Summary

1. Most applications of mathematics are involved with the concept of measurement and
hence of the magnitude or relative size of various quantities. So it is not surprising
that the fields of real and complex numbers, which have a built - in notion of distance,
should play a special role. We assume that all vector spaces are over the field F,
where F denotes either R or C. In this unit we shall study a special class of vector
spaces which is very rich in geometry. Consider V R = R3. For any a = (a 1 , a 2 , a 3 )
102
∈ V, the length |𝑎| = �𝑎12 + 𝑎22 + 𝑎32 Further, given b = (b 1 , b 2 , b 3 ) ∈ V the dot
product (or inner product) a . b = a 1 b 1 , a 2 b 2 , a 3 b 3 is well known. The angle θ
𝑎 .𝑏
between a and b is derived from the equation. 𝐶𝑜𝑠 𝜃 = ‖𝑎‖ ‖𝑏‖
. Here, we satudy

the idea of distance or length into vector spaces via a much richer structure, the so-
called inner product space structure.
2. In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt
process is a method for orthonormalising a set of vectors in an inner product space,
most commonly the Euclidean space Rn. The Gram–Schmidt process takes a finite,
linearly independent set S = {v 1 , …, v k } for k ≤ n and generates an orthogonal set S′ =
{u 1 , …, u k } that spans the same k-dimensional subspace of Rn as S.
3. Also, the orthogonal complement W⊥ of a subspace W of an inner product space V is
the set of all vectors in V that are orthogonal to every vector in W .

2. 5. Keywords

Gram – Schmidt orthogonalization Orthogonal complement


Inner product Orthogonal matrix
Inner product space Orthogonally equivalent matrices
Norm of a matrix Orthogonal operator
Norm of a vector Orthogonal vectors
Normal operator Orthonormal
Normalizing a vector

2. 6. Assessment Questions

1. State and prove the Cauchy – Schwarz Inequality.


Hint. See Theorem 2. 2.1- (iii)
2. State and prove the Triangle Inequality.
Hint. See Theorem 2. 2.1 - (iv)

103
3. Let V be an inner product space with 〈 , 〉 as an inner product and u, v in V. Then
show that u = v if and only if 〈 u, w 〉 = 〈 v, w 〉 for all w in W.
Hint. Use the properties of inner product space.
4. Let V be a finite dimensional inner product space. Show that V has an
orthonormal set as a basis.
Hint. See Theorem 2. 4. 2-(i)
5. Apply the Gram-Schmidt orthogonalization process to the basis B={(1, 0,1), ( 1,
0, –1), (0, 3, 4)}of the inner product space R3 to find an orthogonal and on
orthonormal basis of R3.
1 1 1 −1
Answer. u 1 = � , 0, �, u 2 = � , 0, � and u 3 = (0, 1, 0).
√2 √2 √2 √2

2. 7. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Second Edition, PHI, 1978.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.

104
UNIT-3: THE ADJOINT, NORMAL, SELF-ADJOINT, UNITARY
AND ORTHOGONAL OPERATORS, ORTHOGONAL
PROJECTIONS AND THE SPECTRAL THEOREM

STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. The Adjoint of a linear operator
3. 2. 1. Properties of Adjoint operators and their Matrices
3. 3. The Normal and Self-Adjoint operators
3. 3. 1. Normal linear operator
3. 3. 2. Self-Adjoint operator
3. 4. Unitary and Orthogonal operators
3. 4. 1. Unitary linear operator
3. 4. 2. Orthogonal operator
3. 5. Orthogonal Projections
3. 5. 1. The Spectral Theorem
3. 6. Summary
3. 7. Keywords
3. 8. Assessment Questions
3. 9. References

105
UNIT-3: THE ADJOINT, NORMAL, SELF-ADJOINT, UNITARY
AND ORTHOGONAL OPERATORS, ORTHOGONAL
PROJECTIONS AND THE SPECTRAL THEOREM

3. 0. Objectives

After working through this unit, the reader should able:


• To discuss the different types of linear operators.
• To illustrate the matrix form of adjoint, self-adjoint, normal and unitary linear
operators.
• To discuss the hermitian and skew-hermitian operator in complex number.
• To describe the properties of orthogonal projections
• To discuss the spectral theorem

3. 1. Introduction

This unit investigates the space A(V) of linear operator T on an inner product
space V. Adjoints of operators generalize conjugate transposes of square matrices to
(possibly) infinite-dimensional situations. The adjoint of an operator A is also sometimes
called the Hermitian conjugate (after Charles Hermite) of A. So, most of the results on
unitary spaces are identical to the corresponding results on inner product space. In this
unit, we learn about the normal and self - adjoint operators, unitary and orthogonal
operators and their matrices, orthogonal projections and the spectral theorem, bilinear and
quadratic forms.

106
3. 2. The Adjoint of a linear operator

Definition 3. 2. 1. The conjugate transpose A of a matrix A. For a linear operator T on


an inner product space V, we now define a related linear operator on V called the adjoint
*
of T, whose matrix representation with respect to any orthonormal basis B for V is [T ]B .

Let V be an inner product space, and let y ∈ V. The function g: V → F defined by


g(x) = 〈 x, y 〉 is linear.

Theorem 3. 2. 1. Let V be a finite- dimensional inner product space over a field F, and
let g: V → F be a linear transformation. Then there exists a unique vector y ∈ V such
that g(x) = 〈 x, y 〉 for all x ∈ V.
Proof. Let B = {v 1 , v 2 , ……v n } be an orthonormal basis for V, and let 𝑦=
�������
∑𝑛𝑖=1 𝑔(𝑣 𝚤 ) 𝑣𝑖 . Define h: V → F by h(x) = 〈 x, y 〉, which is linear.

Furthermore, for i ≤ j ≤ n, we have


�������
h(v j ) = 〈 v j , y 〉 = 〈𝑣𝑗 ∑𝑛𝑖=1 𝑔(𝑣 𝑛 𝑛
𝚤 ) 𝑣𝑖 〉 = ∑𝑖=1 𝑔( 𝑣𝑖 ) 〈𝑣𝑗 , 𝑣𝑖 〉 = ∑𝑖=1 𝑔(𝑣𝑖 ) 𝛿𝑗𝑖 = 𝑔(𝑣𝑗 ).

Since g and h both agree on B, so we have, g = h due to the fact that two vector space
V and W, with V has a finite basis { v 1 , v 2 , . . . ,v n }, if U, T: V→W are linear and
U(v i ) = T(v i ) for i = 1,2, …, n, then U=T .
To show that y is unique, suppose that g(x) = 〈 x, y1 〉 for all x.
Then 〈 x, y 〉 = 〈 x, y1 〉 for all x, since if 〈 x, y 〉 = 〈 x, z 〉 for all x, y, z ∈ V , then y = z.
Hence, we have y = y1.

Example -1. Define g: R2 → R by g(a 1 , a 2 ) = 2a 1 + a 2 ; clearly g is a linear


transformation. Let B = {e 1 , e 2 }, and let y = g(e 1 ) e 1 + g(e 2 ) e 2 = 2 e 1 + e 2 = (2, 1), as
in the proof of Theorem 3. 2. 1, Then g(a 1 , a 2 ) = 〈(a1 , a2 ) , (2 , 1)〉 = 2𝑎1 + 𝑎2 .

Theorem 3. 2. 2. Let V be a finite - dimensional inner product space, and let T be a


linear operator on V. Then there exists a unique function T*: V → V such that
〈𝑇(𝑥), 𝑦〉 = 〈𝑥 , 𝑇 ∗ (𝑦)〉 for all x, y ∈ V. Furthermore, T* is linear.

107
Proof. Let y ∈ V. Define g: V → F by g(x) = 〈𝑇(𝑥), 𝑦〉 for all x ∈ V.
First, we show that g is linear.
Let x 1 , x 2 ∈ V and c ∈ F. Then g( cx 1 + x 2 ) = 〈𝑇(𝑐𝑥1 + 𝑥2 ), 𝑦〉
= 〈𝑐 𝑇(𝑥1 ) + 𝑇(𝑥2 ), 𝑦〉
= 𝑐 〈𝑇 (𝑥1 ), 𝑦〉 + 〈𝑇 (𝑥2 ), 𝑦〉
= 𝑐𝑔(𝑥1 ) + 𝑔(𝑥2 ).
Hence g is linear.
Now, we apply the Theorem 3. 2. 1, to obtain a unique vector y1 ∈ V such that
g(x) = 〈𝑥 , 𝑦1 〉. That is 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑦1 〉 for all x ∈ V.
Defining T*: V → V by T*(y) = y1, we have 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑇 ∗ ( 𝑦)〉.
To show that T* is linear, we consider y 1 , y 2 ∈ V and c ∈ F. Then for any x ∈ V, we have
〈𝑥 , 𝑇 ∗ (𝑐𝑦1 + 𝑦2 )〉 = 〈𝑇(𝑥), 𝑐𝑦1 + 𝑦2 〉
= 𝑐̅〈𝑇(𝑥), 𝑦1 〉 + 〈𝑇(𝑥), 𝑦2 〉
= 𝑐̅〈𝑥, 𝑇 ∗ (𝑦1 )〉 + 〈𝑥, 𝑇 ∗ (𝑦2 )〉
= 〈𝑥, 𝑐 𝑇 ∗ (𝑦1 ) + 𝑇 ∗ (𝑦2 ) 〉
Since x is arbitrary, we have T*(cy 1 + y 2 ) = c T*(y 1 ) + T*(y 2 ).
Let V be an inner product space. If 〈𝑥, 𝑦〉 = 〈𝑥, 𝑧〉 for all x, y, z ∈ V, then y = z.
Finally, we show that T* is unique.
Suppose that U: V → V is linear and that it satisfies 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥, 𝑈(𝑦)〉 for all
x, y ∈ V. Then 〈𝑥 , 𝑇 ∗ ( 𝑦)〉 = 〈𝑥, 𝑈(𝑦)〉 for all x, y ∈ V, so T* = U, this completes the
proof.

Note. The linear operator T* described in the above theorem is called the adjoint of the
operator T. Thus T* is the unique operator on V satisfying 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑇 ∗ ( 𝑦)〉 for
all x , y ∈ V.

Theorem 3. 2. 3. Let V be a finite - dimensional inner product space, and let B be an


*
orthonormal basis for V. If T is a linear operator on V, then [ T*] B = [T ]B .

Proof. Let A = [ T ] B and B = [T*] B be two matrix, and the basis B = {v 1 , v 2 ,. . . . , v n }.

108
Then V is an inner product space with an orthonormal basis B = {v 1 , v 2 ,. . . , v n } and T

is a liner operator on V, and let A = [ T ] B . Then for any i and j, A ij = 〈𝑇 ( 𝑣𝑗 ), 𝑣𝑖 〉), we


have
������������
B ij = 〈𝑇 ∗ (𝑣𝑗 ) , 𝑣𝑖 〉 = 〈𝑣 �������������� ����
𝚤 , 𝑇 (𝑣𝚥 )〉 = 〈 𝑇(𝑣𝚤 ) , (𝑣𝚥 ) 〉 = 𝐴𝚥𝚤 = ( A*) ij .

Hence B = A*.

Note. Let A = [ T ] B be an n × n matrix. Then T A* = (T A )*.


Example -2. Let T be the linear transformation on C2 defined by
T( a 1 , a 2 ) = (2ia 1 + 3a 2 , a 1 – a 2 ). If B is the standard ordered basis for C2,
2𝑖 1 * − 2𝑖 1
then [T] B = � �. So [T*] B = [T ]B = � �
1 −1 3 −1
Hence T*(a 1 , a 2 ) = (– 2ia 1 + a 2 , 3a 1 – a 2 ).

3. 2. 1. Properties of Adjoint Operators and their Matrices


1. Let V be an inner product space, and let T and U be linear operators on V. Then
(i) (T + U)* = T* + U*,
(ii) (cT)* = 𝑐̅ T* for any c ∈ F,
(iii) (TU)* = U* T*,
(iv) T** = T,
(v) I* = I.

2. Let A and B be n × n matrices. Then


(i) (A + B)* = A* + B*
(ii) (cA)* = 𝑐̅ A* for any c ∈ F,
(iii) (AB)* = B* A*,
(iv) A** = A,
(v) I* = I.

Note.
1. Let A ∈ M m× n (F), x∈F n , and y ∈ F m . Then 〈𝐴𝑥 , 𝑦〉𝑚 = 〈𝑥 , 𝐴∗ 𝑦〉𝑛 .
2. Let A ∈ M m× n (F). Then rank (A*A) = rank (A).
3. If A is an m× n matrix such that rank (A) = n, then A*A is invertible.
109
Illustrative Example-3. Find the adjoint of linear transformation T: R2 → R2 given by
T(x, y) = (x +2y, x−y) for all (x, y) ∈ R2 .
Solution. Clearly B = {e 1 (2), e 2 (2)} is an orthonormal basis of R2 such that
T(e 1 (2)) = T(1,0) = (1,1) = le 1 (2) + le 2 (2)
T(e 2 (2)) = T(0, 1) = (2, −1) = 2e 1 (2) − 1e 2 (2)
The matrix A that represents T relative to the standard basis B is given by
1 2 1 1
𝐴= � � ⇒ 𝐴𝑇 = � �
1 −1 2 −1
The adjoint T* is represented by the transpose of A.
1 1 𝑥 𝑥+𝑦
Hence, 𝑇 ∗ (𝑋) = 𝐴𝑇 𝑋 = � � �𝑦� = � �
2 −1 2𝑥 − 𝑦
Therefore, T* (x, y) = (x + y, 2x − y).

3. 3. Normal and Self - Adjoint Operators

Lemma 3. 3. 1. Let T be a linear operator on a finite - dimensional inner product space


V. If T has an eigenvector, then so does T*.
Proof. Suppose that v is an eigenvector of a linear operator T with corresponding
eigenvalue λ. Then for any x ∈ V,
0 = 〈0 , 𝑥〉 = 〈(𝑇 − λ𝐼)(𝑣), 𝑥〉 = 〈𝑣, (𝑇 − λ𝐼)∗ (𝑥)〉 = 〈𝑣, �𝑇 ∗ − λ� 𝐼�(𝑥)〉 ,
and hence v is orthogonal to the range of (T* – λ� I) . So (T* – λ� I) is not onto and hence
is not one-to-one. Thus (T* – λ� I) has a nonzero null space, and any nonzero vector in
this null space is an eigenvector of T* with corresponding eigenvalue λ� .

Note.
1. An subspace W of a vector space V is said to be T-invariant if T(W) is contained
W and is defined by T W :W→W by T W (x)= T(x).
2. A polynomial is said to split if it factors into linear polynomials.

110
3. Let T be a linear operator on n- dimensional inner product space V. Suppose that
the eigen polynomial of T splits. Then there exists an orthonormal basis B for V

such that the matrix [T] B is upper triangular. This is known as a Schur Theorem.

3. .3. 1. Normal Linear Operator


Definition 3. 3. 1. Let V be an inner product space, and let T be a linear operator on V.
We say that T is normal if TT* = T*T.
An n× n real or complex matrix A is normal if AA* = A*A.

Example - 3. Let T: R2 → R2 be rotation by θ, where 0 < θ < π. The matrix


𝐶𝑜𝑠 θ −𝑆𝑖𝑛 θ
representation of T in the standard ordered basis is given by 𝐴 = � �.
𝑆𝑖𝑛 θ 𝐶𝑜𝑠 θ

Example - 4. Suppose that A is a real skew - symmetric matrix, that is, AT= − A. Then A
is normal because both AAT and ATA are equal to −A2.

Theorem 3. 3. 2. Let V be an inner product space, and let T be a normal operator on V.


Then the following statements are true.
(i) ‖𝑇(𝑥)‖ = ‖𝑇 ∗ (𝑥)‖ for all x ∈ V.
(ii) (T – cI) is normal for every c ∈ F.
(iii) If x is an eigenvector of T, then x is also an eigenvector of T*. In fact, if
T(x) = λx. then T*(x) = λ� x.
(iv) If λ 1 and λ 2 are distinct eigenvalues of T with corresponding eigenvectors x 1
and x 2 , then x 1 and x 2 are orthogonal.

Proof. (i) For any x ∈ V, we have


‖𝑇(𝑥)‖2 = 〈𝑇(𝑥), 𝑇(𝑥)〉 = 〈𝑇 ∗ 𝑇(𝑥), 𝑥〉 = 〈𝑇𝑇 ∗ (𝑥), 𝑥〉 = 〈𝑇 ∗ (𝑥), 𝑇 ∗ (𝑥)〉 = ‖𝑇 ∗ ‖2 .
(ii) The proof is obvious.
(iii) Suppose x is an eigenvector of T , that is T(x) = λx for some x ∈ V. Let
U = (T – λ I). Then U(x) = 0, and U is normal by (ii). Thus (i) implies that

0 = U ( x) = U * ( x) = (T * − λ I )( x) = T * − λ x

Hence T*(x) = λ x. So x is an eigenvector of T*.

111
(iv) Let λ1 and λ2 be distinct eigen values of T with corresponding eigenvectors x 1
and x 2 . Then, using (iii), we have
λ 1 〈 x 1 , x 2 〉 = 〈λ 1 x 1 , x 2 〉 = 〈T(x 1 ), x 2 〉 = 〈x 1 , 𝑇 ∗ (x 2 )〉 = 〈 x 1 , λ� 2 x 2 〉 =λ� 2 〈 x 1 , x 2 〉.
Since λ 1 ≠ λ 2 , we conclude that 〈 x 1 , x 2 〉= 0

Note. Suppose that p(z) = a n zn + a n-1 zn – 1 + . . . . . . . .+ a 1 z + a 0 is a polynomial in


P n (C) of degree n ≥ 1. Then p(z) has a zero. This is known as Fundamental Theorem of
Algebra.

Theorem 3. 3. 3. Let T be a linear operator on a finite - dimensional complex inner


product space V. Then T is normal if and only if there exists an orthonormal basis for V
consisting of eigenvectors of T.
Proof. Suppose that T is normal. By the fundamental theorem of algebra, the eigen
polynomial of T splits. So we may apply Schur’s theorem to obtain an orthonormal basis
B = {v 1 , v 2 ,. . . . , v n } for V such that [T] B = A is upper triangular. We know that v 1 is
an eigenvector of T because A is upper triangular. Assume that v 1 , v 2 ,. . . . . , v k–1 are
eigenvectors of T.
We claim that v k is also an eigenvector of T. It then follows by mathematical
induction on k that all of the v i ’s are eigenvectors of T. Consider any j < k, and let λ j
denote the eigenvalue of T corresponding to v j . By theorem 3. 2. 2. (iii), T*(v j ) = λ� 𝑣𝑗 .
Since A is upper triangular,
T(v k ) = A 1k v 1 + A 2k v 2 + . . . . . . . . + A jk v j + . . . . . . + A kk v k .
By a finite - dimensional inner product space V with an orthonormal basis
B = {v 1 , v 2 ,. . . . , v n }. If T is a linear operator on V and A = [T] B . Then for any i and j,
A ij = 〈𝑇 ( 𝑣𝑗 ), 𝑣𝑖 〉),
A jk = 〈𝑇(𝑣𝑘 ), 𝑣𝑗 〉 = 〈(𝑣𝑘 ), 𝑇 ∗ (𝑣𝑗 )〉 = 〈𝑣𝑘 , λ�𝚥 𝑣𝑗 〉 = λ𝑗 〈𝑣𝑘 , 𝑣𝑗 〉 = 0
It follows that T(v k ) = A kk v k , and hence v k is an eigenvector of T. So by induction, all the
vectors in B are eigenvectors of T.

3. 3. 2. Self-Adjoint (Hermitian) Operator

112
Definition 3. 3. 2. Let T be a linear operator on an inner product space V. Then T is
known as self-adjoint (Hermitian) if T = T*.
An n × n real or complex matrix A is self-adjoint (Hermitian) if A = A*.

Lemma 3. 3. 4. Let T be a self-adjoint operator on a finite-dimensional inner product


space V. Then
(i) Every eigenvalue of T is real.
(ii) Suppose that V is a real inner product space. Then the characteristic polynomial
of T splits.

Proof. (i) Suppose that T(x) = λx for x ≠ 0. Because a self-adjoint operator is also
normal, by T is a normal operator on inner product space V. If x is an eigenvector of T,
then x is also an eigenvector of T*. In fact, if T(x) = λx. Then T*(x) = λ� x.), to obtain
λx =T(x) = T*(x) = λ� x. So λ = λ� , that is, λ is real.
(ii) Let dim(V) = n, B be an orthonormal basis for V, and A = [T] B . Then A is self -
adjoint . Let T A be the liner operator on C n defined by T A (x) = Ax for all x ∈ C n .
Note that T A is self - adjoint because [T] A = A, where A is the standard ordered
(orthonormal) basis for C n. So, by (i), the eigenvalues of T A are real. By the
Fundamental theorem of algebra, the characteristic polynomial of T A splits into factors of
the form (t – λ). Since each λ is real, the characteristic polynomial splits over R. But T A
has the same characteristic polynomial as A, which has the same characteristic
polynomial as T. Therefore the characteristic polynomial of T splits.

Theorem 3. 3. 5. Let T be a linear operator on a finite - dimensional real inner product


space V. Then T is self - adjoint if and only if there exists an orthonormal basis B for V
consisting of eigenvectors of T.
Proof. Suppose that T is self-adjoint. By lemma 3. 3. 4, we may apply Schur’s theorem
to obtain an orthonormal basis B for V such that the matrix A = [T] B is upper triangular.

But, A* = [T]* B = [T*] B = [T] B = A. So A and A* are both upper triangular, and therefore

A is a diagonal matrix. Thus B must consist of eigenvectors of T. Converse is obvious.

113
3. 4. Unitary and Orthogonal Operator

Definition 3. 4. 1. Let T be a linear operator on a finite - dimensional inner product


space V over a field F. If T ( x) = x for all x ∈ V. Then T is known as a unitary
operator if F = C and an orthogonal operator if F = R.

Note. In the infinite - dimensional, an operator satisfying the preceding norm requirement
is generally called an isometry. If, in addition, the operator is onto (the condition
guarantees one-to-one), then the operator is called a unitary or orthogonal operator.

Example -1. Let h ∈ H satisfy h(x) = 1 for all x. Define the linear operator T on H by

2π 2
1
h(t ) =1
2
= hf = ∫ h(t ) f (t ) h(t ) f (t ) dt =
2 2
T(f) = hf. Then T( f ) f . Since
2π 0

for all t. So T is a unitary operator.

Note. Let T be a linear operator on a finite - dimensional inner product space V. Then
the following statements are equivalent.
(i) TT* = T*T = 1.
(ii) T ( x) , T ( y ) = x , y for all x, y ∈ V.

(iii) If B is an orthonormal basis for V, then T(B) is an orthonormal basis for V.

T ( x) = x
(iv) for all x ∈ V.

Definition 3. 4. 2. A square matrix A is called an orthogonal matrix if ATA = AAT = I


and unitary if A*A = AA* = I.
Note.
1. Since for a real matrix A we have A* = AT, a real unitary matrix is also orthogonal.
In this case, we call A orthogonal rather than unitary.
2. The condition AA* = I is equivalent to the statement that the rows of A form an
n n
orthonormal basis for F because δ ij = I ij = ( AA* ) ij = ∑ Aik ( A* ) kj = ∑ Aik A jk ,
n

k =1 k =1

114
and the last term represents the inner product of the ith and jth rows of A.
Similarly, we have columns of A and the condition A*A = I .

Theorem 3. 4. 1. Let A be a complex n × n matrix. Then A is normal if and only if A is


unitarily equivalent to a diagonal matrix.
Proof. By the above note, we need only prove that if A is unitarily equivalent to a
diagonal matrix, then A is normal. Suppose that A = P*DP, where P is a unitary matrix
and D is a diagonal matrix. Then
AA* = (P* DP) (P*DP)* = (P*DP)(P*D*P) = P*D I D*P = P*D D*P.
Similarly, A*A = P* D* D P. Since D is a diagonal matrix, however,
we have DD* = D*D. Thus AA* = A*A.

Theorem 3. 4. 2. Let A be a real n × n matrix. Then A is symmetric if and only if A is


orthogonally equivalent to a real diagonal matrix.
Proof. Proof of this theorem is similar to theorem 3. 4.1.

3. 5. Orthogonal Projections

If V = W 1 ⊕ W2 , then a linear operator T on V is the projection on W 1 along W 2


if, whenever x = x 1 + x 2 , with x 1 ∈W 1 and x 2 ∈W 2 , we have T(x) = x 1 . We know that
rank (T) = W 1 = { x ∈ V: T(x) = x} and nullity(T) = W2 .
So V = rank (T) ⊕ nullity (T). Thus there is no ambiguity if we refer to T as a “projection
on W1 ” or simply as a “projection”. In fact, it can be shown that T is a projection if and
only if T = T2. Because V = W1 ⊕ W2 = W1 ⊕ W3 , we see that W1 does not uniquely
determine T. For an orthogonal projection T, however, T is uniquely determined by its
range.

Definition 3. 5. 1. Let V be an inner product space, and let linear operator T: V → V be a


projection. Then T is known as an orthogonal projection if rank (T)⊥ = nullity (T) and
nullity (T)⊥ = rank (T).

115
Note. If rank (T)⊥ = nullity (T), then rank (T)⊥ ⊥ = nullity (T) ⊥, provided V is a finite
dimensional inner product space over a field F.

Theorem 3. 5. 1. Let V be an inner product space, and let T be a linear operator on V.


Then T is an orthogonal projection if and only if T has an adjoint T* and T2 = T = T*.
Proof. Suppose that T is an orthogonal projection.
Since T2 = T because T is a projection, we need only show that T* exists and T = T*.
Now V = rank(T) ⊕ nullity(T) and rank(T)⊥ = nullity(T).
Let x, y∈ V. Then x = x 1 + x 2 and y = y 1 + y , where x , y 1 ∈ rank (T) and x 2 , y 2 ∈
nullity (T). Hence
〈𝑥, 𝑇(𝑦)〉 = 〈𝑥1 + 𝑥2 𝑦1 〉 = 〈𝑥1 , 𝑦1 〉 + 〈𝑥2 𝑦1 〉 = 〈𝑥1 , 𝑦1 〉 and
〈 𝑇(𝑥), 𝑦〉 = 〈𝑥1 , 𝑦1 + 𝑦2 〉 = 〈𝑥1 , 𝑦1 〉 + 〈𝑥1 𝑦2 〉 = 〈𝑥1 , 𝑦1 〉.
So 〈 𝑥, 𝑇( 𝑦)〉 = 〈 𝑇(𝑥), 𝑦〉 for all x, y ∈ V, thus T* exists and T = T*.
Now suppose that T2 = T = T*.
Then T is a projection and hence we must show that
rank (T) = nullity (T) ⊥ and rank (T) ⊥ = nullity(T).
Let x ∈ rank (T) and y ∈ nullity(T).
Then x = T(x) = T*(x), and so 〈𝑥, 𝑦〉 = 〈 𝑇 ∗ (𝑥), 𝑦〉 = 〈 𝑥, 𝑇( 𝑦)〉 = 〈𝑥, 0〉 = 0
Therefore x ∈ nullity (T)⊥, from which it follows that rank (T) ⊆ nullity (T)⊥.
Let y ∈ nullity (T)⊥.
We must show that y ∈ rank (T), that is, T(y) = y.
Now, ‖𝑦 − 𝑇(𝑦)‖2 = 〈 𝑦 − 𝑇( 𝑦), 𝑦 − 𝑇(𝑦)〉 = 〈𝑦 , 𝑦 − 𝑇( 𝑦)〉 − 〈𝑇( 𝑦), 𝑦 − 𝑇(𝑦)〉
Since y – T(y) ∈ nullity (T) , the first term must equal zero.
But also 〈𝑇( 𝑦), 𝑦 − 𝑇(𝑦)〉 = 〈𝑦 , 𝑇 ∗ (𝑦 − 𝑇(𝑦))〉 = 〈𝑦 , 𝑇(𝑦 − 𝑇(𝑦))〉 = 〈𝑦 .0〉 = 0
Thus y – T(y) = 0, that is, y = T(y) ∈ rank (T).
Hence rank (T) = nullity (T) ⊥ .
Thus, we have rank (T) ⊥ = nullity (T)⊥⊥ ⊇ nullity(T).
Now suppose that x ∈ rank (T)⊥.
For any y ∈ V, we have 〈 𝑇(𝑥), 𝑦〉 = 〈 𝑥 , 𝑇 ∗ (𝑦)〉 = 〈 𝑥 , 𝑇(𝑦)〉 = 0.
So T(x) = 0, and thus x ∈ nullity (T). Hence rank (T) ⊥ = nullity (T) .

116
3. 5. 1. Spectral Theorem
Theorem 3. 5. 2. Suppose that T is a linear operator on a finite - dimensional inner
product space V over F with the distinct eigenvalues λ 1 , λ 2 , . . . . , λ k . Assume that T is
normal if F = C and that T is self -adjoint if F = R. For each I (1 ≤ i ≤ k), let W i be the
eigenspace of T corresponding to the eigenvalue λ i , and let T i be the orthogonal
projection of V on W i . Then the following statements are true.
(i) V = W 1 ⊕ W2 ⊕ . . . . . . . ⊕ Wk

(ii) If W i
denotedeedededededeelkdfkljdjkljkldkljkjldjkldsfdfdfdffddddjkldkljfdkjldddddddd
ddddddddddfsdededdeeedeedeededd denotes the direct sum of the subspaces
W j for j ≠ i, then W⊥ i = W । iiiiiiiii

(iii) T i T j = δ ij T i for 1 ≤ i , j ≤ k.

(iv) I = T 1 + T 2 + . . . . . . . . T k

(v) T = λ 1 T 1 + λ 2 T 2 + . . . . . . .+ λ k T 1 .

To prove the spectral Theorem, we need following Facts

Fact 1. Let T be a linear operator on a finite -dimensional complex inner product space
V. Then T is normal if and only if there exists an orthonormal basis for V consisting of
eigenvectors of T.

Fact 2. Let T be a linear operator on a finite- dimensional real inner product space V.
Then T is self - adjoint if and only if there exists an orthonormal basis B for V consisting
of eigenvectors of T.

Fact 3. A linear operator T on a finite - dimensional vector space V is diagonalizable if


and only if V is the direct sum of the eigenspaces of T.

117
Fact 4. Let V be an inner product space, and let T be a normal operator on V. Then λ 1
and λ 2 are distinct eigenvalues of T with corresponding eigenvectors x 1 and x 2 , then x 1
and x 2 are orthogonal.

Fact 5. Suppose that S = { v 1 , v 2 , . . . . . . ,v k }is an orthonormal set in an n - dimensional


inner product space V. Then
a) S can be extended to an orthonormal basis { v 1 , v 2 , . . .. .v k , v k + 1 , . . . . .v n } for V.

b) If W = span (S), then S 1 = { v k + 1 , v k + 2 , . . . . . , v n } is an orthonormal basis for W⊥

c) If W is any subspace of V, then dim (V) = dim (W) + dim (W⊥).

Proof of the Main Theorem. By Fact -1 and Fact - 2, T is diagonalizable;


V = W 1 ⊕ W2 ⊕ . . . . . . . ⊕ Wk by Fact-3.
If x ∈ W i and y ∈ W j for some i ≠ j, then 〈 𝑥 , 𝑦 〉 = 0.
By Fact- 4. It follows easily from this result that W1 i ⊆ Wi ⊥ , From (i), we have

dim(𝑊𝑖1 ) = � dim�𝑊𝑗 � = dim(𝑉) − dim(𝑊𝑖 )


𝑗≠𝑖

On the other hand, we have dim (W i ⊥) = dim (V) − dim (W i ) by Fact- 5.


Hence W1 i ⊆ Wi ⊥ , proving (ii). The proof of (iii) is obvious.
(iv) Since T i is the orthogonal projection of V on W i , it follows from (ii) that
nullity (T i ) = rank (T i )⊥ = Wi ⊥= W1 i .
Hence, for x ∈ V, we have x = x 1 + x 2 + . . . . . + x k , where T i (x) = x i ∈ W i , proving
(iv).
(v) For x ∈ V, write x = x 1 + x 2 + . . . . . . . + x k , where x i ∈ W i . Then
T(x) = T (x 1 ) + T(x 2 ) + . . . . . . . + T(x k )
= λ x 1 + λx 2 + . . . . . . . + λx k .
= λ 1 T 1 (x 1 ) + λ 2 T 2 (x 2 ) + . . . . . . . + λ k T k (x k )
= ( λ 1 T 1 + λ 2 T 2 + . . . . . . . + λ k T k ) (x).
This completes the proof.

118
Note. The set {λ 1 , λ 2 , . . . . ., λ k } of eigenvalues of T is called the Spectrum of T, and
the condition (iv) is called a resolution of the identity operator induced by T and the
condition (v) is called a Spectral decomposition of T.

3. 6. Summary

1. The adjoint of an operator A is also sometimes called the Hermitian conjugate of


A and is denoted by A*. The self-adjoint operator is an operator that is its own
adjoint, or, equivalently, one whose matrix is Hermitian, where a Hermitian
matrix is one which is equal to its own conjugate transpose.
2. For Unitary, Normal and self - adjoint operators, it is necessary and sufficient for
the vector space V to possess a basis of eigen vectors. As V is an inner product
space in this block, it is reasonable to seek conditions that guarantee that V has an
orthonormal basis of eigenvectors.
3. Let V be an inner product space, and let T be a linear operator on V. Then T is an
orthogonal projection if and only if T has an adjoint T* and T2 = T = T*.
Throughout this section we study the different types of linear operator and its
properties with their matrices form.
4. The spectral theorem also provides a canonical decomposition, called the spectral
decomposition, eigenvalue decomposition, or eigen decomposition, of the
underlying vector space on which the operator acts. The importance of
diagonalizable operators is seen in Block – II.

3. 7. Keywords

Adjoint of a linear operator Orthogonal operator


Adjoint of a matrix Orthogonal vectors
Hermitian Orthonormal basis
Normal matrix Self - adjoint matrix
Normal operator Self - adjoint operator
Normalizing a vector Spectrum

119
Spectral decomposition Unitary matrix
Spectral Theorem Unitary operator
Unitarily equivalent matrices

3. 8. Assenssment Questions

1. Define adjoint operator T*. Show that adjoint operator T* is linear operator.
Hint. See Theorem 3. 2. 2
2. Find the adjoint of the linear operator T: R3→ R3 defined by
T(x, y, z) = (x+2y, 3x–4z, y).
Answer. T*(x, y, z) = (x+3y, 2x+z, –4y).
3. Show that a linear operator T is self-adjoint if and only if 〈 T(v), v〉 is real for v.
Hint. See Lemma 3. 3. 4-(i)
4. If A is an orthogonal matrix, show that AT and A−1 are orthogonal.
Hint. See Definition 3. 4. 2.
5. State and Prove the Spectral Theorem.
Hint. See Theorem 3. 5. 2.

3. 9. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Second Edition, PHI, 1978.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.

120
UNIT- 4: BILINEAR AND QUADRATIC FORMS

STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. Bilinear Form
4. 2. 1. Symmetric Bilinear Form
4. 3. Quadratic Form
4. 3. 1. Sylvester Inertia Theorem
4. 4. Summary
4. 5. Keywords
4. 6. Assessment Questions
4. 7. References

121
UNIT- 4: BILINEAR AND QUADRATIC FORMS

4. 0. Objectives

After working through this unit, the reader should able to :


• Discuss the bilinear form.
• Express the bilinear form in matrix form.
• Define the symmetric bilinear form
• Illustrate the quadratic form
• Discuss the Sylvester Inertia Theorem
• Illustrate the real quadratic form
122
4. 1. Introduction

In this unit, we will generalize the notion of linear forms. In fact, we will
introduce the notion of a bilinear form on a finite-dimensional vector space. We have
studied linear forms on V(F). Here, we will study bilinear forms as mapping from V×V to
F, which are linear forms in each variable. Bilinear forms also give rise to quadratic and
Hermitian forms.

4. 2. Bilinear Forms

Definition 4. 2. 1. Let V be a vector space over a field F. A transformation B: V × V→


F is said to be bilinear form on V if it satisfies following properties:
(i) B(ax 1 + x 2 ,y) = a B(x 1 , y) + B (x 2 , y) for all x 1 , x 2 , y ∈ V and a ∈ F
(ii) B (x, ay 1 + y 2 ) = a B (x, y 1 ) + B (x , y 2 ) for all x, y 1 , y 2 ∈ V and a ∈ F.

We denote the set of all bilinear forms on V by B (V).


Note. An inner product on a vector space is a bilinear form if the underlying field is real,
but not if the underlying field is complex.

Example -1. Let V be a vector space over F = R. Then the mapping defined by
B(x, y) = x . y (which is the inner product of x and y) for x, y in V, is a bilinear form on V.

Example - 2. Define a function B: R2 x R2 → R by


𝑎1 𝑏
𝐵 ��𝑎 � , � 1 �� = 2𝑎1 𝑏1 + 3𝑎1 𝑏2 + 4𝑎2 𝑏1 − 𝑎2 𝑏2
2 𝑏2
𝑎1 𝑏
For �𝑎 � , � 1 � ∈ 𝑅 2 .
2 𝑏2
We could verify directly that B is a bilinear form on R2.
2 3 𝑎1 𝑏
However, if 𝐴 = � � , 𝑥 = �𝑎 � and 𝑦 = � 1 �, then B(x, y) = xT Ay.
4 −1 2 𝑏2

123
The bilinearity of B now follows directly from the distributive property of matrix
multiplication over matrix addition.

Example - 3. Let V = Fn, where the vectors are considered as column vectors.
For any A ∈ M n×n (F), define B : V × V → F by B(x, y) = xT Ay for x, y ∈ V.
Notice that since x and y are n × 1 matrices and A is an n × n matrix, B(x, y) is a 1 × 1
matrix. We identify this matrix with its single entry. The bilinearity of B follows as in
the example-2.
For example, for a ∈ F and x 1 , x 2 , y ∈ V, we have
B(ax 1 + x 2 , y) = (ax 1 + x 2 )T A y = (axT 1 + xT 2 )Ay
= axT 1 Ay + xT 2 Ay
= a B(x 1 , y) + B(x 2 , y).
Note.
1. By xT Ay, we understand the product of three matrices.

 x1 
x 
 2
That is x =  .  , A=[a ij ] n×n and y =[y 1 , y 2 , . . . . ., y n ] .
T

 
.
 xn 

2. For any bilinear form B on a vector space V over afield F, the following properties
hold:
(i) If, for any x∈V, the functions T x , R x : V →F are defined by T x (y) = B(x, y)
and R x (y)=B(y, x) for all y∈V, then T x and R x are linear.
(ii) B(0, x) = B(x, 0) =0 for all x∈V.
(iii) For all x, y, z, w∈V,
B(x+y, z+w) = B(x, z) + B(x, w)+ B(y, z) + B(y, w)
(iv) If S: V × V→ F is defined by S(x, y) = B(y, x), then C is a bilinear form.

Definition 4. 2. 2. Let V be a vector space, Let B 1 and B 2 be bilinear forms on V, and let
a be a scalar. We define the sum B 1 + B 2 and the scalar product aB 1 by the equations
(B 1 + B 2 )(x, y) = B 1 (x, y) + B 2 (x, y) and

124
(aB 1 )(x, y) = a(B 1 (x, y)) for all x, y ∈ V.
Note.
1. For any vector space V, the sum of two bilinear forms and the product of a scalar
and a bilinear form on v are again bilinear forms on V. Furthermore, B(V) is a
vector space with respect to these operations.
2. Let B = {v 1 , v 2 , . . . . . ,v n } be an ordered basis for an n-dimensional vector space

V, and let B∈ B(V). We can associate with B an n × n matrix A whose entry in ith
row and jth column is defined by A ij = B (v i , v j ) for i, j = 1, 2, . . . . . . n.
3. The matrix A above is called the matrix representation of B with respect to be
the ordered basis B and is denoted by Ψ B is an isomorphism.

4. 2. 1. Symmetric bilinear form


Definition 4. 2. 3. A bilinear form B on a vector space V is symmetric bilinear form, if
B(x, y) = B(y, x) for all x, y ∈ V.

Theorem 4. 2. 1. Let B be a bilinear form on a finite - dimensional vector space V, and


let B be an ordered basis for a vector space V. Then B is symmetric if and only if Ψ B (B)
is symmetric.
Proof. Let B = {v 1 , v 2 , . . . . . ,v n } and C = Ψ B (B).
First assume that B is symmetric.
Then for 1≤ i, j ≤ n, C ij = B(v i , v j ) = B(v j , v i ) = C ji , and it follows that B is symmetric.
Conversely, suppose that B is symmetric.
Let S : V × V → F, where F is the field of scalars for V, be the mapping defined by
S(x, y) = B(y, x) for all x, y ∈ V.
If S : V × V → F is defined by S(x, y) = B(y, x), then S is a bilinear form.
Let D = Ψ B (S), Then for 1 ≤ i, j≤ n,
D ij = S(v i , v j ) = B(v j , v i ) = C ji = C ij .
Thus Ψ B (S) = D = C = Ψ B (B) , since Ψ B is one-to-one, we have S = B.

Hence B(y, x) = S(x, y) = B(x, y) for all x, y ∈ V, and therefore B is symmetric.


This completes the proof.

125
Definition 4. 2 .4. A bilinear form B on a finite - dimensional vector space V is called
diagonalizable if there is an ordered basis B for V such that Ψ B (B) is a diagonal matrix.

Corollary. Let B be a diagonalizable bilinear form on a finite - dimensional vector


space V. Then B is symmetric.
Proof. Suppose that B is diagonalizable. Then there is an ordered basis B for V such that

Ψ B (B) = D is a diagonal matrix.


Trivially, D is a symmetric matrix, and hence, by theorem 4. 2. 1, B is symmetric.

4. 3. Quadratic Form

Definition 4. 3. 1. Let V be a vector space over a field F. A function Q : V → F is


called a quadratic form if there exists a symmetric bilinear form B ∈ B(V) such that
Q(x) = B(x, x) for all x ∈ V. → (1)
Note.
1. If the field F is not of characteristic two, there is a one-to-one correspondence
between symmetric bilinear forms and quadratic forms given by the equation (1).
In fact, if Q is a quadratic form on a vector space V over a field F not of
characteristic two, and Q(x) = B(x, x) for some symmetric bilinear form B on V,
then we can recover B from Q because B(x, y) = ½ [Q(x + y) – Q(x) – Q(y)].
2. Let V be a finite - dimensional real inner product space, and let B be a symmetric
bilinear form on V. Then there exists an orthonormal basis B for V such that

Ψ B (B) is a diagonal matrix.

126
Corollary. Let K be a quadratic form on a finite - dimensional real inner product
space V. There exists an orthonormal basis B = {v 1 , v 2 , . . . . . ,v n } for V and scalars

λ 1 ,λ 2 , . . . . , λ n (not necessarily distinct) such that if x ∈ V and 𝑥 = ∑𝑛𝑖=1 𝑠𝑖 𝑣𝑖 , 𝑠𝑖 ∈ 𝑅.


Then 𝑄(𝑥) = ∑𝑛𝑖=1 λ𝑖 𝑠𝑖2 .
In fact, if B is the symmetric bilinear form determined by Q, then B can be

chosen to be any orthonormal basis for V such that Ψ B (B) is a diagonal matrix.

4. 3. 1. The Sylvester Inertia Theorem

For proving the Sylvester Inertia Theorem we need following basic definitions of
real quadratic forms:

Definition 4. 3. 2. Let Q be a quadratic form on the n-dimensional real vector space V.


A basis B = {v 1 , v 2 , . . . . . ,v n } of V for which the matrix A of Q has the form
+1

⎛ ⎞
+1
⎜ ⎟
⎜ −1 ⎟
𝐴=⎜ ⋱ ⎟
⎜ −1 ⎟
⎜ 0 ⎟

⎝ 0⎠
is called the Sylvester basis for Q, where I r and I s are respectively the r × r and
s × s unit matrices and where 0 t is the t×t zero matrix with 0 ≤ r, s and r + s ≤ n.

Definition 4. 3. 3. The sum of the positive characteristic roots and negative charac-
teristic roots , is called the Rank of the real quadratic form.
That is rank(Q) = r + s.

Definition 4. 3. 4. The difference of the positive characteristic roots and negative


characteristic roots , is called the Signature of the real quadratic form.
That is rank (Q) = r − s.

127
Theorem 4. 3. 1. For a quadratic form Q on an n-dimensional real vector space V there
always exist a Sylvester basis. The number r and s of positive and negative entries in the
diagonal matrix are independent of the choice of the Sylvester basis.
Proof. (i) Existence of the Sylvester basis:
We find such a basis by induction on n, rather as in the principal axes
transformation, only this time it is much easier. The theorem is trivial for Q = 0, let us
suppose Q ≠ 0. Then there must be a vector x∈ V with Q(x) = ±1, and this is all we need
for inductive step.
If B is the symmetric bilinear form of Q, then W: = { y∈ V : B(x, y) = 0} is an
(n−1) – dimensional subspace of V [This follows from the dimension theorem. That is, if
V has finite dimensional vector space, then rank(T) + nullity(T) = dim(V)].
By the inductive assumption QW has sylvester basis B = {v 1 , v 2 , . . . . . ,v n }, and we
only need to add x in the right place in order to obtain a Sylvester basis for all V.
(ii) r and s are well defined:
The quantity r can be defined independently of bases as the maximum dimension
of a subspace of V on which Q is positive definite. In order to see this, take some
Sylvester basis and consider the subspace V + , V − and V 0 by the first r and last
(n− r)-vectors , respectively. Then QV + is positive definite, but each higher dimensional
subspace W must by the dimension Theorem meet V − and V 0 non-trivially. Therefore
QW cannot be positive definite. Analogously, s is the maximum dimension of a
subspace on which Q is negative definite.

Note. Let A =[a ij ] be a real, n×n matrix acting on a field F n and that the inner product
for (δ 1 , δ 2 , . . . , δ n ) and (γ 1 , γ 2 , . . . , γ n ) in a field F n is the real number
δ 1 γ 1 + δ 2 γ 2 + . .. . + δ n γ n , for an arbitrary vector v = (x 1 , x 2 , . . . , x n ) in a field F n a
simple calculations shows that
Q(v) = 〈A(v), v〉 = 𝑎11 𝑥12 + 𝑎22 𝑥22 +. . . . . . . +𝑎𝑛𝑛 𝑥𝑛2 + 2 ∑𝑖<𝑗 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗 .
On the other hand, given any quadratic function in n-variables
γ11 𝑥12 + γ22 𝑥22 +. . . . . . . +γ𝑛𝑛 𝑥𝑛2 + 2 ∑𝑖<𝑗 γ𝑖𝑗 𝑥𝑖 𝑥𝑗 ,

128
with real coefficients γ ij , we clearly can realize it as the quadratic form associated with
real symmetric matrix C= γ ij .

Illustrative Example-1. Determine the rank and signature of the real quadratic form
matrix
4 5 −2
A=� 5 1 −2 �.
−2 −2 3
Solution. The characteristic polynomial of the matrix A is
4 5 −2 1 0 0
det (A−tI ) = det �� 5 1 −2 � − 𝑡 � 0 1 0 ��
−2 −2 3 0 0 1
= t3−8t2−14t+43.
It has, two positive and one negative root and the matrix A is non-singular.
Hence, the rank of real symmetric matrix of A is r = 3 and the signature of real
symmetric matrix of A is s = 1.

Illustrative Example-2. Find the rank and signature of the quadratic form
𝑥12 − 4𝑥1 𝑥2 + 𝑥22 .
1 −2 0
Solution. The matrix A=�−2 1 0� has rank of real symmetric matrix of A is r = 2
0 0 0
and characteristic polynomial
1 −2 0 1 0 0
det (A−tI ) = det ��−2 1 0� − 𝑡 � 0 1 0 ��
0 0 0 0 0 1
= t3−2t2−3t.
The roots are t = 0, −1 and 3.
Hence the signature of real symmetric matrix of A is s = 0.

4. 4. Summary

129
1. There is a certain class of scalar - valued functions of two variables defined on a vector
space that arises in the study of such diverse subjects as geometry and multivariable
calculus. This is the class of bilinear forms.
2. We study the basic properties of this class with a special emphasis on symmetric bilinear
forms, and we consider some of its applications to quadratic surfaces. a quadratic form is
a homogeneous polynomial of degree two in a number of variables. Quadratic forms
are homogeneous quadratic polynomials in n variables.
3. The Sylvester Inertia Theorem is a classification theorem for symmetric real n × n matrix,
it solves the classification problem for quadratic forms on Rn, and therefore on each n-
dimensional real vector space V.

4. 5. Keywords

Bilinear form Sylvester basis


Diagonalizable bilinear form Sylvester Inertia Theorem
Rank of real quadratic form Symmetric bilinear form
Signature of real quadratic form Quadratic form

4. 6. Assessment Questions

1. Define:
(i) Bilinear form,
(ii) Symmetric bilinear form with example.
2. Let B be a diagonalizable bilinear form on a finite - dimensional vector space V.
Show that B is symmetric.
Hint. See Corollary of Theorem 4. 2. 1.
3. State and prove the Sylvester Inertia Theorem
Hint. See Theorem 4. 3. 1.
4. Determine the rank and signature of the real quadratic form matrix

130
1 0 −1
A =� 0 2 1 �.
−1 1 0

Answer. Characteristic polynomial of A is, t3−3t2+3. The rank of real symmetric


matrix of A is r = 3 and the signature of real symmetric matrix of A is s = 1.

4. 7. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.
5. Klaus Janich– Linear Algebra, Springer, 1994.

131
BLOCK - III
Canonical Forms

131
132
UNIT-1: THE DIAGONAL AND TRIANGULAR FORM

STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. The Diagonal Form
1. 2. 1. Similarity Classes
1. 2. 2. Basic Facts of Diagonalization
1. 2. 3. Simultaneous Diagonalizable
1. 3. The Triangular Form
1. 3. 1. Criteria of a subspace
1. 4. Summary
1. 5. Keywords
1. 6. Assessment Questions
1. 7. References

133
UNIT-1: THE DIAGONAL AND TRIANGULAR FORM

1. 0. Objectives

After studying this unit you will be able to:


• Discuss the similarity classes.
• Explain the Basic Facts of diagonalization
• Describe the generalized eigenspace
• Distinguish between diagonalization and non-diagonalization
• Define the triangular canonical form
• Discuss the role of minimal polynomial in triangular form
• Explain the alternate theorem for triangular canonical form

1. 1. Introduction

If every linear operator is not diagonalizable, even if its characteristic polynomial


splits. That purpose of this unit to consider alternative matrix representations for
nondiagonalizable operators. Such representation is generally, known as canonical forms.
Here, we study the diagonal and triangular canonical form.

1. 2. The Diagonal form

1. 2. 1. Similarity Classes

1. Two square matrices A and B are said to be similar matrices if there exists a non
singular matrix C such that B = CAC–1 or A = C–1BC.
2. The linear transformations S, T∈A(V) are said to be similar linear transformation
if there exists an invertible element C∈A(V) such that T = C–1SC.
3. Similarity of linear transformation in A(V) is an equivalence relation, because

134
(i) T ∼ T a T = ITI–1
(ii) T∼S ⇒ T = CSC–1 ⇒ S = C–1T (C–1)–1 ⇒ S ∼ T
(iii) T ∼ S∼ U ⇒ T = CSC–1, S = DUD–1
⇒ T = C (DUD–1) C–1
⇒ T = (CD) U (CD)
–1

⇒ T ∼ U.
The equivalence classes are called similarity classes.

1. 2. 2. Basic Facts of Diagonalization

1. The linear operator T is called diagonalizable if there exists a basis for V with
respect to which the matrix for T is a diagonal matrix.
2. Let T be a linear operator on a vector space V, and let λ be an eigenvalue of T.
Define E λ = {x ∈ V: T(x) = λx} = n(T – λIv ). The set E λ is called the eigenspace
of T corresponding to the eigenvalue λ. Analogously, we define the eigenspace of
a square matrix A to be the eigenspace of T A .

Example -1. Let T be the linear operator on P 2 (R) defined by T(f(x)) =f1(x). The
matrix representation of T with respect to the standard ordered basis B for P 2 (R) is
0 1 0
[T] B= �0 0 2�.
0 0 0
Consequently, the characteristic polynomial of T is
−𝑡 1 0
2 � = –t .
3
det ([T] B – tI) =� 0 −𝑡
0 0 −𝑡
Thus T has only one eigenvalue (λ=0) with multiplicity 3. Solving T(f(x)) = f1(x) = 0
shows that E λ = n(T – λI v ) = n(T ) is the subspace of P 2 (R) consisting of the constant
polynomials. So {1} is a basis for E λ , and therefore dim (E λ ) =1.
Consequently, there is no basis for P 2 (R) consisting of eigenvectors of T, and
therefore T is not diagonalizable.

3. Let T be a linear operator on a finite dimensional vector space V, and let λ be an


eigenvalue of T having multiplicity m. Then 1≤ dim (E λ ) ≤ m, where E λ is a
135
subspace of V consisting of the zero vector and the eigenvectors of T
corresponding to the eigenvalue λ.
4. Let T be a linear operator, and let λ 1 , λ 2 , . . , λ k be distinct eigenvalues of T.
For each i = 1, 2, . . . , k, let v i ∈E λ , the eigenspace corresponding to λ i .
If v 1 + v 2 + . . . . + v k =0, then v i = 0 for all i.
5. Let T be a linear operator, and let λ 1 , λ 2 , . . , λ k be distinct eigenvalues of T. For
each i = 1, 2, . . . , k, let S i be a finite independent subset of the eigenspaces E λi .
Then S= S 1 ∪ S 2 ∪ . . . .∪ S k is a linearly independent of V.
6. Let T be a linear operator on a finite dimensional vector space V. Then T is
diagonalizable if and only if following conditions holds:
(i) The characteristic polynomial of T splits.
(ii) For each eigenvalue λ of T, the multiplicity of λ equals n – rank (T – λl).

The above conditions can be used to test if a square matrix A is diagonalizable


because diagonalizability of A is equivalent to diagonalizability of the operator T A .

3 1 0
Illustrative Example- 2. Test the matrix A=�0 3 0� ∈𝑀3x3 (𝐑) diagonalizable or not.
0 0 4
Solution. The characteristic polynomial of A is det (A–tI) = – (t –4) (t –3)2, which splits,
and so condition 1 of the test (Fact 6(i)) for diagonalization is satisfied. Also A has
eigenvalues λ 1 = 4 and λ 2 =3 with multiplicities 1 and 2, respectively. Since λ 1 has
multiplicity 1, condition 2 is satisfied for λ 1 . Thus we need only test condition 2 for λ 2 .
0 1 0
Because, A– λ 2 I = �0 0 0� has rank 2,
0 0 1
we see that n – rank (T – λl) = 3 – rank(A– λ 2 I) =1, which is not the multiplicity of λ 2 .
Thus condition 2 fails for λ 2 . Therefore A is not diagonalizable.

Remark. However, not every linear operator is diagonalizable, even if its characteristic
polynomial split. This Block consider to alternative matrix representations for
nondiagonalizable operators (see example-1 and illustrative example-2). These
representations are called Canonical forms.

136
1. 2. 3. Simultaneously diagonalizable
Definition 1. 2. 1. Two linear operators T and U on an n-dimensional vector space V are
called simultaneously diagonalizable if there exists some basis B of V such that [T] B and

[U] B are diagonal matrices.


Similarly, A, B∈M n×n (F) are called simultaneously diagonalizable if there exists an
invertible matrix C∈M n×n (F) such that both C–1AC and C–1BC are diagonal matrices.

Lemma 1. 2. 1. If D 1 , D 2 ∈M n×n (F) are two diagonal matrices, then D 1 D 2 = D 2 D 1 .


Proof. If D 1 , D 2 ∈M n×n (F) are two diagonal matrices, then
(D 1 D 2 ) ij = ∑𝑛𝑘=1(𝐷1 )𝑖𝑘 (𝐷2 )𝑘𝑗
= ∑𝑛𝑘=1 δik (𝐷1 )𝑖𝑘 δkj (𝐷2 )𝑘𝑗
= δij (𝐷1 )𝑖𝑖 δkj (𝐷2 )𝑖𝑖
= δij (𝐷2 )𝑖𝑖 δkj (𝐷1 )𝑖𝑖
= (D 2 D 1 ) ij .
Theorem 1. 2. 2. If T and U are simultaneously diagonalizable operators, then T and U
commute.
Proof. Let B be a basis of a vector space V that diagonalizes both T and U. Since
diagonal matrices commutes with each other (by above lemma),
[TU] B = [T] B [U] B = [U] B [T] B =[UT] B .
Now we can relate the two operations in the same way:
TU= φB-1  T[TU ]  φB = φB-1  T[UT ]  φB =UT, where φB is the standard representation
B B

with respect to basis B and T[TU ] is the left multiplication transformation by matrix [TU] B .
B

Illustrative Example - 3. Show that the linear map satisfying T2 = T is diagonalizable.


Solution. The minimum polynomial of T must divide t2 – t = t(t – 1). Hence it has
distinct roots. Hence T is diagonalizable as V can be decomposed into a sum of
subspaces annihilated by T and T – 1 and each of these spaces is diagonalizable.

Illustrative Example - 4. Let λ i = 0 for all i and T satisfy Tn = 1. Show that if T has all
the eigenvalues in F, then T is diagonalizable.

137
Solution. Since T has all the eigenvalues of a field F, the minimum polynomial of T is

∏ (t − λ )
ni
q(t) = i . Now we claim that all these roots are simple.
i

Since Tn = I, q(t) divides tn – 1, and if q(t) has a multiple root, so does tn – 1.


But tn – 1 cannot have multiple root in F as λ i = 0 for all i and there is no common root
between tn – 1 and its derivative.

1. 3. Triangular Canonical Form

Definition 1. 3. 1. The linear transformation T is called triangular if there exists a basis


for V such that the matrix of T relative to the basis is an upper or lower triangular matrix.
In other words, a matrix A = [a ij ] n×n is triangular if all the entries above or below the
main diagonal are zero.
𝑎11 0 0 𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 0 0 𝑎22 𝑎2𝑛
For example, A = [a ij ] n×n = � ⋮ ⋮ … ⋮ � or � ⋮ ⋮ … ⋮ �
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛 0 0 𝑎𝑛𝑛
Note.
1. det (A) = a 11 ,a 12 ,………,a nn.
2. A is non singular if and only if all a i i ≠ 0, for all I = 1, 2, . . . ,n; that is,
det (A) ≠ 0, In fact eigen values of A are the diagonal elements.
3. The subspace W of V is invariant under T∈A(V) if T(W) ⊂ W .

Lemma 1. 3. 1. If W⊂V is invariant under T, then T induces a linear transformation 𝑇 on


V/W, defined by 𝑇 (v + W) = T(v) + W. If T satisfies the polynomial q(x)∈F[x], then so
does 𝑇 . If p 1 (x) is the minimal polynomial for 𝑇 over F and if p(x) is that for T, then
p 1 (x)/p(x).
Proof. Given T: V→V is linear and W is an invariant sub space of V, WT⊂ W.
Let 𝑉 = V/W ={v + w : v∈V} = {𝑣 : 𝑣∈𝑉}.
For 𝑣∈𝑉, define 𝑇: V/W →V/W by 𝑇 (v) = T(v) + W.
First , To prove 𝑇 is well defined.

138
That is, to show that v 1 +W = v 2 +W ⇒ 𝑇 (v 1 +W) = 𝑇 (v 2 +W)
Let v 1 +W = v 2 +W . Then
⇒ v 1 – v 2 ∈W
⇒ T(v 1 –v 2 )∈W , because T is linear
⇒ T(v 1 ) – T(v 2 ) ∈W , because T(W) ⊂ W, since W is invariant under T
⇒ T(v 1 ) + W = T(v 2 ) +W
⇒ 𝑇 (v 1 +W) = 𝑇 (v 2 +W) or 𝑇 (𝑣 1 ) = 𝑇 (𝑣 2 )
R R

Thus , 𝑇 is well defined linear transformation of V/W .


Now, linearity of 𝑇 follows by the linearity of T
Let v 1 +W, v 2 +W∈V/W. Then,
Consider 𝑇 ((v 1 +W) + (v 2 +W)) = 𝑇 ((v 1 +v 2 )+W), by definition
= T (v 1 +v 2 ) + W, by definition
= (T(v 1 ) +W) + (T(v 2 ) +W) , because T is linear
= 𝑇 (v 1 +W) = 𝑇 (v 2 +W) or 𝑇 ( 𝑣 1 ) = 𝑇 (𝑣 2 )
R R

Therefore 𝑇 preserves addition.


Consider, v + W∈V/W , a ϵ F
𝑇 (a(v + W)) = 𝑇 (av + W)
= T(av) + W
= a(T(v) +W), because T is linear
= a𝑇 (v + W), by definition
𝑇 (a𝑣) = a(𝑇 (𝑣 )).
Therefore 𝑇 preserves scalar multiplication. Thus, 𝑇 is a linear transformation.
Let q(x)∈F[x], if q(T) = 0, then q(T ) = 0.
We claim that (𝑇 )k = 𝑇 k
P

For, (𝑇 )2 (v + W) = 𝑇 (𝑇 (v + W))
= 𝑇 (T(v) +W)
= (TT(v) + W)
= T2 (v) + W
= (𝑇)2(v + W)

139
Similarly, (𝑇 )3 = (𝑇 3), (𝑇)4,…………, (𝑇)k = (𝑇 k)
P

Let q(x)∈F[x], If q(x) = a 0 +a 1 x+a 2 x2+……….. + a k xn


q(T) = a 0 + a 1 (𝑇)+ a 2 (𝑇 )2+……….. + a k (𝑇)k
= a 0 +a 1𝑇+a 2+ (𝑇 2)+……….. + a k (𝑇 k ) P

For any polynomial q(x)∈F[x], we have q(𝑇) = 𝑞(𝑇) . There fore q(T) = 0 ⇒ 𝑞(𝑇)= 0,
that is, q(𝑇 ) = 0. Given p(x) is the minimal polynomial of 𝑇.
Therefore p(T) = 0 ⇒ p(𝑇) = 0 and given p 1 (x) is the minimal polynomial of 𝑇 .
Therefore p 1 (𝑇) = 0. Therefore, by the definition of minimal polynomial, we have
p 1 (x)/p(x).

Theorem 1. 3. 2. If T∈A(V) has all its eigenvalues in F, then there is a basis of V in


which the matrix of T is triangular.
Proof. We prove this result by induction method on dim (V) over F.
Let dim(V(F)) = 1 , Then every element in A(V) is a scalar, and hence the theorem is true.
Suppose that the theorem is true for all vector spaces over F of dimension (n –1) and let V
be of dimension n over F.
Given, all the eigenvalues of T, then there exists a non zero vector v 1 ∈V such that
T(v 1 ) = λ 1 v 1 . Let W be a subspace of V generated by v, that is W = {av 1 : a∈F}.
Then W is a one - dimensional subspace of V, which is invariant under T. Let 𝑉 = V/W.
Then dim (𝑉) = dim (V) – dim (W) = (n –1).
Given T∈ A(V) ⇒ 𝑇 ∈ A(𝑉 ), where 𝑇 (v + W) = T(v) + W= 𝑇 (𝑣 ).
Then, we know that the minimal polynomial of 𝑇 divides the minimal polynomial of T.
That is, p 1 (x)/p(x) ( by lemma 1.3.1).
Therefore the eigenvalues of 𝑇 are in F because eigenvalues of T are in F. By induction
hypothesis, there exists a basis {𝑣 2 , 𝑣 3 ,……., 𝑣 n } = {v 2 + W, v 3 + W,………,
R R R

V n +W}
of 𝑉 = V/W over F such that 𝑇 (𝑣2 ) = a 22 𝑣 2 R

𝑇 (𝑣3 ) = a 32𝑣 2 + a 33 𝑣 3
R R

.......................

140
𝑇 (𝑣𝑛 ) = a n2𝑣 2 + a n3 𝑣 3 + . . . . .+ a nn 𝑣 n
R R R

We now verify that B = {v 1 , v 2 , ……,v n } is a basis for V with respect to which T has
matrix in triangular form.
Let a 1 v 1 + a 2 v 2 +…………. +a n v n = 0 → (1)
⇒ (a 1 v 1 + a 2 v 2 +…………. +a n v n )+W = W
⇒ a 1 (v 1 +W) + a 2 (v 2 +W)+……. a n (v n +W) = W
⇒ W + a 2 𝑣 2 + a 3 𝑣 3 +……….. + a n𝑣 n = W
R R R

⇒ a 2 𝑣 2 + a 3 𝑣 3 +……….. + a n𝑣 n = 0 of V/W
R R R

⇒ a 2 = 0, a 3 = 0,…….. a n = 0. Since 𝑣 2 , 𝑣 3 , ……,𝑣 n is linearly independent.


R R R

Now, from (1) , we get a 1 v 1 = 0. Therefore a 1 = 0 as v 1 ≠ 0


Therefore, the basis B ={v 1 ,v 2 ,………,v n } is linearly independent in V of dimension n.
If v 2 ,………,v n are elements of V mapping in to 𝑣 2 , 𝑣 3 ,……., 𝑣 n , respectively, then
R R R

B ={v 1 ,v 2 ,………,v n } form a basis of V.

Further, 𝑇 (𝑣2 ) = a 22 𝑣 2 ⇒ 𝑇 (𝑣2 ) – a 22 𝑣 2 = 0 of V/W


R R

⇒ (T(v 2 ) +W) – a 22 (v 2 + W) = 0
⇒ (T(v 2 ) – a 22 v 2 ) + W = W
⇒ (T(v 2 ) – a 22 v 2 )∈W
⇒ T(v 2 ) – a 22 v 2 = a 21 v 1 , since a 21 v 1 = 0
⇒ T(v 2 ) = a 21 v 1 + a 22 v 2
Similarly , we can prove T(v i ) = a i1 + a i2 +…….. + a ii v i
Thus, T(v 1 ) = λ 1 v 1 or a 11 v 1 (where λ 1 = a 11 )
T(v 2 ) = a 21 v 1 + a 22 v 2
………………………………
T(v i ) = a i1 + a i2 +…….. + a ii v i
……………………………..
T(v n ) = a n1 + a n2 +…….. + a nn v n
𝑎11 0 0
𝑎21 𝑎22 0
Therefore A = � ⋮ ⋮ … ⋮ � is a triangular matrix with respect to the
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
basis B ={v 1 ,v 2 ,………,v n }.
141
Lemma 1. 3. 3. If V is n - dimensional over F and if T∈A(V) has the matrix A in the
basis A ={u 1 ,u 2 ,… …,u n } and there is a matrix an element C ∈F n such that B = CAC–1.
In fact , if S is the linear transformation of V defined by S(u i ) = v i for i = 1,2, ……, n,
then the matrix C can be chosen to the basis B ={v 1 ,v 2 ,………,v n }.
Proof. Let A = [a ij ] and B = [b ij ]. Then T(u i ) = ∑𝑛𝑗=1 𝑎𝑖𝑗 𝑢𝑗 and T(v i ) = ∑𝑛𝑗=1 𝑏𝑖𝑗 𝑣𝑗 ,
R

respectively. Let S∈A(V) be defined by S(u i ) = v i so that S is invertible ( Since S takes


basis to basis, if and only if S is bijective if and only if S is invertible).
Now from T(v i ) = ∑𝑛𝑗=1 𝑏𝑖𝑗 𝑣𝑗

⇒ T(S(u i )) = ∑𝑛𝑗=1 𝑏𝑖𝑗 �𝑆�𝑢𝑗 �� , since S(u j ) = v j


R

⇒ (TS) (u i ) = 𝑆�∑𝑛𝑗=1 𝑏𝑖𝑗 𝑢𝑗 �

⇒ S–1(TS) (u i ) = S–1 𝑆�∑𝑛𝑗=1 𝑏𝑖𝑗 𝑢𝑗 �

⇒ S–1TS (u i ) = ∑𝑛𝑗=1 𝑏𝑖𝑗 𝑢𝑗 , since S is invertible.

By the matrix is taken with respect to the change of basis B ={v 1 ,v 2 ,………,v n }.

Therefore CAC–1 = B, by virtue of the fact that the mapping T→ [T ]B, A is an isomorphism
of A(V) on to F n .

Theorem 1. 3. 4. If the matrix A∈F n has all its eigen values in F, then there is a matrix
C∈ F n such that CAC–1 is a triangular matrix.
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Proof. Suppose that A = � ⋮ ⋮ … ⋮ �∈ F n has all its eigenvalues in F.
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
Now, define a linear map T: F n → F n which maps the basis B ={v 1 ,v 2 ,… …,v n }, where
v 1 = (1, 0, . . . ,0), v 2 = (0, 1, . . . ,0), . . . , v n = (0, 0, . . . ,1) as indicated below
T(v 1 ) = (a 11 , a 12 ,………,a 1n ) = a 11 v 1 + a 12 v 2 + a 13 v 3 + ………… + a 1n v n
T(v 2 ) = (a 21 , a 22 ,………,a 2n ) = a 21 v 1 + a 22 v 2 + a 23 v 3 + ………… + a 2n v n
……………………………………………………………………………….
T(v i ) = (a 11 , a 12 ,………,a 1n ) = = a n1 v 1 + a n2 v 2 + a n3 v 3 + ………… + a nn v n

142
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Therefore A = � ⋮ ⋮ … ⋮ �.
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
Suppose that the matrix A∈ F n as all its eigen values in F. A defines a linear
transformation T on F, whose matrix in the basis v 1 = (1, 0, . . . ,0), v 2 = (0, 1, . . . ,0), . . .
v n = (0, 0, . . . ,1), is precisely A. The eigen values of T being equal to those A, are all in
F.
By Theorem 1. 3. 2, there is a basis say, B ={v 1 ,v 2 ,… …,v n }, which is a triangular matrix
with respect to the basis of F. However this change of basis, merely changes the
matrix A of linear transformation T in the first basis CAC–1 for a suitable C∈ F n .
By lemma 1. 3. 3, we have CAC–1 is a triangular matrix for some C∈ F n .

Note.
1. Theorem 1. 3. 4, is also known as alternate form of Theorem 1. 3. 2.
2. Next theorem, we use λ i = a ii for i= 1, 2, …., n

Theorem 1. 3. 5. If V is n-dimensional vector space over a field F and T∈A(V) has all
its eigenvalues in F, then T satisfies a polynomial of degree n over F.
Proof. Since T∈A(V) has all its eigenvalues in F, there is a basis B ={v 1 ,v 2 ,… …,v n } of
V in which satisfies
T(v 1 ) = λ 1 v 1 or a 11 v 1 (where λ 1 = a 11 )
T(v 2 ) = a 21 v 1 + λ 2 v 2
T(v 2 ) = a 31 v 1 + a 32 v 2 + λ 3 v 3
……………………………..
T(v n ) = a n1 v 1 + a n2 v 2 +…….. + λ n v n
Equivalently, (T – λ 1 ) v 1 = 0
(T – λ 2 ) v 2 = a 21 v 1
(T – λ 3 ) v 3 = a 31 v 1 + a 32 v 2
………………………………
(T – λ n ) v n = a n1 v 1 + a n2 v 2 +…….. + a n, n–1 v n–1 .

143
Note that (T – λ 1 ) (T – λ 2 ) v 1 = (T – λ 2 ) (T – λ 1 ) v 1 =(T – λ 2 ) .0 = 0, since (T – λ 1 ) v 1 =
0.
Also, (T – λ 1 ) (T – λ 2 ) v 2 = (T – λ 1 ) a 21 v 1 , since (T – λ 2 ) v 2 = a 21 v 1
= a 21 ((T – λ 1 ) v 1 ) = a 21 .0 = 0, since (T – λ 1 ) v 1 = 0.
Continuing this type of computation, we get
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 1 = 0
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 2 = 0
……………………………………………
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v n = 0
Let S = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 )∈ A(V). Then
S (v 1 ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 1 = 0
S (v 2 ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 2 = 0
……………………………………………..………
S (v n ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v n = 0
That is S annihilates all the vectors of basis of V. So, S annihilates all the vectors of V.
That is S (v) = 0 for all v∈V.
Therefore S = 0 ⇒ (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) = 0.
Therefore T satisfies the polynomial (x – λ n ) (x – λ n–1 ) . . . . . (x – λ 1 ) in F[x] of
degree n.
This completes the proof.

1. 5. Summary

1. Suppose we have some set S of objects, with an equivalence relation. A canonical


form is given by designating some objects of S to be "in canonical form", such that
every object under consideration is equivalent to exactly one object in canonical
form. In other words, the canonical forms in S represent the equivalence classes,
once and only once. To test whether two objects are equivalent, it then suffices to
test their canonical forms for equality. A canonical form thus provides a

144
classification theorem and more, in that it not just classifies every class, but gives a
distinguished (canonical) representative.
2. Let's look at what it means for the matrix of T to be diagonal. Recall from
Block -II, we get the matrix by choosing a basis B= {v 1 , v 2 , . . . , v n }, and then
entering the coordinates of T(v 1 ) as the first column, the coordinates of T(u 2 ) as the
second column, etc. The matrix is diagonal, with entries λ 1 , λ 2 , λ 3 , . . . . , λ n , if
and only if the chosen basis has the property that T(v i ) = λ i v i , for 1 ≤ i ≤ n. This
leads to the definition of an eigenvalue and its corresponding eigenvectors. So,
diagonalizing a matrix is equivalent to finding a basis consisting of eigenvectors.
3. Particularly, in triangular canonical form, we study the following
• If T∈A(V) has all its eigenvalues in F, then there is a basis of V in which the
matrix of T is triangular.
• If V is n-dimensional vector space over a field F and T∈A(V) has all its
eigenvalues in F, then T satisfies a polynomial of degree n over F.

1. 6. Keywords

Canonical form Similar linear transformation


Diagonalizable Similarity classes
Generalized eigenspace Simultaneously diagonalizable
Generalized eigenvector Triangular canonical form
Minimal polynomial

1. 7. Assessment Questions

1. Show that the similarity of linear transformation in A(V) is an equivalence


relation.
Hint. See the section 1. 2. 1.
2. Explain the diagonalization and non- diagonalization matrix with suitable
example.

145
Hint. See the section 1. 2. 2.
3. Let A and B are two diagonalizable n × n matrices. Prove that A, B is
simultaneously diagonalizable if and only if A, B commute. (That is AB = BA).
Hint. See the section 1. 2. 3.
0 −2 −3 −4
4. Let A = � �, B = � �. Find the basis which simultaneously
1 3 2 3
diagonalizes A and B.
Answer. {(1, 1), (1, 2)}.
5. If T∈A(V) has all its eigenvalues in F, then show that there is a basis of V in which the
matrix of T is triangular.
Hint. See the Theorem 1. 3. 2.

1. 8. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.

146
UNIT-2: THE JORDAN CANONICAL FORM

STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. The Jordan Form
2. 2. 1. Basic Facts of Nilpotent Transformation
2. 2. 2. Minimal Polynomial
2. 2. 3. Basic Jordan Block
2. 3. Summary
2. 4. Keywords
2. 5. Assessment Questions
2. 6. References

147
UNIT-2: THE JORDAN CANONICAL FORM

2. 0. Objectives

After going through this unit, you will be able to:


• Discuss the basic facts of nilpotent transformation and index of nilpotence
• Explain the role of minimal polynomial in Jordan canonical form.
• Express the basic Jordan blocks with the importance of diagonals and
superdiagonals.
• Describe the collection Jordan blocks
• Illustrate the relation between diagonalization and Jordan canonical form.

2. 1. Introduction

The Jordan canonical form describes the structure of an arbitrary linear


transformation on a finite-dimensional vector space over an algebraically closed field.
Here we study only the most basic concepts of linear algebra, with no reference to
determinants or ideals of polynomials.

2. 2. Jordan Canonical form

2. 2. 1. Basic Facts of Nilpotent Transformations

Fact 1. If V = V 1 ⊕ V 2 ⊕ . . . . . . . ⊕ V k , where each subspace V i is of dimension n i


and is invariant under T, an element of A(V) , then a basis of V can be found so that the

148
𝐴1 0 ⋯ 0
𝐴2 ⋯ 0
matrix of T in this basis is of the form � 0 ⋯ ⋮ �, where each A i is an n i × n i
⋮ ⋮
0 0 ⋯ 𝐴𝑛
matrix and is the matrix of the linear transformation induced by T on V i .
Fact 2. If T ∈ A(V) is nilpotent, then a 0 + a 1 T + . . . . . . . + a m Tm , where the a i ∈ F, is
invertible if a 0 ≠ 0.
Notation. M s will denote the s × s matrix.
0 1 0 0 0
⎛0 0 1 ⋯ 0 0⎞
M s =⎜ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⎟.
0 0 0 ⋯ 0 1
⎝0 0 0 0 0⎠
All of whose entries are 0 except on the superdiagonal, where they are all i’s.

Definition 2. 2. 1. If T ∈ A(V) is nilpotent, then k is called the index of nilpotence of T


if Tk = 0 but Tk – 1 ≠ 0.

Fact 3. If T ∈ A(V) is nilpotent, of index of nilpotence n i , then a basis of V can be found


𝑀𝑛1 0 ⋯ 0
𝑀𝑛2 ⋯ 0
such that the matrix of T in this basis has the form � 0 ⋯ ⋮ �, where
⋮ ⋮
0 0 ⋯ 𝑛𝑛𝑟
n 1 ≥ n 2 ≥ . . . . . . . ≥ n r and where n 1 + n 2 + . . . . . . . ≥ n r = dim (V). Here, the
integers n 1 , n 2 , . . . . . . . , n r are called the invariants of T.

Definition 2. 2. 2. If T ∈ A(V) is nilpotent, the subspace M of V, of dimension m, which


is invariant under T, is called cyclic with respect to Tk if
(i) MTm = (0), MTm–1 ≠ (0);

(ii) There is an element z ∈ M such that z, T(z), . . . . . . . ,Tm – 1(z) form a basis of M.

2. 2. 2. Minimal Polynomial

1. A polynomial p(x) is said to be the Minimal polynomial for T, if


(i) p(T) = 0
(ii) If for all q(x) ∈F[x] such that degree of q(x) < degree of p(x), then q(T)≠0.
149
2. Let V 1 be a subspace of vector space V over F invariant under T∈A(V). Then T
induces a linear transformation T 1 ∈A(V 1 ) given by T 1 (u)= T(u) for all u∈V 1 .
3. For any q(x) ∈ F[x] , q(T) ∈ A(V), the linear transformation induced by q(T) on V 1
is q(T 1 ) i.e., q(T)V 1 = q(T 1 ), where T 1 = T V.
Further, q(T) = 0 ⇒ q(T 1 ) = 0. That is T 1 satisfies any polynomial satisfied by T.

Lemma 2. 2. 1. Suppose that V = V 1 ⊕V 2 , where V 1 and V 2 are subspaces of V


invariant under T. Let T 1 and T 2 be the linear transformations induced by T on V 1 and
V 2 , respectively of T 2 is p 2 (x), then the minimal polynomial for T over F is the least
common multiple of p 1 (x) and p 2 (x).
Proof. Let p(x) be the minimal polynomial of T .
If q(x) = Least common multiple of { p 1 (x) , p 2 (x)}, then p 1 (x)q(x) and p 2 (x)q(x).
Since p(T) = 0, we get p(T 1 ) = 0 and p(T 2 ) = 0
But p 1 (T 1 ) = 0 and p 2 (T 2 ) =0 follows from the fact that p 1 (x) and p 2 (x) are minimal
polynomial of T 1 and T 2.
There fore, p 1 (x)p(x) and p 2 (x)p(x)
⇒ Least common multiple of { p 1 (x) , p 2 (x)}p(x) ⇒ q(x)p(x) → (1)
On the other hand, p 1 (T 1 ) = 0 gives p 1 (T 1 ) (v 1 ) = 0 for any v 1 ∈V 1 .
But then p 1 (x)q(x) ⇒ q(T 1 ) (v 1 ) = 0
similarly p 2 (x)p(x) ⇒ q(T 2 )(v 2 ) = 0 for any v 2 ∈V 2 .
For any v∈V, we have v = v 1 +v 2 , where v 1 ∈V 1 , v 2 ∈V 2 .
Therefore, q(T) v = q(T) v 1 + q(T) v 2
= q(T 1 ) v 1 + q(T 2 ) v 2
q(T) v = 0 for any v∈V.
Therefore, q(T) = 0, that is q(x) is satisfied by the linear operator T∈A(V), since p(x) is
the minimal polynomial of T , we get p(x)q(x) → (2)
From (1) and (2) , we get q(x) = p(x).
Hence p(x) = Least common multiple of {p 1 (x) , p 2 (x)}.

Corollary. If V = V 1 ⨁V 2 ⨁…….⨁V k where each V i is a invariant under T and if p i (x) is


the minimal polynomial over a field F of T i , the linear transformation induced by T on V i ,

150
then the minimal polynomial of T over a field F is the Least common multiple of
p 1 (x), p 2 (x) ,… ., p k (x).

Note. In what follows, we use


V 1 = { v∈V /{ 𝑞1 (𝑇)𝑙1 (𝑣) = 0}
V 2 = { v∈V /{ 𝑞2 (𝑇)𝑙2 (𝑣) = 0}
........................
V k = { v∈V /{ 𝑞𝑘 (𝑇)𝑙𝑘 (𝑣) = 0} as subspaces of V, where q 1 (x), q 2 (x) ,……,
q k (x) are distinct irreducible polynomials and l 1 ,l 2 ,………, l k are positive integers. Then
we can verify that each V i is invariant under T .
For u ∈V i , we have T(u) 𝑞𝑖 (𝑇)𝑙𝑖 = [u 𝑞𝑖 (𝑇)𝑙𝑖 ] T = 0 T = 0, since 𝑞𝑖 (𝑇)𝑙𝑖 = 0.
Therefore, v i is an invariant under T, for all i = 1, 2,….. ,k.

Theorem 2. 2. 2. Let V i = { v∈V /{ 𝑞𝑖 (𝑇)𝑙𝑖 (𝑣) = 0} for all i = 1, 2,….. , k be the


invariant subspaces of V under T . Then V i ≠ 0 and V = V 1 ⨁V 2 ⨁……….⨁V k, , the
minimal polynomial of T i on V i is 𝑞𝑖 (𝑥)𝑙𝑖 .
Proof. If k = 1 , then V = V 1 and there is nothing that needs proving suppose that k >1.
To prove V i ≠ 0 for all i = 1, 2, . . . . , k, we define the polynomials
h 1 (x) = 𝑞2 (𝑥)𝑙2 𝑞3 (𝑥)𝑙3 . . . . . . 𝑞𝑘 (𝑥)𝑙𝑘 ,
h 2 (x) = 𝑞1 (𝑥)𝑙1 𝑞3 (𝑥)𝑙3 . . . . . . 𝑞𝑘 (𝑥)𝑙𝑘
..................................
h i (x) = ∏𝑗≠𝑖 𝑞𝑗 (𝑥)𝑙𝑗 , . . . . . ,
..................................
h k (x) = 𝑞1 (𝑥)𝑙1 𝑞2 (𝑥)𝑙3 . . . . . . 𝑞𝑘−1 (𝑥)𝑙𝑘−1
If p(x) = 𝑞1 (𝑥)𝑙1 𝑞2 (𝑥)𝑙3 . . . . . . 𝑞𝑘−1 (𝑥)𝑙𝑘−1 is the minimal polynomial of T ∈ A F (V),
then h i (x) ≠ p(x) and hence h i (T) ≠ 0.
Therefore, h i (T)(v) ≠ 0 for some v∈V. That is, w ≠ 0 , where w = h i (T)(v)
However, 𝑞𝑖 (𝑇)𝑙𝑖 (𝑤) = ( h i (T) 𝑞𝑖 (𝑇)𝑙𝑖 )(𝑣) = p(T) (v)= 0 (v) = 0, since p(T)=
h i (T) 𝑞𝑖 (𝑇)𝑙𝑖 .
Thus w∈V i , where w ≠ 0 of V.
Therefore, V i ≠{0} for i = 1,2, …….. k.
151
We also note, from above argument that h i (T)V ≠ {0} is in V i . That is, h i (T)V ⊂ V i .
Further, for j≠i we have 𝑞𝑖 (𝑥)𝑙𝑖 h i (x).
Therefore, v j ∈V j , we have h i (T)(v j ) = 0.
We now show that V = V 1 + V 2 +……+V k
The k polynomials h 1 (x), h 2 (x) ,………, h k (x) are relatively prime.
Hence we can find the polynomials a 1 (x), a 2 (x), .….., a k (x) in F[x] such that
1 = a 1 (x) h 1 (x) + a 2 (x) h 2 (x) +……….. + a k (x) h k (x)
1 = a 1 (T) h 1 (T) + a 2 (T) h 2 (T) +……….. + a k (T) h k (T)
v = a 1 (T) h 1 (T) (v) + a 2 (T) h 2 (T) (v) +……….. + a k (T) h k (T) (v) for v∈V.
But a i (T) (v) ∈V for all i ⇒ a i (T) h i (T) (v)∈ h i (T)V ⊂V i .
⇒ a i (T) h i (T) (v)∈V i for all i = 1,2…. k.
Hence from the above expression , we get
v = v 1 + v 2 +………. + v k , where v i = a i (T) h i (T) (v) for all i = 1,2… k.
Thus V = V 1 + V 2 +……. + V k .
We must now verify that this sum is a direct sum.
To show this , it is enough to prove that if u 1 +u 2 +………+u k = 0 with each u i ∈ V i, then
each u i = 0 suppose that u 1 +u 2 ……… +u k = 0 where some u i, say u i ≠ 0
Then h 1 (T) (u 1+ u 2 +……….+ u k ) = 0 gives
h 1 (T) (u 1 ) +………+ h 1 (T) (u k ) = 0 , where h 1 (T) (u j ) = 0 for all j ≠ 1
But u 1 ∈V 1 ⇒ 𝑞1 (𝑥)𝑙1 (u 1 ) = 0
Now 𝑞1 (𝑥)𝑙1 is respectively prime with h 1 (x) implies
1 = b 1 (x) h 1 (x) + b 2 (x) 𝑞1 (𝑥)𝑙1 for b 1 (x), b 2 (x) ∈F[x]
⇒ 1 = b 1 (T) h 1 (T) + b 2 (T) 𝑞1 (𝑇)𝑙1
⇒ u 1 = b 1 (T) h 1 (T)( u 1 ) + b 2 (T) 𝑞1 (𝑇)𝑙1 (u 1 )
⇒ = b 1 (T)( h 1 (T)( u 1 )) + b 2 (T)( 𝑞1 (𝑇)𝑙1 (u 1 ))
⇒ = b 1 (T) 0 + b 2 (T) 0 = 0.
This is a contradiction to the fact u i ≠ 0
Therefore, u 1 + u 2 +………. +u k = 0 ⇒ u i = 0 for all i .
Therefore, V = V 1 ⨁V 2 ⨁……….⨁V k .
Finally, we show that the minimal polynomial of T i on V i is 𝑞𝑖 (𝑇)𝑙𝑖 .
152
By the definition of V i , we have 𝑞𝑖 (𝑇)𝑙𝑖 V i = 0
Therefore, 𝑞𝑖 (𝑇)𝑙𝑖 = 0, since V i ≠ 0
Therefore, the minimal polynomial of T i is a divisor of 𝑞𝑖 (𝑇)𝑙𝑖 .
That is, the minimal polynomial of T i is 𝑞𝑖 (𝑇)𝑙𝑖 , where f i ≤ l i .
But, by a corollary on the minimal polynomial, we have the minimal polynomial of T
over F is the least common multiple of { 𝑞1 (𝑥) 𝑓1 𝑞2 (𝑥) 𝑓3 . . . . . . 𝑞𝑘 (𝑥) 𝑓𝑘 }
That is, 𝑞1 (𝑥)𝑙1 𝑞2 (𝑥)𝑙3 . . . . . . 𝑞𝑘 (𝑥)𝑙𝑘 = 𝑞1 (𝑥) 𝑓1 𝑞2 (𝑥) 𝑓3 . . . . . . 𝑞𝑘 (𝑥) 𝑓𝑘 .
This implies that l 1 = f 1 ,………, = l i = f i .
Therefore, the minimal polynomial of T i = 𝑞𝑖 (𝑥) 𝑓𝑖 . or 𝑞𝑖 (𝑥)𝑙𝑖 .
This completes the proof.

2. 2. 3. Basic Jordan block


λ 1 0 ⋯ 0 0
⎛0 λ ⋯ ⋯ 0 0⎞
Definition 2. 2. 3. The Matrix ⎜ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮⎟
0 0 ⋯ ⋯ 0 1
0 ⋯ ⋯ 0 λ⎠
⎝0
such λ’s on the diagonal, 1’s on the super diagonal, and 0’s elsewhere, is a basic Jordan
block belonging to λ.

Note. The matrix A is said to be in Jordan form if it satisfies following conditions.


(i) It must be block diagonal form, where each block has a fixed scalar on the
main diagonal and 1’s or 0’s on the super diagonal. These blocks are called
primary blocks of A.
(ii) The scalars for different primary blocks must be distinct.
(iii) Each primary block must be made up of secondary blocks with a scalar on the
diagonal and only 1’s on the superdiagonal. These blocks must be in decreasing
size (moving down the main diagonal).

Example - 1. Here are two matrices in Jordan form. The first has two primary blocks.
While the second has three.

153
3 1 0  3 1 0 
0 3 1  0 3 1 
   
0 0 3  0 0 3 
  
3

 3 1   
   3 
0 3  
   2 1 0 0 
 2 1   
 0 2  
0 2 1 0

   0 0 2 1 
 2 1  0 0 0 2 
 2 
 
 0  5 
 
 5

As a point of intersect, the first matrix has characteristic polynomial (x–3)5 (x–2)4
and minimal polynomial (x–3)3 (x–2)2. The second matrix has characteristic polynomial
(x–3)5 (x–2)4 (x–5)2 and minimal polynomial (x–3)3 (x–2)4 (x–5) .

Theorem 2. 2. 3. Let T ∈ A(V) have all its distinct characteristic roots, λ 1 , λ 2 , . . . . . ,


λ k , in F. Then a basis of V can be found in which the matrix T is of the form

𝐽1 𝐵𝑖1
⎛ 𝐵 𝑖2 ⎞
� 𝐽2 �, where each ⎜ ⎟ and where 𝐵𝑖1 , 𝐵𝑖2 . .. . . ,
⋱ ⎜ ⋱ ⎟
𝐽𝑘
⎝ 𝐵 𝑖𝑟 𝑖 ⎠

𝐵𝑖𝑟 𝑖 are Basic Jordan blocks belonging to λ i .


R

Proof. We note that an m × m basic Jordan block belonging to λ is merely λ + M m . By


Fact-1 and Fact-3, we can reduce to the case when T has only one characteristic root λ,
that is, T = λ + (T − λ), and since T − λ is nilpotent, by Fact-1, there is a basis in which its
M𝑛1
⎛ 𝑀n2 ⎞
matrix is of the form ⎜ ⎟.

⎝ 𝑀𝑛𝑟 ⎠
But then the matrix of T is of the form

154
M𝑛1 𝐵𝑖1
λ ⎛ ⎞
λ ⎛ 𝑀n2 ⎞ 𝐵 𝑖2
� � + ⎜ ⎟ = ⎜ ⎟
⋱ ⎜ ⎟
⋱ ⋱
λ 𝑀𝑛𝑟 ⎠
⎝ ⎝ 𝐵 𝑖𝑟 𝑖 ⎠
using the first remark made in this proof about the relation of a basic Jordan block and
the M m ’s. This completes the theorem.

Note.
1. In each J i the size of 𝐵𝑖1 ≥ size of 𝐵𝑖2 ≥ . . . . . . , when this has been done, then

𝐽1
the matrix � 𝐽2 � is called the Jordan canonical form of T.

𝐽𝑘
2. Two linear transformations in A(V) which have all their eigenvalues in F are
similar if and only if they can be brought to the same Jordan form.

1 0 0
Illustrative Example -2. Compute the Jordan canonical form for A= �0 0 −2 �.
0 1 3
Solution. Write A for the given matrix. The characteristic polynomial of A is
(λ–1)2(λ–2). So the two possible minimal polynomials are (λ–1) (λ–2) or the
characteristic polynomial itself.
We find that (A – I) (A – 2I) = 0 so the minimal polynomial is (λ–1) (λ–2), and hence the
invariant factors are λ–1, (λ–1) (λ–2).
The prime power factors of the invariant factors are the elementary divisors: λ–1, λ–1,
λ–2. Finally the Jordan canonical form of A is diagonal with diagonal entries 1, 1, 2.

Note. After determining that the minimal polynomial has all roots in the ground field and
no repeated roots, we can immediately conclude that the matrix is diagonalizable and
therefore the Jordan canonical form is diagonal.

Illustrative Example -3. Find all possible Jordan forms A for 6 × 6 matrices with
t2 (1 – t)2 as minimal polynomial.

155
Solution. The possible characteristic polynomials of A have the same irreducible factors
t and (1 – t) with atleast square exponents and of degree 6. Hence we have the
following cases:

Case (i). Characteristic polynomial is t4(t– 1)2. Jordan form in this case is
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
⎛0 0 0 1 0 0⎞ ⎛0 0 0 0 0 0⎞
⎜0 𝑜𝑟 ⎜
0 0 0 0 0⎟ 0 0 0 0 0 0⎟
0 0 0 0 1 1 0 0 0 0 1 1
⎝0 0 0 0 0 1⎠ ⎝0 0 0 0 0 1⎠
Case (ii). Characteristic polynomial is t3 (t – 1)3. Jordan form is
0 1 0 0 0 0
0 0 0 0 0 0
⎛0 0 0 0 0 0⎞
⎜0 0 0 1 1 0⎟
0 0 0 0 1 0
⎝0 0 0 0 0 1⎠
Case (iii). Characteristic polynomials is t2(t– 1)4. Jordan form is
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
⎛0 0 1 1 0 0⎞ ⎛0 0 1 1 0 0⎞
⎜0 𝑜𝑟 ⎜
0 0 1 0 0⎟ 0 0 0 1 0 0⎟
0 0 0 0 1 1 0 0 0 0 1 0
⎝0 0 0 0 0 1⎠ ⎝0 0 0 0 0 1⎠

Illustrative Example -4. Let J be a Jordan block with diagonal entriesλ. Then show that
λ is the only eigenvalue, and the associated eigenspace is only 1-dimensional.
Solution. Since J is upper-triangular, it is clear that the only eigenvalue is λ.
Solving Jx = λx gives us the equations λx i + x i+1 = λx i for i < n, and Jx n = λx n , from
which we see that x 2 = , . . . , = x n = 0, giving the eigenvector (1, 0, . . . . , 0).

2. 3. Summary

Jordan canonical form of a linear operator on a finite dimensional vector space


is an upper triangular matrix of a particular form called Jordan matrix, representing the
operator on some basis. The form is characterized by the condition that any non-diagonal

156
entries that are non-zero must be equal to 1, be immediately above the main diagonal (on
the superdiagonal), and have identical diagonal entries to the left and below them. The
diagonal form for diagonalizable matrices, for instance normal matrices, is a special case
of the Jordan normal form.

2. 4. Keywords

Generalized eigenspace Jordan canonical form


Generalized eigenvector Jordan form of a matrix
Index of nilpotence Minimal polynomial
Jordan block Nilpotent transformation
Jordan canonical basis

2. 5. Assessment Questions

1. Define the nilpotent transformation. Show that two nilpotent linear


transformations are similar if and only if they have same invariants.
Hint. See the Basic Facts of nilpotent transformation and section 1. 2. 1.

2. If V = V 1 ⨁V 2 ⨁…….⨁V k where each V i is a invariant under T and if p i (x) is the


minimal polynomial over a field F of T i , the linear transformation induced by T
on V i , then show that the minimal polynomial of T over a field F is the least
common multiple of p 1 (x), p 2 (x) ,… ., p k (x).
Hint. See the Lemma 2. 2. 1.

3. Define the Jordan canonical form with suitable example which contains atleast
two Jordan blocks.
Hint. See the definition 2. 2. 3 and Example-1.

2 6 − 15 
 
4. Find the Jordan canonical form of A=  1 1 − 5
1 2 − 6 

157
Answer. The characteristic polynomial of A is (λ–1)4

1 2 0 0
 
0 1 2 0
5. Determine the Jordan canonical form for the matrix A=  .
0 0 1 2
 
 1 
0 0 0

Answer. The characteristic polynomial of A is (λ–1)4 and A has a single Jordan


block of type (1, 4).

2. 6. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.

158
UNIT-3: THE MINIMAL POLYNOMIAL

STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. The Minimal Polynomial
3. 3. Summary
3. 4. Keywords
2. 7. Assessment Questions
3. 5. References

159
UNIT-3: THE MINIMAL POLYNOMIAL

3. 0. Objectives

After working through this unit, the reader should able:


• To define the basic concepts of minimal polynomial.
• To describe the minimal polynomial of T exists and unique
• To explain the role of division algorithm in minimal polynomial
• To discuss the linear transformation T is diagonalizable if and only if its minimal
polynomial is a product of distinct linear factors.

3. 1. Introduction

The minimal polynomial deals the distinct eigenvalues and the size of the largest Jordan
block corresponding to eigenvalues. In this unit, we study the minimal polynomial, which is
played vital role of canonical forms, particularly generalized eigenvalue and eigenvector.

3. 2. Minimal polynomial

Definition 3. 2. 1. The monic polynomial p(x) of minimum degree such that p(T) = 0 is
called the minimal polynomial of T
Note.
1. Let F be a field. Let p(x) and h(x) ∈ F[x] ( or simply, P(F)). Suppose p(x) ≠ 0.
Then we may find q(x) and r(x) ∈ F[x] such that h(x) = q(x)p(x) + r(x), where
either r(x) = 0 or deg (r(x)) < deg(p(x)). This is known as Division Algorithm.
2
2. If we consider the set I, T, . . . , 𝑇 𝑛 in n2-dimensional vector space A(V), we have
(n2 + 1) - elements, so they cannot be linearly independent. Thus there is some
2
linear combination a 0 I + a 1 T + , . . . , + 𝑎𝑛2 +1 𝑇 𝑛 that equals the zero function.
160
For every T∈A(V) satisfies some polynomial of degree ≤ n2. Knowing that there
is some polynomial that T satisfies, we can find a polynomial of minimal degree
that T satisfies, and then we can divide by its leading coefficient to obtain a monic
polynomial.

Theorem 3. 2. 1. Let p(x) be a minimal polynomial of a linear operator T on a finite


dimensional vector space V.
(i) The minimal polynomial of T exists and unique.
(ii) If p(x) is the minimal polynomial of T and h(x) is any polynomial with h(T) =0,
then p(x) is a factor of h(x).
Proof. Let p(x) be ammonic polynomial of minimal polynomial degree such that
p(T) = 0.
Suppose that h(x) is any polynomial with h(T) = 0.
The division algorithm holds for polynomials with coefficients in the field F, so it is
possible to write h(x) = q(x)p(x) + r(x), where either r(x) = 0 or deg (r(x)) < deg(p(x)).
If r(x) ≠ 0, then we have r(T) = h(T) – q(T)p(T) = 0.
We can divide r(x) by its leading coefficient to obtain a monic polynomial satisfied by T,
and then this contradicts the choice of p(x) as a monic polynomial of minimal degree
satisfied by T.
We conclude that the remainder r(x) = 0, so h(x) = q(x)p(x) and p(x) is thus a
factor of h(x).
If g(x) is another monic polynomial of minimal degree with g(T) = 0, then by the
preceding paragraph p(x) is a factor of g(x), and g(x) is a factor of p(x).
Since both are monic polynomials, this forces g(x) = p(x), showing that the minimal
polynomial is unique.

Example - 1. Consider the matrices A 0 , A 1 and A 2 above.


Now n i (x) = xi+1 is a polynomial such that n i (A i ) = 0.
So the minimal polynomial p i (x) must divide xi+1 in each case.
From this it is easy to see that the minimal polynomials are in fact n i (x), so that p 0 (x) = x,
p 1 (x) = x2 and p 2 (x) = x3.

161
For a more involved example consider the matrix B = B k (λ) ∈ M k,k (F), where λ is a
scalar and we have a trailing sequence of 1's on the main diagonal.
First consider T = B k (0) = B k (λ) –λIk = B –λIk .
This matrix is nilpotent. In fact Tk = 0, but Tk–1 ≠ 0.
So if we set g(x) = (x – λ)k then g(B) = 0.
Once again, the minimal polynomial p(x) of B must divide g(x).
So p(x) = (x – λ)i, some i ≤ k.
But since Tk–1 ≠ 0, in fact i = k, and the minimal polynomial of B is precisely
p(x) = (x – λ)k.

Note.
1. The characteristic and minimal polynomials of a linear transformation have the
same zeros (except for multiplicities).
2. If V is a finite- dimensional vector space over F, then T∈A(V) is invertible if and
only if the constant term of the minimal polynomial for T is not zero.
3. Suppose all the characteristic roots if T∈A(V) are in F. Then the minimal
polynomial, q 1 (x) = (𝑥 − λ1 )𝑙1 (𝑥 − λ2 )𝑙2 . . . . . (𝑥 − λk )𝑙𝑘 for λ i ∈F.
Here q i (x) = (𝑥 − λi )𝑙𝑖 and V i = {v∈V/ (𝑇 − λi )𝑙𝑖 (v) = 0 }.
So, if all the distinct characteristic roots λ i ………, λ k of T lie in F, then V can be
𝑙𝑖
written as V = V 1 ⨁V 2 ⨁……….⨁V k , where V i = {v∈V/ �𝑇– λi � (v) = 0 } and
where T i has only one characteristic root λ i on V i .

Example - 2. Let T be the linear operator of R2 defined by T(a, b)=(2a+5b, 6a+b) and B
2 5
be the standard ordered basis for R2. Then [T] B =� �, and hence the characteristic
6 1
2−𝑡 5
polynomial of T is f(t) = det� �= (t –7) (t+4).
6 1−𝑡
Thus the minimal polynomial of T is also (t –7) (t+4).

Theorem 3. 2. 2. The linear transformation T is diagonalizable if and only if its minimal


polynomial is a product of distinct linear factors.

162
Proof. If T is diagonalizable, we can compute its minimal polynomial using a diagonal
matrix, and then it is clear that we just need one linear factor for each of the distinct
entries along the diagonal.
Conversely, suppose that the minimal polynomial of T is p(x) = (x – λ n ) (x – λ n–1 )
. . . . (x – λ 1 ) . Then V is a direct sum of the null spaces n(T – λ n IV ), which shows that
there exists a basis for V consisting of eigenvectors.

Illustrative Example-3. Let V = F 3 [x] be the space of all polynomials of degree atmost
3 and let T: V → V be the linear transformation given by T(f) = f1. Find the minimal
polynomial of T.
Solution. The basis for V is {1, x, x2, x3} and T(1) = 0, T(x) = 1, T(x2) = 2x and
T(x3) = 3x2.
0 0 0 0
 
1 0 0 0
Hence the matrix of T is A =  .
0 2 0 0
 
0 0 3 0 

By direct computation, A4 = 0 and A3 ≠ 0. Hence the minimum polynomial of T is x4.

Note.

1. Let µ be a nonempty set of elements in A(V), the subspace W⊂V is said to be


invariant under µ if for every M∈µ , WM⊂ W.
2. The nonempty set, µ, of linear transformation in A(V) is called an irreducible set
if the only subspaces of V invariants under µ are {0} and V. If µ is an irreducible
set of linear transformation on V and if where D = {T∈A(V): TM = MT for all
M∈µ}.

0 1
Illustrative Example-4. Let F be the field of real numbers and let � � ∈F 2 .
−1 1
0 1
Then prove that the set µ consisting only of � � is an irreducible. Further, Find the
−1 1
0 1
set D of all matrices commuting with � � , where D = {T∈A(V): TM = MT for all
−1 1
M∈µ}.
163
𝑎 𝑏 0 1 0 1 𝑎 𝑏
Solution. Consider � � � � = � � � � .
𝑐 𝑑 −1 1 −1 1 𝑐 𝑑

−𝑏 𝑎 𝑐 𝑑
That is, � � = � �.
−𝑑 𝑐 −𝑎 −𝑏

Here –b = c , a = d. Therefore, we have

𝑎 𝑏 0 1 −𝑏 𝑎
� � � � = � � and
−𝑏 𝑎 −1 1 −𝑎 −𝑏
0 1 𝑎 𝑏 −𝑏 𝑎
� � � � = � �.
−1 1 −𝑏 𝑎 −𝑎 −𝑏
𝑎 𝑏
Hence D = { � � : a, b ∈R }.
−𝑏 𝑎
0 1
That is, D = set of all matrices which commute with � �.
−1 1
𝑎 𝑏
Define a map φ: D→ C by φ� � → a+ib is a field isomorphism.
−𝑏 𝑎
To prove it is a field isomorphism.
𝑎 𝑏 𝑐 𝑑 𝑎+𝑐 𝑏+𝑑
First, φ � �+� � = φ � �
−𝑏 𝑎 −𝑑 𝑐 −(𝑏 + 𝑑) 𝑎 + 𝑐
= (a + c) + i(b + d)
= (a + ib) + (c + id)
𝑎 𝑏 𝑐 𝑑
=φ � �+ φ � �
−𝑏 𝑎 −𝑑 𝑐

Therefore, φ preserves addition.


𝑎 𝑏 𝑐 𝑑 𝑎𝑐 − 𝑏𝑑 𝑎𝑑 + 𝑏𝑐
Now φ � � � � = φ� �
−𝑏 𝑎 −𝑑 𝑐 −(𝑏𝑐 − 𝑎𝑑) 𝑎𝑐 − 𝑏𝑑
= (ac – bd) + i(ad+bc)
= (ac+iad) – i(bd – ibc)
= a(c+id) + ib(c+id)
= (a+ib) (c+id)
𝑎 𝑏 𝑐 𝑑
=φ� �+φ� �
−𝑏 𝑎 −𝑑 𝑐
Therefore, φ preserves scalar multiplication.
Hence the mapping φ is well defined.

164
0 1
Let A = � � , then det(A–λI) = 0 , since det (A) = 1.
−1 0

0 1 0 1
That is, det � � � − λ � ��=0
−1 0 −1 0
−λ 1
⇒ det � � =0
−1 0
⇒ λ2+1 = 0 ⇒ λ = ± √–1 = ± i .
Therefore, the minimal polynomial is (x+i) (x–i) = x2 +1 = 0
Now M⊂ A(V), M is a irreducible set if M(W) ⊂ W ⇒ W ={0} or W = V .
Therefore, T(W)⊂ W ∀ T∈M ⇒ either W = {0} or W = V .
Let T be a Linear transformation
0 1
T=� � : R2 → R2
−1 0
Let W be a one - dimensional subspace of V , T(W)⊄W. Let {w 1 } be a basis of W. Then
{w 1 } can be extended to a basis {w 1 , w 2 } of V.
Suppose W is invariant under T .
T (w 1 ) = a 1 w 1 +0.w 2 , because T (w 1 )∈W = 〈 w 1 〉.
T (w 2 ) = b 1 w 1 + b 2 w 2 , because T (w 2 )∈W
𝑎 0
Therefore, the matrix A = � 0 � is a matrix of T with respect to {w 1, w 2 } and the
𝑏1 𝑏2
0 1
matrix B = � � is also a matrix of T with respect to some other basis . Then by the
−1 1
result A and B are similar.
0 1
That is, B = � � ⇒ the matrix B has ± i are complex roots, and
−1 1
𝑎0 0
A=� � ⇒ the matrix A has a 1 , b 1 are real roots.
𝑏1 𝑏2
This is a contradiction to the fact that W is invariant under T . That is, there is no proper
subspace of V in D. Therefore, the matrix is irreducible.

165
3. 3. Summary

1. The minimal polynomial deals the distinct eigenvalues and the size of the largest
Jordan block corresponding to eigenvalues. While the Jordan normal form
determines the minimal polynomial, the converse is not true. This leads to the
notion of elementary divisors.
2. The minimal polynomial has all roots in the ground field and no repeated roots,
we can immediately conclude that the matrix is diagonalizable and therefore the
Jordan canonical form is diagonal.
3. The elementary divisors of a square matrix A are the characteristic polynomials of
its Jordan blocks. The factors of the minimal polynomial m are the elementary
divisors of the largest degree corresponding to distinct eigenvalues. The degree of
an elementary divisor is the size of the corresponding Jordan block, therefore the
dimension of the corresponding invariant subspace. If all elementary divisors are
linear, A is diagonalizable.

3. 4. Keywords

Diagonalizable Irreducible polynomial


Division Algorithm Jordan block
Elementary divisors Minimal polynomial

3. 5. Assessment Questions

1. Let A be an n × n real matrix. Then show the minimal polynomial of A is unique.


Hint. See Theorem 3. 2.1-(i)

2. The linear transformation T is diagonalizable if and only if its minimal


polynomial is a product of distinct linear factors.

166
Hint. See Theorem 3. 2. 1 - (ii).

3. If T, S∈A(V) and if S is regular, then show that T and STS–1 have the same
minimal polynomial.
Hint. Use the polynomial expression p(x) and show p(T) = p(STS–1) with the
definition of minimal polynomial conditions.

4. Give an example of two n × n matrices which have the same minimal polynomial,
but are not similar to each other.
1 1 1 1
Answer. A = � � and B = � �. Also, rank of A=1 and rank of B = 2.
0 0 0 1

3. 6. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010

167
UNIT- 4: THE RATIONAL CANONICAL FORM

STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. The Rational Canonical Form
4. 2. 1. Basic Facts of Cyclic Module
4. 2. 2. Companion Matrix
4. 3. Summary
4. 4. Keywords
4. 5. Assessment Questions
4. 6. References

168
UNIT- 4: THE RATIONAL CANONICAL FORM

4. 0. Objectives

After working through this unit, the reader should able to :


• Discuss the role of minimal polynomial in vector space V as a cyclic module.
• Define the companion matrix
• Explain the rational canonical form
• Explain the elementary divisors and the product of its elementary divisors.
• Find all possible rational canonical forms of given n × n matrices with a minimal
polynomial.

4. 1. Introduction

The generalizations of the eigenvalue and eigenspace, this unit deals with a
suitable canonical form of a linear operator to this context. The one that we study is
called rational canonical form.

4. 2. Rational canonical form

4. 2. 1. Basic Facts of Cyclic module.


1. Let R be any ring, a nonempty set M is said to be an R-module (or, a module over
R) if M is an abelian group under an operation + such that for every r∈R and m∈M
there exists an element rm in M subject to
(i) r(a +b) = ra+ rb
(ii) r(sa) = (rs)a
(iii) (r + s)a = ra + sa for all a, b∈M and r, s∈R.
2. An additive subgroup A of the R-module M is called a submodule of M if whenever
r∈R, a∈A and ra∈A.
169
Given an R-module M and a submodule A we could the quotient module
MA in a manner similar to the way we construct quotient groups, quotient rings
and quotient spaces.
3. An R-module M is said to be cyclic if there is an element m o ∈M such that every
m∈M is of the form m=r m o , where r∈R.
4. Let V i = { v∈V /{ 𝑞𝑖 (𝑇)𝑙𝑖 (𝑣) = 0} for all i = 1, 2,….. , k be the invariant subspaces
of V under T . Then V i ≠ 0 and V = V 1 ⨁V 2 ⨁……….⨁V k, , the minimal polynomial
of T i on V i is 𝑞𝑖 (𝑥)𝑙𝑖 .

To solve the nature of a cycle submodule for an arbitrary T, we discuss the


following lemma, it suffices to settle it for a T whose minimal polynomial is a power of
an irreducible one.

Lemma 4. 2. 1. Suppose that a linear transformation T∈A(V), has as minimal polynomial


over F the polynomial p(x) = γ 0 + γ 1 x + . . . . . . . . .γ r – 1 xr – 1 + xr. Suppose, further, that
V, as a module is a cyclic module. Then there is basis of V over F such that, in this basis,
the matrix of T is

0 1 0 ⋯ 0
0 0
⎛ 0 1 ⋯
⋯ ⎞

⎜ ⋯ ⎟.
0 0 0 1

⎝−γ0 −γ1 ⋯ −γr−1 ⎠

Proof. Since V is cyclic relative to T, there exists a vector v∈V such that every element
w∈V, is of the form w = f(T)(v) for some f(x) ∈ F[x].
Now if for some polynomial h(x) ∈ F[x], h(T)(v) = 0, then for any w∈V,
h(T) (f(T)(v)) = (h(T)(v)) f(T) = 0; thus h(T) annihilates all of V and so h(T) = 0.
But then p(x) | h(x), since p(x) is the minimal polynomial of T. This remark
implies that {v, T(v), T2(v), . . . . . , T r – 1(v) } are linearly independent over a field F, for
if not, then a 0 v+ a 1 T(v) + . . . . . + a r – 1 T r – 1(v) = 0 with a 0 , a 1 , . . . . . . ,a r – 1 in over a
field F.

170
r – 1
But then (a 0 + a 1 T + . . . . . . . + a r – 1 T )(v) = 0, hence by the above
discussion p(x) | (a 0 + a 1 T + . . . . . . . + a r – 1 T r – 1), which is impossible
since p(x) is of degree r unless a 0 = a 1 = . . . . . . . + a r – 1 = 0.
Since Tr = −γ 0 − γ 1 T − . . . . . . . . . −γ r – 1 Tr – 1, we immediately have that T r + k ,
for k ≥ 0, is a linear combination of 1, T, . . . . . . , Tr – 1, and so f(T), for any f(x) ∈ F[x], is
a linear combination of 1, T, ….., Tr – 1 over F.
Since any w∈V is of the form w = f(T)(v) , we get that w is a linear combination of
{v, T(v), . . . .. , Tr – 1 (v)}.
Thus, we have proved, in the above two paragraphs, that the elements
{v, T(v), . . . . . . . . , Tr – 1 (v) }form a basis of V over F.
In this basis, as is immediately verified, the matrix of T is exactly as claimed.

4. 2. 2. Companion matrix

Definition 4. 2. 1. If f(x) = γ 0 + γ 1 x + . . . . . . . . . + γ r – 1 xr – 1 + xr is in F[x], then the r × r


matrix
0 1 0 ⋯ 0
0 0
⎛ 0 1 ⋯
⋯ ⎞

⎜ ⋯ ⎟
0 0 0 1

⎝−γ0 −γ1 ⋯ −γr−1 ⎠
is called the companion matrix of f(x). We write it as C(f(x)).

Note.
1. If V is cyclic relative to T and if the minimal polynomial of T in F[x] is p(x) then
for some basis of V the matrix of T is C(p(x)).
2. The matrix C(f(x)), for any monic f(x) in F[x], satisfies f(x) and has f(x) as its
minimal polynomial.
3. If V = V 1 ⊕ V 2 ⊕ . . . . . . . ⊕ V k , where each subspace V i is of dimension n i and
is invariant under T, an element of A(V) , then a basis of V can be found so that

171
𝐴1 0 ⋯ 0
𝐴2 ⋯ 0
the matrix of T in this basis is of the form � 0 ⋯ ⋮ �, where each A i is
⋮ ⋮
0 0 ⋯ 𝐴𝑛
an n i × n i matrix and is the matrix of the linear transformation induced by T on V i .
4. If T ∈ A(V) is nilpotent, of index of nilpotence n i , then a basis of V can be found
𝑀𝑛1 0 ⋯ 0
𝑀𝑛2 ⋯ 0
such that the matrix of T in this basis has the form � 0 ⋯ ⋮ �,
⋮ ⋮
0 0 ⋯ 𝑛𝑛𝑟
where n 1 ≥ n 2 ≥ . . . . . . . ≥ n r and where n 1 + n 2 + . . . . . . . ≥ n r = dim (V).
Here, the integers n 1 , n 2 , . . . . . . . , n r are called the invariants of T.

Theorem 4. 2. 2. If T∈A(V) has a minimal polynomial p(x) = q(x)e , where q(x) is a


monic, irreducible polynomial in F[x], than a basis of V over F can be found in which the
matrix of T is of the form
C(𝑞(x)e1 )
C(𝑞(x)e1 )
� �,

𝐶(𝑞(x)er )
where e = e 1 ≥ e 2 ≥ . . . . . . . ≥ e r .
Proof. Since V, as a module over F[x], is finitely generated, and since F[x] is Euclidean,
we can decompose V as V = V 1 ⊕ V 2 ⊕ . . . . . . . ⊕ V r where the V i are cyclic modules.
The V i are thus invariant under T, if T i is the linear transformation induced by T on V i ,
its minimal polynomial must be a divisor of p(x) = q(x)e so is of the form q(x)ei.
We can renumber the spaces so that e 1 ≥ e 2 ≥ . . . . . . . ≥ e r .
Now 𝑞(𝑥)e1 annihilates each V i , hence annihilates V, when 𝑞(𝑇)e1 = 0.
Thus e 1 ≥ e; since e 1 is clearly at most e we get that e 1 = e.
By Lemma 4.2.1, since each V i is cyclic relative to T, we can find a basis such that the
matrix of the linear transformation of T i on V i is 𝐶(𝑞(𝑥)ei ).
Thus, for each i= 1, 2, . . . k, V i ≠ (0) and V = V 1 ⊕ V 2 ⊕ . . . .. ⊕ V k .
The minimal polynomial of T i is q i (x)li) a basis of V can be found so that the matrix of T
in this basis is

172
C(q(x)e1 )
C(q(x)e1 )
� �

C(q(x)er )

Corollary. If T∈A(V) has minimal polynomial p(x) = 𝑞1 (𝑥) 𝑓1 𝑞2 (𝑥) 𝑓3 . . . . . . 𝑞𝑘 (𝑥) 𝑓𝑘 P

over a field F, where 𝑞1 (𝑥), 𝑞2 (𝑥), . . . . . , 𝑞𝑘 (𝑥) are irreducible distinct


polynomials in F[x], then a basis of V can be found in which the matrix of T is of the
𝑅1
𝑅2
form � �,

𝑅𝑘
C(q i (x)ei1 )
where each 𝑅𝑖 = � �,

C(q i (x)eiri )
where e i = e i1 ≥ e i2 ≥ . . . . . . . ≥ e iri .
Proof. By note-3, V can be decomposed into the direct sum V = V 1 ⊕ V 2 ⊕ . . . . . ⊕ V k ,
where each V i is invariant under T and where the minimal polynomial of T i , the linear
transformation induced by T on V i , has as minimal polynomial 𝑞𝑖 (𝑥)ei .
Using note-4 and the theorem 4. 2. 2, just proved, we obtain the above result.
If the degree of q i (x) is d i , note that the sum of all the d i e ij is n, the dimension of V over a
field F.

Definition 4. 2. 2. The matrix of T in the statement of the above corollary is called the
rational canonical form of T.

Definition 4. 2. 3. The polynomials


𝑞1 (𝑥)𝑒11 , 𝑞2 (𝑥)𝑒12 , . . . . . 𝑞𝑘 (𝑥)𝑒𝑘1 ,, . . . . . 𝑞𝑘 (𝑥)𝑒𝑘𝑟𝑘 r k in F[x]
P

are called the elementary divisors of T. So, if dim(V) = 0, then the characteristic
polynomial of , p T (x), is the product of its elementary divisors.

Illustrative Example-1. Show that the characteristic polynomial of the companion


matrix C(q(x)) is q(x).

173
0 1 0 ⋯ 0
⎛ 0 0 1 ⋯ 0
Solution. 𝐶�𝑞(𝑥)� = ⎜ ⋯ ⋯ ⋯ ⋯ ⋯ ⎞,
⋯ ⎟
0 0 0 1
−𝑎 −𝑎1 ⋯ ⋯ 𝑎𝑛−1 ⎠
⎝ 0

where q(x) = a 0 + a 1 x + . . . . . + a n – 1 x n – 1 + x n.
λ −1 0 ⋯ 0
0 λ −1 ⋯ 0
𝐷( λ𝐼 − 𝑞(𝑥)) = �� ⋯ ⋯ ⋯ ⋯ ⋯ ��
0 0 0 ⋯ λ−1
𝑎0 𝑎1 ⋯ ⋯ λ + 𝑎𝑛−1
Also to the first column, λ times the second column, λ2 times the third column etc, and
λn – 1 times the last column. Then the determinant becomes
0 −1 0 ⋯ 0 −1 0 ⋯ 0
0 λ −1 ⋯ 0 λ −1 ⋯ 0
�� ⋯ ⋯ ⋯ ⋯ ⋯ �� = (−1)𝑛−1 𝑞(λ) � ⋯ ⋯ ⋯ ⋯ �
0 0 0 ⋯ λ−1 0 0 ⋯ λ−1
𝑞(λ) 𝑎1 ⋯ ⋯ λ + 𝑎𝑛−1
= ( − 1)n – 1 q ( λ) (− 1)n – 1 = q ( λ).

Illustrative Example-2. Deduce from the previous problem that the characteristic
polynomial of T is the product of all elementary divisors of T.

𝑅1 0
Solution. Under the Rational Canonical form, the matrix of T is � ⋱ �,
0 𝑅𝑘
𝐶(𝑞𝑖 (𝑥)𝑒𝑖1 )
where each 𝑅𝑖 = � ⋱ �.
𝑒𝑖𝑛𝑖
𝐶(𝑞𝑖 (𝑥) )
Then the characteristic polynomial of T is the product of characteristic polynomials of
C(𝑞𝑖 (𝑥)𝑒𝑖𝑗 ) which is nothing but 𝑞𝑖 (𝑥)𝑒𝑖𝑗 ). Hence the characteristic polynomial of T is
the product of all elementary divisors.

Illustrative Example-3. If F is the field of rational numbers, find all possible rational
canonical forms and elementary divisors for the 6 x 6 matrices in F 6 having (x – 2)(x +
2)3 as minimal polynomial.
Solution. Since the matrix is of size 6 x 6 and the minimal polynomial is (x – 2)(x + 2)3
there are three cases for the characteristic polynomial.
174
Case 1. Characteristic polynomial is (x – 2)(x + 2)5 .
In this case, the rational canonical forms are C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2)2
or C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2) ⊕ C(x + 2).
Case 2. Characteristic polynomial is (x – 2)(x + 2)4 .
In this case, the rational canonical form is C(x – 2) ⊕ C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2).
Case 3. Characteristic polynomial is (x – 2)2(x + 2)3.
In this case, the rational canonical form is C(x – 2) ⊕ C(x – 2) ⊕ C(x – 2) ⊕ C(x + 2)3.
All these can be written in the block matrix from using the matrix form of C(q(x)),
the companion matrix of q(x).

4. 3. Summary

The Jordan canonical form is the one most generally used to prove theorems
about linear transformations and matrices. Unfortunately, it has one distinct, serious
drawback in that it puts requirements on the location of the charecterstic roots. Thus we
need some canonical form for element in A(V) (or in F n ) which presumes nothing about
the location of the characteristic roots of its element , a canonical form and asset of
invariants created in A(V) itself using only its elements and operation. Such a canonical
form is proved the above unit is rational canonical form.

4. 4. Key words

Companion matrix Rational canonical basis


Cyclic submodule Rational canonical form
Elementary divisors R-Module
Generalized eigenspace Submodule

4. 5. Assessment Questions

175
1. Explain the role of R-module in rational canonical form.
Hint. See the section 4. 2. 1.
2. Show that the elements S and T in A(V) are similar in A(V) if and only if they have
the same elementary divisors.
Hint. See the similar linear transformation in unit-1 and definition 4. 2. 3.
1 1 1 1
 
0 0 0 0
3. Find the rational canonical form of A= 
0 0 −1 0
 
0 −1 1 0 

Hint. The characteristic polynomial of A is (x–1) x (x2+x+1), then use the
companion matrix of q(x).
4. If F is the field of rational numbers, find all possible rational canonical forms and
elementary divisors for the 6 x 6 matrices in F 6 having (x – 1)(x2 + 1)2 as minimal
polynomial.
Hint. Use similar method of illustrative example-3.

4. 6. References

1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,


2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Prentice – Hall of India, 1978, 2nd Ed.,
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.

176
BLOCK-4

UNIT-13 COMBINATORICS AND DESCRIPTIVE STATISTICS

STRUCTURE

13.0. Objectives
13.1 Introduction
13.2 Combinatorics & permutation
13.3 Frequency Distribution
13.4 Graphical Representation of Data
13.5 Measures of Central Tendency
13.6 Moments, Skewness, Kurtosis
13.7 Summary
13.8 Keywords
13.9 Questions for self-study
13.10 References

13.0 Objectives

After studying this unit u will be able to


 Define combinatorics
 Solving the problems in frequency distribution.
 Analyse the graphical representation of data
 Explain the measure of central tendency
 Describe moments, skewness, kurotosis

177
13.1 Introduction

Combinatorics is a stream of mathematics that concerns the study of finite


discrete structures. It deals with the study of permutations and
combinations, enumerations of the sets of elements. It characterizes Mathematical
relations and their properties.

Mathematicians uses the term “Combinatorics” as it refers to the larger subset of


Discrete Mathematics. It is frequently used in computer Science to derive the
formulas and it is used for the estimation of the analysis of the algorithms. In this
article, let us discuss what is combinatorics, its features, formulas, applications and
examples in detail.

 Features of combinatorics

178
Some of the important features of the combinatorics are as follows:

 Counting the structures of the provided kind and size.

 To decide when particular criteria can be fulfilled and analyzing elements of the
criteria, such as combinatorial designs.

 To identify “greatest”, “smallest” or “optimal” elements, known as external


combinatorics.

Combinatorial structures that rise in an algebraic concept, or applying algebraic


techniques to combinatorial problems, known as algebraic combinatorics.

13.2 Combinatorics & permutation

In English, we make use of the word “combination” without thinking if the order is
important. Let’s take a simple instance.
The fruit salad is a combination of grapes, bananas, and apples. The order of fruits in
the salad does not matter because it is the same fruit salad.
But, let us assume that, the combination of a key is 475. You need to take care of the
order, since the other combinations like 457, 574, or others won’t work. Only the
combination of 4 – 7 – 5 can unlock.
Hence, to be precise;

 When the order does not have much impact, it is said to be a combination.
 When the order does have an impact, it is said to be a permutation.

 Combinatorics Formulas
The mathematical form of Permutation and Combination:
 Permutation Formula:
Permutation: The act of an arranging all the members of a set into some order
or sequence, or rearranging the ordered set, is called the process of
permutation.

179
Mathematically Permutation is given as
k-permutation of n is:

n P k = n / (n - k) 
 Combination Formula
Combination: Selection of members of the set where the order is disregarded.
k-combination of n is:

C (n , k) = n! / [ (n-k)! k! ]

 Applications of combinatorics
Combinatorics is applied in most of the areas such as:

 Communication networks, cryptography and network security


 Computational molecular biology
 Computer architecture
 Scientific discovery
 Languages
 Pattern analysis
 Simulation
 Databases and data mining
 Homeland security
 Operations research

Combinatorics is concerned with counting. Permutation and combination form the


principles of counting and they are applied in various situations. Permutations are
understood as arrangements and combinations are understood as selections. The two
basic principles of counting which are useful in determining the number of different
ways of arranging or selecting are:
i. If a certain task can be performed in any of m different ways, another task can
be performed in n different ways, then either of the two tasks can be
performed in (m+n) ways.

180
ii. If a certain task can be done in m different ways. And having done it one of
these ways, another task can be done in n different ways, then two tasks can
together be done in mXn ways.

 Permutations:
A permutation is an arrangement in a definite order of a number of objects taken some
or all at a time.
The number of permutations of n objects taken r at a time is denoted by 𝑛𝑃𝑟 or P(n,r).
To find the number of permutations of n things taken r at a time (which is same as the
number of ways of filling up r blank spaces with n objects) is:
The first space can be filled by any one of the n objects in n ways. After filling the
first space, the second space can be filled by any one of the remaining (n-1) objects in
(n-1) ways. Proceeding like this, the r-th place can be filled by any one of the
remaining n-(r-1) (= n-r+1) ways. Hence,
𝑛 𝑛!
𝑃𝑟 = 𝑛(𝑛 − 1)(𝑛 − 2) … (𝑛 − 𝑟 + 1) = (𝑛−𝑟)!, 0≤ 𝑟 ≤ 𝑛 . (1.1)
𝑛
𝑃𝑛 = 𝑛! .
Suppose in the arrangement, the repetition is allowed, then each space can be filled
in n ways and hence, r blank spaces can be filled in n.n.....n =nr ways. Therefore the
number of permutations of n different objects taken r at a time when repetition is
allowed is nr .
The number of permutations of n things taken all at a time, when p of them are of one
kind, q of them are of another kind, r of them of third kind and so on, and the rest (if
𝑛!
any) , are all different is .
𝑝!𝑞!𝑟!…

181
Example 1
The number of ways a 3-digit number can be formed by using the digits 1 to 9 if no
digit is repeated is give by 9𝑃3 =9X8X7 = 504.
Example 2
The number of different signals that can be transmitted by arranging 3 red, 2 yellow
7!
and 2 green flags on pole assuming all flags are used to transmit a signal is 3!2!2! .

Combinations:
A combination is a selection of some or all of number of different objects where the
order of selection is immaterial. The number of ways of selecting r objects out of n
𝑛
objects is denoted by ( ) or 𝑛𝐶𝑟 or C(n,r) and is given by
𝑟
𝑛 𝑛!
( ) = 𝑟!(𝑛−𝑟)! , (1.2)
𝑟
n and r are positive integers with 𝑟 ≤ 𝑛 .
The following results hold true :
𝑛 𝑛 𝑛 𝑛 𝑛+1
( )=( ), ( ) + ( )=( ),
𝑟 𝑛−𝑟 𝑟 𝑟−1 𝑟
𝑛−1 𝑛
n.( ) = (𝑛 − 𝑟 + 1) ( ).
𝑟−1 𝑟−1
Example 3
A box contains 75 good IC chips and 25 defective chips. 12 chips are selected at
100
random. The number of ways this is done is ( ) (number of samples). The number
12
75
of samples that have all the 12 chips to be good are ( ).
12

182
Binomial Theorem:
If n is a positive integer, then
𝑛 𝑛 𝑛 𝑛
(𝑎 + 𝑏)𝑛 = ( ) 𝑎𝑛 + ( ) 𝑎𝑛−1 𝑏 + ( ) 𝑎𝑛−2 𝑏 2 + ⋯ + ( ) 𝑏 𝑛 (1.3)
0 1 2 𝑛
𝑛 𝑛 𝑛
Co – efficients ( ), ( ), ... , ( ) are called binomial co-efficients.
0 1 𝑛
The result is proved by mathematical induction.
Using binomial theorem, one can find easily, 1113 , (1.1)1000, expansion of (1 + 𝑥)𝑛 ,

13.3 Frequency distribution

A characteristic of phenomena which is numerically measurable is quantitative


character (execution time of programs) and the characteristic which is not numerically
measurable is qualitative characteristic (generation number of processor).
A quantitative characteristic which varies from unit to unit is a variable. A variable
may be discrete or continuous. A variable which assumes only some specified values
in a given range is discrete variable (number of cars of different brands sold in a
year). A variable which assumes all possible values in the range is continuous variable
(time taken to complete the task by students). The information collected on
characteristics of phenomena is called data. Data is collected using statistical
techniques.

It is useful in understanding what a dataset reveals about a particular phenomenon.


A presentation of values taken by a variable and the corresponding frequency is called
frequency distribution of that variable.

183
A tabular presentation of frequency distribution is called frequency table. A frequency
distribution in which class intervals are considered is a continuous frequency
distribution; otherwise it is discrete (ungrouped) frequency distribution. Let us
consider an example, to understand frequency distribution.
Example 4
Marks obtained by 20 students from a class in a test are:
23, 13, 26, 11, 18, 09, 21, 23, 13, 30,
22, 11, 17, 22, 19, 13, 14, 22, 15, 16.
This form of data is known as raw data. To understand and interpret such data, it needs to
be ‘organized’. One way of organization of larger data into a concise form is construction
of frequency distribution table. Note that the term frequency refers to the number of times
an observation occurs or appears in a data set.
Marks obtained Frequency
09 1
11 2
13 3
14 1
15 1
16 1
17 1
18 1
19 1
21 1
22 3
23 2
26 1
30 1

184
This is an ungrouped frequency distribution table. It takes into account ungrouped data
and calculates the frequency for each data value.
Consider the data in the form of class intervals (CI) to tally the frequency for the data that
belongs to that particular class interval.
CI (Marks obtained) Frequency
5-10 1
10-15 6
15-20 5
20-25 6
25-30 1
30-35 1
This is grouped frequency table. From this table it is clear that, the whole range is
divided into mutually exclusive sub intervals called Class Intervals (CI). Each CI has
lower class limit ( 5, 10, 15, 20, 25 ) and upper class limit ( 10, 15, 20, 25, 30 ). The
difference between the class limits is called width of the CI. The number of
observations in any class is the class frequency. For a CI, let f be the class frequency,
w be the width of CI, N be the total frequency.
d = f/w is frequency density, p = f/N is relative frequency. In the above example, N is
20. For CI 20 – 25, f is 6 and w is 5. Hence, d = 6/5 and p = 6/20.
Inclusive and Exclusive CIs:
If a CI is such that the values of data equal to lower and upper class limits are
included in the same CI, it is inclusive CI. For example, if the class intervals of above
example are to be inclusive type then the CI would be 5 – 9, 10 – 14, 15 – 19,.... If a
CI is such that the value of the variable equal to lower class limit is included in that
class but the variable having a value equal to upper class limit is included in
succeeding CI, it is exclusive CI.

185
The above frequency table represents exclusive typeCI. Here the data value 15 is
considered in the CI 15-20 but not in 10-15 and similarly the data value 30 is included
in the CI 30-35 but not in 25-30.

13.4 Graphical representation of data


Graphical representation of data is a vital component in analyzing and understanding
large quantities of numerical data and the relationship between data
points. Histograms and frequency polygons are used to inspect data and its
distributional properties.
 Histograms:
Histograms graphically summarize the distribution of a univariate data set.
Histograms can be used to answer the following questions:
What kind of population distribution do the data come from?
Where are the data located?
How spread out are the data?
Are the data symmetric or skewed?
Are there outliers in the data?
In histogram, the frequency distribution of data is represented by rectangular bars
with area of rectangles proportional to class frequency. The variable (CI) is taken
along X-axis, class frequency is taken along Y-axis. With CI as base, rectangles with
height proportional class frequency are drawn. In the frequency distribution, if CIs are
of unequal width( for example, 5-10, 10-20, 20-40, 40-55, 55-60 ), the rectangles are
drawn with height proportional to frequency density (f/w), so that area is proportional
to class frequency.

186
If the CIs are of inclusive type, they need to be converted to exclusive type by
subtracting 0.5 from lower class limit and by adding 0.5 to upper class limit
(Example 13). From histogram mode of the distribution is obtained. Histograms show
which values are more and less common along with their dispersion.
Example 5
Figure 1 is an example of a histogram that shows the distribution of salary, of the
employees of a corporation.
Salary (in 000s) Number of employees
140-145 4
145-150 10
150-155 18
155-160 20
160-165 15
165-170 8
170-175 5
Figure 1:

187
Example 6
The distribution of Weekly wages of workers in a factory are as below.
Weekly wages (in Rs.) Number of workers Frequency Density
200-400 40 0.2
400-450 85 1.7
450-500 160 3.2
500-600 280 2.8
600-700 110 1.1
700-800 60 0.6
800-900 10 0.1

Figure 2 is histogram for the same. As The CIs are of unequal width, heights of the
rectangles are proportional to frequency density (f/w).
Figure 2:

188
To find the mode of the distribution, mark the points A, B, C, D on the highest
rectangle (as shown in Figure 2), join AC and BD. Let these intersect at O. From O
draw perpendicular to X-axis. Let it meet X-axis at P. The value of P is mode. In this
example, mode = 540.
Frequency Polygon:
Variable is taken along X-axis. CIs may be of equal width, unequal width, inclusive
type or exclusive type.
Class mid values are obtained. Class frequencies are plotted against class-mid values.
These points are joined by straight lines. Joining the mid-points of the upper sides of
the rectangles of histogram also, frequency polygon is obtained.

13.5 Measures of Central Tendency


Generally, in data or in a frequency distribution, the observations or data items cluster
around a central value. This property of concentration of the data items around a
central value is called central tendency (measure of location or average). Thus a
measure of central tendency is a single number that represents the most common
value for a list of numbers.
Five measures of central tendency are defined and they are: mean, median, mode,
harmonic mean, geometric mean. Depending on the need and nature of study,
appropriate measure is chosen. Among these five measures of central tendency, first
three are used often. A measure of central tendency is said to be good if it possesses
following property:
Easy to understand
Should be based on all values.

189
Should not be affected by abnormal/extreme values.
Should be capable of further mathematical treatment so that it could be used in further
analysis of data.
Should be a stable measure.
Arithmetic Mean (A. M.):
It is obtained by dividing sum of values by number of values in a set. A. M. of 𝑥1 ,
𝑥2 , … , 𝑥𝑛 is
𝑥1 +𝑥2 +⋯+𝑥𝑛 ∑𝑛
𝑖=0 𝑥𝑖
𝑥̅ = = (1.4)
𝑛 𝑛

For discrete data set,


X f
𝑥1 𝑓1
𝑥2 𝑓2
. .
. .
. .
𝑥𝑛 𝑓𝑛
𝑥1 𝑓1 +𝑥2 𝑓2 +⋯+𝑥𝑛 𝑓𝑛 ∑𝑛
𝑖=0 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑛
(1.5)
𝑓1 +𝑓2 +⋯+𝑓𝑛 𝑖=0 𝑓𝑖

And for grouped data,


CI f class mid values
𝑙1 − 𝑢1 𝑓1 𝑥1
𝑙2 − 𝑢2 𝑓2 𝑥2
. . .
. . .
. . .
𝑙𝑛 − 𝑢𝑛 𝑓𝑛 𝑥𝑛

190
∑𝑛
𝑖=0 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑛
. (1.6)
𝑖=0 𝑓𝑖

Change of origin and scale:


Let a and h be constants. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are mid values of CIs, then let
𝑥1 −𝑎 𝑥2 −𝑎 𝑥𝑛 −𝑎
𝑢1 = , 𝑢2 = , ... , 𝑢𝑛 = . Arithmetic mean is given by
ℎ ℎ ℎ
∑𝑛
𝑖=0 𝑢𝑖 𝑓𝑖
𝑥̅ = 𝑎 + ℎ. ∑𝑛
(1.7)
𝑖=0 𝑓𝑖

Example 7
The pull off force collected from 10 prototype engine connectors are 11.5, 12.3, 10.2,
12.6, 13.4, 11.2, 12.1, 11.8, 10.7, 11.6. The mean pull off force is given by
11.5+12.3+⋯+11.6
𝑥̅ = = 11.74 .
10

𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝟖
The number of IC chips with different numbers of defects are as below.
Number of defects (X) Number of chips(f)
1 25
2 32
3 12
4 48
5 33
6 42
7 29
8 17
9 9
10 6
The mean number of defective chips is given by
1𝑋25 + 2𝑋32 + ⋯ + 10𝑋6
𝑥̅ = = 4.79
25 + 32 + ⋯ + 6

191
Example 9
Daily expenditure on commutation by 120 students is given below.
Expenditure Number of Mid-values fx
In Rs. Students(f) x
10-20 21 15 315
20-30 40 25 1000
30-40 33 35 1155
40-50 14 45 630
50-60 7 55 385
60-70 3 65 195
70-80 2 75 150
The mean expenditure is
315+1000+⋯+150
𝑥̅ = = 31.91 Rs.
120

Note: The distribution of mean can be easily obtained. It can be calculated even when
some of the values are zero or negative. It is affected by abnormal values.
Median:
Median of a set of values is the middle most value when they are arranged in
ascending (or descending) order of magnitude.
For ungrouped Data, arrange the data in ascending or descending order.
Let the total number of values be n.
Median = (n + 1)/2th observation, if n is odd
Median = [(n/2)th value+ ((n/2) + 1)th value]/2, if n is even.
Example 10
𝑛+1
For the data: 50, 67, 24, 34, 78, 43, 55, median is obtained as ( ) - th = 4th value of
2

24, 34, 43, 50, 55, 67, 78, which is 50.

192
Example 11
For the data 46, 83, 55, 26, 65, 56, 65, 37, 22, 73, median is [(n/2)th value+ ((n/2)
+ 1)th value]/2 of 22, 26, 37, 46, 55, 56, 65, 65, 73, 83 is [55+56]/2=55.5.
When the data is continuous and in the form of a frequency distribution, the median is
found as shown below:
𝑁
( −𝑐)
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2
.ℎ (1.8)
𝑓

N = total number of values i.e. ∑fi


Median Class is the class where (N/2) lies.

l = lower limit of median class


c = cumulative frequency of the class preceding the median class
f = frequency of the median class
h = width of median class

Example 12
Following are the marks scored by 50 students, the median marks scored is:

Classes 0-10 10-20 20-30 30-40 40-50

Frequency 2 12 22 8 6

Classes Number of students Cumulative frequency

0-10 2 2

193
10-20 12 2 + 12 = 14

20-30 22 14 + 22 = 36

30-40 8 36 + 8 = 44

40-50 6 44 + 6 = 50

N = 50
N/2 = 50/2 = 25
Median Class = (20 - 30)
𝑙= 20, f = 22, c = 14, h = 10
Using Median formula (1.8)
Median= 20 + (25 - 14)/22 × 10
= 20 + (11/22) × 10
= 20 + 5 = 25

Example 13
Following are the heights of students registered for Computer Science core
Course.
Classes: 130-139 140-149 150-154 155-159 160-164 165-175
Frequency:2 3 11 20 21 18

194
First convert the CI to exclusive type, find cumulative frequencies.
Classes Frequency Cum. Frequency
129.5-139.5 2 2
139.5-149.5 3 2+3=5
149.5-154.5 11 5+11=16
154.5-159.5 20 16+20=36
159.5-164.5 21 36+21=57
164.5-175.5 18 57+18=75
N = 75
N/2 = 75/2 = 37.5
Median Class = 159.5-164.5
l = 159.5, f = 21, c = 36, h = 10
Using Median formula (1.8)
Median= 159.5 + (37.5-36)/21 × 10
= 160.21
Note: Median can be computed even when some extreme values are
missing, it can be computed. It is not affected by abnormal values /outliers. It can be
used for qualitative data also. Its computation is not based on all values.
Mode:
The value which appears most often in the given data i.e. the observation with the
highest frequency is called a mode of data. For ungrouped data, we need to identify the
observation which occurs maximum times. For example in the data: 7, 8, 9, 3, 4, 6, 7,
6, 8, 3, 12, 8 the value 8 appears the

195
most number of times. Thus, mode = 8. A data may have no mode, 1 mode, or more
than 1 mode. Depending upon the number of modes the data has, it can be called
unimodal, bimodal, trimodal, or multimodal. For example in the data: 7, 8, 9, 3, 4, 6, 7,
6, 8, 3, 12, 8, 7, 13, the values 7 and 8 appear the most number of times. Thus, 7 and
8 are modes, hence the data is bimodal. When the data is continuous and in the form of
a frequency distribution, the mode is found as shown below:
𝑓 −𝑓
𝑀𝑜𝑑𝑒 = 𝑙 + (2𝑓 𝑚−𝑓 −𝑓
0
).ℎ (1.9)
𝑚 0 1

modal class is the class with maximum frequency.


𝑙 = lower limit of modal class,
𝑓𝑚 = frequency of modal class,
𝑓0 = frequency of class preceding modal class,
𝑓1 = frequency of class succeeding modal class,
h = class width
Example 14:
The life batteries in completed hours is 178, 190, 132, 215, 190, 156, 210,
220, 190, 215, 170, 155, 162. The mode of this data is 190.
Example 15:
Percentage Marks scored by students in summative exams of core course
are distributed as below.
CI: 30-39 40-49 50-59 60-69 70-79 80-89 90-100
Freq. : 8 19 29 36 25 13 14
First convert inclusive CI to exclusive CI.
CI Frequency
29.5-39.5 8
39.5-49.5 19

196
49.5-59.5 29
59.5-69.5 36
69.5-79.5 25
79.5-89.5 13
89.5-100.5 14
Modal Class is 59.5-69.5.
𝑙 = 59.5, 𝑓𝑚 = 36 , 𝑓0 = 29 , 𝑓1 = 25 , h = 10.
Using (1.9), Mode = 63.88.
Note: Mode can be calculated even when some extreme values are
missing. Not affected by extreme values/outliers. If a data has more
than one mode, it cannot be used for further statistical analysis.
 Measures of Dispersion
A measure of dispersion reflects how closely the data clusters around the measure of
central tendency. It helps to judge the reliability of measure of Central tendency, to
obtain correct picture of distribution or dispersion of values in the data, to make a
comparative study of variability of two or more data sets or samples.
The two types of measures of dispersion are - 1. Absolute measures of dispersion 2.
Relative measures of dispersion
Absolute measures of dispersion are:
Range, Quartile Deviation, Mean Deviation, Standard Deviation.
These are expressed in the same unit in which data values or observations are given.
Relative measures of dispersion are:
Co-efficient (Co-eff.) of Range, Co-eff. of Quartile Deviation, Co-eff. of Mean
Deviation, Co-eff. of Variation.

197
These are expressed as a ratio or percentage of all the coefficient of the absolute
measures of dispersion. Therefore, relative measures of dispersion are also called
coefficient of dispersion. These are free from units of measurements. Relative
measures are used for comparing variability in two or more distributions having
different units of measurements.
Mean Deviation and Standard Deviation are popular measures of dispersion.
Mean Deviation:
Mean deviation is the average deviation fr om the mean (or median) value
of the given data set. For data values 𝑥1 , 𝑥2 , … , 𝑥𝑛 , mean deviation (MD) is
given by
∑𝑛
𝑖=1|𝑥𝑖 −𝑚|
𝑀𝐷 = , (1.10)
𝑛

m is mean or median of the data.


For discrete frequency distribution,
X :𝑥1 𝑥2 … 𝑥𝑛
F: 𝑓1 𝑓2 … 𝑓𝑛
MD is given by
∑𝑛
𝑖=1|𝑥𝑖 −𝑚|𝑓𝑖
𝑀𝐷 = ∑𝑛
, (1.11)
𝑖=1 𝑓𝑖

m is mean or median of the data set.


For continuous frequency distribution,
MD is given by
∑𝑛
𝑖=1|𝑥𝑖 −𝑚|𝑓𝑖
𝑀𝐷 = ∑𝑛
(1.12)
𝑖=1 𝑓𝑖

where 𝑥𝑖 ’ s are mid values of CI and m is mean or median of the data set.
Example 16
The number of syntax errors committed on compilation of a C-program coded by 7
students are, 4,3,5,8,1,11, 6.

198
To find mean deviation from mean, following computations are used.
𝑥̅ = (4 + 3 + 5 + 8 + 1 + 11 + 6)/7 = 38/7 = 5.43.
|4 − 5.43| = 1.43, |3 − 5.43| = 2.43, |5 − 5.43| = 0.43,
|8 − 5.43| = 2.57, |1 − 5.43| = 4.43, |11 − 5.43| = 5.27,
|6 − 5.43| = 0.57,
𝑀𝐷𝑋̅ = (1.43+2.43+0.43+2.57+4.43+5.27+0.57)/7 = 2.4471.
To find mean deviation from median, following computations are used.
1, 3, 4, 5, 6, 8, 11 (ascending order arrangement)
7+1 -th
Median is value of ( ) item = 5.
2

|4 − 5| = 1, |3 − 5| = 2, |5 − 5| = 0, |8 − 5| = 3, |1 − 5| = 4, |11 − 5| = 6,
|6 − 5| = 1.
𝑀𝐷𝑚𝑒𝑑 = (1+2+0+3+4+6+1)/7 = 2.4285
Example 17
On inspection of 25 systems, the number of defects with respective frequencies is
found to be

No. of defects 6 5 9 12

Frequency 5 7 4 9

The mean deviation from mean is obtained as below.


𝑥𝑖 𝑓𝑖 𝑥𝑖 𝑓𝑖 |𝑥𝑖 –𝑥̅ | 𝑓𝑖 |𝑥𝑖 – 𝑥̅ |
6 5 30 2.26 11.8
5 7 35 3.36 23.52
9 4 36 0.64 2.56
12 9 108 3.64 32.76
𝑓𝑖 |𝑥𝑖 –𝑥̅ |
𝑥̅ = 8.36 ,𝑀𝐷𝑋̅ = ∑ =2.82.
25

199
Example 18
To find the mean deviation for the following data set,
CI: 0-2 2-4 4-6 6-8 8-10
Freq.: 4 3 5 7 6
The computations are:
Mid values (𝑥𝑖 ) Freq.(𝑓𝑖 ) 𝑥𝑖 𝑓𝑖 |𝑥𝑖 –𝑥̅ | 𝑓𝑖 |𝑥𝑖 – 𝑥̅ |
1 4 4 4.64 18.56
3 3 9 2.64 7.92
5 5 25 0.64 3.2
7 7 49 1.36 9.52
9 6 54 3.36 20.16
𝑓𝑖 |𝑥𝑖 –𝑥̅ |
𝑥̅ = 5.64, 𝑀𝐷𝑋̅ = ∑ = 2.3744.
25

Standard Deviation:
A standard deviation (SD) is a statistic that measures the dispersion of a data set
relative to its mean.
It is the positive square root of mean of the squared deviations of the values from
arithmetic mean 𝑥̅ . It is denoted by 𝜎. It is calculated as

∑(𝑥𝑖 −𝑥̅ )2 𝑥𝑖 2 ∑ 𝑥𝑖 2
𝜎=√ = √∑ −( ) (1.13)
𝑛 𝑛 𝑛

For discrete frequency distribution,


X: 𝑥1 𝑥2 … 𝑥𝑛
F: 𝑓1 𝑓2 … 𝑓𝑛
The SD is given by
∑ 𝑓 𝑖 𝑥𝑖 2 ∑ 𝑓 𝑖 𝑥𝑖 2
𝜎=√ −( ) , N = ∑ 𝑓𝑖 (1.14)
𝑁 𝑁

200
For continuous frequency distribution, SD same as in (1.14), with 𝑥𝑖 ′ s
being class mid values.
Let a and h be constants. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are mid values of CIs, then let
𝑥1 −𝑎 𝑥2 −𝑎 𝑥𝑛 −𝑎
𝑢1 = , 𝑢2 = , ... , 𝑢𝑛 = . If 𝜎 is the SD of X values, then SD of U is
ℎ ℎ ℎ

given by (𝜎/ℎ).
Square of SD is called variance.
Example 19
Heights (in cm) of 6 children are 132, 137, 136, 142, 135, and 140. The SD of these
values is computed as below:
x x - 𝑥̅ (𝑥 − 𝑥̅ )2
132 -5 25
137 0 0
136 -1 1
142 5 25
135 -2 4
140 3 9
N = 6, 𝑥̅ = 137, ∑(𝑥 − 𝑥̅ )2 = 64, using (1.13), SD = 3.265
Example 20
Daily sales of computer systems of a brand are given below.
Sales: 12 13 14 15 16 17 18 19
Days: 1 0 4 12 20 15 6 2

201
SD of sales is obtained as below:
x f fx 𝑥2f 𝑥2
12 1 12 144 144
13 0 0 169 0
14 4 56 196 784
15 12 180 225 2700
16 20 320 256 5120
17 15 255 289 4335
18 6 108 324 1944
19 2 38 361 722
N = 80, ∑ 𝑓𝑖 𝑥𝑖 = 969, ∑ 𝑓𝑖 𝑥𝑖 2 = 15749, using (1.14), SD = 1.289.
Example 21
The diastolic blood pressures of 80 individuals are given below.
CI: 78-80 80-82 82-84 84-86 86-88 88-90
Freq: 3 15 26 23 9 4
SD for this data is computed as:
𝑋−83
CI f X u= f 𝑢2 f 𝑢2
2

78-80 3 79 -2 -6 4 12
80-82 15 81 -1 -15 1 15
82-84 26 83 0 0 0 0
84-86 23 85 1 23 1 23
86-88 9 87 2 18 4 36
88-90 4 89 3 12 9 36
∑ 𝑓𝑖 𝑢𝑖 2 ∑ 𝑓 𝑖 𝑢𝑖 2
∑ 𝑓𝑢 = 32, ∑ 𝑓𝑢2=122, h=2(class width) and using SD=√ −( ) . ℎ , we
𝑁 𝑁

have SD=2.336.
Note: SD is independent of origin of measurement, but not scale. For any data set,
SD≥ 0.

202
13.6 Moments, Skewness, Kurtosis

Moments about Mean:


The r-th moment of X about mean 𝑥̅ , denoted as 𝜇𝑟 is defined as
1
𝜇𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟 , r =0, 1, 2, 3, ... (1.15)

𝜇0 = 1, 𝜇1 = 0, 𝜇2 = 𝜎𝑥2 (𝜎𝑥2 = Variance of X)


1 1
𝜇3 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )3 , 𝜇4 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )4

 Skewness:
Measures of central tendency and dispersion are not sufficient to describe the nature
of distribution. The concepts of Skewness and Kurtosis are used to study the spread
and concentration of data values around central value respectively.
Skewness means lack of symmetry. A distribution is said to be symmetrical when the
data values are uniformly distributed around mean. In symmetric distribution mean,
median and mode are equal. The Co-efficient of skewness is given by
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 = (1.16)
𝑆𝑑

Another formula for skewness is


𝜇2
𝑆𝑘 = 𝜇33 (1.17)
2

For symmetric distribution, 𝑆𝑘 = 0. According to the value of Co-efficient of


skewness is positive or negative, the distribution is positively skewed or negatively
skewed. Figure 3 shows the distributions which are symmetric, positively skewed and
negatively skewed.

203
Figure 3

 Kurtosis:
A frequency distribution may have high concentration of values at the centre
compared to extreme values. Kurtosis indicates degree of peakedness of the
distribution. It is denoted as 𝛽2 and is defined as
𝜇
𝛽2 = 𝜇42 (1.18)
2

If 𝛽2 = 3, the distribution is mesokurtic, if 𝛽2 >3 , the distribution is leptokurtic, and


for 𝛽2< 3, the distribution is platykurtic. Figure 4 shows 3 types of kurtosis.
Figure 4

204
13.7 Summary

The following are the important points contained in this unit.


1. The number of ways a 3-digit number can be formed by using the digits 1 to 9
if no digit is repeated is give by 9𝑃3 =9X8X7 = 504.
2. The number of different signals that can be transmitted by arranging 3 red, 2
yellow and 2 green flags on pole assuming all flags are used to transmit a
7!
signal is 3!2!2! .

3. A box contains 75 good IC chips and 25 defective chips. 12 chips are selected
100
at random. The number of ways this is done is ( ) (number of samples).
12
75
The number of samples that have all the 12 chips to be good are ( ).
12
4. The pull off force collected from 10 prototype engine connectors are 11.5,
12.3, 10.2, 12.6, 13.4, 11.2, 12.1, 11.8, 10.7, 11.6. The mean pull off force is
given by
11.5+12.3+⋯+11.6
𝑥̅ = = 11.74 .
10
𝑛+1
5. For the data: 50, 67, 24, 34, 78, 43, 55, median is obtained as ( ) - th = 4th
2

value of 24, 34, 43, 50, 55, 67, 78, which is 50.

13.8 Keywords

Combinatorics
Permutation
Histograms
Skewness
Kurtosis

205
13.9 Questions for Self Study

1. What is Combinatorics?
2. Explain Combinatorics & permutation.
3. A bag has 9 tickets marked with number 1, 2, ..., 9.Two tickets are
drawn at random from the bag. Find the number of ways in which
both numbers drawn are i. Even ii. Odd iii. one card is even and
another card is odd.
4. Number of defective chips in 50 lots are observed as below.

22, 17, 7, 11, 24, 12, 8, 21, 14, 4, 3, 16, 24, 23, 15, 19

4, 7, 6, 13, 29, 21, 30, 16, 19, 6, 2, 14, 12, 7, 10, 8, 20

6, 7, 15, 4, 3, 9, 10, 8, 16, 18, 25, 10, 18, 6, 19, 21, 18

Prepare frequency distribution table by i. Inclusive method

ii. Exclusive method.

5. Find mode of following data using histogram.


Class Interval Frequency
0-10 6
10-20 13
20-30 19
30-40 26
40-50 38
50-60 17
60-70 6

206
6. Explain various measures of central tendency. State their merits
and limitations.

7. Name different Relative Measures of Dispersion. Write


formulas of standard deviations in different scenarios.
Find co-efficient of variation for following data on percentage of
marks scored by Class VIII students of a school.

Marks Scored Number of Students


30-40 6
40-50 12
50-60 29
60-70 42
70-80 34
80-90 20
90-100 5

8. Find mean, median and mode for of the following distribution.

Class Interval Frequency


0-9 5
10-19 12
20-29 21
30-39 23
40-49 18
50-59 8
60-69 7
70-79 2

207
13.10References

1. Chandrashekhar, K S (2009). Engineering Mathematics IV. Sudha Publications,


Banglore.
2. Gupta, S C (2021). Fundamentals of Statistics. (Seventh Edition ) Himalaya
Publishing House.
3. Montgomery D C and Runger G C (2014). Applied Statistics and Probability for
Engineers. (Sixth Edition). John Wiley and Sons, Singapore
4. Trivedi, K S (1997). Probability and Statistics with Reliability, Queuing, and
Computer Science Applications. Prentice Hall of India , New Delhi.

208
UNIT-14 Elementary Probability Theory

STRUCTURE

14.0 Objectives

14.1 Introduction

14.2 Concepts of Probability

14.3 Random Variables and Probability Distributions

14.4 Theoretical Distributions

14.5 Summary

14.6 Keywords

14.7 Question for self study

14.8 References

UNIT-14.0 Objectives

After studying this unit u will be able to


 Define the phenomena of Probability and its concepts,
 Explain the theoretical distributions are introduced
 Describe the relationship among the characteristics is studied.

209
14.1 Introduction

Probability theory is concerned with the study of random or chance phenomena. In


computer science, the role of probability theory is to analyse the behaviour of a
system or algorithm. Probability Theory is rooted in the real-life situation where a
person performs an experiment, the outcome of which may not be certain. Such an
experiment is called random experiment. For example, an experiment may consist of
simple process of noting whether a component is functioning or has failed, it may
consist of determining the execution time of a program, or may consist of determining
the response time of a terminal request. The result of such observations, may be
simple ‘yes’ or ‘no’, meter reading, etc. are called outcomes of the experiment.

Sample Space:

The totality of possible outcomes of a random experiment is called sample space of


the experiment and is denoted by S. For example, if the status of two components is
observed, there are only three possible outcomes: both functioning, both not
functioning, one functioning and other failed. The sample space would be S1 = {0, 1,
2} (each number indicating how components functioning). Or it can be S2 = {(0,0),
(0,1), (1,0), (1,1)} where 0 stands for failed, 1stands for functioning. It is thought that
the elements of sample space as outcomes of the experiment. The sample space is
further classified as finite sample space (number of elements of the sample space is
finite), and countably infinite (number of elements of the sample space has one to one
correspondence with natural numbers).

Event:

An event is simply a collection of certain sample points that is subset of the sample
space. A single performance of the experiment is known as a trial. Let E be an event
defined on a sample space S, that is E is subset of S. Let the outcome of a specific
trial be denoted by s, an element of S. If s is an element of E, then it is said that the
event E has occurred. The entire sample is an event called universal event, the null set
is used to denote impossible event or null event.

210
Exhaustive Events:

An event consisting of all the various possibilities is called an exhaustive event.

Mutually exclusive Events:

Two or more events are said to be mutually exclusive if they cannot happen
simultaneously in one trial.

Independent Events:

Two or more events are said to be independent if happening or non-happening of one


event does not prevent the happening or non-happening of others.

14.2 Concepts of Probability

Mathematical or Classical Definition of Probability:

If the outcome of a trial consists n exhaustive, mutually exclusive, equally possible


outcomes of which m of them are favourable to an event E, then the probability of
event E, denoted as P(E) is given by m/n. This probability can be at most 1. (n-m)
cases are not favourable to event E. The probability of non-happening of E (E1 )is
given by (n-m)/n. Hence, P(E) + P(E1) = 1. If P(E) = 1, then E is sure event if P(E) =
0 , E is null event.

Axiomatic Definition of Probability:

If S is a sample space, E is the set of all events, then to each event A in E, a unique
real number P =P(A) is known as probability of event A, if the following axioms are
satisfied.

i. P(S) = 1
ii. For every event A in E, 0≤ 𝑃(𝐴) ≤ 1
iii. For any countable sequence of events 𝐴1 , 𝐴2 , … , 𝐴𝑛 , that are mutually
exclusive, 𝑃(⋃𝑛𝑖=1 𝐴𝑖 ) = ∑𝑛𝑖=1 𝑃(𝐴𝑖 ).

211
Addition Rule:

If A and B are any two events of S,

P(A∪ B) = P(A) + P(B) –P(A∩B). (2.1)

Conditional Probability:

Let A and B be two events. Probability of happening of event B when event A has
already occurred is called conditional probability of B given A and is denoted
byP(B|A).
𝑃(𝑏𝑜𝑡ℎ 𝐵 𝑎𝑛𝑑 𝐴) 𝑃(𝐴∩𝐵)
P(B|A) = = . (2.2)
𝑃(𝐴) 𝑃(𝐴)

Multiplication Rule:

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴), P(A) > 0. (2.3)

If A and B are two independent events, then P(B|A) = P(B). Hence (2.3) can be
written as

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵) (2.4)

Probability Space:

A collection of subsets of S that is closed under countable unions and


complementation is called 𝜎 - field of subsets of S. Probability space is defined as a
triple (S, ℱ , P), where S is a set, ℱ is a 𝜎 - field of subsets of S which includes S, and
P is a probability measure on ℱ, assumed to satisfy (i) to (iii) axioms of probability.

A group of four integrated

Example 1

circuit (IC) chips consists of two good chips g1 and g2, two defective chips d1 and d2..
If three chips are selected at random from this group, what is the probability of event
E, which is two of the three selected chips are defective?

212
Solution:

Writing all possibilities of selecting three chips out of four chips, the sample space S
is

S = { g1g2d1, g1g2d2, g1d1d2, g2d1d2}.

The two sample points favourable to E are g1d1d2and g2d1d2.

Hence, P(E) = 2/4 = 0.5.

Example 2

What is the probability that some randomly chosen k-digit decimal number is a valid
k-digit octal number.

Solution

The sample space S = {(x1, x2, ... ,xk) | xi𝜖(0, 1, ... , 9)

The event of interest is E = {(x1, x2, ... ,xk) | xi𝜖(0, 1, ... , 7)

As the number is k-digit, the total possible cases will be 10k .

For event E, it is k-digit octal number, total favourable case will be 8k.

8k 4k
P(E) = 10𝑘 = 5𝑘 .

Example 3

The probability that 3 students A, B, C solve a problem are 1/2, 1/3, 1/4 respectively.
If the problem is simultaneously assigned to all of them, what is the probability that
the problem is solved?

213
Solution:

Let E be the event of solving the problem, 𝐸̅ is the event of not solving the problem.

Given that P(A) = 1/2, P(B) = 1/3, P(C) = 1/4 and hence

P(𝐴̅ ) = 1/2, P(𝐵̅ ) = 2/3, P(𝐶̅ ) = 3/4.

P(𝐸̅ ) = probability that problem is not solved

= P(𝐴̅ ) P(𝐵̅ ) P(𝐶̅ ) = 1/2 . 2/3 . 3/4 = 1/4

And P(E) = 1-1/4 = 3/4.

Example 4

The probability that a team wins match is 3/5. If this team plays 3 matches in a
tournament, what is the probability that the team

(a) Win all matches (b) win at least one match

(c) win at most one match (d) lose all matches

Solution

Let W be the event of winning a match by the team.

P(W1) = P(W2) = P(W3) = 3/5.

̅1 ) = 𝑃(𝑊
Events are independent, 𝑃(𝑊 ̅2 ) = 𝑃(𝑊
̅3 ) = 2/5

(a) Probability of winning all matches = P(W1) . P(W2) . P(W3) = 27/125


(b) Probability of winning at least one match
= 1 - Probability of losing all matches
= 1 - 𝑃(𝑊̅1 ). 𝑃(𝑊̅2 ). 𝑃(𝑊̅3 ) = 1- (8/125) = 117/125.
(c) Probability of winning at most one match
= 𝑃(𝑊 ̅1 ). 𝑃(𝑊̅2 ). 𝑃(𝑊̅3 ) + P(W1 ) 𝑃(𝑊 ̅2 ). 𝑃(𝑊
̅3 )+
P(W2 ). 𝑃(𝑊 ̅1 )𝑃(𝑊̅3 ) + P(W3 ). 𝑃(𝑊
̅1 )𝑃(𝑊
̅2 )
8 3 2 2 44
= + 3( . . ) = .
125 5 5 5 125
(d) The probability of losing all the matches
= 𝑃(𝑊̅1 ). 𝑃(𝑊̅2 ). 𝑃(𝑊
̅3 ) = 8/125.

214
Example 5

A box has 5000 IC chips, of which 1000 are manufactured by company X and rest by
company Y. 10% of chips manufactured by company X and 5% of chips
manufactured by company Y are defective. If a randomly chosen chip is found to be
defective, find the probability that it is manufactured by company X.

Solution:

Event A = chip is manufactured by company X

Event B = chip is defective.

P(A) = 1000/5000 = 0.2

A defective chip may be manufactured by X or Y company, out of 5000 chips 300 are
defective.

P(B) = 300/5000 = 0.06

The event (𝐴 ∩ 𝐵) is the event chip is made by company X and is defective. 10% of
1000 chips manufactured by company X are defective, i.e. 100 chips manufactured by
company X are defective. Hence, P(𝐴 ∩ 𝐵) = 100/5000 = 0.02.

Probability of a defective chip is manufactured by company X is P(A|B), and is given


by
P(𝐴∩𝐵)
𝑃(𝐴|𝐵) = = 0.02/0.06 = 1/3.
𝑃(𝐵)

Example 6

In a school 25% of the students failed in first language, 15% of the students failed in
second language and 10% of the students failed in both. If a student is selected at
random, find the probability that

i. He failed in first language if he had failed in second language.


ii. He failed in second language if he had failed in first language.
iii. He had failed in either of the language.

215
Solution

Let L1 be the set of students failing in first language, L2 be the set of students failing
in second language. We have,

P(L1) = 25/100 = 1/4, P(L2) = 15/100 = 3/20, P(L1∩ L2) = 10/100 = 1/10
P(𝐿2 ∩L2 ) 1/10
i. P(L1|L2) = = 3/20 = 2/3
P(L2 )
P(𝐿1 ∩L2 ) 1/10
ii. P(L2|L1) = = = 2/5
P(L1 ) 1/4
iii. P(L1∪ L2) = P(L1) + P(L2) - P(L1∩ L2)
1 3 1 3
= 4 + 20 − 10 = 10

14.3 Random variables and Probability distributions

In a random experiment, if a real variable is associated with every possible outcome


then the real variable is called a random variable (r.v.). A r. v. X on a sample space S
is function X:S⟶ ℛ that assigns a real number X(s) to each sample point s 𝜖 S.

Example 7

1 is associated if a candidate is successful in an exam, 0 is associated if he has failed.


Here S={ f , s } and if X is r.v. then X(s) = 1, X(f) = 0 with

X={0, 1}.

Example 8

Suppose 2 chips are to be selected at random from a lot, the chip may be good(g) or
defective(d). The sample space is S = {gg, gd, dg, dd} depending on the selected
chips being good or bad. Suppose X denotes the number of good chips out of 2
selected, X = 0, 1, 2.

X = {0, 1, 2}.

216
Example 9

In a post office, the weights (in gms.) of speed post articles received in a day are: 87,
800, 225, 430, 290, 220, 350, 105, 95 represent r.v. s

If r.v. takes finite or countably infinite number of values then it is called a discrete r.v.
If a r.v. takes infinite number of values (even in small range) then it is called
continuous r.v.

Note: Generally counting problems correspond to discrete r.v.s and measuring


problems lead to continuous r.v.s.

Probability Distribution:

For each value 𝑥𝑖 of discrete random variable X, a real number 𝑝(𝑥𝑖 ) is defined such
that i. 𝑝(𝑥𝑖 )≥ 0 and ∑ 𝑝(𝑥𝑖 ) = 1, 𝑝(𝑥) is called probability mass function (pmf). The
set of values {𝑥𝑖 , 𝑝(𝑥𝑖 )} is called pmf of discrete r.v. X. 𝑝(𝑥𝑖 ) gives the probability
that the r.v. X obtained on a performance of experiment is equal to 𝑥𝑖 .

The distribution function P(x) defined by P(X≤ 𝑥) = ∑𝑥𝑖=1 𝑝(𝑥𝑖 ) is cumulative


distribution function.

Similarly, For each value 𝑥𝑖 of continuous random variable X, a real number 𝑓(𝑥𝑖 ) is
defined such that i. 𝑓(𝑥𝑖 )≥ 0 and ∫ 𝑓(𝑥𝑖 ) = 1, 𝑓(𝑥) is called probability density
function (pdf). 𝑓(𝑥𝑖 ) gives the probability that the r.v. X obtained on a performance of
experiment belongs to (𝑥𝑖 , 𝑥𝑖 + 𝛿𝑥𝑖 ), a small interval.
𝑥
The distribution function F(𝑥) defined by F(𝑥) = P(X≤ 𝑥) = ∫𝑙 𝑓(𝑥𝑖 )𝑑𝑥𝑖 is
cumulative distribution function, where 𝑙 is lower limit for r.v. 𝑥.

Example 10

In example 9, suppose the probability that the chip is defective is 0.1, then the pmf of
X = {0, 1, 2} is

P(X=0)=P (both chips are defective) = (0.1).(0.1) = 0.01.

217
P(X=1) = P(1 chip is good, 1 chip is defective) = 2.(0.9)(0.1) = 0.18

P(X=2) = P(both chips are good) = (0.9).(0.9) = 0.81.

The probability distribution of X is given by

x: 0 1 2

p(x): 0.01 0.18 0.81

Example 11

Show that the following pmf represents a discrete probability distribution.

x: 10 20 30 40

p(x): 1/8 3/8 3/8 1/8

Solution

p(x) ≥ 0, for all x and ∑ 𝑝(𝑥) = 1 . Hence p(x) represents pmf.

Example 12

Find the value of k, if the following represents a pmf.

x: -3 -2 -1 0 1 2 3

p(x): k 2k 3k 4k 5k 7k 8k.

Solution

p(x) ≥ 0 for all x and ∑ 𝑝(𝑥) = 1 if p(x) has to be pmf.

That is k+2k+3k+4k+5k+7k+8k = 1, implying that k = 1/30.

Example 13

If X is r.v. with probability function

1 2 𝑥
p(x) = ( ) , for x = 1, 2, 3, ....
2 3

find whether p(x) is pmf and the probability of X being an odd number.

218
Solution

1 2 𝑥
∑ 𝑝(𝑥) = ∑∞
𝑥=1 2 (3)

1 2 2 2 2 3
= 2 (3 + (3) + (3) + ⋯ )

1 2/3
= 2. 2 = 1, proving that p(x) is pmf.
1−( )
3

1 2 𝑥
P(x is an odd number) = ∑∞
𝑥=1,3,5,… ( ) 2 3

1 2 2 3 2 5
= 2 (3 + (3) + (3) + ⋯ )

1 2/3
=2. 2 = 3/5.
1−( )2
3

Example 14

2𝑥 , 0 < 𝑥 < 1
Find whether 𝑓(𝑥) = { is a pdf.
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Solution
1 1
∫0 𝑓(𝑥)𝑑𝑥 = ∫0 2𝑥 𝑑𝑥 = 1 , 𝑓(𝑥) is pdf.

Example 15

Find the value of c such that 𝑓(𝑥) is pdf.


𝑥
(6 ) + 𝑐 , 0 ≤ 𝑥 ≤ 3
𝑓(𝑥) = { , also find P(1 ≤ 𝑥 ≤ 2).
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

219
Solution
3
For 𝑓(𝑥) to be pdf, we must have ∫0 𝑓(𝑥)𝑑𝑥 = 1.

3
3 𝑥 𝑥2 3
∫0 ((6) + 𝑐) 𝑑𝑥 = [12 + 𝑐𝑥] = 4 + 3𝑐 .
0

3
∫0 𝑓(𝑥)𝑑𝑥 = 1 ⇒ 𝑐 = 1/12 .
2
2 𝑥 1 𝑥2 𝑥
P(1 ≤ 𝑥 ≤ 2) = ∫1 ((6) + 12) 𝑑𝑥 = [12 + 12] = 1/3.
1

Example 16

Find the distribution function of the following pdf.

6𝑥 − 6𝑥 2 , 0 ≤ 𝑥 ≤ 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Solution

The distribution function of 𝑥 having above pdf is


𝑥
𝐹(𝑥) = ∫0 6𝑢 − 6𝑢2 𝑑𝑢 = [3𝑢2 − 2𝑢3 ]0𝑥 = 3𝑥 2 − 2𝑥 3 , if 0 ≤ 𝑥 ≤ 1.

𝐄𝐱𝐚𝐦𝐩𝐥𝐞 17

The time t years required to complete a software project has pdf of the form (𝑡) =
𝑘𝑡(1 − 𝑡) , 0≤ 𝑡 ≤ 1 and is 0 otherwise. Find k and also the probability that the
project will be completed in less than 4 months.

Solution
1 1
k is the solution of ∫0 𝑓(𝑡)𝑑𝑡 = 1, i.e. ∫0 𝑘𝑡(1 − 𝑡)𝑑𝑡 = 1 implying that k =6.

The probability that the project will be completed in 4 months ( 1/3 years) is given by
1/3
P(0<t<(1/3)) = ∫0 6𝑡(1 − 𝑡)𝑑𝑡

1/3 1 2
= [3𝑡 2 − 2𝑡 3 ]0 = [3 − 27] = 7/27.

220
14.4 Theoretical distributions

Binomial Distribution:

If a random experiment with only two possible outcomes (say success and failure)
which are mutually exhaustive and exclusive is repeated n times. The probability of x
successes in n trials (with probability of success being p) is given by
𝑛
𝑃(𝑥) = ( ) 𝑝 𝑥 (1 − 𝑞)𝑛−𝑥 , x = 0, 1, 2, ... , n, (2.5)
𝑥
With 0<p<1 and p+q=1and is called Binomial Distribution.

The mean of binomial distribution is np, its variance is npq.

Poisson Distribution:

Poisson distribution is regarded as the limiting form of binomial distribution when n


is large (n→ ∞) and p the probability of success is very small (p→ 0) so that np →m (a
fixed finite constant). The probability of x successes out of n trials (with these
conditions on n and p) is given by
𝑚𝑥 𝑒 −𝑚
𝑃(𝑥) = , x = 0, 1, 2, ..... (2.6)
𝑥!

This is the Poisson distribution of r.v. X.

The mean and variance of Poisson distribution is m.

Example 18

A manufacturer produces IC chips, 1% of which are defective. Find the probability


that in a box containing 100 chips, no defective is found using i. Binomial model
ii. Poisson model

221
Solution

n=100, p=0.01.

i. with binomial model, probability of 0 defectives is given by


𝑛
𝑃(𝑥 = 0) = ( ) 0.010 (0.99)100 = (0.99)100 = 0.366
0
ii. with Poisson model, the probability of 0 defectives is given by
𝑚0 𝑒 −𝑚
𝑃(𝑥 = 0) = = 𝑒 −1 =0.367.
0!
( ∵ 𝑚𝑒𝑎𝑛 = 𝑚 = 𝑛𝑝 = 100(0.01) = 1)

Example 19

An airline knows that 5% of the people making reservations on a certain flight will
not turn up. Consequently their policy is to sell 52 tickets for a flight that can only
hold 50 passengers. What is the probability that there will be a seat for every
passenger who turns up?

Solution

P ( a passenger will not turn up) = 0.05 = p, q=0.95

Let x denote the number passengers will not turn up.

52
𝑃(𝑥) = ( ) (0.05)𝑥 (0.95)52−𝑥 , x = 0, 1, 2, ..., 52.
𝑥
A seat is assured for every passenger who turns up if the number of passengers who
fail to turn up is more than or equal to 2, the probability of which is given by

𝑃(𝑥 ≥ 2) = 1 − 𝑃(𝑥 < 2) = 1 − (𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) )

= 1 − [ (0.95)52 + 52(0.05)(0.95)51 ]= 0.7405.

Hence, probability that a seat is available for every passenger who turns up is 0.7405.

222
Normal Distribution:

A continuous probability distribution having the pdf of the form


1 𝑥−𝜇 2
1 − ( )
𝑓(𝑥) = 𝑒 2 𝜎 , (2.7)
√2𝜋 𝜎

With−∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞, 𝜎 > 0 is known as the normal distribution, if a r.v. X
has pdf (2.7), then it is written as 𝑋~𝑁(𝜇 , 𝜎 2 ). 𝜇 𝑎𝑛𝑑 𝜎 2 are the parameters of the
distribution.

The mean and variance of normal distribution are respectively 𝜇 and 𝜎 2 .

If 𝜇 = 0 and 𝜎 2 = 1, then
1 2
1
𝑓(𝑥) = 𝑒 −2𝑥 , −∞ < 𝑥 < ∞ (2.8)
√2𝜋

is known as standard normal distribution. Whenever X has normal distribution with


𝑋−𝜇
parameters 𝜇 and 𝜎 2 , then 𝑍 = will have standard normal distribution (N(0 , 1)).
𝜎

Normal distribution is symmetric distribution, which follows that its

Distribution function F satisfies F(-z) = F(z). The distribution function of N(0,1)


represented by Φ and
1
1 𝑧 − 𝑥2
Φ(𝑧) = ∫ 𝑒 2 𝑑𝑥 represents area under the standard normal curve from - ∞ to
√2𝜋 −∞
z. Tabulated values which gives the area for different values of z is readily available.
It is called normal probability table. The table may give area under standard normal
curve between −∞ and z or between 0 and z.

Note:

i. 𝑤𝑖𝑡ℎ 𝜙 is defined as in (2.8).



𝑃(−∞ ≤ 𝑧 ≤ ∞) = ∫−∞ 𝜙(𝑧)𝑑𝑧 = 1 ,

0 1 ∞
𝑃(−∞ ≤ 𝑧 ≤ 0) = ∫−∞ 𝜙(𝑧)𝑑𝑧 = (2) = 𝑃(0 ≤ 𝑧 ≤ ∞) = ∫0 𝜙(𝑧)𝑑𝑧,

223
𝑃(𝑧 ≤ 𝑧1 ) = 𝑃(−∞ ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 𝑧1 )= 0.5 + 𝑃(0 ≤ 𝑧 ≤ 𝑧1 )

𝑃(𝑧 ≥ 𝑧1 ) = P(z ≥ 0) − 𝑃(0 ≤ 𝑧 < 𝑧1 ) = 0.5- 𝑃(0 ≤ 𝑧 < 𝑧1 )

To find the area under the standard normal curve between 0 and 1.55, theoretically it
1 1.55 −1𝑥 2
is the value of ∫ 𝑒 2 𝑑𝑥. Referring to the table given in at the end of this
√2𝜋 0
section, we move vertically down along the column of z to reach 1.5 and then move
horizontally along this line to the value of 5 (regarded as 0.05) to intersect with
numerical figure 0.4394.

ii. Normal distribution is important in statistical applications because of the


central limit theorem, which states that, the mean of a sample of n mutually
independent random variables (having distributions with finite mean and
variance) is normally distributed in the limit n → ∞.

Example 20

In a test of electric bulbs, it was found that the life time of a particular brand was
distributed normally with an average life of 2000 hours and standard deviation of 60
hours. If the firm purchases 2500 bulbs, find the number of bulbs that are likely to
last for

i. more than 2100 hours


ii. less than 1950 hours
iii. between 1900 to 2100 hours.

Solution

Let X = life of bulb and has normal distribution with 𝜇 = 2000, 𝜎 = 60.
𝑋−𝜇 𝑋−2000
The standard normal variable be 𝑍 = =𝑍=
𝜎 60

𝑋−2000 2100−2000
i. P(X>2100) = P( > ) = P(Z > 1.67) = 0.0475
60 60
(from standard normal tables).
Number of bulbs that are likely to last for more than 2100 hours is 2500 X
0.0475 = 119.

224
𝑋−2000 1950−2000
ii. P( X < 1950 ) = P( < ) = P(Z < -0.83 ) = 0.2033
60 60
(from standard normal tables).
Number of bulbs that are likely to last for less than 1950 hours is 2500 X
0.2033 =508
iii. P( 1900<Z<2100) = P( - 1.67 <Z <1.67)
= 2. P( 0 < Z <1.67 ) = 0.905 (from standard normal tables).
Number of bulbs that are likely to last between 1900 and 2100 hours = 2500 X
0.905 = 2263.

Example 21

An analog signal received at a detector (measured in microvolts) may be modelled as


Normal random variable with mean 200 and variance 256 at a fixed point in time.
What is the probability that the signal will exceed 210 microvolts and 240 microvolts?

Solution
𝑋−200 210−200
P(X >210 ) = P( > ) = P( Z > 0.625) = 0.26599
16 16

𝑋−200 240−200
P(X >240 ) = P( > ) = P( Z > 0.25) = 0.00621
16 16

Mathematical Expectation:

The expectation, E(X), of a r.v. X is defined as

∑𝑖 𝑥𝑖 𝑝(𝑥𝑖 ), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸(𝑋) = { (2.9)
∫𝑥 𝑥𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠

Provided that the relevant sum or integral is absolutely convergent; that is


∑𝑖|𝑥𝑖 |𝑝(𝑥𝑖 ) < ∞ and ∫𝑥 |𝑥|𝑓(𝑥)𝑑𝑥 < ∞. If the right hand side in (2.9) is not
absolutely convergent, then E(X) does not exist.

Example 22

Suppose the pmf of X is given by

x: 0 1 2 3

p(x): 0.2 0.1 0.4 0.3

225
Find E(3X + 2X2 ).

Solution

E(X) = 0(0.2) + 1(0.1) + 2(0.4) + 3(0.3) = 1.8

E(X2) = 02(0.2) + 12(0.1) + 22(0.4) + 32(0.3) = 4.4

E(3X+4X2) = 3(1.8) + 2(4.4) = 14.2

Example 23

The time to failure in thousands of hours of an important piece of electronic


equipment used in a manufactured DVD player has the density function.

3𝑒 −3𝑥 , 𝑖𝑓 𝑥 > 0
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Find the expected life of the piece of equipment.

Solution
∞ ∞
𝐸(𝑋) = ∫0 𝑥. 3𝑒 −3𝑥 𝑑𝑥 = ∫0 𝑒 −3𝑥 𝑑𝑥 = 1/3

Hence, expected life is (1/3) thousand hours.

14.5 Summary
1. circuit (IC) chips consists of two good chips g1 and g2, two defective chips d1 and
d2.. If three chips are selected at random from this group, what is the probability of
event E, which is two of the three selected chips are defective?

2. What is the probability that some randomly chosen k-digit decimal number is a
valid k-digit octal number.

3. The probability that 3 students A, B, C solve a problem are 1/2, 1/3, 1/4
respectively. If the problem is simultaneously assigned to all of them, what is the
probability that the problem is solved?

226
4. The probability that a team wins match is 3/5. If this team plays 3 matches in a
tournament, what is the probability that the team

5. A box has 5000 IC chips, of which 1000 are manufactured by company X and rest
by company Y. 10% of chips manufactured by company X and 5% of chips
manufactured by company Y are defective. If a randomly chosen chip is found to be
defective, find the probability that it is manufactured by company X.

14.6 Keywords

Exhaustive Events

Mutually exclusive Events

Independent Events

Probability Space:

14.7 Question for self study

1. Define : Sample space, events, exhaustive events, independent events,


mutually exclusive events with examples for each.

2. Define complementary events, conditional probability. State and prove


addition rule of probability with example.

3. a) Give axiomatic definition of probability. State Bayes theorem.


b) Define random variable. Give examples of discrete random variable and
continuous random variable.
c) Find the value of k, if the following distribution represents a
probability mass function.
X: -3 -2 -1 0 1 2 3
P(X): k k/2 k/3 k/4 k/5 k/6 k/7

4. a) A machine has four parts. If the probabilities of failure of these parts


are respectively 0.5, 0.6, 0.4 and 0.8. The machine fails if any of these
parts fail. What is the probability that machine survives?
b) An aircraft is equipped with 2 engines. The probability of failure of an
engine is 0.002. If only one engine is needed for successful operation of
the aircraft, find the probability of successful flight of the aircraft.

5. a) Define i. Binomial distribution ii. Poisson distribution.

227
b) The incidence of an occupational disease in an industry is such that
the workers have 20% chance of suffering from it. What is the
probability that out of 5 workers at most two contract disease?
c) 2.5 percent of the fuses manufactured by a firm are expected to be
defective. Find the probability that a box containing 250 fuses contains
i. Defective fuses ii. 3 or more defective fuses.

6. a) Define normal distribution. State its properties.


b) The average IQ of a group of 500 children is 95. The standard deviation
is 8. Assuming normality, find the expected number of children having
IQ between 100 and 115.

14.8 References

1. Chandrashekhar, K. S. (2009). Engineering Mathematics IV. Sudha Publications,


Banglore.
2.Gupta, S. C. (2021). Fundamentals of Statistics. (Seventh Edition) Himalaya
Publishing House.
3.Montgomery, D. C. and Runger, G. C. (2014). Applied Statistics and Probability
for Engineers. (Sixth Edition). John Wiley and Sons, Singapore.
4.Rohatgi, V. K. and A. K. Md. Ehsanes SALEH(2001). An Introduction to
Probability and Statistics (Second Edition). John Wiley & Sons, Inc.
5. Trivedi, K. S. (1997). Probability and Statistics with Reliability, Queuing, and
Computer Science Applications. Prentice Hall of India , New Delhi.

228
UNIT-15 Correlation and regression analysis

STRUCTURE

15.0 Objectives
15.1 Introduction
15.2 Regression analysis
15.3 Fitting of Second degree parabola
15.4 Inverse regression
15.5 Correlation versus Regression
15.6 Summary
15.7 Keywords
15.8 Question for self study
15,9 References

15.0 Objectives

 Define given the data set, to carry out correlation and regression analyses,
 Describe interpret correlation and regression coefficients.
 Solve when to carry out correlation analysis and regression analysis.

229
15.1 Introduction

When we are studying more than one characteristics of a process, it is essential to


know the relationship among the characteristics. This can be analyzed through
Correlation Analysis. The systematic interrelationship between continuous variables is
termed as correlation. When only two variables are involved, the correlation is simple
correlation. If more than two variables are involved the correlation is termed as
multiple correlation. When the variables move in same direction, they are said to be
positively correlated. The positive correlation is also termed as direct correlation. If a
decrease in one variable cause an increase in other variable or vice-versa then the
variables are said to be negatively correlated.
Scatter Diagram:
In correlation problems, first we have to investigate whether there is any relation
between the variables, say, X and Y. For this purpose we use scatter diagram.
Let (x1,y1), (x2,y2), ….(xn,yn) be n pairs of observations. If the values of the variables
X and Y are plotted along the X-axis and Y-axis respectively on the xy-plane of graph
sheet, the resultant diagram is known as scatter diagram. Figure 1 show scatter plots
of all possible linear relationships between variables X and Y.

Figure1

230
 Linear Correlation
The scatter diagram will give only a vague idea about the presence or absence of
correlation and the nature (positive or negative) of correlation. It will not indicate
about the strength or degree of relationship between two variables. The index of the
degree of relationship between two continuous variables is known as correlation
coefficient. The correlation coefficient is denoted by r in sample and as ρ (read as rho)
in case of population. The correlation coefficient r is known as Pearson’s correlation
coefficient. (It was developed by Karl Pearson). It is also called as Product-moment
correlation.
The correlation coefficient r is used under certain assumptions. They are:
i. The variables under study are continuous random variables and are
normally distributed.
ii. The relationship between the variables is linear.
iii. Each pair of observation is uncorrelated with other pairs.
When the assumptions for the applicability of r are not met, the other
measures of association have to be employed.

Properties and Interpretation of Correlation Coefficient:


The correlation coefficient, r is a unit free measure. It ranges from -1 to +1. Usually r
will be a fraction within these limits. If we get a value of r beyond these limits, it is an
indication of wrong computation.The correlation coefficient is independent of the
origin and scale of measurements of variables.
There is no direct and simple interpretation for r itself. The relationship between the
variables is interpreted by the square of the correlation coefficient r2, which is called
the coefficient of determination. The coefficient of determination shows what amount
of variability or change in one of the variables is accounted for by the variability of
the second variable. For example, if r=0.8, r2=0.64 or 64 percent. This means that on
the basis of the sample, approximately 64% of the variation in one variable, say Y, is
caused by the variations of the other variable X. The remaining 36% variation in Y is
unexplained by variation in X. In other words, the variables other than X could have
caused the remaining 36% variation in Y.
The coefficient of determination is more useful in comparing two correlation
coefficients. For example, suppose that the correlation coefficient between the
variables X and Y is -0.6 and is 0.6 between the variables X and Z. The strength of
231
relationship between X and Y is same as that between X and Z, since in both the cases
r2=36%. By r=0 it is implied that there is no linear relationship between the two
variables. However, there may be a non-linear relationship. In other words, when the
variables are uncorrelated (independent), r=0 but when r=0 it is not necessarily true
that the variables are independent.
When r=0.6, we can not say that it indicates, two times the value of r=0.3, since
r2=0.36 or 36% in the first case and r2=0.09 or 9% in the second case. In general, the
correlation coefficient should be interpreted in the light of other disturbing factors.
Unless the disturbing factors are controlled or held constant the correlation will not
reveal the true nature of relationship between them.
Pearson’s Coefficient of Correlation:
The numerical measure of correlation between two variables x and y, denoted by r(
for n pairs of observations on x and y) is given by
∑𝑛
1 (𝑥−𝑥̅ )(𝑦−𝑦
̅)
𝑟= (3.1)
𝑛𝜎𝑥 𝜎𝑦

where𝜎𝑥 and 𝜎𝑦 are SD of x and y respectively.


(3.1) can be further simplified as
𝑛 ∑ 𝑥𝑦−(∑ 𝑥)(∑ 𝑦)
𝑟= (3.2)
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]

Suppose 𝑥̃ = (𝑥 − 𝑥̅ ), 𝑦̃ = (𝑦 − 𝑦̅); we have alternative form for 𝑟 as


∑𝑛
1 𝑥̃𝑦
̃ ∑𝑛 𝑥̃𝑦̃
𝑟= = 𝑛𝜎1 (3.3)
√𝑥̃ 2 √𝑦̃2 𝑥 𝜎𝑦

Sometimes we may be interested in the correlation of attributes. The Chi-square does


not indicate the strength of relationship between two attributes. For determining the
strength of the relationship between two attributes we may either use the coefficient ф
or Cramer’s V.
Example 1
Following are the Agricultural production index (x) of an agricultural product and its
wholesale price index(y) for eight years. Find the correlation coefficient between x
and y, interpret results.
X: 164 176 178 184 175 167 173 180
Y: 158 164 165 171 163 156 163 169

232
Solution:
Since correlation coefficient is independent of the origin and scale of measurements
of variables, correlation coefficient between x and y is same as correlation coefficient
between u = x-170 and v = y-160.
x y u v u2 v2 uv
164 158 -6 -2 36 4 12
176 164 6 4 36 16 24
178 165 8 5 64 25 40
184 171 14 11 196 121 154
175 163 5 3 25 9 15
167 156 -3 -4 9 16 12
173 163 3 3 9 9 9
180 169 10 9 100 81 90
∑ 𝑢 = 37, ∑ 𝑣 = 29, ∑ 𝑢2 = 475,
∑ 𝑣 2 = 281, ∑ 𝑢𝑣 = 356 .
𝑛 ∑ 𝑢𝑣−(∑ 𝑢)(∑ 𝑣)
𝑟= (3.4)
√[𝑛 ∑ 𝑢2 −(∑ 𝑢)2 ][𝑛 ∑ 𝑣 2 −(∑ 𝑣)2 ]

=0.9598.
Example 2
Find the correlation coefficient for the following data.

x: 10 14 18 22 26 30
y: 18 12 24 6 30 36

Solution
𝑥̅ = 20, 𝑦̅ = 21.
𝑥̃ = (𝑥 − 20), 𝑦̃ = (𝑦 − 21)
x y 𝑥̃ 𝑦̃ 𝑥̃ 2 𝑦̃ 2 𝑥̃𝑦̃
10 18 -10 -3 100 9 30
14 12 -6 -9 36 81 54
18 24 -2 3 4 9 -6
22 6 2 -15 4 225 150
26 30 6 9 36 81 54
30 36 10 15 100 225 150
233
∑ 𝑥̃ 2 = 280 , ∑ 𝑦̃ 2 = 630, ∑ 𝑥̃𝑦̃ = 252.
𝑈𝑠𝑖𝑛𝑔 (3.3), we have r = 0.6.
Pearson’s correlation coefficient can be calculated only if the characteristics under
study are quantitative (numerically measure). Spearman’s correlation coefficient can
be calculated even if the characteristics under study are qualitative. Here the values of
the variables are ranked in the decreasing or increasing order, correlation coefficient
using these ranks is computed using
6 ∑ 𝑑2
𝜌=1− (3.5)
𝑛3 −𝑛

where d is the difference between ranks of corresponding x and y.

Example 3
Marks scored by eight students in C-programming and mathematics are given below.

C-prog: 25 43 27 35 54 61 37 45
Maths: 35 47 20 37 63 54 28 40
Find correlation coefficient between the two marks scored.

Solution
Marks Ranks
C-prog. Maths R1 R2 d = R1 - R2 d2
25 35 8 6 2 4
43 47 4 3 1 1
27 20 7 8 -1 1
35 37 6 5 1 1
54 63 2 1 1 1
61 54 1 2 -1 1
37 28 5 7 -2 4
45 40 3 4 -1 1
∑ 𝑑 2 = 14, n = 8, using (3.5), r = 0.8333.
For the computation of correlation coefficient using ranks, while ranking the values,
two or more values may be equal, and so, a situation of ties may arise. In such cases,
all those values which are equal are assigned with the same average rank. And then,
the correlation coefficient is found. Here, corresponding to every repeated rank

234
(𝑚3 −𝑚)
(which repeats m times), a correction factor (CF) is added to ∑ 𝑑 2 . If one
12

rank repeats m1 times, another rank repeats m2 times, a third rank repeats m3 times
and so on, the CF is (𝑚13 − 𝑚1 )/(12) + (𝑚23 − 𝑚2 )/(12) + ⋯. The correlation
coefficient is
6[∑ 𝑑2 +𝐶𝐹]
𝜌=1− (3.6)
𝑛3 −𝑛

Other Measures Of Correlation:


There are many other measures of correlation used in different situations. For
dichotomy we may use biserial correlation. If one variable is continuous and the
other is discrete ( CPU time required and number I/O operations ) we may use point
biserial correlation. In time series data the value observed for one point of time may
be correlated with that for another previous point of time. Such correlation is known
as serial correlation.

15.2 Regression analysis

In correlation analysis, degree of association between two variables is studied. For


correlation analysis to be satisfactory, it is necessary that both the variables must be
normally distributed.
Most of the real life phenomena involve non-normal random variables or arbitrary
series of values. For example, the failure rate of electronic device and temperature;
price of CPU and speed. Here levels of temperature (or speed) are fixed arbitrarily
and have no distribution. For such problems the regression analysis is more
appropriate. It also helps in prediction of one of the variables for given value of the
other. In broader sense, regression is the theory of estimation of unknown value of a
variable with the help of known values of the variables.
Of two variables under study one may represent the cause and the other may represent
the effect. The variable representing the cause is known as independent variable and it
is denoted by X. It is also called as predictor variable or regressor or explanatory
variable. The variable representing the effect is known as dependent variable and is
denoted by Y. It is also called as predicted variable or regressand.
The relationship between the independent and dependent variables may be expressed
as a function. Such functional relationship between two variables is termed as

235
regression. When only two variables are involved the functional relationship is known
as simple regression. If the relationship between the two variables is straight line, it is
known as simple linear regression, otherwise it is called as simple non-linear
regression (price=a.speedb).
Figures 2, 3 and 4 respectively show scatter diagrams of linear dependence, nonlinear
dependence, no specific dependence.
Figure2

Figure3

236
Figure3

When there are more than two variables and one of them is assumed to be dependent
upon the others, the functional relationship between the variables is known as
multiple regressions.

Assumptions Made In the Linear Regression Analysis:


1. The X’s are non-random or fixed constants.
2. At each fixed value of X, the corresponding value of Y has normal distribution
3. For any given x, the variance of Y is same (homoscedastic).
4. The y’s observed at different values of X are completely independent.
Failure of assumptions of independence and equal variances will distort the
conclusions. However when we have large samples a moderate departure from
normality does not impair the conclusions. If any of the assumptions fail, we may
resort to transformation of either X or Y or both. The usual transformations are
logarithmic or square root transformations.
The line of the form

237
𝑦 = 𝑎 + 𝑏𝑥 (3.7)
𝑥 being independent variable is called the regression line of 𝑦 on 𝑥.
Consider a set of n given values (𝑥, 𝑦), we have to find a specific relation 𝑦 = 𝑎 + 𝑏𝑥
(determine values of 𝑎 and 𝑏)for the data to satisfy as accurately as possible and such
an equation is called the best fitting equation or the curve of best fit. The residual 𝑅 =
𝑦 − (𝑎 + 𝑏𝑥) is the difference between observed and estimated values of 𝑦. The
parameters 𝑎 and 𝑏 are found such that the sum of squares of the residuals is
minimum (least) which is called the method of least squares. Let
𝑆 = ∑𝑛𝑖=1 𝑅 2 = ∑𝑛𝑖=1(𝑦 − (𝑎 + 𝑏𝑥))2 (3.8)
Treating 𝑆 as a function of two parameters 𝑎 and 𝑏 the necessary conditions for 𝑆 to
𝜕𝑆 𝜕𝑆
be minimum are 𝜕𝑎 = 0 𝑎𝑛𝑑 = 0 which lead to
𝜕𝑏

𝑛𝑎 + 𝑏 ∑ 𝑥 = ∑ 𝑦 (3.9)
𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 = ∑ 𝑥𝑦 (3.10)
Which are called normal equations for fitting a straight line 𝑦 = 𝑎 + 𝑏𝑥, solving
which we obtain the values of 𝑎 and 𝑏. Dividing (3.9) by 𝑛, we have 𝑦̅ = 𝑎 + 𝑏𝑥̅ ,
implying that regression line passes through (𝑥̅ , 𝑦̅ ). We also know that the equation
of straight line passing through (𝑥1 , 𝑦1 ) having slope m is
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 ) (3.11)
Hence if (𝑥1 , 𝑦1 ) = (𝑥̅ , 𝑦̅ ), we get
𝑦 − 𝑦̅ = 𝑚(𝑥 − 𝑥̅ ) (3.12)
𝑦̃ = 𝑚𝑥̃ , where 𝑦̃ = 𝑦 − 𝑦̅and 𝑥̃ = 𝑥 − 𝑥̅ .
The normal equation for fitting 𝑦̃ = 𝑚𝑥̃ (using (3.10)) to find 𝑚 will be
∑ 𝑦̃𝑥̃
∑ 𝑦̃𝑥̃ = 𝑚 ∑ 𝑥̃ 2 or 𝑚 = (3.13)
∑ 𝑥̃ 2

But from (3.3), 𝑛𝑟𝜎𝑥 𝜎𝑦 = ∑ 𝑦̃𝑥̃ (3.14)


∑(𝑥−𝑥̅ )2 ∑ 𝑥̃ 2
And 𝜎𝑥2 = = implying that
𝑛 𝑛

𝑛𝜎𝑥2 = ∑ 𝑥̃ 2 (3.15)
Using (3.14) and (3.15) in (3.13), we have
𝑛𝑟𝜎𝑥 𝜎𝑦 𝜎𝑦
𝑚= = 𝑟𝜎 (3.16)
𝑛𝜎𝑥2 𝑥

Hence, the regression line of 𝑦 on 𝑥 is given by


𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 𝜎 (𝑥 − 𝑥̅ ) (3.17)
𝑥

238
Similarly assuming the equation in the form 𝑥 = 𝑎 + 𝑏𝑦 and proceeding on the same
lines as above, we obtain
𝜎
𝑥 − 𝑥̅ = 𝑟 𝜎𝑥 (𝑦 − 𝑦̅) (3.18)
𝑦

𝜎𝑦
as the regression line of 𝑥 on 𝑦. The coefficient of 𝑥 in (3.17) and 𝑦 in (3.18) are 𝑟 𝜎
𝑥
𝜎𝑥
and 𝑟 𝜎 respectively and are known as regression coefficients. The product of
𝑦

regression coefficients is 𝑟 2 . Thus 𝑟 is the geometric mean of regression coefficients.

Example 4
8x-10y+66 = 0 and 40x – 18y = 214 are the two regression lines.
i. Find the mean of x’ and y’.
ii. Correlation coefficient of x and y.
iii. 𝜎𝑦 if 𝜎𝑥 = 3.

Solution
i. We know that regression lines pass through 𝑥̅ and 𝑦̅ .
Hence, 8𝑥̅ − 10𝑦̅ = −66
40𝑥̅ − 18𝑦̅ = 214
Solving these two equations simultaneously, we have 𝑥̅ = 13, 𝑦̅ = 17.
ii. Rewriting the given equations, we have
𝑦 = 0.8𝑥 + 6.6 (i)
𝑥 = 0.45𝑦 + 5.35 (ii)
𝜎𝑦 𝜎
From (i) 𝜎 = 0.8 𝑎𝑛𝑑 𝑓𝑟𝑜𝑚 (ii) 𝑟 𝜎𝑥 = 0.45 .
𝑥 𝑦

Multiplying these two and taking square root, we have


𝑟 = √0.8(0.45) = 0.6 since both regression coefficients are positive.
𝜎𝑦
iii. We have 𝑟 𝜎 = 0.8 , for 𝜎𝑥 = 3 we have
𝑥

0.6 .𝜎𝑦 = 2.4 implying that 𝜎𝑦 = 4

Example 5
For the following data, find the two regression lines and calculate x for given value of
y = 16.
X: 36 23 27 28 28 29 30 31 33 35

239
Y: 29 18 20 22 27 21 29 27 29 28

Solution
From the data, we have the following:
n = 10, 𝑥̅ = 30, 𝑦̅ = 25, 𝑧̅ = 5, z = x – y.
∑ 𝑥 2 = 9138, ∑ 𝑦 2 = 6414, ∑ 𝑧 2 = 306
𝜎𝑥2 = 13.8, 𝜎𝑥 = 3.715 , 𝜎𝑦2 = 16.4, 𝜎𝑦 = 4.05
𝜎𝑧2 = 5.6
𝜎𝑥2 +𝜎𝑦2 −𝜎𝑥−𝑦
2
𝑟= = 0.82.
2𝜎𝑥 𝜎𝑦

Using (3.17) and (3.18) , substituting above values, the regression line of 𝑦 on 𝑥and
the regression line of 𝑥 on 𝑦 are respectively given by
𝑦 = 0.894𝑥 − 1.82
𝑥 = 0.752𝑦 + 11.2
To find the value of x for = 16 , 𝑥 = 0.752𝑦 + 11.2 is used and x is obtained as 23

15.3 Fitting of Second degree parabola

Consider a set of n given values (𝑥 , 𝑦 ) for fitting the curve


𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 (3.19)
The residual
𝑅 = 𝑦 − 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 (3.20)
is the difference between the observed and estimated values of y. We have to find
parameters 𝑎 , 𝑏, 𝑐 such that the sum of squares of the residuals is the least
(minimum).
Let 𝑆 = ∑𝑛1[𝑦 − (𝑎 + 𝑏𝑥 + 𝑐𝑥 2 )]2
Treating 𝑆 as a function of three parameters 𝑎 , 𝑏, 𝑐 the necessary conditions for 𝑆 to
𝜕𝑆 𝜕𝑆 𝜕𝑆
be minimum are = 0, 𝜕𝑏 = 0, 𝜕𝑐 = 0.
𝜕𝑎

i.e. − ∑𝑛1 𝑦 + ∑𝑛1 𝑎 + ∑𝑛1 𝑏𝑥 + ∑𝑛1 𝑐𝑥 2 = 0


− ∑𝑛1 𝑥𝑦 + ∑𝑛1 𝑎𝑥 + ∑𝑛1 𝑏𝑥 2 + ∑𝑛1 𝑐𝑥 3 = 0
− ∑𝑛1 𝑥 2 𝑦 + ∑𝑛1 𝑎𝑥 2 + ∑𝑛1 𝑏𝑥 3 + ∑𝑛1 𝑐𝑥 4 = 0
which lead to
𝑛𝑎 + 𝑏 ∑𝑛1 𝑥 + 𝑐 ∑𝑛1 𝑥 2 = ∑𝑛1 𝑦

240
𝑎 ∑𝑛1 𝑥 + 𝑏 ∑𝑛1 𝑥 2 + 𝑐 ∑𝑛1 𝑥 3 = ∑𝑛1 𝑥𝑦 (3.21)
𝑎 ∑𝑛1 𝑥 2 + 𝑏 ∑𝑛1 𝑥 3 + 𝑐 ∑𝑛1 𝑥 4 = ∑𝑛1 𝑥 2 𝑦
are normal equations for fitting the second degree parabola
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 in the least square sense. By solving these equations, we obtain the
values of 𝑎 , 𝑏, 𝑐.

𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝟔
Fit a parabola of second degree
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
for the data given below.
X: 0 1 2 3 4
Y: 1 1.8 1.3 2.5 2.3

Solution
The normal equations for finding the values of 𝑎 , 𝑏, 𝑐 of
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
are given in (3.21). The calculations are:
x y xy x2y x2 x3 x4
0 1 0 0 0 0 0
1 1.8 1.8 1.8 1 1 1
2 1.3 2.6 5.2 4 8 16
3 2.5 7.5 22.5 9 27 81
4 2.3 9.2 36.8 16 64 256
∑ 𝑥 = 10, ∑ 𝑦 = 8.9, ∑ 𝑥𝑦 = 21.1,

∑ 𝑥 2 𝑦 = 66.3, ∑ 𝑥 2 = 30, ∑ 𝑥 3 = 100, ∑ 𝑥 4 = 354.

Using these values in (3.21), solving for 𝑎 , 𝑏, 𝑐, we obtain


𝑎 = 1.078, 𝑏 = 0.414, 𝑐 = −0.021 , using which the best fit of second degree
parabola would be
𝑦 = 1.078 + 0.414, 𝑥 − 0.021𝑥 2
Example 7
Fit a parabola 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 by the method of least squares for
the data.
x 2 4 6 8 10

241
y 3.07 12.85 31.47 57.38 91.29

Solution
𝑥−6
n =5, 𝑦̅ = 39, and let 𝑥̃ = , 𝑦̃ = 𝑦 − 39.
2

The normal equations for fitting 𝑦̃ = 𝑎 + 𝑏𝑥̃ + 𝑐𝑥̃ 2 are


∑ 𝑦̃ = 𝑛𝑎 + 𝑏 ∑ 𝑥̃ + 𝑐 ∑ 𝑥̃ 2
∑ 𝑥̃ 𝑦̃ = 𝑎 ∑ 𝑥̃ + 𝑏 ∑ 𝑥̃ 2 + 𝑐 ∑ 𝑥̃ 3
∑ 𝑥̃ 2 𝑦̃ = 𝑎 ∑ 𝑥̃ 2 + 𝑏 ∑ 𝑥̃ 3 + 𝑐 ∑ 𝑥̃ 4
The relevant computations are:
x y 𝑥̃ 𝑦̃ 𝑥̃𝑦̃ 𝑦̃𝑥̃ 2 𝑥̃ 2 𝑥̃ 3 𝑥̃ 4
2 3.07 -2 -35.93 71.86 -143.72 4 -8 16
4 12.85 -1 -26.15 26.15 -26.15 1 -1 1
6 31.47 0 -7.53 0 0 0 0 0
8 57.38 1 18.38 18.38 18.38 1 1 1
10 91.29 2 52.29 104.58 209.16 4 8 16
∑ 𝑥̃ = 0, ∑ 𝑦̃ = 1.06, ∑ 𝑥̃ 𝑦̃ = 220.97 , ∑ 𝑥̃ 2 𝑦̃ = 57.67,
∑ 𝑥̃ 2 = 10, ∑ 𝑥̃ 3 = 0, ∑ 𝑥̃ 4 = 34
Substituting these values in normal equations, we have
5a + 10c = 1.06
10b = 220.97
10a +34c = 57.67
Solving these, the values of a, b, c are obtained as
a = - 7.73, b = 22.1, c = 3.97
𝑦̃ = 𝑎 + 𝑏𝑥̃ + 𝑐𝑥̃ 2 becomes,
𝑥−6 𝑥−6 2
𝑦 − 39 = −7.73 + 22.1 ( ) + 3.97 ( ) , on simplification we get
2 2

𝑦 = 0.7 − 0.86𝑥 + 0.9925𝑥 2 .

15.4 Inverse regression


Situations exist where we may know the value of the dependent variable Y for an
individual unit, and wish to estimate the corresponding value of independent variable
X. For example if a surgery lasts for three hours what should be the amount of
anesthetic drug to be given. For such cases we cannot use the regression equation of X

242
on Y as x=c+dy, as the assumptions of regression are violated. Instead, we re-arrange
the linear regression of Y on X to obtain
𝑦−𝑎
X= 𝑏

This procedure of prediction of x from y is known as inverse prediction.

15.5 Correlation and regression


The choice of regression or correlation analysis depends upon the nature
of data and the purpose. In regression analysis, the purpose is to express one variable
as a function of another variable. Using this functional relationship we may predict the
values of dependent variable Y using the values of independent variable X. In
correlation we do not intend to find such relationship between the variables. We
merely know whether the variables are interdependent or not. There is no distinction
between independent and dependent variables in correlation analysis. In correlation
analysis both variables are assumed to be normal random variables. If one of them is
non-random and fixed and has no distribution, correlation technique cannot be used.

15.6 Summary
1. Following are the Agricultural production index (x) of an agricultural product and
its wholesale price index(y) for eight years. Find the correlation coefficient between x
and y, interpret results.
2. Find the correlation coefficient for the following data.
3. Marks scored by eight students in C-programming and mathematics are given
below.
4. 8x-10y+66 = 0 and 40x – 18y = 214 are the two regression lines.
5. For the following data, find the two regression lines and calculate x for given value
of

15.7 Keywords
Correlation Analysis
correlation coefficient
regressand
parabola

243
15.8 Question for self study

1. Explain the use of scatter in finding the correlation between two


variables.

2. State the assumptions under which correlation coefficient r is used.


Discuss properties and interpretation of correlation co-efficient.

3. Find the correlation co-efficient between years of schooling and annual


income (in lacs of Rs.) for following data.
Years of Schooling: 12 13 14 17 15 20 16 9
Annual Income: 20 11 15 18 10 26 15 13

4. Explain regression analysis. Describe scatter diagrams for showing linear


dependence, non-linear dependence and no dependence.

5. Define correlation, regression. Explain interpretation of R2 with an


example. Describe fitting of second degree parabola by principle of
least squares.

6. a) Compare correlation analysis with regression analysis.


b) Fit a linear regression of Y (HDL level) on X (age), find the value of HDL for
age 62 .

Age (X) : 35 38 41 44 47 50 53 56 59
HDL level: 50 56 44 49 46 49 45 51 48

15.9 References

1. Chandrashekhar, K S (2009). Engineering Mathematics IV. Sudha Publications,

Banglore.

2. Gupta, S C (2021). Fundamentals of Statistics. (Seventh Edition ) Himalaya


Publishing House.

244
3. Montgomery D C and Runger G C (2014). Applied Statistics and Probability for
Engineers. (Sixth Edition). John Wiley and Sons, Singapore

4. Trivedi, K S (1997). Probability and Statistics with Reliability, Queuing, and


Computer Science Applications. Prentice Hall of India , New Delhi.

245
UNIT-16 Testing of Hypothesis

STRUCTURE

16.1Introduction
16.1 Large Sample tests
16.2 Small sample tests
16.3 Testing for population variance
16.4 Tests based on Chi-square distribution
16.5 Introduction to Monte Carlo Methods
16.6 Summary
16.7 Keywords
16.8 Question for self-study
16.9 References

16.0 Objectives

 Define the basic concepts of Testing Hypotheses,


 Slove the study various tests used in different scenarios.

246
16.1 Introduction

Many practical problems require us to make decision about a population by


examining a sample from that population. For example, a computer centre manager
may have to decide whether or not to upgrade the capacity of his installation. In order
to arrive at a decision, we often make an assumption or guess about the nature of the
underlying population. Such an assertion, which may or may not be valid, is called a
statistical hypothesis. Procedures that enable us to decide whether to reject or accept
hypotheses, based on the information contained in a sample, are called statistical tests.

Null and Alternate Hypotheses:

The statement of agreement with conditions presumed to be true in the population of


interest will be the null hypothesis. The statement of agreement that the analyst
arrives at after systematic study will be the alternate hypothesis. Thus, the compliment
of the conclusion that the analyst is seeking to reach becomes the statement of null
hypothesis. In general, the null hypothesis is set up for the purpose of being
discredited. If the test procedure leads to rejection of null hypothesis, we say that the
data is supportive of some other hypothesis, this other hypothesis is known as
alternative hypothesis.

For example, researcher wish to know is there a difference in uric acid levels of
normal individuals and individuals with mongolism. Another example could be, two
vaccines are same in effectiveness. Similarly we may be interested in testing that the
job arrival rate 𝜆 for a certain computer system satisfies 𝜆 = 𝜆0 . The term null
hypothesis is used for any hypothesis set primarily to see whether it can be rejected.
Even in non-statistical thinking this is what is done. In a court of law, an accused is
assumed to be innocent unless he is proven guilty beyond a reasonable doubt.

The experimental evidence, upon which the test is based, will consist of a random
sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 of size n. The hypothesis testing procedure consists of dividing
n-space of observations into two regions, R(H0) and R(H1). If the observed vector
(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) lies in R(H1), the null hypothesis is rejected. The region R(H0) is
known as the acceptance region and R(H1) is critical region or rejection region.

247
Type I and Type II Errors:

To make a decision on the hypothesis with certainty we would have to examine the
entire population. In many problems the populations are very large. Hence the
decision is made on the basis of sample data. As the decision is based on sample data,
we may make a correct decision on null hypothesis or commit one of the following
two errors. We may reject the null hypothesis when in fact it is true, i.e. Type I error.
The corresponding probability is denoted by 𝛼, is known as level of significance of
the test. Similarly, we may accept the null hypothesis when in fact it is not true, i.e.
Type II error is committed. The probability of type II error is denoted as 𝛽.When we
say that P( type I error) = 𝛼 and P( type II error) = 𝛽, we mean that if a test is
performed large number of times, 𝛼 proportion of time we reject H0 when it is true,
and 𝛽proportion of time we will fail to reject H0, when in fact it is false.Whenever a
null hypothesis isrejected there is always a risk of committing Type I error (rejecting a
true null hypothesis). Whenever a null hypothesis is accepted, there is always a risk of
committing Type II error (accepting a false null hypothesis). An error of Type I or
Type II leads to wrong decision, we must attempt to minimize these errors. The
relationship between Type I error and Type II error is that if one error increases, the
other error will decrease. Hence both errors cannot be controlled simultaneously. It is
customary to fix an upper bound for the probability of Type I error (as it is more
serious)and , the probability of Type II error is minimized as far as possible. The
probability of correct decision of rejecting the null hypothesis when it is false is
known as the power of the test, which is equal to (1 – 𝛽). The example for Type I
error is more serious than Type II error is : Judging agood quality article to be a bad
one is Type I error which is more serious than judging a bad article as a good one. To
make a decision on the hypothesis of population parameter with certainty, the
population parameter is estimated through the statistic. If the estimated statistic and
assumed population parameter differ significantly, it means that the discrepancy
between the statistic and the parameter is too large to be reasonably attributed to
chance. This is a test of significance. The difference between the parameter and the
statistic is known as the sampling error.

Based on the sampling error, the sampling distributions are derived. The observed
results are then compared with the results expected on the basis of the sampling

248
distribution. If the difference between the observed and expected results is more than
a specified quantity of the standard error of the statistic, it is said to be significant at a
specified probability level. If the difference is significant, the null hypothesis is
rejected; otherwise it is accepted. The process of deciding whether to accept or reject
the null hypothesis is known as testing of hypothesis.

The steps involved in hypothesis testing are

Defining the objective (framing the hypothesis)

Data collection

Designing the test procedure

Decision making about objective

Conclusion

But most of the researchers use packages for their analysis. In that case the steps
involved are:

Defining the objective

Data to be identified

Decide which test to use

Using appropriate package

Interpretation

P-values:

P-value for a test may be defined as the smallest value of 𝛼 for which the null
hypothesis is rejected. When you perform a hypothesis test, a P-value helps you
determine the significance of your results. A small P-value indicates strong evidence
against the null hypothesis, so you reject a null hypothesis and a significant difference
does exist. Reporting of P-values as part of the results of an investigation is more
informative to the reader than such statements as the null hypothesis is rejected at
0.05 level of significance or the results are not significant at the 0.05 level. It is the

249
smallest level of significance at which null hypothesis is rejected. So the smaller P-
value implies stronger evidence in favour of alternative hypothesis.

In regression analysis the P-value for each term tests the null hypothesis that the
coefficient is equal to zero (no effect). A low P-value indicates that you can reject the
null hypothesis.

Test Statistic:

The test statistic is some statistic (function of observations) that may be computed
from data of sample. The magnitude of the test statistic decides to reject or not to
reject the null hypothesis.

(relevant statistic − hypothesized parameter)


Test Statistic = .
(standard error of the relevant statistic)

Relevant statistic is the statistic relevant to null hypothesis (such as hypothesis for
mean, variance, proportion,....). Standard error is the standard deviation of sampling
distribution of statistic considered.

Sampling Distribution:

Sample mean, sample variance, sample proportion,.... are statistics. Distributions of


sample mean, sample variance, sample proportion,....are sampling distributions.
Example for distribution of sample mean:

Ages of children in OPD are 6, 8,10, 12, and 14.

All possible samples of size n=2 from a population of size N=5 (Sample means are in
parentheses)

Popl. Unit 6 8 10 12 14

6 6,6(6) 6,8(7) 6,10(8) 6,12(9) 6,14(10)

8 8,6(7) 8,8(8) 8,10(9) 8,12(10) 8,14(11)

10 10,6(8) 10,8(9) 10,10(10) 10,12(11) 10,14(12)

12 12,6(9) 12,8(10) 12,10(11) 12,12(12) 12,14(13)

14 14,6(10) 14,8(11) 14,10(12) 14,12(13) 14,14(14)

250
Sampling Distribution of 𝑥̅ computed for above data is given below.

𝑥̅ Frequency Relative Frequency

6 1 1/25

7 2 2/25

8 3 3/25

9 4 4/25

10 5 5/25

11 4 4/25

12 3 3/25

13 2 2/25

14 1 1/25

Total 25 25/25=1

This is sampling distribution of sample mean.

Population mean = (6+8+10+12+14)/5 = 10.

Population Variance = variance of 6,8,10,12,14 is 8

Mean of sample means ( X) is 10.

Variance of sample means = 4

Standard error =2

Chi-square, t and F distributions are sampling distributions of some statistics.

Degrees of Freedom:

Number of degrees of freedom is the number of independent observations in the data


or number of independent values generated by a sample.

251
Testing Single Population Mean:
The testing of hypothesis about a population mean is considered under three different
conditions:1) sampling from a normally distributed population of values with known
variance (large sample) 2) sampling from a normally distributed population with
unknown variance (small sample) 3) sampling from a population that is not normally
distributed.

16.2 Large Sample tests

Normally Distributed Population with known variance:


Suppose researchers are interested in the mean level of certain characteristic of a
population. The data available will be the observations made on a characteristic of n
individuals (items) from the population of interest, i.e. 𝑥1 , 𝑥2 , … , 𝑥𝑛 is available.

The assumption will be sample comes from a normally distributed values with a
known variance σ2.

The hypothesis H0, to be tested is population mean µ = µ0. The alternate hypothesis
H1 is µ ≠ µ0.

The test statistic is

𝑥̅ −µ
Z =𝜎/√𝑛0 (4.1)

The test statistic Z is normally distributed with mean 0 and variance 1, if H0 is true.

Reject H0 if the computed value of Z falls in the rejection region, otherwise accept H0.
To specify rejection region, we should know for what values of the test statistic H0
will be rejected. If the null hypothesis is false, it may be so either because true mean
is less than µ0 or is greater than µ0. Therefore sufficiently large values or sufficiently
small values of Z will cause rejection of null hypothesis. In other words extreme
values of Z results in the rejection of H0. How extreme must a possible value of the
test statistic be to qualify for the rejection region? The answer depends on the
significance level we choose, i.e. the size of the probability of committing a Type I
error. Suppose Probability of rejecting a true null hypothesis be α=0.05. Since our
rejection region consists of two parts (sufficiently small values and sufficiently large
values of the test statistic), part of α will have to be associated with the large values

252
and part with small values. It is reasonable to divide α equally i.e. (α/2) =0.025 be
associated with small values and (α/2)=0.025 be associated with large values. Now,
we should know what is the value of Z to the right of which lies 0.025 of the area
under the unit normal distribution.

From the standard normal tables for 0.025, we locate Zα/2=1.96.i.e. P(Z ≥ zα/2 ) =0.025
and P(Z ≤ zα/2 ) =0.025. Thus the values of Z ≤ -1.96 or Z ≥1.96 form the critical
region.

𝑥̅ −µ
For 𝑥̅ =22, µ0=25, σ2=45, n=10, Z =𝜎/√𝑛0 = -1.41.

The calculated value of Z is greater than -1.96 but less than 1.96, the null hypothesis
is accepted implying that the computed value of the test statistic is not significant at
0.05 level.

The P-value for this test is

P(Z≤ -1.41) + P(Z ≥ 1.41) = 0.0793+0.0793 = 0.1586.


(using standard normal tables).

One sided hypothesis test: Suppose the hypothesis to be tested is H0 : µ > µ0, to
specify the rejection region, we have to consider those values of Z that would cause
rejection of the null hypothesis. Looking at the hypothesis, sufficiently small values
would cause rejection of null hypothesis. As it is one sided test, the whole of α =0.05
will go in the one tail of the distribution. From normal tables, for what values of Z to
the left of which lies 0.05 of the area of standard normal curve is –1.645.

 Reject H0 if Zcal - 1.645.

The p value is P (Z  - 1.41) = 0.0793

Hypothesis Testing for Difference between Two Populations Means:

The hypothesis to be tested may be

H 0 : 1   2  0 H1 : 1   2  0
H 0 : 1   2  0 H1 : 1   2  0
H 0 : 1   2  0 H1 : 1   2  0

253
The test statistic to be used is (population variances known)

(𝑥̅ 1 −𝑥̅ 2 )−(𝜇1 −𝜇2 )


𝑍= (4.2)
𝜎2 𝜎2
√ 1+ 2
𝑛1 𝑛2

If population variances are not known

(𝑥̅ 1 −𝑥̅ 2 )−(𝜇1 −𝜇2 )


𝑍= (4.3)
𝑆2 2
𝑝 𝑆𝑝
√ +
𝑛1 𝑛2

(𝑛1 −1)𝑆12 +(𝑛2 −1)𝑆22


𝑆𝑝2 = (4.4)
𝑛1 +𝑛2 −2

Sampling From a Population that is Not Normally Distributed:

If the sample on which we base our hypothesis test about a population mean comes
from a population that is not normally distributed, using central limit theorem, we can
use

x  0
Z if population variance is known
 n

x  0
 Otherwise.
S n

16.2 Small sample tests


t-distribution:

If sample size n is small, then the distribution of Z is far from normality and
consequently above tests cannot be used. To deal with small samples, exact sample
tests have been developed. From practical point of view, if n is ≤ 30, it is termed as
small sample.

If 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample of size n from a normal population with mean 𝜇


and variance 𝜎 2 , then the statistic t is defined as

𝑥̅ −𝜇 𝑥̅ −𝜇
𝑡 = 𝑆/ = , (4.5)
√𝑛 √𝑆 2 /𝑛

254
1
where𝑆 2 = 𝑛−1 ∑(𝑥 − 𝑥̅ )2 is an unbiased estimate of 𝜎 2 .

t defined in(4.5) has student’s t distribution with 𝜈 = (n-1) d.f. and with probability
density function(pdf)

1 𝑡2
f(t) = 1 𝜈 . (1 + 𝜈 )−(𝜈+1)/2 , −∞ < 𝑡 < ∞.
√𝜈𝛽(2 ,2)

Let P(t > 𝑡𝜈 (𝛼)) = 𝛼 (4.6)

The value of 𝑡𝜈 (𝛼) defined in (4.6) is called upper critical value of t for 𝜈 d.f.
corresponding to significance level 𝛼.

The two tailed critical values of t for 𝜈 d.f. corresponding to significance level 𝛼 with
equal tails, each of area (𝛼/2) are given by 𝑡𝜈 (𝛼/2) (positive critical value) and
−𝑡𝜈 (𝛼/2) (negative critical value). These critical values of t have been tabulated for
different values of 𝛼 and 𝜈 and are given at the end of this section. t test can be used
to

i. test the significance of mean, population variance being unknown.

ii. test the significance of difference between two sample means,

population variance being equal but unknown.

Sampling from a normally distributed population(Variance Unknown):

When variance of population is not known, the test statistic to be used to test H0 : =
0 is

𝑥̅ −𝜇0
𝑡= (4.7)
𝑆/√𝑛

which follows student’s t distribution with (n-1) degrees of freedom.

The other decision rules are same.

Paired Comparisons

The difference in two population means was studied when the two samples are
independent. A method frequently employed for assessing the effectiveness of a
treatment or experimental procedure is one that makes use of related observations

255
resulting from non-independent samples. A hypothesis test based on this type of data
is known as a paired comparison test.

Related or paired observations may be obtained in a number of ways. The same


subjects may be measured before and after receiving some treatments. Pairs of twins
or siblings may be randomly assigned to two treatments in such a way that members
of single pair receive different treatments. In comparing two methods of analysis, the
material to be analyzed may be equally divided so that one half is analyzed by one
method and the other half is analyzed by the other. Here instead of performing the
analysis with individual observations, we use the difference between individual pairs
of observation as the variable of interest.

The hypothesis about the population means difference d is tested using

𝑑̅ −𝜇𝑑
𝑡= (4.8)
𝑆𝑑−

d is the mean of difference of sample observations, 𝑆𝑑− = S d / n , Sd is the standard

deviation of differences. When H0 is true, the test statistic is distributed as student’s t


with (n-1) df.

If population variance of differences is known, the test statistic will be

𝑑̅−𝜇𝑑
𝑍=𝜎 (4.9)
𝑑 /√ 𝑛

t- test for Difference of Means

Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 and 𝑦1 , 𝑦2 , … , 𝑦𝑚 be two independent random samples from the


normal population. The hypothesis to be tested is that the samples have been drawn
from normal populations with same means. That is sample means 𝑥̅ and 𝑦̅ do not
differ significantly with the assumption that population variances are equal but
unknown. The test statistic given in (4.10) has t-distribution with (n+m – 2) d.f.

𝑥̅ −𝑦̅
𝑡= 1 1
(4.10)
𝑆√ +
𝑛 𝑚

1
Where 𝑆 2 = 𝑚+𝑛−2 [∑(𝑥 − 𝑥̅ )2 + ∑(𝑦 − 𝑦̅)2 ] (4.11)

is an estimate of the common population variance 𝜎 2 based on both samples.

256
16.3 Testing Single population Variance

When the data available for analysis consists of a simple random sample drawn from
a normally distributed population, the test statistic for testing

𝐻0 : 𝜎 2 = 𝜎02 , 𝐻1 : 𝜎 2 ≠ 𝜎02

(𝑛−1)𝑆 2
𝑌= (4.12)
𝜎02

where n is size of sample, S2 is sample variance. The above statistic is distributed as


𝜒 2 (chi-square) with (n-1) degrees of freedom.

Hypothesis testing for ratio of two population variances

Suppose the data constitute two independent random samples each drawn from a
normally distributed population. The hypothesis to be tested is

𝐻0 : 𝜎12 ≤ 𝜎22 , 𝐻1 : 𝜎12 > 𝜎22

The test statistic would be

𝑆2
𝑉 = 𝑆12 (4.13)
2

S12 and S 22 are sample variances of two samples. When the null hypothesis is true, V is
distributed as F with n1-1 and n2-1degrees of freedom.

Example 1

Milk sachets of 500ml. each were filled by a machine for which standard deviation of
filling is 5 ml. 72 sachets are tested for their contents and the mean content is found to
be 501.1 ml. Test whether the machine is set properly.

Solution

𝜇0 = 500, 𝜎 = 5, 𝑥̅ = 501.1, 𝑛 = 72, let 𝛼 = 0.05

257
𝐻0 : 𝜇 = 500 (machine is set properly)

𝐻1 : 𝜇 ≠ 500 (machine is not set properly)

Under 𝐻0 , the test statistic (as given in (4.1))is N(0,1).

From given information, using (4.1), 𝑍𝑐𝑎𝑙 is 1.87.

From standard normal tables, at 5% level of significance, the critical values are
−𝑘𝛼/2 = −1.96 𝑎𝑛𝑑 𝑘𝛼/2 = 1.96 (two sided test). 𝑍𝑐𝑎𝑙 ∈ (−1.96 , 1.96). Hence,
accept 𝐻0 that is machine is set properly.

Example 2

A firm manufactures resistors. The standard deviation of their resistance is known to


be 0.02 ohms. It is required to test whether their mean resistance is 1.4 ohms. A
random sample consisting of 64 resisters have a mean of 1.39 ohms. Based on this
sample can we conclude that the mean resistance of whole lot is 1.4 ohms?

Solution

𝜇0 = 1.4, 𝜎 = 0.02, 𝑥̅ = 1.39, 𝑛 = 64, let 𝛼 = 0.05

𝐻0 : 𝜇 = 1.4 and 𝐻1 : 𝜇 ≠ 1.4 .

Under 𝐻0 , the test statistic (as given in (4.1)) is N (0,1).

From given information, using (4.1), 𝑍𝑐𝑎𝑙 is - 4.

𝑍𝑐𝑎𝑙 ∉ (−1.96 , 1.96) , hence 𝐻0 is rejected.

Mean resistance of the resistors is not equal to 1.4 ohms.

Example 3

For a population, it is believed that the average height is greater than 180cms. with
standard deviation of 3.3cms. Randomly, 50 individuals were selected and their
heights are measured. The average height is found to be 181.1cms. Test the belief
regarding height of population at 1% significance level.

Solution

258
𝜇0 = 180, 𝜎 = 3.3, 𝑥̅ = 181.1, 𝑛 = 50, 𝛼 = 0.01

𝐻0 : 𝜇 = 180 and 𝐻1 : 𝜇 > 180

Under 𝐻0 , the test statistic (as given in (4.1)) is N(0,1).

At 1% significance level, the critical region is Z > 2.33.

𝑍𝑐𝑎𝑙 = 2.36> 2.33, 𝐻0 is rejected.

It is concluded that, the data supports the belief of height being greater than 180cms.

Example 4

A shop sells on an average 200 pen drives per day with a standard deviation of 50 pen
drives. After an extensive advertising campaign, the management computed the
average sales for next 25 days and is found to be 216. Find whether an improvement
has occurred or not, assuming normal distribution.

Solution

𝜇0 = 200, 𝜎 = 50, 𝑥̅ = 216, 𝑛 = 25, 𝑙𝑒𝑡𝛼 = 0.05

𝐻0 : 𝜇 = 200,𝐻1 : 𝜇 > 200.

Under 𝐻0 , the test statistic (given in(4.1)) has N(0,1) distribution.

𝑍𝑐𝑎𝑙 = 1.6 , and the critical region for 𝐻0 is 1.645, hence 𝐻0 is accepted.

The conclusion is, the sample do not support the improvement in sales of pen drives
due to advertisement.

Example 5

An insurance agent claims that the average age of policy holders who insure through
him is less than 30.5 years. A random sample of 100 policy holders who had insured
through him had the average age of 28.8 years and standard deviation of 6.35 years.
Test whether the claim of the agent is true at 5% significance level.

Solution

𝜇0 = 30.5, 𝑠 = 6.35, 𝑥̅ = 28.8, 𝑛 = 100, 𝑙𝑒𝑡𝛼 = 0.05.

259
Since sample size is large(>30), we may use the value of sample standard deviation as
the value of 𝜎.

𝐻0 : 𝜇 = 30.5,𝐻1 : 𝜇 < 30.5

Hence, substituting the values of 𝜇, s, 𝑥̅ , n in (4.1), we get

𝑍𝑐𝑎𝑙 = −2.68 . |Z| = 2.68 > 1.645. Hence 𝐻0 is rejected and the average age of policy
holders who insure through him is less than 30.5 years.

Example 6

An investigation of the relative merits of two kinds of flash light batteries showed that
a random sample of 100 batteries of brand A lasted on the average 36.5 hours with a
standard deviation of 1.8 hours, while a random sample of 80 batteries of brand B
lasted on the average 36.8 hours with a standard deviation of 1.5 hours. Test whether
the observed difference between the average life times is significant.

Solution

𝑥̅𝐴 = 36.5, 𝑥̅𝐵 = 36.8, 𝑆𝐴 = 1.8, 𝑆𝐵 = 1.5, 𝑛𝐴 = 100, 𝑛𝐵 = 80,

𝑙𝑒𝑡𝛼 = 0.05. 𝑆𝐴2 = 3.24, 𝑆𝐵2 = 2.25.

𝐻0 : 𝜇𝐴 = 𝜇𝐵 , 𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵

Under null hypothesis, the test statistic defined in (4.3) has N(0,1) distribution, and
for given values, using (4.3),

𝑍𝑐𝑎𝑙 = −1.2195 , so that |𝑍𝑐𝑎𝑙 | = 1.2195< 1.96.

Hence, 𝐻0 is accepted and the data do not provide any evidence against null
hypothesis.

Example 7

The mean yield of wheat from district A was 210kgs with standard deviation of 10kgs
per hectare from sample of 100 plots. In another district B the yield was 220kgs with
standard deviation of 12kgs per hectare from a sample of 150 plots. Assuming that the
standard deviation of the entire state was 11kgs, test whether the mean yields of
districts A and B differ significantly.

260
Solution

𝑥̅𝐴 = 210, 𝑥̅𝐵 = 220, 𝑆𝐴 = 10, 𝑆𝐵 = 12 , 𝑛𝐴 = 100, 𝑛𝐵 = 150,

𝑙𝑒𝑡𝛼 = 0.01. 𝜎𝐴 = 𝜎𝐵 = 𝜎 = 11

𝐻0 : 𝜇𝐴 = 𝜇𝐵 𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵

Under null hypothesis, the test statistic defined in (4.3) has N(0,1) distribution, and
for given values, using (4.3)

𝑍𝑐𝑎𝑙 = −7.05 , so that |𝑍𝑐𝑎𝑙 | = 7.05> 2.58.

Hence, 𝐻0 is rejected and the data do provide evidence against null hypothesis, the
mean yield of crops in two districts differ significantly.

Example 8

A machine is designed to produce insulating washers for electrical devices of average


thickness of 0.025 cm. A random sample of 10 washers was found to have an average
thickness of 0.024 cm with a standard deviation of 0.002 cm. Test the significance of
the deviation. Value of t for 9 d.f. at 5% significance level is 2.262.

Solution

n = 10, 𝑥̅ = 0.024, s = 0.002.

H0 :𝜇 = 0.025, H1 : 𝜇 ≠ 0.025

Under H0 , the test statistic given in (4.5) has t-distribution with 9 d.f. The calculated
value for given values is 𝑡𝑐𝑎𝑙 = −1.5.

H0 is not rejected since |𝑡𝑐𝑎𝑙 | < 2.62, it is concluded that sample mean do not differ
significantly from population mean.

Example 9

A group of 5 patients treated with medicine A weigh 42, 39, 48, 60 and 41 kgs.
Another group of 7 patients of same hospital treated with medicine B weigh 38, 42,
56, 64, 68, 69 and 62 kgs. Test whether medicine B increases weight significantly?
The table value of t at 5% significance level for 10 d.f. is 2.2281.

261
Solution

H0 :𝜇𝐴 = 𝜇𝐵 , H1 : 𝜇𝐴 < 𝜇𝐵

From the data we get 𝑥̅𝐴 = 46, 𝑦̅𝐵 = 57, 𝑛 = 5, 𝑚 = 7, using (4.11) 𝑆 2 = 121.6.

Substituting these in (4.10), we get 𝑡𝑐𝑎𝑙 = −1.7, tabulated value of t at 5%


significance level for 10 d.f. is 1.81 (left tailed test). Since |𝑡𝑐𝑎𝑙 | < 1.81, H0 is
accepted, it is concluded that the two medicines do not differ significantly as regard to
increase in weight.

Example 10

An IQ test was administered to 5 persons before and after they were trained. The
results are given below.

Candidates: 1 2 3 4 5

IQ before Tr: 110 120 123 132 125

IQ after Tr: 120 118 125 136 121

Test whether there is any change in IQ after the training.

Solution

H0 :𝜇𝑥 = 𝜇𝑦 , H1 : 𝜇𝑥 ≠ 𝜇𝑦

The values of d for 5 candidates are:

d: -10 2 -2 -4 4

𝑑̅ = −2, ∑ 𝑑 2 = 140, hence 𝑆 2 = 30.

|𝑑̅|
Using (4.8), |𝑡𝑐𝑎𝑙 | = = 0.816.
√30/5

The tabulated value of t for 4 d.f. and at 1% level of significance, for a two tailed test
is 4.60. Hence, |𝑡𝑐𝑎𝑙 |< 4.60. Hence, H0 is accepted and it is concluded that the data do
not support the hypothesis of change in IQ due to training.

262
16.4 Tests based on Chi-square distribution

Chi-square Distribution

The square of standard normal variable is called chi-square variate with 1 d.f. Thus if
𝑥−𝜇 2
X is r.v. having normal distribution with mean 𝜇 and variance 𝜎 2 , then ( ) is chi-
𝜎

square variate with 1 d.f. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n independent r.v.s with means


𝜇1 , 𝜇2 , … , 𝜇𝑛 and standard deviations 𝜎1 , 𝜎2 , … , 𝜎𝑛 respectively, then the variate

𝑥1 −𝜇1 2 𝑥2 −𝜇2 2 𝑥𝑛 −𝜇𝑛 2


𝜒2 = ( ) +( ) + …+ ( )
𝜎1 𝜎2 𝜎𝑛

follows 𝜒 2 distribution with n d.f.

𝜒 2 – test for Population Variance:

Suppose we want to test the hypothesis that given normal population has a specified
variance 𝜎 2 = 𝜎02 . If 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample of size n from the normal
population, then under the null hypothesis𝐻0 : 𝜎 2 = 𝜎02 , the statistic

∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 𝑛𝑆 2
𝜒2 = = (4.14)
𝜎02 𝜎02

follows 𝜒 2 – distribution with (n – 1) d.f.

By comparing the value of 𝜒 2 obtained from (4.14) with the tabulated value of 𝜒 2
with (n-1) d.f. at desired level of significance, the null hypothesis is accepted or
rejected.

Note that if 𝑥1 , 𝑥2 , … , 𝑥𝑛 is not from normal population, then for large n (n>30), the

statistic 𝑍 = √2𝜒 2 − √2𝑛 − 1 has N(0,1).

Example 11

A sample of 20 observations gave a standard deviation of 3.72. Is this compatible with


the hypothesis that the sample is from a normal population with variance 4.35?

Solution

263
n=20, S=3.72, 𝜎 2 = 4.35

𝐻0 : 𝜎 2 = 4.35 , 𝐻1 : 𝜎 2 ≠ 4.35

2 𝑛𝑆 2
𝜒𝑐𝑎𝑙 = = 63.62 follows 𝜒 2 distribution with 19 d.f.
𝜎02

Tabulated value of 𝜒 2 with 19 d.f. at 5%level of significance = 30.144.

2
As 𝜒𝑐𝑎𝑙 > 30.144, 𝐻0 is rejected and the sample is not from normal population with
variance 4.35.

Example 12

Weights in kilograms of 10 students are given below.

38, 40, 45, 53, 47, 43, 55, 48, 52, 49

Can we say that the distribution of sample observations is normal with variance 20.

𝐻0 : 𝜎 2 = 20 , 𝐻1 : 𝜎 2 ≠ 20

𝑥̅ = 47 , 𝑛𝑆 2 = 280, 𝜎02 = 20

2
Using (4.14), 𝜒𝑐𝑎𝑙 = 14.

2
𝜒 2 table value for 9 d.f. at 5% significance level is 16.92. 𝜒𝑐𝑎𝑙 is less than the
tabulated value, accept the null hypothesis of population variance to be 20.

𝜒 2 − 𝑡𝑒𝑠𝑡 for Independence of Attributes:

Let us suppose that the given population is consisting of N items is divided into r
mutually exclusive classes 𝐴1 , 𝐴2 , … , 𝐴𝑟 with respect to attribute A, so that an item
selected at random possesses one and only one of the attributes 𝐴1 , 𝐴2 , … , 𝐴𝑟 .
Similarly, let us suppose that the same population is divided into s mutually disjoint
and exhaustive classes 𝐵1 , 𝐵2 , … , 𝐵𝑠 with respect to another attribute B so that an item
selected at random possesses one and only one of the attributes 𝐵1 , 𝐵2 , … , 𝐵𝑠 . The
frequency distribution of the items belonging to the classes 𝐴1 , 𝐴2 , … , 𝐴𝑟 and
𝐵1 , 𝐵2 , … , 𝐵𝑠 is given by r X s contingency table given in Table 4.1.

Table 4.1: r X s contingency table

264
B 𝐵1 𝐵2 … 𝐵𝑗 … 𝐵𝑠 Total
A
𝐴1 (𝐴1 , 𝐵1 ) (𝐴1 , 𝐵2 ) … (𝐴1 , 𝐵𝑗 ) … (𝐴1 , 𝐵𝑠 ) (𝐴1 )
𝐴2 (𝐴2 , 𝐵1 ) (𝐴2 , 𝐵2 ) … (𝐴2 , 𝐵𝑗 ) … (𝐴2 , 𝐵𝑠 ) (𝐴2 )
… … … … … … … …
𝐴𝑖 (𝐴𝑖 , 𝐵1 ) (𝐴𝑖 , 𝐵2 ) … (𝐴𝑖 , 𝐵𝑗 ) … (𝐴𝑖 , 𝐵𝑠 ) (𝐴𝑖 )
… … … … … … … …
𝐴𝑟 (𝐴𝑟 , 𝐵1 ) (𝐴𝑟 , 𝐵2 ) … (𝐴𝑟 , 𝐵𝑗 ) … (𝐴𝑟 , 𝐵𝑠 ) (𝐴𝑟 )
Total (𝐵1 ) (𝐵2 ) … (𝐵𝑗 ) … (𝐵𝑠 ) N

Under the null hypothesis, that the two attributes A and B are independent, the
expected frequency for (𝐴𝑖 , 𝐵𝑗 ) is given by

E[(𝐴𝑖 , 𝐵𝑗 )] = N.P(𝐴𝑖 , 𝐵𝑗 )

(𝐴𝑖 ) (𝐵𝑗 )
= N. .
𝑁 𝑁

(𝐴𝑖 )(𝐵𝑗 )
= (4.15)
𝑁

= (𝐴𝑖 , 𝐵𝑗 )0 (say)

i = 1, 2, …, r ; j = 1, 2, …, s

The statistic,

[(𝐴𝑖 ,𝐵𝑗 )−(𝐴𝑖 ,𝐵𝑗 ) ]2


𝜒 2 = ∑𝑖 ∑𝑗 [ 0
] (4.16)
(𝐴𝑖 ,𝐵𝑗 )0

has 𝜒 2 -distribution with ((r-1)(s-1)) d.f.

Comparing this calculated value of 𝜒 2 with tabulated values of 𝜒 2 for ((r-1)(s-1)) d.f.
at desired level of significance, the null hypothesis is rejected or accepted.

2 X 2Contingency Table

Table 4.1: 2 X 2Contingency Table

265
Attribute B Frequency Frequency Total
Attribute A B1 B2
A1 A b (a+b)
A2 C d (c+d)
Total (a+c) (b+d) N=a+b+c+d

Under the null hypothesis of independence of attributes, the value of 𝜒 2 for the 2 X 2
contingency table given in Table (4.2) is given by

𝑁(𝑎𝑑−𝑏𝑐)2
𝜒2 = (4.17)
(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)

If any cell frequency is less than 5, then in for 2 X 2 contingency table, the Yates
correction is used, and with this

(|𝑎𝑑−𝑏𝑐|−𝑁/2)2
𝜒 2 = (𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑) (4.18)

Example 13

A sample 0f 400 under-graduate (UG) and 400 post-graduate (PG) students was taken
to know their opinion about autonomous colleges. 290 of under-graduate and 310 of
post-graduate students favoured the autonomous status. Test whether opinion about
autonomous status and education level of students are independent at 5% significance
level.

Solution

The given information can be tabulated as

Opinion Favouring Opposing Total

Class

UG 290 110 400

PG 310 90 400

266
Total 600 200 800

H0 :Opinion about autonomous status and education level of students

are independent.

H1 :Opinion about autonomous status and education level of students

are not independent.

Under null hypothesis, the expected frequencies are:

E(290) = (600 X 400)/800 = 300, E(110) = (200 X 400)/800 = 100,

E(310) = (600 X 400)/800 = 300, E(90) = (200 X 400)/800 = 100,

2 (290−300)2 (310−300)2 (110−100)2 (90−100)2


Using (4.16), 𝜒𝑐𝑎𝑙 = + + +
300 300 100 100

= 2.66

d.f.= (2-1)(2-1) = 1

Critical value of 𝜒 2 for 1 d.f. at 5% significance level = 3.84.

2
𝜒𝑐𝑎𝑙 < 3.84, accept H0. Opinion about autonomous status and education level are
independent.

Example 14

A movie producer takes a random survey from 1000 persons attending the pre-view of
the movie and obtains the following figures.

Age-grp. Under 20 20-39 40-59 60+ Total

Liking

267
Liked 320 80 110 200 710

Disliked 50 15 70 60 195

Indifferent 30 5 20 40 95

Total 400 100 200 300 1000

Find whether age and liking of the movie are associated.

Solution

H0: Age do not influence the liking towards the movie.

H1:Age influence the liking of the movie.

The expected frequencies are obtained as

E(320) = (400)(710)/1000, E(80) = (100)(710)/1000, …,

E(40) = (300)(95)/1000.

Age-grp. Under 20 20-39 40-59 60+ Total

Liking

Liked 282 71 142 213 710

Disliked 78 19.5 39 58.5 195

Indifferent 38 9.5 19 28.5 95

268
Total 400 100 200 300 1000

The expected frequencies are tabulated as follows.

2
Using (4.16), 𝜒𝑐𝑎𝑙 = 57.987, d.f. = (3-1)(4-1) = 6.

Critical value of 𝜒 2 for 6 d.f. at 5% significance level = 12.592.

2
𝜒𝑐𝑎𝑙 > 12.592, reject hypothesis of age and movie liking are not independent.

16.5 Intrduction to Monte carlo methods

Monte Carlo methods, or Monte Carlo experiments, are a broad class


of computational algorithms that rely on repeated random sampling to obtain
numerical results. The underlying concept is to use randomness to solve problems that
might be deterministic in principle. They are often used
in physical and mathematical problems and are most useful when it is difficult or
impossible to use other approaches. Monte Carlo methods are mainly used in three
problem classes: optimization, numerical integration, and generating draws from
a probability distribution. In principle, Monte Carlo methods can be used to solve any
problem having a probabilistic interpretation, to obtain the statistical properties of
some phenomenon (or behaviour). By the law of large numbers, integrals described
by the expected value of some random variable can be approximated by taking
the empirical mean of independent samples of the variable.

Monte Carlo methods vary, but tend to follow a particular pattern:

1. Define a domain of possible inputs


2. Generate inputs randomly from a probability distribution over the domain
3. Perform a deterministic computation on the inputs
4. Aggregate the results

Uses of Monte Carlo methods require large amounts of random numbers.

Areas of application include (but not limited to the only mentioned here)

269
Computer graphics

Path tracing, occasionally referred to as Monte Carlo ray tracing, renders a 3D scene
by randomly tracing samples of possible light paths.

In Applied statistics Monte Carlo methods may be used for following purposes:

To compare the competing statistics forsmall samples under realistic data conditions.
To provide a random sample from the posterior distributionin Bayesian inference.
This sample then approximates and summarizes all essential features of the posterior.

16.6 Summary
The following are the important points contained in this unit.
1. Milk sachets of 500ml. each were filled by a machine for which standard
deviation of filling is 5 ml. 72 sachets are tested for their contents and the mean
content is found to be 501.1 ml. Test whether the machine is set properly.
2. A firm manufactures resistors. The standard deviation of their resistance is known
to be 0.02 ohms. It is required to test whether their mean resistance is 1.4 ohms. A
random sample consisting of 64 resisters have a mean of 1.39 ohms. Based on this
sample can we conclude that the mean resistance of whole lot is 1.4 ohms?
3. For a population, it is believed that the average height is greater than 180cms. with
standard deviation of 3.3cms. Randomly, 50 individuals were selected and their
heights are measured. The average height is found to be 181.1cms. Test the belief
regarding height of population at 1% significance level.
4. A shop sells on an average 200 pen drives per day with a standard deviation of 50
pen drives. After an extensive advertising campaign, the management computed
the average sales for next 25 days and is found to be 216. Find whether an
improvement has occurred or not, assuming normal distribution.
5. An insurance agent claims that the average age of policy holders who insure
through him is less than 30.5 years. A random sample of 100 policy holders who
had insured through him had the average age of 28.8 years and standard deviation
of 6.35 years. Test whether the claim of the agent is true at 5% significance level.

16.7 Keywords

270
hypothesis.
variance
Degrees of Freedom
probabilistic interpretation

16.8 Question for Self study

1. Define with examples: i. null and alternative hypotheses


i. Type – I and Type – II errors.

2. Outline a procedure of testing for population variance.

3. Describe i. p – values ii. Test statistic iii. sampling distribution

4. Define t- distribution. Outline a test for testing significance difference


between two sample means.

5. a) Outline the procedures for paired comparisons.


b) The standard pain reliever is known to bring relief in an average of 4.5
minutes with standard deviation 2.5 minutes. To test whether the new
pain reliever works more quickly than the standard one, 50 patients
with minor surgeries were given the new pain reliever and their times
to relief were recorded. The experiment yielded sample mean of 4.1
minutes with sample standard deviation of 2.0 minutes. Test the
hypothesis that newly developed pain reliever acts more quickly.

6. a) The recommended daily intake of iron for females aged 19-50 is


18mg/day. A measurement of daily iron intake of 15 women yielded a
mean daily intake of 16.5 mg/day with sample standard deviation of
4.7 mg/day. Test whether the actual mean daily intake for all women is
different from 18mg/day at 5% level of significance.
b) Describe Chi-square test for association of attributes.

16.9 References

1. Chandrashekhar, K. S. (2009). Engineering Mathematics IV. Sudha Publications,

Banglore.

2. Gupta, S. C. (2021). Fundamentals of Statistics. (Seventh Edition) Himalaya


Publishing House.

3.Montgomery, D. C. and Runger, G. C. (2014). Applied Statistics and Probability


for Engineers. (Sixth Edition). John Wiley and Sons, Singapore.

271
4. Rohatgi, V. K. and A. K. Md. Ehsanes SALEH(2001). An Introduction to
Probability and Statistics (Second Edition). John Wiley & Sons, Inc.

5.Trivedi, K S (1997). Probability and Statistics with Reliability, Queuing, and


Computer Science Applications. Prentice Hall of India , New Delhi.

272
STATISTICAL TABLES 677

Table ST2. Tail Probability Under Standard Normal Distributiona

z 0.00 0.01 0.02 O.Q3 0.04 0.05 0.06 O.o7 0.08 0.09

0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2231 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1984 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287· 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0017 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

Source: Adapted with permission from P. G. Hoel,lntroduction to Mathematical Statistics, 4th ed., Wiley,
New York, 1971, p. 391.
a This table gives the probability that the standard normal variable Z will exceed a given positive value z,

that is, P{Z > Za) =a. The probabilities for negative values of z are obtained by symmetry.
Table ST3. Critical Values Under Chi-Square Distribution4

� Degrees a
QC
of
Freedom 0.99 0.98 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01

1 0.000157 0.000628 0.00393 0.0158 0.0642 0.148 0.455 1.074 1.642 2.706 3.841 5.412 6.635
2 0.0201 0.0404 0.103 0.211 0.446 0.713 1.386 2.408 3.219 4.605 5.991 7.824 9.210
3 0.115 0.185 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 7.815 9.837 11.341
4 0.297 0.429 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 9.488 11.668 13.277
5 0.554 0.752 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 11.070 13.388 15.086
6 0.872 1.134 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 12.592 15.033 16.812
7 1.239 1.564 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 14.067 16.622 18.475
8 1.646 2.032 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13.362 15.507 18.168 20.090
9 2.088 2.532 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 16.919 19.679 21.666
10 2.558 3.059 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 18.307 21.161 23.209
11 3.053 3.609 4.575 5.578 6.989 8.148 10.341 12.899 14.631 17.275 19.675 22.618 24.725
12 3.571 4.178 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 21.026 24.054 26.217
13 4.107 4.765 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 22.362 25.472 27.688
14 4.660 5.368 6.571 7.790 9.467 10.821 13.339 16.222 18.151 12.064 23.685 26.873 29.141
15 5.229 5.985 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 24.996 28.259 30.578
16 5.812 6.614 7.962 9.312 11.152 12.624 15.338 18.418 20.465 23.542 26.296 29.633 32.000
17 6.408 7.255 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.669 27.587 30.995 33.409
18 7.015 7.906 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 28.869 32.346 34.805
19 7.633 8.567 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 33.687 36.191
20 8.260 9.237 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 31.410 35.020 37.566
21 8.897 9.915 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 36.343 38.932
22 9.542 10.600 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 37.659 40.289
23 10.196 11.293 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 35.172 38.968 41.638
24 10.856 11.992 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 36.415 40.270 42.980
25 11.524 12.697 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 37.652 41.566 44.314
26 12.198 13.409 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 38.885 42.856 45.642
27 12.879 14.125 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 44.140 46.963
28 13.565 14.847 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 45.419 48.278
29 14.256 15.574 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 42.557 46.693 49.588
30 14.953 16.306 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 47.962 50.892

Source: Reproduced from Statistical Methods for Research Workers, 14th ed., 1972, with the permission of the estate of R. A. Fisher, and Hafner Press.
aFor degrees of freedom greater than 30, the expression .fii.2- ./2n- I may be used as a normal deviate with unit variance, where n is the number of degrees of
STATISTICAL TABLES 679

Table ST4. Student's t-Distributiona

n 0.10 0.05 0.025 0.01 0.005

1 3.078 6.314 12.706 31.821 63.657


2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2 896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
00 1.282 1.645 1.960 2.326 2.576

Source: P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393.
Reprinted by permission of John Wiley & Sons, Inc.
0The first column lists the number of degrees of freedom (n). The headings of the other columns give
probabilities (a) fort to exceed the entry value. Use symmetry for negative t values.
� Table STS. F-Distribution: 5% (Lightface Type) and 1% (Boldface Type) Points for the Distribution ofF

Degrees of Degrees of Freedom for Numerator, m


Freedom for
Denominator, n 2 3 4 5 6 7 8 9 10 II 12 14 16 20 24 30 40 50 75 100 200 500 00

161 200 216 225 230 234 237 239 241 242 243 244 245 246 248 249 250 251 252 253 253 254 254 254
4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 6106 6142 6169 6208 6234 6258 6286 ti302 6323 6334 6352 6361 6366

2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41 19.42 19.43 19.44 19.45 19.46 19.47 Fi.47 19.48 19.49 19.49 19.50 19.50
98.49 99.01 99.17 99.25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 99.42 99.43 99.44 99.45 99.46 99.47 99.48 � .48 99.49 99.49 99.49 99.50 99.50

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74 8.71 8.69 8.66 8.64 8.62 8.60 8.58 8.57 8.56 8.54 8.54 8.53
34.12 30.81 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.30 26.27 26.23 26.18 26.14 26.12

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.68 5.66 5.65 5.64 5.63
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14.02 13.93 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36
16.26 13.27 12.06 11.39 10.97 10.67 16.45 10.27 16.15 10.05 9.96 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.67
13.74 16.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23
������������������������

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93
11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31

10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.64 2.61 2.59 2.56 2.55 2.54
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91

II 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.96 2.82 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40
�������������w����������

12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30
������������������������

13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.28 2.26 2.24 2.22 2.21
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16
14 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60 2.56 2.53 2.48 2.44 2.39 2.35 2.31 2.27 2.24 2.21 2.19 2.16 2.14 2.13
8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80 3.70 3.62 3.51 3.43 3.34 3.26 3.21 3.14 3.11 3.06 3.02 3.00
15 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55 2.51 2.48 2.43 2.39 2.33 2.29 2.25 2.21 2.18 2.15 2.12 2.10 2.08 2.07
8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 3.56 3.48 3.36 3.29 3.20 3.12 3.07 3.00 2.97 2.92 2.89 2.87
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.45 2.42 2.37 2.33 2.28 2.24 2.20 2.16 2.13 2.09 2.07 2.04 2.02 2.01
8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.61 3.55 3.45 3.37 3.25 3.18 3.10 3.01 2.96 2.89 2.86 2.80 2.77 2.75

17 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.50 2.45 2.41 2.38 2.33 2.29 2.23 2.19 2.15 2.11 2.08 2.04 2.02 1.99 1.97 1.96
8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.45 3.35 3.27 3.16 3.08 3.00 2.92 2.86 2.79 2.76 2.70 2.67 2.65
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.29 2.25 2.19 2.15 2.ll 2.07 2.04 2.00 1.98 1.95 1.93 1.92
8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51 3.44 3.37 3.27 3.19 3.07 3.00 2.91 2.83 2. 78 2.71 2.68 2.62 2.59 2.57
19 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38 2.34 2.31 2.26 2.21 2.15 2.11 2.07 2.02 2.00 1.96 1.94 1.91 1.90 1.88
8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30 3.19 3.12 3.00 2.92 2.84 2.76 2.70 2.63 2.60 2.54 2.51 2.49

20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.23 2.18 2.12 2.08 2.04 1.99 1.96 1.92 1.90 1.87 1.85 1.84
8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37 3.30 3.23 3.13 3.05 2.94 2.86 2.77 2.69 2.63 2.56 2.53 2.47 2.44 2.42
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 2.25 2.20 2.15 2.09 2.05 2.00 1.96 1.93 1.89 1.87 1.84 1.82 1.81
8.02 5. 78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31 3.24 3.17 3.07 2.99 2.88 2.80 2.72 2.63 2.58 2.51 2.47 2.42 2.38 2.36

22 4.30 3.44 3.05 2.82 2.66 2.55 2.47 2.40 2.35 2.30 2.26 2.23 2.18 2.13 2.07 2.03 1.98 1.93 1.91 1.87 1.84 1.81 _1.80 1.78
7.94 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 3.12 3.02 2.94 2.83 2.75 2.67 2.58 2.53 2.46 2.42 2.37 2.33 2.31

23 4.28 3.42 3.03 2.80 2.64 2.53 2.45 2.38 2.32 2.28 2.24 2.20 2.14 2.10 2.04 2.00 1.96 1.91 1.88 1.84 1.82 1.79 1.77 1.76
7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.14 3.07 2.97 2.89 2.78 2.70 2.62 2.53 2.48 2.41 2.37 2.32 2.28 2.26
24 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26 2.22 2.18 2.13 2.09 2.02 1.98 1.94 1.89 1.86 1.82 1.80 1.76 1.74 1.73
7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17 3.09 3.03 2.93 2.85 2.74 2.66 2.58 2.49 2.44 2.36 2.33 2.27 2.23 2.21

25 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24 2.20 2.16 2.11 2.06 2.00 1.96 1.92 1.87 1.84 1.80 1.77 1.74 1.72 1.71
7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 3.21 3.13 3.05 2.99 2.89 2.81 2.70 2.62 2.54 2.45 2.40 2.32 2.29 2.23 2.19 2.17
26 4.22 3.37 2.89 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.10 2.05 1.99 1.95 1.90 1.85 1.82 1.78 1.76 1.72 1.70 1.69
7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09 3.02 2.96 2.86 2.77 2.66 2.58 2.50 2.41 2.36 2.28 2.25 2.19 2.15 2.13


....
!
Table ST5 (Continued)

Degrees of Degrees of Freedom for Numerator, m


Freedom for
Denominator, n 2 3 4 5 6 7 8 9 10 II U M � W U m � m � � D B oo

27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20 2.16 2.13 2.08 2.03 1.97 1.93 1.88 1.84 1.80 1.76 1.74 1.71 1.68 1.67
7.68 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06 2.98 2.93 2.83 2.74 2.63 2.55 2.47 2.38 2.33 2.25 2.21 2.16 2.12 2.10

28 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.M 2.19 2.15 2.12 2.06 2.02 1.96 1.91 1.87 1.81 1.78 1.75 1.72 1.69 1.67 1.65
7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03 2.95 2.90 2.80 2.71 2.60 2.52 2.44 2.35 2.30 2.22 2.18 2.13 2.09 2.06

29 4.18 3.33 2.93 2.70 2.54 2.43 2.35 2.28 2.22 2.18 2.14 2.10 2.05 2.00 1.94 1.90 1.85 1.80 1.77 1.73 1.71 1.68 1.65 1.64
7.60 5.52 4.54 4.04 3.73 3.50 3.33 3.20 3.08 3.00 2.92 2.87 2.77 2.68 2.57 2.49 2.41 2.32 2.27 2.19 2.15 2.10 2.06 2.03

30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 2.12 2.09 2.04 1.99 1.93 1.89 1.84 1.79 1.76 1.72 1.69 1.66 1.64 1.62
�����������������������w

32 4.15 3.30 2.90 2.67 2.51 2.� 2.32 2.25 2.19 2.14 2.10 2.07 2.02 1.97 1.91 1.86 1.82 1.76 1.74 1.69 1.67 1.64 1.61 1.59
7.50 5,34 4.46 3.4J7 3.66 3.42 3.25 3.12 3.01 2.94 2.86 2.80 2.70 2.62 2.51 2.42 2.34 2.25 2.20 2.12 2.08 2.02 1.98 1.96

34 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12 2.08 2.05 2.00 1.95 1.89 1.84 1.80 1.74 1.71 1.67 1.64 1.61 1.59 1.57
������������������������

36 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10 2.06 2.03 1.89 1.93 1.87 1.82 1.78 1.72 1.69 1.65 1.62 1.59 1.56 1.55
7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86 2.78 2.72 2.62 2.54 2.43 2.35 2.26 2.17 2.12 2.04 2.00 1.94 1.90 1.87

38 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09 2.05 2.02 1.96 1.92 1.85 1.80 1.76 1.71 1.67 1.63 1.60 1.57 1.54 1.53
7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82 2.75 2.69 2.59 2.51 2.30 2.32 2.22 2.14 2.08 2.00 1.97 1.90 1.86 1.84

� 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 2.04 2.00 1.95 1.90 1.84 1.79 1.74 1.69 1.66 1.61 1.59 1.55 1.53 1.51
7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80 2.73 2.66 2.56 2.49 2.37 2.29 2.20 2.11 2.05 1.4J7 1.94 1.88 1.84 1.81

42 4.07 3.22 2.83 2.59 2.44 2.32 2.24 2.17 2.11 2.06 2.02 1.99 1.94 1.89 1.82 1.78 1.73 1.68 1.64 1.60 1.57 1.54 1.51 1.49
������������������������

44 4.06 3.21 2.82 2.58 2.43 2.31 2.23 2.16 2.10 2.05 2.01 1.98 1.92 1.88 1.81 1.76 1.72 1.66 1.63 1.58 1.56 1.52 1.50 1.48
7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75 2.68 2.62 2.52 2.44 2.32 2.24 2.15 2.06 2.00 1.92 1.88 1.82 1.78 1.75

46 4.05 3.2o 2.81 2.57 2.42 2.m 2.22 2.14 2.09 2.04 2.00 1.97 1.91 t.87 t.8o 1.75 1.11 t.65 1.62 1.57 1.54 1.51 1.48 1.46
7.21 5.10 4.24 3.76 3.44 3.22 3.05 2.92 2.82 2.73 2.66 2.60 2.50 2.42 2.40 2.22 2.13 2.04 1.98 1.90 1.86 1.80 1.76 1.72

48 4.04 3.19 2.80 2.56 2.41 2.30 2.21 2.14 2.08 2.03 1.99 1.96 1.90 1.86 1.79 1.74 1.70 1.64 1.61 1.56 1.53 1.50 1.47 1.45
����������������������w�
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02 1.98 1.95 1.90 1.85 1.78 1.74 1.69 1.63 1.60 1.55 1.52 1.48 1.46 1.44
����������������������m�

55 4.02 3.17 2.78 2.54 2.38 2.27 2.18 2.11 2.05 2.00 1.97 1.93 1.88 1.83 1.76 1.72 1.67 1.61 1.58 1.52 1.50 1.46 1.43 1.41
�������������������w�m��

60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 1.86 1.81 1.75 1.70 1.65 1.59 1.56 1.50 1.48 1.44 1.41 1.39
7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50 2.40 2.32 2.20 2.12 2.03 1.93 1.87 1.79 1.74 1.68 1.63 1.60

65 3.99 3.14 2.75 2.51 2.36 2.24 2.15 2.08 2.02 1.98 1.94 1.90 1.85 1.80 1.73 1.68 1.63 1.57 1.54 1.49 1.� 1.42 1.39 1.37
������������������������
70 3.98 3.13 2.74 2.50 2.35 2.32 2.14 2.07 2.01 1.97 1.93 1.89 1.84 1.79 1.72 1.67 1.62 1.56 1.53 1.47 1.45 1.40 1.37 1.35
7.01 4.92 4.08 3.60 3.29 3.07 2.91 2.77 2.67 2.59 2.51 2.45 2.35 2.28 2.15 2.07 1.98 1.88 1.82 1.74 169 1.63 1.56 1.53

80 3.96 3.11 2.72 2.48 2.33 2.21 2.12 2.05 1.99 1.95 1.91 1.88 1.82 1.77 1.70 1.65 1.60 1.54 1.51 1.45 1.42 1.38 1.35 1.32
6.96 4.88 4.04 3.56 3.25 3.04 2.87 2.74 2.64 2.55 2.48 2.41 2.32 2.24 2.11 2.03 1.94 1.84 1.78 1.70 1.65 1.57 1.52 1.49

100 3.94 3.09 2.70 2.� 2.30 2.19 2.10 2.03 1.97 1.92 1.88 1.85 1.79 1.75 1.68 1.63 1.57 1.51 1.48 1.42 1.39 1.34 1.30 1.28
6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 2.59 2.51 2.43 2.36 2.26 2.19 2.06 1.98 1.89 1.79 1.73 1.64 1.59 1.51 1.46 1.43

125 3.92 3.07 2.68 2.44 2.29 2.17 2.08 2.0 I 1.95 1.90 1.86 1.83 1.77 1.72 1.65 1.60 1.55 1.49 1.45 1.39 1.36 1.31 1.27 1.25
6.84 4.78 3.94 3.47 3.17 2.95 2.79 2.65 2.56 2.47 2.40 2.33 2.23 2.15 2.03 1.94 1.85 1.75 1.68 1.59 1.54 1.46 1.40 1.37

150 3.91 3.06 2.67 2.43 2.27 2.16 2.07 2.00 1.94 1.89 1.85 1.82 1.76 1.71 1.64 1.59 1.54 1.47 1.44 1.37 1.34 1.29 1.25 1.22
6.81 4.75 3.91 3.44 3.13 2.92 2.76 2.62 2.53 2.44 2.37 � 2.20 2.12 2.00 1.91 1.83 1.72 1.66 1.56 1.51 1.43 1.37 1.33

200 3.89 3.04 2.65 2.41 2.26 2.14 2.05 1.98 1.92 1.87 1.83 1.80 1.74 1.69 1.62 .157 1.52 1.45 1.42 1.35 1.32 1.26 1.22 1.19
6.76 4.71 3.88 3.41 3.11 2.90 2.73 2.60 2.50 2.41 2.34 2.28 1.17 2.09 1.97 1.88 1.79 1.69 1.62 1.53 1.48 1.39 1.33 1.28
400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 190 1.85 1.81 1.78 1.72 1.67 1.60 1.54 1.49 1.42 1.38 1.32 1.28 1.22 1.16 1.13
��w����������������w����

1000 3.85 3.00 2.61 2.38 2.22 2.10 2.02 1.95 1.89 1.84 1.80 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30 1.26 \.19 1.13 1.08
��������w�������m�������

00 3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28 1.24 1.17 1.11 1.00
������������������������

Source: Reprinted by pennission from George W. Snedecor and William G. Cochran, Statistical Methods, 6th ed., © 1967 by Iowa State University Press, Ames, Iowa.

You might also like