1 - MCSDSC-2.1 Linear Algebra & Statistics
1 - MCSDSC-2.1 Linear Algebra & Statistics
1 - MCSDSC-2.1 Linear Algebra & Statistics
Mukthagangotri, Mysore-570006
Course Name: Linear Algebra & Statistics Credit: 4 Unit No: 1-16
Editorial Committee
Dr. Mahesha D.M., MCA.,PhD Chairman
BOS Chairman,
Assistant Professor & Programme co-ordinator(PG)
DoS&R in Computer Science,
Karnataka State Open University, Mysuru-570 006.
Copy Right
Registrar,
Karnataka State Open University,
Mukthagantoghri, Mysore 570 006.
Printed and Published on behalf of Karnataka State Open University, Mysore-570 006 by
the Registrar (Administration)-2023
TABLE OF CONTENTS
• PREFACE
• PRELIMINARIES
1. 0. Objectives
1. 1. Introduction
1. 2. Vector Spaces
1. 3. Subspaces
1. 4. Linear Combinations and Systems of Linear equation
1. 5. Linearly Dependence and Linearly Independence
1. 6. Bases and Dimension
1. 7. Maximal Linearly Independent Subsets
1. 8. Summary
1. 9. Keywords
1. 10. Assessment Questions
1. 11. References
2. 0. Objectives
2. 1. Introduction
2. 2. Linear Transformation
2. 3. The Matrix Representation of a Linear Transformation
2. 4. Composition of a Linear Transformation and Matrix Multiplication
2. 5. Invertiblity and Isomorphism
2. 6. The Change of Coordinate Matrix
2. 7. The Dual Space
2. 8. Summary
2. 9. Keywords
2. 10. Assessment Questions
2. 11. References
3. 0. Objectives
3. 1. Introduction
3. 2. Elementary Matrix Operations and Elementary Matrices
3. 3. The Rank of a Matrix and Matrix Inverses
3. 4. System of linear equations
3. 5. Summary
3. 6. Keywords
3. 7. Assessment Questions
3. 8. References
4. 0. Objectives
4. 1. Introduction
4. 2. Properties of determinant
4. 3. Cofactor Expansions
4. 4. Elementary operations and Cramer’s rule
4. 5. Summary
4. 6. Keywords
4. 7. Assessment Questions
4. 8. References
SUBSPACES 77-89
1. 0. Objectives
1. 1. Introduction
1. 2. Eigenvalues and Eigenvectors
1. 3. Diagonalizability
1. 4. Invariant Subspaces and The Cayley-Hamilton Theorem
1. 5. Summary
1. 6. Keywords
1. 7. Assessment Questions
1. 8. References
2. 0. Objectives
2. 1. Introduction
2. 2. Inner Products Space
2. 3. The Gram-Schmidt Orthogonalization Process
2. 4. Orthogonal Complements
2. 5. Summary
2. 6. Keywords
2. 7. Assessment Questions
2. 8. References
105-120
3. 0. Objectives
3. 1. Introduction
3. 2. The Adjoint of a linear operator
3. 3. The Normal and Self-Adjoint operators
3. 4. Unitary and Orthogonal operators
3. 5. Orthogonal Projections and The Spectral Theorem
3. 6. Summary
3. 7. Keywords
3. 8. Assessment Questions
3. 9. References
4. 0. Objectives
4. 1. Introduction
4. 2. Bilinear Form
4. 3. Quadratic Form
4. 4. Summary
4. 5. Keywords
4. 6. Assessment Questions
4. 7. References
1. 0. Objectives
1. 1. Introduction
1. 2. The Diagonal Form
1. 3. The Triangular Form
1. 4. Summary
1. 5. Keywords
1. 6. Assessment Questions
1. 7. References
2. 0. Objectives
2. 1. Introduction
2. 2. The Jordan Canonical Form
2. 3. Summary
2. 4. Keywords
2. 5. Assessment Questions
2. 6. References
3. 0. Objectives
3. 1. Introduction
3. 2. Minimal Polynomial
3. 3. Summary
3. 4. Keywords
3. 5. Assessment Questions
3. 6. References
4. 0. Objectives
4. 1. Introduction
4. 2. The Rational Canonical Form
4. 3. Summary
4. 4. Keywords
4. 5. Assessment Questions
4. 6. References
• GLOSSARY OF SYMBOLS
BLOCK - IV: STATISTICS
UNIT-13: Combinatorics and descriptive statistics 177-208
13.0 Objectives
13.1 Introduction
13.2 Combinatorics & permutation
13.3 Frequency Distribution
13.4 Graphical Representation of Data
13.5 Measures of Central Tendency
13.6 Moments, Skewness, Kurtosis
13.7 Summary
13.8 Keywords
13.9 Questions for self-study
13.10 References
15.0 Objectives
15.1 Introduction
15.2 Regression analysis
15.3 Fitting of Second degree parabola
15.4 Inverse regression
15.5 Correlation versus Regression
15.6 Summary
15.7 Keywords
15.8 Question for self-study
15,9 References
16.1 Introduction
16.2 Large Sample tests
16.3 Small sample tests
16.4 Testing for population variance
16.5 Tests based on Chi-square distribution
16.6 Introduction to Monte Carlo Methods
16.7 Summary
16.8 Keywords
16.9 Question for self-study
16.10 References
PREFACE
Linear algebra, mathematical discipline that deals with vectors and matrices and,
more generally, with vector spaces and linear transformations. Unlike other parts of
mathematics that are frequently invigorated by new ideas and unsolved problems, linear
algebra is very well understood. Linear algebra is a very useful subject, and its basic
concepts arose and were used in different areas of mathematics and its applications. It is
therefore not surprising that the subject had its roots in such diverse fields as number
theory (both elementary and algebraic), geometry, abstract algebra (groups, rings, fields,
Galois theory), analysis (differential equations, integral equations, and functional
analysis), and physics. Among the elementary concepts of linear algebra are linear
equations, matrices, determinants, linear transformations, linear independence,
dimension, bilinear forms, quadratic forms, and vector spaces. Since these concepts are
closely interconnected, several usually appear in a given context (e.g., linear equations
and matrices) and it is often impossible to disengage them. It has extensive applications
in engineering, natural sciences, computer science, and the social sciences.
Nonlinear mathematical models can often be approximated by linear ones. In 1880,
many of the basic results of linear algebra had been established, but they were not part of
a general theory. In particular, the fundamental notion of vector space, within which
such a theory would be framed, was absent. This was introduced only in 1888 by
Giuseppe Peano, Italian mathematician and he known as a founder of symbolic logic.
This study material deals with three blocks. In block -I contain vector spaces, linear
transformation and matrices, block-II contains diagonalization and inner product space
and finally block-III contains canonical forms, each blocks comprises four units.
0. PRELIMINARIES
This section provides a brief review of all basic concepts, definitions, examples
etc which will be used in the subsequent blocks. It has been assumed that the reader is
familiar with elementary Algebra.
Examples.
1. The real numbers R the complex numbers C, and the rational numbers Q, are all
fields.
2. The set of integers Z, is not a field.
The trace of an n × n matrix M, denoted tr(M), is the sum of the diagonal entries
of M. That is, tr(M) = M 11 + M 22 + . . . .+ M nn .
Definition 0. 10. For any pair of positive integers i and j, the symbol δ ij is defined
δ ij = 0 if i≠ j and δ ij = 1 if i=j. This symbol is known as the kronecker delta.
GLOSSARY OF SYMBOLS
Symbols Meaning
Aij The ijth entry of the matrix A
F A field
1
2
UNIT-1: VECTOR SPACES
STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. Vector Spaces
1. 2. 1. General Properties of Vector Space
1. 3. Subspaces
1. 3. 1. Criteria of a subspace
1. 4. Linear Combinations and Systems of Linear equation
1. 4. 1. Span
1. 5. Linearly Dependence and Linearly Independence
1. 5. 1. Properties of Linearly dependence and linearly independence
1. 6. Bases and Dimension
1. 6. 1. Finite and Infinite dimensional vector space
1. 7. Maximal Linearly Independent Subsets
1. 8. Summary
1. 9. Keywords
1. 10. Assessment Questions
1. 11. References
3
UNIT-1: VECTOR SPACES
1. 0. Objectives
1. 1. Introduction
1. 2. Vector Spaces
4
(i) a(bv) = (ab)v
(ii) (a +b)v = av + bv
(iii) a(u+v)= au + av
(iv) 1v = v, where a, b∈F and u, v∈V.
Note.
1. The elements of V are called vectors, the elements of F are called scalars.
2. To differentiate between the scalar zero and the vector zero or null vector, we will
write them as 0 and 0, respectively.
Theorem 1. 2. 1. Let V be a vector space over a field F and 0 be the zero vector in V.
Then each of the following statements is true.
5
a = 0 by 1/a, we have v = 0. To show (iv), observe that v + (−1)v = 1v + (−v) = (1−1)v =
0v = 0, and so −1v = (−1)v. The proof (v) is obvious.
1. 3. Subspace
1. 3. 1. Criteria of a subspace
1. A nonempty subset W of a vector space V over a field F is a subspace of V over
F if and only if W is closed under addition and scalar multiplication of V.
2. A subset W of a vector space V over a field F is a subspace of V over F if and
only if
(i) u ∈ W, v ∈ W ⇒ u−v ∈ W.
(ii) u ∈ W, a∈F ⇒ au ∈ W.
3. A subset W of a vector space V over a field F is a subspace of V over F if and
only if u, v ∈ W and a, b∈F ⇒ au +bv ∈ W
6
Illustrative Example - 2. Let R be the field of real numbers and S be the set of all
solutions of the equations x + y + 2z = 0. Show that S is a subspace of R3.
Solution. We have, S = {(x, y, z): x + y + 2z = 0, x, y, z ∈ R}.
Clearly, 1× 0 + 1× 0 + 2 × 0 = 0 . So, (0, 0, 0) satisfies the equation x + y + 2z = 0
Therefore, (0, 0, 0)∈S
⇒ S is non- empty subset of R3.
Let u = (x 1 , y 1 ,z 1 ) and v = (x 2 , y 2 , z 2 ) be any two elements of S.
Then x 1 + y 1 + 2z 1 = 0 and x 2 + y 2 + 2z 2 = 0
Let a, b be any two elements of R. Then,
au + bv = a(x 1 , y 1 ,z 1 ) + b(x 2 , y 2 , z 2 )
⇒ au + bv = (ax 1 +bx 2 , ay 1 +by 2 , az 1 + bz 2 )
Now (ax 1 +bx 2 ) + (ay 1 +by 2 ) + 2 (ac 1 + bc 2 ) = a(x 1 + y 1 + 2z 1 ) + b(x 2 + y 2 + 2z 2 )
=a × 0+b × 0=0
Therefore, au + bv = (ax 1 +bx 2 , ay 1 +by 2 , az 1 + bz 2 ) ∈ S
Thus au + bv ∈ S for all u, v ∈ S and a, b ∈ R
Hence, S is a subspace of R3.
Illustrative Example - 3. Let V be the vector space of all real valued continuous
functions over the field R of all real numbers. Show that the set S of solutions of the
d 2 y dy
differential equation 2 2 − 9 + 2 y =
0 is a subspace of V.
dx dx
d2y dy
Solution. We have S = y : 2 2 − 9 + 2 y =
0 , where y = f(x).
dx dx
Clearly, y = 0 satisfies the given differential equations.
Therefore, 0 ∈S ⇒ S ≠ φ .
d 2 y dy
Let y 1 , y 2 ∈ S, Then, y 1 , y 2 are solutions of 2 2 − 9 + 2 y =
0
dx dx
d 2 y1 dy1 d 2 y2 dy
⇒ 2 2 − 9 + 2 y1 =
0 and 2 2 − 9 2 + 2 y2 =
0
dx dx dx dx
Let a, b ∈ R and y = ay 1 + by 2 . Then
7
d 2 y dy d2 d
2 2 − 9 +=2 y 2 2 ( ay1 + by2 ) − 9 ( ay1 + by2 ) + 2 ( ay1 + by2 )
dx dx dx dx
d2y dy d 2 y1 d 2 y2 dy1 dy
⇒ 2 − 9 + 2 y = 2 a + b 2
− 9 a + b 2 + 2 ( ay1 + by2 )
dx 2
dx dx
2
dx dx dx
d2y dy d 2 y1 dy1 d 2 y2 dy
⇒ 2 − 9 +
= 2 y a 2 − 9 + 2 y1
+ b 2 2 − 9 2 + 2 y2
dx dx
2 2
dx dx dx dx
d 2 y dy
⇒ 2 2 − 9 + 2 y =a × 0 + b × 0
dx dx
Therefore, y = ay 1 + by 2 ∈ S for all y 1 , y 2 ∈ S and a, b ∈ R.
Hence, S is a subspace of V.
Note. In any vector space V, 0v = 0 for each v ∈ V. Thus the zero vector is a linear
combination of any nonempty subset of V.
1. 4. 1. Span
Definition 1. 4. 2. Let S be a nonempty subset of a vector space V. The span of S,
denoted span(S), is the set consisting of all linear combinations of the vectors in S.
Note.
1. For convenience, we define span ( φ ) = {0}.
2. In R3, for instance, the span of the set {(1, 0, 0), (0, 1, 0)} consists of all vectors
in R3 that have the form a(1, 0, 0) + b(0, 1, 0) = (a, b, 0) for some scalars a and b.
8
Thus the span of {(1, 0, 0), (0, 1, 0)} contains all the points in the xy-plane. In
this case, the span of the set is a subspace of R3. This fact is true in general.
Example - 1. The vectors (1, 1, 0), (1, 0, 1) and (0, 1, 1) generate R3 since an arbitrary
vector (a 1 , a 2 , a 3 ) in R3 is a linear combination of the three given vectors, in fact, the
scalars r, s and t for which r(1, 1, 0) + s(1, 0, 1) + t(0, 0, 1) = (a 1 ,a 2, a 3 ) are r = ½ (a 1
+ a 2 – a 3 ), s = ½ (a 1 – a 2 + a 3 ), and t = ½ (–a 1 + a 2 + a 3 ).
This is equivalent to
1 1 2 x 1
0 1 −3 y = Applying R 2 →R 2 – R 1 , R 3 →R 3 – R 1
−3
0 2 −1 z 4
1 1 2 x 1
0 1 −3 y = Applying R 3 →R 3 – R 2
−3
or
0 0 5 z 10
10
Definition 1. 5. 1. A subset S of a vector space V is called linearly dependent if there
exists a finite number of distinct vectors u 1 , u 2 , u 3 ,……,u n in S and scalars a 1 , a 2 , a 3 ,
……,a n , not all zero, such that a 1 u 1 + a 2 u 2 + a 3 u 3 + ……+ a n u n = 0. In this case we
also say that the vectors of S are linearly dependent.
Illustrative Example - 1. If the set S = {(1, 3, –4, 2), (2, 2, –4, 0), (1, –3, 2, –4), (–1,
0, 1, 0)} in R4, then show that S is linearly dependent and express one of the vectors in S
as a linear combination of the other vectors in S.
Solution. To show S is linearly dependent, we must find scalars a 1 , a 2 , a 3 and a 4 , not all
zero, such that a 1 (1, 3,–4, 2) + a 2 (2, 2,–4, 0) + a 3 (1,–3, 2,–4) + a 4 (–1, 0, 1, 0) = 0.
Finding such scalars to finding a nonzero solution to the system of linear equations.
a 1 + 2a 2 + a 3 – a 4 = 0
3a 1 + 2a 2 – 3a 3 =0
–4a 1 – 4a 2 + 2a 3 + a 4 = 0
2a 1 – 4a 4 = 0
One such solution is a 1 = 4, a 2 = –3, a 3 = 2, and a 4 = 0. Thus S is a linearly dependent
subset of R4, and 4 (1, 3, –4, 2) – 3( 2, 2, –4, 0) + 2(1, –3, 2, –4) + 0(–1, 0, 1, 0) = 0.
11
the values of µ and γ. But, if λ = 0, then at least one of µ and γ should not be equal to
zero and hence at least one of λ λ 1 + µ and λ λ 2 + γ will not be zero.
Since λ = 0 ⇒ λ λ 1 + µ = µ and λ λ 2 + γ = γ. Hence, from (1), we find that the scalars
λ, λ λ 1 + µ, λ λ 2 + γ are not all zero. Consequently, the set {v 1, v 2, v 3 } is linearly
dependent.
The property-3 provides a useful method for determining whether a finite set is
linearly independent. This technique is illustrated in the following examples.
Illustrative Example - 3. Prove that the set S = {(1, 0, 0, –1), (0, 1, 0, –1),
(0, 0, 1, –1), (0, 0, 0, 1)} is linearly independent.
Solution. We must show that the only linear combination of vectors in S that equals the
zero vector is the one in which all the coefficients are zero.
Suppose that a 1 , a 2 , a 3 and a 4 are scalars such that
a 1 (1, 0, 0, –1) + a 2 (0, 1, 0, –1) + a 3 (0, 0, 1, –1) + a 4 (0, 0, 0, 1) = (0, 0, 0, 0).
Equating the corresponding coordinates of the vectors on the left and the right sides of
this equation, we obtain the following system of linear equations.
a1 =0
a2 =0
12
a3 =0
– a 1 – a 2 –a 3 + a 4 = 0
Clearly the only solution to this system is a 1 = a 2 = a 3 = a 4 = 0, and so S is linearly
independent.
of V that generates V. If A is a basis for V, we also say that the vectors of A form a
basis for V.
Example - 1. Recalling that span ( φ ) = {0} and φ is linearly independent, we see that
φ is a basis for the zero vector space.
Example - 2. If M mxn (F), let Eij denote the matrix whose only nonzero entry is a 1 in the
ith row and jth column. Then {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n }is a basis for M m x n (F).
13
Definition 1. 6. 2. In F n , let e 1 = (1, 0, 0, ……,0), e 2 = (0, 1, 0……,0) , …….,
e n = (0,0,……,0,1); { e 1, e 2, e 3 , ……,e n } is readily seen to be a basis for F n and is called
the standard basis for F n and is denoted by ε n .
Example - 3. If P n (F) the set {1, x, x2,……, xn} is a basis. This is the standard basis for
P n (F).
Examples - 4.
(i) The vector space {0} has dimension zero.
(ii) The vector space F n has dimension n.
14
(iii) The vector space M m × n (F) has dimension mn.
(iv) The vector space P n (F) has dimension (n+1).
The following examples show that the dimension of a vector space depends on its field of
scalars.
Examples - 5.
(i) Over the field of complex numbers C, the vector space of complex numbers has
dimension 1. (A basis is {1, i}).
(ii) Over the field of real numbers R, the vector space of complex numbers has
dimension 2. (A basis is {1, i}).
Note.
1. In a finite dimensional vector space V whose basis set A = {u 1 , u 2 ,……, u n },
every vector u∈V is uniquely expressible as linear combination of the vectors in
A, that is u = ∑𝑛𝑖=1 𝑎𝑖 𝑢𝑖 for all a i ∈F.
2. There exists a basis for each finite dimensional vector space.
3. Let V be a finite dimensional vector space which is spanned by a finite set S of
vectors u 1 , u 2 ,……,u m . Then any linearly independent set of vectors in V
contains not more than m-elements.
4. Let V be finite dimensional vector space over a field F. Then any two basis of V
have same number of elements.
5. Let V be finite dimensional vector space over a field F, and let W be a subspace
of V. Then dim (V/W) = dim (V) – dim (W).
6. If A and B are two subspaces of a finite dimensional vector space V over a field
F, then dim (A+B) = dim (A) + dim (B) – dim (A∩B).
15
Example- 1. Let F be the family of all subsets of a nonempty set S. (This family F is
called the power set of S.) The set S is easily seen to be a maximal element of F.
Example -2. Let S and T be disjoint nonempty sets, and let F be the union of their power
sets. Then S and T are both maximal elements of F.
Our next result shows that the converse of this statement is also true.
Theorem 1. 7. 1. Let V be a vector space and S a subset that generates V. If A is a
We claim that S ⊆ span (A), for otherwise there exists a v ∈ S such that v ∉ span(A).
Since S is a linearly independent subset of a vector space V, and let v ∈ V that is not in S.
Then S ∪ {v} is linearly dependent if and only if v ∈ span (S). This implies that
A ∪ {v} is linearly independent, we have contradicted the maximality of A.
Therefore, S ⊆ span (A). Because span (S) = V, it follows from the span of any subset S
of a vector space V is a subspace of V.
Moreover, any subspace of V that contains S must also contains the span of S. That is
span (A) = V.
16
Theorem 1. 7. 2. Let S be a linearly independent subset of a vector space V. There
exists a maximal linearly independent subset of V that contains S.
Proof. Let F denote the family of all linearly independent subsets of V that contain S.
In order to show that F contains a maximal element, we must show that if C is a chain in
F, then there exists a member U of F that contains each member of C.
We claim that U, the union of the members of C, is the desired set. Clearly U contains
each member of C, and so it suffices to prove that U ∈ F (That is U is a linearly
independent subset of V that contains S). Because each member of C is a subset of V
containing S, we have S ⊆ U ⊆ V. Thus we need only prove that U is linearly
independent.
Let u 1 , u 2 , u 3 , . . . . ,u n be in U and a 1 , a 2 , a 3 , . . . .,a n be scalars such that a 1 u 1 + a 2 u 2
+ . . . . . . + a n u n. = 0. Because u i ∈ U for i = 1, 2, 3……..n, there exists a set A i in C
such that u i ∈ A i . But since C is a chain, one of these sets, say A k , contains all the others.
Thus u i ∈ A k for i = 1, 2, 3……n. However, A k is a linearly independent set; so a 1 u 1 +
a 2 u 2 + a 3 u 3 + …… + a n u n. = 0 implies that F has a maximal element. This element is
easily seen to be a maximal linearly independent subset of V that contains S.
1. 8. Summary
1. A general definition of vector space along with examples and establish its basic
properties. Vector space, a set of multidimensional quantities, known as vectors,
together with a set of one-dimensional quantities, known as scalars, such that
vectors can be added together and vectors can be multiplied by scalars while
preserving the ordinary arithmetic properties (associativity, commutativity,
distributivity, and so forth).
2. A linear subspace is usually called simply a subspace when the context serves to
distinguish it from other kinds of subspaces.
17
3. A linear combination is an expression constructed from a set of terms by
multiplying each term by a constant and adding the results. The concept of linear
combinations is central to linear algebra and a system of linear equations (or linear
system) is a collection of linear equations involving the same set of variables.
• The span of any subset S of a vector space V is a subspace of V.
4. A family of vectors is linearly independent if none of them can be written as a linear
combination of finitely many other vectors in the collection. A family of vectors
which is not linearly independent is called linearly dependent.
5. Every vector space has a basis, and all bases of a vector space have the same
number of elements, called the dimension of the vector space. In this unit, we study
the dimension theorem, which is play very vital role for subsequent units.
6. Finally, the maximal principle guarantees the existence of maximal elements in a
family of sets, with the help of above maximal property we study the Maximal
linearly independent subsets.
• Let V be a vector space and S a subset that generates V. If A is a maximal
linearly independent subset of S, then A is a basis for V.
• Let S be a linearly independent subset of a vector space V. There exists a
maximal linearly independent subset of V that contains S.
1. 9. Keywords
Basis Scalar Multiplication
Dimension Span of a subset
Dimension theorem Spans
Finite - dimensional space Standard basis
Linear combination Subspace
Linearly dependent System of linear equation
Linearly independent Vector
Polynomial Vector addition
Scalar Vector space
18
1. 10. Assessment Questions
2. Let S be the set of all elements of the form (x + 2y, y, –x + 3y) in R3, where
x, y ∈ R. Show that S is a subspace of R3.
3. Express the polynomial f(x) = x2+ 4x –3 in the vector space V of all polynomials
over R as a linear combination of the polynomials g(x) = x2 – 2x + 5,
h(x) = 2x2 – 3x and φ(x) = x +3.
Answer. f(x) = –3g(x) +2h(x) +4 φ(x).
5. Verify whether given set of vectors (0, 1, 0, 1, 1), (0, 1, 0, 1, 1), (0, 1, 0, 1, 1) and
(0, 1, 0, 1, 1) are linearly independent in R5.
19
Answer. b = 0, d = 0, a = 1 and c = –1. This shows the set of vectors are
Linearly dependent.
1. 11. References
1. Gilbert Strang – Linear Algebra and its Applications, Academic Press, New York,
1976.
2. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
3. N. S. Kumeresan – Linear Algebra, A Geometric approach, Prentice Hall India, 2000.
4. Michael Artin – Algebra, Prentice Hall India, New Delhi, 2007.
20
UNIT-2: LINEAR TRANSFORMATION AND MATRIX
STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. Linear Transformation,
2. 2. 1. Null Spaces and Ranges
2. 2. 2. Dimension Theorem
2. 3. The Matrix Representation of a Linear Transformation
2. 4. Composition of a Linear Transformation and Matrix Multiplication
2. 4. 1. The relation between multiplication of linear transformation and matrix
2. 5. Invertiblity and Isomorphism
2. 5. 1. The Algebra of linear transformation
2. 5. 2. Invertible and Non-invertible
2. 5. 3. Isomorphism
2. 6. The Change of Coordinate Matrix
2. 6. 1. Transition matrix
2. 7. The Dual Space
2. 7. 1. Linear functional
2. 8. Summary
2. 9. Keywords
2. 10. Assessment Questions
2. 11. References
21
UNIT-2: LINEAR TRANSFORMATION AND MATRIX
2. 0. Objectives
2. 1. Introduction
A linear map, linear mapping, linear transformation, or linear operator (in some
contexts also called linear function) is a function between two vector spaces that
preserves the operations of vector addition and scalar multiplication. As a result, it
always maps straight lines to straight lines or 0. A. Cayley formally introduced m × n
matrices in two papers in 1850 and 1858 (the term “matrix” was coined by Sylvester in
1850). He noted that they “comport themselves as single entities” and recognized their
usefulness in simplifying systems of linear equations and composition of linear
transformations.
22
2. 2. Linear Transformation
Definition 2. 2. 1. Let U and V be vector space over field F. Then a map T: U→V is a
linear transformation (or a linear map) if T (u+v) = T(u) + T(v) and T(av) = a T (v).
Two conditions together can be written in the form T (au+bv) = aT(u) + bT(v) for
all a, b∈F and u, v∈U.
Note.
1. If T is linear map, then T(0)=0.
2. T is linear map if and only if T(au+v) = aT(u) + T(v) for all a∈F and u, v∈U.
3. If T linear map, then T (u – v) = T(u) – T(v) for all u, v∈U.
4. T is linear map if and only if, for u 1 , u 2 , . . . ,u n ∈U and a 1 , a 2 , . . . , a n ∈F,
n n
we have T (∑ a i ui ) = ∑ a iT (ui ) .
=i 1 =i 1
Note. Let P n (R) denote the vector space that consists of all polynomials in x with degree
n or less and coefficient in R.
23
Illustrative Example - 2. If the mapping T: P 1 (R) →P 2 (R) defined by
T(p(x)) = (1+x)p(x) for all p(x) in P 1 (R), then show that T is linear transformation.
Solution. Let a, b∈ R and p(x), q(x)∈P 1 (R) and the polynomials T(p(x)), T(q(x))∈
P 2 (R). Then, T (a p(x) + b q(x)) = (1+x)( a p(x)+b q(x))
= a (1+x)( p(x)+b (1+x) q(x)
= a T( p(x)) + b T(1+x) q(x).
Thus, T is a linear transformation of P 1 (R) into P 2 (R).
For verifying the above example whether T is linear transformation or not, check
a specific computation of a value of T is provided by T(3+2x) = (1+x) (3+2x) =
3+5x+2x2.
Note:
1. For vector spaces V and W over a field F, we define an identity transformation
I : V → W by I(v) =v for all v∈V and the zero transformation T 0 : V → W by
T 0 (v) = 0 for all v∈V. It is clear that both of these transformations are linear.
2. If S and T are linear transformation of vector space U into V over a field F, then
sum S + T is defined by (S + T )(u) = S(u) + T(u) for all u in U. Also, for each
linear transformation T of U into V and each scalar in F, we define the product of
a and T to be the mapping aT of U into V given by (aT)(u)=a(T(u)).
Theorem 2. 2. 1. Let U and V be vector space over the same field F. Then the set of all
linear transformation of U into V is a vector space over F.
Proof. Let T 1 , T 2 and T 3 denote arbitrary linear transformation of U into V, let u and v
be arbitrary vectors in U, and let a, b and c be scalars in F.
Since, (T 1 +T 2 ) (au+ bv) = T 1 (au+ bv) + T 2 (au+ bv)
= aT 1 (u)+ b T 1 (v) + aT 2 (u)+ b T 2 (v)
= a[T 1 (u)+ T 2 (u)] + b[T 1 (v)+ T 2 (v)]
= a(T 1 + T 2 )(u) + b(T 1 + T 2 )(v),
Therefore, T 1 + T 2 is a linear transformation of U into V.
Addition is associative, since
(T 1 + (T 2 +T 3 ))(u) = T 1 (u) + (T 2 + T 3 )(u)
24
= T 1 (u) + [T 2 (u) + T 3 (u)]
= [T 1 (u) +T 2 (u)] + T 3 (u)
= (T 1 + T 2 ) (u) + T 3 (u)
= ((T 1 + T 2 ) +T 3 ) (u).
The zero linear transformation Z is an additive identity, since
(T 1 + Z)(u) = T 1 (u) + Z(u) = T 1 (u) + 0 = T 1 (u) for all u in U.
The additive inverse of T 1 is the linear transformation (–T 1 ) of U into V defined by
(–T 1 )(u)= –T 1 (u), since (T 1 + (–T 1 ))(u) = T 1 (u)+ (–T 1 (u)) = 0 for all u in U.
For any u in U, (T 1 + T 2 ) (u) = T 1 (u) + T 2 (u) = T 2 (u) + T 1 (u) = (T 2 + T 1 ) (u),
so T 1 + T 2 = T 2 + T 1 .
Since, (cT 1 ) (au+ bv) = c(T 1 (au+ bv)) = c(aT 1 (u)+ b T 1 (v)) = c(aT 1 (u))+ b (T 1 (v)),
cT 1 is a linear transformation of U into V.
25
Theorem 2. 2. 3. If T is a linear transformation of U into V and A = (u 1 , u 2 , . . . ,u n ) is a
set T(A) ={ T(u 1 ), T(u 2 ), . . . ,T(u n )}. For any vector v in T(U), there is a vector u in U
n
such that T(u) = v. The vector u can be written as u = ∑ ai ui since A is a basis of U.
i =1
n n
This = (∑ ai ui )
gives v T= ∑ aiT (ui ), and T(A) spans T(U).
=i 1 =i 1
2. 2. 2. Dimension Theorem
Theorem 2. 2. 4. Let T be a linear transformation of U into V. If U has finite dimension,
then rank(T) + nullity(T) = dim(U).
Proof. Suppose dim(U) = n, and let nullity(T) = k. Choose ( u 1 , u 2 , . . . ,u k ) to be a basis
of the kernel T–1(0). This linearly independent set can be extended to a basis
A = { u 1 , u 2 , . . . ,u k , u k+1 , . . . ,u n } of U. By Theorem 2. 2. 3, the set T(A) spans T(U).
But T(u 1 ) = T(u 2 ) = T(u 3 ) =, . . . . . . , = T(u n ) = 0, so this means that the set of (n–k)
vectors { T(u k+1 ), T(u k+2 ), . . . ,T(u n )} spans T(U).
To show that this set is linearly independent, suppose that
c k+1 T(u k+1 )+ c k+2 T(u k+2 ) + , . . . ,+ c n T(u n ) = 0.
n
Then T(c k+1 u k+1 + c k+2 u k+2 + , . . . ,+ c n u n ) = 0, and ∑ ciui is in T–1(0). Thus there are
i= k +1
n n n n
scalars (d 1 , d 2 , . . . ,d k ) such that ∑ di ui = ∑ ci ui ⇒ ∑ di ui − ∑ ci ui = 0 . Since A is a
i=
1 i=
k +1 i ==
1 i k +1
26
Let U and V be vector spaces of dimension n and m, respectively, over the same
field F, and T will denote the a linear transformation of U into V. Suppose that
A = ( u 1 , u 2 , . . . ,u n ) is a basis of U. Any u in U can be written uniquely in the form
n n
u = ∑ x j u j , and T (u ) = ∑ x jT (u j ). If B = ( v 1 , v 2 , . . . ,v m ) is a basis of V, then each T(u j )
j =1 j =1
m
can be written uniquely as T (u j ) = ∑ aij vi . Thus, with each choice of basis A and B, a
i =1
j =1,2,…, n.
Note. The symbols of [T ]B, A in the above definition denote the same matrix, but the
first one place notational emphasis on the elements of the matrix, while the second one
place emphasis on T and the basis A and B. This matrix A is also referred to as the
matrix of T with respect to A and B, and we say that T is represented by the matrix A.
Also, the element of a ij are uniquely determined by T for given basis A and B. Another
way to describe A is to observe that the jth column of A is the coordinate matrix of T(u j )
with respect to B.
a1 j
a2 j
That is . = [T(u j )] B and A = [T ]B, A = [T(u 1 )] B , [T(u 2 )] B , . . . . . . , [T(u n )] B.
.
amj
27
Illustrative Example - 1. Let T: P 2 (R)→ P 3 (R) be the linear transformation defined by
T(a 0 + a 1 x+ a 2 x2) = (2a 0 + 2a 2 )+(a 0 + a 1 + 3a 2 )x+( a 1 + 2a 2 )x2 +( a 0 + a 2 ) x3.
Then, find the matrix A of T relative to the basis.
Solution. Let A = {1, 1–x, x2} be a basis of P 2 (R) and B={ 1, x,1–x2, 1+x3 }be a basis
of P 3 (R).
To find the first column of A, we compute T(1) and write it as a linear combination of the
vectors in B.
If T(1) = 2+x+x3= (1)(1)+(1)(x)+(0)(1–x2)+(1)( 1+x3).
1
1
Then, the first column of A is [T(1)] B = ,
0
1
To find the second column of A, we compute T(1–x) and write it as a linear combination
of the vectors in B.
If T(1–x ) = 2–x+x3= (0)(1)+(0)(x)+(1)( 1–x2)+(1)( 1+x3).
0
0
Then, the second column of A is [T(1–x)] B =
1
1
To find the third column of A, we compute T(x2) and write it as a linear combination of
the vectors in B.
If T(x2) = 2+3x+2 x2+x3 = (3)(1) + (3)(x) + ( –2)( 1–x2) + (1)( 1+x3).
3
3
Then, the third column of A is [T(x2)] B =
−2
1
Thus the matrix of T with respect to A and B is given by
28
1 0 3
1 0 3
A= [T ]B, A = [T(1)] B , [T(1–x)] B , [T(x2)] B =
0 1 −2
1 1 1 .
1 0 −1 1
[T ]ε 3 ,ε 4 = 2 1 0 3 .
1 2 3 3
Note.
1. If A is any matrix that represents the linear transformation T of U into V, then
2. The matrix of T relative to the basis A and B is the matrix A = [a ij ] m×n = [T ]B, A .
3. The matrix of T relative to the basis A and B is the matrix A = [a ij ] m×n = [T ]B, A
is a matrix such that the equation [T(u)] B = A[u] A is satisfied for all u in U, then
29
Definition 2. 4. 1. Let U, V and Z be vector spaces over the same field F, and let S:
U→V and T : V → Z be linear transformations. Then the product TS is the mapping
of U into Z (that is TS : U → Z) defined by TS(u) = (T o S)(u) = T(S(u)) for each u in U.
Theorem 2. 4. 2. Suppose that U, V and Z are finite dimensional vector spaces with
bases A, B and C, respectively. If S has matrix A relative to A and B and T has matrix
[c ij ] p×n be the matrix of TS relative to the basis A and C. For j=1, 2, …, n , we have
p
∑ cij wi = TS (u j )
i =1
m
= T ( ∑ akj vk )
k =1
m
= ∑ akjT (vk )
k =1
30
m p
= ∑ akj (∑ bik wi )
=k 1 =i 1
m p
= ∑ ∑ (akj bik wi )
k= 1 =i 1
p m
= ∑ ( ∑ (bik akj ) wi ,
=i 1 =
k 1
m
and consequently cij = ∑ bik akj for all values of i and j.
k =1
Illustrative Example -1. If the linear transformation S : M 2x2 (R) →R3 and
a b
T : R3→P 1 (R) defined by S =(a + 2b, b − 3c, c + d ) and T(a 1 , a 2, a 3 ) = (a 1 –
c d
(iii) Verify that the matrix of TS is the product of the matrix of T times the matrix of S.
a b a b
TS = T S
c d c d
= T( a + 2b, b − 3c, c + d )
= (a+2b–2(b–3c) –6(c+d) + (b–3c+3(c+d))x
= (a–6d) + (b+3d)x.
1 0 0 1
(ii) Since S = (1, 0, 0), S = (2,1, 0),
0 0 0 0
0 0 0 0
S = (0, −3,1), S = (0, 0,1) .
1 0 0 1
31
1 2 0 0
A 0 1 −3 0 .
The matrix A of S relative to A and B is=
0 0 1 1
1 0 0 − 6
so the matrix of TS relative to A and C is C =
0 1 0 3
(iii) Verify the usual arithmetic matrix multiplication, we have
1 2 0 0
1 −2 −6 1 0 0 −6
=
BA 3 0
0 1 −= = C.
0 1 3 0 0 1 1 0 1 0 3
Note.
1. In the above example the product AB is not defined, and this is consistent with
the fact that S is not defined
2. The operation of addition and multiplication of matrices are connected by the
distributive property, that is A(B+C) = AB+AC, where A = [a ij ] m×n , B = [b ij ] p×m
and C = [c ij ] p×n .
3. Let A be an n × n matrix with entries from a field F. The mapping T A : F n → F n
defined by T A (x) = Ax (the matrix product of A and x) for each column vector x∈
F n . This is known as Left-Multiplication transformation.
32
Illustrative Example -2. Let T : R2 → R2 be a linear transformation defined by
T(a, b) = (2a – 3b, a + b) for all (a, b) ∈ R2. Find the matrix of T relative to bases
B = {v 1 = (1, 0), v 2 = (0, 1)} and B1 = { 𝑣11 = (2, 3), 𝑣21 = (1, 2)}.
Solution. In order to find the matrix of t relative to bases B and B 1, we have to express
T(v 1 ) and T(v 2 ) as a linear combination of 𝑣11 and 𝑣21 .R
For this, we first find the coordinates of an arbitrary vector (a, b) ∈ R2 with respect to
basis B1.
(a, b) = 𝑥𝑣11 +y 𝑣21
R
3 −4 −7
T
3
[T ]B,B1 =
= −4 11 .
−7 1
Definition 2. 5. 1. Let V be a vector space over a field F and let Hom (V, V) or A(V) be
the set of all linear transformation of V onto itself. This is known as Algebra of Linear
transformation.
That is a map T : U →V is a linear transformation on a vector space with U=V.
For T 1, T 2 ∈ A(V),
(T 1 + T 2 )(v) = T 1 (v) + T 2 (v) ⇒ T 1 + T 2 ∈ A(V)
33
a(T(v)) = (aT)(v) ⇒ aT ∈ A(V)
Now, we show that (T 1 T 2 ) (au + bv) = a(T 1 T 2 )(u) + b(T 1 T 2 )(v) for all a, b ∈ F, u, v ∈
V and T 1 ,T 2 ∈ A(V).
Consider (T 1 T 2 ) (au +bv) = T 1 (T 2 (au + bv))
= T 1 (aT 2 (u)) + T 1 (bT 2 (u))
= a(T 1 T 2 )(u) + b(T 1 T 2 )(v).
34
6. If V is finite dimensional over F and if Tϵ A(V) is non-invertible(singular), then
there exists an S ≠ 0 in A(V) such that ST = TS = 0.
Note.
1. Let A be an n×n matrix. Then A is invertible if there exists an n×n matrix B such
that AB=BA=I, which is unique (If C is an another such matrix, then C = CI =
C(AB) = (CA)B = IB+B). The matrix B is called the inverse of A and is denoted
by A–1.
2. If V is finite dimensional vector space over a field F with ordered bases A and B,
respectively, then a linear transformation T∈A(V) is invertible if and only if
2. 5. 3. Isomorphism
Note.
1. Let U and V be finite dimensional vector space (over the same filed F). Then V is
isomorphic to U if and only if dim(U) = dim(V).
35
2. Let V be finite dimensional vector space over a field F. Then V is isomorphic to
F n if and only if dim (V) = n.
36
⇒ T x (e) = 0 where e is the unit element of V
⇒ ex = 0 or x = 0, since e ≠ 0
Therefore Ker (ψ) = {0} and hence ψ is one to one.
So ψ is an isomorphism of A in to A(V).
Theorem 2. 5. 3. Let A be an algebra, with unit element, over a field F, and suppose that
A is of dimension m over F. Then every element in A satisfies some nontrivial
polynomial in F[x] (or P n (F)) of degree at most m.
Proof. Let e be an unit element of A. For y∈ A, we have e, y, y2,. . ., ym in A.
Since A is m - dimensional over a field F, it follows that the (m+1)- elements (vectors)
e, y,y2, . . . ,ym are linearly dependent over a field F. So there exist the elements
a 0 , a 1 ,……,a m in a field F, not all zero, therefore x satisfies the non trivial polynomial
q(x) = a 0 + a 1 +,……..,+a m x m , of degree at most m, in F[x]. Hence, degree of p(x) ≤ m.
37
Thus, T(x, y) = (0,0) ⇔ (x, y) = (0, 0).
To find the formula for T– 1.
Let T(x, y) = (a, b). Then, T – 1(a, b) = (x, y)
Now, T(x, y) = (a, b) ⇒ (2x + y, 3x + 2y) = (a, b)
⇒ 2x + y =a and 3x + 2y=b
⇒ x = 2a – b, y = –3a + 2b
Therefore, T – 1(a, b) = (2a – b, –3a + 2b).
The relation between those matrices that represent the same linear transformation.
This description is found by examining the effect that a change in the bases of U and V
has on the matrix of T.
Theorem 2. 6. 1. Let C= (w 1 , w 2 , . . . ,w k ) and C1 = (w 1 1, w 2 1, . . . , w k 1 ) be two bases
of the vector space W over F. For an arbitrary vector w in W, let
c1 c11
c 1
2 c2
[ w]C = C = . and [w]C l = C 1 = .
. .
c c 1
k k
denote the coordinate matrices of w relative to C and C1 , respectively. If P is the matrix
Proof. Let P = [p ij ] k×k , and assume that the hypotheses of the theorem are satisfied.
k k
= ∑=
ci wi ∑ c1j w1j and
k
Then w w1j = ∑ pi j wi . Combining these equalities, we have
=i 1 =j 1 i =1
( )
k k k
∑ pi j wi ==∑ ∑ pi j c j wi .
k 1
w = ∑ ci wi = c j
1
i =1 i =1 i 1 =j 1
38
c1 p11c11 + p12 c11 + ....... + p1k ck1
1
c2 p21c1 + p22 c1 + ....... + p2 k ck
1 1
k
Therefore, ci = ∑ pi j c j and . = = PC1.
1
...
j =1
. ...
ck pk 1c11 + pk 2 c11 + ....... + pkk ck1
Example -1. Consider the bases C = {x, 2+x} and C1={ 4+x, 4–x} of the vector space
W = P 1 (R). Since 4+x = (–1)(x)+(2)(2+x) and 4–x = (–3)(x)+(2)(2+x) the matrix of
−1 −3
transition P from C to C 1 is give by P = .
2 2
2
The vector w=4+3x can be written as 4+3x=2(4+x)+(–1)( 4–x) , so [w] C 1 = .
−1
By Theorem 2.6.1, the coordinate matrix [w] C may be found from
−1 −3 2 1
[w] C =P[w] C 1 = = .
2 2 −1 2
This result can be checked by using the base vectors in C.
We get, (1)(x)+2(2+x) = 4+3x = w.
2. 6. 1. Transition Matrix
Definition 2. 6. 1. If A= ( u 1 , u 2 , . . . ,u r ) is asset of vectors in Rn and B= ( v 1 , v 2 , . . .
Theorem 2. 6. 2. Suppose that T has matrix A = [a ij ] m×n relative to the bases A of U and
39
Figure-1
Proof. Assume the hypotheses of the theorem are satisfied. Let U be an arbitrary vector
in U, let X and X1 denote the coordinate matrices of u relative to A and A1,
respectively, and let Y and Y1 denote the coordinate matrices of T(u) relative to B and
B1, respectively. Since the matrix of T relative to the bases A and B is the matrix
A = [a ij ] m×n = [T ]B, A . If u is an arbitrary vector in U, then [T(u)] B = [T] B,A [u] A and by
Theorem 2. 6. 1, we have Y=AX, where Y=PY1 and X= QX1. Substituting for Y and X,
we have PY1=AQX1, and therefore Y1= ( P–1AQ)X1.
matrix such that the equation [T(u)] B = A[u] A is satisfied for all u in U, then A is the
matrix of the linear transformation T relative to A and B. Thus P–1AQ is the matrix of T
Theorem 2. 6. 2. Two m×n matrices A and B represent the same linear transformation T
of U into V if and only if A and B are equivalent.
Proof. If A and B represent T relative to the set of bases A,B and A1,B1, respectively,
then B= P–1AQ, where Q is the matrix of transition from A to A1and P is the matrix of
40
To obtain two similar results concerning row and column equivalence. For
requiring B =B1 is the same as requiring P=I m , and requiring Q=I n . Thus we have the
following theorem
Theorem 2. 6. 3. Two m×n matrices A and B represent the same linear transformation T
of U into V if and only if they are row (or column) - equivalent.
Illustrative Example - 2. Relative to the basis B = {v 1 , v 2 } = {(1, 1), (2, 3)} of R2, find
the coordinate matrix of (i) v = (4, –3) and (ii) v = (a, b)
Solution. Let x, y ∈ R such that v = xv 1 + yv 2 = x(1, 1) + y(2, 3) = (x + 2y, x + 3y)
(i) If v = (4, –3), then v = xv 1 + yv 2 ⇒ (4, –3) = (x + 2y, x + 3y)
⇒ x + 2y = 4, x + 3y = –3
⇒ x = 18, y = –7.
19
Hence, coordinate matrix [v] B of v relative to the basis B is given by [v] B =
−7
(ii) If v = (a, b), then v = xv 1 + yv 2 ⇒ (a, b) = (x + 2y, x + 3y)
⇒ x + 2y = a, x + 3y = b
⇒ x = 3a – 2b, y = –a + b
3a - 2b
Hence, coordinate matrix [v] B of v relative to the basis B is given by [v] B =
-a + b
2. 7. 1. Linear functional
Definition 2. 7. 1. Liner transformation from a vector space V into its field of scalars F,
which is itself a vector space of dimension 1 over a field F. Such a linear transformation
is called a linear functional on V. We generally use the letters f, g, h….. to denote linear
functional.
Example -1. Let V be the vector space of continuous real valued functions on the
interval [0, 2π]. Consider a function g ∈ V. The function h : V → R defined by
41
1 2π
ℎ(𝑥) = 2π
∫0 x(t) g(t) dt , is a linear functional on V. In the cases that g(t) equals
sin (nt) or cos (nt), h(x) is often called the nth Fourier co-efficient of x.
Definition 2. 6. 2. For a vector space V over F, we define the dual space of V to be the
vector space Hom (V, F), denoted by V*. Thus V* is the vector space consisting of all
linear functional on V with the operations of addition and scalar multiplication. If V is
finite - dimensional, then dim (V*) = dim (Hom (V, F)) = dim (V) ⋅ dim (F) = dim (V).
Hence by above note, if V and V* are isomorphic. We also define the double dual
V** of V to be the dual of V*.
f i (1≤ i ≤ n) is the ith coordinate function with respect to B. Then B* is an ordered basis
42
n n n
Therefore f = g due to the fact that two vector space V and W, with V has a finite basis
{ v 1 , v 2 , . . . ,v n }, if U, T: V→W are linear and U(v i ) = T(v i ) for i = 1,2, …, n, then U=T.
Illustrative Example-3. Let B ={ (2, 1), (3, 1)} be an ordered basis for R2 . Suppose
that the dual basis B is given by B* ={ f 1 , f 2 }. Determine f 1 (x, y) and f 2 (x, y).
Solution. We need to consider the equations
1= f 1 (2, 1) = f 1 (2e 1 + e 2 ) = 2 f 1 (e 1 ) + f 1 (e 2 )
0= f 1 (3, 1) = f 1 (3e 1 + e 2 ) = 3 f 1 (e 1 ) + f 1 (e 2 ).
Solving these equations, we obtain f 1 (e 1 )= – 1 and f 1 (e 2 ) = 3. That is , f 1 (x, y) = – x+ 3y.
Similarly, it can be show that f 2 (x, y) = x– 2y.
2. 8. Summary
2. 9. Keywords
Change of coordinate matrix Identity matrix
Coordinate vector relative to a basis Identity transformation
Composition of linear transformation Invertible linear transformation
Dual basis Invertible matrix
Dual space Isomorphic vector spaces
Kernel of a linear transformation Isomorphism
Left - multiplication transformation Nullity of a linear transformation
Linear functional Null space
Linear transformation Product of Matrices
Matrix multiplication Range
Non-invertible linear transformation Rank of a linear transformation
44
1. Define the range, rank, kernel and nullity of a linear transformation with an
example.
Hint. See the section 2.2.1
2. Show that the mapping J: R2 →R3 given by J(a, b) = (a+b, a–b, b) for all
(a, b) ∈ R2, is a linear transformation. Find the range, rank, kernel and nullity of
transformation J.
Answers. rank(J) = dim(I m (J))= dim(range(J)) = 2 ,
nullity(J) = dim R2 – rank(J) = 2 – 2= 0 and Kernel (J) = Null Space = {(0,0)},
where I m (J) = { J(1,0), J(0,1)}= {(1,1,0), (1, -1, 1)}.
1
Answer. b1 =2 − 2t , b2 =− +t
2
4. Define algebra of linear transformation. Explain the properties of algebra of linear
transformation.
Hint. See the section 2. 5. 1.
5. Give an example to show that A(V) is right-invertible but is not invertible.
Hint. See the section 2. 5. 2.
6. Find the dual basis of the basis set B = {(1, −1, 3), (0, 1, −1), (0, 3, −2)}.
2. 11. References
45
1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI,
2009.
2. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier 2010.
3. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
46
UNIT-3: ELEMENTARY MATRIX OPERATION,
RANK OF A MATRIX, MATRIX INVERSE AND
SYSTEM OF LINEAR EQUATION
STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. Elementary Matrix Operations and Elementary Matrices
3. 2. 1. Elementary row [column] operation
3. 2. 2. Elementary Matrix
3. 3. The Rank of a Matrix and Matrix Inverses
3. 3. 1. Rank of a Matrix
3. 3. 2. Augmented matrix
3. 4. System of linear equations
3. 4. 1. Homogeneous system of equation
3. 5. Summary
3. 6. Keywords
3. 7. Assessment Questions
3. 8. References
47
UNIT-3: ELEMENTARY MATRIX OPERATION,
RANK OF A MATRIX, MATRIX INVERSE AND
SYSTEM OF LINEAR EQUATION
3. 0. Objectives
3. 1. Introduction
In the previous two units, we have learnt about vector space and linear
transformations between two vector spaces U(F) and V(F) defined over the same filed F.
In this unit, we shall see that each linear transformation from n-dimensional vector space
U(F) to an m-dimensional vector space V(F) corresponds to an m×n matrix over a field F.
Here, we see how the matrix representation of a linear transformation changes with the
changes of basis of the given vector with the help of elementary matrix operation.
1 2 3 4
Example - 1. Let A = � 2 1 −1 3�
4 0 1 2
interchanging the second row of A with the first row is an example of an elementary row
operation of Type -1.
2 1 −1 3
The resulting matrix is B = � 1 2 3 4 �.
4 0 1 2
Multiplying the second column of A by 3 is and example of an elementary column
operation of Type - 2.
1 6 3 4
The resulting matrix is C = � 2 3 −1 3 �.
4 0 1 2
Adding 4 times the third row of A to the first row is an example of an elementary row
operation of Type-3.
17 2 7 12
In this case, the resulting matrix is M = � 2 1 −1 3�
4 0 1 2
If a matrix Q can be obtained from a matrix P by means of an elementary row
operation, then P can be obtained from Q by an elementary row operation of the same
type. In the above example, A can be obtained from M by adding –4 times the third row
of M to the first row of M.
3. 2. 2. Elementary Matrix
Our first theorem shows that performing an elementary row operation on a matrix
is equivalent to multiplying the matrix by an elementary matrix.
Theorem 3. 3. 1. Let A∈M mxn (F), and suppose that B is obtained from A by performing
an elementary row [column] operation. Then there exists an m × m [n × n] elementary
matrix E such that B = EA [B = AE].
50
Example - 3. Consider the matrices A and B in Example-1. In this case, B is obtained
from A by interchanging the first two rows of A. Performing this same operation on I 3 ,
we obtain the elementary matrix (as in Example-2)
0 1 0
E = �1 0 0�
0 0 1
Note that EA = B.
In the second part of Example -1, C is obtained from A by multiplying the second
column of A by 3. Performing this same operation on I 4 , we obtain the elementary matrix
1 0 0 0
E= � 0 3 0 0�
0 0 1 0
0 0 0 1
Observe that AE = C. It is a useful fact that the inverse of an elementary matrix is also an
elementary matrix.
Note.
1. Many results about the rank of a matrix follow immediately from the
corresponding facts about a linear transformation. An important result of this
type is that an n × n matrix is invertible if and only if its rank is n.
51
2. Every matrix A is the matrix representation of the linear transformation T A is the
same as the rank of one of its matrix representations, namely, A.
The theorem 3.3.1, extends this fact to any matrix representation of any linear
transformation defined on finite-dimensional vector spaces.
Theorem 3. 3. 3. The rank of any matrix equals the maximum number of its linearly
independent columns; that is, the rank of a matrix is the dimension of the subspace
generated by its columns.
Proof. For any A ∈ M mxn (F), rank (A) = rank (T A ) = dim (rank (T A ) ). Let A be the
rank (T) = span (T(A)) = span ({T(v 1 ), T(v 2 ), T(v 3 ),……,T(v n )}).
1 0 1
Example -1. Let A = � 0 1 1 �
1 0 1
52
Observe that the first and second columns of A are linearly independent and that the third
column is a linear combination of the first two.
1 0 1
Thus, rank(𝐴) = dim �𝑠𝑝𝑎𝑛 ��0�, �1�, �1��� = 2.
1 0 1
To compute the rank of a matrix A, it is frequently useful to postpone the use of
above theorem until A has been suitably modified by means of appropriate elementary
row and column operations so that the number of linearly independent columns is
obvious.
By Note 2, guarantees that the rank of the modified matrix is the same as the rank
of A. One such modification of A can be obtained by using elementary row and column
operations to introduce zero entries. The next example illustrates this procedure.
1 2 1
Example -2. Let A = � 1 0 3� .
1 1 2
If we subtract the first row of A from rows 2 and 3 (Type-3 elementary row operations),
1 2 1
the result is � 0 −2 2 �.
0 −1 1
If we now subtract twice the first column from the second and subtract the first column
from the third (Type-3 elementary column operations),
1 0 0
we obtain, �0 −2 2 �.
0 −1 1
It is now obvious that the maximum number of linearly independent columns of this
matrix is 2. Hence the rank of A is 2.
Example -3.
(i) Let A = �1 2 1 1
�
2 1 −1 1
Note that the first and second rows of A are linearly independent since one is not a
multiple of the other. Thus rank (A) = 2.
1 3 1 1
(ii) Let A = �1 0 1 1�
0 3 0 0
53
In this case, there are several ways to proceed. Suppose that we begin with an
elementary row operation to obtain a zero in the 2, 1 position. Subtracting the first
1 3 1 1
row from the second row, we obtain, �0 −3 0 0 �.
0 3 0 0
Note that the third row is a multiple of the second row, and the first and second rows
are linearly independent. Thus rank (A) = 2.
1 2 3 1
(iii) Let A = �2 1 1 1�
1 −1 1 0
Using Elementary row operations, we can transform A as follows:
1 2 3 1 1 2 3 1
A → �0 −3 −5 −1� → �0 −3 −5 −1�
0 −3 −2 −1 0 0 3 0
It is clear that the last matrix has three linearly independent rows and hence has
rank 3.
3. 3. 2. Augmented matrix
54
2 4 2 0 1 0
The result is �0 2 4 ⋮ 1 0 0�
3 3 1 0 0 1
In order to place a 1 in the 1, 1 position, we must multiply the first row by ½; this
operation yields
1 2 1 0 12 0
�0 2 4 ⋮ 1 0 0�
3 3 1 0 0 1
We now complete work in the first column by adding – 3 times row 1 to row 3 to obtain
1
1 2 1 0 2
0
�0 2 4 ⋮ 1 0 0�
−3
0 −3 −2 0 2
1
In order to change the second column of the preceding matrix into the second column of
I, we multiply row 2 by ½ to obtain a 1 in the 2, 2 position. This operation produces
1
1 2 1 0 2
0
1
�0 1 2 ⋮ 2
0 0�
0 −3 −2 0 −3
1
2
We now complete our work on the second column by adding –2 times row 2 to row 1 and
3 times row 2 to row 3. The result is
1
1 0 −3 −1 2
0
1
�0 1 2 ⋮ 2
0 0�
0 0 4 3 −3
1
2 2
Only the third column remains to be changed. In order to place a 1 in the 3, 3 position,
we multiply row 3 by ¼ ; this operation yields
1
1 0 −3 −1 2
0
1
�0 1 2 ⋮ 2
0 0�
0 0 1 3 −3 1
8 8 4
Adding appropriate multiples of row 3 to rows 1 and 2 completes the process and gives
1 −5 3
1 0 0 8 8 4
−1 3 −1
�0 1 0 ⋮ 4 4 2
�
0 0 1 3 −3 1
8 8 4
55
1 −5 3
8 8 4
−1 3 −1
A–1 = � 4 4 2
�
3 −3 1
8 8 4
1 2 1
A = �2 1 −1 �
1 5 4
Solution. Using the strategy similar to the one used in the above example,
we attempt to use elementary row operations to transform
1 2 1 1 0 0
(𝐴 | 𝐼 ) = �2 1 −1 ⋮ 0 1 0�
1 5 4 0 0 1
into a matrix of the form (I | B). We first add - 2 times row 1 to row 2 and - 1 times row
1 2 1 1 0 0
1 to row 3. We then add row 2 to row 3. The result, �2 1 −1 ⋮ 0 1 0�
1 5 4 0 0 1
1 2 1 1 0 0 1 2 1 1 0 0
⇒ �0 −3 −3 −2 1 0� ⇒ �0 −3 −3 ⋮ −2 1 0� is a matrix
0 3 3 −1 0 1 0 0 0 −3 1 1
with a row whose first 3 entries are zeros. Therefore A is not invertible.
56
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Definition 3. 4. 2. The m × n matrix � ⋮ ⋮ … ⋮ � is called the
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
coefficient matrix of the system.
Note.
𝑥1 𝑏1
𝑥2 𝑏2
1. If we let 𝑥 = �⋮ � and 𝑏 = � �. Then the system may be rewritten as a
⋮
𝑥𝑛 𝑏𝑚
single matrix equation Ax = b.
2. Consider a system of linear equations as a single matrix equation.
𝑠1
𝑠2
𝑠 = �⋮ � ∈ 𝐹 𝑛
𝑠𝑛
such that As = b. The set of all solutions to the system of equation is called the
solution set of the system. System of equation is called consistent if its solution
set is nonempty; otherwise it is called inconsistent.
𝑥1
2 3 1 𝑥 1
Solution. In matrix form, the system can be written as � � � 2 � = � �.
1 −1 2 𝑥 6
3
-6 8
This system has many solutions, such as s = � 2 � and s = � -4 �.
7 -3
57
Illustrative Example -3. Solve the system of equation x 1 + x 2 = 0 and x 1 + x 2 = 1.
1 1 𝑥1 0
Solution. In matrix form, the system can be written as � � �𝑥 � = � �, it is
1 1 2 1
evident that this system has no solutions. Thus we see that a system of linear equations
can have one, many, or no solutions.
1 2 1
Let A= � � be the coefficient matrix of this system.
1 -1 -1
It is clear that rank (A) = 2. If K is the solution set of this system, then
dim (K) = 3 – 2 = 1. Thus any nonzero solution constitutes a basis for K.
1
For example, since � −2 � is a solution to the given system,
3
1
�� −2 �� is a basis for K.
3
1 t
Thus any vector in K is of the form t � -2 � = � -2t � , where t ∈ R.
3 3t
(ii) Consider the system x 1 – 2x 2 + x 3 = 0 of one equation in three unknowns.
If A = (1 –2 1) is the coefficient matrix, then rank (A) = 1.
58
2 −1
Note that � 1 � and � 0 � are linearly independent vectors in K.
0 1
Thus they constitute a basis for K,
2 −1
so that 𝐾 = �𝑡1 �1� + 𝑡2 � 0� ∶ 𝑡1 , 𝑡2 𝜖 𝑅�.
0 1
3. 5. Summary
59
2. In the above unit, we define the rank of a matrix. We then use elementary operations
to compute the rank of a matrix and a linear transformation. We have remarked that
an n x n matrix is invertible if and only if its rank is n. Since we know how to
compute the rank of any matrix, we can always test a matrix to determine whether it
is invertible. The section concludes with a procedure for computing the inverse of an
invertible matrix.
3. Solving a system of linear equations is probably the most important application of
linear algebra. The familiar method of elimination for solving systems of linear
equation, involves the elimination of variables so that a simpler system can be
obtained. The technique by which the variables are eliminated utilizes three types of
operations:
3. 6. Keywords
3. 7. Assessment Questions
1. Define an elementary row operation. Explain the Three types of operation with an
example.
Hint. See the section. 3. 2. 1.
2. Explain the role of augmented matrix for determine the inverse matrix of the given
matrix.
60
Hint. See the section 3. 3. 2
1 2 3
3. Find the inverse of the matrix A = �1 3 4 �.
1 4 4
4 −4 1
Answer. A−1= � 0 −1 1 �.
−1 2 −1
3. 8. References
61
UNIT- 4: PROPERTIES OF DETERMINANT, COFACTOR
EXPANSIONS AND CRAMER’S RULE
STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. Properties of determinant
4. 2. 1. Fundamentals of determinants
4. 3. Cofactor Expansions
4. 3. 1. Minors and Cofactors
4. 4. Elementary operations and Cramer’s rule
4. 4. 1. The Adjoint Matrix
4. 5. Summary
4. 6. Keywords
4. 7. Assessment Questions
4. 8. References
62
UNIT- 4: PROPERTIES OF DETERMINANT, COFACTOR
EXPANSIONS AND CRAMER’S RULE
4. 0. Objectives
4. 1. Introduction
4. 2. 1. Fundamentals of determinants
of the form (−1)t a1 j1 a2 j2 ...anjn as j 1 , j 2 ,…, j n assumes all possible permutations of the
numbers 1, 2, …, n, and the exponent t is the number of interchanges used to carry
j 1 , j 2 , …, j n into the natural ordering 1, 2, …, n.
The notations det(A) and A are used interchangeably with det(A). When the
elements of A = [a ij ] n are written as a rectangular array, we would have
a11 a12 . . a1n
a a22 . . a21
det( A=
) = 21
A and det(A) is uniquely determined by A.
. . . . .
an1 an1 . . ann
We observe that there are n! terms in the sum det(A) since there are n! possible
ordering of 1, 2, …, n. The determinant of n × n matrix is referred to as n × n determinant,
or a determinant of order n.
Note.
1. Determinants of order 1 and 2.
64
a11 a12
That is a11 = a11 and = a11 a22 – a12 a21 .
a21 a22
2. Determinants of order 3.
By the definition,
Hence, det( A) = a11a22 a33 − a11a23a32 + a12 a23a31 − a12 a21a33 − a13a22 a31 + a13a21a32 .
2 1 1 3 2 1 1 2 −1
Let A 0
Examples - 1.= ,
5 −2 B =
and
−4 5 −1 C= −2 0 7 .
1
−3 4 3 7
2 −3 4 0
Then the by note-2, we have det (A) = 21, det (B) = 81 and det(C) = –70.
denotes the sum over all possible permutations i 1 , i 2 ,…,i n of 1, 2, …, n, and s is the
number of interchanges used to carry i 1 , i 2 ,…,i n into the natural ordering.
Proof. Let =
S ∑ (−1) s ai11 , ai2 2 ,..., ain n. Now both S and det(A) have n! terms. Except
(i )
possibly for sign, each term of S is a term of det(A), and each term of det(A) is a term of
S. Thus, S and det(A) consist of the same terms, with a possible difference in sign.
Consider a certain term (−1) s ai11 , ai2 2 ,..., ain n and (−1)t a1 j1 a2 j2 ...anjn be the corresponding
65
term in det(A). Then (−1) s ai11 , ai2 2 ,..., ain n can be carried into (−1)t a1 j1 a2 j2 ...anjn by s
interchanges of factors since the permutation i 1 , i 2 ,…,i n can be changed into the natural
ordering 1, 2, …, n by s interchanges of elements. This means that the natural ordering
1, 2, …, n can be changed into the permutations j 1 , j 2 , …, j n by s interchanges since the
column subscripts have been interchanged each time the factors were interchanged. But
j 1 , j 2 , …, j n can be carried into 1, 2, …, n by t interchanges, by the definition of det(A).
Thus 1, 2, …, n can be carried into j 1 , j 2 , …, j n and then back into itself by (s+t)
interchanges. Since 1, 2, …, n can be carried into itself by an even number(zero) of
interchanges, (s+t) is even, because the number of interchanges used to a carry a
permutation j 1 , j 2 , …, j n of {1, 2, …, n } into the natural ordering is either always odd or
even. Therefore (–1)s+t = 1 and (–1)s = (–1)t. Now we have the corresponding terms in
det (A) and S with the same sign, and therefore det(A) = S.
Thus det( B=
) ∑ (−1)t b1 j1 b2 j2 ...bnjn by the definition of det (B), and
( j)
=) ∑ ( −1 )t a j11 ,a j2 2 ,...,a jn n .
det( B
( j)
1 2 3 1 2 3
Example - 2. Let A = 2 1 3
and T
A = 2 1 1 be two matrices.
3 1 2 3 3 2
Then det (AT) = det (A) = 6.
66
= ∑ (−1) a1 j1 a2 j2 ....asjr ....arjs ....anjn
t
( j)
( j)
2 −1 3 2
Example -3. Let A = and B = . Then det (B) = – det (A) = –7.
3 2 2 −1
in the expression for det (A) can be nonzero only for 1≤ j 1 , 2≤ j 2 , . . .,n ≤ j n . Now, j 1 , j 2 ,
…, j n must be a permutation, or rearrangement, of {1, 2, …,n}. Hence, we must have
j n =n, j n–1 = n–1,…., j 2 =2, j 1 =1. Thus the only term of det (A) that can be nonzero is the
product of the elements on the main diagonal of A. Since the permutation 1, 2,….,n has
no inversions, the sign associated with it is +.
Therefore, det (A) =a 11 a 22 . . . .a nn .
Similarly, we can prove the lower triangular case.
4. 3. Cofactor Expansions
67
Definition 4. 3. 1. Let A=[a ij ] n be an n × n matrix. Let M ij be the (n–1) × (n–1)
submatrix of A obtained by deleting the ith row and jth column of A. The det(M ij ) is
called the Minor of a ij . The cofactor A ij of a ij is defined as A ij = (–1)i+j det(M ij ).
3 −1 2
Example - 1. Let A = 4
5 6 . Then
7 1 2
4 6
det(M12 ) = = 8 – 42 = –32,
7 2
3 −1
det(M 23 ) = = 3 + 7=10, and
7 1
−1 2
det(M31 ) = = – 6 –10 = –16.
5 6
Also we have, A 12 = (–1)1+2 det (M 12 ) = (–1)( –34) = 34,
that contains a i2 as a factor in another group, and so on for each column number. This
separates the terms in det (A) into n group with no overlapping since each term contains
exactly one factor from row i. In each of the terms containing a i1 , we factor our a i1 and
let F i1 denote the remaining factor. Repeating this process for each a i1 a i2 . . . .a in in
turn,
68
we obtain det(A) = a i1 F i1 + a i2 F i2 + . . . . + a in F in . To finish the proof, we need only
show that F ij =A ij = (–)i+jM ij , where M ij is the minor of a ij . Consider first the case where
i=1 and j=1.
We shall show that a 11 F 11 =a 11 M 11 . Each term in F 11 was obtained be factoring a 11
from a term (−1)−t1 a1 j1 a2 j2 ...anjn in the expansion of det (A). Thus each term F 11 has the
− t1
form (−1) a2 j2 a3 j3 ...anjn , where t 2 is the number of interchanges used to carry j 2 , …, j n
into 2, 3,….,n. Letting j 2 , …, j n range over all permutations of 2,3,…,n, we see
that each of F 11 and M 11 has (n–1)! terms. Now 1, j 2 , …, j n can be carried into natural
ordering by the same interchanges used to carry j 2 , …, j n into 2,3,…,n. That is, we may
take t 1 =t 2 . This means that F 11 and M 11 have exactly the same terms, yielding F 11 = M 11
and a 11 F 11 =a 11 M 11 . Consider now an arbitrary a ij . By (i–1)
interchanges of the original row i with the adjacent row above and then ( j–1)
interchanges of column j with the adjacent column on the left, we obtain a matrix B that
has a ij in the first row, first column position. Since the order of the remaining rows and
columns of A was not changed, the minor of a ij in B is the same M ij as it is in A. If B
results from matrix A by interchanging two rows (columns) of A, then det(B) = – det(A).
So det (B) = (–1)i–1+ j–1 det(A) = (–1)i+j det (A). This gives det (A) = (–1)i+j det (B). The
sum of all the terms in det (B)that contains a ij as a factor is a ij M ij , from our first case.
Since det(A) = (–1)i+j det (B), the sum of all the terms in det (A) that contains a ij as a
factor is (–1)i+j a ij M ij . Thus a ij F ij = (–1)i+j a ij M ij = a ij A ij , and the theorem proved.
Note. If A=[a ij ] 3 , the expansion of det(A) about the 2nd row is given by
a11 a12 a13
=A a=
21 a22 a23 a21 A21 + a22 A22 + a23 A23
a31 a32 a33
=
−a21 (a12 a33 − a13a32 ) + a22 (a11a33 − a13a31 ) − a23 (a11a32 − a12 a31 ).
69
4. 4. Elementary Operations and Cramer Rules
−1 1
6. If A is non singular, then det(A)≠0 and det(A ) =
det(A)
4. 4. 1. The adjoint matrix
Definition 4. 4. 1. Let A = [a ij ] be an n×n matrix. The n×n matrix adj(A), called the
adjoint of A, is the matrix whose i, jth element is the cofactor A ji of a ji . Thus
A11 An1
adj ( A) = .
A Ann
1n
Note.
1. The adjoint of A is formed by taking the transpose of the matrix of cofactors of
the elements of A.
2. It should be noted that the term adjoint has other meaning in linear algebra in
addition to its use in the above definitions.
3. If A = [a ij ] be an n×n matrix. Then
(i) A adj(A)=(adj(A))A= det(A)I n, where I n is an identity matrix.
70
1
(ii) A−1 = adj ( A) , provided det(A)≠0.
det( A)
Our final results of this unit make an important connection between
determinants and the solutions of certain types of systems of linear equations. This
theorem presents a formula for the unknowns in terms of certain determinants. This
formula is commonly known as Cramer’s rule.
1 n
= ∑ ( bk (δ ik det( A)) ) , where δ ik is the Kronecker delta
det( A) k =1
1
= = bi (δ ii det( A) ) bi .
det( A)
n
∑ bk Akj
Thus, the values x j = k =1 furnish a solution of the system.
det( A)
To prove the uniqueness, suppose that x j =y j, j=1, 2, 3…, n, represents any
n
solution to the system. Then the ith equation ∑ aik yk = bi is satisfied for i=1,2,….,n.
k =1
71
If we multiply both members of the ithequation by A i j (j fixed) and form the sum of these
n n n n
n n
equations, we find that ∑ ∑ aik Aij yk = ∑ bi Aij and ∑ ∑ aik Aij yk = ∑ bi Aij .
=i 1
= k 1 =i 1 k 1 =i 1
= =i 1
n
n n n ∑ bi Aij
But, for each k, ∑ aik Aij = δ kj det( A). ∑
Thus δ kj det( A) yk = bi Aij , ∑ and y j = i =1
.
k =1 =k 1 =i 1 det( A)
Hence these y i ’s are the same as the solution given in the statement of the theorem.
n
Note. The sum ∑ bk Akj is the determinant of the matrix obtained by replacing the jth
k =1
4. 5. Summary
1. In this unit, the fundamental of the theory of determinants and its properties are
explored. It is necessary to the study of eigenvalues and eigenvectors of linear
transformation. For this purpose we discuss some important theorems along their
proof.
2. The different method for evaluating the determinant of an n × n matrix, which
reduces the problem to the evaluation of determinants of matrices of order (n–1). We
can then repeat the process for these (n–1) × (n–1) matrices until we get to 2 × 2
72
matrix. For an elementary operation, which are based on the properties of
determinant and cofactor expansion.
3. The important connection between determinants and the solutions of certain types of
systems of linear equations. We present the Cramer’s rule for finding the unknowns
in terms of certain determinants.
4. 6. Keywords
4. 7. Assessment Questions
73
3 −2 1
4. Let A = 5 6 2 . Then compute adj(A) and A–1.
1 0 −3
−18 −6 −10 18 6 10
17 1
94 94 94
Answer. adj ( A)= 17 −1
− 1 and A = − 10
−10 94 94 94
−6 −2 28 6 2 28
− 94
94 94 .
1 2 3
5. If A = 4 5 6 , find the following minors and cofactors:
7 8 9
(i) M 23 and A 23
(ii) M 13 and A 13
–2 x 1 + 3 x 2 – x 3 = 1
x1 + 2 x2 – x3 = 4
–2x 1 – x 2 + x 3 = –3.
Answer. det (A)= – 2, and x 1 =2, x 2 =3, x 3 = 4.
4. 8. References
74
75
74
BLOCK - II
Diagonalization and Inner Product Spaces
75
76
UNIT-1: EIGENVALUES AND EIGENVECTORS,
DIAGONALIZABILITY AND INVARIANT SUBSPACES
STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. Eigenvalues and Eigenvectors
1. 2. 1. Characteristic Polynomial
1. 3. Diagonalizability
1. 3. 1. Polynomial splits
1. 4. Invariant Subspaces
1. 4. 1. Cayley-Hamilton Theorem
1. 5. Summary
1. 6. Keywords
1. 7. Assessment Questions
1. 8. References
77
UNIT-1: EIGENVALUES AND EIGENVECTORS,
DIAGONALIZABILITY AND INVARIANT SUBSPACES
1. 0. Objectives
1. 1. Introduction
78
Note.
1. Let A be in M nxn (F). A nonzero vector v∈ F n is called an eigenvector of A if v is
an eigenvector of T A ; that is, if Av = λv for some scalar λ. The scalar λ is called
the eigenvalue of A corresponding to the eigenvector v.
2. A vector is an eigenvector of a matrix A if and only if it is an eigenvector of T A .
Likewise, a scalar λ is an eigenvalue of A if and only if it is an eigenvalue of T A .
This is true if and only if (A – λI n ) is not invertible. However, this result is equivalent to
the statement that det (A – λIn ) = 0.
1. 2. 1. Characteristic Polynomial
Definition 1. 2. 2. Let A ∈ M m x n (F) be a matrix, The polynomial f(t) = det (A – tI n ) is
called the characteristic polynomial of A.
Note.
1. Let T be a linear operator on a n-dimensional vector space V with ordered basis B.
We define the characteristic polynomial f(t) of T to be the characteristic
79
polynomial of A = [T] B , which is independent of the choice of ordered basis B.
1 1
Illustrative Example -1. Find the eigenvalues of 𝐴 = � � ∈ 𝑀2 𝑥 2 (𝐑).
4 1
Solution. The characteristic polynomial of A is
1−𝑡 1
det (𝐴 − 𝑡𝐼2 ) = det � �
4 1−𝑡
= t2 – 2t – 3 = (t – 3)(t + 1).
It follows from Theorem 1. 2. 1, that the only eigenvalues of A are 3 and –1.
1 1 0
Illustrative Example -2. Find the eigenvalues of the matrix A= �0 2 2�
0 0 3
Solution. The characteristic polynomial of T is
1−𝑡 1 0
det (A – tI 3 ) = det � 0 2−𝑡 2 �
0 0 3−𝑡
= (1 – t) (2 – t)(3 – t) = – (t –1)(t–2)(t –3).
Hence λ is an eigenvalue of T (or A) if and only if λ = 1, 2 or 3.
Note. Let T be a linear transformation on a vector space V over a field F. Then a vector
v∈ V is an eigenvector of T corresponding to eigenvalue λ if and only if v ≠ 0 and
v∈ nullity (T − λI).
Illustrative Example -3. Show that a square matrix A has 0 as eigenvalues if and only if
A is not invertible.
Solution. Let A be an n × n matrix over a field F.
80
First, let 0 be an eigenvalue of matrix A and non-zero vector X ∈ F n be the corresponding
eigenvector. Then,
AX = 0X ⇒ AX = 0
If possible, let A be an invertible matrix. Then,
AX = 0 ⇒ A – 1 (AX) = A –1 0
⇒ (A – 1A) X = 0
⇒ IX = 0
⇒ X=0
But X ≠ 0. So, we arrive at a contradiction.
Hence, A must be a non-invertible matrix.
Conversely, let A be a non-invertible matrix. Then, A is non-invertible.
The system of equations AX = 0 has non-trivial solutions.
So, there exists a non-zero vector X ∈ Fn such that AX = 0 ⇒AX = 0X ⇒ 0 is an
eigenvalue of A.
81
1. 3. Diagonalizability
Note that, if D = [T] B is a diagonal matrix, then for each vector v j ∈B, we have,
1 3 1 3
Illustrative Example -1. Let A = � �, 𝑣1 = � �, and 𝑣2 = � �. Find [T A ] B
4 2 −1 4
1 3 1 2 1
Solution. Since [T] B (v 1 ) = � � � � = � � = −2 � � = −2𝑣1 , v 1 is an
4 2 −1 −2 −1
eigenvector of T A , and hence of A.
Here λ 1 = –2 is the eigenvalue corresponding to v 1 .
1 3 3 15 3
Further, [T] B (v 2 ) = � � � � = � � = 5 � � = 5𝑣2 , and so v 2 is an eigenvector of
4 2 4 20 4
T A , and hence of A, with the corresponding eigenvalue λ 2 = 5.
83
Note that B = {v 1 , v 2 } is an ordered basis for R2 consisting of eigenvectors both A and
T A , and therefore A and T A are diagonalizable.
−2 0
Therefore, by above remark we have [T A ] B = � �.
0 5
1 1
Example -2. Let 𝐴 = � � ∈ 𝑀2 𝑥 2 (𝐑).
1 1
The characteristic polynomial of A (and hence of T A ) is
1−𝑡 1
det(𝐴 − 𝑡𝐼) = det � � = 𝑡 (𝑡 − 2),
1 1−𝑡
and thus the eigen values of T A are 0 and 2.
Since T A is a linear operator on the two-dimensional vector space R2, we conclude from
the preceding Theorem 1. 2. 2, that T A (and hence A) is diagonalizable.
1. 3. 1. Polynomial splits
Definition 1. 3. 2. A polynomial f(t) in P(F) splits over a field F if there are scalars
c, a 1 , a 2 , ……,a n (not necessarily distinct) in a field F such that
f(t) = c(t – a 1 ) (t – a 2 ) . . . . . . . (t – a n ).
85
Definition 1. 4. 1. Let T be a linear operator on a vector space V. A subspace W of V is
called a T-invariant subspace of V if T(W) ⊆ W. That is, if T(v) ∈ W for all v ∈ W.
Example -3. Let T be the linear operator on R3 defined by T(a, b, c) = (–b + c, a + c, 3c)
We determine the T - cyclic subspace generated by e 1 = (1, 0, 0).
Since T(e 1 ) = T (1, 0, 0) = (0, 1, 0) = e 2 and T2(e 1 ) = T(T(e 1 )) = T(e 2 ) = (–1, 0, 0) = –
e1,
It follows that span ({e 1 , T(e 1 ) , T2(e 1 ) . . . . . }) = span ({e 1 , e 2 }) = {(s, t, o): s, t ∈ R}.
86
Let W be the T - cyclic subspace generated by v, and suppose that dim(W) = k. Let T be a
linear operator on a finite - dimensional vector space V, and let W denote the T - cyclic
subspace of V generated by a nonzero vector v ∈ V.
Let dim (W) = k. Then {v, T(v) , T2(v),…….Tk – 1(v)} is a basis for W. There exist scalars
a 0 , a 1 , ……, a k – 1 such that a 0 v + a 1 T(v) + ……….. + a k – 1 Tk – 1(v) + Tk (v) = 0.
This implies that g(t) = (–1)k (a 0 + a 1 t + ……….. + a k – 1 tk – 1 + tk ) is the characteristic
polynomial of T W . Combining these two equations yields
g(T)(v) = (–1)k (a 0 I + a 1 T+ ……….. + a k – 1 Tk – 1 + Tk )(v) = 0.
g(t) divides f(t); hence there exists a polynomial q(t) such that f(t) = q(t)g(t).
So f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(T)(0) = 0.
Example - 4. Let T be the linear operator of R2 defined by T(a, b) = (a + 2b, –2a + b),
1 2
and let B = {e 1 , e 2 }. Then 𝐴 = � �, where A = [T] B . The characteristic
−2 1
polynomial of T is, therefore,
f(t) = det (D− tI)
1−𝑡 2
= 𝑑𝑒𝑡 � �
−2 1−𝑡
= 𝑡 2 − 2𝑡 + 5.
It is easily verified that T 0 = f(T) = T2 – 2T + 5I.
Similarly, f(A) = A2 – 2A + 5I
−3 4 −2 −4 5 0
= � � + � � + � �
−4 −3 4 −2 0 5
0 0
=� �.
0 0
1. 5. Summary
87
1. In this unit we have discussed about matrices and determinants with respect to the
vectors (to be called characteristic vectors or eigenvectors) and a scalar λ (to be called
characteristic value or eigenvalue), such that Ax = λx.
2. A formula for the reflection of R2 about the line y = 2x. The key to our success was
to find a basis B1 . For which [T] B1 , is a diagonal matrix. This unit is concerned with
the so-called diagonalization problem. A solution to the diagonalization problem
leads naturally to the concepts of eigen value and eigen vector. Aside from the
important role that these concepts play in the diagonalization problem, they also
prove to be useful tools in the study of many non-diagonalizable operators.
3. An invariant subspace of a linear mapping T : V → V from some vector space V to
itself is a subspace W of V such that T(W) is contained in W. An invariant subspace of
T is also said to be T- invariant. With the help of an invariant subspace we study the
Cayley-Hamilton theorem states that every square matrix over a commutative ring
(including the real or complex field) satisfies its own characteristic equation.
1. 6. Keywords
Cayley- Hamilton theorem Eigenspace of a matrix
Characteristic polynomial of a linear Eigenvalue of a matrix
operator Eigenvalue of a linear operator
Characteristic polynomial of a matrix Eigenvector of a matrix
Diagonalizable linear operator Invariant subspace
Diagonalizable matrix Splits
Eigenspace of a liner operator Transition matrix
1. 7. Assessment Questions
1. 8. References
1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. P. R. Halmos– Linear Algebra Problem Book, No.16, The Mathematical Association
of America, 1995.
3. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.
89
UNIT-2: INNER PRODUCT SPACE
STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. Inner Products Space
2. 2. 1. Norms
2. 2. 2. Orthogonal and Orthonormal
2. 3. The Gram-Schmidt Orthogonalization Process
2. 4. Orthogonal Complements
2. 5. Summary
2. 6. Keywords
2. 7. Exercise
2. 8. References
90
UNIT-2: INNER PRODUCT SPACE
2. 0. Objectives
2. 1. Introduction
Inner product space is a vector space or function space in which an operation for
combining two vectors or functions (whose result is called an inner product) is defined
and has certain properties. In this unit we also study the Gram-Schmidt process,
which takes a finite, linearly independent set S = {v 1 , …, v k } for k ≤ n and generates
an orthogonal set S′ = {u 1 , …, u k } that spans the same k-dimensional subspace of
Rn as S. Finally, this unit we concentrate the orthogonal complement W⊥ of a subspace W
of an inner product space V is the set of all vectors in V that are orthogonal to every
vector in W.
91
Definition 2. 2. 1. Let V be a vector space over F, an inner product on V is a function
that assigns, to every ordered pair of vectors x and y in V, a scalar in F 1 denoted 〈x, y〉,
such that for all x, y and z in V and all c in F, the following holds:
1. 〈x+ z, y〉 = 〈x, y〉 + 〈y, z〉.
4. 〈x, x〉 > 0 if x ≠ 0.
Note that (3) reduces to 〈x, y〉 = 〈y, x〉 if F = R. Condition (1) and (2) simply require that
the inner product be linear in the first component.
It is easily shown that if a 1 , a 2 , ……,a n ∈ F and y, v 1 , v 2 , ……,v n ∈ V, then
〈 ∑𝑛𝑖=1 𝑎𝑖 𝑣𝑖 , 𝑦 〉 = ∑ni=1 𝑎𝑖 〈𝑣𝑖 , 𝑦 〉
For example, x = (a 1 , a 2 , ……,a n ) and y = (b 1 , b 2 , ……,b n ) in Fn, define
〈x, y〉 = ∑ni=1 ai b� i .
The verification that ( ⋅ , ⋅) satisfies conditions (1) through (4) is easy.
For example, if z = (c 1 , c 2 , ……,c n ), we have for (1)
〈x + z, y〉 = ∑ni=1(ai + ci ) b� i = ∑ni=1 ai b� i + ∑ni=1 ci b� i
= 〈x, y〉 + 〈z, y〉.
Thus, for x = (1 + i, 4) and y = (2 – 3i, 4 + 5i) in C2,
〈x, y〉 = ( 1 + i )(2+3i) + 4(4 – 5i) = 15 – 15i =15(1 – i).
The inner product in the above example is called the standard inner product on F n .
Where F = R the conjugations are not needed, and in early courses this standard inner
product is usually called the dot product and is denoted by x . y instead of 〈x, y〉.
Example - 1. If 〈x, y〉 is any inner product on a vector space V and r > 0, we may define
another inner product by the rule 〈x, y〉1 = r 〈x, y〉. If r ≤ 0, then (4) would not hold.
Example -2. Let V = C( [0,1] ), the vector space of real - valued continuous functions on
1
[0,1]. For f, g ∈ V, define 〈f, g〉 = ∫0 𝑓(𝑡)𝑔(𝑡)𝑑𝑡. Since the preceding integral is linear in
f, (1) and (2) are immediate, and (3) is trivial. If f ≠ 0, then f2 is bounded away from zero
1
on some subinterval of [0,1] and hence 〈f, f〉 = ∫0 [𝑓(𝑡)]2 𝑑𝑡 > 0.
92
Definition 2. 2. 2. Let A ∈ M m × n (F). We define the conjugate transpose or adjoint of
����
A to be the n × m matrix A* such that (A*) ij = �𝐴 ∗
𝚤𝚥 � for all i, j.
𝑖 1 + 2𝑖 −𝑖 2
Example - 3. Let 𝐴= � �. Then 𝐴∗ = � �.
2 3 + 4𝑖 1 − 2𝑖 3 − 4𝑖
A vector space V over a field F endowed with a specific inner product is called an
inner product space. If F = C, we call V a complex inner product space, whereas if
F = R, we call V a real inner product space. It is clear that if V has an inner product
〈x, y〉 and W is a subspace of V, then W is also an inner product space when the same
function 〈x, y〉 is restricted to the vectors x, y ∈ W.
Note. Let V be an inner product space. Then for x, y, z ∈ V and c ∈ F, the following
statements are true.
(i) 〈x, y + z〉 = + 〈x, z〉
(ii) 〈x, cy〉 = 𝑐̅〈x, y〉.
(iii) 〈x, 0〉 = 〈0, x〉 = 0
(iv) 〈x, x〉 = 0 if and only if x = 0
(v) If 〈x, y〉 = 〈x, z〉 for all x ∈ V, then y – z.
2. 2. 1. Norms
Definition 2. 2. 3. Let V be an inner product space. For x ∈ V, we define the norm or
length of x by ‖𝑥‖ = �〈x, x〉.
Theorem 2. 2. 1. Let V be an inner product space over F. Then for all x, y ∈ V and c ∈
F, the following statements are true.
(i) ‖𝑐𝑥‖ = |𝑐| ∙ ‖𝑥‖
(ii) ‖𝑥‖ = 0 if and only if x = 0. In any case, ‖𝑥‖ ≥ 0
(iii) (Cauchy – Schwarz Inequality) |〈x, y〉| = ‖𝑥‖ ⋅ ‖𝑦‖
93
(iv) (Triangle Inequality) ‖𝑥 + 𝑦‖ ≤ ‖𝑥‖ + ‖𝑦‖
94
unit vector. The process of multiplying a nonzero vector by the reciprocal of its length is
called normalizing.
Example - 5. In F3, {(1, 1, 0), (1, –1, 1), (–1, 1, 2)} is an orthogonal set of nonzero
vectors, but it is not orthonormal; however, if we normalize the vectors in the set, we
1 1 1
obtain the orthonormal set. � (1, 1, 0), (1, −1, 1), (−1,1,2)�.
√2 √3 √6
Illustrative Example-6. Let u and v be two vectors in an inner product space v such that
‖𝑢 + 𝑣‖ = ‖𝑢‖ + ‖𝑣‖. Prove that u and v are linear dependent vectors. Give an
example to show that the converse of this statement is not true.
Illustrative Example -7. Let V be an inner product space and u, v ∈ V. Then, prove that
(i) ‖𝑢 + 𝑣‖2 − ‖𝑢 − 𝑣‖2 = 4 〈𝑢, 𝑣〉
95
‖𝑢 + 𝑣‖2 = 〈𝑢 + 𝑣 , 𝑢 + 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢 + 𝑣〉 + 〈𝑣 , 𝑢 + 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢〉 + 〈𝑢 , 𝑣〉 + 〈𝑣 , 𝑢 〉 + 〈𝑣 , 𝑣〉
⇒ ‖𝑢 + 𝑣‖2 = 〈𝑢 , 𝑢〉 + 2 〈𝑢 , 𝑣〉 + 〈𝑣 , 𝑣〉 , since 〈𝑢 , 𝑣〉 = 〈𝑣 , 𝑢〉
⇒ ‖𝑢 + 𝑣‖2 = ‖𝑢‖2 + ‖𝑣‖2 + 2 〈𝑢 , 𝑣〉 → (1)
‖𝑢 − 𝑣‖2 = 〈𝑢 − 𝑣 , 𝑢 − 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢 − 𝑣〉 + 〈−𝑣, 𝑢 − 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 + 〈𝑢, −𝑣〉 + 〈 −𝑣, 𝑢〉 + 〈−𝑣, −𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 − 〈𝑢, 𝑣〉 − 〈 𝑣, 𝑢〉 + (−1)2 〈𝑣, 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = 〈𝑢, 𝑢〉 − 2 〈𝑢, 𝑣〉 + 〈𝑣, 𝑣〉
⇒ ‖𝑢 − 𝑣‖2 = ‖𝑢‖2 − 2 〈𝑢, 𝑣〉 + ‖𝑣‖2 → (2)
On subtracting (2) from (1), we get
‖𝑢 + 𝑣‖2 − ‖𝑢 − 𝑣‖2 = 4 〈𝑢, 𝑣〉
On adding (1) and (2), we get
‖𝑢 + 𝑣‖2 + ‖𝑢 − 𝑣‖2 = 2(‖𝑢‖2 + ‖𝑣‖2 )
Example -1. The standard ordered basis for F n is an orthonormal basis for F n .
1 2 2 −1
Example -2. The set � � , �, � , � � is an orthonormal basis for R2.
√5 √5 √5 √5
96
〈𝑦, 𝑣𝑗 〉
So 𝑎𝑗 = 2 , and the result follows.
�𝑣𝑗 �
The next couple of results follow immediately from the above theorem
Corollary-1. If, in addition to the hypotheses of the above theorem, S is orthonormal and
y ∈ span (S), then 𝑦 = ∑𝑘𝑖=1 〈y, vi 〉 𝑣𝑖 .
If v possesses a finite orthonormal basis, then Corollary-1 allows us to compute
the coefficients in a linear combination very easily.
97
If n = 1, then the theorem is proved by taking 𝑆11 = S 1 . That is, v 1 = w 1 ≠ 0. Assume
R
1
then that the set 𝑆𝑘−1 = { v 1 , v 2 , ……,.v k – 1 }with the desired properties has been
constructed by the repeated use of (1).
We show that the set 𝑆𝑘1 = { v 1 , v 2 , ……,v k –1 v k } also has the desired properties, where v k
1
is obtained from 𝑆𝑘−1 by (1).
1
If v k = 0, then (1) implies that w k ∈ span (𝑆𝑘−1 ) = span (S k – 1 ), which contradicts the
assumption that S k is linearly independent. For 1 ≤ i ≤ k – 1, it follows from (1) that
𝑘−1
〈𝑤𝑘 , vj 〉
〈 𝑣𝑘 , 𝑣𝑖 〉 = 〈 𝑤𝑘 , 𝑣𝑖 〉 − � 2 〈 𝑣𝑗 , 𝑣𝑖 〉
𝑗=1 �𝑣𝑗 �
〈𝑤𝑘 ,vj 〉 2
= 〈 𝑤𝑘 , 𝑣𝑖 〉 − 2 �𝑣𝑗 �
�𝑣𝑗 �
= 0.
1
Since 〈 𝑣𝑗 , 𝑣𝑖 〉 = 0 if i ≠ j by the induction assumption that 𝑆𝑘−1 is orthogonal.
Hence 𝑆𝑘1 is an orthogonal set of nonzero vectors.
Now, by (1), we have that span(𝑆𝑘1 ) ⊆ span(S k ).
But by Corollary-2 to Theorem 2. 3.1, 𝑆𝑘1 is linearly independent.
So dim(span(𝑆𝑘1 ) ) = dim (span (S k )) = k.
Therefore span (𝑆𝑘1 ) = span (S k ).
The construction of {v 1 , v 2 , ……,v n } by the use of theorem 2. 3. 2 is called the Gram-
Schmidt Orthogonalization Process.
2. 4. Orthogonal Complement
S; That is, S⊥ = {x ∈ V: x, y = 0 for all y ∈ S}. The set S⊥ is called the orthogonal
k k
z , v j = y − ∑ y , vi vi , v j = y − v j − ∑ y , vi vi , v j
i= 1 i =1
= y,v j − y,v j = 0
To show uniqueness of u and z, suppose that y = u + z = u1 + z1, where u1 ∈ W and
z1 ∈ W⊥. Then u – u1 = z1 – z ∈ W∩ W⊥ = {0}. Therefore u = u1 and z = z1.
99
Theorem 2. 4. 2. Suppose that S = {v 1 , v 2 , …,v n } is an orthonormal set in an n-
dimensional inner product space V. Then
(i) S can be extended to an orthonormal basis { v 1 , v 2 , ……,v k , v k + 1 ,……, v n } for
V.
(ii) If W = span(S), then S 1 = { v k + 1 , v k + 2 , ……,v n } is an orthonormal basis for
W⊥.
(iii) If W is any subspace of V, then dim (V) = dim (W) + dim (W⊥).
Proof. (i) Any finite generating set for V with dimension n contains at aleast n vectors,
and a generating set for vector space V that contains exactly n vectors is a basis for V, S
can be extended to an ordered basis S1 = {v 1 , v 2 , . . . . , v k , w k + 1 , . . . . . , w n } for V.
Now apply the Gram-Schmidt Orthogonalization Process to S1. The first k
vectors resulting from this process are the vectors in S, and this new set spans V.
Normalizing the last (n – k) vectors of this set produces an orthonormal set that spans V.
The result follows.
(ii) Because S 1 is a subset of a basis, it is linearly independent. Since S 1 is
clearly a subset of W⊥, we need only show that it spans W⊥. Note that, for any x ∈ V,
Example -1. Let W = span ({e 1 , e 2 }) in F3. Then x = (a, b, c) ∈ W⊥ if and only if
Illustrated Example -2. Let C[ – π, π] be the inner product space of all continuous
π
functions defined on [ – π, π] with the inner product defined by 〈𝑓, 𝑔〉 = ∫−π 𝑓(𝑡)𝑔(𝑡)𝑑𝑡.
100
π
Solution . We have, 〈𝑓, 𝑔〉 = ∫−π 𝑓(𝑡)𝑔(𝑡)𝑑𝑡
π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = ∫−π 𝑠𝑖𝑛 𝑡 𝑐𝑜𝑠 𝑡 𝑑𝑡
1 π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = ∫ 𝑠𝑖𝑛 2𝑡 𝑑𝑡
2 −π
−1 π
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = [cos 2𝑡]− π
4
−1
〈sin 𝑡, 𝑐𝑜𝑠 𝑡 〉 = (−1 − 1) = 0
4
Thus, sin t and cos t are orthogonal functions in the inner product space C[–π, π].
Illustrated Example -3. Let u = (−1, 4, −3) be a vector in the inner product space with
the standard inner product. Find a basis of the subspace u⊥ of R3.
Solution. We have
𝑢⊥ = { 𝑣 ∈ 𝑅 3 : 〈𝑢, 𝑣〉 = 0} or , 𝑢⊥ = { 𝑣 = (𝑥, 𝑦, 𝑧)∈ 𝑅 3 : − 𝑥 + 4𝑦 − 3𝑧 = 0}
Thus, u⊥ consists of all vectors v = (x, y, z) such that –x + 4y − 3z = 0. In this equation
there are only two free variables. Taking y and z as free variable, we find that
y = 1, z = 1 ⇒ x = 1; y = 0, z = 1 ⇒ x = − 3
Thus, v 1 = (1, 1, 1) and v 2 = (−3, 0, 1) are two independent solutions of −x + 4y −3z = 0.
Hence, {v 1 = (1, 1, 1), v 2 = (−3, 0, 1)} form a basis for u⊥
‖𝑢‖2 = 〈𝑣, 𝑣〉 − 〈𝑣, �〈𝑣, 𝑣𝑗 〉𝑣𝑗 〉 − 〈�〈𝑣, 𝑣𝑖 〉𝑣𝑖 , 𝑣〉 + 〈�〈𝑣, 𝑣𝑖 〉𝑣𝑖 , �〈𝑣, 𝑣𝑗 〉𝑣𝑗 〉
𝑗=1 𝑖=1 𝑖=1 𝑗=1
𝑛 𝑛 𝑛 𝑛
101
𝑛 𝑛 𝑛
⇒ 𝑣 − �〈𝑣, 𝑣𝑖 〉𝑣𝑖 = 0𝑣
𝑖=1
𝑛
⇒ 𝑣 = �〈𝑣, 𝑣𝑖 〉𝑣𝑖
𝑖=1
2. 4. Summary
1. Most applications of mathematics are involved with the concept of measurement and
hence of the magnitude or relative size of various quantities. So it is not surprising
that the fields of real and complex numbers, which have a built - in notion of distance,
should play a special role. We assume that all vector spaces are over the field F,
where F denotes either R or C. In this unit we shall study a special class of vector
spaces which is very rich in geometry. Consider V R = R3. For any a = (a 1 , a 2 , a 3 )
102
∈ V, the length |𝑎| = �𝑎12 + 𝑎22 + 𝑎32 Further, given b = (b 1 , b 2 , b 3 ) ∈ V the dot
product (or inner product) a . b = a 1 b 1 , a 2 b 2 , a 3 b 3 is well known. The angle θ
𝑎 .𝑏
between a and b is derived from the equation. 𝐶𝑜𝑠 𝜃 = ‖𝑎‖ ‖𝑏‖
. Here, we satudy
the idea of distance or length into vector spaces via a much richer structure, the so-
called inner product space structure.
2. In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt
process is a method for orthonormalising a set of vectors in an inner product space,
most commonly the Euclidean space Rn. The Gram–Schmidt process takes a finite,
linearly independent set S = {v 1 , …, v k } for k ≤ n and generates an orthogonal set S′ =
{u 1 , …, u k } that spans the same k-dimensional subspace of Rn as S.
3. Also, the orthogonal complement W⊥ of a subspace W of an inner product space V is
the set of all vectors in V that are orthogonal to every vector in W .
2. 5. Keywords
2. 6. Assessment Questions
103
3. Let V be an inner product space with 〈 , 〉 as an inner product and u, v in V. Then
show that u = v if and only if 〈 u, w 〉 = 〈 v, w 〉 for all w in W.
Hint. Use the properties of inner product space.
4. Let V be a finite dimensional inner product space. Show that V has an
orthonormal set as a basis.
Hint. See Theorem 2. 4. 2-(i)
5. Apply the Gram-Schmidt orthogonalization process to the basis B={(1, 0,1), ( 1,
0, –1), (0, 3, 4)}of the inner product space R3 to find an orthogonal and on
orthonormal basis of R3.
1 1 1 −1
Answer. u 1 = � , 0, �, u 2 = � , 0, � and u 3 = (0, 1, 0).
√2 √2 √2 √2
2. 7. References
1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Second Edition, PHI, 1978.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.
104
UNIT-3: THE ADJOINT, NORMAL, SELF-ADJOINT, UNITARY
AND ORTHOGONAL OPERATORS, ORTHOGONAL
PROJECTIONS AND THE SPECTRAL THEOREM
STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. The Adjoint of a linear operator
3. 2. 1. Properties of Adjoint operators and their Matrices
3. 3. The Normal and Self-Adjoint operators
3. 3. 1. Normal linear operator
3. 3. 2. Self-Adjoint operator
3. 4. Unitary and Orthogonal operators
3. 4. 1. Unitary linear operator
3. 4. 2. Orthogonal operator
3. 5. Orthogonal Projections
3. 5. 1. The Spectral Theorem
3. 6. Summary
3. 7. Keywords
3. 8. Assessment Questions
3. 9. References
105
UNIT-3: THE ADJOINT, NORMAL, SELF-ADJOINT, UNITARY
AND ORTHOGONAL OPERATORS, ORTHOGONAL
PROJECTIONS AND THE SPECTRAL THEOREM
3. 0. Objectives
3. 1. Introduction
This unit investigates the space A(V) of linear operator T on an inner product
space V. Adjoints of operators generalize conjugate transposes of square matrices to
(possibly) infinite-dimensional situations. The adjoint of an operator A is also sometimes
called the Hermitian conjugate (after Charles Hermite) of A. So, most of the results on
unitary spaces are identical to the corresponding results on inner product space. In this
unit, we learn about the normal and self - adjoint operators, unitary and orthogonal
operators and their matrices, orthogonal projections and the spectral theorem, bilinear and
quadratic forms.
106
3. 2. The Adjoint of a linear operator
Theorem 3. 2. 1. Let V be a finite- dimensional inner product space over a field F, and
let g: V → F be a linear transformation. Then there exists a unique vector y ∈ V such
that g(x) = 〈 x, y 〉 for all x ∈ V.
Proof. Let B = {v 1 , v 2 , ……v n } be an orthonormal basis for V, and let 𝑦=
�������
∑𝑛𝑖=1 𝑔(𝑣 𝚤 ) 𝑣𝑖 . Define h: V → F by h(x) = 〈 x, y 〉, which is linear.
Since g and h both agree on B, so we have, g = h due to the fact that two vector space
V and W, with V has a finite basis { v 1 , v 2 , . . . ,v n }, if U, T: V→W are linear and
U(v i ) = T(v i ) for i = 1,2, …, n, then U=T .
To show that y is unique, suppose that g(x) = 〈 x, y1 〉 for all x.
Then 〈 x, y 〉 = 〈 x, y1 〉 for all x, since if 〈 x, y 〉 = 〈 x, z 〉 for all x, y, z ∈ V , then y = z.
Hence, we have y = y1.
107
Proof. Let y ∈ V. Define g: V → F by g(x) = 〈𝑇(𝑥), 𝑦〉 for all x ∈ V.
First, we show that g is linear.
Let x 1 , x 2 ∈ V and c ∈ F. Then g( cx 1 + x 2 ) = 〈𝑇(𝑐𝑥1 + 𝑥2 ), 𝑦〉
= 〈𝑐 𝑇(𝑥1 ) + 𝑇(𝑥2 ), 𝑦〉
= 𝑐 〈𝑇 (𝑥1 ), 𝑦〉 + 〈𝑇 (𝑥2 ), 𝑦〉
= 𝑐𝑔(𝑥1 ) + 𝑔(𝑥2 ).
Hence g is linear.
Now, we apply the Theorem 3. 2. 1, to obtain a unique vector y1 ∈ V such that
g(x) = 〈𝑥 , 𝑦1 〉. That is 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑦1 〉 for all x ∈ V.
Defining T*: V → V by T*(y) = y1, we have 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑇 ∗ ( 𝑦)〉.
To show that T* is linear, we consider y 1 , y 2 ∈ V and c ∈ F. Then for any x ∈ V, we have
〈𝑥 , 𝑇 ∗ (𝑐𝑦1 + 𝑦2 )〉 = 〈𝑇(𝑥), 𝑐𝑦1 + 𝑦2 〉
= 𝑐̅〈𝑇(𝑥), 𝑦1 〉 + 〈𝑇(𝑥), 𝑦2 〉
= 𝑐̅〈𝑥, 𝑇 ∗ (𝑦1 )〉 + 〈𝑥, 𝑇 ∗ (𝑦2 )〉
= 〈𝑥, 𝑐 𝑇 ∗ (𝑦1 ) + 𝑇 ∗ (𝑦2 ) 〉
Since x is arbitrary, we have T*(cy 1 + y 2 ) = c T*(y 1 ) + T*(y 2 ).
Let V be an inner product space. If 〈𝑥, 𝑦〉 = 〈𝑥, 𝑧〉 for all x, y, z ∈ V, then y = z.
Finally, we show that T* is unique.
Suppose that U: V → V is linear and that it satisfies 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥, 𝑈(𝑦)〉 for all
x, y ∈ V. Then 〈𝑥 , 𝑇 ∗ ( 𝑦)〉 = 〈𝑥, 𝑈(𝑦)〉 for all x, y ∈ V, so T* = U, this completes the
proof.
Note. The linear operator T* described in the above theorem is called the adjoint of the
operator T. Thus T* is the unique operator on V satisfying 〈𝑇 (𝑥), 𝑦〉 = 〈𝑥 , 𝑇 ∗ ( 𝑦)〉 for
all x , y ∈ V.
108
Then V is an inner product space with an orthonormal basis B = {v 1 , v 2 ,. . . , v n } and T
Hence B = A*.
Note.
1. Let A ∈ M m× n (F), x∈F n , and y ∈ F m . Then 〈𝐴𝑥 , 𝑦〉𝑚 = 〈𝑥 , 𝐴∗ 𝑦〉𝑛 .
2. Let A ∈ M m× n (F). Then rank (A*A) = rank (A).
3. If A is an m× n matrix such that rank (A) = n, then A*A is invertible.
109
Illustrative Example-3. Find the adjoint of linear transformation T: R2 → R2 given by
T(x, y) = (x +2y, x−y) for all (x, y) ∈ R2 .
Solution. Clearly B = {e 1 (2), e 2 (2)} is an orthonormal basis of R2 such that
T(e 1 (2)) = T(1,0) = (1,1) = le 1 (2) + le 2 (2)
T(e 2 (2)) = T(0, 1) = (2, −1) = 2e 1 (2) − 1e 2 (2)
The matrix A that represents T relative to the standard basis B is given by
1 2 1 1
𝐴= � � ⇒ 𝐴𝑇 = � �
1 −1 2 −1
The adjoint T* is represented by the transpose of A.
1 1 𝑥 𝑥+𝑦
Hence, 𝑇 ∗ (𝑋) = 𝐴𝑇 𝑋 = � � �𝑦� = � �
2 −1 2𝑥 − 𝑦
Therefore, T* (x, y) = (x + y, 2x − y).
Note.
1. An subspace W of a vector space V is said to be T-invariant if T(W) is contained
W and is defined by T W :W→W by T W (x)= T(x).
2. A polynomial is said to split if it factors into linear polynomials.
110
3. Let T be a linear operator on n- dimensional inner product space V. Suppose that
the eigen polynomial of T splits. Then there exists an orthonormal basis B for V
such that the matrix [T] B is upper triangular. This is known as a Schur Theorem.
Example - 4. Suppose that A is a real skew - symmetric matrix, that is, AT= − A. Then A
is normal because both AAT and ATA are equal to −A2.
0 = U ( x) = U * ( x) = (T * − λ I )( x) = T * − λ x
111
(iv) Let λ1 and λ2 be distinct eigen values of T with corresponding eigenvectors x 1
and x 2 . Then, using (iii), we have
λ 1 〈 x 1 , x 2 〉 = 〈λ 1 x 1 , x 2 〉 = 〈T(x 1 ), x 2 〉 = 〈x 1 , 𝑇 ∗ (x 2 )〉 = 〈 x 1 , λ� 2 x 2 〉 =λ� 2 〈 x 1 , x 2 〉.
Since λ 1 ≠ λ 2 , we conclude that 〈 x 1 , x 2 〉= 0
112
Definition 3. 3. 2. Let T be a linear operator on an inner product space V. Then T is
known as self-adjoint (Hermitian) if T = T*.
An n × n real or complex matrix A is self-adjoint (Hermitian) if A = A*.
Proof. (i) Suppose that T(x) = λx for x ≠ 0. Because a self-adjoint operator is also
normal, by T is a normal operator on inner product space V. If x is an eigenvector of T,
then x is also an eigenvector of T*. In fact, if T(x) = λx. Then T*(x) = λ� x.), to obtain
λx =T(x) = T*(x) = λ� x. So λ = λ� , that is, λ is real.
(ii) Let dim(V) = n, B be an orthonormal basis for V, and A = [T] B . Then A is self -
adjoint . Let T A be the liner operator on C n defined by T A (x) = Ax for all x ∈ C n .
Note that T A is self - adjoint because [T] A = A, where A is the standard ordered
(orthonormal) basis for C n. So, by (i), the eigenvalues of T A are real. By the
Fundamental theorem of algebra, the characteristic polynomial of T A splits into factors of
the form (t – λ). Since each λ is real, the characteristic polynomial splits over R. But T A
has the same characteristic polynomial as A, which has the same characteristic
polynomial as T. Therefore the characteristic polynomial of T splits.
But, A* = [T]* B = [T*] B = [T] B = A. So A and A* are both upper triangular, and therefore
113
3. 4. Unitary and Orthogonal Operator
Note. In the infinite - dimensional, an operator satisfying the preceding norm requirement
is generally called an isometry. If, in addition, the operator is onto (the condition
guarantees one-to-one), then the operator is called a unitary or orthogonal operator.
Example -1. Let h ∈ H satisfy h(x) = 1 for all x. Define the linear operator T on H by
2π 2
1
h(t ) =1
2
= hf = ∫ h(t ) f (t ) h(t ) f (t ) dt =
2 2
T(f) = hf. Then T( f ) f . Since
2π 0
Note. Let T be a linear operator on a finite - dimensional inner product space V. Then
the following statements are equivalent.
(i) TT* = T*T = 1.
(ii) T ( x) , T ( y ) = x , y for all x, y ∈ V.
T ( x) = x
(iv) for all x ∈ V.
k =1 k =1
114
and the last term represents the inner product of the ith and jth rows of A.
Similarly, we have columns of A and the condition A*A = I .
3. 5. Orthogonal Projections
115
Note. If rank (T)⊥ = nullity (T), then rank (T)⊥ ⊥ = nullity (T) ⊥, provided V is a finite
dimensional inner product space over a field F.
116
3. 5. 1. Spectral Theorem
Theorem 3. 5. 2. Suppose that T is a linear operator on a finite - dimensional inner
product space V over F with the distinct eigenvalues λ 1 , λ 2 , . . . . , λ k . Assume that T is
normal if F = C and that T is self -adjoint if F = R. For each I (1 ≤ i ≤ k), let W i be the
eigenspace of T corresponding to the eigenvalue λ i , and let T i be the orthogonal
projection of V on W i . Then the following statements are true.
(i) V = W 1 ⊕ W2 ⊕ . . . . . . . ⊕ Wk
।
(ii) If W i
denotedeedededededeelkdfkljdjkljkldkljkjldjkldsfdfdfdffddddjkldkljfdkjldddddddd
ddddddddddfsdededdeeedeedeededd denotes the direct sum of the subspaces
W j for j ≠ i, then W⊥ i = W । iiiiiiiii
(iii) T i T j = δ ij T i for 1 ≤ i , j ≤ k.
(iv) I = T 1 + T 2 + . . . . . . . . T k
(v) T = λ 1 T 1 + λ 2 T 2 + . . . . . . .+ λ k T 1 .
Fact 1. Let T be a linear operator on a finite -dimensional complex inner product space
V. Then T is normal if and only if there exists an orthonormal basis for V consisting of
eigenvectors of T.
Fact 2. Let T be a linear operator on a finite- dimensional real inner product space V.
Then T is self - adjoint if and only if there exists an orthonormal basis B for V consisting
of eigenvectors of T.
117
Fact 4. Let V be an inner product space, and let T be a normal operator on V. Then λ 1
and λ 2 are distinct eigenvalues of T with corresponding eigenvectors x 1 and x 2 , then x 1
and x 2 are orthogonal.
118
Note. The set {λ 1 , λ 2 , . . . . ., λ k } of eigenvalues of T is called the Spectrum of T, and
the condition (iv) is called a resolution of the identity operator induced by T and the
condition (v) is called a Spectral decomposition of T.
3. 6. Summary
3. 7. Keywords
119
Spectral decomposition Unitary matrix
Spectral Theorem Unitary operator
Unitarily equivalent matrices
3. 8. Assenssment Questions
1. Define adjoint operator T*. Show that adjoint operator T* is linear operator.
Hint. See Theorem 3. 2. 2
2. Find the adjoint of the linear operator T: R3→ R3 defined by
T(x, y, z) = (x+2y, 3x–4z, y).
Answer. T*(x, y, z) = (x+3y, 2x+z, –4y).
3. Show that a linear operator T is self-adjoint if and only if 〈 T(v), v〉 is real for v.
Hint. See Lemma 3. 3. 4-(i)
4. If A is an orthogonal matrix, show that AT and A−1 are orthogonal.
Hint. See Definition 3. 4. 2.
5. State and Prove the Spectral Theorem.
Hint. See Theorem 3. 5. 2.
3. 9. References
1. S. Friedberg. A. Insel, and L. Spence – Linear Algebra, Fourth Edition, PHI, 2009.
2. I. N. Herstein – Topics in Algebra, Vikas Publishing House, New Delhi, 2002.
3. Hoffman and Kunze – Linear Algebra, Second Edition, PHI, 1978.
4. Jimmie Gilbert and Linda Gilbert – Linear Algebra and Matrix Theory, Academic
Press, An imprint of Elsevier, 2010.
120
UNIT- 4: BILINEAR AND QUADRATIC FORMS
STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. Bilinear Form
4. 2. 1. Symmetric Bilinear Form
4. 3. Quadratic Form
4. 3. 1. Sylvester Inertia Theorem
4. 4. Summary
4. 5. Keywords
4. 6. Assessment Questions
4. 7. References
121
UNIT- 4: BILINEAR AND QUADRATIC FORMS
4. 0. Objectives
In this unit, we will generalize the notion of linear forms. In fact, we will
introduce the notion of a bilinear form on a finite-dimensional vector space. We have
studied linear forms on V(F). Here, we will study bilinear forms as mapping from V×V to
F, which are linear forms in each variable. Bilinear forms also give rise to quadratic and
Hermitian forms.
4. 2. Bilinear Forms
Example -1. Let V be a vector space over F = R. Then the mapping defined by
B(x, y) = x . y (which is the inner product of x and y) for x, y in V, is a bilinear form on V.
123
The bilinearity of B now follows directly from the distributive property of matrix
multiplication over matrix addition.
Example - 3. Let V = Fn, where the vectors are considered as column vectors.
For any A ∈ M n×n (F), define B : V × V → F by B(x, y) = xT Ay for x, y ∈ V.
Notice that since x and y are n × 1 matrices and A is an n × n matrix, B(x, y) is a 1 × 1
matrix. We identify this matrix with its single entry. The bilinearity of B follows as in
the example-2.
For example, for a ∈ F and x 1 , x 2 , y ∈ V, we have
B(ax 1 + x 2 , y) = (ax 1 + x 2 )T A y = (axT 1 + xT 2 )Ay
= axT 1 Ay + xT 2 Ay
= a B(x 1 , y) + B(x 2 , y).
Note.
1. By xT Ay, we understand the product of three matrices.
x1
x
2
That is x = . , A=[a ij ] n×n and y =[y 1 , y 2 , . . . . ., y n ] .
T
.
xn
2. For any bilinear form B on a vector space V over afield F, the following properties
hold:
(i) If, for any x∈V, the functions T x , R x : V →F are defined by T x (y) = B(x, y)
and R x (y)=B(y, x) for all y∈V, then T x and R x are linear.
(ii) B(0, x) = B(x, 0) =0 for all x∈V.
(iii) For all x, y, z, w∈V,
B(x+y, z+w) = B(x, z) + B(x, w)+ B(y, z) + B(y, w)
(iv) If S: V × V→ F is defined by S(x, y) = B(y, x), then C is a bilinear form.
Definition 4. 2. 2. Let V be a vector space, Let B 1 and B 2 be bilinear forms on V, and let
a be a scalar. We define the sum B 1 + B 2 and the scalar product aB 1 by the equations
(B 1 + B 2 )(x, y) = B 1 (x, y) + B 2 (x, y) and
124
(aB 1 )(x, y) = a(B 1 (x, y)) for all x, y ∈ V.
Note.
1. For any vector space V, the sum of two bilinear forms and the product of a scalar
and a bilinear form on v are again bilinear forms on V. Furthermore, B(V) is a
vector space with respect to these operations.
2. Let B = {v 1 , v 2 , . . . . . ,v n } be an ordered basis for an n-dimensional vector space
V, and let B∈ B(V). We can associate with B an n × n matrix A whose entry in ith
row and jth column is defined by A ij = B (v i , v j ) for i, j = 1, 2, . . . . . . n.
3. The matrix A above is called the matrix representation of B with respect to be
the ordered basis B and is denoted by Ψ B is an isomorphism.
125
Definition 4. 2 .4. A bilinear form B on a finite - dimensional vector space V is called
diagonalizable if there is an ordered basis B for V such that Ψ B (B) is a diagonal matrix.
4. 3. Quadratic Form
126
Corollary. Let K be a quadratic form on a finite - dimensional real inner product
space V. There exists an orthonormal basis B = {v 1 , v 2 , . . . . . ,v n } for V and scalars
chosen to be any orthonormal basis for V such that Ψ B (B) is a diagonal matrix.
For proving the Sylvester Inertia Theorem we need following basic definitions of
real quadratic forms:
Definition 4. 3. 3. The sum of the positive characteristic roots and negative charac-
teristic roots , is called the Rank of the real quadratic form.
That is rank(Q) = r + s.
127
Theorem 4. 3. 1. For a quadratic form Q on an n-dimensional real vector space V there
always exist a Sylvester basis. The number r and s of positive and negative entries in the
diagonal matrix are independent of the choice of the Sylvester basis.
Proof. (i) Existence of the Sylvester basis:
We find such a basis by induction on n, rather as in the principal axes
transformation, only this time it is much easier. The theorem is trivial for Q = 0, let us
suppose Q ≠ 0. Then there must be a vector x∈ V with Q(x) = ±1, and this is all we need
for inductive step.
If B is the symmetric bilinear form of Q, then W: = { y∈ V : B(x, y) = 0} is an
(n−1) – dimensional subspace of V [This follows from the dimension theorem. That is, if
V has finite dimensional vector space, then rank(T) + nullity(T) = dim(V)].
By the inductive assumption QW has sylvester basis B = {v 1 , v 2 , . . . . . ,v n }, and we
only need to add x in the right place in order to obtain a Sylvester basis for all V.
(ii) r and s are well defined:
The quantity r can be defined independently of bases as the maximum dimension
of a subspace of V on which Q is positive definite. In order to see this, take some
Sylvester basis and consider the subspace V + , V − and V 0 by the first r and last
(n− r)-vectors , respectively. Then QV + is positive definite, but each higher dimensional
subspace W must by the dimension Theorem meet V − and V 0 non-trivially. Therefore
QW cannot be positive definite. Analogously, s is the maximum dimension of a
subspace on which Q is negative definite.
Note. Let A =[a ij ] be a real, n×n matrix acting on a field F n and that the inner product
for (δ 1 , δ 2 , . . . , δ n ) and (γ 1 , γ 2 , . . . , γ n ) in a field F n is the real number
δ 1 γ 1 + δ 2 γ 2 + . .. . + δ n γ n , for an arbitrary vector v = (x 1 , x 2 , . . . , x n ) in a field F n a
simple calculations shows that
Q(v) = 〈A(v), v〉 = 𝑎11 𝑥12 + 𝑎22 𝑥22 +. . . . . . . +𝑎𝑛𝑛 𝑥𝑛2 + 2 ∑𝑖<𝑗 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗 .
On the other hand, given any quadratic function in n-variables
γ11 𝑥12 + γ22 𝑥22 +. . . . . . . +γ𝑛𝑛 𝑥𝑛2 + 2 ∑𝑖<𝑗 γ𝑖𝑗 𝑥𝑖 𝑥𝑗 ,
128
with real coefficients γ ij , we clearly can realize it as the quadratic form associated with
real symmetric matrix C= γ ij .
Illustrative Example-1. Determine the rank and signature of the real quadratic form
matrix
4 5 −2
A=� 5 1 −2 �.
−2 −2 3
Solution. The characteristic polynomial of the matrix A is
4 5 −2 1 0 0
det (A−tI ) = det �� 5 1 −2 � − 𝑡 � 0 1 0 ��
−2 −2 3 0 0 1
= t3−8t2−14t+43.
It has, two positive and one negative root and the matrix A is non-singular.
Hence, the rank of real symmetric matrix of A is r = 3 and the signature of real
symmetric matrix of A is s = 1.
Illustrative Example-2. Find the rank and signature of the quadratic form
𝑥12 − 4𝑥1 𝑥2 + 𝑥22 .
1 −2 0
Solution. The matrix A=�−2 1 0� has rank of real symmetric matrix of A is r = 2
0 0 0
and characteristic polynomial
1 −2 0 1 0 0
det (A−tI ) = det ��−2 1 0� − 𝑡 � 0 1 0 ��
0 0 0 0 0 1
= t3−2t2−3t.
The roots are t = 0, −1 and 3.
Hence the signature of real symmetric matrix of A is s = 0.
4. 4. Summary
129
1. There is a certain class of scalar - valued functions of two variables defined on a vector
space that arises in the study of such diverse subjects as geometry and multivariable
calculus. This is the class of bilinear forms.
2. We study the basic properties of this class with a special emphasis on symmetric bilinear
forms, and we consider some of its applications to quadratic surfaces. a quadratic form is
a homogeneous polynomial of degree two in a number of variables. Quadratic forms
are homogeneous quadratic polynomials in n variables.
3. The Sylvester Inertia Theorem is a classification theorem for symmetric real n × n matrix,
it solves the classification problem for quadratic forms on Rn, and therefore on each n-
dimensional real vector space V.
4. 5. Keywords
4. 6. Assessment Questions
1. Define:
(i) Bilinear form,
(ii) Symmetric bilinear form with example.
2. Let B be a diagonalizable bilinear form on a finite - dimensional vector space V.
Show that B is symmetric.
Hint. See Corollary of Theorem 4. 2. 1.
3. State and prove the Sylvester Inertia Theorem
Hint. See Theorem 4. 3. 1.
4. Determine the rank and signature of the real quadratic form matrix
130
1 0 −1
A =� 0 2 1 �.
−1 1 0
4. 7. References
131
BLOCK - III
Canonical Forms
131
132
UNIT-1: THE DIAGONAL AND TRIANGULAR FORM
STRUCTURE
1. 0. Objectives
1. 1. Introduction
1. 2. The Diagonal Form
1. 2. 1. Similarity Classes
1. 2. 2. Basic Facts of Diagonalization
1. 2. 3. Simultaneous Diagonalizable
1. 3. The Triangular Form
1. 3. 1. Criteria of a subspace
1. 4. Summary
1. 5. Keywords
1. 6. Assessment Questions
1. 7. References
133
UNIT-1: THE DIAGONAL AND TRIANGULAR FORM
1. 0. Objectives
1. 1. Introduction
1. 2. 1. Similarity Classes
1. Two square matrices A and B are said to be similar matrices if there exists a non
singular matrix C such that B = CAC–1 or A = C–1BC.
2. The linear transformations S, T∈A(V) are said to be similar linear transformation
if there exists an invertible element C∈A(V) such that T = C–1SC.
3. Similarity of linear transformation in A(V) is an equivalence relation, because
134
(i) T ∼ T a T = ITI–1
(ii) T∼S ⇒ T = CSC–1 ⇒ S = C–1T (C–1)–1 ⇒ S ∼ T
(iii) T ∼ S∼ U ⇒ T = CSC–1, S = DUD–1
⇒ T = C (DUD–1) C–1
⇒ T = (CD) U (CD)
–1
⇒ T ∼ U.
The equivalence classes are called similarity classes.
1. The linear operator T is called diagonalizable if there exists a basis for V with
respect to which the matrix for T is a diagonal matrix.
2. Let T be a linear operator on a vector space V, and let λ be an eigenvalue of T.
Define E λ = {x ∈ V: T(x) = λx} = n(T – λIv ). The set E λ is called the eigenspace
of T corresponding to the eigenvalue λ. Analogously, we define the eigenspace of
a square matrix A to be the eigenspace of T A .
Example -1. Let T be the linear operator on P 2 (R) defined by T(f(x)) =f1(x). The
matrix representation of T with respect to the standard ordered basis B for P 2 (R) is
0 1 0
[T] B= �0 0 2�.
0 0 0
Consequently, the characteristic polynomial of T is
−𝑡 1 0
2 � = –t .
3
det ([T] B – tI) =� 0 −𝑡
0 0 −𝑡
Thus T has only one eigenvalue (λ=0) with multiplicity 3. Solving T(f(x)) = f1(x) = 0
shows that E λ = n(T – λI v ) = n(T ) is the subspace of P 2 (R) consisting of the constant
polynomials. So {1} is a basis for E λ , and therefore dim (E λ ) =1.
Consequently, there is no basis for P 2 (R) consisting of eigenvectors of T, and
therefore T is not diagonalizable.
3 1 0
Illustrative Example- 2. Test the matrix A=�0 3 0� ∈𝑀3x3 (𝐑) diagonalizable or not.
0 0 4
Solution. The characteristic polynomial of A is det (A–tI) = – (t –4) (t –3)2, which splits,
and so condition 1 of the test (Fact 6(i)) for diagonalization is satisfied. Also A has
eigenvalues λ 1 = 4 and λ 2 =3 with multiplicities 1 and 2, respectively. Since λ 1 has
multiplicity 1, condition 2 is satisfied for λ 1 . Thus we need only test condition 2 for λ 2 .
0 1 0
Because, A– λ 2 I = �0 0 0� has rank 2,
0 0 1
we see that n – rank (T – λl) = 3 – rank(A– λ 2 I) =1, which is not the multiplicity of λ 2 .
Thus condition 2 fails for λ 2 . Therefore A is not diagonalizable.
Remark. However, not every linear operator is diagonalizable, even if its characteristic
polynomial split. This Block consider to alternative matrix representations for
nondiagonalizable operators (see example-1 and illustrative example-2). These
representations are called Canonical forms.
136
1. 2. 3. Simultaneously diagonalizable
Definition 1. 2. 1. Two linear operators T and U on an n-dimensional vector space V are
called simultaneously diagonalizable if there exists some basis B of V such that [T] B and
with respect to basis B and T[TU ] is the left multiplication transformation by matrix [TU] B .
B
Illustrative Example - 4. Let λ i = 0 for all i and T satisfy Tn = 1. Show that if T has all
the eigenvalues in F, then T is diagonalizable.
137
Solution. Since T has all the eigenvalues of a field F, the minimum polynomial of T is
∏ (t − λ )
ni
q(t) = i . Now we claim that all these roots are simple.
i
138
That is, to show that v 1 +W = v 2 +W ⇒ 𝑇 (v 1 +W) = 𝑇 (v 2 +W)
Let v 1 +W = v 2 +W . Then
⇒ v 1 – v 2 ∈W
⇒ T(v 1 –v 2 )∈W , because T is linear
⇒ T(v 1 ) – T(v 2 ) ∈W , because T(W) ⊂ W, since W is invariant under T
⇒ T(v 1 ) + W = T(v 2 ) +W
⇒ 𝑇 (v 1 +W) = 𝑇 (v 2 +W) or 𝑇 (𝑣 1 ) = 𝑇 (𝑣 2 )
R R
For, (𝑇 )2 (v + W) = 𝑇 (𝑇 (v + W))
= 𝑇 (T(v) +W)
= (TT(v) + W)
= T2 (v) + W
= (𝑇)2(v + W)
139
Similarly, (𝑇 )3 = (𝑇 3), (𝑇)4,…………, (𝑇)k = (𝑇 k)
P
For any polynomial q(x)∈F[x], we have q(𝑇) = 𝑞(𝑇) . There fore q(T) = 0 ⇒ 𝑞(𝑇)= 0,
that is, q(𝑇 ) = 0. Given p(x) is the minimal polynomial of 𝑇.
Therefore p(T) = 0 ⇒ p(𝑇) = 0 and given p 1 (x) is the minimal polynomial of 𝑇 .
Therefore p 1 (𝑇) = 0. Therefore, by the definition of minimal polynomial, we have
p 1 (x)/p(x).
V n +W}
of 𝑉 = V/W over F such that 𝑇 (𝑣2 ) = a 22 𝑣 2 R
𝑇 (𝑣3 ) = a 32𝑣 2 + a 33 𝑣 3
R R
.......................
140
𝑇 (𝑣𝑛 ) = a n2𝑣 2 + a n3 𝑣 3 + . . . . .+ a nn 𝑣 n
R R R
We now verify that B = {v 1 , v 2 , ……,v n } is a basis for V with respect to which T has
matrix in triangular form.
Let a 1 v 1 + a 2 v 2 +…………. +a n v n = 0 → (1)
⇒ (a 1 v 1 + a 2 v 2 +…………. +a n v n )+W = W
⇒ a 1 (v 1 +W) + a 2 (v 2 +W)+……. a n (v n +W) = W
⇒ W + a 2 𝑣 2 + a 3 𝑣 3 +……….. + a n𝑣 n = W
R R R
⇒ a 2 𝑣 2 + a 3 𝑣 3 +……….. + a n𝑣 n = 0 of V/W
R R R
⇒ (T(v 2 ) +W) – a 22 (v 2 + W) = 0
⇒ (T(v 2 ) – a 22 v 2 ) + W = W
⇒ (T(v 2 ) – a 22 v 2 )∈W
⇒ T(v 2 ) – a 22 v 2 = a 21 v 1 , since a 21 v 1 = 0
⇒ T(v 2 ) = a 21 v 1 + a 22 v 2
Similarly , we can prove T(v i ) = a i1 + a i2 +…….. + a ii v i
Thus, T(v 1 ) = λ 1 v 1 or a 11 v 1 (where λ 1 = a 11 )
T(v 2 ) = a 21 v 1 + a 22 v 2
………………………………
T(v i ) = a i1 + a i2 +…….. + a ii v i
……………………………..
T(v n ) = a n1 + a n2 +…….. + a nn v n
𝑎11 0 0
𝑎21 𝑎22 0
Therefore A = � ⋮ ⋮ … ⋮ � is a triangular matrix with respect to the
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
basis B ={v 1 ,v 2 ,………,v n }.
141
Lemma 1. 3. 3. If V is n - dimensional over F and if T∈A(V) has the matrix A in the
basis A ={u 1 ,u 2 ,… …,u n } and there is a matrix an element C ∈F n such that B = CAC–1.
In fact , if S is the linear transformation of V defined by S(u i ) = v i for i = 1,2, ……, n,
then the matrix C can be chosen to the basis B ={v 1 ,v 2 ,………,v n }.
Proof. Let A = [a ij ] and B = [b ij ]. Then T(u i ) = ∑𝑛𝑗=1 𝑎𝑖𝑗 𝑢𝑗 and T(v i ) = ∑𝑛𝑗=1 𝑏𝑖𝑗 𝑣𝑗 ,
R
By the matrix is taken with respect to the change of basis B ={v 1 ,v 2 ,………,v n }.
Therefore CAC–1 = B, by virtue of the fact that the mapping T→ [T ]B, A is an isomorphism
of A(V) on to F n .
Theorem 1. 3. 4. If the matrix A∈F n has all its eigen values in F, then there is a matrix
C∈ F n such that CAC–1 is a triangular matrix.
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Proof. Suppose that A = � ⋮ ⋮ … ⋮ �∈ F n has all its eigenvalues in F.
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
Now, define a linear map T: F n → F n which maps the basis B ={v 1 ,v 2 ,… …,v n }, where
v 1 = (1, 0, . . . ,0), v 2 = (0, 1, . . . ,0), . . . , v n = (0, 0, . . . ,1) as indicated below
T(v 1 ) = (a 11 , a 12 ,………,a 1n ) = a 11 v 1 + a 12 v 2 + a 13 v 3 + ………… + a 1n v n
T(v 2 ) = (a 21 , a 22 ,………,a 2n ) = a 21 v 1 + a 22 v 2 + a 23 v 3 + ………… + a 2n v n
……………………………………………………………………………….
T(v i ) = (a 11 , a 12 ,………,a 1n ) = = a n1 v 1 + a n2 v 2 + a n3 v 3 + ………… + a nn v n
142
𝑎11 𝑎12 𝑎1𝑛
𝑎21 𝑎22 𝑎2𝑛
Therefore A = � ⋮ ⋮ … ⋮ �.
𝑎𝑛1 𝑎𝑛2 𝑎𝑛𝑛
Suppose that the matrix A∈ F n as all its eigen values in F. A defines a linear
transformation T on F, whose matrix in the basis v 1 = (1, 0, . . . ,0), v 2 = (0, 1, . . . ,0), . . .
v n = (0, 0, . . . ,1), is precisely A. The eigen values of T being equal to those A, are all in
F.
By Theorem 1. 3. 2, there is a basis say, B ={v 1 ,v 2 ,… …,v n }, which is a triangular matrix
with respect to the basis of F. However this change of basis, merely changes the
matrix A of linear transformation T in the first basis CAC–1 for a suitable C∈ F n .
By lemma 1. 3. 3, we have CAC–1 is a triangular matrix for some C∈ F n .
Note.
1. Theorem 1. 3. 4, is also known as alternate form of Theorem 1. 3. 2.
2. Next theorem, we use λ i = a ii for i= 1, 2, …., n
Theorem 1. 3. 5. If V is n-dimensional vector space over a field F and T∈A(V) has all
its eigenvalues in F, then T satisfies a polynomial of degree n over F.
Proof. Since T∈A(V) has all its eigenvalues in F, there is a basis B ={v 1 ,v 2 ,… …,v n } of
V in which satisfies
T(v 1 ) = λ 1 v 1 or a 11 v 1 (where λ 1 = a 11 )
T(v 2 ) = a 21 v 1 + λ 2 v 2
T(v 2 ) = a 31 v 1 + a 32 v 2 + λ 3 v 3
……………………………..
T(v n ) = a n1 v 1 + a n2 v 2 +…….. + λ n v n
Equivalently, (T – λ 1 ) v 1 = 0
(T – λ 2 ) v 2 = a 21 v 1
(T – λ 3 ) v 3 = a 31 v 1 + a 32 v 2
………………………………
(T – λ n ) v n = a n1 v 1 + a n2 v 2 +…….. + a n, n–1 v n–1 .
143
Note that (T – λ 1 ) (T – λ 2 ) v 1 = (T – λ 2 ) (T – λ 1 ) v 1 =(T – λ 2 ) .0 = 0, since (T – λ 1 ) v 1 =
0.
Also, (T – λ 1 ) (T – λ 2 ) v 2 = (T – λ 1 ) a 21 v 1 , since (T – λ 2 ) v 2 = a 21 v 1
= a 21 ((T – λ 1 ) v 1 ) = a 21 .0 = 0, since (T – λ 1 ) v 1 = 0.
Continuing this type of computation, we get
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 1 = 0
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 2 = 0
……………………………………………
(T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v n = 0
Let S = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 )∈ A(V). Then
S (v 1 ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 1 = 0
S (v 2 ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v 2 = 0
……………………………………………..………
S (v n ) = (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) v n = 0
That is S annihilates all the vectors of basis of V. So, S annihilates all the vectors of V.
That is S (v) = 0 for all v∈V.
Therefore S = 0 ⇒ (T – λ n ) (T – λ n–1 ) . . . . . (T – λ 1 ) = 0.
Therefore T satisfies the polynomial (x – λ n ) (x – λ n–1 ) . . . . . (x – λ 1 ) in F[x] of
degree n.
This completes the proof.
1. 5. Summary
144
classification theorem and more, in that it not just classifies every class, but gives a
distinguished (canonical) representative.
2. Let's look at what it means for the matrix of T to be diagonal. Recall from
Block -II, we get the matrix by choosing a basis B= {v 1 , v 2 , . . . , v n }, and then
entering the coordinates of T(v 1 ) as the first column, the coordinates of T(u 2 ) as the
second column, etc. The matrix is diagonal, with entries λ 1 , λ 2 , λ 3 , . . . . , λ n , if
and only if the chosen basis has the property that T(v i ) = λ i v i , for 1 ≤ i ≤ n. This
leads to the definition of an eigenvalue and its corresponding eigenvectors. So,
diagonalizing a matrix is equivalent to finding a basis consisting of eigenvectors.
3. Particularly, in triangular canonical form, we study the following
• If T∈A(V) has all its eigenvalues in F, then there is a basis of V in which the
matrix of T is triangular.
• If V is n-dimensional vector space over a field F and T∈A(V) has all its
eigenvalues in F, then T satisfies a polynomial of degree n over F.
1. 6. Keywords
1. 7. Assessment Questions
145
Hint. See the section 1. 2. 2.
3. Let A and B are two diagonalizable n × n matrices. Prove that A, B is
simultaneously diagonalizable if and only if A, B commute. (That is AB = BA).
Hint. See the section 1. 2. 3.
0 −2 −3 −4
4. Let A = � �, B = � �. Find the basis which simultaneously
1 3 2 3
diagonalizes A and B.
Answer. {(1, 1), (1, 2)}.
5. If T∈A(V) has all its eigenvalues in F, then show that there is a basis of V in which the
matrix of T is triangular.
Hint. See the Theorem 1. 3. 2.
1. 8. References
146
UNIT-2: THE JORDAN CANONICAL FORM
STRUCTURE
2. 0. Objectives
2. 1. Introduction
2. 2. The Jordan Form
2. 2. 1. Basic Facts of Nilpotent Transformation
2. 2. 2. Minimal Polynomial
2. 2. 3. Basic Jordan Block
2. 3. Summary
2. 4. Keywords
2. 5. Assessment Questions
2. 6. References
147
UNIT-2: THE JORDAN CANONICAL FORM
2. 0. Objectives
2. 1. Introduction
148
𝐴1 0 ⋯ 0
𝐴2 ⋯ 0
matrix of T in this basis is of the form � 0 ⋯ ⋮ �, where each A i is an n i × n i
⋮ ⋮
0 0 ⋯ 𝐴𝑛
matrix and is the matrix of the linear transformation induced by T on V i .
Fact 2. If T ∈ A(V) is nilpotent, then a 0 + a 1 T + . . . . . . . + a m Tm , where the a i ∈ F, is
invertible if a 0 ≠ 0.
Notation. M s will denote the s × s matrix.
0 1 0 0 0
⎛0 0 1 ⋯ 0 0⎞
M s =⎜ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⎟.
0 0 0 ⋯ 0 1
⎝0 0 0 0 0⎠
All of whose entries are 0 except on the superdiagonal, where they are all i’s.
(ii) There is an element z ∈ M such that z, T(z), . . . . . . . ,Tm – 1(z) form a basis of M.
2. 2. 2. Minimal Polynomial
150
then the minimal polynomial of T over a field F is the Least common multiple of
p 1 (x), p 2 (x) ,… ., p k (x).
Example - 1. Here are two matrices in Jordan form. The first has two primary blocks.
While the second has three.
153
3 1 0 3 1 0
0 3 1 0 3 1
0 0 3 0 0 3
3
3 1
3
0 3
2 1 0 0
2 1
0 2
0 2 1 0
0 0 2 1
2 1 0 0 0 2
2
0 5
5
As a point of intersect, the first matrix has characteristic polynomial (x–3)5 (x–2)4
and minimal polynomial (x–3)3 (x–2)2. The second matrix has characteristic polynomial
(x–3)5 (x–2)4 (x–5)2 and minimal polynomial (x–3)3 (x–2)4 (x–5) .
𝐽1 𝐵𝑖1
⎛ 𝐵 𝑖2 ⎞
� 𝐽2 �, where each ⎜ ⎟ and where 𝐵𝑖1 , 𝐵𝑖2 . .. . . ,
⋱ ⎜ ⋱ ⎟
𝐽𝑘
⎝ 𝐵 𝑖𝑟 𝑖 ⎠
154
M𝑛1 𝐵𝑖1
λ ⎛ ⎞
λ ⎛ 𝑀n2 ⎞ 𝐵 𝑖2
� � + ⎜ ⎟ = ⎜ ⎟
⋱ ⎜ ⎟
⋱ ⋱
λ 𝑀𝑛𝑟 ⎠
⎝ ⎝ 𝐵 𝑖𝑟 𝑖 ⎠
using the first remark made in this proof about the relation of a basic Jordan block and
the M m ’s. This completes the theorem.
Note.
1. In each J i the size of 𝐵𝑖1 ≥ size of 𝐵𝑖2 ≥ . . . . . . , when this has been done, then
𝐽1
the matrix � 𝐽2 � is called the Jordan canonical form of T.
⋱
𝐽𝑘
2. Two linear transformations in A(V) which have all their eigenvalues in F are
similar if and only if they can be brought to the same Jordan form.
1 0 0
Illustrative Example -2. Compute the Jordan canonical form for A= �0 0 −2 �.
0 1 3
Solution. Write A for the given matrix. The characteristic polynomial of A is
(λ–1)2(λ–2). So the two possible minimal polynomials are (λ–1) (λ–2) or the
characteristic polynomial itself.
We find that (A – I) (A – 2I) = 0 so the minimal polynomial is (λ–1) (λ–2), and hence the
invariant factors are λ–1, (λ–1) (λ–2).
The prime power factors of the invariant factors are the elementary divisors: λ–1, λ–1,
λ–2. Finally the Jordan canonical form of A is diagonal with diagonal entries 1, 1, 2.
Note. After determining that the minimal polynomial has all roots in the ground field and
no repeated roots, we can immediately conclude that the matrix is diagonalizable and
therefore the Jordan canonical form is diagonal.
Illustrative Example -3. Find all possible Jordan forms A for 6 × 6 matrices with
t2 (1 – t)2 as minimal polynomial.
155
Solution. The possible characteristic polynomials of A have the same irreducible factors
t and (1 – t) with atleast square exponents and of degree 6. Hence we have the
following cases:
Case (i). Characteristic polynomial is t4(t– 1)2. Jordan form in this case is
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
⎛0 0 0 1 0 0⎞ ⎛0 0 0 0 0 0⎞
⎜0 𝑜𝑟 ⎜
0 0 0 0 0⎟ 0 0 0 0 0 0⎟
0 0 0 0 1 1 0 0 0 0 1 1
⎝0 0 0 0 0 1⎠ ⎝0 0 0 0 0 1⎠
Case (ii). Characteristic polynomial is t3 (t – 1)3. Jordan form is
0 1 0 0 0 0
0 0 0 0 0 0
⎛0 0 0 0 0 0⎞
⎜0 0 0 1 1 0⎟
0 0 0 0 1 0
⎝0 0 0 0 0 1⎠
Case (iii). Characteristic polynomials is t2(t– 1)4. Jordan form is
0 1 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
⎛0 0 1 1 0 0⎞ ⎛0 0 1 1 0 0⎞
⎜0 𝑜𝑟 ⎜
0 0 1 0 0⎟ 0 0 0 1 0 0⎟
0 0 0 0 1 1 0 0 0 0 1 0
⎝0 0 0 0 0 1⎠ ⎝0 0 0 0 0 1⎠
Illustrative Example -4. Let J be a Jordan block with diagonal entriesλ. Then show that
λ is the only eigenvalue, and the associated eigenspace is only 1-dimensional.
Solution. Since J is upper-triangular, it is clear that the only eigenvalue is λ.
Solving Jx = λx gives us the equations λx i + x i+1 = λx i for i < n, and Jx n = λx n , from
which we see that x 2 = , . . . , = x n = 0, giving the eigenvector (1, 0, . . . . , 0).
2. 3. Summary
156
entries that are non-zero must be equal to 1, be immediately above the main diagonal (on
the superdiagonal), and have identical diagonal entries to the left and below them. The
diagonal form for diagonalizable matrices, for instance normal matrices, is a special case
of the Jordan normal form.
2. 4. Keywords
2. 5. Assessment Questions
3. Define the Jordan canonical form with suitable example which contains atleast
two Jordan blocks.
Hint. See the definition 2. 2. 3 and Example-1.
2 6 − 15
4. Find the Jordan canonical form of A= 1 1 − 5
1 2 − 6
157
Answer. The characteristic polynomial of A is (λ–1)4
1 2 0 0
0 1 2 0
5. Determine the Jordan canonical form for the matrix A= .
0 0 1 2
1
0 0 0
2. 6. References
158
UNIT-3: THE MINIMAL POLYNOMIAL
STRUCTURE
3. 0. Objectives
3. 1. Introduction
3. 2. The Minimal Polynomial
3. 3. Summary
3. 4. Keywords
2. 7. Assessment Questions
3. 5. References
159
UNIT-3: THE MINIMAL POLYNOMIAL
3. 0. Objectives
3. 1. Introduction
The minimal polynomial deals the distinct eigenvalues and the size of the largest Jordan
block corresponding to eigenvalues. In this unit, we study the minimal polynomial, which is
played vital role of canonical forms, particularly generalized eigenvalue and eigenvector.
3. 2. Minimal polynomial
Definition 3. 2. 1. The monic polynomial p(x) of minimum degree such that p(T) = 0 is
called the minimal polynomial of T
Note.
1. Let F be a field. Let p(x) and h(x) ∈ F[x] ( or simply, P(F)). Suppose p(x) ≠ 0.
Then we may find q(x) and r(x) ∈ F[x] such that h(x) = q(x)p(x) + r(x), where
either r(x) = 0 or deg (r(x)) < deg(p(x)). This is known as Division Algorithm.
2
2. If we consider the set I, T, . . . , 𝑇 𝑛 in n2-dimensional vector space A(V), we have
(n2 + 1) - elements, so they cannot be linearly independent. Thus there is some
2
linear combination a 0 I + a 1 T + , . . . , + 𝑎𝑛2 +1 𝑇 𝑛 that equals the zero function.
160
For every T∈A(V) satisfies some polynomial of degree ≤ n2. Knowing that there
is some polynomial that T satisfies, we can find a polynomial of minimal degree
that T satisfies, and then we can divide by its leading coefficient to obtain a monic
polynomial.
161
For a more involved example consider the matrix B = B k (λ) ∈ M k,k (F), where λ is a
scalar and we have a trailing sequence of 1's on the main diagonal.
First consider T = B k (0) = B k (λ) –λIk = B –λIk .
This matrix is nilpotent. In fact Tk = 0, but Tk–1 ≠ 0.
So if we set g(x) = (x – λ)k then g(B) = 0.
Once again, the minimal polynomial p(x) of B must divide g(x).
So p(x) = (x – λ)i, some i ≤ k.
But since Tk–1 ≠ 0, in fact i = k, and the minimal polynomial of B is precisely
p(x) = (x – λ)k.
Note.
1. The characteristic and minimal polynomials of a linear transformation have the
same zeros (except for multiplicities).
2. If V is a finite- dimensional vector space over F, then T∈A(V) is invertible if and
only if the constant term of the minimal polynomial for T is not zero.
3. Suppose all the characteristic roots if T∈A(V) are in F. Then the minimal
polynomial, q 1 (x) = (𝑥 − λ1 )𝑙1 (𝑥 − λ2 )𝑙2 . . . . . (𝑥 − λk )𝑙𝑘 for λ i ∈F.
Here q i (x) = (𝑥 − λi )𝑙𝑖 and V i = {v∈V/ (𝑇 − λi )𝑙𝑖 (v) = 0 }.
So, if all the distinct characteristic roots λ i ………, λ k of T lie in F, then V can be
𝑙𝑖
written as V = V 1 ⨁V 2 ⨁……….⨁V k , where V i = {v∈V/ �𝑇– λi � (v) = 0 } and
where T i has only one characteristic root λ i on V i .
Example - 2. Let T be the linear operator of R2 defined by T(a, b)=(2a+5b, 6a+b) and B
2 5
be the standard ordered basis for R2. Then [T] B =� �, and hence the characteristic
6 1
2−𝑡 5
polynomial of T is f(t) = det� �= (t –7) (t+4).
6 1−𝑡
Thus the minimal polynomial of T is also (t –7) (t+4).
162
Proof. If T is diagonalizable, we can compute its minimal polynomial using a diagonal
matrix, and then it is clear that we just need one linear factor for each of the distinct
entries along the diagonal.
Conversely, suppose that the minimal polynomial of T is p(x) = (x – λ n ) (x – λ n–1 )
. . . . (x – λ 1 ) . Then V is a direct sum of the null spaces n(T – λ n IV ), which shows that
there exists a basis for V consisting of eigenvectors.
Illustrative Example-3. Let V = F 3 [x] be the space of all polynomials of degree atmost
3 and let T: V → V be the linear transformation given by T(f) = f1. Find the minimal
polynomial of T.
Solution. The basis for V is {1, x, x2, x3} and T(1) = 0, T(x) = 1, T(x2) = 2x and
T(x3) = 3x2.
0 0 0 0
1 0 0 0
Hence the matrix of T is A = .
0 2 0 0
0 0 3 0
By direct computation, A4 = 0 and A3 ≠ 0. Hence the minimum polynomial of T is x4.
Note.
0 1
Illustrative Example-4. Let F be the field of real numbers and let � � ∈F 2 .
−1 1
0 1
Then prove that the set µ consisting only of � � is an irreducible. Further, Find the
−1 1
0 1
set D of all matrices commuting with � � , where D = {T∈A(V): TM = MT for all
−1 1
M∈µ}.
163
𝑎 𝑏 0 1 0 1 𝑎 𝑏
Solution. Consider � � � � = � � � � .
𝑐 𝑑 −1 1 −1 1 𝑐 𝑑
−𝑏 𝑎 𝑐 𝑑
That is, � � = � �.
−𝑑 𝑐 −𝑎 −𝑏
𝑎 𝑏 0 1 −𝑏 𝑎
� � � � = � � and
−𝑏 𝑎 −1 1 −𝑎 −𝑏
0 1 𝑎 𝑏 −𝑏 𝑎
� � � � = � �.
−1 1 −𝑏 𝑎 −𝑎 −𝑏
𝑎 𝑏
Hence D = { � � : a, b ∈R }.
−𝑏 𝑎
0 1
That is, D = set of all matrices which commute with � �.
−1 1
𝑎 𝑏
Define a map φ: D→ C by φ� � → a+ib is a field isomorphism.
−𝑏 𝑎
To prove it is a field isomorphism.
𝑎 𝑏 𝑐 𝑑 𝑎+𝑐 𝑏+𝑑
First, φ � �+� � = φ � �
−𝑏 𝑎 −𝑑 𝑐 −(𝑏 + 𝑑) 𝑎 + 𝑐
= (a + c) + i(b + d)
= (a + ib) + (c + id)
𝑎 𝑏 𝑐 𝑑
=φ � �+ φ � �
−𝑏 𝑎 −𝑑 𝑐
164
0 1
Let A = � � , then det(A–λI) = 0 , since det (A) = 1.
−1 0
0 1 0 1
That is, det � � � − λ � ��=0
−1 0 −1 0
−λ 1
⇒ det � � =0
−1 0
⇒ λ2+1 = 0 ⇒ λ = ± √–1 = ± i .
Therefore, the minimal polynomial is (x+i) (x–i) = x2 +1 = 0
Now M⊂ A(V), M is a irreducible set if M(W) ⊂ W ⇒ W ={0} or W = V .
Therefore, T(W)⊂ W ∀ T∈M ⇒ either W = {0} or W = V .
Let T be a Linear transformation
0 1
T=� � : R2 → R2
−1 0
Let W be a one - dimensional subspace of V , T(W)⊄W. Let {w 1 } be a basis of W. Then
{w 1 } can be extended to a basis {w 1 , w 2 } of V.
Suppose W is invariant under T .
T (w 1 ) = a 1 w 1 +0.w 2 , because T (w 1 )∈W = 〈 w 1 〉.
T (w 2 ) = b 1 w 1 + b 2 w 2 , because T (w 2 )∈W
𝑎 0
Therefore, the matrix A = � 0 � is a matrix of T with respect to {w 1, w 2 } and the
𝑏1 𝑏2
0 1
matrix B = � � is also a matrix of T with respect to some other basis . Then by the
−1 1
result A and B are similar.
0 1
That is, B = � � ⇒ the matrix B has ± i are complex roots, and
−1 1
𝑎0 0
A=� � ⇒ the matrix A has a 1 , b 1 are real roots.
𝑏1 𝑏2
This is a contradiction to the fact that W is invariant under T . That is, there is no proper
subspace of V in D. Therefore, the matrix is irreducible.
165
3. 3. Summary
1. The minimal polynomial deals the distinct eigenvalues and the size of the largest
Jordan block corresponding to eigenvalues. While the Jordan normal form
determines the minimal polynomial, the converse is not true. This leads to the
notion of elementary divisors.
2. The minimal polynomial has all roots in the ground field and no repeated roots,
we can immediately conclude that the matrix is diagonalizable and therefore the
Jordan canonical form is diagonal.
3. The elementary divisors of a square matrix A are the characteristic polynomials of
its Jordan blocks. The factors of the minimal polynomial m are the elementary
divisors of the largest degree corresponding to distinct eigenvalues. The degree of
an elementary divisor is the size of the corresponding Jordan block, therefore the
dimension of the corresponding invariant subspace. If all elementary divisors are
linear, A is diagonalizable.
3. 4. Keywords
3. 5. Assessment Questions
166
Hint. See Theorem 3. 2. 1 - (ii).
3. If T, S∈A(V) and if S is regular, then show that T and STS–1 have the same
minimal polynomial.
Hint. Use the polynomial expression p(x) and show p(T) = p(STS–1) with the
definition of minimal polynomial conditions.
4. Give an example of two n × n matrices which have the same minimal polynomial,
but are not similar to each other.
1 1 1 1
Answer. A = � � and B = � �. Also, rank of A=1 and rank of B = 2.
0 0 0 1
3. 6. References
167
UNIT- 4: THE RATIONAL CANONICAL FORM
STRUCTURE
4. 0. Objectives
4. 1. Introduction
4. 2. The Rational Canonical Form
4. 2. 1. Basic Facts of Cyclic Module
4. 2. 2. Companion Matrix
4. 3. Summary
4. 4. Keywords
4. 5. Assessment Questions
4. 6. References
168
UNIT- 4: THE RATIONAL CANONICAL FORM
4. 0. Objectives
4. 1. Introduction
The generalizations of the eigenvalue and eigenspace, this unit deals with a
suitable canonical form of a linear operator to this context. The one that we study is
called rational canonical form.
0 1 0 ⋯ 0
0 0
⎛ 0 1 ⋯
⋯ ⎞
⋮
⎜ ⋯ ⎟.
0 0 0 1
⋯
⎝−γ0 −γ1 ⋯ −γr−1 ⎠
Proof. Since V is cyclic relative to T, there exists a vector v∈V such that every element
w∈V, is of the form w = f(T)(v) for some f(x) ∈ F[x].
Now if for some polynomial h(x) ∈ F[x], h(T)(v) = 0, then for any w∈V,
h(T) (f(T)(v)) = (h(T)(v)) f(T) = 0; thus h(T) annihilates all of V and so h(T) = 0.
But then p(x) | h(x), since p(x) is the minimal polynomial of T. This remark
implies that {v, T(v), T2(v), . . . . . , T r – 1(v) } are linearly independent over a field F, for
if not, then a 0 v+ a 1 T(v) + . . . . . + a r – 1 T r – 1(v) = 0 with a 0 , a 1 , . . . . . . ,a r – 1 in over a
field F.
170
r – 1
But then (a 0 + a 1 T + . . . . . . . + a r – 1 T )(v) = 0, hence by the above
discussion p(x) | (a 0 + a 1 T + . . . . . . . + a r – 1 T r – 1), which is impossible
since p(x) is of degree r unless a 0 = a 1 = . . . . . . . + a r – 1 = 0.
Since Tr = −γ 0 − γ 1 T − . . . . . . . . . −γ r – 1 Tr – 1, we immediately have that T r + k ,
for k ≥ 0, is a linear combination of 1, T, . . . . . . , Tr – 1, and so f(T), for any f(x) ∈ F[x], is
a linear combination of 1, T, ….., Tr – 1 over F.
Since any w∈V is of the form w = f(T)(v) , we get that w is a linear combination of
{v, T(v), . . . .. , Tr – 1 (v)}.
Thus, we have proved, in the above two paragraphs, that the elements
{v, T(v), . . . . . . . . , Tr – 1 (v) }form a basis of V over F.
In this basis, as is immediately verified, the matrix of T is exactly as claimed.
4. 2. 2. Companion matrix
Note.
1. If V is cyclic relative to T and if the minimal polynomial of T in F[x] is p(x) then
for some basis of V the matrix of T is C(p(x)).
2. The matrix C(f(x)), for any monic f(x) in F[x], satisfies f(x) and has f(x) as its
minimal polynomial.
3. If V = V 1 ⊕ V 2 ⊕ . . . . . . . ⊕ V k , where each subspace V i is of dimension n i and
is invariant under T, an element of A(V) , then a basis of V can be found so that
171
𝐴1 0 ⋯ 0
𝐴2 ⋯ 0
the matrix of T in this basis is of the form � 0 ⋯ ⋮ �, where each A i is
⋮ ⋮
0 0 ⋯ 𝐴𝑛
an n i × n i matrix and is the matrix of the linear transformation induced by T on V i .
4. If T ∈ A(V) is nilpotent, of index of nilpotence n i , then a basis of V can be found
𝑀𝑛1 0 ⋯ 0
𝑀𝑛2 ⋯ 0
such that the matrix of T in this basis has the form � 0 ⋯ ⋮ �,
⋮ ⋮
0 0 ⋯ 𝑛𝑛𝑟
where n 1 ≥ n 2 ≥ . . . . . . . ≥ n r and where n 1 + n 2 + . . . . . . . ≥ n r = dim (V).
Here, the integers n 1 , n 2 , . . . . . . . , n r are called the invariants of T.
172
C(q(x)e1 )
C(q(x)e1 )
� �
⋱
C(q(x)er )
Definition 4. 2. 2. The matrix of T in the statement of the above corollary is called the
rational canonical form of T.
are called the elementary divisors of T. So, if dim(V) = 0, then the characteristic
polynomial of , p T (x), is the product of its elementary divisors.
173
0 1 0 ⋯ 0
⎛ 0 0 1 ⋯ 0
Solution. 𝐶�𝑞(𝑥)� = ⎜ ⋯ ⋯ ⋯ ⋯ ⋯ ⎞,
⋯ ⎟
0 0 0 1
−𝑎 −𝑎1 ⋯ ⋯ 𝑎𝑛−1 ⎠
⎝ 0
where q(x) = a 0 + a 1 x + . . . . . + a n – 1 x n – 1 + x n.
λ −1 0 ⋯ 0
0 λ −1 ⋯ 0
𝐷( λ𝐼 − 𝑞(𝑥)) = �� ⋯ ⋯ ⋯ ⋯ ⋯ ��
0 0 0 ⋯ λ−1
𝑎0 𝑎1 ⋯ ⋯ λ + 𝑎𝑛−1
Also to the first column, λ times the second column, λ2 times the third column etc, and
λn – 1 times the last column. Then the determinant becomes
0 −1 0 ⋯ 0 −1 0 ⋯ 0
0 λ −1 ⋯ 0 λ −1 ⋯ 0
�� ⋯ ⋯ ⋯ ⋯ ⋯ �� = (−1)𝑛−1 𝑞(λ) � ⋯ ⋯ ⋯ ⋯ �
0 0 0 ⋯ λ−1 0 0 ⋯ λ−1
𝑞(λ) 𝑎1 ⋯ ⋯ λ + 𝑎𝑛−1
= ( − 1)n – 1 q ( λ) (− 1)n – 1 = q ( λ).
Illustrative Example-2. Deduce from the previous problem that the characteristic
polynomial of T is the product of all elementary divisors of T.
𝑅1 0
Solution. Under the Rational Canonical form, the matrix of T is � ⋱ �,
0 𝑅𝑘
𝐶(𝑞𝑖 (𝑥)𝑒𝑖1 )
where each 𝑅𝑖 = � ⋱ �.
𝑒𝑖𝑛𝑖
𝐶(𝑞𝑖 (𝑥) )
Then the characteristic polynomial of T is the product of characteristic polynomials of
C(𝑞𝑖 (𝑥)𝑒𝑖𝑗 ) which is nothing but 𝑞𝑖 (𝑥)𝑒𝑖𝑗 ). Hence the characteristic polynomial of T is
the product of all elementary divisors.
Illustrative Example-3. If F is the field of rational numbers, find all possible rational
canonical forms and elementary divisors for the 6 x 6 matrices in F 6 having (x – 2)(x +
2)3 as minimal polynomial.
Solution. Since the matrix is of size 6 x 6 and the minimal polynomial is (x – 2)(x + 2)3
there are three cases for the characteristic polynomial.
174
Case 1. Characteristic polynomial is (x – 2)(x + 2)5 .
In this case, the rational canonical forms are C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2)2
or C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2) ⊕ C(x + 2).
Case 2. Characteristic polynomial is (x – 2)(x + 2)4 .
In this case, the rational canonical form is C(x – 2) ⊕ C(x – 2) ⊕ C(x + 2)3 ⊕ C(x + 2).
Case 3. Characteristic polynomial is (x – 2)2(x + 2)3.
In this case, the rational canonical form is C(x – 2) ⊕ C(x – 2) ⊕ C(x – 2) ⊕ C(x + 2)3.
All these can be written in the block matrix from using the matrix form of C(q(x)),
the companion matrix of q(x).
4. 3. Summary
The Jordan canonical form is the one most generally used to prove theorems
about linear transformations and matrices. Unfortunately, it has one distinct, serious
drawback in that it puts requirements on the location of the charecterstic roots. Thus we
need some canonical form for element in A(V) (or in F n ) which presumes nothing about
the location of the characteristic roots of its element , a canonical form and asset of
invariants created in A(V) itself using only its elements and operation. Such a canonical
form is proved the above unit is rational canonical form.
4. 4. Key words
4. 5. Assessment Questions
175
1. Explain the role of R-module in rational canonical form.
Hint. See the section 4. 2. 1.
2. Show that the elements S and T in A(V) are similar in A(V) if and only if they have
the same elementary divisors.
Hint. See the similar linear transformation in unit-1 and definition 4. 2. 3.
1 1 1 1
0 0 0 0
3. Find the rational canonical form of A=
0 0 −1 0
0 −1 1 0
Hint. The characteristic polynomial of A is (x–1) x (x2+x+1), then use the
companion matrix of q(x).
4. If F is the field of rational numbers, find all possible rational canonical forms and
elementary divisors for the 6 x 6 matrices in F 6 having (x – 1)(x2 + 1)2 as minimal
polynomial.
Hint. Use similar method of illustrative example-3.
4. 6. References
176
BLOCK-4
STRUCTURE
13.0. Objectives
13.1 Introduction
13.2 Combinatorics & permutation
13.3 Frequency Distribution
13.4 Graphical Representation of Data
13.5 Measures of Central Tendency
13.6 Moments, Skewness, Kurtosis
13.7 Summary
13.8 Keywords
13.9 Questions for self-study
13.10 References
13.0 Objectives
177
13.1 Introduction
Features of combinatorics
178
Some of the important features of the combinatorics are as follows:
To decide when particular criteria can be fulfilled and analyzing elements of the
criteria, such as combinatorial designs.
In English, we make use of the word “combination” without thinking if the order is
important. Let’s take a simple instance.
The fruit salad is a combination of grapes, bananas, and apples. The order of fruits in
the salad does not matter because it is the same fruit salad.
But, let us assume that, the combination of a key is 475. You need to take care of the
order, since the other combinations like 457, 574, or others won’t work. Only the
combination of 4 – 7 – 5 can unlock.
Hence, to be precise;
When the order does not have much impact, it is said to be a combination.
When the order does have an impact, it is said to be a permutation.
Combinatorics Formulas
The mathematical form of Permutation and Combination:
Permutation Formula:
Permutation: The act of an arranging all the members of a set into some order
or sequence, or rearranging the ordered set, is called the process of
permutation.
179
Mathematically Permutation is given as
k-permutation of n is:
n P k = n / (n - k)
Combination Formula
Combination: Selection of members of the set where the order is disregarded.
k-combination of n is:
C (n , k) = n! / [ (n-k)! k! ]
Applications of combinatorics
Combinatorics is applied in most of the areas such as:
180
ii. If a certain task can be done in m different ways. And having done it one of
these ways, another task can be done in n different ways, then two tasks can
together be done in mXn ways.
Permutations:
A permutation is an arrangement in a definite order of a number of objects taken some
or all at a time.
The number of permutations of n objects taken r at a time is denoted by 𝑛𝑃𝑟 or P(n,r).
To find the number of permutations of n things taken r at a time (which is same as the
number of ways of filling up r blank spaces with n objects) is:
The first space can be filled by any one of the n objects in n ways. After filling the
first space, the second space can be filled by any one of the remaining (n-1) objects in
(n-1) ways. Proceeding like this, the r-th place can be filled by any one of the
remaining n-(r-1) (= n-r+1) ways. Hence,
𝑛 𝑛!
𝑃𝑟 = 𝑛(𝑛 − 1)(𝑛 − 2) … (𝑛 − 𝑟 + 1) = (𝑛−𝑟)!, 0≤ 𝑟 ≤ 𝑛 . (1.1)
𝑛
𝑃𝑛 = 𝑛! .
Suppose in the arrangement, the repetition is allowed, then each space can be filled
in n ways and hence, r blank spaces can be filled in n.n.....n =nr ways. Therefore the
number of permutations of n different objects taken r at a time when repetition is
allowed is nr .
The number of permutations of n things taken all at a time, when p of them are of one
kind, q of them are of another kind, r of them of third kind and so on, and the rest (if
𝑛!
any) , are all different is .
𝑝!𝑞!𝑟!…
181
Example 1
The number of ways a 3-digit number can be formed by using the digits 1 to 9 if no
digit is repeated is give by 9𝑃3 =9X8X7 = 504.
Example 2
The number of different signals that can be transmitted by arranging 3 red, 2 yellow
7!
and 2 green flags on pole assuming all flags are used to transmit a signal is 3!2!2! .
Combinations:
A combination is a selection of some or all of number of different objects where the
order of selection is immaterial. The number of ways of selecting r objects out of n
𝑛
objects is denoted by ( ) or 𝑛𝐶𝑟 or C(n,r) and is given by
𝑟
𝑛 𝑛!
( ) = 𝑟!(𝑛−𝑟)! , (1.2)
𝑟
n and r are positive integers with 𝑟 ≤ 𝑛 .
The following results hold true :
𝑛 𝑛 𝑛 𝑛 𝑛+1
( )=( ), ( ) + ( )=( ),
𝑟 𝑛−𝑟 𝑟 𝑟−1 𝑟
𝑛−1 𝑛
n.( ) = (𝑛 − 𝑟 + 1) ( ).
𝑟−1 𝑟−1
Example 3
A box contains 75 good IC chips and 25 defective chips. 12 chips are selected at
100
random. The number of ways this is done is ( ) (number of samples). The number
12
75
of samples that have all the 12 chips to be good are ( ).
12
182
Binomial Theorem:
If n is a positive integer, then
𝑛 𝑛 𝑛 𝑛
(𝑎 + 𝑏)𝑛 = ( ) 𝑎𝑛 + ( ) 𝑎𝑛−1 𝑏 + ( ) 𝑎𝑛−2 𝑏 2 + ⋯ + ( ) 𝑏 𝑛 (1.3)
0 1 2 𝑛
𝑛 𝑛 𝑛
Co – efficients ( ), ( ), ... , ( ) are called binomial co-efficients.
0 1 𝑛
The result is proved by mathematical induction.
Using binomial theorem, one can find easily, 1113 , (1.1)1000, expansion of (1 + 𝑥)𝑛 ,
183
A tabular presentation of frequency distribution is called frequency table. A frequency
distribution in which class intervals are considered is a continuous frequency
distribution; otherwise it is discrete (ungrouped) frequency distribution. Let us
consider an example, to understand frequency distribution.
Example 4
Marks obtained by 20 students from a class in a test are:
23, 13, 26, 11, 18, 09, 21, 23, 13, 30,
22, 11, 17, 22, 19, 13, 14, 22, 15, 16.
This form of data is known as raw data. To understand and interpret such data, it needs to
be ‘organized’. One way of organization of larger data into a concise form is construction
of frequency distribution table. Note that the term frequency refers to the number of times
an observation occurs or appears in a data set.
Marks obtained Frequency
09 1
11 2
13 3
14 1
15 1
16 1
17 1
18 1
19 1
21 1
22 3
23 2
26 1
30 1
184
This is an ungrouped frequency distribution table. It takes into account ungrouped data
and calculates the frequency for each data value.
Consider the data in the form of class intervals (CI) to tally the frequency for the data that
belongs to that particular class interval.
CI (Marks obtained) Frequency
5-10 1
10-15 6
15-20 5
20-25 6
25-30 1
30-35 1
This is grouped frequency table. From this table it is clear that, the whole range is
divided into mutually exclusive sub intervals called Class Intervals (CI). Each CI has
lower class limit ( 5, 10, 15, 20, 25 ) and upper class limit ( 10, 15, 20, 25, 30 ). The
difference between the class limits is called width of the CI. The number of
observations in any class is the class frequency. For a CI, let f be the class frequency,
w be the width of CI, N be the total frequency.
d = f/w is frequency density, p = f/N is relative frequency. In the above example, N is
20. For CI 20 – 25, f is 6 and w is 5. Hence, d = 6/5 and p = 6/20.
Inclusive and Exclusive CIs:
If a CI is such that the values of data equal to lower and upper class limits are
included in the same CI, it is inclusive CI. For example, if the class intervals of above
example are to be inclusive type then the CI would be 5 – 9, 10 – 14, 15 – 19,.... If a
CI is such that the value of the variable equal to lower class limit is included in that
class but the variable having a value equal to upper class limit is included in
succeeding CI, it is exclusive CI.
185
The above frequency table represents exclusive typeCI. Here the data value 15 is
considered in the CI 15-20 but not in 10-15 and similarly the data value 30 is included
in the CI 30-35 but not in 25-30.
186
If the CIs are of inclusive type, they need to be converted to exclusive type by
subtracting 0.5 from lower class limit and by adding 0.5 to upper class limit
(Example 13). From histogram mode of the distribution is obtained. Histograms show
which values are more and less common along with their dispersion.
Example 5
Figure 1 is an example of a histogram that shows the distribution of salary, of the
employees of a corporation.
Salary (in 000s) Number of employees
140-145 4
145-150 10
150-155 18
155-160 20
160-165 15
165-170 8
170-175 5
Figure 1:
187
Example 6
The distribution of Weekly wages of workers in a factory are as below.
Weekly wages (in Rs.) Number of workers Frequency Density
200-400 40 0.2
400-450 85 1.7
450-500 160 3.2
500-600 280 2.8
600-700 110 1.1
700-800 60 0.6
800-900 10 0.1
Figure 2 is histogram for the same. As The CIs are of unequal width, heights of the
rectangles are proportional to frequency density (f/w).
Figure 2:
188
To find the mode of the distribution, mark the points A, B, C, D on the highest
rectangle (as shown in Figure 2), join AC and BD. Let these intersect at O. From O
draw perpendicular to X-axis. Let it meet X-axis at P. The value of P is mode. In this
example, mode = 540.
Frequency Polygon:
Variable is taken along X-axis. CIs may be of equal width, unequal width, inclusive
type or exclusive type.
Class mid values are obtained. Class frequencies are plotted against class-mid values.
These points are joined by straight lines. Joining the mid-points of the upper sides of
the rectangles of histogram also, frequency polygon is obtained.
189
Should not be affected by abnormal/extreme values.
Should be capable of further mathematical treatment so that it could be used in further
analysis of data.
Should be a stable measure.
Arithmetic Mean (A. M.):
It is obtained by dividing sum of values by number of values in a set. A. M. of 𝑥1 ,
𝑥2 , … , 𝑥𝑛 is
𝑥1 +𝑥2 +⋯+𝑥𝑛 ∑𝑛
𝑖=0 𝑥𝑖
𝑥̅ = = (1.4)
𝑛 𝑛
190
∑𝑛
𝑖=0 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑛
. (1.6)
𝑖=0 𝑓𝑖
Example 7
The pull off force collected from 10 prototype engine connectors are 11.5, 12.3, 10.2,
12.6, 13.4, 11.2, 12.1, 11.8, 10.7, 11.6. The mean pull off force is given by
11.5+12.3+⋯+11.6
𝑥̅ = = 11.74 .
10
𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝟖
The number of IC chips with different numbers of defects are as below.
Number of defects (X) Number of chips(f)
1 25
2 32
3 12
4 48
5 33
6 42
7 29
8 17
9 9
10 6
The mean number of defective chips is given by
1𝑋25 + 2𝑋32 + ⋯ + 10𝑋6
𝑥̅ = = 4.79
25 + 32 + ⋯ + 6
191
Example 9
Daily expenditure on commutation by 120 students is given below.
Expenditure Number of Mid-values fx
In Rs. Students(f) x
10-20 21 15 315
20-30 40 25 1000
30-40 33 35 1155
40-50 14 45 630
50-60 7 55 385
60-70 3 65 195
70-80 2 75 150
The mean expenditure is
315+1000+⋯+150
𝑥̅ = = 31.91 Rs.
120
Note: The distribution of mean can be easily obtained. It can be calculated even when
some of the values are zero or negative. It is affected by abnormal values.
Median:
Median of a set of values is the middle most value when they are arranged in
ascending (or descending) order of magnitude.
For ungrouped Data, arrange the data in ascending or descending order.
Let the total number of values be n.
Median = (n + 1)/2th observation, if n is odd
Median = [(n/2)th value+ ((n/2) + 1)th value]/2, if n is even.
Example 10
𝑛+1
For the data: 50, 67, 24, 34, 78, 43, 55, median is obtained as ( ) - th = 4th value of
2
192
Example 11
For the data 46, 83, 55, 26, 65, 56, 65, 37, 22, 73, median is [(n/2)th value+ ((n/2)
+ 1)th value]/2 of 22, 26, 37, 46, 55, 56, 65, 65, 73, 83 is [55+56]/2=55.5.
When the data is continuous and in the form of a frequency distribution, the median is
found as shown below:
𝑁
( −𝑐)
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2
.ℎ (1.8)
𝑓
Example 12
Following are the marks scored by 50 students, the median marks scored is:
Frequency 2 12 22 8 6
0-10 2 2
193
10-20 12 2 + 12 = 14
20-30 22 14 + 22 = 36
30-40 8 36 + 8 = 44
40-50 6 44 + 6 = 50
N = 50
N/2 = 50/2 = 25
Median Class = (20 - 30)
𝑙= 20, f = 22, c = 14, h = 10
Using Median formula (1.8)
Median= 20 + (25 - 14)/22 × 10
= 20 + (11/22) × 10
= 20 + 5 = 25
Example 13
Following are the heights of students registered for Computer Science core
Course.
Classes: 130-139 140-149 150-154 155-159 160-164 165-175
Frequency:2 3 11 20 21 18
194
First convert the CI to exclusive type, find cumulative frequencies.
Classes Frequency Cum. Frequency
129.5-139.5 2 2
139.5-149.5 3 2+3=5
149.5-154.5 11 5+11=16
154.5-159.5 20 16+20=36
159.5-164.5 21 36+21=57
164.5-175.5 18 57+18=75
N = 75
N/2 = 75/2 = 37.5
Median Class = 159.5-164.5
l = 159.5, f = 21, c = 36, h = 10
Using Median formula (1.8)
Median= 159.5 + (37.5-36)/21 × 10
= 160.21
Note: Median can be computed even when some extreme values are
missing, it can be computed. It is not affected by abnormal values /outliers. It can be
used for qualitative data also. Its computation is not based on all values.
Mode:
The value which appears most often in the given data i.e. the observation with the
highest frequency is called a mode of data. For ungrouped data, we need to identify the
observation which occurs maximum times. For example in the data: 7, 8, 9, 3, 4, 6, 7,
6, 8, 3, 12, 8 the value 8 appears the
195
most number of times. Thus, mode = 8. A data may have no mode, 1 mode, or more
than 1 mode. Depending upon the number of modes the data has, it can be called
unimodal, bimodal, trimodal, or multimodal. For example in the data: 7, 8, 9, 3, 4, 6, 7,
6, 8, 3, 12, 8, 7, 13, the values 7 and 8 appear the most number of times. Thus, 7 and
8 are modes, hence the data is bimodal. When the data is continuous and in the form of
a frequency distribution, the mode is found as shown below:
𝑓 −𝑓
𝑀𝑜𝑑𝑒 = 𝑙 + (2𝑓 𝑚−𝑓 −𝑓
0
).ℎ (1.9)
𝑚 0 1
196
49.5-59.5 29
59.5-69.5 36
69.5-79.5 25
79.5-89.5 13
89.5-100.5 14
Modal Class is 59.5-69.5.
𝑙 = 59.5, 𝑓𝑚 = 36 , 𝑓0 = 29 , 𝑓1 = 25 , h = 10.
Using (1.9), Mode = 63.88.
Note: Mode can be calculated even when some extreme values are
missing. Not affected by extreme values/outliers. If a data has more
than one mode, it cannot be used for further statistical analysis.
Measures of Dispersion
A measure of dispersion reflects how closely the data clusters around the measure of
central tendency. It helps to judge the reliability of measure of Central tendency, to
obtain correct picture of distribution or dispersion of values in the data, to make a
comparative study of variability of two or more data sets or samples.
The two types of measures of dispersion are - 1. Absolute measures of dispersion 2.
Relative measures of dispersion
Absolute measures of dispersion are:
Range, Quartile Deviation, Mean Deviation, Standard Deviation.
These are expressed in the same unit in which data values or observations are given.
Relative measures of dispersion are:
Co-efficient (Co-eff.) of Range, Co-eff. of Quartile Deviation, Co-eff. of Mean
Deviation, Co-eff. of Variation.
197
These are expressed as a ratio or percentage of all the coefficient of the absolute
measures of dispersion. Therefore, relative measures of dispersion are also called
coefficient of dispersion. These are free from units of measurements. Relative
measures are used for comparing variability in two or more distributions having
different units of measurements.
Mean Deviation and Standard Deviation are popular measures of dispersion.
Mean Deviation:
Mean deviation is the average deviation fr om the mean (or median) value
of the given data set. For data values 𝑥1 , 𝑥2 , … , 𝑥𝑛 , mean deviation (MD) is
given by
∑𝑛
𝑖=1|𝑥𝑖 −𝑚|
𝑀𝐷 = , (1.10)
𝑛
where 𝑥𝑖 ’ s are mid values of CI and m is mean or median of the data set.
Example 16
The number of syntax errors committed on compilation of a C-program coded by 7
students are, 4,3,5,8,1,11, 6.
198
To find mean deviation from mean, following computations are used.
𝑥̅ = (4 + 3 + 5 + 8 + 1 + 11 + 6)/7 = 38/7 = 5.43.
|4 − 5.43| = 1.43, |3 − 5.43| = 2.43, |5 − 5.43| = 0.43,
|8 − 5.43| = 2.57, |1 − 5.43| = 4.43, |11 − 5.43| = 5.27,
|6 − 5.43| = 0.57,
𝑀𝐷𝑋̅ = (1.43+2.43+0.43+2.57+4.43+5.27+0.57)/7 = 2.4471.
To find mean deviation from median, following computations are used.
1, 3, 4, 5, 6, 8, 11 (ascending order arrangement)
7+1 -th
Median is value of ( ) item = 5.
2
|4 − 5| = 1, |3 − 5| = 2, |5 − 5| = 0, |8 − 5| = 3, |1 − 5| = 4, |11 − 5| = 6,
|6 − 5| = 1.
𝑀𝐷𝑚𝑒𝑑 = (1+2+0+3+4+6+1)/7 = 2.4285
Example 17
On inspection of 25 systems, the number of defects with respective frequencies is
found to be
No. of defects 6 5 9 12
Frequency 5 7 4 9
199
Example 18
To find the mean deviation for the following data set,
CI: 0-2 2-4 4-6 6-8 8-10
Freq.: 4 3 5 7 6
The computations are:
Mid values (𝑥𝑖 ) Freq.(𝑓𝑖 ) 𝑥𝑖 𝑓𝑖 |𝑥𝑖 –𝑥̅ | 𝑓𝑖 |𝑥𝑖 – 𝑥̅ |
1 4 4 4.64 18.56
3 3 9 2.64 7.92
5 5 25 0.64 3.2
7 7 49 1.36 9.52
9 6 54 3.36 20.16
𝑓𝑖 |𝑥𝑖 –𝑥̅ |
𝑥̅ = 5.64, 𝑀𝐷𝑋̅ = ∑ = 2.3744.
25
Standard Deviation:
A standard deviation (SD) is a statistic that measures the dispersion of a data set
relative to its mean.
It is the positive square root of mean of the squared deviations of the values from
arithmetic mean 𝑥̅ . It is denoted by 𝜎. It is calculated as
∑(𝑥𝑖 −𝑥̅ )2 𝑥𝑖 2 ∑ 𝑥𝑖 2
𝜎=√ = √∑ −( ) (1.13)
𝑛 𝑛 𝑛
200
For continuous frequency distribution, SD same as in (1.14), with 𝑥𝑖 ′ s
being class mid values.
Let a and h be constants. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are mid values of CIs, then let
𝑥1 −𝑎 𝑥2 −𝑎 𝑥𝑛 −𝑎
𝑢1 = , 𝑢2 = , ... , 𝑢𝑛 = . If 𝜎 is the SD of X values, then SD of U is
ℎ ℎ ℎ
given by (𝜎/ℎ).
Square of SD is called variance.
Example 19
Heights (in cm) of 6 children are 132, 137, 136, 142, 135, and 140. The SD of these
values is computed as below:
x x - 𝑥̅ (𝑥 − 𝑥̅ )2
132 -5 25
137 0 0
136 -1 1
142 5 25
135 -2 4
140 3 9
N = 6, 𝑥̅ = 137, ∑(𝑥 − 𝑥̅ )2 = 64, using (1.13), SD = 3.265
Example 20
Daily sales of computer systems of a brand are given below.
Sales: 12 13 14 15 16 17 18 19
Days: 1 0 4 12 20 15 6 2
201
SD of sales is obtained as below:
x f fx 𝑥2f 𝑥2
12 1 12 144 144
13 0 0 169 0
14 4 56 196 784
15 12 180 225 2700
16 20 320 256 5120
17 15 255 289 4335
18 6 108 324 1944
19 2 38 361 722
N = 80, ∑ 𝑓𝑖 𝑥𝑖 = 969, ∑ 𝑓𝑖 𝑥𝑖 2 = 15749, using (1.14), SD = 1.289.
Example 21
The diastolic blood pressures of 80 individuals are given below.
CI: 78-80 80-82 82-84 84-86 86-88 88-90
Freq: 3 15 26 23 9 4
SD for this data is computed as:
𝑋−83
CI f X u= f 𝑢2 f 𝑢2
2
78-80 3 79 -2 -6 4 12
80-82 15 81 -1 -15 1 15
82-84 26 83 0 0 0 0
84-86 23 85 1 23 1 23
86-88 9 87 2 18 4 36
88-90 4 89 3 12 9 36
∑ 𝑓𝑖 𝑢𝑖 2 ∑ 𝑓 𝑖 𝑢𝑖 2
∑ 𝑓𝑢 = 32, ∑ 𝑓𝑢2=122, h=2(class width) and using SD=√ −( ) . ℎ , we
𝑁 𝑁
have SD=2.336.
Note: SD is independent of origin of measurement, but not scale. For any data set,
SD≥ 0.
202
13.6 Moments, Skewness, Kurtosis
Skewness:
Measures of central tendency and dispersion are not sufficient to describe the nature
of distribution. The concepts of Skewness and Kurtosis are used to study the spread
and concentration of data values around central value respectively.
Skewness means lack of symmetry. A distribution is said to be symmetrical when the
data values are uniformly distributed around mean. In symmetric distribution mean,
median and mode are equal. The Co-efficient of skewness is given by
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 = (1.16)
𝑆𝑑
203
Figure 3
Kurtosis:
A frequency distribution may have high concentration of values at the centre
compared to extreme values. Kurtosis indicates degree of peakedness of the
distribution. It is denoted as 𝛽2 and is defined as
𝜇
𝛽2 = 𝜇42 (1.18)
2
204
13.7 Summary
3. A box contains 75 good IC chips and 25 defective chips. 12 chips are selected
100
at random. The number of ways this is done is ( ) (number of samples).
12
75
The number of samples that have all the 12 chips to be good are ( ).
12
4. The pull off force collected from 10 prototype engine connectors are 11.5,
12.3, 10.2, 12.6, 13.4, 11.2, 12.1, 11.8, 10.7, 11.6. The mean pull off force is
given by
11.5+12.3+⋯+11.6
𝑥̅ = = 11.74 .
10
𝑛+1
5. For the data: 50, 67, 24, 34, 78, 43, 55, median is obtained as ( ) - th = 4th
2
value of 24, 34, 43, 50, 55, 67, 78, which is 50.
13.8 Keywords
Combinatorics
Permutation
Histograms
Skewness
Kurtosis
205
13.9 Questions for Self Study
1. What is Combinatorics?
2. Explain Combinatorics & permutation.
3. A bag has 9 tickets marked with number 1, 2, ..., 9.Two tickets are
drawn at random from the bag. Find the number of ways in which
both numbers drawn are i. Even ii. Odd iii. one card is even and
another card is odd.
4. Number of defective chips in 50 lots are observed as below.
22, 17, 7, 11, 24, 12, 8, 21, 14, 4, 3, 16, 24, 23, 15, 19
206
6. Explain various measures of central tendency. State their merits
and limitations.
207
13.10References
208
UNIT-14 Elementary Probability Theory
STRUCTURE
14.0 Objectives
14.1 Introduction
14.5 Summary
14.6 Keywords
14.8 References
UNIT-14.0 Objectives
209
14.1 Introduction
Sample Space:
Event:
An event is simply a collection of certain sample points that is subset of the sample
space. A single performance of the experiment is known as a trial. Let E be an event
defined on a sample space S, that is E is subset of S. Let the outcome of a specific
trial be denoted by s, an element of S. If s is an element of E, then it is said that the
event E has occurred. The entire sample is an event called universal event, the null set
is used to denote impossible event or null event.
210
Exhaustive Events:
Two or more events are said to be mutually exclusive if they cannot happen
simultaneously in one trial.
Independent Events:
If S is a sample space, E is the set of all events, then to each event A in E, a unique
real number P =P(A) is known as probability of event A, if the following axioms are
satisfied.
i. P(S) = 1
ii. For every event A in E, 0≤ 𝑃(𝐴) ≤ 1
iii. For any countable sequence of events 𝐴1 , 𝐴2 , … , 𝐴𝑛 , that are mutually
exclusive, 𝑃(⋃𝑛𝑖=1 𝐴𝑖 ) = ∑𝑛𝑖=1 𝑃(𝐴𝑖 ).
211
Addition Rule:
Conditional Probability:
Let A and B be two events. Probability of happening of event B when event A has
already occurred is called conditional probability of B given A and is denoted
byP(B|A).
𝑃(𝑏𝑜𝑡ℎ 𝐵 𝑎𝑛𝑑 𝐴) 𝑃(𝐴∩𝐵)
P(B|A) = = . (2.2)
𝑃(𝐴) 𝑃(𝐴)
Multiplication Rule:
If A and B are two independent events, then P(B|A) = P(B). Hence (2.3) can be
written as
Probability Space:
Example 1
circuit (IC) chips consists of two good chips g1 and g2, two defective chips d1 and d2..
If three chips are selected at random from this group, what is the probability of event
E, which is two of the three selected chips are defective?
212
Solution:
Writing all possibilities of selecting three chips out of four chips, the sample space S
is
Example 2
What is the probability that some randomly chosen k-digit decimal number is a valid
k-digit octal number.
Solution
For event E, it is k-digit octal number, total favourable case will be 8k.
8k 4k
P(E) = 10𝑘 = 5𝑘 .
Example 3
The probability that 3 students A, B, C solve a problem are 1/2, 1/3, 1/4 respectively.
If the problem is simultaneously assigned to all of them, what is the probability that
the problem is solved?
213
Solution:
Let E be the event of solving the problem, 𝐸̅ is the event of not solving the problem.
Given that P(A) = 1/2, P(B) = 1/3, P(C) = 1/4 and hence
Example 4
The probability that a team wins match is 3/5. If this team plays 3 matches in a
tournament, what is the probability that the team
Solution
̅1 ) = 𝑃(𝑊
Events are independent, 𝑃(𝑊 ̅2 ) = 𝑃(𝑊
̅3 ) = 2/5
214
Example 5
A box has 5000 IC chips, of which 1000 are manufactured by company X and rest by
company Y. 10% of chips manufactured by company X and 5% of chips
manufactured by company Y are defective. If a randomly chosen chip is found to be
defective, find the probability that it is manufactured by company X.
Solution:
A defective chip may be manufactured by X or Y company, out of 5000 chips 300 are
defective.
The event (𝐴 ∩ 𝐵) is the event chip is made by company X and is defective. 10% of
1000 chips manufactured by company X are defective, i.e. 100 chips manufactured by
company X are defective. Hence, P(𝐴 ∩ 𝐵) = 100/5000 = 0.02.
Example 6
In a school 25% of the students failed in first language, 15% of the students failed in
second language and 10% of the students failed in both. If a student is selected at
random, find the probability that
215
Solution
Let L1 be the set of students failing in first language, L2 be the set of students failing
in second language. We have,
P(L1) = 25/100 = 1/4, P(L2) = 15/100 = 3/20, P(L1∩ L2) = 10/100 = 1/10
P(𝐿2 ∩L2 ) 1/10
i. P(L1|L2) = = 3/20 = 2/3
P(L2 )
P(𝐿1 ∩L2 ) 1/10
ii. P(L2|L1) = = = 2/5
P(L1 ) 1/4
iii. P(L1∪ L2) = P(L1) + P(L2) - P(L1∩ L2)
1 3 1 3
= 4 + 20 − 10 = 10
Example 7
X={0, 1}.
Example 8
Suppose 2 chips are to be selected at random from a lot, the chip may be good(g) or
defective(d). The sample space is S = {gg, gd, dg, dd} depending on the selected
chips being good or bad. Suppose X denotes the number of good chips out of 2
selected, X = 0, 1, 2.
X = {0, 1, 2}.
216
Example 9
In a post office, the weights (in gms.) of speed post articles received in a day are: 87,
800, 225, 430, 290, 220, 350, 105, 95 represent r.v. s
If r.v. takes finite or countably infinite number of values then it is called a discrete r.v.
If a r.v. takes infinite number of values (even in small range) then it is called
continuous r.v.
Probability Distribution:
For each value 𝑥𝑖 of discrete random variable X, a real number 𝑝(𝑥𝑖 ) is defined such
that i. 𝑝(𝑥𝑖 )≥ 0 and ∑ 𝑝(𝑥𝑖 ) = 1, 𝑝(𝑥) is called probability mass function (pmf). The
set of values {𝑥𝑖 , 𝑝(𝑥𝑖 )} is called pmf of discrete r.v. X. 𝑝(𝑥𝑖 ) gives the probability
that the r.v. X obtained on a performance of experiment is equal to 𝑥𝑖 .
Similarly, For each value 𝑥𝑖 of continuous random variable X, a real number 𝑓(𝑥𝑖 ) is
defined such that i. 𝑓(𝑥𝑖 )≥ 0 and ∫ 𝑓(𝑥𝑖 ) = 1, 𝑓(𝑥) is called probability density
function (pdf). 𝑓(𝑥𝑖 ) gives the probability that the r.v. X obtained on a performance of
experiment belongs to (𝑥𝑖 , 𝑥𝑖 + 𝛿𝑥𝑖 ), a small interval.
𝑥
The distribution function F(𝑥) defined by F(𝑥) = P(X≤ 𝑥) = ∫𝑙 𝑓(𝑥𝑖 )𝑑𝑥𝑖 is
cumulative distribution function, where 𝑙 is lower limit for r.v. 𝑥.
Example 10
In example 9, suppose the probability that the chip is defective is 0.1, then the pmf of
X = {0, 1, 2} is
217
P(X=1) = P(1 chip is good, 1 chip is defective) = 2.(0.9)(0.1) = 0.18
x: 0 1 2
Example 11
x: 10 20 30 40
Solution
Example 12
x: -3 -2 -1 0 1 2 3
p(x): k 2k 3k 4k 5k 7k 8k.
Solution
Example 13
1 2 𝑥
p(x) = ( ) , for x = 1, 2, 3, ....
2 3
find whether p(x) is pmf and the probability of X being an odd number.
218
Solution
1 2 𝑥
∑ 𝑝(𝑥) = ∑∞
𝑥=1 2 (3)
1 2 2 2 2 3
= 2 (3 + (3) + (3) + ⋯ )
1 2/3
= 2. 2 = 1, proving that p(x) is pmf.
1−( )
3
1 2 𝑥
P(x is an odd number) = ∑∞
𝑥=1,3,5,… ( ) 2 3
1 2 2 3 2 5
= 2 (3 + (3) + (3) + ⋯ )
1 2/3
=2. 2 = 3/5.
1−( )2
3
Example 14
2𝑥 , 0 < 𝑥 < 1
Find whether 𝑓(𝑥) = { is a pdf.
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution
1 1
∫0 𝑓(𝑥)𝑑𝑥 = ∫0 2𝑥 𝑑𝑥 = 1 , 𝑓(𝑥) is pdf.
Example 15
219
Solution
3
For 𝑓(𝑥) to be pdf, we must have ∫0 𝑓(𝑥)𝑑𝑥 = 1.
3
3 𝑥 𝑥2 3
∫0 ((6) + 𝑐) 𝑑𝑥 = [12 + 𝑐𝑥] = 4 + 3𝑐 .
0
3
∫0 𝑓(𝑥)𝑑𝑥 = 1 ⇒ 𝑐 = 1/12 .
2
2 𝑥 1 𝑥2 𝑥
P(1 ≤ 𝑥 ≤ 2) = ∫1 ((6) + 12) 𝑑𝑥 = [12 + 12] = 1/3.
1
Example 16
6𝑥 − 6𝑥 2 , 0 ≤ 𝑥 ≤ 1
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution
𝐄𝐱𝐚𝐦𝐩𝐥𝐞 17
The time t years required to complete a software project has pdf of the form (𝑡) =
𝑘𝑡(1 − 𝑡) , 0≤ 𝑡 ≤ 1 and is 0 otherwise. Find k and also the probability that the
project will be completed in less than 4 months.
Solution
1 1
k is the solution of ∫0 𝑓(𝑡)𝑑𝑡 = 1, i.e. ∫0 𝑘𝑡(1 − 𝑡)𝑑𝑡 = 1 implying that k =6.
The probability that the project will be completed in 4 months ( 1/3 years) is given by
1/3
P(0<t<(1/3)) = ∫0 6𝑡(1 − 𝑡)𝑑𝑡
1/3 1 2
= [3𝑡 2 − 2𝑡 3 ]0 = [3 − 27] = 7/27.
220
14.4 Theoretical distributions
Binomial Distribution:
If a random experiment with only two possible outcomes (say success and failure)
which are mutually exhaustive and exclusive is repeated n times. The probability of x
successes in n trials (with probability of success being p) is given by
𝑛
𝑃(𝑥) = ( ) 𝑝 𝑥 (1 − 𝑞)𝑛−𝑥 , x = 0, 1, 2, ... , n, (2.5)
𝑥
With 0<p<1 and p+q=1and is called Binomial Distribution.
Poisson Distribution:
Example 18
221
Solution
n=100, p=0.01.
Example 19
An airline knows that 5% of the people making reservations on a certain flight will
not turn up. Consequently their policy is to sell 52 tickets for a flight that can only
hold 50 passengers. What is the probability that there will be a seat for every
passenger who turns up?
Solution
52
𝑃(𝑥) = ( ) (0.05)𝑥 (0.95)52−𝑥 , x = 0, 1, 2, ..., 52.
𝑥
A seat is assured for every passenger who turns up if the number of passengers who
fail to turn up is more than or equal to 2, the probability of which is given by
Hence, probability that a seat is available for every passenger who turns up is 0.7405.
222
Normal Distribution:
With−∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞, 𝜎 > 0 is known as the normal distribution, if a r.v. X
has pdf (2.7), then it is written as 𝑋~𝑁(𝜇 , 𝜎 2 ). 𝜇 𝑎𝑛𝑑 𝜎 2 are the parameters of the
distribution.
If 𝜇 = 0 and 𝜎 2 = 1, then
1 2
1
𝑓(𝑥) = 𝑒 −2𝑥 , −∞ < 𝑥 < ∞ (2.8)
√2𝜋
Note:
0 1 ∞
𝑃(−∞ ≤ 𝑧 ≤ 0) = ∫−∞ 𝜙(𝑧)𝑑𝑧 = (2) = 𝑃(0 ≤ 𝑧 ≤ ∞) = ∫0 𝜙(𝑧)𝑑𝑧,
223
𝑃(𝑧 ≤ 𝑧1 ) = 𝑃(−∞ ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 𝑧1 )= 0.5 + 𝑃(0 ≤ 𝑧 ≤ 𝑧1 )
To find the area under the standard normal curve between 0 and 1.55, theoretically it
1 1.55 −1𝑥 2
is the value of ∫ 𝑒 2 𝑑𝑥. Referring to the table given in at the end of this
√2𝜋 0
section, we move vertically down along the column of z to reach 1.5 and then move
horizontally along this line to the value of 5 (regarded as 0.05) to intersect with
numerical figure 0.4394.
Example 20
In a test of electric bulbs, it was found that the life time of a particular brand was
distributed normally with an average life of 2000 hours and standard deviation of 60
hours. If the firm purchases 2500 bulbs, find the number of bulbs that are likely to
last for
Solution
Let X = life of bulb and has normal distribution with 𝜇 = 2000, 𝜎 = 60.
𝑋−𝜇 𝑋−2000
The standard normal variable be 𝑍 = =𝑍=
𝜎 60
𝑋−2000 2100−2000
i. P(X>2100) = P( > ) = P(Z > 1.67) = 0.0475
60 60
(from standard normal tables).
Number of bulbs that are likely to last for more than 2100 hours is 2500 X
0.0475 = 119.
224
𝑋−2000 1950−2000
ii. P( X < 1950 ) = P( < ) = P(Z < -0.83 ) = 0.2033
60 60
(from standard normal tables).
Number of bulbs that are likely to last for less than 1950 hours is 2500 X
0.2033 =508
iii. P( 1900<Z<2100) = P( - 1.67 <Z <1.67)
= 2. P( 0 < Z <1.67 ) = 0.905 (from standard normal tables).
Number of bulbs that are likely to last between 1900 and 2100 hours = 2500 X
0.905 = 2263.
Example 21
Solution
𝑋−200 210−200
P(X >210 ) = P( > ) = P( Z > 0.625) = 0.26599
16 16
𝑋−200 240−200
P(X >240 ) = P( > ) = P( Z > 0.25) = 0.00621
16 16
Mathematical Expectation:
∑𝑖 𝑥𝑖 𝑝(𝑥𝑖 ), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸(𝑋) = { (2.9)
∫𝑥 𝑥𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
Example 22
x: 0 1 2 3
225
Find E(3X + 2X2 ).
Solution
Example 23
3𝑒 −3𝑥 , 𝑖𝑓 𝑥 > 0
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution
∞ ∞
𝐸(𝑋) = ∫0 𝑥. 3𝑒 −3𝑥 𝑑𝑥 = ∫0 𝑒 −3𝑥 𝑑𝑥 = 1/3
14.5 Summary
1. circuit (IC) chips consists of two good chips g1 and g2, two defective chips d1 and
d2.. If three chips are selected at random from this group, what is the probability of
event E, which is two of the three selected chips are defective?
2. What is the probability that some randomly chosen k-digit decimal number is a
valid k-digit octal number.
3. The probability that 3 students A, B, C solve a problem are 1/2, 1/3, 1/4
respectively. If the problem is simultaneously assigned to all of them, what is the
probability that the problem is solved?
226
4. The probability that a team wins match is 3/5. If this team plays 3 matches in a
tournament, what is the probability that the team
5. A box has 5000 IC chips, of which 1000 are manufactured by company X and rest
by company Y. 10% of chips manufactured by company X and 5% of chips
manufactured by company Y are defective. If a randomly chosen chip is found to be
defective, find the probability that it is manufactured by company X.
14.6 Keywords
Exhaustive Events
Independent Events
Probability Space:
227
b) The incidence of an occupational disease in an industry is such that
the workers have 20% chance of suffering from it. What is the
probability that out of 5 workers at most two contract disease?
c) 2.5 percent of the fuses manufactured by a firm are expected to be
defective. Find the probability that a box containing 250 fuses contains
i. Defective fuses ii. 3 or more defective fuses.
14.8 References
228
UNIT-15 Correlation and regression analysis
STRUCTURE
15.0 Objectives
15.1 Introduction
15.2 Regression analysis
15.3 Fitting of Second degree parabola
15.4 Inverse regression
15.5 Correlation versus Regression
15.6 Summary
15.7 Keywords
15.8 Question for self study
15,9 References
15.0 Objectives
Define given the data set, to carry out correlation and regression analyses,
Describe interpret correlation and regression coefficients.
Solve when to carry out correlation analysis and regression analysis.
229
15.1 Introduction
Figure1
230
Linear Correlation
The scatter diagram will give only a vague idea about the presence or absence of
correlation and the nature (positive or negative) of correlation. It will not indicate
about the strength or degree of relationship between two variables. The index of the
degree of relationship between two continuous variables is known as correlation
coefficient. The correlation coefficient is denoted by r in sample and as ρ (read as rho)
in case of population. The correlation coefficient r is known as Pearson’s correlation
coefficient. (It was developed by Karl Pearson). It is also called as Product-moment
correlation.
The correlation coefficient r is used under certain assumptions. They are:
i. The variables under study are continuous random variables and are
normally distributed.
ii. The relationship between the variables is linear.
iii. Each pair of observation is uncorrelated with other pairs.
When the assumptions for the applicability of r are not met, the other
measures of association have to be employed.
232
Solution:
Since correlation coefficient is independent of the origin and scale of measurements
of variables, correlation coefficient between x and y is same as correlation coefficient
between u = x-170 and v = y-160.
x y u v u2 v2 uv
164 158 -6 -2 36 4 12
176 164 6 4 36 16 24
178 165 8 5 64 25 40
184 171 14 11 196 121 154
175 163 5 3 25 9 15
167 156 -3 -4 9 16 12
173 163 3 3 9 9 9
180 169 10 9 100 81 90
∑ 𝑢 = 37, ∑ 𝑣 = 29, ∑ 𝑢2 = 475,
∑ 𝑣 2 = 281, ∑ 𝑢𝑣 = 356 .
𝑛 ∑ 𝑢𝑣−(∑ 𝑢)(∑ 𝑣)
𝑟= (3.4)
√[𝑛 ∑ 𝑢2 −(∑ 𝑢)2 ][𝑛 ∑ 𝑣 2 −(∑ 𝑣)2 ]
=0.9598.
Example 2
Find the correlation coefficient for the following data.
x: 10 14 18 22 26 30
y: 18 12 24 6 30 36
Solution
𝑥̅ = 20, 𝑦̅ = 21.
𝑥̃ = (𝑥 − 20), 𝑦̃ = (𝑦 − 21)
x y 𝑥̃ 𝑦̃ 𝑥̃ 2 𝑦̃ 2 𝑥̃𝑦̃
10 18 -10 -3 100 9 30
14 12 -6 -9 36 81 54
18 24 -2 3 4 9 -6
22 6 2 -15 4 225 150
26 30 6 9 36 81 54
30 36 10 15 100 225 150
233
∑ 𝑥̃ 2 = 280 , ∑ 𝑦̃ 2 = 630, ∑ 𝑥̃𝑦̃ = 252.
𝑈𝑠𝑖𝑛𝑔 (3.3), we have r = 0.6.
Pearson’s correlation coefficient can be calculated only if the characteristics under
study are quantitative (numerically measure). Spearman’s correlation coefficient can
be calculated even if the characteristics under study are qualitative. Here the values of
the variables are ranked in the decreasing or increasing order, correlation coefficient
using these ranks is computed using
6 ∑ 𝑑2
𝜌=1− (3.5)
𝑛3 −𝑛
Example 3
Marks scored by eight students in C-programming and mathematics are given below.
C-prog: 25 43 27 35 54 61 37 45
Maths: 35 47 20 37 63 54 28 40
Find correlation coefficient between the two marks scored.
Solution
Marks Ranks
C-prog. Maths R1 R2 d = R1 - R2 d2
25 35 8 6 2 4
43 47 4 3 1 1
27 20 7 8 -1 1
35 37 6 5 1 1
54 63 2 1 1 1
61 54 1 2 -1 1
37 28 5 7 -2 4
45 40 3 4 -1 1
∑ 𝑑 2 = 14, n = 8, using (3.5), r = 0.8333.
For the computation of correlation coefficient using ranks, while ranking the values,
two or more values may be equal, and so, a situation of ties may arise. In such cases,
all those values which are equal are assigned with the same average rank. And then,
the correlation coefficient is found. Here, corresponding to every repeated rank
234
(𝑚3 −𝑚)
(which repeats m times), a correction factor (CF) is added to ∑ 𝑑 2 . If one
12
rank repeats m1 times, another rank repeats m2 times, a third rank repeats m3 times
and so on, the CF is (𝑚13 − 𝑚1 )/(12) + (𝑚23 − 𝑚2 )/(12) + ⋯. The correlation
coefficient is
6[∑ 𝑑2 +𝐶𝐹]
𝜌=1− (3.6)
𝑛3 −𝑛
235
regression. When only two variables are involved the functional relationship is known
as simple regression. If the relationship between the two variables is straight line, it is
known as simple linear regression, otherwise it is called as simple non-linear
regression (price=a.speedb).
Figures 2, 3 and 4 respectively show scatter diagrams of linear dependence, nonlinear
dependence, no specific dependence.
Figure2
Figure3
236
Figure3
When there are more than two variables and one of them is assumed to be dependent
upon the others, the functional relationship between the variables is known as
multiple regressions.
237
𝑦 = 𝑎 + 𝑏𝑥 (3.7)
𝑥 being independent variable is called the regression line of 𝑦 on 𝑥.
Consider a set of n given values (𝑥, 𝑦), we have to find a specific relation 𝑦 = 𝑎 + 𝑏𝑥
(determine values of 𝑎 and 𝑏)for the data to satisfy as accurately as possible and such
an equation is called the best fitting equation or the curve of best fit. The residual 𝑅 =
𝑦 − (𝑎 + 𝑏𝑥) is the difference between observed and estimated values of 𝑦. The
parameters 𝑎 and 𝑏 are found such that the sum of squares of the residuals is
minimum (least) which is called the method of least squares. Let
𝑆 = ∑𝑛𝑖=1 𝑅 2 = ∑𝑛𝑖=1(𝑦 − (𝑎 + 𝑏𝑥))2 (3.8)
Treating 𝑆 as a function of two parameters 𝑎 and 𝑏 the necessary conditions for 𝑆 to
𝜕𝑆 𝜕𝑆
be minimum are 𝜕𝑎 = 0 𝑎𝑛𝑑 = 0 which lead to
𝜕𝑏
𝑛𝑎 + 𝑏 ∑ 𝑥 = ∑ 𝑦 (3.9)
𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 = ∑ 𝑥𝑦 (3.10)
Which are called normal equations for fitting a straight line 𝑦 = 𝑎 + 𝑏𝑥, solving
which we obtain the values of 𝑎 and 𝑏. Dividing (3.9) by 𝑛, we have 𝑦̅ = 𝑎 + 𝑏𝑥̅ ,
implying that regression line passes through (𝑥̅ , 𝑦̅ ). We also know that the equation
of straight line passing through (𝑥1 , 𝑦1 ) having slope m is
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1 ) (3.11)
Hence if (𝑥1 , 𝑦1 ) = (𝑥̅ , 𝑦̅ ), we get
𝑦 − 𝑦̅ = 𝑚(𝑥 − 𝑥̅ ) (3.12)
𝑦̃ = 𝑚𝑥̃ , where 𝑦̃ = 𝑦 − 𝑦̅and 𝑥̃ = 𝑥 − 𝑥̅ .
The normal equation for fitting 𝑦̃ = 𝑚𝑥̃ (using (3.10)) to find 𝑚 will be
∑ 𝑦̃𝑥̃
∑ 𝑦̃𝑥̃ = 𝑚 ∑ 𝑥̃ 2 or 𝑚 = (3.13)
∑ 𝑥̃ 2
𝑛𝜎𝑥2 = ∑ 𝑥̃ 2 (3.15)
Using (3.14) and (3.15) in (3.13), we have
𝑛𝑟𝜎𝑥 𝜎𝑦 𝜎𝑦
𝑚= = 𝑟𝜎 (3.16)
𝑛𝜎𝑥2 𝑥
238
Similarly assuming the equation in the form 𝑥 = 𝑎 + 𝑏𝑦 and proceeding on the same
lines as above, we obtain
𝜎
𝑥 − 𝑥̅ = 𝑟 𝜎𝑥 (𝑦 − 𝑦̅) (3.18)
𝑦
𝜎𝑦
as the regression line of 𝑥 on 𝑦. The coefficient of 𝑥 in (3.17) and 𝑦 in (3.18) are 𝑟 𝜎
𝑥
𝜎𝑥
and 𝑟 𝜎 respectively and are known as regression coefficients. The product of
𝑦
Example 4
8x-10y+66 = 0 and 40x – 18y = 214 are the two regression lines.
i. Find the mean of x’ and y’.
ii. Correlation coefficient of x and y.
iii. 𝜎𝑦 if 𝜎𝑥 = 3.
Solution
i. We know that regression lines pass through 𝑥̅ and 𝑦̅ .
Hence, 8𝑥̅ − 10𝑦̅ = −66
40𝑥̅ − 18𝑦̅ = 214
Solving these two equations simultaneously, we have 𝑥̅ = 13, 𝑦̅ = 17.
ii. Rewriting the given equations, we have
𝑦 = 0.8𝑥 + 6.6 (i)
𝑥 = 0.45𝑦 + 5.35 (ii)
𝜎𝑦 𝜎
From (i) 𝜎 = 0.8 𝑎𝑛𝑑 𝑓𝑟𝑜𝑚 (ii) 𝑟 𝜎𝑥 = 0.45 .
𝑥 𝑦
Example 5
For the following data, find the two regression lines and calculate x for given value of
y = 16.
X: 36 23 27 28 28 29 30 31 33 35
239
Y: 29 18 20 22 27 21 29 27 29 28
Solution
From the data, we have the following:
n = 10, 𝑥̅ = 30, 𝑦̅ = 25, 𝑧̅ = 5, z = x – y.
∑ 𝑥 2 = 9138, ∑ 𝑦 2 = 6414, ∑ 𝑧 2 = 306
𝜎𝑥2 = 13.8, 𝜎𝑥 = 3.715 , 𝜎𝑦2 = 16.4, 𝜎𝑦 = 4.05
𝜎𝑧2 = 5.6
𝜎𝑥2 +𝜎𝑦2 −𝜎𝑥−𝑦
2
𝑟= = 0.82.
2𝜎𝑥 𝜎𝑦
Using (3.17) and (3.18) , substituting above values, the regression line of 𝑦 on 𝑥and
the regression line of 𝑥 on 𝑦 are respectively given by
𝑦 = 0.894𝑥 − 1.82
𝑥 = 0.752𝑦 + 11.2
To find the value of x for = 16 , 𝑥 = 0.752𝑦 + 11.2 is used and x is obtained as 23
240
𝑎 ∑𝑛1 𝑥 + 𝑏 ∑𝑛1 𝑥 2 + 𝑐 ∑𝑛1 𝑥 3 = ∑𝑛1 𝑥𝑦 (3.21)
𝑎 ∑𝑛1 𝑥 2 + 𝑏 ∑𝑛1 𝑥 3 + 𝑐 ∑𝑛1 𝑥 4 = ∑𝑛1 𝑥 2 𝑦
are normal equations for fitting the second degree parabola
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 in the least square sense. By solving these equations, we obtain the
values of 𝑎 , 𝑏, 𝑐.
𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝟔
Fit a parabola of second degree
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
for the data given below.
X: 0 1 2 3 4
Y: 1 1.8 1.3 2.5 2.3
Solution
The normal equations for finding the values of 𝑎 , 𝑏, 𝑐 of
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
are given in (3.21). The calculations are:
x y xy x2y x2 x3 x4
0 1 0 0 0 0 0
1 1.8 1.8 1.8 1 1 1
2 1.3 2.6 5.2 4 8 16
3 2.5 7.5 22.5 9 27 81
4 2.3 9.2 36.8 16 64 256
∑ 𝑥 = 10, ∑ 𝑦 = 8.9, ∑ 𝑥𝑦 = 21.1,
241
y 3.07 12.85 31.47 57.38 91.29
Solution
𝑥−6
n =5, 𝑦̅ = 39, and let 𝑥̃ = , 𝑦̃ = 𝑦 − 39.
2
242
on Y as x=c+dy, as the assumptions of regression are violated. Instead, we re-arrange
the linear regression of Y on X to obtain
𝑦−𝑎
X= 𝑏
15.6 Summary
1. Following are the Agricultural production index (x) of an agricultural product and
its wholesale price index(y) for eight years. Find the correlation coefficient between x
and y, interpret results.
2. Find the correlation coefficient for the following data.
3. Marks scored by eight students in C-programming and mathematics are given
below.
4. 8x-10y+66 = 0 and 40x – 18y = 214 are the two regression lines.
5. For the following data, find the two regression lines and calculate x for given value
of
15.7 Keywords
Correlation Analysis
correlation coefficient
regressand
parabola
243
15.8 Question for self study
Age (X) : 35 38 41 44 47 50 53 56 59
HDL level: 50 56 44 49 46 49 45 51 48
15.9 References
Banglore.
244
3. Montgomery D C and Runger G C (2014). Applied Statistics and Probability for
Engineers. (Sixth Edition). John Wiley and Sons, Singapore
245
UNIT-16 Testing of Hypothesis
STRUCTURE
16.1Introduction
16.1 Large Sample tests
16.2 Small sample tests
16.3 Testing for population variance
16.4 Tests based on Chi-square distribution
16.5 Introduction to Monte Carlo Methods
16.6 Summary
16.7 Keywords
16.8 Question for self-study
16.9 References
16.0 Objectives
246
16.1 Introduction
For example, researcher wish to know is there a difference in uric acid levels of
normal individuals and individuals with mongolism. Another example could be, two
vaccines are same in effectiveness. Similarly we may be interested in testing that the
job arrival rate 𝜆 for a certain computer system satisfies 𝜆 = 𝜆0 . The term null
hypothesis is used for any hypothesis set primarily to see whether it can be rejected.
Even in non-statistical thinking this is what is done. In a court of law, an accused is
assumed to be innocent unless he is proven guilty beyond a reasonable doubt.
The experimental evidence, upon which the test is based, will consist of a random
sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 of size n. The hypothesis testing procedure consists of dividing
n-space of observations into two regions, R(H0) and R(H1). If the observed vector
(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) lies in R(H1), the null hypothesis is rejected. The region R(H0) is
known as the acceptance region and R(H1) is critical region or rejection region.
247
Type I and Type II Errors:
To make a decision on the hypothesis with certainty we would have to examine the
entire population. In many problems the populations are very large. Hence the
decision is made on the basis of sample data. As the decision is based on sample data,
we may make a correct decision on null hypothesis or commit one of the following
two errors. We may reject the null hypothesis when in fact it is true, i.e. Type I error.
The corresponding probability is denoted by 𝛼, is known as level of significance of
the test. Similarly, we may accept the null hypothesis when in fact it is not true, i.e.
Type II error is committed. The probability of type II error is denoted as 𝛽.When we
say that P( type I error) = 𝛼 and P( type II error) = 𝛽, we mean that if a test is
performed large number of times, 𝛼 proportion of time we reject H0 when it is true,
and 𝛽proportion of time we will fail to reject H0, when in fact it is false.Whenever a
null hypothesis isrejected there is always a risk of committing Type I error (rejecting a
true null hypothesis). Whenever a null hypothesis is accepted, there is always a risk of
committing Type II error (accepting a false null hypothesis). An error of Type I or
Type II leads to wrong decision, we must attempt to minimize these errors. The
relationship between Type I error and Type II error is that if one error increases, the
other error will decrease. Hence both errors cannot be controlled simultaneously. It is
customary to fix an upper bound for the probability of Type I error (as it is more
serious)and , the probability of Type II error is minimized as far as possible. The
probability of correct decision of rejecting the null hypothesis when it is false is
known as the power of the test, which is equal to (1 – 𝛽). The example for Type I
error is more serious than Type II error is : Judging agood quality article to be a bad
one is Type I error which is more serious than judging a bad article as a good one. To
make a decision on the hypothesis of population parameter with certainty, the
population parameter is estimated through the statistic. If the estimated statistic and
assumed population parameter differ significantly, it means that the discrepancy
between the statistic and the parameter is too large to be reasonably attributed to
chance. This is a test of significance. The difference between the parameter and the
statistic is known as the sampling error.
Based on the sampling error, the sampling distributions are derived. The observed
results are then compared with the results expected on the basis of the sampling
248
distribution. If the difference between the observed and expected results is more than
a specified quantity of the standard error of the statistic, it is said to be significant at a
specified probability level. If the difference is significant, the null hypothesis is
rejected; otherwise it is accepted. The process of deciding whether to accept or reject
the null hypothesis is known as testing of hypothesis.
Data collection
Conclusion
But most of the researchers use packages for their analysis. In that case the steps
involved are:
Data to be identified
Interpretation
P-values:
P-value for a test may be defined as the smallest value of 𝛼 for which the null
hypothesis is rejected. When you perform a hypothesis test, a P-value helps you
determine the significance of your results. A small P-value indicates strong evidence
against the null hypothesis, so you reject a null hypothesis and a significant difference
does exist. Reporting of P-values as part of the results of an investigation is more
informative to the reader than such statements as the null hypothesis is rejected at
0.05 level of significance or the results are not significant at the 0.05 level. It is the
249
smallest level of significance at which null hypothesis is rejected. So the smaller P-
value implies stronger evidence in favour of alternative hypothesis.
In regression analysis the P-value for each term tests the null hypothesis that the
coefficient is equal to zero (no effect). A low P-value indicates that you can reject the
null hypothesis.
Test Statistic:
The test statistic is some statistic (function of observations) that may be computed
from data of sample. The magnitude of the test statistic decides to reject or not to
reject the null hypothesis.
Relevant statistic is the statistic relevant to null hypothesis (such as hypothesis for
mean, variance, proportion,....). Standard error is the standard deviation of sampling
distribution of statistic considered.
Sampling Distribution:
All possible samples of size n=2 from a population of size N=5 (Sample means are in
parentheses)
Popl. Unit 6 8 10 12 14
250
Sampling Distribution of 𝑥̅ computed for above data is given below.
6 1 1/25
7 2 2/25
8 3 3/25
9 4 4/25
10 5 5/25
11 4 4/25
12 3 3/25
13 2 2/25
14 1 1/25
Total 25 25/25=1
Standard error =2
Degrees of Freedom:
251
Testing Single Population Mean:
The testing of hypothesis about a population mean is considered under three different
conditions:1) sampling from a normally distributed population of values with known
variance (large sample) 2) sampling from a normally distributed population with
unknown variance (small sample) 3) sampling from a population that is not normally
distributed.
The assumption will be sample comes from a normally distributed values with a
known variance σ2.
The hypothesis H0, to be tested is population mean µ = µ0. The alternate hypothesis
H1 is µ ≠ µ0.
𝑥̅ −µ
Z =𝜎/√𝑛0 (4.1)
The test statistic Z is normally distributed with mean 0 and variance 1, if H0 is true.
Reject H0 if the computed value of Z falls in the rejection region, otherwise accept H0.
To specify rejection region, we should know for what values of the test statistic H0
will be rejected. If the null hypothesis is false, it may be so either because true mean
is less than µ0 or is greater than µ0. Therefore sufficiently large values or sufficiently
small values of Z will cause rejection of null hypothesis. In other words extreme
values of Z results in the rejection of H0. How extreme must a possible value of the
test statistic be to qualify for the rejection region? The answer depends on the
significance level we choose, i.e. the size of the probability of committing a Type I
error. Suppose Probability of rejecting a true null hypothesis be α=0.05. Since our
rejection region consists of two parts (sufficiently small values and sufficiently large
values of the test statistic), part of α will have to be associated with the large values
252
and part with small values. It is reasonable to divide α equally i.e. (α/2) =0.025 be
associated with small values and (α/2)=0.025 be associated with large values. Now,
we should know what is the value of Z to the right of which lies 0.025 of the area
under the unit normal distribution.
From the standard normal tables for 0.025, we locate Zα/2=1.96.i.e. P(Z ≥ zα/2 ) =0.025
and P(Z ≤ zα/2 ) =0.025. Thus the values of Z ≤ -1.96 or Z ≥1.96 form the critical
region.
𝑥̅ −µ
For 𝑥̅ =22, µ0=25, σ2=45, n=10, Z =𝜎/√𝑛0 = -1.41.
The calculated value of Z is greater than -1.96 but less than 1.96, the null hypothesis
is accepted implying that the computed value of the test statistic is not significant at
0.05 level.
One sided hypothesis test: Suppose the hypothesis to be tested is H0 : µ > µ0, to
specify the rejection region, we have to consider those values of Z that would cause
rejection of the null hypothesis. Looking at the hypothesis, sufficiently small values
would cause rejection of null hypothesis. As it is one sided test, the whole of α =0.05
will go in the one tail of the distribution. From normal tables, for what values of Z to
the left of which lies 0.05 of the area of standard normal curve is –1.645.
H 0 : 1 2 0 H1 : 1 2 0
H 0 : 1 2 0 H1 : 1 2 0
H 0 : 1 2 0 H1 : 1 2 0
253
The test statistic to be used is (population variances known)
If the sample on which we base our hypothesis test about a population mean comes
from a population that is not normally distributed, using central limit theorem, we can
use
x 0
Z if population variance is known
n
x 0
Otherwise.
S n
If sample size n is small, then the distribution of Z is far from normality and
consequently above tests cannot be used. To deal with small samples, exact sample
tests have been developed. From practical point of view, if n is ≤ 30, it is termed as
small sample.
𝑥̅ −𝜇 𝑥̅ −𝜇
𝑡 = 𝑆/ = , (4.5)
√𝑛 √𝑆 2 /𝑛
254
1
where𝑆 2 = 𝑛−1 ∑(𝑥 − 𝑥̅ )2 is an unbiased estimate of 𝜎 2 .
t defined in(4.5) has student’s t distribution with 𝜈 = (n-1) d.f. and with probability
density function(pdf)
1 𝑡2
f(t) = 1 𝜈 . (1 + 𝜈 )−(𝜈+1)/2 , −∞ < 𝑡 < ∞.
√𝜈𝛽(2 ,2)
The value of 𝑡𝜈 (𝛼) defined in (4.6) is called upper critical value of t for 𝜈 d.f.
corresponding to significance level 𝛼.
The two tailed critical values of t for 𝜈 d.f. corresponding to significance level 𝛼 with
equal tails, each of area (𝛼/2) are given by 𝑡𝜈 (𝛼/2) (positive critical value) and
−𝑡𝜈 (𝛼/2) (negative critical value). These critical values of t have been tabulated for
different values of 𝛼 and 𝜈 and are given at the end of this section. t test can be used
to
When variance of population is not known, the test statistic to be used to test H0 : =
0 is
𝑥̅ −𝜇0
𝑡= (4.7)
𝑆/√𝑛
Paired Comparisons
The difference in two population means was studied when the two samples are
independent. A method frequently employed for assessing the effectiveness of a
treatment or experimental procedure is one that makes use of related observations
255
resulting from non-independent samples. A hypothesis test based on this type of data
is known as a paired comparison test.
𝑑̅ −𝜇𝑑
𝑡= (4.8)
𝑆𝑑−
𝑑̅−𝜇𝑑
𝑍=𝜎 (4.9)
𝑑 /√ 𝑛
𝑥̅ −𝑦̅
𝑡= 1 1
(4.10)
𝑆√ +
𝑛 𝑚
1
Where 𝑆 2 = 𝑚+𝑛−2 [∑(𝑥 − 𝑥̅ )2 + ∑(𝑦 − 𝑦̅)2 ] (4.11)
256
16.3 Testing Single population Variance
When the data available for analysis consists of a simple random sample drawn from
a normally distributed population, the test statistic for testing
𝐻0 : 𝜎 2 = 𝜎02 , 𝐻1 : 𝜎 2 ≠ 𝜎02
(𝑛−1)𝑆 2
𝑌= (4.12)
𝜎02
Suppose the data constitute two independent random samples each drawn from a
normally distributed population. The hypothesis to be tested is
𝑆2
𝑉 = 𝑆12 (4.13)
2
S12 and S 22 are sample variances of two samples. When the null hypothesis is true, V is
distributed as F with n1-1 and n2-1degrees of freedom.
Example 1
Milk sachets of 500ml. each were filled by a machine for which standard deviation of
filling is 5 ml. 72 sachets are tested for their contents and the mean content is found to
be 501.1 ml. Test whether the machine is set properly.
Solution
257
𝐻0 : 𝜇 = 500 (machine is set properly)
From standard normal tables, at 5% level of significance, the critical values are
−𝑘𝛼/2 = −1.96 𝑎𝑛𝑑 𝑘𝛼/2 = 1.96 (two sided test). 𝑍𝑐𝑎𝑙 ∈ (−1.96 , 1.96). Hence,
accept 𝐻0 that is machine is set properly.
Example 2
Solution
Example 3
For a population, it is believed that the average height is greater than 180cms. with
standard deviation of 3.3cms. Randomly, 50 individuals were selected and their
heights are measured. The average height is found to be 181.1cms. Test the belief
regarding height of population at 1% significance level.
Solution
258
𝜇0 = 180, 𝜎 = 3.3, 𝑥̅ = 181.1, 𝑛 = 50, 𝛼 = 0.01
It is concluded that, the data supports the belief of height being greater than 180cms.
Example 4
A shop sells on an average 200 pen drives per day with a standard deviation of 50 pen
drives. After an extensive advertising campaign, the management computed the
average sales for next 25 days and is found to be 216. Find whether an improvement
has occurred or not, assuming normal distribution.
Solution
𝑍𝑐𝑎𝑙 = 1.6 , and the critical region for 𝐻0 is 1.645, hence 𝐻0 is accepted.
The conclusion is, the sample do not support the improvement in sales of pen drives
due to advertisement.
Example 5
An insurance agent claims that the average age of policy holders who insure through
him is less than 30.5 years. A random sample of 100 policy holders who had insured
through him had the average age of 28.8 years and standard deviation of 6.35 years.
Test whether the claim of the agent is true at 5% significance level.
Solution
259
Since sample size is large(>30), we may use the value of sample standard deviation as
the value of 𝜎.
𝑍𝑐𝑎𝑙 = −2.68 . |Z| = 2.68 > 1.645. Hence 𝐻0 is rejected and the average age of policy
holders who insure through him is less than 30.5 years.
Example 6
An investigation of the relative merits of two kinds of flash light batteries showed that
a random sample of 100 batteries of brand A lasted on the average 36.5 hours with a
standard deviation of 1.8 hours, while a random sample of 80 batteries of brand B
lasted on the average 36.8 hours with a standard deviation of 1.5 hours. Test whether
the observed difference between the average life times is significant.
Solution
𝐻0 : 𝜇𝐴 = 𝜇𝐵 , 𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵
Under null hypothesis, the test statistic defined in (4.3) has N(0,1) distribution, and
for given values, using (4.3),
Hence, 𝐻0 is accepted and the data do not provide any evidence against null
hypothesis.
Example 7
The mean yield of wheat from district A was 210kgs with standard deviation of 10kgs
per hectare from sample of 100 plots. In another district B the yield was 220kgs with
standard deviation of 12kgs per hectare from a sample of 150 plots. Assuming that the
standard deviation of the entire state was 11kgs, test whether the mean yields of
districts A and B differ significantly.
260
Solution
𝑙𝑒𝑡𝛼 = 0.01. 𝜎𝐴 = 𝜎𝐵 = 𝜎 = 11
𝐻0 : 𝜇𝐴 = 𝜇𝐵 𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵
Under null hypothesis, the test statistic defined in (4.3) has N(0,1) distribution, and
for given values, using (4.3)
Hence, 𝐻0 is rejected and the data do provide evidence against null hypothesis, the
mean yield of crops in two districts differ significantly.
Example 8
Solution
H0 :𝜇 = 0.025, H1 : 𝜇 ≠ 0.025
Under H0 , the test statistic given in (4.5) has t-distribution with 9 d.f. The calculated
value for given values is 𝑡𝑐𝑎𝑙 = −1.5.
H0 is not rejected since |𝑡𝑐𝑎𝑙 | < 2.62, it is concluded that sample mean do not differ
significantly from population mean.
Example 9
A group of 5 patients treated with medicine A weigh 42, 39, 48, 60 and 41 kgs.
Another group of 7 patients of same hospital treated with medicine B weigh 38, 42,
56, 64, 68, 69 and 62 kgs. Test whether medicine B increases weight significantly?
The table value of t at 5% significance level for 10 d.f. is 2.2281.
261
Solution
H0 :𝜇𝐴 = 𝜇𝐵 , H1 : 𝜇𝐴 < 𝜇𝐵
From the data we get 𝑥̅𝐴 = 46, 𝑦̅𝐵 = 57, 𝑛 = 5, 𝑚 = 7, using (4.11) 𝑆 2 = 121.6.
Example 10
An IQ test was administered to 5 persons before and after they were trained. The
results are given below.
Candidates: 1 2 3 4 5
Solution
H0 :𝜇𝑥 = 𝜇𝑦 , H1 : 𝜇𝑥 ≠ 𝜇𝑦
d: -10 2 -2 -4 4
|𝑑̅|
Using (4.8), |𝑡𝑐𝑎𝑙 | = = 0.816.
√30/5
The tabulated value of t for 4 d.f. and at 1% level of significance, for a two tailed test
is 4.60. Hence, |𝑡𝑐𝑎𝑙 |< 4.60. Hence, H0 is accepted and it is concluded that the data do
not support the hypothesis of change in IQ due to training.
262
16.4 Tests based on Chi-square distribution
Chi-square Distribution
The square of standard normal variable is called chi-square variate with 1 d.f. Thus if
𝑥−𝜇 2
X is r.v. having normal distribution with mean 𝜇 and variance 𝜎 2 , then ( ) is chi-
𝜎
Suppose we want to test the hypothesis that given normal population has a specified
variance 𝜎 2 = 𝜎02 . If 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample of size n from the normal
population, then under the null hypothesis𝐻0 : 𝜎 2 = 𝜎02 , the statistic
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 𝑛𝑆 2
𝜒2 = = (4.14)
𝜎02 𝜎02
By comparing the value of 𝜒 2 obtained from (4.14) with the tabulated value of 𝜒 2
with (n-1) d.f. at desired level of significance, the null hypothesis is accepted or
rejected.
Note that if 𝑥1 , 𝑥2 , … , 𝑥𝑛 is not from normal population, then for large n (n>30), the
Example 11
Solution
263
n=20, S=3.72, 𝜎 2 = 4.35
𝐻0 : 𝜎 2 = 4.35 , 𝐻1 : 𝜎 2 ≠ 4.35
2 𝑛𝑆 2
𝜒𝑐𝑎𝑙 = = 63.62 follows 𝜒 2 distribution with 19 d.f.
𝜎02
2
As 𝜒𝑐𝑎𝑙 > 30.144, 𝐻0 is rejected and the sample is not from normal population with
variance 4.35.
Example 12
Can we say that the distribution of sample observations is normal with variance 20.
𝐻0 : 𝜎 2 = 20 , 𝐻1 : 𝜎 2 ≠ 20
𝑥̅ = 47 , 𝑛𝑆 2 = 280, 𝜎02 = 20
2
Using (4.14), 𝜒𝑐𝑎𝑙 = 14.
2
𝜒 2 table value for 9 d.f. at 5% significance level is 16.92. 𝜒𝑐𝑎𝑙 is less than the
tabulated value, accept the null hypothesis of population variance to be 20.
Let us suppose that the given population is consisting of N items is divided into r
mutually exclusive classes 𝐴1 , 𝐴2 , … , 𝐴𝑟 with respect to attribute A, so that an item
selected at random possesses one and only one of the attributes 𝐴1 , 𝐴2 , … , 𝐴𝑟 .
Similarly, let us suppose that the same population is divided into s mutually disjoint
and exhaustive classes 𝐵1 , 𝐵2 , … , 𝐵𝑠 with respect to another attribute B so that an item
selected at random possesses one and only one of the attributes 𝐵1 , 𝐵2 , … , 𝐵𝑠 . The
frequency distribution of the items belonging to the classes 𝐴1 , 𝐴2 , … , 𝐴𝑟 and
𝐵1 , 𝐵2 , … , 𝐵𝑠 is given by r X s contingency table given in Table 4.1.
264
B 𝐵1 𝐵2 … 𝐵𝑗 … 𝐵𝑠 Total
A
𝐴1 (𝐴1 , 𝐵1 ) (𝐴1 , 𝐵2 ) … (𝐴1 , 𝐵𝑗 ) … (𝐴1 , 𝐵𝑠 ) (𝐴1 )
𝐴2 (𝐴2 , 𝐵1 ) (𝐴2 , 𝐵2 ) … (𝐴2 , 𝐵𝑗 ) … (𝐴2 , 𝐵𝑠 ) (𝐴2 )
… … … … … … … …
𝐴𝑖 (𝐴𝑖 , 𝐵1 ) (𝐴𝑖 , 𝐵2 ) … (𝐴𝑖 , 𝐵𝑗 ) … (𝐴𝑖 , 𝐵𝑠 ) (𝐴𝑖 )
… … … … … … … …
𝐴𝑟 (𝐴𝑟 , 𝐵1 ) (𝐴𝑟 , 𝐵2 ) … (𝐴𝑟 , 𝐵𝑗 ) … (𝐴𝑟 , 𝐵𝑠 ) (𝐴𝑟 )
Total (𝐵1 ) (𝐵2 ) … (𝐵𝑗 ) … (𝐵𝑠 ) N
Under the null hypothesis, that the two attributes A and B are independent, the
expected frequency for (𝐴𝑖 , 𝐵𝑗 ) is given by
E[(𝐴𝑖 , 𝐵𝑗 )] = N.P(𝐴𝑖 , 𝐵𝑗 )
(𝐴𝑖 ) (𝐵𝑗 )
= N. .
𝑁 𝑁
(𝐴𝑖 )(𝐵𝑗 )
= (4.15)
𝑁
= (𝐴𝑖 , 𝐵𝑗 )0 (say)
i = 1, 2, …, r ; j = 1, 2, …, s
The statistic,
Comparing this calculated value of 𝜒 2 with tabulated values of 𝜒 2 for ((r-1)(s-1)) d.f.
at desired level of significance, the null hypothesis is rejected or accepted.
2 X 2Contingency Table
265
Attribute B Frequency Frequency Total
Attribute A B1 B2
A1 A b (a+b)
A2 C d (c+d)
Total (a+c) (b+d) N=a+b+c+d
Under the null hypothesis of independence of attributes, the value of 𝜒 2 for the 2 X 2
contingency table given in Table (4.2) is given by
𝑁(𝑎𝑑−𝑏𝑐)2
𝜒2 = (4.17)
(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)
If any cell frequency is less than 5, then in for 2 X 2 contingency table, the Yates
correction is used, and with this
(|𝑎𝑑−𝑏𝑐|−𝑁/2)2
𝜒 2 = (𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑) (4.18)
Example 13
A sample 0f 400 under-graduate (UG) and 400 post-graduate (PG) students was taken
to know their opinion about autonomous colleges. 290 of under-graduate and 310 of
post-graduate students favoured the autonomous status. Test whether opinion about
autonomous status and education level of students are independent at 5% significance
level.
Solution
Class
PG 310 90 400
266
Total 600 200 800
are independent.
= 2.66
d.f.= (2-1)(2-1) = 1
2
𝜒𝑐𝑎𝑙 < 3.84, accept H0. Opinion about autonomous status and education level are
independent.
Example 14
A movie producer takes a random survey from 1000 persons attending the pre-view of
the movie and obtains the following figures.
Liking
267
Liked 320 80 110 200 710
Disliked 50 15 70 60 195
Indifferent 30 5 20 40 95
Solution
E(40) = (300)(95)/1000.
Liking
268
Total 400 100 200 300 1000
2
Using (4.16), 𝜒𝑐𝑎𝑙 = 57.987, d.f. = (3-1)(4-1) = 6.
2
𝜒𝑐𝑎𝑙 > 12.592, reject hypothesis of age and movie liking are not independent.
Areas of application include (but not limited to the only mentioned here)
269
Computer graphics
Path tracing, occasionally referred to as Monte Carlo ray tracing, renders a 3D scene
by randomly tracing samples of possible light paths.
In Applied statistics Monte Carlo methods may be used for following purposes:
To compare the competing statistics forsmall samples under realistic data conditions.
To provide a random sample from the posterior distributionin Bayesian inference.
This sample then approximates and summarizes all essential features of the posterior.
16.6 Summary
The following are the important points contained in this unit.
1. Milk sachets of 500ml. each were filled by a machine for which standard
deviation of filling is 5 ml. 72 sachets are tested for their contents and the mean
content is found to be 501.1 ml. Test whether the machine is set properly.
2. A firm manufactures resistors. The standard deviation of their resistance is known
to be 0.02 ohms. It is required to test whether their mean resistance is 1.4 ohms. A
random sample consisting of 64 resisters have a mean of 1.39 ohms. Based on this
sample can we conclude that the mean resistance of whole lot is 1.4 ohms?
3. For a population, it is believed that the average height is greater than 180cms. with
standard deviation of 3.3cms. Randomly, 50 individuals were selected and their
heights are measured. The average height is found to be 181.1cms. Test the belief
regarding height of population at 1% significance level.
4. A shop sells on an average 200 pen drives per day with a standard deviation of 50
pen drives. After an extensive advertising campaign, the management computed
the average sales for next 25 days and is found to be 216. Find whether an
improvement has occurred or not, assuming normal distribution.
5. An insurance agent claims that the average age of policy holders who insure
through him is less than 30.5 years. A random sample of 100 policy holders who
had insured through him had the average age of 28.8 years and standard deviation
of 6.35 years. Test whether the claim of the agent is true at 5% significance level.
16.7 Keywords
270
hypothesis.
variance
Degrees of Freedom
probabilistic interpretation
16.9 References
Banglore.
271
4. Rohatgi, V. K. and A. K. Md. Ehsanes SALEH(2001). An Introduction to
Probability and Statistics (Second Edition). John Wiley & Sons, Inc.
272
STATISTICAL TABLES 677
z 0.00 0.01 0.02 O.Q3 0.04 0.05 0.06 O.o7 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2231 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1984 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287· 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0017 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
Source: Adapted with permission from P. G. Hoel,lntroduction to Mathematical Statistics, 4th ed., Wiley,
New York, 1971, p. 391.
a This table gives the probability that the standard normal variable Z will exceed a given positive value z,
that is, P{Z > Za) =a. The probabilities for negative values of z are obtained by symmetry.
Table ST3. Critical Values Under Chi-Square Distribution4
� Degrees a
QC
of
Freedom 0.99 0.98 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01
1 0.000157 0.000628 0.00393 0.0158 0.0642 0.148 0.455 1.074 1.642 2.706 3.841 5.412 6.635
2 0.0201 0.0404 0.103 0.211 0.446 0.713 1.386 2.408 3.219 4.605 5.991 7.824 9.210
3 0.115 0.185 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 7.815 9.837 11.341
4 0.297 0.429 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 9.488 11.668 13.277
5 0.554 0.752 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 11.070 13.388 15.086
6 0.872 1.134 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 12.592 15.033 16.812
7 1.239 1.564 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 14.067 16.622 18.475
8 1.646 2.032 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13.362 15.507 18.168 20.090
9 2.088 2.532 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 16.919 19.679 21.666
10 2.558 3.059 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 18.307 21.161 23.209
11 3.053 3.609 4.575 5.578 6.989 8.148 10.341 12.899 14.631 17.275 19.675 22.618 24.725
12 3.571 4.178 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 21.026 24.054 26.217
13 4.107 4.765 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 22.362 25.472 27.688
14 4.660 5.368 6.571 7.790 9.467 10.821 13.339 16.222 18.151 12.064 23.685 26.873 29.141
15 5.229 5.985 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 24.996 28.259 30.578
16 5.812 6.614 7.962 9.312 11.152 12.624 15.338 18.418 20.465 23.542 26.296 29.633 32.000
17 6.408 7.255 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.669 27.587 30.995 33.409
18 7.015 7.906 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 28.869 32.346 34.805
19 7.633 8.567 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 33.687 36.191
20 8.260 9.237 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 31.410 35.020 37.566
21 8.897 9.915 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 36.343 38.932
22 9.542 10.600 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 37.659 40.289
23 10.196 11.293 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 35.172 38.968 41.638
24 10.856 11.992 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 36.415 40.270 42.980
25 11.524 12.697 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 37.652 41.566 44.314
26 12.198 13.409 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 38.885 42.856 45.642
27 12.879 14.125 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 44.140 46.963
28 13.565 14.847 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 45.419 48.278
29 14.256 15.574 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 42.557 46.693 49.588
30 14.953 16.306 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 47.962 50.892
Source: Reproduced from Statistical Methods for Research Workers, 14th ed., 1972, with the permission of the estate of R. A. Fisher, and Hafner Press.
aFor degrees of freedom greater than 30, the expression .fii.2- ./2n- I may be used as a normal deviate with unit variance, where n is the number of degrees of
STATISTICAL TABLES 679
Source: P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393.
Reprinted by permission of John Wiley & Sons, Inc.
0The first column lists the number of degrees of freedom (n). The headings of the other columns give
probabilities (a) fort to exceed the entry value. Use symmetry for negative t values.
� Table STS. F-Distribution: 5% (Lightface Type) and 1% (Boldface Type) Points for the Distribution ofF
161 200 216 225 230 234 237 239 241 242 243 244 245 246 248 249 250 251 252 253 253 254 254 254
4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 6106 6142 6169 6208 6234 6258 6286 ti302 6323 6334 6352 6361 6366
2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41 19.42 19.43 19.44 19.45 19.46 19.47 Fi.47 19.48 19.49 19.49 19.50 19.50
98.49 99.01 99.17 99.25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 99.42 99.43 99.44 99.45 99.46 99.47 99.48 � .48 99.49 99.49 99.49 99.50 99.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74 8.71 8.69 8.66 8.64 8.62 8.60 8.58 8.57 8.56 8.54 8.54 8.53
34.12 30.81 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.30 26.27 26.23 26.18 26.14 26.12
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.68 5.66 5.65 5.64 5.63
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14.02 13.93 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36
16.26 13.27 12.06 11.39 10.97 10.67 16.45 10.27 16.15 10.05 9.96 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.67
13.74 16.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23
������������������������
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93
11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.64 2.61 2.59 2.56 2.55 2.54
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91
II 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.96 2.82 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40
�������������w����������
12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30
������������������������
13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.28 2.26 2.24 2.22 2.21
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16
14 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60 2.56 2.53 2.48 2.44 2.39 2.35 2.31 2.27 2.24 2.21 2.19 2.16 2.14 2.13
8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80 3.70 3.62 3.51 3.43 3.34 3.26 3.21 3.14 3.11 3.06 3.02 3.00
15 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55 2.51 2.48 2.43 2.39 2.33 2.29 2.25 2.21 2.18 2.15 2.12 2.10 2.08 2.07
8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 3.56 3.48 3.36 3.29 3.20 3.12 3.07 3.00 2.97 2.92 2.89 2.87
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.45 2.42 2.37 2.33 2.28 2.24 2.20 2.16 2.13 2.09 2.07 2.04 2.02 2.01
8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.61 3.55 3.45 3.37 3.25 3.18 3.10 3.01 2.96 2.89 2.86 2.80 2.77 2.75
17 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.50 2.45 2.41 2.38 2.33 2.29 2.23 2.19 2.15 2.11 2.08 2.04 2.02 1.99 1.97 1.96
8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.45 3.35 3.27 3.16 3.08 3.00 2.92 2.86 2.79 2.76 2.70 2.67 2.65
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.29 2.25 2.19 2.15 2.ll 2.07 2.04 2.00 1.98 1.95 1.93 1.92
8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51 3.44 3.37 3.27 3.19 3.07 3.00 2.91 2.83 2. 78 2.71 2.68 2.62 2.59 2.57
19 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38 2.34 2.31 2.26 2.21 2.15 2.11 2.07 2.02 2.00 1.96 1.94 1.91 1.90 1.88
8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30 3.19 3.12 3.00 2.92 2.84 2.76 2.70 2.63 2.60 2.54 2.51 2.49
20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.23 2.18 2.12 2.08 2.04 1.99 1.96 1.92 1.90 1.87 1.85 1.84
8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37 3.30 3.23 3.13 3.05 2.94 2.86 2.77 2.69 2.63 2.56 2.53 2.47 2.44 2.42
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 2.25 2.20 2.15 2.09 2.05 2.00 1.96 1.93 1.89 1.87 1.84 1.82 1.81
8.02 5. 78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31 3.24 3.17 3.07 2.99 2.88 2.80 2.72 2.63 2.58 2.51 2.47 2.42 2.38 2.36
22 4.30 3.44 3.05 2.82 2.66 2.55 2.47 2.40 2.35 2.30 2.26 2.23 2.18 2.13 2.07 2.03 1.98 1.93 1.91 1.87 1.84 1.81 _1.80 1.78
7.94 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 3.12 3.02 2.94 2.83 2.75 2.67 2.58 2.53 2.46 2.42 2.37 2.33 2.31
23 4.28 3.42 3.03 2.80 2.64 2.53 2.45 2.38 2.32 2.28 2.24 2.20 2.14 2.10 2.04 2.00 1.96 1.91 1.88 1.84 1.82 1.79 1.77 1.76
7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.14 3.07 2.97 2.89 2.78 2.70 2.62 2.53 2.48 2.41 2.37 2.32 2.28 2.26
24 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26 2.22 2.18 2.13 2.09 2.02 1.98 1.94 1.89 1.86 1.82 1.80 1.76 1.74 1.73
7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17 3.09 3.03 2.93 2.85 2.74 2.66 2.58 2.49 2.44 2.36 2.33 2.27 2.23 2.21
25 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24 2.20 2.16 2.11 2.06 2.00 1.96 1.92 1.87 1.84 1.80 1.77 1.74 1.72 1.71
7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 3.21 3.13 3.05 2.99 2.89 2.81 2.70 2.62 2.54 2.45 2.40 2.32 2.29 2.23 2.19 2.17
26 4.22 3.37 2.89 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.10 2.05 1.99 1.95 1.90 1.85 1.82 1.78 1.76 1.72 1.70 1.69
7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09 3.02 2.96 2.86 2.77 2.66 2.58 2.50 2.41 2.36 2.28 2.25 2.19 2.15 2.13
�
....
!
Table ST5 (Continued)
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20 2.16 2.13 2.08 2.03 1.97 1.93 1.88 1.84 1.80 1.76 1.74 1.71 1.68 1.67
7.68 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06 2.98 2.93 2.83 2.74 2.63 2.55 2.47 2.38 2.33 2.25 2.21 2.16 2.12 2.10
28 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.M 2.19 2.15 2.12 2.06 2.02 1.96 1.91 1.87 1.81 1.78 1.75 1.72 1.69 1.67 1.65
7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03 2.95 2.90 2.80 2.71 2.60 2.52 2.44 2.35 2.30 2.22 2.18 2.13 2.09 2.06
29 4.18 3.33 2.93 2.70 2.54 2.43 2.35 2.28 2.22 2.18 2.14 2.10 2.05 2.00 1.94 1.90 1.85 1.80 1.77 1.73 1.71 1.68 1.65 1.64
7.60 5.52 4.54 4.04 3.73 3.50 3.33 3.20 3.08 3.00 2.92 2.87 2.77 2.68 2.57 2.49 2.41 2.32 2.27 2.19 2.15 2.10 2.06 2.03
30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 2.12 2.09 2.04 1.99 1.93 1.89 1.84 1.79 1.76 1.72 1.69 1.66 1.64 1.62
�����������������������w
32 4.15 3.30 2.90 2.67 2.51 2.� 2.32 2.25 2.19 2.14 2.10 2.07 2.02 1.97 1.91 1.86 1.82 1.76 1.74 1.69 1.67 1.64 1.61 1.59
7.50 5,34 4.46 3.4J7 3.66 3.42 3.25 3.12 3.01 2.94 2.86 2.80 2.70 2.62 2.51 2.42 2.34 2.25 2.20 2.12 2.08 2.02 1.98 1.96
34 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12 2.08 2.05 2.00 1.95 1.89 1.84 1.80 1.74 1.71 1.67 1.64 1.61 1.59 1.57
������������������������
36 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10 2.06 2.03 1.89 1.93 1.87 1.82 1.78 1.72 1.69 1.65 1.62 1.59 1.56 1.55
7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86 2.78 2.72 2.62 2.54 2.43 2.35 2.26 2.17 2.12 2.04 2.00 1.94 1.90 1.87
38 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09 2.05 2.02 1.96 1.92 1.85 1.80 1.76 1.71 1.67 1.63 1.60 1.57 1.54 1.53
7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82 2.75 2.69 2.59 2.51 2.30 2.32 2.22 2.14 2.08 2.00 1.97 1.90 1.86 1.84
� 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 2.04 2.00 1.95 1.90 1.84 1.79 1.74 1.69 1.66 1.61 1.59 1.55 1.53 1.51
7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80 2.73 2.66 2.56 2.49 2.37 2.29 2.20 2.11 2.05 1.4J7 1.94 1.88 1.84 1.81
42 4.07 3.22 2.83 2.59 2.44 2.32 2.24 2.17 2.11 2.06 2.02 1.99 1.94 1.89 1.82 1.78 1.73 1.68 1.64 1.60 1.57 1.54 1.51 1.49
������������������������
44 4.06 3.21 2.82 2.58 2.43 2.31 2.23 2.16 2.10 2.05 2.01 1.98 1.92 1.88 1.81 1.76 1.72 1.66 1.63 1.58 1.56 1.52 1.50 1.48
7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75 2.68 2.62 2.52 2.44 2.32 2.24 2.15 2.06 2.00 1.92 1.88 1.82 1.78 1.75
46 4.05 3.2o 2.81 2.57 2.42 2.m 2.22 2.14 2.09 2.04 2.00 1.97 1.91 t.87 t.8o 1.75 1.11 t.65 1.62 1.57 1.54 1.51 1.48 1.46
7.21 5.10 4.24 3.76 3.44 3.22 3.05 2.92 2.82 2.73 2.66 2.60 2.50 2.42 2.40 2.22 2.13 2.04 1.98 1.90 1.86 1.80 1.76 1.72
48 4.04 3.19 2.80 2.56 2.41 2.30 2.21 2.14 2.08 2.03 1.99 1.96 1.90 1.86 1.79 1.74 1.70 1.64 1.61 1.56 1.53 1.50 1.47 1.45
����������������������w�
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02 1.98 1.95 1.90 1.85 1.78 1.74 1.69 1.63 1.60 1.55 1.52 1.48 1.46 1.44
����������������������m�
55 4.02 3.17 2.78 2.54 2.38 2.27 2.18 2.11 2.05 2.00 1.97 1.93 1.88 1.83 1.76 1.72 1.67 1.61 1.58 1.52 1.50 1.46 1.43 1.41
�������������������w�m��
60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 1.86 1.81 1.75 1.70 1.65 1.59 1.56 1.50 1.48 1.44 1.41 1.39
7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50 2.40 2.32 2.20 2.12 2.03 1.93 1.87 1.79 1.74 1.68 1.63 1.60
65 3.99 3.14 2.75 2.51 2.36 2.24 2.15 2.08 2.02 1.98 1.94 1.90 1.85 1.80 1.73 1.68 1.63 1.57 1.54 1.49 1.� 1.42 1.39 1.37
������������������������
70 3.98 3.13 2.74 2.50 2.35 2.32 2.14 2.07 2.01 1.97 1.93 1.89 1.84 1.79 1.72 1.67 1.62 1.56 1.53 1.47 1.45 1.40 1.37 1.35
7.01 4.92 4.08 3.60 3.29 3.07 2.91 2.77 2.67 2.59 2.51 2.45 2.35 2.28 2.15 2.07 1.98 1.88 1.82 1.74 169 1.63 1.56 1.53
80 3.96 3.11 2.72 2.48 2.33 2.21 2.12 2.05 1.99 1.95 1.91 1.88 1.82 1.77 1.70 1.65 1.60 1.54 1.51 1.45 1.42 1.38 1.35 1.32
6.96 4.88 4.04 3.56 3.25 3.04 2.87 2.74 2.64 2.55 2.48 2.41 2.32 2.24 2.11 2.03 1.94 1.84 1.78 1.70 1.65 1.57 1.52 1.49
100 3.94 3.09 2.70 2.� 2.30 2.19 2.10 2.03 1.97 1.92 1.88 1.85 1.79 1.75 1.68 1.63 1.57 1.51 1.48 1.42 1.39 1.34 1.30 1.28
6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 2.59 2.51 2.43 2.36 2.26 2.19 2.06 1.98 1.89 1.79 1.73 1.64 1.59 1.51 1.46 1.43
125 3.92 3.07 2.68 2.44 2.29 2.17 2.08 2.0 I 1.95 1.90 1.86 1.83 1.77 1.72 1.65 1.60 1.55 1.49 1.45 1.39 1.36 1.31 1.27 1.25
6.84 4.78 3.94 3.47 3.17 2.95 2.79 2.65 2.56 2.47 2.40 2.33 2.23 2.15 2.03 1.94 1.85 1.75 1.68 1.59 1.54 1.46 1.40 1.37
150 3.91 3.06 2.67 2.43 2.27 2.16 2.07 2.00 1.94 1.89 1.85 1.82 1.76 1.71 1.64 1.59 1.54 1.47 1.44 1.37 1.34 1.29 1.25 1.22
6.81 4.75 3.91 3.44 3.13 2.92 2.76 2.62 2.53 2.44 2.37 � 2.20 2.12 2.00 1.91 1.83 1.72 1.66 1.56 1.51 1.43 1.37 1.33
200 3.89 3.04 2.65 2.41 2.26 2.14 2.05 1.98 1.92 1.87 1.83 1.80 1.74 1.69 1.62 .157 1.52 1.45 1.42 1.35 1.32 1.26 1.22 1.19
6.76 4.71 3.88 3.41 3.11 2.90 2.73 2.60 2.50 2.41 2.34 2.28 1.17 2.09 1.97 1.88 1.79 1.69 1.62 1.53 1.48 1.39 1.33 1.28
400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 190 1.85 1.81 1.78 1.72 1.67 1.60 1.54 1.49 1.42 1.38 1.32 1.28 1.22 1.16 1.13
��w����������������w����
1000 3.85 3.00 2.61 2.38 2.22 2.10 2.02 1.95 1.89 1.84 1.80 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30 1.26 \.19 1.13 1.08
��������w�������m�������
00 3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28 1.24 1.17 1.11 1.00
������������������������
Source: Reprinted by pennission from George W. Snedecor and William G. Cochran, Statistical Methods, 6th ed., © 1967 by Iowa State University Press, Ames, Iowa.