2.7.1 Definition:: LU Decomposition

LU Decomposition
2.7.1 Definition: A mn matrix is said to have a LU-decomposition if there exists matrices L and U with the following properties: (i) L is a mn lower triangular matrix with all diagonal entries being 1. (ii) U is a mn matrix in some echelon form. (iii) A = LU. 2.7.2 Advantage of LU-decomposition:: Suppose we want to solve a mn system AX = b. If we can find a LU-decomposition for A , then to solve AX =b, it is enough to solve the systems
Thus the system LY = b can be solved by the method of forward substitution and the system UX = Y can be solved by the method of backward substitution. To illustrate, we give some examples 2.7.3 Example: Consider the given system AX = b, where
It is easy to check that A = LU, where
Thus, to solve AX = b, we first solve LY = b by forward substitution
to get
Now, we solve UX =Y by backward substitution :
and get
The above discussion raises the following question . Which matrices have LU-decomposition, and how to find them. The answer is given by the following : 2.7.4 Theorem: Let A be a mn matrix and U= .... A 's corresponds to the operation of row interchange, then is also a lower triangular , ,... be elementary matrices such that
is in non-echelon form . If none of the
C= ... is a lower triangular invertible matrix. Further L = matrix with A = LU .
Proof
2.7.5 Example:
Let
Then the row operation -2
,3
give
The corresponding elementary matrices and their inverses are
Thus
Let us observe that for
Note that the element below the main diagonal in L is the pivotal columns are the negatives of the scalars used in the respective column to make all the entries in that column below the pivot zero. For example, in first column, we operated -2 second entry is 2 and the third entry is -3. 2.7.6 Example: Let + and 3 + . Thus, in the first column, the
To reduce A to a row -echelon form, we operate , namely the second column to get
and
for the first pivotal columns
Next we operate -3
for the next pivotal column, the
column to get
This has only two non-zero rows. Thus L is given by replacing entries below diagonal pivotal columns by -ve of the multiples: in column 1 these are - , - ,and in second column this is 3. Thus
Let us check
The method followed in example 2.5.8 and 2.5.9 can in fact be proved mathematically as follows: 2.7.7 Theorem: Let be the m m elementary matrix which corresponds to the elementary row operation of and adding it to the jth row. Then
multiplying the ith row by the scalar -
(iii) If
is such that is in non-echelon form, then
Proof
Till now we had assumed that no row interchanges were required to transform A to upper triangular
form. If such row operations are needed we can proceed by the following:
2.7.8 Theorem: Let A be an m n matrix which requires row interchanges , in addition to the other elementary row operations to transform it to row echelon form U : Then , the matrix PA requires no row interchanges to reduce it to row echelon form and hence can be written as PA = LU . Proof
2.7.9 Corollary: The system AX = b is equivalent to a system PAX = Pb , Where PA has LU-decomposition .
2.7.10 Definition: A mm matrix which arises from matrix . 2.7.11 Example: by finite number of row interchanges is called a permutation
To continue we need to interchange
to get
and now U is in r.e.f.. Thus, we consider PA and transform it to r.e.f
Hence,
8.2. Orhtonormal basis
In theorem 8.1.5 we saw that every set of nonzero orthogonal vectors is linearly independent. This motivates our next definition.
8.2.1 Definition: Let V be an inner product space and W a subspace of V. An orthonormal set orthonormal basis of W if . As an immediate application of theorem 8.1.5, we have the following : is called an
8.2.2 Theorem: Let W be any subspace of V. Then W has an orthonormal basis. Proof
8.2.3 Example: For V = with usual inner-product, the set with ,where 1 is
at the
place, is an orthonormal basis of V. This is called standard basis of
8.2.4 Gram-Schmidt algorithm: Based on the proof of theorem 8.1.5(iv), the procedure for finding an orthonormal basis for an inner product space is given as follows: Let Step1: Define Step 2: : be a basis of V.
Step 3: Define
Then,
is an orthonormal basis of V .
8.2.5 Examples: (i) In example 8.1.6, the basis S={(1, 1, 1, ), (-1, 0, -1), (-1, 2, 3)} of gave the orthonormal basis
(ii) Consider the vector space V = of all real-polynomial function p(x), degree at most 3. Let the inner-product on V be given by
,of
Consider the basis S = for V. The Gram-Schmidt process gives the following.
Next, define
Then is an orthonormal basis of V. (iii) Let W denote the subspace of spanned by the set of vectors: S = {(1, 1, 1), (2, 2, 3), (0, 0, 1), (1, 2, 3)}. Clearly S is not a basis of W , since S has four elements. Thus, to find an orthonormal basis of W one way is to first select a basis of W out of the vectors in S and then apply Gram-Schmidt process to it. Another, more straight forward method, is to apply Gram-Schmidt process directly to the set of vectors in S , and discard those vectors which become zero. In our case it is as follows:
The required orthonormal basis is
Click here to see an interactive visualization of Applet : 8.1
8.2.6 Note:
Suppose we want to find an orthonormal basis of
which includes a given unit vector
Of course, one method is first extend {X} to a basis of and then using GramSchmidt process orthonormalize it. An efficient way of doing this is as follows: (i) Write
Assume (ii) Construct
(iii) It is easy to check that the matrix U has the property i.e. Columns of U are orthonormal.
8.3. Advantages of an orthonormal basis Recall that, if for a vector space V, is an ordered basis, then every vectors v V can be assigned unique coordinates, i.e. there exist unique
scalars
such that
However, we do not know how to compute these scalars for a given v. Suppose, V is an inner product space and B is an ordered orthonormal basis of V. Then the coordinates of any vector v V, can be computed easily, as explained in the next theorem. 8.3.1 Theorem: Let V be an inner product space with ordered orthonormal basis followings holds. (i) For every v V, . Then the
(ii) For every v, w
(iii) For every v V,
The scalar Identity.
are called the Fourier-coefficients of v and (iii) is called the Parseval's
Proof 8.3.2 Note: Formula (8.1) tells us how to compute its coordinates of a vector and formulae (8.2) and (8.3) allow us to compute the inner-product and the norm in terms of these coordinates. 8.3.3 Examples: (i) Consider the set
It is easy to verify that B is an orthonormal basis of
. For the vector u = ( 6, 1, -8 ), since
we have (ii) Consider the space of all real polynomials of degree at most 4. For p, q . Consider the subspace of , define . Consider the
It is easy to see that this is an inner-product on basis of
. We can apply Gram-Schmidt process to this basis to get an orthonormal basis
of as follows : Note that
Now define
Thus, an orthonormal basis of
is given by
For a vector, say
its coordinates with respect to B are given by :
Next we show that the matrix representation of a linear transformation is easy to compute with respect to an ordered orthonormal basis. 8.3.4 Theorem: Suppose is a linear transformation on an inner-product space V. Let be an ordered orthonormal basis of V. Then
Proof 8.3.5 Example: Let us compute the matrix projection on the line along the unit vector u = Since . the orthogonal
and
we have
10.1. Diagonalizability Let A be a nn matrix with entries fromF . In 6.4.4 we had raised the question : When is A similar to a diagonal matrix? Equivalently given a linear transformation T : V V, dim(V)= n , when we can find an ordered basis B of V such that is diagonal? Let us make the following definition:
10.1.1 Definition: (i) A matrix A is said to be diagonalizable if A is similar to a diagonal matrix , i.e. there exists an invertible matrix P such that is a diagonal matrix. (ii) Let V be a finite dimensional vector space and T : V V be linear transformation . We sat T is diagonalizable if there exists a basis B of V such that the matrix is a diagonal matrix.
We would like to know when is a given matrix A diagonalizable and if so, how to find P such that is diagonal ? Next theorem answers this question.
10.1.2 Theorem: Let A be a n n matrix . If A is diagonalizable, then there exists scalars and vectors be such that the following holds :
Proof Theorem 10.1.2 says that if A is diagonalizable then not only A has n eigenvalues, it has a basis consisting of eigenvectors. In fact, the converse of this is also true.
10.1.3 Theorem: (i) Let A be a n n matrix . If A has a linearly independent eigenvectors, then A is diagonalizable. (ii) Let A be a n n matrix. If A is diagonalizable and P is any n n invertible matrix , then B:= is also diagonalizable. (iii) Let V be a finite dimensional vector space and T : V V be a linear transformation . If there exists a basis B of V consisting of eigenvectors of T, then T is diagonalizable. Proof
10.1.4 Note: Theorem 10.1.3 not only tells us when is A diagonalizable, it also gives us a matrix P which will diagonalize A, i.e., = D, and gives us the resulting diagonal matrix also. The columns vectors of the matrix P are the n eigenvectors of A and the diagonal matrix D the diagonal entries as the eigenvalues of A corresponding to these n-eigenvectors .
10.1.5 Examples: (i) Consider the matrix
Since
and the equation (ii) Consider the matrix
has no real roots, the matrix is not invertible.
The characteristic polynomial of A is
Thus, A has two eigenvalues: = 3, 6. For = 3,
The solutions of ( A - 3I )(X) = 0 are given by
Thus,
are both eigenvector for the eigenvalue = 3. Similarly, for = 6, since
The solutions of ( A - 6I )(X) are given by
For example
is a solution. Hence
is an eigenvector for the eigenvalue = 6. Let
Since det( P ) 0, P is invertible. Thus by theorem 10.1.2, is invertible, i.e.,
form a linearly independent set. Hence,
Let us verify this:
(iii) Let If we take standard basis B ={1, x} for T( 1 ) = x, T( x ) = -2 + 3x Thus , , then
The characteristic polynomial of T is given by
Thus, T has two eigenvalues = 1, 2. Further, for the eigenvalues = 1
Thus (
- I ) X = 0 implies
Thus, X =
is an eigenvector for
for = 1.
Similarly, for = 2
Thus,
is an eigenvector for
for the eigenvalue = 2 . Thus, if we take
Or, equivalently, if we take T ( -2 + x ) = -2 + x = 1 ( -2 + x ) + 0 ( -1 + x )
and T ( -1 + x ) = -2 + 2x = 0 ( -2 + x ) + 2 ( -1 + x ) Thus
To apply theorem 10.1.2 to diagonalize a matrix, it is important to know how many eigenvalues the matrix has, and the relations between the vectors. 10.1.6 Theorem: Let A be a n n matrix . (i) Let be an eigenvalue of A and Then, is a subspace of called the eigen subspace for the eigenvalue . (ii) Eigenvectors corresponding to distinct eigenvalues are linearly independent : let is linearly independent set .
forms a linearly independent subset of (iv) Let
be distinct eigenvalues A and
Then A is diagonalizable if and only if
Proof
10.1.7 Corollary: Let A be a nn matrix. If A has n distinct eigenvalues, then A is diagonalizable. Proof
10.1.8 Example: Consider the matrix
Since
Thus, = 1, 2, 3 are the eigenvalues of A, and hence A is diagonalizable. 10.1.9 Note: If a nn matrix A does not have all eigenvalues distinct, it need not imply that A is not diagonalizable. For example in 10.1.4(ii), though A had only two distinct values, it was diagonalizable, for we were able to find 3-linearly independent eigenvalues . When a nn matrix fails to have distinct eigenvalues, it can never happen because either the characteristic polynomial have no roots in F ( where F = ), or some of the roots of det( A - I ) are repeated .This motivates our next definitions: 10.1.10 Definition:
Let A be a nn matrix and F be an eigenvalue of A . (i) We say a positive integer rn is the algebraic multiplicity of if is a root of det(A -tI) of order r. For = 0,
Thus
Finally, for = -2
Thus
Thus, each eigenvalue also has geometric multiplicity 1. (ii) Let
Since
A has only one eigenvalue = 1 and it has algebraic multiplicity 2.Further
(iii) we say has geometric multiplicity k, a positive integer, if
10.1.11 Example: (i) Let
Then
Thus, A has three distinct eigenvalues = 5, 0 and -2. Each has algebraic multiplicity one. Infact each has geometric multiplicity also one. Let us verify that. For = 5,
Thus
Thus .y .
( A - I ) X = 0 iff X = Thus
Hence, = 1 has geometric multiplicity also 1. The notations of algebraic multiplicity, geometric multiplicity and the diagonalizability are related as follows:
10.1.12 Theorem: Let A be a nn matrix. Let A have distinct eigenvalues , respectively and geometric multiplicities with algebraic multiplicities , respectively . Then
Proof
8.4. Orthogonal decomposition 8.4.1 Definition: Let S be any subset of V and let
It is easy to check that
is the subspace of V, called the orthogonal complement of S..
8.4.2 Theorem (Orthogonal complement): Let W be any subspace of V. Then w + v for some w W, u Further, every v V can be uniquely written as .
, we write this as : V= W
Proof
8.4.3 Example: (i) Let V = with the standard inner-product. Let S = {( 1, 0, 0 ), ( 0, 1, 0 )}. Then = {( x, y, z ) | (x, y, z) . (1, 0, 0 ) = 0 = ( 0, 1, 0 ).( x, y, z )} = {(0, 0, z ) | z } If we write W = [S] = {( x, y, 0 ) | x, y }, then = {( 0, 0, 3 ) | z }.
(ii) Let
, the space of all real polynomials of degree at most 3, with inner product :
Let Then This gives
, if and only if
and
i.e.
. This system of equations in variables a, b, c, leads to solutions :
where
are arbitrary. Hence
if and only if
Hence
(iii) Let The vectors ( 2, 5, -1 ), ( -2, 1, 1 ) are orthogonal . Then an orthonormal basis of W is given by
Then the projection of any vector
onto W is given by
For example, if u = ( 1, 2, 3 ), then
Thus
Let V be an inner-product space and let W be a subspace of V. Theorem 8.1.4 tells us that , i.e. , the space V decomposes as a sum of two orthogonal subspaces. Decompositions, as given by theorem 8.4.2 are very useful in understanding linear transformation on inner-product spaces ( see proposition 9.3.2 and theorem 10.2.4. ). This idea can be extended as follows :
8.4.4 Definition: Let V be an inner-product space and be subspaces of V such that
In that case, we say V is an orthogonal sum of the subspaces The following is easy to prove :
and we write it as
8.4.5 Proposition: Let Then there exist unique such that
Proof
As another application of the projections, we have the following :
8.4.6 Theorem (Best unique approximation): Let W be any subspace of V and let v V. Then there exists a unique such that
Proof
8.4.7 Example: (i) Let V = and W be the subspace of generated by {( 2, -1, -2 ), ( 1, 0, 1 )}. Then, the best approximator for u = ( 2, 1, 3 ) from W is given by
The distance of u =( 2, 1, 3 ) from W is given by
(ii) Consider
the space of all real polynomials of degree at most 3, with inner-product
Let W denote the subspace of V spanned by { 1, t }. Since , { 1, t } forms an orthogonal basis of w . Since
the orthogonal projection onto W , the best approximator , is given by
8.5. Least square approximations To show that theorem 8.4.2 is not only of theoretical interest and has practical applications also, let us consider the following problem: An experimenter collects data by taking measurements at time points , respectively. Geometrically, this gives points
Because of the distribution of these points or otherwise, the experimenter feels that there exists a linear relation between the time variable t and the observation variables , say y = ct + d . The problem is to find c and d such that the line y = ct + d best fits the experimental data. Equivalently, one wants to minimize the error:
The error will be zero i.e., the line will be the exact fit if and only if We can reformulate the problem as follows: let
Then, Thus, given the matrix A and vector Y, the line ct + d will be the exact fit (error zero) if and only if AX = Y has a solution. This leads to the problem of solving the system AX = Y. However, in general, the number of equations greatly exceed the the number of variables and it is unrealistic to expect a solution of AX = Y. So, one would like to find (8.4) This leads to the following definition : such that it gives least error, i.e.,
8.5.1 Definition: For a matrix A and , a least square solution of (8.5)
As an application of theorem 8.4.2, we have the following :
8.5.2 Theorem: The set of least square solutions of AX = Y is same as the solution of the system
These are called the normal equations for a matrix A. Proof
8.5.3 Note: For a matrix A and a given vector , let be a least square solution of AX = Y. Then ,
is the projection of the vector Y onto the column space of A. Least square approximation need not be unique, however if and are both least square solutions for AX = Y, then A =A
. As a consequence of theorem 8.5.2, we have the following:
8.5.4 Corollary: The system AX = Y , has unique least square solution if and only if in that case , the least square solution is given by is invertible, and
Proof
The above corollary tells us that the least square approximation can be computed if is invertible. Necessary and sufficient conditions for the invertibility of is given below .
8.5.5 Example: Consider the equation AX = Y where
It is easy to check that the system AX = Y is inconsistent. To find a least square approximation, we solve the normal equations, Here
Since
is invertible, the system has unique least square approximation given by the solution of
Using Gauss elimination, we have
8.5.6 Theorem: For any matrix A, the matrix is invertible if and only if rank( A ) = n.
Proof
8.5.7 Example: Let
We want to find the least square approximation for AX = Y . We note that
Which is invertible . It is easy to compute
Thus
Thus the least square approximation is
In the particular situation, when the columns of A are linearly independent , it is possible to compute the least square approximation another method . For any vector .
8.5.8 Definition: A matrix said to have a QR decomposition if there exist matrices Q and R with the following properties : (i) Q is a matrix whose columns form an orthonormal basis for the column space of A. (ii) A is a upper triangular invertible matrix with positive diagonal entries. (iii) A = QR .
8.5.9 Theorem: Let A be a approximation matrix and . If A admits a QR decomposition, then the least square
for AX = Y is given by
Proof
8.5.10 Note: The above theorem tells us that if A = QR, then to find the least square approximation, we do not have to compute , as suggested by corollary 8.5.3, instead one can compute which is easier to compute as R is Upper triangular, and use above theorem. ,
8.5.11 Example: Let A = QR , where . Then Q has orthonormal columns and R is invertible, upper triangular. Thus for . The system AX = Y, for Here , has least square approximation, given by
The existence and uniqueness of QR decomposition of a matrix is given by the next theorem.
8.5.12 Theorem: Let A be a matrix with linearly independent columns. Thus, A admits a QR decomposition . Further such a decomposition is unique. Proof
8.5.13 Example: Let
It is easy to verify that the columns of A are linearly independent. Application of Gram-Schmidt process to the column vectors of A gives the orthonormal basis for the column-space .
Thus
where
are the column-vectors of A. In our case,
Thus,
We verify
8.5.14 Note: The method of least square approximation applies equally well for fitting a polynomial to a given data.

2.7.1 Definition:: LU Decomposition

Uploaded by

Copyright:

Available Formats

2.7.1 Definition:: LU Decomposition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.7.1 Definition:: LU Decomposition

Uploaded by

Copyright:

Available Formats

What is the advantage of LU decomposition?

What is the advantage of LU decomposition?

How can we check if a matrix has an LU decomposition?

How can we check if a matrix has an LU decomposition?

LU Decomposition

It is easy to check that A = LU, where

Thus, to solve AX = b, we first solve LY = b by forward substitution

Now, we solve UX =Y by backward substitution :

is in non-echelon form . If none of the

C= ... is a lower triangular invertible matrix. Further L = matrix with A = LU .

Then the row operation -2

The corresponding elementary matrices and their inverses are

Let us observe that for

for the first pivotal columns

for the next pivotal column, the

multiplying the ith row by the scalar -

is such that is in non-echelon form, then

To continue we need to interchange

and now U is in r.e.f.. Thus, we consider PA and transform it to r.e.f

8.2. Orhtonormal basis

place, is an orthonormal basis of V. This is called standard basis of

The required orthonormal basis is

Click here to see an interactive visualization of Applet : 8.1

Suppose we want to find an orthonormal basis of

which includes a given unit vector

Assume (ii) Construct

(ii) For every v, w

(iii) For every v V,

The scalar Identity.

are called the Fourier-coefficients of v and (iii) is called the Parseval's

It is easy to verify that B is an orthonormal basis of

. For the vector u = ( 6, 1, -8 ), since

It is easy to see that this is an inner-product on basis of

. We can apply Gram-Schmidt process to this basis to get an orthonormal basis

of as follows : Note that

Thus, an orthonormal basis of

For a vector, say

its coordinates with respect to B are given by :

10.1.5 Examples: (i) Consider the matrix

and the equation (ii) Consider the matrix

has no real roots, the matrix is not invertible.

The characteristic polynomial of A is

Thus, A has two eigenvalues: = 3, 6. For = 3,

The solutions of ( A - 3I )(X) = 0 are given by

are both eigenvector for the eigenvalue = 3. Similarly, for = 6, since

The solutions of ( A - 6I )(X) are given by

is an eigenvector for the eigenvalue = 6. Let

Since det( P ) 0, P is invertible. Thus by theorem 10.1.2, is invertible, i.e.,

form a linearly independent set. Hence,

Let us verify this:

(iii) Let If we take standard basis B ={1, x} for T( 1 ) = x, T( x ) = -2 + 3x Thus , , then

The characteristic polynomial of T is given by

Thus, T has two eigenvalues = 1, 2. Further, for the eigenvalues = 1

for the eigenvalue = 2 . Thus, if we take

Or, equivalently, if we take T ( -2 + x ) = -2 + x = 1 ( -2 + x ) + 0 ( -1 + x )

forms a linearly independent subset of (iv) Let

be distinct eigenvalues A and

Then A is diagonalizable if and only if

10.1.8 Example: Consider the matrix