2.7.1 Definition:: LU Decomposition

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 43
At a glance
Powered by AI
The key takeaways are that LU decomposition allows solving linear systems more efficiently by decomposing the coefficient matrix A into lower and upper triangular matrices L and U. It also discusses the properties and examples of LU decomposition.

The advantage of LU decomposition is that to solve the system Ax=b, it is enough to solve the systems Ly=b and Ux=y, which can be done efficiently using forward and backward substitution respectively.

A matrix A has an LU decomposition if it can be written as the product of a lower triangular matrix L and an upper triangular matrix U. We can check for LU decomposition by attempting to decompose A in this way.

LU Decomposition

2.7.1 Definition: A mn matrix is said to have a LU-decomposition if there exists matrices L and U with the following properties: (i) L is a mn lower triangular matrix with all diagonal entries being 1. (ii) U is a mn matrix in some echelon form. (iii) A = LU. 2.7.2 Advantage of LU-decomposition:: Suppose we want to solve a mn system AX = b. If we can find a LU-decomposition for A , then to solve AX =b, it is enough to solve the systems

Thus the system LY = b can be solved by the method of forward substitution and the system UX = Y can be solved by the method of backward substitution. To illustrate, we give some examples 2.7.3 Example: Consider the given system AX = b, where

It is easy to check that A = LU, where

Thus, to solve AX = b, we first solve LY = b by forward substitution

to get

Now, we solve UX =Y by backward substitution :

and get

The above discussion raises the following question . Which matrices have LU-decomposition, and how to find them. The answer is given by the following : 2.7.4 Theorem: Let A be a mn matrix and U= .... A 's corresponds to the operation of row interchange, then is also a lower triangular , ,... be elementary matrices such that

is in non-echelon form . If none of the

C= ... is a lower triangular invertible matrix. Further L = matrix with A = LU .

Proof

2.7.5 Example:

Let

Then the row operation -2

,3

give

The corresponding elementary matrices and their inverses are

Thus

Let us observe that for

Note that the element below the main diagonal in L is the pivotal columns are the negatives of the scalars used in the respective column to make all the entries in that column below the pivot zero. For example, in first column, we operated -2 second entry is 2 and the third entry is -3. 2.7.6 Example: Let + and 3 + . Thus, in the first column, the

To reduce A to a row -echelon form, we operate , namely the second column to get

and

for the first pivotal columns

Next we operate -3

for the next pivotal column, the

column to get

This has only two non-zero rows. Thus L is given by replacing entries below diagonal pivotal columns by -ve of the multiples: in column 1 these are - , - ,and in second column this is 3. Thus

Let us check

The method followed in example 2.5.8 and 2.5.9 can in fact be proved mathematically as follows: 2.7.7 Theorem: Let be the m m elementary matrix which corresponds to the elementary row operation of and adding it to the jth row. Then

multiplying the ith row by the scalar -

(iii) If

is such that is in non-echelon form, then

Proof

Till now we had assumed that no row interchanges were required to transform A to upper triangular

form. If such row operations are needed we can proceed by the following:

2.7.8 Theorem: Let A be an m n matrix which requires row interchanges , in addition to the other elementary row operations to transform it to row echelon form U : Then , the matrix PA requires no row interchanges to reduce it to row echelon form and hence can be written as PA = LU . Proof

2.7.9 Corollary: The system AX = b is equivalent to a system PAX = Pb , Where PA has LU-decomposition .

2.7.10 Definition: A mm matrix which arises from matrix . 2.7.11 Example: by finite number of row interchanges is called a permutation

To continue we need to interchange

to get

and now U is in r.e.f.. Thus, we consider PA and transform it to r.e.f

Hence,

8.2. Orhtonormal basis

In theorem 8.1.5 we saw that every set of nonzero orthogonal vectors is linearly independent. This motivates our next definition.

8.2.1 Definition: Let V be an inner product space and W a subspace of V. An orthonormal set orthonormal basis of W if . As an immediate application of theorem 8.1.5, we have the following : is called an

8.2.2 Theorem: Let W be any subspace of V. Then W has an orthonormal basis. Proof

8.2.3 Example: For V = with usual inner-product, the set with ,where 1 is

at the

place, is an orthonormal basis of V. This is called standard basis of

8.2.4 Gram-Schmidt algorithm: Based on the proof of theorem 8.1.5(iv), the procedure for finding an orthonormal basis for an inner product space is given as follows: Let Step1: Define Step 2: : be a basis of V.

Step 3: Define

Then,

is an orthonormal basis of V .

8.2.5 Examples: (i) In example 8.1.6, the basis S={(1, 1, 1, ), (-1, 0, -1), (-1, 2, 3)} of gave the orthonormal basis

(ii) Consider the vector space V = of all real-polynomial function p(x), degree at most 3. Let the inner-product on V be given by

,of

Consider the basis S = for V. The Gram-Schmidt process gives the following.

Next, define

Then is an orthonormal basis of V. (iii) Let W denote the subspace of spanned by the set of vectors: S = {(1, 1, 1), (2, 2, 3), (0, 0, 1), (1, 2, 3)}. Clearly S is not a basis of W , since S has four elements. Thus, to find an orthonormal basis of W one way is to first select a basis of W out of the vectors in S and then apply Gram-Schmidt process to it. Another, more straight forward method, is to apply Gram-Schmidt process directly to the set of vectors in S , and discard those vectors which become zero. In our case it is as follows:

The required orthonormal basis is

Click here to see an interactive visualization of Applet : 8.1

8.2.6 Note:

Suppose we want to find an orthonormal basis of

which includes a given unit vector

Of course, one method is first extend {X} to a basis of and then using GramSchmidt process orthonormalize it. An efficient way of doing this is as follows: (i) Write

Assume (ii) Construct

(iii) It is easy to check that the matrix U has the property i.e. Columns of U are orthonormal.

8.3. Advantages of an orthonormal basis Recall that, if for a vector space V, is an ordered basis, then every vectors v V can be assigned unique coordinates, i.e. there exist unique

scalars

such that

However, we do not know how to compute these scalars for a given v. Suppose, V is an inner product space and B is an ordered orthonormal basis of V. Then the coordinates of any vector v V, can be computed easily, as explained in the next theorem. 8.3.1 Theorem: Let V be an inner product space with ordered orthonormal basis followings holds. (i) For every v V, . Then the

(ii) For every v, w

(iii) For every v V,

The scalar Identity.

are called the Fourier-coefficients of v and (iii) is called the Parseval's

Proof 8.3.2 Note: Formula (8.1) tells us how to compute its coordinates of a vector and formulae (8.2) and (8.3) allow us to compute the inner-product and the norm in terms of these coordinates. 8.3.3 Examples: (i) Consider the set

It is easy to verify that B is an orthonormal basis of

. For the vector u = ( 6, 1, -8 ), since

we have (ii) Consider the space of all real polynomials of degree at most 4. For p, q . Consider the subspace of , define . Consider the

It is easy to see that this is an inner-product on basis of

. We can apply Gram-Schmidt process to this basis to get an orthonormal basis

of as follows : Note that

Now define

Thus, an orthonormal basis of

is given by

For a vector, say

its coordinates with respect to B are given by :

Next we show that the matrix representation of a linear transformation is easy to compute with respect to an ordered orthonormal basis. 8.3.4 Theorem: Suppose is a linear transformation on an inner-product space V. Let be an ordered orthonormal basis of V. Then

Proof 8.3.5 Example: Let us compute the matrix projection on the line along the unit vector u = Since . the orthogonal

and

we have

10.1. Diagonalizability Let A be a nn matrix with entries fromF . In 6.4.4 we had raised the question : When is A similar to a diagonal matrix? Equivalently given a linear transformation T : V V, dim(V)= n , when we can find an ordered basis B of V such that is diagonal? Let us make the following definition:

10.1.1 Definition: (i) A matrix A is said to be diagonalizable if A is similar to a diagonal matrix , i.e. there exists an invertible matrix P such that is a diagonal matrix. (ii) Let V be a finite dimensional vector space and T : V V be linear transformation . We sat T is diagonalizable if there exists a basis B of V such that the matrix is a diagonal matrix.

We would like to know when is a given matrix A diagonalizable and if so, how to find P such that is diagonal ? Next theorem answers this question.

10.1.2 Theorem: Let A be a n n matrix . If A is diagonalizable, then there exists scalars and vectors be such that the following holds :

Proof Theorem 10.1.2 says that if A is diagonalizable then not only A has n eigenvalues, it has a basis consisting of eigenvectors. In fact, the converse of this is also true.

10.1.3 Theorem: (i) Let A be a n n matrix . If A has a linearly independent eigenvectors, then A is diagonalizable. (ii) Let A be a n n matrix. If A is diagonalizable and P is any n n invertible matrix , then B:= is also diagonalizable. (iii) Let V be a finite dimensional vector space and T : V V be a linear transformation . If there exists a basis B of V consisting of eigenvectors of T, then T is diagonalizable. Proof

10.1.4 Note: Theorem 10.1.3 not only tells us when is A diagonalizable, it also gives us a matrix P which will diagonalize A, i.e., = D, and gives us the resulting diagonal matrix also. The columns vectors of the matrix P are the n eigenvectors of A and the diagonal matrix D the diagonal entries as the eigenvalues of A corresponding to these n-eigenvectors .

10.1.5 Examples: (i) Consider the matrix

Since

and the equation (ii) Consider the matrix

has no real roots, the matrix is not invertible.

The characteristic polynomial of A is

Thus, A has two eigenvalues: = 3, 6. For = 3,

The solutions of ( A - 3I )(X) = 0 are given by

Thus,

are both eigenvector for the eigenvalue = 3. Similarly, for = 6, since

The solutions of ( A - 6I )(X) are given by

For example

is a solution. Hence

is an eigenvector for the eigenvalue = 6. Let

Since det( P ) 0, P is invertible. Thus by theorem 10.1.2, is invertible, i.e.,

form a linearly independent set. Hence,

Let us verify this:

(iii) Let If we take standard basis B ={1, x} for T( 1 ) = x, T( x ) = -2 + 3x Thus , , then

The characteristic polynomial of T is given by

Thus, T has two eigenvalues = 1, 2. Further, for the eigenvalues = 1

Thus (

- I ) X = 0 implies

Thus, X =

is an eigenvector for

for = 1.

Similarly, for = 2

Thus,

is an eigenvector for

for the eigenvalue = 2 . Thus, if we take

Or, equivalently, if we take T ( -2 + x ) = -2 + x = 1 ( -2 + x ) + 0 ( -1 + x )

and T ( -1 + x ) = -2 + 2x = 0 ( -2 + x ) + 2 ( -1 + x ) Thus

To apply theorem 10.1.2 to diagonalize a matrix, it is important to know how many eigenvalues the matrix has, and the relations between the vectors. 10.1.6 Theorem: Let A be a n n matrix . (i) Let be an eigenvalue of A and Then, is a subspace of called the eigen subspace for the eigenvalue . (ii) Eigenvectors corresponding to distinct eigenvalues are linearly independent : let is linearly independent set .

forms a linearly independent subset of (iv) Let

be distinct eigenvalues A and

Then A is diagonalizable if and only if

Proof

10.1.7 Corollary: Let A be a nn matrix. If A has n distinct eigenvalues, then A is diagonalizable. Proof

10.1.8 Example: Consider the matrix

Since

Thus, = 1, 2, 3 are the eigenvalues of A, and hence A is diagonalizable. 10.1.9 Note: If a nn matrix A does not have all eigenvalues distinct, it need not imply that A is not diagonalizable. For example in 10.1.4(ii), though A had only two distinct values, it was diagonalizable, for we were able to find 3-linearly independent eigenvalues . When a nn matrix fails to have distinct eigenvalues, it can never happen because either the characteristic polynomial have no roots in F ( where F = ), or some of the roots of det( A - I ) are repeated .This motivates our next definitions: 10.1.10 Definition:

Let A be a nn matrix and F be an eigenvalue of A . (i) We say a positive integer rn is the algebraic multiplicity of if is a root of det(A -tI) of order r. For = 0,

Thus

Finally, for = -2

Thus

Thus, each eigenvalue also has geometric multiplicity 1. (ii) Let

Since

A has only one eigenvalue = 1 and it has algebraic multiplicity 2.Further

(iii) we say has geometric multiplicity k, a positive integer, if

10.1.11 Example: (i) Let

Then

Thus, A has three distinct eigenvalues = 5, 0 and -2. Each has algebraic multiplicity one. Infact each has geometric multiplicity also one. Let us verify that. For = 5,

Thus

Thus .y .

( A - I ) X = 0 iff X = Thus

Hence, = 1 has geometric multiplicity also 1. The notations of algebraic multiplicity, geometric multiplicity and the diagonalizability are related as follows:

10.1.12 Theorem: Let A be a nn matrix. Let A have distinct eigenvalues , respectively and geometric multiplicities with algebraic multiplicities , respectively . Then

Proof

8.4. Orthogonal decomposition 8.4.1 Definition: Let S be any subset of V and let

It is easy to check that

is the subspace of V, called the orthogonal complement of S..

8.4.2 Theorem (Orthogonal complement): Let W be any subspace of V. Then w + v for some w W, u Further, every v V can be uniquely written as .

, we write this as : V= W

Proof

8.4.3 Example: (i) Let V = with the standard inner-product. Let S = {( 1, 0, 0 ), ( 0, 1, 0 )}. Then = {( x, y, z ) | (x, y, z) . (1, 0, 0 ) = 0 = ( 0, 1, 0 ).( x, y, z )} = {(0, 0, z ) | z } If we write W = [S] = {( x, y, 0 ) | x, y }, then = {( 0, 0, 3 ) | z }.

(ii) Let

, the space of all real polynomials of degree at most 3, with inner product :

Let Then This gives

, if and only if

and

i.e.

. This system of equations in variables a, b, c, leads to solutions :

where

are arbitrary. Hence

if and only if

Hence

(iii) Let The vectors ( 2, 5, -1 ), ( -2, 1, 1 ) are orthogonal . Then an orthonormal basis of W is given by

Then the projection of any vector

onto W is given by

For example, if u = ( 1, 2, 3 ), then

Thus

Let V be an inner-product space and let W be a subspace of V. Theorem 8.1.4 tells us that , i.e. , the space V decomposes as a sum of two orthogonal subspaces. Decompositions, as given by theorem 8.4.2 are very useful in understanding linear transformation on inner-product spaces ( see proposition 9.3.2 and theorem 10.2.4. ). This idea can be extended as follows :

8.4.4 Definition: Let V be an inner-product space and be subspaces of V such that

In that case, we say V is an orthogonal sum of the subspaces The following is easy to prove :

and we write it as

8.4.5 Proposition: Let Then there exist unique such that

Proof

As another application of the projections, we have the following :

8.4.6 Theorem (Best unique approximation): Let W be any subspace of V and let v V. Then there exists a unique such that

Proof

8.4.7 Example: (i) Let V = and W be the subspace of generated by {( 2, -1, -2 ), ( 1, 0, 1 )}. Then, the best approximator for u = ( 2, 1, 3 ) from W is given by

The distance of u =( 2, 1, 3 ) from W is given by

(ii) Consider

the space of all real polynomials of degree at most 3, with inner-product

Let W denote the subspace of V spanned by { 1, t }. Since , { 1, t } forms an orthogonal basis of w . Since

the orthogonal projection onto W , the best approximator , is given by

8.5. Least square approximations To show that theorem 8.4.2 is not only of theoretical interest and has practical applications also, let us consider the following problem: An experimenter collects data by taking measurements at time points , respectively. Geometrically, this gives points

Because of the distribution of these points or otherwise, the experimenter feels that there exists a linear relation between the time variable t and the observation variables , say y = ct + d . The problem is to find c and d such that the line y = ct + d best fits the experimental data. Equivalently, one wants to minimize the error:

The error will be zero i.e., the line will be the exact fit if and only if We can reformulate the problem as follows: let

Then, Thus, given the matrix A and vector Y, the line ct + d will be the exact fit (error zero) if and only if AX = Y has a solution. This leads to the problem of solving the system AX = Y. However, in general, the number of equations greatly exceed the the number of variables and it is unrealistic to expect a solution of AX = Y. So, one would like to find (8.4) This leads to the following definition : such that it gives least error, i.e.,

8.5.1 Definition: For a matrix A and , a least square solution of (8.5)

As an application of theorem 8.4.2, we have the following :

8.5.2 Theorem: The set of least square solutions of AX = Y is same as the solution of the system

These are called the normal equations for a matrix A. Proof

8.5.3 Note: For a matrix A and a given vector , let be a least square solution of AX = Y. Then ,

is the projection of the vector Y onto the column space of A. Least square approximation need not be unique, however if and are both least square solutions for AX = Y, then A =A

. As a consequence of theorem 8.5.2, we have the following:

8.5.4 Corollary: The system AX = Y , has unique least square solution if and only if in that case , the least square solution is given by is invertible, and

Proof

The above corollary tells us that the least square approximation can be computed if is invertible. Necessary and sufficient conditions for the invertibility of is given below .

8.5.5 Example: Consider the equation AX = Y where

It is easy to check that the system AX = Y is inconsistent. To find a least square approximation, we solve the normal equations, Here

Since

is invertible, the system has unique least square approximation given by the solution of

Using Gauss elimination, we have

8.5.6 Theorem: For any matrix A, the matrix is invertible if and only if rank( A ) = n.

Proof

8.5.7 Example: Let

We want to find the least square approximation for AX = Y . We note that

Which is invertible . It is easy to compute

Thus

Thus the least square approximation is

In the particular situation, when the columns of A are linearly independent , it is possible to compute the least square approximation another method . For any vector .

8.5.8 Definition: A matrix said to have a QR decomposition if there exist matrices Q and R with the following properties : (i) Q is a matrix whose columns form an orthonormal basis for the column space of A. (ii) A is a upper triangular invertible matrix with positive diagonal entries. (iii) A = QR .

8.5.9 Theorem: Let A be a approximation matrix and . If A admits a QR decomposition, then the least square

for AX = Y is given by

Proof

8.5.10 Note: The above theorem tells us that if A = QR, then to find the least square approximation, we do not have to compute , as suggested by corollary 8.5.3, instead one can compute which is easier to compute as R is Upper triangular, and use above theorem. ,

8.5.11 Example: Let A = QR , where . Then Q has orthonormal columns and R is invertible, upper triangular. Thus for . The system AX = Y, for Here , has least square approximation, given by

The existence and uniqueness of QR decomposition of a matrix is given by the next theorem.

8.5.12 Theorem: Let A be a matrix with linearly independent columns. Thus, A admits a QR decomposition . Further such a decomposition is unique. Proof

8.5.13 Example: Let

It is easy to verify that the columns of A are linearly independent. Application of Gram-Schmidt process to the column vectors of A gives the orthonormal basis for the column-space .

Thus

where

are the column-vectors of A. In our case,

Thus,

We verify

8.5.14 Note: The method of least square approximation applies equally well for fitting a polynomial to a given data.

You might also like