Academia.eduAcademia.edu

Applications of Linear Algebra

2010

Preface These lecture notes are intended to supplement a one-semester graduate-level engineering course at The George Washington University in algebraic methods appropriate to the solution of engineering computational problems, including systems of linear equations, linear vector spaces, matrices, least squares problems, Fourier series, and eigenvalue problems. In general, the mix of topics and level of presentation are aimed at upper-level undergraduates and firstyear graduate students in mechanical, aerospace, and civil engineering.

Applications of Linear Algebra by Gordon C. Everstine 5 June 2010 Copyright c 1998–2010 by Gordon C. Everstine. All rights reserved. This book was typeset with LATEX 2ε (MiKTeX). Preface These lecture notes are intended to supplement a one-semester graduate-level engineering course at The George Washington University in algebraic methods appropriate to the solution of engineering computational problems, including systems of linear equations, linear vector spaces, matrices, least squares problems, Fourier series, and eigenvalue problems. In general, the mix of topics and level of presentation are aimed at upper-level undergraduates and firstyear graduate students in mechanical, aerospace, and civil engineering. Gordon Everstine Gaithersburg, Maryland June 2010 iii Contents 1 Systems of Linear Equations 1.1 Definitions . . . . . . . . . . . . . . . . . . . . . 1.2 Nonsingular Systems of Equations . . . . . . . . 1.3 Diagonal and Triangular Systems . . . . . . . . 1.4 Gaussian Elimination . . . . . . . . . . . . . . . 1.5 Operation Counts . . . . . . . . . . . . . . . . . 1.6 Partial Pivoting . . . . . . . . . . . . . . . . . . 1.7 LU Decomposition . . . . . . . . . . . . . . . . 1.8 Matrix Interpretation of Back Substitution . . . 1.9 LDU Decomposition . . . . . . . . . . . . . . . 1.10 Determinants . . . . . . . . . . . . . . . . . . . 1.11 Multiple Right-Hand Sides and Matrix Inverses 1.12 Tridiagonal Systems . . . . . . . . . . . . . . . 1.13 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 4 4 6 9 10 11 14 16 17 18 19 20 2 Vector Spaces 2.1 Rectangular Systems of Equations 2.2 Linear Independence . . . . . . . 2.3 Spanning a Subspace . . . . . . . 2.4 Row Space and Null Space . . . . 2.5 Pseudoinverses . . . . . . . . . . 2.6 Linear Transformations . . . . . . 2.7 Orthogonal Subspaces . . . . . . 2.8 Projections Onto Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 26 29 30 30 31 33 38 39 3 Change of Basis 3.1 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Examples of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Isotropic Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 44 44 48 4 Least Squares Problems 4.1 Least Squares Fitting of Data . . 4.2 The General Linear Least Squares 4.3 Gram-Schmidt Orthogonalization 4.4 QR Factorization . . . . . . . . . . . . . 48 50 52 53 54 . . . . 55 58 58 61 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Fourier Series 5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Generalized Fourier Series . . . . . . . . . . . . . . . . 5.3 Fourier Expansions Using a Polynomial Basis . . . . . 5.4 Similarity of Fourier Series With Least Squares Fitting v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Eigenvalue Problems 6.1 Example 1: Mechanical Vibrations . . . . 6.2 Properties of the Eigenvalue Problem . . . 6.3 Example 2: Principal Axes of Stress . . . . 6.4 Computing Eigenvalues by Power Iteration 6.5 Inverse Iteration . . . . . . . . . . . . . . . 6.6 Iteration for Other Eigenvalues . . . . . . 6.7 Similarity Transformations . . . . . . . . . 6.8 Positive Definite Matrices . . . . . . . . . 6.9 Application to Differential Equations . . . 6.10 Application to Structural Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 65 68 73 75 78 79 81 82 84 88 Bibliography 90 Index 93 List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Two Vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Vectors That Span the xy-Plane. . . . . . . . . . . . . . . . . . . . . . 90◦ Rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reflection in 45◦ Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection Onto Horizontal Axis. . . . . . . . . . . . . . . . . . . . . . . . . Rotation by Angle θ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection Onto Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Element Coordinate Systems in the Finite Element Method. . . . . . . . . . Basis Vectors in Polar Coordinate System. . . . . . . . . . . . . . . . . . . . Change of Basis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Element Coordinate System for Pin-Jointed Rod. . . . . . . . . . . . . . . . Example of Least Squares Fit of Data. . . . . . . . . . . . . . . . . . . . . . The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . . . The Projection of One Vector onto Another. . . . . . . . . . . . . . . . . . . Convergence of Series in Example. . . . . . . . . . . . . . . . . . . . . . . . . A Function with a Jump Discontinuity. . . . . . . . . . . . . . . . . . . . . . The First Two Vectors in Gram-Schmidt. . . . . . . . . . . . . . . . . . . . . 2-DOF Mass-Spring System. . . . . . . . . . . . . . . . . . . . . . . . . . . . Mode Shapes for 2-DOF Mass-Spring System. (a) Undeformed shape, (b) Mode 1 (masses in-phase), (c) Mode 2 (masses out-of-phase). . . . . . . . . . 3-DOF Mass-Spring System. . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometrical Interpretation of Sweeping. . . . . . . . . . . . . . . . . . . . . Elastic System Acted Upon by Forces. . . . . . . . . . . . . . . . . . . . . . vi 3 30 34 34 34 37 39 40 41 41 47 50 54 58 59 61 63 65 67 75 79 83 1 Systems of Linear Equations A system of n linear equations in n unknowns can be expressed in the form  A11 x1 + A12 x2 + A13 x3 + · · · + A1n xn = b1    A21 x1 + A22 x2 + A23 x3 + · · · + A2n xn = b2  ..  .   An1 x1 + An2 x2 + An3 x3 + · · · + Ann xn = bn .  (1.1) The problem is to find numbers x1 , x2 , · · · , xn , given the coefficients Aij and the right-hand side b1 , b2 , · · · , bn . In matrix notation, Ax = b, (1.2) where A is the n × n matrix    A=  b is the right-hand side vector A11 A12 · · · A1n A21 A22 · · · A2n .. .. .. .. . . . . An1 An2 · · · Ann     b 1      b2  , b= ..   .      b     ,  (1.3) (1.4) n and x is the solution vector     x 1     x2   x= . ..   .     x   (1.5) n 1.1 Definitions The identity matrix I is the square diagonal  1 0  0 1   I= 0 0  .. ..  . . 0 0 matrix with 1s on the diagonal:  0 ··· 0 0 ··· 0   1 ··· 0  . .. . . ..  . .  . 0 ··· 1 In index notation, the identity matrix is the Kronecker delta:  1, i = j, Iij = δij = 0, i 6= j. 1 (1.6) (1.7) The inverse A−1 of a square matrix A is the matrix such that A−1 A = AA−1 = I. (1.8) The inverse is unique if it exists. For example, let B be the inverse, so that AB = BA = I. If C is another inverse, C = CI = C(AB) = (CA)B = IB = B. A square matrix A is said to be orthogonal if AAT = AT A = I. (1.9) That is, an orthogonal matrix is one whose inverse is equal to the transpose. An upper triangular matrix has only zeros below the diagonal, e.g.,   A11 A12 A13 A14 A15  0 A22 A23 A24 A25    0 A33 A34 A35  U=  0 .  0 0 0 A44 A45  0 0 0 0 A55 Similarly, a lower triangular matrix has only  A11 0  A21 A22  L=  A31 A32  A41 A42 A51 A52 zeros above the diagonal, e.g.,  0 0 0 0 0 0   A33 0 0  . A43 A44 0  A53 A54 A55 (1.10) (1.11) A tridiagonal matrix has all its nonzeros on the main diagonal and the two adjacent diagonals, e.g.,   x x 0 0 0 0  x x x 0 0 0     0 x x x 0 0   . A= (1.12)  0 0 x x x 0    0 0 0 x x x  0 0 0 0 x x Consider two three-dimensional vectors u and v given by u = u1 e1 + u 2 e2 + u3 e3 , v = v 1 e 1 + v 2 e 2 + v 3 e3 , (1.13) where e1 , e2 , and e3 are the mutually orthogonal unit basis vectors in the Cartesian coordinate directions, and u1 , u2 , u3 , v1 , v2 , and v3 are the vector components. The dot product (or inner product) of u and v is u · v = (u1 e1 + u2 e2 + u3 e3 ) · (v1 e1 + v2 e2 + v3 e3 ) = u1 v1 + u2 v2 + u3 v3 = 3 X ui vi , (1.14) i=1 where ei · ej = δij . 2 (1.15) ✓ ✓ b✓ ✓ ✓ ✓ ✓θ ✼❅ ✓ ❅ ❅ ❅c=a−b b sin θ ❅ ❅ ❅ a ❘ ❅ ✲ ✛ a − b cos θ ✲ Figure 1: Two Vectors. In n dimensions, u·v = n X ui vi , (1.16) i=1 where n is the dimension of each vector (the number of components). There are several notations for this product, all of which are equivalent: inner product notation vector notation index notation (u, v) u·v n X ui v i i=1 uT v or vT u matrix notation The scalar product can also be written in matrix notation as    v1       v2    T . u v = u1 u2 · · · un .    ..     v  n (1.17) For u a vector, the length of u is v u n q uX √ ui ui = u21 + u22 + · · · + u2n . u = |u| = u · u = t (1.18) c = a − b, (1.19) (a − b cos θ)2 + (b sin θ)2 = c2 , (1.20) a2 − 2ab cos θ + b2 cos2 θ + b2 sin2 θ = c2 (1.21) i=1 A unit vector is a vector whose length is unity. Two vectors u and v are orthogonal if u · v = 0. Consider the triangle defined by three vectors a, b, and c (Fig. 1), where in which case where a, b, and c are the lengths of a, b, and c, respectively, amd θ is the angle between a and b. Eq. 1.20 can be expanded to yield 3 or c2 = a2 + b2 − 2ab cos θ, (1.22) which is the law of cosines. We can further expand the law of cosines in terms of components to obtain (a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 = a21 + a22 + a23 + b21 + b22 + b23 − 2ab cos θ (1.23) or a1 b1 + a2 b2 + a3 b3 = ab cos θ. (1.24) Thus, the dot product of two vectors can alternatively be expressed as a · b = |a||b| cos θ = ab cos θ, (1.25) where θ is the angle between the two vectors. 1.2 Nonsingular Systems of Equations If the linear system Ax = b has a unique solution x for every right-hand side b, the system is said to be nonsingular. Usually, one refers to the matrix rather than the system of equations as being nonsingular, since this property is determined by the coefficients Aij of the matrix and does not depend on the right-hand side b. If A is singular (i.e., not nonsingular), then, for some right-hand sides, the system Ax = b will be inconsistent and have no solutions, and, for other right-hand sides, the equations will be dependent and have an infinite number of solutions. For example, the system  2x1 + 3x2 = 5 (1.26) 4x1 + 6x2 = 1 is inconsistent and has no solution. The system 2x1 + 3x2 = 5 4x1 + 6x2 = 10  (1.27) is redundant (i.e., has dependent equations) and has an infinite number of solutions. The only possibilities for a square system of linear equations is that the system have no solution, one solution, or an infinite number of solutions. 1.3 Diagonal and Triangular Systems Several types of systems of equations are particularly easy to solve, including diagonal and triangular systems. For the diagonal matrix   A11   A22     A 33 A= (1.28) ,   . .   . Ann 4 the system Ax = b has the solution x1 = b1 b2 bn , x2 = , · · · , xn = , A11 A22 Ann (1.29) assuming that, for each i, Aii 6= 0. If Akk = 0 for some k, then xk does not exist if bk 6= 0, and xk is arbitrary if bk = 0. Consider the upper triangular system (Aij = 0 if i > j)  A11 x1 + A12 x2 + A13 x3 + ··· + A1n xn = b1     A22 x2 + A23 x3 + ··· + A2n xn = b2    A33 x3 + ··· + A3n xn = b3  (1.30) ..  .    An−1,n−1 xn−1 + An−1,n xn = bn−1    Ann xn = bn .  This system is solved by backsolving: xn = bn /Ann , (1.31) xn−1 = (bn−1 − An−1,n xn )/An−1,n−1 , (1.32) and so on until all components of x are found. The backsolving algorithm can be summarized as follows: 1. Input n, A, b 2. xn = bn /Ann 3. For k = n − 1, n − 2, . . . , 1: [k = row number] x k = bk For i = k + 1, k + 2, . . . , n: xk = xk − Aki xi xk = xk /Akk 4. Output x Since the only impediments to the successful completion of this algorithm are the divisions, it is apparent that an upper triangular system is nonsingular if, and only if, Aii 6= 0 for all i. Alternatively, if any Aii = 0, the system is singular. For example, consider the upper triangular system  x1 − 2x2 + x3 = 1  2x2 + 4x3 = 10 (1.33)  − 2x3 = 8. From the backsolving algorithm, the solution is found as x3 = −4, x2 = [10 − 4(−4)]/2 = 13, x1 = 1 + 2(13) − (−4) = 31. 5 (1.34) A variation on the back-solving algorithm listed above is obtained if we return the solution in the right-hand side array b. That is, array x could be eliminated by overwriting b and storing the solution in b. With this variation, the back-solving algorithm becomes 1. Input n, A, b 2. bn = bn /Ann 3. For k = n − 1, n − 2, . . . , 1: [k = row number] For i = k + 1, k + 2, . . . , n: bk = bk − Aki bi bk = bk /Akk 4. Output b On output, the solution is returned in b, and the original b is destroyed. A lower triangular system Ax = b is one for which Aij = 0 if i < j. This system can be solved directly by forward solving in a manner similar to backsolving for upper triangular systems. 1.4 Gaussian Elimination The approach widely used to solve general systems of linear equations is Gaussian elimination, for which the procedure is to transform the general system into upper triangular form, and then backsolve. The transformation is based on the following elementary operations, which have no effect on the solution of the system: 1. An equation can be multiplied by a nonzero constant. 2. Two equations can be interchanged. 3. Two equations can be added together and either equation replaced by the sum. 4. An equation can be multiplied by a nonzero constant, added to another equation, and the result used to replace either of the first two equations. (This operation is a combination of the first and third operations.) We illustrate Gaussian elimination with an example. Consider the system  −x1 + 2x2 + 3x3 + x4 = 1    −3x1 + 8x2 + 8x3 + x4 = 2 x1 + 2x2 − 6x3 + 4x4 = −1    2x1 − 4x2 − 5x3 − x4 = 0. Step 1: Eliminate all coefficients of x1 except for that in the first equation:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 4x2 − 3x3 + 5x4 = 0    x3 + x4 = 2. 6 (1.35) (1.36) Step 2: Eliminate all coefficients of x2 except for those in the first two equations:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 − x3 + 9x4 = 2    x3 + x4 = 2. Step 3: Eliminate all coefficients of x3 except for those in the first three equations:  −x1 + 2x2 + 3x3 + x4 = 1    2x2 − x3 − 2x4 = −1 − x3 + 9x4 = 2    10x4 = 4. (1.37) (1.38) Ths system is now in upper triangular form, and we could complete the solution by backsolving the upper triangular system. Notice in the above procedure that we work with both sides of the equations simultaneously. Notice also that we could save some effort in the above procedure by not writing the unknowns at each step, and instead use matrix notation. We define a matrix consisting of the left-hand side coefficient matrix augmented by the right-hand side, and apply to this matrix the same elementary operations used in the three steps above:     −1 2 3 1 1 1 −1 2 3 1  −3   2  8 8 1   −→  0 2 −1 −2 −1  (1.39)  1  0 4 −3 0  2 −6 4 −1  5 2 −4 −5 −1 0 0 1 1 0 2     1 1 −1 2 3 1 −1 2 3 1    0 2 −1 −2 −1   −→  0 2 −1 −2 −1  . (1.40) −→   0 0 −1  0 0 −1 2  2  9 9 2 4 0 0 1 1 0 0 0 10 Note that this reduction can be performed “in-place” (i.e., by writing over the matrix and destroying both the original matrix and the right-hand side). Multiple right-hand sides can be handled simultaneously. For example, if we want the solution of Ax1 = b1 and Ax2 = b2 , we augment the coefficient matrix by both right-hand sides:   1 2 −1 2 3 1  −3 2 3  8 8 1 .   1 3  2 −6 4 −1 0 −1 2 −4 −5 −1 The Gaussian elimination algorithm (reduction to upper triangular form) can now be summarized as follows: 1. Input n, A, b 2. For k = 1, 2, . . . , n − 1: [k = pivot row number] For i = k + 1, k + 2, . . . , n: [i = row number below k] 7 m = −Aik /Akk [m = row multiplier] For j = k + 1, k + 2, . . . , n: [j = column number to right of k] Aij = Aij + mAkj bi = bi + mbk [rhs] 3. Output A, b The output of this procedure is an upper triangular system which can be solved by backsolving. Note that each multiplier m could be saved in Aik (which is in the lower triangle and not used in the above algorithm) so that additional right-hand sides could be solved later without having to reduce the original system again. Thus, the Gaussian elimination algorithm is modified as follows: 1. Input n, A, b 2. For k = 1, 2, . . . , n − 1: [k = pivot row number] For i = k + 1, k + 2, . . . , n: [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1, k + 2, . . . , n: [j = column number to right of k] Aij = Aij + Aik Akj bi = bi + Aik bk [rhs] 3. Output A, b Note further that, if the multipliers are saved in Aik , the loops involving the coefficient matrix A can (and should) be separated from those involving the right-hand side b: 1. Input n, A, b 2. Elimination For k = 1, 2, . . . , n − 1: [k = pivot row number] For i = k + 1, k + 2, . . . , n: [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1, k + 2, . . . , n: [j = column number to right of k] Aij = Aij + Aik Akj 3. RHS modification For k = 1, 2, . . . , n − 1: [k = pivot row number] For i = k + 1, k + 2, . . . , n: [i = row number below k] bi = bi + Aik bk [rhs] 4. Output A, b If the coefficient matrix is banded, the loops above can be shortened to avoid unnecessary operations on zeros. A band matrix, which occurs frequently in applications, is one for which the nonzeros are clustered about the main diagonal. We define the matrix semi-bandwidth w as the maximum “distance” of any nonzero off-diagonal term from the diagonal (including 8 the diagonal). Thus, for a band matrix, the upper limit n in the loops associated with pivot row k above can be replaced by kmax = min(k + w − 1, n). (1.41) The modified algorithm thus becomes 1. Input n, A, b, w [w = matrix semi-bandwidth] 2. Elimination For k = 1, 2, . . . , n − 1: [k = pivot row number] kmax = min(k + w − 1, n) [replaces n in the loops] For i = k + 1, k + 2, . . . , kmax : [i = row number below k] Aik = −Aik /Akk [multiplier stored in Aik ] For j = k + 1, k + 2, . . . , kmax : [j = column number to right of k] Aij = Aij + Aik Akj 3. RHS modification For k = 1, 2, . . . , n − 1: [k = pivot row number] kmax = min(k + w − 1, n) [replaces n in the loop] For i = k + 1, k + 2, . . . , kmax : [i = row number below k] bi = bi + Aik bk [rhs] 4. Output A, b A similar modification is made to the back-solve algorithm. 1.5 Operation Counts Except for small systems of equations (small n), we can estimate the number of operations required to execute an algorithm by looking only at the operations in the innermost loop. We define an operation as the combination of a multiply and add, since they generally are executed together. Gaussian elimination involves a triply-nested loop. For fixed row number k, the inner loops on i and j are executed n − k times each. Hence, Number of multiply-adds ≈ ≈ n−1 X (n − k) = k=1 n−1 Z 2 1 X m=n−1 2 m = n−1 X m2 m=1 m2 dm = (n − 1)3 /3 ≈ n3 /3, 0 (1.42) where the approximations all depend on n’s being large. Thus, for a fully-populated matrix, Gaussian elimination requires about n3 /3 operations. For a band matrix with a semi-bandwidth w ≪ n, w2 operations are performed for each of the n rows (to first order). Thus, for a general band matrix, Gaussian elimination requires about nw2 operations. 9 For backsolving, we consider Step 3 of the backsolve algorithm. In Row k (counting from the bottom), there are approximately k multiply-add operations. Hence, Number of multiply-adds ≈ n X k=1 k = n(n + 1)/2 ≈ n2 /2. (1.43) Thus, backsolving requires about n2 /2 operations for each right-hand side. Notice that, for large n, the elimination step has many more operations than does the backsolve. 1.6 Partial Pivoting For Gaussian elimination to work, the (current) diagonal entry Aii must be nonzero at each intermediate step. For example, consider the same example used earlier, but with the equations rearranged:     −1 2 3 1 1 1 −1 2 3 1  2 −4 −5 −1  1 1 0  2    −→  0 0  (1.44)  −3  0 2 −1 −2 −1  . 2  8 8 1 1 2 −6 4 −1 0 4 −3 5 0 In this case, the next step would fail, since A22 = 0. The solution to this problem is to interchange two equations to avoid the zero divisor. If the system is nonsingular, this interchange is always possible. However, complications arise in Gaussian elimination not only when Aii = 0 at some step but also sometimes when Aii is small. For example, consider the system  (1.000 × 10−5 )x1 + 1.000x2 = 1.000 (1.45) 1.000x1 + 1.000x2 = 2.000, the solution of which is (x1 , x2 ) = (1.00001000 . . . , 0.99998999 . . .). Assume that we perform the Gaussian elimination on a computer which has four digits of precision. After the one-step elimination, we obtain  (1.000 × 10−5 )x1 + 1.000x2 = 1.000 (1.46) − 1.000 × 105 x2 = −1.000 × 105 , whose solution is x2 = 1.000, x1 = 0.000. If we now interchange the original equations, the system is 1.000x1 + 1.000x2 = 2.000 (1.000 × 10−5 )x1 + 1.000x2 = 1.000,  (1.47) which, after elimination, yields 1.000x1 + 1.000x2 = 2.000 1.000x2 = 1.000, 10  (1.48) whose solution is x2 = 1.000, x1 = 1.000, which is correct to the precision available. We conclude from this example that it is better to interchange equations at Step i so that |Aii | is as large as possible. The interchanging of equations is called partial pivoting. Thus an alternative version of the Gaussian elimination algorithm would be that given previously except that, at each step (Step i), the ith equation is interchanged with one below so that |Aii | is as large as possible. In actual practice, the “interchange” can alternatively be performed by interchanging the equation subscripts rather than the equations themselves. 1.7 LU Decomposition Define Mik as the identity matrix with the addition of the number mik in Row i, Column k:   1   1     1 (1.49) Mik =  .   .   mik . . 1 For example, for n = 3, M31 is M31  1  0 = m31 0 1 0  0 0 . 1 (1.50) We now consider the effect of multiplying a matrix A by Mik . For example, for n = 3,    A11 A12 A13 1 0 0 1 0   A21 A22 A23  (1.51) M31 A =  0 A31 A32 A33 m31 0 1   A11 A12 A13 . A21 A22 A23 = (1.52) m31 A11 + A31 m31 A12 + A32 m31 A13 + A33 Thus, when A is multiplied by M31 , we obtain a new matrix which has Row 3 replaced by the sum of Row 3 and m31 ×Row 1. In general, multiplication by Mik has the effect of replacing Row i by Row i+mik ×Row k. Thus, each step of the Gaussian elimination process is equivalent to multiplying A by some Mik . For example, for n = 4 (a 4 × 4 matrix), the entire Gaussian elimination can be written as M43 M42 M32 M41 M31 M21 A = U, where (1.53) Aik (1.54) Akk are the multipliers in Gaussian elimination, and U is the upper triangular matrix obtained in the Gaussian elimination. Note that the multipliers mik in Eq. 1.54 are computed using the current matrix entries obtained during Gaussian elimination. mik = − 11 If each M−1 ik exists, then, for n = 4, −1 −1 −1 −1 −1 A = (M43 M42 M32 M41 M31 M21 )−1 U = M−1 21 M31 M41 M32 M42 M43 U, (1.55) since the inverse of a matrix product is equal to the product of the matrix inverses in reverse order. We observe, by example, that M−1 ik exists and is obtained from Mik simply by changing the sign of mik (the ik element), since, for n = 4 and M42 ,      1 1 1 0 0 0     0 1 0 0  1 1  = I. (1.56)   =    0  0 1 0  1 1 0 m42 − m42 0 1 −m42 1 m42 1 From Eq. 1.55, for n = 3, we obtain −1 −1 A = M−1 21 M31 M32 U, (1.57) where −1 −1 M−1 21 M31 M32  1  = −m21  1 =  −m21 1 1 1  In general, 1 1 1 =  −m21 −m31 −m32 −1 −1 −1 M−1 21 M31 · · · Mn1 · · · Mn,n−1 and, for any nonsingular matrix A,    1 1 −m31 1  1     =   1 −m31    1 −m32 1 1 1 −m32  1  (1.58) (1.60) 1 −m21 1 −m31 −m32 1 .. .. .. . . . −mn1 −mn2 · · ·  .. . −mn,n−1 A = LU, 12  (1.59) . where L is the lower triangular matrix  1  −m21 1   −m31 −m32 1 L=  .. .. ..  . . . −mn1 −mn2 · · ·  1    ,   (1.61) (1.62)  .. . −mn,n−1 1    ,   (1.63) and U is the upper triangular matrix obtained in Gaussian elimination. A lower triangular matrix with 1s on the diagonal is referred to as a lower unit triangular matrix. The decomposition A = LU is referred to as the “LU decomposition” or “LU factorization” of A. Note that the LU factors are obtained directly from the Gaussian elimination algorithm, since U is the upper triangular result, and L consists of the negatives of the row multipliers. Thus, the LU decomposition is merely a matrix interpretation of Gaussian elimination. It is also useful to show a common storage scheme for the LU decomposition. If A is factored in place (i.e., the storage locations for A are over-written during the Gaussian elimination process), both L and U can be stored in the same n2 storage locations used for A. In particular,     A11 A12 A13 · · · A1n U11 U12 U13 · · · U1n  A21 A22 A23 · · · A2n   L21 U22 U23 · · · U2n        A31 A32 A33 · · · A3n     is replaced by  L31 L32 U33 · · · U3n  ,  ..  .. ..  .. .. .. .. ..  .. ..  .  . . . .  . . . . .  Ln1 Ln2 Ln3 · · · Unn An1 An2 An3 · · · Ann where the Lij entries are the negatives of the multipliers obtained during Gaussian elimination. Since L is a lower unit triangular matrix (with 1s on the diagonal), it is not necessary to store the 1s. Each Lij entry can be stored as it is computed. We have seen how a lower unit triangular matrix can be formed from the product of M matrices. It can also be shown that the inverse of a matrix product is the product of the individual inverses in reverse order. We have also seen that the inverse of an M matrix is another M matrix: the one with the off-diagonal term negated. Thus, one might infer that the inverse of a lower unit triangular matrix is equal to the same matrix with the terms below the diagonal negated. However, such an inference would be incorrect, since, although the inverse of a lower unit triangular matrix is a lower triangular matrix equal to the product of M matrices, the M matrices are in the wrong order. Thus, the inverse of a lower unit triangular matrix is not equal to the same matrix with the terms below the diagonal negated, as the following simple example illustrates. Consider the matrix   1 0 0 A =  5 1 0 . (1.64) 1 8 1 We note that A−1 since  1 0 0 1 0 , 6=  −5 −1 −8 1      1 0 0 1 0 0 1 0 0  5 1 0   −5 0 1 0 = 1 0 = 6 I. −40 0 1 −1 −8 1 1 8 1  13 (1.65) (1.66) To see why we cannot compute A−1 by simply negating the terms of A below the diagonal, consider the three matrices       1 0 0 1 0 0 1 0 0 (1.67) B =  5 1 0 , C =  0 1 0 , D =  0 1 0 . 0 8 1 1 0 1 0 0 1 Is A = BCD or A = DCB  1 0 BCD =  5 1 0 0 and or both?  1 0 0  0 1 1 Notice that     1 0 0 1 0 0 0 0 1 0   0 1 0  =  5 1 0  = A, 1 8 1 0 8 1 0 1      1 0 0 1 0 0 1 0 0 1 0 0 6 A. DCB =  0 1 0   0 1 0   5 1 0  =  5 1 0  = 41 8 1 0 0 1 1 0 1 0 8 1  (1.68) (1.69) Thus, we conclude that order matters when M matrices are multiplied, and BCD 6= DCB. (1.70) That is, the nice behavior which occurs when multiplying M matrices occurs only if the matrices are multiplied in the right order: the same order as that which the Gaussian elimination multipliers are computed. To complete this discussion, we compute A−1 as       1 0 0 1 0 0 1 0 0 1 0 0 1 0 . 1 0   0 1 0   −5 1 0  =  −5 A−1 = D−1 C−1 B−1 =  0 39 −8 1 0 0 1 −1 0 1 0 −8 1 (1.71) 1.8 Matrix Interpretation of Back Substitution Consider the linear system Ax = b, where A has been factored into A = LU. Then LUx = b. If we now define Ux = y, Eq. 1.72 is equivalent to the pair of equations  Ly = b, Ux = y, (1.72) (1.73) which can be solved in two steps, first to compute y and then to compute x. The combination of Gaussian elimination and backsolve can therefore be interpreted as the sequence of two matrix solutions: Eq. 1.73a is the modification of the original right-hand side b by the multipliers to yield y. Eq. 1.73b is the backsolve, since the backsolve operation involves the coefficient matrix U to yield the solution x. The two steps in Eq. 1.73 are often referred to as forward substitution and back substitution, respectively. The combination is sometimes 14 called the forward-backward substitution (FBS). Since the forward and back substitutions each require n2 /2 operations for a fully populated matrix, FBS requires n2 operations. A good example of why equations are solved with LU factorization followed by FBS is provided by transient structural dynamics. Transient response analysis of structures is concerned with computing the forced response due to general time-dependent loads (e.g., blast or shock). Linear finite element modeling results in the matrix differential equation Mü + Bu̇ + Ku = F(t), (1.74) where u(t) is the vector of unknown displacements, t is time, M is the system mass matrix, B is the viscous damping matrix, K is the system stiffness matrix, F is the time-dependent vector of applied forces, and dots denote differentiation with respect to the time t. M, B, and K are constant (independent of time). The problem is to determine u(t), given the initial displacement u0 and initial velocity u̇0 . There are several numerical approaches for solving this system of equations, one of which integrates the equations directly in the time domain. Another approach, the modal superposition approach, will be discussed as an application of eigenvalue problems in Section 6.10 (p. 88). An example of a direct integrator is the Newmark-beta finite difference method (in time), which results in the recursive relation   1 1 1 1 M + B + K u = (Fn+1 + Fn + Fn−1 ) n+1 ∆t2 2∆t 3 3     (1.75) 1 1 1 1 2 M − K un + − 2 M + B − K un−1 , + ∆t2 3 ∆t 2∆t 3 where ∆t is the integration time step, and un is the solution at the nth step. If the current time solution is un , this matrix system is a system of linear algebraic equations whose solution un+1 is the displacement solution at the next time step. The right-hand side depends on the current solution un and the immediately preceding solution un−1 . Since the original differential equation is second-order, it makes sense that two consecutive time solutions are needed to move forward, since a minimum of three consecutive time values is needed to estimate a second derivative using finite differences. Newmark-beta uses the initial conditions in a special start-up procedure to obtain u0 and u−1 to get the process started. There are various issues with integrators (including stability, time step selection, and the start-up procedure), but our interest here is in the computational effort. Notice that, in Eq. 1.75, the coefficient matrices in parentheses that involve the time-independent matrices M, B, and K do not change during the integration unless the time step ∆t changes. Thus, Eq. 1.75 is an example of a linear equation system of the form Ax = b for which many solutions are needed (one for each right-hand side b) for the same coefficient matrix A, but the right-hand sides are not known in advance. Thus, the solution strategy is to factor the left-hand side coefficient matrix once into LU, save the factors, form the new rhs at each step, and then complete the solution with FBS. The computational effort is therefore one matrix decomposition (LU) for each unique ∆t followed by two matrix/vector muliplications (to form the rhs) and one FBS at each step. Since LU costs considerably more than multiplication or FBS, many time steps can be computed for the same effort as one LU. As a result, one should not change the time step size ∆t unnecessarily. 15 1.9 LDU Decomposition Recall that any nonsingular square matrix A can be factored into A = LU, where L is a lower unit triangular matrix, and U is an upper triangular matrix. The diagonal entries in U can be factored out into a separate diagonal matrix D to yield A = LDU, (1.76) where D is a nonsingular diagonal matrix, and U is an upper unit triangular matrix (not the same U as in LU). To verify the correctness of this variation of the LU decomposition, let LU = LDÛ. (1.77) For a 3 × 3 matrix,      U11 U12 U13 1 Û12 Û13 D11 0 0 U =  0 U22 U23  =  0 D22 0   0 1 Û23  = DÛ, 0 0 D33 0 0 U33 0 0 1 (1.78) where Dii = Uii (no sum on i), and Ûij = Uij /Dii (no sum on i). That is,      1 U12 /U11 U13 /U11 U11 0 0 U11 U12 U13  0 U22 U23  =  0 U22 0   0 1 U23 /U22  . 0 0 1 0 0 U33 0 0 U33 (1.79) For example,   1 0 0 2 1 1 1 0 A =  4 −6 0  =  2 −2 7 2 −1 −1 1    2 0 0 1 0 0 1 0   0 −8 0   = 2 0 0 1 −1 −1 1  Note that, if A is symmetric (AT = A),  2 1 1   0 −8 −2  0 0 1  1 1/2 1/2 0 1 1/4  = LDÛ. 0 0 1  (1.80) (1.81) A = LDU = (LDU)T = UT DLT , (1.82) U = LT . (1.83) or Thus, nonsingular symmetric matrices can be factored into A = LDLT . 16 (1.84) 1.10 Determinants Let A be an n × n square matrix. For n = 2, we define the determinant of A as   A11 A12 A11 A12 det = = A11 A22 − A12 A21 . A21 A22 A21 A22 (1.85) For example, 1 2 3 4 = 1 · 4 − 2 · 3 = −2, (1.86) where the number of multiply-add operations is 2. For n = 3, we can expand by cofactors to obtain, for example, 1 2 3 4 5 6 7 8 9 =1 4 6 4 5 5 6 +3 −2 7 9 7 8 8 9 = 0, (1.87) where the number of operations is approximately 3 · 2 = 6. In general, if we expand a determinant using cofactors, the number of operations to expand an n × n determinant is n times the number of operations to expand an (n − 1) × (n − 1) determinant. Thus, using the cofactor approach, the evaluation of the determinant of an n × n matrix requires about n! operations (to first order), which is prohibitively expensive for all but the smallest of matrices. Several properties of determinants are of interest and stated here without proof: 1. A cofactor expansion can be performed along any matrix row or column. 2. det AT = det A 3. det(AB) = (det A)(det B) 4. det A−1 = (det A)−1 (a special case of Property 3) 5. The determinant of a triangular matrix is the product of the diagonals. 6. Interchanging two rows or two columns of a matrix changes the sign of the determinant. 7. Multiplying any single row or column by a constant multiplies the determinant by that constant. Consequently, for an n × n matrix A and scalar c, det(cA) = cn det(A). 8. A matrix with a zero row or column has a zero determinant. 9. A matrix with two identical rows or two identical columns has a zero determinant. 10. If a row (or column) of a matrix is a constant multiple of another row (or column), the determinant is zero. 11. Adding a constant multiple of a row (or column) to another row (or column) leaves the determinant unchanged. 17 To illustrate the first property using the matrix of Eq. 1.87, we could also write 1 2 3 4 5 6 7 8 9 = −2 1 3 1 3 4 6 −8 +5 4 6 7 9 7 9 = 0. (1.88) This property is sometimes convenient for matrices with many zeros, e.g., 1 0 3 0 5 0 7 0 9 =5 1 3 7 9 = −60. (1.89) Now consider the LU factorization A = LU: det A = (det L)(det U) = (1 · 1 · 1 · · · 1)(U11 U22 · · · Unn ) = U11 U22 · · · Unn . (1.90) This result implies that a nonsingular matrix has a nonzero determinant, since, if Gaussian elimination results in a U matrix with only nonzeros on the diagonal, U is nonsingular, which implies that A is nonsingular, and det A 6= 0. On the other hand, if U has one or more zeros on the diagonal, U and A are both singular, and det A = 0. Thus, a matrix is nonsingular if, and only if, its determinant is nonzero. Eq. 1.90 also shows that, given U, the determinant can be computed with an additional n operations. Given A, calculating det A requires about n3 /3 operations, since computing the LU factorization by Gaussian elimination requires about n3 /3 operations, a very small number compared to the n! required by a cofactor expansion, as seen in the table below: n 5 10 25 60 n3 /3 42 333 5208 7.2×104 n! 120 3.6×106 1.6×1025 8.3×1081 Determinants are rarely of interest in engineering computations or numerical mathematics, but, if needed, can be economically computed using the LU factorization. 1.11 Multiple Right-Hand Sides and Matrix Inverses Consider the matrix system Ax = b, where A is an n × n matrix, and both x and b are n×m matrices. That is, both the right-hand side b and the solution x have m columns, each corresponding to an independent right-hand side. Each column of the solution x corresponds to the same column of the right-hand side b, as can be seen if the system is written in block form as A[x1 x2 x3 ] = [b1 b2 b3 ], (1.91) or [Ax1 Ax2 Ax3 ] = [b1 b2 b3 ]. (1.92) Thus, a convenient and economical approach to solve a system with multiple right-hand sides is to factor A into A = LU (an n3 /3 operation), and then apply FBS (an n2 operation) to 18 each right-hand side. Unless the number m of right-hand sides is large, additional right-hand sides add little to the overall computational cost of solving the system. Given a square matrix A, the matrix inverse A−1 , if it exists, is the matrix satisfying   1 0 0 ··· 0  0 1 0 ··· 0      (1.93) AA−1 = I =  0 0 1 · · · 0  .  .. .. .. . . ..   . . . . .  0 0 0 ··· 1 The calculation of A−1 is equivalent to solving a set of equations with coefficient matrix A and n right-hand sides, each one column of the identity matrix. The level of effort required is one LU factorization and n FBSs; that is, 1 4 Number of operations ≈ n3 + n(n2 ) = n3 . 3 3 (1.94) Thus, computing an inverse is only four times more expensive than solving a system with one right-hand side. In fact, if one exploits the special form of the right-hand side (the identity matrix, which has only one nonzero in each column), it turns out that the inverse can be computed in n3 operations. 1.12 Tridiagonal Systems Tridiagonal systems arise in a variety of engineering applications, including the finite difference solution of differential equations (e.g., the Crank-Nicolson finite difference method for solving parabolic partial differential equations) and cubic spline interpolation. Tridiagonal systems are particularly easy (and fast) to solve using Gaussian elimination. It is convenient to solve such systems using the following notation:  d 1 x1 + u1 x2 = b1     l 2 x1 + d 2 x2 + u2 x 3 = b2   l 3 x2 + d 3 x 3 + u3 x 4 = b3 (1.95)  ..   .   ln xn−1 + dn xn = bn ,  where di , ui , and li are, respectively, the diagonal, upper, and lower matrix entries in Row i. All coefficients can now be stored in three one-dimensional arrays, D(·), U(·), and L(·), instead of a full two-dimensional array A(I,J). The solution algorithm (reduction to upper triangular form by Gaussian elimination followed by backsolving) can now be summarized as follows: 1. For k = 1, 2, . . . , n − 1: [k = pivot row] m = −lk+1 /dk [m = multiplier needed to annihilate term below] dk+1 = dk+1 + muk [new diagonal entry in next row] bk+1 = bk+1 + mbk [new rhs in next row] 19 2. xn = bn /dn [start of backsolve] 3. For k = n − 1, n − 2, . . . , 1: xk = (bk − uk xk+1 )/dk [backsolve loop] To first order, this algorithm requires 5n operations. 1.13 Iterative Methods Gaussian elimination is an example of a direct method, for which the number of operations required to effect a solution is predictable and fixed. Some systems of equations have characteristics that can be exploited by iterative methods, where the number of operations is not predictable in advance. For example, systems with large diagonal entries will converge rapidly with iterative methods. Consider the linear system Ax = b. In Jacobi’s method, one starts by assuming a solution (0) x , say, and then computing a new approximation to the solution using the equations    (1) (0) (0) (0)   x /A11 = b − A x − A x − · · · − A x 1 12 2 13 3 1n n 1        x(1) = b2 − A21 x(0) − A23 x(0) − · · · − A2n x(0) /A22 n 2 1 3 (1.96) .  .  .      (0) (0) (0)   x(1) n = bn − An1 x1 − An2 x2 − · · · − An,n−1 xn−1 /Ann . In general,   (k) (k−1) (k−1) (k−1) (k−1) xi = bi − Ai1 x1 − Ai2 x2 − · · · − Ai,i−1 xi−1 − Ai,i+1 xi+1 − · · · − Ain xn(k−1) /Aii , i = 1, 2, . . . , n. (1.97) or (k) xi = bi − X j6=i (k−1) Aij xj ! /Aii , i = 1, 2, . . . , n. (1.98) In these equations, superscripts refer to the iteration number. If the vector x does not change much from one iteration to the next, the process has converged, and we have a solution. Jacobi’s method is also called iteration by total steps and the method of simultaneous displacements, since the order in which the equations are processed is irrelevant, and the updates could in principle be done simultaneously. Thus, Jacobi’s method is nicely suited to parallel computation. Notice that, in this algorithm, two consecutive approximations to x are needed at one time so that convergence can be checked. That is, it is not possible to update the vector x in place. The Jacobi algorithm can be summarized as follows: 1. Input n, A, b, Nmax , x [x = initial guess] 2. k=0 20 3. k=k+1 [k = iteration number] 4. For i = 1, 2, . . . , n: yi = (bi − Ai1 x1 − Ai2 x2 − · · · − Ai,i−1 xi−1 − Ai,i+1 xi+1 − · · · − Ain xn ) /Aii 5. If converged (compare y with x), output y and exit; otherwise . . . 6. x = y [save latest iterate in x] 7. If k < Nmax , go to Step 3; otherwise . . . 8. No convergence; error termination. In the above algorithm, the old iterates at each step are called x, and the new ones are called y. For example, consider the system Ax = b, where     2 1 0  1    0 . (1.99) A= 1 4 1 , b=   0 0 1 3 This system can be written in the form   x1 = (1 − x2 )/2 x2 = (−x1 − x3 )/4  x3 = −x2 /3. As a starting vector, we use x(0) We summarize the iterations in a table: k 0 1 2 3 4 5 6 7 8 9 10 11 12 x1 0 0.5000 0.5000 0.5625 0.5625 0.5755 0.5755 0.5782 0.5782 0.5788 0.5788 0.5789 0.5789    0  0 . =   0 x2 0 0 -0.1250 -0.1250 -0.1510 -0.1510 -0.1564 -0.1564 -0.1576 -0.1576 -0.1578 -0.1578 -0.1579 21 (1.100) (1.101) x3 0 0 0 0.0417 0.0417 0.0503 0.0503 0.0521 0.0521 0.0525 0.0525 0.0526 0.0526 For the precision displayed, this iteration has converged. (k) (k) (k) Notice that, in Jacobi’s method, when x3 is computed, we already know x2 and x1 . (k−1) (k−1) Why not use the new values rather than x2 and x1 ? The iterative algorithm which uses at each step the “improved” values if they are available is called the Gauss-Seidel method. In general, in Gauss-Seidel,   (k) (k) (k) (k) (k−1) (k−1) /Aii , xi = bi − Ai1 x1 − Ai2 x2 − · · · − Ai,i−1 xi−1 − Ai,i+1 xi+1 − · · · − Ain xn i = 1, 2, . . . , n. (1.102) This formula is used instead of Eq. 1.97. The Gauss-Seidel algorithm can be summarized as follows: 1. Input n, A, b, Nmax , x [x = initial guess] 2. k=0 3. k=k+1 [k = iteration number] 4. y = x [save previous iterate in y] 5. For i = 1, 2, . . . , n: xi = (bi − Ai1 x1 − Ai2 x2 − · · · − Ai,i−1 xi−1 − Ai,i+1 xi+1 − · · · − Ain xn ) /Aii 6. If converged (compare x with y), output x and exit; otherwise . . . 7. If k < Nmax , go to Step 3; otherwise . . . 8. No convergence; error termination. We return to the example used to illustrate Jacobi’s method:     2 1 0  1    0 . A= 1 4 1 , b=   0 0 1 3 The system Ax = b can again be written in the form   x1 = (1 − x2 )/2 x2 = (−x1 − x3 )/4  x3 = −x2 /3, (1.103) (1.104) where we again use the starting vector x(0)    0  0 . =   0 We summarize the Gauss-Seidel iterations in a table: 22 (1.105) k 0 1 2 3 4 5 6 x1 0 0.5000 0.5625 0.5755 0.5782 0.5788 0.5789 x2 0 -0.1250 -0.1510 -0.1564 -0.1576 -0.1578 -0.1578 x3 0 0.0417 0.0503 0.0521 0.0525 0.0526 0.0526 For this example, Gauss-Seidel converges is about half as many iterations as Jacobi. In Gauss-Seidel iteration, the updates cannot be done simultaneously as in the Jacobi method, since each component of the new iterate depends upon all previously computed components. Also, each iterate depends upon the order in which the equations are processed, so that the Gauss-Seidel method is sometimes called the method of successive displacements to indicate the dependence of the iterates on the ordering. If the ordering of the equations is changed, the components of the new iterate will also change. It also turns out that the rate of convergence depends on the ordering. In general, if the Jacobi method converges, the Gauss-Seidel method will converge faster than the Jacobi method. However, this statement is not always true. Consider, for example, the system Ax = b, where     1 2 −2  1    1 . 1 , b= (1.106) A= 1 1   1 2 2 1 The solution of this system is (−3, 3, 1). For this system, Jacobi converges, and Gauss-Seidel diverges. On the other hand, consider the system Ax = b with     5 4 3  12  15 . (1.107) A =  4 7 4 , b =   11 3 4 4 For this system, whose solution is (1, 1, 1), Jacobi diverges, and Gauss-Seidel converges. Generally, for iterative methods such as Jacobi or Gauss-Seidel, the stopping criterion is either that the absolute change from one iteration to the next is small or that the relative change from one iteration to the next is small. That is, either (k) (k−1) <ε max xi − xi i or (k) (k−1) max xi − xi i (1.108) (k−1) < ε max xi i , (1.109) where ε is a small number chosen by the analyst. We now return to the question of convergence of iterative methods. Consider Jacobi’s method of solution for the system Ax = b. Define M as the iteration matrix for Jacobi’s method; i.e., M is defined as the matrix such that x(k) = Mx(k−1) + d, 23 (1.110) where  A12 − A11  0   A21  − 0  A22 M=  .. ..  . .    A An2 n1 − − Ann Ann A13 − A11 A23 − A22 .. . A1n − A11 A2n − A22 .. . ··· ··· ... ··· − An,n−1 Ann 0       ,      di = bi /Aii . (1.111) Define the error at the kth iteration as e(k) = x(k) − x, (1.112) where x is the exact solution (unknown). By definition (Eq. 1.110), Mx + d = x. (1.113) Thus,  Me(k) = Mx(k) − Mx = x(k+1) − d − (x − d) = x(k+1) − x = e(k+1) . (1.114) That is, if the iteration matrix M acts on the error e(k) , e(k+1) results. Thus, to have the error decrease at each iteration, we conclude that M should be small in some sense. The last equation can be written in terms of components as (k+1) ei =− X Aij j6=i Aii (k) ej (1.115) or, with absolute values, (k+1) ei = X Aij Aii j6=i Since, for any numbers ci , X i it follows that (k+1) ei X Aij (k) X Aij ≤ ej = Aii Aii j6=i j6=i where we define αi = ci ≤ (k) ej X ≤ i (k) ej . (1.116) |ci | , X Aij Aii j6=i (1.117) ! X Aij 1 X = |Aij |. A |A ii ii | j6=i j6=i 24 (k) e(k) max = αi emax . (1.118) (1.119) Thus, Eq. 1.118 implies (k) e(k+1) . ≤ αmax emax max If αmax < 1, the error decreases with each iteration, where αmax < 1 implies X |Aii | > |Aij |, i = 1, 2, . . . , n. (1.120) (1.121) j6=i A matrix which satisfies this condition is said to be strictly diagonally dominant. Therefore, a sufficient condition for Jacobi’s method to converge is that the coefficient matrix A be strictly diagonally dominant. Notice that the starting vector does not matter, since it did not enter into this derivation. It turns out that strict diagonal dominance is also a sufficient condition for the Gauss-Seidel method to converge. Note that strict diagonal dominance is a sufficient condition, but not necessary. As a practical matter, rapid convergence is desired, and rapid convergence can be assured only if the main diagonal coefficients are large compared to the off-diagonal coefficients. These sufficient conditions are weak in the sense that the iteration might converge even for systems not diagonally dominant, as seen in the examples of Eqs. 1.106 and 1.107. 2 Vector Spaces A vector space is defined as a set of objects, called vectors, which can be combined using addition and multiplication under the following eight rules: 1. Addition is commutatative: a + b = b + a. 2. Addition is associative: (a + b) + c = a + (b + c). 3. There exists a unique zero vector 0 such that a + 0 = a. 4. There exists a unique negative vector such that a + (−a) = 0. 5. 1 a = a. 6. Multiplication is associative: α(βa) = (αβ)a for scalars α, β. 7. Multiplication is distributive with respect to vector addition: α(a + b) = αa + αb. 8. Multiplication is distributive with respect to scalar addition: (α + β)a = αa + βa. Examples of vector spaces include the following: 1. The space of n-dimensional vectors: a = (a1 , a2 , . . . , an ). 2. The space of m × n matrices. 3. The space of functions f (x). 25 Any set of objects that obeys the above eight rules is a vector space. A subspace is a subset of a vector space which is closed under addition and scalar multiplication. The phrase “closed under addition and scalar multiplication” means that (1) if two vectors a and b are in the subspace, their sum a + b also lies in the subspace, and (2) if a vector a is in the subspace, the scalar multiple αa lies in the subspace. For example, a two-dimensional plane in 3-D is a subspace. However, polynomials of degree 2 are not a subspace, since, for example, we can add two polynomials of degree 2,   −x2 + x + 1 + x2 + x + 1 = 2x + 2, (2.1) and obtain a result which is a polynomial of degree 1. On the other hand, polynomials of degree ≤ 2 are a subspace. The column space of a matrix A consists of all combinations of the columns of A. For example, consider the 3 × 2 system     1 0    b1   5 4  u = b2 . (2.2) v   b3 2 4 Since this system has three equations and two alternative way to write this system is     0  1  4 +v u 5    2 4 unknowns, it may not have a solution. An      b1  b2 = ,    b3 (2.3) where u and v are scalar multipliers of the two vectors. This system has a solution if, and only if, the right-hand side b can be written as a linear combination of the two columns. Geometrically, the system has a solution if, and only if, b lies in the plane formed by the two vectors. That is, the system has a solution if, and only if, b lies in the column space of A. We note that the column space of an m × n matrix A is a subspace of m-dimensional space Rm . For example, in the 3 × 2 matrix above, the column space is the plane in 3-D formed by the two vectors (1, 5, 2) and (0, 4, 4). The null space of a matrix A consists of all vectors x such that Ax = 0. 2.1 Rectangular Systems of Equations For square systems of equations Ax = b, there is one solution (if A−1 exists) or zero or infinite solutions (if A−1 does not exist). The situation with rectangular systems is a little different. For example, consider the 3 × 4 system      x1      1 3 3 2    b1  x2  2 6 9 5  b2 . (2.4) = x3       −1 −3 3 0  b3  x4 26 We can simplify this system using elementary row operations to obtain     1 3 3 2 b1 1 3 3 2 b1  2 6 9 5 b2  −→  0 0 3 1 b2 − 2b1  −1 −3 3 0 b3 0 0 6 2 b 1 + b3   1 3 3 2 b1 , −→  0 0 3 1 b2 − 2b1 0 0 0 0 b3 − 2b2 + 5b1 (2.5) (2.6) which implies that the solution exists if, and only if, b3 − 2b2 + 5b1 = 0. (2.7) Thus, for this system, even though there are more unknowns than equations, there may be no solution. Now assume that a solution of Eq. 2.4 exists. For example, let    1  5 , (2.8) b=   5 in which case the row-reduced system is   1 3 3 2 1  0 0 3 1 3 , 0 0 0 0 0 (2.9) which implies, from the second equation, 3x3 + x4 = 3. (2.10) One of the two unknowns (x4 , say) can be picked arbitrarily, in which case 1 x3 = 1 − x4 . 3 (2.11) From the first equation in Eq. 2.9, we obtain x1 = 1 − 3x2 − 3x3 − 2x4 = 1 − 3x2 − 3(1 − x4 /3) − 2x4 = −2 − 3x2 − x4 , which implies that the solution can be written in the form         −1 −3 x −2         1                 x2 0 1 0 . + x4 + x2 x= = x3  −1/3  0  1                      x4 1 0 0 (2.12) (2.13) This system thus has a doubly infinite set of solutions, i.e., there are two free parameters. 27 Now consider the corresponding homogeneous system (with zero right-hand side): Axh = 0, where, from Eq. 2.9, the row-reduced system is   1 3 3 2 0  0 0 3 1 0 , 0 0 0 0 0 (2.14) (2.15) which implies 3x3 = −x4 (2.16) x1 = −3x2 − 3x3 − 2x4 = −3x2 − x4 . (2.17) and Thus, the solution of the homogeneous system can be written     −3 −1             1 0 x h = x2 + x4 . 0  −1/3            0 1 (2.18) Therefore, from Eqs. 2.13 and 2.18, the solution of the nonhomogeneous system Ax = b can be written as the sum of a particular solution xp and the solution of the corresponding homogeneous problem Axh = 0: x = xp + xh . (2.19) Alternatively, for the system Ax = b, given a solution x, any solution of the corresponding homogeneous system can be added to get another solution, since Ax = A(xp + xh ) = Axp + Axh = b + 0 = b. The final row-reduced matrix is said to  x x  0 x   0 0 0 0 (2.20) be in echelon form if it looks like this:  x x x x x x x x  , 0 x x x  0 0 0 0 where nonzeros are denoted with the letter “x”. The echelon form of a matrix has the following characteristics: 1. The nonzero rows appear first. 2. The pivots (circled above) are the first nonzeros in each row. 3. Only zeros appear below each pivot. 4. Each pivot appears to the right of the pivot in the row above. The number of nonzero rows in the row-reduced (echelon) matrix is referred to as the rank of the system. Thus, matrix rank is the number of independent rows in a matrix. 28 2.2 Linear Independence The vectors v1 , v2 , . . . , vk are linearly independent if the linear combination c1 v 1 + c2 v 2 + · · · + ck v k = 0 (2.21) implies c1 = c2 = · · · = ck = 0. We note that, if the coefficients ci are not all zero, then one vector can be written as a linear combination of the others. For the special case of two vectors, two vectors (of any dimension) are linearly dependent if one is a multiple of the other, i.e., v1 = cv2 . Geometrically, if two vectors are dependent, they are parallel. Three vectors (of any dimension) are linearly dependent if they are coplanar. For example, let us determine if the three vectors        3   4   2  3 , v2 = 5 , v3 = 7 v1 = (2.22)       3 4 4 are independent. Note that Eq. 2.21 implies  or v1 v2   c   1  v3 c2 =0   c3     3 4 2  c1   0   3 5 7  c2 0 , =     0 c3 3 4 4  (2.23) (2.24) so that v1 , v2 , v3 are independent if c1 = c2 = c3 = 0 is the only solution of this system. Thus, to determine the independence of vectors, we can arrange the vectors in the columns of a matrix, and determine the null space of the matrix:     3 4 2 0 3 4 2 0  3 5 7 0  −→  0 1 5 0  . (2.25) 3 4 4 0 0 0 2 0 This system implies c1 = c2 = c3 = 0, since an upper triangular system with a nonzero diagonal is nonsingular. Thus, the three given vectors are independent. The previous example also shows that the columns of a matrix are linearly independent if, and only if, the null space of the matrix is the zero vector. Let r denote the rank of a matrix (the number of independent rows). We observe that the number of independent columns is also r, since each column of an echelon matrix with a pivot has a nonzero component in a new location. For example, consider the echelon matrix   1 3 3 2  0 0 3 1 , 0 0 0 0 where the pivots are circled. Column 1 has one nonzero component. Column 2 is not independent, since it is proportional to Column 1. Column 3 has two nonzero components, 29 (0,1) ✻ ✒ ✲ ✲ (1,0) (1,1) (1,0) Figure 2: Some Vectors That Span the xy-Plane. so Column 3 cannot be dependent on the first two columns. Column 4 also has two nonzero components, but the column can be written uniquely as a linear combination of Columns 1 and 3 by picking first the Column 3 multiplier (1/3) followed by the Column 1 multiplier. Thus we see that, for a matrix, the row rank is equal to the column rank. 2.3 Spanning a Subspace If every vector v in a vector space can be expressed as a linear combination v = c1 w1 + c2 w2 + · · · + ck wk (2.26) for some coefficients ci , then the vectors w1 , w2 , . . . , wk are said to span the space. For example, in 2-D, the vectors (1,0) and (0,1) span the xy-plane (Fig. 2). The two vectors (1,0) and (1,1) also span the xy-plane. Since the number of spanning vectors need not be minimal, the three vectors (1,0), (0,1), and (1,1) also span the xy-plane, although only two of the three vectors are needed. The column space of a matrix is the space spanned by the columns of the matrix. This definition is equivalent to that given earlier (all possible combinations of the columns). A basis for a vector space is a set of vectors which is linearly independent and spans the space. The requirement for linear independence means that there can be no extra vectors. For example, the two vectors (1,0) and (0,1) in Fig. 2 form a basis for R2 . The two vectors (1,0) and (1,1) also form a basis for R2 . However, the three vectors (1,0), (0,1), and (1,1) do not form a basis for R2 , since one of the three vectors is linearly dependent on the other two. Thus, we see that all bases for a vector space V contain the same number of vectors; that number is referred to as the dimension of V . 2.4 Row Space and Null Space The row space of a matrix A consists of all possible combinations of the rows of A. Consider again the 3 × 4 example used in the discussion of rectangular systems:       1 3 3 2 1 3 3 2 1 3 3 2 6 9 5  −→  0 0 3 1  −→  0 0 3 1  = U, (2.27) A= 2 0 0 6 2 0 0 0 0 −1 −3 3 0 where the pivots are circled. A has two independent rows, i.e., rank(A) = 2. Also, since the rows of the final matrix U are simply linear combinations of the rows of A, the row spaces of both A and U are the same. Again we see that the number of pivots is equal to the rank r, 30 and that the number of independent columns is also r. That is, the number of independent columns is equal to the number of independent rows. Hence, for a matrix, Row rank = Column rank. (2.28) We recall that the null space of a matrix A consists of all vectors x such that Ax = 0. With elementary row operations, A can be transformed into echelon form, i.e., Ax = 0 −→ Ux = 0. For the example of the preceding section,  which implies (2.29)  1 3 3 2 U =  0 0 3 1 , 0 0 0 0 (2.30) x1 + 3x2 + 3x3 + 2x4 = 0 (2.31) 3x3 + x4 = 0. (2.32) and With four unknowns and two independent equations, we could, for example, choose x4 arbitrarily in Eq. 2.32 and x2 arbitrarily in Eq. 2.31. In general, for a rectangular system with m equations and n unknowns (i.e., A is an m × n matrix), the number of free variables (the dimension of the null space) is n − r, where r is the matrix rank. We thus summarize the dimensions of the various spaces in the following table. For an m × n matrix A (m rows, n columns): Space Row Column Null Dimension r r n−r Subspace of Rn Rm Rn Note that dim(row space) + dim(null space) = n = number of columns. (2.33) This property will be useful in the discussions of pseudoinverses and least squares. 2.5 Pseudoinverses Rectangular m×n matrices cannot be inverted in the usual sense. However, one can compute a pseudoinverse by considering either AAT or AT A, whichever (it turns out) is smaller. We consider two cases: m < n and m > n. Notice that, for any matrix A, both AAT and AT A are square and symmetric. For m < n, consider AAT , which is a square m × m matrix. If AAT is invertible, AAT  AAT 31 −1 =I (2.34) or h −1 i A AT AAT = I, (2.35) −1 where the matrix C = AT AAT in brackets is referred to as the pseudo right inverse. The reason for choosing the order of the two inverses given in Eq. 2.34 is that this order allows the matrix A to be isolated in the next equation. For the second case, m > n, consider AT A, which is an n×n matrix. If AT A is invertible, AT A or h −1 AT A −1  AT A = I i AT A = I, (2.36) (2.37) −1 T where the matrix B = AT A A in brackets is referred to as the pseudo left inverse. For example, consider the 2 × 3 matrix   4 0 0 , (2.38) A= 0 5 0 in which case AAT =  4 0 0 0 5 0 and the pseudo right inverse is C = AT AA  T −1     4 0 16 0  0 5 = , 0 25 0 0      1/4 0 4 0  1/16 0 =  0 1/5  . = 0 5  0 1/25 0 0 0 0  We check the correctness of this result by noting that      1/4 0  1 0 4 0 0  0 1/5  = AC = = I. 0 1 0 5 0 0 0 (2.39) (2.40) (2.41) One of the most important properties associated with pseudoinverses is that, for any matrix A,   (2.42) rank AT A = rank AAT = rank(A). To show that  rank AT A = rank(A), (2.43) we first recall that, for any matrix A, dim(row space) + dim(null space) = number of columns (2.44) r + dim(null space) = n. (2.45) or 32 Since AT A and A have the same number of columns (n), if suffices to show that AT A and A have the same null space. That is, we want to prove that Ax = 0 ⇐⇒ AT Ax = 0. (2.46) The forward direction (=⇒) is clear by inspection. For the other direction (⇐=), consider AT Ax = 0. (2.47) If we take the dot product of both sides with x, 0 = xT AT Ax = (Ax)T (Ax) = |Ax|2 , (2.48) which implies (since a vector of zero length must be the zero vector) Ax = 0. (2.49) Thus, since AT A and A have both the same null space and the same number of columns (n), the dimension r of the row space of the two matrices is the same, and the first part of Eq. 2.42 is proved. Also, with AT = B,   (2.50) rank AAT = rank BT B = rank(B) = rank(BT ) = rank(A), and the second part of Eq. 2.42 is proved. The implication of Eq. 2.42 is that, if rank(A) = n (i.e., A’s columns are independent), then AT A is invertible. It will also be shown in §4 that, for the overdetermined system Ax = b (with m > n), the pseudoinverse solution using AT A is equivalent to the least squares solution. 2.6 Linear Transformations If A is an m × n matrix, and x is an n-vector, the matrix product Ax can be thought of as a transformation of x into another vector x′ : Ax = x′ , (2.51) where x′ is an m × 1 vector. We consider some examples: 1. Stretching Let A= in which case Ax =  c 0 0 c   x1 x2 c 0 0 c  = cI,   cx1 cx2 33 =  (2.52) =c  x1 x2  . (2.53) ❆❑ ❆θ ✯ ✟ ❆ ✟✟ θ ❆✟ (x1 , x2 ) Figure 3: 90◦ Rotation. ✁✁✕ ✁ ✁ ✯ ✟ ✁ ✟✟ ✟ ✁✟✟ ✁ ✟ (x1 , x2 ) Figure 4: Reflection in 45◦ Line. 2. 90◦ Rotation (Fig. 3) Let A= in which case Ax =  0 −1 1 0   0 −1 1 0  x1 x2  0 1 1 0  x1 x2  , = (2.54)  −x2 x1  . (2.55) 3. Reflection in 45◦ Line (Fig. 4) Let A= in which case Ax =  0 1 1 0   , = (2.56)  x2 x1  . (2.57) 4. Projection Onto Horizontal Axis (Fig. 5) Let  1 0 0 0  x1 x2 A= in which case Ax =  1 0 0 0 ✟ ✟✟   , = (2.58)  x1 0  . (x1 , x2 ) ✯ ✟ ✟ ✟✟ r (x1 , 0) Figure 5: Projection Onto Horizontal Axis. 34 (2.59) Note that transformations which can be represented by matrices are linear transformations, since A(cx + dy) = c(Ax) + d(Ay), (2.60) where x and y are vectors, and c and d are scalars. This property implies that knowing how A transforms each vector in a basis determines how A transforms every vector in the entire space, since the vectors in the space are linear combinations of the basis vectors: A(c1 x1 + c2 x2 + · · · + cn xn ) = c1 (Ax1 ) + c2 (Ax2 ) + · · · + cn (Axn ), (2.61) where xi is the ith basis vector, and ci is the ith scalar multiplier. We now continue with the examples of linear transformations: 5. Differentiation of Polynomials Consider the polynomial of degree 4 p(t) = a0 + a1 t + a2 t2 + a3 t3 + a4 t4 . (2.62) p1 = 1, p2 = t, p3 = t2 , p4 = t3 , p5 = t4 , (2.63) The basis vectors are which implies that, in terms of the basis vectors, p(t) = a0 p1 + a1 p2 + a2 p3 + a3 p4 + a4 p5 . (2.64) The coefficients ai become the components of the vector p. If we differentiate the polynomial, we obtain p′ (t) = a1 + 2a2 t + 3a3 t2 + 4a4 t3 = a1 p1 + 2a2 p2 + 3a3 p3 + 4a4 p4 . (2.65) Differentiation of polynomials of degree 4 can thus be written in matrix form as      a0      0 1 0 0 0  a1         0 0 2 0 0   a1   2a2    (2.66)  0 0 0 3 0   a2  =  3a3  ,     a     3  0 0 0 0 4  4a4   a4 so that the matrix which represents differentiation  0 1 0 0  0 0 2 0 AD =   0 0 0 3 0 0 0 0 of polynomials of degree 4 is  0 0  , (2.67) 0  4 where the subscript “D” denotes differentiation. Note that this matrix has been written as a rectangular 4 × 5 matrix, since differentiation reduces the degree of the polynomial by unity. 35 6. Integration of Polynomials Let p(t) = a0 + a1 t + a2 t2 + a3 t3 , (2.68) which can be integrated to obtain Z t a1 a2 a3 a1 a2 a3 p(τ ) dτ = a0 t + t2 + t3 + t4 = a0 p2 + p3 + p4 + p5 . 2 3 4 2 3 4 0 Thus, integration of polynomials of  0 0 0  1 0 0    0 12 0   0 0 13 0 0 0 degree  0    0     0    0   1 4 (2.69) 3 can be written in matrix form as     0      a0       a   0   a1 1 = , (2.70) a 2 1   a2     1   3 a2     a3    1  a 3 4 so that the matrix which represents integration  0 0 0  1 0 0  1 AI =   0 2 01  0 0 3 0 0 0 of polynomials of degree 3 is  0 0   0  , 0  (2.71) 1 4 where the subscript “I” denotes integration. We note that integration followed by differentiation is an inverse operation which restores the vector to its original:      0 0 0 0  0 1 0 0 0  1 0 0 0   0 0 2 0 0   1 01 0 0   0 1 0 0       0 2 0 0  AD AI =  (2.72)  =  0 0 1 0  = I. 0 0 0 3 0  1  0 0  0 3 0 0 0 1 0 0 0 0 4 0 0 0 14 Thus, AD is the pseudo left inverse of AI . in reverse order (differentiation followed its original:   0 0 0 0   1 0 0 0  0 1 0   0 0 2 1  AI AD =   0 2 01 0   0 0 0  0 0 0  3 0 0 0 0 0 0 14 On the other hand, performing the operations by integation) does not restore the vector to 0 0 3 0   0   0  =   0  4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1     6= I.   Integration after differentiation does not restore the original constant term. 36 (2.73) e′2 e2 ✻ ❑❆❆ θ ′ ❆ ✯ e1 ✟ ✟ ✟ ❆ ❆✟✟ θ ✲ e1 Figure 6: Rotation by Angle θ. 7. Rotation by Angle θ (Fig. 6) Consider the 2-D transformation R=  cos θ − sin θ sin θ cos θ  . If v is a typical vector in 2-D, v can be written       v1 0 1 = v 1 e1 + v 2 e 2 , v= + v2 = v1 1 v2 0 (2.74) (2.75) where e1 and e2 are the basis vectors. We now consider the effect of R on the two basis vectors e1 and e2 :     − sin θ cos θ ′ = e′2 , (2.76) = e1 , Re2 = Re1 = cos θ sin θ as shown in Fig. 6. Since e1 and e2 are both rotated by θ, so is v. The transformation R has the following properties: (a) Rθ and R−θ are inverses (rotation by θ and −θ):      1 0 cos θ − sin θ cos θ sin θ = I, = R−θ Rθ = 0 1 sin θ cos θ − sin θ cos θ (2.77) where we used sin(−θ) = − sin θ and cos(−θ) = cos θ to compute R−θ . Thus, R T is an orthogonal matrix (RRT = I). Note also that R−θ = R−1 θ = R . (b) R2θ = Rθ Rθ (rotation through the double angle):    cos θ − sin θ cos θ − sin θ Rθ Rθ = sin θ cos θ sin θ cos θ   cos2 θ − sin2 θ −2 sin θ cos θ = 2 sin θ cos θ cos2 θ − sin2 θ   cos 2θ − sin 2θ = R2θ = sin 2θ cos 2θ (2.78) (2.79) (2.80) The above equations assume the double-angle trigonometric identities as known, but an alternative view of these equations would be to assume that R2θ = Rθ Rθ must be true, and view these equations as a derivation of the double-angle identities. 37 (c) Rθ+φ = Rφ Rθ (rotation through θ, then φ):    cos θ − sin θ cos φ − sin φ Rφ Rθ = sin θ cos θ sin φ cos φ   cos φ cos θ − sin φ sin θ − cos φ sin θ − sin φ cos θ = sin φ cos θ + cos φ sin θ − sin φ sin θ + cos φ cos θ   cos(φ + θ) − sin(φ + θ) = Rφ+θ = sin(φ + θ) cos(φ + θ) (2.81) (2.82) (2.83) We note that, in 2-D rotations (but not in 3-D), the order of rotations does not matter, i.e., Rφ Rθ = Rθ Rφ . (2.84) Properties (a) and (b) are special cases of Property (c). 2.7 Orthogonal Subspaces For two vectors u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ), we recall that the inner product (or dot product) of u and v is defined as T (u, v) = u · v = u v = n X ui v i . (2.85) i=1 The length of u is v u n q uX √ |u| = u21 + u22 + · · · + u2n = u · u = t ui ui (2.86) i=1 The vectors u and v are orthogonal if u · v = 0. We note first that a set of mutually orthogonal vectors is linearly independent. To prove this assertion, let vectors v1 , v2 , . . . , vk be mutually orthogonal. To prove independence, we must show that c1 v 1 + c2 v 2 + · · · + ck v k = 0 (2.87) implies ci = 0 for all i. If we take the dot product of both sides of Eq. 2.87 with v1 , we obtain v1 · (c1 v1 + c2 v2 + · · · + ck vk ) = 0, (2.88) where orthogonality implies v1 · vi = 0 for i 6= 1. Thus, from Eq. 2.88, c1 v1 · v1 = c1 |v1 |2 = 0. (2.89) Since |v1 | = 6 0, we conclude that c1 = 0. Similarly, c2 = c3 = · · · = ck = 0, and the original assertion is proved. We define two subspaces as orthogonal if every vector in one subspace is orthogonal to every vector in the other subspace. For example, the xy-plane is a subspace of R3 , and the orthogonal subspace is the z-axis. 38 3 ✒ ✏ b ✏ ✶✏ ✏ ✏✏ p ✏ θ ✏ ✏✏ ✶ ✏ ✏✏ a 2 1 Figure 7: Projection Onto Line. We now show that the row space of a matrix is orthogonal to the null space. To prove this assertion, consider a matrix A for which x is in the nullspace, and v is in the row space. Thus, Ax = 0 (2.90) and v = AT w (2.91) for some w. That is, v is a combination of the columns of AT or rows of A. Then, vT x = wT Ax = 0, (2.92) thus proving the assertion. 2.8 Projections Onto Lines Consider two n-dimensional vectors a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ). We wish to project b onto a, as shown in Fig. 7 in 3-D. Let p be the vector obtained by projecting b onto a. The scalar projection of b onto a is |b| cos θ, where θ is the angle between b and a. Thus, a p = (|b| cos θ) , (2.93) |a| where the fraction in this expression is the unit vector in the direction of a. Since a · b = |a||b| cos θ, Eq. 2.93 becomes p = |b| a·b a·b a = a. |a||b| |a| a·a (2.94) (2.95) This result can be obtained by inspection by noting that the scalar projection of b onto a is the dot product of b with a unit vector in the direction of a. The vector projection would then be this scalar projection times a unit vector in the direction of a:   a a p= b· . (2.96) |a| |a| 39 y y ✻ ▼❇ ✻ ȳ ❇ ❇ r✏✏✏ ✶ x̄ ✏ r✏✏ ✏ ✏✏ ▼❇ axial member ✲ r❳❳ ✶ x̄ ✏ ❳❳r✏✏ ✏ ✏ ❇ ✡ ✏✏ ❇✏ r✡ ✏ triangular membrane ȳ ✡ ✲ x x Figure 8: Element Coordinate Systems in the Finite Element Method. We can also determine the projection matrix for this projection. The projection matrix is the matrix P that transforms b into p, i.e., P is the matrix such that p = Pb. (2.97) Here it is convenient to switch to index notation with the summation convention (in which repeated indices are summed). From Eq. 2.95, pi = ai aj aj bj ai = bj = Pij bj . a·a a·a (2.98) ai aj a·a (2.99) aaT aaT = T . a·a a a (2.100) Thus, the projection matrix is Pij = or, in vector and matrix notations, P= The projection matrix P has the following properties: 1. P is symmetric, i.e., Pij = Pji . 2. P2 = P. That is, once a vector has been projected, there is no place else to go with further projection. This property can be proved using index notation: P2  ij = (PP)ij = Pik Pkj = ai ak ak aj ai aj (a · a) ai aj = = = Pij . (a · a)(a · a) (a · a)(a · a) a·a (2.101) 3. P is singular. That is, given p, it is not possible to reconstruct the original vector b, since b is not unique. 3 Change of Basis On many occasions in engineering applications, the need arises to transform vectors and matrices from one coordinate system to another. For example, in the finite element method, it is frequently more convenient to derive element matrices in a local element coordinate system and then transform those matrices to a global system (Fig. 8). Transformations 40 y ✻ eθ ❇▼ ❇ r ❇ ✶ er ✏✏ q ✏ ❇✏ θ ✲ x Figure 9: Basis Vectors in Polar Coordinate System. x x̄2 ❇▼ ❇ ✻2 ❇ ✒ ❇ ❇ ❇ ❇ ✏ ❇✏✏ ✏✏ v ✏✏ ✏✏ x̄1 ✶ ✏✏ θ ✲ x1 Figure 10: Change of Basis. are also needed to transform from other orthogonal coordinate systems (e.g., cylindrical or spherical) to Cartesian coordinates (Fig. 9). Let the vector v be given by v = v1 e1 + v2 e2 + v 3 e3 = 3 X v i ei , (3.1) i=1 where ei are the basis vectors, and vi are the components of v. Using the summation convention, we can omit the summation sign and write v = vi ei , (3.2) where, if a subscript appears exactly twice, a summation is implied over the range. An orthonormal basis is defined as a basis whose basis vectors are mutually orthogonal unit vectors (i.e., vectors of unit length). If ei is an orthonormal basis,  1, i = j, ei · ej = δij = (3.3) 0, i 6= j, where δij is the Kronecker delta. Since bases are not unique, we can express v in two different orthonormal bases: v= 3 X v i ei = i=1 3 X v̄i ēi , (3.4) i=1 where vi are the components of v in the unbarred coordinate system, and v̄i are the components in the barred system (Fig. 10). If we take the dot product of both sides of Eq. 3.4 41 with ej , we obtain 3 X i=1 vi ei · ej = 3 X i=1 v̄i ēi · ej , (3.5) where ei · ej = δij , and we define the 3 × 3 matrix R as Rij = ēi · ej . (3.6) Thus, from Eq. 3.5, vj = 3 X Rij v̄i = i=1 Since the matrix product 3 X T Rji v̄i . (3.7) i=1 C = AB (3.8) can be written using subscript notation as Cij = 3 X Aik Bkj , (3.9) k=1 Eq. 3.7 is equivalent to the matrix product v = RT v̄. (3.10) Similarly, if we take the dot product of Eq. 3.4 with ēj , we obtain 3 X i=1 vi ei · ēj = 3 X i=1 v̄i ēi · ēj , (3.11) where ēi · ēj = δij , and ei · ēj = Rji . Thus, v̄j = 3 X Rji vi or v̄ = Rv or v = R−1 v̄. (3.12) i=1 A comparison of Eqs. 3.10 and 3.12 yields R −1 =R T T or RR = I or 3 X Rik Rjk = δij , (3.13) k=1 where I is the identity matrix (Iij = δij ):  1 0 0 I =  0 1 0 . 0 0 1  (3.14) This type of transformation is called an orthogonal coordinate transformation (OCT). A matrix R satisfying Eq. 3.13 is said to be an orthogonal matrix. That is, an orthogonal 42 matrix is one whose inverse is equal to the transpose. R is sometimes called a rotation matrix. For example, for the coordinate rotation shown in Fig. 10, in 3-D,   cos θ sin θ 0 (3.15) R =  − sin θ cos θ 0  . 0 0 1 In 2-D, R= and   cos θ sin θ − sin θ cos θ  (3.16) vx = v̄x cos θ − v̄y sin θ (3.17) vy = v̄x sin θ + v̄y cos θ. We recall that the determinant of a matrix product is equal to the product of the determinants. Also, the determinant of the transpose of a matrix is equal to the determinant of the matrix itself. Thus, from Eq. 3.13, det(RRT ) = (det R)(det RT ) = (det R)2 = det I = 1, (3.18) and we conclude that, for an orthogonal matrix R, det R = ±1. (3.19) The plus sign occurs for rotations, and the minus sign occurs for combinations of rotations and reflections. For example, the orthogonal matrix   1 0 0 0  (3.20) R= 0 1 0 0 −1 indicates a reflection in the z direction (i.e., the sign of the z component is changed). Another property of orthogonal matrices that can be deduced directly from the definition, Eq. 3.13, is that the rows and columns of an orthogonal matrix must be unit vectors and mutually orthogonal. That is, the rows and columns form an orthonormal set. We note that the length of a vector is unchanged under an orthogonal coordinate transformation, since the square of the length is given by v̄i v̄i = Rij vj Rik vk = δjk vj vk = vj vj , (3.21) where the summation convention was used. That is, the square of the length of a vector is the same in both coordinate systems. To summarize, under an orthogonal coordinate transformation, vectors transform according to the rule 3 X v̄ = Rv or v̄i = Rij vj , (3.22) j=1 where and Rij = ēi · ej , (3.23) RRT = RT R = I. (3.24) 43 3.1 Tensors A vector which transforms under an orthogonal coordinate transformation according to the rule v̄ = Rv is defined as a tensor of rank 1. A tensor of rank 0 is a scalar (a quantity which is unchanged by an orthogonal coordinate transformation). For example, temperature and pressure are scalars, since T̄ = T and p̄ = p. We now introduce tensors of rank 2. Consider a matrix M = (Mij ) which relates two vectors u and v by 3 X v = Mu or vi = Mij uj (3.25) j=1 (i.e., the result of multiplying a matrix and a vector is a vector). Also, in a rotated coordinate system, v̄ = M̄ū. (3.26) Since both u and v are vectors (tensors of rank 1), Eq. 3.25 implies RT v̄ = MRT ū or v̄ = RMRT ū. (3.27) By comparing Eqs. 3.26 and 3.27, we obtain M̄ = RMRT (3.28) or, in index notation, M̄ij = 3 X 3 X Rik Rjl Mkl , (3.29) k=1 l=1 which is the transformation rule for a tensor of rank 2. In general, a tensor of rank n, which has n indices, transforms under an orthogonal coordinate transformation according to the rule 3 X 3 3 X X Āij···k = ··· Rip Rjq · · · Rkr Apq···r . (3.30) p=1 q=1 3.2 r=1 Examples of Tensors 1. Stress and strain in elasticity The stress tensor σ is  σ11 σ12 σ13 σ =  σ21 σ22 σ23  , σ31 σ32 σ33 (3.31)  ε12 ε13 ε22 ε23  , ε32 ε33 (3.32)  where σ11 , σ22 , σ33 are the direct (normal) stresses. The corresponding strain tensor is  ε11  ε = ε21 ε31 stresses, and σ12 , σ13 , and σ23 are the shear 44 where, in terms of displacements, 1 1 εij = (ui,j + uj,i ) = 2 2  ∂ui ∂uj + ∂xj ∂xi  . (3.33) The shear strains in this tensor are equal to half the corresponding engineering shear strains. Both σ and ε transform like second rank tensors. 2. Generalized Hooke’s law According to Hooke’s law in elasticity, the extension in a bar subjected to an axial force is proportional to the force, or stress is proportional to strain. In 1-D, σ = Eε, (3.34) where E is Young’s modulus, an experimentally determined material property. In general three-dimensional elasticity, there are nine components of stress σij and nine components of strain εij . According to generalized Hooke’s law, each stress component can be written as a linear combination of the nine strain components: σij = cijkl εkl , (3.35) where the 81 material constants cijkl are experimentally determined, and the summation convention is being used. We now prove that cijkl is a tensor of rank 4. We can write Eq. 3.35 in terms of stress and strain in a second orthogonal coordinate system as Rki Rlj σ̄kl = cijkl Rmk Rnl ε̄mn . (3.36) If we multiply both sides of this equation by Rpj Roi , and sum repeated indices, we obtain Rpj Roi Rki Rlj σ̄kl = Roi Rpj Rmk Rnl cijkl ε̄mn , (3.37) or, because R is an orthogonal matrix, δok δpl σ̄kl = σ̄op = Roi Rpj Rmk Rnl cijkl ε̄mn . (3.38) Since, in the second coordinate system, σ̄op = c̄opmn ε̄mn , (3.39) c̄opmn = Roi Rpj Rmk Rnl cijkl , (3.40) we conclude that which proves that cijkl is a tensor of rank 4. 3. Stiffness matrix in finite element analysis In the finite element method of analysis for structures, the forces F acting on an object in static equilibrium are a linear combination of the displacements u (or vice versa): Ku = F, 45 (3.41) where K is referred to as the stiffness matrix (with dimensions of force/displacement). In this equation, u and F contain several subvectors, since u and F are the displacement and force vectors for all grid points, i.e.,       Fa  ua         Fb     ub  (3.42) , F = u= Fc  uc           ...     ...  for grid points a, b, c, . . ., where ūa = Ra ua , ūb = Rb ub , · · · . Thus,      ū a       ūb   ū = = ū c         ...  Ra 0 0 0 Rb 0 0 0 Rc .. .. .. . . . where Γ is an orthogonal block-diagonal matrix  Ra 0  0 Rb  Γ= 0 0  .. .. . . and ··· ··· ··· .. .            ua ub uc .. . (3.43)          = Γu, consisting of rotation matrices R:  0 ··· 0 ···   , Rc · · ·   .. . . . . (3.44) (3.45) ΓT Γ = I. (3.46) F̄ = ΓF. (3.47) K̄ū = F̄, (3.48) K̄Γu = ΓF (3.49) Similarly, Thus, if or  ΓT K̄Γ u = F. (3.50) That is, the stiffness matrix transforms like other tensors of rank 2: K = ΓT K̄Γ. (3.51) We illustrate the transformation of a finite element stiffness matrix by transforming the stiffness matrix for the pin-jointed rod element from a local element coordinate system to a global Cartesian system. Consider the rod shown in Fig. 11. For this element, the 4 × 4 2-D 46 4 q ✻y θ ȳ ■ ❅ ❅q ✒ DOF x̄ 2 ✲ ■ ❅ x ❅q ✒ ■ ❅ ❅q ✒ 3 θ 1 Figure 11: Element Coordinate System for Pin-Jointed Rod. stiffness matrix in the element coordinate system is  1 0 −1 0 AE  0 0  0 0 K̄ =  −1 0 1 0 L 0 0 0 0   ,  (3.52) where A is the cross-sectional area of the rod, E is Young’s modulus of the rod material, and L is the rod length. In the global coordinate system, K = ΓT K̄Γ, where the 4 × 4 transformation matrix is Γ= and the rotation matrix is R= Thus,   R 0 0 R  (3.53) , cos θ sin θ − sin θ cos θ (3.54)  .   c s 0 1 0 −1 0 c −s 0 0   −s c  0 0 AE  0 0 0 s c 0 0    K= c 1 0  0 0 0 c −s   −1 0 L  0 0 0 −s 0 0 0 0 0 0 s c   2 2 c cs −c −cs  AE  cs s2 −cs −s2  , = 2 c2 cs  L  −c −cs −cs −s2 cs s2  where c = cos θ and s = sin θ. 47 (3.55)  0 0   s  c (3.56) 3.3 Isotropic Tensors An isotropic tensor is a tensor which is independent of coordinate system (i.e., invariant under an orthogonal coordinate transformation). The Kronecker delta δij is a second rank tensor and isotropic, since δ̄ij = δij , and Ī = RIRT = RRT = I. (3.57) That is, the identity matrix I is invariant under an orthogonal coordinate transformation. It can be shown in tensor analysis that δij is the only isotropic tensor of rank 2 and, moreover, δij is the characteristic tensor for all isotropic tensors: Rank 1 2 3 4 odd Isotropic Tensors none cδij none aδij δkl + bδik δjl + cδil δjk none That is, all isotropic tensors of rank 4 must be of the form shown above, which has three constants. For example, in generalized Hooke’s law (Eq. 3.35), the material property tensor cijkl has 34 = 81 constants (assuming no symmetry). For an isotropic material, cijkl must be an isotropic tensor of rank 4, thus implying at most three independent elastic material constants (on the basis of tensor analysis alone). The actual number of independent material constants for an isotropic material turns out to be two rather than three, a result which depends on the existence of a strain energy function, which implies the additional symmetry cijkl = cklij . 4 Least Squares Problems Consider the rectangular system of equations Ax = b. (4.1) where A is an m × n matrix with m > n. That is, we are interested in the case where there are more equations than unknowns. Assume that the system is inconsistent, so that Eq. 4.1 has no solution. Our interest here is to find a vector x which “best” fits the data in some sense. We state the problem as follows: If A is a m × n matrix, with m > n, and Ax = b, find x such that the scalar error measure E = |Ax − b| is minimized, where E is the length of the residual Ax − b (error). To solve this problem, we consider the square of the error: E 2 = (Ax − b)T (Ax − b) = (xT AT − bT )(Ax − b) = xT AT Ax − xT AT b − bT Ax + bT b, 48 (4.2) or, in index notation using the summation convention, E 2 = xi (AT )ij Ajk xk − xi (AT )ij bj − bi Aij xj + bi bi = xi Aji Ajk xk − xi Aji bj − bi Aij xj + bi bi . (4.3) We wish to find the vector x which minimizes E 2 . Thus, we set ∂(E 2 ) = 0 for each l, ∂xl (4.4) where we note that ∂xi = δil , ∂xl where δij is the Kronecker delta defined as  1, i = j, δij = 0, i 6= j. (4.5) (4.6) We then differentiate Eq. 4.3 to obtain 0= ∂(E 2 ) = δil Aji Ajk xk + xi Aji Ajk δkl − δil Aji bj − bi Aij δjl + 0 ∂xl = Ajl Ajk xk + xi Aji Ajl − Ajl bj − bi Ail , (4.7) (4.8) where the dummy subscripts k in the first term and j in the third term can both be changed to i. Thus, 0= ∂(E 2 ) = Ajl Aji xi + Aji Ajl xi − Ail bi − Ail bi = (2AT Ax − 2AT b)l ∂xl (4.9) or AT Ax = AT b. (4.10) The problem solved is referred to as the least squares problem, since we have minimized E, the square root of the sum of the squares of the components of the vector Ax − b. The equations, Eq. 4.10, which solve the least squares problem are referred to as the normal equations. Since A is an m × n matrix, the coefficient matrix AT A in Eq. 4.10 is a square n × n matrix. Moreover, AT A is also symmetric, since (AT A)T = AT (AT )T = AT A. (4.11) If AT A is invertible, we could “solve” the normal equations to obtain x = (AT A)−1 AT b. (4.12) However, (AT A)−1 may not exist, since the original system, Eq. 4.1, may not have been over-determined (i.e., rank r < n). 49 f (x) ✻ r r (x1 , f1 ) p(x) x1 Error ✻ ❄ at a point ✒ r r x2 x3 x4 ✲ x Figure 12: Example of Least Squares Fit of Data. We note that Eq. 4.10 could have been obtained simply by multiplying both sides of Eq. 4.1 by AT , but we would not have known that the result is a least-squares solution. We also recall that the matrix (AT A)−1 AT in Eq. 4.12 is the pseudo left inverse of A. Thus, the pseudo left inverse is equivalent to a least squares solution. We also note that, if A is square and invertible, then x = (AT A)−1 AT b = A−1 (AT )−1 AT b = A−1 b, (4.13) as expected. One of the most important properties associated with the normal equations is that, for any matrix A, rank(AT A) = rank(A), (4.14) as proved in §2.5 following Eq. 2.42. We recall that the implication of this result is that, if rank(A) = n (i.e., A’s columns are independent), then AT A is invertible. 4.1 Least Squares Fitting of Data Sometimes one has a large number of discrete values of some function and wants to determine a fairly simple approximation to the data, but make use of all the data. For example, as shown in Fig. 12, suppose we know the points (x1 , f1 ), (x2 , f2 ), . . . , (xm , fm ), and we want to approximate f with the low-order polynomial p(x) = a0 + a1 x + a2 x2 + · · · + an xn , (4.15) where the number m of points is large, and the order n of the polynomial is small. The goal of the fit is for p to agree as closely as possible with the data. For example, we could require that the algebraic sum of the errors would be minimized by p, but positive and negative errors would cancel each other, thus distorting the evaluation of the fit. We could alternatively require that the sum of the absolute errors be minimum, but it turns out that that choice results in a nasty mathematical problem. Hence we pick p(x) such that the sum of the squares of the errors, R = [f1 − p(x1 )]2 + [f2 − p(x2 )]2 + · · · + [fm − p(xm )]2 , (4.16) is minimum. We note that the quantities in brackets are the errors at each of the discrete points where the function is known. R is referred to as the residual. 50 For example, consider the linear approximation p(x) = a0 + a1 x, (4.17) where the measure of error is given by R(a0 , a1 ) = [f1 − (a0 + a1 x1 )]2 + [f2 − (a0 + a1 x2 )]2 + · · · + [fm − (a0 + a1 xm )]2 . For R a minimum, ∂R ∂R = 0 and = 0, ∂a0 ∂a1 (4.18) (4.19) which implies, from Eq. 4.18, 2[f1 − (a0 + a1 x1 )](−1) + 2[f2 − (a0 + a1 x2 )](−1) + · · · + 2[fm − (a0 + a1 xm )](−1) = 0 2[f1 − (a0 + a1 x1 )](−x1 ) + 2[f2 − (a0 + a1 x2 )](−x2 ) + · · · + 2[fm − (a0 + a1 xm )](−xm ) = 0 (4.20) or ma0 + (x1 + x2 + · · · + xm )a1 = f1 + f2 + · · · + fm (x1 + x2 + · · · + xm )a0 + (x21 + x22 + · · · + x2m )a1 = f1 x1 + f2 x2 + · · · + fm xm . (4.21) This is a symmetric system of two linear algebraic equations in two unknowns (a0 , a1 ) which can be solved for a0 and a1 to yield p(x). An alternative approach would be to attempt to find (a0 , a1 ) such that the linear function goes through all m points: a0 + a1 x1 = f1 a0 + a1 x2 = f2 a0 + a1 x3 = f3 (4.22) .. . a0 + a1 xm = fm , which is an overdetermined system if m > 2. In matrix notation, this system can be written Aa = f , where     A=    1 x1 1 x2   1 x3  , .. ..  . .  1 xm    f 1          f 2   f 3 , f=  ..       .     f  m (4.23) a=  a0 a1  . (4.24) This system was solved previously by multiplying by AT : AT Aa = AT f , 51 (4.25) where AT A =  1 1 1 ··· x1 x2 x3 · · · and T A f=     1   xm   1 1 1 ··· x1 x2 x3 · · · Thus, Eq. 4.25 becomes  m X xi  m  i=1  X m m X  x2i xi i=1 i=1     m 1 x1 X  xi  1 x2   m    i=1 1 x3  =  m , m X X    .. .. 2 xi xi . .  i=1 i=1 1 xm    X m  f1          fi   f2     1 i=1 f3 = m X xm   ..        xi f i .      f   i=1 m  X m   fi      a0 i=1 =  m X a   1   xi f i  i=1          ,          . (4.26) (4.27) (4.28) which is identical to Eq. 4.21. That is, both approaches give the same result. The geometric interpretation of minimizing the sum of squares of errors is equivalent to the algebraic interpretation of finding x such that |Aa − f | is minimized. Both approaches yield Eq. 4.28. Hence, the two error measures are also the same; i.e., E 2 = R. 4.2 The General Linear Least Squares Problem Given m points (x1 , f1 ), (x2 , f2 ), . . . , (xm , fm ), we could, in general, perform a least squares fit using a linear combination of other functions φ1 (x), φ2 (x), . . . , φn (x). For example, we could attempt a fit using the set of functions {ex , 1/x, sin x}. The functions used in the linear combination are referred to as basis functions (or basis vectors). Thus, the more general least squares problem is to find the coefficients ai in p(x) = a1 φ1 (x) + a2 φ2 (x) + · · · + an φn (x) (4.29) so that the mean square error, Eq. 4.16, between the function p(x) and the original points is minimized. If we use the algebraic approach to set up the equations, we attempt to force p(x) to go through all m points: a1 φ1 (x1 ) + a2 φ2 (x1 ) + · · · + an φn (x1 ) = f1 a1 φ1 (x2 ) + a2 φ2 (x2 ) + · · · + an φn (x2 ) = f2 .. . a1 φ1 (xm ) + a2 φ2 (xm ) + · · · + an φn (xm ) = fm , 52 (4.30) which is an over-determined system if m > n. Thus, Aa = f or Aij aj = fi , (4.31) where the summation convention is used, and the coefficient matrix is Aij = φj (xi ). (4.32) The least squares solution of this problem is therefore found by multiplying by AT : (AT A)a = AT f , (4.33) where (using the summation convention) (AT A)ij = (AT )ik Akj = Aki Akj = φi (xk )φj (xk ) (4.34) (AT f )i = (AT )ij fj = Aji fj = φi (xj )fj . (4.35) and Thus, the normal equations for the general linear least squares problem are (in index notation) [φi (xk )φj (xk )]aj = φi (xj )fj . (4.36) The order of this system is n, the number of basis functions. The straight line fit considered earlier is a special case of this formula. 4.3 Gram-Schmidt Orthogonalization One of the potential problems with the least squares fits described above is that the set of basis functions is not an orthogonal set (even though the functions may be independent), and the resulting system of equations (the normal equations) may be hard to solve (i.e, the system may be poorly conditioned). This situation can be improved by making the set of basis functions an orthogonal set. The Gram-Schmidt procedure is a method for finding an orthonormal basis for a set of independent vectors. Note that, given a set of vectors, we cannot simply use as our orthonormal basis the set {e(1) , e(2) , e(3) , . . .}, since the given vectors may define a subspace of a larger dimensional space (e.g., a skewed plane in R3 ). Consider the following problem: Given a set of independent vectors a(1) , a(2) , a(3) , . . ., find a set of orthonormal vectors q(1) , q(2) , q(3) , . . . that span the same space. (A set of vectors is orthonormal if the vectors in the set are both mutually orthogonal and normalized to unit length.) We start the Gram-Schmidt procedure by choosing the first unit basis vector to be parallel to the first vector in the set: a(1) q(1) = (1) , (4.37) |a | as shown in Fig. 13. We define the second unit basis vector so that it is coplanar with a(1) and a(2) : a(2) − (a(2) · q(1) )q(1) . (4.38) q(2) = (2) |a − (a(2) · q(1) )q(1) | 53 ✒ a(2) − (a(2) · q(1) )q(1) ▼❇ ❇ ❇ ✏✏ a(2) ✶ ✏ ✏✏ ✏ ✶ ✏ ✏✏ a(1) q(1) Figure 13: The First Two Vectors in Gram-Schmidt. Since the third unit basis vector must be orthogonal to the first two, we create q(3) by subtracting from a(3) any part of a(3) which is parallel to either q(1) or q(2) . Hence, q(3) = q (4) and so on. 4.4 a(3) − (a(3) · q(1) )q(1) − (a(3) · q(2) )q(2) , |a(3) − (a(3) · q(1) )q(1) − (a(3) · q(2) )q(2) | a(4) − (a(4) · q(1) )q(1) − (a(4) · q(2) )q(2) − (a(4) · q(3) )q(3) , = (4) |a − (a(4) · q(1) )q(1) − (a(4) · q(2) )q(2) − (a(4) · q(3) )q(3) | (4.39) (4.40) QR Factorization Consider an m × n matrix A whose columns a(1) , a(2) , a(3) , . . . , a(n) are independent. If we have an orthonormal basis q(1) , q(2) , q(3) , . . . , q(n) for the columns, then each column can be expanded in terms of the orthonormal basis:  a(1) = (a(1) · q(1) )q(1) + (a(1) · q(2) )q(2) + · · · + (a(1) · q(n) )q(n)    a(2) = (a(2) · q(1) )q(1) + (a(2) · q(2) )q(2) + · · · + (a(2) · q(n) )q(n)  , (4.41) ..  .   a(n) = (a(n) · q(1) )q(1) + (a(n) · q(2) )q(2) + · · · + (a(n) · q(n) )q(n)  or, in terms of matrices,       A = a(1) a(2) · · · a(n) = q(1) q(2) · · · q(n)    a(1) · q(1) a(2) · q(1) · · · a(1) · q(2) a(2) · q(2) · · ·   .. .. ..  . . . .  (1) (n) (2) (n) a ·q a ·q ··· (4.42) However, if the q vectors were obtained from a Gram-Schmidt process, q(1) is parallel to a(1) , and a(1) · q(2) = a(1) · q(3) = a(1) · q(4) = · · · = 0. (4.43) Also, q(2) is in the plane of a(1) and a(2) , in which case q(3) · a(2) = q(4) · a(2) = q(5) · a(2) = · · · = 0. Hence, Eq. 4.42 becomes  A= a (1) a (2) ··· a (n)   = q (1) q (2) ··· q (n) 54      a(1) · q(1) a(2) · q(1) a(3) · q(1) 0 a(2) · q(2) a(3) · q(2) 0 0 a(3) · q(3) .. .. .. . . . (4.44) ··· ··· ··· .. .      (4.45) or A = QR, (4.46) where Q is the matrix formed with the orthonormal columns q(1) , q(2) , q(3) , . . . , q(n) of the Gram-Schmidt process, and R is an upper triangular matrix. We thus conclude that every m × n matrix with independent columns can be factored into A = QR. The benefit of the QR factorization is that it simplifies the least squares solution. We recall that the rectangular system Ax = b (4.47) with m > n was solved using least squares by solving the normal equations AT Ax = AT b. (4.48) A = QR (4.49) If A is factored into (where A and Q are m × n matrices, and R is an n × n matrix), Eq. 4.48 becomes RT QT QRx = RT QT b, (4.50) where, in block form, Thus,   q(1)T    q(2)T T Q Q= ..  .    q(n)T          q(1) q(2) · · · q  (n)     =    ··· 0 ··· 0   ··· 0   = I. . . ..  . .  0 0 0 ··· 1 1 0 0 .. . 0 1 0 .. . 0 0 1 .. . RT Rx = RT QT b, (4.51) (4.52) and, since R is upper triangular and invertible, the normal equations become Rx = QT b. (4.53) This system can be solved very quickly, since R is an upper triangular matrix. However, the major computational expense of the least squares problem is factoring A into QR, i.e., performing the Gram-Schmidt process to find Q and R. 5 Fourier Series Consider two functions f (x) and g(x) defined in the interval [a, b] (i.e., a ≤ x ≤ b). We define the inner product (or scalar product) of f and g as Z b f (x)g(x) dx. (5.1) (f, g) = a Note the analogy between this definition for functions and the dot product for vectors. 55 We define the norm Nf of a function f as 1 Nf = (f, f ) 2 = s Z b [f (x)]2 dx. (5.2) a Note that a new function f¯(x) defined as f (x) f¯(x) = Nf (5.3) has unit norm, i.e., (f¯, f¯) = 1. The function f¯ is said to be normalized over [a, b]. Thus, this definition of the norm of a function is analogous to the length of a vector. We define two functions f and g as orthogonal over [a, b] if (f, g) = 0. Orthogonality of functions is analogous to the orthogonality of vectors. We define the set of functions φi (x), i = 1, 2, . . . , as an orthonormal set over [a, b] if (φi , φj ) = δij , where δij is the Kronecker delta defined as  1, i = j, (5.4) δij = 0, i 6= j. For example, the set of functions {1, cos x, sin x, cos 2x, sin 2x, cos 3x, sin 3x, . . .} is an orthogonal set over [−π, π], since, for integers m and n, (cos nx, sin mx) = 0, (sin nx, sin mx) = (cos nx, cos mx) = 0, m 6= n. To compute the norms of these functions, we note that Z π (1)2 dx = 2π, −π Z π Z π 2 cos2 nx dx = π sin nx dx = −π (5.5) (5.6) −π for all integers n. Thus, the set of functions   cos x sin x cos 2x sin 2x 1 √ , √ , √ , √ , √ , ... π π π π 2π is an orthonormal set over [−π, π]. The Fourier series of a function defined over [−L, L] is an expansion of the function into sines and cosines: ∞ ∞ X nπx X nπx + . (5.7) Bn sin f (x) = A0 + An cos L L n=1 n=1 This equation is an expansion in functions orthogonal over [−L, L], analogous to the expansion of a vector in orthogonal basis vectors. The basis functions here are   πx 2πx 2πx 3πx 3πx πx , sin , cos , sin , cos , sin , ... . 1, cos L L L L L L 56 To determine the coefficients An , Bn of the Fourier series, we take the inner product of both sides of Eq. 5.7 with each basis function; that is, we multiply both sides of that equation by a basis function, and integrate. For example, Z Z Z L ∞ X mπx mπx nπx cos dx + cos dx cos An L L L −L −L n=1  Z L ∞ X mπx nπx A0 (2L), m = 0, + cos dx = Bn sin Am (L), m 6= 0, L L −L n=1 L mπx f (x) cos dx =A0 L −L L since the integrals in Eq. 5.8 are Z L mπx dx = cos L −L Z L nπx mπx cos cos dx = L L −L Z L Z −L  2L, m = 0, 0, m = 6 0, L nπx mπx sin sin dx = L L −L sin (5.8) (5.9)  L, m = n, 0, m = 6 n, nπx mπx cos dx = 0 L L for integers m and n. Similarly, we can evaluate Bn . Thus, the coefficients An , Bn Fourier series, Eq. 5.7, are Z L 1 f (x) dx, A0 = 2L −L Z nπx 1 L f (x) cos An = dx, n > 0, L −L L Z nπx 1 L f (x) sin dx. Bn = L −L L (5.10) (5.11) in the (5.12) (5.13) (5.14) Note that A0 is the average value of f (x) over the domain [−L, L]. The evaluation of the coefficients using integrals could have alternatively been expressed in terms of inner products, in which case, from Eq. 5.7 (with L = π), (f, cos mx) = Am (cos mx, cos mx) or An = (f, cos nx) . (cos nx, cos nx) (5.15) (5.16) It is interesting to note the similarity between this last formula and the formula of the vector projection p of a vector b onto another vector a (Fig. 14):   a a b·a p= b· = a. (5.17) |a| |a| a·a Thus, the nth Fourier coefficient can be interpreted as being the “projection” of the function f on the nth basis function (cos nx). 57 b ✒ ✏ ✏✏ ✏✏ ✶ ✏ ✏✏ ✶ a p Figure 14: The Projection of One Vector onto Another. 5.1 Example We evaluate the Fourier series for the function  0, −5 < x < 0, f (x) = 3, 0 < x < 5. (5.18) From Eqs. 5.12–5.14 (with L = 5), the Fourier coefficients are Z 5 Z 5 1 1 f (x) dx = 3 dx = 3/2, A0 = 10 −5 10 0 Z Z 1 5 1 5 nπx nπx dx = dx = 0, n > 0, f (x) cos 3 cos An = 5 −5 5 5 0 5  Z Z 1 5 1 5 nπx nπx 6/(nπ), n odd, Bn = dx = dx = f (x) sin 3 sin 0, n even. 5 −5 5 5 0 5 (5.19) (5.20) (5.21) Thus the corresponding Fourier series is   nπx 3 6 πx 1 3πx 1 5πx 3 6 X 1 sin = + sin + sin + sin + ··· . f (x) = + 2 π n odd n 5 2 π 5 3 5 5 5 (5.22) The convergence of this series can be seen by evaluating the partial sums f (x) ≈ fN (x) = A0 + N X nπx nπx X + Bn sin L L n=1 N An cos n=1 (5.23) for different values of the upper limit N . Four such representations of f (x) are shown in Fig. 15 for N = 5, 10, 20, and 40. Since half the sine terms and all the cosine terms (except the constant term) are zero, the four partial sums have 4, 6, 11, and 21 nonzero terms, respectively. Notice also in Fig. 15 that, as the number of terms included in the Fourier series increases, the overshoot which occurs at a discontinuity does not diminish. This property of Fourier series is referred to as the Gibbs phenomenom. Notice also that, at a discontinuity of the function, the series converges to the average value of the function at the discontinuity. 5.2 Generalized Fourier Series The expansion of a function in a series of sines and cosines is a special case of an expansion in other orthogonal functions. In the generalized Fourier series representation of the function 58 y ✻ y ✻ 3 3 N =5 N = 10 ✲ -5 5 ✲ x -5 5 y ✻ x y ✻ 3 3 N = 20 N = 40 ✲ -5 5 ✲ x -5 5 x Figure 15: Convergence of Series in Example. f (x), f (x) ≈ N X ci φi (x), (5.24) i=1 we wish to evaluate the coefficients ci so as to minimize the mean square error #2 Z b" N X MN = f (x) − ci φi (x) dx, a (5.25) i=1 where (φi , φj ) = δij (Kronecker delta). Thus, the series representation in Eq. 5.24 is an expansion in orthonormal functions. To minimize MN , we set, for each j, # Z b " N X ∂MN 0= 2 f (x) − = ci φi (x) [−φj (x)] dx, (5.26) ∂cj a i=1 or Z b f (x)φj (x) dx = a N X i=1 That is, ci Z b φi (x)φj (x) dx = a N X ci δij = cj . (5.27) i=1 ci = (f, φi ). (5.28) Alternatively, had we started with the series, Eq. 5.24, we would have obtained the same result by taking the inner product of each side with φj : (f, φj ) = N X ci (φi , φj ) = i=1 N X ci δij = cj . (5.29) i=1 The expansion coefficients ci are called the Fourier coefficients, and the series is called the generalized Fourier series. The generalized Fourier series thus has the property that the 59 series representation of a function minimizes the mean square error between the series and the function. The formula for the generalized Fourier coefficient ci in Eq. 5.28 can be interpreted as the projection of the function f (x) on the ith basis function φi . This interpretation is analogous to the expansion of a vector in terms of the unit basis vectors in the coordinate directions. For example, in three-dimensions, the vector v is, in Cartesian coordinates, v = v x ex + v y e y + v z e z , (5.30) where the components vx , vy , and vz are the projections of v on the coordinate basis vectors ex , ey , and ez . That is, vx = v · ex , vy = v · ey , v z = v · ez . (5.31) These components are analogous to the Fourier coefficients ci . From Eq. 5.25, we can deduce some additional properties of the Fourier series. From that equation, the mean square error after summing N terms of the series is ! N ! Z b X N N X X MN = (f, f ) − 2 ci (f, φi ) + ci φi cj φj dx a i=1 = (f, f ) − 2 = (f, f ) − 2 = (f, f ) − N X c2i + i=1 j=1 ci cj (φi , φj ) i=1 j=1 i=1 N X N N X X c2i i=1 N X + N X c2i i=1 c2i . (5.32) i=1 Since, by definition, MN ≥ 0, this last equation implies N X i=1 c2i ≤ (f, f ), (5.33) which is referred to as Bessel’s Inequality. Since the right-hand side of this inequality (the square of the norm) is fixed for a given function, and each term of the left-hand side is non-negative, a necessary condition for a Fourier series to converge is that ci → 0 as i → ∞. From Eq. 5.32, the error after summing N terms of the series is MN = (f, f ) − N X c2i , (5.34) i=1 so that the error after N + 1 terms is MN +1 = (f, f ) − N +1 X i=1 c2i = (f, f ) − 60 N X i=1 c2i − c2N +1 = MN − c2N +1 . (5.35) ✻ f (x) q ✲ x Figure 16: A Function with a Jump Discontinuity. That is, adding a new term in the series decreases the error (if that new term is nonzero). Thus, in general, higher order approximations are better than lower order approximations. The set of basis functions φi (x) is defined as a complete set if #2 Z b" N X lim MN = lim f (x) − (5.36) ci φi (x) dx = 0. N →∞ N →∞ a i=1 If this equation holds, the Fourier series is said to converge in the mean. Having a complete set means that the set of basis functions φi (x) is not missing any members. Thus, we could alternatively define the set φi (x) as complete if there is no function with positive norm which is orthogonal to each basis function φi (x) in the set. It turns out that the set of basis functions in Eq. 5.7 forms a complete set in [−L, L]. Note that convergence in the mean does not imply convergence pointwise. That is, it may not be true that N X (5.37) ci φi (x) = 0 lim f (x) − N →∞ i=1 for all x. For example, the series representation of a discontinuous function f (x) will not have pointwise convergence at the discontinuity (Fig. 16). The vector analog to completeness for sets of functions can be illustrated by observing that, to express an arbitrary three-dimensional vector in terms of a sum of coordinate basis vectors, three basis vectors are needed. A basis consisting of only two unit vectors is incomplete, since it in incapable of representing arbitrary vectors in three dimensions. 5.3 Fourier Expansions Using a Polynomial Basis Recall, in the generalized Fourier series, we represent the function f (x) with the expansion f (x) = N X ci φi (x), (5.38) i=1 although the equality is approximate if N is finite. Let the basis functions be {φi } = {1, x, x2 , x3 , . . .}, (5.39) which is a set of independent, nonorthogonal functions. If we take the inner product of both sides of Eq. 5.38 with φj , we obtain (f, φj ) = N X i=1 61 ci (φi , φj ), (5.40) where, in general, since the basis functions are nonorthogonal, (φi , φj ) 6= 0 for i 6= j. (5.41) Thus, Eq. 5.40 is a set of coupled equations which can be solved to yield ci : (f, φ1 ) = (φ1 , φ1 )c1 + (φ1 , φ2 )c2 + (φ1 , φ3 )c3 + · · · (f, φ2 ) = (φ2 , φ1 )c1 + (φ2 , φ2 )c2 + (φ2 , φ3 )c3 + · · · (f, φ3 ) = (φ3 , φ1 )c1 + (φ3 , φ2 )c2 + (φ3 , φ3 )c3 + · · · .. . (5.42) Ac = b, (5.43) or where Aij = (φi , φj ),    c1         c  2   c3 c= ,  ..     .      c   N   (f, φ1 )      (f, φ2 ) (f, φ3 ) b=  ..   .    (f, φ ) N and φi (x) = xi−1 . If we assume the domain 0 ≤ x ≤ 1, we obtain Aij = or Z 1 x i−1 j−1 x dx = 0     A=    Z 1 1 2 1 3 1 4 .. . 1 xi+j−2 dx = 0 1 2 1 3 1 4 1 5 .. . 1 3 1 4 1 5 1 6 .. . which is referred to as the Hilbert matrix. There are several problems with this approach: 1 4 1 5 1 6 1 7 .. . xi+j−1 i+j−1 ··· ··· ··· ··· .. .              1 = 0 , 1 i+j−1 (5.44) (5.45)     ,    (5.46) 1. Since the set {φi } is not orthogonal, we must solve a set of coupled equations to obtain the coefficients ci . 2. The coefficients ci depend on the number N of terms in the expansion. That is, to add more terms, all ci change, not just the new coefficients. 3. The resulting coefficient matrix, the Hilbert matrix, is terribly conditioned even for very small matrices. 62 ✒ a(2) − (a(2) · q(1) )q(1) ▼❇ ❇ ❇ ✏✏ a(2) ✶ ✏ ✏✏ ✏ ✶ ✏ ✏✏ a(1) q(1) Figure 17: The First Two Vectors in Gram-Schmidt. These problems can be eliminated if we switch to a set of orthogonal polynomials, which can be found using the Gram-Schmidt process. Let φi (x) denote the nonorthogonal set of basis functions φ0 (x) = 1, φ1 (x) = x, φ2 (x) = x2 , . . . , (5.47) where we have chosen to count the functions starting from zero. We let Pi (x) denote the set of orthogonal basis functions to be determined. These functions will not be required to have unit norm. To facilitate the application of Gram-Schmidt to functions, we recall the Gram-Schmidt algorithm for vectors. Given a set of independent, nonorthogonal vectors a(1) , a(2) , a(3) , . . ., this procedure computes a set of orthonormal vectors q(1) , q(2) , q(3) , . . . that span the same space. For example, the first two orthonormal vectors are q(1) = a(1) , |a(1) | q(2) = a(2) − (a(2) · q(1) )q(1) , |a(2) − (a(2) · q(1) )q(1) | (5.48) as illustrated in Fig. 17. For convenience, we choose the domain −1 ≤ x ≤ 1, in which case (φi , φj ) = 0, i + j = odd, (5.49) since the integral of an odd power of x over symmetrical limits is zero (i.e., the integrand is an odd function of x). Thus, by analogy with Gram-Schmidt for vectors, we obtain P0 (x) = φ0 (x) = 1, P1 (x) = φ1 (x) − (φ1 , P0 ) P2 (x) = φ2 (x)−(φ2 , P0 ) where Thus, 1 P0 (x) = x − (x, 1) = x − 0 = x, (P0 , P0 ) (1, 1) (5.51) P0 (x) P1 (x) 1 x −(φ2 , P1 ) = x2 −(x2 , 1) −(x2 , x) , (5.52) (P0 , P0 ) (P1 , P1 ) (1, 1) (x, x) Z 1 2 x dx = , (x , 1) = 3 −1 2 (5.50) 2 (1, 1) = Z 1 dx = 2, (x2 , x) = 0. (5.53) −1 1 P2 (x) = x2 − , (5.54) 3 and so on. These polynomials, called Legendre polynomials, are orthogonal over [-1,1]. Because of their orthogonality, the Legendre polynomials are a better choice for polynomial expansions than the polynomials of the Taylor series, Eq. 5.47. 63 With the basis functions φi given by the Legendre polynomials, Eq. 5.40 becomes (f, φj ) = N X ci (φi , φj ) = cj (φj , φj ) (no sum on j), (5.55) i=1 since each equation in Eq. 5.40 has only one nonzero term on the right-hand side (the term i = j). 5.4 Similarity of Fourier Series With Least Squares Fitting In the least squares problem, given a finite number m of points (xi , fi ) and the n-term fitting function p(x) = ai φi (x), (5.56) the normal equations are, from Eq. 4.36, [φi (xk )φj (xk )]aj = fj φi (xj ), (5.57) where the order of the resulting system is n, the number of basis functions. The summation convention is used, so that a summation is implied over repeated indices. In the Fourier series problem, given a function f (x) and the generalized Fourier series f (x) = ci φi (x), (5.58) (φi , φj )cj = (f, φi ), (5.59) the coefficients are determined from where the summation convention is used again. Note the similarities between Eqs. 5.57 and 5.59. These two equations are essentially the same, except that Eq. 5.57 is the discrete form of Eq. 5.59. Also, the basis functions in Eq. 5.59 are orthogonal, so that Eq. 5.59 is a diagonal system. 6 Eigenvalue Problems If, for a square matrix A of order n, there exists a vector x 6= 0 and a number λ such that Ax = λx, (6.1) λ is called an eigenvalue of A, and x is the corresponding eigenvector. Note that Eq. 6.1 can alternatively be written in the form Ax = λIx, (6.2) (A − λI)x = 0, (6.3) where I is the identity matrix, or 64 ✲ k1 k2 m1 ✁❆❆✁✁❆❆✁✁ ❡ ✲ u1 (t) ✁✁❆❆✁✁❆❆✁✁ ❡ u2 (t) ✲ m2 ❡ f2 (t) ❡ Figure 18: 2-DOF Mass-Spring System. which is a system of n linear algebraic equations. If the coefficient matrix A − λI were nonsingular, the unique solution of Eq. 6.3 would be x = 0, which is not of interest. Thus, to obtain a nonzero solution of the eigenvalue problem, A − λI must be singular, which implies that det(A − λI) = a11 − λ a12 a13 a21 a22 − λ a23 a31 a32 a33 − λ .. .. .. . . . an1 an2 an3 ··· ··· ··· ... a1n a2n a3n .. . ··· ann − λ = 0. (6.4) We note that this determinant, when expanded, yields a polynomial of degree n referred to as the characteristic polynomial associated with the matrix. The equation obtained by equating this polynomial with zero is referred to as the characteristic equation for the matrix. The eigenvalues are thus the roots of the characteristic polynomial. Since a polynomial of degree n has n roots, A has n eigenvalues (not necessarily real). Eigenvalue problems arise in many physical applications, including free vibrations of mechanical systems, buckling of structures, and the calculation of principal axes of stress, strain, and inertia. 6.1 Example 1: Mechanical Vibrations Consider the undamped two-degree-of-freedom mass-spring system shown in Fig. 18. We let u1 and u2 denote the displacements from the equilibrium of the two masses m1 and m2 . The stiffnesses of the two springs are k1 and k2 . For the purposes of computation, let k1 = k2 = 1 and m1 = m2 = 1. From Newton’s second law of motion (F = ma), the equations of motion of this system are  ü1 + 2u1 − u2 = 0 , (6.5) ü2 − u1 + u2 = f2 where dots denote differentiation with respect to the time t. In matrix notation, this system can be written         ü1 u1 1 0 2 −1 0 + = (6.6) ü2 u2 0 1 −1 1 f2 or Mü + Ku = F, (6.7) where M=  1 0 0 1  , K=  2 −1 −1 1 65  , F=  0 f2  . (6.8) These three matrices are referred to as the system’s mass, stiffness, and force matrices, respectively. With zero applied force (the free undamped vibration problem), Mü + Ku = 0. (6.9) We look for nonzero solutions of this equation in the form u = u0 cos ωt. (6.10) That is, we look for solutions of Eq. 6.9 which are sinusoidal in time and where both DOF oscillate with the same circular frequency ω. The vector u0 is the amplitude vector for the solution. The substitution of Eq. 6.10 into Eq. 6.9 yields − Mω 2 u0 cos ωt + Ku0 cos ωt = 0. (6.11) Since this equation must hold for all time t, (−ω 2 M + K)u0 = 0 (6.12) Ku0 = ω 2 Mu0 (6.13) M−1 Ku0 = ω 2 u0 . (6.14) or or This is an eigenvalue problem in standard form if we define A = M−1 K and ω 2 = λ: Au0 = λu0 (6.15) (A − λI)u0 = 0. (6.16) or For this problem, −1 A=M K=  1 0 0 1  2 −1 −1 1  =  2 −1 −1 1  . (6.17) Since we want the eigenvalues and eigenvectors of A, it follows from Eq. 6.16 that det(A − λI) = 2 − λ −1 −1 1 − λ = (2 − λ)(1 − λ) − 1 = λ2 − 3λ + 1 = 0. (6.18) Thus, √ 3± 5 ≈ 0.382 and 2.618 , λ= 2 and the circular frequencies ω (the square root of λ) are p p ω1 = λ1 = 0.618, ω2 = λ2 = 1.618. 66 (6.19) (6.20) (a) ✁ ❆❆✁✁❆❆✁✁ ✁ ❆❆✁✁❆❆✁✁ ✲ 0.618 (b) ✲ 1.000 ✁✁❆❆✁✁❆❆✁✁ ✁✁❆❆✁✁❆❆✁✁ ✲ 1.000 (c) ✁✁❆❆✁✁❆❆✁✁ ✛ 0.618 ✁✁❆❆✁✁ Figure 19: Mode Shapes for 2-DOF Mass-Spring System. (a) Undeformed shape, (b) Mode 1 (masses in-phase), (c) Mode 2 (masses out-of-phase). To obtain the eigenvectors, we solve Eq. 6.16 for each eigenvalue found. For the first eigenvalue, λ = 0.382, we obtain      u1 1.618 −1 0 , (6.21) = u2 −1 0.618 0 which is a redundant system of equations satisfied by   0.618 (1) . u = 1 (6.22) The superscript indicates that this is the eigenvector associated with the first eigenvalue. Note that, since the eigenvalue problem is a homogeneous problem, any nonzero multiple of u(1) is also an eigenvector. For the second eigenvalue, λ = 2.618, we obtain      0 u1 −0.618 −1 = , (6.23) u2 −1 −1.618 0 a solution of which is u (2) =  1 −0.618  . (6.24) Physically, ωi is a natural frequency of the system (in radians per second), and u(i) is the corresponding mode shape. An n-DOF system has n natural frequencies and mode shapes. Note that natural frequencies are not commonly expressed in radians per second, but in cycles per second or Hertz (Hz), where ω = 2πf . The lowest natural frequency for a system is referred to as the fundamental frequency for the system. The mode shapes for this example are sketched in Fig. 19. For Mode 1, the two masses are vibrating in-phase, and m2 has the larger displacement, as expected. For Mode 2, the two masses are vibrating out-of-phase. The determinant approach used above is fine for small matrices but not well-suited to numerical computation involving large matrices. 67 6.2 Properties of the Eigenvalue Problem There are several useful properties associated with the eigenvalue problem: Property 1. Eigenvectors are unique only up to an arbitrary multiplicative constant. From the eigenvalue problem, Eq. 6.1, if x is a solution, αx is also a solution for any nonzero scalar α. We note that Eq. 6.1 is a homogeneous equation. Property 2. The eigenvalues of a triangular matrix are the diagonal entries. For example, let   1 4 5 A =  0 4 7 , (6.25) 0 0 9 for which the characteristic equation is det(A − λI) = 1−λ 4 5 0 4−λ 7 0 0 9−λ = (1 − λ)(4 − λ)(9 − λ) = 0, (6.26) which implies λ = 1, 4, 9. Although the eigenvalues of a triangular matrix are obvious, it turns out that the eigenvectors are not obvious. Property 3. The eigenvalues of a diagonal matrix are the diagonal entries, and the eigenvectors are            1 0 0                          1   0   0   0 , 0 , 1 , etc. (6.27)     ..  ..        0   .   .              0   0    ...   For example, let  11 0 0 A =  0 4 0 . 0 0 9  For λ1 = 11, x(1) since Ax(1) Similarly,    1  0 , =   0 (6.29)       11 0 0  1   11   1  0 = = 11 0 = λ1 x(1) . = 0 4 0  0       0 0 0 0 0 9  Ax(2)    0  =4 1 ,   0 Ax(3) 68 (6.28)    0  =9 0 .   1 (6.30) (6.31) Property 4. The sum of the eigenvalues of A is equal to the trace of A; i.e., for a matrix of order n, n X λi = tr A. (6.32) i=1 (The trace of a matrix is defined as the sum of the main diagonal terms.) To prove this property, we first note, from Eq. 6.4, that det(A − λI) = a11 − λ a12 a13 a21 a22 − λ a23 a31 a32 a33 − λ .. .. .. . . . an1 an2 an3 n ··· ··· ··· .. . a1n a2n a3n .. . ··· ann − λ = (−λ) + (a11 + a22 + · · · + ann )(−λ)n−1 + · · · . (6.33) No other products in the determinant contribute to the (−λ)n−1 term. For example, the cofactor of a12 deletes two matrix terms involving λ, so that the largest power of λ from that cofactor is n − 2. On the other hand, since det(A − λI) is a polynomial of degree n, we can write it in the factored form det(A − λI) = (λ1 − λ)(λ2 − λ) · · · (λn − λ) = (−λ)n + (λ1 + λ2 + · · · + λn )(−λ)n−1 + · · · + λ1 λ2 λ3 · · · λn . (6.34) Since the polynomials in Eqs. 6.33 and 6.34 must be identically equal for all λ, we equate the coefficients of the (−λ)n−1 terms to obtain λ1 + λ2 + · · · + λn = a11 + a22 + · · · + ann , (6.35) which is the desired result. Property 5. The product of the eigenvalues equals the determinant of A, i.e., λ1 λ2 · · · λn = det A. (6.36) This property results immediately from Eq. 6.34 by setting λ = 0. Property 6. If the eigenvectors of A are linearly independent, and a matrix S is formed whose columns are the eigenvectors, then the matrix product   λ1   λ2     −1 λ 3 S AS = Λ =  (6.37)    ...   λn 69 is diagonal. This property is proved in block form as follows:   AS = A x(1) x(2) · · · x(n)   = Ax(1) Ax(2) · · · Ax(n)   = λ1 x(1) λ2 x(2) · · · λn x(n)  λ1  λ2  (1) (2)   (n) λ3 = x x ··· x   ..  .  λn Since the columns of S are independent, S is invertible, and S−1 AS = Λ or A = SΛS−1 .     = SΛ.   (6.38) (6.39) Property 7. Distinct eigenvalues yield independent eigenvectors. To prove this property, we consider two eigensolutions: Ax(1) = λ1 x(1) (6.40) Ax(2) = λ2 x(2) . (6.41) To show that x(1) and x(2) are independent, we must show that c1 x(1) + c2 x(2) = 0 (6.42) implies c1 = c2 = 0. We first multiply Eq. 6.42 by A to obtain c1 Ax(1) + c2 Ax(2) = c1 λ1 x(1) + c2 λ2 x(2) = 0. (6.43) We then multiply Eq. 6.42 by λ2 to obtain c1 λ2 x(1) + c2 λ2 x(2) = 0. (6.44) If we subtract these last two equations, we obtain c1 (λ1 − λ2 )x(1) = 0, (6.45) which implies c1 = 0, since the eigenvalues are distinct (λ1 6= λ2 ). Similarly, c2 = 0, and the eigenvectors are independent. From the last two properties, we conclude that a matrix with distinct eigenvalues can always be diagonalized. It turns out that a matrix with repeated eigenvalues cannot always be diagonalized. Property 8. The eigenvalues and eigenvectors of a real, symmetric matrix are real. To prove this property, we consider the eigenvalue problem Ax = λx, 70 (6.46) where A is a real, symmetric matrix. If we take the complex conjugate of both sides of this equation, we obtain Ax∗ = λ∗ x∗ , (6.47) where the asterisk (*) denotes the complex conjugate. The goal is to show that λ = λ∗ . We multiply Eq. 6.46 by x∗T (the conjugate transpose) and Eq. 6.47 by xT to obtain x∗T Ax = λx∗T x, (6.48) xT Ax∗ = λ∗ xT x∗ , (6.49) where the two left-hand sides are equal, since they are the transposes of each other, and both are scalars (1 × 1 matrices). Thus, λx∗T x = λ∗ xT x∗ , (6.50) where x∗T x = xT x∗ 6= 0 implies λ = λ∗ (i.e., λ is real). The eigenvectors are also real, since they are obtained by solving the equation (A − λI)x = 0 (6.51) for x, given λ. In engineering applications, the form of the eigenvalue problem that is of greater interest than Eq. 6.46 is the generalized eigenvalue problem Ax = λBx, (6.52) where both A and B are real and symmetric. Having A and B real and symmetric is not sufficient, as is, to show that the generalized eigenvalue problem has only real eigenvalues, as can be seen from the example     0 1 1 0 , (6.53) , B= A= 1 0 0 −1 √ for which Eq. 6.52 has imaginary eigenvalues ±i, where i = −1. However, it turns out that, if B is not only real and symmetric but also positive definite (a term to be defined later in §6.8, p. 82), the generalized eigenvalue problem has real eigenvalues [10]. Since, in engineering applications, B is usually positive definite, the generalized eigenvalue problems that arise in engineering have real eigenvalues. Matrix formulations of engineering problems generally yield symmetric coefficient matrices, so this property is of great practical significance. As a result, various physical quantities of interest, which must be real, are guaranteed to be real, including the natural frequencies of vibration and mode shapes (the eigenvectors) of undamped mechanical systems, buckling loads in elastic stability problems, and principal stresses. Property 9. The eigenvectors of a real, symmetric matrix with distinct eigenvalues are mutually orthogonal. To prove this property, we consider from the outset the generalized eigenvalue problem Ax = λBx, (6.54) 71 where A and B are both real, symmetric matrices. For two eigenvalues λ1 and λ2 , Ax(1) = λ1 Bx(1) , (6.55) Ax(2) = λ2 Bx(2) , (6.56) from which it follows that the scalar λ1 x(2)T Bx(1) = x(2)T Ax(1) = x(1)T Ax(2) = λ2 x(1)T Bx(2) = λ2 x(2)T Bx(1) . (6.57) Thus, (λ1 − λ2 )x(2)T Bx(1) = 0, (6.58) and, if the eigenvalues are distinct (λ1 6= λ2 ), x(2)T Bx(1) = 0. (6.59) x(2)T Ax(1) = 0. (6.60) Also, since Ax = λBx, The eigenvectors are said to be orthogonal with respect to the matrices A and B. In mechanical vibrations, A represents the stiffness matrix, and B represents the mass matrix. For x an eigenvector, Eq. 6.54 implies the scalar equation xT Ax = λxT Bx (6.61) or xT Ax , (6.62) xT Bx which is referred to as the Rayleigh quotient. In mechanical vibrations, the numerator and denominator are the generalized stiffness and mass, respectively, for a given vibration mode, and λ = ω 2 is the square of the circular frequency. The orthogonality relations proved here also imply that λ= ST AS = ΛST BS, (6.63) where S is the matrix of eigenvectors (in the columns), Λ is the diagonal matrix of eigenvalues defined in Eq. 6.37, and ST AS and ST BS are both diagonal matrices. To prove Eq. 6.63, we note that, with the matrix S of eigenvectors defined as   S = x(1) x(2) · · · , (6.64) it follows that  (1)T   x x(2)T ST AS =   ...      A  x(1) x(2)  x(1)T Ax(1) x(1)T Ax(2) · · ·   (2)T (1) x(2)T Ax(2) · · ·  · · · =  x Ax , .. .. .. . . .  (6.65) where the off-diagonal terms are zero by orthogonality. A similar expression applies for B. 72 If Ax = λx (i.e., B = I), and the eigenvectors are normalized to unit length, then the matrix S is orthogonal, and ST AS = ΛST IS = Λ (6.66) or A = SΛST . (6.67) This last result, referred to as the spectral theorem, means that a real, symmetric matrix can be factored into A = SΛST , where S has orthonormal eigenvectors in the columns, and Λ is a diagonal matrix of eigenvalues. Conversely, if the eigenvalues of a matrix A are real, and the eigenvectors are mutually orthogonal, A is necessarily symmetric. Since orthogonal eigenvectors are independent, Eq. 6.39 implies that A can be written in the form A = SΛS−1 , (6.68) where S is the matrix whose columns are the eigenvectors. If the orthogonal eigenvectors are normalized to unit length, S is also orthogonal (i.e., S−1 = ST ), and A = SΛST , (6.69) which is a symmetric matrix. 6.3 Example 2: Principal Axes of Stress The matrix of stress components in 3-D elasticity  σ11 σ12 σ =  σ21 σ22 σ31 σ32 is a tensor of rank 2:  σ13 σ23  , σ33 (6.70) where the diagonal components of the matrix are the normal (or direct) stresses, and the off-diagonal components are the shear stresses. This matrix is symmetric. Recall that an orthogonal transformation of coordinates transforms σ according to the rule σ ′ = RσRT , (6.71) where Rij = e′i · ej , (6.72) e′i and ej are unit vectors in the coordinate directions, and R is an orthogonal matrix (RRT = I). A key problem is to determine whether there is an orthogonal transformation of coordinates (i.e., a rotation of axes) x′ = Rx such that the stress tensor σ is diagonalized:     ′ s1 0 0 σ11 0 0 ′ 0  =  0 s2 0  , (6.73) σ ′ =  0 σ22 ′ 0 0 s3 0 0 σ33 where the diagonal stresses in this new coordinate system are denoted s1 , s2 , s3 . 73 From Eq. 6.71, σRT = RT σ ′ . (6.74) We now let vi denote the ith column of RT ; i.e., RT = [v1 v2 v3 ], (6.75) in which case Eq. 6.74 can be written in matrix form as   s1 0 0 σ[v1 v2 v3 ] = [v1 v2 v3 ]  0 s2 0  = [s1 v1 s2 v2 s3 v3 ]. 0 0 s3 (6.76) (The various matrices in this equation are conformable from a “block” point of view, since the left-hand side is, for example, the product of a 1 × 1 matrix with a 1 × 3 matrix.) Each column of this equation is σvi = si vi (no sum on i). (6.77) Thus, the original desire to find a coordinate rotation which would transform the stress tensor to diagonal form reduces to seeking vectors v such that σv = sv. (6.78) Eq. 6.78 is an eigenvalue problem with s the eigenvalue, and v the corresponding eigenvector. The goal in solving Eq. 6.78 (and eigenvalue problems in general) is to find nonzero vectors v which satisfy Eq. 6.78 for some scalar s. Geometrically, the goal in solving Eq. 6.78 is to find nonzero vectors v which, when multiplied by the matrix σ, result in new vectors which are parallel to v. Eq. 6.78 is equivalent to the matrix system (σ − sI)v = 0. (6.79) For nontrivial solutions (v 6= 0), the matrix σ − sI must be singular, implying that det(σ − sI) = σ11 − s σ12 σ13 σ21 σ22 − s σ23 σ31 σ32 σ33 − s = 0, (6.80) which is referred to as the characteristic equation associated with the eigenvalue problem. The characteristic equation is a cubic polynomial in s, since, when expanded, yields (σ11 − s)[(σ22 − s)(σ33 − s) − σ23 σ32 ] − σ12 [σ21 (σ33 − s) − σ23 σ31 ] + σ13 [σ21 σ32 − σ31 (σ22 − s)] = 0 (6.81) or where − s3 + θ1 s2 − θ2 s + θ3 = 0,   θ1 = σ11 + σ22 + σ33 = σii = tr σ 2 2 2 θ2 = σ22 σ33 + σ33 σ11 + σ11 σ22 − σ31 − σ12 − σ23 = 21 (σii σjj − σij σji )  θ3 = det σ. 74 (6.82) (6.83) ✲ k1 =3 ✁❆ ✁✁❆❆✁✁ m1 =1 ❡ u1 (t) k2 =2 ✁❆❆✁✁❆❆✁✁ ✲ m2 =2 ❡ ❡ u2 (t) k3 =1 ✁❆❆✁✁❆❆✁✁ ✲ u3 (t) m3 =3 ❡ ❡ ❡ Figure 20: 3-DOF Mass-Spring System. The stresses s1 , s2 , s3 , which are the three solutions of the characteristic equation, are referred to as the principal stresses. The resulting stress tensor is   s1 0 0 (6.84) σ =  0 s2 0  , 0 0 s3 and the coordinate axes of the coordinate system in which the stress tensor is diagonal is referred to as the principal axes of stress or the principal coordinates. The principal axes of stress are the eigenvectors of the stress tensor, since the (normalized) eigenvectors are columns of RT (rows of R), where Rij = e′i · ej . (6.85) Thus, Row i of R consists of the direction cosines of the ith principal axis. Since the principal stresses are independent of the original coordinate system, the coefficients of the characteristic polynomial, Eq. 6.82, must be invariant with respect to a coordinate rotation. Thus, θ1 , θ2 , and θ3 are referred to as the invariants of the stress tensor. That is, the three invariants have the same values in all coordinate systems. In principal coordinates, the stress invariants are   θ1 = s1 + s2 + s3 θ2 = s2 s3 + s3 s1 + s1 s2 (6.86)  θ3 = s1 s2 s3 We note that the above theory for eigenvalues, principal axes, and invariants is applicable to all tensors of rank 2, since we did not use the fact that we were dealing specifically with stress. For example, strain is also a tensor of rank 2. A geometrical example of a tensor of rank 2 is the inertia matrix Iij whose matrix elements are moments of inertia. The determination of principal axes of inertia in two dimensions in an eigenvalue problem. 6.4 Computing Eigenvalues by Power Iteration Determinants are not a practical way to compute eigenvalues and eigenvectors except for very small systems (n = 2 or 3). Here we show a classical approach to solving the eigenproblem for large matrices. We will illustrate the procedure with a mechanical vibrations example. Consider the 3-DOF system shown in Fig. 20. For this system, the mass and stiffness matrices are     5 −2 0 1 0 0 3 −1  . (6.87) M =  0 2 0  , K =  −2 0 −1 1 0 0 3 75 The eigenvalue problem is, from Eq. 6.13, Ku = λMu (6.88) Au = λu, (6.89) or where u is the vector of displacement amplitudes, and      1 0 0 5 −2 0 5 −2 0      3 A = M−1 K =  0 21 0   −2 3 −1  =  −1 − 12  . 2 1 0 −1 1 0 − 13 0 0 13 3 (6.90) Notice that the product of two symmetric matrices need not be symmetric, even if the first matrix is diagonal. By definition, a solution (eigenvector) u of Eq. 6.89 is a vector such that Au is a scalar multiple of u, where the scalar multiple is the corresponding eigenvalue λ. Geometrically, u is an eigenvector if Au is parallel to u. That is, an eigenvector of a matrix A is a vector which is transformed by A into a new vector with the same orientation but (possibly) different length. Thus, our approach to computing a solution of Eq. 6.89 will be to attempt to guess an eigenvector, check if that vector is transformed by A into a parallel vector, and if, as expected, our guess is wrong, we try to improve upon the guess iteratively until convergence is achieved. For the matrix A given in Eq. 6.90, let our initial guess for an eigenvector be    1  (0) 0 , u = (6.91)   0 in which case Au(0)  5 −2 0   1  1  3 =  −1 −2  0 2   1 0 − 13 0 3        5 = −1     0       1.0 = 5 −0.2     0.0      = λ1 u(1) , (6.92) where, to allow easy comparison of u(1) with u(0) , we factored the largest component from the right-hand side vector. Since u(0) and u(1) are not parallel, u(0) is not an eigenvector. To determine if u(1) is an eigenvector, we repeat this calculation using u(1) :             5.40 1.00 1.0 5 −2 0         (1) 1  3 = = 5.40 = λ2 u(2) . (6.93) Au =  −1 −0.2 − −1.30 −0.24  2 2             1 0 − 13 0.0 0.067 0.01 3 Since u(1) and u(2) are not parallel, we conclude that u(1) is not an eigenvector, and we iterate again. It is useful to summarize all the iterations: 76              1   1.0   1.00   1.000   1.000   1.0000  −0.2 , 0 , −0.24 , −0.249 , −0.252 , · · · , −0.2518             0.0 0 0.01 0.015 0.016 0.0162 5.0 5.40 5.48 5.49 ··· 5.5036 Under each iterate for the eigenvector is shown the corresponding iterate for the eigenvalue. Thus, we conclude that one eigenvalue of A is 5.5036, and the corresponding eigenvector is    1.0000  −0.2518 .   0.0162 This iterative procedure is called the power method or power iteration. We now wish to determine which of the three eigenvalues has been found. A symmetric system of order n has n eigenvectors which are orthogonal. Hence, any vector of dimension n can be expanded in a linear combination of such vectors. Let the n eigenvectors of A be φ(1) , φ(2) , · · · , φ(n) , where |λ1 | < |λ2 | < · · · < |λn |. Then u(0) = c1 φ(1) + c2 φ(2) + · · · + cn φ(n) (6.94) and u(1) = Au(0) = c1 Aφ(1) + c2 Aφ(2) + · · · + cn Aφ(n) = c1 λ1 φ(1) + c2 λ2 φ(2) + · · · + cn λn φ(n) . (6.95) Similarly, u(2) = c1 λ21 φ(1) + c2 λ22 φ(2) + · · · + cn λ2n φ(n) , (6.96) and, in general, after r iterations, u(r) = c1 λr1 φ(1) + c2 λr2 φ(2) + · · · + cn λrn φ(n)   r  r  λ1 c 2 λ2 (1) (2) (n) r c1 = c n λn . φ + φ + ··· + φ c n λn c n λn Since λi <1 λn (6.97) (6.98) for all i < n, if follows that u(r) → cn λrn φ(n) as r → ∞. (6.99) This last equation shows that power iteration converges to the nth eigenvector (the one corresponding to the largest eigenvalue). However, the largest eigenvalue is rarely of interest in engineering applications. For example, in mechanical vibration problems, it is the low modes that are of interest. In fact, discrete element models such as finite element models are necessarily low frequency models, and the higher modes are not physically meaningful if the original problem was a continuum. In elastic stability problems, the buckling load factor turns out to be an eigenvalue, and it is the lowest buckling load which is normally of interest. 77 6.5 Inverse Iteration We recall that iteration with A in Au = λu, (6.100) yields the largest eigenvalue and corresponding eigenvector of A. We could alternatively write the eigenvalue problem as 1 (6.101) A−1 u = u, λ so that iteration with A−1 would converge to the largest value of 1/λ and hence the smallest λ. This procedure is called inverse iteration. In practice, A−1 is not found, since computing A−1 is both expensive and unnecessary. Instead, one solves the system of equations with A as coefficient matrix: 1 u(k) = Au(k+1) , (6.102) λ where k is the iteration number. Since, for each iteration, the same coefficient matrix A is used, one can factor A = LU, save the factors, and repeat the forward-backward substitution (FBS) for each iteration. For the mechanical vibration problem, where A = M−1 K, Eq. 6.102 becomes Mu(k) = 1 Ku(k+1) . λ (6.103) K is factored once, and the LU factors saved. Each iteration requires an additional matrix multiplication and FBS. Consider the generalized eigenvalue problem Ku = λMu, (6.104) where K and M are both real, symmetric matrices. We can shift the origin of the eigenvalue scale with the transformation λ = λ0 + µ, (6.105) where λ0 is a specified shift. Eq. 6.104 then becomes Ku = (λ0 + µ)Mu, (6.106) (K − λ0 M)u = µMu, (6.107) or which is an eigenvalue problem with µ as the eigenvalue. This equation can be arranged for inverse iteration to obtain 1 (6.108) Mu(k) = (K − λ0 M)u(k+1) , µ where k is the iteration number, so that inverse iteration would converge to the smallest eigenvalue µ. Thus, by picking λ0 , we can find the eigenvalue closest to the shift point λ0 . This procedure is referred to as inverse iteration with a shift. The matrix which must be factored is K − λ0 M. A practical application of inverse iteration with a shift would be the mechanical vibration problem in which one seeks the vibration modes closest to a particular frequency. 78 v ✻ ✟ ✟✟ ✛ ✯ ✟ ✟✟ ✟✟ w· w ✲ (1) Mφ |Mφ(1) | ✲ ✲ Mφ(1) Figure 21: Geometrical Interpretation of Sweeping. 6.6 Iteration for Other Eigenvalues We saw from the preceding sections that power iteration converges to the largest eigenvalue, and inverse iteration converges to the smallest eigenvalue. It is often of interest to find other eigenvalues as well. Here we describe the modification to the power iteration algorithm to allow convergence to the second largest eigenvalue. Consider again the eigenvalue problem Au = λu, (6.109) where A = M−1 K, and assume we have already found the largest eigenvalue and its corresponding eigenvector φ(1) . Let φ(2) denote the eigenvector corresponding to the second largest eigenvalue. Thus, Aφ(2) = λφ(2) , (6.110) where, by orthogonality, φ(2)T Mφ(1) = 0 or φ(2) · Mφ(1) = 0. (6.111) (The orthogonality relation has been written in both matrix and vector notations.) To find φ(2) , given φ(1) , we will perform another power iteration but with the added requirement that, at each iteration, the vector used in the iteration is orthogonal to φ(1) . For example, consider the vector w as a trial second eigenvector. Fig. 21 shows the plane of the two vectors w and Mφ(1) . Since a unit vector in the direction of Mφ(1) is Mφ(1) , |Mφ(1) | the length of the projection of w onto Mφ(1) is w· Mφ(1) , |Mφ(1) | and the vector projection of w onto Mφ(1) is Mφ(1) w· |Mφ(1) | ! Mφ(1) . |Mφ(1) | Thus, a new trial vector v which is orthogonal to Mφ(1) is ! Mφ(1) Mφ(1) v̂ = w − w · . |Mφ(1) | |Mφ(1) | 79 (6.112) This procedure of subtracting from each trial vector (at each iteration) any components parallel to Mφ(1) is sometimes referred to as a sweeping procedure. This procedure is also equivalent to the Gram-Schmidt orthogonalization procedure discussed previously, except that here we do not require that v be a unit vector. Note that, even though A is used in the iteration, we must use either M or K in the orthogonality relation rather than A or the identity I, since, in general, A is not symmetric (e.g., Eq. 6.90). With a nonsymmetric A, there is no orthogonality of the eigenvectors with respect to A or I. If A happens to be symmetric (a special case of the symmetric generalized eigenvalue problem), then the eigenvectors would be orthogonal with respect to A and I. Thus, even though we may iterate using A, the orthogonality requires M or K weighting. M is chosen, since it usually contains more zeros than K and, for the example to follow, is diagonal. To illustrate the sweeping procedure, we continue the mechanical vibrations example started earlier. Recall that, from power iteration,         5 −2 0 1 0 0 1.0000      (1) −1 3 1  A = M K =  −1 −2  , M =  0 2 0  , φ = −0.2518 , (6.113) 2     1 1 0 −3 0 0 3 0.0162 3 where Mφ(1)     1 0 0  1.0000   1.0000  −0.5036 , = =  0 2 0  −0.2518     0.0162 0.0486 0 0 3  (6.114) a vector whose length is |Mφ(1) | = 1.1207. Thus, the unit vector in the direction of Mφ(1) is   0.8923   (1) Mφ −0.4494 . = (6.115) |Mφ(1) |  0.0434  The iterations are most conveniently displayed with a spreadsheet: 80 i 0 1 2 3 4 5 6                   w(i)    1  0   0 0.5410 1.0401 −0.3655 0.6030 1.1552 −0.4505 0.6130 1.1725 −0.4634 0.6145 1.1747 −0.4651 0.6145 1.1751 −0.4653 0.6145 1.1751 −0.4653 α= Mφ(1) w · |Mφ (1) | (i) 0.8923                   −0.0005 −0.0006 −0.0001 0.0002 0.0000 0.0000 (1) Mφ w(i) − α |Mφ (1) |    0.2038  0.4010    −0.0387   0.5414  1.0399    −0.3655   0.6035  1.1549    −0.4505   0.6131  1.1725    −0.4634   0.6143  1.1748    −0.4651   0.6145  1.1751    −0.4653   0.6145  1.1751   −0.4653 v(i) λi 0.4010 1.0399 1.1549 1.1725 1.1748 1.1751 1.1751                      0.5082 1.0000 −0.0965 0.5206 1.0000 −0.3515 0.5226 1.0000 −0.3901 0.5229 1.0000 −0.3952 0.5229 1.0000 −0.3959 0.5229 1.0000 −0.3960 0.5229 1.0000 −0.3960                      Av(i) = w(i+1)    0.5410  1.0401    −0.3655   0.6030  1.1552    −0.4505   0.6130  1.1725    −0.4634   0.6145  1.1747    −0.4651   0.6145  1.1751    −0.4653   0.6145  1.1751   −0.4653 Thus, the second largest eigenvalue of A is 1.1751, and the corresponding eigenvector is    0.5229  1.0000 .   −0.3960 6.7 Similarity Transformations Two matrices A and B are defined as similar if B = M−1 AM for some invertible matrix M. The transformation from A to B (or vice versa) is called a similarity transformation. The main property of interest is that similar matrices have the same eigenvalues. To prove this property, consider the eigenvalue problem Ax = λx, (6.116) B = M−1 AM or A = MBM−1 . (6.117) and let The substitution of Eq. 6.117 into Eq. 6.116 yields MBM−1 x = λx (6.118) B(M−1 x) = λ(M−1 x), (6.119) or 81 which completes the proof. This equation also shows that, if x is an eigenvector of A, M−1 x is an eigenvector of B, and λ is the eigenvalue. A similarity transformation can be interpreted as a change of basis. Consider, for example, mechanical vibrations, where the eigenvalue problem computes the natural frequencies and mode shapes of vibration. If the eigenvalues (natural frequencies) do not change under the transformation, the system represented by A has not changed. However, the eigenvectors are transformed from x to M−1 x, a transformation which can be thought of as simply a change of coordinates. We recall from Property 6 of the eigenvalue problem (page 69) that, if a matrix M is formed with the eigenvectors of A in the columns, and the eigenvectors are independent,   λ1   λ2   −1 M AM = Λ =  (6.120) , . .   . λn where Λ is a diagonal matrix populated with the eigenvalues of A. Thus, the way to “simplify” a matrix (by diagonalizing it) is to find its eigenvectors. The eigenvalue problem is thus the basis for finding principal stresses and principal strains in elasticity and principal axes of inertia. 6.8 Positive Definite Matrices For a square matrix M, the scalar Q = xT Mx = x · Mx is called a quadratic form. In index notation, the quadratic form is Q= = n n X X Mij xi xj i=1 j=1 M11 x21 + (6.121) M22 x22 + · · · + (M12 + M21 )x1 x2 + (M13 + M31 )x1 x3 + · · · . (6.122) A matrix S is said to be symmetric if S = ST (i.e., Sij = Sji ). A matrix A is said to be antisymmetric (or skew symmetric) if A = −AT (i.e., Aij = −Aji ). For example, an antisymmetric matrix of order 3 is necessarily of the form   0 A12 A13 0 A23  . (6.123) A =  −A12 −A13 −A23 0 Any square matrix M can be written as the unique sum of a symmetric and an antisymmetric matrix M = S + A, where 1 1 S = ST = (M + MT ), A = −AT = (M − MT ). 2 2 For example, for a 3 × 3 matrix M,       0 2 −1 3 3 8 3 5 7 0 1  = S + A. M =  1 2 8  =  3 2 7  +  −2 1 −1 0 8 7 4 9 6 4 82 (6.124) (6.125) F1 ✟r ✟ ✙ ✟ r✁ r ✲ ✕✁ F2 F3 Figure 22: Elastic System Acted Upon by Forces. The antisymmetric part of a matrix does not contribute to the quadratic form, i.e., if A is antisymmetric, xT Ax = 0 for all vectors x. To prove this property, we note that xT Ax = (xT Ax)T = xT AT x = −xT Ax = 0, (6.126) since the scalar is equal to its own negative, thus completing the proof. Thus, for any square matrix M, xT Mx = xT Sx, (6.127) where S is the symmetric part of M. A square matrix M is defined as positive definite if xT Mx > 0 for all x 6= 0. M is defined as positive semi-definite if xT Mx ≥ 0 for all x 6= 0. Quadratic forms are of interest in engineering applications, since, physically, a quadratic form often corresponds to energy. For example, consider an elastic system (Fig. 22) acted upon by forces in static equilibrium. Let u denote the vector of displacements, and let F denote the vector of forces. If the forces and displacements are linearly related by generalized Hooke’s law, F = Ku, (6.128) where K is the system stiffness matrix. The work W done by the forces F acting through the displacements u is then 1 1 1 W = u · F = u · Ku = uT Ku, 2 2 2 (6.129) which is a quadratic form. Since the work required to deform a mechanical system in equilibrium (and with sufficient constraints to prevent rigid body motion) is positive for any set of nonzero displacements, 1 W = u · Ku > 0. (6.130) 2 That is, the stiffness matrix K for a mechanical system in equilibrium and constrained to prevent rigid body motion must be positive definite. The main property of interest for positive definite matrices is that a matrix M is positive definite if, and only if, all the eigenvalues of M are positive. To prove this statement, we must prove two properties: (1) positive definiteness implies positive eigenvalues, and (2) positive eigenvalues imply positive definiteness. We first prove that positive definiteness implies positive eigenvalues. For the ith eigenvalue λi , Mx(i) = λi x(i) , (6.131) 83 where x(i) is the corresponding eigenvector. If we form the dot product of both sides of this equation with x(i) , we obtain 2 0 < x(i) · Mx(i) = λi x(i) · x(i) = λi x(i) , (6.132) which implies λi > 0, thus completing the first part of the proof. We now prove that positive eigenvalues imply positive definiteness. Since, for a quadratic form, only the symmetric part of M matters, we can assume that M is symmetric. From Property 9 of the eigenvalue problem (page 71), the eigenvectors of M are mutually orthogonal, which implies that the eigenvectors are independent. We can therefore expand any vector x using an eigenvector basis: x= n X ci x(i) , (6.133) i=1 where x(i) is the ith eigenvector. Then, Mx = n X ci Mx(i) = i=1 n X ci λi x(i) , (6.134) i=1 which implies that x · Mx = n X j=1 cj x (j) · n X c i λi x (i) = n n X X i=1 j=1 i=1 ci cj λi x(i) · x(j) . (6.135) However, since the eigenvectors are mutually orthogonal, x(i) · x(j) = 0 for i 6= j. Therefore, x · Mx = n X 2 c2i λi x(i) , (6.136) (6.137) i=1 which is positive if all λi > 0, thus completing the second part of the proof. This important property establishes the connection between positive definiteness and eigenvalues. Two other consequences of positive-definiteness are 1. A positive definite matrix is non-singular, since Ax = 0 implies xT Ax = 0 implies x = 0 implies A−1 exists. 2. Gaussian elimination can be performed without pivoting. 6.9 Application to Differential Equations Consider the ordinary differential equation y ′ = ay, 84 (6.138) where y(t) is an unknown function to be determined, y ′ (t) is its derivative, and a is a constant. All solutions of this equation are of the form y(t) = y(0)eat , (6.139) where y(0) is the initial condition. The generalization of Eq. 6.138 to systems of ordinary differential equations is y1′ = a11 y1 + a12 y2 + · · · + a1n yn y2′ = a21 y1 + a22 y2 + · · · + a2n yn .. . (6.140) yn′ = an1 y1 + an2 y2 + · · · + ann yn , where y1 (t), y2 (t), . . . , yn (t) are the unknown functions to be determined, and the aij are constants. In matrix notation, this system can be written y′ = Ay, where     y 1     y2   , y= ..   .     y   n    A=  a11 a12 · · · a1n a21 a22 · · · a2n .. .. .. ... . . . an1 an2 · · · ann (6.141)    .  (6.142) Our principal interest in this section is in solving systems of coupled differential equations like Eq. 6.141. Note that, if the matrix A is a diagonal matrix, Eq. 6.141 is easy to solve, since each equation in the system involves only one unknown function (i.e., the system is uncoupled). Thus, to solve the system y′ = Ay for which A is not diagonal, we will attempt a change of variable y = Su, (6.143) where S is a square matrix of constants picked to yield a new system with a diagonal coefficient matrix. The substitution of Eq. 6.143 into Eq. 6.141 yields Su′ = ASu or where  u′ = S−1 AS u = Du, D = S−1 AS, (6.144) (6.145) (6.146) and we have assumed that S is invertible. The choice for S is now clear: S is the matrix which diagonalizes A. From Property 6 of the eigenvalue problem (page 69), we see that S is the matrix whose columns are the linearly independent eigenvectors of A, and D is the diagonal matrix of eigenvalues:   λ1   λ2     −1 λ 3 (6.147) D = S AS =   = Λ,   . .   . λn 85 where λi is the ith eigenvalue of A. Thus, we can use the following procedure for solving the coupled system of differential equations y′ = Ay: 1. Find the eigenvalues and eigenvectors of A. 2. Solve the uncoupled (diagonal) system u′ = Λu, where Λ is the diagonal matrix of eigenvalues of A. 3. Determine y from the equation y = Su, where S is the matrix whose columns are the eigenvectors of A. We illustrate this procedure by solving the system y1′ = y1 + y2 y2′ = 4y1 − 2y2 (6.148) with the initial conditions y1 (0) = 1, y2 (0) = 6. The coefficient matrix for this system is   1 1 . (6.149) A= 4 −2 Since the characteristic polynomial is det(A − λI) = 1−λ 1 4 −2 − λ = λ2 + λ − 6 = (λ + 3)(λ − 2), (6.150) the eigenvalues of A are λ = 2 and λ = −3. The corresponding eigenvectors are (1, 1) and (1, −4), respectively. Thus, a diagonalizing matrix for A is   1 1 , (6.151) S= 1 −4 and 1 Λ = S AS = 5 −1  4 1 1 −1  1 1 4 −2  1 1 1 −4  =  2 0 0 −3 Therefore, the substitution y = Su yields the new diagonal system   2 0 ′ u u = Λu = 0 −3  . (6.152) (6.153) or u′1 = 2u1 , u′2 = −3u2 , (6.154) u1 = c1 e2t , u2 = c2 e−3t . (6.155) whose general solutions are Thus, y, the original dependent variable of interest, is given by      1 1 u1 c1 e2t + c2 e−3t y = Su = = 1 −4 u2 c1 e2t − 4c2 e−3t 86 (6.156) or  y1 = c1 e2t + c2 e−3t y2 = c1 e2t − 4c2 e−3t . (6.157) The application of the initial conditions yields c1 + c2 = 1 c1 − 4c2 = 6 or c1 = 2, c2 = −1. Thus, the solution of the original system is  y1 = 2e2t − e−3t y2 = 2e2t + 4e−3t . Notice from Eq. 6.156 that the solution can be written in the form   u1 y = [v1 v2 ] = u1 v 1 + u 2 v 2 = c 1 e λ 1 t v 1 + c 2 e λ 2 t v 2 , u2 where the eigenvectors of A are the columns of S:     1 1 . , v2 = v1 = −4 1 (6.158) (6.159) (6.160) (6.161) Thus, in general, if the coefficient matrix A of the system y′ = Ay is diagonalizable, the general solution can be written in the form y = c1 e λ 1 t v 1 + c2 e λ 2 t v 2 + · · · + cn e λ n t v n , (6.162) where λ1 , λ2 , . . . , λn are the eigenvalues of A, and vi is the eigenvector corresponding to λi . A comparison of the matrix differential equation, Eq. 6.141, with the scalar equation, Eq. 6.138, indicates that the solution of Eq. 6.141 can be written in the form y(t) = eAt y(0), (6.163) where the exponential of the matrix At is formally defined [8] by the convergent power series eAt = I + At + (At)2 (At)3 + + ··· . 2! 3! (6.164) From Eq. 6.147, A = SΛS−1 , (6.165) where S is a matrix whose columns are the eigenvectors of A. Notice that, for integers k, Ak = (SΛS−1 )k = (SΛS−1 )(SΛS−1 ) · · · (SΛS−1 ) = SΛk S−1 , (6.166) since S−1 cancels S. Thus, SΛ2 S−1 t2 SΛ3 S−1 t3 + + ··· eAt = I + SΛS−1 t + 2! 3!  (Λt)2 (Λt)3 = S I + Λt + + + · · · S−1 = SeΛt S−1 . 2! 3! 87 (6.167) (6.168) Since Λt is a diagonal matrix, the powers (Λt)k are also diagonal, and eΛt is therefore the diagonal matrix   e λ1 t   e λ2 t   Λt e = (6.169) . . .   . e λn t Thus, the solution of y′ = Ay can be written in the form y(t) = SeΛt S−1 y(0), (6.170) where S is a matrix whose columns are the eigenvectors of A, Λ is the diagonal matrix of eigenvalues, and the exponential is given by Eq. 6.169. For the example of Eq. 6.148, Eq. 6.170 yields         2t   1 1 y1 2e2t − e−3t 1 e 0 4 1 1 = = , (6.171) y2 2e2t + 4e−3t 6 0 e−3t 1 −1 5 1 −4 in agreement with Eq. 6.159. Note that this discussion of first-order systems includes as a special case the nth-order equation with constant coefficients, since a single nth-order ODE is equivalent to a system of n first-order ODEs. For example, the n-th order system an y (n) + an−1 y (n−1) + an−2 y (n−2) + · · · + a1 y ′ + a0 y = 0, an 6= 0, (6.172) can be replaced by a first-order system if we rename the derivatives as y (i) = yi+1 (t), in which case   y1′ = y2   ′    y2 = y3 .. (6.173) .    y ′ = yn    yn−1 ′ n = (−a0 y1 − a1 y2 − · · · − an−2 yn−1 − an−1 yn )/an . 6.10 Application to Structural Dynamics We saw in the discussion of mechanical vibrations (§6.1, p. 65) that the equation of motion for a linear, multi-DOF, undamped mechanical system is Mü + Ku = F(t), (6.174) where M is the system mass matrix, K is the system stiffness matrix, F is the time-dependent applied force vector, u(t) is the unknown displacement vector, and dots denote differentiation with respect to the time t. If the system is damped, a viscous damping matrix B is introduced, and the equations of motion become Mü + Bu̇ + Ku = F(t). 88 (6.175) All three system matrices are real and symmetric. Equations like this arise whether the system is a discrete element system such as in Fig. 20 or a continuum modeled with finite elements. The general problem is to integrate these equations in time to determine u(t), given the initial displacement u0 and initial velocity u̇0 . There are several approaches used to solve these equations, one of which is the Newmark method (Eq. 1.75) discussed on p. 15. Another approach expands the solution in terms of vibration modes (the modal superposition approach). We recall that, for the free, undamped vibration problem Ku = λMu, (6.176) the eigenvectors are orthogonal with respect to both the mass M and stiffness K (Property 9, p. 71). Thus, the eigenvectors could be used to form a basis for the solution space, and the unknown displacement vector in Eq. 6.175 could be written as a linear combination of the modes: u(t) = ξ1 (t)x(1) + ξ2 (t)x(2) + · · · + ξm (t)x(m) . (6.177) The number of terms in this expansion would be n (the number of physical DOF) if all eigenvectors were used in the expansion, but, in practice, a relatively small number m of modes is usually sufficient for acceptable accuracy. The modes selected are generally those having lowest frequency (eigenvalue). In this equation, the multipliers ξi (t) are referred to as the modal coordinates, which can be collected into the array     ξ (t) 1      ξ2 (t)  . (6.178) ξ(t) = ..   .      ξ (t)  m Also, like in Property 6 (p. 69), we form the matrix S whose columns are the eigenvectors:   S = x(1) x(2) · · · x(m) , (6.179) in which case Eq. 6.177 can be written in matrix form as u(t) = Sξ(t). (6.180) This equation can be interpreted as a transformation from physical coordinates u to modal coordinates ξ. If this equation is substituted into the equation of motion, Eq. 6.175, and multiplied by ST , we obtain ST MSξ̈ + ST BSξ̇ + ST KSξ = ST F(t) = P(t), (6.181) c = ST MS and K b = ST KS are, by orthogonality, diagonal matrices referred to where M as the modal mass and stiffness matrices, respectively. The diagonal entries in these two matrices are the generalized masses and stiffnesses for the individual vibration modes. The b = ST BS is, in general, not a diagonal matrix, since the eigenvalue modal damping matrix B problem did not involve B. Thus, in modal coordinates, the equations of motion are c ξ̈ + B b ξ̇ + Kξ b = P(t). M 89 (6.182) This last equation is similar to Eq. 6.175 except that this equation has m DOF (the number of eigenvectors in the modal expansion), which generally is much smaller than the b is non-diagonal, both equations require a similar number n of physical DOF in Eq. 6.175. If B time integration. The main benefit of the modal formulation is that, in structural dynamics, it is common to approximate damping with a mode-by-mode frequency-dependent viscous b is diagonal, in which case the modal equations (Eq. 6.182) damping coefficient so that B uncouple into m scalar equations of the form mi ξ¨i + bi ξ˙i + ki ξi = Pi (t), (6.183) where mi and ki are the generalized mass and stiffness, respectively, for mode i. This is the equation of motion for the ith modal coordinate. Since it is a scalar equation with an arbitrary right-hand side, it has a known analytic solution in terms of an integral. Once all the scalar equations have been solved, the solution u(t) in terms of physical coordinates is obtained from Eq. 6.177. Thus we see that a multi-DOF dynamical system can be viewed as a collection of single-DOF mass-spring-damper systems associated with the modes. Most of the computational expense in the modal expansion method arises from solving the eigenvalue problem. Thus, the modal method is generally advantageous in structural dynamics if a small number of modes gives sufficient accuracy, if many time steps are required in the analysis, or if the structure has no localized damping treatments. Bibliography [1] J.W. Brown and R.V. Churchill. Fourier Series and Boundary Value Problems. McGraw-Hill, Inc., New York, seventh edition, 2006. [2] R.W. Clough and J. Penzien. Dynamics of Structures. McGraw-Hill, Inc., New York, second edition, 1993. [3] K.H. Huebner, D.L. Dewhirst, D.E. Smith, and T.G. Byrom. The Finite Element Method for Engineers. John Wiley and Sons, Inc., New York, fourth edition, 2001. [4] D.C. Kay. Schaum’s Outline of Theory and Problems of Tensor Calculus. Schaum’s Outline Series. McGraw-Hill, Inc., New York, 1988. [5] A.J. Laub. Matrix Analysis for Scientists and Engineers. Society for Industrial and Applied Mathematics, Philadelphia, 2005. [6] S. Lipschutz. 3000 Solved Problems in Linear Algebra. McGraw-Hill, Inc., New York, 1988. Schaum’s Outline Series. [7] S. Lipschutz. Schaum’s Outline of Theory and Problems of Linear Algebra. Schaum’s Outline Series. McGraw-Hill, Inc., New York, second edition, 1991. [8] C. Moler and C. Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Review, 45(1):3–49, 2003. 90 [9] M.R. Spiegel. Schaum’s Outline of Fourier Analysis With Applications to Boundary Value Problems. Schaum’s Outline Series. McGraw-Hill, Inc., New York, 1974. [10] G. Strang. Linear Algebra and Its Applications. Thomson Brooks/Cole, Belmont, CA, fourth edition, 2006. [11] J.S. Vandergraft. Introduction to Numerical Calculations. Academic Press, Inc., New York, second edition, 1983. 91 Index backsolving, 5, 19 matrix interpretation, 14 band matrix, 8 basis for vector space, 30 basis functions, 52, 60 basis vectors, 52 Bessel’s inequality, 60 buckling, 65 change of basis, 40 characteristic equation, 65, 74 column space, 26, 30 completeness, 61 Crank-Nicolson method, 19 determinants, 17, 43 diagonal system, 4 diagonally dominant, 25 differential equations, 84 dimension of vector space, 30 direct methods, 20 dot product, 2 echelon form of matrix, 28 eigenvalue problems, 64 generalized, 71 properties, 68 sweeping, 80 elementary operations, 6 equation(s) backsolving, 5 characteristic, 65, 74 diagonal system, 4 differential, 84 elementary operations, 6 forward solving, 6 Gaussian elimination, 6 homogeneous, 28 inconsistent, 4 lower triangular, 6 nonhomogeneous, 28 nonsingular, 4 normal, 49, 50, 53 partial pivoting, 10 rectangular systems, 26 upper triangular, 5 FBS, 15 finite difference method, 15 forward solving, 6 Fourier series, 55, 58 Bessel’s inequality, 60 convergence, 58 Gibbs phenomenom, 58 least squares, 64 polynomial basis, 61 function(s) basis, 60 complete set, 61 inner product, 55 norm, 56 normalized, 56 orthogonal, 56 orthonormal, 56 scalar product, 55 Gaussian elimination, 6, 84 generalized eigenvalue problem, 71 generalized mass, stiffness, 72 Gibbs phenomenom, 58 Gram-Schmidt orthogonalization, 53 Hilbert matrix, 62 inconsistent equations, 4 inner product, 2, 55 invariants, 75 inverse iteration, 78 iterative methods, 20 Jacobi’s method, 20 Kronecker delta, 1, 41, 48, 49, 56 law of cosines, 4 least squares problems, 48 Legendre polynomials, 63 93 linear independence, 29 LU decomposition, 13 storage scheme, 13 mass matrix, 72 matrix band, 8 block, 74 Hilbert, 62 identity, 1 inverse, 2, 19 lower triangular, 2 orthogonal, 2, 43 positive definite, 82 projection, 40 pseudoinverse, 31 rank, 28 rotation, 43 symmetric, 40 trace, 69 tridiagonal, 2 upper triangular, 2 mean square error, 52, 59 mechanical vibrations, 65, 75 mode shape, 67 mode superposition, 89 moments of inertia, 75 multiple right-hand sides, 7, 18 multiplier, 8 natural frequency, 67 Newmark method, 15 nonsingular equations, 4 norm of function, 56 normal equations, 49, 50, 53 null space, 26 dimension, 31 orthogonal to row space, 39 operation counts, 9 orthogonal coordinate transformation, 42 eigenvectors, 71 functions, 56 matrix, 2 subspaces, 38 vectors, 3 orthonormal basis, 41 set, 56 partial pivoting, 10, 84 pivots, 28 power iteration, 75 convergence, 77 principal stress, 73, 75 projections onto lines, 39 QR factorization, 54 quadratic forms, 82 Rayleigh quotient, 72 residual, 48, 50 row space, 30 orthogonal to null space, 39 scalar product functions, 55 similarity transformation, 81 spanning a subspace, 30 spectral theorem, 73 stiffness matrix, 72 stress invariants, 75 principal, 75 structural dynamics, 15, 88 integrator, 15 subspace, 26 summation convention, 41, 49 sweeping procedure, 80 tensor(s), 44 examples, 44 isotropic, 48 trace of matrix, 69 transformation(s) differentiation, 35 integration, 36 linear, 33 projection, 34 reflection, 34 rotation, 34, 37 94 similarity, 81 stretching, 33 tridiagonal system, 19 unit vector(s), 3, 41 upper triangular system, 5 vector spaces, 25 95