Exercises 03
Exercises 03
Exercises 03
3.38. Explain how QR factorization with column 3.42. Let a be a nonzero column vector. Consid-
pivoting can be used to determine the rank of a ered as an n × 1 matrix, a has only one positive
matrix. singular value. What is its value?
3.39. Explain why column pivoting can be used 3.43. Express the Euclidean condition number of
with the modified Gram-Schmidt orthogonaliza- a matrix in terms of its singular values.
tion procedure but not with the classical Gram-
3.44. List two reliable methods for determining
Schmidt procedure.
the rank of a rectangular matrix numerically.
3.40. In terms of the condition number of the
3.45. If A is a 2n × n matrix, rank the following
matrix A, compare the range of applicability of
methods according to the amount of work required
the normal equations method and the Householder
to solve the linear least squares problem Ax ≈ b.
QR method for solving the linear least squares
problem Ax ∼ = b [i.e., for what values of cond(A) (a) QR factorization by Householder transforma-
can each method be expected to break down?]. tions
3.41. Let A be an m × n matrix. (b) Normal equations
(a) What is the maximum number of nonzero sin- (c) Singular value decomposition
gular values that A can have? 3.46. List at least two applications for the singu-
(b) If rank(A) = k, how many nonzero singular lar value decomposition (SVD) of a matrix other
values does A have? than solving least squares problems.
Exercises
3.1. If a vertical beam has a downward force ap- (a) Set up the overdetermined linear system for
plied at its lower end, the amount by which it the least squares problem.
stretches will be proportional to the magnitude of (b) Set up the corresponding normal equations.
the force. Thus, the total length y of the beam is (c) Compute the least squares solution by
given by the equation Cholesky factorization.
y = x1 + x2 t, 3.3. Set up the linear least squares system Ax ∼ =
b for fitting the model function f (t, x) = x1 t+x2 et
where x1 is its original length, t is the force ap- to the three data points (1,2), (2,3), (3,5).
plied, and x2 is the proportionality constant. Sup-
pose that the following measurements are taken: 3.4. In fitting a straight line y = x0 + x1 t to the
three data points (ti , yi ) = (0,0), (1,0), (1,1), is
t 10 15 20 the least squares solution unique? Why?
y 11.60 11.85 12.25
3.5. Let x be the solution to the linear least
(a) Set up the overdetermined 3 × 2 system of lin- squares problem Ax ∼
= b, where
ear equations corresponding to the data collected.
1 0
(b) Is this system consistent? If not, compute each 1 1
possible pair of values for (x1 , x2 ) obtained by se- A= 1 2.
lecting any two of the equations from the system.
1 3
Is there any reason to prefer any one of these re-
sults? Let r = b − Ax be the corresponding residual
(c) Set up the system of normal equations and vector. Which of the following three vectors is a
solve it to obtain the least squares solution to the possible value for r? Why?
overdetermined system. Compare your result with
1 −1 −1
those obtained in part b. 1 −1 1
3.2. Suppose you are fitting a straight line to the (a) 1
(b)
1
(c)
1
three data points (0,1), (1,2), (3,3). 1 1 −1
Exercises 149
3.6. (a) What is the Euclidean norm of the min- 3.13. If A is both an orthogonal matrix and
imum residual vector for the following linear least an orthogonal projector, what can you conclude
squares problem? about A?
3.14. Show that if the vector v 6= 0, then the
1 1 2
0 1 x 1 ∼ matrix
= 1
x2 vv T
0 0 1 H =I −2 T
v v
(b) What is the solution vector x for this problem? is orthogonal and symmetric.
3.7. Let A be an m × n matrix and b an m- 3.15. Let a be any nonzero vector. If v =
vector. a − α e1 , where α = ±kak2 , and
(a) Prove that a solution to the least squares prob-
lem Ax ∼ = b always exists. vv T
H =I −2 ,
(b) Prove that such a solution is unique if, and vT v
only if, rank(A) = n. show that Ha = α e1 .
3.8. Suppose that A is an m × n matrix of rank 3.16. Consider the vector a as an n × 1 matrix.
n. Prove that the matrix AT A is positive definite.
(a) Write out its reduced QR factorization, show-
3.9. Prove that the augmented system matrix in ing the matrices Q and R explicitly.
Section 3.4.2 cannot be positive definite.
(b) What is the solution to the linear least squares
3.10. Let B be an n × n matrix, and assume that problem ax ∼= b, where b is a given n-vector?
B is both orthogonal and triangular.
3.17. Determine the Householder transformation
(a) Prove that B must be diagonal.
that annihilates all but the first entry of the vector
(b) What are the diagonal entries of B? [ 1 1 1 1 ]T . Specifically, if
(c) Let A be n × n and nonsingular. Use parts a
and b to prove that the QR factorization of A is 1 α
vv T
unique up to the signs of the diagonal entries of 1 0
I −2 T = ,
R. In particular, there exist unique matrices Q v v 1 0
and R such that Q is orthogonal, R is upper tri- 1 0
angular with positive entries on its main diagonal,
and A = QR. what are the values of α and v?
3.11. Suppose that the partitioned matrix 3.18. Suppose that you are computing the QR
factorization of the matrix
A B
O C 1 1 1
1 2 4
is orthogonal, where the submatrices A and C are A= 1 3
9
square. Prove that A and C must be orthogonal, 1 4 16
and B = O.
3.12. (a) Let A be an n × n matrix. Show by Householder transformations.
that any two of the following conditions imply the (a) How many Householder transformations are
other: required?
1. AT = A (b) What does the first column of A become as a
2. AT A = I result of applying the first Householder transfor-
3. A2 = I mation?
(b) Give a specific example, other than the iden- (c) What does the first column then become as a
tity matrix I or a permutation of it, of a 3 × 3 result of applying the second Householder trans-
matrix that has all three of these properties. formation?
(c) Name a nontrivial class of matrices that have (d ) How many Givens rotations would be required
all three of these properties. to compute the QR factorization of A?
150 Chapter 3: Linear Least Squares
3.28. (a) In the Gram-Schmidt procedure of Sec- (e) Show that any orthogonal matrix Q is a prod-
tion 3.5.3, if we define the orthogonal projectors uct of reflectors.
Pk = qk qkT , k = 1, . . . , n, where qk is the kth col- (f ) Illustrate the previous result by expressing the
umn of Q in the resulting QR factorization, show plane rotation
that c s
,
(I − Pk )(I − Pk−1 ) · · · (I − P1 ) −s c
= I − Pk − Pk−1 − · · · − P1 . where c2 + s2 = 1, as a product of two reflectors.
For some specific angle of rotation, draw a picture
(b) Show that the classical Gram-Schmidt proce- to show the mirrors.
dure is equivalent to
3.30. (a) Consider the column vector a as an
qk = (I − (P1 + · · · + Pk−1 ))ak , n × 1 matrix. Write out its reduced singular value
decomposition, showing the matrices U , Σ, and
(c) Show that the modified Gram-Schmidt proce- V explicitly.
dure is equivalent to (b) Consider the row vector aT as a 1 × n matrix.
Write out its reduced SVD, showing the matrices
qk = (I − Pk−1 ) · · · (I − P1 )ak . U , Σ, and V explicitly.
3.31. If A is an m × n matrix and b is an m-
(d ) An alternative way to stablize the classical vector, prove that the solution x of minimum Eu-
procedure is to apply it more than once (i.e., iter- clidean norm to the least squares problem Ax ∼
=b
ative refinement), which is equivalent to taking is given by
X uTi b
qk = (I − (P1 + · · · + Pk−1 ))m ak , x= vi ,
σi
σi 6=0
where m = 2 is typically sufficient. Show that where the σi , ui , and vi are the singular values
all three of these variations are mathematically and corresponding singular vectors of A.
equivalent (though they may differ markedly in
finite-precision arithmetic). 3.32. Prove that the pseudoinverse A+ of an
m × n matrix A, as defined using the SVD in Sec-
3.29. Let v be a nonzero n-vector. The hyper- tion 3.6.1, satisfies the following four properties,
plane normal to v is the (n − 1)-dimensional sub- known as the Moore-Penrose conditions.
space of all vectors z such that v T z = 0. A (a) AA+ A = A.
reflector is a linear transformation R such that
(b) A+ AA+ = A+ .
Rx = −x if x is a scalar multiple of v, and
Rx = x if v T x = 0. Thus, the hyperplane acts as (c) (AA+ )T = AA+ .
a mirror: for any vector, its component within the (d ) (A+ A)T = A+ A.
hyperplane is invariant, whereas its component or- 3.33. Prove that the pseudoinverse A+ of an
thogonal to the hyperplane is reversed. m × n matrix A, as defined using the SVD in Sec-
(a) Show that R = 2P − I, where P is the or- tion 3.6.1, has the value indicated for each of the
thogonal projector onto the hyperplane normal to following special cases.
v. Draw a picture to illustrate this result. (a) If m = n and A is nonsingular, then A+ =
(b) Show that R is symmetric and orthogonal. A−1 .
(c) Show that the Householder transformation (b) If m > n and A has rank n, then A+ =
(AT A)−1 AT .
vv T (c) If m < n and A has rank m, then A+ =
H =I −2 ,
vT v AT (AAT )−1 .
is a reflector. 3.34. (a) What is the pseudoinverse of the fol-
lowing matrix?
(d ) Show that for any two vectors s and t such
that s 6= t and ksk2 = ktk2 , there is a reflector R
1 0
such that Rs = t. A0 =
0 0
152 Chapter 3: Linear Least Squares
(b) If > 0, what is the pseudoinverse of the fol- (c) What do these results imply about the condi-
lowing matrix? tioning of the problem of computing the pseudoin-
verse of a given matrix?
1 0
A =
0
Computer Problems
3.1. For n = 0, 1, . . . , 5, fit a polynomial of de- 3.3. (a) For a series of matrices A of order n,
gree n by least squares to the following data: record the execution times for a library routine to
compute the LU factorization of A. Using a linear
t 0.0 1.0 2.0 3.0 4.0 5.0
least squares routine, or one of your own design,
y 1.0 2.7 5.8 6.6 7.5 9.9
fit a cubic polynomial to the execution times as
Make a plot of the original data points along with a function of n. To obtain reliable results, use a
each resulting polynomial curve (you may make fairly wide range of values for n, say, in increments
separate graphs for each curve or a single graph of 100 from 100 up to several hundred, depending
containing all of the curves). Which polynomial on the speed and available memory of the com-
would you say captures the general trend of the puter you use. You may obtain more accurate
data better? Obviously, this is a subjective ques- timings by averaging several runs for a given ma-
tion, and its answer depends on both the nature trix size. The resulting cubic polynomial could be
of the given data (e.g., the uncertainty of the data used to predict the execution time for other val-
values) and the purpose of the fit. Explain your ues of n not tried, such as very large values for n.
assumptions in answering. What is the predicted execution time for a matrix
of order 10,000?
3.2. A common problem in surveying is to deter-
mine the altitudes of a series of points with respect (b) Try to determine the basic execution rate (in
to some reference point. The measurements are floating-point operations per second, or flops) for
subject to error, so more observations are taken your computer by timing a known computation,
than are strictly necessary to determine the alti- such as matrix multiplication. You can then use
tudes, and the resulting overdetermined system is this information to determine the complexity of
solved in the least squares sense to smooth out LU factorization, based on the polynomial fit to
errors. Suppose that there are four points whose the execution times. After converting to floating-
altitudes x1 , x2 , x3 , x4 are to be determined. In point operations, how does the dominant term
addition to direct measurements of each xi with compare with the theoretically expected value of
respect to the reference point, measurements are 4n3 /3 (counting both additions and multiplica-
also taken of each point with respect to all of the tions)? Try to explain any discrepancy. If you
others. The resulting measurements are: use a system that provides operation counts au-
tomatically, try this same experiment fitting the
x1 = 2.95, x2 = 1.74, operation counts directly.
x3 = −1.45, x4 = 1.32,
3.4. (a) Solve the following least squares prob-
x1 − x2 = 1.23, x1 − x3 = 4.45, lem using any method you like:
x1 − x4 = 1.61, x2 − x3 = 3.21,
x2 − x4 = 0.45, x3 − x4 = −2.75.
0.16 0.10 0.26
0.17 x1 ∼
0.11 = 0.28 .
Set up the corresponding least squares system x2
2.02 1.29 3.31
Ax ∼ = b and use a library routine, or one of your
own design, to solve it for the best values of the
altitudes. How do your computed values compare (b) Now solve the same least squares problem
with the direct measurements? again, but this time use the slightly perturbed