Additional Exercises For ECE133A PDF
Additional Exercises For ECE133A PDF
Additional Exercises For ECE133A PDF
L. Vandenberghe
Contents
1 Vectors 2
2 Matrices 9
3 Linear equations 12
4 Matrix inverses 15
5 Triangular matrices 19
6 Orthogonal matrices 21
7 LU factorization 27
8 Least squares 37
11 Cholesky factorization 62
12 Nonlinear equations 70
13 Unconstrained optimization 72
16 Algorithm stability 92
17 Floating-point numbers 95
1
1 Vectors
1.1 Average and norm. Use the Cauchy–Schwarz inequality to prove that
n
1 1X 1
− √ kxk ≤ xi ≤ √ kxk
n n i=1 n
for all n-vectors x. In other words, − rms(x) ≤ avg(x) ≤ rms(x). What are the conditions on x to have
equality in the upper bound? When do we have equality in the lower bound?
1.2 Use the Cauchy–Schwarz inequality to prove that
n n
!−1
1X 1X 1
xk ≥
n n xk
k=1 k=1
for all n-vectors x with positive elements xk . The left-hand side of the inequality is the arithmetic mean
(average) of the numbers xk ; the right-hand side is called the harmonic mean.
1.3 Cauchy–Schwarz inequality for complex vectors. Extend the proof of the Cauchy–Schwarz inequality on
page 57 of the textbook to complex vectors, i.e., show that |bH a| ≤ kakkbk holds for all complex n-vectors
a, b. What are the conditions on a and b to have equality |aH b| = kakkbk?
Hint. Define γ = arg(bH a) (where arg denotes phase angle), so that bH a = |bH a|ejγ , and start from the
inequality 0 ≤ kβa − αejγ bk2 , where α = kak and β = kβk.
1.4 Decoding using inner products. An input signal is sent over a noisy communication channel. The channel
attenuates the input signal by an unknown factor α and adds noise to it. We represent the input signal as an
n-vector u, the output signal as an n-vector v, and the noise signal as an n-vector w. Therefore v = αu + w,
where the elements uk , vk , wk of the three vectors give the values of the signals at time k. The two plots
below show an example with α = 0.5.
1 1
uk
vk
0 0
−1 −1
Suppose we know that the input signal u was chosen from a set of four possible signals x(1) , x(2) , x(3) , x(4) .
We know these signals x(i) , but we don’t know which one was used as input signal u. A simple method for
estimating the input signal, based on the received signal v, is to calculate the angles between v and x(k) and
pick the signal x(k) that makes the smallest angle with v.
Download the file decoding.m from the class webpage, save it in your working directory, and execute it in
MATLAB using the command [x1, x2, x3, x4, v] = decoding. The first four output arguments are the
possible input signals x(k) , k = 1, 2, 3, 4. The fifth output argument is the received (output) signal v. The
length of the signals is n = 200.
(a) Plot the vectors v, x(1) , x(2) , x(3) , x(4) . Visually, it should be obvious which input signal was used to
generate v.
2
(b) Calculate the angle θk of v with each of the four signals x(k) . Which signal x(k) makes the smallest
angle with v? Does this confirm your conclusion in part (a)?
1.5 Multiaccess communication. A communication channel is shared by several users (transmitters), who use it
to send binary sequences (sequences with values +1 and −1) to a receiver. The following technique allows
the receiver to separate the sequences transmitted by each transmitter. We explain the idea for the case
with two transmitters.
We assign to each transmitter a different signal or code. The codes are represented as n-vectors u and v: u
is the code for user 1, v is the code for user 2. The codes are chosen to be orthogonal (uT v = 0). The figure
shows a simple example of two orthogonal codes of length n = 100.
1 1
0.8
0.5
0.6
uk
vk
0
0.4
−0.5
0.2
0 −1
0 20 40 60 80 100 0 20 40 60 80 100
k k
Suppose user 1 transmits a binary sequence b1 , b2 , . . . , bm (with values bi = 1 or bi = −1), and user 2
transmits a sequence c1 , c2 , . . . , cm (with values ci = 1 or ci = −1). From these sequences and the user
codes, we construct two signals x and y, both of length mn, as follows:
b1 u c1 v
b2 u c2 v
x = . , y = . .
.
. . .
bm u cm v
(Note that here we use block vector notation: x and y consist of m blocks, each of size n. The first block of
x is b1 u, the code vector u multiplied with the scalar b1 , etc.) User 1 sends the signal x over the channel,
and user 2 sends the signal y. The receiver receives the sum of the two signals. We write the received signal
as z:
b1 u c1 v
b2 u c 2 v
z = x + y = . + . .
.. ..
bm u cm v
The figure shows an example where we use the two code vectors u and v shown before. In this example
m = 5, and the two transmitted sequences are b = (1, −1, 1, −1, −1) and c = (−1, −1, 1, 1, −1).
3
1
xk
0
−1
0 100 200 300 400 500
1
yk
0
−1
0 100 200 300 400 500
2
zk 0
−2
0 100 200 300 400 500
k
How can the receiver recover the two sequences bk and ck from the received signal z? Let us denote the first
block (consisting of the first n values) of the received signal z as z (1) : z (1) = b1 u + c1 v. If we make the inner
product of z (1) with u, and use the fact that uT v = 0, we get
uT z (1) v T z (1)
b1 = , c1 = .
kuk2 kvk2
Repeating this for the other blocks of z allows us to recover the rest of the sequences b and c:
uT z (k) v T z (k)
bk = , ck = ,
kuk2 kvk2
4
1.6 Distance between two lines. Consider two lines
L1 = {a + tb | t ∈ R}, L2 = {c + sd | s ∈ R}.
The vectors a, b, c, d are given n-vectors with b 6= 0, d 6= 0. The distance between L1 and L2 is the minimum
of k(a + tb) − (c + sd)k over all values of s, t ∈ R. Derive a formula for this distance in terms of a, b, c, d.
1.7 Regression line. Let a, b be two real n-vectors. To simplify notation we write the vector averages as
1T a 1T b
ma = avg(a) = , mb = avg(b) = ,
n n
and their standard deviations as
1 1
sa = std(a) = √ ka − ma 1k, sb = std(b) = √ kb − mb 1k.
n n
We assume the vectors are not constant (sa 6= 0 and sb 6= 0) and write the correlation coefficient as
1 (a − ma 1)T (b − mb 1)
ρ= .
n sa sb
In lecture 2, we considered the problem of fitting a straight line to the points (ak , bk ), by minimizing
n
1X 1
J= (c1 + c2 ak − bk )2 = kc1 1 + c2 a − bk2 .
n n
k=1
We found that the optimal coefficients are c2 = ρsb /sa and c1 = mb − ma c2 . Show that for those values of
c1 and c2 , we have J = (1 − ρ2 )s2b .
1.8 Orthogonal distance regression. We use the same notation as in exercise 1.7: a, b are non-constant n-vectors,
with means ma , mb , standard deviations sa , sb , and correlation coefficient ρ.
y
(ak , bk ) y = c1 + c2 x
dk
ek
For each point (ak , bk ), the vertical deviation from the straight line defined by y = c1 + c2 x is given by
ek = |c1 + c2 ak − bk |.
P 2
The least squares regression method of the lecture minimizes the sum k ek of the squared vertical deviations.
The orthogonal (shortest) distance of (ak , bk ) to the line is
|c1 + c2 ak − bk |
dk = p .
1 + c22
As an alternative to the least P
squares method, we can find the straight line that minimizes the sum of the
squared orthogonal distances k d2k . Define
n
1X 2 kc1 1 + c2 a − bk2
J= dk = .
n n(1 + c22 )
k=1
5
(a) Show that the optimal value of c1 is c1 = mb − ma c2 , as for the least squares fit.
(b) If we substitute c1 = mb − ma c2 in the expression for J, we obtain
kc2 (a − ma 1) − (b − mb 1)k2
J= .
n(1 + c22 )
Set the derivative of J with respect to c2 to zero, to derive a quadratic equation for c2 :
2 sa sb
ρc2 + − c2 − ρ = 0.
sb sa
H1 = {x ∈ Rn | aT x = b}, H2 = {y ∈ Rn | aT y = c},
where a is a nonzero n-vector, and b and c are scalars. The hyperplanes are parallel because they have the
same coefficient vectors a. The distance between the hyperplanes is defined as
d = min kx − yk.
x∈H1
y∈H2
Which of the following three expressions for d is correct? Explain your answer.
|b − c| |b − c|
(a) d = |b − c|, (b) d= , (c) d= .
kak kak2
1.10 The k-means algorithm. In this exercise we apply the k-means algorithm to the example in §4.4.1 of the
textbook.
Download the file mnist_train.mat from the course website and load it in MATLAB or Octave using the
command load mnist_train. This creates two variables: a 784 × 60000 matrix digits and a 1 × 60000
matrix labels. We will not need labels. Each column of digits is a 28 × 28 grayscale image, stored as
a vector of length 282 = 784 with elements between 0 and 1 (0 denotes a black pixel and 1 a white pixel).
Figure 4.6 in the book shows the first 25 images. To display the image in the ith column of digits you can
use the commands
The first line converts column i of digits to a 28 × 28 matrix. The second command displays the matrix as
an image. To speed up the computations we will use only the first 10000 digits:
6
digits = digits(:, 1:10000);
You are asked to apply the k-means algorithm to this set of N = 10000 vectors, with k = 20 groups, and
starting from a random initial group assignment (as opposed to starting from 20 randomly generated group
representatives, as in Algorithm 4.1 (page 72–73).
In the following list of MATLAB hints and comments we assume that the 20 group representatives are stored
as columns of a 784 × 20 matrix Z, and that the group assignment is represented by a 1 × 10000 matrix (i.e.,
row vector) group. The ith element group(i) is an integer between 1 and 20, with the index of the group
that column i of digits is assigned to.
• You can create an initial group assignment using the randi function:
group = randi(20, 1, 10000);
This generates a pseudorandom 1 × 10000 matrix of integers between 1 and 20.
• Since we start from a random partition, the order of the two steps in algorithm 4.1 is switched. The
first step in each cycle is to compute the group representatives for the current assignment. We can find
the columns of digits that are assigned to group i using the function find. The command
I = find(group == i);
defines an index vector I such that digits(:,I) is the submatrix of digits with the columns assigned
to class i. To find the average of the columns in this matrix, you can use for-loops, or the MATLAB
functions sum or mean. (Be sure to look up what sum or mean do when applied to a matrix; see help sum
and help mean.)
• We evaluate the quality of the clustering using the clustering objective
N
1 X
J= min kxi − zj k2 .
N i=1 j=1,...,k
The algorithm is terminated when J is nearly equal in two successive iterations (e.g., we terminate
when |J − Jprev | ≤ 10−5 J, where Jprev is the value of J after the previous iteration).
• After running the k-means algorithm you can display the representative vectors of the 20 groups as
follows:
for k=1:20
subplot(4,5,k)
imshow(reshape(Z(:,k), 28, 28));
end
This produces a figure similar to figures 4.8 and 4.9 in the textbook. Your results will be different
because figures 4.8 and 4.9 were computed using the full set of 60000 digits and, moreover, the result
of the k-means algorithm depends on the starting point.
• Include the MATLAB code and a figure of a typical set of group representatives with your solution.
1.11 Tomography. Download the file tomography.mat from the class webpage and load it in MATLAB using the
command load tomography. This creates a matrix A of size 576 × 784 and a vector b of length 576. The
matrix A and the vector b describe a toy tomography example and were constructed using the MATLAB
AIR Tools package that can be found at www2.compute.dtu.dk/~pcha/AIRtools. (You do not need this
package for the exercise.) The geometry is shown in the figure below.
7
The image at the center is a black-and-white image of size 28 × 28. The figure shows a random image but
in the actual problem we used one of the images of handwritten digits of exercise 1.10. The green dots are
twelve source locations. For each source location we generate 48 rays emanating from the source. (The figure
shows the rays for two sources only.) For each ray, we calculate the line integral of the pixel intensities along
the ray. This gives 12 · 48 = 576 linear equations
784
X
Aij xj = bi , i = 1, . . . , 576.
j=1
Here xj denotes the intensity in pixel j of the image (images are stored as vectors of length 784, as in
exercise 1.10), Aij is the length of the intersection of ray i with pixel j, and bi is the value of the line integral
along ray i. These equations can be written in matrix form as Ax = b where A and b are the data in
tomography.mat, or as
aTi x = bi , i = 1, . . . , 576,
where aTi is row i of A.
The purpose is to reconstruct the image x from the line integral measurements. We will use Kaczmarz’s
iterative algorithm for this purpose. Note that this is a small problem and easy to solve using the standard
non-iterative methods that are discussed later in the course.
Kaczmarz’s algorithm starts at an arbitrary point x (for example, a zero vector) and then cycles through the
equations. At each iteration we take a new equation, and update x by replacing it with its projection on the
hyperplane defined by the equation. If K is the number of cycles and m = 576 is the number of equations,
the algorithm can be summarized as follows.
Initialize x.
For k = 1, . . . , K:
For i = 1, . . . , m:
Project x on the ith hyperplane:
a T x − bi
x := x − i ai .
kai k2
end
end
Run the algorithm for K = 10 cycles, starting at x = 0 (a black image). Compute the error kAx−bk/kbk after
each cycle. Also display the reconstructed image x (using the command imshow(reshape(x, 28, 28))).
8
2 Matrices
2.1 Projection on a line through the origin. Let y be a nonzero n-vector, and consider the function f : Rn → Rn ,
defined as
xT y
f (x) = y.
kyk2
It can be shown that that f (x) is the projection of x on the line passing through y and the origin (see exercise
3.12 of the textbook). Is f a linear function of x? If your answer is yes, give an n × n matrix A such that
f (x) = Ax for all x. If your answer is no, show with an example that f does not satisfy the definition of
linearity f (αu + βv) = αf (u) + βf (v).
2.2 Reflection of a vector about a line through the origin. As in exercise 2.1, let f (x) be the projection of x on
the line through the origin and a nonzero n-vector y. Define g : Rn → Rn as the reflection of x with respect
to the line. g(x) can be expressed as g(x) = x + 2(f (x) − x). The definition is illustrated in the figure.
y
g(x)
f (x)
2 3 2 3
5 1
1 4 4 5
9
The matrix A is called irreducible if all the elements of the matrix (I + A)n−1 (the (n − 1)st power of I + A)
are positive. Show that A is irreducible if and only if the graph GA is strongly connected, i.e., for every
vertex i and every vertex j there is a directed path from vertex i to vertex j.
The matrix A1 is an irreducible matrix. The matrix A2 is not (for example, there is no directed path from
vertex 1 to 3, or from 2 to 5).
2.5 Circular convolution. The circular convolution of two n-vectors a, b is the n-vector c defined as
X
ck = a i bj , k = 1, . . . , n,
all i and j with
(i + j) mod n =
k+1
where (i + j) mod n is the remainder of i + j after integer division by n. Therefore the sum is over all i, j
with i + j = k + 1 or i + j = n + k + 1. For example, if n = 4,
c1 = a 1 b1 + a 4 b2 + a 3 b3 + a 2 b4
c2 = a 2 b1 + a 1 b2 + a 4 b3 + a 3 b4
c3 = a 3 b1 + a 2 b2 + a 1 b3 + a 4 b4
c4 = a 4 b1 + a 3 b2 + a 2 b3 + a 1 b4 .
We use the notation c = a ⊛ b for circular convolution, to distinguish it from the standard convolution
c = a ∗ b defined in the textbook (p. 136) and lecture (p. 3-32).
Suppose a is given. Show that a ⊛ b is a linear function of b, by giving a matrix Tc (a) such that a ⊛ b = Tc (a)b
for all b.
2.6 Give the number of flops required to evaluate a product of three matrices X = ABC, where A is n × n, B
is n × 10, and C is 10 × n. You can evaluate the product in two possible ways.
2.7 A matrix C is defined as C = Auv T B where A and B are n × n-matrices, and u and v are n-vectors. The
product on the right-hand side can be evaluated in many different ways, e.g., as A(u(v T B)) or as A((uv T )B),
etc. What is the fastest method (requiring the least number of flops) when n is large?
For example,
3 4 9 12
1 3 3 4 −5 6 −15 18
⊗ =
.
2 −1 −5 6 6 8 −3 −4
−10 12 5 −6
10
Suppose the n × n matrices A and B, and an n2 -vector x are given. Describe an efficient method for the
matrix-vector multiplication
A11 B A12 B · · · A1n B x1
A21 B A22 B · · · A2n B x2
y = (A ⊗ B)x = .. .. .. .. .
. . . .
An1 B An2 B · · · Ann B xn
(On the right we partitioned x in subvectors xi of size n.) What is the complexity of your method? How
much more efficient is it than a general matrix-vector multiplication of an n2 × n2 matrix and an n2 -vector?
2.9 Let A and B be two matrices of size m × n. Describe an efficient method for computing C = (I + AB T )3 .
Distinguish two cases, m > n and m < n. Give the complexity of your method, including all cubic terms
(m3 , m2 n, mn2 , n3 ). If you know several methods, choose the most efficient one.
11
3 Linear equations
3.1 Polynomial interpolation. In this problem we construct polynomials p(t) = x1 +x2 t+· · ·+xn−1 tn−2 +xn tn−1
of degree 5, 10, and 15 (i.e., for n = 6, 11, 16), that interpolate points on the graph of the function
f (t) = 1/(1 + 25t2 ) in the interval [−1, 1]. For each value of n, we compute the interpolating polynomial as
follows. We first generate n pairs (ti , yi ), using the MATLAB commands
t = linspace(-1, 1, n)’;
y = 1 ./ (1 + 25*t.^2);
This produces two n-vectors: a vector t with elements ti , equally spaced in [−1, 1], and a vector y with
elements yi = f (ti ). (See ‘help rdivide’ and ‘help power’ for the meaning of the operations ./ and .^.)
We then solve a set of linear equations
1 t1 · · · t1n−2 t1n−1 x1 y1
1 · · · t2n−2 t2n−1
t2 x 2 y2
.. .. .. .. .. = ..
. . (1)
. . . .
1 tn−1 · · · tn−2 tn−1 xn−1 yn−1
n−1 n−1
1 tn · · · tnn−2 tnn−1 xn yn
MATLAB hints.
Type ‘help vander’ for details. This is almost what we need, but you have to ‘flip’ this matrix from
left to right. This operation is also built in in MATLAB (type help fliplr).
• We are interested in the behavior of the interpolating polynomials between the points ti that you used
in the construction. Therefore, when you plot the three polynomials, you should use a much denser
grid of points (e.g., a few hundred points equally spaced in interval [−1, 1]) than the n points that you
used to generate the polynomials.
3.2 Formulate the following problem as a set of linear equations Ax = b. Find two cubic polynomials
p(t) = x1 + x2 t + x3 t2 + x4 t3 , q(t) = x5 + x6 t + x7 t2 + x8 t3
12
• p(t4 ) = q(t4 ), p′ (t4 ) = q ′ (t4 ). This specifies that at t = t4 the polynomials should have the same value
and the same derivative.
The variables in the problem are the coefficients x1 , . . . , x8 . The numbers ti , yi are given, with t1 < t2 <
t3 < t4 < t5 < t6 < t7 .
Test the method in MATLAB on the following problem. We take the 7 points ti equally spaced in the
interval [−0.75, 0.25] (using t = linspace(-0.75, 0.25, 7)’), and
Calculate the two polynomials p(t) and q(t) (using the command x = A \ b to solve the equation Ax = b),
and plot them on the interval [−0.75, 0.25].
3.3 Express the following problem as a set of linear equations. Find a cubic polynomial
f (t) = c1 + c2 (t − t1 ) + c3 (t − t1 )2 + c4 (t − t1 )2 (t − t2 )
that satisfies
f (t1 ) = y1 , f (t2 ) = y2 , f ′ (t1 ) = s1 , f ′ (t2 ) = s2 .
The numbers t1 , t2 , y1 , y2 , s1 , s2 are given, with t1 6= t2 . The unknowns are the coefficients c1 , c2 , c3 , c4 .
Write the equations in matrix-vector form Ax = b, and solve them.
3.4 Express the following problem as a set of linear equations. Find a quadratic function
p11 p12 u1 u1
f (u1 , u2 ) = u1 u2 + q1 q2 +r
p12 p22 u2 u2
and
ρ1 = 18.187, ρ2 = 9.4218, ρ3 = 14.310, ρ4 = 24.955.
3.6 Formulate the following problem as a set of linear equations in the form Ax = b. Give the numerical values
of the elements of the 4 × 4 matrix A.
Find a polynomial p(t) = x1 + x2 t + x3 t2 + x4 t3 that satisfies the four conditions
Z 1 Z 1 Z 1 Z 1
2
p(t)dt = b1 , tp(t)dt = b2 , t p(t)dt = b3 , t3 p(t)dt = b4 .
0 0 0 0
13
3.7 Express the following problem as a set of linear equations Ax = b. Find a rational function
x 1 + x 2 t + x 3 t2
f (t) =
1 + x 4 t + x 5 t2
that satisfies the five conditions
f (2) (0) f (3) (0) f (4) (0)
f (0) = b1 , f (1) (0) = b2 , = b3 , = b4 , = b5 ,
2 6 24
where b1 , . . . , b5 are given. Here f (k) (0) denotes the kth derivative of f at 0. Clearly state what the 5 × 5
matrix A is. Hint. The five conditions specify the first five terms in the series expansion
∞
X ∞
X
f (k) (0) f (k) (0)
f (t) = t k = b1 + b2 t + b3 t 2 + b4 t 3 + b 5 t 4 + tk .
k! k!
k=0 k=5
14
4 Matrix inverses
4.1 Do the following matrices have linearly independent columns?
−1 2
(a) A = 3 −6 .
2 −1
−1 3 2
(b) A = .
2 −6 −1
−9 0 7
(c) A = 4 0 −5 .
−1 0 6
D
(d) A = , where B is m × n and D is a diagonal n × n matrix with nonzero diagonal elements.
B
(e) A = abT where a and b are n-vectors and n > 1.
(f) A = I − abT where a and b are n-vectors with kakkbk < 1.
4.2 Suppose A is a nonsingular n × n matrix, u and v are n-vectors, and v T A−1 u 6= −1. Show that A + uv T is
nonsingular with inverse
1
(A + uv T )−1 = A−1 − A−1 uv T A−1 .
1 + v T A−1 u
4.3 Suppose A is a nonsingular n × n matrix. Consider the 2n × 2n matrix
" #
A A + A−T
M= .
A A
(a) Show that M is nonsingular, by showing that M x = 0 implies x = 0.
(b) Find the inverse of M . To find M −1 , express it as a block matrix
−1 W X
M =
Y Z
with blocks of dimension n×n, and determine the matrices W , X, Y , Z from the condition M M −1 = I.
4.4 Define a sequence of matrices Ak for k = 0, 1, 2 . . ., as follows: A0 = 1 and for k ≥ 1,
Ak−1 Ak−1
Ak = .
−Ak−1 Ak−1
Therefore Ak is a matrix of size 2k × 2k . The first four matrices in the sequence are
1 1 1 1
1 1 −1 1 −1 1
A0 = 1, A1 = , A2 = −1 −1
,
−1 1 1 1
1 −1 −1 1
1 1 1 1 1 1 1 1
−1 1 −1 1 −1 1 −1 1
−1 −1 1 1 −1 −1 1 1
1 −1 −1 1 1 −1 −1 1
A3 = .
−1 −1 −1 −1 1 1 1 1
1 −1 1 −1 −1 1 −1 1
1 1 −1 −1 −1 −1 1 1
−1 1 1 −1 1 −1 −1 1
Show that all matrices Ak in the sequence are nonsingular. What is the inverse of Ak ?
15
4.5 Lagrange interpolation. In exercise 3.1 we considered the problem of finding a polynomial
p(t) = x1 + x2 t + · · · + xn tn−1 (2)
with specified values
p(t1 ) = y1 , p(t2 ) = y2 , ,..., p(tn ) = yn . (3)
The polynomial p is called the interpolating polynomial through the points (t1 , y1 ), . . . , (tn , yn ). Its coeffi-
cients can be computed by solving the set of linear equations
1 t1 · · · t1n−2 t1n−1 x1 y1
1 · · · t2n−2 t2n−1
t2 x 2 y2
.. .. .. .. .. = .. .
. . (4)
. . . .
1 tn−1 · · · tn−2 tn−1 xn−1 yn−1
n−1 n−1
1 tn · · · tnn−2 tnn−1 xn yn
The coefficient matrix is called a Vandermonde matrix. As we have seen in the lecture, a Vandermonde
matrix is nonsingular if the points ti are distinct (ti 6= tj for i 6= j). As a consequence, the interpolating
polynomial is unique: if the points ti are distinct, then there exists exactly one polynomial of degree less
than or equal to n − 1 that satisfies (3), and its coefficients are the solution of the equations (4).
In this problem we describe another method for finding p, known as Lagrange interpolation.
(a) We define n polynomials li : Q
j6=i (t − tj )
li (t) = Q , i = 1, . . . , n.
j6=i (ti − tj )
Verify that li is a polynomial of degree n − 1, and that
1 if k = i
li (tk ) =
0 if k =
6 i.
For example, for n = 3, we have the three polynomials
(t − t2 )(t − t3 ) (t − t1 )(t − t3 ) (t − t1 )(t − t2 )
l1 (t) = , l2 (t) = , l3 (t) = .
(t1 − t2 )(t1 − t3 ) (t2 − t1 )(t2 − t3 ) (t3 − t1 )(t3 − t2 )
(b) Show that the polynomial
p(t) = y1 l1 (t) + y2 l2 (t) + · · · + yn ln (t) (5)
has degree n − 1 or less, and satisfies the interpolation conditions (3). It is therefore equal to the unique
interpolating polynomial through those points.
(c) This provides another method for polynomial interpolation. To find the coefficients xi we express the
polynomial (5) in the form (2) by expanding the polynomials li as weighted sums of powers of t.
As an example, for n = 3, the polynomial (5) is given by
(t − t2 )(t − t3 ) (t − t1 )(t − t3 ) (t − t1 )(t − t2 )
p(t) = y1 + y2 + y3 .
(t1 − t2 )(t1 − t3 ) (t2 − t1 )(t2 − t3 ) (t3 − t1 )(t3 − t2 )
Express this as p(t) = x1 + x2 t + x3 t2 , i.e., give expressions for x1 , x2 , x3 in terms of t1 , t2 , t3 , y1 , y2 ,
y3 . Use the result to prove the following expression for the inverse of a 3 × 3 Vandermonde matrix:
t t t t t t
2 3 1 3 1 2
16
4.6 (a) Formulate the following problem as a set of linear equations. Find a point x ∈ Rn at equal distance to
n + 1 given points y1 , y2 , . . . , yn+1 ∈ Rn :
kx − y1 k = kx − y2 k = · · · = kx − yn+1 k.
is nonsingular.
4.7 We consider the problem of localization from range measurements in 3-dimensional space. The 3-vector y
represents the unknown location. We measure the distances of the location y to five points at known locations
c1 , . . . , c5 . The five distance measurements ρ1 , . . . , ρ5 are exact, except for an unknown systematic error or
offset z (for example, due to a clock offset). We therefore have five equations
ky − ck k + z = ρk , k = 1, . . . , 5,
X = AB, Y = B † A†
where A† and B † are the pseudo-inverses of A and B. Show the following properties.
(a) Y X is symmetric.
(b) XY is symmetric.
(c) Y XY = Y .
(d) XY X = X.
Carefully explain your answers.
4.9 Suppose A is an n × (n − 1) matrix with linearly independent columns, and b is an n-vector with AT b = 0
and kbk = 1.
†
A
(a) Show that the matrix [ A b ] is nonsingular with inverse .
bT
(b) Let C be any left inverse of A. Show that
C(I − bbT ) I 0
A b = .
bT 0 1
(c) Use the results of parts (a) and (b) to show that C(I − bbT ) = A† .
17
4.10 Let A be an n × n matrix with the property that
X
|Aii | > |Aij |, i = 1, . . . , n.
j6=i
(The absolute value of each diagonal element is greater than the sum of the absolute values of the other
elements in the same row.) Such a matrix is called strictly diagonally dominant. Show that A is nonsingular.
Hint. Consider a vector x with Ax = 0. Suppose k is an index with |xk | = maxi=1,...,n |xi |. Consider the
kth equation in Ax = 0.
4.11 Consider the matrix A = I + aJ, where a is a real scalar and J is the reverser matrix
0 0 ··· 0 1
0 0 ··· 1 0
. .. ..
J = ... ..
. . . . .
0 1 ··· 0 0
1 0 ··· 0 0
of size n × n. We assume n > 1.
(a) For what values of a is A singular?
(b) Assuming A is nonsingular, express its inverse as a linear combination of the matrices I and J.
4.12 Consider the n × n matrix A = aI − 11T where a is a real scalar and 1 is the n-vector of ones. We assume
n > 1.
(a) For what values of a is A singular?
(b) Assuming A is nonsingular, express its inverse as a linear combination of I and 11T .
4.13 If A and B are nonsingular matrices of the same size, then (AB)−1 = B −1 A−1 . It is tempting to generalize
this to
(AB)† = B † A† (6)
for matrices A and B with linearly independent columns.
(a) First show that (6) is false for
1 0
1
A = 0 1 , B= .
1
0 1
We conclude that (6) does not necessarily hold for matrices with linearly independent columns.
(b) Next, we make stronger assumptions on A and B. In each of the following two subproblems, either
show that (6) holds, or give a small example for which (AB)† 6= B † A† . We assume the dimensions of
A and B are compatible, so the product AB is defined.
(i) A has linearly independent columns and B is nonsingular.
(ii) A is nonsingular and B has linearly independent columns.
4.14 Let A and B be square matrices with A + B nonsingular. The matrix
C = A(A + B)−1 B
is called the parallel sum of A and B. Show the following identities and properties.
(a) C = A − A(A + B)−1 A.
(b) C = B(A + B)−1 A.
(c) If A and B are nonsingular, then C is nonsingular with inverse C −1 = A−1 + B −1 .
18
5 Triangular matrices
5.1 Let A be a lower triangular n × n matrix. Verify the following properties.
(a) If B is a lower triangular n × n matrix, then the product AB is lower triangular.
(b) The matrix Ak is lower triangular for all positive integers k.
(c) If A is nonsingular, then Ak is lower triangular for all integers k (positive or negative).
5.2 Let A be a nonsingular lower triangular n × n matrix.
(a) What is the complexity of computing A−1 ?
(b) What is the complexity of solving Ax = b by first computing A−1 and then forming the matrix-vector
product x = A−1 b? Compare with the complexity of forward and backward substitution.
5.3 The trace of a square matrix is the sum of its diagonal elements. What is the complexity of computing the
trace of A−1 , if A is lower triangular and nonsingular?
5.4 A lower triangular matrix A is bidiagonal if Aij = 0 for i > j + 1:
A11 0 0 ··· 0 0 0
A21 A22 0 ··· 0 0 0
0 A32 A33 ··· 0 0 0
.. .. .. .. .. ..
A= . . . . . . .
0 0 0 ··· An−2,n−2 0 0
0 0 0 ··· An−1,n−2 An−1,n−1 0
0 0 0 ··· 0 An,n−1 Ann
Assume A is a nonsingular bidiagonal and lower triangular matrix of size n × n.
(a) What is the complexity of solving Ax = b?
(b) What is the complexity of computing the inverse of A?
State the algorithm you use in each subproblem, and give the dominant term (exponent and coefficient) of
the flop count. If you know several methods, consider the most efficient one.
5.5 A lower triangular matrix A is called a lower triangular band matrix with k subdiagonals if Aij = 0 for
i > j + k. The matrix
−0.9 0 0 0 0 0
0.7 −0.7 0 0 0 0
1.4 −2.7 3.7 0 0 0
A= 0
0.6 0.3 −1.2 0 0
0 0 −2.2 1.1 −0.6 0
0 0 0 2.4 2.4 0.7
is a 6 × 6 lower triangular band matrix with 2 subdiagonals.
What is the complexity of solving a set of linear equations Ax = b if A is an n × n lower triangular band
matrix with k subdiagonals and nonzero diagonal elements? Express the complexity as the number of flops
as a function of n and k. You can assume that k ≪ n.
5.6 Describe an efficient method for each of the following two problems and give its complexity.
(a) Solve
DX + XD = B
where D is n × n and diagonal. The diagonal elements of D satisfy Dii + Djj 6= 0 for all i and j. The
matrices D and B are given. The variable is the n × n matrix X.
19
(b) Solve
LX + XLT = B
where L is lower triangular. The diagonal elements of L satisfy Lii + Ljj 6= 0 for all i and j. The
matrices L and B are given. The variable is the n × n matrix X. (Hint: Solve for X column by
column.)
If you know several methods, choose the fastest one (least number of flops for large n).
20
6 Orthogonal matrices
6.1 Show that every 2 × 2 rotation matrix can be written as a product of two reflectors: for every θ, there exist
u, v such that
cos θ − sin θ
= (I − 2uuT )(I − 2vv T ), kuk = kvk = 1.
sin θ cos θ
Without loss of generality, one can choose v = (1, 0), so it remains to find u. You may find the trigonometric
identities cos θ = cos2 (θ/2) − sin2 (θ/2) and sin θ = 2 cos(θ/2) sin(θ/2) useful.
6.2 A square matrix A is called normal if AAT = AT A. Show that if A is normal and nonsingular, then the
matrix Q = A−1 AT is orthogonal.
6.3 Let S be a square matrix that satisfies S T = −S. This is called a skew-symmetric matrix.
(a) Show that I − S is nonsingular. (Hint: first show that xT Sx = 0 for all x.)
(b) Show that (I + S)(I − S)−1 = (I − S)−1 (I + S). (This property does not rely on the skew-symmetric
property; it is true for any matrix S for which I − S is nonsingular.)
(c) Show that the matrix A = (I + S)(I − S)−1 is orthogonal.
6.4 When is a matrix lower-triangular and orthogonal? How many n × n matrices with this property exist?
6.5 Let Q be an n × n orthogonal matrix, partitioned as
Q = Q1 Q2
where Q1 has size n × m and Q2 has size n × (n − m), with 0 < m < n. Consider the n × n matrix
A = Q1 QT1 − Q2 QT2 .
(a) Show that A can also be written in the following two forms: A = 2Q1 QT1 − I and A = I − 2Q2 QT2 .
(b) Show that A is orthogonal.
(c) Describe an efficient method for solving Ax = b and give its complexity. If you know several methods,
give the method with the lowest complexity when m < n/2.
6.6 Let A be a tall m × n matrix with linearly independent columns. Define P = A(AT A)−1 AT .
(a) Show that the matrix 2P − I is orthogonal.
(b) Use the Cauchy–Schwarz inequality to show that the inequalities
−kxkkyk ≤ xT (2P − I)y ≤ kxkkyk
hold for all m-vectors x and y.
(c) Take x = y in part (b). Show that the right-hand inequality implies that kP xk ≤ kxk for all m-vectors x.
6.7 Let B be an m × n matrix.
(a) Prove that the matrix I + B T B is nonsingular. Since we do not impose any conditions on B, this also
shows that the matrix I + BB T is nonsingular.
(b) Show that the matrix
I BT
A=
−B I
is nonsingular and that the following two expressions for its inverse are correct:
−1 I 0 −B T
A = + (I + BB T )−1 B I ,
0 0 I
−1 0 0 I
A = + (I + B T B)−1 I −B T .
0 I B
21
(c) Now assume B has orthonormal columns. Use the result in part (b) to formulate a simple method for
solving Ax = b. What is the complexity of your method? If you know several methods, give the most
efficient one.
6.8 (a) For what property of the matrix B is a matrix of the form
1 I BT
A= √
2 −B I
orthogonal? Give the necessary and sufficient conditions on B.
(b) What are the properties of B needed to make the matrix A nonsingular?
6.9 Circulant matrices and discrete Fourier transform. A circulant matrix is a square matrix of the form
a1 an an−1 · · · a3 a2
a2 a1 an · · · a4 a3
a3 a 2 a1 · · · a5 a4
T (a) = . .. .. .. .. .. . (7)
.. . . . . .
an−1 an−2 an−3 · · · a1 an
an an−1 an−2 · · · a2 a1
We use the notation T (a) for this matrix, where a = (a1 , a2 , . . . , an ) is the n-vector in the first column. Each
of the other columns is obtained by a circular downward shift of the previous column. In matrix notation,
T (a) = a Sa S 2 a · · · S n−1 a
where S is the n × n circular shift matrix
0 0 ··· 0 1
1 0 ··· 0 0
0 1
S= 0 1 ··· 0 0 = .
.. .. . . .. .. I n−1 0
. . . . .
0 0 ··· 1 0
(a) Let W be the n × n DFT matrix:
1 1 1 ··· 1
1 ω −1 ω −2 ··· ω −(n−1)
ω −2 ω −4 ω −2(n−1)
W = 1 ··· (8)
.. .. .. ..
. . . .
1 ω −(n−1) ω −2(n−1) ··· ω −(n−1)(n−1)
22
(c) The matrix-vector product y = W x is the discrete Fourier transform of x. The matrix-vector product
y = W −1 x = (1/n)W H x is the inverse discrete Fourier transform. The DFT and its inverse can be
computed in order n log n operations using the Fast Fourier Transform algorithm.
Use the factorization (9) to formulate a fast algorithm, with a complexity of order n log n, for computing
matrix-vector products T (a)x. The product T (a)x is known as the circular convolution of the vectors
a and x.
(d) Use the factorization in part (b) to formulate a fast algorithm, with an order n log n complexity, for
solving a set of linear equations Ax = b with variable x and coefficient matrix A = T (a) (assuming T (a)
is nonsingular). Compare the speed of your algorithm with the standard method (A \ b), for randomly
generated a and b. You can use the following code to generate a and b, and construct the circulant
matrix A = T (a).
a = randn(n, 1);
b = randn(n, 1);
A = toeplitz(a, [a(1), flipud(a(2:n))’]);
Use the MATLAB functions fft and ifft to implement the fast algorithm. y = fft(x) evaluates
y = W x using the Fast Fourier Transform algorithm; y = ifft(x) evaluates y = W −1 x.
6.10 Refer to the factorization (9) of a circulant matrix (7) with the n-vector a as its first column. In (9), W is
the n × n discrete Fourier transform matrix and diag(W a) is the diagonal matrix with the vector W a (the
discrete Fourier transform of a) on its diagonal.
(a) Suppose T (a) is nonsingular. Show that its inverse T (a)−1 is a circulant matrix. Give a fast method
for computing the vector b that satisfies T (b) = T (a)−1 .
(b) Let a and b be two n-vectors. Show that the product T (a)T (b) is a circulant matrix. Give a fast method
for computing the vector c that satisfies T (c) = T (a)T (b).
6.11 A diagonal matrix with diagonal elements +1 or −1 is called a signature matrix. The matrix
1 0 0
S = 0 −1 0
0 0 −1
is an example of a 3 × 3 signature matrix. If S is a signature matrix, and A is a square matrix that satisfies
AT SA = S, (10)
(a) Suppose S is an n × n signature matrix, and u is an n-vector with uT Su 6= 0. Show that the matrix
2
A=S− uuT
uT Su
is pseudo-orthogonal with respect to S.
(b) Show that pseudo-orthogonal matrices are nonsingular. In other words, show that any square matrix
A that satisfies (10) for some signature matrix S is nonsingular.
(c) Describe an efficient method for solving Ax = b when A is pseudo-orthogonal. ‘Efficient’ here means
that the complexity is at least an order of magnitude less than the (2/3)n3 complexity of the standard
method for a general set of linear equations. Give the complexity of your method.
(d) Show that if A satisfies (10) then ASAT = S. In other words, if A is pseudo-orthogonal with respect
to S, then AT is also pseudo-orthogonal with respect to S.
23
6.12 The Kronecker product of two n × n matrices A and B is the n2 × n2 matrix
A11 B A12 B · · · A1n B
A21 B A22 B · · · A2n B
A⊗B = .. .. .. .. .
. . . .
An1 B An2 B ··· Ann B
For example,
3 4 0 0
1 0 3 4 −5 6 0 0
⊗ =
.
2 −1 −5 6 6 8 −3 −4
−10 12 5 −6
Suppose A and B are orthogonal. Is A ⊗ B orthogonal? Explain your answer.
6.13 Let a be an n-vector with kak = 1. Define the 2n × 2n matrix
aaT I − aaT
A= .
I − aaT aaT
0
line through a and the origin
6.14 Let A be an m × n matrix with linearly independent columns. The Householder algorithm for the QR
factorization of A computes an orthogonal m × m matrix Q such that
R
QT A =
0
where R is upper triangular with nonzero diagonal elements. The matrix Q is computed as a product
Q = Q1 Q2 · · · Qn−1 of orthogonal matrices. In this problem we discuss the first step, the calculation of Q1 .
This matrix has the property that
R11 × · · · ×
0 × ··· ×
QT1 A = . .. .. .
.. . .
0 × ··· ×
The ‘×’ symbols denote elements that may or may not be zero.
24
Let a = (A11 , A21 , . . . , Am1 ) be the first column of A. Define an m-vector
1 1
v=p a + se1
1 + |A11 |/kak kak
where s = 1 if A11 ≥ 0 and s = −1 if A11 < 0. The vector e1 is the first unit vector (1, 0, . . . , 0).
√
(a) Show that v has norm 2.
(b) Define Q1 = I − vv T . Show that Q1 is orthogonal.
(c) Show that QT1 a = R11 e1 , where R11 = −skak.
(d) Give the complexity of computing the matrix-matrix product QT1 A = (I − vv T )A.
6.15 We use the notation In for the identity matrix of size n × n and Jn for the reverser matrix of size n × n.
(The reverser matrix is the identity matrix with the column order reversed.)
(a) Verify that the 2n × 2n reverser matrix J2n can be written as
In 0 T 1 In In
J2n = Q Q where Q = √ .
0 −In 2 J n −Jn
Also show that Q is orthogonal.
(b) Let A be a 2n × 2n matrix with the property that
J2n A = AJ2n . (11)
An example is the 4 × 4 matrix
1 2 3 4
5 6 7 8
.
8 7 6 5
4 3 2 1
Use the factorization of J2n in part (a) to show that if A satisfies (11) then the matrix QT AQ is
block-diagonal:
T B 0
Q AQ = ,
0 C
where B and C are n × n matrices.
(c) The complexity of solving a general linear equation Ax = b of size 2n × 2n is (2/3)(2n)3 = (16/3)n3 .
Suppose A has the property defined in part (b). By how much can the dominant term in the complexity
of solving Ax = b be reduced if we take advantage of the factorization property in part (b)? Explain
your answer.
6.16 Let A be an m × n matrix with linearly independent columns. Suppose Aij = 0 for i > j + 1. In other words,
the elements of A below the first subdiagonal are zero:
A11 A12 · · · A1,n−1 A1n
A21 A22 · · · A2,n−1 A2n
0 A 32 · · · A3,n−1 A3n
0 0 · · · A4,n−1 A4n
.. .. .. ..
. . . .
A= .
0 0 · · · An,n−1 Ann
0 0 ··· 0 An+1,n
0 0 ··· 0 0
. .. .. ..
.. . . .
0 0 ··· 0 0
Show that the Q-factor in the QR factorization of A has the same property: Qij = 0 for i > j + 1.
25
6.17 What is the QR factorization of the matrix
2 8 13
A= 4 7 −7 ?
4 −2 −13
You can use MATLAB to check your answer, but you must provide the details of all intermediate steps on
paper.
6.18 Suppose A and B are left-invertible matrices of the same size that satisfy
AAT = BB T .
26
7 LU factorization
7.1 (a) For what values of a1 , a2 , . . . , an is the n × n matrix
a1 1 0 ··· 0 0
a2 0 1 ··· 0 0
.. .. .. .. .. ..
. . .
A= . . .
an−2 0 0 ··· 1 0
an−1 0 0 ··· 0 1
an 0 0 ··· 0 0
nonsingular?
(b) Assuming A is nonsingular, what is the complexity of solving an equation Ax = b?
(c) Assuming A is nonsingular, what is the inverse A−1 ?
7.2 Consider the set of linear equations
(D + uv T )x = b
where u, v, and b are given n-vectors, and D is a given diagonal matrix. The diagonal elements of D are
nonzero and uT D−1 v 6= −1. (This implies that the matrix D + uv T is nonsingular.)
(a) What is the complexity of solving these equations using the following method?
• First calculate A = D + uv T .
• Then solve Ax = b using the standard method (via LU factorization, which costs (2/3)n3 flops).
(b) In exercise 4.2, the following expression for the inverse of D + uv T is derived:
1
(D + uv T )−1 = D−1 − D−1 uv T D−1 .
1 + v T D−1 u
This means we can also solve the equations by evaluating x = (D + uv T )−1 b, using the expression for
the inverse. What is the complexity of this method?
7.3 For each subproblem, we give a naive but correct algorithm, in MATLAB notation. Derive the complexity,
assuming that matrix inverses and solutions of linear equations are computed using the LU factorization. Use
the values f = (2/3)n3 , s = 2n2 for the flop counts of the factorization and solve steps (for a set of n linear
equations in n variables). Assume the cost of computing the inverse of an n × n matrix is f + ns = (8/3)n3
flops.
If possible, give a more efficient method. You do not have to provide any MATLAB code, as long as the
description of your method is clear. If you know several methods, give the most efficient one.
27
(d) Solve the set of equations
A B b
x= ,
C I c
where A ∈ Rn×n , B ∈ Rn×10n , C ∈ R10n×n , b ∈ Rn , and c ∈ R10n are given, and I is the identity
matrix of dimension 10n × 10n. The matrix
A B
C I
is nonsingular.
x = [A, B; C, eye(10*n)] \ [b; c]
(e) Solve the set of equations
I B b
x= ,
C I c
where B is m × n, C is n × m, and n > m. The matrix
I B b
x=
C I c
is nonsingular.
x = [eye(m), B; C, eye(n)] \ [b; c]
You can check your result in MATLAB, but you have to provide the details of your calculation.
7.5 You are given a nonsingular n × n matrix A and an n-vector b. You are asked to evaluate
where A−2 = (A2 )−1 and A−3 = (A3 )−1 . Describe in detail how you would compute x, and give the
complexity of the different steps in your algorithm. If you know several methods, give the most efficient one.
7.6 Consider K sets of linear equations
AD1 Bx1 = b1
AD2 Bx2 = b2
..
.
ADK BxK = bK .
The n × n matrices A and B are nonsingular and given. The matrices Dk are diagonal with nonzero diagonal
elements. The right-hand sides bk ∈ Rn are also given. The variables in the problem are the K n-vectors
xk , k = 1, . . . , K. Describe an efficient method for computing the vectors xk . Compare with the complexity
of solving K sets of linear equations of size n × n, using a standard method.
7.7 What is the most efficient way to compute each of the following n × n matrices X? The vectors u and v
have size n and the matrix A is n × n. A is nonsingular.
28
(a) X = vuT AvuT .
(b) X = vuT A−1 vuT .
(c) X = vuT (A + A−1 )vuT .
7.8 Suppose A is an n × n matrix, and u and v are n-vectors. What is the complexity of computing the matrix
B = A−1 uv T A−1 in each of the following cases?
(a) A is diagonal with nonzero diagonal elements.
(b) A is lower-triangular with nonzero diagonal elements.
(c) A is orthogonal.
(d) A is a general nonsingular matrix.
Explain your answers by describing the main steps in an efficient method for computing B. Give the flop
count of each step and the total flop count (keeping only dominant terms).
Ax1 = b1 , AT x2 = b2
where A is n × n, b1 and b2 are n-vectors, and A is nonsingular. The unknowns are the n-vectors x1 and x2 .
What is the most efficient way to solve the two problems? Clearly state the different steps in your algorithm
are and give their complexity.
Ax = b, (A + uv T )y = b,
where A is n × n, and u, v, and b are n-vectors. The variables are x and y. We assume that A and A + uv T
are nonsingular. Give an efficient method, based on the expression
1
(A + uv T )−1 = A−1 − A−1 uv T A−1 .
1 + v T A−1 u
Clearly state the different steps in your algorithm and give their complexity.
7.12 Assume A is a nonsingular n × n matrix. Show that thee inverse of the matrix
" #
A A + A−T
M= (12)
A A
is given by " #
−1
−AT A−1 + AT
M = . (13)
AT −AT
29
(a) Compare the complexity of the following two methods for solving a set of linear equations M x = b,
given A and b.
(i) Calculate A−1 , build the matrix M as defined in equation (12), and solve M x = b using the
standard method This method would correspond to the MATLAB code
x = [A, A+inv(A)’; A, A] \ b.
(ii) Calculate A−1 , build the matrix M −1 as defined in equation (13), and form the matrix-vector
product x = M −1 b. This method would correspond to the MATLAB code
x = [-A’ inv(A)+A’; A’ -A’]*b.
(b) Can you improve the fastest of the two methods described in part (a)?
7.13 Consider the set of linear equations with a 3 × 3 block coefficient matrix
A11 A12 A13 x1 b1
0 A22 A23 x2 = b2 . (14)
0 0 A33 x3 b3
The m × m matrices Aij and the m-vectors b1 , b2 , b3 are given. The variables are the three m-vectors x1 ,
x2 , x3 . In other words we have n = 3m equations in 3m variables. We assume that the matrices A11 , A22 ,
A33 are nonsingular.
where Aij is m × m and bi is an m-vector. The variables are the m-vectors xi . The diagonal blocks Aii
are nonsingular. Compare the complexity of your algorithm with the complexity of a standard method
for solving Km equations in Km variables.
The nine blocks in the coefficient matrix have size n × n. The matrix A is nonsingular, and the matrix D is
diagonal with nonzero diagonal elements. The vectors b, c, and d in the right-hand side are n-vectors. The
variables are the n-vectors x, y, z. If you know several methods, give the most efficient one. Clearly state
the different steps in your algorithm, give the complexity of each step, and the total complexity.
7.15 We define a 2m × 2m matrix
−2A 4A
B= ,
3A −5A
where A is a nonsingular m × m matrix.
30
(b) The complexity of solving the linear equations
−2A 4A x1 b1
=
3A −5A x2 b2
with variables x1 ∈ Rm , x2 ∈ Rm , using the standard method (i.e., using the command
[-2*A 4*A; 3*A -5*A] \ [b1; b2]
in MATLAB), is (2/3)(2m)3 = (16/3)m3 flops for large m.
Formulate a more efficient method. Clearly state the different steps in your algorithm and give the
complexity of each step, as well as the total complexity. If you know several methods, give the most
efficient one.
7.16 For each subproblem, describe an efficient method to evaluate the expression, and give its complexity. A is
a nonsingular n × n matrix, and vi , wi , i = 1, . . . , m, are n-vectors.
P
m
(a) viT A−1 wi
i=1
Pm
(b) viT (A + A−1 )wi
i=1
Pm Pm
(c) viT A−1 wj
i=1 j=1
m P
P m
(d) vi wjT A−1 wi vjT
i=1 j=1
If you know several methods, choose the most efficient one. Include only the dominant terms in the flop
counts, assuming m and n are large.
7.17 Let A be a nonsingular n × n matrix and b an n-vector. In each subproblem, describe an efficient method
for computing the vector x, and give the complexity of the method, including terms of order two (n2 ) and
higher. If you know several methods, give the most efficient one.
(a) x = (A−1 + A−2 )b.
(b) x = (A−1 + A−T )b.
(c) x = (A−1 + JA−1 J)b where J is the n × n matrix
0 0 ··· 0 1
0 0 ··· 1 0
.. .. .
J = ... ... . .
0 1 ··· 0 0
1 0 ··· 0 0
(J is the reverser matrix, i.e., the identity matrix with its columns reversed: Jij = 1 if i + j = n + 1
and Jij = 0 otherwise.)
7.18 Suppose A and B are n × n matrices with A nonsingular, and b, c and d are n-vectors. Describe an efficient
algorithm for solving the set of linear equations
A B 0 x1 b
0 AT B x2 = c
0 0 A x3 d
with variables x1 ∈ Rn , x2 ∈ Rn , x3 ∈ Rn . Give the complexity of your algorithm, including all terms that
are cubic or quadratic in n. If you know several methods, give the most efficient one.
31
7.19 Consider the linear equation
(A + ǫB)x = b,
where A and B are given n × n matrices, b is a given n-vector, and ǫ is a scalar parameter. We assume that
A is nonsingular, and therefore A + ǫB is nonsingular for sufficiently small ǫ. The solution of the equation is
x(ǫ) = (A + ǫB)−1 b,
a complicated nonlinear function of ǫ. In order to find a simple approximation of x(ǫ), valid for small ǫ, we
can expand x(ǫ) = (A + ǫB)−1 b in a series
x(ǫ) = x0 + ǫx1 + ǫ2 x2 + ǫ3 x3 + · · · ,
where x0 , x1 , x2 , x3 , . . . are n-vectors, and then truncate the series after a few terms. To determine the
coefficients xi in the series, we examine the equation
(A + ǫB)(x0 + ǫx1 + ǫ2 x2 + ǫ3 x3 + · · · ) = b.
We see that if this holds for all ǫ in a neighborhood of zero, the coefficients xi must satisfy
Describe an efficient method for computing the first k + 1 coefficients x0 , . . . , xk from (15). What is the
complexity of your method (number of flops for large n, assuming k ≪ n)? If you know several methods,
give the most efficient one.
7.20 Suppose A is a nonsingular n × n matrix.
are nonsingular. Describe an efficient method for solving the two equations. The variables are the two
n-vectors x1 and x2 , and the two scalars y1 and y2 .
If you know several methods, give the most efficient one. Take advantage of the fact that the 1,1 blocks
of the two coefficient matrices are the same. What is the complexity of your method?
7.21 Let A be a nonsingular n × n matrix and let u, v be two n-vectors that satisfy v T A−1 u 6= 1.
32
(b) Describe an efficient method for solving the two equations
A u y b
Ax = b, = .
vT 1 z c
The variables are the n-vectors x and y, and the scalar z. Describe in detail the different steps in your
algorithm and give the complexity of each step. If you know several methods, choose the most efficient
one.
7.22 Explain how you can solve the following problems using a single LU factorization and without computing
matrix inverses. The matrix A is a given nonsingular n × n matrix. For each problem, carefully explain the
different steps in your algorithm, give the complexity of each step, and the total complexity (number of flops
for large n, excluding the LU factorization itself). If you know several methods, give the most efficient one.
2
(a) Compute A−1 + A−T b, where b is a given n-vector.
(b) Solve the equation AXA = B for the unknown X, where B is a given n × n matrix.
Pn Pn
(c) Compute (A−1 )ij , the sum of all the entries of A−1 .
i=1 j=1
7.23 Explain how you can solve the following problem using an LU factorization of A. Given a nonsingular n × n
matrix A and two n-vectors b and c, find an n-vector x and a scalar y such that
Ax + yb = c and kxk2 = 1.
We assume that b 6= 0 and kA−1 ck < 1. Clearly state every step in your algorithm. How many solutions
(x, y) are there?
7.24 The pseudo-inverse of a matrix B with linearly independent rows is the matrix B † = B T (BB T )−1 . Note
that B † B is a symmetric matrix. It can be shown that B † is the only right inverse X of B with the property
that XB is symmetric.
(a) Assume A is a nonsingular n × n matrix and b is an n-vector. Show that the n × (n + 1) matrix
B= A b
(c) What is the complexity of computing the vector y in part (b) using an LU factorization of A? Give the
complexity, including all cubic and quadratic terms. If you know several methods, consider the most
efficient one.
7.25 Let A be an n × m matrix and B an m × n matrix. We compare the complexity of two methods for solving
(I + AB)x = b.
33
(a) In the first method we compute the matrix C = I + AB and then solve Cx = b using the standard
method (LU factorization). Give the complexity of this method.
(b) Suppose the matrix I + BA is nonsingular. Show that
(c) This suggests a second method for solving the equation: compute the solution via the formula
x = I − A(I + BA)−1 B b.
Describe an efficient method for evaluating this formula and give the complexity. Which of the two
methods is faster when m ≪ n? Explain your answer.
7.26 Let A be an n × n matrix that has an LU factorization A = LU (i.e., the general LU factorization A = P LU
with P = I). Suppose you are given L and U . Show that the ith diagonal element of the inverse, (A−1 )ii ,
can be computed in 2(n − i)2 flops if we ignore terms that are linear in n or constant.
7.27 The solution A−1 b of a set of linear equations is a nonlinear function of the elements of A. It is often useful
to know the derivatives of the solution with respect to parameters in A. This is important, for example, in
sensitivity analysis or when optimizing over the parameters.
Consider the function f : Rm → Rn defined as
f (y) = (A + y1 B1 + · · · + ym Bm )−1 b
where A, B1 , . . . , Bm are n × n matrices, with A nonsingular, and b is an n-vector. The function value at
y = 0 is f (0) = A−1 b. It can be shown that the derivative matrix of f (y) at y = 0 is
Df (0) = −A−1 B1 A−1 b −A−1 B2 A−1 b · · · −A−1 Bm A−1 b .
34
Pn
(b) Compute the sum of the columns of A−1 , i.e., compute j=1 (A−1 )1:n,j .
Pn
(c) Compute the sum of the rows of A−1 , i.e., compute i=1 (A−1 )i,1:n .
Describe each step in your algorithm and give its complexity (number of flops as a function of n). Include
in the flop counts terms that are cubic, quadratic, and linear in n. If you know several methods, give the
most efficient one (of lowest order, if we exclude the cost of the LU factorization of A).
7.30 Let A be a nonsingular n × n matrix, with columns a1 , . . . , an . Suppose we replace column i with a vector c
and call this new matrix Ã. We can write this matrix as à = A + (c − ai )eTi where ei is the ith unit vector.
(a) Suppose that (A−1 c)i = 0. Show that à is singular.
(b) Suppose that (A−1 c)i 6= 0. Show that à is nonsingular, with inverse
1
Ã−1 = A−1 − (A−1 c − ei )eTi A−1 .
(A−1 c) i
(c) Suppose that (A−1 c)i 6= 0. Use the formula in part (b) to describe an efficient algorithm, based on the
LU factorization of A, for solving the two equations
Ax = b, Ãy = b,
where b is a given n-vector. Give the complexity of each step in your algorithm and the total complexity.
7.31 Consider an m × m block matrix A of the form
A11 0 0 ··· 0 0
0 A 22 0 ··· 0 0
0 0 A 33 ··· 0 0
A= . .. .. .. .. .. .
.. . . . . .
0 0 0 ··· Am−1,m−1 0
Am1 Am2 Am3 ··· Am,m−1 Amm
Each block Aij and all the zero matrices in the definition are matrices of size n × n, so the matrix A itself
has size nm × nm. What is the complexity of solving Ax = b in each of the following cases?
Include in the complexity all terms that are cubic (n3 , n2 m, nm2 , m3 ) or higher-order. Explain your answers.
7.32 Let A be a nonsingular n × n matrix. Explain how each of the following problems can be solved with a single
LU factorization of A. Describe the steps in your algorithms and give the complexity of each step, including
all quadratic and cubic terms in the flop count. If you know several methods, give the most efficient one.
35
(c) Find the n-vectors x, y that satisfy
A A x b
= .
−A A y c
36
8 Least squares
8.1 Formulate the following problems as least squares problems. For each problem, give a matrix A and a vector
b such that the problem can be expressed as
(a) Minimize x21 + 2x22 + 3x23 + (x1 − x2 + x3 − 1)2 + (−x1 − 4x2 + 2)2 .
(b) Minimize (−6x2 + 4)2 + (−4x1 + 3x2 − 1)2 + (x1 + 8x2 − 3)2 .
(c) Minimize 2(−6x2 + 4)2 + 3(−4x1 + 3x2 − 1)2 + 4(x1 + 8x2 − 3)2 .
(d) Minimize xT x + kBx − dk2 where the p × n matrix B and the p-vector d are given.
(e) Minimize kBx − dk2 + 2kF x − gk2 . The p × n matrix B, the l × n matrix F , the p-vector d and the
l-vector g are given.
(f) Minimize xT Dx + kBx − dk2 . D is a n × n diagonal matrix with positive diagonal elements, B is p × n,
and d is a p-vector. D, B and D are given.
8.2 The figure shows a planar spiral inductor, implemented in CMOS, for use in RF circuits. The inductor is
characterized by four key parameters:
• n, the number of turns (which is a multiple of 1/4, but that needn’t concern us)
• w, the width of the wire
• d, the inner diameter
• D, the outer diameter
D
The inductance L of such an inductor is a complicated function of the parameters n, w, d, and D. It can
be found by solving Maxwell’s equations, which takes considerable computer time, or by fabricating the
inductor and measuring the inductance. In this problem you will develop a simple approximate inductance
model of the form
L̂ = αnβ1 wβ2 dβ3 Dβ4 ,
where α, β1 , β2 , β3 , β4 ∈ R are constants that characterize the approximate model. (Since L is positive, we
have α > 0, but the constants β2 , . . . , β4 can be negative.) This simple approximate model, if accurate
enough, can be used for design of planar spiral inductors.
The file inductordata.m contains data for 50 inductors, obtained from measurements. Download the file,
and execute it in MATLAB using [n, w, d, D, L] = inductordata. This generates 5 vectors n, w, d, D,
L of length 50. The ith elements of these vectors are the parameters ni , wi (in µm), di (in µm), Di (in µm)
and the inductance Li (in nH) for inductor i.. Thus, for example, w13 gives the wire width of inductor 13.
37
Your task is to find α, β1 , . . . , β4 so that
b i = αnβ1 wβ2 dβ3 Dβ4 ≈ Li
L for i = 1, . . . , 50.
i i i i
Your solution must include a clear description of how you found your parameters, as well as their actual
numerical values. Note that we have not specified the criterion that you use to judge the approximate model
b i and Li ); we leave that to your judgment.
(i.e., the fit between L
We can define the percentage error between L b i and Li as
b i − Li |
|L
ei = 100 .
Li
Find the average percentage error for the 50 inductors, i.e., (e1 + · · · + e50 )/50, for your model. (We are only
asking you to find the average percentage error for your model; we do not require that your model minimize
the average percentage error.)
Remark. The MATLAB command to solve a least squares problem
minimize kAx − bk2
is x = A \ b, i.e., the same command as for solving a set of linear equations. The meaning of the backslash
operator therefore depends on the context. If A is a square matrix, then A \ b solves the linear set of
equations Ax = b; if A is tall, it solves the least squares problem.
8.3 The figure shows m = 50 points (ti , yi ) as circles. These points are well approximated by a function of the
form
eαt+β
f (t) = .
1 + eαt+β
(An example, for two specific values of α and β, is shown in dashed line).
0.8
0.6
f (t)
0.4
0.2
−1 0 1 2 3 4 5
t
Formulate the following problem as a least squares problem. Find values of the parameters α, β such that
eαti +β
≈ yi , i = 1, . . . , m, (16)
1 + eαti +β
You can assume that 0 < yi < 1 for i = 1, . . . , m.
Clearly state the error function you choose to measure the quality of the fit in (16), and the matrix A and
the vector b of the least squares problem. Test your method on the example data in the file logistic_fit.m.
(The command [t, y] = logistic_fit; creates arrays with the points ti , yi .)
38
8.4 We have N points in R2 , and a list of pairs of points that must be connected by links. The positions of
some of the N points are fixed; our task is to determine the positions of the remaining points. The objective
is to place the points so that some measure of the total interconnection length of the links is minimized. As
an example application, we can think of the points as locations of plants or warehouses, and the links as the
routes over which goods must be shipped. The goal is to find locations that minimize the total transportation
cost. In another application, the points represent the position of modules or cells on an integrated circuit,
and the links represent wires that connect pairs of cells. Here the goal might be to place the cells in such a
way that the the total length of wire used to interconnect the cells is minimized.
The problem can be described in terms of a graph with N nodes, representing the N points. With each free
node we associate a variable (ui , vi ) ∈ R2 , which represents its location or position.
In this problem we will consider the example shown in the figure below.
(0.5, 1)
l4
(u2 , v2 )
(u1 , v1 ) l7 (1, 0.5)
l1
l3 l2 l6
(−1, 0)
(u3 , v3 )
l5
(0, −1)
Here we have 3 free points with coordinates (u1 , v1 ), (u2 , v2 ), (u3 , v3 ). We have 4 fixed points, with coordi-
nates (−1, 0), (0.5, 1), (0, −1), and (1, 0.5). There are 7 links, with lengths l1 , l2 , . . . , l7 . We are interested
in finding the coordinates (u1 , v1 ), (u2 , v2 ) and (u3 , v3 ) that minimize the total squared length
where the 6-vector x contains the six variables u1 , u2 , u3 , v1 , v2 , v3 . Give the coefficient matrix A and
the vector b.
(b) Show that you can also obtain the optimal coordinates by solving two smaller least squares problems
where u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ). Give the coefficient matrices Ā, Â and the vectors b̄ and b̂.
What is the relation between Ā and Â?
(c) Solve the least squares problems derived in part (a) or (b) using MATLAB.
8.5 Least squares model fitting. In this problem we use least squares to fit several different types of models to a
given set of input-output data. The data set consists of a scalar input sequence u(1), u(2), . . . , u(N ), and
a scalar output sequence y(1), y(2), . . . , y(N ), with N = 100. The signals are shown in the following plots.
39
2 3
2
1
u(t) 1
y(t)
0 0
−1
−1
−2
−2 −3
0 20 40 60 80 100 0 20 40 60 80 100
t t
We will develop and compare seven different models that relate the signals u and y. The models range in
complexity from a simple constant to a nonlinear dynamic model:
The first four models are memoryless. In a memoryless model the output at time t, i.e., y(t), depends only
the input at time t, i.e., u(t). Another common term for such a model is static.
In a dynamic model, y(t) depends on u(s) for some s 6= t. Models (e), (f), and (g) are dynamic models, in
which the current output depends on the current input and the previous input. Such models are said to
have a finite memory of length one. Another term is 2-tap system (the taps refer to taps on a delay line).
Each of the models is specified by a number of parameters, i.e., the scalars α, β, etc. You are asked to find
least squares estimates (α̂, β̂, . . . ) for the parameters, i.e., the values that minimize the sum-of-squares of
the errors between predicted outputs and actual outputs. Your solutions should include:
For example, the affine 2-tap model (part (f)) depends on three parameters α, β1 , and β2 . The least squares
estimates α̂, β̂1 , β̂2 are found by minimizing
N
X 2
(y(t) − α − β1 u(t) − β2 u(t − 1)) .
t=2
(Note that we start at t = 2 so u(t − 1) is defined). You are asked to formulate this as a least squares
problem, solve it to find α̂, β̂1 , and β̂2 , plot the predicted output
40
and the residual r(t) = ŷ(t) − y(t), for t = 2, . . . , N , and give the value of the RMS residual
N
!1/2
1 X
Rrms = (y(t) − ŷ(t))2 .
N − 1 t=2
The data for the problem are available from the class webpage in the m-file systemid.m. The command is
[u, y] = systemid.
8.6 In this problem we use least squares to fit a circle to given points (ui , vi ) in a plane, as shown in the figure.
We use (uc , vc ) to denote the center of the circle and R for its radius. A point (u, v) is on the circle if
(u − uc )2 + (v − vc )2 = R2 . We can therefore formulate the fitting problem as
P
m 2
minimize (ui − uc )2 + (vi − vc )2 − R2
i=1
with variables uc , vc , R. Show that this can be written as a least squares problem if we make a change of
variables and use as variables uc , vc , and w = u2c + vc2 − R2 .
Test your formulation on the problem data in the file circlefit.m on the course website. The commands
[u,v] = circlefit;
plot(u, v, ’o’);
axis equal
will create a plot of the m = 50 points (ui , vi ) in the figure. The following code plots the 50 points and the
computed circle.
(assuming your MATLAB variables are called uc, vc, and R).
41
8.7 Suppose you are given m (m ≥ 2) straight lines
Li = {pi + ti qi | ti ∈ R}, i = 1, . . . , m
in Rn . Each line is defined by two n-vectors pi , qi . The vector pi is a point on the line; the vector qi specifies
the direction. We assume that the vectors qi are normalized (kqi k = 1) and that at least two of them are
linearly independent. (In other words, the vectors qi are not all scalar multiples of the same vector, so the
lines are not all parallel.) We denote by
with variable y. Express the least squares problem in the standard form
is nonsingular.
(b) Show that the solution x̄, ȳ of the set of linear equations
I A x̄ b
=
AT 0 ȳ 0
is given by x̄ = b − Axls and ȳ = xls , where xls is the solution of the least squares problem
The p × q matrix A, the p-vector b, and the q-vector c are given. The variables are the q-vector x̄ and the
p-vector ȳ.
42
(b) From part (a) we know that the solution x̄, ȳ is unique. Show that x̄ minimizes kAx − bk2 + kx + ck2 .
8.10 Explain how you can solve the following problems using the QR factorization.
(a) Find the vector x that minimizes kAx − b1 k2 + kAx − b2 k2 . The problem data are the m × n matrix
A and two m-vectors b1 and b2 . The matrix A has linearly independent columns. If you know several
methods, give the most efficient one.
(b) Find x1 and x2 that minimize kAx1 − b1 k2 + kAx2 − b2 k2 . The problem data are the m × n matrix A,
and the m-vectors b1 and b2 . The matrix A has linearly independent columns.
8.11 Solving normal equations versus QR factorization. In this problem we compare the accuracy of the two
methods for solving a least squares problem
We take
1 1 −10−k
A = 10−k 0 , b = 1 + 10−k ,
0 10−k 1 − 10−k
for k = 6, k = 7 and k = 8.
(a) Write the normal equations, and solve them analytically (i.e., on paper, without using MATLAB).
(b) Solve the least squares problem in MATLAB, for k = 6, k = 7 and k = 8, using the recommended
method x = A \ b. This method is based on the QR factorization.
(c) Repeat part (b), using x = (A’*A) \ (A’*b). Compare the results of this method with the results of
parts (a) and (b).
Remark. Type format long to make MATLAB display more than five digits.
8.12 Least squares updating. Suppose x̂ is the solution of the least squares problem
d − cT x̂
ŷ = x̂ + (AT A)−1 c.
1 + cT (AT A)−1 c
(b) Describe an efficient method for computing x̂ and ŷ, given A, b, c and d, using the QR factorization
of A. Clearly describe the different steps in your algorithm. Give the complexity of each step and the
overall complexity. In your total flop count, include all terms that are cubic (n3 , mn2 , m2 n, m3 ) and
quadratic (m2 , mn, n2 ). If you know several methods, give the most efficient one.
8.13 Least squares downdating. Let A be an m × n matrix and b and m-vector. We assume that A has linearly
independent columns and define x̂ as the solution of the least squares problem
m
X
minimize kAx − bk2 = (aTi x − bi )2 (17)
i=1
43
(aTi is the ith row of A). We also define ŷk as the solution of the least squares problem (17) with the kth
term in the sum, corresponding to row k of A, removed:
X k−1
X m
X
minimize (aTi y − bi )2 = (aTi y − bi )2 + (aTi y − bi )2 . (18)
i6=k i=1 i=k+1
We assume that for each k = 1, . . . , m, the matrix formed by removing row k from A has linearly independent
columns, so the solution ŷk of (18) is unique.
with variables y ∈ Rn and z ∈ R. Give a simple argument why the solution is equal to y = ŷk and
z = bk − aTk ŷk . In other words, explain why the y-component of the solution of (19) is also the solution
of (18).
(b) Use the normal equations for (19) to show that
bk − aTk x̂
ŷk = x̂ − (AT A)−1 ak .
1 − aTk (AT A)−1 ak
(c) Give an efficient algorithm, based on the QR factorization of A and the expression in part (b), for
computing x̂ and the m vectors ŷ1 , . . . , ŷm . Clearly explain the steps in your algorithm and give the
overall complexity (dominant term in the flop count as a function of m and n). If you know several
algorithms, give the most efficient one.
where A is an m × n matrix with linearly independent columns. Suppose a is an m-vector, not in the range
of A. Let (ŷ, ẑ) be the solution of the least squares problem
2
y
minimize
A a − b
z
.
The variables in this problem are the n-vector y and the scalar z.
(b) Formulate an efficient algorithm for computing x̂, ŷ, ẑ, using the QR factorization of A. Give the
complexity of the algorithm (including all cubic and quadratic terms in the flop count).
8.15 Let A be an m × n matrix with linearly independent columns, and b an m-vector not in the range of A.
44
(b) Suppose we partition the matrices in the QR factorization of part (a) as
R11 R12
Q = Q1 Q2 , R= ,
0 R22
−1
where Q1 is m × n, Q2 is m × 1, R11 is n × n, R12 is n × 1 and R22 is a scalar. Show that xls = R11 R12
is the solution of the least squares problem
minimize kAx − bk2
and that R22 = kAxls − bk.
8.16 Let x̂ and ŷ be the solutions of the least squares problems
minimize kAx − bk2 , minimize kAy − ck2
where A is an m × n matrix with linearly independent columns, and b and c are m-vectors. We assume that
Ax̂ 6= b.
(a) Show that the m × (n + 1) matrix A b has linearly independent columns.
(b) Show that the solution of the least squares problem
2
u
minimize
A b − c
v
,
We compare the complexity of two methods for computing the K least squares solutions x̂(1) , . . . , x̂(K) .
(This question arises in K-fold cross-validation, for example.) The first method solves the problems as K
independent least squares problems. The second method takes advantage of the common blocks in the
matrices Ck .
45
(a) What is the complexity of solving the K least squares problems (20), using the standard method (QR
factorization) for each problem?
(b) A more efficient method is based on the following idea.
• First compute the matrices
“Efficient” here means that the complexity is substantially less than the complexity of the standard method
based on the QR factorization. What is the complexity of your algorithm (number of flops for large m and
n)? (Note: we assume that U , D, V are given; you are not asked to include the complexity of computing
these matrices.)
8.19 Let A be an m × n matrix with linearly independent columns. Explain how you can solve each of the
following problems using the QR factorization of A. Give the complexity of your algorithm, including in the
flop count all cubic and quadratic terms (as a function of m and n). If you know several methods, give the
most efficient one.
(a) Compute bT AA† b, where b is an m-vector.
(b) Solve the linear equation
I A x b
=
AT 0 y c
with variables x, y. The m-vector b and the n-vector c are given.
(c) Solve the matrix least squares problem
where B is an m × k matrix,
P with
P k ≪ n. The variable X is an n × k matrix. (Recall the definition of
Frobenius norm: kY kF = ( i j Yij2 )1/2 .)
46
9 Multi-objective least squares
9.1 Formulate the following problem as a least squares problem. Find a polynomial
p(t) = x1 + x2 t + x3 t2 + x4 t3
• The values p(ti ) at 4 given points ti in the interval [0, 1] is approximately equal to given values yi :
p(ti ) ≈ yi , i = 1, . . . , 4.
The points ti are given and distinct (ti 6= tj for i 6= j). The values yi are also given.
• The derivatives of p at t = 0 and t = 1 are small:
p′ (0) ≈ 0, p′ (1) ≈ 0.
• The average value of p over the interval [0, 1] is approximately equal to the value at t = 1/2:
Z 1
p(t) dt ≈ p(1/2).
0
Give A and b such that E(x) = kAx − bk2 . Clearly state the dimensions of A and b, and what their elements
are.
9.2 The figure shows an illumination system of n lamps illuminating m flat patches. The variables in the problem
are the lamp powers x1 , . . . , xn , which can vary between 0 and 1.
lamp j
rij
θij
patch i
The illumination intensity at (the midpoint of) patch i is denoted Ii . We use a simple linear model for the
illumination intensities Ii as a function of the lamp powers xj : for i = 1, . . . , m,
n
X
Ii = Aij xj .
j=1
The matrix A (with coefficients Aij ) is available from the class webpage (see below), and was constructed
as follows. We take
−2
Aij = rij max{cos θij , 0},
47
where rij denotes the distance between lamp j and the midpoint of patch i, and θij denotes the angle between
the upward normal of patch i and the vector from the midpoint of patch i to lamp j, as shown in the figure.
This model takes into account “self-shading” (i.e., the fact that a patch is illuminated only by lamps in the
halfspace it faces) but not shading of one patch caused by another. Of course we could use a more complex
illumination model, including shading and even reflections. This just changes the matrix relating the lamp
powers to the patch illumination levels.
The problem is to determine lamp powers that make the illumination levels Ii close to a given desired level
Ides . In other words, we want to choose the n-vector x such that
n
X
Aij xj ≈ Ides , i = 1, . . . , m,
j=1
but we also have to observe the power limits 0 ≤ xj ≤ 1. This is an example of a constrained optimization
problem. The objective is to achieve an illumination level that is as uniform as possible; the constraint is that
the elements of x must satisfy 0 ≤ xj ≤ 1. Finding the exact solution of this minimization problem requires
specialized numerical techniques for constrained optimization. However, we can solve it approximately using
least squares.
In this problem we consider two approximate methods that are based on least squares, and compare them
for the data generated using [A, Ides] = illumdata, with the MATLAB file illumdata.m from the course
website. The elements of A are the coefficients Aij . In this example we have m = 11, n = 7 so A is 11 × 7,
and Ides = 2.
(a) Saturate the least squares solution. The first method is simply to ignore the bounds on the lamp powers.
We solve the least squares problem
m X
X n
minimize ( Aij xj − Ides )2
i=1 j=1
ignoring the constraints 0 ≤ xj ≤ 1. If we are lucky, the solution will satisfy the bounds 0 ≤ xj ≤ 1,
for j = 1, . . . , n. If not, we replace xj with zero if xj < 0 and with one if xj > 1.
Apply this method to Pm the problem data generated by illumdata.m, and calculate the resulting value
of the cost function i=1 (Ii − Ides )2 .
(b) Weighted least squares. The second method is to solve the problem
m X
X n n
X
minimize ( Aij xj − Ides )2 + λ (xj − 0.5)2 ,
i=1 j=1 j=1
where the constant λ ≥ 0 is used to attach a cost to the deviation of the powers from the value 0.5,
which lies in the middle of the power limits. For λ = 0, this is the same least squares problem as in
part (a). If we take λ large enough, the solution of this problem will satisfy 0 ≤ xj ≤ 1.
Formulate this problem as a least squares problem in the variables x, and solve it for λ = 1, λ = 2,
λ = 3, etc., until you find a value of λ such that
Pmall components of the solution x satisfy 0 ≤ xj ≤ 1. For
that solution x, calculate the cost function i=1 (Ii − Ides )2 and compare with the value you obtained
in part (a).
9.3 De-noising using least squares. The figure shows a signal of length 1000, corrupted with noise. We are asked
to estimate the original signal. This is called signal reconstruction, or de-noising, or smoothing. In this
problem we apply a smoothing method based on least squares.
48
0.5
xcorr
0
−0.5
i
We will represent the corrupted signal as a vector xcor of size 1000. (The values can be obtained as
xcor = lsdenoising using the file lsdenoising.m.) The estimated signal (i.e., the variable in the problem)
will be represented as a vector x̂ of size 1000.
The idea of the method is as follows. We assume that the noise in the signal is the small and rapidly varying
component. To reconstruct the signal, we decompose xcor in two parts xcor = x̂ + v where v is small and
rapidly varying, and x̂ is close to xcor (x̂ ≈ xcor ) and slowly varying (x̂i+1 ≈ x̂i ). We can achieve such a
decomposition by choosing x̂ as the solution of the least squares problem
P
999
minimize kx − xcor k2 + λ (xi+1 − xi )2 , (21)
i=1
where λ is a positive constant. The first term kx − xcor k2 measures how much x deviates from xcor . The
P999
second term, i=1 (xi+1 − xi )2 , penalizes rapid changes of the signal between two samples. By minimizing
a weighted sum of both terms, we obtain an estimate x̂ that is close to xcor (i.e., has a small value of
P999
kx̂ − xcor k2 ) and varies slowly (i.e., has a small value of i=1 (x̂i+1 − x̂i )2 ). The parameter λ is used to
adjust the relative weight of both terms.
Problem (21) is a least squares problem, because it can be expressed as
where
A= √I , b=
xcor
,
λD 0
and D is a 999 × 1000 matrix defined as
−1 1 0 0 ··· 0 0 0 0
0 −1 1 0 ··· 0 0 0 0
0 0 −1 1 ··· 0 0 0 0
.
. .. .. .. .. .. .. ..
D= . . . . . . . ..
0 0 0 0 ··· −1 1 0 0
0 0 0 0 ··· 0 −1 1 0
0 0 0 0 ··· 0 0 −1 1
The matrix A is quite large (1999 × 1000), but also very sparse, so we will solve the least squares problem
by solving the normal equations
(I + λDT D)x = xcor . (22)
MATLAB provides special routines for solving sparse linear equations, and they are used as follows. There
are two types of matrices: full (or dense) and sparse. If you define a matrix, it is considered full by default,
49
unless you specify that it is sparse. You can convert a full matrix to sparse format using the command
A = sparse(A), and a sparse matrix to full format using the command A = full(A).
When you type x = A \ b where A is n × n, MATLAB chooses different algorithms depending on the type
of A. If A is full it uses the standard methods for general matrices. If A is sparse, it uses an algorithm that
takes advantage of sparsity. In our application, the matrix I + λDT D is sparse (in fact tridiagonal), so if we
make sure to define it as a sparse matrix, the normal equations will be solved much more quickly than if we
ignore the sparsity.
The command to create a sparse zero matrix of dimension m × n is A = sparse(m,n). The command
A = speye(n) creates a sparse n × n-identity matrix. If you add or multiply sparse matrices, the result
is automatically considered sparse. This means you can solve the normal equations (22) by the following
MATLAB code (assuming λ and xcor are defined):
D = sparse(999,1000);
D(:,1:999) = -speye(999);
D(:,2:1000) = D(:,2:1000) + speye(999);
xhat = (speye(1000) + lambda*D’*D) \ xcor;
Solve the least squares problem (21) with the vector xcor defined in lsdenoising.m, for three values of λ:
λ = 1, λ = 100, and λ = 10000. Plot the three reconstructed signals x̂. Discuss the effect of λ on the quality
of the estimate x̂.
9.4 Regularized least squares image deblurring. This exercise is on the image deblurring problem in §15.3.3 of
the textbook. The purpose is to develop a fast method for solving the regularized least-squares problem on
page 321:
minimize kAx − yk2 + λ(kDv xk2 + kDh xk2 ). (23)
Notation. A black-and-white image of size n×n is represented as an n×n matrix X with Xij the intensity
of pixel i, j, or as an n2 -vector x. We use column-major order when converting a matrix X to a vector x:
X1:n,1
X1:n,2
x= .. .
.
X1:n,n
In MATLAB the conversion can be done by the command x = X(:) or, equivalently, x = reshape(X, n^2, 1).
To convert an n2 -vector x to an n × n matrix X we use X = reshape(x, n, n).
In (23), the vectors x and y have length n2 . The vector y is given and represents an observed, noisy and
blurred image Y of size n × n. The variable x is the reconstructed n × n image X in vector form.
The following notation will be used to express the matrices A, Dv , Dh in (23). As in exercise 6.9, we use
T (v), with v an n-vector, to denote the n × n circulant matrix with v as its first column:
v1 vn vn−1 · · · v3 v2
v2 v1 vn · · · v4 v3
v3 v2 v1 · · · v5 v4
T (v) = . .. .. .. .. .. .
.. . . . . .
vn−1 vn−2 vn−3 · · · v1 vn
vn vn−1 vn−2 · · · v2 v1
50
We also define T (U ) for an n × n matrix U . If the columns of U are u1 , u2 , . . . , un , then T (U ) is the n2 × n2
matrix
T (u1 ) T (un ) T (un−1 ) · · · T (u3 ) T (u2 )
T (u2 ) T (u 1 ) T (un ) · · · T (u4 ) T (u3 )
T (u3 ) T (u 2 ) T (u1 ) · · · T (u5 ) T (u4 )
T (U ) = .. .. .. .. .. .. .
. . . . . .
T (un−1 ) T (un−2 ) T (un−3 ) · · · T (u1 ) T (un )
T (un ) T (un−1 ) T (un−2 ) · · · T (u2 ) T (u1 )
This structure is called block-circulant with circulant blocks (BCCB). Each block T (ui ) is a circulant n × n
matrix. The matrix T (U ) is block-circulant because each block column is a circular downward shift of the
previous block column.
Image blurring. A linear, spatially invariant blurring operation is defined by a set of coefficients Pkl ,
with k and l integers ranging from −r to r. We call P (or, more accurately, the function that maps a pair
(k, l) to the coefficient Pkl ) the point spread function (PSF). The integer r is the width of the PSF. We will
assume that 2r + 1 ≤ n (usually, r ≪ n). The blurring operation defined by the PSF P is a two-dimensional
convolution. It transforms an n × n image X into a blurred n × n image Y defined by
r
X r
X
Yij = Pkl Xi−k,j−l , i = 1, . . . , n, j = 1, . . . , n. (24)
k=−r l=−r
Note that the sum in this definition references some values of Xi−k,j−l with indices outside the interval [1, n].
To fully specify the blurring operation we therefore need to make assumptions about the image X outside
the frame. These assumptions are called boundary conditions. In this exercise we will use periodic boundary
conditions. Periodic boundary conditions assume that the image X is repeated periodically outside the
frame. One way to write this is to replace the convolution (24) with
r
X r
X
Yij = Pkl X̄i−k,j−l , i = 1, . . . , n, j = 1 . . . , n, (25)
k=−r l=−r
X X X
X̄ = X X X
X X X
and the indices i, j in X̄ij run from −n + 1 to 2n. For example, if n = 3, r = 1, the convolution with periodic
boundary conditions gives
Y P P−1,0 P1,0 P0,−1 P−1,−1 P1,−1 P01 P−1,1 P11 X
11 00 11
Y21 P10 P00 P−1,0 P1,−1 P0,−1 P−1,−1 P11 P01 P−1,1 X21
Y31 P−1,0 P10 P00 P−1,−1 P1,−1 P0,−1 P−1,1 P11 P01 X31
Y12 P01 P−1,1 P11 P00 P−10 P10 P0,−1 P−1,−1 P1,−1 X12
Y22 = P11 P01 P−1,1 P10 P00 P−1,0 P1,−1 P0,−1 P−1,−1 X22 .
Y32 P−1,1 P11 P01 P−1,0 P10 P00 P−1,−1 P1,−1 P0,−1 X32
Y13 P0,−1 P−1,−1 P1,−1 P01 P−1,1 P11 P00 P−1,0 P10 X13
Y23 P1,−1 P0,−1 P−1,−1 P11 P01 P−1,1 P10 P00 P−1,0 X23
Y33 P−1,−1 P1,−1 P0,−1 P−1,1 P11 P01 P−1,0 P10 P00 X33
51
where B is the following n × n matrix constructed from the PSF coefficients.
P00 P01 · · · P0r 0 ··· 0 P0,−r ··· P0,−1
P10 P 11 · · · P 1r 0 ··· 0 P1,−r ··· P1,−1
.. .. .. .. .. .. ..
. . . . . . .
Pr0 P r1 · · · P rr 0 ··· 0 Pr,−r ··· Pr,−1
0 0 ··· 0 0 ··· 0 0 ··· 0
B= .. .. .. .. .. .. .. .
. . . . . . .
0 0 · · · 0 0 ··· 0 0 ··· 0
P−r,0 P−r,1 · · · P−r,r 0 ··· 0 P−r,−r ··· P−r,−1
.. .. .. .. .. .. ..
. . . . . . .
P−1,0 P−1,1 · · · P−1,r 0 ··· 0 P−1,−r ··· P−1,−1
As we will see, periodic boundary conditions are very convenient mathematically. Although the assumption
of periodicity is not realistic, it is an acceptable approximation if r ≪ n.
Vertical and horizontal differencing. Next we discuss the matrices Dv and Dh in (23). The vector
Dv x is the image obtained by subtracting from X the image X shifted up over one pixel, so that
n X
X n
kDv xk2 = (Xij − Xi+1,j )2 .
i=1 j=1
Here too, the sum references values of Xij outside the frame (namely, the values Xn+1,j are needed when
i = n). To be consistent with the boundary conditions used for the blurring operation, we use periodic
boundary conditions and assume Xn+1,j = X1j . With this assumption, Dv is an n2 × n2 matrix
D 0 ··· 0
0 D ··· 0
Dv = . .. . . .. ,
.. . . .
0 0 ··· D
(The matrix E is zero, except for the elements 1, −1 in the first column.)
52
The last term kDh xk2 in (23) is a penalty on the horizontal differences in the image: Dh is defined by
n X
X n
kDh xk2 = (Xij − Xi,j+1 )2 .
i=1 j=1
We again use periodic boundary conditions and define Xi,n+1 = Xi1 . This gives
I −I 0 ··· 0 0 0
0 I −I ··· 0 0 0
0 0 I ··· 0 0 0
.
.. .
.. .. .. .. .. .
Dh = . . . .
0 0 0 · · · I −I 0
0 0 0 ··· 0 I −I
−I 0 0 ··· 0 0 I
The matrix W f is n2 × n2 and has n block rows and columns. The i, j block of W
f is Wij W where Wij is the
f
i, j element of W . (The operation that constructs W from W is known as a Kronecker product and written
f = W ⊗ W .) Using (29) and the property W H W = (1/n)I one can show that
W
fH W
W f = n2 I.
f −1 = (1/n2 )W
Therefore W f H and W
fWf H = n2 I.
It is important to keep in mind that we never need the matrix W f explicitly, since we use the fft2 and
f
ifft2 functions for the matrix-vector multiplications with W . The product v = W f u can be evaluated by
combining fft2 and the reshape function:
53
Finally, we need the two-dimensional counterpart of the factorization result for circulant matrices proved in
exercise 6.9. The result is as follows. Suppose U is an n × n matrix and u is the corresponding n2 -vector
(with the elements of U in column-major order). Then the n2 × n2 BCCB matrix T (U ) can be factored as
1 fH f u)W
f.
T (U ) = W diag(W (30)
n2
Assignment.
Assume A = T (B), Dv = T (E), Dh = T (E T ) are BCCB matrices, where B is a given n × n matrix and
E is the n × n matrix defined in (26). Use the factorization (30) to derive a fast algorithm (with order
n2 log n complexity) for solving the normal equations. Implement the fast algorithm in MATLAB or
Octave.
(b) Download the file deblur.mat on the course website and load it in MATLAB or Octave (load deblur).
The file contains two matrices Y and B. The matrix Y is the blurred image and can be displayed using
the command imshow(Y). The matrix B defines the blurring matrix A = T (B). (The image is from
the USC-SIPI Image Database at http://sipi.usc.edu/database.)
Test your deblurring code with several values of λ, plot the reconstructed image for each value (using
imshow(X)), and determine the value of λ that gives the best result (visually, in your judgment). It is
best to search for a good value of λ over a large range, by using a series of values evenly spaced on a
logarithmic scale, for example, λ = 10−6 , 10−5 , 10−4 , . . . .
54
10 Constrained least squares
10.1 Minimum-energy optimal control. A simple model of a vehicle moving in one dimension is given by
s1 (t + 1) 1 1 s1 (t) 0
= + u(t), t = 0, 1, 2, . . . .
s2 (t + 1) 0 0.95 s2 (t) 0.1
s1 (t) is the position at time t, s2 (t) is the velocity at time t, and u(t) is the actuator input. Roughly
speaking, the equations state that the actuator input affects the velocity, which in turn affects the position.
The coefficient 0.95 means that the velocity decays by 5% in one sample period (for example, because of
friction), if no actuator signal is applied. We assume that the vehicle is initially at rest at position 0:
s1 (0) = s2 (0) = 0.
We will solve the minimum energy optimal control problem: for a given time horizon N , choose inputs
u(0), . . . , u(N − 1) so as to minimize the total energy consumed, which we assume is given by
N
X −1
E= u(t)2 .
t=0
In addition, the input sequence must satisfy the constraint s1 (N ) = 10, s2 (N ) = 0. Our task therefore is to
bring the vehicle to the final position s1 (N ) = 10 with final velocity s2 (N ) = 0, as efficiently as possible.
(a) Formulate the minimum energy optimal control problem as a least norm problem
minimize kxk2
subject to Cx = d.
Clearly state what the variables x, and the problem data C and d are.
(b) Solve the problem for N = 30. Plot the optimal u(t), the resulting position s1 (t), and velocity s2 (t).
(c) Solve the problem for N = 2, 3, . . . , 29. For each N calculate the energy E consumed by the optimal
input sequence. Plot E versus N . (The plot looks best if you use a logarithmic scale for E, i.e.,
semilogy instead of plot.)
(d) Suppose we allow the final position to deviate from 10. However, if s1 (N ) 6= 10, we have to pay a
penalty, equal to (s1 (N ) − 10)2 . The problem is to find the input sequence that minimizes the sum of
the energy E consumed by the input and the terminal position penalty,
N
X −1
u(t)2 + (s1 (N ) − 10)2 ,
t=0
Remark. If C has linearly independent rows, then the MATLAB command x = C \ d computes a solution
to Cx = d, but it is not the least norm solution. We can use the command x = C’ * ((C*C’) \ d) to
compute the least norm solution. We can also use the QR factorization method, using the code
[Q, R] = qr(C’, 0);
x = Q * (R’ \ d);
10.2 Two vehicles are moving along a straight line. For the first vehicle we use the same model as in exercise 10.1:
s1 (t + 1) 1 1 s1 (t) 0
= + u(t), t = 0, 1, 2, . . . ,
s2 (t + 1) 0 0.95 s2 (t) 0.1
55
s1 (t) is the position at time t, s2 (t) is the velocity at time t, and u(t) is the actuator input. We assume that
the vehicle is initially at rest at position 0: s1 (0) = s2 (0) = 0. The model for the second vehicle is
p1 (t + 1) 1 1 p1 (t) 0
= + v(t), t = 0, 1, 2, . . . ,
p2 (t + 1) 0 0.8 p2 (t) 0.2
p1 (t) is the position at time t, p2 (t) is the velocity at time t, and v(t) is the actuator input. We assume that
the second vehicle is initially at rest at position 1: p1 (0) = 1, p2 (0) = 0.
Formulate the following problem as a least norm problem, and solve it in MATLAB (see the remark at the
end of exercise 10.1). Find the control inputs u(0), u(1), . . . , u(19) and v(0), v(1), . . . , v(19) that minimize
the total energy
X19 X19
u(t)2 + v(t)2
t=0 t=0
In other words, at time t = 20 the two vehicles must have velocity zero, and be at the same position. (The
final position itself is not specified, i.e., you are free to choose any value as long as s1 (20) = p1 (20).)
Plot the positions s1 (t) and p1 (t) of the two vehicles, for t = 1, 2, . . . , 20.
10.3 Explain how you would solve the following problems using the QR factorization.
Pn
(a) Find the solution of Cx = d with the smallest value of i=1 wi x2i :
P
n
minimize wi x2i
i=1
subject to Cx = d.
The problem data are the p × n matrix C, the p-vector d, and the n vector w. We assume that A has
linearly independent rows, and wi > 0 for all i.
(b) Find the solution of Cx = d with the smallest value of kxk2 − cT x:
minimize kxk2 − cT x
subject to Cx = d.
The problem data are the n-vector c, the p × n matrix C, and the p-vector d. We assume that C has
linearly independent rows.
10.4 Show how to solve the following problems using the QR factorization of A. In each problem A is an m × n
matrix with linearly independent columns. Clearly state the different steps in your method. Also give the
complexity, including all terms that are quadratic (order m2 , mn, or n2 ), or cubic (order m3 , m2 n, mn2 ,
n3 ). If you know several methods, give the most efficient one.
56
(c) Solve the least norm problem
minimize kxk2 + kyk2
subject to AT x − 2AT y = b.
The variables are the m-vectors x and y.
(d) Solve the quadratic minimization problem
minimize xT AT Ax + bT x + c.
10.5 If A is an m × n matrix with linearly independent columns, and D is an m × m diagonal matrix with positive
diagonal elements, then the coefficient matrix of the equation
2
D A x̂ b
=
AT 0 ŷ c
exists. The matrix Q1 has size n × m, Q2 has size n × 1, R11 has size m × m, R12 is m × 1, and R22 is a
scalar. Show that x̂ = Q2 R22 solves the optimization problem
minimize kx − bk2
subject to Ax = 0.
10.7 Suppose A is an m × n matrix with linearly independent columns. Let x̂ be the solution of the optimization
problem
minimize kAx − bk2 + 2cT x
with b ∈ Rm and c ∈ Rn , and let ŷ be the solution of
minimize ky − bk2
subject to AT y = c.
(a) Show that ŷ = b − Ax̂. (Hint. To find an expression for x̂, set the gradient of kAx − bk2 + 2cT x equal
to zero.)
57
(b) Describe an efficient method for calculating x̂ and ŷ using the QR factorization of A. Clearly state the
different steps in your algorithm and give the complexity, including all terms that are quadratic (m2 ,
mn, n2 ) or cubic (m3 , m2 n, mn2 , n3 ) in m and n.
10.8 Consider the underdetermined set of linear equations
Ax + By = b
where the p-vector b, the p × p matrix A, and the p × q matrix B are given. The variables are the p-vector x
and the q-vector y. We assume that A is nonsingular, and that B has linearly independent columns (which
implies q ≤ p). The equations are underdetermined, so there are infinitely many solutions. For example, we
can pick any y, and solve the set of linear equations Ax = b − By to find x.
Below we define four solutions that minimize some measure of the magnitude of x, or y, or both. For each
of these solutions, describe a method for computing x and y using a QR or LU factorization. Clearly specify
the matrices that you factor, and the type of factorization. If you know several methods, give the most
efficient one.
(a) The solution x, y with the smallest value of kxk2 + kyk2
(b) The solution x, y with the smallest value of kxk2 + 2kyk2 .
(c) The solution x, y with the smallest value of kyk2 .
(d) The solution x, y with the smallest value of kxk2 .
10.9 Let A be a real m × n matrix with linearly independent columns, and let b be a real m-vector. We consider
two least squares problems. The first problem is the standard
Here ei denotes the ith unit vector of length n (an n-vector with all its elements zero, except the ith element,
which is one).
(a) Let x̂ be the solution of (32). Show that the solution of (33) is
x̂i
x = x̂ − (AT A)−1 ei . (34)
((A A)−1 )
T
ii
The denominator in the second term is the ith diagonal element of the inverse (AT A)−1 .
(b) Describe an efficient algorithm, based on the QR factorization of A, to calculate x̂ and the vector x
in (34). Carefully state the different steps in your algorithm, and give the complexity of each step
(number of flops for large m, n).
10.10 Let A be an m × n matrix with linearly independent rows, and let d be a nonzero n-vector that satisfies
Ad = 0. Denote by x
b the solution of the least norm problem
minimize kxk
subject to Ax = b
and by yb the solution of the least norm problem
minimize kyk
subject to Ay = b
dT y = c.
58
Show that
c
yb − x
b= d.
dT d
10.11 Least squares fit of piecewise-polynomial function. This exercise is on the piecewise-polynomial fitting prob-
lem of page 340 of the textbook. Download the file splinefit.m and run it in MATLAB/Octave to create
two arrays t and y of length 100. The vector t contains 100 points in the interval [−1, 1]. The first 50 points
are in [−1, 0], the last 50 in [0, 1]. You are asked to find two cubic polynomials
p(x) = θ1 + θ2 x + θ3 x2 + θ4 x3 , q(x) = θ5 + θ6 x + θ7 x2 + θ8 x3 ,
that give the best least squares fit on the intervals [−1, 0] and [0, 1], respectively, and satisfy the continuity
constraints p(0) = q(0), p′ (0) = q ′ (0):
P
50 P
100
minimize (p(ti ) − yi )2 + (q(ti ) − yi )2
i=1 i=51
subject to p(0) = q(0), p′ (0) = q ′ (0).
The variables are the coefficients θ1 , . . . , θ8 . Formulate this as a constrained least squares problem
In MATLAB/Octave:
sol = [ A’*A, C’; C, zeros(p,p) ] \ [ A’*b; d ];
x = sol(1:n);
10.12 Suppose A is an m × n matrix with linearly independent rows aT1 , . . . , aTm , and b is an m-vector.
minimize kx − x̂k2
subject to Ax = b
f (x̂) = A† b + (I − A† A)x̂.
59
The index j counts the number of updates (projections) we have made.
(i) Show that f (x(j+1) ) = f (x(j) ). This implies that f (x(j) ) = f (x(0) ) for all j.
(ii) Show that
kx(j+1) − f (x(0) )k2 = kx(j) − f (x(0) )k2 − (bi − aTi x(j) )2 .
10.13 For each of the following constrained least squares problems, give the optimality conditions and explain how
to solve the problem using the QR factorization of A. Clearly state the different steps in your method and
give the complexity, including all terms that are quadratic (order m2 , mn, or n2 ) and cubic (order m3 , m2 n,
mn2 , n3 ). Also include the cost of the QR factorization. If you know several methods, give the most efficient
one.
(a)
minimize kAx − bk2
subject to cT x = d.
A is an m × n matrix with linearly independent columns. b is an m-vector. c is a nonzero n-vector. d
is a scalar. The variable x is an n-vector.
(b)
minimize kx − bk2 + ky − ck2
subject to AT x = AT y.
A is an m × n matrix with linearly independent columns. b and c are m-vectors. The variables x and
y are m-vectors.
10.14 Let A be an m × n matrix with linearly independent columns. For i = 1, . . . , n, define x̂(i) as the solution of
the constrained least squares problem
minimize kAxk2
(35)
subject to eTi x = −1.
(ei denotes the ith unit vector, i.e., column i of the n × n identity matrix.)
(b) Describe an algorithm for computing the n vectors x̂(1) , . . . , x̂(n) from the QR factorization of A. Give
a flop count for each step and the total flop count (keeping only dominant terms).
10.15 The figure shows a resistor circuit and the corresponding directed graph, which has five vertices and eight
edges. (However, note that the details of the circuit do not matter. The important features are that each
branch contains a resistor Rk > 0 and a voltage source Ek with the orientation shown.)
60
E2 R2
x2
+
−
2
x3 x8 1 3
x1 x5
E3 E8
3 8
−+
+−
R3 R8
E1 + + E5
− −
1 5 5
R1 R7 R6 R5
E7 E6 7 6
−+
−+
x7
x6 4
2 4
+
−
x4
E4 R4
(a) Show that the currents x1 , . . . , x8 in the circuit are the solution of the constrained least squares problem
P
8
minimize Rk (xk + Ek /Rk )2
k=1
subject to Cx = 0.
(Since the sum of the rows of C is zero, we can also remove one of the five equations in Cx = 0 and
obtain an equivalent constrained least squares problem.)
(b) Is the solution x of the constrained least squares problem in part (a) unique?
61
11 Cholesky factorization
11.1 Are the following matrices positive definite?
−1 2 3
(a) A = 2 5 −3 .
3 −3 2
(b) A = I − uuT where u is an n-vector with kuk < 1.
I B
(c) A = where B is an m × n matrix.
BT I + BT B
1 uT
(d) A = where u is an n-vector with kuk < 1.
u I
11.2 Show that the inverse of a positive definite matrix is positive definite.
11.4 A square matrix P is called a symmetric projection matrix if P = P T and P 2 = P . Show that a symmetric
projection matrix P satisfies the following properties.
The mapping f (x) = Ax is the orthogonal projection of x on the line through a. The mapping g(x) = Bx =
x − Ax is the difference between x and its projection on the line through a.
11.6 The elementwise product C = A ◦ B of two n × n matrices A, B is defined as the n × n matrix C with
elements Cij = Aij Bij . (In MATLAB notation: C = A .* B .)
where diag(d) is the diagonal matrix with the vector d on its diagonal. Use this observation to show
that the matrix A ◦ (ddT ) is positive semidefinite if A is positive semidefinite.
(b) Suppose A is n × n and positive semidefinite, and D is an n × m matrix. Show that A ◦ (DDT ) is
positive semidefinite.
Pm
(Hint. Write DDT as DDT = k=1 dk dTk where dk is the kth column of D.)
(c) Suppose A is n × n and positive definite, and D is an n × m matrix with at least one nonzero element
in every row. Show that A ◦ (DDT ) is positive definite.
62
11.7 Compute the Cholesky factorization of
4 6 2 −6
6 34 3 −9
A=
2
3 2 −1
−6 −9 −1 38
You can use MATLAB to verify the result, but you have to provide the details of your calculations.
11.8 For what values of the scalar a are the following matrices positive definite? To derive the conditions, factor
A using a Cholesky factorization and collect the conditions on a needed for the factorization to exist.
1 a
(a) A = .
a 1
1 a 0
(b) A = a 1 a .
0 a 1
1 0 1
(c) A = 0 1 1 .
1 1 a
1 0 a
(d) A = 0 1 0 .
a 0 1
a 1 0
(e) A = 1 −a 1 .
0 1 a
I aI
(f) A = . I is the n × n identity matrix.
aI I
I au
(g) A = . I is the n×n identity matrix and u = (1, 1, . . . , 1), the n-vector with all its elements
auT 1
equal to one.
1 1 1
(h) A = 1 a a .
1 a 2
11.9 Suppose A is an n × n positive definite matrix. For what values of the scalar β is the matrix
A −A
−A βA
positive definite?
11.10 Let A be a positive definite matrix of size n × n. For what values of the scalar a are the following matrices
positive definite?
A ae1 A e1 A ae1
(a) (b) (c) .
aeT1 1 eT1 a aeT1 a
(e1 = (1, 0, . . . , 0) denotes the first unit vector of length n.) Give your answer for each of the three problems
in the form of upper and/or lower bounds (amin < a < amax , a > amin , or a < amax ). Explain how you can
compute the limits amin and amax using the Cholesky factorization A = RT R.
63
11.11 In each subproblem D is a diagonal n × n matrix. For what values of D11 , . . . , Dnn is the matrix A positive
definite? Also give the Cholesky factorization of A (if the factorization exists).
D I
(a) A = .
I −D
I D
(b) A = .
D D
D 1
(c) A = .
1T 1
11.12 Suppose u is a nonzero n-vector. For what values of the scalar a are the following matrices positive definite?
Express your answer as upper and/or lower bounds on a that depend on the elements of u.
11.14 You are given the Cholesky factorization A = RT R of a positive definite matrix A of size n × n, and an
n-vector u.
64
11.15 A matrix A is tridiagonal if Aij = 0 for |i − j| > 1, i.e., A has the form
A11 A12 0 ··· 0 0 0
A21 A22 A23 · · · 0 0 0
0 A32 A33 · · · 0 0 0
A = ... ..
.
..
.
..
.
..
.
..
.
..
. .
0 0 0 · · · A A 0
n−2,n−2 n−2,n−1
0 0 0 · · · An−1,n−2 An−1,n−1 An−1,n
0 0 0 ··· 0 An,n−1 Ann
What is the complexity of computing the Cholesky factorization of a tridiagonal positive definite matrix of
size n × n? Count square roots as one flop and keep only the leading term in the total number of flops.
The dots indicate the positions of the nonzero elements; all the other elements are zero.
The Cholesky factor R of A has one of the following upper-triangular nonzero patterns. Which one is correct?
For each R, explain why R is or is not the Cholesky factor of A.
• • • • • •
• • • • •
(a) R = • •
(b) R =
• •
• •
• •
• • • • • •
• • • • •
(c) R = • •
(d) R =
• • •
• • • •
• •
where R11 and R22 are upper triangular matrices with positive diagonal elements. The blocks R11 , R12 ,
R22 , and the two identity matrices on the right-hand side, all have size n × n. What is the complexity of
computing this factorization?
65
(b) Suppose a > −1/kuk2 . It can be shown that the triangular factor in the Cholesky factorization I +
auuT = RT R can be written as
β1 γ1 u 2 γ1 u 3 ··· γ1 un−1 γ1 un
0 β2 γ2 u3 ··· γ2 un−1 γ2 un
0 0 β3 ··· γ3 un−1 γ3 un
R= . .. .. .. .. .. .
.. . . . . .
0 0 0 ··· βn−1 γn−1 un
0 0 0 ··· 0 βn
The entries to the right of the diagonal in row k are the entries uk+1 , . . . , un of the vector u, multiplied
with γk . Give an algorithm, with a complexity linear in n, for calculating the parameters β1 , . . . , βn
and γ1 , . . . , γn−1 , given the vector u and the scalar a.
where a is a real scalar. The i, j element is Aij = a|i−j| . The matrix A is a Toeplitz matrix, i.e., constant
along the diagonals. (Note, however, that A is not circulant; the problem does not require any of the
properties of circulant matrices used in exercise 6.9).
We will use the following property (which you are not asked to show): if A is positive definite, then its
Cholesky factor has the form
D11 0 0 ··· 0 1 a a2 ··· an−1
0 D 22 0 ··· 0 0 1 a ··· an−2
0 0 D ··· 0 0 0 1 ··· an−3
R= 33 , (36)
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
0 0 0 ··· Dnn 0 0 0 ··· 1
(a) Suppose A is positive definite, so its Cholesky factor can be written as (36). Give an expression for the
diagonal elements Dii . (Hint. Consider the diagonal elements in the equality A = RT R.)
(b) For what values of a is A positive definite?
(c) Give a simple explicit formula for the inverse of R in (36).
(d) Use part (c) to formulate a fast algorithm, with order n complexity, for solving Ax = b, when A is
positive definite.
11.20 Multi-class classification. In this exercise we implement the handwritten digit classification method of the
lecture on Cholesky factorization on a smaller data set (of 5000 examples) than used in the lecture. The
data are available in the file mnist.mat. The file contains four variables: Xtrain, Xtest, labels_train,
labels_test.
The variable Xtrain is a 5000×784 matrix Xtrain containing 5000 images. Each row is a 28×28 image stored
as a vector of length 784. (To display the image in row i, use imshow(reshape(Xtrain(i,:), 28, 28)’).)
66
The array labels_train is a vector of length 5000, with elements from {0, 1, . . . , 9}. The ith element is the
digit shown in row i of Xtrain . The 5000 images in Xtrain will be used to compute the classifier.
The matrix Xtest and vector labels_test are defined similarly, and give 5000 examples that will be used
to test the classifier.
• Binary classifiers. The multi-class classification method is based on 10 binary classifiers. Each of the
binary classifiers is designed to distinguish one of the digits versus the rest. We will use the polynomial
kernel of degree 3, as in the lecture.
To compute the classifier for digit k versus the rest, we solve a linear equation
(Q + λI)w = y,
Qij = (1 + xTi xj )3 , i, j = 1, . . . , N,
and xTi is the ith row of the matrix Xtrain . The coefficient λ is a positive regularization parameter. The
right-hand side y is a vector of length N , with yi = 1 if the image in row i of Xtrain is an example of
digit k, and yi = −1 otherwise.
In MATLAB the coefficients w for all ten binary classifiers can be computed as follows.
load mnist;
[N, n] = size(Xtrain);
Y = -ones(N, 10);
for j = 1:10
I = find(labels_train == j-1);
Y(I, j) = 1;
end;
W = ( (1 + Xtrain * Xtrain’).^3 + lambda * eye(N) ) \ Y;
Here we first create an N × 10 matrix Y with Yij = 1 if image i is an example of digit j − 1 and Yij = −1
otherwise. The computed matrix W has size N × 10 and contains in its jth column the coefficients w
of the classifier for digit j − 1 versus the rest. The binary classifier for digit j − 1 can be evaluated at
a new image z by computing
N
X
f˜(j) (z) = Wij (1 + z T xi )3
i=1
˜(j)
and assigning z to class j if f (x) is greater than or equal to zero.
• Multi-class classifier. We combine the ten binary classifiers into a multi-class classifier by taking the
maximum of the ten functions f˜(j) (z):
N
!
X
ˆ ˜(j)
f (z) = argmax f (z) = argmax T 3
Wij (1 + z xi ) .
j=1,...,10 j=1,...,10
i=1
In MATLAB, if z is an image stored as a column vector of length 784, the prediction fˆ(z) can be
computed as
[val, prediction] = max((( 1 + z’ * Xtrain’ ).^3) * W);
On the right-hand side we compute the maximum of a row vector of length 10. The first output
argument on the left-hand side is the maximum value; the second output argument is the column index
of the maximum (an integer between 1 and 10).
The predictions for all examples in the training set can be computed using
[val, prediction] = max((( 1 + Xtrain * Xtrain’).^3) * W, [], 2);
67
On the right-hand side we have a matrix of size 5000 × 10 as the first argument of max. The second
and third arguments are needed to indicate that we are taking the maximum over the elements in each
row. The first output argument on the left-hand side is a column vector of length N with the maximum
value in each row; the second output argument is a column vector of length N with the column indices
of the maximum elements in each row. This vector therefore contains the class predictions for the N
rows of Xtrain . The command
I = find( prediction - 1 ~= labels_train )
returns the indices of the rows of Xtrain that are misclassified by the multi-class classifier.
Similarly, we can compute the predictions for all examples in the test set using
[val, prediction] = max((( 1 + Xtest * Xtrain’).^3) * W, [], 2);
Implement this method for a range of regularization parameters λ = 1, 10, . . . , 107 . Plot the error rate for
training set and test set as a function of λ. Choose the λ that (approximately) gives the smallest error on
the test set, and give the confusion matrix of the classifier for this value of λ.
minimize xT Ax
subject to cT x = 1
is given by
1
x= A−1 c.
cT A−1 c
Hint. By writing xT Ax = xT RT Rx = kRxk2 , where R is the Cholesky factor of A, we can interpret
the problem as a constrained least squares problem.
(b) Suppose λ > 0. Show that the solution of the optimization problem
is given by
λ
x= A−1 c.
1 + λcT A−1 c
11.22 Let a and b be two nonzero n-vectors.
is always positive semidefinite. (The vector 1 is the n-vector with all its elements equal to one.)
(b) Assume n ≥ 3. Use the Cholesky factorization of A to derive the necessary and sufficient conditions
for A to be positive definite. Express the conditions in terms of the averages avg(a), avg(b), standard
deviations std(a), std(b), and correlation coefficient ρ of the vectors a and b. What do the conditions
mean on a scatterplot of a and b?
11.23 Let A be an m × n matrix, B a positive definite n × n matrix, and C a positive definite m × m matrix.
68
(a) Show that the matrix
B AT
A −C
is nonsingular, and that the following two expressions for its inverse are correct:
−1
B AT B −1 0 B −1 AT
= − (C + AB −1 AT )−1 AB −1 −I
A −C 0 0 −I
and −1
B AT 0 0 I
= + (B + AT C −1 A)−1 I AT C −1 .
A −C 0 −C −1 C −1 A
(b) Suppose B = RT R and C = U T U are the Cholesky factorizations of B and C. Define (x, y) as the
solution of the equation
B AT x b
= .
A −C y c
Show that x is the minimizer of the function
11.24 Let A be a positive definite matrix with negative off-diagonal entries (Aij < 0 for i 6= j), and let R be the
Cholesky factor of A.
(a) Show that the entries of R above the diagonal are negative (Rij < 0 for j > i).
(b) Use the result of part (a) to show that the entries of R−1 above the diagonal are positive ((R−1 )ij > 0
for j > i).
(c) Use the result of part (b) to show that all the entries of A−1 are positive.
69
12 Nonlinear equations
12.1 The nonlinear resistive circuit shown below is described by the nonlinear equation
f (x) = g(x) − (E − x)/R = 0.
The function g(x) gives the current through the nonlinear resistor as a function of the voltage x across its
terminals.
R
y = g(x)
E x
Use Newton’s method to find all the solutions of this nonlinear equation, assuming that g(x) = x3 −6x2 +10x.
Consider three cases:
(a) E = 5, R = 1.
(b) E = 15, R = 3.
(c) E = 4, R = 0.5.
Select suitable starting points by plotting f over the interval [0, 4], and visually selecting a good starting
point. You can terminate the Newton iteration when |f (x(k) )| < 10−8 . Compare the speed of convergence
for the three problems, by plotting |f (x(k) )| versus k, or by printing out |f (x(k) )| at each iteration.
12.2 The figure shows the function n(t) = 10te−2t + e−t .
2.5
1.5
n(t)
0.5
0
0 0.5 1 1.5 2 2.5 3
t
Use Newton’s method to answer the following questions.
(a) Find the two values of t at which n(t) = 2.
(b) Find the value of t at which n(t) reaches its maximum.
12.3 Find all values of x for which k(A + xI)−1 bk = 1 where
−3 0 0 1
A= 0 1 0 , b = 1 .
0 0 2 1
Use Newton’s method, applied to the equation g(x) = 1, where
1 1 1
g(x) = k(A + xI)−1 bk2 = 2
+ 2
+ .
(−3 + x) (1 + x) (2 + x)2
First plot g(x) versus x to determine the number of solutions and to select good starting points.
70
12.4 Use Newton’s method to find all solutions of the two equations
in the two variables x1 , x2 . Each equation defines an ellipse in the (x1 , x2 )-plane. You are asked to find the
points where the ellipses intersect.
12.5 Explain how you would solve the following problem using Newton’s method. We are given three distinct
points a = (a1 , a2 ), b = (b1 , b2 ), c = (c1 , c2 ) in a plane, and two positive numbers α and β. Find a point
x = (x1 , x2 ) that satisfies
kx − ak = αkx − bk, kx − ak = βkx − ck. (37)
Clearly state the equations f (x) = 0 to which you apply Newton’s method (these equations can be the two
equations (37) or an equivalent set of equations), and the linear equations you solve at each iteration of
the algorithm. You do not have to discuss the selection of the starting point, the stopping criterion, or the
convergence of the method.
71
13 Unconstrained optimization
13.1 Derive the gradient and Hessian of the following functions g. Show that ∇2 g(x) is positive definite.
p p
(a) g(x) = x21 + x22 + · · · + x2n + 1 = kxk2 + 1.
Hint. Express ∇2 g(x) as
1
∇2 g(x) = (I − uuT )
g(x)
where u is an n-vector with norm less than one. Then prove that I − uuT is positive definite.
p
(b) g(x) = kCxk2 + 1 where A is an m × n matrix with linearly independent columns.
Hint. Use the result of part (a) and the expression
13.2 The figure shows 42 data points (ui , vi ). The points are approximately on a straight line, except for the first
and the last point.
20
15
10
5
vi
−5
−10
−15
−10 −5 0 5 10
ui
(a) Fit a straight line v = α + βu to the points by solving a least squares problem
P
42
minimize g(α, β) = (α + βui − vi )2 ,
i=1
with variables α, β.
(b) Fit a straight line v = α + βu to the points by solving the unconstrained minimization problem
42 p
P
minimize g(α, β) = (α + βui − vi )2 + 25
i=1
using Newton’s method. Use as initial points the values of α and β computed in part (a). (With
this starting point, no line search should be necessary; for other starting points, a line search may be
needed.) Terminate the iteration when k∇g(α, β)k ≤ 10−6 .
72
(c) Fit a straight line v = α + βu to the points by solving the unconstrained minimization problem
P
42
minimize g(α, β) = (α + βui − vi )4
i=1
using Newton’s method. Use as initial points the values of α and β computed in part (a). (With this
starting point, no line search should be necessary.) Terminate the iteration when k∇g(α, β)k ≤ 10−6 .
(d) Plot the three functions α + βu computed in parts (a), (b) and (c) versus u, and compare the results.
13.3 Consider the problem of fitting a quadratic function f (t) = α+βt+γt2 to m given points (ti , yi ), i = 1, . . . , m.
Suppose we choose to estimate the parameters α, β, γ by minimizing the function
m
X
g(α, β, γ) = − log 1 − (α + βti + γt2i − yi )2
i=1
What is the complexity of your method? It is sufficient to give the exponent of the dominant term in
the flop count.
13.6 Define
n−1
Xp
2
g(x) = kx − ak + (xk+1 − xk )2 + ρ,
k=1
where a is a given n-vector and ρ is a given positive scalar. The variable is the n-vector x.
(a) Give expressions for the gradient and Hessian of g. Show that the Hessian is positive definite at all x.
73
(b) Describe an efficient method for computing the Newton step
13.7 Let A be an m × n matrix with linearly independent columns and QR factorization A = QR. Consider the
problem of minimizing
1
g(x) = kAx − bk2 − (cT x)2
2
where b is an m-vector and c is an n-vector. (Note the minus sign in front of the second term, so this is not
a least squares problem.)
cT y
x
b=y+ z
1 − cT z
13.8 We consider the problem of fitting a model fˆ(x) = θT F (x) to N data points (x1 , y1 ), . . . , (xN , yN ), with
xi ∈ Rn and yi ∈ R. The model is parameterized by a p-vector of parameters θ. The function F (x) : Rn →
Rp is a vector of basis functions. We formulate the model fitting problem as a minimization problem, and
determine θ by minimizing the function
N
X 1
g(θ) = h(F (xi )T θ − yi ) + kθk2
i=1
2
where h(z) is the function h(z) = log(ez + e−z ) − log 2. This function is shown in the figure.
ez + e−z
h(z) = log( )
2
z
−3 −2 −1 1 2 3
(a) Suppose we use Newton’s method to minimize g(θ). Describe in detail the linear equation that needs
to be solved at each iteration.
(b) Show that the Hessian of g(θ) is positive definite.
74
(c) Assume p ≫ N . Describe an efficient method, based on the Cholesky factorization, for solving the
Newton equation in part (a). You can use the following fact: if A is an N × p matrix and D a diagonal
N × N matrix with positive diagonal elements, then
What is the complexity of your algorithm (number of flops per Newton iteration) as a function of N
and p? In the flop count, include all terms of order three (N 3 , N 2 p, N p2 , p3 ) and higher. Do not
include the complexity of evaluating the vectors F (xi ).
75
14 Nonlinear least squares
14.1 A cell phone at location (x1 , x2 ) in a plane (we assume that the elevation is zero for simplicity) transmits an
emergency signal at time x3 . This signal is received at m base stations, located at positions (p1 , q1 ), (p2 , q2 ),
. . . , (pm , qm ). Each base station can measure the time of arrival of the emergency signal, within a few tens
of nanoseconds. The measured times of arrival are
1p
τi = (x1 − pi )2 + (x2 − qi )2 + x3 + vi , i = 1, . . . , m,
c
where c is the speed of light (0.3 meters/nanosecond), and vi is the noise or error in the measured time of
arrival. The problem is to estimate the cell phone position x = (x1 , x2 ), as well as the time of transmission
x3 , based on the time of arrival measurements τ1 , . . . , τm .
The m-file e911.m, available on the course web site, defines the data for this problem. It is executed as
[p,q,tau] = e911. The 9 × 1-arrays p and q give the positions of the 9 base stations. The 9 × 1-array tau
contains the measured times of arrival. Distances are given in meters and times in nanoseconds.
Determine an estimate x̂1 , x̂2 , x̂3 of the unknown coordinates and the time of transmission, by solving the
nonlinear least squares problem
P9
minimize fi (x)2
i=1
where
1p
(x1 − pi )2 + (x2 − qi )2 + x3 − τi .
fi (x1 , x2 , x3 ) =
c
Use the Levenberg–Marquardt method, with x1 = x2 = x3 = 0 as starting point. Your solution should
include:
• a description of the least squares problems that you solve at each iteration
• the computed estimates x̂1 , x̂2 , x̂3
P9
• a plot of g(x(k) ) − g(x̂) versus k, where g(x) = i=1 fi (x)2 .
14.2 In exercise 8.2 we developed a simple approximate model for the inductance of a planar CMOS inductor.
The model had the form L̂ = αnβ1 wβ2 dβ3 Dβ4 , and depended on five model parameters: α, β1 , β2 , β3 , β4 .
We calculated the five parameters by minimizing the error function
50
X
(log Li − log L̂i )2 ,
i=1
76
14.3 The figure shows m = 20 points (ti , yi ) as circles. These points are well approximated by a function of the
form
f (t) = αtβ eγt .
(An example is shown in dashed line.)
0.8
0.7
0.6
0.5
f (t)
0.4
0.3
0.2
0.1
0
0 1 2 3 4
t
Explain how you would compute values of the parameters α, β, γ such that
αtβi eγti ≈ yi , i = 1, . . . , m, (38)
using the following two methods.
(a) The Levenberg–Marquardt method applied to the nonlinear least squares problem
m
X 2
minimize αtβi eγti − yi
i=1
with variables α, β, γ. Your description should include a clear statement of the linear least squares
problems you solve at each iteration. You do not have to include a line search.
(b) Solving a single linear least squares problem, obtained by selecting a suitable error function for (38)
and/or making a change of variables. Clearly state the least squares problem, the relation between its
variables and the parameters α, β, γ, and the error function you choose to measure the quality of fit
in (38).
14.4 We have discussed the problem of fitting a polynomial
p(t) = c1 + c2 t + c3 t2 + · · · + cn tn−1
to observations (t1 , y1 ), . . . , (tN , yN ). The simplest method is to solve the least squares problem
N
X
minimize (c1 + c2 ti + c3 t2i + · · · + cn tin−1 − yi )2 (39)
i=1
77
As a variation, one can seek to minimize the sum of the squares of the distances of the points (ti , yi ) to the
graph of the polynomial, as in the figure on the right. To formulate this as an optimization problem, we first
note that the squared distance between the point (ti , yi ) and the graph of the polynomial is given by
min (p(ui ) − yi )2 + (ui − ti )2 .
ui
The problem of finding a polynomial that minimizes the sum of the squared distances can therefore be
written as an optimization problem
N
X N
X
2
minimize c1 + c2 ui + c3 u2i + · · · + cn uin−1 − yi + (ui − ti )2 , (40)
i=1 i=1
ρ4
(u2 , v2 )
(u1 , v1 ) ρ1 ρ7 (ū3 , v̄3 ) = (1, 0.5)
ρ3 ρ2 ρ6
(ū1 , v̄1 ) = (−1, 0)
(u3 , v3 )
ρ5
78
We will formulate the problem in terms of a K × N matrix B and three K-vectors c, d, ρ, defined as follows.
The rows in B and the elements of b, c, and ρ correspond to the different edges in the graph.
The other elements of the kth row of B are zero. ρk is a (noisy) measurement of the distance ((ui −
uj )2 + (vi − vj )2 )1/2 .
• If row k corresponds to an edge between free point i and anchor point j, we take
The other elements in row k of B are zero. ρk is a measurement of the distance ((ui − ûj )2 +(vi −v̂j )2 )1/2 .
The problem is to find values of u, v that satisfy lk (u, v) ≈ ρk for k = 1, . . . , K. This can be posed as a
nonlinear least squares problem
XK
g(u, v) = fk (u, v)2 (41)
k=1
(Here we define fk as lk (u, v)2 − ρ2k rather than lk (u, v) − ρk to simplify the calculation of the derivates.)
The file networkloc.m on the class webpage contains the data that we will use in the problem. It can be
executed in MATLAB using the command
This creates the problem data B, c, d, ρ for the network shown in the figure below.
79
1
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
There are 50 free nodes (N = 50), 9 anchor nodes shown as squares (M = 9), and 389 edges (K = 389).
Find estimates of u and v by solving (41) using the Levenberg–Marquardt method. When testing your code,
try several randomly generated starting points in the square [0, 1] × [0, 1] (using the MATLAB commands u
= rand(N,1), v = rand(N,1)). Terminate the iteration when k∇g(u, v)k2 ≤ 10−6 .
14.6 As in exercise 8.6 we consider the problem of fitting a circle to m points in a plane, but with a different error
function. In this exercise we formulate the problem as a nonlinear least squares problem
m p
P 2
minimize g(uc , vc , R) = (ui − uc )2 + (vi − vc )2 − R
i=1
p
with variables uc , vc , R. Note that | (ui − uc )2 + (vi − vc )2 − R| is the distance of the point (ui , vi ) to the
circle, so in this formulation we minimize the sum of the squared distances of the points to the circle.
Apply the Levenberg–Marquardt method to this nonlinear least squares problem with the data in circlefit.m.
Terminate the iteration when k∇g(uc , vc , R)k ≤ 10−5 . You can select a good starting point from the plot of
the 50 points (ui , vi ) or from the solution of exercise 8.6. Include in your solution:
• a description of the least squares problem you solve at each iteration (the matrix A and the vector b)
• the MATLAB code
• the computed solution uc , vc , R
• a plot of the computed circle, created with the commands
t = linspace(0, 2*pi, 1000);
plot(u, v, ’o’, R * cos(t) + uc, R * sin(t) + vc, ’-’);
axis square
(assuming your MATLAB variables are called uc, vc, and R).
14.7 Suppose you are asked to fit the following three functions f (v, w) to experimental data:
Each function has two variables (v, w), and depends on several parameters (α, β, γ, δ). Your task is to
calculate values of these parameters such that
f (vi , wi ) ≈ ti , i = 1, . . . , N.
80
The problem data ti , vi , wi are given. We assume that N > 4, and vi > 0, wi > 0, ti > 0. In the third
subproblem, we also assume that ti < 1. The exponents of v, w, and v + w in the definitions of f are allowed
to be positive, negative, or zero, and do not have to be integers.
For each of the three functions, explain how you would formulate the model fitting problem as a linear least
squares problem
minimize kAx − bk2
or as a nonlinear least squares problem
P
m
minimize ri (x)2 .
i=1
You are free to use any reasonable error criterion to judge the quality of the fit between the experimental
data and the model. Your solution must include a clear statement of the following:
• the variables x in the linear or nonlinear least squares problem
• the error criterion that you minimize
• if you use linear least squares: the matrix A and the vector b
• if you use nonlinear least squares: the functions ri (x), and the matrix A(k) and the vector b(k) in the
linear least squares problem
minimize kA(k) x − b(k) k2
that you solve at iteration k of the basic Gauss–Newton method.
14.8 We revisit the data fitting problem of exercise 8.3. In that problem we used (linear) least squares to fit a
function
eαt+β
f (t) =
1 + eαt+β
to 50 points (ti , yi ). To formulate the problem as a least squares problem we applied a nonlinear transfor-
mation log(y/(1 − y)) (the inverse of the function f ) to the points yi , and minimized the function
m
X 2
yi
αti + β − log( ) .
i=1
1 − yi
An example is shown in the following figure (for a different set of points than in exercise 8.3.
1
6
0.8 4
log(y/(1 − y))
2
0.6
0
y
0.4
−2
0.2 −4
−6
0
−8
−2 −1 0 1 2 3 4 −2 −1 0 1 2 3 4
t t
As can be seen in the left-hand figure, the quality of the fit is not uniform (better for yi near 1 or 0 than
at the other points). A second problem with this approach is that it requires that 0 < yi < 1 for all data
points.
81
In this exercise we compute the true least squares fit on the original scale, by solving the optimization
problem
Xm 2
eαti +β
minimize − y i . (42)
i=1
1 + eαti +β
This is a nonlinear least squares problem with two variables α, β.
Download the file logistic_gn.m, and execute in MATLAB as [t, y] = logistic_gn;. This is the set
of 50 points used in the figures above. Solve the nonlinear least squares problem (42) using the Levenberg–
Marquardt method. You can use as starting point the solution of the linear least squares method of exer-
cise 8.3 (which applies because 0 < yi < 1 at all points), or simply take α = β = 0. Terminate the iteration
when k∇g(α, β)k ≤ 10−6 , where g(α, β) is the cost function in (42). Compare the solution with the result
of the linear least squares method.
14.9 The image taken by a camera can be described by a projective transformation F from R3 to R2 , i.e., a
transformation
1
F (x) = T (Cx + d)
f x+g
where C is a 2 × 3 matrix, d is a 2-vector, f is a 3-vector, and g is a scalar. The 2-vector F (x) is the
projection of the point x ∈ R3 on the image plane of the camera.
Suppose a small (point) object at unknown location x ∈ R3 is viewed by l cameras, each described by a
projective transformation Fi (x) = (Ci x + di )/(fiT x + gi ). This gives l measurements
1
yi = (Ci x + di ) + vi , i = 1, . . . , l,
fiT x + gi
where vi is unknown measurement error. To estimate the position x from the l camera views, we solve the
nonlinear least squares problem
l
X
2
1
minimize
(C x + d ) − y
i
. (43)
fT x + g i i
i=1 i i
The variable is the vector x ∈ R3 . The vectors yi ∈ R2 , and the camera parameters Ci ∈ R2×3 , di ∈ R2 ,
fi ∈ R3 , gi ∈ R are given.
Suppose we solve problem (43) using the basic Gauss–Newton method. Describe in detail the linear least
squares problem that is solved at the iteration that updates x(k) to x(k+1) . You can assume that fiT x(k) +gi >
0 for i = 1, . . . , l.
with variables θ1 , θ2 , θ3 , θ4 . Describe one iteration of the Levenberg–Marquardt method for this problem.
Your description should consist of a detailed statement of the (linear) least squares problem solved at each
iteration.
82
14.11 Consider the optimization problem
n
X
minimize kAx − bk2 + ρ (x2i − 1)2 ,
i=1
where A is an m × n matrix, b is an m-vector, and ρ is a positive constant. The first term in the cost function
is the standard least squares objective. The second term forces the variables xi to be close to ±1.
(a) Write the problem as a nonlinear least squares problem (minimize kf (x)k2 ). Clearly state how f is
defined.
(b) Describe in detail the (linear) least squares problem solved in each iteration of the Levenberg–Marquardt
algorithm for solving the nonlinear least squares problem in part (a).
83
15 Matrix norm and condition number
15.1 For each of the following matrices A, give the 2-norm kAk2 and the Frobenius norm kAkF .
(a) A matrix with one row:
A= A11 A12 ··· A1n .
(b) A matrix of the form A = uuT where u is a given n-vector.
(c) A matrix of the form A = uv T where u and v are given n-vectors.
15.2 Compute the 2-norm of each of the following matrices, without using MATLAB.
1 1 0 1 −1 0
1 1 1 −1 1 1
, , 0 , 1 1 0 .
1 1 1 1
0 0 −3/2 0 0 −3/2
(c) Use the result of part (b) to show that kAk2 = kAT k2 .
15.4 Let U and V be tall m × n matrices (i.e., m > n) with orthonormal columns. Define A = U V T . For each
of the following three statements, either show that it is true, or give a small example (i.e., a specific U , V )
for which it is false.
(a) A is nonsingular.
(b) A is orthogonal.
(c) kAk2 = 1.
15.5 Let P be a nonzero symmetric projection matrix (exercise 11.4). Show that kP k2 = 1.
15.6 Let A be an m × n matrix with kAk2 < 1.
(a) Show that the matrix I − AT A is positive definite.
(b) Show that the matrix
I A
AT I
is positive definite.
15.7
0 0 −104 0
0 0 0 −10
A=
0
.
10−3 0 0
10−2 0 0 0
84
(c) What is the 2-norm of the inverse of A?
(d) What is the condition number of A?
Explain your answers, without referring to any MATLAB results. (Of course, you are free to check the
answers in MATLAB.)
15.8 Give the 2-norm kAk2 of each of the following matrices A, without using MATLAB. If A is nonsingular, also
give kA−1 k2 and κ(A).
1 1
(a) A =
1 1
1 −1
(b) A =
1 1
1 1 0
(c) A = 1 1 0
0 0 −3/2
1 −1 0
(d) A = 1 1 0
0 0 −3/2
0 −1 0
(e) A = 2 0 0
0 0 −3
2 −1 0
(f) A = 2 1 0
0 0 −3
2 −1 −3
(g) A = −2 1 3
2 −1 −3
15.9 Suppose Q is orthogonal. For each of the following matrices A, give kAk2 and, if A is invertible, also kA−1 k2 .
Explain your answers.
Q −Q
(a) A = .
Q Q
Q Q
(b) A = .
Q Q
15.10 The table shows kAx(i) k and kx(i) k for four vectors x(1) , x(2) , x(3) , x(4) , where A is a nonsingular n × n
matrix.
x kxk kAxk
(1)
x 1 100
x(2) 100 1
x(3) 103 104
x(4) 10−3 102
What are the best (i.e., greatest) lower bounds on kAk2 , kA−1 k2 and κ(A) that you can derive based on
this information?
85
15.11 Consider a set of linear equations Ax = b with
1 + 10−8 1 1
A= , b= .
1 1 1
15.12 Sort the following matrices in order of decreasing condition number (without using MATLAB):
105 1 105 1
A1 = , A2 = ,
1 −105 1 −10−5
10−5 1 105 1
A3 = , A4 = .
1 −10−5 1 10−5
If any of the matrices is singular, take ∞ as its condition number. Explain your answer.
15.13 Condition numbers and diagonal scaling. The matrix A that represents a linear function y = Ax depends
on the units we choose for x and y. Suppose for example that the components of x are currents and the
components of y are voltages, and that we have y = Ax with x in amperes and y in volts. Now suppose x̃1
is x1 expressed in milliamperes, i.e., x̃1 = 1000x1 . Then we can write
y1 A11 /1000 A12 · · · A1n x̃1 x
e1
y2 A21 /1000 A22 · · · A2n x2 x2
.. = .. .. . . .
. .
. = AD .
.
. . . . . . .
yn An1 /1000 An2 · · · Ann xn xn
where D is a diagonal matrix with diagonal elements D11 = 1/1000, D22 = · · · = Dnn = 1.
Likewise, if ỹ1 is y1 expressed in millivolts, we have
ỹ1 1000 A11 1000 A12 ··· 1000 A1n x1
y2 A21 A22 ··· A2n x2
.. = .. .. .. .. .. = DAx
. . . . . .
yn An1 An2 ··· Ann xn
where D is a diagonal matrix with diagonal elements D11 = 1000, D22 = · · · = Dnn = 1.
In general, changing the units for x corresponds to replacing A with AD where D is positive diagonal matrix;
changing the units for y corresponds to replacing A with DA where D is positive and diagonal.
In this problem we examine the effect of scaling columns or rows of a matrix on its condition number.
(a) Prove the following properties of the matrix norm. We assume A is n × n with columns ai :
A = a1 a2 · · · an .
86
(i) kAk2 ≥ maxi=1,...,n kai k (i.e., the norm of A is greater than or equal to the norm of each column
ai )
(ii) kA−1 k2 ≥ 1/ mini=1,...,n kai k (i.e., 1/kA−1 k is less than or equal to the norm of each column ai )
(iii) The condition number satisfies the lower bound
maxi=1,...,n kai k
κ(A) ≥ .
mini=1,...,n kai k
In other words, if the columns of A are very different in norm, (i.e., maxi kai k ≫ mini kai k), then A will
certainly have a large condition number. In practice, it is therefore recommended to scale the columns
of a matrix so that they are approximately equal in norm. Similar comments apply to row scaling.
(b) As an example, consider the matrix
1 3 · 10−3 11
A = −2 · 10−2 105 −4 · 102 .
1 104 108
15.14 The figure below shows three possible experiments designed to estimate the magnitudes of signals emitted
by four sources. The location of the sources is indicated by the empty circles. The solid circles show the
location of the sensors. The output yi of sensor i is given by
2 2 2 2
yi = x1 /ri1 + x2 /ri2 + x3 /ri3 + x4 /ri4
where xj is the (unknown) magnitude of the signal emitted by source j and rij is the (given) distance from
source j to sensor i.
3 4
1234 1234 1234
3 4
For each of the three configurations, we can determine the distances rij and write these equations as
Ax = y
where
2
1/r11 2
1/r12 2
1/r13 2
1/r14
x1 y1
2
1/r21 2 2 2 x2 y2
1/r22 1/r23 1/r24
A=
1/r2 2 2 2
,
x=
x3 , y=
y3 . (44)
31 1/r32 1/r33 1/r34
2 2 2 2 x4 y4
1/r41 1/r42 1/r43 1/r44
From the measured sensor outputs y we can then determine x by solving the equations Ax = y.
87
There is a measurement error ∆y in the sensor readings, which can be as large as 0.01%, i.e., k∆yk/kyk ≤
10−4 . From the analysis in section 6.3 of the EE133A Lecture Notes, we have the following bound on the
relative error in x:
k∆xk k∆yk
≤ κ(A) ≤ κ(A) · 10−4
kxk kyk
where κ(A) is the condition number of A.
(a) Download the MATLAB file expdesign.m from the class webpage and execute it in MATLAB as
[A1, A2, A3] = expdesign. The three matrices A1 , A2 , A3 are the values of the matrix A in (44) for
each of the three experiments.
Compute the condition numbers of A1 , A2 , A3 using the MATLAB command cond(A).
(b) Based on the results of part (a), which configuration would you prefer?
(c) Can you give an intuitive argument for your conclusion in part (b)?
t1 = 1, t2 = 2, t3 = 3, ..., tn−1 = n − 1, tn = n.
(a) Show that A is singular for ǫ = 0. Verify that for ǫ 6= 0, the inverse is given by
1 1 −2
1
A−1 = 1 1 − ǫ −2 .
ǫ
−1 −1 2 + ǫ
is badly conditioned when ǫ is small (and nonzero). More specifically, give a ∆b 6= 0 such that
k∆xk 1 k∆bk
≥
kxk |ǫ| kbk
88
15.17 Suppose A is a nonsingular n × n matrix. We denote by αk the (Euclidean) norm of the kth column of A
and by βi the (Euclidean) norm of the ith row of A:
v v
u n u n
uX uX
αk = t 2
aik , βi = t a2ik .
i=1 k=1
Show that
max{αmax , βmax }
κ(A) ≥ ,
min{αmin , βmin }
where
15.18 Let L be a nonsingular n × n lower triangular matrix with elements Lij . Show that
maxi=1,...,n |Lii |
κ(L) ≥ .
minj=1,...,n |Ljj |
15.19 The graph shows the condition number of one of the following matrices as a function of t for t ≥ 0.
t 1 t t t 0 t −t
A1 = , A2 = , A3 = , A4 = .
1 −t −t 1 0 1+t −t 1
10
8
condition number
0
0 1 2 3 4 5
t
Which of the four matrices was used in the figure? Carefully explain your answer.
15.20 We define A as the n × n lower triangular matrix with diagonal elements 1, and elements −1 below the
diagonal:
1 0 0 ··· 0 0
−1 1 0 · · · 0 0
−1 −1 1 · · · 0 0
A= .. .. .. . . .. ...
. . . . . .
−1 −1 −1 · · · 1 0
−1 −1 −1 · · · −1 1
89
(c) Show with an example that small errors in the right-hand side of Ax = b can produce very large errors
in x. Take b = (0, 0, . . . , 0, 1) (the n-vector with all its elements zero, except the last element, which is
one), and find a nonzero ∆b for which
k∆xk k∆bk
≥ 2n−2 ,
kxk kbk
15.22 Let A and E be two n × n matrices, with A nonsingular and kA−1 Ek2 < 1.
(a) Show that A + E is nonsingular.
(b) Let b be a nonzero n-vector. Define x and y as the solutions of the linear equations
Ax = b, (A + E)y = b.
90
(b) Let A be a nonsingular matrix. Use properties of the 2-norm to show the two inequalities
(c) Let A be a square matrix that satisfies kI − Ak2 < 1. Combine the inequalities in part (b) to show
1
kA−1 k2 ≤ .
1 − kI − Ak2
91
16 Algorithm stability
16.1 If we evaluate
1 − cos x
(45)
sin x
at x = 10−2 , rounding the cosine and sine to 4 significant digits using the chop command1 , we obtain
>> (1 - chop(cos(1e-2), 4)) / chop(sin(1e-2), 4)
ans =
0
The first four digits of the correct answer are 5.000 10−3 . The large error is due to cancellation in the
numerator 1 − cos(x): cos 10−2 = 0.99995000 . . ., so rounding to four digits yields one, and subtracting from
one yields zero.
Rewrite the expression (45) in a form that is mathematically equivalent, but avoids cancellation. Evaluate
the stable formula (still rounding cosines and sines to 4 significant digits) and compare with the result above.
16.2 Recall the definition of average and standard deviation of a vector: if x is an n-vector, then
n
1X 1
avg(x) = xi , std(x) = √ kx − avg(x)1k. (46)
n i=1 n
(See page 53 of the textbook.) The following MATLAB code evaluates the expressions (46) and (47), using
the chop function to simulate a machine with a precision of 6 decimal digits.
>> n = length(x);
>> sum1 = 0;
>> sum2 = 0;
>> for i=1:n
sum1 = chop(sum1 + x(i)^2, 6);
sum2 = chop(sum2 + x(i), 6);
end;
>> a = chop(sum2/n, 6)
>> s = chop((sum1 - sum2^2/n) / n, 6)
If we run this code with input vector
x = [1002; 1000; 1003; 1001; 1002; 1002; 1001; 1004; 1002; 1001];
it returns
a =
1.0018e+03
s =
-3.2400
1 The MATLAB command chop(x,n) rounds the number x to n decimal digits. We use it to artificially introduce rounding
92
This is clearly wrong, because the variance must be a nonnegative number. (The correct answer is avg(x) =
1001.8, std(x)2 = 1.1600.)
Suppose we evaluate the first 3000 terms of the sum in MATLAB, rounding the result of each addition to
four digits:
>> sum = 0;
>> for i = 1:3000
sum = chop(sum + 1/i^2, 4);
end
>> sum
sum =
1.6240
>> sum = 0;
>> for i = 3000:-1:1
sum = chop(sum + 1/i^2, 4);
end
>> sum
sum =
1.6450
The result has four correct significant digits. Explain the difference. (Note that the calculation does not
involve any subtractions, so this is an example of a large error that is not caused by cancellation.)
16.4 The length of one side of a triangle (c in the figure) can be calculated from the lengths of the two other sides
(a and b) and the opposing angle (θ) by the formula
p
c = a2 + b2 − 2ab cos θ. (48)
a
c
θ
b
Two equivalent expressions are p
c= (a − b)2 + 4ab (sin(θ/2))2 (49)
and p
c= (a + b)2 − 4ab (cos(θ/2))2 . (50)
93
(The equivalence of the three formulas follows from the identities cos θ = 1 − 2 (sin(θ/2))2 and cos θ =
−1 + 2 (cos(θ/2))2 .)
Which of the three formulas gives the most stable method for computing c if a ≈ b and θ is small? For
simplicity you can assume that the calculations are exact, except for a small error in the evaluation of the
cosine and sine functions. Explain your answer.
16.5 Consider the two nonlinear equations in the two (scalar) variables x, y
x2 − y 2 = a, 2xy = b.
The right-hand sides a and b can be positive, negative, or zero. It is easily verified that
s √ s √
a+ a +b2 2 −a + a2 + b2
x̂ = , ŷ = sign(b)
2 2
Give a numerically stable algorithm for computing x̂ and ŷ when |a| ≫ |b|.
16.6 The slope of the least squares straight-line fit to points (a1 , b1 ), . . . , (an , bn ) is given by
P
n P
n
(ai − ā)(bi − b̄) ai bi − nāb̄
i=1 i=1
c= P
n = Pn ,
(ai − ā)2 a2i − nā2
i=1 i=1
where ā = avg(a), b̄ = avg(b). We evaluate the second expression for c for the vectors
bi
3
1
ai
108 1 + 108 2 + 108
and obtained c = 0.5000. Explain the large error in the result. (Your explanation can be qualitative; you
are not expected to explain the exact value 0.5.)
94
17 Floating-point numbers
17.1 Evaluate the following expressions in MATLAB and explain the results.
(a) (1 + 1e-16) - 1
(b) 1 + (1e-16 - 1)
(c) (1 - 1e-16) - 1
(d) (2 + 2e-16) - 2
(e) (2 - 2e-16) - 2
(f) (2 + 3e-16) - 2
(g) (2 - 3e-16) - 2
17.2 Answer the following questions, assuming IEEE double precision arithmetic.
17.3 How many IEEE double precision floating-point numbers are contained in the following intervals?
>> x = 2;
>> for i=1:54
x = sqrt(x);
end;
>> for i=1:54
x = x^2;
end;
>> x
17.5 Explain the following results in MATLAB. (Note log(1 + x)/x ≈ 1 for small x so the correct result is very
close to 1.)
√
17.6 The figure shows 1 + x around x = 0 (solid line) and its first-order Taylor approximation 1 + x/2 (dashed
line).
95
2
1.5
0.5
0
−1 −0.5 0 0.5 1 1.5 2
x
√
It is clear that 1 + x ≈ 1 + (1/2)x for small x. Therefore the function
√
1+x−1
f (x) =
x
is approximately equal to 1/2 for x around zero. We evaluated f in MATLAB, using the command
y = (sqrt(1 + x) - 1) / x
x y
−16
2 · 10 0
3 · 10−16 0
4 · 10−16 0.5551
5 · 10−16 0.4441
e = lim (1 + 1/n)n .
n→∞
This suggests a method for evaluating e: we pick a large n, and evaluate (1 + 1/n)n . One would expect that
this yields better approximations as n increases.
Evaluate (1 + 1/n)n in MATLAB for n = 104 , n = 108 , n = 1012 , and n = 1016 . How many correct digits
do you obtain? Explain briefly.
17.8 The plot shows the function
1 − cos(x)
f (x) = ,
x2
evaluated in MATLAB using the command (1 - cos(x)) / x^2. We plot the function between x = 10−9
and x = 10−6 , with a logarithmic scale for the x-axis.
96
1
(b)
(d)
f (x)
0.5
(c)
0
(a)
−9 −8 −7 −6
10 10 10 10
x
We notice large errors: the correct value of f (x) in this interval is very close to 1/2, because cos x ≈ 1 − x2 /2
for small x. We also note that the computed function is not continuous.
(a) At what value of x does the first discontinuity (from point (a) to (b)) occur? Why is the computed
value zero to the left of point (a)?
(b) At what point does the second discontinuity occur (from point (c) to (d))?
(c) What are the computed values at points (c) and (d)?
(d) Give a more stable method for computing f (x) for small x.
Be as precise as possible in your answers for parts (a), (b), and (c). However, you can use the approximation
cos x ≈ 1 − x2 /2.
17.9 The inequality
ka − bk ≥ kak − kbk
holds for all vectors a and b of the same length. The MATLAB code below evaluates the difference of the
two sides of the inequality for
1 10−16
a= , b = .
2 · 10−8 0
ans =
-2.2204e-16
The result is negative, which contradicts the inequality. Explain the number returned by MATLAB, assuming
IEEE double precision arithmetic was used.
√
(You can use the√linear approximation 1 + x ≈ 1 + x/2 for small x. The linear function 1 + x/2 is also an
upper bound on 1 + x for all x ≥ −1.)
17.10 The figure shows the function
(1 + x) − 1
f (x) =
1 + (x − 1)
evaluated in IEEE double precision arithmetic in the interval [10−16 , 10−15 ], using the MATLAB command
((1 + x) - 1) / (1 + (x - 1)) to evaluate f (x).
97
2
0
10−16 10−15
x
2 (b)
(d)
(e)
(c)
0
(a)
0 0.5 1
x −15
x 10
(a) Which of the two expressions (exp(log(x))/x or log(exp(x))/x) was used for the graph in solid line,
and which one for the graph in dashed line? You can assume that the MATLAB functions log(u) and
exp(v) return the exact values of log u and exp v, rounded to the nearest floating-point number.
98
(b) Explain the graph in solid line up to point (e). What are (approximately) the horizontal and vertical
values at the labeled points (a)–(e)? Why are the segments between (b) and (c), and between (d) and
(e) nonlinear?
To analyze the effect of rounding error, you can use the first-order Taylor approximations log(a + b) ≈
log(a) + b/a for a > 0 and |b| ≪ a, and exp(a + b) ≈ exp(a)(1 + b) for |b| ≪ |a|.
f (x̂ + h) − f (x̂ − h)
f ′ (x̂) ≈
2h
for small positive h. The right-hand side is known as a finite-difference approximation of the derivative.
The figure shows the finite-difference approximation of the derivative of f (x) = exp(x) at x̂ = 0, for values
of h in the interval (0, 5 · 10−16 ]. The finite-difference approximation
exp(h) − exp(−h)
g(h) =
2h
was computed using the MATLAB command
(4)
(6)
(2)
1
(5)
g(h)
(7)
(3)
0
(1)
0 5
h −16
x 10
Explain the first four segments of the graph, assuming IEEE double precision arithmetic was used in the
calculation. You can make the approximation exp(t) ≈ 1 + t for small t.
Your explanation should include the numerical values of the seven points marked on the graph, and an
expression for the curve between points (2) and (3), (4) and (5), and (6) and (7).
99
f = (1 - 1 / (1 + x)) / (-1 + 1 / (1 - x)).
(4) (5)
1/2 (2) (3)
0 (1)
Explain the graph up to the point labeled (5). What are the x-values of the five points? Why are the
computed values on the first five segments 0, 1, 1/2, 1, and 2/3?
Assume that the result of each operation (addition, subtraction, division) is the exact value, rounded to the
nearest floating-point number.
Hint. The linear approximation of 1/y around 1 is 1 − (y − 1) (the dashed line in the next figure).
4.5
3.5
3
1/y
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
y
100
2
1.5
f
1
0.5
0
0.1 1 2 3 4 5
x x 10
−16
x f
5e-17 0
1e-16 1.1102
2e-16 0.55511
3e-16 0.74015
4e-16 1.1102
√ √
You can use the approximation 1 + x ≈ 1 + 12 x for x ≈ 0, and the fact that 1 + x < 1 + 12 x for
x 6= 0.
(b) Give a more stable method for evaluating f (x).
f = (1 + x)^2 - 1, g = x * (2 + x),
−15
x 10
2
1.5
0.5
−0.5
−1
−1.5
−2
−5 0 5
x x 10
−16
101
(a) Which expression (f or g) was used for the solid line and which one for the dashed line? Briefly explain
your answer.
(b) Explain in detail the graph in solid line. At what values of x do the steps occur, and what is their
height?
17.16 A basic step in the Householder QR algorithm is the calculation of a reflector matrix H = I − (2/kwk2 )wwT
that maps a given nonzero n-vector y to a multiple of the first unit vector e1 . The following two choices for
w accomplish this:
y1 + kyk y1 − kyk
y2 y2
w
b = y + kyke1 = .. , w
e = y − kyke1 = .. .
. .
yn yn
If we choose w = w,
b the reflection H maps y to Hy = −kyke1 . With w = w,
e it maps y to Hy = kyke1 .
17.17 Explain the results of the following calculations with IEEE double precision numbers.
(a) For each of the four values of x in the table, we evaluate the function f (x) = (exp(x)−1)/x in MATLAB
as f = (exp(x) - 1) / x.
x f = (exp(x)-1) / x
5e-16 0.8882
8e-16 1.1102
-5e-16 1.1102
-8e-16 0.9714
Explain the numbers in column 2. You can use the approximation exp(x) ≈ 1 + x for small x.
(b) We repeat the same calculations using the equivalent expression f (x) = (exp(x) − 1)/ log(exp(x)) and
the MATLAB code f = (exp(x) - 1) / log(exp(x)).
x f = (exp(x)-1) / log(exp(x))
5e-16 1.0000
8e-16 1.0000
-5e-16 1.0000
-8e-16 1.0000
Explain why these results are more accurate. You can use the approximation log y ≈ y − 1 for y
around 1.
17.18 Simple trigonometric identities show that the following two functions are equal:
We evaluate the two formulas in MATLAB (i.e., double precision arithmetic), using the expressions
f = 1 - cos(x), g = 2*sin(x/2)^2,
102
for values x in the interval [0, 3 · 10−8 ], and obtain the two graphs shown with solid and dashed lines. The
dashed line is much more accurate than the solid line. (For small x, f (x) = g(x) ≈ x2 /2.)
4 · 10−16
3 · 10−16
2 · 10−16
1 · 10−16
103