Vector Spaces: 2.1 Vectors and Vector Space
Vector Spaces: 2.1 Vectors and Vector Space
Vector Spaces: 2.1 Vectors and Vector Space
Vector Spaces
Our first technical topic for this book is linear algebra, which is one of the foundation
stones of applied mathematics in general, and econometrics and statistics in partic-
ular. Data ordered by observation are naturally stored in vectors. Related vectors
are naturally grouped into matrices. Once our data are organized this way, we need
to perform basic arithmetic operations or solve equations or quadratic minimization
problems. In this chapter we cover the foundations of vector operations used in linear
algebra. As we’ll see, the conceptual aspects of linear algebra are clearest if we begin
by stripping away details such as matrices and look instead at linear operations from
a more abstract perspective.
2.1.1 Vectors
For arbitrary N ∈ N, the symbol R N represents the set of all N-vectors, or vectors of
length N. A typical element has the form
x1
x2
x= .. where xn ∈ R for each n
.
xN
9
10 Chapter 2
(2, 4)
4
(-3, 3)
0
4 2 0 2 4
(-4, -3.5) 4
(R = R1 represents the set of all real numbers, which is the union of the rational and
irrational numbers.) While x has been written vertically, as a column of numbers, we
could also write it horizontally as x = ( x1 , . . . , x N ). For now vectors are just a se-
quences of numbers, and it makes no difference whether we write them vertically or
horizontally. (Only when we work with matrix multiplication does it become neces-
sary to distinguish between column and row vectors.)
The vector of ones will be denoted 1 and the vector of zeros will be denoted 0:
1 0
1 := ... , 0 := ...
1 0
Although vectors are infinitesimal points in R N , they are often represented visually as
arrows from the origin to the point itself. Figure 2.1 gives an illustration for the case
N = 2.
In vector space theory there are two fundamental algebraic operations: vector ad-
dition and scalar multiplication. For x, y ∈ R N , their vector sum is
x1 y1 x1 + y1
x + y = ... + ... := ..
.
xN yN xN + yN
Vector Spaces 11
x+y
y
x1
αx1
αx = α ... := ...
xN αx N
Thus addition and scalar multiplication are defined in terms of ordinary addition and
multiplication in R, computed element by element, adding and multiplying respec-
tively.1 Figures 2.2 and 2.3 show examples of vector addition and scalar multiplication
in the case N = 2.
Subtraction of two vectors is performed element by element, just like addition.
Subtraction is not a primitive operation because the definition can be given in terms of
addition and scalar multiplication: x − y := x + (−1)y. An illustration of subtraction
is given in figure 2.4. One way to remember this operation is to draw a line from y to
x and then shift it to the origin.
The inner product of two vectors x and y in R N is denoted by hx, yi, and defined
1. In some instances, the notion of scalar multiplication includes multiplication of vectors by complex
numbers. In what follows we will work almost entirely with real scalars. Hence scalar multiplication means
real scalar multiplication unless otherwise stated.
12 Chapter 2
−2x
x−y
N
hx, yi := ∑ xn yn (2.1)
n =1
Fact 2.1.1 For any α, β ∈ R and any x, y, z ∈ R N , the following statements are true:
(i) hx, yi = hy, xi,
(ii) hαx, βyi = αβ hx, yi, and
(iii) hx, αy + βzi = α hx, yi + β hx, zi.
These properties are easy to check from (2.1). For example, regarding the second
equality, pick any α, β ∈ R and any x, y ∈ R N . By the definitions of scalar multiplica-
tion and inner product respectively, we have
N N
hαx, βyi = ∑ αxn βyn = αβ ∑ xn yn = αβ hx, yi
n =1 n =1
and represents the length of the vector x. (In the arrow representation of vectors in
figures 2.2–2.4, the norm of the vector is equal to the length of the arrow.)
Fact 2.1.2 For any α ∈ R and any x, y ∈ R N , the following statements are true:
(i) kxk > 0 and kxk = 0 if and only if x = 0,
(ii) kαxk = |α|kxk,
(iii) kx + yk 6 kxk + kyk, and
(iv) |hx, yi| 6 kxkkyk.
Properties (i) and (ii) you can verify yourself without difficulty. Proofs for (iii)
and (iv) are a bit harder. Property (iii) is called the triangle inequality, while (iv) is
called the Cauchy–Schwarz inequality. The proof of the Cauchy–Schwarz inequality
is given as a solved exercise after we’ve built up some more tools (see ex. 3.5.33). If
you’re prepared to accept the Cauchy–Schwarz inequality for now, then the triangle
inequality follows, because, by the properties of the inner product given in fact 2.1.1,
x2 2 x2 2
x1 x1
1 1
−2 −1 1 2 −2 −1 1 2
−1 −1
y
−2 −2
y
−2 −1 1 2 −2 −1 1 2
−1 −1
−2 −2
for some collection of scalars α1 , . . . , αK (i.e., with αk ∈ R for all k). Figure 2.5 shows
four different linear combinations y = α1 x1 + α2 x2 where x1 , x2 are fixed vectors in R2
and the scalars α1 and α2 are varied.
Vector Spaces 15
x1 x1
0 x2 0 x2
0 0
0 0
Given any nonempty X ⊂ R N , the set of all vectors that can be made by (finite)
linear combinations of elements of X is called the span of X, and denoted by span X.
For example, the set of all linear combinations of X := {x1 , . . . , xK } is
K
( )
span X := all vectors ∑ αk xk such that α := (α1 , . . . , αK ) ∈ RK
k =1
As will be discussed below, the span of certain collections of vectors turns out to have
an intimate connection with existence of solutions to linear equations.
Example 2.1.1 By construction, the four vectors labeled y in figure 2.5 lie in the span
of X = {x1 , x2 }. Looking at this picture might lead you to wonder whether any vector
in R2 could be created as a linear combination of x1 , x2 . The answer is affirmative.
We’ll prove this in §2.1.5.
Example 2.1.2 Let X = {1} = {(1, 1)} ⊂ R2 . The span of X is all vectors of the form
α1 = (α, α) with α ∈ R. This constitutes a line in the plane. Since we can take α = 0,
it follows that the origin 0 is in span X. In fact span X is the unique line in the plane
that passes through both 0 and the vector 1 = (1, 1).
Example 2.1.3 Let x1 = (3, 4, 2) and let x2 = (3, −4, 0.4). The span of {x1 , x2 } is
a plane in R3 that passes through both of these vectors and the origin, as shown in
figure 2.6.
16 Chapter 2
e2 = (0, 1)
y = y1 e1 + y2 e2
e1 = (1, 0)
Example 2.1.4 Consider the vectors {e1 , . . . , e N } ⊂ R N , where en has all zeros except
for a 1 as the nth element:
1 0 0
0 1 0
e1 : = . , e2 : = . , · · · , e N : = .
.. .. ..
0 0 1
The case of R2 is illustrated in figure 2.7. The vectors e1 , . . . , e N are called the canon-
ical basis vectors of R N . One reason is that {e1 , . . . , e N } spans all of R N . Here’s a
proof for N = 2: Observe that for any y ∈ R2 , we have
y1 y1 0 1 0
y := = + = y1 + y2 = y1 e1 + y2 e2
y2 0 y1 0 1
α2
α2 = (1.2/2.2)α1
α2 = −(1.1/1.4)α1
1
α1
−2 −1 0 1 2
−1
α1 x1 + · · · + α K x K = 0 =⇒ α1 = · · · = α K = 0 (2.4)
Example 2.1.6 In figure 2.5 on page 14, the two vectors are x1 = (1.2, 1.1) and x2 =
(−2.2, 1.4). Suppose that α1 and α2 are scalars with
1.2 −2.2
α1 + α2 =0
1.1 1.4
This translates to the linear, two-equation system shown in figure 2.8, where the un-
knowns are α1 and α2 . The only solution is α1 = α2 = 0. Hence {x1 , x2 } is linearly
independent.
18 Chapter 2
Example 2.1.7 The set of N canonical basis vectors {e1 , . . . , e N } is linearly indepen-
dent in R N . To see this, let α1 , . . . , α N be coefficients such that ∑nN=1 αn en = 0. Equiv-
alently,
1 0 0 0
α1
0 1 0 α2 0
α1 .. + α2 .. + · · · + αN .. = .. = ..
. . . . .
0 0 1 αN 0
x1 = (1, 0)
x2 = (−2, 0)
This pair fails to be linearly independent, since x2 = −2x1 , and hence 2x1 + x2 = 0.
Theorem 2.1.1 Let X := {x1 , . . . , xK } ⊂ RN . For K > 1, the following statements are
equivalent:
(i) X is linearly independent.
(ii) X0 is a proper subset2 of X =⇒ span X0 is a proper subset of span X.
(iii) No vector in X can be written as a linear combination of the others.
Exercise 2.4.15 asks you to check these equivalences. For now let’s just step through
them in the context of two examples. First consider the pair of canonical basis vectors
{e1 , e2 } in R2 , as depicted in figure 2.7. As we saw in examples 2.1.4 and 2.1.7, this
pair is linearly independent, and its span is all of R2 . Both vectors contribute to the
span, since removing either one reduces the span to just a line in R2 . (For example, the
span of {e1 } is just the horizontal axis in R2 .) Neither one of this pair can be written
as a linear combination of the other.
2. A is a proper subset of B if A ⊂ B and A 6= B.
Vector Spaces 19
Next, consider instead the pair {x1 , x2 } in example 2.1.8. These vectors fail to be
linearly independent, as shown in that example. It is also clear that dropping either
one does not change the span—it remains the horizontal axis in any case. Finally, we
saw in example 2.1.8 that x2 = −2x1 , which means that each vector can be written as
a linear combination of the other.
y = α1 x1 + · · · + α K x K (2.5)
Proof. ((i) =⇒ (ii)) Let X be linearly independent and pick any y ∈ R N . Suppose
that there are two sets of scalars such that (2.5) holds. In particular, suppose that y =
∑kK=1 αk xk = ∑kK=1 β k xk . It follows from the second equality that ∑kK=1 (αk − β k )xk = 0.
By linear independence, we then have αk = β k for all k. In other words, the represen-
tation is unique.
((ii) =⇒ (i)) If (ii) holds, then there exists at most one set of scalars such that
0 = ∑kK=1 αk xk . Because α1 = · · · = αk = 0 has this property, we conclude that no
nonzero scalars yield 0 = ∑kK=1 αk xk . In other words, X is linearly independent.
20 Chapter 2
x, y ∈ S and α, β ∈ R =⇒ αx + βy ∈ S (2.6)
Example 2.1.9 It follows from the preceding discussion that if X is any nonempty
subset of R N , then span X is a linear subspace of R N . For this reason, span X is often
called the linear subspace spanned by X.
Example 2.1.10 Given any a ∈ R N , the set A := {x ∈ R N : ha, xi = 0} is a linear
subspace of R N . To see this let x, y ∈ A and let α, β ∈ R. We claim that z := αx + βy ∈
A, or, equivalently, that ha, zi = 0. This is true because
Example 2.1.11 The entire space R N is a linear subspace of RN because any linear
combination of N-vectors is an N-vector.
To visualize subspaces in R3 , think of lines and planes that pass through the origin.
Here are some elementary facts about linear subspaces:
Fact 2.1.5 If S is a linear subspace of R N , then
(i) 0 ∈ S,
(ii) X ⊂ S =⇒ span X ⊂ S, and
(iii) span S = S.
There’s also one deep result about linear subspaces we need to cover, which forms
a cornerstone of many foundational results:
Theorem 2.1.3 Let S be a linear subspace of R N . If S is spanned by K vectors, then any
linearly independent subset of S has at most K vectors.
Vector Spaces 21
x3
x1 0
x2
In other words, if there exists a set X = {x1 , . . . , xK } with S ⊂ span X, then any
subset of S with more than K vectors will be linearly dependent. The proof can be
found in most texts on linear algebra (e.g., §3.5 of Jänich 1994).
Example 2.1.12 We saw in example 2.1.4 that R2 is spanned by the pair {e1 , e2 }, where
ei is the ith canonical basis vector in R2 . (See also figure 2.7.) It follows immediately
from this fact and theorem 2.1.3 that the three vectors in R2 shown in figure 2.1 are
linearly dependent.
on page 17), and any pair of linearly independent vectors in R2 spans R2 . Here’s a
statement of this result for the general case:
Proof. ((i) =⇒ (ii)) Suppose that span X = R N but X is not linearly independent.
Then, by theorem 2.1.1, there exists a proper subset X0 of X with span X0 = span X.
Since X0 is a proper subset of X it contains K < N elements. We now have K vec-
tors spanning R N . In particular, the span of K vectors contains the N > K linearly
independent vectors e1 , . . . , e N . This contradicts theorem 2.1.3.
((ii) =⇒ (i)) Suppose that X is linearly independent and yet there exists an x ∈ R N
with x ∈ / span X. By fact 2.1.4, it follows that the N + 1 element set X ∪ {x} ⊂ R N
is linearly independent. Since R N is spanned by the N canonical basis vectors, this
statement also contradicts theorem 2.1.3.
Example 2.1.15 The pair {x1 , x2 } from figure 2.5 (page 14) forms a basis of R2 .
Example 2.1.16 The pair {e1 , e2 } is a basis for the set P defined in example 2.1.5.
The proof of part (i) is not particularly hard. See, for example, section 3.2 of Jänich
(1994). Part (ii) follows from theorem 2.1.3 and is left as an exercise (ex. 2.4.17).
If S is a linear subspace of R N , then the common number identified in theorem 2.1.5
is called the dimension of S, and written as dim S.
Part (i) of fact 2.1.6 implies that the only N-dimensional linear subspace of R N is
RN itself.
(Following a common convention, we’ll write linear functions with uppercase letters
and omit the parenthesis around the argument where no confusion arises. This con-
vention has come about because the action of linear maps is essentially isomorphic to
multiplication of vectors by matrices. More on that topic soon.)
24 Chapter 2
K K
" #
Tx = T ∑ αk ek = ∑ αk Tek
k =1 k =1
as we vary α1 , . . . , αK over all scalar combinations. (See §15.2 for the definition of
range.) In other words, the range of a linear map is the span of the image of the
canonical basis functions. This will prove to be important later on. The next fact
summarizes.
Fact 2.1.7 If T : RK → RN is a linear map, then
rng T = span V, where V := { Te1 , . . . , TeK }
Soon we’ll turn to the topic of determining when linear functions are bijections,
an issue that is intimately related to invertibility of matrices. To this end it’s useful
to note that, for linear functions, the property of being one-to-one can be determined
by examining the set of points it maps to the origin. To express this idea, for any
T : RK → R N , we define the null space or kernel of T as
null T := {x ∈ RK : Tx = 0}
Vector Spaces 25
outcome
model
F(x) = y
what x led to outcome y?
0.4
0.4
0.4 Tx = αx with α =0
Theorem 2.1.8 For a linear map T from RK → R N , the following statements are true:
(i) If K < N, then T is not onto.
(ii) If K > N, then T is not one-to-one.
The most important implication is that if N 6= K, then we can forget about bijec-
tions. The proof of theorem 2.1.8 is a solved exercise (ex. 2.4.22).
Vector Spaces 27
x z
Figure 2.12 x ⊥ z
2.2 Orthogonality
One of the core concepts in this book is orthogonality, not just of vectors but also of
more complex objects such as random variables. Let’s begin with the vector definition
and some key implications.
k z1 + · · · + z K k2 = k z1 k2 + · · · + k z K k2
Orthogonal sets and linear independence are closely related. For example,
While not every linearly independent set is orthogonal, an important partial con-
verse to fact 2.2.2 is given in §2.2.4.
28 Chapter 2
S
x
Figure 2.13 x ⊥ S
K
x= ∑ hx, uk i uk (2.8)
k =1
S⊥
S
ŷ = argmin ky − zk (2.9)
z∈S
The next theorem tells us that a solution ŷ to this minimization problem always exists,
as well as providing a means to identify it.
Theorem 2.2.1 (Orthogonal Projection Theorem I) Let y ∈ R N and let S be any nonempty
linear subspace of R N . The following statements are true:
(i) The optimization problem (2.9) has exactly one solution.
(ii) ŷ ∈ R N solves (2.9) if and only if ŷ ∈ S and y − ŷ ⊥ S.
The unique solution ŷ is called the orthogonal projection of y onto S.
The intuition is easy to grasp from a graphical presentation. Figure 2.15 illustrates.
Looking at the figure, we can see that the closest point to y in S is the one point ŷ ∈ S
such that y − ŷ is orthogonal to S.
For a full proof see, for example, theorem 5.16 of Çinlar and Vanderbei (2013). Let’s
just cover sufficiency of the conditions in part (ii): Let y ∈ R N and let S be a linear
subspace of R N . Let ŷ be a vector in S satisfying y − ŷ ⊥ S. Let z be any other point
30 Chapter 2
y − ŷ S
ŷ
in S. We have
The second equality follows from y − ŷ ⊥ S (why?) and the Pythagorian law. Since
z was an arbitrary point in S, we have ky − zk > ky − ŷk for all z ∈ S. Hence (2.9)
holds.
Example 2.2.1 Let y ∈ R N and let 1 ∈ R N be the vector of ones. Let S be the set of
constant vectors in R N , meaning that all elements are equal. Evidently S is the span
of {1}. The orthogonal projection of y onto S is ŷ := ȳ1, where ȳ := N1 ∑nN=1 yn . To
see this, note that ŷ ∈ S clearly holds. Hence we only need to check that y − ŷ is
orthogonal to S, for which it suffices to show that hy − ŷ, 1i = 0 (see ex. 2.4.14 on
page 36). This is true because
N
hy − ŷ, 1i = hy, 1i − hŷ, 1i = ∑ yn − ȳ h1, 1i = 0
n =1
Py
y0
Py0
P = proj S
K
Py = ∑ hy, uk i uk (2.10)
k =1
Fact 2.2.6 is a fundamental result. It’s true because the right-hand side of (2.10)
clearly lies in S (being a linear combination of basis functions) and, for any u j in the
basis set
K
y − Py, u j = y, u j − ∑ hy, uk i uk , u j = y, u j − y, u j = 0
k =1
M := I − P (2.11)
Vector Spaces 33
y
S⊥
S
My
Py
Example 2.2.3 Recall example 2.2.1, where we found that the projection of y ∈ R N
onto span{1} is ȳ1. The residual projection is Mc y := y − ȳ1. In econometric appli-
cations, we’ll view this as a vector of errors obtained when the elements of a vector
are predicted by its sample mean. The subscript reminds us that Mc centers vectors
around their mean.
Fact 2.2.8 Let S be a linear subspace of R N , let P = proj S, and let M be the residual
projection as defined in (2.11). The following statements are true:
(i) M = proj S⊥ .
(ii) y = Py + My for any y ∈ R N .
(iii) Py ⊥ My for any y ∈ R N .
(iv) My = 0 if and only if y ∈ S.
(v) P ◦ M = M ◦ P = 0.
Part (v) means that PMy = MPy = 0 for all y ∈ R N . Figure 2.17 illustrates the
action of M. The results in fact 2.2.8 can be seen in the figure.
If S1 and S2 are two subspaces of R N with S1 ⊂ S2 , then S2⊥ ⊂ S1⊥ . This means that
the result in fact 2.2.7 is reversed for M.
34 Chapter 2
Fact 2.2.9 Let S1 and S2 be two subspaces of R N and let y ∈ RN . Let M1 and M2 be
the projections onto S1⊥ and S2⊥ respectively. If S1 ⊂ S2 , then
M1 M2 y = M2 M1 y = M2 y
As an application of the ideas above, let’s now discuss a procedure called Gram–
Schmidt orthogonalization, which provides a fundamental link between the two ma-
jor concepts discussed in this chapter: linear independence and orthogonality. It can
be considered as a partial converse to fact 2.2.2 on page 27.
Theorem 2.2.3 For each linearly independent set {b1 , . . . , bK } ⊂ RN , there exists an or-
thonormal set {u1 , . . . , uK } with
The proof of theorem 2.2.3 provides an important algorithm for generating the
orthonormal set {u1 , . . . , uK }. The first step is to construct orthogonal sets {v1 , . . . , vk }
with span identical to {b1 , . . . , bk } for each k. The construction of {v1 , . . . , vK } uses the
so called Gram–Schmidt orthogonalization procedure. First, for each k = 1, . . . , K, let
(i) Bk := span{b1 , . . . , bk },
(iv) Vk := span{v1 , . . . , vk }.
Good texts on vector spaces include Marcus and Minc (1988) and Jänich (1994).
Vector Spaces 35
2.4 Exercises
Ex. 2.4.1 Show that inner products of linear combinations satisfy the following rule:
K J K J
* +
∑ αk xk , ∑ β j y j = ∑ ∑ αk β j xk , y j
k =1 j =1 k =1 j =1
Ex. 2.4.2 Show that the vectors (1, 1) and (−1, 2) are linearly independent.
Ex. 2.4.3 Use fact 2.1.2 on page 13 to show that if y ∈ RN is such that hy, xi = 0 for
every x ∈ R N , then y = 0.
−3.9 −4 0 0
Ex. 2.4.9 Show that if S and S0 are two linear subspaces of RN , then S ∩ S0 is also a
linear subspace of R N .
Ex. 2.4.13 Show that if T : R N → R N is a linear function and λ is any scalar, then
E := {x ∈ R N : Tx = λx} is a linear subspace of R N .
Ex. 2.4.14 Show that if B ⊂ S with span B = S, then x ⊥ S if and only if x ⊥ b for all
b ∈ B.
Ex. 2.4.17 Show that if S is a linear subspace of R N then every basis of S has the same
number of elements.
Ex. 2.4.23 Find two unit vectors (i.e., vectors with norm equal to one) that are orthog-
onal to (1, −2).
Ex. 2.4.24 Prove the Pythagorean law (fact 2.2.1 on page 27). See ex. 2.4.1 if you need
a hint.
Ex. 2.4.26 Prove fact 2.2.8 using theorems 2.2.1 and 2.2.2.
Ex. 2.4.29 Let P be the orthogonal projection described in theorem 2.2.2 (page 31).
Confirm that P is a linear function from R N to R N .
Ex. 2.4.31 Let S be any linear subspace of R N and let P = proj S (see theorem 2.2.2 on
page 31). Is P one-to-one as a function on R N ?
Vector Spaces 37
Ex. 2.4.32 Prove the reverse triangle inequality. That is, given two vectors x and y,
show that |kxk − kyk| 6 kx − yk.4
In the next three exercises, the notation is as given in theorem 2.2.3 and the discus-
sion immediately afterwards.
Ex. 2.4.36 Show that {u1 , . . . , uk } is an orthonormal set with span equal to Vk for all
k.
hx, xi
hy, xi 6 | hy, xi | 6 kykkxk = kxk = = hx̂, xi
kxk
Solution to Ex. 2.4.5. This is a bit of a trick question. To solve it, you need to look
carefully at the definitions (as always). A linear subspace of R3 is a subset of R3 with
certain properties. R3 is a collection of 3-tuples ( x1 , x2 , x3 ) where each xi is a real
number. Elements of R2 are 2-tuples (pairs), and hence not elements of R3 . Therefore
R2 is not a subset of R3 , and, in particular, not a linear subspace of R3 .
Solution to Ex. 2.4.6. Let T be as in the question. We need to show that T0 = 0. Here’s
one proof. We know from the definition of scalar multiplication that 0x = 0 for any
vector x. Let x and y be any vectors in RK and apply the definition of linearity to
obtain
T0 = T (0x + 0y) = 0Tx + 0Ty = 0 + 0 = 0
Solution to Ex. 2.4.7. The answer is yes. Suppose, to the contrary, that {γx1 , γx2 } is
linearly dependent. Then one element can be written as a linear combination of the
others. In our setting with only two vectors, this translates to γx1 = αγx2 for some α.
Since γ 6= 0, we can multiply each side by 1/γ to get x1 = αx2 . This contradicts linear
independence of {x1 , x2 }.
4. Hint: Use the triangle inequality.
38 Chapter 2
Solution to Ex. 2.4.8. There is an easy way to do this: We know that any linearly in-
dependent set of 3 vectors in R3 will span R3 . Since z ∈ R3 , this will include z. So all
we need to do is show that the 3 vectors are linearly independent. To this end, take
any scalars α1 , α2 , α3 with
−4 0 0 0
α1 0 + α2 2 + α3 0 = 0 : = 0
0 0 −1 0
Written as three equations, this says that −4α1 = 0, 2α2 = 0, and −1α3 = 0. Hence
α1 = α2 = α3 = 0, and therefore the set is linearly independent.
Solution to Ex. 2.4.9. Let S and S0 be two linear subspaces of R N . Fix x, y ∈ S ∩ S0 and
α, β ∈ R. We claim that z := αx + βy ∈ S ∩ S0 . To see this, note that since x, y ∈ S and
S is a linear subspace, we have z ∈ S; and since x, y ∈ S0 and S0 is a linear subspace,
we have z ∈ S0 . It follows that z ∈ S ∩ S0 , as was to be shown.
Solution to Ex. 2.4.11. If a := (1, −1, 1), then Q is all x with ha, xi = 0. This set is a
linear subspace of R3 , as shown in example 2.1.10.
Solution to Ex. 2.4.15. We are asked to verify the equivalences in theorem 2.1.1 on
page 18 for the set X := {x1 , . . . , xK }. We will prove the cycle (i) =⇒ (ii) =⇒ (iii)
=⇒ (i).
((i) =⇒ (ii)) We aim to show that if (i) holds and X0 is a proper subset of X, then
span X0 is a proper subset of span X. To simplify notation let’s take X0 := {x2 , . . . , xK }.
Suppose, to the contrary, that span X0 = span X. Since x1 ∈ span X, we must then
have x1 ∈ span X0 , from which we deduce the existence of scalars α2 , . . . , αK such that
0 = −x1 + α2 x2 + · · · + αK xK . Since −1 6= 0, this contradicts part (i).
((ii) =⇒ (iii)) The claim is that when (ii) holds, no vector in X can be written as
a linear combination of the others. Suppose, to the contrary, that x1 = α2 x2 + · · · +
αK xK , say. Let y ∈ span X, so that y = β 1 x1 + · · · + β K xK . If we use the preceding
equality to substitute out x1 , we get y as a linear combination of {x2 , . . . , xK } alone. In
other words, any element of span X is in the span of the proper subset {x2 , . . . , xK }.
Contradiction.
((iii) =⇒ (i)) The final claim is that α1 = · · · = αK = 0 whenever α1 x1 + · · · +
αK xK = 0. Suppose, to the contrary, that there exist scalars with α1 x1 + · · · + αK xK = 0
and yet αk 6= 0 for at least one k. It follows immediately that xk = (1/αk ) ∑ j6=k α j x j .
This contradicts (iii).
Solution to Ex. 2.4.16. The aim is to prove fact 2.1.4 on page 19. Regarding the part
(i), let’s take X as linearly independent and show that the subset X0 := {x1 , . . . , xK −1 }
is linearly independent. (The argument for more general subsets is similar.) Suppose,
Vector Spaces 39
to the contrary, that X0 is linearly dependent. Then, by the definition, we can take
α1 , . . . , αK −1 not all zero with ∑kK=−11 αk xk = 0. Letting αK = 0, we can write this as
∑kK=1 αk xk = 0. Since not all coefficients are zero, we have contradicted linear inde-
pendence of X.
Regarding (ii), let X := {x1 , . . . , xK } be linearly independent and suppose that
x j = 0. Then by setting αk = 0 for k 6= j and α j = 1, we can form scalars not all equal
to zero with ∑kK=1 αk xk = 0.
Regarding (iii), let X := {x1 , . . . , xK } ⊂ R N be linearly independent and let xK +1
be any point in R N such that xK +1 ∈ / span X. The claim is that X ∪ {xK +1 } is linearly
independent. Suppose, to the contrary, that there exist α1 , . . . , αK , αK +1 not all zero
such that ∑kK=+11 αk xk = 0. There are two possibilities for αK +1 , both of which lead
to a contradiction: First, if αK +1 = 0, then, since α1 , . . . , αK , αK +1 are not all zero, at
least one of α1 , . . . , αK are nonzero, and moreover ∑kK=1 αk xk = ∑kK=+11 αk xk = 0. This
contradicts our assumption of independence on X. Second, if αK +1 6= 0, then from
∑kK=+11 αk xk = 0 we can express xK +1 as a linear combination of elements of X. This
contradicts the hypothesis that xK +1 ∈ / span X.
Solution to Ex. 2.4.17. Let B1 and B2 be two bases of S, with K1 and K2 elements re-
spectively. By definition, B2 is a linearly independent subset of S. Moreover, S is
spanned by the set B1 , which has K1 elements. Applying theorem 2.1.3, we see that
B2 has at most K1 elements. That is, K2 6 K1 . Reversing the roles of B1 and B2 gives
K1 6 K2 .
Solution to Ex. 2.4.18. The aim is to prove fact 2.1.6 on page 23. Suppose that S and
S0 are K-dimensional linear subspaces of R N with S ⊂ S0 . We claim that S = S0 . To
see this, observe that by the definition of dimension, S is equal to span B where B is
a set of K linearly independent basis vectors {b1 , . . . , bK }. If S 6= S0 , then there exists
a vector x ∈ S0 such that x ∈ / span B. In view of theorem 2.1.1 on page 18, the set
{x, b1 , . . . , bK } is linearly independent. Moreover, since x ∈ S0 and since B ⊂ S ⊂ S0 ,
we now have K + 1 linearly independent vectors inside S0 . At the same time, being K-
dimensional, we know that S0 is spanned by K vectors. This contradicts theorem 2.1.3
on page 20.
Regarding part (ii), suppose that S is an M-dimensional linear subspace of R N
where M < N and yet S = R N . Then we have a space S spanned by M < N vectors
that also contains the N linearly independent canonical basis vectors. We are led to
another contradiction of theorem 2.1.3. Hence S = R N cannot hold.
Solution to Ex. 2.4.19. Regarding part (i), let B be a basis of span X. By definition,
B is a linearly independent subset of span X. Since span X is spanned by K vectors,
theorem 2.1.3 implies that B has no more than K elements. Hence, dim span X 6 K.
40 Chapter 2
Regarding part (ii), suppose first that X is linearly independent. Then X is a basis
for span X. Since X has K elements, we conclude that dim span X = K. Conversely,
if dim span X = K, then X must be linearly independent. If X were not linearly inde-
pendent, then there would exist a proper subset X0 of X such that span X0 = span X.
By part (i) of this theorem, we then have dim span X0 6 #X0 6 K − 1. Therefore
dim span X 6 K − 1, a contradiction.
In the proof we will exploit the fact that T is by assumption a linear bijection.
Pick any vectors x, y ∈ R N and any two scalars α, β. Since T is a bijection, we
know that x and y have unique preimages under T. In particular, there exist unique
vectors u and v such that Tu = x and Tv = y. Using these definitions, linearity of T
and the fact that T −1 is the inverse of T, we have
Solution to Ex. 2.4.22. Regarding part (i), let K < N and let T : RK → R N be linear.
T cannot be onto because, if T were onto, then we would have rng T = R N , in which
case the vectors in V = { Te1 , . . . , TeK } in fact 2.1.7 would span R N , despite having
only K < N elements. This is impossible. (Why?)
Regarding part (ii), let T : RK → R N be linear and let K > N. Seeking a contradic-
tion, suppose in addition that T is one-to-one. Let {αk }kK=1 be such that ∑kK=1 αk Tek =
0. By linearity, T (∑kK=1 αk ek ) = 0, and since T is one-to-one and T0 = 0, this in turn
implies ∑kK=1 αk ek = 0. Since the canonical basis vectors are linearly independent, it
must be that α1 = · · · = αK = 0. From this we conclude that { Te1 , . . . , TeK } is linearly
independent. Thus R N contains K linearly independent vectors, despite the fact that
N < K. This is impossible by theorem 2.1.3 on page 20.
Solution to Ex. 2.4.25. Let O = {x1 , . . . , xK } ⊂ R N be an orthogonal set that does not
contain 0. Let α1 , . . . , αK be such that ∑kK=1 αk xk = 0. We claim that α j = 0 for any j. To
see that this is so, fix j and take the inner product of both sides of ∑kK=1 αk xk = 0 with
respect to x j to obtain α j kx j k2 = 0. Since x j 6= 0, we conclude that α j = 0. The proof is
done.
To verify this equality, we need to show that the right-hand side is the orthogonal
projection of αx + βy onto S. Going back to theorem 2.2.1, we need to show that (i)
42 Chapter 2
hy − x, e1 i = 0 and hy − x, e2 i = 0
kxk = kx − y + yk 6 kx − yk + kyk
It follows that kxk − kyk 6 kx − yk. A similar argument reversing the roles of x and
y gives kyk − kxk 6 kx − yk. Combining the last two inequalities gives
lie in Bk . Hence vk ∈ Bk . Since spans increase as we add more elements, it follows that
v j ∈ Bk for j 6 k. In other words, {v1 , . . . , vk } ⊂ Bk . Since Bk is a linear subspace, we
have Vk ⊂ Bk .
An induction argument shows that Bk ⊂ Vk also holds. Clearly it holds for k =
1. Suppose that it also holds for k − 1. From the definition of vk , we have bk =
Pk−1 bk + vk . The first term on the right-hand side lies in Bk−1 , which, by our induction
hypothesis, satisfies Bk−1 ⊂ Vk−1 ⊂ Vk . The second term on the right-hand side is vk ,
which obviously lies in Vk . Hence both terms are in Vk , and therefore bk ∈ Vk . Using
arguments analogous to the end of the last paragraph leads us to Bk ⊂ Vk .
Solution to Ex. 2.4.36. Since {v1 , . . . , vk } is orthogonal, the family {u1 , . . . , uk } will be
orthonormal provided that the norm of each element is 1. This is true by construction,
since uk := vk /kvk k. The only concern is that kvk k = 0 might hold for some k. But
vk = 0 is impossible because, if it was to hold, then by (vii) of theorem 2.2.2 we would
have bk ∈ Bk−1 , contradicting linear independence of {b1 , . . . , bK }.
The proof that span{u1 , . . . , uk } = Vk is straightforward and left for you.