Inner Product
Inner Product
Inner Product
1. Preliminaries
An inner product space is a vector space V along with a function h, i called an inner product which
associates each pair of vectors u, v with a scalar hu, vi, and which satisfies:
Combining (2) and (3), we also have hu, ↵v + wi = ↵hu, vi + hu, wi. Condition (1) is called positive definite,
condition (2) is called symmetric and condition (3) with the note above is called bilinear. Thus an inner
product is an example of a positive definite, symmetric bilinear function or form on the vector space V .
Definition 1.0.1. Let V be an inner product space and u and v be vectors in V . We make the following
definitions:
p
||u|| = hu, ui
||u v||
hu, vi = 0
hu,vi
Note that, a priori, we do not know that 1 ||u||·||v|| 1 so ✓ may not be defined. Later we will show that
these bounds are valid and so our definition of ✓ is also valid. When referring to (5), we will usually say “the
projection of u onto v”.
Now we give some preliminary results:
Theorem 1.0.2 (Pythagorean Theorem). Suppose that u and v are orthogonal vectors. Then
||u + v||2 = hu + v, u + vi
= hu, ui + hv, vi
= ||u||2 + ||v||2
Theorem 1.0.3. Suppose that p is the orthogonal projection of u onto the space spanned by v. Then p and
u p are orthogonal.
= h v, ui h v, vi
= hv, ui 2
hv, vi
hu, vi hu, vi2
= hv, ui hv, vi
hv, vi hv, vi2
hu, vi2 hu, vi2
=
hv, vi hv, vi
= 0
Theorem 1.0.4 (Cauchy-Schwarz Inequality). Suppose that v and v are vectors in an inner product space.
Then
|hu, vi| ||u|| · ||v||
Proof: Let p be the orthogonal projection of u onto v. From the previous result, p and u p are orthogonal.
We may then apply the Pythagorean Theorem to get
= ||u||2
Subtracting ||u p||2 from both sides and noting 0 ||u p||2 gives:
2.1. Example: Rn . Just as Rn is our template for a real vector space, it serves in the same way as the
archetypical inner product space. The usual inner product on Rn is called the dot product or scalar product
on Rn . It is defined by:
hx, yi = xT y
then 0 1
y1
B C
B C
B y2 C
hx, yi = x y = (x1 , . . . , xn ) B
T
B ..
C = x1 y1 + · · · + xn yn
C
B
@ . C
A
yn
****PROOF OF DOT PRODUCT BEING INNER PRODUCT GOES HERE****
****GEOMETRIC PROOF OF ORTHOGONAL PROJECTIONS GOES HERE****
****SPECIFIC EXAMPLE GOES HERE****
✓ ◆
xT y
p= y
yT y
hx, yi = xT ⇤ A ⇤ y
where the right-hand side is just matrix multiplication. The n ⇥ n matrix A must be a type of matrix known as
a symmetric, positive definite matrix in order for this to satisfy the conditions of an inner product. If we choose
A to be a symmetric matrix in which all of its entries are non-negative and has only positive entries on the main
diagonal, then it will be such a matrix. (More generally a symmetric, positive definite matrix is a symmetric
matrix with only positive eigenvalues.)
3
2.2. Example: Rm⇥n . Suppose A = (aij ) and B = (bij ) are matrices in Rm⇥n . The usual inner product on
Rm⇥n is given by:
m X
X n
hA, Bi = aij bij
i=1 j=1
Note that this is the sum of the point-wise products of the elements of A and B.
****PROOF OF THIS PRODUCT BEING INNER PRODUCT GOES HERE****
****SPECIFIC EXAMPLE GOES HERE****
2.3. Example: Pn . Here we will describe a type of inner product on Pn which we will term a discrete inner
product on Pn . Let {x1 , . . . , xn } be distinct real numbers. If p(x) is a polynomial in Pn , then it has degree at
most (n 1), and therefore has at most (n 1) roots. This means that p(xi ) 6= 0 for at least one i. We know
define an inner product by:
n
X
hp(x), q(x)i = p(xi )q(xi )
i=1
where w(x) is some continuous, positive real-valued function on [a, b]. The standard inner product on C[a, b]
is where w(x) = 1 in the above definition.
****PROOF OF THIS PRODUCT BEING INNER PRODUCT GOES HERE****
****SPECIFIC EXAMPLE GOES HERE****
Definition 3.0.1. Let U and V be subspaces of a vector space W such that U \ V = {0}. The direct sum of
U and V is the set U V = {u + v | u 2 U and v 2 V }.
Definition 3.0.2. Let S be a subspace of the inner product space V . The the orthogonal complement of S
is the set S ? = {v 2 V | hv, si = 0 for all s 2 S}.
Theorem 3.0.3. (1) If U and V are subspaces of a vector space W with U \ V = {0}, then U V is also
a subspace of W .
(2) If S is a subspace of the inner product space V , then S ? is also a subspace of V .
Proof: (1.) Note that 0+0 = 0 is in U V . Now suppose w1 , w2 2 U V , then w1 = u1 +v1 and w2 = u2 +v2
with ui 2 U and vi 2 V and w1 + w2 = (u1 + v1 ) + (u2 + v2 ) = (u1 + u2 ) + (v1 + v2 ). Since U and V are
subspaces, it follows that w1 +w2 2 U V . Suppose now that ↵ is a scalar, then ↵w1 = ↵(u1 +v1 ) = ↵u1 +↵v1 .
As above, it then follows that ↵w1 2 U V . Thus U V is a subspace for W .
4
For (2.), first note that 0 2 S ? . Now suppose that v1 and v2 2 S ? . Then hv1 , si = hv2 , si = 0 for all s 2 S.
So hv1 + v2 , si = hv1 , si + hv2 , si = 0 + 0 = 0 for all s 2 S. Thus v1 + v2 2 S ? . Similarly, if ↵ is a scalar, then
h↵v1 , si = ↵hv1 , si = ↵ · 0 = 0 for all s 2 S. Thus S ? is a subspace of V . ⇤
Theorem 3.0.4. If U and V are subspaces of W with U \ V = {0} and w 2 U V , then w = u + v for unique
u 2 U and v 2 V .
Definition 3.0.5. Let S be a subspace of the inner product space V and let {x1 , . . . , xn } be a basis for S such
that hxi , xj i = 0 if i 6= j, then this basis is called an orthogonal basis. Furthermore, if hxi , xi i = 1 then this
basis is called an orthonormal basis.
Definition 3.0.6. Let S be a finite dimensional subspace of the inner product space V and let {x1 , . . . , xn } be
an orthogonal basis for S. If v is any vector in V then the orthogonal projection of v onto S is the vector:
Xn
hv, xi i
p= xi
i=1
hxi , xi i
Note that if {x1 , . . . , xn } is an orthonormal basis, then we have the simpler expression:
n
X
p= hv, xi ixi
i=1
Also in the special case where S is spanned be the single vector x1 , then p is just the usual orthogonal projection
of v onto S, which is the line spanned by x1 .
Now we can prove the main theorem of this section:
Theorem 3.0.7. Let S be a finite dimensional subspace of the inner product space V and v be some vector in
V . Moreover let {x1 , . . . , xn } be an orthogonal basis for S and p be the orthogonal projection of v onto S. Then
(1) v p 2 S?.
(2) V = S S?.
(3) If y is any vector in S with y 6= p, then ||v p|| < ||v y||
Note that part (3.) says that p is the vector in S which is closest to v. Moreover, an immediate consequence
of (2.) is that the orthogonal projection p of v onto S is independent of the choice of orthogonal basis for S.
Proof: (1.) We need to show that p and v p are orthogonal. So consider hp, v pi = hp, vi hp, pi. Note
that hxi , xj i = 0 when i 6= j, so that
n
X X n
hv, xi i
hp, vi = hci xi , vi with ci = and hp, pi = hci xi , ci xi i )
i=1
hxi , xi i i=1
Xn Xn
hv, xi i hv, xi i2
hp, vi = hxi , vi and hp, pi = hxi , xi i
i=1
hxi , xi i i=1
hxi , xi i2
5
Thus
n
X n
X
hv, xi i2 hv, xi i2
hp, vi = and hp, pi =
i=1
hxi , xi i i=1
hxi , xi i
and the result follows for this part. Now let v be any vector in V , then v = p + (v p). Note that p 2 S and
from (1.), v p 2 S ? , and S \ S ? = {0}. Therefore we must have V = S S ? , proving (2.). For part (3.), let
y be some vector in S with y 6= p. Then ||v p|| = ||v p+p y||. Since p y 2 S and v p 2 S ? by (1.),
we have
||v p||2 + ||p y||2 = ||v y||2
Since y 6= p, ||p y|| 6= 0. Therefore ||v p||2 < ||v y||2 and ||v y|| < ||v y||. ⇤
Note that by (3.) of the above theorem, if v is actually in S, then p = v.
Definition 3.0.8. Let S be a subspace of the inner product space V , v be a vector in V and p be the orthogonal
projection of v onto S. Then p is called the least squares approximation of v (in S) and the vector r = v p
is called the residual vector of v.
4. Least squares in Rn
In this section we consider the following situation: Suppose that A is an m ⇥ n real matrix with m > n. If b
is a vector in Rm then the matrix equation Ax = b corresponds to an overdetermined linear system. Generally
such a system does not have a solution, however we would like to find an x̂ such that Ax̂ is as close to b as
possible. In this case Ax̂ is the least squares approximation to b and we refer to x̂ as the least squares solution
to this system. Recall that if r = b Ax, then r is the residual of this system. Moreover, our goal is then to
find a x which minimizes ||r||.
Before we continue, we mention a result without proof:
Theorem 4.0.9. Suppose that A is a real matrix. Then Col(A)? = N(AT ) and Col(AT )? = N(A).
We will use the results of the previous section to find x̂, or more precisely Ax̂. Given b there is a unique vector
p in Col(A) such that ||b p|| is minimal by theorem 3.0.7. Moreover, by the same theorem, b p 2 N(AT ).
Thus:
AT (b p) = 0 ) AT b AT p = 0 ) AT p = AT b
However, p = Ax̂ for some vector x̂ (note: x̂ is not necessarily unique, but Ax̂ is). So
AT p = AT b ) AT Ax̂ = AT b
AT Ax̂ = AT b
which is necessarily consistent. Note that in this case we did not need to know an orthogonal basis for Col(A).
This is because we never explicitly calculate p.
6
Another general fact about A in this case is that the rank of A is generally n. That is, the columns of A will
usually be linearly independent. We have the following theorem which gives us an additional way to solve for x̂
in this situation:
Proof: Clearly, N(A) is a subset of N(AT A). We now wish to show that N(AT A) is a subset of N(A). This
would establish that N(A) = N(AT A). Let x 2 N(AT A), then (AT A)x = 0 ) AT (Ax) = 0 ) Ax 2 N(AT ).
Note also that Ax 2 Col(A) so that Ax 2 N(AT ) \ Col(A). Since Col(A)? = N(AT ) ) Ax = 0, thus x 2 N(A)
and N(AT A) = N(A). By the rank-nullity theorem we see that the rank of AT A is the same as the rank of A
which is assumed to be n. As AT A is an n ⇥ n matrix, it must be invertible. ⇤
Thus, when A has rank n, AT A is invertible, and
x̂ = (AT A) 1
AT b
x1 + x2 = 10
2x1 + x2 = 5
x1 2x2 = 20
This system is overdetermined and inconsistent. We would like to find the least squares approximation to b
and the least squares solution x̂ to this system. We can rewrite this linear system as a matrix system Ax = b
where: 0 1 0 1
1 1 10
B C B C
B C B C
A=B 2 1 C and b = B 5 C
@ A @ A
1 2 20
Example 2: Suppose some system is modeled by a quadratic function f (x), so that f (x) = ax2 + bx + c.
Experimental data is recorded in the form (x, f (x)) with the following results:
We would like to find the best approximation for f (x). Using these data points, we see that:
a(1) + b(1) + c = 1
a(4) + b(2) + c = 10
a(9) + b(3) + c = 9
a(16) + b(4) + c = 16
7
This corresponds to the matrix equation Ax = b where:
0 1 0 1
1 1 1 1
B C B C
B C B C
B4 2 1C B10C
A=B B
C and b = B C
C B C
B9 3 1C B9C
@ A @ A
16 4 1 16
As in the previous example, the matrix A has full rank, hence AT A is invertible. Therefore the least squares
solution to this system is:
0 1
0.5
B C
B C
x̂ = (AT A) 1
AT b = B 6.9 C
@ A
4.5
Therefore f (x) is approximately 0.5x2 + 6.9x 4.5
Example 3: The orbit of a comet around the sun is either elliptical, parabolic, or hyperbolic. In particular,
the orbit can be expressed by the polar equation:
r= e(r cos ✓)
where is some positive constant and e is the eccentricity. Note that the orbit is elliptical if 0 e < 1, parabolic
if e = 1 and hyperbolic if e > 1.
A certain comet is observed over time and the following set of data (r, ✓) was recorded:
Using this data we would like to find the approximate orbital equation of this comet. Plugging these data points
in the equation above gives us the linear system:
e(1.146) = 1.2
e(0.761) = 2.1
e( 3.513) = 4.1
e( 4.983) = 6.3
Once again, the matrix A has full rank so that the least squares solution is given by:
0 1
2.242
x̂ = (AT A) 1 AT b = @ A
0.718
Therefore the orbital equation is approximately r = 2.242 0.718(r cos ✓). This example is similar to one of the
first applications of least squares. Gauss is credited with developing the method of least squares and applying
it to predicting the path of the asteroid Ceres. He did this to a remarkable degree of accuracy in 1801.
8
5. Least squares in C[a, b]
where w(x) is some continuous, positive function on [a, b]. Consider that we have a collection of functions
{f1 (x), . . . , fn (x)} which are mutually orthogonal. Moreover, assume that they form an orthogonal basis for S.
Then, given any function f (x) in C[a, b], we can approximate f (x) by a linear combination of the fi . The best
such approximation (in terms of least squares) will be given by the orthogonal projection p(x) of f (x) onto S.
The most common application of such an approximation is in Fourier Series which will be covered in the next
section.
There is an analytical consideration which has to be made, and that is how good can we make this approx-
imation. In particular, can we enlarge S is a regular way so that the limit of this process is f (x). This is a
question which is beyond our scope, but the answers is yes in some cases and no in others.
6. Fourier Series
In this section we consider the function space C[ ⇡, ⇡] and we wish to know if given a function f (x) in
C[ ⇡, ⇡] how can we approximate this function using functions of the form cos mx and sin mx. This is obviously
useful for periodic functions. Our setup is as follows:
Z
1 ⇡
hf (x), g(x)i = f (x)g(x) dx
⇡ ⇡
and
1
Sn = Span( p , sin x, . . . , sin nx, cos x, . . . , cos nx)
2
We can check that the following are true:
Z
1 ⇡
h1, cos ↵xi = cos ↵x dx = 0
⇡ ⇡
Z
1 ⇡
h1, sin xi = sin x dx = 0
⇡ ⇡
8
Z < 1 if ↵ =
1 ⇡
hcos ↵x, cos xi = cos ↵x cos x dx =
⇡ ⇡ : 0 if ↵ 6=
8
Z ⇡ < 1 if ↵ =
1
hsin ↵x, sin xi = sin ↵x sin x dx =
⇡ ⇡ : 0 if ↵ 6=
Z
1 ⇡
hcos ↵x, sin xi = cos ↵x sin x dx = 0
⇡ ⇡
Therefore
1
{ p , sin x, . . . , sin nx, cos x, . . . , cos nx}
2
is an orthonormal basis for Sn .
9
Given a function f (x) in C[ ⇡, ⇡], the least squares approximation of f (x) in Sn will be
n
a0 X
+ (ak cos kx + bk sin kx)
2
k=1
where Z
1 ⇡
a0 = f (x) dx
⇡ ⇡
Z
1 ⇡
ak = f (x) cos kx dx
⇡ ⇡
Z
1 ⇡
bk = f (x) sin kx dx
⇡ ⇡
Note that ak and bk are just the inner products of f (x) with the basis vectors of Sn and
n
a0 X
+ (ak cos kx + bk sin kx)
2
k=1
It is beyond our scope, but as n ! 1, these approximations do converge to f (x) on the interval ( ⇡, ⇡). In
particular, on that interval,
1
X 2
x = f (x) = ( 1)k+1 sin kx
k
k=1
and
1
x X 1
= ( 1)n+1 sin kx
2 k
k=1
If we take x = ⇡/2, then we see that
1
⇡ X 1 ⇡ 1 1 1
= ( 1)k+1 sin k = 1 + + ···
4 k 2 3 5 7
k=1
10