Nonex Kap5

Chapter 5
Metric Projection: applications
In this chapter we shall list a problems and applications which may be analyzed by the
results of the last chapters and attacted by the methods presented in the next chapter.
The applications are located in approximation theory, optimization, math finance, signal
and image analysis and mathematical physiscs. Many of the problems my be considered
as inverse problems; see [6, 33, 45].
5.1 Approximation problems

The first approximation problem considerd in this section is the classical Chebyshev ap-
proximation problem. Then we consider the approximation by partial sums of a repre-
sentation system in Hilbert and Banach spaces. Finally we mention a result which is the
basis for discretization methods of elliptic equations.
Chebyshev approximation
A polynomial p is a function which can written in the form
p(t) = a0 + a1 t + · · · + am tm
for some coefficients a0 , . . . , am ∈ R . If am �= 0, then the polynomial is said to be of degree

m . Polynomials are just about the simplest mathematical functions that exist, requiring
only multiplications and additions for their evaluation. Yet they also have the exibility
to represent very general nonlinear relationships. Approximation of more complicated
functions by polynomials is a basic building block for a great many numerical techniques.
Polynomials are used to approximate a difficult to evaluate function, such as a an
interest function, a density or a distribution function, with the aim of fast evaluation on a
computer. Here there is no interest in the polynomial curve itself. Rather the interest is
on how closely the polynomial can follow the special function, and especially on how small
the maximum error can be made. Very high order polynomials may be used here if they
provide accurate approximations. Very often a function is not approximated directly,
but is first transformed or standardized so to make it more amenable to polynomial
approximation.
Here we sketch the best approximation of functions by polynomials in space C[a, b] of
continuous functions on the interval [a, b] . The distance we use is the supremum-norm
119
in C[a, b]:
�f − g�∞ := sup |f(t) − g(t)| , f, g ∈ C[a, b] .
t∈[a,b]
Let Pn be the space of polynomials of degree less than or equal to n . The best approxima-
tion problem in the supremum norm (Chebyshev approximation problem) is the following
one:
Given f ∈ Pn find p0 ∈ Pn such that �f − p0 �∞ = dist(f, Pn ) . (5.1)
This problem has a solution since Pn is proximal in C[a, b] . The proximality is a conse-
quence of the fact that for each f ∈ C[a, b] the finite-dimensional space span({f} ∪ Pn ) is
reflexive.
Theorem 5.1 (Kolmogorov, 1948). Let f ∈ C[a, b] and p0 ∈ Pn . Then the following
statements are equivalent:
(a) p0 is a best approximation of f .
(b) maxt∈A0 (f(t) − p0 ))g(t) ≥ 0 for all g ∈ Pn where A0 := {t ∈ [a, b] : |f(t) − p0 (t)| =
�f − p0 �∞ } is the set of the so called peak points.
Notice that A0 is nonempty and compact.
Proof:
Obviously, A0 is compact and nonempty due to the continuity of f − p0 and the compact-
ness of [a, b] .
(a) =⇒ (b) Let g ∈ Pn and a > 0 . Then there exists ta ∈ [a, b] with �f − p0 + ag�∞ ≥
�f − p0 �∞ . Then
|f(ta ) − p0 (ta ) + ag(ta )| = �f − p0 + ag�∞ ≥ �f − p0 �∞ ≥ |f(ta ) − p0 (ta )|

|f(ta ) − p0 (ta )|2 + 2(f(ta ) − p0 (ta ))g(ta ) + a2 |g(ta )|2 ≥ |f(ta ) − p0 (ta )|2
This implies
1
(f(ta ) − p0 (ta ))g(ta ) ≥ − a2 �g�2∞ (5.2)
2
Taking the limit a → 0 we obtain a cluster point t0 ∈ [0, 1] of the family ta , a > 0, due to
the compactness of [a, b] . Then we obtain from the inequalities above |f(t0 ) − p0 (t0 )| ≥
�f − p0 �∞ , i.e. t0 ∈ A0 and (f(t0 ) − p0 (t0 ))g(t0 ) ≥ 0 .
(b) =⇒ (a) Let g1 ∈ Pn . Then we obtain t0 ∈ A0 with
(f(t0 ) − p0 (t0 ))(p0 (t0 ) − g1 (t0 )) ≥ 0
Since t0 is a peak point of f − p0 we have |f(t0 ) − p0 (t0 )| = �f − p0 �∞ and we obtain
�f − g1 �2∞ ≥ |f(t0 ) − g1 (t0 )|2 = |(f(t0 ) − p0 (t0 )) + (p0 (t0 ) − g1 (t0 ))|2
= |f(t0 ) − p0 (t0 )|2 + 2(f(t0 ) − p0 (t0 ))(p0 (t0 ) − g1 (t0 )) + |p0 (t0 ) − g1 (t0 )|2
≥ |f(t0 ) − p0 (t0 )|2 = �f − p0 �2∞ .
Since g1 ∈ Pn was arbitrary chosen p0 is a best approximation of f . �
The characterization of a best approximation in Theorem 5.1 is proved on a direct way.

Instead, we could refer to Theorem ??, but then we had to analize the duality mapping
in the space C[a, b] ; see Example 3.11. This road to the result 5.1 can be found in [31].
120
Now, we want to derive the uniqueness of the best approxiamtion. Here, an important
role plays the so called Haar condition. Let U ⊂ C[a, b] be a subspace with dim(U) =
n + 1 . We say that U satisfies the Haar condition if and only the property
Each p ∈ Pn , p �= θ, has at most n zeros (5.3)
holds. If the Haar condition holds we call U a Haar space. Clearly the subspace U := Pn
satisfies the Haar condition due to the fundamental theorem of algebra.
Let t0 , t1 , . . . , tn be pairwise different points in [a, b] . If we choose a basis φ0 , . . . , φn
in a Haar space, then the matrix B := (φi (tj ))i,j=0,...,n has a non-zero determinant. A
consequence of this fact is that for n + 1 distinct points t0 , . . . , tn in [a, b] and data
y0 , . . . , yn ∈ R the interpolation problem
g(ti ) = yi , i = 0, . . . , n, (5.4)
has a uniquely determined solution g in the Haar space.
Lemma 5.2. Let p0 ∈ Pn be the best approximation of f ∈ C[a, b]\Pn . Then then there
exist at least n + 1 distinct poits ξ with |f(ξ) − p0 (ξ)| = �f − p0 �∞ .
Proof:
In a first step we prove that there exist at least two distinct points t0 , t1 ∈ [a, b] with
f(t0 ) − p0 (t0 ) = −(f(t0 ) − p0 (t0 )) = �f − p0 �∞ = E := dist(f, Pn ) .
Let t1 ∈ [0, 1] with f(t1 ) − p0 (t1 ) = E . If the assertion is false, then we have e :=
minξ∈[a,b] (f(ξ) − p0 (ξ)) > −E . Hence e + E �= 0 and the polynomial q := p0 + 12 (E + e)
belongs to Pn . We have
1 1 1
E − (E + e) ≥ f(t) − p0 (t) − (E + e) ≥ e − (E + e) for all t ∈ [0, 1]
2 2 2
which implies
1 1
− (E − e) ≤ f(t) − q(t) ≤ (E − e) .
2 2
This shows
1
�f − q�∞ ≤ (E − e) < E = �f − p0 �∞ = dist(f, Pn ) .
2
This is a contradiction.
Suppose that there exist only m ≤ n + 1 distinct poits ξ0 , . . . , ξm ∈ [a, b] with |f(ξi ) −
p0 (ξi )| = �f − p0 �∞ , i = 0, . . . , m . Then we can find a polynomial q ∈ Pn with q(ξi ) =
−(f(ξi ) − p0 (ξi )), i = 0, . . . , m . This implies
max (f(ξ) − p0 (ξi ))q(ξi ) = max (−|f(ξi ) − p0 (ξi )|2 ) < 0 .

i=0,...,m i=0,...,m
This is a contradiction to condition (b) in Theorem 5.1. �
Theorem 5.3. Pn is a Chebyshev subspace of C[a, b] .
121
Proof:
We know already that Pn is proximal. Let f ∈ C[a, b] . If f ∈ Pn then the best approxima-
tion in Pn is given by f and therefore uniquely determined. Let f ∈ C[a, b]\Pn . Suppose
that p0 , p1 ∈ Pn are best approximations of f . Then with p := 12 (p0 + p1 ) we have
�f − p0 �∞ = �f − p1 �∞ = �f − p�∞ = dist(f, Pn ) > 0 .
Hence,
1 1
dist(f, Pn ) ≤ �f − p�∞ ≤ �f − p0 �∞ + �f − p1 �∞ = dist(f, Pn ) .
2 2
This shows that p is also a best approximation. By Lemma 5.2 there exist n + 2 distinct
points t0 , . . . , tn+1 ∈ [a, b] with |f(ti ) − p(ti )| = dist(f, Pn ), i = 0, . . . , n + 1 . For a fixed
tk we have
1
f(tk ) − (p0 (tk ) + p1 (tk )) = f(tk ) − p(tk ) = dist(f, Pn ) = �f − p0 �∞ ≥ f(tk ) − p0 (tk ),
2
which gives p1 (tk ) ≤ p0 (tk ) . Similarly,
1
f(tk ) − (p0 (tk ) + p1 (tk )) = f(tk ) − p(tk ) = dist(f, Pn ) = �f − p1 �∞ ≥ f(tk ) − p1 (tk ),
2
leads to p0 (tk ) ≤ p1 (tk ) . Thus, we have p0 (tk ) = p1 (tk , k = 0, . . . , n + 1 . On the other
hand, if f(tk ) − p(tk ) = −dist(f, Pn ), then
1
f(tk ) − (p0 (tk ) + p1 (tk )) = f(tk ) − p(tk ) = −dist(f, Pn ) = −�f − p0 �∞ ≤ f(tk ) − p0 (tk ),
2
which gives p0 (tk ) ≤ p1 (tk ) . A similar procedure gives then p0 (tk ) ≥ p1 (tk ) . Hence, we
have in this case also p0 (tk ) = p1 (tk ), k = 0, . . . , n + 1 . We onclude that p0 − p1 ∈ Pn
has n + 2 distinct zeroes and hence p0 = p1 . �
5.2 Greedy approximation

Greedy approximation in Hilbert space
A generic problem of mathematical and numerical analysis is to approximately represent a
given function. It is a classical problem that goes back to the first results on Taylor’s and
Fourier’s expansions of a function. The first step in solving the representation problem
is to choose a representation system. Traditionally, a representation system has natu-
ral features such as minimality, orthogonality, simple structure and nice computational
characteristics. The most typical representation systems are the trigonometric system
(eikt )k∈Z , the algebraic system (tk )k∈N , the spline system, the finite element system and
the wavelet system and their various variants. In general we may speak of a basis sequence
(bk )k∈N in a Hilbert space or Banach space. The second step in solving the representation
problem is to choose a form of an approximant that is built on the basis of the chosen
representation system.
122
Let us consider the Hilbert space case. Let H be a separable Hilbert space and let
the representation system (ek )k∈N be an orthonormal basis. As we know, every element
x ∈ H has a representation �
x= �x|ei �ei .
i∈N
In a classical way that was used for centuries, an approximant for an element x ∈ H is a
partial sum
m
�
Sm (x) := �x|ei �ei (m ∈ N)
i=1
One calls such a construction of an approximant a linear m-term approximation.
In order to put this procedure into the terminology of the last chapter, we set Um :=
span({e1 , . . . , em }) and consider the approximation problem
Find w ∈ Um which satisfies �x − w� = inf �x − u� = dist(x, Um ) .

u∈Um
We know that this problem has a uniquely determined solution which we denote by PUm (x)
again; PUm is the associated metric projection. From the fact that x − Sm (x) is orthogonal
to Um we conclude that PUm (x) = Sm (x) holds true; see (4) in Theorem 3.3.
Experience in signal and image processing has led to the suggestion that it is more
beneficial to use a general m-term approximation with respect to the basis chosen. This
means that for x ∈ H we look for an approximation of the form
�
SΛm (x) := �x|ei �ei
i∈Λm
where Λm is a subset of m indices. The choice of this set Λm has to be chosen appropriate
with respect to x . Notice, both Sm (x) and SΛm (x), are of the same complexity. From the
Parseval equality we obtain
� � �2 �
� �
�x − SΛm (x)�2 = � �x|ei �ei � = |�x|ei �|2 =: ηΛm (x) .
i∈N,i∈Λ
/ m (x) i∈N,i∈Λ
/ m (x)
This type of approximation is a nonlinear approach since for a fixed m the approximants
come from different subspaces spanned by ei , i ∈ Λm (x) .
The key for an optimal choice of the set Λm (x) is the simple observation that the
approximation error ηΛm (x) will be minimized if the set Λm contains those indices whose
modulus of the coefficients is biggest. This choice is called the greedy choice. This
means that the greedy subset Λm is chosen in such a way that
min |�x|ei �| ≥ max |�x|ei �|

i∈Λm i∈Λ
/ m
Clearly, a greedy subset is not necessarly uniquely determined. We set
σm (x) := �x − SΛm (x)� where Λm is a greedy subset .
123
Since the approximation of x by Sm (x) can be considered as a m-term approximation we
obtain
σm (x) ≤ dist(x, Um ) .
In order to confirm that greedy approximation may decrease the approximation error let
m ∈ N and choose x := em+1 . We obtain Sm (x) = θ with �x − Sm (x)� = 1, wheras
SΛm (x) = x and no error occur.
Algorithm 5.1 Greedy algorithm in Hilbert space

Given a Hilbert space with an orthonormal basis (ei )i∈N . Let x ∈ H . This algorithm
computes a greedy subset Λm and a greedy approximation SΛm (x) .
(1) x0 := x, j := 0, Λ0 := ∅ .
(2) For j = 1, . . . , m repeat
(i) Find nj such that |�xj−1 |enj �| ≥ |�xj−1 |ek �| for all k ∈ N .
(ii) Set Λj := Λj−1 ∪ {nj }, xj := xj−1 − �xj−1 |enj � .
(3) Set SΛm := xm .
Greedy approximation in Banach space

In order to translate the considerations above to the case of Banach spaces we need the
terminology of a basis in a Banach space. The appropriate concept is that of a Schauder
basis. We follow mainly [48, 49] and [12].
Definition 5.4. Let X be a Banach space. A sequence (bi )i∈N is called a Schauder
basis
�if∞ for each x ∈ X there exist uniquely determined coefficients ci , i ∈ N, such that
i
x = i=1 ci b . �
Necessarily, a Banach space with a Schauder basis is separable (choose rational coef-
ficients for the approximating series). Therefore, l∞ posesses no Schauder basis. In the
spaces lp , 1 ≤ p < ∞, the standard unit vectors build a Schauder basis. The Haar sys-
tem is a Schauder basis in Lp [0, 1], 1 ≤ p < ∞ . It was for a long time an open problem
whether each separable Banach space posesses a Schauder basis. A negative answer has
been given by P. Enflo [22] in 1973 when he constructed a Banach space which satisfies not
the approximation property since this property is a necessary condition for the existence
of a Schauder basis. The approximation property may be formulated as follows:
A Banach space X has the approximation property if the following holds:
For each compact subset K of X and for each ε > 0 there exists a linear
bounded operator T : X −→ X with dim ran(T ) < ∞ and �T (x) − x� ≤ ε
for all x ∈ K .
Let X be a separable Banach space with a Schauder basis (bi )i∈N . Again, we denote
the partial sums by Sm (x):
m
� ∞
�
i
Sm (x) = ci b if x = c i bi .
i=1 i=1
124
Clearly, the mappings
∞
�
λn : X � x = ci bi �−→ cn ∈ R , n ∈ N,
i=1
are well defined linear functionals due to the uniqueness of the presentation of x as a
series. As a consequence, the mapping x �−→ Sm (x) is linear too. Moreover, one can
show that for each n ∈ N the mapping x �−→ Sm (x) is a bounded linear operator and
the number BC := supm∈N �Sm � is finite. Now, the coefficient functionals λn are linear
and bounded too.
We set Um := span({b1 , . . . , bm }) and consider the approximation problem
Find w ∈ Um which satisfies �x − w� = inf �x − u� = dist(x, Um ) .

u∈Um
Since Um is finite dimensional this problem has a solution but uniqueness is not guaran-
teed. Moreover, it is not clear that Sm (x) is a solution of the problem. Since
�Sm (x)� ≤ BC �x� for all m ∈ N, x ∈ X .
we have for u ∈ Um
�x − Sm (x)� ≤ �x − u + u − Sm (x)� ≤ �x − u� + �Sm (x − u)� ≤ (1 + BC)�x − u� .
Hence
�x − Sm (x)� ≤ (1 + BC)dist(x, Um ) . (5.5)
(5.5) shows that Sm (x) is almost a best approximation of x in Um . Moreover, limm Sm (x) =
x due to the definition of a Schauder basis.
Similar to the Hilbert space case, we can consider the greedy approximation in the
Banach space X with Schauder basis (bi )i∈N . The error of the best greedy approximation
is � � �
σm (x) := inf inf{�x − ci bi � : ci ∈ R, i ∈ Λ}
Λ⊂N,#Λ=m
i∈Λ
Obvously, σm (x) ≤ dist(x, Um ) for all m ∈ N, x ∈ X . Moreover, we have σm (bm+1 ) <

dist(x, Um ) .
A permutation ρ : N −→ N is called decreasing for x if
|λρ(i) (x)| ≥ |λρ(i+1) (x)| for all i ∈ N
holds true. Now, we can define the family of greedy approximants: for x ∈ X and a
decreasing permutation ρ we call
m
�
Gm (x, ρ) := λρ(i) (x)bρ(i)
i=1
the m-th greedy-approximant. The m-th greedy approximant is a partial sum of a

rearranged series expansion of x . Therefore, convergence of Gm (x, ρ) → x for all x ∈ X
can be expected only if every rearrangement of the basis (bi )i∈N is again a Schauder basis.
125
Definition 5.5. Let X be a Banach space. A sequence (bi )i∈N is called a unconditional
Schauder basis if for each permutation π : N −→ N (bπ(i) )i∈N is a Schauder basis. �
For instance, in [2, 19, 18, 35] one can find results under which additional assumptions
convergence of the greedy-approximants holds. The main source of different behaviour of
greedy-approximants are wavelet bases.
5.3 Cea’s Lemma

Cea’s Lemma is fundamental for error estimates in finite element methods. It describes
the bridge between the discretization error by finite elements in a (partial) differential
equation and the best approximation error by the chosen finite elements.
Let V be a Hilbert space with inner product �·|·� and let a : V × V −→ R be a

mapping with the properties
(1) a is bilinear.
(2) a is continuous, i.e. there exists a constant c1 > 0 such that
|a(u, v)| ≤ c1 �u��v� for all u, v ∈ V .
(3) a is coercive, i.e. there exist a constant c2 such that
a(u, u) ≥ c2 �u�2 for all u ∈ V .
Consider for y ∈ V the problem
Find u ∈ V such that a(u, v) = �y|v� for all v ∈ V . (5.6)
To solve this problem one chooses a finite-dimensional subspace Vh of V (the space of

finite elements“) and considers the discretizized problem
”
Find uh ∈ Vh such that a(uh , v) = �y|v� for all v ∈ Vh . (5.7)
The parameter h is measure which describes the meshwidth of the finite elements and
determines the dimension of Vh . By the Lax-Milgram lemma, each of the problems above
has exactly one solution; see for instance [51].
Lemma 5.6 (Cea’s Lemma). Let u, uh be the solution of (5.6) and (5.8), respectively.
Under the assumptions above we have
�u − uh � ≤ c1 (c2 )−1 dist(u, Vh ) . (5.8)
Proof:
We have for all v ∈ Vh
c2 �u − uh �2 ≤ a(u − uh , u − uh ) = a(u − uh , u − v) + a(u − uh , v − uh ) .
126
Since
a(u, v) = �y|v� = a(uh , v) for all v ∈ Vh .
we obtain
c2 �u − uh �2 ≤ a(u − uh , u − v) ≤ c1 �u − uh ��u − v� for all v ∈ Vh
and the result follows. �
Lemma 5.6 says that uh is the best approximation“ of u in Vh , up to a constant

”
c1 (c2 )−1 . If the bilinear form a is in addition symmetrc, i.e. a(v, w)�
= a(w, v) for all
−1
v, w ∈ V, then the constant c1 (c2 ) may be replaced by the constant c1 (c2 )−1 .
Example 5.7. Consider the boundary value problem
−u �� + pu � + u = f , ξ ∈ (0, 1), u(0) = u(1) = 0, (5.9)
where f is a given function and p is a constant. The problem consists in finding a function
u which satisfies the differential equation and the boundary conditions in (5.9).
If multiply the equation by a function
v ∈ C10 (0, 1) := {w : (0, 1) −→ R : w differentiable and w(0) = w(1) = 0}
then we obtain by formal computational steps
�1 �1
� � �
a(u, w) = u w + pu w + uw)dξ = fwdξ .
0 0
In order to obtain a Hilbert space setting we have to consider the completion of C20 (0, 1)
under the norm
�� 1 � 12 1
�w�1 := (w �2 + w2 )dξ = (�w � �20 + �w�20 ) 2
0
where � · �0 is the norm in L2 (0, 1) getting the Hilbert space V := H10 (0, 1) ; H10 (0, 1) is
endowed with the inner product
�1
�v, w� := (v � w � + vw)dξ , v, w ∈ H10 (0, 1) .
0
a is then defined on V × V as above. a is symmetric if p = 0, otherwise non-symmetric.

For convenience, assume 0 ≤ p < 2 . The continuity of a follows from
�
� 1 � �
|a(u, v)| ≤ |�u|v�| + p � u vdξ� ≤ �u�1 �v�1 + p�u � �0 �v�0 ≤ 3�u�1 �v�1
0
The coercivity of a is implied by

�1
a(v, v) = (v �2 + pv � v + v2 )dξ
0
�
1 1 �2
≥ (v (2 − p) + v2 (2 − p))dξ
2 0
1
≥ (2 − p)�v�21
2
127
Now, the assumptions introduced above are satisfied.
Next, we sketch the analysis of discretized problem. To discretize the equation choose a
finite-dimensional space Vh ⊂ V and a basis φ1 , . . . , φm of Vh . Usually, m is of order 1/h .
For the solution of the discretized equation one has to solve a linear system of equations
which is uniquely solvable to the coercivity. To exploit Cea’s Lemma we have to estimate
dist(u, Vh ) . To do this one may choose the interpolation operator Ih defined by
m
�
Ih v := v(ξi )φi
i=1
where 0 = ξ1 < ξ2 < · · · < ξm = 1 is a partition of [0, 1] . Then
dist(u, Vh ) ≤ �u − Ih u� .
To estimate �u − Ih u� is now a standard problem in the analysis of functions. �
5.4 Feasibility problems

Feasibility problems are ubiquitous in mathematical optimization. Given a constraint
set, a solution to the feasibility problem is any point contained in the set. Frequently
the constraint set can be realized as the intersection of a family of sets, each describing
a simpler constraint. Thus, the convex feasibility problem can be formulated in a very
simple way: Let C1 , . . . , Cm be a family of nonempty closed convex sets in a Hilbert space
H . The convex feasibility problem is:
Given C1 , . . . , Cm find a point x ∈ C .
Historically, this problem is related to problem of guessing a starting point for solving a
mathematical programming problem by the Simplex-algorithm.
To solve the feasibility problem one may try to find the projection PC (x0 ) of a given
approximation“ x0 ∈ X . But it may be difficult to compute the metric projection PC .
”
Instead one may try to find the metric projection PC by using the metric projections
C1 , . . . , Cm which may be easier to compute in a cyclic way; see (??). That this idea may
work migth be seen in the simple example when C1 , C2 are linear subspaces in R2 .
Linear systems
Consider the problem of solving the linear system
Ax = b (5.10)
where A ∈ Rm,n and b ∈ Rm . Let ai be the ith row of A and define
Hi := {x ∈ Rn : �ai |x� = bi } , i = 1, . . . , m .
We assume ai �= θ, i = 1, . . . , m . Then each Hi is a hyperplane. Now, Ax = b if and only

if x ∈ H := ∩m
i=1 Hi . The metric projection onto Hi is given by
bi − �ai |x� i
PHi (x) := x + a (5.11)
�ai �
128
For numerical purposes, the computation of this projection is very pleasant since its
computation requires vector arithmetic only.
The system (5.10) is called consistent if there exists a z ∈ Rn with Az = b . In
practice, the system (5.10) may be inconsistent due to the fact that it is overdetermined
(m >> n). As we will see in the next chapter, both cases can be handled by the cyclic
projection method.
Suppose that instead we want to solve a linear system of inequalities
Ax ≤ b (5.12)
where ≤ is used componentwise. Then Ax ≤ b if and only if x ∈ ∩m ^ ^
i=1 Hi where Hi are the
half-spaces given by
^ i := {x ∈ Rn : �ai |x� ≤ bi } , i = 1, . . . , m .
H
^ i is given by
The metric projection onto H
�
x if �ai |x� ≤ bi
PH^ i (x) := (5.13)
PHi (x) otherwise
Again, there are many variations. For instance, we may consider the constrained linear
system
Ax = b , x ≥ θ .
Here, in addition to m affine constraints a cone-type constraint has to be met. The metric
projection onto the cone Rn+ := {x ∈ Rn : x ≥ θ} is a special case of the projection onto
half-spaces.
Determining separating hyperplanes

Let C, D closed convex subsets in the Hilbert space H such that C ∩ D = ∅ . A hyperplane
H = Hy,a := {x ∈ H : �y|x� = a} is called a separating hyperplane of C, D if
�y|u� ≤ a ≤ �y|v� for all u ∈ C, v ∈ D .
As we will see in the next chapter, the method of alternating projection may be used
to construct two sequences (un )n∈N and (vn )n∈N in C and D respectively and points
u ∈ C, v ∈ D such that
u = lim un , v = lim vn and �u − v� = dist(C, D) .
n n
We define y := u − v, and a := tu + (1 − t)v for some t ∈ [0, 1] and define the set
H := {x ∈ H : �y|x� = a} .
Assumption: dist(C, D) > 0 .
Then we must have y �= θ and H is a hyperplane. Further, 0 < �y�2 = �y|u� − �y|v�
which implies �y|u� > �y|v� . As we know from Kolmogorov’s criterion we have
�v − u|u � − u� ≤ 0 , �u − v|v � − v� ≤ 0 for all u � ∈ C, v � ∈ D .
This implies
�y|u � � ≤ �y|v� < �y|u� ≤ �y|u � � for all u � ∈ C, v � ∈ D ,
and we conclude that H is a separating hyperplane.
129
Linear programming - a first view
Consider the linear program (P) given by
Maximize �c|x� subject to Ax ≤ b, x ≥ 0 , (5.14)
where A ∈ Rm,n , b ∈ Rm and c ∈ Rn . Then (P) has a (symmetric) dual program (D)
given by
Minimize �b|y� subject to At y ≥ c, y ≥ 0 ; (5.15)
see for instance [7]. If the optimal value of either (P) or (D) is finite, then the optimal
values of both (P) and (D) are finite and strong duality holds. That is, �c|x∗ � = �b|y∗ � if
and only if x∗ is optimal for (P) and y∗ is optimal for (D).
We define the constraints

C1 := {(x, y) ∈ Rn × Rm : �c|x� = �b|y�} (optimality)
C2 := {(x, y) ∈ Rn × Rm : Ax ≤ b} (primal feasibility)
C3 := {(x, y) ∈ Rn × Rm : x ≥ θ} (primal feasibility)
C4 := {(x, y) ∈ Rn × Rm t
: A y ≥ b} (dual feasibility)
C5 := {(x, y) ∈ Rn × Rm : y ≥ θ} (dual feasibility)
Then (x∗ , y∗ ) ∈ ∩5i=1 Ci if and only if x∗ is optimal for (P) and y∗ is optimal for (D). Since
the variables x and y are related only by C1 it is convenient to consider the following
three constraints: C1 , C2 ∩ C4 , C3 ∩ C5 . C1 is a hyperplane, C2 ∩ C4 is an intersection of
halfplanes, C3 ∩ C5 is the cone Rn+ × Rm + .
Semidefinite optimization
Consider the space Rn,n of square matrices. Rn,n can be endowed with the inner product
n
�
�A|B� := A • B := aij bij = trace(Bt A) , A = (aij ), B = (bij ) .
i.j=1
Since Rn,n is finite-dimensional (dim Rn,n = n2 ) (Rn,n , �·|·�) is a Hilbert space. The inner
product induces the so called Frobenius norm
1
�A�F := �A|A� 2 , A ∈ Rn,n .
The subset of symmetric matrices in Rn,n will be denoted by S n . The subset of positive
semidefinite symmetric matrices is S+n and S++
n
is the set of symmetric postive definite
n
matrices. Clearly, S+ is a nonempty closed convex cone. To abbreviate the notions, we
write occasionally A � Θ for A ∈ S+n .
Consider the primal semidefinite program:
Minimize �C|X subject to �Ai |X� = bi , i = 1, . . . , m , X � Θ , (5.16)
where A ∈ Rm,n , b ∈ Rm and c ∈ Rn .
130
The feasibility problem of this program may be reformulated as follows:
Find X ∈ S+n with �Ai |X� = bi , i = 1, . . . , m .
We set
C := S+n , D := {X ∈ S n : �Ai |X� = bi , i = 1, . . . , m} .
Then we have to find a point in C ∩ D .
The projection
�n of a matrix X onto C := Sn+ can be found from the eigenvalue decom-
t
position X = i=1 λi qi qi of X:
n
�
PSn+ (X) = max{0, λi }qi qti
i=1
The projection of X onto on the affine set D is also easy to work out:
m
�
PAi •X=bi ,i=1,...,m (X) = X − ui Ai
i=1
where ui are found from the normal equations
Gu = (�Ai |X� = bi )i=1,...,m , G = (Gij ), Gij = �Ai |Aj �, i, j = 1, . . . , m .
Remark 5.8. As in the case of linear programming, we have a dual problem of semidef-
inite programs. In contrast to the linear programming, if the optimal value of either the
primal or the dual program is finite, then the optimal values of both problems are not
forced to be equal; a duality gap may exist. But there are well known conditions which are
sufficient for the non-existence of a duality gap. If there is no duality gap the feasibility
problem for the primal and dual problem together with the optimality condition may be
considered as in the linear programming context. �
An interesting application for the semidefinite approach is the the so called educa-
tional testing problem; see [16, 1]. Such optimization problems arise in statistics.
Given a matrix A ∈ S+n , find out how much can be subtracted from the
diagonal of A and still retain a matrix in S+n .
This can be expressed as
Maximize �e|d� subject to A − Diag(d) ∈ S+n , d ≥ 0, (5.17)
where e := (1, 1, . . . , 1) ∈ Rn .
Fletcher [23, 24] describes methods to solve the problem by classical“ methods of
”
optimization. W. Glunt [25] applies successfully the idea described in the following sub-
section.
131
Linear programming - a second view
We consider a variation of the linear programming problem (5.16).
Minimize �e|x� subject to x ∈ ∩m
i=1 Ci , (5.18)
where e = (1, 1, . . . , 1) ∈ Rn and C1 , . . . , Cm are closed convex subsets of Rn . Clearly,
the feasable set C := ∩m i=1 Ci is a generalized constraint set; compare with (5.16). We call
the problem (5.18) a linear convex programming problem. To relate this problem
to alternate projection methods following an idea of W. Glunt [25] we introduce the
hyperplane
Lτ := {y ∈ Rn : �e|y� = τ} .
If τ is chosen such that
τ < min�e|u�
u∈C
then the sets C and Lτ are disjoint. Given a f ∈ Rn one may consider the problem
Minimize �f − x� subject to x ∈ C ∩ Lτ . (5.19)
which has no feasible point. Nevertheless, one may consider the alternate projection
methods to try to compute PC∩Lτ (f) . What we get by this procedure are two sequences
(un )n∈N , (yn )n∈N , belonging to C and Lτ respectively. As we show below following [15],
these sequences converge to points u ∈ C and y ∈ Lτ such that �u − y� attains the
minimum distance between C and Lτ . It can be deduced from the relationsship of Lτ and
�e|·� that u solves the problem (5.18).
Theorem 5.9. Let H be a Hilbert space and let C, D be nonempty closed convex subsets
of H . Then under the assumption
one of the sets C, D is compact (5.20)
there exist u ∈ C and v ∈ D respectively such that
�u − w� = inf �u � − y � �
u � ∈C,y � ∈D
Proof:
Let us assume that C is compact.
Obviously, there exist a sequence (un )n∈N in C and a sequence (vn )n∈N in D with
1
�un − vv � − ≤ d := inf �u − v�, n ∈ N .
n u∈C,v∈D
Since C is compact (un )n∈N contains a convergent subsequence (unk )k∈N ; let u := limk unk .
Since C is compact and therefore closed, u ∈ C . Then (vnk )k∈N is bounded and contains
therefore a weakly convergent subsequence (vnl )l∈N ; let v = w−liml vnl . Since D is closed
and convex, D is weakly closed and therefore v ∈ D . Now (unl − vnl )l∈N converges weakly
to u − v . Since the norm in H is weakly lower semicontinuous, d = �u − v� . �
Cheney and Goldstein have shown in 1959 [15] that under the assumption of Theorem
5.9 the alternate projection method produces a vector w which satisfies
w = u − v, u ∈ C, v ∈ D, �w� = dist(C, D) .
132
5.5 Approximation of correlation matrices
A correlation matrix is a symmetric positive semidefinite matrix with unit diagonal. Cor-
relation matrices occur in several areas of numerical linear algebra, including precondition-
ing of linear systems and error analysis of Jacobi methods for the symmetric eigenvalue
problem. The term correlation matrix comes from statistics, since a matrix whose (i, j)
entry is the correlation coefficient between two random variables xi and xj is a symmet-
ric positive semidefinite matrix with unit diagonal. In stock research sample correlation
matrices constructed from vectors of stock returns are used for predictive purposes. Un-
fortunately, on any day when an observation is made data is rarely available for all the
stocks of interest. One way to deal with this problem is to compute the sample correlations
of pairs of stocks using data drawn only from the days on which both stocks have data
available. The resulting matrix of correlations will be only an approximate correlation
matrix, because it has been built from inconsistent data sets.
The problem we consider is for a matrix A ∈ S := C1 ∩ S+n to compute the best

approximation of A in S with respect to the Frobenius-norm:
Minimize �X − A�F subject to A ∈ C1 ∩ S+n
This leads to the computation PS (A) . Since C1 and S+n are both closed convex subsets of
Rn,n , so is their intersection. Therefore the metric projection PS exists. Moreover, each
PS (A) is a singleton. The alternate projection method may be used to compute PS ; see
[29, 30, ?, 50]
In pratice, it is reasonable to fit the norm in Rn,n to some observations concerning the

n
quality of the correlation data. One can do this by choosing a matrix W ∈ S++ This W
1
induces the inner product �A|B�W := �A|WB�F and the norm �A�W := �W 2 A�F .
5.6 Extremal property of splines

Here we consider the interpolation of difficult to evaluate functions“ by simpler“ func-
” ”
tions. We follow here mainly [4, 9]
The curvature of a curve given by the parametrization
γ : [a, b] � t �−→ (t, f(t)) ∈ R2
with a twice continuous differentiable function f is given by
f �� (t)
κγ (t) := 3 , t ∈ [a, b] . (5.21)
(1 + f � (t)2 ) 2
From a physical point of view, the integral

�b
f �� (t)2
� 2 3
dt (5.22)
a (1 + f (t) )
133
is the strain energy of a flexible thin beam or draftman’s spline1 forced to pass through
predescribed points. Consider a partition τ : a = t0 < t1 < · · · < tm = b . We want to
find a curve of least strain energy which interpolates the given data y0 , . . . , ym ∈ R . The
criterion in (5.22) is a rather nonlinear one. Therefore it is appropriate to approximate
the strain energy. Historically and from the practical point of view a very successful
approximation is the following: � b
f �� (t)2 dt (5.23)
a
If we summarize the interpolationg conditions by
W(y) := {f ∈ C2 [a, b] : f(ti ) = yi , i = 0, . . . , m}
we may state the interpolation problem by

�b
Minimize f �� (t)2 dt subject to f ∈ W(y) (5.24)
a
Unfortunately, C2 [a, b] is not complete in the norm

�b
1
�f�c := ( |f(t)|2 + |f �� (t)|2 dt) 2 .
a
Therefore we consider a completion H2 [a, b] of C2 [a, b] defined as follows:

�t
AC[a, b] := {f ∈ C[a, b] : f(t) = α + g(s)ds, t ∈ [a, b], with g ∈ L1 [a, b], α ∈ R} ,
a
�t
� �
H [a, b] := {f ∈ C [a, b] : f (t) ∈ AC[a, b], f = α + g(s)ds, t ∈ [a, b], g ∈ L2 [a, b]} .
2 1
a
The function g in the definition of H2 [a, b] above is called the second derivative of f in a
generalized sense and we write f �� := g . This space H2 [a, b] is a special case of the family
of Sobolev spaces. It is a Hilbert space under the inner product
�b
�f, h�2,2 := (f(t)h(t) + f �� (t)h �� (t))dt .
a
A half norm in H2 [a, b] is given by

�b
1
ν(f)2,2 := ( |f �� (t)|2 dt) 2 .
a
Now we reformulate the minimization problem above as follows:
Minimize ν(f)2,2 subject to f ∈ V(y) (5.25)
where we have adapted the interpolating conditions to the new context:
V(y) := {f ∈ H2 [a, b] : f(ti ) = yi , i = 0, . . . , m}

1
The term spline“ was introduced by Schoenberg 1946.
”
134
The problem (5.24) has now a formulation which is appropriate to analyze the problem
by methods prepared in the last chapter. The problem that ν2,2 is not a norm is not very
serious because in general the interpolationg conditions enforces uniqueness of a solution
of (5.25).
One can show that such a solution f is a C2 − function and on each interval [ti , ti+1 ], i =
0, . . . , m − 1, equal to a polynomial of degree at least three. Therefore, a solution of the
minimization problem (5.25) is called a interpolating cubic spline and the fact that
such a spline is a solution of (5.25) is called the extremal property of the spline.
On the boundary of the interval [a, b] we observe the natural boundary conditions
f �� (a) = f �� (b) = 0 . If we count the degrees of freedom which such a function contains
then we obtain the number 4m . On the other hand we have to obbey 3(m − 1) + (m + 1)
requirements (continuity, interpolation conditions). There are two degrees of freedom left
which generate the natural boundary conditions. These natural boundary conditions may
be replaced by two other conditions; see below.
Let us consider a more general problem formulation. We choose Hilbert spaces H, K,

a linear continuous operator R : K −→ H, a convex subset M of K, and an affine closed
subspace V of K . Then we want to solve
Minimize �R(x)�H subject to x ∈ V ∩ M (5.26)
This problem may be considered as an approximation problem:
Find in H the best approximation w of θ in C := R(V ∩ M) . (5.27)
The set C is convex but the existence of a solution is in doubt since it is not clear whether
C is a closed. We introduce now a description of V with the intention to analyze this
question.
Let Y be a Banach space and let A : K −→ Y be a linear continuous mapping
with nullspace ker(A) . Moreover, we assume that A describes the set V as follows: V =
A−1 (y) with a given y ∈ Y . On this way we describe interpolating conditions: x ∈ K is
interpolating the data y“ if Ax = y . With the set M we describe constraints of non-
”
interpolatory character; for example: M = {x ∈ K : x ≥ θ} where ≥ is a (partial) order in
K.
Lemma 5.10. Let R : K −→ H be a surjective linear operator, let A : K −→ Y be a

linear operator. If ker(R) ∩ ker(A) = {θ} and if ker(R) is finite-dimensional then
1
� · �(R,A) : K � x �−→ (�R(x)�2 + �A(x)�2 ) 2 ∈ R (5.28)
defines a norm in K which is equivalent to the norm in K .
Proof:
Obviously, the expression in (5.28) is a norm due to the assumption ker(R)∩ker(A) = {θ} .
We have
�R(x)�2 + �A(x)�2 ≤ �R�2 �x�2 + �A�2 �x�2
which shows that there exists a constant c1 > 0 with
�x�(R,A) ≤ c1 �x� for all x ∈ K .
135
Consider the mapping L : K � x �−→ (Rx, Ax) ∈ H × Y . Clearly, H × Y is a Banach
space and L is a linear continuous operator. We show that the range ran(L) is closed.
Let (L(xn ))n∈N be a sequence in ran(L) with limn R(xn ) = w, limn A(xn ) = z . We de-
compose each xn as xn = un + vn ∈ ker(R) + ker(R)⊥ . We have limn R(vn ) = w . Since
^ := R|ker(R)⊥ is linear, continuous and bijective we conclude that (vn )n∈N is a Cauchy
R
sequence and therefore convergent. Let v := limn vn ∈ ker(R)⊥ .
Assume that (xn )n∈N is not bounded. Then we conclude from
xn un vn
= +
�xn � �xn � �xn �
n nk
that ( un )n∈N is a bounded sequence in ker(R) and a convergent subsequence ( unk )k∈N
�x � �x �
nk
u nk nk
exists. Clearly, u := limk nk ∈ ker(R) . Since (Ax )k∈N , (Av )k∈N are bounded we
�x �
obtain Au = θ, which implies u ∈ ker(R) ∩ ker(A) and finally u = θ . But this is
nk
contradicting the fact that xnk = 1, k ∈ N .
�x �
Now we know that (xn )n∈N is bounded which implies that (un )n∈N is bounded too. Hence,
(un )n∈N has a convergent subsequence and since (vn )n∈N is convergent, (xn )n∈N has a
convergent subsequence too. Each cluster point x of (xn )n∈N leads to (w, z) = (Rx, Ax) .
Now, the range of L is closed. Then L is a linear, continuous bijective mapping from H
onto ran(L) . Therefore L−1 : ran(L) −→ K exists and is continuous. From
1
�x� = �L−1 (Rx, Ax)� ≤ �L−1 ��(Rx, Ax)� = �L−1 �(�Rx�2 + �Ax�2 ) 2 = �L−1 ��x�(R,A)
we conclude
c−1 −1
1 �x�(R,A) ≤ �x� ≤ �L ��x�(R,A) for all x ∈ K .
�
Theorem 5.11. Let R : K −→ H be a surjective linear operator, let A : K −→ Y be a
linear operator and let y ∈ Y . We set V := A−1 (y) and let M be a closed convex subset
in K . If ker(R) ∩ ker(A) = {θ} and if ker(R) is finite-dimensional then R(V ∩ M) is closed
and the spline-extremal problem
Minimize �Rx)� subject to x ∈ V ∩ M . (5.29)
has a uniquely determined solution w ∈ V ∩ M .

Proof:
Set S := R(V ∩M) . Obviously, S is convex. We want to show that S is closed. Let (xn )n∈N
a sequence in V ∩ M with limn R(xn ) = u . Then limn (Rxn , Axn ) = (u, y) . Therefore
(Rxn , Axn ))n∈N is a Cauchy sequence in H × Y with respect to the norm � · �(R,A) ; see
Lemma 5.10. Again, by Lemma 5.10 (xn )n∈N is a Cauchy sequence in K and therefore
convergent. Let x = limn xn . Then x ∈ M, Ax = y, Rx = u .
Now, there exists a uniquely determined v ∈ S with
�v� = inf
�
�v � �
v ∈S
We have v = Rw, w ∈ V ∩ M . Suppose that there is another w � ∈ V ∩ M with v = Rw � .

Then w − w � ∈ ker(R) ∩ ker(A) and by assumption, w = w � . �
136
Next we want to characterize the solution w in Theorem 5.11. We know from Theorem
3.1 (Kolmogorov’s criterion)
�Rw|Rv − Rw� ≥ 0 for all v ∈ V ∩ M . (5.30)
This is a variational inequality in the space H . Using the adjoint R∗ of R we obtain a

variational inequality in K:
�R∗ Rw|v − w� ≥ 0 for all v ∈ V ∩ M . (5.31)
The ideas in the section on variational inequalities can be applied to (5.32).
If we have no constraints M, i.e. if M = K, then the variational inequality (5.32)

becomes
Rw ∈ R(ker(A))⊥ . (5.32)
Lemma 5.12. Let R : K −→ H be a surjective linear operator, let A : K −→ Y
be a linear surjective operator and let y = Ax0 ∈ Y ; we set V(y) := x0 + ker(A) . Let
R(w) ∈ R(ker(A))⊥ with w ∈ V(y) . We assume
dim(ker(R)) < ∞ , dim Y < ∞ , ker(R) ∩ ker(A) = {θ} . (5.33)
Then
(a) R∗ (R(ker(A)))⊥ ) = ker(A) ∩ ker(R) .
(b) m + 1 = dim(R(ker(A)))⊥ ) = dim(ker(A) ∩ ker(R)) .
(c) ker(A)⊥ is finite-dimensional.
(d) dim(ker(A)⊥ ) = dim(ker(R)) + dim(ker(A)⊥ ∩ ker(R)⊥ ) .
(e) If b1 , . . . , bm is a basis of R(ker(A))⊥ then
�Rw|bi � = �y|ηi � , i = 1, . . . , m , (5.34)
where R∗ bi = A∗ ηi , i = 1, . . . , m .
Proof:
Ad (a) This can easily be shown.
Ad (b) Follows from the fact that R∗ is injective.
Ad (c) Clearly, dim(ker(A)⊥ ) = dim(ran(A)) < ∞ .
Ad (d) We have
ker(A)⊥ = ker(R) ∩ ker(A)⊥ ⊕ ker(R)⊥ ∩ ker(A)⊥ = ker(R) ⊕ ker(R)⊥ ∩ ker(A)⊥
since ker(R) ⊂ ker(A)⊥ due to the assumption ker(R) ∩ ker(A) = {θ} .

Ad (e) We have R∗ bi ∈ ker(R)⊥ ∩ ker(A)⊥ . Hence there exist ηi with R∗ bi = A∗ ηi , i =
1. . . . , m . Now, Rw = R(x0 + u) with x0 ∈ V(y) ∩ ker(A)⊥ , u ∈ ker(A) . Then for i =
1, . . . , m
�Rw|bi � = �Rx0 |bi � = �x0 |R∗ bi � = �x0 A∗ ηi � = �Ax0 |ηi � = �y|ηi � .
137
We conclude from Lemma 5.12 that the solution w of the spline-problem can be com-
puted as follows: Solve the linear system
Ba = r with B := (�bi |bj �) ∈ Rm,m , r = (�y|ηj �) ∈ Rm (5.35)
with solution a ∈ Rm and find w as follows:

m
�
−1 ⊥
w := R g ∈ ker(R) where g := a i bi . (5.36)
i=1
Notice that w ∈ ker(R)⊥ in (5.36) is well-defined due to the assumption ker(R) ∩ ker(A) =
{θ} .
Let us come back to the interpolatory spline at the beginning of this section. Here
K := H2 [a, b] , H = L2 [a, b] , R : K � f �−→ f �� ∈ H,

Y := Rn+4 , A : K � f �−→ (f � (a), f(a), f(t1 ), . . . , f(tn ), f(b), f � (b)) ∈ Y,
where τ : a = t0 < t1 < · · · < tn < tm = b is a partition of [a, b] . Then all the
requirements of the more general problem above are satisfied, especially the assumption
ker(R) ∩ ker(A) = {θ} . A basis b0 , . . . , bm in R(ker(A))⊥ is given by
�
t1 − t , t ∈ [t1 , t0 ]
b0 (t) := t 1 − t0
0 , else

 t − t i
 ti−1 − ti , t ∈ [ti−1 , ti ]

bi (t) := ti+1 − t
, t ∈ [ti , ti+1 ] , i = 1, . . . , n,

 t i+1 − ti

0 , else
� t−t
n
bn+1 (t) := tn+1 − tn , t ∈ [tn , tn+1 ]
0 , else
Since these basis-functions are piecewise linear functions the resulting solution solution
of the spline-problem is piecewise a polynom of degree three.
In order to find a nonlinear framework for the interpolatory splines one can replace
the optimization criterion as follows:
Minimize φ(Rx) subject to x ∈ V ∩ M , (5.37)
where φ is a functional defined on H . Since the optimization criterion has to be well-

defined one may K, H adapt to this requirement. This can be done in the concrete case
by replacing the space L2 [a, b] by an Orlicz-type space. In this framework the spline
problem has been discussed in [4, 5, 9]. For example, one can characterize on this way
piecewise rational functions by an extremal property.
Another approach is considered by B.J. McCartin [42]. Starting from the analogy of a
cubic spline to a beam a tension term is added to the governing differential equation so
giving rise to the exponential spline.
138
5.7 Variational inequalities
An obstacle problem
Let Ω be a bounded subset in R2 with boundary ∂Ω . Consider the problem
−Δu = f, in Ω , u = 0 in ∂Ω , u ≥ g in Ω (5.38)
Here f, g are given functions in Ω and ∂Ω, respectively. g is called the obstacle. We
reformulate this problem by �introducing the Sobolev space V := H10 (Ω), the �operator
A : V −→ V ∗ , �A(u), v� := Ω ∇u∇vdx and the functional b ∈ V ∗ , �b, v� := Ω fvdx .
The problem (5.38) becomes
�A(u) − b, v − u� ≤ 0 for all v ∈ C , (5.39)
where the obstacle is now described by
C := {w ∈ V : w ≥ g} .
Solving variational inequalities by projection methods

Given a nonempty closed and convex subset C of a Hilbert space H and a mapping
F : C −→ H, the variational inequality problem2 is to find a vector x∗ ∈ C such that
�F(x∗ )|x − x∗ � ≥ 0 for all x ∈ C . (5.40)
Variational inequalities provide a tool to formulate various equilibrium problems. Several

well known problems, such as systems of nonlinear equations, first order conditions for
linear and nonlinear optimization problems, and complementarity problems, are special
cases of variational inequalities.
A geometric interpretation of a solution of a variational inequality is that x∗ is a
solution if and only F(x∗ ) makes a non-obtuse angle with all the feasible directions going
into C from x . Kolmogorov’s criterion is a specific case of this variational inequality. If a
solution x∗ beongs to the interior of C then obviously F(x∗ ) = θ .
Let us just sketch the variational inequality which describes a necessary condition in
optimization. Consider the optimization problem
Minimize f(x) subject x ∈ C .
Here, f is a differentiable function from Rn −→ R and C is a convex subset of Rn . As we

know from analysis a necessary condition for a solution x∗ of the minimization problem
is the variational inequality
�∇f(x∗ )|x − x∗ � ≥ 0 for all x ∈ C
Setting F(x) := ∇f(x) we obtain a variational inequality.

Theorem 5.13. Let H be a Hilbert space and let C be nonempty closed convex subset of
H . Then for x∗ ∈ C the following conditions are equivalent:
2
Variational inequalities were introduced by Hartman and Stampacchia in 1966 for the study of partial
differential equations with applications in the field of mechanics.
139
(a) x∗ is a solution of the variational inequality (5.40).
(b) x∗ is a fixed point of H � x �−→ PC (x − τF(x)) for each τ > 0 .
Proof:
(a) =⇒ (b) Let τ > 0 . Define u := x∗ − F(x∗ ) . Then �x∗ − u|v − x∗ � ≥ 0 for all v ∈ C .
Kolmogorov’s criterion implies x∗ = PC (u) = PC (x∗ − τF(x∗ )) .
(b) =⇒ (a) We have �x∗ − (x∗ − τF(x∗ ))|v − x∗ � ≥ 0 for all v ∈ C . �
Using Theorem 5.13 we may solve the variational inequality by solving the fixed point
equation
x = PC (x − τF(x)) (5.41)
Since PC is nonexpansive we can use the fixed point theory of nonexpansive mappings. If
the constraint set C can be decomposed into a intersection of sets, C = ∩m
i=1 Ci , we may
use a cyclic projection method to solve the fixed point equation.
Common solutions of variational inequalities

Let H be a Hilbert space. Let there be given for each i = 1, . . . , m
• a mapping Ai : H ⇒ H,
• a nonempty closed convex subset Ci .
Then one may consider the problem (CVI):
Find a point x ∈ ∩m
i=1 Ci such that for each i = 1, . . . , m there exists ui ∈ Ai (x)
satisfying
�ui |v − x� ≥ 0 for all v ∈ Ci , i = 1, . . . , m . (5.42)
5.8 A proof of Schauder’s fixed point theorem

Theorem 5.14 (Fixed point Theorem of Schauder, 1930/Second Version). Let X be a
Banach space and let M be a nonempty compact convex subset of X . If F : M −→ M is
continuous then F posesses a fixed point.
Proof:
As the closed linear span of the compact set M is a separable normed space, we may
assume, without loss of generality, that X is separable. Moreover, since a separable
Banach space can be renormed such that the norm is equivalent to the given norm and
strictly convex, we can assume that the norm in X is strictly convex; see Koethe [34].
Notice that the existence of a fixed point is not influenced by renorming. For each n ∈ N,
let Sn be a finite subset of M such that M ⊂ Sn + B1/n . Notice that the choice of Sn is
possible since M is totally bounded3 . Let Cn be the closed convex hull of Sn . Then Cn is
a compact convex subset of M ; see the Lemma 5.15 below. Thus, PCn is single-valued and
continuous. Let Zn be the linear span of Sn . Then dim Zn < ∞. Define Fn : Zn −→ Zn
by Fn (x) := PCn (x) . By Brouwer’s fixed point theorem Fn has a fixed point xn ∈ Cn . Now
�Fn (x) − F(x)� ≤ 1/n for all x ∈ M .
3
A set in a metric space is totally bounded if it can be covered by finitely many balls of any fixed size.
140
Therefore, �F(xn )−xn � ≤ 1/n , n ∈ N . Now, it is clear that every cluster point of (xn )n∈N ,
and such cluster points exist since M is compact, is a fixed point of F . �
Lemma 5.15 (Mazur, 1934). Let X be a normed space and let M ⊂ X be such that M is
compact. Then co(M) is compact too.
Proof:
Let ε > 0 . Since M is compakt there exist x1 , . . . , xn in M with
M ⊂ ∪ni=1 Bε/2 (xi ) .
Let x ∈ co(M) . Then this x has a presentation

m
� m
�
i 1 m
x= ai y , y , . . . , y ∈ M, a1 , . . . , am ≥ 0, ai = 1 .
i=1 i=1
Now, we have m indices j1 , . . . , jm such that yi − xji ∈ Bε/2 , i = 1, . . . , m . Therefore we

obtain m m
� �
ji
x− ai x = ai (yi − xji ) ∈ Bε/2 .
i=1 i=1
This implies K := co(M) ⊂ co({x1 , . . . , xn }) + Bε/2 . Obviously, K is bounded. Since K is

a convex subset of a finite-dimensional space K is totally bounded. Therefore there exist
z1 , . . . , zk ∈ K with K ⊂ ∪ki=1 Bε/2 (zi ) . This implies
K = co(M) ⊂ ∪ki=1 Bε (zi ) .
This shows co(M) is relatively compact. This implies co(M) is compact too. �
5.9 Some inverse problems

Recovery of signals
A signal is a function in L2 (R) . In communications we often deal with signals which are
presumed to have no very high frequency components. We make this class of signals precise
by use of Fourier transforms and say that a signal L2 (R) is band limited with band
[−Ω, Ω] if its Fourier transform f^ vanishes for |ω| > Ω . With the family of characteristic
function χa , defined by �
1, |ω| ≤ a
χa (ω) := ,
0, otherwise
a band-limited signal f has the property
^
f(ω) ^
= χΩ (ω)f(ω), ω ∈ R.
We define
BΩ := {f ∈ L2 (R) : χΩ f^ = f}
^,
BΩ is a closed subspace of L2 (R) .
141
Let f ∈ L2 (R) be considered as a signal from which we know the segment g :=
f|[−τ,τ] (τ > 0) . Without any further knowledge about f we cannot reconstruct f from
g . But if we have the information that the the signal f is contained in BΩ for some
bounded interval ω then it is possible to reconstruct f from g . This follows from the fact
that f must be an analytic function due to the representations
�
1 ^
f(t) = √ f(ω)e iωt
dω , t ∈ R . (5.43)
2π [−Ω,Ω]
If we define
Dτ := {f ∈ L2 (R) : χτ f = g} , (5.44)
then our assumptions can be formulated as follows:
f ∈ B Ω ∩ Dτ .
Dτ is a closed affine subspaces of L2 (R) . Therefore, the metric projections of L2 (R) onto
Dτ , BΩ may be considered and used. They are of the following form:
Dτ : L2 (R) � f −
� → χτ f ∈ Dτ , (5.45)
BΩ : L2 (R) � f −� → χΩ f^ ∈ BΩ . (5.46)
Set
C := {f ∈ L2 (R) : (χΩ − 1)f^ = θ} , D := {f ∈ L2 (R) : Dτ f = g} .
Using these notations the reconstruction may be rewritten as a feasibilty problem:
Find a signal f ∈ C ∩ D .
Remark 5.16. A related theme is the problem of phase retrieval of images from data
|�aj |x�| = bj , j = 1, . . . , m .
Consider (see [17])
C := {(x, z) ∈ Cn × Cm : �aj |x� = z, j = 1, . . . , m} ,

D := {(x, z) ∈ Cn × Cm : |zj | = bj , j = 1, . . . , m} .
Notice that �·|·� is an inner product in Cn . �
Reconstruction of radiographs
The reconstruction problem of x-ray tomography (x-ray slice imaging) is used to recon-
struct the internal structure of a solid object from external measurements. Tomography
�
((τóµoσ = slice, γραφειν=aufzeichnen) means to reconstruct a three-dimensional object
B from its two-dimensional slices. Objects of interest in x-ray imaging are described by
a real-valued function defined on R3 , called the attenuation coefficient. The attenuation
coefficient quantifies the tendency of an object to absorb or scatter x-rays of a given en-
ergy. This function varies from point-to-point within the object and is usually taken to
vanishes outside Ω . The attenuation coefficient, like density, is nonnegative. It is useful
for medical imaging because different anatomical structures have different attenuation
142
coefficients. Bone has a much higher attenuation coefficient than soft tissue and different
soft tissues have slightly different coefficients. For medical applications it is crucial that
normal and cancerous tissues also have slightly different attenuation coefficients.
Using the assumption that x-rays travel along straight lines Iθ,s the x-ray flux is atten-
uated along this line. Here θ (angle) and s (distance) are parameters identifying the line
in a coordinate system. Then a a radiograph is a line integral like this:
�
f(l)dl
Iθ,s
where f is the density of the object in the slice. The collection of all integrals of f along
the lines in the plane defines a function Rf on the unitsphere in R2 times R . It is called the
Radon transform. To solve the (inverse) problem of tomography is the task to invert
the Radon transform R . J. Radon gave 1917 an inversion formula for R .
To invert the Radon transform in practice there are two main groups of methods in the
focus: analytic methods and iterative methods. Analytic methods use mainly properties
of the transform R to obtain the density f back from Rf by theoretical considerations.
A famous method is the so called filtered back projection. Iterative methods solve
the linear system of equations which one get by discretization of a finite number of line
integrals. As we know from the consideration concerning feasibility such a system may
be solved by successive projection on hyperplanes. A famous method is the so called
algebraic reconstruction technique (ART). We shall consider this method in the next
chapter. As rule, the numerical realization of analytic methods is faster than iterative
methods but it needs a lot of physical knowledge. The iterative methods allow more
physical and geometrical variations.
A Cauchy problem for an elliptic equation

Let Ω ⊂ R2 be an open bounded and simply connected domain. The problem we consider
is the evaluation of the trace of the solution of an elliptic differential equation at that part
of the boundary where no data was described, actually at ∂Ω\Γ . As a compensation, we
observe the Neumann data of the solution on Γ . Such a problem is called a Cauchy
problem. It is well known that an elliptic Cauchy problem is ill posed mainly due to
the lack of continuous dependence of the solution. This can be shown by considering the
following Cauchy problem:
Δu(x1 , x2 ) = 0 , (x1 , x2 ) ∈ Ω := (0, 1) × (0, 1)

u(x1 , 0) = 0 , x1 ∈ (0, 1) (5.47)
∂
∂x2
u(x1 , 0) = g (x
k 1 ) , x1 ∈ (0, 1)
where gk (x1 ) = (πk)−1 sin(πkx1 ), x1 ∈ (0, 1) . It is easily verified that the uniquely deter-
mined solution u = uk is given by
u(x1 , x2 ) = (πk)−2 sin(πkx1 ) sinh(πkx2 ) , (x1 , x2 ) ∈ (0, 1) × (0, 1) . (5.48)
As we see, the sequence (gk )k∈N converges uniformly to zero. Taking the limit k → ∞
we obtain a Cauchy problem with homogeneous data which admits the trivial solution
only. But for every x2 > 0 the solutions uk oscillate stronger and stronger and became
143
unbounded as k → ∞ . Consequentely, the sequence (uk )k∈N does not converge to zero
(in any reasonable topology).
Let us consider the problem (5.48) in a more general setting:
Du = θ , in Ω
u = f , in Γ (5.49)
uν = g , in Γ
where D is a second order elliptic differential operator and f, g are given functions on Γ .
Let Γ1 be such that ∂Ω = Γ ∪ Γ1 and Γ ∩ Γ1 = ∅ . Then we consider the following iteration:
Given φ0 on Γ
Du = θ in Ω Dw = θ in Ω
u = f in Γ w = ψk in Γ1
uν = φk in Γ1 wν = g in Γ
φk+1 := wν |Γ1 .
In the iteration step above, two differential equations are to solved and two trace operators
are applied. Actually, we generate two sequences: the first one of Dirichlet traces and
the second one of Neumann traces, both defined on Γ1 . Using trace operators γd , γn
and solution operators for the Dirichlet- and Neumann-problems we conclude that the
iteration can be reformulated as
φk+1 = T (φk ) = Tl (φk ) + z , k ∈ N0 . (5.50)
with a linear mapping Tl and a function z which depends on the data f, g only. From
the uniqueness of a Cauchy problem we obtain that a fixed point of T is a solution of
the Cauchy problem. The main difficulty is to find the appropriate spaces of functions
defined on the part Γ1 of the boundary. This can be done (see [40], [8] and [36]) and one
obtains that the Tl is a linear bounded selfadjoint nonexpansive mapping. An application
of the results which we present in Chapter 7 leads to the result that the iteration (5.50)
converges to a fixed point of T .
Online parameter identification

The identification of parameters in partial differential equations arises in a variety of ap-
plications. Classical applications are heat conduction problems, population models, seis-
mic explorations and reservoir simulation, computer– and impedance–tomography. The
mathematical models which describe these applications contain (function-) parameters
which cannot, in general, be measured directly. They must be determined by measuring
consequences they produce. Such problems are called inverse problems where the main
difficulty to solve identification problems consists in the fact that there is a lack of stability
in such problems (ill-posedness).
Here we are concerned with adaptive techniques: determination of the parameter par-
allel to the process of gathering the data. They have been developed by engineers for
control problems governed by ordinary differential equations; see [37], [41], [44]. These
techniques can also be used in the case of partial differential equations as it was shown in
[3], [11] and [46].
144
The system we want to consider has the following formulation (in an evolutionary
style):
Dt z = A(p)z + A0 z + f(t), t > 0 , z(0) = z0 (5.51)
p is the parameter we want to determine from the observation of the state t �−→ z(t) .
Lateron we weaken this assumption. The cauchy problem (5.51) is an evolutionary linear
system since A(p), A0 are supposed linear operators. For the moment, we assume that
the state is defined for all t > 0, later on we will consider the case that the process is
known on a finite interval (0, T ) only. We call the system a bilinear system due to the
fact that we assume that the mappimg (p, z) �−→ A(p)z is bilinear.
Now, we formulate the equation for the model reference state x as follows:
Dt x = Cx + A(q)z(t) + A0 z(t) − Cz(t) + f(t), t > 0, x(0) = x0 . (5.52)
Here C is a mapping with stroong ellipticity properties.
In order to find an adaptation rule we exploit the bilinearity of the mapping (q, v) �−→
�A(q)z(t), v� :
�A(h)z(t), v� = �B(z(t), v)|h�H , h ∈ H, v ∈ V, t > 0 , (5.53)

�B(z(t), ·)� ≤ α , t > 0 . (5.54)
Using this adjoint“ B we are able to formulate the following adaption rule:
”
Dt q = −B(z(t), x − z(t)), t > 0, q(0) = q0 . (5.55)
Notice that the system (5.52), (5.55) is a nonautonomous system for the evolution of x, q .
We set
e := x − z , r := q − p
Then we obtain from (5.52), (5.55) the following error system:
Dt e = Ce + A(r)z(t) , t > 0 , e(0) = e0 , (5.56)

Dt r = −B(z(t), e) , t > 0 , r(0) = r0 . (5.57)
In a system notation this reads as follows:

� � � � � �
e e C A(·)z(t)
Dt = M(t) where M(t) =
r r −B(z(t), ·) Θ
We define
1
L(e, r) := (�e�2H + �r�2H ) , e ∈ H , r ∈ H .
2
The main idea for the analysis is to show that L may plays the role of a Ljapunov–
function. A first step is to prove Outputidentifiability:
lim �e(t)�H = 0 . (5.58)

t→∞
To establish parameter convergence, an additional hypothesis is required. Such a hy-

pothesis is an extension of the finite-dimensional notion of persistence excitation; see for
145
instance [43, 47]. Such a hypothesis may be also considered as a richness-condition. Under
such assumptions parameter-identifiability may be proved:
lim �r(t)�H = 0 . (5.59)

t→∞
The model reference adaptive method can be applied als in the case when the state
z can be observed in a time interval [0, T ] only. This case is similar to the solution of
linear systems by using the idea of Kaczmarz which is also known as the ART-procedure
in tomography: on uses the state z(t), t ∈ [0, T ], in a cyclic way; see [10, 27, 26, 32]. In
this manner we obtain a sequence (en , rn )n∈N with
en , rn : [0, T ] � t �−→ (en (t), rn (t) ∈ V × H .
By analogous reasoning one obtain the results
lim �en �H = 0 and lim �rn �H = 0 ;

n n
see [46].
The problem statement above has an important drawback: the state has completely
to be observed. In the engineering literature this problem is tackled on by introducing
a dynamical observer. But this works only well in the finite-dimensional“ case: the
”
state equation has to be a system of ordinary differential equations. An observer theory
for partial differential equations is difficult to realize. Since a partial observation gives
occasion to use the projection on the observable part of a state the state z in the model
reference system. A promising approach for online identification in the case when a partial
observation of the state is available only, is presented by Boiger and Kaltenbacher [13].
Here noisy data are also considered.
5.10 Matrix completion – a non-convex feasibility

problem
A partial real matrix is a matrix A ∈ Rn,n for which entries in certain locations are not
known. The problem of matrix completion is the following:
Given a partial matrix find a completion belonging to a specified family of
matrices.
This problem can be formulated as a feasibility problem:
Given a partial matrix A find a completion B in a set C := ∩m i=1 Ci where
C is the intersection of C1 , . . . , Cm . C1 , . . . , Cm are chosen such that the in-
tersection is equal to the set of completions of A with the specified matrix
family.
The simplest such case is when C1 is the set of all completions of A and the intersection
C2 , . . . , Cm equals to the desired matrix family. Here we consider the completion problem
for the family of Euclidean distance matrices. This is subclass of the set S n of symmetric
146
matrices in Rn,n . Euclidean distance matrices are a useful description of point sets. A
typical task is to retrieve the original point configuration. But Euclidean distance matrices
may be noisy and/or incomplete.
In this section � · � is the Euclidean norm in a space Rl,l .
Definition 5.17. Let D ∈ S n , D = (dij )i,j=1,...,n .
(a) D is called a predistance matrix if the following hold:
dii = 0, i = 1, . . . , n , aij ≥ 0, i, j = 1, . . . , n, i �= j .
(b) A predistance matrix D is called a Euclidean distance matrix if there exist points
x1 , . . . , xn ∈ Rq such that
�xi − xj �2 = dij , i, j = 1, . . . , n ,
where here (and in the following) � · � is the euclidean norm in Rn . The points
x1 , . . . , xn are called generating points and we say that D is embeddable in Rq .
The lowest number q such that generating points exist Rq is called the embedding
dimension emb(D) .
�
Notice that generating points of an Euclidean distance matrix are not uniquely deter-
mined. This follows from the fact that Euclidean distances are invarint under a translation
and a rotation. Moreover, one can easily show that the embedding dimension of a Eu-
clidean distance matrix D satisfies emb(D) ≤ n − 1 .
Suppose we have an Euclidean distance matrix D generated by points x1 , . . . , xn ∈ Rq .
Then the Gram matrix G := (�xi |xj �)i,j=1,...,n has the representation
1
G = (1(d1 )t − D + d1 1t ) where d1 is the first column of D (5.60)
2
and 1 is a the vector with entries 1. This leads to the fact that the point set {x1 , . . . , xn }
can be regained from G by an eigenvalue decomposition of G .
Euclidean distance matrices have found applications in psychometrics, crystallography,
machine learning, wireless sensor networks, acoustics, and more. For a tutorial see the ar-
ticle [20] and [38, 39, 50]. In [39] Euclidean distance matrices are used to study the fractal
dimension of attractors of dynamical systems by determining the embedding dimension
of associated Euclidean distance matrices. One of the most important application is the
study of protein conformation determination. Here methods using the metric projection
are successfully applied; see [14, 21]. Now, we follow [14].
To reformulate the completion problem as a feasibility problem we look at the following
characterization of a Euclidean distance matrix; see [28].
Theorem 5.18. Let D ∈ S n be a predistance matrix. Then D is a Euclidean distance
^ ∈ Rn−1,n−1 in the presentation
matrix if and only if the matrix D
� �
^ u
D
Q(−D)Q = (5.61)
ut α
^ ≤
is positive semi-definite. In this case, the embedding dimension emb of D is k = rank(D)
n − 1.
147
The matrix Q in (5.61) is the Householder matrix given by
√
Q = I − 2vvt �v�−2 where v = (1, 1, . . . , 1, 1 + n) ∈ Rn .
For a partial predistance matrix D = (dij ) consider the associated adjacent matrix
W = (wij ) ∈ Rn,n is defined as follows:
�
1 if dij is known
wij =
0 if dij is unknown
We assume that D is embeddable in Rq . Then we define
C1 := {A = (aij ) ∈ S n : aij ≥ 0, aij = dij if wij = 1}

� �
^ u
A
n
C2 := {A ∈ S : Q(−A)Q = ^ ∈ S+n−1 , u ∈ Rn−1 , α ∈ R, rank(A)
,A ^ ≤ q}
ut α
Then the completion problem can be reformulated as a feasibility problem
Find A ∈ C1 ∩ C2 (5.62)
Suppose that Rn,n is endowed with the Frobenius-norm. Clearly C1 is closed and convex
and the projection PC1 (A) of A ∈ Rn,n can easily computed. But C2 is non-convex and
does not have the chebyshevian property. Nevertheless, we may consider the the projection
PC2 (A) of A ∈ Rn,n ; we refer to [14] for this computation. Several numerical simulations
show that completion of a Euclidean distance matrix is possible by the alternate projection
method.
Remark 5.19. The completion of a Euclidean distance matrix may be done by other
approaches: exact completion by graph methods, methods of least squares, approximate
completion by semidefinite optimization using interior point methods; see [38] for a short
survey. �
5.11 Appendix: Variational inequalities

5.12 Conclusions and comments
5.13 Exercises
1.) Let f ∈ C[a, b] . What is the best approximation in P0 ?
2.) Let f ∈ C[−1, 1], f(t) := t3 . Show that p0 (t) := (1 − t2 ) is the best approximation
of f in the subspace span({1, t2 }) of C[−1, 1] .
3.) Consider the subspace span({1, t2 }) of C[−1, 1] . Sow that there exists f ∈ C[−1, 1]
such the best approximation is not uniquely determined.
4.) Let f ∈ C[a, b] and let p ∈ Pn with f(ti ) − p(ti ) = (−1)i εi , sign(εi ) = const at
n + 2 consequtive points t0 , . . . , n + 1 . Show dist(f, Pn ) mini=1,...,n+1 |εi | .
148
5.) Show that the subspace span({1, t cos(t), t sin(t)}) of C[0, π] satisfies the Haar con-
dition.
6.) Let Rn endowed with l2 -norm and let
�n
U := {x = (x1 , . . . , xn ) : xi = 0} .
i=1
(a) Show U is a linear subspace with dim U = n − 1 .

(b) Compute PU (ei ), i = 1, 2, . . . , n . (Here: ei = (δij ) .)
(c) Compute PU (x) for all x ∈ Rn .
7.) Let Rn endowed with the l2 -norm and let
C := {x = (x1 , . . . , xn ) : x1 ≤ x2 ≤ · · · ≤ xn } .
(a) Show that C is a closed convex cone.

(b) Show that C is a Chebyshev set.
(c) Compute PC (ei ) for i = 1, . . . , n (Here: ei = (δij ) .).
8.) Show that if a Banach space has a Schauder basis then it is separabel.
9.) Let H = R3 with the Euclidean norm and let
H := {(x, y, z) ∈ H : x + 2y = 0} .
Define on H the linear functional λ by �λ, (x, y, z)� := x − y . Find �λ� and extend
λ to H without changing the norm.
10.) Let X = R3 with the l1 norm and let
U := {(x, y, 0) ∈ X : x + 2y = 0} .
Define on U the linear functional λ by �λ, (x, y, z)� := x . Find �λ� and find two
norm-preserving extensions of λ to X .
11.) Let H be a Hilbert space and let B : H × H −→ R be bilinear. Suppose that
B(·, y) is continuous for all yH and B(x, ·) is continuous for all x ∈ H . Show that
B is continuous, i.e.
|B(x, y)| ≤ c�x��y� for all x, y ∈ H for some c ≥ 0 .
Hint: Uniform boundeness principle.

12.) Let X be a Banach space and let U, V be linear subspaces with X = U ⊕ V .
Let x ∈ X , x = u + V, u ∈ U, v ∈ V ; define Px := v . Then we have a mapping
P : X −→ V . Then
(a) P is linear and P ◦ P = P .
(b) U, V are closed if and only if P is continuous.
149
Bibliography
[1] S. Al-Homidan. Hybrid methods for solving the educational testing problem. J. of
Computational and Applied Mathematics, 91:31–45, 1998.
[2] F. Albiac, L.L. Ansorena, S.J. Dillworth, and D. Kutzarova. Existence and uniqueness
of greedy bases in banach spaces. https://arxiv.org/abs/1505.08119, x:1–28, 2015.
[3] H.W. Alt, K.-H. Hoffmann, and J. Sprekels. A numerical procedure of solve certain
identification problems. Intern. Ser. Numer. Math., 68:11–43, 1984.
[4] J. Baumeister. Über die Extremaleigenschaft nichtlinearer Splines. Numer. Math.,

25:433–455, 1976.
[5] J. Baumeister. Variationsprobleme in Orliczräumen und Splines. Manuscripta Math.,

20:29–49, 1977.
[6] J. Baumeister. Stable Solution of Inverse Problems. Vieweg, Braunschweig, 1987.

[7] J. Baumeister. Konvexe Analysis, 2014. Skriptum WiSe 2014/15, Goethe–Universität
Frankfurt/Main.
[8] J. Baumeister and A. Leitão. On iterative methods for solving ill-posed problems modeled
by PDE’s. Journal of Inverse and Ill-Posed Problems, 9:13–29, 2001.
[9] J. Baumeister and L. Schumaker. Nonlinear classes of splines and variational problems. J.
of Approximation Theory, 18:63–73, 1976.
[10] J. Baumeister, B. Kaltenbacher and A. Leitão. On Levenberg-Marquardt-Kaczmarz itera-

tive methods for solving systems of nonlinear equations. Inverse Problems and Imaging,
4:335–350, 2010.
[11] J. Baumeister, W. Scondo, M.A. Demetriou and I.G. Rosen. On–line parameter estima-
tion for infinite dimensional dynamical systems. SIAM Journal on Control and Optimiza-
tion, 35:678 – 713, 1997.
[12] D. Bengs. Greedy Approximation im Raum der Funktionen von beschränkter Variation
und Anwendungen auf die Bildverarbeitung. Master’s thesis, Goethe-Universität, 2012.
Fachbereich Informatik und Mathematik.
[13] R. Boiger and B. Kaltenbacher. A online parameter identification method for time depen-
dent partial differential equations. Inverse Problems, page 28 pages, 2015.
[14] J.M. Borwein and M.K. Tam. Reflection methods for inverse problems with applications
to protein conformation determination. pages 1–18.
150
[15] E.W. Cheney and A. Goldstein. Proximity maps for convex sets. Proc. Amer. Math. Soc.,
10:448–450, 1959.
[16] M.T. Chu and J.W. Wright. The educational testing problem revisited. IMA J. Numer.
Anal., 15:141–160, 1995.
[17] P.L. Combettes. The convex feasibility problem in image recovery. Advances in Imaging
and Electron Physics, 95:155–270, 1996.
[18] S.J. Dillworth, N.J. Kalton, and D. Kutzarova. On the existence of almost greedy bases in
banach spaces. XXX, pages xx–xx, 2012.
[19] S.J. Dillworth, D. Freman, E. Odell and T. Schlumprecht. Greedy bases for Besov spaces.
Constr. Approximation, pages xx–xx, 2010.
[20] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli. Euclidean distance matrices.

arXiv:1502.07541v2, 2015.
[21] Q. Dong and Z. Wu. A geometric build-up algorithm for solving the molecular distance
geometric problem with sparse distance data. J. Global Optim., 26:321–333, 2003.
[22] P. Enflo. A counterexample to the approximation problem in Banach spaces. Acta Mathe-
matica, 130:309–317, 1973.
[23] R. Fletcher. A nonlinear programming problem in statistics (educational testing). SIAM

J. Sci. Stat. Comput., 2:257–267, 1981.
[24] R. Fletcher. Semi-definite matrix constraints. SIAM J. Control and Optimization, 23:493–
513, 1985.
[25] W. Glunt. An alternating projections method for certain linear problems in a hilbert space.
IMA Journal of Numerical Analysis, 15:291–305, 1995.
[26] M. Haltmeier, R. Kowar, A. Leitao, and O. Scherzer. Kaczmarz methods for regularizing
nonlinear ill-posed equations. II. Applications. Inverse Probl. Imaging, 1(3):507–523, 2007.
[27] M. Haltmeier, A. Leitao, and O. Scherzer. Kaczmarz methods for regularizing nonlinear
ill-posed equations. I. convergence analysis. Inverse Probl. Imaging, 1(2):289–298, 2007.
[28] T.L. Hayden and J. Wells. Approximation by matrices positive semidefinite on a subspace.
Linear Algebra and its Applications, 109:115–130, 1988.
[29] N. Higham. Computing a nearest symmetric positive semi-definite matrix. Linear Algebra
and Appl., 103:103–118, 1988.
[30] N. Higham. Computing the nearest correlation matrix – a problem from finance, 2002.
[31] R. Holmes. A Course on Optimization and Best Approximation. Springer, 1971.
[32] S. Kaczmarz. Approximate solution of systems of linear equations. Internat. J. Control,

57:1269–1271, 1993.
[33] A. Kirsch. An Introduction to the Mathematical Theory of Inverse Problems. Springer,

New York, 1996.
151
[34] G. Koethe. Topological linear spaces. Springer, Berlin, 1960.
[35] S.V. Konyagin and V.N. Temlyakov. A remark on greedy approximation in banach spaces.
East J. Approx., 5:365–379, 1999.
[36] Kozlov, V.G. Maz’ya, and A:V: Fomin. An iterative method for solving the cauchy problem
for elliptic equations. Comput. Maths. Phys., 31:45–52, 1991.
[37] G. Kreisselmeier. Adaptive observers with exponential rate of convergence. IEEE Trans.
Autom. Contr., 22:509 – 535, 1977.
[38] Miriam Kreth. Matrixergänzungsprobleme. Master’s thesis, Goethe-Universität, 2004.

Fachbereich Informatik und Mathematik.
[39] Miriam Kreth. Distanzmatrizen, erzeugende Punkte und Einbettungsdimension. Disserta-

tion, Goethe Universität, 2008.
[40] A. Leitao. An iterative method for solving elliptic Cauchy problems. Numer. Func. Anal.
Optim., 21:715–742, 2000.
[41] P.M. Lion. Rapid identification of linear and nonlinear systems. AIAA J., 5:1835 – 1841,
1987.
[42] B.J. McCartin. Theory of exponential splines. J. of Approximation theory, 66:1–23, 1991.
[43] K.S. Narendra and A.M. Annaswammy. Persistent excitation in adaptive systems. Int. J.
Control, 45:127 – 160, 1987.
[44] K.S. Narendra and P. Kudva. Stable adaptive schemes for system identification and control
i,ii. IEEE SMC–4, 4:542 – 560, 1974.
[45] F. Natterer. The Mathematics of Computerized Tomography. John Wiley, New York, 1986.
[46] W. Scondo. Ein Modellabgleichungsverfahren zur adaptiven Parameteridentifikation in

Evolutionsgleichungen. Master’s thesis, Fachbereich Mathematik, Goethe–Universität,
Frankfurt/Main, 1987.
[47] N. Shimkin and A. Feuer. Persistency of excitation in continuous-time systems. Systems

& Control Letters, 9:225–233, 1987.
[48] V.N. Temlyakov. The best m-term approximation and greedy algorithms. Advances in
Computational Mathematics, 8:249–265, 1998.
[49] V.N. Temlyakov. Greedy approximation. Acta Mathematika, 17:1069–1083, 2008.
[50] Annika Wacker. Das Matrix-Ergänzungsproblem und die Berechnung einer besten
Approximation einer Korrelationsmatrix. Master’s thesis, Goethe Universität, 2016. Fach-
bereich Informatik und Mathematik.
[51] D. Werner. Funktionalanalysis. Springer, 2002.
152

Nonex Kap5

Uploaded by

Copyright:

Available Formats

Nonex Kap5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonex Kap5

Uploaded by

Copyright:

Available Formats

Chapter 5

Metric Projection: applications

5.1 Approximation problems

for some coeﬃcients a0 , . . . , am ∈ R . If am �= 0, then the polynomial is said to be of degree

|f(ta ) − p0 (ta ) + ag(ta )| = �f − p0 + ag�∞ ≥ �f − p0 �∞ ≥ |f(ta ) − p0 (ta )|

(f(t0 ) − p0 (t0 ))(p0 (t0 ) − g1 (t0 )) ≥ 0

Since t0 is a peak point of f − p0 we have |f(t0 ) − p0 (t0 )| = �f − p0 �∞ and we obtain

Since g1 ∈ Pn was arbitrary chosen p0 is a best approximation of f . �

The characterization of a best approximation in Theorem 5.1 is proved on a direct way.

Each p ∈ Pn , p �= θ, has at most n zeros (5.3)

has a uniquely determined solution g in the Haar space.

f(t0 ) − p0 (t0 ) = −(f(t0 ) − p0 (t0 )) = �f − p0 �∞ = E := dist(f, Pn ) .

max (f(ξ) − p0 (ξi ))q(ξi ) = max (−|f(ξi ) − p0 (ξi )|2 ) < 0 .

This is a contradiction to condition (b) in Theorem 5.1. �

Theorem 5.3. Pn is a Chebyshev subspace of C[a, b] .

�f − p0 �∞ = �f − p1 �∞ = �f − p�∞ = dist(f, Pn ) > 0 .

5.2 Greedy approximation

One calls such a construction of an approximant a linear m-term approximation.

Find w ∈ Um which satisﬁes �x − w� = inf �x − u� = dist(x, Um ) .

min |�x|ei �| ≥ max |�x|ei �|

Clearly, a greedy subset is not necessarly uniquely determined. We set

σm (x) := �x − SΛm (x)� where Λm is a greedy subset .

Algorithm 5.1 Greedy algorithm in Hilbert space

(2) For j = 1, . . . , m repeat

(3) Set SΛm := xm .

Greedy approximation in Banach space

We set Um := span({b1 , . . . , bm }) and consider the approximation problem

Find w ∈ Um which satisﬁes �x − w� = inf �x − u� = dist(x, Um ) .

�Sm (x)� ≤ BC �x� for all m ∈ N, x ∈ X .

�x − Sm (x)� ≤ �x − u + u − Sm (x)� ≤ �x − u� + �Sm (x − u)� ≤ (1 + BC)�x − u� .

Obvously, σm (x) ≤ dist(x, Um ) for all m ∈ N, x ∈ X . Moreover, we have σm (bm+1 ) <

A permutation ρ : N −→ N is called decreasing for x if

|λρ(i) (x)| ≥ |λρ(i+1) (x)| for all i ∈ N

the m-th greedy-approximant. The m-th greedy approximant is a partial sum of a

5.3 Cea’s Lemma

Let V be a Hilbert space with inner product �·|·� and let a : V × V −→ R be a

(2) a is continuous, i.e. there exists a constant c1 > 0 such that

|a(u, v)| ≤ c1 �u��v� for all u, v ∈ V .

(3) a is coercive, i.e. there exist a constant c2 such that

a(u, u) ≥ c2 �u�2 for all u ∈ V .

Consider for y ∈ V the problem

Find u ∈ V such that a(u, v) = �y|v� for all v ∈ V . (5.6)

To solve this problem one chooses a ﬁnite-dimensional subspace Vh of V (the space of

�u − uh � ≤ c1 (c2 )−1 dist(u, Vh ) . (5.8)

c2 �u − uh �2 ≤ a(u − uh , u − uh ) = a(u − uh , u − v) + a(u − uh , v − uh ) .

Lemma 5.6 says that uh is the best approximation“ of u in Vh , up to a constant

a is then deﬁned on V × V as above. a is symmetric if p = 0, otherwise non-symmetric.

The coercivity of a is implied by

where 0 = ξ1 < ξ2 < · · · < ξm = 1 is a partition of [0, 1] . Then

To estimate �u − Ih u� is now a standard problem in the analysis of functions. �

5.4 Feasibility problems

Given C1 , . . . , Cm ﬁnd a point x ∈ C .

where A ∈ Rm,n and b ∈ Rm . Let ai be the ith row of A and deﬁne

We assume ai �= θ, i = 1, . . . , m . Then each Hi is a hyperplane. Now, Ax = b if and only

Determining separating hyperplanes

Maximize �c|x� subject to Ax ≤ b, x ≥ 0 , (5.14)

We deﬁne the constraints

Consider the primal semideﬁnite program:

Minimize �C|X subject to �Ai |X� = bi , i = 1, . . . , m , X � Θ , (5.16)