chapter_3 Performance Surface and Search Method

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

CHAPTER THREE

Ch3_Eigenanalysis and Performance Surface


Ch4_Search Methods
adaptive filters theory and applications
Behrouz Farhang-Boroujeny
EIGENVALUES AND
EIGENVECTORS
 Let

 A nonzero N-by-1 vector q is said to be an eigenvector of


R, if it satisfies the equation

 for some scalar constant λ. The scalar λ is called the


eigenvalue of R associated with the eigenvector q.
 We note that if q is an eigenvector of R, then for any
nonzero scalar a, aq is also an eigenvector of R,
corresponding to the same eigenvalue, λ
 To find the eigenvalues and eigenvectors of R, we note
2
that Eq. (4.2) may be rearranged as
3
Example: find eigenvalues and eigenvectors of .
Solution:
det(=0
)()=0

For

At the same manner for For

4
Example 1 : Find the eigenvalues and the
eigenvectors of the matrix .
Solution:

In general to form A - I from A one simply


subtracts  from the diagonal entries.
The characteristic equation is

So the eigenvalues are


1 = 5 and 2 = - 2
5
For 1 = 5
Let is the eigenvector of the matrix A
corresponding to 1 = 5.

So
- 4x + 3y = 0

4x - 3y = 0
6
Since the second equation is the negative of
the first, any solution to the first equation is
also a solution to the second. So it suffices to
solve the first equation.

y 1 4 8 …..

x 3/4 3 6 ……

So any multiple of the vector is an eigenvector for


1 = 5.
7
For 2 = - 2
Let is the eigenvector of the matrix A
corresponding to 2 = -2.

So
3x + 3y = 0

4x +4y = 0

8
 Since the second equation is the multiple of
the first, any solution to the first equation is
also a solution to the second. So it suffices
to solve the first equation.
x = -y

y -1 1 2 …..
x 1 -1 -2 ……

any multiple of the vector is an eigenvector for 2 = -2.

9
PROPERTIES OF
EIGENVALUES AND EIGENVECTORS
 Some of the properties derived here are directly related
to the fact that the correlation matrix R is Hermitian and
nonnegative definite.
 A matrix A, in general, is said to be Hermitian (or self-
adjoint matrix) if .
 The N-by-N Hermitian matrix A is said to be nonnegative
definite or positive semidefinite, if :

 The fact that A is Hermitian implies that is real-valued.


 the correlation matrix R is almost always
positive definite.

10
1. The eigenvalues of the correlation matrix R
are all real and nonnegative.
2. If qi and qj are two eigenvectors of the
correlation matrix R that correspond
to two of its distinct eigenvalues, then:

In other words, eigenvectors associated with


the distinct eigenvalues of the correlation
matrix R are mutually orthogonal.
3. Assume the eigenvectors q0, q1, . . . , qN-1
are all normalized to have a length of unity,
and define the
N-by-N matrix is then a unitary matrix, i.e., ,
This implies that the matrices Q and are the 11

inverse of each other.


4. For any N-by-N correlation matrix R, one can
always find a set of mutually orthogonal
eigenvectors. Such a set may be used as a
basis to express any vector in the N-
dimensional space of complex vectors.
5. Unitary Similarity Transformation. The
correlation matrix R can always be
decomposed as:

12
6. Let λ0, λ1, . . . , λN−1 be the eigenvalues of
the correlation matrix R. Then,

where tr[R] denotes trace of R and is defined


as the sum of the diagonal elements of R.
7. Minimax Theorem:

for i = 1, 2, . . . , N - 1

13
8. The eigenvalues of the correlation matrix R
of a discrete-time stationary stochastic
process {x(n)} are bounded by the
minimum and maximum values of the
power spectral density, , of the process.

14
9. Karhunen–Lo´eve expansion

15
THE CANONICAL FORM OF
THE ERROR-PERFORMANCE SURFACE
 We recall from the last lecture the
performance function {mean-squared error
(MSE)} of a transversal Wiener filter with a
real-valued input sequence x(n) and a
desired output sequence d(n) is

 Also, we recall that the optimum value of the


Wiener filter tap-weight vector is obtained
from the Wiener-Hopf equation
16
 The performance function ξ may be
rearranged as follows:

17
 we use eigen-decomposition to express the
correlation matrix R of the tap-input vector in
terms of its eigenvalues and associated
eigenvectors (see Appendix E in Haykin Textbook)

18
 This new formulation of the mean-square
error contains no cross-product terms, as
shown by

 where vk is the kth component of the vector


v.

19
 Example 4.3: Consider the case where a two-
tap transversal Wiener filter is characterized
by the following parameters:

We want to explore the performance surface of


this filter for values of α ranging from 0 to 1.
The performance function of the filter is
obtained by substituting the above parameters
in Eq. (4.81). This gives

20
 Solving the Wiener–Hopf equation to obtain
the optimum tap weights of the filter, we
obtain:

21
 To convert this to its canonical form, we
should first find the eigenvalues and
eigenvectors of R. To find the eigenvalues of
R, we should solve the characteristic
equation

22
23
24
25
SEARCH METHODS
1. METHOD OF STEEPEST DESCENT

26
 We recall from Chapter 2 that the optimum tap-weight
vector wo is the one that minimizes the performance
function

 where e(n) = d(n) - y(n) is the estimation error of the


Wiener filter. Also, we recall that the performance
function ξ can be expanded as:

 Here, we assume that R and p are available.


 however, this approach need difficult arithmetic
circuits (requiring the (computationally challenging)
inversion of the matrix Rx) not suitable for many
applications, therefore in the next section we 27
introduce another approach to find the tap-weight
vector w.
ITERATIVE SEARCH METHOD
 In this chapter we present a set of algorithms that
iteratively search for the minimum of the cost function.
 They do this based (at least) on the gradient of the cost
function, so they are often called deterministic gradient
algorithms.
 In order for the cost function to depend only on the filter
w, the statistics Rx and rxd must be given.
 In this way, these algorithms solve the Wiener-Hopf
equation iteratively, most of them without requiring the
(computationally challenging) inversion of the matrix Rx.
 However, all the information from the environment is
captured in the second-order statistics and these
algorithms do not have a learning mechanism for
adapting to changes in the environment. In the next
28
chapter we will see how adaptive filters solve this issue.
STEEPEST DESCENT ALGORITHM

 The method of steepest descent is a general scheme


that uses the following steps to search for the
minimum point of any convex function of a set of
parameters:
1. Start with an initial guess of the parameters whose
optimum values are to be found for minimizing the
function.
2. Find the gradient of the function with respect to these
parameters at the present point.
3. Update the parameters by taking a step in the
opposite direction of the gradient vector obtained in
Step 2. This corresponds to a step in the direction of
steepest descent in the cost function at the present
point. Furthermore, the size of the step taken is
chosen proportional to the size of the gradient vector.
29
4. Repeat Steps 2 and 3 until no further significant
change is observed in the parameters.
 To implement this procedure in the case of
the transversal filter shown in Figure 5.1, we
recall from Chapter 2 that

30
 As we shall soon show, the convergence of
w(k) to the optimum solution wo and the
speed at which this convergence takes place
are dependent on the size of the step-size
parameter μ.
 A large step-size may result in divergence of

this recursive equation

 where I is the N-by-N identity matrix.

31
THE V-AXES
 we substitute for p from Eq. (5.6). Also,we
subtract wo from both sides of Eq. (5.11) and
rearrange the result to obtain

 This is the tap-weight update equation in


terms of the v-axes
32
THE V’-AXES
 Recall that R has the following unitary
similarity decomposition

33
 The vector recursive Eq. (5.18) may be
separated into the scalar recursive equations

 the step-size parameter μ is selected so that

34
35
Starting with an initial value w(0) = [w0(0) w1(0)]T and letting the
recursive equation (5.29) to run, we get two sequences of the tap-
weight variables w0(k) and w1(k). 36
LEARNING CURVE
 The curve obtained by plotting ξ(k) as a
function of the iteration index, k, is called
learning curve. A learning curve of the
steepest-descent algorithm, as can be seen
from Eq. (5.31), consists of a sum of N
exponentially decaying terms, each of which
corresponds to one of the modes of
convergence of the algorithm.

 Each exponential term may be characterized


37
by a time constant, which is obtained as
follows.
38
39
The existence of two distinct time constants on the learning curve
in Figure 5.5 is clearly observed.
40
41
42
EFFECT OF EIGENVALUE SPREAD
 Our study in the last two sections shows that the
performance of the steepest-descent algorithm is
highly dependent on the eigenvalues of the
correlation matrix R.
 In general, a wider spread of the eigenvalues results
in a poorer performance of the steepest-descent
algorithm.
 To gain further insight into this property of the
steepest-descent algorithm, we find the optimum
value of the step-size parameter μ, which results in
the fastest possible convergence of the steepest-
descent algorithm.

43
THE GEOMETRICAL RATIO FACTOR

44
NEWTON’S METHOD
 Our discussions in the last few sections show
that the steepest-descent algorithm may
suffer from slow modes of convergence,
which arise due to the spread in the
eigenvalues of the correlation matrix R.
 This means that if we can somehow get rid of

the eigenvalue spread, we can get a much


better convergence performance. This is
exactly what Newton’s method does.

45
 To derive Newton’s method for the quadratic
case, we start from the steepest descent
algorithm given in Eq. (5.10). Using p = Rwo,
Eq. (5.10) becomes:

46
47
48
 Wiener filter and Steepest Descent are both methods used for signal
processing and optimization problems, but they have different approaches
and applications.

 The Wiener filter is a linear filter used to estimate a signal or a system from
a noisy signal. It is based on minimizing the mean square error between
the desired signal and the output of the filter. The Wiener filter takes into
account the statistical properties of the input signal and noise to estimate
the signal of interest. It is often used in applications such as image
processing, speech enhancement, and communication systems.

 On the other hand, Steepest Descent is an iterative optimization algorithm


used to minimize a cost function or maximize a performance metric. It is
based on the gradient descent method, where the algorithm takes steps in
the direction of the negative gradient of the cost function. The step size is
chosen using a line search algorithm. Steepest Descent is commonly used
in machine learning, optimization, and control systems.

 In summary, the Wiener filter is used for signal estimation and noise 49
reduction, while Steepest Descent is used for optimization and control
problems.

You might also like