On The Singular Values of The Hankel Matrix With Application in Singular Spectrum Analysis
On The Singular Values of The Hankel Matrix With Application in Singular Spectrum Analysis
On The Singular Values of The Hankel Matrix With Application in Singular Spectrum Analysis
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/236680545
CITATIONS
READS
51
2 authors:
Rahim Mahmoudvand
Mohammad Zokaei
7 PUBLICATIONS 74 CITATIONS
SEE PROFILE
SEE PROFILE
Time Series
Research Paper
Abstract
Hankel matrices are an important family of matrices that play a fundamental role in
diverse fields of study, such as computer science, engineering, mathematics and statistics.
In this paper, we study the behavior of the singular values of the Hankel matrix by
changing its dimension. In addition, as an application, we use the obtained results for
choosing the optimal values of the parameters of singular spectrum analysis, which is a
powerful technique in time series based on the Hankel matrix.
1.
Introduction
A Hankel matrix can be finite or infinite and its (i, j) entry is a function of i+j; see Widom
(1966). In other words, a matrix whose entries are the same along the anti-diagonals is
called the Hankel matrix. Specifically, an L K Hankel matrix H is a rectangular matrix
of the form
h1 h2 . . . hK
h2 h3 . . . hK+1
H = ..
(1)
.. . .
.. ,
.
. .
.
hL hL+1 . . . hN
where K = N L + 1.
Hankel matrices play many roles in diverse areas of mathematics, such as approximation
and interpolation theory, stability theory, system theory, theory of moments and theory
Corresponding author: Rahim Mahmoudvand. Department of Statistics, Shahid Beheshti University, PO. Box
1983963113, Evin, Tehran, Iran. Email: [email protected]
http://www.soche.cl/chjs
44
2.
In this section, we briefly introduce stages of the SSA method and discuss the importance
of using a Hankel matrix in the development of this technique.
45
(2)
Note that the matrix given in Equation (2) is a Hankel matrix as defined in Equation (1).
2nd step: singular value decomposition (SVD). In this step, we perform the SVD
of X. Denote by 1 , . . . , L the eigenvalues of XX> arranged in the decreasing order
(1 L 0) and by U1 , . . . , UL the corresponding
eigenvectors. The
SVD of X can
matrix Z to the form of a Hankel matrix HZ, which can be subsequently converted to a
time series. If zij stands for an element of a matrix Z, then the kth term of the resulting
series is obtained by averaging zij for all i, j such that i + j = k + 1. Hankelization HZ is
an optimal procedure, which is nearest to Z with respect to the matrix norm.
3.
Theoretical Results
Along the paper, the matrices to be considered are over the field of the real numbers. In
addition, we consider different values of L, whereas N is supposed to be fixed. Recall that,
for any operator A, the operator AA> is always positive, and its unique positive square
root is denoted by |A|. The eigenvalues of |A| counted with multiplicities are called the
singular values of A. In this section, we provide the main results of the paper.
3.1 On sum of square of the singular values of a Hankel matrix
Let L,N
denote the jth ordered eigenvalue of HH> . Then, for a fixed value of L, the trace
j
of HH> is given by
L,N
TH
= tr(HH> ) =
L
X
j=1
L,N
.
j
(3)
46
L,N
The behavior of TH
given in Equation (3), with respect to different values of L, is
considered in the following theorem.
Theorem 3.1 Consider the Hankel matrix H as defined in Equation (1). Then,
L,N
TH
=
N
X
wjL,N h2j ,
j=1
L NX
L+i
X
i=1
h2j .
(4)
j=i
N
X
Cj,L,N h2j ,
j=1
1 j L;
j,
L + 1 j K;
Cj,L,N = L,
N j + 1, K + 1 j N,
which is exactly equals to wjL,N . Similarly for the second case, we get
Cj,L,N
1 j K;
j,
K + 1 j L;
= K,
N j + 1, L + 1 j N ;
The weight wjL,N defined in Theorem 3.1 can be written in the functional form
wjL,N
.
2
2
2
2
(5)
L,N
is a symmetric function around line (N + 1)/2 with respect to j and L.
wj
L,N
The above mentioned results imply that the behavior of the quantity TH
is similar on
two intervals 2 L [(N + 1)/2] and [(N + 1)/2] + 1 L N 1, where, as usual, [x]
denotes the integer part of the number x. Therefore, we only need to consider one of these
intervals.
47
L,N
L,N
Theorem 3.2 Let TH
be defined as in Equation (3). Then, TH
is an increasing function of L on {2, . . . , [(N + 1)/2]}, a decreasing function on {[(N + 1)/2] + 1, . . . , N 1},
and
[ N +1 ],N
L,N
max TH
= TH 2
.
Proof First, we show that wjL,N is an increasing function of L on {2, . . . , [(N + 1)/2]}.
Let L1 and L2 be two arbitrary values, where L1 < L2 [(N + 1)/2]. From the definition
of wjL,N , we have
wjL2 ,N wjL1 ,N
0,
1 j L1 ;
L1 + 1 j L2 ;
j L1 ,
L2 + 1 j N L2 + 1;
= L2 L1 ,
j
+
1
L
,
N
L2 + 2 j N L1 + 1;
0,
N L1 + 2 j N.
Therefore, wjL2 ,N wjL1 ,N 0, for all j, and inequality is strict for some j. Thus,
L2 ,N
L1 ,N
TH
TH
=
X
wjL2 ,N wjL1 ,N h2j > 0.
(6)
j=1
L,N
This confirms that TH
is an increasing function of L on {2, . . . , [(N + 1)/2]}. Similar
L,N
approach for the set {[(N + 1)/2]+1, . . . , N 1} implies that TH
is a decreasing function
L2 ,N
L1 ,N
of L on this interval. Note also that TH TH
in Equation (6) increases as the value
L,N
of L2 increases too proving that TH
is an increasing function on {2, . . . , [(N + 1)/2]}.
L2 ,N
Therefore, the maximum value of TH
is attained at the maximum value of L, which is
[(N + 1)/2].
L,N
Lmax ,N
Corollary 3.3 Let Lmax denote the value of L such that TH
TH
, for all L, and
the inequality to be strict for some values of L. Then,
Lmax =
N +1
2 ,
N
2
and
if N is odd;
N
+ 1, if N is even.
2
Corollary 3.3 shows that L = median{1, . . . , N } maximizes the sum of squares of the
Hankel matrix singular values with fixed values of N . Applying Corollary 3.3 and Equation
(5), we can show that
wjLmax ,N
N + 1 N + 1
j .
=
2
2
L,N
Equation (7) shows that h[(N +1)/2] has maximum weight at TH
.
(7)
48
j = 1, . . . , L m,
HH> =
>
HH>
(1) HH(3)
HH>
(2)
HH>
(4)
where
HH>
(1)
K
P
K
P
h2j
hj hj+1
j=1
j=1
P
K
K
P
hj+1 hj
h2j+1
j=1
j=1
=
..
..
.
.
K
K
P
P
hj+Lm1 hj
hj+Lm1 hj+1
j=1
j=1
...
K
P
hj hj+Lm1
...
hj+1 hj+Lm1
.
j=1
..
..
.
K
P 2
...
hj+Lm1
j=1
K
P
j=1
Using this partitioning form, we can say that the sub-matrix HH>
(1) is obtained from a
Hankel matrix corresponding to the sub-series HN m = (h1 , . . . , hN m ) and its eigenvalues
m
m
m
are Lm,N
Lm,N
Lm,N
0. Therefore, the proof is completed
1
2
Lm
using Cauchys interlacing theorem.
L
X
j=1
L,N
= tr(HH> ) =
j
L K+l1
X
X
l=1
h2j .
j=l
49
N
1
X
h2j +
j=1
N
X
h2j +
j=2
N
1
X
j=1
h2j
N
X
j=2
N
1
X
h2j
2
hj hj+1 = 0.
(8)
j=1
Equation (8) has two real solutions so that we have two real eigenvalues. The first eigenvalue (larger one) is given by
NP
1
2,N
=
1
j=1
h2j +
N
P
j=2
!2
u
u
NP
1
2
t
2
2
2
h1 hN + 4
hj +
hj hj+1
j=1
(9)
2,N
1
1,N , PN 1 hj hj+1 h2 h2 ;
1 N
1
j=1
2
=
1,N , PN 1 h h
2
h1 h2N ;
j j+1
1
j=1
(10)
PN
2
where 1,N
=
1
j=1 hj , when L = 1. Practically, it seems that the first condition of
Equation (10) is usually satisfied for a wide classes of models. For example, it can be
seen that the condition is equivalent to monotonicity of the sequence {hj , j = 1, . . . , N }.
P 1
For a non-negative (or non-positive) monotone sequence, we have N
j=1 hj hj+1 h1 hN .
PN 1 2
1,N 1
2,N
. A greater class is obtained
Applying Equation (9), it follows 1 j=1 hj = 1
if we consider positive data, where all observations are bigger P
that the first one and h1
N 1
hN /(N 1). Under this condition, it is easy to show that
j=1 hj hj+1 h1 hN and
therefore 2,N
1,N
1
1 . In the next section, we see some examples of models that have not
these conditions but 2,N
1,N
1
1 .
It is worth mention that we can state a geometrical display of Equation (8) as
2 ||h1:N 1 ||2 + ||h2:N ||2 + ||h1:N 1 ||2 ||h2:N ||2 (sin (1,2 ))2 = 0,
(11)
where h1:N 1 and h2:N denote the first and second rows of H, ||.|| the Euclidean norm and
1,2 the angle between two rows of H. Notice that last expression in Equation (11) is the
magnitude of the cross product between two first rows of H. Since (sin (1,2 ))2 1, it is
1
easy to obtain the inequality 2,N
1,N
, which is a direct result of Theorem 3.4 from
1
1
characteristics given in Equation (11).
3.2.3 Case 3: L > 2, rank of H = 2
>
In this case, HH has two positive eigenvalues. To obtain the eigenvalues, first of all note
that
(12)
50
Lemma 3.5 (Horn and Johnson, 1985, Theorem 1.2.12) Let A be an n n real or complex
matrix with eigenvalues 1 , . . . , n . Then, for 1 k n,
(i) sk () = (1)k ck , and
(ii) sk () is the sum of all the k k principal minors of A.
Equation (12) shows that the eigenvalues of HH> in this case are the solution of
2
L K+l1
X
X
l=1
h2j +
l=1 i=1
j=l
h2j
L1
Ll
XX
K+l1
X
j=l
K+l1
X
h2j+i
j=l
k+l1
X
j=l
hj hj+i
= 0.
(13)
L K+l1
X
X
p
1
2
L,N
=
,
h
+
L
j
1
2
l=1
(14)
j=l
where L is the discriminant of the quadratic expression given in Equation (13). According
to Equation (14), it is easy to see that, for L [(N + 1)/2],
L,N
j
L1,N
j
L+1
p
p N X
L1 L
h2l ,
j = 1, 2.
l=L
Similar to the previous case, Equation (13) may be reformulated in the language of multivariate geometry for the L-lagged vectors given by
2
L
X
j=1
||hj:K+j1 ||2 +
L1
X
L
X
i=1 j=i+1
CjL,N CjLm,N .
(15)
Since inequality given in Equation (15) is satisfied for all values of m belonging to
{1, . . . , L 1}, it appears that C1L,N is decreasing on L {2, . . . , [(N + 1)/2]}. In the
next section, we see examples that show whether such a behavior is true for polynomial
models or not.
4.
51
In this section, we discuss some examples related to the theoretical results obtained in
Section 3. Also, we provide an application of these results.
4.1 Examples
160
150
140
Singular value
170
180
Example 4.1 Let ht = exp(0 + 1 t), for t = 1 . . . , N . It is easy to see that the corresponding Hankel matrix H has rank one. Figure 1 shows first singular value of H for this
model with 0 = 0.1, 1 = 0.2 and N = 20, which is convex with respect to L and attains
maximum value at L = 10, 11, i.e., the median of {1, . . . , 20}.
10
15
Figure 1. Plot of the first singular value of H for different values of L: example 4.1.
Now, we consider two different examples where their corresponding Hankel matrices have
rank two. The first one is a simple linear model and the second is a cosine model. As we
see for both models, roughly speaking, we can say that the results are somewhat similar
to Example 4.1.
12
6
160
10
Singular value
200
180
Singular value
220
14
240
16
10
15
10
15
Figure 2. Plots of the first (left) and second (right) singular values of H for different values of L: example 4.2.
52
15
Singular value
10
10
20
15
Singular value
20
25
25
Example 4.3 Let ht = cos (t/12), for t = 1, . . . , N . First and second singular values of
H are depicted in Figure 3 for series length 100. If we connive some small fluctuations in
the plots, we can say that behavior of singular values of H is similar to Example 4.2.
20
40
60
80
100
20
40
60
80
100
Figure 3. Plots of the first (left) and second (right) singular values of H for different values of L: example 4.3.
15
10
Singular value
250
200
Singularvalue
150
4500
50
3500
100
4000
Singular value
300
5000
350
10
L
15
10
L
15
10
15
Figure 4. Plots of the three largest singular values of H for different values of L: example 4.4.
Example 4.5 Let ht = log(t), for t = 1, . . . , N . Then, it can be seen that rank of the
corresponding Hankel matrix H is four. Singular values of H are shown in Figure 5 for
N = 20. The results of this example are in concordance with Example 4.4.
Figure 6 shows two singular values for models ht = cos(t/12) (left) and ht = log(t)
(right), for N = 5, . . . , 100. Solid and dashed lines in Figure 6 denote the singular values
for L = 2 and L = 1, respectively. Both of these values confirm our expectation for
discrepancy between two singular values. Notice that the cosine model is not monotone,
but 2,N
1,N
1
1 .
53
1.6
1.2
1.4
Singular value
20
18
14
1.0
16
Singular value
1.8
22
2.0
24
10
15
10
15
0.007
0.006
0.003
0.06
0.004
0.005
Singular value
0.10
0.08
Singular value
0.12
0.008
10
12
14
16
10
12
14
16
50
Figure 5. Plots of the four largest singular values of H with respect to different values of L: example 4.5.
30
20
Singular value
40
L=1
L=2
10
Singular value
L=1
L=2
20
40
60
Length of Series
80
20
40
60
80
Length of Series
Figure 6. Plots of the first singular value for values of L and N in cosine (left) and logarithm (right) models.
10
15
1.2e05
10
0.0e+00
0.995
0.001
4.0e06
8.0e06
Ratio
0.003
Ratio
0.002
0.997
0.996
Ratio
0.998
0.004
0.999
0.005
54
15
10
15
Figure 7. Plots of CjL,N with respect to L for N = 20 and j = 1 (left), j = 2 (center), j = 3 (right): example 4.6.
Next, we examine the cases where the degree of the polynomial is greater than two.
Furthermore, different coefficients are considered. The results are similar to Example 4.6
and thus we do not report them here. As a general result, we can say that inequality given
in Equation (15) is satisfied for j = 1 in the polynomial models. Now, we consider the
P
L,N
ratio C1:r
= rj=1 CjL,N . Since C1L,N is bigger than CjL,N , for j > 1, and the discrepancy
between them usually is so much (see the polynomial model of Example 4.6), we expect
L,N
that the ratio C1:r
has a behavior such as C1L,N . In the following example, the behavior
of this ratio is depicted for the polynomial model with degree four.
10
15
0.999980
0.9975
0.999985
Ratio
Ratio
0.999990
0.9985
0.9980
Ratio
0.9990
0.999995
0.9995
1.000000
10
15
10
15
L,N
Figure 8. Plot of C1:r
with respect to L for N = 20 and r = 1 (left), r = 2 (center), r = 3 (right): example 4.7.
55
K
X
h2j .
(16)
j=L
Equation (16) is the rate of change in tr(HH> ) for each unit change in the window length.
This rate is large for small values of the window length and decreases to attain minimum
value at L = K, where it is equivalent to L = Lmax = median{1, . . . , N }. This motivate
us to choose smaller values than Lmax when the rate given in Equation (16) is small. To
support this motivation, Golyandina et al. (2001) said that series with a complex structure
and too large window length L can produce an undesirable decomposition of the series
components of interest, which may lead, in particular, to their mixing with other series
components. Sometimes, in these circumstances, even a small variation in the value of L
can reduce mixing and lead to a better separation of the components, i.e., it provides a
transition from weak to strong separability.
Another important parameter to be chosen is the number of needed singular values r for
grouping in the reconstruction step. Election of this parameter is similar to the procedure
for obtaining the cutoff value in principal component analysis. It is known that there is
not a general way to choose an optimal value of the cutoff number and it depends on the
data; for a complete description and review of this topic, see Jollife (2002). Perhaps the
most obvious criterion for choosing the cutoff value is to select a (cumulative) percentage
of the total variation, which one desires that the selected singular values contribute, say
a 80% or 90%. The required number of singular values is then the smallest value of r for
L,N
which this chosen percentage is exceeded. This criterion is equivalent to the ratio C1:r
previously defined.
5.
Conclusions
We have considered one of the main and most important issues in the singular spectrum
analysis, that is, the selection of parameters. As stated, singular values of the trajectory
matrix in the singular spectrum analysis play an important role. Specifically, election of the
parameters values of the window length (L) and the number of needed singular values for
reconstruction of series (r) depend on the behavior of the singular values of the trajectory
matrix. In this paper, we have studied the behavior of the singular values of a Hankel
matrix (H) with respect to its dimension. We have shown that, for a wide classes of time
series, the singular value of HH> (L,N
) increases with L in L {1, . . . , [(N + 1)/2]} and
j
decreases in L {[(N + 1)/2] + 1, . . . , N }. In addition, we have investigated the behavior
of the sum of square and the contribution of each singular value. The results based on
these criteria have shown that the choice of L close to one-half of the time series length is
a suitable choice for decomposition stage in most cases for the singular spectrum analysis.
56
Acknowledgements
Authors would like to thank anonymous referees for their valuable comments that improved
the exposition of the paper.
References
Bhatia, R., 1997. Matrix Analysis. Springer, New York.
Elsner, J.B., Tsonis, A.A., 1996. Singular Spectrum Analysis: A New Tool in Time Series
Analysis. Plenum Press, New York.
Golyandina, N., Nekrutkin, V., Zhigljavsky, A., 2001. Analysis of Time Series Structure:
SSA and Related Techniques. Chapman & Hall/CRC, New York.
Hassani, H., 2007. Singular spectrum analysis: methodology and comparision. Journal of
Data Science, 5, 239257.
Hassani, H., Heravi, S., Zhigljavsky, A., 2009. Forecasting European industrial production
with singular spectrum analysis. International Journal of Forecasting, 25, 103118.
Hassani, H., Mahmoudvand, R., Zokaei, M., 2011. Separability and window length in singular spectrum analysis. Comptes Rendus Mathematique (in press).
Hassani, H., Thomakos, D., 2010. A Review on singular spectrum analysis for economic
and financial time series. Statistics and its Interface, 3, 377397.
Horn, R.A., Johnson, C.R., 1985. Matrix Analysis. Cambridge University Press, Cambridge.
Jolliffe, I.T., 2002. Principal Component Analysis. Springer, New York.
Peller, V., 2003. Hankel Operators and Their Applications. Springer, New York.
Widom, H., 1966. Hankel matrix. Transactions of the American Mathematical Society,
121, 135.