ssp07 0000720

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

TESTING STATIONARITY WITH SURROGATES A ONE-CLASS SVM APPROACH

Jun Xiao, Pierre Borgnat, Patrick Flandrin

Ecole Normale Sup erieure de Lyon


46 all ee dItalie 69364 Lyon Cedex 07 France
C edric Richard
Universit e de Technologie de Troyes
12 rue Marie Curie 10010 Troyes Cedex France
ABSTRACT
An operational framework is developed for testing stationar-
ity relatively to an observation scale, in both stochastic and
deterministic contexts. The proposed method is based on a
comparison between global and local time-frequencyfeatures.
The originality is to make use of a family of stationary surro-
gates for de!ning the null hypothesis and to base on them a
statistical test implemented as a one-class Support Vector Ma-
chine. The time-frequency features extracted from the sur-
rogates are considered as a learning set and used to detect
departure from stationnarity. The principle of the method is
presented, and some results are shown on typical models of
signals that can be thought of as stationary or nonstationary,
depending on the observation scale used.
Index Terms Stationarity Test, Time-Frequency Anal-
ysis, Support Vector Machines, One-Class Classi!cation
1. REVISITING STATIONARITY
Considering stationarity is central in many signal processing
applications, either because its assumption is a pre-requisite
for applying most of standard algorithms devoted to steady-
state regimes, or because its breakdown conveys speci!c in-
formation in evolutive contexts. Testing for stationarity is
therefore an important issue, but addressing it raises some dif-
!culties. The main reason is that the concept itself of station-
arity, while uniquely de!ned in theory, is often interpreted in
different ways. Indeed, whereas the standard de!nition of sta-
tionarity refers only to stochastic processes and concerns the
invariance of statistical properties over time, stationarity is
also usually invoked for deterministic signals whose spectral
properties are time-invariant. Moreover, while the underlying
invariances (be they stochastic or deterministic) are supposed
to hold in theory for all times, common practice allows them
to be restricted to some !nite time interval [1, 2, 3], possibly
with abrupt changes in between [4, 5]. As an example, we
can think of speech that is routinely segmented into station-
ary frames, the stationarity of voiced segments relying in
fact on periodicity structures within restricted time intervals.
Those remarks call for a better framework aimed at dealing
with stationarity in an operational sense, with a de!nition
that would both encompass stochastic and deterministic vari-
ants, and include the possibility of its test relatively to a given
observation scale. This is the purpose of the present study.
2. FRAMEWORK
2.1. A time-frequency approach
As far as only second order evolutions are to be tested, time-
frequency (TF) distributions and spectra are natural tools [6].
Well-established theories exist for justifying the choice of a
given TF representation. In the case of stationary processes,
the Wigner-Ville Spectrum (WVS) is not only constant as a
function of time but also equal to the Power SpectrumDensity
(PSD) at each instant. From a practical point of view, the
WVS is a quantity that has to be estimated. In this study, we
choose to make use of multitaper spectrograms [7] de!ned as
S
x,K
(t, f) =
1
K
K

k=1
S
(h
k
)
x
(t, f),
where the {S
(h
k
)
x
(t, f), k = 1, . . . K} stand for the K spec-
trograms computed with the K !rst Hermite functions as short-
time windows h
k
(t):
S
(h
k
)
x
(t, f) =

x(s) h
k
(s t) e
i2fs
ds

2
.
The reason for this choice is that spectrograms can be both
interpreted as estimates of the WVS for stochastic processes
and as reduced interference distributions for deterministic sig-
nals. The multitaper approach is furthermore adopted in or-
der to reduce estimation variance without some extra time-
averaging which would be unappropriate in a nonstationary
context. In practice, the multitaper spectrogram is evaluated
only at N time positions {t
n
, n = 1, . . . N}, with a spacing
t
n+1
t
n
which is an adjustable fraction of the temporal width
T
h
of the K windows h
k
(t).
2.2. Relative stationarity
The TF interpretation suggesting that suitable representations
should undergo no evolution in stationary situations, station-
arity tests can be envisioned on the basis of some comparison
720 1-4244-1198-X/07/$25.00 2007 IEEE SSP 2007
o
r
i
g
i
n
a
l
s
u
r
r
o
g
a
t
e
Fig. 1. Signal and surrogates. This !gure displays a Fre-
quency Modulated signal with a Gaussian amplitude, which
is a nonstationary signal (left), and one of its surrogates ob-
tained by replacement of the phase of the Fourier transform
by an i.i.d. uniform phase (right).
between local and global features. Relaxing the assumption
that stationarity would be some absolute property, the basic
idea underlying the approach proposed here is that, when con-
sidered over a given duration, a process will be referred to
as stationary relatively to this observation scale if its time-
varying spectrum undergoes no evolution or, in other words,
if the local spectra at all different time instants are statistically
similar to the global spectrum obtained by marginalization.
3. TEST
3.1. Surrogates
Revisiting stationarity within the TF perspective has already
been pushed forward [2], but the novelty is to address the sig-
ni!cance of the difference local vs. global by elaborating
from the data itself a stationarized reference serving as the
null hypothesis for the test. Indeed, distinguishing between
stationarity and nonstationarity would be made easier if, be-
sides the analyzed signal itself, we had at our disposal some
reference having the same marginal spectrumwhile being sta-
tionary. Since such a reference is generally not available, one
possibility is to create it from the data: this is the rationale be-
hind the idea of surrogate data, a technique which has been
introduced and widely used in the physics literature, mostly
for testing linearity [8, 9] (up to some proposal reported in
[10], it seems to have never been used for testing stationar-
ity).
For an identical marginal spectrum over the same obser-
vation interval, nonstationary processes are expected to differ
fromstationary ones by some structured organization in time,
hence in their time-frequency representation. A set of J sur-
rogates is thus computed from a given observed signal x(t),
so that each of them has the same PSD as the original signal
while being stationarized. In practice, this is achieved by
destroying the organized phase structure controlling the non-
stationarity of x(t), if any. To this end, x(t) is !rst Fourier
transformed to X(f), and the modulus of X(f) is then kept
unchanged while its phase is replaced by a random one, uni-
formly distributed over [, ]. This modi!ed spectrum is
then inverse Fourier transformed, leading to as many station-
time
f
r
e
q
u
e
n
c
y
original marg.
time
f
r
e
q
u
e
n
c
y
1 surrogate
time
f
r
e
q
u
e
n
c
y
mean over 40 surrogates
time
m
a
r
g
.
time time
Fig. 2. Surrogates. This !gure compares the TF structure of
the nonstationary FM signal of Fig. 1 (1st column), of one
of its surrogates (2nd column) and of the mean over J = 40
surrogates (3rd column). The spectrogram is represented in
each case on the 1st line, with the corresponding marginal in
time on the 2nd line. The marginal in frequency, which is the
for the three spectrograms, is displayed on the far right of the
1st line.
ary surrogate signals as phase randomizations are operated.
Fig. 1 shows a nonstationary signal and one surrogate result-
ing from this operation. The effect of the surrogate procedure
is further illustrated in Fig. 2, displaying both signal and sur-
rogate spectrograms, together with their marginals in time and
frequency. It clearly appears from this !gure that, while the
original signal undergoes a structured evolution in time, the
recourse to phase randomization in the Fourier domain ends
up with stationarized (i.e., time unstructured) surrogate data
with identical spectrum.
3.2. One-class SVM
Once a collection of stationarized surrogate data has been
synthesized, different possibilities are offered. The !rst one is
to extract from them some features such as distances between
local and global spectra, and to characterize the null hypothe-
sis of stationarity by the statistical distribution of their varia-
tion in time. This approach is the subject of current investiga-
tions that will be reported elsewhere [11]. We will here rather
focus on an alternative viewpoint rooted in statistical learn-
ing theory: the collection of surrogates will be considered
as a learning set and used to detect departure from station-
arity. In this context, the classi!cation task is fundamentally
a one-class classi!cation problem and differs from conven-
tional two-class pattern recognition problems in the way how
721
w /w
j/w
sample classi!ed
as an outlier
minimum volume
hypersphere
w, (z)H = 0
O
Fig. 3. One-class SVM with kernel (z
i
, z
j
) depending only
on z
i
z
j
.
the classi!er is trained. The latter uses only target data to
estimate a boundary which encloses most of them. The ma-
chinery of one-class Support Vector Machines (1-class SVM),
which was introduced for outlier detection [12], can be used.
This technique has been successfully applied to a number of
problems, including audio and biomedical signal segmenta-
tion [4, 13].
Let Z = {z
1
, . . . , z
J
} be a set of J surrogate signals (or
a collection of features derived from it). Let : Z Z R
be a kernel function that satis!es Mercer conditions. The lat-
ter can be used to map the z
j
s into a feature space denoted
by H via : Z H de!ned as (z) = (z, ). The
space H is shown to be a reproducing kernel Hilbert space
of functions with dot product ,
H
. The reproducing kernel
property states that (z
i
, ), (z
j
, )
H
= (z
i
, z
j
), which
means that (z
i
, z
j
) can be interpreted as the dot product be-
tween z
i
and z
j
mapped to H by (). A classic example of
Mercer kernel is the Gaussian kernel de!ned as (z
i
, z
j
) =
exp(z
i
z
j

2
/2
2
0
), where
2
0
is a bandwidth parameter.
Note that it maps any data point onto a hypersphere of radius
1 since (z
j
, z
j
) = 1 for all z
j
.
The learning strategy adopted by 1-class SVM is to map
the data into the feature space corresponding to the kernel
function, and determine the hyperplane w, (z)
H
= 0
which separates them from the origin with maximum mar-
gin. The decision function d(z) = sgn(w, (z)
H
) then
gives on which side of the hyperplane any new point z falls
in feature space, and determine if it may be considered as an
outlier. For kernels (z
i
, z
j
) depending only on z
i
z
j
such
as the Gaussian kernel, which map data onto a hypersphere,
this strategy is equivalent to !nding the minimum volume hy-
persphere enclosing the data [14]; See Fig. 3. Now, let us fo-
cus on the optimization problem solved to get the hyperplane
parameters w and . On the one hand, the distance /w
that separates the hyperplane from the origin must be maxi-
mized. But on the other hand, the number of target samples
wrongly classi!ed as outliers must be minimized. Such sam-
ples z
j
satisfy inequalities of the formw, (z
j
)
H

j
with
j
> 0. Based on these results, the decision function
is found by minimizing the weighted sum of a regularization
termw
2
, and an empirical error termdepending on the mar-
gin variable and individual errors
j
min
w,,
1
2
w
2
+
1
J

J
j=1

j

subject to w, (z
j
)
H

j
,
j
0,
with [0, 1]. Basic properties of 1-class SVM are reported
in [12]. An important result is that the parameter may be
used to incorporate prior information about the frequency of
novelty occurrences.
We shall nowuse 1-class SVM with (stationary) surrogate
signals. The resulting decision rules will allow us to distin-
guish between stationary and nonstationary processes.
4. TWO EXAMPLES
Two test signals are used as being simple illustrations of pos-
sible nonstationary evolutions: amplitude modulation (AM)
of a random noise and deterministic frequency modulation
(FM). The models are expressed, for t [0, T], as
(AM) x(t) = (1 + sin 2t/T
0
) e(t);
(FM) x(t) = sin 2(f
0
t +sin2t/T
0
) +e(t),
where e(t) is white Gaussian noise and (FM case) f
0
is the
central frequency. For both models, T
0
is the period of the
modulation and 0 1 is the modulation factor. All non-
stationarities cannot be subsumed under these two categories,
but they are believed to give meaningful examples.
Given x(t), TF features are extracted from a multitaper
spectrogram. Introducing !rst the local average

Sn
com-
puted with the normalized time-frequency distribution (for
f > 0 only):

S
n
(f) :=
S
x,K
(t
n
, f)

0
S
x,K
(t
n
, f)df
at times t
n
(n = 1, ..., N), temporal features are obtained so
as to describe the time evolution of the local power P
n
of the
signal and of its local frequency content F
n
:
P
n
= 1

Sn
; F
n
= f

Sn
; F
2
n
= f
2

Sn
From this (keeping a small number of features for a sake of
clarity), we retain the following two characteristics compar-
ing local TF features to global ones:

P = std({P
n
}
n=1..N
)/mean({P
n
}
n
)
F = std({F
n
}
n=1..N
)/mean({

{F
2
n
(F
n
)
2
}}
n
)
722
0
P
F
1!class SVM
0
P
F
0
P
F
time
T
0

=

T
/
2
0
signal
time
T
0

=

T
time
T
0

=

2
0

T
spectrogram
time
f
r
e
q
u
e
n
c
y
time
f
r
e
q
u
e
n
c
y
time
f
r
e
q
u
e
n
c
y
P
F
1!class SVM
0
P
F
0
P
F
time
T
0

=

T
/
2
0
signal
time
T
0

=

T
time
T
0

=

2
0

T
spectrogram
time
f
r
e
q
u
e
n
c
y
time
f
r
e
q
u
e
n
c
y
time
f
r
e
q
u
e
n
c
y
Fig. 4. Two examples. Signals, spectrograms and space (P, F) of the TF features in AM (left) and FM (right) situations. From
top to bottom, T
0
= T/20, T and 20 T, with T = 1600. In each case, the red circle corresponds to the (P, F) pair of one test
signal used to draw the surrogates. Those surrogates (J = 40 in the experiments reported here) are plotted as green dots which,
with 1-class SVM, de!ne the domain of stationarity represented here as the gray shaded region, the blue circles corresponding
to the support vectors. Magenta dots are independent realizations of the same test model. Other parameters are as follows
number of tapers: K = 5, length of tapers: T
h
= 387, modulation indices: = 0.5 (AM) and 0.02 (FM), signal-to-noise ratio:
SNR = 10 dB SVM kernel is Gaussian, = 0.07, = 0.05.
The !rst one (P) is a measure of the "uctuations in time of
the local power of the signal, whereas the second one (F) op-
erates the same way with respect to the local mean frequency.
These characteristics are used as features, z = (P, F), for the
1-class SVM whose output is displayed in Fig. 4. The SVM
toolbox proposed in [15] was used for this illustration.
The results are shown for T
0
= T/20, T and 20 T, allow-
ing to consider stationarity relatively to the ratio between the
observation time T and the modulation period T
0
. They can
be summarized as follows;
Macroscale For a small modulation period (or a large ob-
servation time, i.e., when T
0
T), the situation can
be considered as stationary, due to the observation of
many similar oscillations over the observed time scale.
This is re"ected by a test signal (P, F) feature (red cir-
cle, see caption) which lies inside the region de!ned by
the 1-class SVM for the stationary surrogates.
Mesoscale For a medium observation time (T T
0
), the
local evolution due to the modulation is prominant and
the red circle for the modulated signal is well outside
the stationary region, in accordance with a situation that
can be referred to as nonstationary.
Microscale Finally, if T
0
T, the result turns back to
stationarity because no signi!cative change in the am-
plitude or the frequency is observed over the considered
time scale.
Two remarks can be made with respect to these results.
First, although both AM and FM signals are seen as nonsta-
tionary in the mesoscale regime, in the AM case the nonsta-
tionarity manifests through a deviation of the local power P,
whereas in the FM case, it is the local frequency F that is
mostly different from the stationary class. Second, it turns
out that, in the microscale regime, the deterministic stationar-
ity (FM case) naturally ends up with a much larger dispersion
in P than in F since, by construction, the spectrum is narrow-
band. Moreover, the randomization which underlies the con-
struction of surrogates necessarily ends up with more power
"uctuations in the stationarized data than in the original test
signal, and hence with a (P, F) pair which, at best, lies on
the border of the support of the stationary class. This sug-
gests that, in some sense, the position of the (P, F) pair with
respect to the stationary region does not only give an infor-
mation about a possible nonstationarity but also an indication
about its type.
5. CONCLUSION
Testing for stationarity in signal processing and data analy-
sis has already received some attention, but maybe not as
much as might be expected from its ubiquitous nature. In
this paper, we proposed an operational framework to mea-
sure and test departures from stationarity. Its originality is
that it makes use of a family of stationarized realizations of
the analyzed signal, called surrogates, for de!ning the null
723
hypothesis. Time-frequency features are then extracted from
surrogates, and used as a learning set to train a 1-class SVM
which encompasses what may be considered stationary.
A number of extensions to the present work are possi-
ble. Here, we use in Section 4 speci!c bidimensional fea-
tures, but one can think of directly using general TF represen-
tations emerging from the use of SVM machinery for Time-
Frequency, such as in [16]. Another possibility is to general-
ize the present approach to other forms of stationarity. This
requires to de!ne new speci!c stationarizing tools and signal
representations, in the spirit of [17, 18].
6. REFERENCES
[1] S. Mallat, G. Papanicolaou, and Z. Zhang, Adaptive
covariance estimation of locally stationary processes,
Ann. of Stat., vol. 24, no. 1, pp. 147, 1998.
[2] W. Martin and P. Flandrin, Detection of changes of sig-
nal structure by using the Wigner-Ville spectrum, Sig-
nal Proc., vol. 8, pp. 215233, 1985.
[3] R.A. Silverman, Locally stationary randomprocesses,
IRE Trans. on Info. Theory, vol. 3, pp. 182187, 1957.
[4] M. Davy and S. Godsill, Detection of abrupt signal
changes using Support Vector Machines: An application
to audio signal segmentation, in Proc. IEEE ICASSP-
02, Orlando (FL), 2002.
[5] H. Laurent and C. Doncarli, Stationarity index for
abrupt changes detection in the time-frequency plane,
IEEE Signal Proc. Lett., vol. 5, no. 2, pp. 4345, 1998.
[6] P. Flandrin, Time-Frequency / Time-Scale Analysis,
Academic Press, 1999.
[7] M. Bayram and R.G. Baraniuk, Multiple windowtime-
varying spectrum estimation, in Nonlinear and Non-
stationary Signal Processing, W.J. Fitzgerald et al., Ed.
2000, Cambridge Univ. Press.
[8] J. Theiler et al., Testing for nonlinearity in time series:
the method of surrogate data, Physica D, vol. 58, no.
14, pp. 7794, 1992.
[9] T. Schreiber and A. Schmitz, Improved surrogate data
for nonlinearity tests, Phys. Rev. Lett., vol. 77, no. 4,
pp. 635638, 1996.
[10] C.J. Keylock, Constrained surrogate time series with
preservation of the mean and variance structure, Phys.
Rev. E, vol. 73, pp. 030767.1030767.4, 2006.
[11] J. Xiao, P. Borgnat, and P. Flandrin, Testing stationarity
with time-frequencysurrogates, in Proc. EUSIPCO-07,
Poznan (PL), Sept. 2007, to appear.
[12] B. Sch olkopf, J.C. Platt, J.S. Shawe-Taylor, A.J. Smola,
and R.C. Williamson, Estimating the support of a high-
dimensional distribution, Neural Computation, vol. 13,
no. 7, pp. 14431471, 2001.
[13] A.B. Gardner, A.M. Krieger, G. Vachtsevanos, and
B. Litt, One-class novelty detection for seizure analy-
sis from intracranial eeg, Journal of Machine Learning
Research, vol. 7, pp. 10251044, 2006.
[14] D.M.J. Tax and R.P.W. Duin, Support vector data de-
scription, Machine Learning, vol. 54, pp. 4566, 2004.
[15] S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotoma-
monjy, SVM and Kernel Methods Matlab Toolbox,
Perception Syst` emes et Information, INSA de Rouen,
http://asi.insa-rouen.fr/arakotom/toolbox/index.html,
2005.
[16] P. Honeine, C. Richard, and P. Flandrin, Time-
frequency learning machines, IEEE Trans. on Signal
Proc., vol. 55, no. 7 (Part 2), pp. 39303936, 2007.
[17] P. Flandrin, P. Borgnat, and P.-O. Amblard, From
stationarity to self-similarity, and back : Variations on
the Lamperti transformation, in Processes with Long-
Range Correlations: Theory and Applications, G. Ra-
ganjaran and M. Ding, Eds. June 2003, vol. 621 of Lec-
tures Notes in Physics, pp. 88117, Springer-Verlag.
[18] P. Borgnat, P.-O. Amblard, and P. Flandrin, Stochastic
invariances and Lamperti transformations for stochastic
processes, J. Phys. A: Math. Gen., vol. 38, no. 10, pp.
20812101, Feb. 2005.
724

You might also like