Independent Component Analysis For Time Series Separation
Independent Component Analysis For Time Series Separation
Independent Component Analysis For Time Series Separation
Analysis
For Time Series Separation
ICA
Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.
Applications include:
Audio Processing
Medical data
Finance
Array processing (beamforming)
Coding
and most applications where Factor Analysis and PCA is currently used.
While PCA seeks directions that represents data best in a |x
0
- x|
2
sense,
ICA seeks such directions that are most independent from each other.
We will concentrate on Time Series separation of Multiple Targets
The simple Cocktail Party Problem
Sources
Observations
s
1
s
2
x
1
x
2
Mixing matrix A
x = As
n sources, m=n observations
Motivation
Two Independent Sources
Mixture at two Mics
a
IJ
... Depend on the distances of the microphones from the speakers
2 22 1 21 2
2 12 1 11 1
) (
) (
s a s a t x
s a s a t x
+ =
+ =
Motivation
Get the Independent Signals out of the Mixture
ICA Model (Noise Free)
Use statistical latent variables system
Random variable s
k
instead of time signal
x
j
= a
j1
s
1
+ a
j2
s
2
+ .. + a
jn
s
n
, for all j
x = As
ICs s are latent variables & are unknown AND Mixing matrix A is
also unknown
Task: estimate A and s using only the observeable random vector x
Lets assume that no. of ICs = no of observable mixtures
and A is square and invertible
So after estimating A, we can compute W=A
-1
and hence
s = Wx = A
-1
x
Illustration
2 ICs with distribution:
Zero mean and variance equal to 1
Mixing matrix A is
The edges of the parallelogram are in the
direction of the cols of A
So if we can Est joint pdf of x
1
& x
2
and then
locating the edges, we can Est A.
|
|
.
|
\
|
=
1 2
3 2
A
s
=
otherwise
s if
s p
i
i
0
3 | |
3 2
1
) (
Restrictions
s
i
are statistically independent
p(s
1
,s
2
) = p(s
1
)p(s
2
)
Nongaussian distributions
The joint density of unit
variance s
1
& s
2
is symmetric.
So it doesnt contain any
information about the
directions of the cols of the
mixing matrix A. So A cannt
be estimated.
If only one IC is gaussian, the
estimation is still possible.
|
|
.
|
\
|
+
=
2
exp
2
1
) , (
2
2
2
1
2 1
x x
x x p
t
Ambiguities
Cant determine the variances (energies)
of the ICs
Both s & A are unknowns, any scalar multiple in one of the
sources can always be cancelled by dividing the corresponding
col of A by it.
Fix magnitudes of ICs assuming unit variance: E{s
i
2
} = 1
Only ambiguity of sign remains
Cant determine the order of the ICs
Terms can be freely changed, because both s and A are
unknown. So we can call any IC as the first one.
ICA Principal (Non-Gaussian is Independent)
Key to estimating A is non-gaussianity
The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)
f(s
1
) f(s
2
) f(x
1
) = f(s
1
+s
2
)
Where w is one of the rows of matrix W.
y is a linear combination of s
i
, with weights given by z
i
.
Since sum of two indep r.v. is more gaussian than individual r.v., so z
T
s is more
gaussian than either of s
i
. AND becomes least gaussian when its equal to one of s
i
.
So we could take w as a vector which maximizes the non-gaussianity of w
T
x.
Such a w would correspond to a z with only one non zero comp. So we get back the s
i.
s z As w x w y
T T T
= = =
Measures of Non-Gaussianity
We need to have a quantitative measure of non-gaussianity for ICA
Estimation.
Kurtotis : gauss=0 (sensitive to outliers)
Entropy : gauss=largest
Neg-entropy : gauss = 0 (difficult to estimate)
Approximations
where v is a standard gaussian random variable and :
2 2 4
}) { ( 3 } { ) ( y E y E y kurt =
}
= dy y f y f y H ) ( log ) ( ) (
) ( ) ( ) ( y H y H y J
gauss
=
{ }
2
2
2
) (
48
1
12
1
) ( y kurt y E y J + =
{ } { } | |
2
) ( ) ( ) ( v G E y G E y J ~
) 2 / . exp( ) (
) . cosh( log
1
) (
2
u a y G
y a
a
y G
=
=
Data Centering & Whitening
Centering
x = x E{x}
But this doesnt mean that ICA cannt estimate the mean, but it just simplifies the
Alg.
ICs are also zero mean because of:
E{s} = WE{x}
After ICA, add W.E{x} to zero mean ICs
Whitening
We transform the xs linearly so that the x
~
are white. Its done by EVD.
x
~
= (ED
-1/2
E
T
)x = ED
-1/2
E
T
Ax = A
~
s
where E{xx
~
} = EDE
T
So we have to Estimate Orthonormal Matrix A
~
An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.
Reducing dim of data (choosing dominant Eig) while doing whitening also
help.
Noisy ICA Model
x = As + n
A ... mxn mixing matrix
s ... n-dimensional vector of ICs
n ... m-dimensional random noise vector
Same assumptions as for noise-free model, if we use measures of
nongaussianity which are immune to gaussian noise.
So gaussian moments are used as contrast functions. i.e.
however, in pre-whitening the effect of noise must be taken in to account:
x
~
= (E{xx
T
} - )
-1/2
x
x
~
= Bs + n
~
.
{ } { } | |
( ) ) 2 / exp( 2 / 1 ) (
) ( ) ( ) (
2 2
2
c x c y G
v G E y G E y J
=
~
t