ICINCO2010

FEATURE EXTRACTION AND SELECTION FOR AUTOMATIC
SLEEP STAGING USING EEG
Hugo Simões, Gabriel Pires

Institute of Systems and Robotics, University of Coimbra, Coimbra,Portugal
[email protected], [email protected]
Urbano Nunes, Vitor Silva

Department of Electrical Engineering,University of Coimbra – Polo II, Coimbra, Portugal
Keywords: Feature extraction, feature selection, EEG sleep staging, Bayesian classifier.
Abstract: Sleep disorders affect a great percentage of the population. The diagnostic of these disorders is usually made
by a polysomnography, requiring patient’s hospitalization. Low cost ambulatory diagnostic devices can in
certain cases be used, especially when there is no need of a full or rigorous sleep staging. In this paper,
several methods to extract features from 6 EEG channels are described in order to evaluate their
performance. The features are selected using the R-square Pearson correlation coefficient (Guyon and
Elisseeff, 2003), providing this way a Bayesian classifier with the most discriminative features. The results
demonstrate the effectiveness of the methods to discriminate several sleep stages, and ranks the several
feature extraction methods. The best discrimination was achieved for relative spectral power, slow wave
index, harmonic parameters and Hjorth parameters.
1 INTRODUCTION implement in the algorithms and combines a macro

and micro perspective of the overall epochs. It
About a third of the population suffers from sleep should be highlighted that there is also some level
of disagreement between experts.
disorders, including the obstructive sleep apnea
This work describes part of an apnea detection
syndrome (Doroshenkov et al, 2007). The diagnosis
of such diseases is performed by a system to be used in ambulatory situations by
patients at home. It does not intend to substitute the
polysomnography (PSG) which requires the
PSG, but only to determine primarily if the patient
patient's hospitalization with costs and discomfort
for the patient. Ambulatory diagnostic devices may is sleeping at the occurrence of the apnea episode,
and secondly to determine in which sleeping stage it
have an important role in order to mitigate these
did occur. The stage classification relies only on
factors. The PSG consists on the acquisition of
various electrical biosignals including EEG signals. This paper investigates several feature
extraction methods to compare their performance
electroencephalogram (EEG), electrooculogram
aiming to achieve improved results in the following
(EOG) and electromyogram (EMG). The signals are
segmented into epochs of 30 seconds and assigned sleep detection stages: wake (W) vs. sleep (S),
NREM (NR) sleep vs. REM (R) sleep, NREM N1
to a sleep stage by an expert (Iber et al, 2007). This
vs. NREM N2 + NREM N3, NREM N1 + NREM
is a tedious and time consuming task. Automatic
sleep stages classification (ASSC) is therefore an N2 vs. NREM N3, NREM N1 vs. NREM N2,
NREM N2 vs. NREM N3 and NREM N1 vs. REM
attractive solution. However, the general opinion is
sleep (Iber et al, 2007). Moreover, a feature
that most of the experts do not rely on ASSC
software, because they usually present a low selection method based on the squared Pearson
correlation coefficient (Guyon and Elisseeff, 2003),
performance (i.e. present a high level of
henceforth designated R-square criteria, is applied
disagreement). One of the main reasons is due to
the high variability between subjects which makes it with the purpose of finding a reduced set of
discriminative features. These features are used to
difficult to obtain robust models for classification.
provide additional information to the expert, and
The expert uses sometimes heuristics difficult to
also to automatically classify each sleep stage with
some degree of certainty. The classification is 2 DATABASE
performed by a Bayesian classifier using 2-class
detection. Scoring sleep is done according to rules Data from all-night PSG records were provided by
of the American Academy of Sleep Medicine the Laboratory of Sleep from Centro Hospitalar de
(AASM) Manual for Scoring Sleep (Iber et al, Coimbra. The PSG was recorded by the model
2007), an actualization of the rules of Rechtschaffen Somnostar Pro from Viasys at a sampling frequency
and Kales (Rechtschaffen and Kales, 1968). of 200 Hz. The database comprises seven patients
According to AASM Manual, sleep is divided into (five males and two females) with ages between 27
five stages: wake, NREM (Non Rapid Eye and 64 years old (mean = 50 years; standard
Movement) sleep (N1, N2 and N3) and REM deviation = 12.88 years). Only six EEG channels
(Rapid Eye Movement) sleep. Considering only were used: F3-A2, C3-A2, O1-A2, F4-A1, C4-A1
EEG signals, the wake stage is characterized by a and O2-A1. All recordings were segmented into
low amplitude alpha activity (8-13 Hz); N1 by a epochs of 30 seconds and labelled by an expert.
low amplitude theta activity (3-7 Hz); in N2 the The dataset was initially composed by 6558
predominant frequencies are in the 0.7-4 Hz range epochs. In order to avoid the over-fitting in the
and there is the arising of sleep spindles and K- learning and testing of algorithms, the number of
complexes; N3 presents at least 20% of the epochs sleep epochs in the database was reduced to 3000,
with delta activity (<2 Hz) with amplitude greater balancing the distribution of epochs of different
than 75 µV; REM is characterized by frequencies sleep stages according to a normal night sleep
mostly between 2 and 6 Hz with low amplitude. distribution as presented in Table 1. Since the sleep
Sleep staging based only on EEG presents some stages N2 and N1 are the ones with the highest and
difficulties because different stages such as wake, lowest occurrence during a normal night sleep,
REM and NREM N1 present similar patterns. The respectively, they were set as the stages with major
ASSC has been addressed by many research groups. and minor number of epochs in the dataset,
In (Tang et al, 2007), Hilbert-Hang transform and respectively, and the other sleep stages have a
wavelet transform were applied to extract harmonic number of epochs between these limits.
parameters from EEG signals, (Hese et al, 2001)
implemented a semi-automatic method based on k- Table 1: Full and reduced datasets.
means clustering algorithm. (Ebrahimi et al, 2008)
used neuronal networks and wavelet packet Sleep NREM
coefficients to discriminate between different sleep Stages Wake N1 N2 N3 REM
stages. Doroshenkov et al. (2007) have developed a Full
1293 784 2431 1154 896
dataset
classification algorithm based on Hidden Markov Reduced
Models using only EEG signals. (Zoubek et al, 560 410 760 520 750
dataset
2007) have used feature selection algorithms to find
the relevant features extracted from PSG signals.
Schwaibold et al (2003) have implemented a neuro-
fuzzy algorithm to model the rules of Rechtschaffen 3 AUTOMATIC SLEEP SCORING
and Kales. Although some studies show good
performance, they are very limited to specific The classification methodology is illustrated in
groups of patients and it has not been possible yet to the block diagram presented in figure 1. The EEG
create generalized models that provide results signals are filtered and segmented. Different types
accepted by the experts. Moreover, it remains of features extraction are used. These features are
difficult to discriminate between certain sleep then selected using the correlation criteria R-square
stages using only EEG signals. measure in order to provide the classification stage,
EEG Signal Feature Extraction Classification

RSP
Patient Database SWI
Harmonic Parameters Bayesian Classifier
Pre-Processing Parameters of Hjorth
Feature Selection LDA
Patient Testing Entropy Decision Tree
Notch filter
Skweness & Kurtosis
Butterworth filter R-square
Segmentation
Figure 1: Classification methodology.

a Bayesian-based classifier, with the most Some spectral bands can be highlighted over
discriminative ones. The training process uses data slow wave bands by means of slow wave index
from a pool of patients and some data from the (SWI) defined by the following ratios:
patient being monitored, namely, the wake recorded
epochs before the patient fall asleep. This way, the DSI = BSPDelta /( BSPTheta + BSPAlpha ) (1)
wake model can be improved. Moreover, the wake
epochs can be used for calibration of sleep stages. TSI = BSPTheta /( BSPDelta + BSPAlpha ) (2)
The performance analysis of the of feature
extraction algorithms was done through ten-fold ASI = BSPAlpha /( BSPDelta + BSPTheta ) , (3)
cross validation. The patients’ database is
partitioned into ten groups with the same number of
where DSI, TSI and ASI stand for delta-slow-wave
epochs from each sleep stage. Nine of them are
index, theta-slow-wave index and alpha-slow-wave
used to perform the models of classification and one
index, respectively (Agarwal et al, 2001).
for testing. This process is repeated 10 times using a
Harmonic parameters allow the analysis of a
different group for testing.
specific band in the EEG spectrum. They include
three parameters: center frequency (fc), bandwidth
(fσ) and spectral value at center frequency (Sfc),
4 FEATURE EXTRACTION AND defined as follows (Tang et al, 2007):
SELECTION
fH fH
In ASSC, the EEG is traditionally analyzed in fc = ∑ fP xx (f ) ∑P xx (f ) (4)

fL fL
frequency domain because, according with AASM
Manual, each sleep stage is essentially distinguished 12
by some spectral properties. However, temporal  fH fH 
analysis provides also useful information. For each fσ = 
 ∑( f − f c )
2
Pxx ( f ) ∑ Pxx ( f )

(5)
EEG channel, 34 features were extracted using  fL fL 
several methods as described in the following.
Spectral analysis provides some of the most S f c = Pxx ( f c ) , (6)
important features. For each sleep epoch, an
autoregressive method solved by the Yule-Walker where, Pxx(f) denotes the PSD, which is calculated
algorithm was applied to estimate the power for the frequency bands {fL,fH} (see Table 2).
spectral density (PSD) (Yilmaz et al, 2007). The The Hjorth parameters provide dynamic
spectrum is divided into ten frequency sub-bands as temporal information of the EEG signal.
represented in Table 2. For each sub-band, the Considering the epoch x, the Hjorth parameters are
relative spectral power (RSP) was computed. This computed from the variance of x, var(x), and the
parameter is given by the ratio between the sub- first and second derivatives x’, x’’ according to
band spectral power (BSP) and the total spectral (Ansari-Asl et al, 2007)
power, i.e., the sum of all 10 BSP sub-bands. This
normalization is important to increase classification Activity = var( x) (7)
robustness during the recording session.
Mobility = var( x' ) var( x) (8)
Table 2: Spectral sub-bands used in RSP computation.
Bands Sub-bands
Bandwidth Complexity = var( x' ' ) × var( x) var( x' ) 2 . (9)
{fL,fH} (Hz)
Delta 1 {0.5,2.0} The entropy gives a measure of signal disorder
Delta
Delta 2 {2.0,4.0} and can provide relevant information in the
Theta 1 {4.0,6.0} detection of some sleep disturbs. It is computed
Theta
Theta 2 {6.0,8.0} from histogram of the EEG samples of each sleep
Alpha 1 {8.0,10.0} epoch, according with (Zoubek et al, 2007)
Alpha
Alpha 2 {10.0,12.0}
Sigma 1 {12.0,14.0}
Sigma N
ni  ni 
Sigma 2 {14.0,16.0} Entropy = −∑ ln ,
n  n 
(10)
Beta 1 {16.0,25.0} i =1
Beta
Beta 2 {25.0,35.0}
where n is the number of samples within the sleep 5 BAYESIAN CLASSIFICATION
epoch, N is the number of bins used in computation
of histogram and ni is the number of samples within The conditional density function of the class i is
the ith bin. modelled as a multivariate distribution under
The skewness is a measure of symmetry. The gaussian assumption
kurtosis is a measure of wether the data are peaked
or flat relative to a normal distribution. Defining
the kth order moment mk as (Zoubek et al, 2007) ( )
P(Y | µi , Σ i ) = K exp − (Y − µi )T Σ i−1 (Y − µi ) / 2
(16)
k where,
1 n
mk = ∑ y (i ) − y ,
n i =1
( ) (11)
(
K = 1 (2π )n 2 Σ i
12
), (17)
where n is the number of samples of an epoch and Y is the feature vector resulting from concatenation
y is the mean of these samples, the skewness and of the extracted features, µ i and Σi are respectively,
kurtosis are given by the mean and covariance matrices computed for
each class wi from the training data. The Bayes
skewness = m3 m2 × m2 (12) decision function is written as:
and
kurtosis = m4 m2 × m2 . wˆ (Y ) = arg max{{∆ 2 p(Y | w1 )P(w1 )},
(13) ,
{∆1 p(Y | w2 )P(w2 )}} (18)
Features are usually selected by wrapper or

filter methods using sequential approaches. The where P(wi) is the ith class prior probability and ∆i
results from wrappers methods are dependent of the an adjustment parameter to control the rate of false
choice of the classification algorithm. Our option positives and false negatives (Heijden et al, 2004).
fell on an R-square filter approach which is
independent of the classifier, based on the Pearson
correlation coefficient defined as (Guyon and 6 RESULTS AND DISCUSSION
Elisseeff, 2003):
The feature extraction process provides a vector of
cov( X , Y )
ℜ= , 204 features, 34 features per each EEG channel: 10
var( X ) var(Y )
(14)
RSP, 3 SWI, 15 harmonic parameters, 3 Hjorth
Parameters, 1 entropy feature, 1 skewness and 1
kurtosis. Next, the features are sorted in a
where X and Y represent two random distributions
decreasing order of level of discrimination by
of samples, and cov and var designates covariance
applying the R-squared based selection approach.
and variance, respectively. Considering xi and yi as
Figure 2 shows the percentage of disagreement for
the sample values of feature i labelled with class 1
wake/sleep detection between our ASSC system
and class 2, respectively, the value R(i) for the
and expert classification (i.e. the percentage of
feature i is given by:
epochs for which the automatic classification differs
from manual classification made by the expert), as
∑
m
( xi , k − xi )( yi , k − yi ) function of the number of features, i. e., the n-most
R (i ) = k =1
, (15) discriminative features with n = 1,…,52. The
∑ ( xi , k − xi ) 2 ∑ k =1 ( yi , k − yi ) 2
m m
k =1 disagreement values are obtained from a ten-fold
cross validation. The lowest disagreement value
where xi and yi represent the mean value of xi and was reached using the first 19 ranked features.
yi of the m samples. The R-square, computed as Table 3 presents the results for each binary
R(i)2, provide a level of discrimination between the classifier, using 1, 2, 3, 19 most discriminative
two classes. High values of R-square indicate large features and all 204 features. Selecting the relevant
inter-class separation and small within-class features reduces the number of features used in the
variance. The R-square provides a feature ASSC leading to an increased robustness of the
discrimination ranking. classifiers.
The feature selection also enables to identify the
type of features and channels that lead to higher
discrimination results for each 2-class discriminator
Wake vs. Sleep features, skewness or kurtosis. These parameters are
11
related to the signal shape. However, since the EEG
Pe rcenta ge of Disagreement
signal patterns are very random, it is difficult to

10 obtain useful information from these parameters.
Instead, the set of most discriminatory features
9 between sleep stages was composed mainly by RSP
and Harmonic Parameters. This result emphasizes
8
the fact that the spectral analysis has more
7
discriminative information than temporal signal
5 10 15 20 25 30 35 40 45 50 analysis as already concluded in (Hese et al, 2001;
Numbe r of Features
Tang et al, 2007).
Figure 2: Percentage of disagreement vs. number of
features used in wake vs. sleep classification. Table 4: Number of feature type and channels within the
20 most discriminative features.
N1 vs. N2/N3
N1/N2 vs. N3
N1 vs. N2
N2 vs. N3
R vs. NR
N1 vs. R
W vs. S
Table 3: Percentage of disagreement obtained using 1, 2,
Total
3 and the 19 most discriminative features and all 204
features.
1 2 3 19 204 RSP 5 4 6 6 5 6 13 45
W vs. S 11,4 10,7 8,8 7,0 16,7 SWI 3 2 2 2 0 4 4 17
R vs NR 22,5 21,4 19,5 15,6 30,8
Features
HP 9 14 8 3 12 2 3 51
N1 vs. N2/N3 15,1 15,7 15,7 10,6 72,5
PHj 3 0 4 9 3 8 0 27
N1/N2 vs. N3 15,7 14,7 14,6 15,5 30,3
N1 vs. N2 21,9 22,6 18,5 15,6 63,9 Ent 0 0 0 0 0 0 0 0
N2 vs. N3 19,0 18,2 16,7 17,7 39,8 Skw 0 0 0 0 0 0 0 0
N1 vs. R 25,5 24,7 24,4 25,0 64,7 Krt 0 0 0 0 0 0 0 0
Mean 18,7 18,3 16,9 15,3 45,5 F3 1 6 6 3 4 2 5 27
C3 1 3 4 5 4 5 5 27
Channels
(Table 4). As it can be seen, the feature entropy O1 6 1 3 4 2 4 0 20

(Ent), Skewness (Skw) and kurtosis (Krt) never F4 2 5 3 2 4 1 5 22
appear in the 20 most discriminative features. On C4 5 3 3 4 4 5 5 29
the other hand, the most frequents are the RSP and O2 5 2 1 2 2 3 0 15
harmonic parameters. Analyzing the origin of the
20 most discriminative features for each case, the
parameters of Hjorth (PHj) are most evident in 50
Number of times it appears
N1/N2 vs. N3 and N2 vs. N3, but they have no

weight in R vs. NR and N1 vs. R. The harmonic 40
parameters are more frequent in W vs. S, N1 vs.

30
N2/N3 and N1 vs. N2, but are not relevant in R vs.
NR, N1 vs. N2/N3, N2 vs. N3 and N1 vs. R. For the 20
RSP and SWI, they have a similar number of
10
features in all discriminations, except for N1 vs. R,
where the RSP has several features with good 0
RSP SWI HP PHj Ent Sk Krt F3 C3 O1 F4 C4 O2
discrimination, and for N1 vs. N2, where SWI does Features Channels
not assume any importance. Analyzing the EEG
channels, it can be seen that O1A2 (O1) and O2A1 Figure 3: Number of times that each group of features and
(O2) are the most relevant in discrimination wake each channel appears in the 20 most discriminative
vs. sleep; F3A2 (F3) and F4A1 (F4) in REM vs. features.
NREM; and C3A2 (C3) and C4A1 (C4) in N2 vs.
N3. In the remaining discriminations, they all have On the other hand, all the 6-six EEG channels
a relatively uniform distribution, except in N1 vs. R, provide useful features for sleep staging
in which the channels O1A2 and O2A1 do not have discrimination. Analyzing the results for each of the
any type of contribution. Figure 3 shows the type of binary classifiers, there is greater disagreement in
features and channels that lead to higher the case of N1 vs. R sleep. This situation relates to
discrimination results, taking all discriminators the fact that, in terms of EEG, the patterns presented
together. Summarizing, the best ranked in these two stages are very similar. Finally, a
discriminative features never include entropy
decision tree was implemented based on 2-class to the patient. Further sessions can then use these
detection, as represented in Figure 4. At each step, a robust user-dependent models. This approach is
new level was introduced from a wake/sleep to all under research presently.
stages classification. The results were compared
with and without feature selection (Table 5). The
improvements from feature selection are evident.
The results obtained with our ASSC system are REFERENCES
comparable to the ones obtained in other methods
based on EEG only described in literature (zoubek et Ansari-Asl. K., Chanel, G., Pun, T., A channel Selection
al, 2007; Doroshenkov et al, 2007). Method for EEG Classification in Emotion
Assessment Based on Synchronization Likelihood. In
Level 1 EUSIPCO’07, 1241-1245.
Wake Level 2 Doroshenkov, L., Konyshev, V., Selishchev, S., 2007,
Epoch Classification of Human Sleep Stages Based on EEG
REM Level 3
Processing Using Hidden Markov Models.
Sleep N1 Level 4 Biomedical Engineering, 41(1), 25-28.
NREM N2 Ebrahimi, F., Mikaeili, M., Estrada, E., Nazeran, H.,
N2/N3 2008, Automatic Sleep Stage Classification Based on
Epoch N3 EEG Signals by Using Neural Networks and Wavelet
Packet Coefficients. In IEEE EMBS’08, 1, 1151-
Figure 4: Decision tree based on 2-class detection. 1154.
Guyon, I., Elisseeff, A., 2003, An introduction to variable
Table 5: Disagreement obtained with using 19 most and feature selection. Journal of Machine Learning
discriminative features and all 204 in 2, 3, 4 and 5 sleep Research, 3, 1157-1182.
stages classification. Heijden, F., Duin, R., Ridder, D., Tax, D., 2004,
Classification, Parameter Estimation and State
Diasagreement (%) Estimation. John Wiley & Sons.
Classification
All Features 19 Hese, P., Philips, W., Koninck, J., Walle, R., Lemahieu,
2 Class 36 7 I., 2001, Automatic Detection of Sleep Stages Using
3 Class 62 18 the EEG. In IEEE EMBS’01,Proc. of, 1994-1947.
4 Class 83 22 Iber., C., Ancoli-Israel, S., Chesson, A., Quan, S., 2007,
5 Class 83 29
The AASM Manual for the scoring of Sleep and
Associated Events: Rules. Terminology and Technical
Specifications (1st ed.). Westchester, Illinois:
7 CONCLUSIONS American Academy of Sleep Medicine.
Rechtscheffen, A., Kales, A., 1968, A Manual of
Standardized Terminology, Techniques ans Scoring
In this paper, the use of several feature extraction System for Sleep Stages of Human Subjects, US
methods was investigated in the context of EEG- Government Printing Office, National Institute of
based sleep staging. The first conclusion was that Health Publications, Washington DC.
the most discriminative features were determined Schwaibold, M., Harms, R., Scholer, B., Pinnow, I.,
by RSP, SWI, Harmonic Parameters and Parameters Cassel, W., Penzel, T., Becker, H., Bolz, A., 2003,
of Hjorth. All the 6-EEG channels provide useful Knowledge-Based Automatic Sleep-Stage
information. On the other hand, the application of Recognition – Reduction in the Interpretation
the feature selection method improved, in general, Variability. Somnologie, 7, 59-65.
Tang, W. C., Lu, S. W., Tsai. X. M., Kao. C. Y., Lee. H.
the process of discrimination by selecting the set of H., 2007, Harmonic Parameters with HHT and
features that provided a lower percentage of Wavelet Transform for Automatic Sleep Stages
disagreement. One of the biggest problems in Scoring. Proc. of World Academy of Science.
automatic sleep staging based on EEG is the Engineering and Technology, 22, 414-417.
similarity between patterns of different sleep stages Yilmaz, A., Alkan, A., Asyali, M., 2007, Applications of
such as REM and NREM N1. This can be improved parametric spectral estimation methods on detection
recurring to other biosignals, such as EOG and of power system harmonics. Electric Power Systems
EMG. Another problem in ASSC is the high level Research (2008), 78, 683-693.
of variability between patients. Using an Zoubek, L., Charbonnier, S., Lesecq, S., Buguet, A.,
Chapolot, F., 2007, Feature selection for sleep/wake
ambulatory system, the patient can perform periodic stages classification using data driven methods.
recordings at home. This way, the first session can Biomedical Signal Processing and Control, 2, 171-
be fully analysed by the expert. The labelled data 179.
can be used to obtain classification models specific

ICINCO2010

Uploaded by

Copyright:

Available Formats

ICINCO2010

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICINCO2010

Uploaded by

Copyright:

Available Formats

FEATURE EXTRACTION AND SELECTION FOR AUTOMATIC

SLEEP STAGING USING EEG

Hugo Simões, Gabriel Pires

Urbano Nunes, Vitor Silva

1 INTRODUCTION implement in the algorithms and combines a macro

EEG Signal Feature Extraction Classification

Figure 1: Classification methodology.

In ASSC, the EEG is traditionally analyzed in fc = ∑ fP xx (f ) ∑P xx (f ) (4)

Features are usually selected by wrapper or

signal patterns are very random, it is difficult to

(Table 4). As it can be seen, the feature entropy O1 6 1 3 4 2 4 0 20

N1/N2 vs. N3 and N2 vs. N3, but they have no

parameters are more frequent in W vs. S, N1 vs.

You might also like