

Journal of Clinical Neurophysiology

19(2):136–143, Lippincott Williams & Wilkins, Inc., Philadelphia

© 2002 American Clinical Neurophysiology Society

Digital Tools in Polysomnography

*Rajeev Agarwal and †Jean Gotman

*Stellate Systems; and †Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University,
Montreal, Quebec, Canada

Summary: Recent advances in the computing power and storage devices have made
computer-based recording of polysomnograms (PSGs) very attractive. Digital PSGs
offer the possibility of automating many tedious and time-consuming tasks of identi-
fying sleep related events. Automation not only alleviates these laborious tasks, but
also introduces a measure of objectivity in the scoring of various discrete events. In this
paper we briefly review some automatic methods that have been previously developed
by the authors. Automatic sleep staging methods is emphasized with some illustrative
results on inter-scorer variations. We also discuss the leg movement event, respiratory
event, sleep spindle and rapid-eye-movement detection methods. Key Words: Poly-
somnography—Automatic sleep staging—Segmentation—Self– organization.

The replacement of paper tracings by computer-based However, many of the definitions related to sleep staging
recordings in polysomnography (PSG) has saved not or PSG events are imprecise; for instance, the universally
only many trees and a lot of storage space, but it has accepted rules of state (Rechtschaffen and Kales, 1968,
provided flexibility in visual review, allowing the display p.5) (hereafter referred to as R&K rules): “Stage 1 is
of traces with varying characteristics such as filters, defined by a relatively low voltage . . . The transition
gains, and time scale. The processing power of comput- from an alpha record to stage 1 is characterized by a
ers also allows the automation of many tasks that are decrease in the amount, amplitude and frequency of
tedious and time-consuming, and introduces a degree of alpha activity.” The use of terms such as “low voltage”
objectivity in decisions that are often quite subjective. or “decrease in amplitude” require subjective interpreta-
The automation deals particularly with sleep staging and tion on the part of the human scorer. They must be
the detection of discrete events such as respiratory transformed for the computer program into a fixed rule
events, leg movements, eye movements, and spindles.
that may result in sleep staging that will agree with some
It must be remembered, however, that it is rare that
scorers and not with others. Each user must therefore
these methods can be applied without human validation.
review the results of computer scoring or detection and
This is for two reasons: The first is that the methods are
make sure they are acceptable.
not perfect and can sometimes err because of the pres-
ence of artifacts or an unexpected configuration of the We review a recently developed method of comput-
event. The second, and probably the most important, er-assisted sleep staging that incorporates the prefer-
relates to a fundamental problem of computer-based ence of each scorer before completing the analysis,
analysis: A computer program must have a precise def- and a series of methods for detecting discrete events
inition of a problem to encode it into an algorithm. that are of interest in sleep studies. Although auto-
matic detection of spikes and seizures can also be
useful during sleep studies, we do not review them
Address correspondence and reprint requests to Dr. Rajeev Agarwal,
Stellate Systems, 345 Victoria Avenue, Suite 300, Montreal, Quebec, here because they have been reviewed relatively re-
Canada, H3Z 2N2. cently (Gotman, 1999).


AUTOMATIC SLEEP STAGING approaches (Schaltenbrand, 1996) have also been pro-
posed. With these methods the agreement of manual
The diagnosis and treatment of patients with sleep- scoring ranges from 70 to 85% with specific datasets and
related problems requires the classification of sleep into testing conditions.
various sleep states. The current practice throughout the Many of these methods are rule based, for which the
world is to use the classification scheme defined by the staging rules are translated by the developer based on
committee headed by Rechtschaffen and Kales (1968).
experience and the data at hand. The often subjective
This standard recommends scoring (more commonly
nature of the rules, as in the terms “relatively low
referred to as staging) 20 or 30-second epochs of PSG
amplitude” or “decrease in amplitude,” must be quanti-
consisting of EEG, electromyographic (EMG), and elec-
fied, which may not meet the preference of all users or all
tro-oculographic (EOG) data into one of six key sleep
data types. Many rules require the setting of thresholds,
stages: wake, light sleep (consisting of stages 1 and 2),
which will clearly work for the data with which they are
slow-wave sleep (consisting of stages 3 and 4), and rapid
developed; however, as new data are tested these thresh-
eye movement (REM) stage according to the defined
olds may not be appropriate. In the case of neural
rules. The R&K rules were designed according to the
network-based methods, training data are required. The
sleep architecture of normal young adult subjects and are
performance of automatic staging may not be acceptable
now used for all types of PSGs. The state of sleep is a
on a dataset from a population other than the one used for
continuum from light sleep to deep sleep that is inter-
training. Although there have been numerous attempts at
rupted periodically by REM sleep. Thus, the R&K rules
are in fact a simplification to serve as a standardization automating the sleep staging process, resulting in part
for comparison. As discussed earlier, many of these rules from some of the previously mentioned reasons, none of
are qualitative and are not defined precisely. Any crite- them have found credibility in clinical practice. Many of
rion that attempts to characterize PSG epochs with dis- the authors of these works readily admit that the perfor-
tinct features is bound to lead to considerable variations mance of their method may deteriorate as the technical
in interpretation by different scorers. The level of varia- quality decreases or the population varies. It is clear that
tions in scoring is reflected by the interscorer agreement there is room for an automatic staging method using
of 67 to 91% (Gaillard and Tissot, 1973; Kim et al., evolving or adaptive schemes that can adjust depending
1992; Kubicki et al., 1989; Stanus et al., 1987). Conse- on the type and quality of PSGs. The various thresholds
quently, serious objection has been raised against the should be self-adjusting.
idea of manual scoring (Kubicki and Herrmann, 1996). COMPUTER-ASSISTED SLEEP STAGING:
Moreover, scoring of all-night PSG is tedious, time- SEGMENTATION AND SELF-ORGANIZATION
consuming, and can become very difficult for abnormal
records. In a very simple definition of sleep architecture, REM
and non-REM sleep cycle every 60 to 90 minutes
throughout the night. The normal adult sleep architecture
starts through the non-REM state, which consists of the
To combat some of the problems with manual staging four substates (R&K stages 1 to 4) describing the depth
of PSGs, during the last several decades many techniques of sleep followed by the REM state. Thereafter, the
of automating the staging process have been proposed. non-REM and REM states cycle throughout the night.
During the early years, automatic schemes based on Within the non-REM state, each of the four R&K stages
period analysis (Itil et al., 1969), EEG spectra with persists for a finite time and the transition between stages
multiple discriminant analysis (Larsen and Walter, is subjective. An important aspect of sleep staging is the
1970), and hybrid techniques that combine analog and idea that a pattern of the PSG is assumed to exist for a
digital techniques (Smith and Karacan, 1971; Smith et finite time until a new pattern emerges, signaling the
al., 1969) were proposed with varying performance. change of state. These patterns are recurrent throughout
Various scoring criteria were used in the development of the night and their differentiation is based on primitive
automatic methods and their performance was evaluated sleep-related patterns such as frequency, amplitude,
using varying techniques. More recently, the ideas of sleep spindles, k-complexes, and so forth.
pattern recognition (Martin et al., 1972), wave detection To address some of the problems discussed in the
and the Bayesian approach (Stanus et al., 1987), the previous section, in a recent publication we presented the
interval histogram method (Kuwahara et al., 1988), ex- Computer-Assisted Sleep Staging (CASS) method
pert system (Ray et al., 1986), and neural network (Agarwal and Gotman, 2001a), which takes advantage of

the notion that patterns are recurrent throughout the TABLE 1. Interscorer agreement for the three sets of
night. The method finds pseudonatural patterns in the hypnograms
PSG, which can be labeled subsequently by the user as Scorer Manual Automatic Rescored
one of the predefined R&K sleep stages or as any other
sleep staging classification scheme of preference. We A vs. B 74.6 91.6 77.1
A vs. C 84.4 94.2 84.3
refer to these patterns as pseudonatural because of the A vs. D 80.1 93.3 77.0
ill-defined (requiring subjective interpretation) transitory A vs. E 78.5 93.3 79.3
epochs (for example, from stage wake to stage 1 or stage B vs. C 75.5 95.0 82.9
B vs. D 71.1 95.8 77.6
1 to stage 2). In reality, the exact change of stage is only B vs. E 71.2 95.8 78.8
a methodological concept because sleep is a continuum. C vs. D 82.5 96.7 85.6
This sleep continuum can be split into any number of C vs. E 80.5 96.7 82.3
D vs. E 78.6 100.0 76.7
patterns defined by the user— hence the pseudonatural
patterns. The method is based on the ideas of segmenta- Average 77.7 95.2 80.2
tion and self-organization to cluster the different PSG SD 4.5 2.3 3.3
patterns. The underlying idea is to group staging epochs, SD, standard deviation.
in which the epochs in a group contain relatively similar
patterns. Subsequently, each group can be labeled with
any name to specify any given stage in the classification context, the properties of the considered epoch clearly
scheme of preference (Agarwal and Gotman, 2001b). allow such a reclassification. For example, some stage 2
The following summarizes the key steps in the method. epochs may be reclassified as REM, wake, or stage 1 if
1. The PSG data channels relevant to staging (EEG the properties of the considered epoch clearly warrant
from central and occipital derivation, submental EMG, such a change and the context favors the change. The
and left and right EOGs) are broken down simulta- thresholds needed for such rules are patient specific,
neously into variable-length epochs in which the signals which the algorithm learns from the preliminary
are relatively stationary (Agarwal and Gotman, 1999). hypnogram.
During the course of the night, this yields several thou- 8. For some of the reasons discussed at the beginning
sand, five-channel, variable-length segments. of this article, it is rare that results of automatic methods
2. Each segment is parameterized by a set of basic can be used without visual review. The final step requires
sleep-related features (amplitude, frequency, presence of the visual review of the automatically generated hypno-
spindles, presence of REM, various spectral measures, gram, during which the reviewer can change the score of
etc.). any with which he disagrees. This process will yield a
3. Based on the parameterization in step 2, the seg- second hypnogram that we have termed the rescored
ments are clustered into a maximum of 8 to 10 groups hypnogram.
using a self-organization scheme based on the k-means Agarwal and Gotman (2001a) showed that on a dataset
clustering method. with varying hypnogram types, the CASS method
4. Segment clustering information is translated onto achieved relatively good performance—more than 80%
conventional, fixed-size staging epochs. Note that we agreement when compared with manual scoring. In this
have created clusters of epochs where each cluster rep- article, we selected subject C of that dataset to get some
resents, in theory, one type of pattern existing in the indication of interscorer differences. Five scorers (four
PSG. registered PSG technologists with substantial clinical
5. The user is shown a few sample epochs from each experience and one with notable research experience in
cluster and is asked to associate a valid R&K stage with scoring PSGs) were asked to score the PSG as described
it. Using this process, we have incorporated user prefer- in Agarwal and Gotman (2001a). For each scorer, this
ence in staging. generated three hypnograms: (1) scored manually
6. The method uses the staged sample epochs to stage (MANUAL), (2) scored automatically using CASS
the remaining epochs in the PSG to generate a prelimi- (AUTO; each scorer has a different AUTO score because
nary hypnogram. the CASS system follows the preference of that scorer,
7. Postprocessing. We used sleep-related primitive and (3) an automatically scored hypnogram that has been
features and apply specific decisional criteria for each corrected or rescored to the scorer’s satisfaction (RES-
stage type to justify the stage assignment of each epoch CORED). The RESCORED set is essentially another set
in the preliminary hypnogram. During this step, it may of MANUAL scoring. Table 1 shows the results of
be necessary to reclassify an epoch if, in the current interscorer agreement for the three sets of hypnograms.

took, on average, twice as long to score the PSG manu-

ally compared with using the CASS method to generate
the PSG and then correct it (i.e., to obtain the RES-
CORED results).
Figure 1 is an example of the three hypnograms
generated by one scorer. Although there are differences
in an epoch-by-epoch comparison, it is clear that the
overall profile (thick light-gray line) in each of the three
hypnograms is very similar.
Computer-assisted sleep staging has several intrinsic
advantages over manual scoring. In contrast to manual
scoring, the CASS method analyses the signal properties
in the complete recording before staging, thus taking
advantage of patient-specific properties. Unlike most
other automatic staging methods, CASS does not rely on
FIG. 1. Example of three different hypnograms generated by one fixed or hard threshold settings. All thresholds required
scorer. (A) Manually scored hypnogram. (B) Automatically scored during the postprocessing step are learned from the
using the Computer-Assisted Sleep Staging method. (C) Automatically
generated hypnogram that has been corrected to the scorer’s preference. recording and are specific to the PSG under consider-
The automatically scored hypnogram in B shows more transition ation. For example, not all subjects demonstrate the same
between stages compared with the two manually staged hypnograms level of muscle activity (e.g., REM behavior disorder);
(A, C). This is to be expected because during automatic staging, unlike
manual staging, only minimal contextual information is used. Note, therefore, REM stage classification using patient-specific
however, that in all three cases the overall profile (thick gray outline in thresholds (as would be done in manual scoring) for
each hypnogram) is very similar.
determining atonia will clearly provide for an improved
staging of REM. Hard thresholds in the rule-based meth-
ods may lose REM stage for patients with REM behavior
Average interscorer agreement for the MANUAL set is disorder.
77.7% (range, 71.1 to 84.4%), for the AUTO set it is The method attempts to find naturally occurring pat-
95.2% (range, 91.6 to 100%), and for the RESCORED terns in the recording based on primitive features, which
set it is 80.2% (range, 76.7 to 85.6%). As expected, the are subsequently classified manually by the user (steps 5
interscorer agreement for the AUTO set is very high, and 6 listed earlier). This interactive participation allows
because differences come only from the small sample of the user to impart his staging preferences according to
epochs shown each scorer. The interscorer agreement for
any staging criterion. Because of the overwhelming pop-
the RESCORED set is a little better than the MANUAL
ularity of the R&K classification, the last step (postpro-
set because each scorer starts from hypnograms that are
cessing) of the method is designed to meet this need.
similar (the AUTO set). Comparison of AUTO and
This, however, can be modified easily for any other
MANUAL hypnograms across reviewers showed an av-
staging scheme of preference. Because of its generic
erage agreement of 65.7 ⫾ 7.2% (standard deviation),
nature, the CASS method is amenable to any staging
and for the AUTO versus the RESCORED hypnograms
criterion and it provides a tool to find a more natural
the average agreement was 78.0 ⫾ 5.7%. Considering
staging scheme.
that the expected interscorer agreement of the MANUAL
Most important, the CASS method allows some mea-
set is 77.7% (column 1 in Table 1), the average 65.7%
sure of objectivity to be introduced in the staging pro-
agreement of MANUAL versus AUTO scoring is very
cess, as reflected by the high interscorer agreement for
acceptable, particularly if the AUTO-generated hypno-
the automatic staging (column 2 in Table 1). For exam-
grams are considered as a first pass. In practice, however,
the user is more likely to score the PSG automatically ple, the separation of slow-wave sleep into stages 3 and
and then correct it. Thus, the 78% agreement between the 4 is more objective and consistent than visual scoring, as
AUTO and the RESCORED sets across reviewers pro- shown by Agarwal and Gotman (2001a).
vides a more meaningful indication of performance of AUTOMATIC DETECTION OF SLEEP
the CASS method. The 78% agreement is approximately SPINDLES
the same as the average interscorer agreements for both
the MANUAL and the RESCORED sets in Table 1. It is Sleep spindles are well-known to sleep investigators
also interesting to note that the reviewers found that it as evidence of onset of light sleep and are typically

FIG. 2. Examples of spindles detected by both manual and automatic

methods. (A) Clear spindles. (B) Marginal spindles. (C) Spindles
obscured by background slow activity. Note that the two sets of
spindles do not coincide with the start and duration of the spindles.

dominant in stage 2 of the R&K sleep classification.

These transient patterns are known as sigma waves and
they are defined as groups of rhythmic waves that last
from 0.5 to 3 seconds in the 12 to 14-Hz range. These
FIG. 3. Examples of respiratory events detected by manual and
potentials can show marked variation with regard to automatic methods. (A) Hypopnea. (B) Obstructive apnea (OA). (C)
morphology, frequency, spatial distribution, and stage of Central apnea (CA). The exact start and end of the automatically
sleep. Age and central nervous system disorders can also detected events do not coincide with the manually marked events. In
polysomnography, the number of these events during the course of the
influence the appearance of spindles. night is the important factor. More important, automatic detection of
Typical spindles are identified relatively easily by respiratory events will be consistent and will not vary with different
visual inspection of overnight PSGs. However, there are scorers.
often spindles that are obscured by superimposed activ-
ity. Regardless of whether they are easy to identify, the well as respiratory efforts (chest and/or abdomen).
manual detection of spindles is a very tedious and time- One channel of airflow and one channel of respiratory
consuming task. Some spindle detection algorithms us- effort are required in standard PSG. Cessation of
ing zero-crossing and phase-locked loops (Pivik et al., airflow and effort are used to define apneic events and
1982) have been presented in the literature. These tech- their classifications. Although there are rules that
niques require the use of specific models and require the define apneas and hypopneas, there is always the issue
selection of thresholds. We propose a robust algorithm to of translation of qualitative descriptions to computer
detect sleep spindles based on a strategy similar to the methods. For example, hypopneas are defined to be
previously discussed CASS method. The method at- typically less than 50% cessation in airflow for a
tempts to segment bandpass-filtered EEGs (11.5 to 15 minimum of 10 seconds. Words like “typically” create
Hz), extract relevant spindle features from each segment, a qualitative interpretation that can lead to interscorer
and cluster the segments based on these features. Clus- variations. This is particularly so with borderline
ters most likely to contain spindle segments are extracted events. The automatic method uses a moving back-
to yield candidate spindles. Similar to the CASS method, ground window in which the inspiratory reference
postprocessing based on statistical measures is used to amplitude is evaluated. Comparing the peak ampli-
filter the candidate spindles to remove the segments that tudes in a test window against the inspiratory refer-
are least likely to contain spindles. As with the CASS ence amplitude allows the detection of respiratory
method, the relevant postprocessing thresholds are not events. The inspiratory reference amplitude is defined
fixed, but are adapted from the PSG under consideration. to be the average peak amplitude in a given window.
Figure 2 shows some examples of automatically detected Figure 3 shows examples of automatically detected
spindles. The method achieves a high rate of sensitivity obstructive apnea, central apnea, and hypopnea. Over-
of approximately 90% and a specificity of approximately all, the method achieves an event detection sensitivity
65% with respect to manual scoring. The method allows of 96.5% and a specificity of 85.9% when compared
for a parameter that can control the specificity/sensitivity with manual scoring of these events.
Automatic Detection of Respiratory Events MOVEMENTS
To assess respiratory disturbances properly, it is Periodic movement in sleep is defined as short-lasting
necessary to monitor airflow (nasal and/or oral) as periods of brief, repetitive bursts of surface EMG, tonic

Note during the last step that the necessary detection

thresholds are relative to the PSG under consideration;
hence, there is no need for a priori threshold selection.
Figure 4 shows some examples of manually marked and
automatically detected leg movements. The results ob-
tained on five files (45 hours of data with 1,813 manually
marked leg movement events) show a sensitivity of
84.7% and a specificity of 73.1%. The method allows for
an input parameter that can be used to control the
sensitivity/specificity tradeoff.
FIG. 4. Example leg movement events marked automatically and
manually (two scorers) in a single recording of one subject. (A) Clear AUTOMATIC DETECTION OF RAPID EYE
leg movement detected by all. (B) Leg movement with clearly lower
amplitude detected by all. (C) Leg movement detected by the automatic
method but not marked manually by the two scorers. (D) Leg move-
ment marked by two scorers but not detected automatically, From B REM sleep is characterized by EEG activation,
and C, it is clear that the manual scoring of leg movement events muscle atonia, and episodic bursts of eye movements.
appears to be inconsistent, unlike automatic detection. More important, Even in the absence of atonia, REMs recorded with
this is achieved without setting any hard detection thresholds.
standard EOG electrodes have become the cardinal
sign for the detection of REM sleep. Moreover, there
in nature, that occur during quiet wakefulness and sleep. is continuing evidence that suggests that REM count
Most commonly they involve both legs and are referred abnormalities exist in several psychiatric syndromes
to as periodic leg movements. They can occur during when compared with normal subjects (Foster et al.,
various pathologic conditions, such as respiratory disor- 1976; Zarcone et al., 1987). This information has been
ders (apnea), restless leg syndrome, and narcolepsy. largely ignored primarily because of the tedious and
Excessive periodic leg movements, particularly when time-consuming nature of visual identification of
associated with repeated arousals, can lead to excessive REM events.
daytime sleepiness. Standardized rules are used to score Several methods based on eye movement models that
and quantify the individual leg movements visually, require threshold selection have been proposed in the
which are converted to periodic leg movements accord- literature (see, for example, Degler et al. [1975], Hatzi-
ing to set rules. The visual detection of leg movements is labrou et al. [1994], and Ktonas and Smith [1978]). We
a very tedious and time-consuming task, and consider- propose a REM detection method that incorporates var-
able training is required to achieve high interscorer ious eye movement features without explicitly using an
agreement. The level of EMG activity required for a leg eye movement model. The root of the method relies on
movement to occur is ill-defined, and thus substantial the idea that REMs have a rapid change in amplitude as
subjectivity is introduced in visual scoring. well as phase-reversed synchrony in the left and right
To combat some of these issues, we propose an EOG channels. A two-step process is used for REM
automatic leg movement detection method in which detections:
the need for the EMG amplitude threshold is removed. 1. Candidate REMs are detected using a detection
Like the spindle detection method, automatic leg criterion based on the previous two features.
movement detection is based on the segmentation and 2. Candidate REMs are filtered using a series of rules
clustering procedure. The method can be separated that can control the sensitivity/specificity tradeoff.
into four steps: Detection results on the training data indicate the
1. Segmentation. The EMG data are broken into vari- sensitivity and specificity to be on the order of 84%
able-length segments in which the level of EMG activity when compared with manual scoring of individual REM
is relatively unchanging. events (Agarwal et al., 2002). Figure 5 shows the basic
2. Feature extraction. Features describing the level of idea in REM detection.
EMG activity in each segment are evaluated.
3. Classification. The two-tiered clustering strategy is
applied to group segments with similar properties. CONCLUSION
4. Postprocessing. Clusters containing leg movements
are extracted to derive leg movement detection thresh- Polysomnographic studies involve the monitoring of
olds, which are used in the final detection of leg the temporal evolution of different sleep states that are
movements. referred to more commonly as sleep stages, as well as

dles, leg movement events, and REM events in which the

preselection of thresholds is practically eliminated. Al-
though the performance of these methods appears to be
quite good, it must be remembered that because of the
high variability in the data types and the varying inter-
pretation of the many ambiguous standards, manual re-
view of any automatically generated results is still es-
sential in the proper assessment of PSGs.

Acknowledgment: The authors thank Suzie Laroche, An-

thony Ferreli, Michael Sauve, Sherrye Hague, and Jim Davis
for their assistance in the PSG scoring.
FIG. 5. Examples of automatic detection of individual rapid eye
movements (REMs). (A) Criterion to generate a set of candidate REM
