Digital Tools in Polysomnography: Rajeev Agarwal and Jean Gotman
*Stellate Systems; and †Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University,
Montreal, Quebec, Canada
Summary: Recent advances in the computing power and storage devices have made
computer-based recording of polysomnograms (PSGs) very attractive. Digital PSGs
offer the possibility of automating many tedious and time-consuming tasks of identi-
fying sleep related events. Automation not only alleviates these laborious tasks, but
also introduces a measure of objectivity in the scoring of various discrete events. In this
paper we briefly review some automatic methods that have been previously developed
by the authors. Automatic sleep staging methods is emphasized with some illustrative
results on inter-scorer variations. We also discuss the leg movement event, respiratory
event, sleep spindle and rapid-eye-movement detection methods. Key Words: Poly-
somnography—Automatic sleep staging—Segmentation—Self– organization.
The replacement of paper tracings by computer-based However, many of the definitions related to sleep staging
recordings in polysomnography (PSG) has saved not or PSG events are imprecise; for instance, the universally
only many trees and a lot of storage space, but it has accepted rules of state (Rechtschaffen and Kales, 1968,
provided flexibility in visual review, allowing the display p.5) (hereafter referred to as R&K rules): “Stage 1 is
of traces with varying characteristics such as filters, defined by a relatively low voltage . . . The transition
gains, and time scale. The processing power of comput- from an alpha record to stage 1 is characterized by a
ers also allows the automation of many tasks that are decrease in the amount, amplitude and frequency of
tedious and time-consuming, and introduces a degree of alpha activity.” The use of terms such as “low voltage”
objectivity in decisions that are often quite subjective. or “decrease in amplitude” require subjective interpreta-
The automation deals particularly with sleep staging and tion on the part of the human scorer. They must be
the detection of discrete events such as respiratory transformed for the computer program into a fixed rule
events, leg movements, eye movements, and spindles.
that may result in sleep staging that will agree with some
It must be remembered, however, that it is rare that
scorers and not with others. Each user must therefore
these methods can be applied without human validation.
review the results of computer scoring or detection and
This is for two reasons: The first is that the methods are
make sure they are acceptable.
not perfect and can sometimes err because of the pres-
ence of artifacts or an unexpected configuration of the We review a recently developed method of comput-
event. The second, and probably the most important, er-assisted sleep staging that incorporates the prefer-
relates to a fundamental problem of computer-based ence of each scorer before completing the analysis,
analysis: A computer program must have a precise def- and a series of methods for detecting discrete events
inition of a problem to encode it into an algorithm. that are of interest in sleep studies. Although auto-
matic detection of spikes and seizures can also be
useful during sleep studies, we do not review them
Address correspondence and reprint requests to Dr. Rajeev Agarwal,
Stellate Systems, 345 Victoria Avenue, Suite 300, Montreal, Quebec, here because they have been reviewed relatively re-
Canada, H3Z 2N2. cently (Gotman, 1999).
AUTOMATIC SLEEP STAGING approaches (Schaltenbrand, 1996) have also been pro-
posed. With these methods the agreement of manual
The diagnosis and treatment of patients with sleep- scoring ranges from 70 to 85% with specific datasets and
related problems requires the classification of sleep into testing conditions.
various sleep states. The current practice throughout the Many of these methods are rule based, for which the
world is to use the classification scheme defined by the staging rules are translated by the developer based on
committee headed by Rechtschaffen and Kales (1968).
experience and the data at hand. The often subjective
This standard recommends scoring (more commonly
nature of the rules, as in the terms “relatively low
referred to as staging) 20 or 30-second epochs of PSG
amplitude” or “decrease in amplitude,” must be quanti-
consisting of EEG, electromyographic (EMG), and elec-
fied, which may not meet the preference of all users or all
tro-oculographic (EOG) data into one of six key sleep
data types. Many rules require the setting of thresholds,
stages: wake, light sleep (consisting of stages 1 and 2),
which will clearly work for the data with which they are
slow-wave sleep (consisting of stages 3 and 4), and rapid
developed; however, as new data are tested these thresh-
eye movement (REM) stage according to the defined
olds may not be appropriate. In the case of neural
rules. The R&K rules were designed according to the
network-based methods, training data are required. The
sleep architecture of normal young adult subjects and are
performance of automatic staging may not be acceptable
now used for all types of PSGs. The state of sleep is a
on a dataset from a population other than the one used for
continuum from light sleep to deep sleep that is inter-
training. Although there have been numerous attempts at
rupted periodically by REM sleep. Thus, the R&K rules
are in fact a simplification to serve as a standardization automating the sleep staging process, resulting in part
for comparison. As discussed earlier, many of these rules from some of the previously mentioned reasons, none of
are qualitative and are not defined precisely. Any crite- them have found credibility in clinical practice. Many of
rion that attempts to characterize PSG epochs with dis- the authors of these works readily admit that the perfor-
tinct features is bound to lead to considerable variations mance of their method may deteriorate as the technical
in interpretation by different scorers. The level of varia- quality decreases or the population varies. It is clear that
tions in scoring is reflected by the interscorer agreement there is room for an automatic staging method using
of 67 to 91% (Gaillard and Tissot, 1973; Kim et al., evolving or adaptive schemes that can adjust depending
1992; Kubicki et al., 1989; Stanus et al., 1987). Conse- on the type and quality of PSGs. The various thresholds
quently, serious objection has been raised against the should be self-adjusting.
idea of manual scoring (Kubicki and Herrmann, 1996). COMPUTER-ASSISTED SLEEP STAGING:
Moreover, scoring of all-night PSG is tedious, time- SEGMENTATION AND SELF-ORGANIZATION
consuming, and can become very difficult for abnormal
records. In a very simple definition of sleep architecture, REM
and non-REM sleep cycle every 60 to 90 minutes
throughout the night. The normal adult sleep architecture
starts through the non-REM state, which consists of the
To combat some of the problems with manual staging four substates (R&K stages 1 to 4) describing the depth
of PSGs, during the last several decades many techniques of sleep followed by the REM state. Thereafter, the
of automating the staging process have been proposed. non-REM and REM states cycle throughout the night.
During the early years, automatic schemes based on Within the non-REM state, each of the four R&K stages
period analysis (Itil et al., 1969), EEG spectra with persists for a finite time and the transition between stages
multiple discriminant analysis (Larsen and Walter, is subjective. An important aspect of sleep staging is the
1970), and hybrid techniques that combine analog and idea that a pattern of the PSG is assumed to exist for a
digital techniques (Smith and Karacan, 1971; Smith et finite time until a new pattern emerges, signaling the
al., 1969) were proposed with varying performance. change of state. These patterns are recurrent throughout
Various scoring criteria were used in the development of the night and their differentiation is based on primitive
automatic methods and their performance was evaluated sleep-related patterns such as frequency, amplitude,
using varying techniques. More recently, the ideas of sleep spindles, k-complexes, and so forth.
pattern recognition (Martin et al., 1972), wave detection To address some of the problems discussed in the
and the Bayesian approach (Stanus et al., 1987), the previous section, in a recent publication we presented the
interval histogram method (Kuwahara et al., 1988), ex- Computer-Assisted Sleep Staging (CASS) method
pert system (Ray et al., 1986), and neural network (Agarwal and Gotman, 2001a), which takes advantage of
the notion that patterns are recurrent throughout the TABLE 1. Interscorer agreement for the three sets of
night. The method finds pseudonatural patterns in the hypnograms
PSG, which can be labeled subsequently by the user as Scorer Manual Automatic Rescored
one of the predefined R&K sleep stages or as any other
sleep staging classification scheme of preference. We A vs. B 74.6 91.6 77.1
A vs. C 84.4 94.2 84.3
refer to these patterns as pseudonatural because of the A vs. D 80.1 93.3 77.0
ill-defined (requiring subjective interpretation) transitory A vs. E 78.5 93.3 79.3
epochs (for example, from stage wake to stage 1 or stage B vs. C 75.5 95.0 82.9
B vs. D 71.1 95.8 77.6
1 to stage 2). In reality, the exact change of stage is only B vs. E 71.2 95.8 78.8
a methodological concept because sleep is a continuum. C vs. D 82.5 96.7 85.6
This sleep continuum can be split into any number of C vs. E 80.5 96.7 82.3
D vs. E 78.6 100.0 76.7
patterns defined by the user— hence the pseudonatural
patterns. The method is based on the ideas of segmenta- Average 77.7 95.2 80.2
tion and self-organization to cluster the different PSG SD 4.5 2.3 3.3
patterns. The underlying idea is to group staging epochs, SD, standard deviation.
in which the epochs in a group contain relatively similar
patterns. Subsequently, each group can be labeled with
any name to specify any given stage in the classification context, the properties of the considered epoch clearly
scheme of preference (Agarwal and Gotman, 2001b). allow such a reclassification. For example, some stage 2
The following summarizes the key steps in the method. epochs may be reclassified as REM, wake, or stage 1 if
1. The PSG data channels relevant to staging (EEG the properties of the considered epoch clearly warrant
from central and occipital derivation, submental EMG, such a change and the context favors the change. The
and left and right EOGs) are broken down simulta- thresholds needed for such rules are patient specific,
neously into variable-length epochs in which the signals which the algorithm learns from the preliminary
are relatively stationary (Agarwal and Gotman, 1999). hypnogram.
During the course of the night, this yields several thou- 8. For some of the reasons discussed at the beginning
sand, five-channel, variable-length segments. of this article, it is rare that results of automatic methods
2. Each segment is parameterized by a set of basic can be used without visual review. The final step requires
sleep-related features (amplitude, frequency, presence of the visual review of the automatically generated hypno-
spindles, presence of REM, various spectral measures, gram, during which the reviewer can change the score of
etc.). any with which he disagrees. This process will yield a
3. Based on the parameterization in step 2, the seg- second hypnogram that we have termed the rescored
ments are clustered into a maximum of 8 to 10 groups hypnogram.
using a self-organization scheme based on the k-means Agarwal and Gotman (2001a) showed that on a dataset
clustering method. with varying hypnogram types, the CASS method
4. Segment clustering information is translated onto achieved relatively good performance—more than 80%
conventional, fixed-size staging epochs. Note that we agreement when compared with manual scoring. In this
have created clusters of epochs where each cluster rep- article, we selected subject C of that dataset to get some
resents, in theory, one type of pattern existing in the indication of interscorer differences. Five scorers (four
PSG. registered PSG technologists with substantial clinical
5. The user is shown a few sample epochs from each experience and one with notable research experience in
cluster and is asked to associate a valid R&K stage with scoring PSGs) were asked to score the PSG as described
it. Using this process, we have incorporated user prefer- in Agarwal and Gotman (2001a). For each scorer, this
ence in staging. generated three hypnograms: (1) scored manually
6. The method uses the staged sample epochs to stage (MANUAL), (2) scored automatically using CASS
the remaining epochs in the PSG to generate a prelimi- (AUTO; each scorer has a different AUTO score because
nary hypnogram. the CASS system follows the preference of that scorer,
7. Postprocessing. We used sleep-related primitive and (3) an automatically scored hypnogram that has been
features and apply specific decisional criteria for each corrected or rescored to the scorer’s satisfaction (RES-
stage type to justify the stage assignment of each epoch CORED). The RESCORED set is essentially another set
in the preliminary hypnogram. During this step, it may of MANUAL scoring. Table 1 shows the results of
be necessary to reclassify an epoch if, in the current interscorer agreement for the three sets of hypnograms.
