DE GRUYTER
Current Directions in Biomedical Engineering 2020;6(3): 20203091
Martin Golz*, Sebastian Thomas and Adolf Schenka
EEG-Based Classification of the
Driver Alertness State
The value of the Generalized Matrix Relevance Learning Vector Quantization method
Abstract: GMLVQ (Generalized Matrix Relevance Learning
Vector Quantization) is a method of machine learning with an
adaptive metric. While training, the prototype vectors as well
as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with
other machine learning methods employing a fixed metric. It
was investigated how accurately the methods can assign the
6-channel EEG of 25 young drivers, who drove overnight in
the simulation lab, to the two classes of mild and severe
drowsiness. Results of cross-validation show that GMLVQ is
at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting
machines (GBM) and cannot exploit the potential of learning
adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from
the weighting matrix.
Keywords: Electroencephalogram, EEG, driving simulation,
drowsiness, classification, machine learning, generalized matrix relevance learning vector quantization, support-vector machine, gradient boosting machine.
https://doi.org/10.1515/cdbme-2020-3091
1 Introduction
Due to their phylogeny, humans are not prepared for some
occupations in modern society, such as keeping awake at the
wheel of a car for long periods of time. As the length of the
waking period and the time on task increase, and as the circadian trough approaches, alertness may decrease and fluctuate
dangerously despite a high risk of accidents [1]. Further important influencing factors of maintaining alertness are the sensation of monotony and the decrease of responsibility awareness, both of which depend on the personality type [2].
______
*Corresponding author: Martin Golz, University of Applied Sciences, Schmalkalden, Germany, e-mail:
[email protected]
Sebastian Thomas, Adolf Schenka: University of Applied Sciences, Schmalkalden, Germany
Open Access. © 2020 Martin Golz et al., published by De Gruyter.
A reliable assessment of the alertness state would be very helpful for sleep theory, because there are still many open questions regarding the transition from waking to sleeping state.
For practice it is important as a reference method to validate
devices for monitoring driver alertness [1]. EEG is the most
promising and practicable candidate for a reliable assessment.
Many authors have investigated these field and corroborated
various hypotheses, none of which have been widely accepted so far. High accuracies could be achieved with methods of
computational intelligence, also known as machine learning
methods. We are here investigating whether a relatively new
method, the GMLVQ [3], has a particular advantage in the
analysis of EEG with its fluctuating character. The method
has a learning rule for an adaptive metric and could therefore
outperform other methods with fixed metrics.
Sleepiness detection by machine learning based on EEG
data has been the concern of several authors in the last two
decades. Using artificial neural networks (ANN), accuracies
of up to 95 % were achieved on features extracted from wavelet analysis and descriptive statistics of the EEG of 30 individuals [4]. Power spectral densities within 5 EEG bands of 20
professional and 35 casual drivers were classified with ANN
to detect emerging fatigue; accuracies of up to 83.1 % were
achieved [5]. After ICA-based removal of EOG artefacts in
the EEG of 40 drivers, a maximum accuracy of 86 % was demonstrated with SVM based on spectral features [6]. A comparison of 10 machine learning methods, which processed 4
entropy measures of the centro-parietal EEG, yielded accuracies of up to 96.6 % [7]. The CP4 channel was empirically
selected out of 32-channel EEG of 40 partially sleep-deprived
subjects. The study was conducted in a driving simulator with
a moving base. In [8] the 11-channel EEG of 16 young adults
was recorded under continuous cognitive load. Extreme learning machines employed on four non-linear wavelet features
yielded accuracies of up to 96.8 %.
In this contribution, we will not only deal with the question of accuracy and stability of machine learning but also
with the explainability of the results. With the extraction of
feature relevancies, a first path is paved for this, where the
methods additionally learn the importance of individual components of the input vectors.
This work is licensed under the Creative Commons Attribution 4.0 License.
Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 2
2 Material
EEG data were recorded during night-time in our real car driving simulation lab. 25 young adults (23.8 ± 3.1 years old, 12
females, 13 males) drove in 7 sessions from 1:00 to 7:40 am,
each lasting 40 minutes. A vigilance test was then carried out,
but this is not considered here. After a short break, next session started every hour. Time since sleep (TSS) was at least
16 hours and was checked by wrist actometry before investigations. The EEG was recorded with a portable Somnoscreen
system (SomnoMedics GmbH, Kist, Germany) and a sampling rate of 256 Hz. Collodion-fixed Ag/AgCl electrodes at
standard positions Fp1, Fp2, A1, C3, C4, A2, O1, O2 (common average reference) were used. Neutral channels (A1, A2)
and the recorded ECG and EOG will not be considered here.
The EEG was processed in 6 s long non-overlapping segments. Due to movement artefacts, a few segments were dropped. Segment length of 6 s proved to be empirically optimal.
Subsequently, the spectral power densities were directly estimated using the periodogram method. Logarithmic scaling
and averaging in 1 Hz broad bands within the interval 0 to 30
Hz resulted in 180 EEG features for 6 channels.
The independent target variable required for supervised
learning was generated as follows. During driving, the subjectively perceived level of fatigue on the Karolinska sleepiness scale (KSS) [9] was assessed every 5 min by the subject
and two observers at monitors. As the three values had similar progression, the mean was calculated and finally binarized: If the mean value was smaller than the 40th percentile
of the distribution, target value 1 (not severely fatigued) was
chosen; if it was larger than the 60th percentile, target value 2
(severely fatigued) was chosen. EEG segments assigned to a
KSS value between the 40th and 60th percentile were omitted.
3 Methods
GMLVQ (Generalized Matrix Relevance Learning Vector
Quantization) [3] is a classification method with adaptively
weighted Euclidean metrics based on the concept of vector
quantization, where prototype vectors 𝒘 are adapted to a data
distribution so that they represent a larger set of feature vectors 𝒙. Let 𝑆 be a dataset consisting of data pairs (𝒙, 𝑡) with
𝑡 being a discrete target variable (class label):
𝑆 = (𝒙 , 𝑡 ) 𝒙 ∈ ℝ , 𝑡 ∈ {1,2, … , 𝑛 }, 𝑖 ∈ {1,2, … , 𝑛 } (1)
𝑑 is the dimensionality of the feature space, 𝑛 the number of
classes (here: 𝑛 = 2) and 𝑛 the number of data pairs (here:
𝑛 = 45,214).
An optimal function ℎ∗ (𝒙) is sought, which maps any feature
vector 𝒙 ∈ ℝ to an output variable 𝑦, so that the classification accuracy is maximised:
(2)
ℎ∗ (𝒙): 𝒙 ↦ 𝑦 , 𝑦 ∈ {1,2, … , 𝑛 }
GMLVQ starts with initialization of a set of prototype vectors
with the same class labels as in equ. (1):
𝒘 ∈ ℝ , 𝜏 ∈ {1,2, … , 𝑛 }, 𝑗 ∈ {1,2, … , 𝑛 }
(3)
with 𝑛 being the number of prototype vectors, which should
always be lower than the number of data pairs: 𝑛 < 𝑛 .
In each iteration step a training example (𝒙 , 𝑡 ) ∈ 𝑆 is
randomly selected and the closest prototype vector 𝒘 , 𝜏 ∈
𝑆 with the same class label 𝜏 = 𝑡 is being searched for:
𝑆 =
𝒘 ,𝜏
𝒘
= arg min 𝑑 𝒙 , 𝒘
, 𝜏
=𝑡
(4)
𝒘
= arg min 𝑑 𝒙 , 𝒘
, 𝜏
≠𝑡
(5)
In addition, the closest prototype vector 𝒘 ∈ 𝑆 with unequal class label 𝜏 ≠ 𝑡 is being searched for:
GMLVQ employs as distance measure 𝑑(𝒙, 𝒘) the squared,
weighted Euclidean metric:
𝑑(𝒙, 𝒘) = (𝒙 − 𝒘) 𝑨(𝒙 − 𝒘) = ‖𝑩(𝒙 − 𝒘)‖ ≥ 0
(6)
Here, 𝑨 = 𝑩 𝑩 must be both symmetric and positive definite
to preserve the norm property. The difference vector (𝒙 − 𝒘)
is linearly transformed by 𝑩 and then the squared Euclidean
distance is calculated. Using equ. (6) and the abbreviations
𝑑 = 𝑑 𝒙 , 𝒘 , 𝑑 = 𝑑 𝒙 , 𝒘 the following learning rule
can be derived [3] for the adaptation of equ. (4) and (5):
Δ𝒘
Δ𝒘
𝑑
𝑑
𝑑
= −2𝜂 sgd
𝑑
= +2𝜂 sgd
−𝑑
+𝑑
−𝑑
+𝑑
2𝑑
𝑨(𝒙 − 𝒘 )
(𝑑 + 𝑑 )
2𝑑
𝑨(𝒙 − 𝒘 )
(𝑑 + 𝑑 )
sgd′ denotes the first derivative of the sigmoid function. The
learning rate 𝜂 should decrease monotonously with the number of iterations within the interval (0,1).
The matrix 𝑩 = 𝑏
is adapted during training and also
𝑨 = 𝑩 𝑩. The learning rule of 𝑩 was derived in [3] with
𝑞, 𝑝 ∈ {1,2, , … , 𝑑} being indices of vector components:
Δ𝑏
= −2𝜂 sgd
𝑑 −𝑑
𝑑 +𝑑
2𝑑
(𝑑 + 𝑑 )
𝑥 −𝑤
2𝑑
(𝑑 + 𝑑 )
−
𝑥 −𝑤
×
[𝑩(𝒙 − 𝒘 )]
[𝑩(𝒙 − 𝒘 )]
Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 3
4 Results
To compare the machine learning methods, cross-validation
of the type repeated random subsampling was performed. The
data set was randomly split into a training and a test set (validation set) with 30 repetitions in a ratio of 4:1 and then the
following three steps were performed:
1. Training
2. Empirical estimation of the classification accuracy 𝑎
on the training set
3. Empirical estimation of the classification accuracy 𝑎
on the test set
margin, soft margin, and kernel function substitution. This
together makes the method very powerful and difficult to
outperform. Here the Gaussian radial basis function was used
as kernel function. The method is characterized by very stable
and precise training and classifies the data unknown during
training, i.e. test data, 0.7 % more accurately than OLVQ1.
The most successful method proved to be the gradient
boosting machine (GBM) with two concepts for increasing
efficiency [10]. Each training run gained 100 % accuracy and
the mean accuracy on unknown test data was 91.1 %.
For LVQ1, OLVQ1, GRLVQ and GMLVQ own implementa-
Finally, mean value and standard deviation were estimated
over 30 repetitions of the three steps (Table 1). With an increasing number of repetitions and a larger partitioning ratio,
the estimation error would decrease, but the numerical effort
would increase considerably.
Table 1: Mean and standard deviation of the classification accuracy on training and test sets for six machine learning methods.
ML method
LVQ1
𝒂𝑻𝒓 [%]
𝒂𝑻𝒆 [%]
90.8 ± 0.1
88.8 ± 0.4
OLVQ1
93.7 ± 0.1
90.2 ± 0.3
GRLVQ
75.2 ± 1.2
75.1 ± 1.5
SVM
98.3 ± 0.0
90.9 ± 0.2
GBM
100.0 ± 0.0
91.1 ± 0.3
83.3 ± 0.8
82.8 ± 0.6
GMLVQ
The first method (LVQ1) is the standard LVQ version. The
result was generated with 920 neurons. The optimized version
(OLVQ1) uses the same learning rule, but uses an individual
learning rate for each neuron, which is decreasing in case of
training success and slightly increasing in case of failures. It
shows a higher mean accuracy and similar standard deviation
compared to LVQ1; OLVQ1 training is thus more precise.
The optimal result was achieved with 960 neurons.
The generalised relevance LVQ (GRLVQ) uses a weighted Euclidean metric, similar to GMLVQ. The metric is also
adapted similarly, but not identically to GMLVQ. The adaptive weights of the metric are only applied to isolated individual features. This corresponds to a diagonal form of the matrix 𝑨 in equation (6). It turned out, that GRLVQ is much less
accurate in training and testing when applied to our data set.
In contrast to LVQ1 and OLVQ1, the support vector machine (SVM) has a strict mathematical framework (as does
GRLVQ and GMLVQ) and has three basic concepts: large
Figure 1: Weighting matrix 𝑨 of the squared Euclidean metric of
equ. (6). The matrix was averaged over 30 training runs.
tions were used; for GRLVQ and GMLVQ also implementations of the original authors [3].
As mentioned above, 30 training repetitions were performed with different sub-samples during cross-validation to
estimate training as well as test accuracy. At the end of the 30
GMLVQ training runs, the resulting weighting matrices were
averaged (Fig.1). The main diagonal elements of the averaged
weighting matrix do not have outstanding values; a particular
structure of the matrix could not be identified. From the progression of the main diagonal elements, the relevance of the
features can be seen. This is because these elements scale the
features and a strongly scaled feature influences the distance
calculation more than a weakly scaled one; thus, scaling is
proportional to relevance. The progression of the 180 main
diagonal elements is shown in Fig. 2, whereby the features
were grouped to one of the 6 EEG channels and within each
channel section the frequencies were depicted. The are the
starting frequencies of each 1 Hz wide spectral band. Thus,
we have a plot of relevancies versus starting frequencies.
Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 4
Figure 2: Mean relevancies estimated by the main diagonal elements of the GMLVQ weighting matrix 𝑨 = 𝑩 𝑩. The plot is divided into
6 intervals for the EEG channels. In each interval the relevancies are shown versus the start frequencies of the 1 Hz wide bands.
The results are mean values over 30 training runs, in which the weighting matrix was adapted parallel to the prototype vectors.
5 Conclusions
Achieved classification accuracies are in the range of other
authors who used machine learning on EEG of mildly versus
severely fatigued persons. The relatively high mean accuracy
of 91 % with a relatively small standard deviation of 0.2 % is
remarkable because the independent criterion for the separation of both classes is by no means strict. It is subjective and
thus a not reliably reproducible criterion. The results are also
remarkable because of the nature of the EEG. It is a signal
mixture of an extreme multi-process in which several 100 million signal sources work independently of each other. However, the changes caused by the transition from high to low
alertness level are apparently global and affect enough brain
processes to allow the analysis to find a decision hypothesis
relatively accurately.
However, these conclusions apply to SVM and GBM.
They do not lead to hypotheses explainable for humans, as
they are sub-symbolic and extremely multicriterial. GMLVQ
proved to be less accurate on our data with almost 83 %. This
result is below that of the basic variant LVQ1 and its optimized version OLVQ1. We suspect that the adaptation of the
weight matrix may have to be regularised in a more elaborate
way to prevent it from becoming increasingly singular during
training [11]. A further indication for the failure of GMLVQ
was found in the increase in the number of neurons. No significant changes towards higher or lower accuracy were observed. In addition, the relevance values are difficult to interpret and do not agree with previous findings of other authors.
Author Statement
The author state no funding involved. Authors state no conflict of interest. Informed consent has been obtained from all
individuals included in this study. The research related to
human use complies with all the relevant national regulations,
institutional policies and was performed in accordance with
the tenets of the Helsinki Declaration, and has been approved
by the authors' institutional ethical review board.
References
[1]
Golz M, Sommer D, et al. (2010) Evaluation of fatigue monitoring technologies. Somnologie 14(3):187-99.
[2] Thackray RI, Jones KN, Touchstone RM (1974) Personality
and physiological correlates of performance decrement on a
monotonous task requiring sustained attention. British J Psychology 65(3):351-8.
[3] Schneider P, Biehl M, Hammer B (2009) Adaptive relevance
matrices in learning vector quantization. Neural computation
21(12):3532-61.
[4] Subasi A (2005) Automatic recognition of alertness level from
EEG by using neural network and wavelet coefficients. Exp
Syst Applic 28(4):701-11.
[5] King LM, Nguyen HT, Lal SKL (2006) Early driver fatigue
detection from electroencephalography signals using artificial
neural networks. Proc IEEE EMBS Conf:2187-90.
[6] Hu S, Zheng G, Peters B (2013) Driver fatigue detection from
EEG spectrum after EOG artefact removal. IET Intell Transp
Syst 7(1):105-13.
[7] Hu J (2017) Comparison of different features and classifiers
for driver fatigue detection based on a single EEG channel.
Comput Math Meth Med, ID5109530: 9 pages.
[8] Chen L, Zhao Y, Zhang J, Zou JZ (2015) Automatic detection
of alertness using wavelet-based nonlinear features and machine learning. Exp Syst Applic 42(21):7344-55.
[9] Kaida K, Takahashi M, Åkerstedt T, et al. (2006) Validation of
the Karolinska sleepiness scale against performance & EEG
variables. Clin Neurophysiol 117(7):1574-81.
[10] Ke G, Meng Q, Finley T, et al. (2017) LightGBM: A highly
efficient gradient boosting decision tree. In: Adv Neural Inf
Proc Syst:3146-54.
[11] Biehl M, Hammer B, Schleif FM et al. (2015) Stationarity of
matrix relevance LVQ. Proc Int Joint Conf Neural Netw:1-8.