Academia.eduAcademia.edu

EEG-Based Classification of the Driver Alertness State

2020, Current Directions in Biomedical Engineering

GMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.

DE GRUYTER Current Directions in Biomedical Engineering 2020;6(3): 20203091 Martin Golz*, Sebastian Thomas and Adolf Schenka EEG-Based Classification of the Driver Alertness State The value of the Generalized Matrix Relevance Learning Vector Quantization method Abstract: GMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix. Keywords: Electroencephalogram, EEG, driving simulation, drowsiness, classification, machine learning, generalized matrix relevance learning vector quantization, support-vector machine, gradient boosting machine. https://doi.org/10.1515/cdbme-2020-3091 1 Introduction Due to their phylogeny, humans are not prepared for some occupations in modern society, such as keeping awake at the wheel of a car for long periods of time. As the length of the waking period and the time on task increase, and as the circadian trough approaches, alertness may decrease and fluctuate dangerously despite a high risk of accidents [1]. Further important influencing factors of maintaining alertness are the sensation of monotony and the decrease of responsibility awareness, both of which depend on the personality type [2]. ______ *Corresponding author: Martin Golz, University of Applied Sciences, Schmalkalden, Germany, e-mail: [email protected] Sebastian Thomas, Adolf Schenka: University of Applied Sciences, Schmalkalden, Germany Open Access. © 2020 Martin Golz et al., published by De Gruyter. A reliable assessment of the alertness state would be very helpful for sleep theory, because there are still many open questions regarding the transition from waking to sleeping state. For practice it is important as a reference method to validate devices for monitoring driver alertness [1]. EEG is the most promising and practicable candidate for a reliable assessment. Many authors have investigated these field and corroborated various hypotheses, none of which have been widely accepted so far. High accuracies could be achieved with methods of computational intelligence, also known as machine learning methods. We are here investigating whether a relatively new method, the GMLVQ [3], has a particular advantage in the analysis of EEG with its fluctuating character. The method has a learning rule for an adaptive metric and could therefore outperform other methods with fixed metrics. Sleepiness detection by machine learning based on EEG data has been the concern of several authors in the last two decades. Using artificial neural networks (ANN), accuracies of up to 95 % were achieved on features extracted from wavelet analysis and descriptive statistics of the EEG of 30 individuals [4]. Power spectral densities within 5 EEG bands of 20 professional and 35 casual drivers were classified with ANN to detect emerging fatigue; accuracies of up to 83.1 % were achieved [5]. After ICA-based removal of EOG artefacts in the EEG of 40 drivers, a maximum accuracy of 86 % was demonstrated with SVM based on spectral features [6]. A comparison of 10 machine learning methods, which processed 4 entropy measures of the centro-parietal EEG, yielded accuracies of up to 96.6 % [7]. The CP4 channel was empirically selected out of 32-channel EEG of 40 partially sleep-deprived subjects. The study was conducted in a driving simulator with a moving base. In [8] the 11-channel EEG of 16 young adults was recorded under continuous cognitive load. Extreme learning machines employed on four non-linear wavelet features yielded accuracies of up to 96.8 %. In this contribution, we will not only deal with the question of accuracy and stability of machine learning but also with the explainability of the results. With the extraction of feature relevancies, a first path is paved for this, where the methods additionally learn the importance of individual components of the input vectors. This work is licensed under the Creative Commons Attribution 4.0 License. Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 2 2 Material EEG data were recorded during night-time in our real car driving simulation lab. 25 young adults (23.8 ± 3.1 years old, 12 females, 13 males) drove in 7 sessions from 1:00 to 7:40 am, each lasting 40 minutes. A vigilance test was then carried out, but this is not considered here. After a short break, next session started every hour. Time since sleep (TSS) was at least 16 hours and was checked by wrist actometry before investigations. The EEG was recorded with a portable Somnoscreen system (SomnoMedics GmbH, Kist, Germany) and a sampling rate of 256 Hz. Collodion-fixed Ag/AgCl electrodes at standard positions Fp1, Fp2, A1, C3, C4, A2, O1, O2 (common average reference) were used. Neutral channels (A1, A2) and the recorded ECG and EOG will not be considered here. The EEG was processed in 6 s long non-overlapping segments. Due to movement artefacts, a few segments were dropped. Segment length of 6 s proved to be empirically optimal. Subsequently, the spectral power densities were directly estimated using the periodogram method. Logarithmic scaling and averaging in 1 Hz broad bands within the interval 0 to 30 Hz resulted in 180 EEG features for 6 channels. The independent target variable required for supervised learning was generated as follows. During driving, the subjectively perceived level of fatigue on the Karolinska sleepiness scale (KSS) [9] was assessed every 5 min by the subject and two observers at monitors. As the three values had similar progression, the mean was calculated and finally binarized: If the mean value was smaller than the 40th percentile of the distribution, target value 1 (not severely fatigued) was chosen; if it was larger than the 60th percentile, target value 2 (severely fatigued) was chosen. EEG segments assigned to a KSS value between the 40th and 60th percentile were omitted. 3 Methods GMLVQ (Generalized Matrix Relevance Learning Vector Quantization) [3] is a classification method with adaptively weighted Euclidean metrics based on the concept of vector quantization, where prototype vectors 𝒘 are adapted to a data distribution so that they represent a larger set of feature vectors 𝒙. Let 𝑆 be a dataset consisting of data pairs (𝒙, 𝑡) with 𝑡 being a discrete target variable (class label): 𝑆 = (𝒙 , 𝑡 ) 𝒙 ∈ ℝ , 𝑡 ∈ {1,2, … , 𝑛 }, 𝑖 ∈ {1,2, … , 𝑛 } (1) 𝑑 is the dimensionality of the feature space, 𝑛 the number of classes (here: 𝑛 = 2) and 𝑛 the number of data pairs (here: 𝑛 = 45,214). An optimal function ℎ∗ (𝒙) is sought, which maps any feature vector 𝒙 ∈ ℝ to an output variable 𝑦, so that the classification accuracy is maximised: (2) ℎ∗ (𝒙): 𝒙 ↦ 𝑦 , 𝑦 ∈ {1,2, … , 𝑛 } GMLVQ starts with initialization of a set of prototype vectors with the same class labels as in equ. (1): 𝒘 ∈ ℝ , 𝜏 ∈ {1,2, … , 𝑛 }, 𝑗 ∈ {1,2, … , 𝑛 } (3) with 𝑛 being the number of prototype vectors, which should always be lower than the number of data pairs: 𝑛 < 𝑛 . In each iteration step a training example (𝒙 , 𝑡 ) ∈ 𝑆 is randomly selected and the closest prototype vector 𝒘 , 𝜏 ∈ 𝑆 with the same class label 𝜏 = 𝑡 is being searched for: 𝑆 = 𝒘 ,𝜏 𝒘 = arg min 𝑑 𝒙 , 𝒘 , 𝜏 =𝑡 (4) 𝒘 = arg min 𝑑 𝒙 , 𝒘 , 𝜏 ≠𝑡 (5) In addition, the closest prototype vector 𝒘 ∈ 𝑆 with unequal class label 𝜏 ≠ 𝑡 is being searched for: GMLVQ employs as distance measure 𝑑(𝒙, 𝒘) the squared, weighted Euclidean metric: 𝑑(𝒙, 𝒘) = (𝒙 − 𝒘) 𝑨(𝒙 − 𝒘) = ‖𝑩(𝒙 − 𝒘)‖ ≥ 0 (6) Here, 𝑨 = 𝑩 𝑩 must be both symmetric and positive definite to preserve the norm property. The difference vector (𝒙 − 𝒘) is linearly transformed by 𝑩 and then the squared Euclidean distance is calculated. Using equ. (6) and the abbreviations 𝑑 = 𝑑 𝒙 , 𝒘 , 𝑑 = 𝑑 𝒙 , 𝒘 the following learning rule can be derived [3] for the adaptation of equ. (4) and (5): Δ𝒘 Δ𝒘 𝑑 𝑑 𝑑 = −2𝜂 sgd 𝑑 = +2𝜂 sgd −𝑑 +𝑑 −𝑑 +𝑑 2𝑑 𝑨(𝒙 − 𝒘 ) (𝑑 + 𝑑 ) 2𝑑 𝑨(𝒙 − 𝒘 ) (𝑑 + 𝑑 ) sgd′ denotes the first derivative of the sigmoid function. The learning rate 𝜂 should decrease monotonously with the number of iterations within the interval (0,1). The matrix 𝑩 = 𝑏 is adapted during training and also 𝑨 = 𝑩 𝑩. The learning rule of 𝑩 was derived in [3] with 𝑞, 𝑝 ∈ {1,2, , … , 𝑑} being indices of vector components: Δ𝑏 = −2𝜂 sgd 𝑑 −𝑑 𝑑 +𝑑 2𝑑 (𝑑 + 𝑑 ) 𝑥 −𝑤 2𝑑 (𝑑 + 𝑑 ) − 𝑥 −𝑤 × [𝑩(𝒙 − 𝒘 )] [𝑩(𝒙 − 𝒘 )] Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 3 4 Results To compare the machine learning methods, cross-validation of the type repeated random subsampling was performed. The data set was randomly split into a training and a test set (validation set) with 30 repetitions in a ratio of 4:1 and then the following three steps were performed: 1. Training 2. Empirical estimation of the classification accuracy 𝑎 on the training set 3. Empirical estimation of the classification accuracy 𝑎 on the test set margin, soft margin, and kernel function substitution. This together makes the method very powerful and difficult to outperform. Here the Gaussian radial basis function was used as kernel function. The method is characterized by very stable and precise training and classifies the data unknown during training, i.e. test data, 0.7 % more accurately than OLVQ1. The most successful method proved to be the gradient boosting machine (GBM) with two concepts for increasing efficiency [10]. Each training run gained 100 % accuracy and the mean accuracy on unknown test data was 91.1 %. For LVQ1, OLVQ1, GRLVQ and GMLVQ own implementa- Finally, mean value and standard deviation were estimated over 30 repetitions of the three steps (Table 1). With an increasing number of repetitions and a larger partitioning ratio, the estimation error would decrease, but the numerical effort would increase considerably. Table 1: Mean and standard deviation of the classification accuracy on training and test sets for six machine learning methods. ML method LVQ1 𝒂𝑻𝒓 [%] 𝒂𝑻𝒆 [%] 90.8 ± 0.1 88.8 ± 0.4 OLVQ1 93.7 ± 0.1 90.2 ± 0.3 GRLVQ 75.2 ± 1.2 75.1 ± 1.5 SVM 98.3 ± 0.0 90.9 ± 0.2 GBM 100.0 ± 0.0 91.1 ± 0.3 83.3 ± 0.8 82.8 ± 0.6 GMLVQ The first method (LVQ1) is the standard LVQ version. The result was generated with 920 neurons. The optimized version (OLVQ1) uses the same learning rule, but uses an individual learning rate for each neuron, which is decreasing in case of training success and slightly increasing in case of failures. It shows a higher mean accuracy and similar standard deviation compared to LVQ1; OLVQ1 training is thus more precise. The optimal result was achieved with 960 neurons. The generalised relevance LVQ (GRLVQ) uses a weighted Euclidean metric, similar to GMLVQ. The metric is also adapted similarly, but not identically to GMLVQ. The adaptive weights of the metric are only applied to isolated individual features. This corresponds to a diagonal form of the matrix 𝑨 in equation (6). It turned out, that GRLVQ is much less accurate in training and testing when applied to our data set. In contrast to LVQ1 and OLVQ1, the support vector machine (SVM) has a strict mathematical framework (as does GRLVQ and GMLVQ) and has three basic concepts: large Figure 1: Weighting matrix 𝑨 of the squared Euclidean metric of equ. (6). The matrix was averaged over 30 training runs. tions were used; for GRLVQ and GMLVQ also implementations of the original authors [3]. As mentioned above, 30 training repetitions were performed with different sub-samples during cross-validation to estimate training as well as test accuracy. At the end of the 30 GMLVQ training runs, the resulting weighting matrices were averaged (Fig.1). The main diagonal elements of the averaged weighting matrix do not have outstanding values; a particular structure of the matrix could not be identified. From the progression of the main diagonal elements, the relevance of the features can be seen. This is because these elements scale the features and a strongly scaled feature influences the distance calculation more than a weakly scaled one; thus, scaling is proportional to relevance. The progression of the 180 main diagonal elements is shown in Fig. 2, whereby the features were grouped to one of the 6 EEG channels and within each channel section the frequencies were depicted. The are the starting frequencies of each 1 Hz wide spectral band. Thus, we have a plot of relevancies versus starting frequencies. Martin Golz et al., EEG-Based Classification of the Driver Alertness State — 4 Figure 2: Mean relevancies estimated by the main diagonal elements of the GMLVQ weighting matrix 𝑨 = 𝑩 𝑩. The plot is divided into 6 intervals for the EEG channels. In each interval the relevancies are shown versus the start frequencies of the 1 Hz wide bands. The results are mean values over 30 training runs, in which the weighting matrix was adapted parallel to the prototype vectors. 5 Conclusions Achieved classification accuracies are in the range of other authors who used machine learning on EEG of mildly versus severely fatigued persons. The relatively high mean accuracy of 91 % with a relatively small standard deviation of 0.2 % is remarkable because the independent criterion for the separation of both classes is by no means strict. It is subjective and thus a not reliably reproducible criterion. The results are also remarkable because of the nature of the EEG. It is a signal mixture of an extreme multi-process in which several 100 million signal sources work independently of each other. However, the changes caused by the transition from high to low alertness level are apparently global and affect enough brain processes to allow the analysis to find a decision hypothesis relatively accurately. However, these conclusions apply to SVM and GBM. They do not lead to hypotheses explainable for humans, as they are sub-symbolic and extremely multicriterial. GMLVQ proved to be less accurate on our data with almost 83 %. This result is below that of the basic variant LVQ1 and its optimized version OLVQ1. We suspect that the adaptation of the weight matrix may have to be regularised in a more elaborate way to prevent it from becoming increasingly singular during training [11]. A further indication for the failure of GMLVQ was found in the increase in the number of neurons. No significant changes towards higher or lower accuracy were observed. In addition, the relevance values are difficult to interpret and do not agree with previous findings of other authors. Author Statement The author state no funding involved. Authors state no conflict of interest. Informed consent has been obtained from all individuals included in this study. The research related to human use complies with all the relevant national regulations, institutional policies and was performed in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors' institutional ethical review board. References [1] Golz M, Sommer D, et al. (2010) Evaluation of fatigue monitoring technologies. Somnologie 14(3):187-99. [2] Thackray RI, Jones KN, Touchstone RM (1974) Personality and physiological correlates of performance decrement on a monotonous task requiring sustained attention. British J Psychology 65(3):351-8. [3] Schneider P, Biehl M, Hammer B (2009) Adaptive relevance matrices in learning vector quantization. Neural computation 21(12):3532-61. [4] Subasi A (2005) Automatic recognition of alertness level from EEG by using neural network and wavelet coefficients. Exp Syst Applic 28(4):701-11. [5] King LM, Nguyen HT, Lal SKL (2006) Early driver fatigue detection from electroencephalography signals using artificial neural networks. Proc IEEE EMBS Conf:2187-90. [6] Hu S, Zheng G, Peters B (2013) Driver fatigue detection from EEG spectrum after EOG artefact removal. IET Intell Transp Syst 7(1):105-13. [7] Hu J (2017) Comparison of different features and classifiers for driver fatigue detection based on a single EEG channel. Comput Math Meth Med, ID5109530: 9 pages. [8] Chen L, Zhao Y, Zhang J, Zou JZ (2015) Automatic detection of alertness using wavelet-based nonlinear features and machine learning. Exp Syst Applic 42(21):7344-55. [9] Kaida K, Takahashi M, Åkerstedt T, et al. (2006) Validation of the Karolinska sleepiness scale against performance & EEG variables. Clin Neurophysiol 117(7):1574-81. [10] Ke G, Meng Q, Finley T, et al. (2017) LightGBM: A highly efficient gradient boosting decision tree. In: Adv Neural Inf Proc Syst:3146-54. [11] Biehl M, Hammer B, Schleif FM et al. (2015) Stationarity of matrix relevance LVQ. Proc Int Joint Conf Neural Netw:1-8.