Emotion Recognition Elsevier 17 Esfand
Emotion Recognition Elsevier 17 Esfand
Emotion Recognition Elsevier 17 Esfand
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail address
b
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail
address
c
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail
address
d
Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences ,
Abstract
Diagnosing emotional states would improve human-computer interaction (HCI) systems to be more effective in
practice. Correlations between Electroencephalography (EEG) signals and emotions have been shown in the
researches. Therefore, EEG signal based methods are the most accurate and informative. In this study, a
Convolutional Neural Network (CNN) model is optimized to diagnose emotions using EEG signals and a Raspberry
Pi minicomputer is used to implement the optimized and lightweight model. The emotional states were recognized
for every three-second epochs of received signals on the embedded system. Average classification accuracy of
99.11% in the valance and 99.19% in the arousal was achieved on DEAP dataset. Comparing the results with the
related works show that we achieved a high accurate and implementable model in practice.
Keywords
1. Introduction
Nowadays, human-computer interaction (HCI) systems are a big part of human lives. It seems that such interactions
need to have the same social and natural principles as the human to human interactions. In many related
applications, emotional information is required to have more effective systems. For example, in some diseases,
understanding the emotions of patients has an effect on the therapy manner. Some patients, for example with autism
disorder, could not express their emotions. Therefore, the ability to understand the users’ emotions is of interest (J.
Zhang et al., 2020). In recent research, the lack of emotional information in HCI has been considered. To improve
such ability in HCI systems, machines need to understand and interpret the emotions of humans. The aim is to have
adaptive and personalized means of emotion recognition which needs research in different fields of science, e.g.
Artificial Intelligence, psychology, computer science and neuroscience (Egger et al., 2019)
Humans may have different emotions such as happiness, sadness, joy, satisfaction and etc. In the literature, different
models have been proposed for emotion states (Al-Nafjan et al., 2017). One of the most popular ones is Russell’s
Circumplex 2D model which defines emotions as a two-dimensional space of valence and arousal. The term Valence
indicates the level of pleasure and the Arousal indicates the level of excitation (Russell, 1980). Although In
Russell’s model and some studies, e.g. (Javidan et al., 2021), the emotions have been considered as continuous
variables, but in the most related works, emotions have been considered as discrete states.
Emotions can be recognized from speech, behavior, motion, facial expression or physiological signals. Physiological
data that have been used for this purpose are Electrocardiography (ECG), Heart Rate Variability (HRV),
Electroencephalography (EEG), Facial Recognition (FR), Forehead Bio-Signal (FBS), Speech Recognition (SR),
Skin Temperature (SKT), Blood Volume Pulse (BVP), and Respiration (RSP) (Goshvarpour & Goshvarpour, 2020a;
Ko, 2018; Lieskovská et al., 2021; Nikolova et al., 2019; Villarejo et al., 2012). In (Q. Zhang et al., 2017),
respiration signals were studied to recognize emotions. The model was developed using DEAP dataset (Russell,
1980) and Augsburg University dataset. In some studies, Galvanic Skin Response (GSR) signals are used for
emotion recognition. For example in (Ayata et al., 2016), valance and arousal were categorized using GSR. In
(Villarejo et al., 2012), GSR was used to build a stress sensor. In (Domínguez-Jiménez et al., 2020), information
about heart rate as well as GSR was considered to recognize three target emotions. In some works, ECG signals
were decoded to detect the emotional states. For example, a deep neural network in (Keren et al., 2017) and a
scattering wavelet algorithm in (Sepúlveda et al., 2021) were employed to detect emotion from ECG signals.
To improve the accuracy in emotion recognition, some studies use both physical signs and physiological signals. In
(Tarnowski et al., 2018), an experiment was designed with 22 subjects using a movie as the stimulus meanwhile
GSR and EEG signals of each subject were extracted and processed. Frequency domain features were extracted and
two classifiers, SVM and KNN, were implemented. In (Goshvarpour et al., 2017), also ECG and GSR signals were
used to recognize emotions. An experiment was designed with 11 subjects and the stimulus was a music clip.
Features were extracted using wavelet and discrete cosine transform. After reducing the dimension of the features,
Probabilistic Neural Network (PCA) was used to detect four classes of valance and arousal plane. The results of this
paper showed that the accuracy using ECG features is more than the ones of GSR. Facial expression data, ECG, skin
temperature and conductance, Breathing signal, mouth length and pupil size were used in (Tan et al., 2020) to
Although the researches on emotion recognition are very extensive, but some methods are subject-based and in some
cases, an external reaction against a stimulus depends on the personality of the subjects. For example, if a subject
decides to conceal his feeling, the performance of some methods would be affected. Overall, methods based on
physiological signals are more reliable. Among them, as the brain is the source of human reactions to external
stimuli, EEG signal based methods are the most accurate and informative. Correlations between EEG signals and
emotions have been shown in the researches. The frontal scalp seems to store more emotional activation compared
to other regions of the brain (Wang et al., 2014). Furthermore, processing EEG signals has more advantages
compared to some other techniques. Providing immediate medical care with low cost and ease in use for patients
who cannot respond or have any movement makes EEG signals favorable in detecting some diseases as well as
The researches on emotion recognition using EEG signals are extensive. There are differences in the extracted
features categories, classifiers, the number of used channels, datasets and experiments. In (Y. Zhang et al., 2016),
EEG signals of only two channels were employed and EMD strategy and SVM classifier were used. Two neural
models, CNN and DNN, were employed in (Tripathi et al., 2017) on the DEAP dataset. Results in 2-class and 3-
class mode were compared. In some studies, to find the most important features of EEG signals to recognize
emotions, different categories of features have been considered. In (Khateeb et al., 2021), time, frequency and
wavelet domain features were extracted and by using SVM, nine classes of emotions were identified. In (Moon et
al., 2018), power spectrum and correlation between two electrodes were extracted and fed to a CNN for
classification. In (Gannouni et al., 2021), multi-class emotion recognition were studied on the DEAP dataset.
Considering nine emotion states, the authors achieved more accuracy rate using QDC and RNN.
In the literature, most of the deep learning algorithms achieved higher accuracy compared to machine learning ones.
On the other hand, such algorithms are usually too complicated for practical implementation. In this paper, our aim
is to develop an emotion recognition model using EEG signals that is highly accurate and implementable on an
embedded system. We use different state-of-the-art CNN models and assess them to find the most accurate one in
diagnosing the emotional states. Each network is assessed in two setting ways: subject-dependent and subject-
independent. Next, we optimize the selected CNN model to be lightweight and implementable on a Raspberry Pi
processor. Using EEG signals from the DEAP (Russell, 1980) dataset, we investigate the model while the processing
is done on the embedded board in real-time. The results show that this optimized model could achieve high accuracy
The rest of this paper is organized as follows. The method is explained in Section 2. The simulation results are
presented in Section 3. The implementation on the hardware is explained in Section 4. Section 5 concludes the
paper.
In this study, a deep learning model is used to detect emotions using the DEAP dataset. The study is performed in
both subject-dependent and subject-independent settings. We have included preprocessing in our technique to
remove artifacts from EEG data. Then, the baseline signal is removed, the data is segmented and finally passed to
convolutional network. EEGNet, Shallow Convolutional Network (ShallowConvNet), and Deep Convolutional
Neural Network (DeepConvNet) are carried out to recognize emotions in both subject-dependent and subject-
independent settings. Finally, the accuracy and F-score of the three convolutional networks are compared. We also
implement real-time emotion recognition process on an embedded system using a Raspberry Pi board. The steps
applied in this paper are shown in Fig. 1. These steps are described more precisely in the rest of the section.
Fig 1. The workflow diagram applied in this paper.
2.1. Dataset
The well-known DEAP dataset (Russell, 1980) is used in this study, which includes the electroencephalogram and
other peripheral physiological signals of 32 subjects aged between 19 and 37 while watching 40 one-minute music
videos as the stimuli. EEG signals of 32 channels are available in the DEAP dataset. The level of arousal and
valance of the subjects’ emotions after each experiment were assessed using Self-assessment manikins (SAM), with
values from 1 to 9 for each dimension. The emotional states were presented in two dimension valance-arousal model
in which the valance ranged from sad to joyful; and arousal ranged from bored to excited (X. Li et al., 2018). We
segment each valance-arousal space to two parts. The values greater than 5 are high valence/arousal and the ones
2.2. Preprocessing
In the DEAP dataset, EEG signals were recorded by international standard 10-20 electrode systems with a sampling
rate of 512 Hz. In the preprocessing step, the signals were down-sampled to 128Hz and a band pass filter from 4.0
Hz to 45.0 Hz was applied to reduce EMG (Electromyography) and ECG effects from the signals. Eye movement
artifacts and interferences of other sources were removed using blind source separation techniques such as
The time duration of each EEG signal in the DEAP dataset is 63-second, which contains 3-second pre-trial baseline
and 60-second of emotional information. The first 3-second pre-trial signal, in which the video has not started
playing, was repeated 20 times to get a 60-second signal and then this signal was subtracted from the 60-second
trial. Then the pre-trial times were removed from the signals. Next, each 60-second signal was segmented into 3-
2.3. Processing
After preprocessing, the signals are prepared to be processed. In this work, for emotion recognition, three networks
namely, EEGNet, ShallowConvNet, and DeepConvNet were used. For each network, two approaches were
conducted in learning; subject-dependent and subject-independent learning. In the subject-dependent approach, for
each subject a model was trained and parameters were extracted. In this method, we had 800 samples (40
experiments×20 epochs with 3-seconds interval) for each subject. In subject-independent approach, the model was
trained for all subjects and 40×800 samples were available. We adopted a 5-fold cross-validation for both
approaches.
The results of the test on the three networks and with the two mentioned approaches were compared in terms of
accuracy of emotion recognition in arousal and valance. Based on the results, the best method was determined. The
In the following, we introduce the three used networks and the parameters that we set in this study.
2.3.1. EEGNet
EEGNet (Lawhern et al., 2018) is a compact convolutional network that can be applied in different Brain-Computer
Interface (BCI) models and can be trained using limited data. The structure of this network is shown in Table 1.
Input of this network is as (C, T), in which C is the number of channels (in this study C=32) and T is time samples
(in this study T=384 =3 second × 128 Hz). Signals are passed from eight 2D convolutional filters. The output of this
layer is EEG signals in 8 frequency bands. Next, the signals are fed to DepthwiseConv2D as a special filter. To
prevent over-fitting, we use dropout layer. Average polling is applied to reduce features size. After Separable
Convolution, the last block is a softmax classifier with N units, where N is the number of classes that in this study is
set to 2. The model is trained using Adam optimizer and batch size of 64. We run the model 50 and 30 training
2.3.2. DeepConvNet
DeepConvNet (Schirrmeister et al., 2017) is an EEG decoding network which is compatible with any type of
feature. The structure of this network is shown in Table 2. This network contains of five convolution layers and one
2.3.3. ShallowConvNet
ShallowConvNet (Schirrmeister et al., 2017) has more shallow architecture than Deep ConvNet and is designed to
decode band power features of signals. The structure of this network is shown in Table 3. This model consists of a
temporal and then spatial convolutional layer, a mean polling and finally classification layer.
As explained in Section 2, in this study, EEG signals from the DEAP dataset were used for emotion detection on an
embedded system. After preprocessing, baseline removal, and segmentation, the signals were fed to EEGNet,
DeepConvNet and ShallowConvNet using subject-dependent and subject-independent approaches to recognize the
emotional states. For evaluation of the model, 5-fold cross-validation was used and for comparing the results,
accuracy (Acc) and F-score (Yin et al., 2021) were utilized. The Acc parameter is defined as:
TP+TN
Acc=
TP+TN + FP+ FN (1)
in which, TP and TN are true classified cases (low arousal/negative valence named positive emotion and high
arousal/positive valence named negative emotion) and FN and FP are false identified emotion ones.
The F-score parameter, considers precision (Pre) and recall (Rec) rate, is as follow:
2∗Rec∗Pr e
F−score=
Re c +Pr e , (2)
TP TP
Pr e= Re c=
TP+FN , TP+ FN (3)
Table 4 compares the results of the three networks EEGNet, DeepConvNet and Shallowconvnet in subject-
independent mode. As it turns out, the ShallowConvNet model outperforms the other two models in subject-
independent approach. We achieved the best accuracy of 90.49% for valance and 90.97% for arousal using the
Shallowconvnet model.
DeepConvNet.
Valence Arousal
Model Accuracy F score Accuracy F score
EEGNet 70.85 ± 0.85 72.12 ± 1.26 73.30 ± 0.85 75.91 ± 1.41
Table 5 and 6 show the accuracy and F-score results of subject-dependent method for valence and arousal,
respectively. To compare the results, the valance and arousal accuracy acquired in this method are presented in Fig.
As it can be seen in the Table 2 and 3 and also in the Fig. 3 and 4, ShallowConvNet is more accurate in subject-
dependent approach for both the arousal and valance dimensions. Furthermore, the results in Table 1 showed that
ShallowConvNet works more accurate in subject-independent approach, too. Therefore, we used this network in the
embedded system.
To compare the results with other studies, we presented Table 7. To be comparable, the studies on the DEAP dataset
are selected. As the table shows, our method is more accurate in both the arousal and valence dimensions than the
2. Hardware Implementation
In this study, to design an embedded system a Raspberry Pi processor (version 4) was used. This hardware has Quad
core, 64-bit ARM-Cortex A72 running at 1.5GHz, 2 Gigabyte LPDDR4 RAM, ARMv8 based and has different
communication interfaces. The board and the connections are shown in Fig 4 (Ltd, n.d., p. 4).
Fig 4. Hardware of the embedded system
In the implementation step of this study, operating system (Armbian) was installed on the SD card. Commands were
fed to the board using SSH (Secure Shell) protocol and socket programming was used to feed the data to the
processor. The emotional states are recognized for every three-second epochs of received signals.
To reduce the size of the model and to increase the speed, tensorflow lite tools were used. Tensorflow lite version of
the model is executed efficiently on devices with limited resources. In this work, to optimize the size of the model
more, different quantization techniques were applied. Quantization reduces the precision of the numbers used to
represent a model’s parameters. Optimization and conversion reduces the size of the model and the latency, with
The results of applying two quantization techniques are shown in Table 8 and 9 for subject dependent and
independent, respectively. Optimization and conversion, result in significant reduction in size of the model and
faster computation (in most of the cases) without loss of accuracy. According to the results, the best model for
implementing is the ShallowConvNet subject dependent that resized with the post-training dynamic range
quantization.
bytes)
Quantization
Valence 99.09 99.12
range quantization
Valence 91.35 92.21
quantization
Valence 91.37 92.24
3. Conclusion
In this study, we recognized the emotional states by implementing three convolutional networks, EEGNet,
ShallowConvNet and DeepConvNet using EEG signals. For every network, we used two methods: subject-
dependent and subject-independent. The best average classification accuracy of 99.11% in the valance and 99.19%
in the arousal was achieved by using ShallowConvNet and subject-dependent method. Furthermore, since the
models did not need feature extraction and selection steps, the processing steps were reduced. This makes it possible
to implement the algorithm on embedded systems. We used Raspberry Pi processor in our embedded system. After
optimization and quantization, we achieved a lightweight model that could recognize emotional states for every
three-second epochs of received signals. It is possible to use such hardware in applicable devices such as emotion-
Future studies aim to use mobile and wearable sensors to collect facial expressions and some other physiological
signals such as ECG and EEG; and combine them in an appropriate framework to improve accuracy in real-time
emotion detection. In this study, as well as most papers in the literature, the emotions have been identified in two
dimensions of valance and arousal. Expanding the model to include additional dimensions may also be considered in
future approach. For example, by analyzing situational information, the subject can be predicted in a three-
4. ACKNOWLEDGMENTS
This work is supported by Isfahan University of Medical Sciences, Grant No. 2400207.
5. REFERENCES
Al-Nafjan, A., Hosny, M., Al-Ohali, Y., & Al-Wabil, A. (2017). Review and Classification of Emotion Recognition
Based on EEG Brain-Computer Interface System Research: A Systematic Review. Applied Sciences, 7(12),
1239. https://doi.org/10.3390/app7121239
Ayata, D., Yaslan, Y., & Kamaşak, M. (2016). Emotion recognition via random forest and galvanic skin response:
Comparison of time based feature sets, window sizes and wavelet approaches. 2016 Medical Technologies
National Congress (TIPTEKNO), 1–4. https://doi.org/10.1109/TIPTEKNO.2016.7863130
Cui, H., Liu, A., Zhang, X., Chen, X., Wang, K., & Chen, X. (2020). EEG-based emotion recognition using an end-
https://doi.org/10.1016/j.knosys.2020.106243
Domínguez-Jiménez, J. A., Campo-Landines, K. C., Martínez-Santos, J. C., Delahoz, E. J., & Contreras-Ortiz, S. H.
(2020). A machine learning model for emotion recognition from physiological signals. Biomedical Signal
Egger, M., Ley, M., & Hanke, S. (2019). Emotion Recognition from Physiological Signal Analysis: A Review.
Gannouni, S., Aledaily, A., Belwafi, K., & Aboalsamh, H. (2021). Emotion detection using electroencephalography
signals and a zero-time windowing-based epoch estimation and relevant electrode identification. Scientific
Goshvarpour, A., Abbasi, A., & Goshvarpour, A. (2017). An accurate emotion recognition system using ECG and
GSR signals and matching pursuit method. Biomedical Journal, 40(6), 355–368.
https://doi.org/10.1016/j.bj.2017.11.001
Goshvarpour, A., & Goshvarpour, A. (2020a). The potential of photoplethysmogram and galvanic skin response in
emotion recognition using nonlinear features. Physical and Engineering Sciences in Medicine, 43(1), 119–
134. https://doi.org/10.1007/s13246-019-00825-7
Goshvarpour, A., & Goshvarpour, A. (2020b). A Novel Approach for EEG Electrode Selection in Automated
Emotion Recognition Based on Lagged Poincare’s Indices and sLORETA. Cognitive Computation, 12(3),
602–618. https://doi.org/10.1007/s12559-019-09699-z
Huang, D., Chen, S., Liu, C., Zheng, L., Tian, Z., & Jiang, D. (2021). Differences first in asymmetric brain: A bi-
hemisphere discrepancy convolutional neural network for EEG emotion recognition. Neurocomputing, 448,
140–151. https://doi.org/10.1016/j.neucom.2021.03.105
Javidan, M., Yazdchi, M., Baharlouei, Z., & Mahnam, A. (2021). Feature and channel selection for designing a
regression-based continuous-variable emotion recognition system with two EEG channels. Biomedical
Khateeb, M., Anwar, S. M., & Alnowami, M. (2021). Multi-Domain Feature Fusion for Emotion Classification
Ko, B. C. (2018). A Brief Review of Facial Emotion Recognition Based on Visual Information. Sensors, 18(2), 401.
https://doi.org/10.3390/s18020401
Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P., & Lance, B. J. (2018). EEGNet: A
Li, R., Liang, Y., Liu, X., Wang, B., Huang, W., Cai, Z., Ye, Y., Qiu, L., & Pan, J. (2021). MindLink-Eumpy: An
Open-Source Python Toolbox for Multimodal Emotion Recognition. Frontiers in Human Neuroscience, 15.
https://doi.org/10.3389/fnhum.2021.621493
Li, X., Song, D., Zhang, P., Zhang, Y., Hou, Y., & Hu, B. (2018). Exploring EEG Features in Cross-Subject
Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A Review on Speech Emotion Recognition Using
https://doi.org/10.3390/electronics10101163
Ltd, R. P. (n.d.). Raspberry Pi 4 Model B specifications. Raspberry Pi. Retrieved March 2, 2022, from
https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
Moon, S.-E., Jang, S., & Lee, J.-S. (2018). Convolutional Neural Network Approach for Eeg-Based Emotion
Recognition Using Brain Connectivity and its Spatial Information. 2018 IEEE International Conference on
https://doi.org/10.1109/ICASSP.2018.8461315
Nath, D., Anubhav, Singh, M., Sethia, D., Kalra, D., & Indu, S. (2020). A Comparative Study of Subject-Dependent
and Subject-Independent Strategies for EEG-Based Emotion Recognition using LSTM Network.
Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 142–147.
https://doi.org/10.1145/3388142.3388167
Nikolova, D., Mihaylova, P., Manolova, A., & Georgieva, P. (2019). ECG-Based Human Emotion Recognition
Across Multiple Subjects. In V. Poulkov (Ed.), Future Access Enablers for Ubiquitous and Intelligent
3_3
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–
1178. https://doi.org/10.1037/h0077714
Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M., Eggensperger, K., Tangermann, M.,
Hutter, F., Burgard, W., & Ball, T. (2017). Deep learning with convolutional neural networks for EEG
decoding and visualization: Convolutional Neural Networks in EEG Analysis. Human Brain Mapping,
Sepúlveda, A., Castillo, F., Palma, C., & Rodriguez-Fernandez, M. (2021). Emotion Recognition from ECG Signals
Using Wavelet Scattering and Machine Learning. Applied Sciences, 11(11), 4945.
https://doi.org/10.3390/app11114945
Suhaimi, N. S., Mountstephens, J., & Teo, J. (2020). EEG-Based Emotion Recognition: A State-of-the-Art Review
of Current Trends and Opportunities. Computational Intelligence and Neuroscience, 2020, 1–19.
https://doi.org/10.1155/2020/8875426
Tan, C., Ceballos, G., Kasabov, N., & Puthanmadam Subramaniyam, N. (2020). FusionSense: Emotion
Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking
Tarnowski, P., Kołodziej, M., Majkowski, A., & Rak, R. J. (2018). Combined analysis of GSR and EEG signals for
https://doi.org/10.1109/IIPHDW.2018.8388342
Tripathi, S., Acharya, S., Sharma, R., Mittal, S., & Bhattacharya, S. (2017). Using Deep and Convolutional Neural
Networks for Accurate Emotion Classification on DEAP Data. Proceedings of the AAAI Conference on
Villarejo, M. V., Zapirain, B. G., & Zorrilla, A. M. (2012). A Stress Sensor Based on Galvanic Skin Response
Yin, Y., Zheng, X., Hu, B., Zhang, Y., & Cui, X. (2021). EEG emotion recognition using fusion model of graph
convolutional neural networks and LSTM. Applied Soft Computing, 100, 106954.
https://doi.org/10.1016/j.asoc.2020.106954
Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multi-modal data and machine
https://doi.org/10.1016/j.inffus.2020.01.011
Zhang, Q., Chen, X., Zhan, Q., Yang, T., & Xia, S. (2017). Respiration-based emotion recognition with deep
Zhang, Y., Ji, X., & Zhang, S. (2016). An approach to EEG-based emotion recognition using combined feature