Emotion Recognition Elsevier 17 Esfand

A Real-time Emotion Recognition Embedded System using
an Optimized Deep Learning Model
Mehdi Bazargani a, Amir Tahmasebi b, Mohammad Reza Yazdchi c, Zahra Baharlouei d
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail address
b
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail
address
c
a Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran, e-mail
address
d
Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences ,
Isfahan, Iran, [email protected].
Abstract
Diagnosing emotional states would improve human-computer interaction (HCI) systems to be more effective in
practice. Correlations between Electroencephalography (EEG) signals and emotions have been shown in the
researches. Therefore, EEG signal based methods are the most accurate and informative. In this study, a
Convolutional Neural Network (CNN) model is optimized to diagnose emotions using EEG signals and a Raspberry
Pi minicomputer is used to implement the optimized and lightweight model. The emotional states were recognized
for every three-second epochs of received signals on the embedded system. Average classification accuracy of
99.11% in the valance and 99.19% in the arousal was achieved on DEAP dataset. Comparing the results with the
related works show that we achieved a high accurate and implementable model in practice.
Keywords
Electroencephalography; Emotion Recognition; Embedded system; Convolutional Neural Network.
1. Introduction
Nowadays, human-computer interaction (HCI) systems are a big part of human lives. It seems that such interactions
need to have the same social and natural principles as the human to human interactions. In many related
applications, emotional information is required to have more effective systems. For example, in some diseases,
understanding the emotions of patients has an effect on the therapy manner. Some patients, for example with autism
disorder, could not express their emotions. Therefore, the ability to understand the users’ emotions is of interest (J.
Zhang et al., 2020). In recent research, the lack of emotional information in HCI has been considered. To improve
such ability in HCI systems, machines need to understand and interpret the emotions of humans. The aim is to have
adaptive and personalized means of emotion recognition which needs research in different fields of science, e.g.
Artificial Intelligence, psychology, computer science and neuroscience (Egger et al., 2019)
Humans may have different emotions such as happiness, sadness, joy, satisfaction and etc. In the literature, different
models have been proposed for emotion states (Al-Nafjan et al., 2017). One of the most popular ones is Russell’s
Circumplex 2D model which defines emotions as a two-dimensional space of valence and arousal. The term Valence
indicates the level of pleasure and the Arousal indicates the level of excitation (Russell, 1980). Although In
Russell’s model and some studies, e.g. (Javidan et al., 2021), the emotions have been considered as continuous
variables, but in the most related works, emotions have been considered as discrete states.
Emotions can be recognized from speech, behavior, motion, facial expression or physiological signals. Physiological
data that have been used for this purpose are Electrocardiography (ECG), Heart Rate Variability (HRV),
Electroencephalography (EEG), Facial Recognition (FR), Forehead Bio-Signal (FBS), Speech Recognition (SR),
Skin Temperature (SKT), Blood Volume Pulse (BVP), and Respiration (RSP) (Goshvarpour & Goshvarpour, 2020a;
Ko, 2018; Lieskovská et al., 2021; Nikolova et al., 2019; Villarejo et al., 2012). In (Q. Zhang et al., 2017),
respiration signals were studied to recognize emotions. The model was developed using DEAP dataset (Russell,
1980) and Augsburg University dataset. In some studies, Galvanic Skin Response (GSR) signals are used for
emotion recognition. For example in (Ayata et al., 2016), valance and arousal were categorized using GSR. In
(Villarejo et al., 2012), GSR was used to build a stress sensor. In (Domínguez-Jiménez et al., 2020), information
about heart rate as well as GSR was considered to recognize three target emotions. In some works, ECG signals
were decoded to detect the emotional states. For example, a deep neural network in (Keren et al., 2017) and a
scattering wavelet algorithm in (Sepúlveda et al., 2021) were employed to detect emotion from ECG signals.
To improve the accuracy in emotion recognition, some studies use both physical signs and physiological signals. In
(Tarnowski et al., 2018), an experiment was designed with 22 subjects using a movie as the stimulus meanwhile
GSR and EEG signals of each subject were extracted and processed. Frequency domain features were extracted and
two classifiers, SVM and KNN, were implemented. In (Goshvarpour et al., 2017), also ECG and GSR signals were
used to recognize emotions. An experiment was designed with 11 subjects and the stimulus was a music clip.
Features were extracted using wavelet and discrete cosine transform. After reducing the dimension of the features,
Probabilistic Neural Network (PCA) was used to detect four classes of valance and arousal plane. The results of this
paper showed that the accuracy using ECG features is more than the ones of GSR. Facial expression data, ECG, skin
temperature and conductance, Breathing signal, mouth length and pupil size were used in (Tan et al., 2020) to
recognize the emotions by enhanced neural networks.
Although the researches on emotion recognition are very extensive, but some methods are subject-based and in some
cases, an external reaction against a stimulus depends on the personality of the subjects. For example, if a subject
decides to conceal his feeling, the performance of some methods would be affected. Overall, methods based on
physiological signals are more reliable. Among them, as the brain is the source of human reactions to external
stimuli, EEG signal based methods are the most accurate and informative. Correlations between EEG signals and
emotions have been shown in the researches. The frontal scalp seems to store more emotional activation compared
to other regions of the brain (Wang et al., 2014). Furthermore, processing EEG signals has more advantages
compared to some other techniques. Providing immediate medical care with low cost and ease in use for patients
who cannot respond or have any movement makes EEG signals favorable in detecting some diseases as well as
emotion states (Suhaimi et al., 2020).
The researches on emotion recognition using EEG signals are extensive. There are differences in the extracted
features categories, classifiers, the number of used channels, datasets and experiments. In (Y. Zhang et al., 2016),
EEG signals of only two channels were employed and EMD strategy and SVM classifier were used. Two neural
models, CNN and DNN, were employed in (Tripathi et al., 2017) on the DEAP dataset. Results in 2-class and 3-
class mode were compared. In some studies, to find the most important features of EEG signals to recognize
emotions, different categories of features have been considered. In (Khateeb et al., 2021), time, frequency and
wavelet domain features were extracted and by using SVM, nine classes of emotions were identified. In (Moon et
al., 2018), power spectrum and correlation between two electrodes were extracted and fed to a CNN for
classification. In (Gannouni et al., 2021), multi-class emotion recognition were studied on the DEAP dataset.
Considering nine emotion states, the authors achieved more accuracy rate using QDC and RNN.
In the literature, most of the deep learning algorithms achieved higher accuracy compared to machine learning ones.
On the other hand, such algorithms are usually too complicated for practical implementation. In this paper, our aim
is to develop an emotion recognition model using EEG signals that is highly accurate and implementable on an
embedded system. We use different state-of-the-art CNN models and assess them to find the most accurate one in
diagnosing the emotional states. Each network is assessed in two setting ways: subject-dependent and subject-
independent. Next, we optimize the selected CNN model to be lightweight and implementable on a Raspberry Pi
processor. Using EEG signals from the DEAP (Russell, 1980) dataset, we investigate the model while the processing
is done on the embedded board in real-time. The results show that this optimized model could achieve high accuracy
in recognizing the emotions in real-time.
The rest of this paper is organized as follows. The method is explained in Section 2. The simulation results are
presented in Section 3. The implementation on the hardware is explained in Section 4. Section 5 concludes the
paper.
2. Material and Methods
In this study, a deep learning model is used to detect emotions using the DEAP dataset. The study is performed in
both subject-dependent and subject-independent settings. We have included preprocessing in our technique to
remove artifacts from EEG data. Then, the baseline signal is removed, the data is segmented and finally passed to
convolutional network. EEGNet, Shallow Convolutional Network (ShallowConvNet), and Deep Convolutional
Neural Network (DeepConvNet) are carried out to recognize emotions in both subject-dependent and subject-
independent settings. Finally, the accuracy and F-score of the three convolutional networks are compared. We also
implement real-time emotion recognition process on an embedded system using a Raspberry Pi board. The steps
applied in this paper are shown in Fig. 1. These steps are described more precisely in the rest of the section.
Fig 1. The workflow diagram applied in this paper.
2.1. Dataset
The well-known DEAP dataset (Russell, 1980) is used in this study, which includes the electroencephalogram and
other peripheral physiological signals of 32 subjects aged between 19 and 37 while watching 40 one-minute music
videos as the stimuli. EEG signals of 32 channels are available in the DEAP dataset. The level of arousal and
valance of the subjects’ emotions after each experiment were assessed using Self-assessment manikins (SAM), with
values from 1 to 9 for each dimension. The emotional states were presented in two dimension valance-arousal model
in which the valance ranged from sad to joyful; and arousal ranged from bored to excited (X. Li et al., 2018). We
segment each valance-arousal space to two parts. The values greater than 5 are high valence/arousal and the ones
below 5 are low valence/arousal.
2.2. Preprocessing
In the DEAP dataset, EEG signals were recorded by international standard 10-20 electrode systems with a sampling
rate of 512 Hz. In the preprocessing step, the signals were down-sampled to 128Hz and a band pass filter from 4.0
Hz to 45.0 Hz was applied to reduce EMG (Electromyography) and ECG effects from the signals. Eye movement
artifacts and interferences of other sources were removed using blind source separation techniques such as
Independent Component Analysis (ICA).
The time duration of each EEG signal in the DEAP dataset is 63-second, which contains 3-second pre-trial baseline
and 60-second of emotional information. The first 3-second pre-trial signal, in which the video has not started
playing, was repeated 20 times to get a 60-second signal and then this signal was subtracted from the 60-second
trial. Then the pre-trial times were removed from the signals. Next, each 60-second signal was segmented into 3-
second epochs and finally, Z-normalization was applied.
2.3. Processing
After preprocessing, the signals are prepared to be processed. In this work, for emotion recognition, three networks
namely, EEGNet, ShallowConvNet, and DeepConvNet were used. For each network, two approaches were
conducted in learning; subject-dependent and subject-independent learning. In the subject-dependent approach, for
each subject a model was trained and parameters were extracted. In this method, we had 800 samples (40
experiments×20 epochs with 3-seconds interval) for each subject. In subject-independent approach, the model was
trained for all subjects and 40×800 samples were available. We adopted a 5-fold cross-validation for both
approaches.
The results of the test on the three networks and with the two mentioned approaches were compared in terms of
accuracy of emotion recognition in arousal and valance. Based on the results, the best method was determined. The
details are presented in the result section.
In the following, we introduce the three used networks and the parameters that we set in this study.
2.3.1. EEGNet
EEGNet (Lawhern et al., 2018) is a compact convolutional network that can be applied in different Brain-Computer
Interface (BCI) models and can be trained using limited data. The structure of this network is shown in Table 1.
Input of this network is as (C, T), in which C is the number of channels (in this study C=32) and T is time samples
(in this study T=384 =3 second × 128 Hz). Signals are passed from eight 2D convolutional filters. The output of this
layer is EEG signals in 8 frequency bands. Next, the signals are fed to DepthwiseConv2D as a special filter. To
prevent over-fitting, we use dropout layer. Average polling is applied to reduce features size. After Separable
Convolution, the last block is a softmax classifier with N units, where N is the number of classes that in this study is
set to 2. The model is trained using Adam optimizer and batch size of 64. We run the model 50 and 30 training
iterations for subject-dependent and subject-independent approaches, respectively.
Table 1. EEGNet network structure (Lawhern et al., 2018).
Layer Number Kernel Padding Output parameters

of filters size
Input (1,32,384)
Conv2D 8 1×32 same (8,32,384) 256
BatchNorm2D (8,32,384) 16
DepthwiseConv2D 16 32×1 valid (16,1,384) 512
ELU Activation (16,1,384)
AveragePooling2D 1×4 valid (16,1,96)

DropOut (16,1,96)
SeparableConv2D 16 1×16 same (16,1,96) 512
ELU Activation (16,1,96)
AveragePooling2D 1×8 valid (16,1,12)
DropOut (16,1,12)
Flatten (16×12)
Dense 384 (2)
Softmax Activation
2.3.2. DeepConvNet
DeepConvNet (Schirrmeister et al., 2017) is an EEG decoding network which is compatible with any type of
feature. The structure of this network is shown in Table 2. This network contains of five convolution layers and one
dense softmax classifier.
Table 2. DeepConvNet network structure (Schirrmeister et al., 2017).
Block Layer Activation Padding Filter Size

1 Convolution Linear Valid 25 1,5
Spatial filter Linear Valid 25 32,1
Max Polling 1,2
2 Convolution Linear valid 50 1,5
Max polling 1,2
Max polling 1,2
Max polling 1,2
5 Classification Softmax 2
2.3.3. ShallowConvNet
ShallowConvNet (Schirrmeister et al., 2017) has more shallow architecture than Deep ConvNet and is designed to
decode band power features of signals. The structure of this network is shown in Table 3. This model consists of a
temporal and then spatial convolutional layer, a mean polling and finally classification layer.
Table 3. ShallowConvNet network structure (Schirrmeister et al., 2017).
Layer Activation Padding Filter Size

Convolution Linear Same 40 1,13
Spatial filter Linear Valid 40 32,1
Mean Polling strides=(1, 7)
Classification Softmax 2
1. Simulation Results
As explained in Section 2, in this study, EEG signals from the DEAP dataset were used for emotion detection on an
embedded system. After preprocessing, baseline removal, and segmentation, the signals were fed to EEGNet,
DeepConvNet and ShallowConvNet using subject-dependent and subject-independent approaches to recognize the
emotional states. For evaluation of the model, 5-fold cross-validation was used and for comparing the results,
accuracy (Acc) and F-score (Yin et al., 2021) were utilized. The Acc parameter is defined as:
TP+TN
Acc=
TP+TN + FP+ FN (1)
in which, TP and TN are true classified cases (low arousal/negative valence named positive emotion and high
arousal/positive valence named negative emotion) and FN and FP are false identified emotion ones.
The F-score parameter, considers precision (Pre) and recall (Rec) rate, is as follow:
2∗Rec∗Pr e
F−score=
Re c +Pr e , (2)
in which, Pre and Rec are:
TP TP
Pr e= Re c=
TP+FN , TP+ FN (3)
Table 4 compares the results of the three networks EEGNet, DeepConvNet and Shallowconvnet in subject-
independent mode. As it turns out, the ShallowConvNet model outperforms the other two models in subject-
independent approach. We achieved the best accuracy of 90.49% for valance and 90.97% for arousal using the
Shallowconvnet model.
Table 4. Classification results of subject-independent approach using EEGNet, ShallowConvNet and
DeepConvNet.
Valence Arousal
Model Accuracy F score Accuracy F score
EEGNet 70.85 ± 0.85 72.12 ± 1.26 73.30 ± 0.85 75.91 ± 1.41
ShallowConvNet 90.49 ± 0.93 91.38 ± 0.82 90.97 ± 1.35 91.96 ± 1.39
DeepConvNet 86.37 ± 0.64 87.60 ± 0.61 88.62 ± 0.44 90.21 ± 0.43
Table 5 and 6 show the accuracy and F-score results of subject-dependent method for valence and arousal,
respectively. To compare the results, the valance and arousal accuracy acquired in this method are presented in Fig.
3 and Fig.4, respectively.

Table 5. Classification results of subject-dependent approach using EEGNet, ShallowConvNet and
DeepConvNet for valance.
Model Accuracy (in average) F-score (in average)

EEGNet 86.9 ± 8.3 87.1 ± 9.32
ShallowConvNet 99.11 ± 1.16 99.15 ± 1.18
DeepConvNet 96.82 ± 5.43 96.81 ± 6.57
Table 6. Classification results of subject-dependent approach using EEGNet, ShallowConvNet and
DeepConvNet for arousal.
Model Accuracy (in average) F-score (in average)

EEGNet 87.06 ± 8.87 86.45 ± 12.4
ShallowConvNet 99.19 ± 1.02 99.22 ± 1.09
DeepConvNet 96.91 ± 6.56 96.87 ± 7.05

Fig 2. The accuracy of three networks in subject-depended approach for the valence dimension.
Fig 3. The accuracy of three networks in subject-depended approach for the arousal dimension.
As it can be seen in the Table 2 and 3 and also in the Fig. 3 and 4, ShallowConvNet is more accurate in subject-
dependent approach for both the arousal and valance dimensions. Furthermore, the results in Table 1 showed that
ShallowConvNet works more accurate in subject-independent approach, too. Therefore, we used this network in the
embedded system.
To compare the results with other studies, we presented Table 7. To be comparable, the studies on the DEAP dataset
are selected. As the table shows, our method is more accurate in both the arousal and valence dimensions than the
methods presented in other articles.
Table 7. Comparison of the accuracy in different studies on emotion recognition.
Study Year Method accuracy

(Yin et al., 2021 ECLGCNN 90.45% in valance and 90.60% in arousal
2021)
(Cui et al., 2020 RACNN 96.65 ± 2.65 in valance and 97.11 ± 2.01 in arousal
2020)
(Goshvarpo 2020 Lagged Poincare 98.97% in valance and 98.94% in arousal
ur & Indices, RSSF, SVM
Goshvarpou
r, 2020b)
(R. Li et al., 2021 SVM , CNN 52.50 ± 11.29% in valance and 56.00 ± 12.46% in arousal
2021)
(Nath et al., 2020 LSTM 94.69% in valance and 93.13% in arousal
2020)
(Huang et 2021 BiDCNN 94.38% in valence and 94.72% in arousal
al., 2021)
Our method EEGNet, 99.11% for valance and 99.19 % for arousal (in average
ShallowConvNet, using subject-dependent method and ShallowConvNet)
DeepConvNet
2. Hardware Implementation
In this study, to design an embedded system a Raspberry Pi processor (version 4) was used. This hardware has Quad
core, 64-bit ARM-Cortex A72 running at 1.5GHz, 2 Gigabyte LPDDR4 RAM, ARMv8 based and has different
communication interfaces. The board and the connections are shown in Fig 4 (Ltd, n.d., p. 4).
Fig 4. Hardware of the embedded system
In the implementation step of this study, operating system (Armbian) was installed on the SD card. Commands were
fed to the board using SSH (Secure Shell) protocol and socket programming was used to feed the data to the
processor. The emotional states are recognized for every three-second epochs of received signals.
To reduce the size of the model and to increase the speed, tensorflow lite tools were used. Tensorflow lite version of
the model is executed efficiently on devices with limited resources. In this work, to optimize the size of the model
more, different quantization techniques were applied. Quantization reduces the precision of the numbers used to
represent a model’s parameters. Optimization and conversion reduces the size of the model and the latency, with
minimal (or no) loss in accuracy.
The results of applying two quantization techniques are shown in Table 8 and 9 for subject dependent and
independent, respectively. Optimization and conversion, result in significant reduction in size of the model and
faster computation (in most of the cases) without loss of accuracy. According to the results, the best model for
implementing is the ShallowConvNet subject dependent that resized with the post-training dynamic range
quantization.
Table 8. Subject dependent results on board
Technique Dimension Accuracy F score Latency (ms) Model Size(K
bytes)
Without Quantization Arousal 99.19 99.20 12.5629 221
Valence 99.09 99.13
Post-training dynamic range Arousal 99.20 99.21 12.5554 61
Quantization
Valence 99.09 99.12
Post-training float16 quantization Arousal 99.19 99.20 12.8859 114
Valence 99.09 99.13
Table 9. Subject independent results on board
Technique Dimension Accuracy F score Latency (ms) Model Size(K bytes)
Without quantization Arousal 91.70 92.84 12.7359 221
Valence 91.37 92.24
Post-training dynamic Arousal 91.74 92.88 12.5564 61
range quantization
Valence 91.35 92.21
Post-training float16 Arousal 91.70 92.84 12.5254 113
quantization
Valence 91.37 92.24
3. Conclusion
In this study, we recognized the emotional states by implementing three convolutional networks, EEGNet,
ShallowConvNet and DeepConvNet using EEG signals. For every network, we used two methods: subject-
dependent and subject-independent. The best average classification accuracy of 99.11% in the valance and 99.19%
in the arousal was achieved by using ShallowConvNet and subject-dependent method. Furthermore, since the
models did not need feature extraction and selection steps, the processing steps were reduced. This makes it possible
to implement the algorithm on embedded systems. We used Raspberry Pi processor in our embedded system. After
optimization and quantization, we achieved a lightweight model that could recognize emotional states for every
three-second epochs of received signals. It is possible to use such hardware in applicable devices such as emotion-
detection wearable headbands.
Future studies aim to use mobile and wearable sensors to collect facial expressions and some other physiological
signals such as ECG and EEG; and combine them in an appropriate framework to improve accuracy in real-time
emotion detection. In this study, as well as most papers in the literature, the emotions have been identified in two
dimensions of valance and arousal. Expanding the model to include additional dimensions may also be considered in
future approach. For example, by analyzing situational information, the subject can be predicted in a three-
dimensional model of emotions, namely arousal, valance and position.
4. ACKNOWLEDGMENTS
This work is supported by Isfahan University of Medical Sciences, Grant No. 2400207.
5. REFERENCES
Al-Nafjan, A., Hosny, M., Al-Ohali, Y., & Al-Wabil, A. (2017). Review and Classification of Emotion Recognition
Based on EEG Brain-Computer Interface System Research: A Systematic Review. Applied Sciences, 7(12),
1239. https://doi.org/10.3390/app7121239
Ayata, D., Yaslan, Y., & Kamaşak, M. (2016). Emotion recognition via random forest and galvanic skin response:
Comparison of time based feature sets, window sizes and wavelet approaches. 2016 Medical Technologies
National Congress (TIPTEKNO), 1–4. https://doi.org/10.1109/TIPTEKNO.2016.7863130
Cui, H., Liu, A., Zhang, X., Chen, X., Wang, K., & Chen, X. (2020). EEG-based emotion recognition using an end-
to-end regional-asymmetric convolutional neural network. Knowledge-Based Systems, 205, 106243.
https://doi.org/10.1016/j.knosys.2020.106243
Domínguez-Jiménez, J. A., Campo-Landines, K. C., Martínez-Santos, J. C., Delahoz, E. J., & Contreras-Ortiz, S. H.
(2020). A machine learning model for emotion recognition from physiological signals. Biomedical Signal
Processing and Control, 55, 101646. https://doi.org/10.1016/j.bspc.2019.101646
Egger, M., Ley, M., & Hanke, S. (2019). Emotion Recognition from Physiological Signal Analysis: A Review.
Electronic Notes in Theoretical Computer Science, 343, 35–55. https://doi.org/10.1016/j.entcs.2019.04.009
Gannouni, S., Aledaily, A., Belwafi, K., & Aboalsamh, H. (2021). Emotion detection using electroencephalography
signals and a zero-time windowing-based epoch estimation and relevant electrode identification. Scientific
Reports, 11(1), 7071. https://doi.org/10.1038/s41598-021-86345-5
Goshvarpour, A., Abbasi, A., & Goshvarpour, A. (2017). An accurate emotion recognition system using ECG and
GSR signals and matching pursuit method. Biomedical Journal, 40(6), 355–368.
https://doi.org/10.1016/j.bj.2017.11.001
Goshvarpour, A., & Goshvarpour, A. (2020a). The potential of photoplethysmogram and galvanic skin response in
emotion recognition using nonlinear features. Physical and Engineering Sciences in Medicine, 43(1), 119–
134. https://doi.org/10.1007/s13246-019-00825-7
Goshvarpour, A., & Goshvarpour, A. (2020b). A Novel Approach for EEG Electrode Selection in Automated
Emotion Recognition Based on Lagged Poincare’s Indices and sLORETA. Cognitive Computation, 12(3),
602–618. https://doi.org/10.1007/s12559-019-09699-z
Huang, D., Chen, S., Liu, C., Zheng, L., Tian, Z., & Jiang, D. (2021). Differences first in asymmetric brain: A bi-
hemisphere discrepancy convolutional neural network for EEG emotion recognition. Neurocomputing, 448,
140–151. https://doi.org/10.1016/j.neucom.2021.03.105
Javidan, M., Yazdchi, M., Baharlouei, Z., & Mahnam, A. (2021). Feature and channel selection for designing a
regression-based continuous-variable emotion recognition system with two EEG channels. Biomedical
Signal Processing and Control, 70, 102979. https://doi.org/10.1016/j.bspc.2021.102979

Keren, G., Kirschstein, T., Marchi, E., Ringeval, F., & Schuller, B. (2017). End-to-end Learning for Dimensional
Emotion Recognition from Physiological Signals. 985–990. https://hal.archives-ouvertes.fr/hal-02080895
Khateeb, M., Anwar, S. M., & Alnowami, M. (2021). Multi-Domain Feature Fusion for Emotion Classification
Using DEAP Dataset. IEEE Access, 9, 12134–12142. https://doi.org/10.1109/ACCESS.2021.3051281
Ko, B. C. (2018). A Brief Review of Facial Emotion Recognition Based on Visual Information. Sensors, 18(2), 401.
https://doi.org/10.3390/s18020401
Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P., & Lance, B. J. (2018). EEGNet: A
Compact Convolutional Network for EEG-based Brain-Computer Interfaces. ArXiv:1611.08024 [Cs, q-
Bio, Stat]. https://doi.org/10.1088/1741-2552/aace8c
Li, R., Liang, Y., Liu, X., Wang, B., Huang, W., Cai, Z., Ye, Y., Qiu, L., & Pan, J. (2021). MindLink-Eumpy: An
Open-Source Python Toolbox for Multimodal Emotion Recognition. Frontiers in Human Neuroscience, 15.
https://doi.org/10.3389/fnhum.2021.621493
Li, X., Song, D., Zhang, P., Zhang, Y., Hou, Y., & Hu, B. (2018). Exploring EEG Features in Cross-Subject
Emotion Recognition. Frontiers in Neuroscience, 12, 162. https://doi.org/10.3389/fnins.2018.00162
Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A Review on Speech Emotion Recognition Using
Deep Learning and Attention Mechanism. Electronics, 10(10), 1163.
https://doi.org/10.3390/electronics10101163
Ltd, R. P. (n.d.). Raspberry Pi 4 Model B specifications. Raspberry Pi. Retrieved March 2, 2022, from
https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
Moon, S.-E., Jang, S., & Lee, J.-S. (2018). Convolutional Neural Network Approach for Eeg-Based Emotion
Recognition Using Brain Connectivity and its Spatial Information. 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2556–2560.
https://doi.org/10.1109/ICASSP.2018.8461315
Nath, D., Anubhav, Singh, M., Sethia, D., Kalra, D., & Indu, S. (2020). A Comparative Study of Subject-Dependent
and Subject-Independent Strategies for EEG-Based Emotion Recognition using LSTM Network.
Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 142–147.
https://doi.org/10.1145/3388142.3388167
Nikolova, D., Mihaylova, P., Manolova, A., & Georgieva, P. (2019). ECG-Based Human Emotion Recognition
Across Multiple Subjects. In V. Poulkov (Ed.), Future Access Enablers for Ubiquitous and Intelligent
Infrastructures (pp. 25–36). Springer International Publishing. https://doi.org/10.1007/978-3-030-23976-
3_3
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–
1178. https://doi.org/10.1037/h0077714
Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M., Eggensperger, K., Tangermann, M.,
Hutter, F., Burgard, W., & Ball, T. (2017). Deep learning with convolutional neural networks for EEG
decoding and visualization: Convolutional Neural Networks in EEG Analysis. Human Brain Mapping,
38(11), 5391–5420. https://doi.org/10.1002/hbm.23730
Sepúlveda, A., Castillo, F., Palma, C., & Rodriguez-Fernandez, M. (2021). Emotion Recognition from ECG Signals
Using Wavelet Scattering and Machine Learning. Applied Sciences, 11(11), 4945.
https://doi.org/10.3390/app11114945
Suhaimi, N. S., Mountstephens, J., & Teo, J. (2020). EEG-Based Emotion Recognition: A State-of-the-Art Review
of Current Trends and Opportunities. Computational Intelligence and Neuroscience, 2020, 1–19.
https://doi.org/10.1155/2020/8875426
Tan, C., Ceballos, G., Kasabov, N., & Puthanmadam Subramaniyam, N. (2020). FusionSense: Emotion
Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking
Neural Network. Sensors, 20(18), 5328. https://doi.org/10.3390/s20185328
Tarnowski, P., Kołodziej, M., Majkowski, A., & Rak, R. J. (2018). Combined analysis of GSR and EEG signals for
emotion recognition. 2018 International Interdisciplinary PhD Workshop (IIPhDW), 137–141.
https://doi.org/10.1109/IIPHDW.2018.8388342
Tripathi, S., Acharya, S., Sharma, R., Mittal, S., & Bhattacharya, S. (2017). Using Deep and Convolutional Neural
Networks for Accurate Emotion Classification on DEAP Data. Proceedings of the AAAI Conference on
Artificial Intelligence, 31(2), 4746–4752.
Villarejo, M. V., Zapirain, B. G., & Zorrilla, A. M. (2012). A Stress Sensor Based on Galvanic Skin Response
(GSR) Controlled by ZigBee. Sensors, 12(5), 6075–6101. https://doi.org/10.3390/s120506075

Wang, X.-W., Nie, D., & Lu, B.-L. (2014). Emotional state classification from EEG data using machine learning
approach. Neurocomputing, 129, 94–106. https://doi.org/10.1016/j.neucom.2013.06.046
Yin, Y., Zheng, X., Hu, B., Zhang, Y., & Cui, X. (2021). EEG emotion recognition using fusion model of graph
convolutional neural networks and LSTM. Applied Soft Computing, 100, 106954.
https://doi.org/10.1016/j.asoc.2020.106954
Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multi-modal data and machine
learning techniques: A tutorial and review. Information Fusion, 59, 103–126.
https://doi.org/10.1016/j.inffus.2020.01.011
Zhang, Q., Chen, X., Zhan, Q., Yang, T., & Xia, S. (2017). Respiration-based emotion recognition with deep
learning. Computers in Industry, 92–93, 84–90. https://doi.org/10.1016/j.compind.2017.04.005
Zhang, Y., Ji, X., & Zhang, S. (2016). An approach to EEG-based emotion recognition using combined feature
extraction method. Neuroscience Letters, 633, 152–157. https://doi.org/10.1016/j.neulet.2016.09.037

Emotion Recognition Elsevier 17 Esfand

Uploaded by

Copyright:

Available Formats

Emotion Recognition Elsevier 17 Esfand

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Emotion Recognition Elsevier 17 Esfand

Uploaded by

Copyright:

Available Formats

A Real-time Emotion Recognition Embedded System using

an Optimized Deep Learning Model

Mehdi Bazargani a, Amir Tahmasebi b, Mohammad Reza Yazdchi c, Zahra Baharlouei d

Isfahan, Iran, [email protected].

Electroencephalography; Emotion Recognition; Embedded system; Convolutional Neural Network.

recognize the emotions by enhanced neural networks.

emotion states (Suhaimi et al., 2020).

in recognizing the emotions in real-time.

2. Material and Methods

below 5 are low valence/arousal.

Independent Component Analysis (ICA).

second epochs and finally, Z-normalization was applied.

details are presented in the result section.

iterations for subject-dependent and subject-independent approaches, respectively.

Table 1. EEGNet network structure (Lawhern et al., 2018).

Layer Number Kernel Padding Output parameters

AveragePooling2D 1×4 valid (16,1,96)

dense softmax classifier.

Table 2. DeepConvNet network structure (Schirrmeister et al., 2017).

Block Layer Activation Padding Filter Size

Table 3. ShallowConvNet network structure (Schirrmeister et al., 2017).

Layer Activation Padding Filter Size

in which, Pre and Rec are:

Table 4. Classification results of subject-independent approach using EEGNet, ShallowConvNet and

ShallowConvNet 90.49 ± 0.93 91.38 ± 0.82 90.97 ± 1.35 91.96 ± 1.39

DeepConvNet 86.37 ± 0.64 87.60 ± 0.61 88.62 ± 0.44 90.21 ± 0.43

3 and Fig.4, respectively.

DeepConvNet for valance.

Model Accuracy (in average) F-score (in average)

DeepConvNet 96.82 ± 5.43 96.81 ± 6.57

Table 6. Classification results of subject-dependent approach using EEGNet, ShallowConvNet and

DeepConvNet for arousal.

Model Accuracy (in average) F-score (in average)

ShallowConvNet 99.19 ± 1.02 99.22 ± 1.09

DeepConvNet 96.91 ± 6.56 96.87 ± 7.05

methods presented in other articles.

Table 7. Comparison of the accuracy in different studies on emotion recognition.

Study Year Method accuracy

minimal (or no) loss in accuracy.

Table 8. Subject dependent results on board

Technique Dimension Accuracy F score Latency (ms) Model Size(K

Without Quantization Arousal 99.19 99.20 12.5629 221

Valence 99.09 99.13

Post-training dynamic range Arousal 99.20 99.21 12.5554 61

Post-training float16 quantization Arousal 99.19 99.20 12.8859 114

Valence 99.09 99.13

Table 9. Subject independent results on board

Technique Dimension Accuracy F score Latency (ms) Model Size(K bytes)

Without quantization Arousal 91.70 92.84 12.7359 221

Valence 91.37 92.24

Post-training dynamic Arousal 91.74 92.88 12.5564 61

Post-training float16 Arousal 91.70 92.84 12.5254 113

detection wearable headbands.

dimensional model of emotions, namely arousal, valance and position.

to-end regional-asymmetric convolutional neural network. Knowledge-Based Systems, 205, 106243.

Processing and Control, 55, 101646. https://doi.org/10.1016/j.bspc.2019.101646

Electronic Notes in Theoretical Computer Science, 343, 35–55. https://doi.org/10.1016/j.entcs.2019.04.009

Reports, 11(1), 7071. https://doi.org/10.1038/s41598-021-86345-5

Signal Processing and Control, 70, 102979. https://doi.org/10.1016/j.bspc.2021.102979

Emotion Recognition from Physiological Signals. 985–990. https://hal.archives-ouvertes.fr/hal-02080895