4d Attention Based NN For Eeg Emotion Recognition
4d Attention Based NN For Eeg Emotion Recognition
4d Attention Based NN For Eeg Emotion Recognition
Abstract
Electroencephalograph (EEG) emotion recognition is a significant task in the brain-computer
interface field. Although many deep learning methods are proposed recently, it is still
challenging to make full use of the information contained in different domains of EEG signals.
In this paper, we present a novel method, called four-dimensional attention-based neural
network (4D-aNN) for EEG emotion recognition. First, raw EEG signals are transformed into
4D spatial-spectral-temporal representations. Then, the proposed 4D-aNN adopts spectral and
spatial attention mechanisms to adaptively assign the weights of different brain regions and
frequency bands, and a convolutional neural network (CNN) is utilized to deal with the spectral
and spatial information of the 4D representations. Moreover, a temporal attention mechanism
is integrated into a bidirectional Long Short-Term Memory (LSTM) to explore temporal
dependencies of the 4D representations. Our model achieves state-of-the-art performance on
the SEED dataset under intra-subject splitting. The experimental results have shown the
effectiveness of the attention mechanisms in different domains for EEG emotion recognition.
Keywords: EEG, emotion recognition, attention mechanism, convolutional recurrent neural network
1
Department of Electronics, 1
Peking University, Beijing, China
2
School of Electrical Engineering,
Beijing Jiaotong University, Beijing, China
*Corresponding author: Quansheng Ren (Email: [email protected])
Introduction recognition. Zheng et al. introduce a deep belief network
(DBN) to investigate the critical frequency bands and EEG
Emotion plays an important role in daily life and is closely signal channels for EEG emotion recognition (Zheng and Lu
related to human behavior and cognition (Dolan 2002). As one 2015). Yang et al. propose a hierarchical network to classify
of the most significant research topics of affective computing, the DE features extracted from different frequency bands
emotion recognition has received increasing attention in (Yang et al. 2018b). Song et al. use a graph convolutional
recent years for its applications of disease detection (Bamdad neural network to classify the DE features (Song et al. 2020).
et al. 2015; Figueiredo et al. 2019), human-computer Ma et al. propose a multimodal residual Long Short-Term
interaction (Fiorinia et al. 2020; Katsigiannis and Ramzan Memory model (MMResLSTM) for emotion recognition,
2017), and workload estimation (Blankertz et al. 2016). In which shares temporal weights across the multiple modalities
general, emotion recognition methods can be divided into two (Jiaxin Ma et al. 2019). To learn the bi-hemispheric
categories (Mühl et al. 2014). One is based on external discrepancy for EEG emotion recognition, Yang et al. propose
emotion responses including facial expressions and a novel bi-hemispheric discrepancy model (BiHDM) (Li et al.
gestures(Yan et al. 2016), and the other is based on internal 2020). All those deep learning methods outperform the
emotion responses including electroencephalograph (EEG) shallow models.
and electrocardiography (ECG) (Zheng et al. 2017). Although deep learning emotion recognition models have
Neuroscientific researches have shown that some major brain achieved higher accuracy than shallow models, it is still
cortex regions are closely related to emotions, making it challenging to fuse more important information on different
possible to decode emotions based on EEG (Brittona et al. domains and capture discriminative local patterns in EEG
2006; Lotfia and Akbarzadeh-T 2014). EEG is non-invasive, signals. In the past decades, many researchers have
portable, and inexpensive so that it has been widely used in investigated the critical frequency bands and channels for
the field of brain-computer interfaces (BCIs) (Pfurtscheller et EEG emotion recognition. Zheng et al. demonstrate that
al. 2010). Besides, EEG signals contain various spatial, β[14~31 Hz] and γ[31~51 Hz] bands are more related to
spectral, and temporal information about emotions evoked by emotion recognition than other bands, and their model
specific stimulation patterns. Therefore, more and more achieves the best performance when combining all frequency
researchers concentrate on EEG emotion recognition recently bands. They also conduct experiments to select critical
(Alhagry et al. 2017; Li and Lu 2009). channels and propose the minimum pools of electrode sets for
Traditional EEG emotion recognition methods usually emotion recognition (Zheng and Lu 2015). To utilize the
extract hand-crafted features from EEG signals first and then spatial information of EEG signals, Li et al. propose a 2D
adopt shallow models to classify the emotion features. EEG sparse map to maintain the information hidden in the electrode
emotion features can be extracted from the time domain, placement (Li et al. 2018). Zhong et al. introduce a regularized
frequency domain, and time-frequency domain. Jenke et al. graph neural network (RGNN) to capture both local and global
conduct a comprehensive survey on EEG feature extraction relations among different EEG channels for emotion
methods by using machine learning techniques on a self- recognition (Zhong et al. 2020). The temporal dependencies
recorded dataset (Jenke et al. 2014). For classifying the in EEG signals are also important to emotion recognition. For
extracted emotion features, many researchers have adopted example, Ma et al. (Jiaxin Ma et al. 2019) apply LSTMs in
machine learning methods over the past few years (Kim et al. their models to extract temporal features for emotion
2013). Li et al. apply a linear support vector machine (SVM) recognition. Shen et al. transform the DE features of different
to classify emotion features extracted from the gamma channels into 4D structures to integrate the spectral, spatial,
frequency band (Li and Lu 2009). Duan et al. extract and temporal information simultaneously and then use a four-
differential entropy (DE) features, which are superior to dimensional convolutional recurrent neural network (4D-
representing emotion states in EEG signals (Shi et al. 2013), CRNN) to recognize different emotions (Shen et al. 2020).
from multichannel EEG data and combine a k-Nearest However, the differences among brain regions and frequency
Neighbor (KNN) with SVM to classify the DE features (Duan bands are not fully utilized in their work. To adaptively
et al. 2013). However, shallow models require lots of expert capture discriminative patterns in EEG signals, attention
knowledge to design and select emotion features, limiting mechanisms have been applied to EEG emotion recognition.
their performance on EEG emotion classification. For instance, Tao et al. introduce a channel-wise attention
Deep learning methods have been demonstrated to mechanism, assigning the weights of different channels
outperform traditional machine learning methods in many adaptively, along with an extended self-attention to explore
fields such as computer vision, natural language processing, the temporal dependencies of EEG signals (Tao et al. 2020).
and biomedical signal processing (Abbass et al. 2018; Craik et Jia et al. propose a two-stream network with attention
al. 2019) for the ability to learn high-level features from data mechanisms to adaptively focus on important patterns (Jia et
automatically (Krizhevsky et al. 2012). Recently, some al. 2020). From the above, it can be observed that it is critical
researchers have applied deep learning to EEG emotion
2
Journal XX (XXXX) XXXXXX Xiao et al
3
derived by the 0.5s window without overlapping. To utilize map, respectively. The 2D sparse map of all the c channels
the spatial information of electrodes, we organize all the 𝑐 with zero-padding is shown in Fig. 3, which preserves the
channels as a 2D sparse map so that the 3D feature tensor 𝐹𝑛 topology of different electrodes. In this paper, we set ℎ = 19,
is transformed into a 4D representation 𝑋𝑛 𝑅ℎ𝑤2𝑓2𝑇 , 𝑤 = 19, and 𝑓 = 5.
where ℎ and 𝑤 are the height and width of the 2D sparse
Fig. 2 The generation of 4D spatial-spectral-temporal representation. For each Ts EEG signal segment, we extract DE and PSD features
from different channels and frequency bands with a 0.5s window. Then, the features are transformed into a 4D representation which consists
of 2T temporal slices.
4
Fig. 3 The 2D sparse map with zero-padding of 62 channels. The purpose of the organization is to preserve the positional relationships among
different electrodes.
Attention-based CNN
For a 4D spatial-spectral-temporal representation 𝑋𝑛 , we
extract the spatial and spectral information from each temporal
slice 𝑆𝑖 𝑅ℎ𝑤2𝑓 , 𝑖 = 1, 2, . . . , 2𝑇 with a CNN, explore the
discriminative local patterns in spatial and spectral domains
with a convolutional attention module, and finally get its
spatial and spectral representation. The attention module here
is similar to what Woo et al. propose (Woo et al. 2018), which
is originally used to improve the representation power of CNN
networks.
The structure of the attention-based CNN is shown in Fig.
4. It contains four convolutional layers, four convolutional
attention modules, one max-pooling layer, and one fully-
connected layer. The four convolutional layers have 64, 128,
256, and 64 feature maps with the filter size of 5 5, 5 5,
5 5 , and 3 3 , respectively. Specifically, a convolutional
attention module is used after each convolutional layer to
utilize the spatial and spectral attention mechanisms, and the
details will be given later. We only use one max-pooling layer
with a filter size of 2 2 after the last convolutional attention
module to preserve more information and enhance the
robustness of the network. Finally, outputs of the max-pooling
layer are flattened and fed to the fully-connected layer with
150 units. Thus, for each temporal slice 𝑆𝑖 , we take the final
output 𝑃𝑖 𝑅150 as its spatial and spectral representation.
5
Convolutional attention module where 𝑊1𝑆 and 𝑊2𝑆 are learnable parameters, denotes the
The convolutional attention module is applied after each element-wise addition, and 𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙 𝑅1×1×𝑐𝑣 is the spectral
convolutional layer to adaptively capture important brain attention. The elements of 𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙 represent the importance
regions and frequency bands. The structure of the of the corresponding 2D feature maps of the spectral domain.
convolutional attention module is shown in Fig. 5. It consists After generating the spectral attention 𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙 , the output of
of two sub-modules, i.e. the spatial attention module and the the spectral attention module can be defined as:
spectral attention module. 𝑉 ′ = 𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙 𝑉 (11)
For each convolutional layer above, its output is a 3D where 𝑉′ denotes the refined 3D feature tensor, and
feature tensor 𝑉 𝑅ℎ𝑣 × 𝑤𝑣 × 𝑐𝑣 , where ℎ𝑣 , 𝑤𝑣 , and 𝑐𝑣 are the represents the element-wise multiplication.
height of the 2D feature maps of 𝑉, the width of the 2D feature The spatial attention module is applied to identify valuable
maps of 𝑉 , and the number of the 2D feature maps of 𝑉 , brain regions for emotion recognition. Firstly, we shrink the
respectively. We take 𝑉 as the input of the convolutional spectral dimension of 𝑉 ′ by spectral-wise average pooling
attention module. and spectral-wise maximum pooling, which is defined as:
The spectral attention module is applied to identify valuable 𝑐𝑣
1 ′
frequency bands for emotion recognition. The average pooling 𝑆𝑃𝐴𝑎𝑣𝑔,(ℎ,𝑤) = ∑ 𝑆ℎ,𝑤 (𝑐) , ℎ = 1, 2, . . . , ℎ𝑣 ; 𝑤
𝑐𝑣
has been widely used to aggregate spatial information and the 𝑐=1
maximum pooling has been commonly adopted to gather = 1, 2, . . . , 𝑤𝑣 (12)
′
distinctive features. Therefore, we shrink the spatial 𝑆𝑃𝐴𝑚𝑎𝑥,(ℎ,𝑤) = 𝑚𝑎𝑥(𝑆ℎ,𝑤 ), ℎ = 1, 2, . . . , ℎ𝑣 ; 𝑤
dimension of 𝑉 by a spatial-wise average pooling and a = 1, 2, . . . , 𝑤𝑣 (13)
spatial-wise maximum pooling, which are defined as:
′
where 𝑆ℎ,𝑤 𝑅𝑐𝑣 denotes the channel in the h-th row and w-th
ℎ𝑣 𝑤𝑣 column of 𝑉 ′ , 𝑆𝑃𝐴𝑎𝑣𝑔,(ℎ,𝑤) represents the element in the h-th
1
𝐶𝑎𝑣𝑔,𝑖 = ∑ ∑ 𝑉𝑖 (ℎ, 𝑤) , 𝑖 = 1, 2, … , 𝑐𝑣 (4) row and w-th column of the spectral average representation
ℎ𝑣 × 𝑤𝑣
ℎ=1 𝑤=1 𝑆𝑃𝐴𝑎𝑣𝑔 𝑅ℎ𝑣×𝑤𝑣×1 and 𝑆𝑃𝐴𝑚𝑎𝑥,(ℎ,𝑤) is the element in the h-
𝐶𝑚𝑎𝑥,𝑖 = 𝑚𝑎𝑥(𝑉𝑖 ), 𝑖 = 1, 2, . . . , 𝑐𝑣 (5) th row and w-th column of the spectral maximum
where 𝑉𝑖 𝑅ℎ𝑣 × 𝑤𝑣 denotes the 2D feature map in the i-th representation 𝑆𝑃𝐴𝑚𝑎𝑥 𝑅ℎ𝑣×𝑤𝑣×1 . In the following, we
channel of 𝑉, 𝐶𝑎𝑣𝑔,𝑖 represents the element in the i-th channel implement the spatial attention with a convolutional layer and
of the spatial average representation 𝐶𝑎𝑣𝑔 𝑅𝑐𝑣 , 𝑚𝑎𝑥(𝑍) a sigmoid activation function, which is defined as:
returns the largest element in 𝑍, and 𝐶𝑚𝑎𝑥,𝑖 is the element in 𝑆𝑃𝐴 = 𝐶𝑎𝑡(𝑆𝑃𝐴𝑎𝑣𝑔 , 𝑆𝑃𝐴𝑚𝑎𝑥 ) (14)
the i-th channel of the spatial maximum representation 𝐴𝑠𝑝𝑎𝑡𝑖𝑎𝑙 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝐶𝑜𝑛𝑣(𝑆𝑃𝐴)) (15)
𝐶𝑚𝑎𝑥 𝑅𝑐𝑣 . Subsequently, we implement the spectral attention where 𝐶𝑎𝑡(𝑆𝑃𝐴𝑎𝑣𝑔 , 𝑆𝑃𝐴𝑚𝑎𝑥 ) denotes the concatenation of
by two fully-connected layers, a Relu activation function and 𝑆𝑃𝐴𝑎𝑣𝑔 and 𝑆𝑃𝐴𝑚𝑎𝑥 along the spectral dimension,
a sigmoid activation function, which is defined as: 𝐶𝑜𝑛𝑣(𝑆𝑃𝐴) represents the convolutional layer for 𝑆𝑃𝐴, and
𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙,𝑎𝑣𝑔 = 𝑊2𝑆 (𝑅𝑒𝑙𝑢(𝑊1𝑆 𝐶𝑎𝑣𝑔 ) (6) 𝐴𝑠𝑝𝑎𝑡𝑖𝑎𝑙 𝑅ℎ𝑣×𝑤𝑣×1 is the spatial attention. The elements of
𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙,𝑚𝑎𝑥 = 𝑊2𝑆 (𝑅𝑒𝑙𝑢(𝑊1𝑆 𝐶𝑚𝑎𝑥 ) (7) 𝐴𝑠𝑝𝑎𝑡𝑖𝑎𝑙 represent the importance of the corresponding regions
𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙,𝑎𝑣𝑔 𝐴𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙,𝑚𝑎𝑥 ) (8) of the spatial domain. Subsequently, the output of the spatial
𝑅𝑒𝑙𝑢(𝑥) = max(𝑥, 0) (9) attention module can be defined as:
1
𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = (10) 𝑉 ′′ = 𝐴𝑠𝑝𝑎𝑡𝑖𝑎𝑙 𝑉 ′ (16)
1 + 𝑒 −𝑥
where 𝑉 ′′ 𝑅ℎ𝑣 × 𝑤𝑣 × 𝑐𝑣
denotes the final output 3D feature
tensor of the convolutional attention module.
6
Fig. 5 The top block is the overall structure of the convolutional attention block, it consists of the spectral attention module and
the spatial attention module. The middle block represents the generation of spectral attention. The bottom block denotes the
generation of spatial attention.
𝑃 𝑁 = (𝑃2𝑇 , 𝑃2𝑇−1 , . . ., 𝑃1 ) as the input sequence. The outputs
Attention-based bidirectional LSTM of the i-th node of the unidirectional LSTMs are 𝑌𝑖𝑃 𝑅36
For each temporal slice 𝑆𝑖 𝑅ℎ𝑤2𝑓 , 𝑖 = 1, 2, . . . , 2𝑇 , the and 𝑌𝑖 𝑅 𝑃, 𝑖 = 1, 2,
𝑁 36
. . . , 2𝑇 , respectively. Then, we
𝑁
final output of the attention-based CNN is 𝑃𝑖 𝑅 . Since the
150 concatenate 𝑌𝑖 and 𝑌2𝑇 + 1 − 𝑖 as the output of the i-th node
variation between different temporal slices contains temporal of the bidirectional LSTM 𝑌𝑖 𝑅72 . Different from traditional
information for emotion recognition, we utilize an attention- ways that only use the output of the last node of an LSTM for
based bidirectional LSTM to explore the importance of classification or other applications, we take the outputs of all
different slices, as shown in Fig. 6. the bidirectional LSTM nodes 𝑌 𝑅2𝑇×72 into consideration
A bidirectional LSTM connects two unidirectional LSTMs and explore the importance of different temporal slices by the
with opposite directions to the same output. Comparing with temporal attention mechanism.
a unidirectional LSTM, a bidirectional LSTM preserves The temporal attention mechanism is implemented with two
information from both past and future, making it understand fully-connected layers, a Relu activation function, and a
the context better. In this paper, the bidirectional LSTM softmax activation function, which is defined as:
comprises two unidirectional LSTMs with 36 memory cells. 𝑇𝑒𝑚𝑖 = 𝑊2𝑇 (𝑅𝑒𝑙𝑢(𝑊1𝑇 Yi + 𝑏1𝑇 )) + 𝑏2𝑇 (17)
The unidirectional LSTM for positive time direction, LSTMP 𝐴𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑇𝑒𝑚) (18)
takes the output sequence of the attention-based CNN 𝑃 𝑃 = 𝑒𝑥𝑝(𝑥)
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑥) = (19)
(𝑃1 , 𝑃2 , . . ., 𝑃2𝑇 ) as the input sequence, while the other for ∑ 𝑒𝑥𝑝(𝑥)
negative time direction, LSTMN takes the reverse sequence
7
where 𝑊1𝑇 , 𝑊2𝑇 , 𝑏1𝑇 , and 𝑏2𝑇 are learnable parameters, 𝑇𝑒𝑚𝑖 function to predict the label of the 4D sample 𝑋𝑛 , which can
represents the i-th element of 𝑇𝑒𝑚 𝑅2𝑇×1 which projects be defined as follows:
𝑌 𝑅 2𝑇×72 to a lower dimension, and 𝐴𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑅2𝑇×1 is 𝑃𝑟𝑒 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊 𝑝 𝐿𝑛 + 𝑏 𝑝 ) (21)
the temporal attention. The elements of 𝐴𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 represent where 𝑊 , 𝑏 are learnable parameters and 𝑃𝑟𝑒 𝑅𝐶
𝑝 𝑝
the importance of the corresponding temporal slices. denotes the probability of 𝑋𝑛 belonging to all the 𝐶 classes.
Subsequently, the high-level representation of the 4D sample Specifically, the class of the largest probability is the predicted
𝑋𝑛 can be defined as: label of 4D-aNN.
𝐿𝑛 (𝑒) = ∑ 𝐴𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑌𝑒 , 𝑒 = 1, 2, … , 72 (20)
Experiment
where 𝑌𝑒 R2T×1 denotes the e-th column of 𝑌 R2T×72
and 𝐿𝑛 (𝑒) is the e-th element of the high-level representation In this section, we firstly introduce a widely used dataset.
𝐿𝑛 𝑅 72 , which integrates spatial, spectral, and temporal
Then, the experiment settings are described. Finally, the
information of 𝑋𝑛 . results on the dataset are reported and discussed.
SEED Dataset
Settings
8
Journal XX (XXXX) XXXXXX Xiao et al
Fig. 7 The performance of 4D-aNN on each subject. In the SEED dataset, 3 experiments are conducted for each subject. We evaluate the
performance of each experiment and also present the average classification accuracy for each subject.
9
To verify the importance of the attention mechanisms in our the critical brain regions could vary with different subjects,
model, we conduct an additional experiment for ablation time, and emotions so that the attention mechanisms that
studies on SEED dataset. The experiment is ablation on spatial, enable 4D-aNN to adaptively capture discriminative patterns
spectral, and temporal attention mechanisms. We evaluate the make sense for EEG emotion recognition.
performances of 4D-aNN when spatial, spectral, temporal,
and all the attention mechanisms are ablated respectively. As
shown in Fig. 8, when one of the attention mechanisms is
ablated, the classification accuracy decreases. 4D-aNN
without the spectral attention mechanism decreases by 0.63%,
4D-aNN without the spatial attention mechanism decreases by
0.47%, and 4D-aNN without the temporal attention
mechanism decrease by 1.19%. Specifically, 4D-aNN without
all the attention mechanisms decreases by 2.17%, which is the
worst among the models used for comparison. In conclusion,
the results indicate that the attention mechanisms make
contributions to EEG emotion recognition for the ability to
capture the discriminative local patterns in spatial, spectral,
and temporal domains.
10
on the outputs of the CNN module while the temporal
Discussion attention mechanism explores the importance of different
We conduct several experiments to investigate the use of temporal slices. The experiments on SEED dataset
4D-aNN which fuses the spatial-spectral-temporal demonstrate better performance than all baselines. In
information and the effectiveness of the attention mechanisms particular, the ablation studies on different attention modules
on different domains for EEG emotion classification. In this show the effectiveness of the attention mechanisms in our
section, we discuss three noteworthy points. model for EEG emotion recognition.
First, to deal with the spatial-spectral information, we apply
Reference
an attention-based CNN which consists of a CNN network, a
spectral attention module, and a spatial attention module. The Abbass SKGHA, Tan KC, Al-Mamun A, Thakor N,
CNN network extracts the spatial-spectral representation from Bezerianos A, Li J (2018) Spatio–Spectral
inputs first. Then, the spectral attention mechanism is applied Representation Learning for
to each spectral feature to explore the importance of different Electroencephalographic Gait-Pattern Classification
frequency bands and features. Besides, the spatial attention Ieee T Neur Sys Reh 26:1858-1867
mechanism is applied to each 2D feature map to adaptively doi:10.1109/TNSRE.2018.2864119
capture the critical brain regions. The critical brain regions and Alhagry S, Fahmy AA, El-Khoribi RA (2017) Emotion
recognition based on EEG using LSTM recurrent
frequency bands could vary with different individuals,
neural network International Journal of Advanced
emotions, and time so that the ability to capture discriminative Computer Science and Applications 8:335-358
patterns of the attention modules improves the performance of doi:10.14569/IJACSA.2017.081046
4D-aNN. Bamdad M, Zarshenas H, Auais MA (2015) Application of
Second, to explore the temporal dependencies in 4D spatial- BCI systems in neurorehabilitation: a scoping review
spectral-temporal representations, we utilize an attention- Disability and Rehabilitation: Assistive Technology
based bidirectional LSTM. The bidirectional LSTM extracts 10:355-364 doi:10.3109/17483107.2014.961569
high-level representations from the outputs of the attention- Blankertz B et al. (2016) The Berlin brain-computer interface:
based CNN. Different from traditional ways that only use the progress beyond communication and control Front
output of the last node of an LSTM for classifications or other Neurosci-Switz 10:530
applications, we consider outputs of all the nodes with the doi:10.3389/fnins.2016.00530
temporal attention mechanism. The temporal attention Brittona JC, Phan KL, Taylor SF, Welsh RC, Berridge KC,
Liberzon I (2006) Neural correlates of social and
mechanism adaptively assigns weights of different temporal
nonsocial emotions: An fMRI study Neuroimage
slices so that the dynamic content of emotions in 4D
31:397-409 doi:10.1016/j.neuroimage.2005.11.027
representations could be captured better. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN
Third, to address the importance of the attention (2018) Grad-CAM++: Generalized Gradient-Based
mechanisms, we conduct ablation studies on different Visual Explanations for Deep Convolutional
attention modules. 4D-aNN without the spatial, spectral, and Networks. Paper presented at the 2018 IEEE Winter
temporal attention mechanism decreases by 0.47%, 0.63%, Conference on Applications of Computer Vision
and 1.19% on classification accuracy, respectively. In (WACV), Lake Tahoe, NV, USA, 12-15 March 2018
particular, 4D-aNN without all the attention mechanisms Craik A, He Y, Contreras-Vidal JL (2019) Deep learning for
decreases by 2.17%, which is the worst among the models in electroencephalogram (EEG) classification tasks: a
comparison. The experimental results demonstrate the review J Neural Eng 16:031001 doi:10.1088/1741-
effectiveness of the attention mechanisms to adaptively 2552/ab0ab5
capture discriminative patterns. Dolan RJ (2002) Emotion, cognition, and behavior Science
298:1191–1194 doi:10.1126/science.1076358
Conclusion Duan R-N, Zhu J-Y, Lu B-L (2013) Differential entropy
feature for eeg-based emotion classification. Paper
In this paper, we propose the 4D-aNN model for EEG presented at the 2013 6th International IEEE/EMBS
emotion recognition. The 4D-aNN takes 4D spatial-spectral- Conference on Neural Engineering (NER), San
temporal representations containing spatial, spectral, and Diego, CA, USA, 6-8 Nov. 2013
temporal information of EEG signals as inputs. We integrate Figueiredo GR, Ripka WL, Romaneli EFR, Ulbricht L (2019)
the attention mechanisms into the CNN module and the Attentional bias for emotional faces in depressed and
nondepressed individuals: an eye-tracking study.
bidirectional LSTM module. The CNN module deals with the
Paper presented at the 2019 41st Annual
spatial and spectral information of EEG signals while the
International Conference of the IEEE Engineering in
spatial and spectral attention mechanisms capture critical Medicine and Biology Society (EMBC), Berlin,
brain regions and frequency bands adaptively. The Germany, Germany, 23-27 July 2019
bidirectional LSTM module extracts temporal dependencies
11
Fiorinia L, Mancioppi G, Semeraro F, Fujita H, Cavallo F Shen F, Dai G, Lin G, Zhang J, Kong W, Zeng H (2020) EEG-
(2020) Unsupervised emotional state classification based emotion recognition using 4D convolutional
through physiological parameters for social robotics recurrent neural network Cogn Neurodynamics
applications Knowledge-Based Systems 190 14:815–828 doi:10.1007/s11571-020-09634-1
doi:10.1016/j.knosys.2019.105217 Shi L-C, Jiao Y-Y, Lu B-L (2013) Differential entropy feature
Jenke R, Peer A, Buss M (2014) Feature extraction and for eeg-based vigilance estimation. Paper presented
selection for emotion recognition from eeg IEEE at the 2013 35th Annual International Conference of
Transactions on Affective Computing 5:327-339 the IEEE Engineering in Medicine and Biology
doi:10.1109/TAFFC.2014.2339834 Society (EMBC), Osaka, Japan, 3-7 July 2013
Jia Z, Lin Y, Cai X, Chen H, Gou H, Wang J SST-EmotionNet: Song T, Zheng W, Song P, Cui Z (2020) EEG Emotion
Spatial-Spectral-Temporal based Attention 3D Recognition Using Dynamical Graph Convolutional
Dense Network for EEG Emotion Recognition. In: Neural Networks IEEE Transactions on Affective
Proceedings of the 28th ACM International Computing 11:532-541
Conference on Multimedia, Seattle, WA, USA, 2020. doi:10.1109/TAFFC.2018.2817622
Association for Computing Machinery, pp 2909– Tao W, Li C, Song R, Cheng J, Liu Y, Wan F, Chen X (2020)
2917. doi:10.1145/3394171.3413724 EEG-based Emotion Recognition via Channel-wise
Jiaxin Ma, Tang H, Zheng W-L, Lu B-L Emotion recognition Attention and Self Attention IEEE Transactions on
using multimodal residual LSTM network. In: Affective Computing:1-1
Proceedings of the 27th ACM International doi:10.1109/TAFFC.2020.3025777
Conference on Multimedia, Nice, France, 2019. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam:
Association for Computing Machinery, New York, Convolutional block attention module. Computer
NY, USA, pp 176–183. Vision – ECCV 2018. Springer International
doi:10.1145/3343031.3350871 Publishing, Cham. doi:10.1007/978-3-030-01234-
Katsigiannis S, Ramzan N (2017) Dreamer: A database for 2_1
emotion recognition through eeg and ecg signals Yan J, Zheng W, Xu Q, Lu G, Li H, Wang B (2016) Sparse
from wireless low-cost off-the-shelf devices Ieee J kernel reduced-rank regression for bimodal emotion
Biomed Health 22:98-107 recognition from facial expression and speech IEEE
doi:10.1109/JBHI.2017.2688239 Transactions on Multimedia 18:1319-1329
Kim M-K, Kim M, Oh E, Kim S-P (2013) A review on the doi:10.1109/TMM.2016.2557721
computational methods for emotional state Yang Y, Wu Q, Fu Y, Chen X Continuous Convolutional
estimation from the human eeg Comput Math Neural Network with 3D Input for EEG-Based
Method M 2013 doi:10.1155/2013/573734 Emotion Recognition. In: Cheng L, Leung ACS,
Krizhevsky A, Sutskever I, Hinton GE Imagenet classification Ozawa S (eds) Neural Information Processing, 2018a.
with deep convolutional neural networks. In: Springer International Publishing, pp 433-433.
Advances in Neural Information Processing Systems, doi:10.1007/978-3-030-04239-4_39
2012. Curran Associates, Inc., pp 1097-1105 Yang Y, Wu QMJ, Zheng W-L, Lu B-L (2018b) EEG-based
Li J, Zhang Z, He H (2018) Hierarchical convolutional neural emotion recognition using hierarchical network with
networks for EEG-based emotion recognition Cogn subnetwork nodes Ieee T Cogn Dev Syst 10:408-419
Comput 10:368–380 doi:10.1007/s12559-017-9533- doi:10.1109/TCDS.2017.2685338
x Zheng W-L, Lu B-L (2015) Investigating critical frequency
Li M, Lu B-L (2009) Emotion classification based on gamma- bands and channels for EEG-based emotion
band EEG. Paper presented at the 2009 Annual recognition with deep neural networks IEEE
International Conference of the IEEE Engineering in Transactions on Autonomous Mental Development
Medicine and Biology Society, Minneapolis, MN, 7:162-175 doi:10.1109/TAMD.2015.2431497
USA, 3-6 Sept. 2009 Zheng W-L, Zhu J-Y, Lu B-L (2017) Identifying stable
Li Y et al. (2020) A Novel Bi-hemispheric Discrepancy Model patterns over time for emotion recognition from EEG
for EEG Emotion Recognition Ieee T Cogn Dev IEEE Transactions on Affective Computing 10:417-
Syst:1-1 doi:10.1109/TCDS.2020.2999337 429 doi:10.1109/TAFFC.2017.2712143
Lotfia E, Akbarzadeh-T M-R (2014) Practical emotional Zhong P, Wang D, Miao C (2020) EEG-Based Emotion
neural networks Neural Networks 59:61-72 Recognition Using Regularized Graph Neural
doi:10.1016/j.neunet.2014.06.012 Networks IEEE Transactions on Affective
Mühl C, Nijholt BAA, Chanel G (2014) A survey of affective Computing:1-1 doi:10.1109/TAFFC.2020.2994159
brain computer interfaces: principles, state-of-the-art,
and challenges Brain-Computer Interfaces 1:66-84
doi:10.1080/2326263X.2014.912881
Pfurtscheller G et al. (2010) The hybrid BCI Front Neurosci-
Switz 4:3 doi:10.3389/fnpro.2010.00003
12