Senticnet 6: Ensemble Application of Symbolic and Subsymbolic Ai For Sentiment Analysis
Senticnet 6: Ensemble Application of Symbolic and Subsymbolic Ai For Sentiment Analysis
Senticnet 6: Ensemble Application of Symbolic and Subsymbolic Ai For Sentiment Analysis
ABSTRACT of data and, for instance, making predictions, suggestions, and cat-
Deep learning has unlocked new paths towards the emulation of egorizations based on them. All such classifications are made by
the peculiarly-human capability of learning from examples. While transforming real items that need to be classified into numbers or
this kind of bottom-up learning works well for tasks such as im- features in order to later calculate distances between them. While
age classification or object detection, it is not as effective when it this is good for making comparison between such items and cluster
comes to natural language processing. Communication is much them accordingly, it does not tell us much about the items them-
more than learning a sequence of letters and words: it requires a selves. Thanks to machine learning, we may find out that apples
basic understanding of the world and social norms, cultural aware- are similar to oranges but this information is only useful to clus-
ness, commonsense knowledge, etc.; all things that we mostly learn ter oranges and apples together: it does not actually tell us what
in a top-down manner. In this work, we integrate top-down and an apple is, what it is usually used for, where it is usually found,
bottom-up learning via an ensemble of symbolic and subsymbolic how does it taste, etc. Throughout the span of our lives, we learn a
AI tools, which we apply to the interesting problem of polarity lot of things by example but many others are learnt via our own
detection from text. In particular, we integrate logical reasoning personal (kinaesthetic) experience of the world and taught to us by
within deep learning architectures to build a new version of Sentic- our parents, mentors, and friends. If we want to replicate human
Net, a commonsense knowledge base for sentiment analysis. intelligence into a machine, we cannot avoid implementing this
kind of top-down learning.
KEYWORDS Integrating logical reasoning within deep learning architectures
has been a major goal of modern AI systems [19, 61, 65]. Most
Knowledge representation and reasoning; Sentiment analysis
of such systems, however, merely transform symbolic logic into
ACM Reference format: a high-dimensional vector space using neural networks. In this
Erik Cambria, Yang Li, Frank Z. Xing, Soujanya Poria, and Kenneth Kwok. work, instead, we do the opposite: we employ subsymbolic AI
2020. SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for for recognizing meaningful patterns in natural language text and,
Sentiment Analysis. In Proceedings of the 29th ACM International Conference
hence, represent these in a knowledge base, termed SenticNet 6,
on Information and Knowledge Management, Virtual Event, Ireland, October
19–23, 2020 (CIKM ’20), 10 pages.
using symbolic logic. In particular, we use deep learning to gen-
https://doi.org/10.1145/3340531.3412003 eralize words and multiword expressions into primitives, which
are later defined in terms of superprimitives. For example, expres-
sions like shop_for_iphone11, purchase_samsung_galaxy_S20
1 INTRODUCTION or buy_huawei_mate are all generalized as BUY(PHONE) and later
The AI gold rush has become increasingly intense for the huge reduced to smaller units thanks to definitions such as BUY(x)=
potential AI offers for human development and growth. Most of GET(x) ∧ GIVE($), where GET(x) for example is defined in terms
what is considered AI today is actually subsymbolic AI, i.e., machine of the superprimitive HAVE as !HAVE(x)→ HAVE(x).
learning: an extremely powerful tool for exploring large amounts While this does not solve the symbol grounding problem, it helps
reducing it to a great degree and, hence, improves the accuracy
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
of natural language processing (NLP) tasks for which statistical
for profit or commercial advantage and that copies bear this notice and the full citation analysis alone is usually not enough, e.g., narrative understanding,
on the first page. Copyrights for components of this work owned by others than ACM dialogue systems and sentiment analysis. In this work, we focus
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a on sentiment analysis where this ensemble application of symbolic
fee. Request permissions from [email protected]. and subsymbolic AI is superior to both symbolic representations
CIKM ’20, October 19–23, 2020, Virtual Event, Ireland and subsymbolic approaches, respectively.
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-6859-9/20/10. . . $15.00
https://doi.org/10.1145/3340531.3412003
105
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
106
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
3.1 biLSTM
To extract the contextual features from these subsentences, we use
the biLSTM model on L and C independently. Given that we repre-
sent the word vector for the t t h word in a sentence as x t , the LSTM Figure 2: Overall framework for context and word embed-
transformation can be performed as: ding generation.
107
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
It generates a vector which provides weights corresponding to Algorithm 1 Context and target word embedding generation
the relevance of the underlying context across the sentence. Below, 1: procedure TrainEmbeddings
we describe the attention formulation applied on the left context 2: Given sentence S = [w 1 , w 2 , ...w n ] s.t. w i is target word.
sentence. H LC can be represented as a sequence of [ht ] where 3: L ← E([w 1 , w 2 , ...w i−1 ]) ▷ E() : word2vec embedding
t ∈ [1, l]. Let A denote the attention network for this sentence. The 4: R ← E([w i+1 , w 2 , ...w n ])
attention mechanism of A produces an attention weight vector α 5: C ← E(w i )
and a weighted hidden representation r as follows: 6: c ←TargetWordEmbedding(C)
P = tanh(Wh .H LC ) (9) 7: v ←ContextEmbedding(L, R)
T 8: NegativeSampling(c, v)
α = so f tmax(w .P) (10)
9: procedure TargetWordEmbedding(C)
r = H LC .α T (11) 10: C ∗ = tanh(Wa .c + ba )
c = tanh(Wb .C ∗ + bb )
where P ∈ Rd×l , α ∈ Rl , r ∈ R2d . And, Wh ∈ Rd×2d , w ∈ Rd are
11:
12: return c
projection parameters (Table 1). Finally, the sentence representation
is generated as: 13: procedure ContextEmbedding(L, R)
H LC ← ϕ
r ∗ = tanh(Wp .r )
14:
(12)
15: ht −1 ← 0
Here, r ∗ ∈ R2d and Wp ∈ Rd×2d is the weight to be learnt while 16: for t:[1,i − 1] do
training. This generates the overall sentential context representa- 17: ht ← LST M(ht −1 , Lt )
tion for the left context sentence: E LC = r ∗ . Similarly, attention is 18: H LC ← H LC ∪ ht
also applied to the right context sentence to get the right context 19: ht −1 ← ht
sentence E RC . To get a comprehensive feature representation of 20: H RC ← ϕ
the context for a particular concept, we fuse the two sentential con- 21: ht −1 ← 0
text representations, E LC and E RC , using a NTN [52]. It involves 22: for t:[i + 1,n] do
a neural tensor T ∈ R 2d×2d×k which performs a bilinear fusion 23: ht ← LST M(ht −1 , R t )
across k dimensions. Along with a single layer neural model, the 24: H RC ← H RC ∪ ht
overall fusion can be shown as: 25: ht −1 ← ht
E LC
v = tanh(ETLC .T [1:k ] .E RC + W . + b) (13) 26: E LC ←Attention(H LC )
E RC 27: E RC ←Attention(H RC )
Here, the tensor product ETLC .T [1:k ] .E RC is calculated to get a 28: v ←NTN(E LC , E RC )
return v
vector v∗ ∈ R k such that each entry in the vector v∗ is calculated
29:
30: procedure LSTM(h t −1 ,x t )
as vi∗ = ETLC .T [i] .E RC , where T [i] is the i t h slice of the tensor
ht −1
T . W ∈ R k ×4d and b ∈ R k are the parameters (Table 1). The 31: X =
xt
tensor fusion network thus finally provides the sentential context 32: ft = σ (Wf .X + bf )
representation v. 33: i t = σ (Wi .X + bi )
34: ot = σ (Wo .X + bo )
3.4 Negative Sampling 35: c t = ft ⊙ c t −1 + i t ⊙ tanh(Wc .X + bc )
To learn the appropriate representation of sentential context and 36: ht = ot ⊙ tanh(c t )
target word, we use word2vec’s negative sampling objective func- 37: return ht
tion. Here, a positive pair is described as a valid context and word 38: procedure Attention(H )
pair and the negative pairs are created by sampling random words 39: P = tanh(Wh .H )
from a unigram distribution. Formally, our aim is to maximize the 40: α = so f tmax(w T .P)
following objective function: 41: r = H .α T
Õ Õz 42: return r
Obj = (loд(σ (c.v)) + loд(σ (−ci .v))) (14) 43: procedure NTN(E LC , E RC )
c,v E LC
i=1
44: v = tanh(ETLC .T [1:k ] .E RC + W . + b)
Here, the overall objective is calculated across all the valid word E RC
and context pairs. We choose z invalid word-context pairs where 45: return v
each −ci refers to an invalid word with respect to a context.
3.5 Context embedding using BERT In one of the tasks, BERT randomly masks a percentage of words
We leverage the BERT architecture [16] to obtain the sentential in the sentences and only predicts those masked words. In the
context embedding of a word. BERT utilizes a transformer net- other task, BERT predicts the next sentence given a sentence. This
work to pre-train a language model for extracting contextual word task, in particular, tries to model the relationship among two sen-
embeddings. Unlike ELMo and OpenAI-GPT, BERT uses different tences which is supposedly not captured by traditional bidirectional
pre-training tasks for language modeling. language models.
108
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
The goal now is to find a substitute for the target word having
the same parts of speech in the given context. To achieve this, we
obtain the context and target word embeddings (v and c) from the
joint hyperspace of the network. For all possible substitute words b,
we then calculate the cosine similarity using equation 16 and rank
them using this metric for possible substitutes. This substitution
leads to new verb-noun or adjective-noun pairs which bear the
same conceptual meaning in the given context. The context2vec
code for primitive discovery is available on our github1 .
4 PRIMITIVE SPECIFICATION
The deep learning framework described in the previous section
allows for the automatic discovery of concept clusters that are se-
mantically related and share a similar lexical function. The label
Figure 3: An example of primitive specification. of each of such cluster is a primitive and it is assigned by select-
ing the most typical of the terms. In the verb cluster {increase,
enlarge, intensify, grow, expand, strengthen, extend,
Consequently, this particular pre-training scheme helps BERT
widen, build_up, accumulate...}, for example, the term with the
to outperform state-of-the-art techniques by a large margin on
highest occurrence frequency in text (the one people most com-
key NLP tasks such as question answering and natural language
monly use in conversation) is increase.
inference where understanding the relation among two sentences
Hence, the cluster is named after it, i.e., labeled by the prim-
is very important. In SenticNet 6, we utilize BERT as follows:
itive INCREASE and later defined either via symbolic logic, e.g.,
• First, we fine-tune the pre-trained BERT network on the INCREASE(x) = x + a(x), where a(x) is an undefined quantity
ukWaC corpus [4]. related to x, or in terms of polar transitions, e.g., INCREASE: LESS
• Next, we calculate the embedding for the context v. For this, → MORE (Fig. 3). Symbolic logic is usually used to define super-
we first remove the target word c, i.e., either the verb or primitives or neutral primitives. Polar transitions are used to define
noun from the sentence. The remainder of the sentence is polarity-bearing verb primitives in terms of polar state change
then fed to the BERT architecture which returns the context (from positive to negative and vice versa) via a ying-yang kind of
embedding. clustering [64].
• Finally, we adopt a new similarity measure in order to find In both cases, the goal is to define the connotative information
the replacement of the word. For this, we need the embedding associated with primitives and, hence, associate a polarity to them
of the target word which we obtain by simply feeding the (explained in the next section). Such a polarity is later transferred
word to BERT pre-trained network. Given a target word c to words and multiword expressions via a four-layered knowledge
and its sentential context v, we calculate the cosine distance representation (Fig. 4).
of all the other words in the embedding hyperspace with
both c and v. If b is a candidate word, the distance is then 1 http://github.com/senticnet/context2vec
calculated as:
dist(b, (c, v)) = cos(b, c) + cos(b, v) +
(15)
cos(BERT (v, b), BERT (v, c)) Parameters
where BERT (v, b) is the BERT-produced embedding of the Weights
sentence formed by replacing word c with the candidate Wi ,Wf ,Wo ,Wc ∈ Rd×(d+dw ) Wp ∈ Rd×2d
word b in the sentence. Similarly, BERT (v, c) is the embed- Wb ∈ R k ×d Bias
ding of the original sentence which consists of word c. Wa ∈ R d×dw b i , b f , bo ∈ Rd
A stricter rule to ensure high similarity between the target T ∈ R 2d×2d×k ba ∈ Rd
and candidate word is to apply multiplication instead of Wh ∈ Rd×2d b ∈ Rk
addition: W ∈R k ×4d bb ∈ Rk
dist(b, (c, v)) = cos(b, c) · cos(b, v)· w ∈R d
(16)
cos(BERT (v, b), BERT (v, c)) Hyperparameters
d dimension of LSTM hidden unit
We rank the candidates as per their cosine distance and
k NTN tensor dimension
generate the list of possible lexical substitutes.
z negative sampling invalid pairs
First, we extract all the concepts of the form verb-noun and
Table 1: Summary of notations used in Algorithm 1. Note: dw
adjective-noun present in ConceptNet 5 [54]. An example sentence
is the word embedding size. All the hyperparameters were
for each of these concepts is also extracted. Then, we take one word
set using random search [5].
from the concept (either a verb/adjective or a noun) to be the target
word and the remaining sentence serves as the context.
109
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
In this representation, in particular, named entities are linked to the topological structure of the vector space from one state to its
commonsense concepts by IsA relationships from IsaCore [11], a antithetic partner is more likely to contain concepts that are both
large subsumption knowledge base mined from 1.68 billion web- semantically and affectively relevant. To calculate such a path, we
pages. Commonsense concepts are later generalized into primitives use regularized k-means (RKM) [20], a novel algorithm that finds a
by means of deep learning (as explained in the previous section). morphism between a given point set and two reference points in a
Primitives are finally deconstructed into superprimitives, basic vector space X ∈ Rd where d ∈ N + by exploiting the information
states and actions that are defined by means of first order logic, e.g., provided by the available data.
HAVE(subj,obj)= ∃ obj @ subj. Such morphism is described as a discrete path, composed by a
set of prototypes selected based on the data manifolds. Consider
4.1 Key Polar State Specification a set of points X = {x j ∈ Rd }, j = 1, ..., N and two points w 0
In order to automatically discover words and multiword expres- and w Nc ∈ Rd . The path connecting the two points w 0 and w Nc +1
sions that are both semantically and affectively related to key polar is described as an ordered set W of Nc prototypes w ∈ Rd . Such
states such as EASY versus HARD or STABLE versus UNSTABLE, we path is found by minimizing standard k-means cost function with
use AffectiveSpace [7], a vector space of affective commonsense the addition of a regularization term that considers the distance
knowledge built by means of semantic multidimensional scaling. between ordered centroids.
By exploiting the information sharing property of random projec- The cost function can be formalized as:
N Nc Nc
tions, AffectiveSpace maps a dataset of high-dimensional semantic γ ÕÕ λÕ
and affective features into a much lower-dimensional subspace in min ∥x i − w j ∥ 2δ (ui , j) + ∥w i+1 − w i ∥ 2 (17)
W 2 2 i=0
which concepts conveying the same polarity and similar meaning i=1 j=1
fall near each other. In past works, this vector space model has been where ui is the datum cluster.
used to classify concepts as positive or negative by calculating the The novel cost function is composed of two terms weighted by
dot product between new concepts and prototype concepts. the hyper-parameters γ and λ:
In this case, rather than a distance, we need a discrete path
between a key polar state and its opposite (e.g., CLEAN and DIRTY) Ω(W , u, X , γ , λ) = γ ΩX (W , u, X ) + λΩW (W ). (18)
throughout the vector space manifolds. While the shortest path (in The first term coincides with the standard k-means cost func-
a k-means sense) between two polar states in AffectiveSpace risks tion while the second one induces a path topology based on the
to include many irrelevant concepts, in fact, a path that follows centroids ordering and controls the level of smoothness of the path.
110
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
5 EXPERIMENTS
In this section, we evaluate the performance of both the subsymbolic
and symbolic segments of SenticNet 6 (the former being the deep
learning framework for primitive discovery, the latter being the
logic framework for primitive specification) on 9 different datasets.
ntme
re
sp
ce
the center of the space, however, there are many low-intensity (al- on
ity
conte
an
si
v
pt
en
ce
ann
an
ly
sli
ncho
xie
sign the first 20 concepts of the path (e.g., cleaned, spotless, and
di
oya
ty
nce
disgust fear
filthy, stained, and soiled) to pend . sadness anger
We also use this morphism to assign emotion labels to key polar loathing terror
states, based on the average distance (dot product) between the con-
cepts of the path (the first 20 and the last 20, respectively) and the grief rage
111
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
Model LJ-5k
K-means 77.91%
Sentic medoids 82.76%
RKM 91.54%
Table 3: Comparison between RKM and two baselines on a
dataset for concept polarity detection.
112
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
Model Year SST Dataset [53] STS Dataset [50] SemEval-2013 [40] SemEval-2015 [47] SemEval-2016 [39] Sanders [2]
ANEW [6] 1999 31.21% 36.77% 42.72% 33.13% 42.20% 27.70%
WordNet-Affect [55] 2004 04.51% 11.98% 03.82% 03.27% 03.53% 05.64%
Opinion Lexicon [22] 2004 54.21% 60.72% 41.00% 43.15% 37.83% 54.33%
Opinion Finder [63] 2005 53.60% 55.71% 47.50% 43.97% 46.75% 46.98%
Micro WNOp [12] 2007 15.45% 18.94% 19.13% 16.97% 17.85% 15.36%
Sentiment140 [21] 2009 55.75% 67.69% 45.67% 50.92% 41.70% 64.95%
SentiStrength [59] 2010 36.76% 51.53% 37.28% 41.51% 33.97% 44.85%
SentiWordNet [3] 2010 50.19% 48.75% 50.15% 50.31% 49.62% 43.55%
General Inquirer [57] 2011 25.91% 11.14% 16.06% 12.47% 16.78% 10.29%
AFINN [41] 2011 44.81% 58.50% 43.82% 44.99% 40.13% 53.19%
EmoLex [38] 2013 46.94% 47.63% 45.12% 42.33% 42.38% 44.12%
NRC HS Lexicon [67] 2014 47.90% 49.86% 28.56% 42.54% 25.28% 54.33%
VADER [23] 2014 50.72% 64.90% 50.36% 49.08% 45.93% 57.27%
MPQA [15] 2015 53.71% 55.43% 46.75% 43.97% 45.42% 46.57%
SenticNet 5 [10] 2018 53.61% 55.71% 68.17% 56.03% 70.80% 48.37%
SenticNet 6 2020 75.43% 83.82% 81.79% 80.19% 82.23% 77.62%
Table 4: Comparison with 15 popular lexica on 6 benchmark datasets for sentiment analysis (top 3 results in bold).
Since most of the datasets we used are for Twitter sentiment To enhance the accuracy of all such tasks, we propose a new
analysis, initially we also wanted to apply microtext normalization version of SenticNet built using an approach to knowledge rep-
to all sentences before processing them through the lexica. If we did resentation that is both top-down and bottom-up: top-down for
that, however, we should have also applied many other NLP tasks the fact that it leverages symbolic models (i.e., logic and semantic
required for proper polarity detection [9], e.g., anaphora resolution networks) to encode meaning; bottom-up because it uses subsym-
and sarcasm detection, so eventually we refrained from doing so. bolic methods (i.e., biLSTM and BERT) to implicitly learn syntactic
Classification results are shown in Table 4. SenticNet 6 was the patterns from data. We believe that coupling symbolic and subsym-
best-performing lexicon mostly because of its bigger size (200,000 bolic AI is key for stepping forward in the path from NLP to natural
words and multiword expressions). Most of the classification errors language understanding. Machine learning is only useful to make
made by other lexica, in fact, were due to a missing entry in the a ‘good guess’ based on past experience because it simply encodes
knowledge base. Most of the sentences misclassified by SenticNet 6, correlation and its decision-making process is merely probabilistic.
instead, were using sarcasm or contained microtext. As professed by Noam Chomsky, natural language understanding
requires much more than that: “you do not get discoveries in the
6 CONCLUSION sciences by taking huge amounts of data, throwing them into a
In the past, SenticNet has been employed for many different tasks computer and doing statistical analysis of them: that’s not the way
other than polarity detection, e.g., recommendation systems [24], you understand things, you have to have theoretical insights”.
stock market prediction [31], political forecasting [46], irony de-
tection [60], drug effectiveness measurement [42], depression de- ACKNOWLEDGMENTS
tection [14], mental health triage [1], vaccination behavior detec- This research is supported by the Agency for Science, Technol-
tion [27], psychological studies [29], and more. ogy and Research (A*STAR) under its AME Programmatic Funding
Scheme (Project #A18A2b0046).
REFERENCES
[1] Hayda Almeida, Marc Queudot, and Marie-Jean Meurs. 2016. Automatic triage of
mental health online forum posts: CLPsych 2016 system description. In Workshop
on Computational Linguistics and Clinical Psychology. 183–187.
[2] Sanders Analytics. 2015. Sanders Dataset. (2015). http://sananalytics.com/lab
[3] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet
3.0: an enhanced lexical resource for sentiment analysis and opinion mining.. In
LREC. 2200–2204.
[4] Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The
WaCky wide web: a collection of very large linguistically processed web-crawled
corpora. Language resources and evaluation 43, 3 (2009), 209–226.
[5] James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter
optimization. The Journal of Machine Learning Research 13, 1 (2012), 281–305.
[6] Margaret Bradley and Peter Lang. 1999. Affective Norms for English Words
(ANEW): Stimuli, Instruction Manual and Affective Ratings. Technical Report. The
Center for Research in Psychophysiology, University of Florida.
[7] Erik Cambria, Jie Fu, Federica Bisio, and Soujanya Poria. 2015. AffectiveSpace
Figure 8: Sentiment data flow for the sentence “The car is 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis. In AAAI.
very old but rather not expensive” using linguistic patterns. 508–514.
113
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
[8] Erik Cambria, Thomas Mazzocco, Amir Hussain, and Chris Eckl. 2011. Sen- [38] Saif M Mohammad and Peter D Turney. 2013. Crowdsourcing a word–emotion
tic Medoids: Organizing Affective Common Sense Knowledge in a Multi- association lexicon. Computational Intelligence 29, 3 (2013), 436–465.
Dimensional Vector Space. In LNCS 6677. 601–610. [39] Preslav Nakov, Alan Ritter, Sara Rosentha, Fabrizio Sebastiani, and Veselin Stoy-
[9] Erik Cambria, Soujanya Poria, Alexander Gelbukh, and Mike Thelwall. 2017. anov. 2016. SemEval-2016 Task 4: Sentiment Analysis in Twitter. In SemEval.
Sentiment Analysis is a Big Suitcase. IEEE Intelligent Systems 32, 6 (2017), 74–80. [40] Preslav Nakov, Sara Rosenthal, Zornitsa Kozareva, Veselin Stoyanov, Alan Ritter,
[10] Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018. and Theresa Wilson. 2013. SemEval-2013 Task 2: Sentiment Analysis in Twitter.
SenticNet 5: Discovering conceptual primitives for sentiment analysis by means In SemEval. 312–320.
of context embeddings. In AAAI. 1795–1802. [41] Finn Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis
[11] Erik Cambria, Yangqiu Song, Haixun Wang, and Newton Howard. 2014. Semantic in microblogs. CoRR abs/1103.2903 (2011).
Multi-Dimensional Scaling for Open-Domain Sentiment Analysis. IEEE Intelligent [42] Samira Noferesti and Mehrnoush Shamsfard. 2015. Using Linked Data for polarity
Systems 29, 2 (2014), 44–51. classification of patients’ experiences. Journal of biomedical informatics 57 (2015),
[12] Sabrina Cerini, Valentina Compagnoni, Alice Demontis, Maicol Formentelli, and 6–19.
Caterina Gandini. 2007. Micro-WNOp: A gold standard for the evaluation of [43] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Senti-
automatically compiled lexical resources for opinion mining. Language resources ment classification using machine learning techniques. In EMNLP. 79–86.
and linguistic theory: Typology, Second Language Acquisition, English linguistics [44] Soujanya Poria, Erik Cambria, and Alexander Gelbukh. 2016. Aspect Extraction
(2007), 200–210. for Opinion Mining with a Deep Convolutional Neural Network. Knowledge-
[13] Zhuang Chen and Tieyun Qian. 2019. Transfer Capsule Network for Aspect Based Systems 108 (2016), 42–49.
Level Sentiment Classification. In ACL. 547–556. [45] Soujanya Poria, Erik Cambria, Alexander Gelbukh, Federica Bisio, and Amir
[14] Ting Dang, Brian Stasak, Zhaocheng Huang, Sadari Jayawardena, Mia Atcheson, Hussain. 2015. Sentiment Data Flow Analysis by Means of Dynamic Linguistic
Munawar Hayat, Phu Le, Vidhyasaharan Sethu, Roland Goecke, and Julien Epps. Patterns. IEEE Computational Intelligence Magazine 10, 4 (2015), 26–36.
2017. Investigating word affect features and fusion of probabilistic predictions [46] Lei Qi, Chuanhai Zhang, Adisak Sukul, Wallapak Tavanapong, and David Peter-
incorporating uncertainty in AVEC 2017. In Workshop on Audio/Visual Emotion son. 2016. Automated coding of political video ads for political science research.
Challenge. 27–35. In IEEE International Symposium on Multimedia. 7–13.
[15] Lingjia Deng and Janyce Wiebe. 2015. MPQA 3.0: An entity/event-level sentiment [47] Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif Mohammad, Alan
corpus. In NAACL. 1323–1328. Ritter, and Veselin Stoyanov. 2015. SemEval-2015 Task 10: Sentiment Analysis in
[16] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Twitter. In SemEval. 451–463.
Pre-training of Deep Bidirectional Transformers for Language Understanding. In [48] David Rumelhart and Andrew Ortony. 1977. The representation of knowledge in
NAACL-HLT. 4171–4186. memory. In Schooling and the acquisition of knowledge. Erlbaum, Hillsdale, NJ.
[17] Cıcero Nogueira dos Santos and Maıra Gatti. 2014. Deep convolutional neural [49] Ivan Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger.
networks for sentiment analysis of short texts. In COLING. 69–78. 2002. Multiword Expressions: A Pain in the Neck for NLP. In CICLing. 1–15.
[18] Umberto Eco. 1984. Semiotics and Philosophy of Language. Indiana University [50] Hassan Saif, Miriam Fernandez, Yulan He, and Harith Alani. 2013. Evaluation
Press. datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
[19] Richard Evans and Edward Grefenstette. 2018. Learning explanatory rules from In AI*IA.
noisy data. Journal of Artificial Intelligence Research 61 (2018), 1–64. [51] Roger Schank. 1972. Conceptual dependency: A theory of natural language
[20] Marco Ferrarotti, Sergio Decherchi, and Walter Rocchia. 2019. Finding Principal understanding. Cognitive Psychology 3 (1972), 552–631.
Paths in Data Space. IEEE Transactions on Neural Networks and Learning Systems [52] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013.
30, 8 (2019), 2449–2462. Reasoning with neural tensor networks for knowledge base completion. In NIPS.
[21] Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification 926–934.
using distant supervision. CS224N project report, Stanford 1, 12 (2009). [53] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning,
[22] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic
SIGKDD. 168–177. compositionality over a sentiment treebank. In EMNLP. 1631–1642.
[23] Clayton J Hutto and Eric GIlbert. 2014. VADER: A parsimonious rule-based model [54] Robert Speer and Catherine Havasi. 2012. ConceptNet 5: A Large Semantic Net-
for sentiment analysis of social media text. In ICWSM. 216–225. work for Relational Knowledge. In Theory and Applications of Natural Language
[24] Muhammad Ibrahim, Imran Sarwar Bajwa, Riaz Ul-Amin, and Bakhtiar Kasi. 2019. Processing. Chapter 6.
A neural network-inspired approach for improved and true movie recommenda- [55] Carlo Strapparava and Alessandro Valitutti. 2004. WordNet-Affect: An Affective
tions. Computational intelligence and neuroscience (2019), 4589060. Extension of WordNet. In LREC. 1083–1086.
[25] Ray Jackendoff. 1976. Toward an explanatory semantic representation. Linguistic [56] Yosephine Susanto, Andrew Livingstone, Bee Chin Ng, and Erik Cambria. 2020.
Inquiry 7, 1 (1976), 89–150. The Hourglass Model Revisited. IEEE Intelligent Systems 35, 5 (2020).
[26] Ray Jackendoff. 1983. Semantics and cognition. MIT Press. [57] Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede.
[27] Aditya Joshi, Xiang Dai, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C Raina 2011. Lexicon-based methods for sentiment analysis. Computational linguistics
MacIntyre. 2018. Shot or not: Comparison of NLP approaches for vaccination 37, 2 (2011), 267–307.
behaviour detection. In SMM4H@EMNLP. 43–47. [58] Duyu Tang, Furu Wei, Bing Qin, Ting Liu, and Ming Zhou. 2014. Coooolll: A
[28] Jerrold Katz and Jerry Fodor. 1963. The structure of a Semantic Theory. Language deep learning system for Twitter sentiment classification. In SemEval. 208–212.
39 (1963), 170–210. [59] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas.
[29] Megan O Kelly and Evan F Risko. 2019. The Isolation Effect When Offloading 2010. Sentiment strength detection in short informal text. Journal of the American
Memory. Journal of Applied Research in Memory and Cognition 8, 4 (2019), 471– society for information science and technology 61, 12 (2010), 2544–2558.
480. [60] Cynthia Van Hee, Els Lefever, and Véronique Hoste. 2018. We usually don’t like
[30] Gerhard Kremer, Katrin Erk, Sebastian Padó, and Stefan Thater. 2014. What going to the dentist: Using common sense to detect irony on Twitter. Computa-
Substitutes Tell Us - Analysis of an "All-Words" Lexical Substitution Corpus. In tional Linguistics 44, 4 (2018), 793–832.
EACL. 540–549. [61] Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. 2019. SATNet: Bridging
[31] Xiaodong Li, Haoran Xie, Raymond YK Lau, Tak-Lam Wong, and Fu-Lee Wang. deep learning and logical reasoning using a differentiable satisfiability solver. In
2018. Stock prediction via sentimental transfer learning. IEEE Access 6 (2018), ICML. 6545–6554.
73110–73118. [62] Anna Wierzbicka. 1996. Semantics: Primes and Universals. Oxford University
[32] Qiao Liu, Haibin Zhang, Yifu Zeng, Ziqi Huang, and Zufeng Wu. 2018. Content Press.
Attention Model for Aspect Based Sentiment Analysis. In WWW. 1023–1032. [63] Theresa Wilson, Paul Hoffmann, Swapna Somasundaran, Jason Kessler, Janyce
[33] Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Targeted aspect-based senti- Wiebe, Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. 2005.
ment analysis via embedding commonsense knowledge into an attentive LSTM. OpinionFinder: A system for subjectivity analysis. In HLT/EMNLP. 34–35.
In AAAI. 5876–5883. [64] Lei Xu. 1997. Bayesian Ying–Yang machine, clustering and number of clusters.
[34] Diana McCarthy and Roberto Navigli. 2007. SemEval-2007 task 10: English lexical Pattern Recognition Letters 18, 11 (1997), 1167–1178.
substitution task. In SemEval. 48–53. [65] Fan Yang, Zhilin Yang, and William Cohen. 2017. Differentiable learning of
[35] Oren Melamud, Omer Levy, Ido Dagan, and Israel Ramat-Gan. 2015. A Simple logical rules for knowledge base reasoning. In NIPS. 2319–2328.
Word Embedding Model for Lexical Substitution. In VS@HLT-NAACL. 1–7. [66] Wei Zhao, Haiyun Peng, Steffen Eger, Erik Cambria, and Min Yang. 2019. Towards
[36] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. scalable and reliable capsule networks for challenging NLP applications. In ACL.
Distributed representations of words and phrases and their compositionality. In 1549–1559.
NIPS. 3111–3119. [67] Xiaodan Zhu, Svetlana Kiritchenko, and Saif Mohammad. 2014. NRC-canada-
[37] Marvin Minsky. 1975. A framework for representing knowledge. In The psychol- 2014: Recent improvements in the sentiment analysis of tweets. In SemEval.
ogy of computer vision, Patrick Winston (Ed.). McGraw-Hill, New York. 443–447.
114