A Hybrid Model of Sentimental Entity Recognition o
A Hybrid Model of Sentimental Entity Recognition o
A Hybrid Model of Sentimental Entity Recognition o
Abstract
With new forms of media such as Twitter becoming increasingly popular, the Internet is now the main conduit of
individual and interpersonal messages. A considerable amount of people express their personal opinions about
news-related subject through Twitter, a popular SNS platform based on human relationships. It provides us a data
source that we can use to extract peoples’ opinions which are important for product review and public opinion
monitoring. In this paper, a hybrid sentimental entity recognition model (HSERM) has been designed. Utilizing 100
million collected messages from Twitter, the hashtag is regarded as the label for sentimental classification. In the
meanwhile, features as emoji and N-grams have been extracted and classified the collected topic messages into
four different sentiment categories based on the circumplex sentimental model. Finally, machine learning methods
are used to classify the sentimental data set, and an 89 % precise result has been achieved. Further, entities that are
behind emotions could be gotten with the help of SENNA deep learning model.
Keywords: Feature selection, Sentiment analysis, Sentiment classification, Entity recognition
© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made.
neutral, positive, and negative. It is limited to help people Finally, combining some rules, the names could be recog-
listen to the real voice and emotion of society. nized automatically. Turkish scholars [12] did the
The study of social media is relatively new. As early as named-entity recognition on their domestic twitter. In
2005, Park et al. [3] began to analyze the emotion on their article, a new named-entity-annotated tweet corpus
Twitter. They labeled more than 20,000 tweets and was presented and the various tweet-specific linguistic
created an emotion polarity dataset by labeling the phenomena were analyzed. After that, Derczynski L and
neutral-positive-negative emotional tag manually. Next, his group [13] also worked on the similar field. Even
they developed an emotion classifier by using machine Xu et al. [14] have a patent on named-entity recogni-
learning method based on Naïve Bayes, support vector tion inquiry.
machine (SVM), and conditional random fields (CRF). Some relevant techniques in data mining of tweets’
Read et al. [4] put forward that they used twitter applica- fine-grained sentiment analysis will be researched, in-
tion program interface (API) to get a great number of cluding the methods of tweets collection, the tweets
emotion icon and demonstrated these icon’s effect on pre-processing, and the construction of knowledge.
emotion classification in detail. Go et al. [5] developed Based on a tweets’ emotional dictionary, sentiment ana-
three machine learning classifiers based on Naïve Bayes, lysis based on weighted emotional words meaning and
maximum entropy, and SVM by using a non-supervisor sentiment analysis based on multi-feature fusion.
machine learning algorithm. They added the emotion Tweets text has the features of a large amount of data,
icons into the selected features which caused the accuracy covering a wide range and rapidity, so it impossible to
of the emotional tendency discrimination to be more than monitor hot events and analyze guidance of public
80 %. This research has been applied to many business opinion manually. For processing the huge amount of
fields such as online shopping, online film review, and unstructured text data, machine learning and deep
online 4S shop message. For instance, Fei HongChao learning have had some certain breakthrough in the
analyzed the review text aimed at Yahoo English Sports. field of text processing. In the part of sentiment ana-
Through that, the attitude of the investors to the stock lysis, we will build a circumplex sentiment model by
market could be discovered. Ghose et al. researchers using hashtags as the classification tags, catching N-
started to apply the LingPipe for emotion classifica- gram, and emoji features. Then, the emotion will be
tion. They tried to increase the accuracy of classifiers classified through the processing of a SENNA model. It
by labeling the training set manually and then recog- is possible to classify four kinds of emotions which we
nized the emotion tendency of the original text. The described in advance.
amount of research about text mining based on emo-
tion is growing, and the related research fields are ex- 3 Definition of question
tended at the same time. R. Pavitra [6] established an We aim to deduce the user’s emotion by analyzing their
analysis model based on the weakly supervised joint tweet text. To give formalized definition of this question,
sentiment-topic mode and created a sentiment the- we predefine some like below:
saurus with positive and negative lexicons to find the
sentiment polarity of the bigrams. Wang and Cui [7, 8] Definition 1(Tweet words w): Since each word in
worked on group events and disease surveillance to re- a tweet is possible related to users’ emotion,
search the sentiment analysis. They also extended the so we add up all the words in blog text and use a
data source to multimedia for research on sentiment two-tuples to represent it, w = {t, a}. t is the text
analysis [9]. form of w, a is the frequency of w in a tweet.
Recently, with the development of computer tech- Definition 2 (sentiment dictionary D): for each
nology on information searching and search engines, sentiment, we can design a dictionary which
named-entity recognition has been a hot topic in the can represent it sharply, called sentiment
field of natural language processing. Asahara [10] per- dictionary. The dictionaries of different sentiment
formed the automatic identification on names and or- can include same words, since dictionary exacts
ganizations by SVM and got some good results. Tan influence on the sentiment analysis as entirety.
utilized a method based on transferring the wrong drive We use a two-tuples to represent each sentiment
to get context contact rules of naming entity places. dictionary: Di = {d, v}. d is each word in dictionary,
Next is using rules to implement automatic identifica- v is central vector of this sentiment. The closer
tion the names of places. According to the data test, the user’s vector model is to central vector, the more
accuracy of this method can achieve 97 %. Huang et al. likely the user is to be this sentiment. The words
[11] got a large amount of statistics data from vast real in dictionary also can be represented as two-tuples:
text data, and they calculated every one’s reliability of d = {t, c}. t is the text form of d, c is the relevancy
continually-words-construction and words construction. of the word d and the sentiment.
If there is a large amount of data, according to the In the same way, we can also get the concept of N-
Markov assumption, the probability of a word gram model.
appears only associated with the probability of the In our research, we used the improved version
word in front of it, and then problem becomes of the N-gram model, namely adding padding
simple. Therefore, uni-gram model changed to the characters (general spaces or whitespace) at the
binary model bi-gram. beginning of each bi-gram and tri-gram to increase
the number of grams, improving the prediction
P ðW Þ material Pð Þ P ð j Þ P ð jÞ … P ð j Þ ð2Þ
accuracy of the models, as shown in Fig. 1.
In the same way, we can get tri-gram, the probability Sometimes, a tweet contains only a few words,
of words appearance only related to the probability using the tri-gram model can only static few
of two words in front of it. characteristics, but the feature quantity is
improved significantly after adding the padding
P ðW Þ material Pð Þ P ð j Þ P ð j ÞP ð jÞ … P ð j Þ ð3Þ characteristic.
S: sentimental category table included in system Table 6 Label table of sentiment classification in Twitter
Emotion type Label
Output: the sentiment E of target tweet t Happy-active 1
Happy-inactive 2
1. For each T’s every word w do
Unhappy-active 3
2. For each α1 not satisfied KKT condition
3. distance (α1, α2)
If min =X Unhappy-inactive 4
m
4. Check i‐1
αi label ¼ 0
5. Calculate α1, α2
6. Check label * (WTX + b) ≥ 1.0 presented in the Hinton paper. The basic idea of deep
7. Calculate b1,b2 learning is that, for a system S with N level (S1, S2… SN).
8. Returnα matrix and b matrix If input is I and output is O, a formula could be used to
express this system as I= > S1= > S2= > S3 = > Sn= > O.
4.5 Entity detection This system should automatically learn some features to
Named-entity recognition, also called entity recognition, help people make decisions. By adjusting parameters in
means the entities with specific meaning in a series of each layer of the system, the output of the lower level
text and it mainly refers to the names of people, places, could be the input for the higher level. And by piling from
organizations, and proper noun. There are four main various layers, there could be a hierarchical expression of
techniques for named-entity recognition: input information.
Deep learning training model of the system is in Fig. 3.
(1)The statistical-based recognition method. The main Natural language is the human-use communication
statistical models for named-entity recognition and direct language, but in the order for computers to
include hidden Markov model, decision tree make computational identification, it needs to convert
model, support vector machine (SVM) model, natural language into computer-use symbols, usually
maximum entropy model, and conditional called the digital of natural language. In deep learning,
random fields model. words are embedded to represent words. The word em-
(2)The rule-based recognition method. It mainly uses bedding method was proposed by Bengio more than a
two pieces of information: restrictive clauses and dozen years ago. The words in the language are mapped
named-entity words. into the high-dimensional vector with 200 to 500 dimen-
(3)The recognition method of combining rules sions. By training word vector with deep learning, each
and statistics. Some mainstream. Named-entity word has their corresponding spatial coordinates in
recognition systems combine the rules and the high-dimensional space. The sample of space coordinate
statistics. First, they use the statistical methods to map is in Fig. 4:
the image to recognize the entities, and then, At the beginning of the training process of the word
correct and filter them by the rules. vector, each word will be given a random vector. For ex-
(4)The recognition method based on machine learning. ample, deep learning is used to predict if a quintet
This technology in English is developed to some phrase is true, such as “Benjamin likes play the basket-
extent. Classifying English words by SVM machine ball”. If taking the replacement for any one of the words
methods can achieve an accuracy more than 99 % in this sentence, such as replacing “the” with “theory”,
when the places or names of people are recognized. “Benjamin likes play theory basketball” is obviously not
true in the grammar. Using models trained by deep
Deep learning is a new branch in the field of machine learning. It is possible to predict whether changed quintet
learning and a kind of algorithm that stimulates func- phrases are true or not.
tions of the brain. Deep learning originated from the SENNA not only proposed the method for building
deep belief nets originated from the Boltzmann machine word embedding but also solved the natural language
Table 5 A sample table of sentiment classification in Twitter Table 7 Sample labeling of Twitter
Tweets Type Tweets Label
Happy-active 1
Unhappy-active 3
Happy-active 1
Unhappy-inactive 4
processing tasks (POS, Chunking, NER, SRL) from the tweets as the tag which are automatically classified by
perspective of neural network language model system. In using a machine method. A hashtag, as a level of tag in
SENNA, each word can be directly found from the the tweets, is used to record a certain topic. The paper be-
lookup table. lieves that the tweets with the label of a certain sentiment
The word vectors of HLBL in SENNA which are dif- category belong to the category, so as to implement the
ferent from each other are used to depict different se- automatic classification of the machine.
mantics and grammar usage. The word vectors of a
word are combined by various forms of the vector word
eventually. SENNA directly pieced the vectors together 5.2 Data preprocessing
and represented the words. Not like the traditional news or media data, tweets are a
Then, the emotion will be classified through the pro- kind of data as daily expression, so they have a lot of
cessing of a SENNA model. It is possible to classify four error and “noise”. Before classification, the “noisy” data
kinds of emotions which we described in advance. should be deleted as following:
Table 8 Four kind of emotion and the entity extracted from them
Happyactive HappyInactive UnhappyActive UnhappyInactive
Ganada Netflix Mad Men Pocket Full Of Gold
NDSU youtube LiveYourTruth AmnestyOnline
Sensation Levis HolyBible AhmedahliCom
Game Newspaper KingLikeAQueen EdgarAllanPoe
Filmphotography VallartaGV NTUWFC Elena
KimFCoates yoga backstreetboys JonnyValleyBoy
Ft. Beyonce CandyCrushSaga BLOOMparenting GinyTonicBlog
Drake ICandlelighters STFU Louise Havasupai
StuartPWright HillCountry YL train Bethany
Longley_Farm SLU Rebecca De Mulder David Letterman
5.4 Classifier training and result analysis SVM is different. The value of precision rate is around
From the graph, we can see that by using different way 89 % (88.9~89.8 %), and the recall rate is around 89 %
of feature extraction, the precision rate and recall rate of (88.6~90 %), too. It proved that with uni-gram, emoji,
Naïve Bayes is different. The value of precision rate is and punctuation of these all features, the precision rate
around 86.5 % (86.3~86.9 %), and the recall rate is around can be up to the maximum 89.8 % (Fig. 8).
86.5 % (86.3~87 %), too. It proved that with uni-gram, Comparing the data in this project, it is obvious that
emoji, and punctuation of these all features, the precision by using uni-gram, emoji, and punctuation as character-
rate can up to the maximum 86.9 % (Fig. 5). istics and SVM as emotional classifiers, the classification
From the graph we can see that by using different way accuracy could reach 89.8 %. SVM is the best sentimen-
of feature extraction, the precision rate and recall rate of tal classification method for our experiment.
Logistic regression is different. The value of precision
rate is around 85 % (84.2~85.9 %), and the recall rate is 5.5 Results and analysis of named-entity recognition
around 85 % (84.9~86 %), too. It proved that with uni- After the training of emotional classifiers, an automatic
gram, emoji, and punctuation of these all features, the classifier has been implemented to deal with 5000 new
precision rate can be up to the maximum 85.9 % data values. Also, the named-entity recognition has been
(Fig. 6). processed for each type of data. In this section, the
From the graph, we can see that by using different way SENNA deep learning toolkit is adopted for entity ex-
of feature extraction, the precision rate and recall rate of traction for each type of data and at the same time all of
KNN is different. The value of precision rate is around these words are sorted. Table 8 shows the top 10 results.
88.5 % (88.1~89.1 %), and the recall rate is around For each type of emotional entities, visual graphic dis-
88.5 % (88.3~89 %), too. It proved that with uni-gram, plays as follows (Figs. 9, 10, 11, and 12).
emoji, and punctuation of these all features, the preci- By using SENNA, we extracted the emotional entities
sion rate can be up to the maximum 89.1 % (Fig. 7). from 5000 new data values. Actually, these entities are
From the graph, we can see that by using different way the reasons why users show these types of emotions.
of feature extraction, the precision rate and recall rate of The result shows that Netflix could let the users feel
“HappyInactive”. At the same time, when we read the
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at