Classifying mood in plurks
Mei-Yu Chen*
[email protected]
Hsin-Ni Lin*
[email protected]
Chang-An Shih**
[email protected]
Yen-Ching Hsu*
[email protected]
Pei-Yu Hsu*
[email protected]
Shu-Kai Hsieh*
[email protected]
*Department of English, National Taiwan Normal University
** Institute of Linguistics, Academia Sinica
Abstract
In this paper, we present a simple but efficient approach for the automatic mood classification
of microblogging messages from Plurk platform. In contrast with Twitter, Plurk has become
the most popular microblogging service in Taiwan and other countries 1; however, no previous
research has been done for the emotion and mood recognition, nor the Chinese affective
terms or corpus available. Following the line of mashup programming, we thus construct a
dynamic plurk corpus by pipelining Plurk APIs, Yahoo! Chinese segmentation APIs, etc to
preprocess and annotate the corpus data. Based on the corpus, we conduct experiments by
way of combining textual statistics and emoticons data, and our method yield the results with
high performance. This work can be further extended to combine with affective ontology
designed with emotion theory of appraisal.
Keyword: mood classification, plurks, keyness, emotion paradox
1
According to Alvin, the cofounder of Plurk website, the number of the plurkers in Taiwan had reached
approximately 1 million, which was one-third of the total plurkers in October, 2009. Another statistic data is
collected from Google trend for website, manifesting that Taiwan is the rank one region of visiting Plurk website
(August,2010).
172
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), Pages 172-183,
Puli, Nantou,Taiwan, September 2010.
1. Introduction
Recently, with the emergence of Web 2.0 content, sentiment and opinion mining in social
media and network domains like Blogs, has gained great attention from computational
linguistics and other fields. Mood/Emotion classification, as related to sentiment
classification, whose goal is set to tag a text based on whether it expresses positive or
negative sentiment, has been surveyed on lyrics in music domain.
Mood classification is useful for various applications; the mood intelligence module can
be incorporated in (affective) dialogue systems, learning tutors, etc. to improve the
naturalness of human-machine interaction.
Figure 1. Number of Plurkers
This paper addresses the task of classifying microblogging posts by mood. What
motivates us to conduct this research is the fact that the above figure shows: Plurk 2 is the
most commonly used microblogging platform in Asia, and in contrast with other social
networks, it has also become one of the most common forms of computer-mediated
communication in Taiwan. In this paper, we aim to propose a simple and effective method in
automatic classifying emotion of plurk messages for emotion.
The rest of the paper is organized as follows. Section 2 introduces the emoticons in plurk
messages. Section 3 describes the construction of plurk corpus. Section 4 presents the
experiment results and evaluation. Finally, Section 5 concludes the paper and summarizes our
future work.
2. Emoticon and Emotional Expression in Social Media
In the previous studies, computer systems have been designed to perform automatic
sentiment analysis using elements from machine learning such as latent semantic analysis
(LSA), Naive Bayes, support vector machines (SVM) etc. [1] [2]. A different set of features
was utilized. For example, Mishne [1] uses many features extracted from Live Journal web
blog service to train the SVM binary classifier for sentiment analysis.
2
http://www.plurk.com
173
It becomes even harder in the task of identifying (multiple) emotions. Jung et al. [3]
show that there are some idiosyncratic natures of mood expression in plurk messages; for
example, the initial mood may not be maintained all the way to the end (which also known as
the fluctuation of moods). In addition, some blogs are so intertwined that even human readers
would have difficulties in indentifying the mood, not to mention for the machine.
By scrutinizing the plurk data, we found that the emoticons3 can be utilized as useful
features in presenting the emotional content of a conversation, especially in
computer-mediated communication (CMC). As graphic representation of facial expressions,
emoticons are used to remedy the lack of some non-verbal cues, such as head nodding, smiles,
and it is why they are often imbued in the textual messages, where we want to set different
tones, such as humor, irony, sarcasm, cuteness, flippancy or other emotions that our face is
not there to deliver [4].
With the advantages of that fact that plurk messages are augmented with many plugins
support for emoticons, we propose to extract the affective terms by 'bag-of-words'
bootstrapped from the emoticons (non-verbal expressions) widely used in plurk domains, and
use them to determine the overall emotion embedded within a plurk message.
3. Plurking a Social Web Corpus
For this research, we constructed a Plurk social web-as-corpus (SocialWaC) in Taiwan. Plurk
is a social journal service that allows users to showcase the events that make up their life, and
follow the events of the people that matter to them in messages called plurks. Plurkers just
post a new plurk and it'll appear on the timeline.
Figure 2. Snapshot of plurk timeline
3.1 Crawling and Preprocessing
3.1.1 Crawling Data
We use the official Plurk API4 to crawl the Plurk data. These include plurk data (with
attributes of plurk_id, qualifier, date when the plurk was posted, content, response, emoticons,
3
Note that in this paper, we do not consider smileys, which is often used in other CMCs like emails communication.
4
http://www.plurk.com/API
174
etc), user data (with attributes of id, location, date_of_birth, gender, karma, relationship, etc),
and other social network data (friends and fans, blocks, cliques, etc).
We stick to the policy of Plurk with regards to which technical, conversational and
sociological meta-information should be captured during the collection of the data and which
ethical aspects should be considered when designing our plurk SocialWaC. For instance,
privacy data were not collected. Currently, our corpus has 3000 plurkers, with 1723 female
plurkers and 1277 male plurkers. The total number of plurks achieved to 38629.
3.1.2 Format
The API returns JSON encoded data. We use simplejson of Python library to decode the
data returned. We decided to crawl the data without layout information (such as typeface, font,
color, size, etc) for some techinal reasons on the one hand, and on the other hand, we found
that plurkers perform pragmatic functions more often than just text decoration within
communicative exchanges, except the emoticons, whose visual properties are functional in
the original texts.
3.1.3 Segmentation
One of the most crucial and difficult tasks in preprocessing corpus in Chinese is the
word segmentation. Among different segmentation algorithms, the lexicon-based approach is
widely adopted, which can successfully identify Chinese sentences as distinct words from
Chinese texts. However, the word identification ability of the lexicon-based scheme is highly
dependent with a well-prepared lexicon with sufficient amount of lexical entries. Hybrid
approach thus proposed to combine with other statistical information to detect
out-of-vocabulary (OOV).
To our knowledge, most Chinese word segmentation systems, such as Sinica CKIP
segmentator and CScanner were designed to perform Chinese word segmentation process for
general domain Chinese texts based on a lexicon manually. For our social domain corpus
with huge amount of lexical and syntactic varieties (new words, phrases and sentence
structures), these tools did not work well. For instance, these tools cannot identify trendy
words like luo2 li4 (蘿莉) 'a little girl', xing2 nan2 (型男) 'a stylish man', let alone those
complex orthographically code-mixing words like A ka (A 咖) 'the most popular and famous
guy in certain domains', Q ban3 (Q 版) 'cute version', niou2 B (牛 B) 'great', etc. Based on
these considerations, we adopt the segmentation system provided by Yahoo!5 for the sake of
its powerful lexicon extension for new words emerged in the social media.
3.2 Annotation
In contrast with traditional corpus annotation, King [5] pointed out that the responses to
spam (in the form of 'adbots'), cyber-orthography, the ubiquity of names, and overlapping
conversations all result in the challenges of annotation in CMC.
In addition to these properties in common, „plurks‟ also behave as short sentences and
5
http://tw.developer.yahoo.com/cas/
175
fragments, with abundant use of emoticon, and available meta-information of users. As
emoticons convey the sentiment that plurkers use to express their particular state of mind, the
annotation is based on plurk level, with POS information and emoticons tagged.
One particularly interesting property that can be observed from our corpus data, is the
availability of users‟ meta-information, which can be coupled with lexical information for
computational sociolinguistic surveys. For instance, Figure 3 shows a socialized lexical
network extracted from our corpus, visualized by vister software. The colored areas
demonstrate the shared lexical patterns among certain social community.
Figure 3. Socialized lexical network
In addition, through the meta–information, we are also able to know when a certain gender or
people from a certain location use a lexicon, how accompanying emotion is distributed. This
is demonstrated in the website introduced in Section 3.3.
3.3 Availability
We‟ve made a website for the query of our corpus, which can be accessed via
http://140.122.83.235/plurk/.
Figure 4. Plurk search website
176
4.
Experiments
In this section we describe the mood classification experiments we conducted on plurk corpus.
The aim was to assess affective and attitude interpretations of plurks based on our proposed
method.
4.1 Algorithm: Keyness and Manual Validation
The prevalent SVM machine learning algorithm works well for text classification, but has
only low precision and recall for sentence-level classification. Various techniques need to be
used to deal with the issue of imbalanced data distribution [6].
In this paper, we use a simple but effective method that ties back to a well-known textual
statistical measure called keyness. Given a set of evaluative plurks P, our mood classifier
one of the three affective valencies: positive, interrogative and
classifies each plurk
negative. This algorithm makes use of Yahoo! Chinese segmentator and part-of-speech (POS)
tagger, with a stop-word list compiled from Academia Sinica Balanced Corpus. It consists of
four steps explained in the following:
[Step one] Extract plurks containing emoticons from the corpus and divide them into 3
different emoticon categories based on the emoticons they contain, so as to generate the
keyword and keyness lists as in step 2.
[Step two] Generate a list of keyword and keyness for each emoticon categories
respectively, i.e. Positive (P), Interrogative (I), and Negative (N) categories,6 by using
the log-likelyhood feature selection method in Anctconc [7][8].7 In addition, stop words8
are removed from the keyword lists.
We define a Saliency Score (SS) Function, which is calculated by the result of one
keyword's keyness value divided by the keyness value of the first-ranking word in the
keyword list, as shown in (1).
(1)
[Step three] Calculate the sum of saliency score of each emoticon category in the whole
sentence domain, the category with the highest sum is assigned, i.e., Emoticon assigned
(X) = Max (Score(P), Score(I), Score(N)). Unknown words are assigned to the most
similar words using pointwise mutual information (PMI) from Chinese Wordnet.
(2)
6
Note here that emoticons themselves are not counted as a word when we try to generated keywords. Emoticons
are used so that we can categorized plurks as positive, interrogative, or negative.
7
When generating the list of keyword and keyness of Positive category, we take the other categories as reference
corpus. The situation is similar when we generate the lists of Interrogative and Negative categories
respectively.
8
Compiled from Academia Sinica Balanced Corpus. The Stop lists contain the deictics, pronouns and quantity words.
177
[Step four] Manual Validation: After the automatic emoticon assignment, a manual
validation is conducted. Two evaluators are requested to validate the emoticon assignment.
This step aims to not only validate the results but also recall the accurate judgment of the
aforementioned automatic emotion classification.
4.2 Data and Resource Description
To examine emoticons and emotional expression, we extract plurks in our corpus. Of
436,487 plurks, 103,171 (24%) were found that contain emoticons, 32,851 found containing
default emoticons. To simplify the classification of emoticon categories, only default
emoticons 9 are concerned in this current research.
We randomly divide the data with default emoticons into 80/20 division for training and
testing, respectively. The keyword and keyness lists of 3 different emoticon categories are
generated by the 80 division, and the testing data are construed by the 20 division.
4.3 Experiments with heuristic rules (H-rules)
All the test plurks are assigned to one of the three valences by the method described in
Section 4.1. The second and third experiment is conducted with the following heuristic rules,
respectively.
4.3.1 H1-Rule: Decomposing the plurks
The focus of our proposed method is similar to sentiment classification at the
sentence-level. However, according to the Pragmatic Information Structure, new information,
the purpose of communication, is conveyed later than old information in sentences. As
Emoticon possessing the communicative function, its regular sentence final position is
corresponded to the Information Structure, and is closely related to the later fragment of
sentence, especially in disjunctive sentence. For example, “Fragment 1 [I won the lottery this
morning,] Fragment 2 [ but I lost the ticket in the afternoon .]” The fragment 1 conveys a
proposition of positive emotion, but the real conveying information of this sentence is the
second fragment, conveying a negative emotion and denoting by the emoticon in sentence
final position. We thus wish to examine whether the sentence final fragment is most probable
to contain affective content by restrict the Score Domain to the sentence final fragment only.
4.3.2 H2-Rule: Deleting inappropriate keywords
A few words cannot serve as Negative keywords due to the properties of their meanings;
therefore, we delete those content words in our keyword lists. There are 3 words being
deleted.
The first one is hao3 ( 好 ) 'good (adj.)/very (adv.).' As shown in the English
translation, hao3 (好) can be a word of positive emotion ( i.e. "good") or an adverb of degree
9
Default emoticon can be accessed by all users of different Karma value. In Plurk, the Karma value is a mechanism used to
evaluate user‟s level.
178
(i.e. "very"), which conveys neutral emotion but usually be used in negative emotion to
amplify the negative degree.
The second keyword being deleted is le (了), a Chinese aspectual particle. The particle
provides no meaning but aspectual information. Therefore, le (了), like stop words, was not
supposed to be a keyword.
The other word being deleted is chu fu2 (舒服) 'comfortable', a word is positive per se
but is misanalyzed as a keyword of negation emotion. The misanalysis appears in several
positive words in the experiment because those words usually follows bu4 (不) 'not', a
negation adverb, to form a negative meaning. A problem arises here is that Yahoo! Chinese
Segmentation
system
segmented
the bu4-plus-word(s)
pattern
inconsistently.
Sometimes bu4 is separated with the following adjective or verb; sometimes the bu4 and the
adjective or verb together are segmented as a single word. To avoid the problem caused by
inconsistence, we examine the 30-most-frequent separated words following bu4 in negative
keywords. Among the 30 keywords, only chu fu2 'comfortable' is a word with positive
emotion, so we delete only the word here.
4.4 Baseline
Given no similar experiments have been conducted before, the baseline of our
experiments is set as the results of randomly classifying each plurk in the testing data as one
of valences (i.e. positive, interrogative, or negative). The random classification is
automatically performed by a program written in Python.
4.5 Results and Analysis
The results of the three experiments described in Section 4.1 and 4.3 are illustrated in
Table1.
Table 1. The results of 3 experiments with and without heuristic rules
Experiment 1
(without
H-rules)
Experiment 2
(H1-rule)
Experiment 3
(H2-rule)
Accuracy
59.13 %
56.35 %
61.80 %
Recall
95.05 %
91.15 %
94.55 %
From the numbers of Table 1, it is observed that Experiment 3 has the best performance
among all. For a more convincing result, we duplicate the Experiment 3 to examine whether
the heuristic rule of Keywords is the essential factor which affects the result. The duplicated
experiment (Experiment 4) is conducted by randomly choosing another 20 division of testing
data. The result is illustrated in Table 2, and recalls by manual validation (MV) and PMI are
added in the table.
179
Table 2. The results of 2 experiment both with H2-rule
Experiment 3 (H2-rule)
Experiment 4 (H2-rule)
Original
MV +PMI
Original
MV+PMI
61.80 %
70.89 %
60.64 %
69.55 %
94.55%
95.3%
11.02%
94.2%
94.98%
10.90 %
Accuracy
Recall
Ambiguous10
The detailed accuracy rate of distinct emoticon categories of Experiment 3 and 4 is illustrated
in Table 3 and Table4.
Table 3. The accuracy rate of distinct emoticon categories in Experiment 3
Experiment 3 (H2-rule)
Positive
(n= 3122)
Interrogative
(n= 460)
Negative
(n= 2779)
Original
MV +PMI
Original
MV +PMI
Original
MV
+PMI
Accuracy
56.95%
59.39%
25.65%
60.22%
73.22%
85.50%
Recall
93.69%
94.59%
98.04%
98.7%
94.93%
95.54%
Table 4. The accuracy rate of distinct emoticon categories in Experiment 4
Experiment 4 (H2-rule)
Positive
(n= 3563)
Original
Accuracy
56.55%
MV
+PMI
59.05%
Recall
93.49%
94.5%
Interrogative
(n= 516)
Original
Negative
(n= 3093)
Original
19.38%
MV
+PMI
52.71%
72.23%
MV
+PMI
84.45%
95.16%
95.54%
94.86%
95.44%
Besides, the testing data of Experiment 3 and Experiment 4 are also used as the data of
the baseline in this study; also, in order to produce a convincing results, we perform the
baseline experiments twice. The results are presented in the following table.
10
The inconsistency evaluation of two evaluators in manual validation, that is, while some sentences have ambiguous
emotions, the evaluators have different agreements in the emoticon assignment. The consistency rate of the two evaluators is
67.26 %.
180
Table 5. The results of baseline experiments
Baseline of
Experiment 3
Baseline of
Experiment 4
1st time
2nd time
1st time
2nd time
Accuracy
34.73 %
32.86 %
32.54 %
33.73 %
Failure
65.27 %
67.14 %
67.46 %
66.27 %
Compared the accuracy in Table 5 with that in Table 2, it is shown that the results of our
algorithm are much higher than those of baseline, especially the recalled results.
4.6 Evaluation
From the result in Table 1, heuristic rule 1, decomposition the plurk, did not perform
better as we expected. Instead, the accuracy rate decreases from 59.13% to 56.35% and recall
rate decreases from 95.05% to 91.15%. This result can be contributed to the shorter length of
fragments, not enough keywords to be extracted from. However, heuristic rule 2, Deleting
inappropriate keywords, increases the accuracy rate from 59.13% to 61.80%, and did not
affect much of the recall rate. We proposed that some special categories of keywords should
be removed. First is the homographic keywords, which can ambiguously serve as keywords
in two polar emotions, such as hao3 (好). Second is the positive adjective which can combine
with the negative adverb to denote an opposite meaning. As the homographic keywords, these
positive terms can serve as keywords in two polar emotions, as positive terms or negative
when appearing after negative terms. A plausible solution is to retag the negative expression,
combining the negative term and the following positive term into a single word, to avoid
undetermined semantic orientation [9].
It is worth noting that researchers found that "in relationship with simultaneous verbal
behavior, nonverbal behavior may emphasize, repeat, substitute, or contradict verbal
messages, yet CMC commentators discuss emoticons in terms of their emphatic function or
signaling function, not mere repetition or substitution of otherwise-conveyable verbally
transmitted meaning". So basically, "positively valenced emoticons should enhance positively
valenced verbal meassages, and negative emoticons make negatively valenced messages
more negative". However, there exists an interesting phenomenon of emoticon paradox
which suggests an intentionally conflicted or ambiguous state, and which are thus less
predictable. That is, "positive verbal messages with a negative emotion or vice versa". For
example, “I really enjoy the meal with my boss .” Verbally, the speaker said that he enjoys,
but he conveys his real feeling by using negative emoticon. The inconsistencies between
verbal meanings and nonverbal cues conveyed by emoticons will render an inconsistency
error in classification task. This phenomenon can be observed in the accuracy of Interrogative
and Negative categories in Table 3 and Table 4; after manual validation, the accuracy of the
two categories improves greatly due to the hedge-of-speech-act phenomenon. We observe
that when conveying a negative expression, people like to hedge the speech act by adopting
181
interrogative format. For example, “How come there are such rude people in the world? ”
On the other hand, negative expression could be hedged or modified by using an interrogative
emoticon, such as the interrogative emoticon. For example, “I don’t want to take the
exam. :-o.” However, it is rarely to see a hedge in the positive expression, thus the accuracy
rate didn‟t improve much after manual validation.
We propose that the emoticons not only serve as a decoration or amplification of the
sentence mood, they also function as verbal tokens conveying their arbitrary meanings as
observed in emotion paradox. Even without any content, they convey information by
themselves. The emotion paradox phenomenon can be used to automatically detect the
pragmatic hedge-of-speech-act usage. When analyzing sentence like “I really enjoy the meal
with my boss .” By detecting the inconsistency of the computed emoticon and the given
emoticon, the hedge-of-speech-act usages can be detected and contribute to the real meaning
of the sentence, which cannot be done by purely analyzing plain verbal meanings.
5. Conclusion
The present study proposed a simple but effective way, by computing the keyness of
keywords in sentence, in both assigning emotions to sentences and detecting the emotion
paradox sentences. This method can as well compile a dynamic (Chinese) affective terms, the
keywords, and apply to subfields of Cognitive linguistic, Chinese Teaching and so forth.
Pragmatics will also strongly benefit from the detection of the hedge-of-speech-act sentences.
Since in the traditional methodology, the presuppositions or implicatures in utterances are
known by speakers only. However, an emoticon provides a cue to speakers‟ real emotion and
intention, and thus reflects speakers‟ underlying meaning of presuppositions and implicatures.
Future works can be aimed at using LSA to broaden the linkage of the keywords to see
whether the closely related words sharing the same emotion, and whether this approach can
expand the coverage of keywords to reduce achieving a lower unknown rate.
References
[1] G. Mishne, “Experiments with Mood Classification in Blog Posts,” in Proceedings of the
1st Workshop on Stylistic Analysis of Text For Information Access, 2005.
[2] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant
supervision,” Dec 2009. [Online]. Available:
http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision09.pdf
[3] Y. Jung, Y. Choi, and S.H. Myaeng, “Determining mood for a blog by combining multiple
sources of evidence,” in Proceedings of IEEE/WIC/ACM International Conference on
Web Intelligence, pp. 271-274, 2007.
[4] J. B. Walther, and K.P. D'addario, “The Impacts of Emoticons on Message Interpretation
in Computer-Mediated Communication,” Social Science Review, vol. 19, no. 3, pp.
324-347, 2001.
[5] B. King, “Building and Analysing Corpora of Computer-Mediated Communication,” in
Contemporary Corpus Linguistics, P. Baker, Ed. New York: Continuum International
Publishing Group, 2009, pp. 301-320.
182
[6] E. Spyropoulou, S. Buchholz, and S. Teufel, “Sentence-based Emotion classification for
text-to-speech synthesis,” presented at Computational Aspects of Affectual and Emotional
Interaction-2008, Patras, Greece, 2008.
[7] L. Anthony, “AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus
Analysis Toolkit,” in Proceedings of IWLeL 2004: An Interactive Workshop on Language
e-Learning, pp. 7-13, 2004.
[8] A. Kilgarriff, “Comparing corpora,” International Journal of Corpus Linguistics, vol. 6,
no. 1, pp. 97-133, 2001.
[9] P. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation applied to Unsupervised
Classification of Reviews,” in Proceeding of the Meeting of the Association for
Computational Linguistics, pp. 417-424, 2002. [Online]. Available:
http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf
183