Academia.eduAcademia.edu

Classification and Pattern Discovery of Mood in Weblogs

2010, Lecture Notes in Computer Science

Classification and Pattern Discovery of Mood in Weblogs Thin Nguyen, Dinh Phung, Brett Adams, Truyen Tran, and Svetha Venkatesh Curtin University of Technology [email protected], {d.phung,b.adams,t.tran2,s.venkatesh}@curtin.edu.au Abstract. Automatic data-driven analysis of mood from text is an emerging problem with many potential applications. Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature. We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs. Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology. This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with. In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth. Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure–displeasure and activation–deactivation. 1 Introduction Mood is a state of the mind such as being happy, sad or angry. It is a complex cognitive process which has received extensive research effort, and debate, among psychologists about its nature and structure [9,10,6]. But better scientific understanding of what constitutes a ‘mood’ has ramifications beyond psychology alone: for neuroscientists, it might offer insight into the functioning of the human brain; for medical professionals working in the domain of mental health, it might enable better monitoring and intervention for individuals and communities. Research like that cited above aims to understand psychological drives and structures behind human mental states, and typically does so with expensive methodologies involving questionnaires or interviews that limit the number of participants. By contrast, our work aims to classify and cluster mood based on pre-existing content generated by users, which is collected unobtrusively – a sub-problem known as mood analysis in sentiment analysis [8]. Text-based mood classification and clustering, as a sub-problem of opinion and sentiment mining, have many potential applications identified in [8]. M.J. Zaki et al. (Eds.): PAKDD 2010, Part II, LNAI 6119, pp. 283–290, 2010. c Springer-Verlag Berlin Heidelberg 2010  284 T. Nguyen et al. However, text-based mood analysis poses additional challenges beyond standard text categorization and clustering. The complex cognitive processes of mood formulation make it dependent on the specific social context of the user, their idiosyncratic associations of mood and vocabulary, syntax and style which reflect on language usage, or the specific genre of the text. In the case of weblogs, these challenges are highlighted by bloggers in the expression of diverse styles, relatively short text length, and informal language, such as jargon, abbreviations, and grammatical errors. This leads us to investigate whether machine learning-based feature selection methods for general text classification are still effective for blog text. Feature selection methods available in machine learning are often computationally expensive, relying on labeled data to learn discriminative features; but the blogosphere is vast (reaching almost 130 million1 ) and continuing to grow, making desirable a feature set that works without requiring supervised feature training to classify mood. To this end, we turn our attention to the result of a study that intersects psychology and linguistics known as affective norm for English words (ANEW) [1], and propose its use for mood classification. In addition to classification, clustering mood into patterns is also an important task as it might provide vital clues about human emotion structure and has implications for sentiment-aware applications. While the structure of mood organization has been investigated from a psychological perspective for some time [9], to our knowledge, it has not been investigated from a data-driven and computational point of view. We provide an analysis of mood patterns using an unsupervised clustering approach and a dataset of more than 17 millions blog posts manually groundtruthed with users’ moods. Our contribution is twofold. First, we provide a comparative study of machine learning-based text feature selection for the specific problem of mood classification, elucidating insights into what can be transferred from a generic text categorization problem for mood classification. We then formulate a novel use of a psychology-inspired set of features for mood classification which does not require supervised feature learning, and is thus very useful for large-scale mood classification. Second, we provide empirical results for mood organization in the blogosphere on the largest dataset with mood groundtruth available today. To our knowledge, we are the first to consider the problem of data-driven mood pattern discovery at this scale. The rest of this paper is organized as follows. Related work to feature selection methods for classification and clustering tasks in general text and in sentiment analysis is presented in Section 2. Work related to emotion measures in psychology is also examined in this section. Next, we present machine learning based feature selection schemes, together with the proposed ANEW feature set and linguistic analysis, applied to mood classification in two large datasets in Section 3. Section 4 presents the results of mood pattern discovery using an unsupervised learning technique, and is followed by some concluding remarks. 1 From the state of the blogosphere 2008 at http://technorati.com Classification and Pattern Discovery of Mood in Weblogs 2 285 Related Background For generic text categorization, a wide range of feature selection methods in machine learning has been studied. Most noticeably, Yang and Pedersen [14] conduct a comparative study on different feature selection schemes including information gain (IG), mutual information (MI ), and χ2 statistic (CHI ). Other than term-class interaction based, another approach for selecting features is to consider term statistics. Thresholds for term frequency (TF ) or document frequency (DF ) are commonly used in feature reduction in data mining. A joint from these two, the term frequency–inverse document frequency (TF.IDF ) scheme, is also popular in text mining which often outperform TF and DF. Some works perform feature searching in narrower sets than over the entire vocabulary such as linguistic groups like the parts of speech (POS ). Related work making use of emotion bearing lexicon for sentiment analysis includes [2], where Dodds and Danforth use the valence values of ANEW [1] for estimating happiness levels in song lyrics, blogs, and the State of the Union. Work related to mood clustering includes [5], where Leshed and Kaye group blog posts based on their moods to find mood synonymy. Generally, emotions have been represented in dimensional and discrete perspectives. In the first methodology, emotion states are coded as combinations of some factors like valence and arousal. In contrast, the latter argues that each emotion has unique coincidence of experience, psychology, and behaviour [6]. We base our work on the dimensional mode for estimating the emotion sphere in blogosphere. Specifically, we use the circumplex model of affect [9,10] since it conceptualizes emotion states simply via valence and arousal dimensions, which can be computed using ANEW. 3 3.1 Textual-Based Mood Classification Feature Selection Methods Denote by B the corpus of all blogposts and by M= {sad, happy, ...} the set of all mood categories. In a standard feature selection setting, each blogpost d ∈ B is also labeled with a mood category ld ∈ M and the objective is to extract from d a feature vector x(d) being as discriminative as possible ford to be classified as ld . For example, if we further denote by V = v1 , . . . , v|V | the (d) set of all terms, then the feature vector x(d) = [. . . , xi , . . .] might take a simple (d) counting with its i-component xi represents the number of times the term vi appears in blogpost d, a scheme widely known as bag-of-word representation. Term-based selection. These are features derived with respect to a term v. Two common features are term and document frequencies where term frequency T F (v, d) represents the number of times the term v appears in document d, whereas document frequency DF (v) is the number of blogposts containing the term v. It is also well-known in text mining that T F.IDF (v, d) weighting scheme 286 T. Nguyen et al. can potentially improve discriminative power where T F.IDF (v, d) = T F (v, d) × IDF (v) with IDF (v) = |B|/DF (v) is the inverse document frequency. In this work, a term v will be selected if it has high DF (v) value, or high average values of T F (v, d) or T F.IDF (v, d) across all documents d over a threshold. Term-Class interaction-based selection. The essence of these methods is to capture the dependence between terms and corresponding class labels during the feature selection process. Three common selection methods falling into this category are information gain IG(v), mutual information M I(v, l) and χ2 -statistics CHI (v, l)[14]. IG (v) captures the information gain (measured in bits) when a term v is present or absent; M I (v, l) measures the mutual information between a term v and a class label l; and lastly CHI (v, l) measures the dependence between a term and a class label by comparing against one degree of freedom χ2 distribution. Affective Norms for English Words (ANEW ). Apart from feature sets learned from data, for sentiment analysis, some emotion bearing lexicons have been subjectively chosen by labor power could help. Among them is ANEW [1], a set of 1034 sentiment conveying English words. These words are rated in terms of valence, arousal, and dominance they could convey. We apply the proposed set of 1034 words in ANEW exclusively as the feature vector, which means each blogpost is represented as a sparse counting vector for these ANEW words. 3.2 Mood Classification Results It has been shown that linguistic components such as specific use of adverbs, adjectives or verbs can be a strong indicator for mood inference [8]. Therefore, in this paper, we further run a part-of-speech tagger to identify all terms that can be tagged as verbs, adjectives and adverbs. The tagger used is the SS-Tagger [12] ported to the Antelope NLP framework, giving a reasonable accuracy2 . Three term weighting-based (TF, DF, TF.IDF ) and three term-class interaction-based (IG, MI, CHI ) selection methods are employed in this experiment. These feature selection methods shall be applied either on all terms (unigrams) or with respect to a subset of terms tagged with a specific POS. Our experimental design is to compare and contrast which feature selection methods work best and to examine the effect of specific linguistic components in the context of mood classification. For classification methods, we have experimented with many off-the-shelf classifiers such as SVM, IBK, C4.5 and so on, however, the naive Bayes classifier (NBC) consistently outperforms these methods and therefore we shall only report the results w.r.t NBC. For each run, we use ten-fold cross-validation and repeat 10 runs and report the average result. To evaluate the results, we report two commonly-used measures: accuracy and F-score (which is measured based on recall and precision). Effect of feature selection schemes and linguistic components. We use two datasets, namely WSM09 and IR05, for the task of mood classification. The first, WSM09, is provided by Spinn3r as the benchmark dataset for ICWSM 2 www.proxem.com Classification and Pattern Discovery of Mood in Weblogs 287 2009 conference3 which contains 44 millions blogposts crawled between August and October 2008. We extract a subset from this dataset consisting only blogposts from LiveJournal and query LiveJournal to obtain the mood groundtruth entered by the user when the posts were composed. We only consider the moods predefined by LiveJournal and discard others, resulting in approximately 600,000 blogposts. To validate the generalization of a feature selection scheme, we also run it on another dataset (IR05) created in [7] which contains 535,844 posts tagged the predefined moods. To make comparison with previous results in [11], we examine three popular moods {sad, happy, angry} in this experiment. The full set of 132 mood categories will be reported in the next section. We run the experiment over combination of feature selection methods on different linguistic subsets and report the top ten best results in Table 1. Table 1. Mood classification results for different feature selection schemes and for different part-of-speech subsets. Different combinations of feature selection methods and POS subsets are run, but we report only the top ten results sorted in ascending order of F-score. WSM09 IR05 Selection Linguistic Selection Linguistic Accuracy F-score Accuracy F-score method subsets method subsets IG TF.IDF DF TF TF.IDF DF TF IG IG ANEW Verb unigram AdjVbAdv AdjVbAdv AdjVbAdv unigram unigram AdjVbAdv unigram 0.713 0.714 0.744 0.75 0.751 0.754 0.753 0.753 0.762 0.776 0.697 0.7 0.738 0.745 0.745 0.748 0.752 0.752 0.756 0.774 IG Adjective ANEW TF.IDF AdjVbAdv TF AdjVbAdv DF AdjVbAdv TF.IDF unigram DF unigram TF unigram IG AdjVbAdv IG unigram 0.738 0.734 0.759 0.76 0.76 0.765 0.765 0.765 0.773 0.791 0.709 0.712 0.749 0.75 0.75 0.756 0.762 0.762 0.763 0.788 With respect to feature selection scheme, information gain (IG) is observed to be the best selection scheme. Other term-class interaction based methods do not perform well, noticeably mutual information (MI ) does not appear in any of the top ten results. These observations are consistent with what reported in [14] for text categorization problem. However, different with conclusions in [14], we found that CHI performs badly for mood classification task and does not appear in any top ten results. Surprisingly, both TF and DF performs better than TF.IDF in all-term (unigram) cases, which otherwise has been known oppositely in text mining that IF.IDF is often superior although much more computationally expensive. Thus, TF or DF should be the alternative candidates for IG for the trade-off of computational cost. The performance of feature selection schemes experimented is also agreeable well across two datasets as can be seen in Table 1, except for the first few 3 http://www.icwsm.org/2009/data/ 288 T. Nguyen et al. Fig. 1. Discovered mood structure map. Each cluster is annotated with the top six mood categories (best viewed in colour). rows. Our best result stands at 77.4% F-score for WSM09 and 78.8% for IR05, which is higher than what reported in [11] (66.1%). With respect to the effect of linguistic components (which are not experimented in [11] and [5]), a combination of adjectives, verbs and adverbs (AdjVbAdv) dominates the top ten results and gives a very close performance to using all terms; noticeably using verbs or adjectives alone shows a good performance. Performance of ANEW. Without the need of supervised feature selection stage, the result of ANEW feature is found to be very encouraging, appears in both top ten results across two datasets. The results across two datasets are also consistent, stand at approximately 70% F-score (still better than the best result reported in [11]). 4 Mood Pattern Discovery While most of existing work has focused on supervised classification of mood, we are interested in discovering intrinsic patterns in mood structure using unsupervised learning approaches. Using a large, groundtruthed dataset of more than 17 millions posts introduced in [5], we aim to seek empirical evidences to answer various questions which have often posed in psychological studies. For example, Classification and Pattern Discovery of Mood in Weblogs 289 does mood follow a continuum in its transition from ‘pleasure’ to ‘displeasure’, or from ‘activation’ to ‘deactivation’ ? Is ‘excited’ closer to ‘aroused’ or ‘happy’ ? Does ‘depressed’ transit to ‘calm’ before reaching ‘happy’ ? We use a total of 132 predefined moods defined by livejournal.com4 for the clustering task. Given a corpus of more than 17 millions posts, it means that feature selection schemes presented in section 3.1 are very expensive to perform; for example, computing M I(v, l) for each pair (term, mood label) will take O (|M| × |V|) where |M| = 132 (number of moods) and |V| is the number of unique terms which could be in the order of hundreds of thousands. Since our results in section 3.2 have shown that the proposed ANEW feature set gives comparable results, marginally lower (∼8%) in the classification compared to the best result but can totally avoid the expensive feature selection step, we shall employ ANEW as the feature vector in this section. We choose multidimensional scaling, in particular, self-organizing map (SOM) [3] for clustering purpose. We use the SOM-PAK package [4] and the SOM Toolbox for Matlab [13] to train and visualize the map. For training, an 9 × 7 map is used which accounts for nearly a half of the mood classes. Using the recommendations in [3], the horizontal axis is roughly 1.3 that of the vertical axis; the node topology is hexagonal, and the number of training steps is 32,000 (about 500 times of the number of nodes). Due to space restriction, we omit coarse-level results and present in the Figure 1 the structures of the clusters discovered in which top six moods in each cluster are included. Several interesting patterns emerge from this analysis. At the highest level, one can observe the general transition of mood from an extreme of pleasure (clusters II, III, and V) to displeasure (clusters IV, VI, VII). On the pleasure polar we observe the moods having very high valence values5 such as good (7.47), loved (8.64) or relaxed (7), whereas on the displeasure end, we observe the moods having low valence values such as enraged (2.46) or stressed (2.33). Certain mood transition is also evidential, for example the cluster path IV-II-III presents a transition pattern from infuriated to relaxed and then to good. Though not strongly emerging as in the case of pleasure ↔ displeasure, a global pattern of activation ↔ deactivation is also observed based on the analysis of the arousal measure as shown in Figure 1. Our results are indeed favorable of the core affect model for human emotion structure studied in psychology [9,10], generally agreeable with the global mood structure proposed in there. 5 Conclusion We addressed the problem of mood classification and pattern discovery in weblogs. While the problem of machine learning based feature selection for text categorization has been intensively investigated, little work is found for textual based mood classification which is often more challenging. Our first contribution 4 5 These moods can be viewed at http://www.livejournal.com Measured based on a study on ANEW reported in [1]. 290 T. Nguyen et al. is a comprehensive comparison of different selection schemes across two large datasets. In addition, we propose a novel use of ANEW features which do not require a supervised selection phase, and thus, can be applied for mood analysis at a much larger scale. Our results have recalled similar findings in previous results, but also brought to light discoveries peculiar to the problem of mood classification. Our newly proposed feature set has also performed comparatively well at a fraction of the computational cost of supervised schemes, and was further validated by the results of an unsupervised clustering exercise, which clustered 17 million blog posts, and provided a unique view of mood patterns in the blogosphere. In particular, this study manifests global patterns of mood organization that are analogous to the pleasure–displeasure and activation–deactivation dimensions proposed independently in the psychology literature, such as the core affect model for the structure of human emotion. This data-driven organization of mood could be of interest to a wide range of practitioners in the humanities, and has many potential uses in sentiment-aware applications. References 1. Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): Stimuli, instruction manual and affective ratings. Technical report, The Center for Research in Psychophysiology, University of Florida (1999) 2. Dodds, P.S., Danforth, C.M.: Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies, 1–16 (2009) 3. Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (2001) 4. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: SOM PAK: The selforganizing map program package. Technical report, Helsinki University of Technology (1996) 5. Leshed, G., Kaye, J.J.: Understanding how bloggers feel: recognizing affect in blog posts. In: Proc. of ACM Conf. on Human Factors in Computing Systems, CHI (2006) 6. Mauss, I.B., Robinson, M.D.: Measures of emotion: A review. Cognition & emotion 23(2), 209–237 (2009) 7. Mishne, G.: Experiments with mood classification in blog posts. In: Proc. of ACM Workshop on Stylistic Analysis of Text for Information Access, SIGIR (2005) 8. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008) 9. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161–1178 (1980) 10. Russell, J.A.: Emotion, core affect, and psychological construction. Cognition & Emotion 23(7), 1259–1283 (2009) 11. Sara, S., Lucy, V.: Sentisearch: Exploring mood on the web. In: Proc. of Workshop on Weblogs and Social Media, ICWSM (2009) 12. Tsuruoka, Y.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proc. of ACL Conf. on HLT/EMNLP, pp. 467–474 (2005) 13. Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: SOM toolbox for Matlab. Technical report, Helsinki University of Technology (2000) 14. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of ICML, pp. 412–420 (1997)