Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
6 pages
1 file
The social media is gaining a lot of importance among businesshouses, academicians, medical practitioners, politicians, among others, due to its role in creating awareness about products, services, and socio-political views. The end users of these products, services, and views provide their feedbacks in the form of comments. An accurate determination of the sentiments of end users is crucial in designing policies and plans for products and services in future. As the processing power and storage capacities of computers have increased several folds, researchers can focus more on the accuracy of sentiment detection than consumption of computational resources. In this paper, we are applying a set of heuristics to analyse sentiments using freely available dictionary resources and open source tools. We have tested these heuristics over a large data set collected from standard sources. The experimental results are promising and opening new research directions in dictionary-based sentiment analysis.
2020
Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictionary is developed for the sentiment classification and further calculation of sentiment accuracy. The concept of categorized dictionary involves the creation of dictionaries for different categories making the comparisons specific. The categorized dictionary includes words defining the positive and negative sentiments related to the particular category. It is used by the mapper reducer algorithm for the classification of sentiments. The data is collected from social networking site and is pre-processed. Since the amount of data is enormous therefore a reliable open-source framework Hadoop is use...
— Due to the fast growth of World Wide Web the online communication has increased. In recent times the communication focus has shifted to social networking. In order to enhance the text methods of communication such as tweets, blogs and chats, it is necessary to examine the emotion of user by studying the input text. Online reviews are posted by customers for the products and services on offer at a website portal. This has provided impetus to substantial growth of online purchasing making opinion analysis a vital factor for business development. To analyze such text and reviews sentiment analysis is used. Sentiment analysis is a sub domain of Natural Language Processing which acquires writer's feelings about several products which are placed on the internet through various comments or posts. It is used to find the opinion or response of the user. Opinion may be positive, negative or neutral. In this paper a review on sentiment analysis is done and the challenges and issues involved in the process are discussed. The approaches to sentiment analysis using dictionaries such as SenticNet, SentiFul, SentiWordNet, and WordNet are studied. Dictionary-based approaches are efficient over a domain of study. Although a generalized dictionary like WordNet may be used, the accuracy of the classifier get affected due to issues like negation, synonyms, sarcasm, etc.
International Journal for Research in Applied Science and Engineering Technology IJRASET, 2020
With the headway of web innovation and its development, there's a colossal volume of information present inside the web for web clients and huge amounts of information is produced as well. Web has become a stage for internet getting the hang of, trading thoughts and imparting insights. Long range informal communication locales like Twitter, Facebook, Google+ are quickly picking up prominence as they license individuals to share and express their perspectives about themes, have conversation with various networks, or post messages over the planet. There has been part of work inside the field of opinion examination of twitter information. This undertaking centres for the most part around assumption examination of twitter information which is valuable to inquire about the information inside the tweets where suppositions are profoundly unstructured, heterogeneous and 2 are either positive or negative, or impartial now and again. during this paper, we offer an overview and a similar investigation of existing methods for conclusion mining like AI and vocabulary based methodologies, close by assessment measurements. Utilizing different AI calculations like Guileless Bayes, and Bolster Vector Machine, we offer research on twitter information streams. We have additionally examined general difficulties and utilization of Feeling Investigation on Twitter.
Political Science Research and Methods, 2019
Contemporary dictionary-based approaches to sentiment analysis exhibit serious validity problems when applied to specialized vocabularies, but human-coded dictionaries for such applications are often labor-intensive and inefficient to develop. We demonstrate the validity of “minimally-supervised” approaches for the creation of a sentiment dictionary from a corpus of text drawn from a specialized vocabulary. We demonstrate the validity of this approach in estimating sentiment from texts in a large-scale benchmarking dataset recently introduced in computational linguistics, and demonstrate the improvements in accuracy of our approach over well-known standard (nonspecialized) sentiment dictionaries. Finally, we show the usefulness of our approach in an application to the specialized language used in US federal appellate court decisions.
International Journal of Computer Science and Engineering, 2017
In recent times, people share their opinions, ideas through social networking site, electronic media etc. Different organizations always want to find public opinions about their products and services. Individual consumers also want to know the opinions from existing users before purchasing product. Sentiment analysis is the computational treatment of user's opinions, sentiments and subjectivity of text. In this paper we propose a framework for sentiment analysis using R software which can analyze sentiment of users on Twitter data using Twitter API. Our methodology involves collection of data from twitter, its preprocessing and followed by a lexicon based approach to analyze user's sentiment.
arXiv (Cornell University), 2015
With the booming of microblogs on the Web, people have begun to express their opinions on a wide variety of topics on Twitter and other similar services. Sentiment analysis on entities (e.g., products, organizations, people, etc.) in tweets (posts on Twitter) thus becomes a rapid and effective way of gauging public opinion for business marketing or social studies. However, Twitter's unique characteristics give rise to new problems for current sentiment analysis methods, which originally focused on large opinionated corpora such as product reviews. In this paper, we propose a new entity-level sentiment analysis method for Twitter. The method first adopts a lexiconbased approach to perform entity-level sentiment analysis. This method can give high precision, but low recall. To improve recall, additional tweets that are likely to be opinionated are identified automatically by exploiting the information in the result of the lexicon-based method. A classifier is then trained to assign polarities to the entities in the newly identified tweets. Instead of being labeled manually, the training examples are given by the lexicon-based approach. Experimental results show that the proposed method dramatically improves the recall and the F-score, and outperforms the state-of-the-art baselines.
2021
A huge amount of textual data is generated due to the boom of microblogging. Microblogging sites such as Facebook, Twitter and Google+ are used by millions of people to express their views and emotions on different subjects. In this paper, we discuss sentiment analysis on a Twitter dataset having various tweets from different users. Sentiment analysis is useful for gaining the opinion of people using large volumes of text data where texts are highly unstructured and heterogeneous. In this paper, different classification techniques like Support Vector Machine, Logistic Regression, Logistic Regression with Stochastic Gradient Descent optimizer, Decision Tree Classification, Naive Bayes, Bidirectional LSTM and Random Forest Classification have been applied to analyze the sentiment of people, i.e., whether their tweets are positive or negative. The corpus has been analyzed by plotting descriptive insights such as the word cloud and frequency of positive and negative tweets. The best cla...
Computational Linguistics, 2011
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. We show that SO-CAL's performance is consistent across domains and on completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.
Applied Sciences
Sentiment analysis has become a key technology to gain insight from social networks. The field has reached a level of maturity that paves the way for its exploitation in many different fields such as marketing, health, banking or politics. The latest technological advancements, such as deep learning techniques, have solved some of the traditional challenges in the area caused by the scarcity of lexical resources. In this Special Issue, different approaches that advance this discipline are presented. The contributed articles belong to two broad groups: technological contributions and applications.
Medijska istraživanja/ Media Research, 2019
This paper analyzes free online programs for sentiment analysis which can, on the bases of their algorithm, give a positive, negative or neutral opinion of a text. At the beginning of the paper sentiment analysis programs and techniques they use such as Naive Bayes and Recurrent Neural Networks are presented. The programs are divided into two categories for analysis. The first category consists of sentiment analysis programs which analyze texts written or copied inside the user interface. The second category consists of programs for analyzing opinions posted on social networks, blogs, and other media sites. Programs from both categories were chosen for this research on the bases of positive reviews on computer science portals and their popularity on web search engines such as Google and Bing. The accuracy of the programs from the first category was checked by inserting the same sentence from movie review and comparing the results. Their additional options have also been analyzed. For the second category of programs, it was determined which social networks, blogs, and other social media they cover on the internet. The purpose of this analysis was to check the overall quality and options that free sentiment analysis programs provide. An example of how to create one's own custom sentiment analyzer by using the available Python code and libraries found online is also given. Two simple programs were created using Python. The first program belongs to the first category of programs for analyzing an input text. This program serves as a pilot program for Croatian which gives only the basic analysis of sentences. The second program collects recent tweets from Twitter containing certain words and creates a pie chart based on the analysis of the results.
I. INTRODUCTION
With the growing importance of social media in all walks of life including education, business, healthcare, politics etc., adequate attentions are being given to users' views about entities and products. The Internet penetration is increasing due to the availability of electronic devices in the form of desktops, laptops, tablets and smart phones. Easy accessibility of Internet is encouraging users of these devices to browse the Internet, view contents, perform transactions, and most importantly express their opinions about the contents described on the e-business portals and social media. The views expressed by the users accumulate into a large amount of data, which, if analysed properly, can provide interesting observations. In social media research lexicon, the user's comments are termed as sentiments and their deep analysis is considered as an important task from academic as well as industrial point of view. Some applications of sentiment analysis include book and movie reviews, recommender systems, and political campaign analysis. A careful analysis of the textual data generated by the users on social media gives a useful insight about the products, entities and events. This helps other stakeholders such as management and end users to take informed decisions. As these opinions are highly unstructured, analysing these views is a daunting task for the research community.
Keeping in view the importance of user's sentiments, several sentiment analysis tools have been created. Some of the freely available online sentiment analysis tools are: Social Mention (http://socialmention.com) for searching a term in blogs, microblogs, images, videos, etc., Twitter Sentiment (http://www.sentiment140.com) to discover twitter sentiment about any product, brand or person. Besides, some commercial tools such as conversation miner (http://converseon.com/miner), Attensity Analyze (http://www.attensity.com/attensity-analyze), Factiva (http://new.dowjones.com/products/factiva/) are also available. Some of these tools rely on a limited set of the emotions and determine the sentiment based on a set of keywords. Therefore, these tools are not able to capture the implicitly expressed views of the users. On the other hand, many other tools use sophisticated data mining or natural language processing techniques. These tools have been tested over users' views about movies or their tweets on microblogging website Twitter. Their performances are rarely evaluated against more complicated posts such as Facebook updates in political and social contexts. This gap motivates us to conduct a comparative research to judge their performance over big data collected from movie reviews and data collected from Facebook status updates. The research presented in this paper aims to conduct a study over utility of open source dictionary based sentiment analysis tools in predicting sentiments over complex user feedbacks in social media. This research will determine the shortcoming in the existing resources as well as identify novel uses of these tools in order to make them economic and more usable to business and academic community. The analysis of the results of the experiments opens new research directions in the field of sentiment analysis.
The remaining of the paper is organized as follows: Section 2 describes some important research work done in the last decade. Section 3 of the paper describes the method and tools used to conduct this research. Section 4 tabulates and analyses the results. Section 5 discusses conclusions and future directions in sentiment analysis.
II. RELATED WORK The most basic task in sentiment analysis is to classify opinions as positive or negative. This task can be performed at three levels: document, sentence and phrase level analyses [21]. In the document level analysis, sentimental polarity of overall document is computed [6][30] [39]. The sentence level analysis is based on the fact that a document may contain several sentiments and hence individual sentences should be examined for positivity or negativity [12][15] [17]. Some researchers [8] [23] fine-grained the research to the phrase level sentiment analysis, where the importance is assigned to individual words or phrases. Cesarano et al. [5] stressed on sentiment classification based on adjective phrases only, and proposed a scale ranging from -1 to +1 for measuring the degree of polarity in sentiments. Later, Benamara et al. [2] proposed that a combination of adjective and adverb gives more accurate results than adjectives only. Subrahmanian and Reforgiato [35] extended this concept to include verbs along with adjectives and adverbs for sentiment analysis on the same scale as in [5] to get better results. Also, there have been attempts to predict sentimental polarity at one level of granularity and utilize it to predict sentimental polarity at another level [25]. For instance, Zhang et al. [41] used a rule-based approach to determine document level sentiment classification by aggregating the outcomes of sentence level analysis for Chinese documents.
The inclusion of one or more sentiment dictionaries has been central to sentiment analysis researches [9][10] [27] [34]. The dictionary-based approaches for sentiment analysis require development of a well-defined and comprehensive dictionary. Young and Soroka [40] used a dictionary-based approach consisting of a simple word count of the frequency of keywords in a text from a predefined dictionary. They have designed a sentiment dictionary (called Lexicoder Sentiment Dictionary) and tested it against nine other dictionaries as well as against a body of human-coded news content in political context. We are using Lexicoder Sentiment Dictionary for the research presented in this paper.
The source of gathering the user review may vary. It may be from the feedback section of product selling websites, such as Amazon [7] [11] where the language of feedback is very clean. Other sources may be social networking sites such as Facebook pages or Twitter handles of celebrities or political parties [18] [38]. In this case, the language is not clean at all, and the task is more challenging. Several researchers have taken Twitter as a data source to design and test sentiment analysis systems [1][13][28] [42]. The popularity of twitter is perhaps due to the short size nature of the tweets, which offer challenges due to their colloquial nature but are very precise in polarity of sentiments [20].
The alternatives to purely dictionary-based approaches are learning models based approaches, which are particularly useful for cross-domain sentiment analysis. Bollegala et al. [3] proposed a cross-domain sentiment classifier that generates an automatically extracted sentiment sensitive thesaurus during the analysis. There has also been attempt to predict the sentimental inclination of the users review by analysing the context or pinpointing the target of the opinion (also called opinion features). The opinion extract can be done using supervised leaning models [16] [19] which are more suitable for domain specific analysis or unsupervised learning models [32] [33]. Zhen et al [14] proposed a method for opinion feature extraction from online reviews by exploiting the differences in statistical features across domain specific and domain independent corpora. Some of the opinions may not contain strong emotions or may contain views on more than one issue. They require subjectivity analysis or opinion target identification [4]. To handle such opinions, Pang and Lee [29] used machine-learning method based on text categorization techniques. They extract only subjective sentences in the documents (based on minimum cut sets principles in graph theory) and discard the objective sentence in order to prevent polarity classifier from considering potentially irrelevant sentences. They tested their algorithm by classifying movie reviews as positive or negative over a data set consisting of 1000 positive and 1000 negative reviews written by 312 authors. The authors concluded that extracting most subjective sentences only (about 22% of total review) and analysing them for polarity of the movie reviews gives the comparable or sometimes better results than that of a full text review.
III.
SENTIMENT ANALYSIS This section describes the heuristics applied to conduct the research described in this paper. It also presents the nature and sources of data collected from various sources. Experimental results of sentiment analysis research are more easily compared with each other when they rely on publicly available datasets and tools. For this reason, we have used some freely available tools to determine the sentiments of the user comments that are acquired from publically available datasets. They are briefly introduced in this section.
A. Data Collection
Collecting data for sentiment analysis and labelling them according to sentiments is a daunting task. Fortunately, Bo Pang and Lillion Lee [29] have collected and classified 1000 feedbacks each for positive and negative sentiments. This collection is drawn from IMDB's archive of rec.arts.movies.reviews, and it is freely available to research community from the website aliasi.com/lingpipe/demos/tutorial/sentiment/read-me.html.
The overall average size of feedbacks in the negative data set is 610.6 words and 33.42 sentences. The overall average size of feedbacks in the negative data set is 684.29 words and 34.55 sentences.
We prepared another dataset consisting of Facebook status updates. We tried to collect Facebook posts to include as heterogeneous information as possible. Six hundred Facebook status updates were downloaded from Facebook walls of 100 unique users posted between July 10, 2014 to January 29, 2015. The average size of Facebook status updates were 68.82 words; the minimum size being 14 words and maximum size 616 words. The topics ranged from politics, sports, movies, personal accomplishments, among others.
B. Sentiment Analysis Tools
We used three freely available common tools to conduct the research discussed in this paper. A brief description of these tools is given below.
RIOTScan:
RIOTScan is a freely available software (http://riot.ryanb.cc/) designed for calculating meaningful indices from texts. It supports 35 dictionary schemes including financial sentiment dictionary [24], Social ties dictionary [31], among others. We used Lexicoder Sentiment Dictionary [40] and Opinion Lexicon [15] [22]. Opinion Lexicon consists of a list of around 6800 opinion words or sentiment words from English. RIOTScan gives processing option of stemming using Porter algorithm and lemmatization.
SentiStrength: SentiStrength estimates the strength of positive and negative sentiments in short texts, even for informal language [36] [37]. Besides its online trial version (http://sentistrength.wlv.ac.uk/), SentiStrength provides executable version for windows as well as Java implementation on request for academic research. SentiStrength analyses negative feedback on a scale ranging from -1 to -5 and positive feedbacks ranging from 1 to 5 for some keywords in the feedbacks. The overall sentiment value of a sentence is calculated by subtracting the total positive value of the sentence from the total negative value of the sentence. So, if the total negative value is higher than its total positive value, then the sentence is considered as a negative sentence.
Sentiment.vivekn.com: Sentiment.vivekn.com is a free online tool for sentiment analysis for research and academic purposes [26]. This tool works by examining individual words and short sequences of words (n-grams) and comparing them with a probability model. The probability model is built on a pre-labelled test set of IMDB movie reviews. It can also detect negations in phrases, i.e., the phrase "not bad" will be classified as positive despite having two individual words with a negative sentiment.
C. Sentiment Analysis Heuristics
We applied some heuristics to analyse the sentiments in feedbacks and posts. This section describes these heuristics. In the first heuristic, we tried to determine the effectiveness of dictionaries, in their basic forms, on the analysis of sentiments. That is, we tested what percentage of words used in the reviews are from negative sentiment dictionary and what percentage from positive sentiment dictionary. If the percentage of words from negative dictionary exceeds that from the positive dictionary, the view is considered as negative sentiment; otherwise, it is considered a positive sentiment. The percentage of determination of positive or negative words can be done using standard sentiment dictionaries. We used Lexicoder Sentiment Dictionary and Opinion Lexicon to determine the percentage of negative and positive words in feedbacks.
The second heuristic calculates sentiments for each sentence in the user views, and adds them to determine the overall sentiment of the feedback documents. For all the documents marked manually as positive review, the sentiment of the document is equal to the difference between the total of positive sentiments summed over all sentences in the document and that of negative sentiments summed over for all sentiments in document. The difference should be a positive number in order to mark the document as a positive sentiment. Similarly, for all the documents marked manually as negative review, the sentiment of the document is equal to the difference between the total of negative sentiments summed over all sentences in the document and that of positive sentiments summed over for all sentiments in document. The difference should be a positive number in order to mark this document as a negative sentiment. We have used SentiStrength to test this heuristics.
The third and last heuristic tests the documents by examining each word and comparing them with a probability model defined over a pre-labelled movie data. In this case, the decision is taken on the basis of confidence level generated by the probability model. We have used the online tool available at http://sentiment.vivekn.com/.
IV.
RESULTS This section describes the results obtained after applying heuristics and tools described in section 3 over data collection from movie reviews and Facebook. We are describing these results in two phases. In first phase, we are describing results over movie data. As seen from Table I, the percentage of correct prediction using RIOTScan and Lexicoder Sentiment Dictionary is 63.4% and 68.1% for positive and negative documents respectively that is consistent (overall average 65.8%) for both types of feedbacks. However, the prediction fluctuates heavily when we use RIOTScan with Opinion Lexicon. The prediction (Table II) in case of positive documents is as high as 96.6%, which can be a benchmark performance by any standard. But, the prediction fell down to an abysmally low value of 15.2% in case of negative documents. This underlines the need of adding more relevant negative words in the Opinion Lexicon Dictionary, which is relatively an older dictionary.
Table
We applied second heuristic over movie review data using Java version of SentiStrength. We have changed the default policy of assigning 1.5 times higher weightage of negative words in SentiStrength to equal weightage (1.0) for both positive and negative words. We have also used the SentenceCombineTot option instead of the default "Maximum" option of SentiStrength. As shown in Table III, the performance improves in case of negative documents as compared to heuristic one with Lexicoder Sentiment Dictionary, but it goes down in case of positive documents. This opens a research direction that requires experimenting and reassigning the weights assigned to negative and positive words in EmotionLookup Table in SentiStrength. Table IV, the performance of the third heuristic is satisfactorily high and consistent. The reason for good and consistent performance can be attributed to two factors: First, this tool is modelled and tested over similar kind of data and second, the labelled dataset is relatively new. In the second phase of evaluation, we tested all heuristics using documents sets prepared by collecting Facebook status updates. In case of first heuristic with Lexicon Sentiment Dictionary, the performance went up for positive documents (from 63.4% to 72%), but went down sharply to 48% from 68.1 % for negative documents (Table I and Table V). In case of Opinion Lexicon dictionary, the gap between performance over positive and negative documents decreased, but still it is too wide (Table II and Table VI).
As shown in
The pattern for performance of the second heuristic follows the same pattern as the first heuristic using Lexicoder Sentiment Dictionary where performance for positive documents goes up and that for negative documents goes down (Table III and Table VII). On the other hand, the performance for the third heuristics goes down for both positive and negative documents (Table IV and Table VIII). The performance of the third heuristic is consistent with our reasoning for its good performance over movie review data.
The overall performance of all heuristics slips down when we use data from Facebook with an exception of Opinion Lexicon dictionary where performance is very low for negative documents for both types of data. The reason behind the fall of the performance can be attributed to different style of writing of negative comments on social websites like Facebook. The users over Facebook quite often write sarcastic posts. These posts are composed of mostly positive words, but they are negative in true sense. For example, consider the post on Facebook, "Someone takes blessings of his mother before filing nomination (the main agenda of the day. All other activities are just to support this main agenda). Starts from residence to file nomination with great fanfare. He is busy whole day doing everything except filing nomination. The day ends and the filing nomination is yet to be done. You decide how focused he (is) will be in his work. Does it give some clue on his working style? Ok. Remember the days when he was CM. He was doing everything (dharna, protest) except what he is supposed to do as a CM". This post contains more positive terms than negative terms. However, a human will easily judge that this post is a negative remark on the person under consideration. In addition, there are many negative posts on Facebook, which if presented to humans without providing the details of past events and political or social inclination of the user, will be judged as positive comments. For example, consider the post "Tough competition between Sagarika & Shekhar to claim the position of "India's most Progressive Intellectual Sanctimonious Secular"". By reading this comment without prior knowledge of the background of the names mentioned in this post, this comment will be decided as a positive comment. However, in real context the user had posted it as a sarcastic comment. This aspect also opens new research direction in sentiment analysis.
V. DISCUSSION, CONCLUSION AND FUTURE WORK
The growth of social media and increasing participation by the vocal users in expressing their views make sentiment analysis of data a crucial task for success of businesses. Though there are several online tools for analysis of sentiments, but business houses cannot use it for privacy and security reasons. Using free sentiment analysis tools instead of costly commercial products makes good economic sense. Therefore, we have applied some heuristics over freely available tools to test their suitability with real life data from social media. We have presented the results, and have highlighted the comparative performance of these tools for both negative and positive documents. This research identified the shortcomings in the existing resources and tools, and underlined the potentials for further improvement in these tools instead of developing an entirely new set of tools. It also opens several new research directions.
There is a lot of scope in research in sentiment analysis. As this paper clearly highlights, there is an urgent need to conduct intensive studies to update sentiment dictionaries. There can be several ways to populate the sentiment dictionary. We recommend assigning dynamic weights to the terms in dictionary instead of assigning fixed values. In order to achieve this, these tools can be redesigned to update the weight of the terms in dictionaries dynamically with each usage of the tool.
Keeping track of the details of the users may require lot of memory and computing resources, but it can help in deciphering the context, and as a result, prediction of sentiments will be better. Research can be conducted to predict the behaviours of the users based on frequency and nature of their posts in social media. This will help in handling sarcastic posts in better way. This will also help business houses to understand and serve the users in a better way by providing customized products and services to them.
We need to conduct research to develop a continuous scale to analyse the sentiments instead of analysing at discrete scale or a simple positive or negative sentiment. The posts or feedbacks of the users can range from extremely negative to neutral to extremely positive defined over a continuous scale. This will help in assessing the user sentiments in a more accurate way.
The users from countries where English is not a native language post their comments using a combination of English and their native languages. For example, a typical user in India mixes Hindi (written in either Devanagari Script or Roman Script) and English for writing posts on social media. Intensive research should be conducted to integrate the language detection and language translation tools with sentiment analysis tools for indepth and accurate analysis of sentiments.
Finally, the effective utilization of semantic web may significantly help the research in sentiment analysis.
TABLE I .
TABLE V .