2016 Ann
2016 Ann
2016 Ann
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 2, Ver. V (Mar-Apr. 2016), PP 64-69
www.iosrjournals.org
Abstract : Sentiment Analysis is the process of identifying whether the opinion or reviews expressed in a piece of work is
positive, negative or neutral. Sentiment analysis is useful in social media monitoring to automatically characterize the
overall feeling or mood of consumers as replicated in social media toward a specific brand or company and determine
whether they are viewed positively or negatively on the web Sentiment Analysis has been widely used in classification of
review of products and movie review ratings. This paper reviews the machine learning-based approaches to sentiment
analysis and brings out the salient features of techniques in place. The prominently used techniques and methods in machine
learning-based sentiment analysis include - Naïve Bayes, Maximum Entropy and Support Vector Machine, K-nearest
neighbour classification. Naïve Bayes has very simple representation but doesn't allow for rich hypotheses. Also the
assumption of independence of attributes is too constraining. Maximum Entropy estimates the probability distribution from
data, but it performs well with only dependent features. For SVM may provide the right kernel, but lacks the standardized
way for dealing with multi-class problems. For improving the performance regarding correlation and dependencies between
variables, an approach combining neural networks and fuzzy logic is often used.
Keywords - Machine Learning, Maximum Entropy, Naïve Bayes, Neural Network, Sentiment analysis, Support
Vector Machine.
I. INTRODUCTION
Sentiment mainly refers to feelings, emotions, opinion or attitude. With the rapid increase of World Wide Web,
people frequently express their sentiments over internet through social media, blogs, rating and reviews. Due to
this increase in the textual data, there is a need to analyze the concept of expressing sentiments and calculate the
insights for exploring business. Business owners and advertising companies often employ sentiment analysis to
start new business strategies and advertising campaign.
Sentiment analysis can be used in different fields for various purposes. For example in Online Commerce,
sentiment analysis is extensively incorporated in e-Commerce activities. Websites allow their users to record
their experience about shopping and product qualities. They provide summary for the product and different
features of the product by assigning ratings or scores. Customers can easily view opinions and recommendation
information on whole product as well as specific product features. Voice-of-the-Market (VOM) is about
determining what customers are feeling about products or services of competitors. Voice-of-the-Customer
(VOC) is concern about what individual customer is saying about products or services. It means analyzing the
reviews and feedback of the customers. Brand Reputation Management (BRM) is concern about managing
reputation in market. Opinions from customers or any other parties can damage or strengthen the reputation of
business .
Machine learning algorithms are very helpful to classify and predict whether a particular document have
positive or negative sentiment. Machine learning is categorized in two types known as supervised and
unsupervised machine learning algorithms. Supervised learning algorithm uses a labelled dataset where each
document of training set is labelled with appropriate sentiment, whereas, unsupervised learning include
unlabelled dataset where text is not labelled with appropriate sentiments.
This paper primarily focuses on applying supervised learning techniques on a labeled dataset. Sentiment
analysis is usually implemented on three levels namely sentence level, document level and aspect level.
Document Level sentiment classification aims at classifying the entire document or topic as positive or negative.
Sentence level sentiment classification considers the polarity of individual sentence of a document whereas
aspect level sentiment classification first identifies the different aspects of a corpus and then for each document
the polarity is calculated with respect to the obtained aspects for exploring business [18].
Sentiment analysis plays an important role in opinion mining. It is generally used when consumers have to make
a decision or a choice regarding a product along with its reputation which is derived from the opinion of others.
Sentiment analysis can reveal what other people think about a product. According to the wisdom of the crowd
DOI: 10.9790/0661-1802056469 www.iosrjournals.org 64 | Page
An approach to sentiment analysis using Artificial Neural Network with comparative analysis of
sentiment analysis gives indication and recommendation for the choice of product. A single global rating could
change perspective regarding that product. Another application of sentiment analysis is for companies who want
to know the review of customers on their products. Sentiment analysis can also determine which features are
more important for the customers. Knowing what people think provides numerous possibilities in the
Human/Machine interface domain. Sentiment analysis for determining the opinion of a customer on a product is
a non-trivial phase in analyzing the business activities like brand management, product planning, etc. The figure
1 shows the general process flow.
II. BACKGROUND
Pang, Lee and Vaithyanathan [1] have done sentiment classification based on categorization feature
categorizing sentiments as positive and negative using three different machines learning algorithms i.e., Naïve
Bayes classification, Support Vector machine, and Maximum Entropy classification. These techniques are
augmented with the use of n-grams. Their experimentation reveals that the SVMs perform better as compared to
Naïve Bayes technique.
The structured reviews are used for testing and training and identifying features. This is followed by scoring
methods to determine whether the reviews are positive or negative. The classifiers namely NB and SVM are
used to classify the sentences obtained from web search through search query using product name as search
condition. When operating on individual sentences collected from web searches, performance is limited due to
noise and ambiguity. But in the context of a complete web-based tool and helped by a simple method for
grouping sentences into attributes, the results are qualitatively quite useful [2]. Among SVM, NB and ME
classification techniques for sentiments, Naïve Bayes has been found to achieve better performance over SVM
on[5].
K-nearest neighbor classification (kNN) is based on the assumption that the classification of an instance is most
similar to classification of other instances that are nearby in the vector space. In comparison to the other text
classification methods like Naive Bayes, KNN does not depend on prior probabilities and it is computationally
efficient [6].
An approach based on artificial neural networks to divide the document into positive, negative and fuzzy tone
has been proposed by Jian,Chen and Han-shi. The said approach uses recursive least squares back propagation
training algorithm and in the research, sentiment analysis was performed on a large data set of tweets using
Hadoop and the performance was measured in form of speed and accuracy. The results show that the technique
shows very good efficiency in handling big sentiment data sets than the small datasets. [7].
Chen, Liu and Chiu have proposed a Neural Network based approach to classify sentiment in blogospheres by
combining the advantages of the BPN and SO indexes. Compared with traditional techniques such as BPN and
SO indexes, the proposed approach delivers more accurate results. It is found to improve classification accuracy
and also reduction in training time [8].
4.2 Pre-Processing
Following are steps in preprocessing.
1) Stop word removal
2) Symbol removal
3) POS tagging (Part Of Speech).
Stanford POS tagging is used for our study.
This method finds actual parts of speech using the English parser mode.
The POS Tagging on the input sentence and uses Verb, Adverb and Adjectives only.
It uses the standard Penn Treebank POS tag sets.
For example: The movie was not quite good. After the Removal of stop word Output is [Movie,
not, quite, good] after POS Tagging result is [Movie/NN, not/RB, quite/JJ, good/JJ].
V. CONCLUSION
Applying Sentiment analysis to mine the large amount of unstructured data has become an important research
problem. Now business organizations and individuals are putting forward their efforts to find the best system for
sentiment analysis. Some of the algorithms have been used in sentiment analysis to gives good results, but no
technique can resolve all the challenges. Most of the researchers reported that Support Vector Machines (SVM)
has high accuracy than other algorithms, but it also has limitations.To overcome limitation of some techniques,
our study focus is on the machine learning approaches and use of artificial neural networks (ANN) in sentiment
classification and analysis. Our study suggests that the ANN implementations would result in improved
classification, combining the best of artificial neural network with fuzzy logic.
References
[1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of
the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics,
2002, pp.79–86.
[2] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product
reviews,”in Proceedings of the 12th international conference on World Wide Web. ACM, 2003, pp. 519–528.
[3] Neethu M S and Rajasree R,” Sentiment Analysis in Twitter using Machine Learning Techniques”, 4th ICCCNT 2013 July 4 - 6,
2013, Tiruchengode, India
[4] S.B.Kotsianta, I.D. Zaharakis and P.E. Pintelas.” Machine Learning: a review of classification and combining
technique”,Springer,(pp.159-190).-2006
[5] G. Vinodhini and RM. Chandrasekaran,”Sentiment Analysis and Opinion Mining: A survey”, International Journal of Advanced
Research in Computer Science and Software Engineering, (pp.282 – 291).-2012.
[6] Eui-Hong (Sam) Han, George Karypis and VipinKumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbour
Classification”, 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), (pp. 53-65).-2001.
[7] ZHU Jian , XU Chen, and WANG Han-shi, “ Sentiment classification using the theory of ANNs”, The Journal of China Universities
of Posts and Telecommunications, July 2010, 17(Suppl.): 58–62
[8] Long-Sheng Chen, Cheng-Hsiang Liu and Hui-Ju Chiu, “A neural network based approach for sentiment classification in the
blogosphere”, Journal of Informetrics 5 (2011) 313–322.
[9] Vikrant Hole and Mukta Takalikar, “A Survey on Sentiment Analysis And Summarization For Prediction”, IJECS Volume 3 Issue
12 December, 2014 Page No.9503-9506
[10] Geetika Gautam and Divakar Yadav, “Sentiment Analysis of Twitter Data Using Machine Learning Approaches and Semantic
Analysis”, 978-1-4799-5173-4/14/$31.00 ©2014 IEEE.
[11] Neha S. Joshi and Suhasini A. Itkat, “A Survey on Feature Level Sentiment Analysis”, (IJCSIT) International Journal of Computer
Science and Information Technologies, Vol. 5 (4) , 2014, 5422-5425
[12] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and Trends in Information Retrieval 2(1-2), 2008, pp. 1–
135.
[13] A. Abbasi, H. Chen and A. Salem, “Sentiment analysis in multiple languages: Feature selection for opinion classification in web
forums,” In ACM Transactions on Information Systems, vol. 26 Issue 3, pp. 1-34, 2008.
[14] P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews”, Proceedings of the
Association for Computational Linguistics (ACL), 2002, pp. 417–424.
[15] A. Harb, M. Planti, G. Dray, M. Roche, Fran, O. Trousset and P. Poncelet, “Web opinion mining: how to extract opinions from
blogs?”, presented at the Proceedings of the 5th international conference on Soft computing as trans-disciplinary science and
technology, Cergy-Pontoise, France, 2008.
[16] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu, “Combining Lexicon-based and Learning-based Methods for Twitter Sentiment
Analysis”, Technical report, HP Laboratories, 2011.
[17] Ji Fang and Bi Chen, “Incorporating Lexicon Knowledge into SVM Learning to Improve Sentiment Classification”, In Proceedings
of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), pages 94–100, 2011.
[18] Abinash Tripathy, Ankit Agrawal and Santanu Kumar Rath,” Classification of Sentimental Reviews Using Machine Learning
Techniques”, 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015).