"Sentiment Analysis of Survey Comments: Animesh Tilak
"Sentiment Analysis of Survey Comments: Animesh Tilak
"Sentiment Analysis of Survey Comments: Animesh Tilak
ANALYSIS OF
SURVEY
COMMENTS”
BY
ANIMESH TILAK
(CSE)
1
ABSTRACT
2
Introduction
Unstructured data inflow is rapidly increasing day by day. It needs to be
classified to get meaningful insight out of it. Sentiment Analysis can be
used in various fields like Product performance analysis in market, training
chatter bots with specific sentiments to respond, content ratings for various
blogs, posts, videos and can also be used in story summarizing. Sentiment
Analysis is also used in Page Ranking Systems for various search engines.
1. Anaconda Navigator
2. Jupyter Notebook
1. Python 3.6
2. Numpy
3. Pandas
4. Sklearn
Project Planning & Implementations
Converting unstructured data into structured data
Survey comments are imported from the text file where comments were
line separated and labeled as negative and positive. Now a dictionary is
created by taking all the words from negative as well as positive comments.
Using Count-Vectorizer
Count Vectorizer removes English stop words from our created dictionary
and an object of Count-Vectorizer is initialized and it is fed by the created
dictionary. This Count-Vectorizer object gives unique index to each word
present in the created dictionary.
Single line comments are given as a parameter to Count-Vectorizer object.
Now these comments are converted from unstructured data in English
language to 1-D vector. This 1-D vector is the combination of 0’s and 1’s.
If a comment contains any word, then the index which is assigned to that
word by the Count-Vectorizer object is assigned frequency of that word in
the comment.
K-NEAREST NEIGHBORS
K-Nearest Neighbors is one of the most basic yet essential classification
algorithms in Machine Learning. It belongs to the supervised learning
domain and finds intense application in pattern recognition, data mining
and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric,
meaning, it does not make any underlying assumptions about the
distribution of data (as opposed to other algorithms, which assume a
Gaussian distribution of the given data).
ACCURACY SCORE
For calculating the performance of each classifier we use accuracy score
which is calculated by comparing predicted label with actual label.
DATA PRE-PROCESSING
DATA PRE-PROCESSING
MODEL TRAINING
MODEL TRAINING & ACCURACY SCORE
Conclusion and Future Scope
Sentiment analysis is done better when we convert the unstructured data
into structured data because machine learning models understand
numerical data better than categorical or language data. After applying
different classifiers it is observed that logistic regression, Multinomial
Naive Bayes and Support Vector Machine perform very good in
classifying binary data. 75% accuracy is good to achieve because the
dataset was small. It is hard for classifiers to classify when small data set
is given for training.