Sentiment Analysis Report
Sentiment Analysis Report
Sentiment Analysis Report
Elections 2016
Operations Research Project
Introduction
Peoples opinions change over time. Being able to accurately detect this
change enables us to analyse the potential causes. Its of great
importance to know the main reasons for the dynamics behind general
publics opinion. For example whether a change of sentiment is derived by
how an event is represented in the mass media or the true actual event.
How influential different networks are in the distribution of news, and if
some of them are representing biased information. Being able to provide
quantitative analysis for these questions, will help gain deeper insight into
the system of social information transfer within a society. As a result, I
decided to have a study on Twitter, as a representative of the mass
audience. I used Twitters text to find the crowds dominant sentiment.
of public opinion than surveys or focus groups do, because the data
is created by the customer.
Abstract
Elections empower citizens to choose their leaders. It gives all an
opportunity for equal voice and representation in our government.
Democracy is government for the people, and by the people, which means
government leaders are determined by participation in elections. As we
approach the 2016 November US presidential election, the public
sentiment towards candidates will influence the future leader of USA. I am
interested in how the public views the top election candidates, namely
Donald Trump, Hillary Clinton, Ted Cruz and Ben Carson. Feelings towards
candidates fluctuate quickly as interviews, debates, responses to global
events, and other issues come to front. To achieve a large, diverse dataset
of current public opinions on the candidates, I decided to use Twitter.
Twitter provides us with live access to opinions about the election across
the globe. It will demonstrate percentage of peoples sentiments on
twitter into positive, negative and neutral and also will showcase the word
cloud in which it will show all the words that have been spoken regarding
the respective candidate in the timeframe during which the tweets were
extracted from twitter. The code for the said project has been written in
python and is somewhat inclined towards the machine learning paradigm.
In the project, the input data has around four thousand tweets in all. Data
collection was the most important as the format of the data needs to be
cleaned so that it can be analysed and further study can be done. The
texts of the collected tweets were used to study peoples sentiments
towards the presidential candidates. As the data set was large, therefore it
was important to bifurcate the dataset into training and test data. The
model was trained on the training data and then tested on the latter. A
random selection of 80% data was done for the training dataset and the
rest 20% for the test dataset. The tweets obtained from Twitter are
compiled in CSV Format (Excel file) and then loaded into python by using
various open source libraries.
Literature Review
The following were read to get an idea about the
research that is going on the topic:1.Probablistics relational models
2. sentiment analysis in social neworks by pozzi alberto
3.Bayesian networks and the nave bayes classifier