Text Mining With R PDF
Text Mining With R PDF
Text Mining With R PDF
2/22
1
11/4/2019
Assumptions:
The more addressees in a tweet, the harsher its words
Longer tweets are also less likely to contain favorable language
Sources:
https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-2
R packages needed:
See Twitter.Visualization.USAirlines.R
3/22
Required R packages
install.packages("readr")
install.packages("ggplot2")
install.packages("ggthemes")
install.packages("dplyr")
install.packages("stringr")
install.packages("gridExtra")
install.packages("tm")
install.packages("SnowballC")
install.packages("wordcloud")
install.packages("fpc")
install.packages("cluster")
install.packages("maps")
4/22
2
11/4/2019
str(tweets)
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
5/22
The stringr package is used to count the number of @ symbols in the tweet. Of
course if there is only one, then it is the airline.
The same package is used to count the number of characters used in the tweet
where the maximum length should be 170.
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
6/22
3
11/4/2019
##
# Show counts
## 1 2 3+
table(tweets$at_countD)
## 12995 1420 225
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
7/22
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
8/22
4
11/4/2019
While tweets containing 1, 2, and 3+ @ symbols have roughly the same proportion
of positive tweets, the negativity goes down and neutrality goes up.
This is probably because the ratio of useful text to perform sentiment analysis is
decreasing as the number of addressees in the text increases resulting in greater
uncertainty/neutrality.
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
9/22
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
10/22
5
11/4/2019
Source: https://www.kaggle.com/mrisdal/d/crowdflower/twitter-airline-sentiment/exploring-audience-text-length
11/22
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
12/22
6
11/4/2019
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
13/22
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
14/22
7
11/4/2019
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
15/22
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
16/22
8
11/4/2019
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-1/notebook
17/22
Conclusions
Most tweets have negative sentiment (>60%).
Most tweets are targeted towards United airlines, followed by American and US Airways.
Most of the tweets targeted towards American, United and US Airways contain negative
sentiment.
Tweets targeted towards Delta, Virgin and Southwest containing roughly same proportion
of negative, neutral and positive sentiment.
Main reasons for negative sentiment are Customer Service Issues and Late Flights.
Negative sentiment tweets towards Delta are based mostly on late flights and not so
much on Customer Service Issues as for the rest of the airlines.
18/22
9
11/4/2019
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-2
19/22
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-2
20/22
10
11/4/2019
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-2
21/22
Source: https://www.kaggle.com/solegalli/d/crowdflower/twitter-airline-sentiment/airline-sentiment-part-2
22/22
11