Hate Speech On Social Media

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11
At a glance
Powered by AI
The paper proposes an approach to detect hateful expressions on Twitter by collecting unigrams and patterns to use as features for training a machine learning algorithm to classify tweets.

The paper proposes an approach to automatically detect hate expressions on Twitter. It collects unigrams and patterns from a training set to later use as features along with others to train a machine learning algorithm for classification.

Hate speech refers to the use of aggressive, violent or offensive language targeting a specific group. With the growth of social media, hate speech has become a serious problem. The paper looks at detecting hate speech automatically on Twitter to help filter hateful content.

Received December 25, 2017, accepted January 31, 2018, date of publication February 15, 2018, date of current

version March 28, 2018.


Digital Object Identifier 10.1109/ACCESS.2018.2806394

Hate Speech on Twitter: A Pragmatic Approach


to Collect Hateful and Offensive Expressions
and Perform Hate Speech Detection
HAJIME WATANABE, MONDHER BOUAZIZI , AND TOMOAKI OHTSUKI
Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan
Corresponding author: Mondher Bouazizi ([email protected])

ABSTRACT With the rapid growth of social networks and microblogging websites, communication between
people from different cultural and psychological backgrounds has become more direct, resulting in more
and more ‘‘cyber’’ conflicts between these people. Consequently, hate speech is used more and more, to the
point where it has become a serious problem invading these open spaces. Hate speech refers to the use of
aggressive, violent or offensive language, targeting a specific group of people sharing a common property,
whether this property is their gender (i.e., sexism), their ethnic group or race (i.e., racism) or their believes
and religion. While most of the online social networks and microblogging websites forbid the use of hate
speech, the size of these networks and websites makes it almost impossible to control all of their content.
Therefore, arises the necessity to detect such speech automatically and filter any content that presents hateful
language or language inciting to hatred. In this paper, we propose an approach to detect hate expressions on
Twitter. Our approach is based on unigrams and patterns that are automatically collected from the training set.
These patterns and unigrams are later used, among others, as features to train a machine learning algorithm.
Our experiments on a test set composed of 2010 tweets show that our approach reaches an accuracy equal
to 87.4% on detecting whether a tweet is offensive or not (binary classification), and an accuracy equal
to 78.4% on detecting whether a tweet is hateful, offensive, or clean (ternary classification).

INDEX TERMS Twitter, hate speech, machine learning, sentiment analysis.

I. INTRODUCTION following 9/11, 58% of them were perpetrated within two


Online social networks (OSN) and microblogging websites weeks after the event. However, nowadays, with the rapid
are attracting internet users more than any other kind of web- growth of OSN, more conflicts are taking place, following
site. Services such those offered by Twitter, Facebook and each big event or other.
Instagram are more and more popular among people from dif- Nevertheless, while the censorship of content remains a
ferent backgrounds, cultures and interests. Their contents are controversial topic with people divided into two groups, one
rapidly growing, constituting a very interesting example of supporting it and one opposing it [2], in OSN, such language
the so-called big data. Big data have been attracting the atten- still exists. It is even easier to spread among young people as
tion of researcher, who have been interested in the automatic well as older ones than other ‘‘cleaner’’ speeches.
analysis of people’s opinions and the structure/distribution of For these reasons, Burnap and Williams [3] claimed that
users in the networks, etc. collecting and analyzing temporal data allows decision mak-
While these websites offer an open space for people to ers to study the escalation of hate crimes following ‘‘trig-
discuss and share thoughts and opinions, their nature and the ger’’ events. However, ‘‘official’’ information regarding such
huge number of posts, comments and messages exchanged events are scarce given that hate crimes are often unreported
makes it almost impossible to control their content. Further- to the police. Social networks in this context present a better
more, given the different backgrounds, cultures and believes, and more rich, yet less reliable and full of noise, source of
many people tend to use and aggressive and hateful language information.
when discussing with people who do not share the same To overcome this noise and the non-reliability of data,
backgrounds. King and Sutton [1] reported that 481 hate we propose in this work an efficient way to detect both offen-
crimes with an anti-Islamic motive occurred in the year that sive posts and hate speeches in Twitter. Our approach relies

2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 13825
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

on writing patterns, and unigrams along with sentimental both users, the tweet is actually a joke between two friends.
features to perform the detection. The second also presents the same problem, even though the
The remainder of this paper is structured as follows: in user seems to be offending women, given the context of the
Section II we present our motivations and describe some of message (i.e., a small discussion between a group of friends),
the related work. In Section III we formally define the aim of the tweet in itself was not posted to offend women, or even
our work and describe in detail our proposed method for hate the person targeted by the tweet.
speech detection and how features are extracted. In Section IV Such expression, and others that include reference to a par-
we detail and discuss our experimental results. Section V ticular gender, race, ethnic group or religion are widely used
concludes this paper and proposes possible directions for in a joking context, and have to be clearly distinguished from
future work. hate speeches. Therefore, the use of dictionaries, and n-grams
in general, might not be the optimal option to perform the
II. MOTIVATIONS AND RELATED WORK distinction between expressions showing hate, and those that
A. MOTIVATIONS do not.
Hate speech is a particular form of offensive language where It is arguable that sentiment analysis techniques can be
the person using it is basing his opinion either on segregative, used to perform hate speech detection. However, this is
racist or extremist background or on stereotypes. Merriam- a different task, which requires more sophisticated tech-
Webster1 defines hate speech as a ‘‘speech expressing hatred niques: In sentiment analysis, the main task is the detec-
of a particular group of people.’’ From a legal perspec- tion of sentiment polarity of the tweet, which goes back to
tive, it defines it as a ‘‘speech that is intended to insult, the idea of the detection of any existing positive/negative
offend, or intimidate a person because of some trait (as race, word or expression. This makes it easy to rely on the
religion, sexual orientation, national origin, or disability).’’ direct meaning of words: words have usually the same
This being the case, hate speech is considered a world-wide sentiment polarity regardless of the context or the actual
problem that many countries and organizations have been meaning with very few exceptions (e.g. the word ‘‘bad’’
standing up against. With the spread of internet, and the cannot be interpreted, under any circumstance, in a pos-
growth of online social networks, this problem becomes even itive way). However, in the case of hate speech, some
more serious, since the interactions between people became words might be negative, might even have the meaning
indirect, and people’s speech tends to be more aggressive of hate, but the context makes them not hate speech-
when they feel physically safer, not to mention that internet related. A typical example can be seen in the following two
presents for many hate groups sees it as an ‘‘unprecedented examples:
means of communication of recruiting’’ [2]. - ‘‘I hate seeing them losing every time! It’s just unfair!’’:
In the context of internet and social networks, not only Even though the word ‘‘hate’’ has been employed here,
does hate speech create tension between groups of people, the given sentence does not fall under the category of hate
its impact can also influence businesses, or start serious real- speech, simply because the context is not a context of offend-
life conflicts. For such reasons, websites such as Facebook, ing a person, let alone to be offending him for his gender,
Youtube and Twitter prohibit the use of hate speech. However, race, etc.
it is always difficult to control and filter all the contents. - ‘‘I hate these neggers, they keep making life much
Therefore, in the research field, hate speech has been subject painful’’:
to some studies, trying to automatically detect it. Most of This is obviously a hate speech towards a specific ethnic
these works on hate speech detection have goals such as group.
the construction of dictionaries of hate words and expres- This makes the task of hate speech detection quite different
sions [4] or the binary classification into ‘‘hate’’ and ‘‘non- and more challenging than sentiment analysis: not only is it
hate’’ [5]. However, it is always difficult to clearly decide context-dependent, but also, we should not rely on simple
on a sentence whether it contains hate or not, in particular if words or even n-grams to detect it.
the hate speech is hiding behind sarcasm or if no clear words On a related context, writing patterns have proven to be
showing hate, racism or stereotyping exist. effective in text classification tasks such as sarcasm detec-
Furthermore, OSN are full of ironic and joking content that tion [6], [7], multi-class sentiment analysis [8] or sentiment
might sound racist, segregative or offensive, which in reality quantification [9]. The types of patterns, and the way they
is not. An example is given in the following two tweets: are built and extracted depend on the application. Therefore,
• ‘‘Hey dummy. It has been a while since we last read one during this work, we try to extract patterns of hate speech and
of your useless comments.’’ offensive texts using a pragmatic approach, and use these,
• ‘‘If we want the opinion of a WOMAN, we’ll ask you along with other features to detect hate speech in short text
dear... For now keep quiet.’’ messages on Twitter.
The first tweet sounds offensive and demeaning the person Therefore, in this work, we propose different sets of fea-
target of the tweet. However, given the mutual follow of tures including writing patterns and hate speech unigrams.
We use these features together to perform the classification
1 http://www.merriam-webster.com/dictionary/ of texts collected from Twitter (i.e., tweets) into three classes

13826 VOLUME 6, 2018


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

we refer to as ‘‘Clean,’’ ‘‘Offensive’’ and ‘‘Hateful.’’ Further bag of words (BoW) features to distinguish hate speech utter-
description of the different classes will be given in the next ances from clean speech ones.
section.
The main contribution of this paper are as follows: III. PROPOSED APPROACH
1) We propose a pattern-based approach to detect hate Given a set of Tweets, the aim of this work is to classify each
speech on Twitter: patterns are extracted in prag- of them into one of three classes which are:
matic way from the training set and we define • Clean: this class consists of tweets which are neutral,
a set of parameters to optimize the collection of non-offensive and present no hate speech.
patterns. • Offensive: this class contains tweets that are offen-
2) In addition to patterns, we propose an approach that sive, but do not present any hate or a segregative/racist
collects, also in a pragmatic way, words and expres- speeches
sions showing hate and offense, and use them with • Hateful: this class includes tweets which are offen-
patterns, along with other sentiment-based features to sive, and present hate, racist and segregative words and
detect hate speech. expressions.
3) The proposed sets of unigrams and patterns can be used We use machine learning to perform the classification: we
as already-built dictionaries for future works related to extract a set of features from each tweet, we refer to a training
hate speech detection. set and perform the classification.
4) We classify tweets into three different classes (instead
of only two) where we make distinction between tweets A. DATA
showing hate, and those being just offensive. For the sake of this work, we have collected and combined
3 different data sets:
B. RELATED WORK • A first data set publicly available on Crowdflower2 : this
The analysis of subjective language on OSN has been deeply data set contains more than 14 000 tweets that have been
studied and applied on different fields varying from sentiment manually classified into one of the following classes:
analysis [10]–[12] to sarcasm detection [6], [7] or detection of ‘‘Hateful,’’ ‘‘Offensive’’ and ‘‘Clean.’’ All the tweets
rumors [13] etc. However, relatively fewer works (compared on this data set have been manually annotated by three
to the aforementioned topics) have been addressed to the hate people.
speech detection. Some of these works targeted sentences • A second data set publicly available also on Crowd-
in the world wide web such as the work of Warner and flower3 : which has been used previously in [19] and
Hirschberg [5] and Djuric et al. [14]. The first work reached which has also been manually annotated into one of
an accuracy of classification equal to 94% with an F1 score the three classes: ‘‘Hateful,’’ ‘‘Offensive’’ and ‘‘Nei-
equal to 63.75% in the task of binary classification, and ther,’’ the last referring to the ‘‘Clean’’ class mentioned
the second reached an accuracy equal to 80%. previously.
Gitari et al. [15] extracted sentences from some major • A third data set, which has been published in github4 and
‘‘hate sites’’ in United States. They annotated each of the used in the work [18]: Tweets on this data set are classi-
sentences into one of three classes: ‘‘strongly hateful (SH),’’ fied into one of the following three classes: ‘‘Sexism,’’
‘‘weakly hateful (WH),’’ and ‘‘non-hateful (NH).’’ They used ‘‘Racism’’ and ‘‘Neither.’’ The first two (‘‘Sexism,’’
semantic features and grammatical patterns features, run the ‘‘Racism’’) referring to specific forms of hate speech,
classification on a test set and obtained an F1-score equal they have been included as a part of the class ‘‘Hateful,’’
to 65.12%. whereas the tweets of the class ‘‘Neither’’ have been
Nobata et al. [16] used lexicon features, n-gram features, discarded because there is no indication whether they
linguistic features, syntactic features, pretrained features, are clean or offensive (several tweets were manually
‘‘word2vec’’ features and ‘‘comment2vec’’ features to per- checked, and they have been identified as belonging to
form the classification task into two classes, and obtained an both classes).
accuracy equal to 90%. As stated above, the three data sets were combined to make
Nevertheless, some other works targeted the detection of a bigger data set, that we split as we will describe later in this
hateful sentences in Twitter. Kwok and Wang [17] targeted section.
the detection of hateful tweets against black people. They To perform the task of classification, the data set is split
used unigram features which gave an accuracy equal to 76% into three subsets as follows:
for the task of binary classification. Obviously, the focus • A training set: this set contains 21 000 tweets, dis-
on the hate speech toward a specific gender, ethnic group, tributed evenly among the three classes (i.e., ‘‘Clean,’’
race or other makes the collected unigrams related to that ‘‘Offensive’’ and ‘‘Hateful’’): each class has 7 000 tweets.
specific group. Therefore, the built dictionary of unigrams
cannot be reused to detect hate speech towards other groups 2 https://www.crowdflower.com/data-for-everyone/
with the same efficiency. Burnap and Ohtsuki [3] used typed 3 https://data.world/crowdflower/hate-speech-identification
dependencies (i.e., the relation between words) along with 4 https://github.com/ZeerakW/hatespeech

VOLUME 6, 2018 13827


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

This set will be referred to as the ‘‘training set’’ in the OpenNLP presents poor performances on PoS tagging of
rest of this work. informal and noisy texts such as tweets.
• A test set: this set contains 2 010 tweets: each class has Afterwards, we generate what we qualify as negation vec-
670 tweets.This set will be referred to as the ‘‘test set’’ tor: we detect the position of negation words (e.g., ‘‘not,’’
and will be used to optimize our proposed approach. ‘‘never,’’ etc.) and detect the coverage of these words. The
• A validation set: this set contains 2 010 tweets: each approach we used is quite simple and inspired from the work
class has 670 tweets. This set will be referred to as of Das and Chen [21]: basically, a negation word covers all
the ‘‘valication set’’ and will be used to evaluate our the words that follows it until the next punctuation mark or the
proposed approach. occurrence of a contrast word (e.g., ‘‘but,’’ ‘‘however,’’ etc).
To get fair result, we use the same number of tweets for Words covered by a negation word are given a negation score
each set. Given that the number of tweets in ‘‘Hateful’’ class equal to −1 while the rest of the words will be given a score
was 8 340 and it is the least among the three classes, we set equal to 1. This will be used later on the count of positive
the number of training tweets for each class to 7 000 tweets, and negative words: a positive word (negative word) having
that of the test tweets to 670 tweets and that of the validation a negation score equal to −1 will be considered as a negative
tweets to 670. word (positive word), and it is attributed the opposite of its
original score (This will be explained in the next subsection).
B. DATA PRE-PROCESSING On a separate step, we extract all the hashtags, and use
In this section, we briefly describe how the tweets were a small tool we developed to decompose it into the words
preprocessed. Fig 1 shows the different steps done during this that compose it (e.g., the hashtag ‘‘#ihateyou’’ will give the
phase. expression ‘‘I hate you’’) and are kept aside to be used when
needed.

C. FEATURES EXTRACTION
In this subsection, we describe how features are extracted
from the tweets, and which we will use later to perform the
classification. However, we first explain the choice of our sets
of features.
Hate is basically a sentiment among others, a negative
sentiment to be precise. Therefore, we believe that relying
on sentiment polarity of the tweet is an important indicator of
whether or not it can be a potential hateful tweets.
In addition, punctuation marks and use of all-capitalized
words can significantly change the meaning of the
tweet, or make explicit some intention hidden in a text.
FIGURE 1. Pre-processing phases of the tweets.
Therefore, such features need to be extracted along with
sentiment features to detect hate.
In a first step, we clean up the tweets. This includes the However, hate manifests mainly on the words and expres-
removal of URLs (which starting either with ‘‘http://’’ or sions a person uses. Therefore, the content of the words itself
‘‘https://’’) and tags (i.e., ‘‘@user’’) and irrelevant expres- is even more important than the aforementioned features.
sions (words written in languages that is not supported by For this, we extract from the training set, in a pragmatic
ANSI coding). This is because these do not add any informa- way, a set of words (to which we refer as unigrams) and
tion on whether the tweet might express hate or not. In partic- expressions (to which we refer as patterns), that are most
ular, for the case of tags, if the relationship between the author likely to be related to hate and use them as extra features for
of the tweet and the person tagged is known, this information hate detection.
might be valuable. However, since no background is given As explained early on this work (Section 2.1), unlike
regarding the author and the tagged person, we believe that sentiment analysis, it is not very useful to rely only on the
the use of tags is not useful for our work. sentiment polarity of the words to detect hate speech: not
The second step consists of the tokenization, Part-of- only do the words’ meanings change according to the context,
Speech (PoS) Tagging, and the lemmatization (using both but also hate speech has different manifestations. Patterns,
tokens and PoS tags) of the different words. For this sake, in such cases, are useful to detect longer hateful expression.
we used OpenNLP5 to perform the Natural Language Pro- Therefore, we extract patterns referring to words, as well as
cessing (NLP) tasks of tokenization and lemmatization. part-of-speech tags, to make sure that we do not get exclusive
However, to perform the Part-of-Speech (PoS) tagging, patterns that apply to only very specific situations, but general
we rely on Gate Twitter PoS Tagger [20]. This is because ones that reflect hate regardless of the content. In other words,
we make sure that an expression extracted that shows hate,
5 https://opennlp.apach.org is a general one that applies to different contexts of hate.

13828 VOLUME 6, 2018


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

This will be elaborated later on this work, when we set some To detect the polarity of emoticons and slang words,
parameters to make sure that a certain expression occurs we rely on two manually-built dictionaries containing the
enough times in a given class (i.e., it is not specific to a emoticons/slang words along with their polarity. As for Hash-
single case or scenario) and does not occur in the other classes tags, we developed our own tool that splits a hashtag into
(i.e., it is not a general expression that has nothing to do with the words that composes it and used SentiStrength scores to
that class). decide on its polarity.
To conclude, mainly 4 sets of features are extracted which Sentiment-related features are good indicators whether or
we qualify as ‘‘sentiment-based features‘‘, ‘‘semantic fea- not a text is negative. As mentioned above, a negative text is
tures,’’ ‘‘unigram features‘‘, and ‘‘pattern features.’’ By com- most likely to present hate speech. However, not all negative
bining these sets, we believe it is possible to detect hate texts do. Therefore, more features need to be extracted for the
speech: ‘‘sentiment features‘‘ allow us to extract the polarity sake of detection of hate speech.
of the tweet, a very essential component of hate speech (given
that hateful speeches are mostly negative ones). ‘‘Seman-
tic features’’ allow us to find any emphasized expression. 2) SEMANTIC FEATURES
‘‘Unigram features’’ allow us to detect any explicit form of Semantic features are ones that describe how an internet user
hate speech, whereas patterns allow the identification of any uses punctuation, capitalized words, and interjections, etc.
longer or implicit forms of hate speech. In the rest of this Although hate speech on social networks and microblogging
subsection, we describe how these features are extracted. websites do not have a specific and a common use of punc-
tuation or employment of capitalization, in some cases, some
1) SENTIMENT-BASED FEATURES of these reflect some sort of segregation or others, such as the
following example:
Although the task of detection of hate speech differs drasti-
‘‘Why don’t you simply go back to YOUR COUNTRY and
cally from that of sentiment analysis and polarity detection,
leave us in peace?’’
it still makes sense to use sentiment-based features as the most The tweet is obviously offensive and shows some hate,
basic features that allow the detection of hate speech. This however, there is no explicit use of hate words, or any sen-
is because hate speech is most likely to be present in a timental word (except the word ‘‘peace’’ which is obviously
‘‘negative’’ tweet, rather than a ‘‘positive’’ one. a positive word.).
Consequently, we first extract features that would help to Therefore, we believe that punctuation features, including
determine whether a tweet is positive, negative or neutral. the capitalization, the existence of question and exclamation
As mentioned above, the detection of the polarity in itself is marks, etc. help detecting hateful speech, and they cannot be
not the purpose of this work, but an extra step to facilitate the simply discarded. In our work, we make use of the following
main task which is the detection of hate speech. features:
Therefore, from each tweet t we extract the following
features: • the number of exclamation marks,
• the total score of positive words (PW ),
• the number of question marks,
• the total score of negative words (NW ),
• the number of full stop marks,
• the ratio of emotional (positive and negative) words ρ(t)
• the number of all-capitalized words,
defined as: ρ(t) = PW the number of quotes,
PW +NW ; ρ(t) is set to 0 if the tweet
−NW •

has no emotional words, • the number of interjections,


• the number of positive slang words,
• the number of laughing expressions,
• the number of negative slang words,
• the number of words in the tweet.
• the number of positive emoticons,
• the number of negative emoticons,
• the number of positive hashtags, 3) UNIGRAM FEATURES
• the number of negative hashtags. Unigram features are simply unigrams collected from the
The total score of positive words, and that of negative training set in a pragmatic way, and are used each as an
words are extracted using SentiStrength,6 a tool that attributes independent feature which can take one of two values: ‘‘true’’
sentiment scores to sentences as well as the words of which and ‘‘false.’’
it is composed. The scores range from -5 to -1 for negative All unigrams that have a part-of-speech (PoS) tag of a
words, and from 1 to 5 for positive words. Given a tweet t, noun, verb, adjective or adverb are extracted from the training
we count the sum of the scores of individual words that have set and stored in three different lists (one list for each class)
a positive polarity and attribute the obtained sum to the first along with their number of occurrences in the correspond-
features; and we do the same for the negative words and ing class. We keep only words that occur at least minuocc
attribute the absolute value of the obtained sum to the second (a threshold that represents minimal number of occurrences
features (i.e., both features take positive values). of unigrams to be taken into account).
Given a word w that appeared in one of the three lists (for
6 http://sentistrength.wlv.ac.uk/ convenience we call it C1 ), we measure two ratios we refer to

VOLUME 6, 2018 13829


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

as ρ12 and ρ13 defined as follows:


N1 (w)
ρ12 (w) = (1)
N2 (w)
N1 (w)
ρ13 (w) = (2)
N3 (w)
where Ni (w) is the number of occurrences of the word in a
class i. If the denominator of the ratio is 0, the value is set to 2.
This is done for all the words of the three classes that
satisfy the condition mentioned above regarding the number
of occurrences. We keep only words that satisfy a second
condition defined as follows:
ρij (w) ≥ Thu (3)
where Thu is a threshold we set for the ratios, that needs to be
tuned to maximize the accuracy.
As mentioned above, each of the resulting words will be
used as a unique feature: for a word w, in each tweet, we check
whether it is employed or not. If the tweet contains the
word, the value of the corresponding feature is set to ‘‘true,’’
otherwise, it is set to ‘‘false.’’ta
Given the optimal values of the two parameters minuocc and FIGURE 3. Offensive class top words.
Thu (we will describe the optimization process of the different
parameters later in this section), the most occurring top words
extracted from the tweets of the class, ‘‘hateful’’ are given a dictionary of hate-related words that can be used for future
in Fig. 2 and ‘‘offensive’’ are given in Fig. 3. works.
In total, we extracted 1 373 words. Consequently,
1 373 unigram features are defined.

4) PATTERN FEATURES
Pattern features are extracted the same way we extract uni-
grams: however, before we describe how pattern features are
attributed their values and are extracted from the training set,
we first introduce a pattern in our context.
In a first step, we divide the words of a tweet into two
groups based on whether or not they can be sentimental into
two categories: a category ‘‘SW’’ (i.e., sentimental word) and
a category ‘‘NSW’’ (i.e., non-sentimental word). Words that
can be sentimental are simply nouns, verbs, adjective and
adverbs. Therefore, any word in the tweet that has a PoS
that refers to a noun, verb, adjective or adverb is qualified
as belonging to ‘‘SW.’’ A word that has another PoS tag is
qualified as belonging to ‘‘NSW.’’
A pattern is extracted from a tweet as follows: for each
word, if it belongs to ‘‘SW,’’ it is replaced by its simplified
PoS tag as described in TABLE 1 along with its polarity.
For example the word ’’coward’’ will be replaced by the
FIGURE 2. Hateful class top words. expression ‘‘Negative_ADJECTIVE.’’ Otherwise, if the word
belongs to ‘‘NSW’’ it is simply replaced by its simplified PoS
While most of the used words from both classes are just tag as described in TABLE 1.
general words that people use when insulting or demean- The resulting vectors extracted from different tweets have
ing someone, some of them have a racist content or different lengths, therefore, we define a pattern as a vector
content that refers to a specific gender, ethnic group or oth- of consecutive words having a fixed length L where L is a
ers (e.g., ‘‘muslims’, ‘‘islamic,’’ ‘‘faggot,’’ ‘‘spic’’ etc.). parameter to optimize. If a tweets has more than L words,
We believe that using a bigger training set, we can use the we extract all possible patterns. If it has less words than L,
approach we proposed above for ‘‘unigram-features’’ to build it is simply discarded.

13830 VOLUME 6, 2018


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

TABLE 1. List of PoS tags and their corresponding simplified tags. D. PARAMETERS OPTIMIZATION
The proposed sets of features present different parameters
that need to be optimized to obtain the maximum accuracy
of classification. The parameters to be optimized are the
following:
• the minimal occurrence of words minuocc
• the word ratios threshold Thu
p
• the minimal occurrence of patterns minocc
• the pattern ratios threshold Thp
• the pattern length L
• the coefficient α
To tune these parameters, each time we fix all the param-
eters except one, and look for its optimal value. Therefore,
to determine the best value of the parameter minuocc , we set
the values of the the different parameters as follows:
• Thu = Thp = 1.4,
p
• minocc = 3,
• L =7
• α = 0.1
The choice of these values was based on an earlier set of
experiment in which we tried to limit the intervals of the
values of the parameters: we ran our experiments on each
We extract different patterns as described from the training family of features independently using the values of similar
set and save them in three different lists along with their parameters that we introduced in a previous work [6]. Then
number of occurrences. We filter out the ones that appear less we adjusted the features to get the current values.
p
than minocc . Afterwards, given a pattern p that appeared in We try different values of the parameter minuocc . The results
one of the three lists (we call it C1 ), we measure two ratios are given in Fig. 4. The optimal value was obtained for
we refer to as ρ12 and ρ13 defined as follows: minuocc = 9.
N1 (p)
ρ12 (p) = (4)
N2 (p)
N1 (p)
ρ13 (p) = (5)
N3 (p)
where Ni (p) is the number of occurrences of the pattern p in a
class i. If the denominator of the ratio is 0, the value is set to 2.
Only patterns that satisfy the condition

ρij (p) ≥ Thp (6)

are kept, where Thp is a threshold we define and tune. FIGURE 4. Classification accuracy (right axis) and number of words
Using the optimal values of the two parameters collected (left axis) for different values of the parameter minu
occ .
p
minocc and Thp , 1875 patterns features are extracted in total.
Given a pattern p, the corresponding feature is attributed a We then keep the values of the different parameters as they
numeric value measuring the resemblance of the tweet to that are, set minuocc to 9, and adjust the parameter Thu . Different
pattern. Therefore, given a tweet t and a pattern p, we define values from 1.1 to 2 have been checked, and the optimal value
the following resemblance function [6]: was obtained for Thu = 1.4 as shown in Fig. 5. In total,
1 373 words are collected.

 1, if the pattern appears in the tweets To determine the best length of patterns (i.e., L), we set the



 as it is, values of the parameters related to unigram features to their

 optimal values and try different values of the parameter L as
α · n/N , if the tweet contains n out of the N


 shown in Fig. 6. we kept the other parameters as we set them
res(p, t) = tags of the pattern in the correct initially. The optimal value was obtained for L = 5, and the

order,



 total number of patterns obtained is 1 875.


 0, if the tweet doesn’t contain any We proceed the same way to obtain the optimal values of
p



of the tags of the pattern. minocc and Thp . The optimal values of the parameters are
7 and 1.3 ∼ 1.9 (in the rest of this work the value 1.4 is
where α is a parameter to optimize. considered) respectively as show in Figs. 7 and 8.

VOLUME 6, 2018 13831


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

low value). The optimal value of this parameter is equal


to 0.01.
Therefore, for the rest of this work, we considered the first
case and keep the values of the parameters as follows:


 minuocc = 9,

Thu = 1.4,



minp = 7,

occ


 Th p = 1.4,


 L = 5,
FIGURE 5. Classification accuracy (right axis) and number of words

α = 0.01.


collected (left axis) for different values of the parameter Thu .

IV. EXPERIMENTAL RESULTS


After the extraction of features and optimization of parame-
ters, we proceed to our final experiments. The classification
is done using the toolkit weka [22]. Weka presents variety
of classifiers organized into groups based on the type of the
algorithm (e.g., decision tree-based, rule-based, etc.).
To evaluate the performance of classification, we use 4 dif-
ferent key performances indicators (KPIs) which are the per-
centage of true positives, the precision, the recall and the
F1-score defined as:
FIGURE 6. Classification accuracy (right axis) and number of patterns Precision · Recall
collected (left axis) for different values of the parameter L.
F1-score = 2 × (7)
Precision + Recall
For the sake of our work, to perform the classification,
we use the machine learning algorithm ‘‘J48graft’’ [23].
The algorithm ‘‘J48graft’’ presents a main parameter to tune
which is the confidence threshold for pruning (C). The opti-
mal value of this parameter, obtained during this work is
C = 0.04. This is because this classifier presents better
performances than other classifiers (even powerful ones such
as Support Vector Machine (SVM) and Random Forest, etc.).
The fact that ‘‘J48graft’’ outperforms SVM might be due
to the existence of hundreds of binary features (that take
the values ‘‘true’’ or ‘‘false’’), since SVM is better dealing
FIGURE 7. Classification accuracy (right axis) and number of patterns with numeric features. However this does not explain why
p
collected (left axis) for different values of the parameter minocc .
‘‘J48graft’’ outperforms Random Forest.

TABLE 2. Accuracy, precision, recall and F1-score of classification using


different classifiers.

Table 2 shows the performances of classification using


‘‘J48graft’’ compared to that using some other classifiers.
Initially, we perform the classification on the test set,
FIGURE 8. Classification accuracy (right axis) and number of patterns which has been used to optimize the features’ parameters
collected (left axis) for different values of the parameter Thp . we defined. This is to optimize also the parameters of the
classifier used (i.e., ‘‘J48graft’’). Once the parameters are
optimized we re-run the classification again on the validation
We finally set the values of the four parameters to their set. This is to make sure that the features as well as the
optimal and tried different values of α. The obtained results classifier parameters were not overfitting to the current test
did not differ much (keeping in mind α should have a set, and that they perform well for a completely different set.

13832 VOLUME 6, 2018


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

A. BINARY CLASSIFICATION TABLE 5. Binary classification performances on the validation set.

In a first step, we combined the tweets of the two classes


‘‘hateful’’ and ‘‘offensive’’ under one class we refer to as
‘‘offensive’’ (since hateful tweets are indeed offensive and
aggressive). This is to make the classification a binary classi-
fication task. In the training set, in total we have 14 000 tweets
for class ‘‘offensive’’ and 7 000 tweets for the class ‘‘clean.’’
As for the test set, the number of tweets of the class ‘‘offen-
sive’’ is 2,680 while that of the class ‘‘clean’’ is 1 340. Using
these sets, run the classification. The obtained results are
given in TABLE 3, while the confusion matrix is given
in TABLE 4.

TABLE 3. Binary classification performances on the test set.

TABLE 6. Binary classification confusion matrix of the validation set.

Semantic features on the other hand does not have a good


classification accuracy. This is because, when they are used
alone, these features cannot tell whether or not a text is
hateful, offensive or clean. In other words, these features need
to be combined with the other sets of features to make sense.
The same goes for sentiment-based features: even though
TABLE 4. Binary classification confusion matrix. offensive language is more likely to appear in negative tweets,
this information alone (whether the tweet is positive or neg-
ative) is not enough to judge on the content of the tweet and
the language used.

B. TERNARY CLASSIFICATION
We then perform the binary classification on the validation The classification on the test set presents a clearly lower
set (which, to remind, has not been involved in any of the opti- accuracy, precision and recall as shown in TABLE 7. The
mization process steps). The results of the classification are overall accuracy of classification reaches 79.7% with almost
given in Table 5 and the confusion matrix is given in Table 6. 10% drop after splitting the class previously referred to as
The overall accuracy obtained when all the features are ‘‘offensive’’ into two sub-classes (i.e., ‘‘offensive’’ and ‘‘hate-
used is equal to 87.4% with a precision equal to 93.2% for ful’’). These two classes have obviously lower precision and
the class ‘‘offensive.’’ The performances per family of features recall compared to the other class ‘‘clean’’, because tweets of
show that the unigram features as well as the pattern features these two classes are close in terms of content, and tend to be
present the highest accuracy with values respectively equal confused with each other as shown in TABLE 8.
to 82.1% and 70%. This is because the way these features are Again, we run the classification on the validation set, to
extracted (pragmatic approach) made them highly related to confirm the efficiency of the features and the parameters used.
the different classes. In other words, while punctuation-based The results of the classification on the validation set are given
and sentiment-based marks have not been selected to reflect in TABLE 9 and the confusion matrix is given in TABLE 10.
any specific aspect, and have been extracted from the differ- While the binary classification discussed in the previous
ent tweets as they are, patterns and top words are polarized subsection is important since it allows to automatically detect
features and the existence of any of them in a tweet has a offensive, aggressive and hateful speeches with a precision
high influence on the decision whether it is offensive or not. equal to 93.2%, it is a more challenging task to go deeper in

VOLUME 6, 2018 13833


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

TABLE 7. Ternary classification performances on the test set. TABLE 10. Ternary classification confusion matrix of the validation set.

person to whom the message is sent. Even for humans with


no background about the speaker, it is usually very hard to
judge whether a tweet is hateful or just offensive.
Using the same sets of features we ran the classification.
The classification results obtained are given in TABLE 9.
Obviously, the accuracy dropped remarkably compared to
the binary classification for the simple reason that hateful
and offensive speeches are hard to distinguish from each
other and In TABLE 10, and we observed that many of the
tweets of the ‘‘hateful’’ were misclassified as belonging to the
class ‘‘clean.’’ This also explains the low recall of the class
‘‘hateful,’’ and the low precision of the class ‘‘clean.’’ This
is because we can’t distinguish some ‘‘hateful’’ and ‘‘clean’’
tweets. We can also see it from ‘‘clean’’ tweets misclassified
as ‘‘hateful’’ is more than misclassified as ‘‘offensive’’ in
TABLE 10.
TABLE 8. Ternary classification confusion matrix of the test set.
The overall accuracy obtained reaches 78.4%. In addition,
the same sets of features that performed well during the
binary classification are ones that performed well during the
ternary classification, for the same reasons mentioned above.
In particular, hate-related unigrams are very close to those
offensive. As shown in Fig. 2, words highly related to hate
TABLE 9. Ternary classification performances on the validation set. are almost the same as those usually used to offend people,
demean them or insult them (i.e., offensive speech). That
being the case, even features qualified as ‘‘Unigram’’ present
lower accuracy when we split the class ‘‘offensive’’ from the
pervious subsection (binary classification) into two classes
which are ‘‘hateful’’ and ‘‘offensive.’’
Even though, performing such a comparison on patterns is
quite challenging (since patterns do not show a direct relation
to a specific class), we believe that the same kind of problem
occurs and the patterns extracted from both classes are very
close and related to one another.

V. CONCLUSION
In this work, we proposed a new method to detect hate speech
in Twitter. Our proposed approach automatically detects
hate speech patterns and most common unigrams and use
these along with sentimental and semantic features to clas-
sify tweets into hateful, offensive and clean. Our proposed
approach reaches an accuracy equal to 87.4% for the binary
classification of tweets into offensive and non-offensive, and
an accuracy equal to 78.4% for the ternary classification of
tweets into, hateful, offensive and clean.
In a future work, we will try to build a richer dictionary
the classification, and separate tweets containing hate speech of hate speech patterns that can be used, along with a uni-
from those that are just offensive. Hate speech as discussed in gram dictionary, to detect hateful and offensive online texts.
the motivation usually targets groups of people based on their We will make a quantitive study of the presence of hate speech
backgrounds, while an offensive text might just target the one among the different genders, age groups and regions, etc.

13834 VOLUME 6, 2018


H. Watanabe et al.: Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions

ACKNOWLEDGMENT [23] G. I. Webb, ‘‘Decision tree grafting from the all-tests-but-one partition,’’
The research results have been achieved by ‘‘Cognitive in Proc. IJCAI, San Francisco, CA, USA, Aug. 1999, pp. 702–707.
Security: A New Approach to Securing Future Large Scale
and Distributed Mobile Applications,’’ the Commissioned
Research of National Institute of Information and Commu-
HAJIME WATANABE received the Bachelor’s
nications Technology (NICT), JAPAN. degree in Keio University in Japan in 2015. He is
currently working on the Master’s degree at Keio
REFERENCES University.
[1] R. D. King and G. M. Sutton, ‘‘High times for hate crimes: Explaining the
temporal clustering of hate-motivated offending,’’ Criminology, vol. 51,
no. 4, pp. 871–894, 2013.
[2] J. P. Breckheimer, ‘‘A haven for hate: The foreign and domestic implica-
tions of protecting Internet hate speech under the first amendment,’’ South
California Law Rev., vol. 75, no. 6, p. 1493, Sep. 2002.
[3] P. Burnap and M. L. Williams, ‘‘Cyber hate speech on twitter: An appli-
cation of machine classification and statistical modeling for policy and
decision making,’’ Policy Internet, vol. 7, no. 2, pp. 223–242, Jun. 2015.
MONDHER BOUAZIZI received the Bachelor
[4] A. H. Razavi, D. Inkpen, S. Uritsky, and S. Matwin, ‘‘Offensive lan-
guage detection using multi-level classification,’’ Advances in Artifi- Engineering Diploma in communications from
cial Intelligence, vol. 6085. Ottawa, ON, Canada: Springer, Jun. 2010, SUPCOM, Carthage University, Tunisia, in 2010,
pp. 16–27. and the master’s degree from Keio University
[5] W. Warner and J. Hirschberg, ‘‘Detecting hate speech on the world wide in 2017, where he is currently pursuing the Ph.D.
Web,’’ in Proc. 2nd Workshop Lang. Social Media, Jun. 2012, pp. 19–26. degree. He was a Telecommunication Engineer
[6] M. Bouazizi and T. O. Ohtsuki, ‘‘A pattern-based approach for sarcasm (access network quality and optimization) for three
detection on Twitter,’’ IEEE Access, vol. 4, pp. 5477–5488, 2016. years with Ooredoo Tunisia.
[7] D. Davidov, O. Tsur, and A. Rappoport, ‘‘Semi-supervised recognition of
sarcastic sentences in Twitter and Amazon,’’ in Proc. 14th Conf. Comput.
Natural Lang. Learn., Jul. 2010, pp. 107–116.
[8] M. Bouazizi and T. Ohtsuki, ‘‘Sentiment analysis: From binary to multi-
class classification: A pattern-based approach for multi-class sentiment
analysis in Twitter,’’ in Proc. IEEE ICC, May 2016, pp. 1–6. TOMOAKI OHTSUKI (OTSUKI) received the
[9] M. Bouazizi and T. Ohtsuki, ‘‘Sentiment analysis in Twitter: From clas- B.E., M.E., and Ph.D. degrees in electrical engi-
sification to quantification of sentiments within tweets,’’ in Proc. IEEE neering from Keio University, Yokohama, Japan,
GLOBECOM, Dec. 2016, pp. 1–6. in 1990, 1992, and 1994, respectively.
[10] J. M. Soler, F. Cuartero, and M. Roblizo, ‘‘Twitter as a tool for pre- From 1994 to 1995, he was a Post-Doctoral
dicting elections results,’’ in Proc. IEEE/ACM ASONAM, Aug. 2012, Fellow and a Visiting Researcher in electrical engi-
pp. 1194–1200. neering at Keio University. From 1993 to 1995,
[11] S. Homoceanu, M. Loster, C. Lofi, and W.-T. Balke, ‘‘Will I like it? he was a Special Researcher of Fellowships of the
Providing product overviews based on opinion excerpts,’’ in Proc. IEEE Japan Society for the Promotion of Science for
CEC, Sep. 2011, pp. 26–33. Japanese Junior Scientists. From 1995 to 2005, he
[12] U. R. Hodeghatta, ‘‘Sentiment analysis of Hollywood movies on Twitter,’’ was with the Tokyo University of Science. From 1998 to 1999, he was with
in Proc. IEEE/ACM ASONAM, Aug. 2013, pp. 1401–1404.
the Department of Electrical Engineering and Computer Sciences, University
[13] Z. Zhao, P. Resnick, and Q. Mei, ‘‘Enquiring minds: Early detection of
of California at Berkeley, Berkeley, CA, USA. In 2005, he joined Keio
rumors in social media from enquiry posts,’’ in Proc. Int. Conf. World Wide
Web, May 2015, pp. 1395–1405. University, where he is currently a Professor. He has authored or co-authored
[14] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and over 140 journal papers and 340 international conference papers. He is
N. Bhamidipati, ‘‘Hate speech detection with comment embeddings,’’ in involved in research on wireless communications, optical communications,
Proc. WWW Companion, May 2015, pp. 29–30. signal processing, and information theory.
[15] N. D. Gitari, Z. Zuping, H. Damien, and J. Long, ‘‘A lexicon-based Dr. Ohtsuki is a Fellow of the IEICE. He was a recipient of the
approach for hate speech detection,’’ Int. J. Multimedia Ubiquitous Eng., 1997 Inoue Research Award for Young Scientist, the 1997 Hiroshi Ando
vol. 10, no. 4, pp. 215–230, Apr. 2015. Memorial Young Engineering Award, the Ericsson Young Scientist Award
[16] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, ‘‘Abusive 2000, the 2002 Funai Information and Science Award for Young Scientist,
language detection in online user content,’’ in Proc. WWW, Apr. 2016, the IEEE the 1st Asia–Pacific Young Researcher Award 2001, the 5th Inter-
pp. 145–153. national Communication Foundation (ICF) Research Award, the 2011 IEEE
[17] I. Kwok and Y. Wang, ‘‘Locate the hate: Detecting tweets against blacks,’’ SPCE Outstanding Service Award, the 27th TELECOM System Technology
in Proc. AAAI, Jul. 2013, pp. 1621–1622. Award, the ETRI Journal’s 2012 Best Reviewer Award, and the 9th Interna-
[18] Z. Waseem and D. Hovy, ‘‘Hateful symbols or hateful people? Predictive tional Conference on Communications and Networking in China 2014 (CHI-
features for hate speech detection on Twitter,’’ in Proc. Student Res. NACOM ’14) Best Paper Award. He gave tutorials and keynote speeches at
Workshop (NAACL), Jun. 2016, pp. 88–93. many international conferences, including IEEE VTC, IEEE PIMRC, and so
[19] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ‘‘Automated hate on. He was a Vice President of the Communications Society of the IEICE.
speech detection and the problem of offensive language,’’ in Proc. ICWSM,
He served a Chair of the IEEE Communications Society, Signal Processing
May 2017, pp. 1–4.
for Communications and Electronics Technical Committee. He served as
[20] L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, ‘‘Twitter part-of-
speech tagging for all: Overcoming sparse and noisy data,’’ in Proc. Int. a Technical Editor of the IEEE Wireless Communications Magazine and
Conf. RANLP, Sep. 2013, pp. 198–206. an Editor of Physical Communications (Elsevier). He is currently an Area
[21] S. Das and M. Chen, ‘‘Yahoo! for Amazon: Extracting market sentiment Editor of the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY and an Editor of
from stock message boards,’’ in Proc. 8th Asia Pacific Finance Assoc. the IEEE COMMUNICATIONS SURVEYS AND TUTORIALS. He has served as general
Annu. Conf., vol. 35. Jul. 2001, p. 43. co-chair and symposium co-chair of many conferences, including IEEE
[22] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and GLOBECOM 2008, SPC, IEEE ICC2011, CTS, IEEE GCOM2012, SPC,
I. H. Witten, ‘‘The WEKA data mining software: An update,’’ ACM and IEEE SPAWC.
SIGKDD Explorations Newslett., vol. 11, no. 1, pp. 10–18, Jun. 2009.

VOLUME 6, 2018 13835

You might also like