Akuma Paperon Hate Speech

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/363759715
Comparing Bag of Words and TF-IDF with different models for hate speech
detection from live tweets
Article in International Journal of Information Technology · September 2022

DOI: 10.1007/s41870-022-01096-4
CITATION READS
1 171
3 authors:
Akuma Stephen Tyosar Lubem

Benue State University, Makurdi Benue State University, Makurdi
13 PUBLICATIONS 88 CITATIONS 1 PUBLICATION 1 CITATION
SEE PROFILE SEE PROFILE
Isaac Terngu Adom

Benue State University, Makurdi
1 PUBLICATION 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Implicit Feedback System for the Recommendation of Relevant Web Documents View project
Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets View project
All content following this page was uploaded by Akuma Stephen on 24 September 2022.
The user has requested enhancement of the downloaded file.

Int. j. inf. tecnol.
https://doi.org/10.1007/s41870-022-01096-4
ORIGINAL RESEARCH
Comparing Bag of Words and TF‑IDF with different models

for hate speech detection from live tweets
Stephen Akuma1 · Tyosar Lubem1 · Isaac Terngu Adom1
Received: 2 June 2022 / Accepted: 9 September 2022

© The Author(s), under exclusive licence to Bharati Vidyapeeth’s Institute of Computer Applications and Management 2022
Abstract Social media platforms such as Twitter have rev- Keywords Social media · Sentiment analysis · Hate
olutionized online communication and interactions but often speech · Twitter · Machine learning algorithm · Bag of
contain components of disdain for its growing user base. Words · TF-IDF
This discomforting feed creates instability leading to mental
breakdown, and loss of human lives and properties among
other results of misuse. Even though the problem posed 1 Introduction
by the content of social media is obvious, the challenge of
detecting hateful content persists. Several algorithms and There has been an exponential growth of users of online
techniques have been used in the past for detecting hateful forums like Facebook, Twitter, and Instagram. About
content on social media but there is room for improvement. 350,000 tweets are generated on Twitter and 50,000 com-
The goal of this paper is to detect hate speech from live ments are generated on Facebook per second [11]. Partici-
tweets on Twitter via a combination of mechanisms. The pants of these forums come from different races, cultures
comparison results of Term Frequency-Inverse Document and educational backgrounds and their opinions, criticism,
Frequency (TF-IDF) and Bag of Words (BoW) with machine and personal feelings are all expressed through these plat-
learning models of Logistic Regression, Naïve Bayes, Deci- forms. Lack of regulation of freedom of speech on the web
sion Tree, and K-Nearest Neighbour (KNN), is used to select often leads some users of social media platforms to hide
the best performing model. This model which is integrated their identity and use derogatory and offensive words on
into a web system developed with Twitter Application Pro- people. These derogatory words meant to cause psycho-
gramming Interface (API) is used in identifying live tweets logical damage to users are often referred to as hate speech.
which are hateful or not. The outcome of the comparative There is no consensus definition for hate speech. Attempts
study presented showed that Decision Tree performed bet- have been made to define hate speech based on acts of vio-
ter than the other three models with an accuracy of 92.43% lence or a prejudiced environment that may promote a vio-
using TF-IDF which gives optimal results compared to BoW. lent act on a person or group [15]. In Ref. [18] opined that
Hate speech is an offensive language which can be nasty,
disrespectful, or harmful to an online and offline individual
or communities or a society as a whole. In Ref. [7] pre-
sented hate speech as a type of communication that criti-
cizes or dismisses groups based on specific characteristics
* Stephen Akuma
[email protected] such as physical appearance, religion, descent, national or
ethnic origin, sexual orientation, gender identity, or other
Tyosar Lubem
[email protected] factors, and it can take many forms, including subtle forms
such as humour and jokes. According to Twitter, hate speech
Isaac Terngu Adom
[email protected] is communication that incites violence against others and
directly targets or threatens them because of their ethnic-
1
Department of Mathematics, Computer Science ity, race, nationality, age, gender, religious affiliation, sexual
and Statistics, Benue State University, Makurdi, Nigeria
13
Vol.:(0123456789)
orientation, handicap, or major illness. In Ref. [10] defined and the result is discussed in Sect. 4. Section 5 is the con-
hate speech as “any statement that promotes violent crime, clusion, and it summarizes our findings and makes recom-
attacks or seeks to silence a minority, uses a racial or sexist mendations for future research.
slur; criticizes a minority irrationally; contains stereotyping
of a particular minority; and defends sexism, racism, xeno-
phobia, or any other dangerous extremism”. 2 Related work
These uncensored behaviour has increased within the last
decade and manually detecting and removing such injurious A lot of research work on sentiment analysis has been con-
messages and comments from social media is a tedious task ducted as users of online forums increase [2]. In Ref. [14]
to undertake. When automated techniques are used, they can researched the polarization of Twitter sentiments and they
quickly classify hate speech and protect online users from classified sentiments using emotions from Plutchik’s wheel
social hate speech harassment [18]. Research has been con- of emotion. Long Short-Term Memory (LSTM) model was
ducted in the last decade on how to use automated systems used for sentiment analysis in social data by [21]. They
to detect hate speech [8, 11, 18]. For instance, Artificial obtained an accuracy of 87% in reviewing e-commerce
intelligence technology is already being used by companies products in Hindi language. The same LSTM approach was
like Facebook and Google but there are still challenges in fused with an attention encoder to analyze sentiments [19].
detecting some hate words. The difficulty with automati- They inferred that their aggregated method outperformed
cally detecting hate tweets is that the language and expres- the baseline model used. Deep learning approaches are also
sions have a format that is difficult to annotate [17]. For used for sentiment analysis through a paragragh2vec and
instance, character redundancies (e.g., kiiiiiind, caaaaar) Convolution Neural Network (CNN) approach for the clas-
and unnecessary intonation and exclamations (e.g., Com- sification of hate words [9].
ing…!!!!, yes????) are some of the word abbreviations and Several works on hate speech detection, some of which
expressions overused. As a result, to obtain a format that are addressed in this study, use more than two classifica-
keeps the original meaning, the source text must be changed tion techniques for computational comparison to determine
through a vital preprocessing process that is compatible which method has the best detection performance and accu-
with other comparable posts [8]. This makes detecting hate racy. In Ref. [15] built a model to identify and detect hate
speech from tweets problematic for both robots and humans, speech using a Linear Support Vector Machine with three
as it is exceedingly difficult to distinguish hate speech from parameters, including Brown groups, surface n-grams, and
other potentially harmful content that does not fall into the word skip-grams. In Ref. [1] worked on the hate speech data-
hate tweet category [17]. Other research has used traditional set and they conducted a performance evaluation of feature
machine learning techniques and surface characteristics such extraction techniques with several machine learning models.
as word, deep learning, Term Frequency-Inverse Document Their study divided tweets under study using the following
Frequency (TF-IDF), Doc2Vec, and character n-grams for grouping: hate speech, offensiveness, and neither. Data gath-
hate speech detection [11]. ering, data preprocessing, feature engineering, data splitting,
In this research, we presented a mechanism for identify- classification model design, and classification model evalu-
ing hate speech on Twitter that can effectively differentiate ation were all part of their methodology used in obtaining
between a hate word and a non-hate word. We used a pub- reasonable results.
licly available Twitter dataset to train our classifier model In Ref. [17] described HaterNet, a smart system set up
using Bag of Words and TF-IDF and evaluated the system by the Spanish government to combat hate crimes and iden-
with standard evaluation metrics. A comparative analysis of tify and track the rise of antagonistic relations on Twitter.
the results obtained from several classifiers was also carried This measure provides the first intelligent system that uses
out and it was found that Decision Tree performed better social network analysis methods to track and show hate
than other classifiers for TF-IDF. The strategy that comes speech on social networking sites. It evaluates and contrasts
closest to ours is proposed by [11], which uses TF-IDF and many classification algorithms based on different document
weighted n-gram values. This paper’s main contributions are representation strategies and text classification methods.
as follows: (1) Analysis of the efficiency of BoW with TF- They obtained an Area Under the Curve (AUC) of 0.828.
IDF using various machine learning methods in detecting In Ref. [5, 13] employed TF-IDF vectorization and word
hate speech; (2) Hate speech identification on Twitter using embeddings to extract features in their studies, using vari-
machine learning techniques; (3) Evaluation of the model ous classification algorithms to compare their performance.
using live tweets. The rest of the paper is organized into A hybridization approach of deep learning and TF-IDF
the following: Sect. 2 presents related work summarizing was used by [13] to improve document classification. Their
past research in detecting hate speech. Section 3 provides a result outperformed the traditional classifiers, depicting high
detailed description of our approach. The approach is tested accuracy in classifying documents based on the texts. In
13
Ref. [20] developed a system for detecting hate speech from They used Logistic Regression, Naive Bayes, Decision
comments and posts from major social media platforms, as Tree, Random Forests, and Linear SVMs to test a variety
well as remarks and stories from a list of internet sites. The of models. They found that Logistic Regression and Linear
researchers’ major goal was to develop software for search- SVM performed much better than other models. In Ref. [12]
ing, evaluating, and saving multi-source media and social developed a model for detecting hate speech in Amharic
media posts, with a focus on anti-migrant and anti-refugee language using a combination of RNN, LSTM and Gated
hate speech. They created a Natural Language Processing Recurrent Unit (GRU) techniques. Data labelling, cleaning,
(NLP) script as well as a web service that allows for friendly normalization and tokenization of the Facebook posts which
user interaction. were used as the primary source was carried out. Word2vec
To identify offensive words in tweets, [8] utilized Linear word embeddings and features extraction method were used
SVM and Naive Bayes classifiers. The data used in the train- and they obtained an improved accuracy of the result. Other
ing procedure was demonstrated to be quite sensitive to the researchers have employed word representations or embed-
Linear SVM. Data normalization with tags was discovered dings for detecting hate speech [16]. Table 1 is the tabulated
to make the parameter regulating process more challenging. review of some of the literature used in this work.
In Ref. [11] suggested using n-gram traits weighted with
TF-IDF values, as well as three prominent machine learning
algorithms (SVM, Naive Bayes, and Logistic Regression) 3 Methodology
to detect hate speech and provocative language on Twitter.
They used a grid search for all possible feature parameter We used a machine learning model to distinguish between
combinations and tenfold cross-validation to train each hateful or toxic words and non-hateful language. Preprocess-
model. The average cross-validation score for each com- ing and feature extraction are carried out on a dataset that
bination of feature parameters was used to evaluate the has been annotated and made public through Kaggle [3],
performance of each algorithm. Their results show that for and a comparative analysis is carried out using a BoW and
L2 normalization, SVM performs badly when compared to TF-IDF. The accuracy, F-measure, recall, confusion matrix,
Naive Bayes and Logistic Regression. However, Logistic and precision of the results generated from the four selected
Regression performs better with the appropriate n-gram models—Logistic Regression, Naive Bayes, Decision Tree,
range of 1 to 3 for the L2 normalization of TF-IDF which and K-Nearest Neighbor were analyzed. The Twitter API is
has 95.6% accuracy. used to query or retrieve user tweets from Twitter to detect
To minimize the data’s dimensionality, [6] created a hate speech or non-hate speech to assess the accuracy of the
method that used Logistic Regression with L1 regularization. chosen model. Figure 1 shows the system architecture.
Table 1 Tabulated literature review

S/N Author(s) and Year Methodology Result Limitations
1 [14], 2022 Plutchik’s wheel of emotions and Classification of sentiments from Plutchik’s wheel of emotion has
Rule-Based Classification Algo- tweets limited dictionary words and this
rithm might have affected the prediction
accuracy
2 [21], 2022 LSTM and BoW used for sentiment 87% accuracy of Hindi-based senti- The domain was limited to E-com-
analysis ment analysis merce product reviews
3 [1], 2020 Combined feature engineering and 79% accuracy of hate speech detec- Lack of real-time prediction and less
ML algorithms tion training data
4 [15], 2017 Linear SVM n-grams, word skip- 78% accuracy of the result Limited classifiers, Low-scoped
grams and brown clusters features
5 [17], 2019 LSTM, MLP and frequency features The area under the curve is 0.828 Limited domain of application
6 [5], 2020 Encoder-decoder, LSTM, GRU, 1D 77% accuracy Limited Bangla dataset and testing
convolutional layers, TF-IDF and mechanism for the model
word embeddings
7 [8], 2020 Naive Bayes and SVM 90% accuracy of Naive Bayes and Few classifiers used
92% of SVM
8 [11], 2018 Logistic Regression, Naive Bayes, 95.6% accuracy The distancing of words required
SVM, n-gram and TF-IDF for n-gram takes a longer time to
search through
9 [12], 2020 LSTM, GRU, n-gram and word2vec 97.9% accuracy Model limited to Amharic text data
13
Fig. 1 The System architecture
The algorithm capturing the methodology is presented feature vector leads to improved classification results. The fea-
below: ture extraction methods used are Bag of Words (BoW) and
TF-IDF as explained in Sects. 3.3.1 and 3.3.2.
Step 1 Start
Step 2 Preprocess dataset 3.3.1 Bag of Words
Step 3 Build the model using BoW and TF-IDF as features
Step 4 Evaluate with ML algorithms and select the best fit This method eliminates features from textual expressions so
model that they can be used in modelling, such as in machine learning
Step 5 Use Twitter API to read live tweet t models. Since all information about the sequence or structure
Step 6 Test t with the model to classify as “hate” or of words in a document is removed, it is described as a "bag"
“non-hate” of words. This model just cares about whether or not known
Step 7 Stop terms appear in a document, not where they appear. The goal
is to turn each document into a vector that can be input into or
3.1 Dataset extracted from a machine learning model. The simplest scor-
ing approach is to assign a Boolean value to the presence of
This study utilizes the Kaggle Hate Speech and Offen- words, with 0 indicating absence and 1 indicating presence.
sive Language dataset [3], which was created by Andriy For this work, this feature extraction technique is combined
Samoshyn. It contains 24,784 tweets, each of which has with other techniques and algorithms.
been classified by crowdflower contributors. The dataset was
gathered and annotated to detect hate speech. It distinguishes 3.3.2 Term Frequency–Inverse Document Frequency
between tweets containing hate speech, tweets containing (TF‑IDF)
offensive language, and tweets that do not contain any offen-
sive or hateful language. The Term Frequency–Inverse Document Frequency is a statis-
tical method that measures how important a word is in a set of
3.2 Data processing documents. This is calculated by multiplying two metrics over
a series of texts: the total number of times a word appears in
Stop words were removed, and stemming, tokenizing, and a document (TF) and the word’s inverse document frequency
lemmatization were among the data preprocessing tech- (IDF). The TF-IDF is useful in machine learning models and
niques employed in the study. Because Twitter user com- Natural Language Processing (NLP) tasks for text analysis
munication is occasionally informal, the data is inconsistent where the count of the occurrence of words is of paramount
and noisy, necessitating its cleaning and transformation into importance. Thus, the formula for computing the TF-IDF of
a format that the classification model can understand. term t present in document d is given in Eq. 1.
3.3 Feature extraction techniques

tf − idf (d, t) = tf (t) ∗ idf (d, t) (1)
Machine learning classification techniques necessitate the

correct presentation of tweets, with each tweet, turned into
a feature vector containing only different words. The feature
vector is used as an input to the classifier, implying that a good
13
Table 2 The first experiment conducted with a Bag of Words and the 4.2 Experiment 1 results
models
S/N Algorithm Accuracy Recall Precision F-Measure The classifiers’ performance, looking at Table 2, shows
that Logistic Regression obtained the highest accuracy
1 Naïve Bayes 25.45% 0.003 0.8 0.20
with 74.79% compared to KNN: 66.21%, Decision Tree:
2 KNN 66.21% 0.79 0.76 0.53
67.16% and Naïve Bayes: 25.45%. It is further observed
3 Logistic Regression 74.79% 1.00 0.75 0.43
that the Decision Tree obtained the highest recall of 0.76,
4 Decision Tree 67.16% 0.76 0.79 0.58
precision of 0.79 and 0.58 in the F-measure. Comparing
Logistic Regression and KNN, Logistic Regression per-
formed better than KNN in recall with 1.00 while KNN
Table 3 The second experiment conducted with TF-IDF and the got 0.79. Logistic Regression obtained 0.75 while KNN
models obtained 0.76. The F-measure for the two is 0.43 for
S/N Algorithm Accuracy Recall Precision F-Measure Logistic Regression and 0.53 for KNN respectively. On the
whole, Naïve Bayes performs poorly with an accuracy of
1 Naïve Bayes 75.27% 0.79 0.86 0.69
25.45% being the lowest accuracy while Logistic Regres-
2 KNN 85.76% 0.88 0.91 0.81
sion obtained 74.79% to be the highest accuracy using the
3 Logistic Regression 90.46% 0.99 0.88 0.85
BoW approach. Figure 2 shows the result for the confusion
4 Decision Tree 92.43% 0.95 0.95 0.84
matrix for the algorithms tested with BoW.
4 Results and discussion 4.3 Experiment 2 results
There are numerous machine learning algorithms acces- From the performance of the classifiers as shown in
sible for use in machine learning projects; however, for Table 3, it was observed that the Decision Tree obtained
each dataset, there is a corresponding algorithm that fits the highest accuracy with 92.43% compared to Naves
better; so, several algorithms were examined to determine Bayes with 75.27%, Logistic Regression obtained the sec-
which algorithm has the highest performance accuracy. ond-highest accuracy after the Decision Tree with 90.46%
In light of this, four (4) machine learning algorithms, and KNN: 85.75% making it the third-best performing
namely Naïve Bayes, Decision Tree, Logistic Regression, classifier. It is further observed that the Decision Tree
and K-Nearest Neighbor, were put to the test. The fol- obtained the maximum recall value of 0.95, with a preci-
lowing results were acquired and presented in Tables 2 sion of 0.95 and 0.84 in the F-measure. The recall value
and 3 in terms of precision, recall, F-measure, accuracy, means that only 5% of hateful tweets were misclassified
and confusion matrix from the study conducted on the by the system. Figure 3 shows the result for the confusion
chosen machine learning models using the TF-IDF and matrix for the algorithms tested with TF-IDF.
BoW as feature extraction methods. The experiment was
run twice with multiple classifiers using the TF-IDF and
BoW techniques. 4.3.1 Discussion
As the experimental results obtained from the BoW and

4.1 Experimental Setup TF-IDF comparison with the models show, it can be con-
cluded that the Decision Tree achieved the highest accu-
The experiment was run on a PC with Windows 10 operat- racy than other algorithms. The Decision Tree classifier
ing system, 8 GB RAM and a 4 GHz processor, however, outperformed KNN, Naïve Bayes and Logistic Regression
a GPU will be quicker. The Jupyter notebook in Anaconda classifiers, and obtained higher experimental results based
3 and Google Colab were used for analysis. The data was on the evaluation metrics used in this work. In addition,
divided into two categories: training data and test data. for the feature extraction methods used in the study, the
The models were created using Tensorflow and sci-kit Decision Tree classifier obtained better experimental
learn. Packages for natural language processing were also results when combined with TF-IDF compared to BoW.
used. For Naïve Bayes and Logistic Regression models, Furthermore, the Logistic Regression classifier was the
default parameters were used. The random state parameter second-best classifier compared with the rest of the clas-
for K-Nearest Neighbour is set at 3. The Decision Tree was sifiers. When both BoW and TF-IDF were used as feature
given a random state of 1 instead of the default value of 0. extraction techniques, the result revealed that TF-IDF out-
performed BoW, allowing the Decision Tree classifier to
13
Fig. 2 Confusion matrices for Experiment 1 result
Fig. 3 Confusion matrices for Experiment 2 result
13
reach the maximum accuracy, while BoW performed better 5. Das AK, Asif AA, Paul A, Hossain N (2020) Bangla hate speech
for the Logistic Regression classifier. detection on social media using attention-based recurrent neural
network. J Intell Syst 30(1):578–591
Comparing our work with similar work presents an 6. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated
improvement in prior studies. Some of these attempts hate speech detection and the problem of offensive language.
include [11] where they used n-grams and TD-IDF for hate ICWSM
speech detection. The drawback of their approach is the 7. Fortuna P (2017) Automatic detection of hate speech in text:
an overview of the topic ad dataset annotation with hierarchical
long distance between related words which is not compu- classes. Thesis, Faculdade de engenharia da universidade do porto
tationally efficient. [4]’s system for detecting cyber hate on 8. De Souza GA, Da Costa-Abreu M (2020) Automatic offensive
Twitter based on limited characteristics lacks the generaliza- language detection from Twitter data using machine learning and
tion that our model provides. Limiting the categorization to feature selection of metadata. In Anonymous. In: 2020 interna-
tional joint conference on neural networks (IJCNN). 2020, pp 1–6
race, social orientation and disability is not comprehensive 9. Gambäck B, Sikdar UK (2017) Using convolutional neural net-
enough. [8]’s approach for detecting offensive Twitter posts works to classify hate-speech. In Anonymous. In: Proceedings of
using Machine Learning and feature selection achieved an the first workshop on abusive language online. (Vancouver, BC,
accuracy of 92% lower than ours. Comparability is also lim- Canada). Association for computational linguistics, pp 85–90
10. Gao L (2018) Detecting online hate speech using both supervised
ited as they used only Naive Bayes and SVM for their clas- and weakly-supervised approaches. Master’s thesis, Texas A & M
sification which affected their result. University
11. Gaydhani A, Doma V, Kendre S, Bhagwat L (2018) Detecting hate
speech and offensive language on twitter using machine learning:
an N-gram and TFIDF based approach. Arxiv Abs/1911.02989,
5 Conclusion abs/1809.08651
12. Getachew S, Kakeba K (2020) Department of software engineer-
This research evaluated four supervised machine learning ing, big data and HPCCoE. Addis Ababa Science and Technology
algorithms to track hateful posts or tweets on Twitter. An University, Addis Ababa
13. Kalra V, Kashyap I, Kaur H (2022) Improving document classi-
experimental study was carried out and the results showed fication using domain-specific vocabulary: hybridization of deep
that the machine learning models yielded considerably bet- learning approach with TFIDF. Int J Inf Technol 14:2451–2457
ter results when tested using the TF-IDF approach than 14. Kumar P, Vardhan M (2022) PWEBSA: Twitter sentiment analysis
BoW with the Decision Tree yielding 92.43% when tested by combining Plutchik wheel of emotion and word embedding.
Int J Inf Technol 14:69–77
with TF-IDF, outperforming the other algorithms. Logistic 15. Malmasi S, Zampieri M (2017) Detecting hate speech in social
Regression obtained the highest accuracy among the four media. In: Advances in natural language processing (RANLP), pp
classifiers tested with BoW with an accuracy of 74.79%. 467
The developed model used the technique with the highest 16. Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estima-
tion of word representations in vector space. ICLR
accuracy to determine the presence or absence of hateful 17. Pereira-Kohatsu JC, Quijano-Sánchez L, Liberatore F, Camacho-
connotations in a given tweet. An intriguing look at how Collados M (2019) Detecting and monitoring hate speech in Twit-
hateful words are detected and how they are manifested in ter. Sens J 19(12):4654
intolerance, religion, gender, racism, and misinformation 18. Salminen J, Hopf M, Chowdhury SA, Jung S, Almerekhi H,
Jansen BJ (2020) Developing an online hate classifier for multiple
gives the motivation for this research. social media platforms. Hum-centric Comput Inf Sci 10(1):1–34.
This research used tweets in text format on Twitter to https://doi.org/10.1186/s13673-019-0205-6
detect hate speech. Future research will explore the use of 19. Soni J, Mathur K (2022) Sentiment analysis based on aspect
emotions like emojis, optical character recognition and video and context fusion using attention encoder with LSTM. Int J Inf
Technol
images in detecting hate speech. Larger datasets with other 20. Vrysis L, Vryzas N, Kotsakis R, Saridou T, Matsiola M, Veglis A,
feature extraction techniques will be used with machine Arcila-Calderón C, Dimoulas C (2021) Web interface for analyz-
learning methods for optimal results. ing hate speech. Future Internet 13(3):80. https://d oi.o rg/1 0.3 390/
fi13030080
21. Yadav V, Verma P, Katiyar V (2022) Long short term memory
(LSTM) model for sentiment analysis in social data for e-com-
References merce products reviews in Hindi languages. Int J Inf Technol.
https://doi.org/10.1007/s41870-022-01010-y
1. Abro S, Shaikh S, Hussain Z, Ali Z, Khan S, Mujtaba G (2020)
Automatic hate speech detection using machine learning: a com- Springer Nature or its licensor holds exclusive rights to this article under
parative study. Int J Adv Comput Sci Appl 11(8):484–491 a publishing agreement with the author(s) or other rightsholder(s);
2. Akuma S, Obilikwu P, Ahar E (2021) Sentiment analysis of social author self-archiving of the accepted manuscript version of this article
media content for music recommendation. Nigerian Ann Pure is solely governed by the terms of such publishing agreement and
Appl Sci 4(1):95–107 applicable law.
3. Andrii S (2019) Kaggle. Dataset, 2022
4. Burnap P, Williams ML (2016) Us and them: identifying cyber
hate on Twitter across multiple protected characteristics. EPJ Data
Sci 5(11):1–15. https://d oi.o rg/1 0.1 140/e pjds/s 13688-0 16-0 072-6
13
View publication stats

Akuma Paperon Hate Speech

Uploaded by

Copyright:

Available Formats

Akuma Paperon Hate Speech

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Akuma Paperon Hate Speech

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in International Journal of Information Technology · September 2022

Akuma Stephen Tyosar Lubem

SEE PROFILE SEE PROFILE

Isaac Terngu Adom

The user has requested enhancement of the downloaded file.

Comparing Bag of Words and TF‑IDF with different models

Received: 2 June 2022 / Accepted: 9 September 2022

Table 1 Tabulated literature review

Fig. 1 The System architecture

3.3 Feature extraction techniques

Machine learning classification techniques necessitate the

4 Results and discussion 4.3 Experiment 2 results

As the experimental results obtained from the BoW and

Fig. 2 Confusion matrices for Experiment 1 result

Fig. 3 Confusion matrices for Experiment 2 result

View publication stats

You might also like

Akuma Paperon Hate Speech

Uploaded by

Copyright:

Available Formats

Akuma Paperon Hate Speech

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Akuma Paperon Hate Speech

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in International Journal of Information Technology · September 2022

Akuma Stephen Tyosar Lubem

SEE PROFILE SEE PROFILE

Isaac Terngu Adom

The user has requested enhancement of the downloaded file.

Comparing Bag of Words and TF‑IDF with different models

Received: 2 June 2022 / Accepted: 9 September 2022

Table 1 Tabulated literature review

Fig. 1 The System architecture

3.3 Feature extraction techniques

Machine learning classification techniques necessitate the

4 Results and discussion 4.3 Experiment 2 results

As the experimental results obtained from the BoW and

Fig. 2 Confusion matrices for Experiment 1 result

Fig. 3 Confusion matrices for Experiment 2 result

View publication stats

You might also like

Table 1 Tabulated literature review

Fig. 1 The System architecture

3.3 Feature extraction techniques

4 Results and discussion 4.3 Experiment 2 results

Fig. 2 Confusion matrices for Experiment 1 result

Fig. 3 Confusion matrices for Experiment 2 result