Hate Speech Recognition Final 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI – 590018, Karnataka


INTERNSHIP REPORT
ON

“SOCIAL MEDIA SENTIMENT


ANALYSIS”
Submitted in partial fulfilment for the award of degree(18CSI85)
BACHELOR OF ENGINEERING IN
INFORMATION SCIENCE AND ENGINEERING
Submitted by :
AISHWARYA C KADAKOLE

1JB20IS004

Conducted at
VARCONS TECHNOLOGIES

S J B INSTITUTE OF TECHNOLOGY
Department of Information Science And Engineering
Accredited by NBA, New Delhi
BGSHEALTH AND EDUCATION CITY, KENGERI, BENGALURU-560060,
KARNATAKA, INDIA.

Internship report 2023-2024 1


S J B INSTITUTE OF TECHNOLOGY
Department of Information Science And Engineering
Accredited by NBA, New Delhi
BGSHEALTH AND EDUCATION CITY, KENGERI, BENGALURU-560060,
KARNATAKA, INDIA.

CERTIFICATE

This is to certify that the Internship titled “Social Media Sentiment Analysis” carried out by
Aishwarya C Kadakole(1JB20IS004) , a bonafide student of SJB Institute of Technology,
in partial fulfillment for the award of Bachelor of Engineering, in INFORMATION
SCIENCE AND ENGINEERING under Visvesvaraya Technological University,Belagavi,
during the year 2022-2023. It is certified that all corrections/suggestions indicated have been
incorporatedin the report.
The project report has been approved as it satisfies the academic requirements in respectof
Internship prescribed for the course Internship / Professional Practice (18CSI85)

Signature of Guide Signature of HOD Signature of Principal

External Viva:

Name of the Examiner Signature with Date

1)

2)

Internship report 2023-2024 2


D E C LARAT I O N

I, Aishwarya C Kadakole, final year student of Information Science and Engineering, S J B


Instituteof Technology - 560082, declare that the Internship has been successfully completed,
in VARCONS TECHNOLOGIES. This report is submitted in partial fulfillment of the
requirements for award of Bachelor Degree in Branch name, during the academic year 2022-
2023.

Date : 14/09/2023 :
Place : Bangalore

USN : 1JB20IS004
NAME : Aishwarya C Kadakole

Internship report 2023-2024 3


OFFER LETTER

Date: 11th August, 2023

Name: Aishwarya C Kadakole


USN: 1JB20IS004

Dear Student,

We would like to congratulate you on being selected for the Machine Learning With Python

(Research Based) Internship position with Varcons Technologies, effective Start Date 11th
August, 2023, All of us are excited about this opportunity provided to you!

This internship is viewed as being an educational opportunity for you, rather than a part-time job.
As such, your internship will include training/orientation and focus primarily on learning and
developing new skills and gaining a deeper understanding of concepts of Machine Learning
With Python (Research Based) through hands-on application of the knowledge you learn while
you train with the senior developers. You will be bound to follow the rules and regulations of the
company during your internship duration.

Again, congratulations and we look forward to working with you!.

Sincerely,

Spoorthi H C
Director
Varcons Technologies
213, 2st Floor,
18 M G Road, Ulsoor,
Bangalore-560001

Internship report 2023-2024 4


AC K N O WLE D G E M EN T

This Internship is a result of accumulated guidance, direction and support of several important
persons. We take this opportunity to express our gratitude to all who have helped us to
complete the Internship.

I express my sincere thanks to Dr. K. V. Mahendra Prashanth, Principal, for providing us


adequate facilities to undertake this Internship.

I extend my sincere thanks to Dr. Shashidhara H.R, Head of the Department, Information
Science and Engineering, for providing us an opportunity to carry out Internship and for his
valuable guidance and support.

I would like to thank Rekha(lab Assistant) , Software Services for guiding us during the period
of internship.

I express my deep and profound gratitude to my guide, Dr. Pavitra Bai S, Associate Prof, for
her keen interest and encouragement at every step in completing the Internship.

I would like to thank all the faculty members of our department for the support extended during
the course of Internship.

I would like to thank the non-teaching members of our dept, for helping us during the
Internship.

Last but not the least, I would like to thank my parents and friends without whose constant
help, the completion of Internship would have not been possible.

AISHWARYA C KADAKOLE
1JB20IS004

Internship report 2023-2024 5


ABSTRACT

In the era of digital communication, social media platforms have become indispensable for
expressing opinions, emotions, and sentiments on a wide range of topics. Understanding
and analyzing the sentiment expressed in social media data is crucial for various applications,
including brand reputation management, public opinion monitoring, and market trend
prediction. Hate speech detection is a critical research area at the intersection of natural
language processing and social responsibility. In the digital age, hate speech has proliferated
across various online platforms, causing harm to individuals and communities by perpetuating
discrimination, inciting violence, and fostering division. The objective of hate speech detection
is to develop automated systems that can identify and mitigate hate speech in text, speech, or
multimedia content. These systems employ a range of techniques, including machine learning,
deep learning, and natural language processing, to analyze and classify content as either hate
speech or non-hate speech. Researchers aim to create robust and accurate models that can
distinguish between different forms of hate speech, adapt to evolving online hate speech trends,
and operate across multiple languages and dialects. The ongoing development of hate speech
detection technologies holds the potential to enhance online safety, promote healthy digital
discourse, and protect vulnerable communities from the harmful effects of hate speech.
However, it also raises complex ethical questions related to censorship, bias, and freedom of
expression, underscoring the need for interdisciplinary collaboration and responsible AI
deployment in this field . Hate speech detection remains a necessary and evolving field, one that
strives to make the digital world safer and more inclusive for all individuals, while also raising
fundamental questions about the intersection of technology and human values.

Internship report 2023-2024 6


TABLE OF CONTENTS

Sl no Description Page no

1 Company Profile 8-9

2 About the Company 10-11

3 Introduction 12-14

4 System Analysis 15-17

5 Requirement Analysis 18-19

6 Design & Analysis 20-24

7 Implementation 24-26

8 Snapshots 27-30

9 Conclusion 31-32

10 References 33-34

Internship report 2023-2024 7


CHAPTER 1
COMPANY PROFILE

Internship report 2023-2024 8


COMPANY PROFILE

A Brief History of Company


Varcons Technology , was incorporated with a goal ”To provide high quality and optimal
Technological Solutions to business requirements of our clients”. Every business is a different
and has a unique business model and so are the technological requirements. They understand
this and hence the solutions provided to these requirements are different as well. They focus on
clients requirements and provide them with tailor made technological solutions. They also
understand that Reach of their Product to its targeted market or the automation of the existing
process into e-client and simple process are the key features that our clients desire from
Technological Solution they are looking for and these are the features that we focus on while
designing the solutions for their clients.

Varcons Technology is a Technology Organization providing solutions for all web design and
development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting
the ever increasing automation requirements, Sarvamoola Software Services. specialize in
ERP, Connectivity, SEO Services, Conference Management, effective webpromotion and
tailor-made software products, designing solutions best suiting clients requirements.

we strive to be the front runner in creativity and innovation in software development through
their well-researched expertise and establish it as an out of the box software development
company in Bangalore, India. As a software development company, they translate this software
development expertise into value for their customers through their professional solutions.

They understand that the best desired output can be achieved only by understanding the clients
demand better. At our Company we work with them clients and help them to defiine their exact
solution requirement. Sometimes even they wonder that they have completely redefined their
solution or new application requirement during the brainstorming session, and here they
position themselves as an IT solutions consulting group comprising of high caliber consultants.

They believe that Technology when used properly can help any business to scale and achieve
new heights of success. It helps Improve its efficiency, profitability, reliability; to put itin one
sentence ” Technology helps you to Delight your Customers” and that is what we wantto
achieve.

Internship report 2023-2024 9


CHAPTER 2

ABOUT THE COMPANY

Internship report 2023-2024 10


ABOUT THE COMPANY
Varcons Technologies is a leading provider of cutting-edge technologies and services,
offering scalable solutions for businesses of all sizes. Founded by a group of friends who
started by scribbling their ideas on a piece of paper, today we offer smart, innovative services to
dozens of clients. We develop SaaS products, provide Corporate Seminars, Industrial
trainings and much more. Smart solutions are at the core of all that we do at VCT. Our main
goal is to find smart ways of using technology that will help build a better tomorrow for
everyone, everywhere. SaaS offers a variety of advantages over traditional software licensing
models and We here at VCT tend to include the key features of SaaS in everything we build.

Services provided by Varcons Technologies.

• Website as Software

• Analytics and Research

• Comprehensive Customer Support

• Smart Automation Tools

• Research and Development/Improvise of ML Models

• Python

• Conference / Event Management Service

• Academic Project Guidance

• On The Job Training

• Software Training

Internship report 2023-2024 11


CHAPTER 3
INTRODUCTION

Internship report 2023-2024 12


INTRODUCTION
Introduction to ML
Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on the
development of algorithms and models that enable computers to learn and make
predictions or decisions based on data without explicit programming. It is a rapidly evolving
field that has transformed various industries, from healthcare and finance to entertainment and
autonomous vehicles. Machine learning has become an essential tool for solving complex
problems and extracting valuable insights from vast datasets.

Key Concepts in Machine Learning:

1. .Data: Data is the foundation of machine learning. ML algorithms learn patterns andmake
predictions by analyzing historical data.can be structured (e.g., tables, databases) or
unstructured (e.g., text, images, audio).

2. Algorithm: Machine learning algorithms are mathematical models that learn


from data. They can be categorized into supervised, unsupervised, and reinforcement
learning, depending on the nature of the learning process.

3. Training: In supervised learning, a model is trained using labeled data, where the
algorithm learns to map inputs to outputs. Unsupervised learning involves finding
patterns or structures in unlabeled data. Reinforcement learning focuses on learning
optimal actions in a given environment through trial and error.

4. Features: Features are the attributes or characteristics extracted from data that help the
model understand and make predictions. Feature engineering involves selecting and
transforming relevant features.

5. Model Evaluation: The performance of ML models is assessed using various metrics like
accuracy, precision, recall, F1-score, and more. Cross-validation helps estimate how well a
model generalizes to new, unseen data.

Internship report 2023-2024 13


6. Overfitting and Underfitting: Overfitting occurs when a model learns the training data too
well but fails to generalize to new data. Underfitting, on the other hand, is when the model is
too simplistic to capture the underlying patterns.

7. Hyperparameters: Machine learning models often have hyperparameters that are set before
training begins. Tuning these hyperparameters is crucial to optimize a model's
performance.

Problem Statement

The rapid proliferation of hate speech in online environments, particularly on social media platforms,
poses a significant and growing societal concern. Hate speech, characterized by offensive,
discriminatory, or threatening language targeting individuals or groups based on attributes such as
race, ethnicity, religion, gender, or sexual orientation, has severe consequences for individuals and
communities. It fosters toxicity, exacerbates social divisions, and can even incite violence. Thus, the
central challenge is to develop robust and efficient hate speech detection methods that can
automatically identify and mitigate hate speech instances in digital content.

The key facts of this problem encompass the following challenges:

Context Sensitivity: Hate speech detection is challenging due to its context-dependent nature. What
may be considered hate speech in one context might not be in another, making it difficult
to create universally applicable detection algorithms.

Evolving Language and Tactics: Hate speech constantly adapts and evolves, incorporating subtle
linguistic changes and new tactics to avoid detection. This necessitates the continuous updating of
detection models.

Ethical Dilemmas: Striking a balance between curbing hate speech and preserving freedom of
expression is a complex ethical challenge. Overly aggressive detection systems can suppress legitimate
speech, raising concerns about censorship.

Data Quality and Labeling: Developing accurate hate speech detection models requires high-quality
training data, often obtained by human annotators. Ensuring that this data is representative and free from
bias is a significant challenge.

Internship report 2023-2024 14


CHAPTER 4
SYSTEM ANALYSIS

Internship report 2023-2024 15


SYSTEM ANALYSIS

1. Existing System

Rule-Based Systems: These systems use predefined rules and patterns to identify hate speech. They
often rely on keywords, phrases, and regular expressions. While simple and interpretable, they may
struggle with context-dependent hate speech.

Pretrained Language Models: Pretrained language models, such as BERT and GPT, have gained
popularity for hate speech detection. Fine-tuning these models on labeled hate speech datasets can lead
to state-of-the-art performance.

2. Proposed System

The proposed hate speech detection system aims to leverage advanced natural language processing (NLP)
and machine learning techniques to automatically identify and mitigate hate speech in various forms of
digital content. The system will be designed to operate across multiple online platforms, including social
media, forums, and messaging apps, with a focus on text-based content.

Here is an outline of the key components and features of the proposed system:
Data Collection and Preprocessing:Gather a diverse dataset of text and multimedia content containing
examples of hate speech and non-hate speech.
Preprocess the data to clean and standardize it, including text normalization, tokenization, and removal of
irrelevant content.

Feature Extraction:Utilize state-of-the-art NLP techniques to extract meaningful features from textual
content, such as word embeddings and contextual embeddings (e.g., BERT). For multimedia content,
extract relevant features from images, audio, or video, as applicable.For multimedia content, extract
relevant features from images, audio, or video, as applicable

Internship report 2023-2024 16


3. Objective of the System

The objectives of hate speech detection are multifaceted, encompassing both technical and societal
aims. These objectives aim to address the growing challenges posed by hate speech in digital spaces and
promote a safer and more inclusive online environment. Here are the key objectives of hate speech
detection:

Identification and Mitigation: The primary objective is to automatically identify and mitigate hate
speech in digital content, including text, images, audio, and video. This involves the development of
algorithms and models capable of accurately classifying content as hate speech or non-hate speech.

Enhanced Online Safety: To enhance the safety of online communities, platforms, and individuals by
reducing the prevalence of harmful content, thereby protecting users from the psychological and
emotional harm caused by hate speech.

Maintaining Civil Discourse: Hate speech can disrupt constructive conversations and debates. By
detecting and moderating hate speech, platforms and communities can encourage civil discourse and
productive discussions.

Compliance with Laws and Policies: In many jurisdictions, hate speech is illegal and can lead to legal
consequences. Detecting hate speech helps online platforms comply with local laws and their own
content moderation policies.

Improving Algorithmic Fairness: Many social media platforms use algorithms to curate content.
Detecting and addressing hate speech is crucial to ensure these algorithms do not inadvertently promote
or amplify hateful content.

Internship report 2023-2024 17


CHAPTER 5

REQUIREMENT ANALYSIS

Internship report 2023-2024 18


5. REQUIREMENT ANALYSIS

Hardware Requirement Specification

The most common set of requirements defined by any operating system or software

application is the physical computer resources, also known as hardware.

• Processor : >i3

• Ram :4GB.

• HardDisk : 500GB.

• Inputdevice : Standard Keyboard and Mouse.

• CompactDisk : 650Mb.

• Outputdevice : High Resolution Monitor.

Software Requirement Specification

Software requirements deal with defining software resource requirements and

prerequisites that need to be installed on a computer to provide optimal functioning

of an application.

The following are the software requirements for the application:

• OS:Windows 7 and above

• Back end: Python 3.7.5

• Dataset: Twitter

• IDE: Google Colaborator and Jupyter Notebook

Internship report 2023-2024 19


CHAPTER 6
DESIGN ANALYSIS

Internship report 2023-2024 20


DESIGN ANALYSIS

A design analysis of a hate speech detection system involves breaking down the system into its key
components and analyzing the design choices made for each component. Here's a high-level design
analysis of a hate speech detection system:

Data Collection and Preprocessing:

Design Choice: Collect a diverse dataset of hate speech and non-hate speech content from various sources
and preprocess it to prepare it for model training.
Analysis: Ensure that the dataset is representative of the content found on the targeted platforms and that
preprocessing techniques do not inadvertently remove important information.

Feature Extraction:

Design Choice: Extract relevant features from textual, audio, or visual content to represent the data in a
format suitable for machine learning models.
Analysis: Select appropriate feature extraction techniques, such as word embeddings for text, audio
spectrograms for audio, and convolutional neural networks (CNNs) for images. Ensure that the chosen
features capture the nuances of hate speech effectively.

Internship report 2023-2024 21


Machine Learning Models:

Design Choice: Develop machine learning models, including deep learning architectures, to classify content
as hate speech or non-hate speech.
Analysis: Evaluate the choice of model architecture (e.g., LSTM, CNN, BERT) based on performance
metrics and computational resources. Consider ensembling or transfer learning for improved results.

Multi-Modal Detection:

Design Choice: Extend the system to support multi-modal detection, allowing it to analyze textual, audio,
and visual content simultaneously.
Analysis: Ensure that the integration of different modalities is seamless and that the model can effectively
combine information from multiple sources to make accurate classifications.

Bias Mitigation and Fairness:

Design Choice: Implement techniques to mitigate bias in the training data and model predictions, ensuring
fairness in content classification.
Analysis: Regularly assess the system's performance across different demographic groups to detect and
address bias. Use fairness-aware machine learning methods as needed.

User Interface:

Design Choice: Develop a user-friendly interface that integrates with online platforms for reporting and
moderating hate speech content.
Analysis: Ensure that the user interface is intuitive, accessible, and provides real-time feedback and alerts to
platform administrators and users when necessary.

Transparency and Explainability:

Design Choice: Implement features that provide users with explanations for content classifications and
maintain logs for auditing.
Analysis: Ensure that the explanations provided are clear and informative, and that the auditing logs are
secure and comprehensive for transparency and accountability.

Internship report 2023-2024 22


Evaluation and Metrics:

Design Choice: Define and measure performance metrics, such as accuracy, precision, recall, and F1 score,
to assess the system's effectiveness.
Analysis: Continuously evaluate the system's performance using benchmark datasets and real-world data to
fine-tune the model and ensure it meets performance goals.

Deployment and Scaling:

Design Choice: Ensure the system can be deployed at scale to handle a large volume of content and users.
Analysis: Consider cloud-based solutions and load balancing strategies to handle increasing traffic and
ensure system reliability.

Ethical Considerations:

Design Choice: Establish clear guidelines for content moderation and user privacy.
Analysis: Continuously engage with stakeholders to address ethical concerns and adapt the system's design
to evolving ethical standards and regulations.

Monitoring and Feedback Loop:

Design Choice: Implement a feedback loop to collect user feedback and continuously improve the system's
performance and fairness.
Analysis: Regularly review user feedback and iterate on the system's design to address user concerns and
improve overall effectiveness.

Scalability and Future Enhancements:

Design Choice: Plan for future enhancements and scalability as the system evolves.
Analysis: Consider the potential for incorporating emerging technologies and techniques to further enhance
hate speech detection capabilities.

Internship report 2023-2024 23


Model Selection:
By choosing a machine learning or deep learning model for hate speech detection, such as LSTM, CNN,
BERT, or a combination of these.
Train multiple models and evaluate their performance using appropriate metrics like F1-score, precision,
and recall.

Regular Updates:
Continuously collect and annotate new data containing examples of hate speech. This data should reflect the
latest forms of hate speech, slang, and context-specific expressions. Gathering diverse datasets is important
to ensure the detection models generalize well.

User-Friendly Reporting Mechanism:


Make it easy for users to report hate speech or offensive content. Implement clear and accessible reporting
buttons or forms on your platform or website.

Legal and Privacy Compliance:


Understand and comply with relevant data privacy regulations, such as the European Union's General Data
Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA). Ensure that data collection
and processing practices adhere to these laws.

Feature Extraction:

Bag of Words (BoW) : BoW represents text as a collection of unique words in a document. Each word is
treated as a feature, and the frequency of each word in the text is used as its value. BoW can be extended to
include n-grams (sequences of n words) to capture context.

TF-IDF (Term Frequency-Inverse Document Frequency) : TF-IDF is a technique that assigns a weight to
each word in a document based on its frequency in the document (TF) and its rarity across all documents
(IDF). It helps identify important words in a document.

Word Embeddings : Word embeddings like Word2Vec, GloVe, or fastText represent words as dense, fixed-
length vectors. These embeddings capture semantic relationships between words and can be used as features
in machine learning models.

Character-level Features : Extract features based on character-level information, such as character n-grams
or the presence of specific characters or symbols. This can help capture unique patterns in hate speech.

Internship report 2023-2024 24


CHAPTER 7

IMPLEMENTATION

Internship report 2023-2024 25


7. IMPLEMENTATION

Implementation is the stage where the theoretical design is turned into a working system. The
most crucial stage in achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.

The system can be implemented only after thorough testing is done and if it is found to work
according to the specification. It involves careful planning, investigation of the current
system and it constraints on implementation, design of methods to achieve the change over
and an evaluation of change over methods a part from planning.

Two major tasks of preparing the implementation are education and training of the users and
testing of the system. The more complex the system being implemented, the more involved
will be the system analysis and design effort required just for implementation.

The implementation phase comprises of several activities. The required hardware and
software acquisition is carried out. The system may require some software to be developed.
For this, programs are written and tested. The user then changes over to his new fully tested
system and the old system is discontinued.

TESTING
The testing phase is an important part of software development. It is the Information zed
system will help in automate process of finding errors and missing operations and also a
complete verification to determine whether the objectives are met and the user requirements
are satisfied. Software testing is carried out in three steps:

1. The first includes unit testing, where in each module is tested to provide its correctness,
validity and also determine any missing operations and to verify whether theobjectives
have been met. Errors are noted down and corrected immediately.

2. Unit testing is the important and major part of the project. So errors are rectified easily in
particular module and program clarity is increased. In this project entire system is
divided into several modules and is developed individually. So unit testing is conducted
to individual modules.

3. The second step includes Integration testing. It need not be the case, the software whose
modules when run individually and showing perfect results, will also show perfect
results when run as a whole.

Internship report 2023-2024 26


CHAPTER 8
SNAPSHOTS

Internship report 2023-2024 27


8.SNAPSHOTS

1. DATASET OF HATESPEECH DETECTION

2. CLASS DISTRIBUTION IN RAW DATASET

Internship report 2023-2024 28


3 . CONFUSION MATRIX FOR TF_IDF

4.
5.

3. PERFORMANCE MATRIX FOR CLASSICATION MODELS

Internship report 2023-2024 29


6. CONFUSION MATRIX FOR SENTIMENTAL ANALYSIS

7. PERFORMANCE METRICES OF CLASSICATION MODELS


(SENTIMENTAL ANAYSIS )

Internship report 2023-2024 30


CHAPTER 9
CONCLUSION

Internship report 2023-2024 31


CONCLUSION

Significance: Hate speech detection is of paramount importance in the digital age due to the widespread
proliferation of harmful content on online platforms.

Protection of Individuals and Communities: The primary goal of hate speech detection is to safeguard
individuals and communities from the detrimental effects of hate speech, including psychological harm,
discrimination, and violence.

Technological Advancements: The development of hate speech detection systems relies on cutting-edge
technologies such as natural language processing and machine learning, which have the potential to
automatically identify and mitigate hate speech at scale.

Ongoing Challenges: The field faces ongoing challenges, including the need to adapt to ever-evolving
language trends, mitigate algorithmic biases, and address ethical concerns related to freedom of expression
and censorship.

Balancing Act: Striking a balance between content moderation and preserving open discourse is a complex
ethical dilemma inherent in hate speech detection.

Positive Impact: Hate speech detection systems, when deployed responsibly, can promote healthier and
more inclusive online environments, fostering constructive digital discourse.

Ethical Considerations: The ethical considerations surrounding these systems underscore the importance of
transparency, fairness, and continuous improvement to uphold core values of respect and diversity.

Responsible AI: Hate speech detection exemplifies the need for responsible AI deployment, where
technology is harnessed to combat harm while respecting fundamental human rights.

Collective Responsibility: Addressing hate speech is a collective responsibility that involves collaboration
between technology developers, online platforms, policymakers, and society at large.

Internship report 2023-2024 32


CHAPTER 10
REFERENCES

Internship report 2023-2024 33


REFERENCES

1. Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech
Detection and the Problem of Offensive Language. arXiv preprint arXiv:1703.04009.
This research paper discusses the challenges of hate speech detection and presents an
approach using machine learning techniques.
2. Waseem, Z., & Hovy, D. (2016). Hateful Symbols or Hateful People? Predictive Features for
Hate Speech Detection on Twitter. In Proceedings of NAACL-HLT (pp. 88-93).
3. This paper explores predictive features for hate speech detection on Twitter, offering insights
into the nature of hateful content.
4. Fortuna, P., Nunes, S., & Rodrigues, P. (2018). A survey on automatic detection of hate speech
in text. ACM Computing Surveys (CSUR), 51(4), 1-30.
This survey provides a comprehensive overview of various techniques and approaches for
automatic hate speech detection in text.
5. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language
detection in online user content. In Proceedings of NAACL-HLT (pp. 99-105).
The paper discusses the problem of abusive language detection, a subset of hate speech
detection, and presents techniques for identifying such content.
Hate Speech Detection Kaggle Challenge.
Kaggle often hosts challenges related to hate speech detection, and the associated datasets and
solutions provided by participants can be valuable resources for research.
6. Basile, V., Fersini, E., Nozza, D., & Patti, V. (2019). Overview of the Evalita 2018 Hate
Speech Detection Task. In Proceedings of Evalita (pp. 1-7).
This paper provides an overview of a hate speech detection task in the Evalita competition,
which can be a useful reference for benchmark datasets and evaluation metrics.

Internship report 2023-2024 34

You might also like