Machine Learning Based Sarcasm Detection On Twitter Data
Machine Learning Based Sarcasm Detection On Twitter Data
Machine Learning Based Sarcasm Detection On Twitter Data
Abstract: S arcasm is a subtle type of irony, which can be widely Therefore, social networks are usually widely used, in
used in social networks. It is usually used to transmit hidden
information to criticize and ridicule a person and to recognize. The
particular microblogging sites such as Twitter. Thus, the
sarcastic reorganization system is very helpful for the improvement modern approach to sentiment analysis and opinion
of automatic sentiment analysis collected from different social analysis usually performs lower indicators when
networks and microblogging sites. S entiment analysis refers to analyzing collected data, such sites. Maynard and
internet users of a particular community, expresse d attitudes and Greenwood [4] show that the effectiveness of sarcastic
opinions of identification and aggregation. In this paper, to detect analysis can be significantly improved when sarcasm is
sarcasm, a pattern-based approach is proposed using Twitter data.
Four sets of features that include a lot of specific sarcasm is
detected in sarcastic statements. Therefore, effective
proposed and classify tweets as sarcastic and non-sarcastic. The means of detecting sarcasm are required.
proposed feature sets are studied and evaluate its additional cost Identifying sarcasm helps with the task of analyzing mood
classifications. when it is performed on microblogging sites such as
Twitter.
Keywords: Sarcasm detection, Twitter, Sentiment analysis, Mood analysis and opinion mining rely on emotional
Machine learning. words to detect their polarity in a text (that is, whether it
relates to "positivity” or “ negativity” in its thread).
I. INTRODUCTION However, the text appearance can lead to confusion. [5]
Today the twitter has been a very biggest network, by [6]. The aim of this paper is to propose a system to
using peoples share their opinions and thoughts. Twitter automatically detect a sarcastic tweet.
has been an official site which contain active 288 million
users and sent 500 million tweets are daily. [1]. II. RELATED WORK
However, due to the limitations of the unofficial In recent years, attention paid to analyzing twitter's
language and characters used by Twitter (that is, 140 mood by researchers and a number of current documents
characters per tweet), it is very difficult to understand the have been applied to classifying tweets. Sriram [7]
opinions of users and conduct such an analysis. In classifies tweets into a predefined set of General classes,
addition, the presence of sarcasm is even more difficult: including events, opinions, transactions and private
sarcastic when a person says that they are not what they messages, non-contextual features used, such as the
mean [2]. presence of slang, phrases about temporary events,
Oxford dictionary express sarcasm as "the use of opinion by word, and information about Twitter users.
sarcasm to Express or convey contempt". Free Dictionary The author of the article [8] [9] proposed a method for
also describes sarcasm as irony intended to convey identifying emotional and verbal patterns in Twitter data.
contempt. [3]. Sarcasm detection is very difficult in real However, most of the work was done to classify the
life. tweets according to the polarity of user sentiment towards
As a rule, people use sarcasm in everyday life, not only the specific topics, focusing on the content of the tweet.
jokes and humor, but also criticism or comments, ideas, Various functions have been proposed. They include
types and effects. presence of diagrams [10], frequency and non-text
features such as emoticons [11] [12]. The author [13],
defines framework that learns to classify the words and
that words emotions of context.
Sarcasm has been used in everyday conversation in a very
long time. Therefore, sarcasm in terms of psychological
[14] and neurobiological [15] is the subject of deep
research.
However, it has been studied as a language behavior
that characterizes a person. In [16], Burfoot and Baldwin
introduced a set of attributes, including profanity and
slang use, and they are certified "semantic action" and
Authorized licensed use limited to: Texas Tech University. Downloaded on May 24,2022 at 13:52:13 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Authorized licensed use limited to: Texas Tech University. Downloaded on May 24,2022 at 13:52:13 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Fig. 2. Flow of handling of Hindi tweets Eg: ‘‘You are incredibly funny -_- ’’
When the algorithm is used on data, the machine learns 4) The feature related to pattern
on the basis of type of data, like provide input tweets, and The selection pattern of the previous subsection and
there output is either positive, negative or neutral. So, qualified “general ironic expression “is very common and
when the machine learns itself, there is not any issue, for even in conversation. However, their number is small,
they are not unique, and our training and test seals largely
which language input is provided, the only matter is their
do not include them.
output.
In this approach, the words are classified according to two
categories: high-frequency words and content words that
B. Feature Extraction are based on its data, the frequency of the frequency, and
Next, the system feature extraction is carried out for the determining the sample as high-frequency words and slots
data. Four features are extracted are as follows: in the ordered sequence of contextual words.
Authorized licensed use limited to: Texas Tech University. Downloaded on May 24,2022 at 13:52:13 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
= (1)
Authorized licensed use limited to: Texas Tech University. Downloaded on May 24,2022 at 13:52:13 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Authorized licensed use limited to: Texas Tech University. Downloaded on May 24,2022 at 13:52:13 UTC from IEEE Xplore. Restrictions apply.