Fake News Synopsis 1
Fake News Synopsis 1
Fake News Synopsis 1
SESSION: 2022-2023
In our modern era where the internet is ubiquitous, everyone relies on various online
resources for news. Along with the increase in the use of social media platforms like
Facebook, Twitter, etc. news spread rapidly among millions of users within a very
short span of time. The spread of fake news has far-reaching consequences like the
creation of biased opinions to swaying election outcomes for the benefit of certain
candidates. Moreover, spammers use appealing news headlines to generate revenue
using advertisements via click-baits. In this paper, we aim to perform binary
classification of various news articles available online with the help of concepts
pertaining to Artificial Intelligence, Natural Language Processing and Machine
Learning. We aim to provide the user with the ability to classify the news as fake or
real and also check the authenticity of the website publishing the news.
The goal of the research is to look at how deception detection supported support
vector machines and Naive Bayes classifier work for this particular problem given a
manually labelled news dataset and to support (or not) the thought of using AI for
fake news detection.
2. Introduction
Fake news detection is a subtask of text classification and is often defined as the task
of classifying news as real or fake. The term ‘fake news’ refers to the false or
misleading information that appears as real news. It aims to deceive or mislead people.
Problem definition
Given a multi-source news dataset and social contexts of news consumers (social media
users), the task of fake news detection is to determine if a news item is fake or real.
Formally, we define the problem of fake news detection as:
In the era of news in our lives, it is the people’s responsibility to not to share any
misleading information as there are many sources available now-a-days. The fraud
news such as spam messages, funding news or any false information to be fall out or
reach to the people we consider it as a serious issue although it is extremely complicated
to find out which is fraud and which is not a fraud profile or users in social media, they
replicate the information as the original one. As the technology evolved and the machine
intelligence has come into existence everyone tends to use available sources for creating
and dissemination of fraud news. People who are illiterate might be new to digital media
as they are inexperienced, so they are the ones who believe that fraud news easily and
makes it practical in their lives. To a minimum, we have deviled а simple web
аррliсаtiоn which statistically detects false information, and also real news.
4. Related work
(i) Mykhailo Granik et. al. in their paper shows a simple approach for fake news
detection using naive Bayes classifier. This approach was implemented as a
software system and tested against a data set of Facebook news posts. They were
collected from three large Facebook pages each from the right and from the left,
as well as three large mainstream political news pages (Politico, CNN, ABC
News). They achieved classification accuracy of approximately 74%.
Classification accuracy for fake news is slightly worse. This may be caused by
the skewness of the dataset: only 4.9% of it is fake news.
(ii) Himank Gupta et. al. gave a framework based on different machine learning
approach that deals with various problems including accuracy shortage, time lag
(BotMaker) and high processing time to handle thousands of tweets in 1 sec.
Firstly, they have collected 400,000 tweets from HSpam14 dataset. Then they
further characterize the 150,000 spam tweets and 250,000 non- spam tweets.
They also derived some lightweight features along with the Top-30 words that are
providing highest information gain from Bag-of-Words model. 4. They were able
to achieve an accuracy of 91.65% and surpassed the existing solution by
approximately 18%.
(iii) Marco L. Della Vedova et. al. first proposed a novel ML fake news detection
method which, by combining news content and social context features,
outperforms existing methods in the literature, increasing its accuracy up to
78.8%. Second, they implemented their method within a Facebook Messenger
Chabot and validate it with a real-world application, obtaining a fake news
detection accuracy of 81.7%. Their goal was to classify a news item as reliable or
fake; they first described the datasets they used for their test, then presented the
content-based approach they implemented and the method they proposed to
combine it with a social-based approach available in the literature. The resulting
dataset is composed of 15,500 posts, coming from 32 pages (14 conspiracy
pages, 18 scientific pages), with more than 2, 300, 00 likes by 900,000+ users.
8,923 (57.6%) posts are hoaxes and 6,577 (42.4%) are non-hoaxes.
5. Methodology
This paper explains the system which will be developed in three parts. The first part is
static which works on machine learning classifier. We will study and train the model
with 4 different classifiers and will choose the best classifier for final execution. The
second part is dynamic which takes the keyword/text from user and searches online
for the truth probability of the news. The third part provides the authenticity of the
URL input by user. In this paper, we will use Python and its Sci-kit libraries. Python
has a huge set of libraries and extensions, which can be easily used in Machine
Learning. Sci-Kit Learn library is the best source for machine learning algorithms
where nearly all types of machine learning algorithms are readily available for
Python, thus easy and quick evaluation of ML algorithms is possible. We will use
Django for the web based deployment of the model, provides client side
implementation using HTML, CSS and Javascript. We will also use Beautiful Soup
(bs4), requests for online scrapping.
A. System Design
6. Timeline
Our first major task is to go through various research paper so as to find the extent
and domain of detection technique so our first phase is to go through various research
paper which is estimated to be completed in December 2022 and then there comes the
second phase. In second phase , after finalising the algorithm , we begin the coding
phase that completely belongs to the frontend of our system and it will be done till
January. After the second phase , here comes the core phase and that is algorithm and
scrapping phase. In third phase , we will work on the core part of our system and it
will be done till March and after this testing phase will begin , in which we
continuously test our system and this will be completed before April and after this our
product is finally ready for deployment and use.
References .
[1] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, “Fake News
Detection on Social Media: A Data Mining Perspective” arXiv:1708.01967v3 [cs.SI],
3 Sep 2017
[2] M. Granik and V. Mesyura, "Fake news detection using naive Bayes classifier," 2017
IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON),
Kiev, 2017, pp. 900-903.