Mini Project II Report
Mini Project II Report
Mini Project II Report
Submitted By
MANORANJANI L N(1805054)
MITHRA V(1805057)
MONAL P(1805058)
Certified that this Mini Project - II Report “Fake News Detector” is the
bonafide work of “ Manoranjani L N(1805054), Mithra V(1805057),
Monal P(1805058)” who carried out the project work under my
supervision.
SIGNATURE SIGNATURE
Vattamalaipalayam Vattamalaipalayam
Coimbatore-22 Coimbatore-22
Machine learning provides systems the ability to automatically learn and improve from
experience without being explicitly programmed. The information on social media
networks has been increasing rapidly and hence this becomes the reason for the difficulty
in classifying it as true or false. People may trust the fake news as true one and they will
spread that news to the non - social media users which will create a negative impact on
people. Here we aimed to create a model that classifies the text as fake or true. This
model will be helpful to identify the fake news and neglect them, which can avoid some
controversies among the public.
ACKNOWLEDGEMENT
We put forth our hearts and souls to thank the Almighty for being with us through
our achievements and success. We would like to express our unfathomable thanks to our
esteemed and Honorable Managing Trustee Thiru.D.Lakshminarayanaswamy and
Joint Managing Trustee Thiru.R.Sundar for giving us the chance to be a part of this elite
team at Sri Ramakrishna Engineering College, Coimbatore.
We would like to express our sincere thanks to our honorable Principal Dr.N.R.
Alamelu, for the facilities provided to complete this project.
We wish to convey our special thanks to our academic coordinator, Dr.K. Deepa,
Professor, Information technology for her consistent support, timely help and valuable
suggestions during the entire period of our project work.
We would like to express our sincere thanks to our project guide Dr. J. Angel Ida
Chellam, Assistant Professor(Sr.Gr), Information technology for his valuable support in
the completion of this project.
We extend our sincere gratitude to all the teaching and non-teaching staff of our
department who helped us during our project.
TABLE OF CONTENTS
ABSTRACT iii
ACKNOWLEDGEMENT iv
1 INTRODUCTION 1
1.3 APPLICATIONS 3
2 LITERATURE SURVEY 4
3.2 DESCRIPTION 10
3.4.3.4 ADABOOST 14
3 .4.4 TESTING 16
4 RESULT 17
5.1 CONCLUSION 19
5.2 FUTURE SCOPE 19
6 REFRENCES 20
7 APPENDICES 22
INTRODUCTION
The advent of the World Wide Web and the rapid adoption of social media
platforms paved the way for rapid spread of information that has never been witnessed in
human history before. Besides other use cases, news outlets benefitted from the
widespread use of social media platforms by providing updated news in near real time to
its subscribers. The traditional way of reading news from newspapers, tabloids, and
magazines had moved to a digital form such as online news platforms, blogs, social media
feeds, and other digital media formats. It became easier for consumers to acquire the
latest news at their fingertips.
There has been a rapid increase in the spread of fake news in the last decade. Such
proliferation of sharing articles online that do not conform to facts has led to many
problems. It is not just limited to politics but covering various other domains such as
sports, health, and also science . One such area affected by fake news is the financial
markets , where a rumor can have disastrous consequences and may bring the market to a
halt.
Fake news in social media has become a major problem so that the Social media
users are misled by them. Increase in the spread of fake news creates a negative impact on
every individual. The main aim of this project is to increase the trustworthiness of the
online news among people.
APPLICATIONS
LITERATURE SURVEY
Various researches have been done for Fake news. This research is done prior to
taking up the project and understanding the various methods that were used previously.
This study helped to identify the benefits and drawbacks of the existing system.
1. “Survey on Automated System for Fake News Detection using NLP & Machine
Learning Approach” by Subhadra Gurav1, Swati Sase2, Supriya Shinde3, Prachi
Wabale4, Sumit Hirve5 they took the news event, analyze related data from data sources
and then use various classification algorithms to classify the news as legitimate or fake.
2. “A Tool for Fake News Detection'' by Bashar AI Asaad and Madalina Erascu
In this paper, they proposed machine learning techniques for fake news detection. They
used a dataset of fake and real news to train a machine learning model using the
Scikit-learn library in Python and extracted features from the dataset using text
representation models. We tested two classification approaches, namely probabilistic
classification and linear classification on the title and the content, checking if it is
clickbait or non-clickbait, i.e fake or real.
In this project a model is build based on the count vectorizer or a tfidf matrix ( i.e )
word tallies relatives to how often they are used in other articles in our dataset can help.
Since the motive is for text classification, Implementing a Binomial logistic regression
will give more accuracy.
The actual goal is in developing a model which was the text transformation (count
vectorizer vs tfidf vectorizer) and choosing which type of text to use (headlines vs full
text). Now the next step is to extract the most optimal features for countvectorizer or
tfidf-vectorizer, this is done by using a n-number of the most used words, and/or phrases,
lower casing or not, mainly removing the stop words which are common words such as
“the”, “when”, and “there” and only using those words that appear at least a given
number of times in a given text dataset.
LOGISTIC REGRESSION
As the classification of text is on the basis of a wide feature set, with a binary
output (true news/fake news), a logistic regression (LR) model is used, since it provides
the intuitive equation to classify problems into binary or multiple classes. We performed
hyperparameter tuning to get the best result for all individual datasets, while multiple
parameters were tested before acquiring the maximum accuracies from LR model.
Mathematically, the logistic regression hypothesis function can be defined as follows:
Logistic regression uses a sigmoid function to transform the output to a probability value;
the objective is to minimize the cost function to achieve an optimal probability. The cost
SOFTWARE REQUIREMENTS
PYTHON 3.7
GOOGLE COLAB
DATASET DESCRIPTION
Two datasets have been used in this project namely fake.csv and true.csv. These
datasets have a shape of (23481, 4) and (21417, 4) respectively. The first column contains
the title of the article, the second column contains the text, the third column contains the
subject of the article and the fourth column contains the date in which the article is
published. Fake news dataset and true news dataset are labeled as 1 and 0 respectively.
Both the datasets are merged into a single dataset. The datasets have been divided into
train and test dataset in which test dataset is 20% of the total dataset and train dataset is
80% of the total dataset.
WORKING PRINCIPLE
Input values (x) are combined linearly using weights or coefficient values to
predict an output value (y). A key difference from linear regression is that the output
value being modeled is a binary value (0 or 1) rather than a numeric value. Below is an
equation for logistic regression:
y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))
MODULE DESCRIPTION
DATA COLLECTION
The datasets have been taken from kaggle which contains true and fake news
separately.
FAKE DATASET
TRUE DATASET
DATA ANALYSIS
Libraries like Numpy, Pandas, Seaborn and matplotlib are used for data analysis.
Nltk is the library used for data cleaning using which the stopwords are removed.
MODEL TRAINING
The regression model is trained using Sklearn library which is the efficient library
for machine learning in python. TfidfVectorizer is used for extracting the features.
SGDClasiffier is used to train the model using logistic regression with minimal loss.
Using the GridSearchCV function we will get accuracy/loss for every combination of
hyperparameters and we can choose the one with the best performance.
MODEL TESTING
Confusion matrix is created for evaluating the model trained. It is created using the
library sklearn
RESULT
● People will start to trust the social media platform and to save them by getting
misled by them.
● Fake news can create a dispute among the people. With the help of this model,
peace is spread throughout the society.
● Highly accurate.
DISADVANTAGES
● As our model is created using a Machine learning algorithm, it cannot predict the
news other than the given dataset.
● This model cannot be applicable for both photos and videos cannot be predicted.
CONCLUSION