Report

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

RUMOUR

DETECTION FROM
SOCIAL MEDIA

BY:
SOHAM NANDY
SAHIL MAHAJAN
VARDAAN BAJAJ
RUMOURS
?
What is Rumor?

 A rumour is a story or piece of information that may or may not be


true, but that people are talking about.

 Two types of rumours :-


1) long-standing rumours
2) newly emerging rumours
Problem Definition

 Rumors, fortunately or unfortunately affects us all and in many ways then we care to
remember.

 Despite the increasing use of social media platforms for information and news gathering,
its unmoderated nature often leads to the emergence and spread of rumours.

 At the same time, the openness of social media platforms provides opportunities to study
how users share and discuss rumours, and to explore how to automatically assess their
veracity, using natural language processing and data mining techniques.
Problem Definition

 We provide an overview of research into social media rumours with the ultimate goal of
developing a rumour classification system that consists of four components:
1. Rumour detection,
2. Rumour tracking,
3. Rumour stance classification, and
4. Rumour veracity classification.

 We delve into the approaches presented in the scientific literature for the development of
each of these four components.
Aim
 This project aims to investigate the characteristics of rumours found on online social
networks. The characteristics could be: Size and frequency of messages, message
propagation through the social network, and sentence structure of the messages.

 This study seeks to identify the key traits of rumours on online social networks such as
Twitter. The importance of automating the identification of rumours is growing ever-
increasingly important, given the rise of the internet’s popularity as a source of news,
and the ever-growing amount of information on the internet.
Methodology
Project Planning:-
The project plan must comprehensively account for all tasks required to be completed for the
project, accounting for the research direction of the project and the dependencies between
tasks.

Information Harvester Development:-


The information harvester must be able to collect tweets from Twitter in automated and
consistent manner.

Literature Review:-
Literature review will be undertaken for both academic fields of Computer Science and Social
Sciences, to gain a mix of insights of how rumours are detected.
Methodology
Feature Selection & Engineering:-
This project will be exploring datasets gathered through the collection of tweets from the Twitter
API. Investigative work will be performed to engineer additional features based on existing tweet
data, such as tweet type and tweet text. This section will also include manual labelling of tweets to
indicate the tweet’s sentiment (eg. is news, is rumour), which will be used as the target label for
classification purposes.

Sentiment Analysis using Machine Learning Techniques:-


Work will be performed to engineer more features via the usage of sentiment libraries. Lastly,
Machine Learning classifiers will be used to detect key trends in the dataset.

Testing, Results, and Discussion:-


The testing phase will report characteristics of the datasets collected and elaborate on the impact
of the findings generated.

Full System Integration:-


The full system integration seeks to provide an easy-to-use web user interface for the user to
easily discover insights from the datasets and experiment results generated.
Workflow
Workflow
The general data workflow consists of the following 4 elements :-

1.Twitter
Twitter is a social network platform where participants can make posts and interact with fellow
participants using hashtags, quote retweets, retweets, and comments. The datasets used in this project are
based on tweets collected from Twitter.

2.Information Harvester
The Information Harvester collects tweets from Twitter based on search queries by the user.

3.MongoDB Database
The MongoDB Database stores tweets from the information harvester. Tweets are put through a data
cleaning process and are imported into the MongoDB Database.

4.Analysis & Development Platform


The Analysis & Development Platform is where all further in-depth analysis and work are performed.
TWITTER
2nd
Largest Social Networking Site

1,300,000,000
Twitter Accounts
5,000,000
Tweets per Day
INFORMATIONHARVESTER
» Automated 24/7 tweet collection
» Networkoptimizations
» Duplicate tweet reduction
» Gzipped archives for 90% space savings
DATAPREPROCESSING
1. Decompress archives
2. Remove tweet duplicates
3. Label tweets with tweet types
4. Generate tweet relationship data
Future Scope
As this is only a preliminary and broad study on rumours on online social networks, improvements can be
done in the following ways:-

1) The existing workflow can be enhanced in the following ways:

- Leveraging on GPU acceleration to speed up calculations


- Utilizing a distributed database for greater scale-up capability
- Real-time importing and visualization of data

2) Testing can be done in the following ways:

- Evaluation of existing models on public datasets (eg. News datasets)


- Evaluation of existing models on other types of texts (eg. Articles)
Some Snapshots of the App
Questions?
Thank You

You might also like