Report
Report
Report
DETECTION FROM
SOCIAL MEDIA
BY:
SOHAM NANDY
SAHIL MAHAJAN
VARDAAN BAJAJ
RUMOURS
?
What is Rumor?
Rumors, fortunately or unfortunately affects us all and in many ways then we care to
remember.
Despite the increasing use of social media platforms for information and news gathering,
its unmoderated nature often leads to the emergence and spread of rumours.
At the same time, the openness of social media platforms provides opportunities to study
how users share and discuss rumours, and to explore how to automatically assess their
veracity, using natural language processing and data mining techniques.
Problem Definition
We provide an overview of research into social media rumours with the ultimate goal of
developing a rumour classification system that consists of four components:
1. Rumour detection,
2. Rumour tracking,
3. Rumour stance classification, and
4. Rumour veracity classification.
We delve into the approaches presented in the scientific literature for the development of
each of these four components.
Aim
This project aims to investigate the characteristics of rumours found on online social
networks. The characteristics could be: Size and frequency of messages, message
propagation through the social network, and sentence structure of the messages.
This study seeks to identify the key traits of rumours on online social networks such as
Twitter. The importance of automating the identification of rumours is growing ever-
increasingly important, given the rise of the internet’s popularity as a source of news,
and the ever-growing amount of information on the internet.
Methodology
Project Planning:-
The project plan must comprehensively account for all tasks required to be completed for the
project, accounting for the research direction of the project and the dependencies between
tasks.
Literature Review:-
Literature review will be undertaken for both academic fields of Computer Science and Social
Sciences, to gain a mix of insights of how rumours are detected.
Methodology
Feature Selection & Engineering:-
This project will be exploring datasets gathered through the collection of tweets from the Twitter
API. Investigative work will be performed to engineer additional features based on existing tweet
data, such as tweet type and tweet text. This section will also include manual labelling of tweets to
indicate the tweet’s sentiment (eg. is news, is rumour), which will be used as the target label for
classification purposes.
1.Twitter
Twitter is a social network platform where participants can make posts and interact with fellow
participants using hashtags, quote retweets, retweets, and comments. The datasets used in this project are
based on tweets collected from Twitter.
2.Information Harvester
The Information Harvester collects tweets from Twitter based on search queries by the user.
3.MongoDB Database
The MongoDB Database stores tweets from the information harvester. Tweets are put through a data
cleaning process and are imported into the MongoDB Database.
1,300,000,000
Twitter Accounts
5,000,000
Tweets per Day
INFORMATIONHARVESTER
» Automated 24/7 tweet collection
» Networkoptimizations
» Duplicate tweet reduction
» Gzipped archives for 90% space savings
DATAPREPROCESSING
1. Decompress archives
2. Remove tweet duplicates
3. Label tweets with tweet types
4. Generate tweet relationship data
Future Scope
As this is only a preliminary and broad study on rumours on online social networks, improvements can be
done in the following ways:-