Aishwarya Pendyala Fall2019

FAKE CONSUMER REVIEW DETECTION
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of

the requirements for the degree of
MASTER OF SCIENCE
in
Software Engineering
by
Aishwarya Pendyala
FALL
2019
© 2019
Aishwarya Pendyala
ALL RIGHTS RESERVED
ii
A Project
by
Aishwarya Pendyala
Approved by:
__________________________________, Committee Chair

Dr. Jingwei Yang
__________________________________, Second Reader

Dr. Jinsong Ouyang
____________________________
Date
iii
Student: Aishwarya Pendyala
I certify that this student has met the requirements for format contained in the University
format manual, and that this project is suitable for electronic submission to the library
and credit is to be awarded for the project.
__________________________, Graduate Coordinator ____________________

Dr. Jinsong Ouyang Date
Department of Computer Science
iv
Abstract
of
by
Aishwarya Pendyala
Consumers’ reviews on ecommerce websites, online services, ratings and
experience stories are useful for the user as well as the vendor. The reviewer can increase
their brand’s loyalty and help other customers understand their experience with the
product. Similarly reviews help the vendors gain more profiles by increasing their sale of
products, if consumers leave positive feedback on their product review. But
unfortunately, these review mechanisms can be misused by vendors.
For example, one may create fake positive reviews to promote brand’s reputation
or try to demote competitor’s products by leaving fake negative reviews on their product.
Existing solutions with supervised include application of different machine learning
algorithms and different tools like Weka.
Unlike the existing work, instead of using a constrained dataset I chose to have a
wide variety of vocabulary to work on such as different subjects of datasets combined as
one big data set. Sentiment analysis has been incorporated based on emojis and text
content in the reviews. Fake reviews are detected and categorized. The testing results are
obtained through the application of Naïve Bayes, Linear SVC, Support Vector Machine
and Random forest algorithms.

v
The implemented (proposed) solution is to classify these reviews into fake or
genuine. The highest accuracy is obtained by using Naïve Bayes by including sentiment
classifier.
_______________________, Committee Chair

Dr. Jingwei Yang
_______________________
Date
vi
DEDICATION
To My Family
vii
ACKNOWLEDGEMENTS
I whole heartedly show sincere gratitude to my project guide, Dr. Jingwei Yang
and Dr. Jinsong Ouyang for guiding me with their technical expertise, providing me
feedback and suggestions for improving this project and giving me an opportunity to gain
and learn through my project experience. I thank Dr. Jagannadha Chidella for constantly
supporting and helping me with the research work.
I am also thankful my family for their love, support and trust in me throughout
my masters.
viii
TABLE OF CONTENTS
Page
Dedication…………………………………………………...………………...…….vii
Acknowledgments………………………………………………………………......viii
List of Tables…………………………………………………………………….…...xi
List of Figures……………………………………………………………………..…xii
Chapter
1. INTRODUCTION…………………………………………………………………..1
2. PROBLEM STATEMENT………………………………………………………… 4
3. MOTIVATION AND RELATED WORK ............................................................... 6
4.PROPOSED SOLUTION .......................................................................................... 8
4.1 Data Collection ........................................................................................... 9
4.2 Data Preprocess......................................................................................... 10
4.3 Feature Extraction ..................................................................................... 10
4.4 Sentiment Analysis ................................................................................... 11
4.5 Fake Review Detection ............................................................................. 12
4.6 Performance Evaluation and Results ........................................................ 12
5.EXPERIMENTAL CONFIGURATION ..................................................................13
5.1 NLP based Textblob classifier ................................................................. 18
5.2 SKlearn based classifiers ……………………………...……... ............... 20
5.3 Technologies Used ................................................................................... 21
ix
5.3.1 Hardware Configuration .............................................................21
5.3.2 Software Configuration ...............................................................22
6. ACCURACY ENHANCEMENTS ..........................................................................23
6.1 Enhancement 1……………………………...……....................................23
6.2 Enhancement 2 ..........................................................................................25
6.3 Enhancement 3…………………………………………………………... 27
6.4 Enhancement 4…………………………………………………………... 27
7. RESULTS………………………………………………………………………....31
8. CONCLUSION…………………………………………………………………...40
9. FUTURE WORK…………………………………………………………………41
References………………….…………………………...…………………………....42
x
LIST OF TABLES
Tables Page
1. Results ................................................................................................................ 37
xi
LIST OF FIGURES
Figures Page
1. Implementation Architecture………………………………...………………....9
2. Frequent word list sample...…………………………………………………...10
3. Data Exploration...…………………………………………………………….13
4. Data Labeled ..................................................................................................... 14
5. Dataset Load ..................................................................................................... 14
6. Preprocessing .................................................................................................... 15
7. Feature Extraction ............................................................................................. 16
8. Verified Purchase Review Count ...................................................................... 16
9. Rating Review count ......................................................................................... 17
10. F-Train.txt (Fake Review training dataset) ....................................................... 19
11. T-Train.txt (True Review training dataset) ....................................................... 19
12. TestingData.txt (Fake Review testing dataset) ................................................. 20
13. Reviews.txt........................................................................................................ 21
14. Sentiment word list ........................................................................................... 24
15. Sentiment word list code snippet ...................................................................... 25
16. POS tagging code snippet ................................................................................. 26
17. Emojis classification ......................................................................................... 28
18. Including Emoticons in the data ....................................................................... 28

xii
19. Emoji Sentiment ranking………………………………………………………29
20. Emoji Sentiment score code snippet .................................................................. 29
21. Label vs Product Category code snippet............................................................ 31
22. Label vs Rating code snippet ............................................................................. 32
23. Label vs Rating .................................................................................................. 32
24. Label vs Emoji count code snippet .................................................................... 33
25. Label vs Emoji count ......................................................................................... 33
26. Label vs Stopwords Count code snippet ............................................................ 34
27. Label vs Stopwords Count ................................................................................. 34
28. Label vs Verified Purchase code snippet ........................................................... 35
29. Label vs Verified Purchase ................................................................................ 36
30. Output.txt (Classified testing output dataset) .................................................... 36
31. Results graph…………………………………………………….……………. 38
xiii
1
Chapter 1: Introduction
Everyone can freely express his/her views and opinions anonymously and without
the fear of consequences. Social media and online posting have made it even easier to
post confidently and openly. These opinions have both pros and cons while providing the
right feedback to reach the right person which can help fix the issue and sometimes a con
when these get manipulated These opinions are regarded as valuable. This allows people
with malicious intentions to easily make the system to give people the impression of
genuineness and post opinions to promote their own product or to discredit the competitor
products and services, without revealing identity of themselves or the organization they
work for. Such people are called opinion spammers and these activities can be termed as
opinion spamming.
There are few different types of opinion spamming. One type is giving positive
opinions to some products with intention to promote giving untrue or negative reviews to
products to damage their reputation. Second type consists of advertisements with no
opinions on product. There is lot of research work done in field of sentiment analysis and
created models while using different sentiment analysis on data from various sources, but
the primary focus is on the algorithms and not on actual fake review detection. One of
many other research works by E. I. Elmurngi and A. Gherbi [1] used machine learning
algorithms to classify the product reviews on Amazon.com dataset [2] including customer
usage of the product and buying experiences. The use of Opinion Mining, a type of
language processing to track the emotion and thought process of the people or users about
a product which can in turn help research work.

2
Opinion mining, which is also called sentiment analysis, involves building a
system to collect and examine opinions about the product made in social media posts,
comments, online product and service reviews or even tweets. Automated opinion
mining uses machine learning, a component of artificial intelligence. An opinion mining
system can be built using a software that can extract knowledge from dataset and
incorporate some other data to improve its performance.
One of the biggest applications of opinion mining is in the online and e-commerce
reviews of consumer products, feedback and services. As these opinions are so helpful
for both the user as well as the seller the e-commerce web sites suggest their customers
to leave a feedback and review about their product or service they purchased. These
reviews provide valuable information that is used by potential customers to know the
opinions of previous or current users before they decide to purchase that product from
that seller. Similarly, the seller or service providers use this information to identify any
defects or problems users face with their products and to understand the competitive
information to know the difference about their similar competitors’ products.
There is a lot of scope of using opinion mining and many applications for different
usages:
Individual consumers: A consumer can also compare the summaries with
competing products before taking a decision without missing out on any other better
products available in the market.
Businesses/Sellers: Opinion mining helps the sellers to reach their audience and
understand their perception about the product as well as the competitors. Such reviews
3
also help the sellers to understand the issues or defects so that they can improve later
versions of their product. In today’s generation this way of encouraging the consumers to
write a review about a product has become a good strategy for marketing their product
through real audience’s voice. Such precious information has been spammed and
manipulated. Out of many researches one fascinating research was done to identify the
deceptive opinion spam [3].

4
Chapter 2: Problem Statement
People write unworthy positive reviews about products to promote them. In some
cases malicious negative reviews to other (competitive) products are given in order to
damage their reputation. Some of these consists of non-reviews (e.g., ads and promotions)
which contain no opinions about the product.
The first challenge here is, a word can be positive in one situation while being
negative in any other situation. For e.g. the word "long" in terms of a laptop’s battery life
being long is a positive opinion while the same word about the start time is long is a
negative opinion. This shows that the opinion mining system trained about words from
opinions cannot understand this nature of the word, giving a different meaning in different
situations.
Another challenge is that people don't always express opinions the same way.
Most of the traditional text processing techniques assume that small difference in text
don't change the meaning much. However, in opinion mining, e.g. the service was great,
and the service wasn’t great does make a huge difference.
Finally, in some cases, people give contradictory statements which were difficult
to anticipate the nature of the opinion. There could be a hidden positive sense in a negative
review. And sometimes there is both positive and negative opinion about the product. An
emotion factor can add a lot to what a person says or expresses. Adding a negative emoji
to a positive comment or vice versa. In the millennial world of texting people have
replaced long sentences with short forms and emoticons. These emoticons when used in
5
text format are composed of punctuations and there is a good chance that they will be lost
in data cleaning process while preprocessing the text in opinion mining.
After all these challenges, detecting the reviews that are not genuine or which are
used to deviate the consumers opinion in a certain direction becomes even more difficult.
Opinion spamming or fake review detection is thus significant problem for ecommerce
sites and other service providers as the consumer these days rely highly on such opinions
or reviews.
6
Chapter 3: Motivation and Related work
Lack of genuine feedback, creating fake reviews and ratings for supporting the
products on their website to improve their reputation and sales is unfair and misleading.
This is a common practice these days which increases the need for a fake review detector.
In a recent study a method was proposed by E.I Elmurngi and A. Gherbi [1] using
an open source software tool called ‘Weka tool’ to implement machine learning
algorithms using sentiment analysis to classify fair and unfair reviews from amazon
reviews based on three different categories positive, negative and neutral words. In this
research work, the spam reviews are identified by only including the helpfulness votes
voted by the customers along with the rating deviation are considered which limits the
overall performance of the system. Also, as per the researcher’s observations and
experimental results, the existing system uses Naive Bayes classifier for spam and non-
spam classification where the accuracy is quite low which may not provide accurate
results for the user.
Initially N. O’Brien [4] and J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F.
Benevenuto [5] have proposed solutions that depends only on the features used in the data
set with the use of different machine learning algorithms in detecting fake news on social
media. Though different machine learning algorithms the approach lacks in showing how
accurate the results are.
B. Wagh, J.V.Shinde, P.A.Kale [6] worked on twitter to analyze the tweets posted
by users using sentiment analysis to classify twitter tweets into positive and negative.
They made use of K-Nearest Neighbor as a strategy to allot them sentiment labels by
7
training and testing the set using feature vectors. But the applicability of their approach
to other type of data has not been validated.

8
Chapter 4: Proposed solution
To solve the major problem faced by online websites due to opinion spamming,
this project proposes to identify any such spammed fake reviews by classifying them into
fake and genuine. The method attempts to classify the reviews obtained from freely
available datasets from various sources and categories including service based, product
based, customer feedback, experience based and the crawled Amazon dataset with a
greater accuracy using Naïve Bayes [7], Linear SVC, SVM, Random forest, Decision
Trees algorithm. In order to improve the accuracy, the additional features like comparison
of the sentiment of the review, verified purchases, ratings, emoji count, product category
with the overall score are used in addition to the review details.
A classifier is built based on the identified features. And those features are
assigned a probability factor or a weight depending on the classified training sets. This is
a supervised learning technique applying different Machine learning algorithms to detect
the fake or genuine reviews,
The high-level architecture of the implementation can be seen in Figure:1 and the problem
is solved in the following six steps:

9
Figure 1: Implementation Architecture
4.1 Data Collection

Consumer review data collection- Raw review data was collected from different sources,
such as Amazon, websites for booking Airlines, Hotel and Restaurant, CarGurus, etc.
reviews. Doing so was to increase the diversity of the review data. A dataset of 21000
was created.
10
4.2 Data Preprocess
Processing and refining the data by removal of irrelevant and redundant information as
well as noisy and unreliable data from the review dataset.
Step 1 Sentence tokenization
The entire review is given as input and it is tokenized into sentences using NLTK
package.
Step 2 Removal of punctuation marks
Punctuation marks used at the starting and ending of the reviews are removed along
with additional white spaces.
Step 3 Word Tokenization
Each individual review is tokenized into words and stored in a list for easier retrieval.
Step 4 Removal of stop words
Affixes are removed from the stem. For example, the stem of "cooking" is "cook", and
the stemming algorithm knows that the "ing" suffix can be removed. A few words from
the frequent word list is shown below in Figure: 2.
Figure 2: Frequent word list sample
4.3 Feature extraction

The preprocessed data is converted into a set of features by applying certain
parameters. The following features are extracted:
Normalized length of the review-Fake reviews tend to be of smaller length.
Reviewer ID- A reviewer posting multiple reviews with the same Reviewer ID.
11
Rating-Fake reviews in most scenarios have 5 out of 5 stars to entice the customer or have
the lowest rating for the competitive products thus it plays an important role in fake
detection.
Verified Purchase-Purchase reviews that are fake have lesser chance of it being verified
purchase than genuine reviews.
Thus these combination of features are selected for identifying the fake reviews.
This in turn improves the performance of the prediction models.
4.4 Sentiment Analysis
Classifying the reviews according to their emotion factor or sentiments being
positive, negative or neutral. It includes predicting the reviews being positive or negative
according to the words used in the text, emojis used, ratings given to the review and so
on. Related research [8] shows that fake reviews has stronger positive or negative
emotions than true reviews. The reasons are that, fake reviews are used to affect people
opinion, and it is more significant to convey opinions than to plainly describe the facts.
The Subjective vs Objective ratio matters: Advertisers post fake reviews with more
objective information, giving more emotions such as how happy it made them than
conveying how the product is or what it does. Positive sentiment vs negative sentiment:
The sentiment of the review is analyzed which in turn help in making the decision of it
being a fake or genuine review.

12
4.5 Fake Review Detection

Classification assigns items in a collection to target categories or classes. The goal
of classification is to accurately predict the target class for each case in the data. Each
data in the review file is assigned a weight and depending upon which it is classified into
respective classes - Fake and Genuine.
4.6 Performance Evaluation and Results
Comparison of the accuracies of various models and classifiers with enhancements for
better results, as discussed in Accuracy Enhancements [chapter 6]

13
Chapter 5: Experimental Configuration

The implementation of this project uses supervised learning technique on the
datasets and the fake and genuine labels help us to cross validate the classification results
of the data.
Collection of data is done by choosing appropriate dataset. Datasets for such reviews with
labels is found from different sources like hotel reviews, amazon product reviews, and
other free available review datasets and combined into Reviews.txt file. Firstly, the
dataset is explored by loading it as csv format as shown in Figure 3.
Figure 3: Data exploration
Then to make it readable, the labels in the dataset are clearly labelled as fake or genuine
as shown in Figure 4.
14
Figure 4: Data Labeled
The dataset created from multiple sources of information has many forms of redundant
and unclean values. Such type of data is neither useful nor easy to model.
Preprocessing: Data has been cleaned by removing all the null values, white spaces and
punctuations. This raw dataset is loaded in the form of <ID, Review text, Label> tuple
using the code as shown in Figure 5 allowing to only focus on the textual review content.
Figure 5: Dataset Load
Then the raw data is preprocessed by applying tokenization, removal of stop words and
lemmatization. The code snippet used is shown in Figure 6.

15
Figure 6: Preprocessing
Feature Extraction: The text reviews have different features or peculiarities that can
help to solve the classification problem. For e.g. Length of reviews (fake reviews tend to
be smaller in length with less facts revealed about the product) and repetitive words (fake
reviews have smaller vocabulary with words repeated). Apart from the just the review
text there are other features that can contribute towards the classification of reviews as
fake. Some of the significant ones that were used as additional features inclusion are
Ratings, verified purchase and product category. The code snippet used to extract them
is shown in Figure 7.
16
Figure 7: Feature Extraction
Figure 8 and Figure 9 show the count of the reviews for each feature.
Figure 8: Verified Purchase Review Count

17
Figure 9: Rating Review count
Sentiment Analysis: This processed data is now analyzed for emotions or sentiment, if
the review is positive or negative. The significant factors for doing the sentiment analysis
of the reviews are use of emoticons sentiment scores and the rating of the reviews. Note
that while removing the punctuation marks a list of emoticons is parsed to be exception,
so we do not remove or discard them by accident, while cleaning the dataset. This is
explained in more detail in chapter 6 of Accuracy Enhancements under Enhancement 4
section. Sentiment analysis is performed with use of different classification algorithms
such as Naïve Bayes, Linear SVC, Non-linear SVM and Random forest to obtain better
results and compare the accuracies.
Fake review Detection: This is the final goal of the project to classify these reviews into
fake or genuine. The preprocessed dataset is thus classified using different classification
algorithms to analyze variety of data to classify it.

18
5.1 NLP based Text blob Classifier:
The two classifiers used in this configuration are:
a. Naive Bayes classifier
b. Decision Tree classifier
The experimental configuration for both classifiers was kept the same, and this section
consists of the configurations used to set up the models for training the Python Client.
Naïve Bayes [7] and Decision Tree Classifier are used for detecting the genuine(T) and
fake(F) reviews across a wide range of data set. The probability for each word is
calculated is given by the ratio of (sum of frequency of each word of a class to the total
words for that class). The dataset is split into 80% training 20% testing, 16800 for training
and 4200 for testing. Finally, for testing the data using a test set where the probability of
each review is calculated for each class. The class with the highest probability value using
which the review is assigned the label i.e. true/genuine (T) or fake (F) Review. The
datasets used for training are F-train.txt and T-train.txt. They include Review ID (for e.g.
ID-1100) as well as the Review text (Great product) shown below in Figure 10 and Figure
11 respectively.
19
Review
ID
Figure 10: F-train.txt: (Fake review training dataset)
Review
ID
Figure 11: T-train.txt: (True review training dataset)

20
Review
ID
Figure 12: TestingData.txt: (Fake review testing dataset)
Figure 12 contains the testing dataset which has only the ID and text for the review and
the output of this after running the model is stored in output.txt which contains the result
after prediction as fake or True review alias F / T.
5.2 SKlearn Based Classifiers:
The Sklearn based classifiers were also used for classification and compared
which algorithm to get better and accurate results.
a. Multinomial Naïve Bayes: Naive Bayes classifier [7] is used in natural language
processing (NLP) problems by predicting the tag of text, calculate probability of each tag
of a text and then output choose the highest one.
b. LinearSVC: This classifier classifies data by providing the best fit hyper plane that
can be used to divide the data into categories.

21
c. SVC: Different studies have shown If you use the default kernel in SVC (), the Radial
Basis Function (rbf) kernel, then you probably used a more nonlinear decision boundary
on the case of the dataset, this will vastly outperform a linear decision boundary
d. Random Forest: This algorithm has also been used for classifying which is provided
by sklearn library by creating multiple decision trees set randomly on subset of training
data.
For these classifiers Reviews.txt dataset is used. Figure 13 shows the dataset.
Figure 13: Reviews.txt file
After the application of all these classifiers, accuracies for each of them is compared and
their performance is evaluated for classification of the fake reviews. There are some more
enhancements also made to the models as discussed in the upcoming chapter 6. This
provided even better accuracy results for classification of these fake reviews.
5.3 Technologies Used:
5.3.1 Hardware configuration
The machine on which this project was built, is a personal computer with the
following configuration:
 Processor: Intel(R) Core i5-7200U @ 2.7GHz

22
 RAM: 8GB
 System: 64bit OS, x64 processor
 512 SSD Storage
5.3.2 Software Configuration

 Windows 10
 Python 3.5.2
 Different libraries are available in Python that helps in machine learning,
classification projects. Several of those libraries have improved the performance
of this project. Few of them are mentioned in this section.
 First, “Numpy” that provides with high-level math function collection to support
multi-dimensional matrices and arrays. This is used for faster computations over
the weights (gradients) in neural networks.
 Second, “scikit-learn” is a machine learning library for Python which features
different algorithms and Machine Learning function packages.
 NLTK, natural language toolkit is helpful in word processing and tokenization.
The project makes use of Anaconda Environment which is an open source distribution
for Python which simplifies package management and deployment. It is best for large
scale data processing.

23
Chapter 6: Accuracy Enhancements
The biggest challenge was generalizing the behavior for the datasets which it was
never trained for. In a real-life situation, we can never train a model with every scenario
possible. Also, it is not possible to gather the dataset for all kinds of reviews as it all
depends on varied dialects. Here are a few techniques or strategies that have significantly
improved the model accuracy to classify the reviews as fake or genuine. They are applied
in different phases of the project, making them more efficient. These will be discussed in
the following section.
6.1 Enhancement 1
Using a predefined sentiment word list to count the sentiment words in each
review. This is based on the research where the results have shown that the more the
number of sentiment words in a review, the more chances of it being fake. There is a list
of sentiment words that the review text is compared against and ratio of words that match
from the list to the total number of words. This ratio is considered as one of the factors
while determining the fake reviews and is applied during the preprocessing as well as
sentiment analysis phase of the experiment.
The predefined sentiment list can be glanced in following picture. B. Liu and M.
Hu Sentiment Lexicon is used for sentiment words [8]. It consists of 2 sections:
a. Positive Words
b. Negative Words
These are included in the sentimentwordlist.txt file for further reference. A glimpse of
which is shown in Figure 14 below.

24
Figure 14: Sentiment word list
The code snippet to solve make use of this sentiment list is shown in Figure 15.
25
Figure 15: Sentiment word list code snippet
6.2 Enhancement 2
Compared the number of verbs and nouns in each review. This is based on the
research where the results have shown that the more the number of verbs in a review than
number of nouns, the more chances of it being fake. One of the more powerful aspects of
NLTK for Python is the part of speech tagger that is built in. This can be used in the
preprocessing phase of the project. Use of NLTK part of speech tagging is done using the
following POS tag list:
 NN noun, singular 'desk'

26
 NNS noun plural 'desks'
 NNP proper noun, singular 'Harrison'
 NNPS proper noun, plural 'Americans'
 VB verb, base form takes
 VBD verb, past tense took
 VBG verb, gerund/present participle taking
 VBN verb, past participle taken
 VBP verb, sing. present, non-3d take
 VBZ verb, 3rd person sing. present takes
The review text can be tagged as verbs and nouns with use of NLTK and thus the count
can be compared [9]. The code snippet to do that is shown below in Figure 16
Figure 16: POS tagging code snippet

27
6.3 Enhancement 3
Discount Deception Reviews:
There are reviews by some users which involves the discount prices or sale at
some store to distract the buyers to buy from certain sites. These are mostly for
promotional purposes, done intentionally by sellers mostly. For considering them in the
fake classification, the keywords that are common in such reviews are used to identify.
Some of the words on the list are:
1.profit
2.sale
3.percent
4.dollars
Use of such words are flagged as fake reviews on the testing dataset. Though it is
a bit debatable to directly discard them, in this scenario it is considered as fake.
6.4 Enhancement 4
Emoticons based sentiment Classification:
Classification of reviews in datasets according to the set of emoticons used in the
reviews by the reviewer, demonstrating the sentiment of the reviewer. The list of
emoticons [10] that can be included in as positive negative or neutral is shown in Figure
17 below
28
Positive emojis:
😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀
Negative emojis:
😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀😀
Neutral emojis:
😀😀😀😀😀😀😀
Reference:
https://li.st/jesseno/positive-negative-and-neutral-
emojis-6EGfnd2QhBsa3t6Gp0FRP9
Figure 17: Emojis Classification
When cleaning the dataset while preprocessing all punctuations are removed from
the reviews text just like sentiment research on emojis [11]. The emojis are kept as an
exception by making another list ‘items_to_keep[]’ in the review text and the snippet of
that code is included in Figure18.
Figure 18: Including the emoticons in the data

29
Studies which focused their experiments on emoticons mainly distributed the
intensity of an emotion as an integer polarity. Some of the most commonly used emojis
are selected from a list of 751 emojis with respect to their frequency and distinction in
the emoji Scores [12]. These scores are referred for finding the sentiment of the reviews
that contains emojis. The scores are mentioned below in Figure 19.
Figure 19: Emoji Sentiment Ranking
The sentiment scores can be assigned according to the UTF-8 code of the
emoticons to recognize the emojis in the reviews.

30
Figure 20: Emoji Sentiment score code snippet
The emoticons scores recognition using UTF-8 and code snippet is included in
Figure 20. This probability will be used to determine the sentiment of the review which
in turn will help determining the genuineness of the reviews. Sentiment classification is
done using all same classification algorithms but before actual fake review detection step.
31
Chapter 7: Results
Data visualization:
The following visualizations show the kind of data that was used and each depicts
how many product categories are there for each label in the Reviews.txt. Here label means
fake and genuine. For e.g. for category Instruments there are 350 reviews with label fake
as seen in the code snippet is in Figure 21.
Figure 21: Label vs Product Category code snippet
Observing the number of occurences of reviews with ratings vs the label they
have. For eg. Number of occurnaces of reviews with a fake label and rated as 5 out of 5
is more than reviews with a fake label and rated 3. The following Figure 22 shows Label
vs Rating code snippet and the comparison Label vs Rating is shown in Figure 23.
32
Figure 22: Label vs Rating code snippet
Figure 23: Label vs Rating
Observing the number of occurences of reviews with emojis vs the label they
have. For eg. Number of occurnaces of reviews with a fake label and have emojis is less
than reviews with a genuine label. The following Figure 24 shows the Label vs Emojis
count code snippet and comparison Label vs Emojis is shown in Figure 25.
33
Figure 24: Label vs Emoji Count code snippet
Figure 25: Label vs Emoji Count
Observing the number of occurences of reviews with stop words counts vs the
label they have. For eg. Number of occurnaces of reviews with a fake label have
stopwords is less than reviews with a genuine label. The following Figure 26 shows the
Label vs Stopwords count code snippet and the comparison Label vs Stopwords count in
Figure 27
34
Figure 26: Label vs Stopwords Count code snippet
Figure 27: Label vs Stopwords Count
Observing the number of occurences of reviews with verified purchases or not vs
the label they have. For eg. Number of occurnaces of reviews with a fake label have way
less verified purchases than reviews with a genuine label. The following Figure 28 shows
the Label vs Verified Purchases code snippet and comparison Label vs Verified Purchases
in Figure 29.
35
Figure 28: Label vs Verified Purchase
Figure 29: Label vs Verified Purchase
These snippets of code can be observed in the DataVisualization.ipynb file for further
reference. The following output.txt file is the result generated by textblob Naïve Bayes
Classifier. It can be shown in Figure 30.

36
Figure 30: Output.txt: (Classified testing output dataset)
The accuracy scores obtained for this dataset are shown as follows:
Accuracy-80.542
F1 score-77.888
Precision Score-80.612
Recall-79.001
37
The following results were observed for each of the previously described
experimental setups. The results show how the accuracy has improved after each
enhancement to the model in Table 1
Raw data w/ Preprocessing & Feature Testing Sentiment

Tokenization Lemmatization inclusion data classifier
Multinomial 72% 77% 81% 80% 84%

Naïve Bayes
Linear SVC 67% 70% 74% 73% 83%
SVM 69% 75% 77% 81% 81%
Random Forest 68% 70% 72% 71% 79%
Table 1: Results
Another plotting of the results is shown in Figure 31 which depicts the bar chart
for each classifier with a different color for a data of 21000 in total.
38
Accuracies
81 77 80 81 84 81 83 79
72 69 67 68 77 75 74 72 73 71
70 70
RAW DATA PREPROCESSING FEATURE INCLUSION TESTING DATASET SENTIMENT

LEMMATIZATION CLASSIFIER
Multinomial NB SVM Linear SVC Random Forest
Figure 31: Results graph
Raw data is loaded from Reviews.txt file and by just parsing it and tokenizing,
accuracy of each model is calculated to predict the reviews being fake or genuine. The
best results were obtained using Naïve Bayes classifier as evident in the figure.
Preprocessing and lemmatization of the review text is done, accuracy of each
model is calculated to predict the reviews being fake or genuine. The best results were
obtained using Naïve Bayes classifier.
Additional feature inclusion works on including additional features like verified
purchase, ratings, product category of the review. Previously, the data features used were
only in an ID, Text, Label tuple from each review in the dataset. After utilizing these
other features, the accuracy of the models increased and can see the improvements in the
results for each of the classifiers.

39
Testing Dataset covers the classification accuracy for the reviews in the testing
dataset. Here as you observed the non-linear SVM classifier performed the best and could
give 81% accuracy. This shows it could generalize and predict the fake reviews more
accurately compared to Naïve Bayes classification which outperformed pretty much in
all the other scenarios.
Sentiment classifier includes predicting the reviews being positive or negative
according to the emojis used, the count of positive or negative word ratio, ratings given
to the review. This sentiment classification is in turn used in predicting the reviews being
fake or genuine. The accuracy results show how each model performed on sentiment
prediction of the reviews in the dataset.
Enhancement 1 is used in predicting the sentiment of the reviews using the list
of positive and negative words in the review.
Enhancement 2 compares the number of verbs and nouns in each review and
included in the preprocessing and lemmatization step.
Enhancement 3 is discount deceptive reviews predicted, it has increased the
accuracy, but this can be regarded as infinitesimally small to be included in the results.
Enhancement 4 is using emojis has added to the overall performance of the
model that helped in most accurate measure. It has improved the sentiment analysis of
the reviews, and in turn helped the performance of the models to predict whether the
review is fake or genuine.

40
Chapter 8: Conclusion
The fake review detection is designed for filtering the fake reviews. In this
research work SVM classification provided a better accuracy of classifying than the
Naïve Bayes classifier for testing dataset. On the other hand, the Naïve Bayes classifier
has performed better than other algorithms on the training data. Revealing that it can
generalize better and predict the fake reviews efficiently. This method can be applied over
other sampled instances of the dataset. The data visualization helped in exploring the
dataset and the features identified contributed to the accuracy of the classification. The
various algorithms used, and their accuracies show how each of them have performed
based on their accuracy factors.
Also, the approach provides the user with a functionality to recommend the most
truthful reviews to enable the purchaser to make decisions about the product. Various
factors such as adding new vectors like ratings, emojis, verified purchase have affected
the accuracy of classifying the data better.

41
Chapter 9: Future Work
1. To use a real time/ time based datasets which will allow us to compare the user’s
timestamps of the reviews to find if a certain user is posting too many reviews in a
short period of time.
2. To use and compare other machine learning algorithms like logistic regression to
extend the research to deep learning techniques.
3. To develop a similar process for unsupervised learning for unlabeled data to detect
fake reviews.
42
References
1. E. I. Elmurngi and A.Gherbi, “Unfair Reviews Detection on Amazon Reviews using

Sentiment Analysis with Supervised Learning Techniques,” Journal of Computer
Science, vol. 14, no. 5, pp. 714–726, June 2018.
2. J. Leskovec, “WebData Amazon reviews,” [Online]. Available:

http://snap.stanford.edu/data/web-Amazon-links.html [Accessed: October 2018].
3. J. Li, M. Ott, C. Cardie and E. Hovy, “Towards a General Rule for Identifying Deceptive
Opinion Spam,” in Proceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics, Baltimore, MD, USA, vol. 1, no. 11, pp. 1566-1576,
November 2014.
4. N. O’Brien, “Machine Learning for Detection of Fake News,” [Online]. Available:

https://dspace.mit.edu/bitstream/handle/1721.1/119727/1078649610-MIT.pdf
[Accessed: November 2018].
5. J. C. S. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto, “Supervised Learning

for Fake News Detection,” IEEE Intelligent Systems, vol. 34, no. 2, pp. 76-81, May 2019.
6. B. Wagh, J. V. Shinde and P. A. Kale, “A Twitter Sentiment Analysis Using NLTK and
Machine Learning Techniques,” International Journal of Emerging Research in
Management and Technology, vol. 6, no. 12, pp. 37-44, December 2017.
7. A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text
Classification,” in Proceedings of AAAI-98 Workshop on Learning for Text
Categorization, Pittsburgh, PA, USA, vol. 752, no. 1, pp. 41-48, July 1998.
8. B. Liu and M. Hu, “Opinion Mining, Sentiment Analysis and Opinion Spam Detection,”
[Online]. Available: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
[Accessed: January 2019].
9. C. Hill, “10 Secrets to Uncovering which Online Reviews are Fake,” [Online]. Available:
https://www.marketwatch.com/story/10-secrets-to-uncovering-which-online-reviews-
are-fake-2018-09-21 [Accessed: March 2019].
10. J. Novak, “List archive Emojis,” [Online]. Available: https://li.st/jesseno/positive-

negative-and-neutral-emojis-6EGfnd2QhBsa3t6Gp0FRP9 [Accessed: June 2019].
11. P. K. Novak, J. Smailović, B. Sluban and I. Mozeti, “Sentiment of Emojis,” Journal of

Computation and Language, vol.10, no. 12, pp. 1-4, December 2015.
43
12. P. K. Novak, “Emoji Sentiment Ranking,” [Online]. Available:

http://kt.ijs.si/data/Emoji_sentiment_ranking/ [Accessed: July 2019]
44

Aishwarya Pendyala Fall2019

Uploaded by

Copyright:

Available Formats

Aishwarya Pendyala Fall2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aishwarya Pendyala Fall2019

Uploaded by

Copyright:

Available Formats

FAKE CONSUMER REVIEW DETECTION

Presented to the faculty of the Department of Computer Science

California State University, Sacramento

Submitted in partial satisfaction of

__________________________________, Committee Chair

__________________________________, Second Reader

and credit is to be awarded for the project.

__________________________, Graduate Coordinator ____________________

Department of Computer Science

FAKE CONSUMER REVIEW DETECTION

Consumers’ reviews on ecommerce websites, online services, ratings and

products, if consumers leave positive feedback on their product review. But

unfortunately, these review mechanisms can be misused by vendors.

Existing solutions with supervised include application of different machine learning

algorithms and different tools like Weka.

wide variety of vocabulary to work on such as different subjects of datasets combined as

and Random forest algorithms.

_______________________, Committee Chair

supporting and helping me with the research work.

3. MOTIVATION AND RELATED WORK ............................................................... 6

4.PROPOSED SOLUTION .......................................................................................... 8

4.1 Data Collection ........................................................................................... 9

4.2 Data Preprocess......................................................................................... 10

4.3 Feature Extraction ..................................................................................... 10

4.4 Sentiment Analysis ................................................................................... 11

4.5 Fake Review Detection ............................................................................. 12

4.6 Performance Evaluation and Results ........................................................ 12

5.EXPERIMENTAL CONFIGURATION ..................................................................13

5.1 NLP based Textblob classifier ................................................................. 18

5.2 SKlearn based classifiers ……………………………...……... ............... 20

5.3 Technologies Used ................................................................................... 21

5.3.2 Software Configuration ...............................................................22

6. ACCURACY ENHANCEMENTS ..........................................................................23

6.1 Enhancement 1……………………………...……....................................23

6.2 Enhancement 2 ..........................................................................................25

6.3 Enhancement 3…………………………………………………………... 27

6.4 Enhancement 4…………………………………………………………... 27

2. Frequent word list sample...…………………………………………………...10

4. Data Labeled ..................................................................................................... 14

5. Dataset Load ..................................................................................................... 14

7. Feature Extraction ............................................................................................. 16

8. Verified Purchase Review Count ...................................................................... 16

9. Rating Review count ......................................................................................... 17

10. F-Train.txt (Fake Review training dataset) ....................................................... 19

11. T-Train.txt (True Review training dataset) ....................................................... 19

12. TestingData.txt (Fake Review testing dataset) ................................................. 20

14. Sentiment word list ........................................................................................... 24

15. Sentiment word list code snippet ...................................................................... 25

16. POS tagging code snippet ................................................................................. 26

17. Emojis classification ......................................................................................... 28

18. Including Emoticons in the data ....................................................................... 28

20. Emoji Sentiment score code snippet .................................................................. 29

21. Label vs Product Category code snippet............................................................ 31

22. Label vs Rating code snippet ............................................................................. 32

23. Label vs Rating .................................................................................................. 32

24. Label vs Emoji count code snippet .................................................................... 33

25. Label vs Emoji count ......................................................................................... 33

26. Label vs Stopwords Count code snippet ............................................................ 34

27. Label vs Stopwords Count ................................................................................. 34

______, Graduate Coordinator