Dr. A.P.J. Abdul Kalam Technical University: ODD Semester 2021-22 Examination Admit Card

Dr. A.P.J. Abdul Kalam Technical University

ODD Semester 2021-22 Examination Admit Card

Roll No. : 1900680100081

Name of Candidate : ATUL SHARMA
Gender : M
Course : B.Tech
Branch : Computer Science and Engineering
Semester : 5
Examination Center : ( 128 ) Bharat Institute Of Technology,Meerut

Subject Code Subject Name Exam Date* Timings AnsBookNo.

KCS503 Design and Analysis of Algorithm 30/12/2021 9.30AM-12.30PM  
KCS501 Database Management System 03/01/2022 9.30AM-12.30PM  
KCS051 Data Analytics 05/01/2022 9.30AM-12.30PM  
KCS055 Machine Learning Techniques 10/01/2022 9.30AM-12.30PM  
KCS502 Compiler Design 12/01/2022 9.30AM-12.30PM  

Session: 2021-2022



“Amazon Food Product Review




I owe sincere thanks to all the faculty members in the department of Computer Science and
engineering for their kind guidance and encouragement time to time.

Date: 21/12/21 PRIYANSHU NEGI

Amazon Food Product Review Analysis

First We want to know What is Amazon Food Review

The goal here is to classify Food reviews based on customers' text. So the
first step would be to download the dataset. It would be fascinating for
suppliers to use reviews from their customers to provide better service to
them. Reviews include several features like ‘ProductId’, ‘UserId’, ‘Score’,
and ‘text’.
This dataset consists of reviews of fine foods from amazon. The data span a
period of more than 10 years, including all ~500,000 reviews up to October
2012. Reviews include product and user information, ratings, and a
plaintext review. We also have reviews from all other Amazon categories.
Amazon reviews are often the most publicly visible reviews of consumer
products. As a frequent Amazon user, I was interested in examining the
structure of a large database of Amazon reviews and visualizing this
information so as to be a smarter consumer and reviewer.
The Amazon Fine Food Reviews dataset consists of reviews of fine foods
from Amazon.

•Number of reviews: 568,454

•Number of users: 256,059
•Number of products: 74,258
•Timespan: Oct 1999 — Oct 2012
•Number of Attributes/Columns in data: 10
Attribute Information:

2.ProductId — unique identifier for the product
3.UserId — unique identifier for the user
5.Helpfulness Numerator — number of users who found the review helpful
6.HelpfullnessDenominator — number of users who indicated whether they
found the review helpful or not
7.Score — rating between 1 and 5
8.Time — timestamp for the review
9.Summary — brief summary of the review
10.Text — text of the review

Q- How to determine if a review is positive or negative?

Ans - We could use the Score/Rating. This is an approximate and proxy
way of determining the polarity (positivity/negativity) of a review. Here as
we only want to get the global sentiment of the recommendations (positive
or negative), we will purposefully ignore all Scores equal to 3.A review of 3
is neutral and ignored. If the score id above 3, then the recommendation
will be set to "positive". A rating of 4 or 5 could be considered a positive
review. A review of 1 or 2 could be considered negative. Otherwise, it will
be set to "negative".
Import libraries

I imported several libraries for the project:

Loading Data from Reviews.CSV file in Google Drive:

Download the file based of file id:

Eliminating neutral reviews:

Data cleaning: DeDuplication

Text Preprocessing: Stemming, stop-word removal and

Code for removing html tags,punctuations:

Review after pre-processing:


RNNs are a powerful and robust type of neural network, and belong to the most
promising algorithms in use because it is the only one with an internal memory.

Like many other deep learning algorithms, recurrent neural networks are
relatively old. They were initially created in the 1980’s, but only in recent years
have we seen their true potential. An increase in computational power along with
the the massive amounts of data that we now have to work with, and the
invention of long short-term memory (LSTM) in the 1990s, has really brought
RNNs to the foreground.

Because of their internal memory, RNN’s can remember important things about
the input they received, which allows them to be very precise in predicting
what’s coming next. This is why they're the preferred algorithm for sequential
data like time series, speech, text, financial data, audio, video, weather and much
more. Recurrent neural networks can form a much deeper understanding of a
sequence and its context compared to other algorithms.

About LSTM:

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN)

architecture used in the field of deep learning. Unlike standard feedforward neural networks,
LSTM has feedback connections. It can process not only single data points (such as
images), but also entire sequences of data (such as speech or video). For example, LSTM
is applicable to tasks such as unsegmented, connected handwriting recognition, speech
recognition and anomaly detection in network traffic or IDSs (intrusion detection systems).
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget
gate. The cell remembers values over arbitrary time intervals and the three gates regulate
the flow of information into and out of the cell.
After RNN with 4 LSTM final Model:
Now Adding four layer with output layer:

Final Evaluation:
Batch Normalization:

Procedure Followed :
Table (Different models with their train and test accuracy):
>Upload Amazon Fine Food Reviews dataset on google drive to run it on GOOGLE colab.
>Load Amazon Fine Food Reviews dataset from google drive.
>Perform text pre-processing on text data.
>Perform following 3 tasks to convert it into IMDB data format.

a. Sort the dataset on the basis of time and after that find voc
abulary for all the reviews in the dataset

b. Now compute frequencies for each word of vocabulary.

c. Index each word in the decreasing order of frequencies (Word

with max frequency will have rank 1 or index 1).

Split whole dataset into 50-50 for training_data and test_data randomly.
Now pad or truncate each review intpo sequences of length 100.
Now implement RNN with 1, 2 ,3,4,5 and 6 LSTM layers and 6th with batch normalisation.
Find accuracy for each above model.
Draw Binary Cross entropy Loss VS No. of Epochs plot.
Compare the models:
Software/Hardware Requirement Specification:

Software Requirements:
Windows XP, Windows 7(ultimate, enterprise)
Android Studio

Hardware Components:
Processor i3
Hard Disk 5 GB
Memory 1GB RAM
Android Phone with kitkat and higher.

The System should have an active internet connection.
The user has to Login to make use of the system keeping the data secure.
The system helps the user to get the best product with many resources.
Minimizes users time.
User can rate and review a product.

It requires active internet connection else error may occur.
Wrong reviews and ratings will affect the overall ratings of a product.

This system can be used to by Ecommerce Institution to help their users to
get the best product.

- Machine Learning by Tom M. Mitchell

-The Elements of Statistical Learning: Data Mining, Inference,

and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

- Pattern Recognition and Machine Learning by Christopher M.


- Natural Language Processing with Python by Steven Bird, Ewan

Klein, and Edward Loper

