F13 Final
F13 Final
F13 Final
1
Presentation Outline
Introduction
Overview
Motivations and Uniqueness of the work
Literature Survey
Existing System
Problem Identification
Model Diagram
Methods, Tools, Algorithms and Technologies used
Experimentation and Results
System Specifications
Datasets Description
Parameters used
Experimental outcomes
Result Analysis and Validation
Conclusion and Future Scope
Bibliography
2
Introduction
Overview
• Filmmakers and marketers can make better decisions by using public sentiment analysis for
IMDb reviews, which is crucial for understanding audience responses and preferences.
Additionally, it offers insightful information about how movies are received, advancing
recommendation systems, and the discipline of consumer behavior analysis.
• The project uses machine learning and deep learning techniques to analyze IMDb movie
reviews to classify them as positive or negative. It compares various models to identify the
best-performing approach, with a focus on exploring Graph Convolutional Networks (GCN).
• Compared different embedding systems such as Word2Vec and TF-IDF for converting text
into numerical representations.
• We have explored the GCN field further and implemented the Graph Convolutional Network
(GCN) model in our project. GCN has superior performance in capturing relationships in text
data and improving sentiment classification accuracy.
3
Introduction contd..
A. Hassan and A. Mahmood[1] Deep learning approach for sentiment analysis of short texts 2017 • RNN
• LSTM
A. Narayanan, M. Arora, and A. A study of Sentiment Analysis : Concepts, Techniques, and 2019 • Lexicon-Based Approach
Bhatia[3] Challenges • NLP
N. Ahmed, R. A. Michelin, W. Xue, Sentiment Analysis with NLP on Twitter Data 2020 • TF-IDF
S. Ruj, R. Malaney, S. S. Kanhere, • NLP
A. Seneviratne, W. Hu, H.
Janicke, and S. Jha [4]
Kumar, S.; Gahalawat, M.; Roy, Exploring Impact of Age and Gender on Sentiment Analysis 2020 • Naive Bayes
P.P.; Dogra, D.P.; Kim [5] Using Machine Learning. • SVM
• LSTM
Ashwini Patil and Sneha M. A Review on Sentiment Analysis Approaches 2021 • Lexicon-Based Approach
Bharamgonda [6] • ML Approach
N. A. Buchan, N. P. Richardson, and A natural language processing based technique for 2023 • Cluster analysis, TF-IDF
R. M. Gorsuch [7] sentiment analysis of college english corpus
Literature Survey
Existing System
o Original work on this dataset was done by researchers at Stanford University:
• Used unsupervised learning to cluster words with close semantics
• Created word vectors from these clusters
• Ran various classification models on the word vectors to understand the polarity of
reviews
• This approach is particularly useful for data with rich sentiment content and subjectivity
in word semantics and intended meanings
o Additional work by Bo Pang and Peter Turney includes:
• Polarity detection of movie reviews and product reviews
• Creating a multi-class classification of reviews
• Predicting the reviewer rating of the movie/product
6
Literature Survey Contd..
Problem Identification
• These works discussed the use of Random Forest classifiers and SVMs for the
classification of reviews, as well as various feature extraction techniques.
• A significant point noted in these papers was the exclusion of a neutral category in
classification:
o Neutral texts are assumed to lie close to the boundary of the binary classifiers.
o Neutral texts are disproportionately hard to classify.
o There are many sentiment analysis tools and software available today, both free and commercial.
o With the advent of microblogging, sentiment analysis is widely used to analyze public sentiments and
draw inferences.
o One notable application was the use of Twitter to understand political sentiment during the German
Federal elections.
o Further exploration to effectively compare the performance of various embedding techniques
Literature Survey Contd..
Problem Identification
Performance enhancing techniques :
Incorporating context-aware models: Developing models that can capture the context and flow of
sentiment throughout a long review, rather than relying solely on individual words or phrases.
• Employing hierarchical architectures: Using models that can process reviews at different levels
of granularity, from individual sentences to the entire review, to better understand the overall
sentiment.
• Leveraging transfer learning: Fine-tuning pre-trained models on datasets containing long
reviews to improve their performance on complex language.
• Exploring ensemble methods: Combining multiple models, each specialized in handling
different aspects of long reviews, to improve overall performance.
Model Diagram :
10
Methods and Algorithms used contd..
Classification
• Logistic Regression from sklearn.linear_model
• Random forest from sklearn.ensemble import RandomForestClassifier
• LSTM from TensorFlow. keras.layers import LSTM
• ANN from sklearn.neural_network import MLPClassifier
• GCN from torch and torch_geometric with precompiled binaries
Evaluation Metrics
• accuracy_score, precision_score, F1_score, Recall_score, confusion_matrix, classification_report
from sklearn.metrics: Used to evaluate the performance of the models by calculating accuracy,
generating a confusion matrix, and providing detailed classification metrics.
Visualization
• Seaborn library: Used to create visualizations like wordCloud, heatmaps to show all accuracies
and bar charts to represent the sentiment distribution, word frequencies, and model performance.
11
Experimentation and Results
System Specifications :
The IMDB movie review sentiment analysis system operates on standard desktop computers with the
following system specifications:
Datasets Description
The dataset used in the IMDB movie review sentiment analysis project is a CSV file containing
movie reviews from the Internet Movie Database (IMDB). Here is a brief description of the
dataset:
• Dataset shape: 54565 rows and 2 columns
• Source: The dataset is sourced from IMDB, a popular platform for movie reviews and ratings.
• Format: The dataset is in CSV format, which is commonly used for storing tabular data.
• Columns: The dataset contains two columns: 'Review' and 'Rating’
Datasets Description
Word2Vec Parameters:
• sentences=texts: The list of tokenized reviews.
• vector_size=100: The dimensionality of the word vectors.
• window=5: The maximum distance between the current and predicted word within a sentence.
• min_count=1: Ignores all words with a total frequency lower than this.
• workers=4: The number of worker threads to train the model.
Train-Test Split Parameters:
• test_size: Specifies the proportion of the dataset to be used as the test set. In this case, 30% of
the data is used for testing, and the remaining 70% is used for training.
• random_state: Specifies the seed value for the random number generator to ensure
reproducibility of the results.
Experimentation and Results Contd ..
Table 2. Percentage of Accuracy, F1-Score, Recall, Precision of all the models in tabular form
Logistic 88 88 88 88
Regression
Random 81.0 80 82 81
Forest
ANN 87.16 87 87 87
LSTM 85.76 83 83 83
GCN 88.37 88 88 88
Experimentation and Results Contd..
Figure 3. Accuracy, F1-Score, Precision, and Recall of all the models Figure 4. Accuracy of models using two different
embeddings
Conclusion and Future Scope
Conclusion :
• The project demonstrates the application of NLP and machine learning techniques to analyze the
sentiment of movie reviews. The results show good accuracy of the model in classifying the
sentiment of the reviews, indicating the effectiveness of the approach
• The project compares LR, Random Forest, ANN, LSTM, and GCN models and has found that the
GCN has the highest accuracy for sentiment analysis of the IMDB dataset.
• Embedding using Tf-idf, we have trained the GCN model which results in an accuracy of around
88.37 % and an F1, Recall, and Precision score of 88 %.
Future Scope:
• The project opens up avenues for future research in sentiment analysis, including the exploration of
deep learning techniques and the analysis of sentiment in other domains.
One can further explore embedding techniques like Fasttext, and word2vec with the GCN model.
21
Bibliography
[1] A. Hassan and A. Mahmood, "Deep learning approach for sentiment analysis of short texts," in
Proceedings of the Third International Conference on Control, Automation and Robotics (ICCAR),
Nagoya, Japan, Apr. 2017, pp. 705-710.
[2] A. Kiritchenko and S. M. Mohammad, "Gender Bias in Sentiment Analysis," ResearchGate, Nov
2017
[3] A. Narayanan, M. Arora, and A. Bhatia, "A Study of Sentiment Analysis: Methods and Tools,"
ResearchGate, Apr. 2019
[4] N. Ahmed, R. A. Michelin, W. Xue, S. Ruj, R. Malaney, S. S. Kanhere, A. Seneviratne, W. Hu, H.
Janicke, and S. Jha, "A Survey of COVID-19 Contact Tracing Apps," in IEEE Access, vol. 8, pp.
134577-134601,doi: 10.1109/ACCESS.2020.3016145, 2020
[5]Kumar, S.; Gahalawat, M.; Roy, P.P.; Dogra, D.P.; Kim, B.-G.J.E. Exploring Impact of Age and
Gender on Sentiment Analysis Using Machine Learning. Electronics 2020, 9, 374.
[6] Ashwini Patil and Sneha M. Bharamgonda,” A Review on Sentiment Analysis Approaches,"
ResearchGate, Feb(2021)
[7] N. A. Buchan, N. P. Richardson, and R. M. Gorsuch, "Determination of critical thresholds in social
networks," Proc. Natl. Acad. Sci. U.S.A., vol. 120, no. 23, pp. e10280647,June(2023)
23