CU6051NA - Artificial Intelligence: Student Name: Renish Gautam
CU6051NA - Artificial Intelligence: Student Name: Renish Gautam
CU6051NA - Artificial Intelligence: Student Name: Renish Gautam
2019-20 Autumn
I confirm that I understand my coursework needs to be submitted online via Google Classroom under the relevant
module page before the deadline in order for my assignment to be accepted and marked. I am fully aware that late
1. Introduction
Artificial intelligence (AI) is the simulation of human intelligence processes by machines,
especially computer systems. It is the ability of a digital computer to perform tasks commonly
associated with intelligent beings. The term is frequently applied to the project of developing
systems endowed with the intellectual processes characteristic of humans, such as ability to reason,
discover meaning, generalize, or learn from past experience. Despite continuing advances in
computer processing speed and memory capacity, there are as yet no programs that can match
human flexibility over wider domains or in tasks requiring much everyday knowledge. On the
other hand, some programs have attained the performance levels of human experts and
professionals in performing certain specific tasks, so that artificial intelligence in this limited sense
is found in applications as diverse as medical diagnosis, computer search engines, and voice or
handwriting recognition. While the huge volume of data that’s being created on a daily basis would
bury a human researcher, AI applications that use machine learning can take that data and quickly
turn it into actionable information. (Cambria, 2017) Lately, AI has been so general that we don’t
even realize that we have always been using it as in some social networking sites like Facebook,
YouTube, Instagram etc. These social networking sites show the content based on our interest.
Moreover, Google AI has been helping us in image recognition, voice assistant for android devices
and so on. Hence, AI is wide-ranging branch of computer science concerned with building smart
machines. (Pozzi, 2016)
Supervised learning: Here, the data sets are labeled so that patterns can be detected and
used to label new data sets.
Unsupervised learning: Here, data sets are not labeled and are sorted according to
similarities to differences.
1|Page
Renish Gautam
CU6051NI Artificial Intelligence
However, machine learning remains a relatively ‘hard’ problem. Machine learning remains a hard
problem when implementing existing algorithms and models to work well for one’s new
application.
2|Page
Renish Gautam
CU6051NI Artificial Intelligence
Because the sentiment analysis can be automated, and therefore decisions can be made based on a
significant amount of data rather than plain intuition that is not always right. (Hardy, 2020)
Basic sentiment analysis of the text works in a straightforward process. At, First the text document
is break down into its component parts like phrases, token, sentence and parts of speech. After that
the Identification of each and every sentiment-bearing phrase and the component is complete.
Those components identified are then assigned to each phrase as sentiment score. Instead, we can
merge multi-layered sen scores (lexalytics, 2020)
For many people, YouTube is used to watch music video, comedy shows, how to guides, recipes,
hacks and more. YouTube can be a great space for teens to discover things they like. It has been
one of the growing platforms with the simplest video sharing service which users can watch, like,
share, comment, and upload their own videos. The YouTubers' main challenges are to collect all
relevant comment and detect them with summarizing the overall responses about the single video.
This is definitely much time consuming. By using the sentiment analysis Youtuber can easily know
about the reviews given by the viewers without spending lot of time. However, not every person
‘s comment in the videos are same and different kind of emotion are attached in comments. Some
may react badly to any type of disagreement, while others may even thrive there on. In order to
determine the sentiment of the comment Sentiment analysis is used.
At times, the comments of the YouTube can be so toxic that it might sabotage people, religion,
and gender personally. About 500 million comments are deleted. A lot of Youtubers have
complained about the effect they have had on their videos because of hate comments. This toxicity
seems to have a serious impact on how many people tend to engage in conversation and
discourages some from engaging in online conversation altogether. As a result, online platforms
tend to struggle effectively to facilitate connections, resulting in many small groups
3|Page
Renish Gautam
CU6051NI Artificial Intelligence
2. Background
2.1. Sentiment Analysis and its approaches
There are various factors that determines a sentiment of speech or a text, Sentiment analysis is
not a straight procedure. Text information can typically be divided into two main types: facts
and opinions. Opinions are of two types: Comparative and Direct. Direct opinions give an
opinion about an entity directly. (Jadav, 2017)
There are numerous types of sentiment analysis. Systems which focuses on polarity (positive,
negative, neutral) and some systems that detect feelings and emotions or identify intentions are
some important types. Similar emotions such as disappointment, frustration or anxiety (i.e.
negative feelings) or joy, affection or excitement (i.e. positive feelings) are correlated with th
e polarity of a text. Machine learning and Lexicons algorithm are used to detect the emotions
and feelings from texts. When a system is restored to lexicons, it becomes very tricky as the
way people express their emotions varies greatly and so do the lexical items they use.
2.1.1. Approaches
Currently there are many methods and algorithms introduced that extracts sentiment out of
texts. Computation linguistic is very huge that research and works are still going on to
improve the end result or accuracy that these methods provide. The sentiment analysis
Rule-based: Set of rules are described in this approach that identifies subjectivity, polarity,
or the subject of an opinion via some form of scripting language. Classic NLP techniques
such as tokenization, part of speech marking, stemming, sorting and other tools such as
lexicons are the variety of inputs that can be used in this method. (Monkey Learn, 2020)
Automatic: That is the approach to learning from data based on machine learning
techniques. In this approach, the task is modeled as a classification problem where a
classifier is fed with a text and then returns corresponding sentiment e.g., negative,
positive or neutral. The classifier is applied with the training samples by first training a
model to associate a specific input with the respective output. The pairs of tags and
4|Page
Renish Gautam
CU6051NI Artificial Intelligence
feature vectors (e.g. positive, negative, or neutral) are fed into the machine learning
algorithm to generate a model. The second step is the process of prediction, in which the
feature extractor transforms the unseen text inputs into feature vectors. When those
feature vectors are fed into the model, the predicted tags are generated. Naïve Bayes,
Logistic Regression, Support Vector machines and Neural Networks are under
supervision learning the classification algorithms which are commonly used. (Monkey
Learn, 2020)
Hybrid: The concept of hybrid methods is very intuitive: just combine the best of both
worlds, the rule-based and the automatic ones. Usually, by combining both approaches,
the methods can improve accuracy and precision (Monkey Learn, 2020)
5|Page
Renish Gautam
CU6051NI Artificial Intelligence
In the journal written by Lambodara Parabhoi, and Payel Saha namely, Sentiment Analysis
of YouTube Comments on Koha Open Source Software Videos has conducted sentiment
analysis on total of 404 comment on Koha ILS video on the Youtube Channel. The main
objective of this project was to analyze if the comments were positive, negative or neutral.
It discusses on using Naïve Bayes Algorithm for the sentiment analysis. They used Parallel
Dots API and Google Spreadsheet using AYLIEN Text Analysis API. The sentiment
analysis was done on categories like intention, subjectivity and sentiments, emotion and
world frequency. (Parabhoi & Saha, 2018)
In another research the authors Joe Timoney, Adarsh Raj, and Brian Davis conducted
Sentiment Analysis on comment of extracted from Youtube’s song. 250 song titles were
gathered and total of 100 comments were extracted from these videos. Various
Classification approaches such as Naïve Bayes, Decision Tree, Cross Validation
techniques and Evaluation metrics were discussed. Two machine learning algorithms were
tested: Naïve Bayes and Decision Trees. The accuracy obtained using Naïve Bayes was
79% and Decision tree was 86.09%. (Timoney et al., 2019)
In the third research written the authors have proposed to present Natural Language
Processing (NLP) based sentiment analysis approach on user comment on the Youtube.
They have proved the effectiveness of scheme by data driven experiment in terms of
accuracy of finding popular and high-quality videos. The NLP process consisted of four
processes: Comment collection and preprocessing, Generation of data sets, sentiment
measures and video rating. (Bhuiyan et al., 2017)
6|Page
Renish Gautam
CU6051NI Artificial Intelligence
7|Page
Renish Gautam
CU6051NI Artificial Intelligence
In the above application, it performs sentiment analysis for McDonalds vs. Burger King. We can
see a massive spike in positive sentiment for Burger King. At the same time, McDonalds was hit
with a wave of negative sentiment.
8|Page
Renish Gautam
CU6051NI Artificial Intelligence
3. Solution
3.1. Explanation of the proposed solution/approach to solving the problem
Taking account of above research and explanations it is clear that sentiment analysis can be
Brand Monitoring
Customer Support
Customer Feedback
Product Analytics, etc.
Supervised Learning is preferable to achieve the task of predicting the feeling of YouTube
comments in order to successfully complete the proposed problem among many approaches
of sentiment analysis. Naïve Bayes is the algorithm for predicting the sentiment among the
many algorithms under the neural network. For the YouTube comments, Kaggle is used to
gather training datasets.
Fast
Requires less training data
Highly scalable
It can make probabilistic prediction
It is easy to implement
It works more efficiently than other algorithms if the independence assumption holds.
(educba, 2020)
9|Page
Renish Gautam
CU6051NI Artificial Intelligence
For example, if a fruit is red, round, and around 3 inches in diameter, it may be called an apple.
Even if these characteristics depend on each other or on the existence of the other characteristics,
all these characteristics contribute independently to the probability that this fruit is an apple, which
is why it is called' Naive.'
Naive Bayes model is simple to build and especially useful for very large data sets. Naive Bayes
is considered to outperform even highly sophisticated methods of classification, as well as
simplicity. (Ray, 2017)
Bayes Theorem provides a way for P(c), P(x) and P(x) to measure posterior probability. Look at
the equation underneath:
Here,
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
10 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
3.3. Pseudocode
Import necessary libraries
Read dataset and separate sentiment text and its sentiment label.
Remove stopwords.
Tokenization.
model=naive_bayes.MultinomialNB()
model.fit(X_train,y_train)
11 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
my_vectorizer=vectorizer.transform(my_test_data)
model.predict(my_vectorizer
Compare real response value with the value of the expected response.
3.4. Flowchart
12 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
4. Conclusion
13 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
5. Bibliography
Bhuiyan, H., ara, J., Bardhan , R. & Islam, R. (2017) Retrieving YouTube Video by Sentiment
Analysis on User Comment onn User Comment. Proc. of the 2017 IEEE International Conference
on Signal and Image Processing Applications , p.478.
Jadav, S. (2017) Sentiment Analysis: A Review. Scientific Journal of Impact Factor (SJIF): 4.72
, p.962.
Parabhoi, & Saha,. (2018) Sentiment Analysis of YouTube Comments on Koha Open Source
Software Videos. International Journal of Library and Information Studies, 8, p.102.
Pozzi, F.A. (2016) Sentiment Analysis in Social Networks. In Sentiment Analysis in Social
Networks. 1st ed. Morgan Kaufmann. p.284.
14 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence
Ray, S. (2017) 6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R [Online].
Available from: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-
explained/?fbclid=IwAR1-5mSCWS8WwOHc3B6OJPy8-
R73G3OqTxDWn42c528CoOZO2jw5BQYXmSM [Accessed 11 September 2017].
Timoney, , Raj, & Davis , B. (2019) Nostalgic Sentiment Analysis of YouTube Comments for Chart
Hits of the 20th Century. Maynooth: Dept. of Computer Science, Maynooth University.
15 | P a g e
Renish Gautam