3408-Article Text-14726-1-10-20240115
3408-Article Text-14726-1-10-20240115
3408-Article Text-14726-1-10-20240115
Sentiment Analysis on Cyanide Case After 'Ice Cold' Aired with NLP Method
using Naïve Bayes Algorithm
Rahmatika Hizria1)*, Sarwadi2), Rabiatul Adawiyah Hasibuan 3), Ramadhani Ritonga4),
Rika Rosnelly 5)
1,2,3,4,5) Potensi Utama University, Indonesia
1)
[email protected], 2)[email protected], 3) [email protected], 4)
[email protected], 5)[email protected]
ABSTRACT
Information technology is developing increasingly rapidly, and the reach of the Internet has expanded even to
remote areas. The public increasingly uses social media as a source of information that discusses all aspects of
people's lives. Social media has a vital role for most people, one of which is the news of the cyanide coffee case.
The Cyanide Coffee case was discussed again by netizens after Netflix raised this case in a documentary film
entitled Ice Cold, which made the public even more convinced of the irregularities of the case. Based on this,
sentiment analysis is needed to extract comments to obtain public opinion information. The sentiment analysis
aims to create a sentiment model to determine public comments on this case. Therefore, this research was conducted
to find out and classify public sentiment on the Cyanide Coffee Case using the Natural Language Processing (NLP)
method, which is a text preprocessing process followed by the tokenization stage. Data filtering was used using
Indonesian Stopwords, and then normalization was continued using Porter Stemmer. In this study, data collection
was carried out based on public comments on Ice Cold shows on the TikTok platform using TikTok Comments
Scraper. The test results show that the classification using naïve Bayes obtained the results of 22 negative
comments, 4052 neutral comments and 34 positive comments. The classification results of this study are 87%
accuracy, 97.6% precision, 87% recall, and 91.9% F-Score.
Keywords: Natural Language Processing; Sentiment Analysis; Jessica Wongso; Cyanide Coffee; Ice Cold;
Netflix; Naïve Bayes;
INTRODUCTION
Information technology is developing more rapidly, and the reach of the Internet has expanded even to remote
areas. Social media is increasingly being used by the public as a source of information in this millennial era, of course,
discussing all aspects of life, from social, cultural, economic, and criminal to the community's lifestyle. Social media
has a vital role for most people, one of which is the news of the cyanide coffee case.
As we know, the role of social media is vital to raise an issue, make it viral, and get the wider community's
attention, with related institutions resolving these issues more quickly.
The Jessica Wongso Cyanide Coffee Case has stolen the spotlight again since it was aired as a documentary on
Netflix titled Ice Cold. Since being found guilty of premeditated murder in the death of Wayan Mirna Salihin 7 years
ago, Jessica Kumala Wongso has become a hot topic of conversation on social media. Many people's speculations
changed when the documentary was aired.
It is known that in January 2016, Indonesian people were shocked by the death of a woman named Wayan Mirna
Salihin, who died after drinking Vietnamese Iced Coffee at Olivier Cafe, Grand Indonesia Mall Jakarta, with two
friends, Hani and Jessica. Mirna's death was allegedly caused by a corrosive substance found in the coffee she drank.
The news about the case of Mirna's death after drinking coffee containing cyanide went viral and became a
trending topic throughout Indonesia; the media paid more attention to the news of the cyanide coffee case and made
it a news broadcast continuously every day.
Seven years after the 20-year sentence was handed down to suspect Jessica Kumala Wongso, Netflix turned the
story of the cyanide coffee case into a documentary called Ice Cold. After the documentary aired, there were many
social stigmas and changes in people's views towards the defendant, Jessica Kumala Wongso. The change in view
affects people's belief that Jessica is Mirna's true killer.
Therefore, this research was conducted to find out and classify public sentiment in the Cyanide Coffee case using
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 231
Journal of Computer Networks, Architecture and
High Performance Computing Submitted : Jan 6, 2024
Volume 6, Number 1, January 2024 Accepted : Jan 13, 2024
https://doi.org/10.47709/cnahpc.v6i1.3408 Published : Jan 16, 2024
the Natural Language Processing method. Sentiment analysis is a computational science that studies public opinion
on a topic where there is a process of classifying text data containing opinions, whether positive, negative or neutral.
The goal is to find out the opinion of a group of people/public on a particular topic, product, service or agency, where
the opinion can be positive, negative, or neutral. Information is taken on a text or text mining, such as sentiment
analysis, research by (Liu, 2012). Currently, there are many studies related to sentiment analysis. One of them that is
currently trending is sentiment analysis on opinions found on social media, research by (Suryani, Linawati, & Saputra,
2019).
The reason why this research uses the Natural Language Processing method is that it is based on a research journal
written by Nico Munasatya and Sendi Novianto with the title "Natural Language Processing for Sentiment Analysis
of President Jokowi Using Multi-Layer Perceptron", where the research proves that sentiment analysis is commonly
used for opinion mining in the sense of giving an identity/label (Positive, Negative, Neutral) to the data/corpus. NLP
(Natural Language Processing) is used to process data/corpus so that it can be understood/understood by machines or
can be said to be data preprocessing/cleaning text. The classification text used to process the data/corpus is entered
into the classification engine model using the multi-layer perceptron model, producing a prediction with a percentage
accuracy of > 90% (better), research by (Munasatya & Novianto, 2020). Sentiment analysis using the Naïve Bayes
algorithm. The Naïve Bayes algorithm has also been carried out in research on news comments on Twitter; the research
was carried out by classifying tweets containing positive and negative comments and producing an accuracy rate of
55.80%, research by (Pandhu & Diki, 2020).
METHOD
This research uses the orange data mining version 3.36 application for opinion mining. According to Turney,
Opinion mining or sentiment analysis is the process of understanding, extracting and processing textual data
automatically to obtain sentiment information in an opinion sentence. Sentiment analysis is done to see the opinion or
tendency of opinion on a problem or object expressed by someone, whether it tends to have a negative or positive
view or opinion, research by (Pisceldo, Adriani, & Manurung, 2009). Opinion mining is done to see the opinion or
tendency of opinion on a problem or news topic by a person, whether it tends to be negative, positive or neutral, so
that it is hoped that the opinions collected can be helpful information. The information contained in online news is
unstructured digital text data information, research by (Pang, Lee, & Vaithyanathan, 2002). The sentiment analysis
workflow used in this study can be seen in Picture 1. According to (Lisangan, Gormantara, & Carolus, 2022), After
the dataset is collected, then through the process of data preprocessing, feature extraction, and classification using
Naive Bayes, the algorithm uses a confusion matrix by paying attention to the accuracy, precision, and recall values.
Research Scenario
This research was conducted by utilizing Orange Data Mining Tools. Data from TikTok social media is used as a
sample in this study, as shown in Picture 2.
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 232
Journal of Computer Networks, Architecture and
High Performance Computing Submitted : Jan 6, 2024
Volume 6, Number 1, January 2024 Accepted : Jan 13, 2024
https://doi.org/10.47709/cnahpc.v6i1.3408 Published : Jan 16, 2024
Data Crawling
In this process, collecting comments and test annotations after Ice Cold is aired, including sarcastic ones, and
sentences contained in public comments on the Cyanide Coffee case on the TikTok platform are scraped using the
TikTok Comments Scraper tool by entering the keyword searched Jessica Wongso. The dataset collected was 3774
comments in October 2023.
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 233
Journal of Computer Networks, Architecture and
High Performance Computing Submitted : Jan 6, 2024
Volume 6, Number 1, January 2024 Accepted : Jan 13, 2024
https://doi.org/10.47709/cnahpc.v6i1.3408 Published : Jan 16, 2024
Text Preprocessing
Preprocessing here is a stage to change the structure of a corpus in the form of a collection of text into tokens or
words through the tokenization stage; the tokens are processed again through the cleaning process stage, namely case
folding to convert text into lowercase letters (lowercase) and the stopwords removal stage so that the token does not
repeat the same word and becomes the base word by removing words that have no value, such as "which", "and", "in",
"on". Symbols, emoticons, numbers and punctuation marks are cleaned during the cleaning process. In the last stage,
normalization is carried out using a Porter stemmer before the corpus is entered into the classification model to
normalize shortened or repeated words.
Topic Modeling
At this stage, sentiment clustering identifies topics into categories or themes to understand sentiment variation and
distribution.
Topic modelling is one of the techniques in Natural Language Processing (NLP) to analyze text (Ting, Ip, & Tsang,
2011), an algorithm to identify hidden patterns from a set of words using the technique of distributing words in a set
of documents. The output of topic modelling is a set of topics consisting of several clusters of words that appear
together in the document based on specific patterns, research by (AGGARWAL & ZHAI, 2013).
P(B|A)* P(A)
P(A|B) = (Jacobi, Atteveldt, & Welbers, 2015)
P(B)
Confusion Matrix
The Confusion Matrix is used to measure the classification results of the Naïve Bayes Classifier method. Confusion
matrix is a method used to calculate accuracy in data mining concepts. Evaluation by producing accuracy, precision
and recall values, research by (Pang, Lee, & Vaithyanathan, 2002).
Classification accuracy is the percentage of correctly classified data records after testing the classification results.
Precision is the proportion of predicted positive cases and true positives in the actual data.
Recall is the proportion of true positive cases that are correctly predicted positive.
Equation:
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
Accuracy = (TP + TN) / (TP + FP + FN + TN), research by (Zunic, Corcoran, & Spasic, 2020).
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 234
Journal of Computer Networks, Architecture and
High Performance Computing Submitted : Jan 6, 2024
Volume 6, Number 1, January 2024 Accepted : Jan 13, 2024
https://doi.org/10.47709/cnahpc.v6i1.3408 Published : Jan 16, 2024
The word cloud output shows ten tokens with the highest weight:
After obtaining tokens, topic modelling using Latent Semantic Indexing is carried out so that 10 topics are obtained.
Using these 10 topics, sentiment analysis uses multi-language Indonesian to get sentiment values (0, <0, >0).
Furthermore, sentiment value normalization is carried out where <0 is a negative category, 0 is a neutral category,
and>0 is a positive category.
From the results of this sentiment analysis, there are 10 features and one target (10 topics and sentiment value
categories).
The test results on sentiment analysis that have been built obtained that the test results using the naïve Bayes
classifier algorithm provide classification test results with an accuracy of 87%, precision of 97.6%, recall of 87% and
F-Score of 91.9%.
The test results of this study show that classification using naïve Bayes obtained the results of 22 negative
comments, 4052 neutral comments and 34 positive comments, as shown in Picture 4 below:
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 235
Journal of Computer Networks, Architecture and
High Performance Computing Submitted : Jan 6, 2024
Volume 6, Number 1, January 2024 Accepted : Jan 13, 2024
https://doi.org/10.47709/cnahpc.v6i1.3408 Published : Jan 16, 2024
CONCLUSION
Based on the research that has been done, it is concluded that AUC (Area Under ROC Curve) has the lowest value
due to the imbalance in the classification results using the naïve Bayes classifier algorithm because it produces better
performance accuracy in the application of the classification process
REFERENCES
Aggarwal, C. C., & Zhai, C. (2013). Mining text data. Mining Text Data 9781461432, 1–522. doi: 10.1007/978-1-
4614-3223-4.
Br Ginting, S. L., & Trinanda, R. P. (2013). Teknik Data Mining Menggunakan Metode Bayes Classifier untuk
Optimalisasi Pencarian padAplikasi Perpustakaan (Studi Kasus: Perpustakaan Universitas Pasundan –
Bandung). Jurnal Teknologi dan Informasi, DOI: 10.34010/jati.v3i2.794.
Jacobi, C., Atteveldt, v. W., & Welbers, K. (2015). Quantitative analysis of large amounts of journalistic texts using
topic modelling. Digital Journalism, 89-106. https://doi.org/10.1080/21670811.2015.1093271.
Lisangan, E. A., Gormantara, A., & Carolus, R. Y. (2022). Implementasi Naive Bayes pada Analisis Sentimen Opini
Masyarakat di Twitter Terhadap Kondisi New Normal di Indonesia. KONSTELASI Konvergensi Teknologi
dan Sistem Informasi, 2(1).
Liu, B. (2012). Morgan & Claypool Publishers.
Munasatya, N., & Novianto, S. (2020). Natural Language Processing untuk Sentimen Analisis Presiden Jokowi
Menggunakan Multi Layer Perceptron. Techno Com, 19(3):237-244.
Pandhu, A., & Diki, W. (2020). Analisa sentimen dan Klasifikasi Komentar Positif Pada Twitter dengan Naïve Bayes
Classification. BRITech (Jurnal Ilmiah Ilmu Komputer, Sains dan Teknologi Terapan), 1(2). 32–40.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using Machine Learning
Techniques. Association for Computational Linguistics, 79–86.
Pisceldo, F., Adriani, M., & Manurung, R. (2009). Probabilistic Part Of Speech Tagging for Bahasa Indonesia. Third
International MALINDO Workshop, colocated Event ACLIJCNLP.
Suryani, N. S., Linawati, & Saputra, K. O. (2019). Penggunaan Metode Naïve Bayes Classifier pada Analisis Sentimen
Facebook Berbahasa Indonesia. Majalah Ilmiah Teknologi Elektro, 22.
Ting, J. S., Ip, W. H., & Tsang, A. H. (2011). Is Naïve Bayes a Good Classifier for Document Classification?
International Journal of Software Engineering and Its Applications, 5(3).
Zunic, A., Corcoran, P., & Spasic, I. (2020). Sentiment Analysis in Health and Well-Being: Systematic Review. JMIR
Medical Informatics, 8(1) 1-22. doi : 10.2196/16023.
* Corresponding author
This is an Creative Commons License This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
BY-NC-SA 4.0). 236