Sentiment Analysis of Practo App Reviews Using KNN and Word2Vec

Building of Informatics, Technology and Science (BITS)
Volume 5, No 1, June 2023 Page: 144−152

ISSN 2684-8910 (media cetak)
ISSN 2685-3310 (media online)
DOI 10.47065/bits.v5i1.3598
Sentiment Analysis of Practo App Reviews using KNN and Word2Vec

Muhammad Farhan*, Mahendra Dwifebri, Widi Astuti
Faculty of Informatic, Informatic, Telkom University, Bandung, Indonesia
Email: 1,*[email protected], [email protected],
[email protected]
Correspondence Author Email: [email protected]

Submitted: 08/06/2023; Accepted: 26/06/2023; Published: 29/06/2023
Abstract−The development of technology and communication is used by the community to facilitate daily activities, one of which
is in the field of health services. Health services are good enough, but there are still some obstacles that are commonly found,
including not allowing to leave the house or a short schedule of doctor consultations. With the presence of health service
applications, one of which is Practo, it makes it easier for people to consult online. This convenience makes a lot of reviews
regarding the Practo healthcare application. The diversity of opinions on the internet, makes Practo app reviews varied. Therefore,
sentiment analysis of Practo app reviews is necessary. In this study, the algorithm used was KNN. The KNN algorithm was chosen
because it is very effective if the amount of data is large and easy to implement. The feature extraction used in this study is
Word2Vec. Word2Vec was chosen as a feature extraction because it was considered good enough to use because it represented
each word with a vector. This research produced the best model built when using stemming with Word2Vec dimensions of 300
and K = 3 values on the KNN parammeter, capable of producing an f1-score of 77.30%.
Keywords: Sentiment Analysis, Practo Application Review, KNN, Word2Vec
1. INTRODUCTION
Technology, communication, and information are growing rapidly along with the times.. Technological advancements
are designed to make everyday life easier, both in doing work and getting information. Among the various sectors
affected by the 4.0 era, it seems that the healthcare sector has the most to gain from the merging of physical, digital,
and biological systems, although this field may be the least prepared to welcome it [1]. Digital health services are
growing rapidly in the past 2 years, some digital services have been integrated with health institutions [2].Of course,
going to a health institution to see a doctor can treat the pain we suffer, but if we do not have enough free time or the
distance of the health institution is far away, especially with the current conditions that force us not to visit crowded
places [3].
Sentiment analysis is a type of analysis that uses linguistic computing, text mining and natural language
processing with the aim of analyzing sentiment or ratings on a particular product or service [4]. In sentiment analysis,
opinions in a text are categorized into categories such as positive, negative, or neutral. [5]. A positive category means
that the comments given against the application have good value. Whereas, a negative category means that the
comments given against the application have a less good value. However, there is also a neutral category which means
that the application has a value that is not too good and not too bad. Through sentiment analysis, people get help in
choosing the right health app. By analyzing positive or negative sentiments in reviews, people can gain insights into
other users experiences. This research also aims to understand the extent to which sentiment analysis algorithms and
techniques are effective in processing health app review data.
Sentiment analysis has many algorithms that can be used. In this research, the feature extraction used is
Word2Vec, Word2Vec is good to use as feature extraction because it can represent each word with a vector. Thus,
when using Word2Vec the polarity of the score of each word has an important role in the results of sentiment analysis
[6]. In research [6] by Ardhian Fahmi Sabani in 2022, explained that Word2Vec feature extraction as a very good
feature extraction is used because it represents each word into a vector.
There are several classifications that can be used in conducting sentiment analysis, in this study the
classification used is K-Nearest Neighbor. This classification is used because according to research [7] the KNN
method is considered a high-quality approach in behavior analysis, especially in sentiment analysis. In research [7]
conducted by Imam Prayoga in 2023, it was explained that, the KNN method used in sentiment analysis of Indonesian
movie reviews had an f1-score value of 86.98% which means it has accurate results for performing sentiment analysis.
In research [8] by Widi Widayat in 2021 discusses Sentiment Analysis Movie Review using Word2Vec and
the Long Sort Term Memory Deep Learning method. In this study, it goes through several preprocessing stages which
include converting data into lowercase form, cleaning characters in reviews that do not have sentiment meaning,
deleting urls, and also tokenizing the dataset. In the study, the best accuracy value was obtained when using a
dimension size of 100 of 88.17% and the lowest accuracy value of 85.86% at a dimension size of 500, which means
it is quite good at doing sentiment classification. So that the Long Sort Term Memory method with word2vec can be
an option when you want to do research on sentiment analysis with large amounts of data.
In research conducted by Dwi Intan Af'Idah in 2021 [9]. This study investigates how Wod2Vec parameters
impact deep learning capabilities in sentiment classification. The research goes through preprocessing starting from
the case folding, filtering, tokenization, and stopword removal processes. The study analyzed the influence of
Word2Vec architecture on model accuracy with CBOW accuracy value of 97.12% and Skip-gram accuracy value of
96.62%, in another study the influence of Word2Vec evaluation method on model accuracy with Hierarchical Softmax
Copyright © 2023 Muhammad Farhan, Page 144

This Journal is licensed under a Creative Commons Attribution 4.0 International License
DOI 10.47065/bits.v5i1.3598
accuracy value of 97.16% and Negative Sampling accuracy value of 96.58%, besides that, the study reviewed the
influence of Word2Vec dimensions on model accuracy. The results show that dimension 100 has a better effect than
dimensions 200 and 300, the value of dimension 100 is 97.10% while the value of dimension 200 is 96.77% and the
value of dimension 300 is 96.73%.
Research [10] conducted by Syarifuddin in 2020, using the Naïve Bayes-decision tree-KNN algorithm to
discuss public opinion on the impact of the PSBB on Twitter. In this study there are several stages in preprocessing,
including convert negation, cleansing, tokenization, case folding, stemming in Indonesian. and stopword removal.
The accuracy value of the decision tree is 83.3% while the KNN accuracy value is 80.80% and the Naïve Bayes
accuracy value is 80.03%. The precision value of the decision tree is 81.06% while the precision value of KNN is
82.72% and the precision value of Naïve Bayes is 87.54%. While the recall value of the decision tree classification
gets a recall value of 87.17% and the recall value in the Naïve Bayes classification gets the lowest value with 62.71%,
other recall values in the KNN classification get a value of 74.41%
Another research conducted by Puji Astuti in 2022 [11]. This research describes the application of the KNN
algorithm to the sentiment analysis of care protect application reviews. In this study through two preprocessing stages,
namely tokenize and stopword filter. In this study, the use of applications in calculating the accuracy value is
RapidMiner software in the data processing process. Fold cross validation was used in this study, and to calculate
accuracy using the Confusion Matrix. The ROC curve is used to measure the AUC value. Where cross validation is
an action taken in finding the accuracy of each method by dividing data test and data train. In this study, the amount
of data used was 200 data divided into 100 negative review data and 100 other positive review data. The accuracy
value is 81.72% with an AUC value of 0.856, while by changing the K value to K = 20, the accuracy is 81.74% with
an AUC value of 0.861. More data is needed to increase the accuracy value.
In Research [12] conducted by Abdul Rozaq in 2022, discusses sentiment analysis of the implementation of an
independent program to study at an independent campus with the Decision Tree, KNN and Naïve Bayes classification.
Feature Extraction used in the study is Term Frequency and TF-IDF. The preprocessing stages of the study include
case folding, tokenize, and stopword removal. Of the total 475 data taken, the data was then divided into two parts,
namely test data and train data. The scale of data sharing is 80:20, where 20% of the total data is test data and the
other 80% is train data with the Naïve Bayes accuracy value of 99.22%, while the KNN accuracy value is 96.90%,
and the Decision Tree accuracy value is 37.21%.
2. RESEARCH METHODOLOGY
In this study, the system built was a sentiment analysis model from a review of practo applications using KNN and
Word2Vec. The system built on this study is shown in Figure 1:
Figure 1. System Design Flow

2.1 Dataset
In this study, using a dataset of practo application reviews obtained from the Kaggle website. The data collected
consisted of 7,156 practo app review data and the review data was in English.. This dataset does not include all reviews
(there are about 250 thousand reviews in the appstore). This dataset has been labeled with the positive and negative
classes shown in Figure 2. An example of the labeled dataset is shown in Table 1.
Table 1. Research Dataset
Label Sentence
Positive “Good app and available at all the times. Have made doctors available at some very critical times
of emergency."

DOI 10.47065/bits.v5i1.3598
Negative “The 'medical records' feature is not smooth. When I select the option to upload photos from
gallery, it does not show the content of galary rather it goes back to main window after few
seconds..”
Figure 2. Label Distribution

2.2 Preprocessing
Figure 3. Preprocessing Stage

The preprocessing process begins after all the data has been collected and prepared. Preprocessing is done to
fix data processing problems. The preprocessing stage carried out in this study is depicted in Figure 3. This study
divides preprocessing into five processes, namely Cleansing, Stopword Removal, Case Folding, Stemming, and
Tokenization. At the preprocessing stage, data in the review_type column is also changed. The change in data type
changes the positive class to 1 and the negative class to 0.
2.2.1 Cleansing
Cleansing is the process of cleaning up attributes that are not needed in input data such as symbols and punctuation
marks [10]. Table 2 is the result of the cleansing stage.
Table 2. Cleansing Result
Text Cleansing Result
“ Good app and available at all the times. Have made “ Good app and available at all the times Have made
doctors available at some very critical times of doctors available at some very critical times of
emergency.” emergency”
2.2.2 Case Folding
During the case folding process, all characters in the data are converted into lowercase letters [13]. The results of case
folding can be seen in Table 3.
Table 3. Case Folding Result
Cleansing Result Case Folding Result
“ Good app and available at all the times Have made “ good app and available at all the times have made
doctors available at some very critical times of doctors available at some very critical times of
emergency ” emergency ”

DOI 10.47065/bits.v5i1.3598
2.2.3 Tokenization
Tokenization is divided by space and serves as a sentence breaker based on each word that composes it. [14]. Table 4
is the result of the tokenization stage.
Table 4. Tokenization Result
Case Folding Result Tokenization Result
“ good app and available at all the times have “[good], [app], [and], [available], [at], [all], [the], [times],
made doctors available at some very critical [have], [made], [doctors], [available], [at], [some], [very],
times of emergency ” [critical], [times], [of], [emergency]”
2.2.4 Stopword Removal
Stopword removal is removing meaningless words and unimportant words [15]. Table 5 is the result of the stopword
removal stage.
Table 5. Stopword Removal Result
Tokenization Result Stopword Removal Result
“[good], [app], [and], [available], [at], [all], [the], [times], [have], “[good], [app], [available], [times], [made],
[made], [doctors], [available], [at], [some], [very], [critical], [doctors], [available], [critical], [times],
[times], [of], [emergency]” [emergency]”
2.2.5 Stemming
Turning words into root words is known as stemming. This process removes word affixes, namely suffixes, prefixes,
and a combination of both [7]. Table 6 is the result of the stemming stage.
Table 6. Stemming Result
Stopword Removal Result Stemming Result
“[good], [app], [available], [times], [made], [doctors], [available], “good app avail time made doctor avail
[critical], [times], [emergency]” critic time emerg”
After completing all stages of preprocessing, clean sentences shown in table 7.
Table 7. Preprocessing Stages Result
Text Preprocessed Text
“ Good app and available at all the times. Have made doctors available “ good app avail time made doctor avail
at some very critical times of emergency.” critic time emerg ”
2.3 Split Data
After preprocessing is completed, the next step is to split the dataset. Data is divided or split into test and train data.
In this analysis, the data is divided into 80% train data and 20% test data. After the data split is complete, the results
can be shown in Table 8:
Table 8. Split Data Result
Split Data Data Train Data Test
Total Data 5724 1432
Positive 2063 910
Negative 3661 522
2.4 Word2Vec
After the preprocessing and split data stages are complete, the next step is to weight the words with Word2Vec.
Word2Vec is a tool based on deep learning that aims to represent words in a context as a vector with N dimensions
[6]. This research uses Word2vec as feature extraction. There are 2 types of Word2Vec models that can be used, the
first type is the Continuous Bag of Words (CBOW) and the second type is Skip-Gram. The way CBOW works is to
predict the context of words from previous words while the way Skip-gram works predicts the middle word context
of the words to the left or right of a given word [9]. In this research, the model used is Skip-Gram. The use of Skip-
Gram in this study is because Skip-Gram is an effective method for studying various vector representations of words
available in unstructured text [6]. The equation used in the Skip-Gram model in number 1:
𝑇
1
∑ ∑ log 𝑝(𝑤𝑡+𝑗 |𝑤𝑡 ) (1)
𝑇
𝑡=1 −𝑐≤𝑗≤𝑐,𝑗≠0
Description:

DOI 10.47065/bits.v5i1.3598
𝑐 = Training context measures

𝑤𝑡+𝑗 = The word after the middle word
𝑤𝑡 = Center word
𝑝(𝑤𝑡+𝑗 |𝑤𝑡 ) = Word probability in the current word
Figure 4 shows the implementation of the above equations in the Skip-Gram model structure.
Figure 4. Skip-Gram Model [8]

2.5 K-Nearest Neighbor
After the feature extraction process using Word2Vec done, it is necessary to carry out the classification process. The
classification used in this analysis is K-Nearest Neighbor. The K-Nearest Neighbor algorithm classifies new data
based on the distance of the new data to some data or its closest neighbor [16]. The purpose of this algorithm is to use
attributes and training samples to classify new objects [17]. The K-Nearest Neighbor classification is used because it
has the advantages of being easy to understand, good performance, and easy to implement parameter tuning to adjust
to research needs to achieve better results. [7]. K-Nearest Neighbor uses Euclidean distance to calculate the distance
from one test data to all exercise data [18]. The euclidean distance formula can be seen at number 2:
𝑛
𝐷(𝑥, 𝑦) = √∑ (𝑥𝑖 − 𝑦𝑖 )2 (2)
𝑖=1
Description:
D : Distance between points
y : testing data
x : training data
K-Nearest Neighbor method can be calculated in the following way [17] :
a. Determine value of K.
b. Combine all train data to specify the new data distance. Distance is calculated using Euclidean distances.
c. Sort the distance from the closest.
d. Check the nearest neighbor's K class.
e. New class data = majority of the nearest neighboring K class specifies the K parameter.
2.7 Evaluation
In this study, the confusion matrix was used to conduct the evaluation process. The Confusion Matrix is used to find
the value of several points needed for this stage such as precision, accuracy, F1-Score and recall [19]. The Confusion
Matrix can be seen in Table 9:

DOI 10.47065/bits.v5i1.3598
Table 9. Confusion Matrix

Factual Value
Confusion Matrix
Positive Negative
Predicted Positive TP FP
Value Negative FN TN
Description:
TP = positive predicted and positive factual (true positive)
FN = negative predicted and positive factual (false negative)
FP = positive predicted and negative factual (false positive)
TN = negative predicted and negative factual (true negative)
To measure the performance of the classification process, precision, recall and f1-score are calculated. The formula
for performance evaluation is as follows:
F1-Score is a weighted average by taking recall values and precision to calculate the performance of classification
method [20]. Here is the formula for calculating F1-Score in number 3:
2 × 𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = (3)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Precision is the comparison of the number of items correctly identified as positive with the number of items
identified as positive [20]. Here is the formula for calculating Precision in number 4:
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
𝑇𝑃 + 𝐹𝑃
Recall or True Positive Rate (TPR) is a comparison of the number of relevant items correctly identified with
all correct items [20]. Here is the formula for calculating Recall in number 5:
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (5)
𝑇𝑃 + 𝐹𝑁
3. RESULT AND DISCUSSION

In this study, 7156 data were collected after the preprocessing process was successfully carried out. Then data division
was carried out with a ratio of 80:20 and the results obtained were 5724 for training data and 1432 for test data. After
the data is split, the next step is to perform feature extraction using Word2Vec. In feature extraction, each word will
be implemented into a vector. After performing feature extraction, the next process is to perform classification using
the K-Nearest Neighbor method. There are a total of 3 test scenarios conducted in this research. The first test scenario
is by comparing the preprocessing stage using stemming and not using stemming. The purpose of the first scenario is
to find out how the use of stemming in the preprocessing stage affects performance results. The second test scenario
is to use dimension 100 and dimension 300 in Word2Vec feature extraction. The purpose of the second scenario is to
find out how the selection of dimensions in Word2Vec feature extraction impacts the performance results. The third
test scenario is to find the best K in the K-Nearest Neighbor method with a maximum limit of K ≤ 11. The purpose of
scenario three is to find out whether the K value affects the performance of the KNN model.
Table 10. Experiment Scenario
Scenario Experiment
Tests are carried out at the preprocessing stage to determine performance results using
1
stemming and without using stemming.
2 Compare the use of Word2Vec feature extraction when using dimension 100 and
dimension 300
3 Compare the use of k values in the KNN algorithm up to k ≤ 11
3.1 The Effect of Stemming
Tests are performed in the first scenario to find out if the stemming process in preprocessing affects the built model..
In this scenario, the preprocessing test is performed twice, using the stemming shown in figure 5 and not using the
stemming shown in figure 6.

DOI 10.47065/bits.v5i1.3598
Figure 5. Preprocessing with stemming
Figure 6. Preprocessing without stemming

Tests conducted using data with Word2Vec with a dimension of 300 and the K-Nearest Neighbor method.
The following table 11 shows the test results from scenario 1:
Table 11. Scenario 1 Result
Preprocessing Precission Recall F1-Score
With Stemming 83.71% 73.92% 75.63%
Without Stemming 83.65% 73.03% 74.68%
According to the test results shown in table 11 above, test are performed using stemming in preprocessing stage
produce better precision, recall, and f1-score scores than Testing is carried out without the use of stemming at the
preprocessing stage. Testing using stemming produces a precision of 83.71%, recall of 73.92%, and f1-score of
75.63% while testing without using stemming produces a precision of 83.65%, recall of 73.03%, and f1-score of
74.68%. Research [21] shows that stemming can improve model performance when used in the preprocessing stage.
Model performance can be disrupted by features that are not very relevant to the KNN method. This shows that
stemming is useful because it can reduce the number of features by cutting word affixes.
3.2 The Effect of Dimension on Word2Vec
In test scenario 2, tests were conducted to compare the impact of Word2Vec feature extraction with the KNN method.
Testing is done by comparing the different dimensions in Word2Vec using data that has gone through the stemming
process during preprocessing and the KNN method. The dimensions used are 100 and 300. The following Table 12
shows the test results from scenario 2.:
Word2Vec Dimension Precission Recall F1-Score
100 82.66% 72.24% 73.78%
300 83.71% 73.92% 75.63%
According to the test results shown in table 12 above, it show that test with 300 dimension on Word2Vec have
better performance and stability than test with 100 dimension. This is because when the dimension is larger, there is

DOI 10.47065/bits.v5i1.3598
more space to show the relationship between words. This richer representation of words can help find more relevant
nearest neighbors when using the KNN method.
3.3 Find Best K in KNN
The third test scenario is to find best K in the K-Nearest Neighbor method with a maximum limit of K ≤ 11. In this
scenario, the feature extraction used is Word2Vec with a dimension of 300 by involving stemming during
preprocessing. Figure 7 show the statistic from scenario 3 and the test results of scenario 3 can be seen in Table 13
below:
Figure 7. Statistical Diagram Scenario 3

K Value F1-Score
1 75.84%
2 71.40%
3 77.30%
4 72.60%
5 75.63%
6 72.44%
7 75.20%
8 71.47%
9 74.19%
10 71.32%
11 73.28%
According to the test results shown in Figure 7 and Table 13 above, it shows that the highest f1-score value is
obtained at K = 3 with an f1-score value of 77.30% while the lowest f1-score value is obtained at K = 10 with an f1-
score value of 71.32%. The graph above shows that using a smaller K value can produce a better f1-score than using
a larger K value, this is shown in the graph above which tends to decrease when using a larger K value. In this study,
it can be concluded that the K value can affect the results of the performance of the K-Nearest Neighbor model.
4. CONCLUSION
Based on the results of this study, a system can be built about Sentiment Analysis of Practo Application Reviews
Using KNN and Word2Vec Methods, with the first test scenario comparing performance results when using stemming
and not using stemming in the preprocessing process, the second test scenario comparing Word2Vec feature extraction
when using dimension 100 and dimension 300, the third test scenario finding the best K value for the K-Nearest
Neighbor method with a maximum limit of K value ≤ 11. Based on the results of the scenario tests that have been
carried out, it can be concluded that the use of stemming during the preprocessing process can affect the performance
of the results, these results can be proven in the first test scenario where the performance when using stemming gets
better recall, precision, and f1-score values than without using stemming. In the second test scenario, it can be
concluded that dimensional differences in Word2Vec feature extraction can affect performance results where when
using dimension 300, the recall, precision, and f1-score values are better than using dimension 100. The third test
scenario proves that the K value can affect the results on the performance of the KNN model, the K = 3 value gets the
best result with an f1-score of 77.30% compared to other K values with a maximum value of K ≤ 11. Suggestions for
further research are to replace stemming with lemmatization in the preprocessing process, combine other feature
extractions to get varied performance results, compare more K values to get more diverse results.

DOI 10.47065/bits.v5i1.3598
REFERENCES
[1] N. R. Wardani and A. Erfina, “Konsultasi Dokter Menggunakan Algoritma Naive,” SISMATIK (Seminar Nas. Sist. Inf. dan
Manaj. Inform., pp. 11–18, 2021.
[2] A. Hendra and F. Fitriyani, “Analisis Sentimen Review Halodoc Menggunakan Nai v̈ e Bayes Classifier,” JISKA (Jurnal
Inform. Sunan Kalijaga), vol. 6, no. 2, pp. 78–89, 2021, doi: 10.14421/jiska.2021.6.2.78-89.
[3] N. Nuris, E. R. Yulia, and K. Solecha, “Implementasi Particle Swarm Optimization (PSO) Pada Analysis Sentiment Review
Aplikasi Halodoc Menggunakan Algoritma Naïve Bayes,” J. Teknol. Inf., vol. 7, no. 1, pp. 17–23, 2021, doi:
10.52643/jti.v7i1.1330.
[4] R. N. CIKANIA, “Implementasi Algoritma Naïve Bayes Classifier Dan Support Vector Machine Pada Klasifikasi Sentimen
Review Layanan Telemedicine Halodoc,” Jambura J. Probab. Stat., vol. 2, no. 2, pp. 96–104, 2021, doi:
10.34312/jjps.v2i2.11364.
[5] A. Andreyestha and A. Subekti, “Analisa Sentiment Pada Ulasan Film Dengan Optimasi Ensemble Learning,” J. Inform.,
vol. 7, no. 1, pp. 15–23, 2020, doi: 10.31311/ji.v7i1.6171.
[6] A. Fahmi Sabani, Adiwijaya, and W. Astuti, “Analisis Sentimen Review Film pada Website Rotten Tomatoes Menggunakan
Metode SVM Dengan Mengimplementasikan Fitur Extraction Word2Vec,” e-Proceeding Eng., vol. 9, no. 3, p. 1800, 2022.
[7] I. Prayoga and M. D. P, “Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of
Chi-Square Feature Selection,” vol. 7, pp. 369–375, 2023, doi: 10.30865/mib.v7i1.5522.
[8] W. Widayat, “Analisis Sentimen Movie Review menggunakan Word2Vec dan metode LSTM Deep Learning,” J. Media
Inform. Budidarma, vol. 5, no. 3, p. 1018, 2021, doi: 10.30865/mib.v5i3.3111.
[9] D. I. Af’idah, Dairoh, S. F. Handayani, and R. W. Pratiwi, “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning
pada Klasifikasi Sentimen,” J. Inform. Jurunal Pengemb. IT, vol. 6, no. 3, pp. 156–161, 2021.
[10] M. Syarifuddinn, “Analisis Sentimen Opini Publik Terhadap Efek Psbb Pada Twitter Dengan Algoritma Decision Tree,Knn,
Dan Naïve Bayes,” INTI Nusa Mandiri, vol. 15, no. 1, pp. 87–94, 2020, doi: 10.33480/inti.v15i1.1433.
[11] P. Astuti and N. Nuris, “Penerapan Algoritma KNN Pada Analisis Sentimen Review Aplikasi Peduli Lindungi,” Comput.
Sci., vol. 2, no. 2, pp. 137–142, 2022, doi: 10.31294/coscience.v2i2.1258.
[12] A. Rozaq, Y. Yunitasari, K. Sussolaikah, E. R. N. Sari, and R. I. Syahputra, “Analisis Sentimen Terhadap Implementasi
Program Merdeka Belajar Kampus Merdeka Menggunakan Naïve Bayes, K-Nearest Neighboars Dan Decision Tree,” J.
Media Inform. Budidarma, vol. 6, no. 2, p. 746, 2022, doi: 10.30865/mib.v6i2.3554.
[13] N. D. Kusumawati, S. Al Faraby, and M. Dwifebri, “Analisis Sentimen Komentar Beracun pada Media Sosial Menggunakan
Word2Vec dan Support Vectore Machine ( SVM ),” e-Proceeding Eng., vol. 8, no. 5, pp. 10038–10050, 2021.
[14] V. Kevin, S. Que, A. Iriani, and H. D. Purnomo, “Analisis Sentimen Transportasi Online Menggunakan Support Vector
Machine Berbasis Particle Swarm Optimization ( Online Transportation Sentiment Analysis Using Support Vector Machine
Based on Particle Swarm Optimization ),” vol. 9, no. 2, pp. 162–170, 2020.
[15] M. F. El Firdaus, N. Nurfaizah, and ..., “Analisis Sentimen Tokopedia Pada Ulasan di Google Playstore Menggunakan
Algoritma Naïve Bayes Classifier dan K-Nearest Neighbor,” JURIKOM (Jurnal …, vol. 9, no. 5, pp. 1329–1336, 2022, doi:
10.30865/jurikom.v9i5.4774.
[16] A. Baita, Y. Pristyanto, and N. Cahyono, “Analisis Sentimen Mengenai Vaksin Sinovac Menggunakan Algoritma Support
Vector Machine (SVM) dan K-Nearest Neighbor (KNN),” Inf. Syst. J., vol. 4, no. 2, pp. 42–46, 2021, [Online]. Available:
https://jurnal.amikom.ac.id/index.php/infos/article/view/687
[17] N. Faridhotun, E. Haerani, and R. M. Candra, “Analisis Sentimen Ulasan Aplikasi WeTV Untuk Peningkatan Layanan
Menggunakan Metode K-Nearst Neighbor,” vol. 4, no. 3, pp. 855–864, 2023, doi: 10.47065/josh.v4i3.3349.
[18] N. Octaviani Faomasi Daeli, “Sentiment Analysis on Movie Reviews Using Information Gain and K-Nearest Neighbor,”
Open Access J Data Sci Appl, vol. 3, no. 1, pp. 1–007, 2020, doi: 10.34818/JDSA.2020.3.22.
[19] M. A. A. Jihad, Adiwijaya, and W. Astuti, “Analisis sentimen terhadap ulasan film menggunakan algoritma random forest,”
e-Proceeding Eng., vol. 8, no. 5, pp. 10153–10165, 2021.
[20] I Wayan Budi Suryawan, Nengah Widya Utami, and Ketut Queena Fredlina, “Analisis Sentimen Review Wisatawan Pada
Objek Wisata Ubud Menggunakan Algoritma Support Vector Machine,” J. Inform. Teknol. dan Sains, vol. 5, no. 1, pp. 133–
140, 2023, doi: 10.51401/jinteks.v5i1.2242.
[21] A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis
on Indonesian-language Texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3,
pp. 375–380, 2019, doi: 10.22219/kinetik.v4i4.912.


Sentiment Analysis of Practo App Reviews Using KNN and Word2Vec

Uploaded by

Copyright:

Available Formats

Sentiment Analysis of Practo App Reviews Using KNN and Word2Vec

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sentiment Analysis of Practo App Reviews Using KNN and Word2Vec

Uploaded by

Copyright:

Available Formats

Building of Informatics, Technology and Science (BITS)

Volume 5, No 1, June 2023 Page: 144−152

Sentiment Analysis of Practo App Reviews using KNN and Word2Vec

Correspondence Author Email: [email protected]

Copyright © 2023 Muhammad Farhan, Page 144

Figure 1. System Design Flow

Copyright © 2023 Muhammad Farhan, Page 145

Figure 2. Label Distribution

Figure 3. Preprocessing Stage

Copyright © 2023 Muhammad Farhan, Page 146

Copyright © 2023 Muhammad Farhan, Page 147

𝑐 = Training context measures

Figure 4. Skip-Gram Model [8]

Copyright © 2023 Muhammad Farhan, Page 148

Table 9. Confusion Matrix

3. RESULT AND DISCUSSION

Copyright © 2023 Muhammad Farhan, Page 149

Figure 5. Preprocessing with stemming

Figure 6. Preprocessing without stemming

Copyright © 2023 Muhammad Farhan, Page 150

Figure 7. Statistical Diagram Scenario 3

Copyright © 2023 Muhammad Farhan, Page 151

Copyright © 2023 Muhammad Farhan, Page 152

You might also like