Life 13 00439 v2

life
Article
Deep Learning Approach for Predicting the Therapeutic Usages
of Unani Formulas towards Finding Essential Compounds
Sony Hartono Wijaya 1, * , Ahmad Kamal Nasution 2 , Irmanida Batubara 3 , Pei Gao 2 , Ming Huang 2 ,
Naoaki Ono 2 , Shigehiko Kanaya 2 and Md. Altaf-Ul-Amin 2, *
1 Department of Computer Science, Faculty of Mathematics and Natural Sciences, IPB University,
Bogor 16680, Indonesia
2 Computational Systems Biology Lab, Graduate School of Science and Technology,
Nara Institute of Science and Technology, Nara 630-0101, Japan
3 Department of Chemistry, Faculty of Mathematics and Natural Sciences, IPB University,
Bogor 16680, Indonesia
* Correspondence: [email protected] (S.H.W.); [email protected] (M.A.-U.-A.)
Abstract: The use of herbal medicines in recent decades has increased because their side effects are
considered lower than conventional medicine. Unani herbal medicines are often used in Southern
Asia. These herbal medicines are usually composed of several types of medicinal plants to treat
various diseases. Research on herbal medicine usually focuses on insight into the composition of
plants used as ingredients. However, in the present study, we extended to the level of metabolites that
exist in the medicinal plants. This study aimed to develop a predictive model of the Unani therapeutic
usage based on its constituent metabolites using deep learning and data-intensive science approaches.
Furthermore, the best prediction model was then utilized to extract important metabolites for each
therapeutic usage of Unani. In this study, it was observed that the deep neural network approach
provided a much better prediction model than other algorithms including random forest and support
vector machine. Moreover, according to the best prediction model using the deep neural network, we
identified 118 important metabolites for nine therapeutic usages of Unani.
Citation: Wijaya, S.H.; Nasution,

Keywords: Unani; herbal medicine; metabolomics; deep learning; prediction
A.K.; Batubara, I.; Gao, P.; Huang, M.;
Ono, N.; Kanaya, S.; Altaf-Ul-Amin,
M. Deep Learning Approach for
Predicting the Therapeutic Usages of
Unani Formulas towards Finding 1. Introduction
Essential Compounds. Life 2023, 13, Herbal medicines are plant-based medicines made from different combinations of
439. https://doi.org/ medicinal plant parts, e.g., leaves, flowers, or roots. Each part can have different medic-
10.3390/life13020439 inal uses, and many types of chemical constituents require different extraction methods.
Academic Editor: Tao Huang Both fresh and dried plants are used, depending on the herb (https://www.nimh.org.uk/
whats-herbal-medicine, accessed on 4 June 2021). Herbal medicines have become popular
Received: 19 December 2022 drugs in the last three decades, and no less than 80% of people worldwide depend on
Revised: 20 January 2023 herbal medicines. The main reasons why people tend to choose herbal medicines are
Accepted: 1 February 2023
because herbal medicines provide better efficacy and relatively lower side effects compared
Published: 3 February 2023
to conventional drugs [1]. The use of herbal medicines throughout the world reached
USD 60 billion in 2010 and is expected to reach USD 5 trillion by 2050 [2,3]. This infor-
mation shows that the use of herbal medicines is prevalent throughout the world. Some
Copyright: © 2023 by the authors.
examples of herbal medicine systems around the world are Traditional Chinese Medicine
Licensee MDPI, Basel, Switzerland.
(TCM) from China; Kampo from Japan; Jamu from Indonesia; and Ayurvedic, Siddha, or
This article is an open access article Unani from Southern Asia.
distributed under the terms and Unani Tibb, commonly known as Unani medicine, is practiced widely in South and
conditions of the Creative Commons Central Asia. The Arabic term “Tibb” means “medicine,” while the name “Unani” is as-
Attribution (CC BY) license (https:// sumed to have its roots in the Greek word “Ionan” [4]. Later on, it was also influenced by
creativecommons.org/licenses/by/ Indian and Chinese traditional systems. The Unani herbal medicines mostly utilize medic-
4.0/). inal plants as their ingredients, and this system follows ancient concepts and principles
Life 2023, 13, 439. https://doi.org/10.3390/life13020439 https://www.mdpi.com/journal/life

Life 2023, 13, 439 2 of 15
of drug management. Research on building Unani’s scientific foundation has not been
carried out much by researchers. This is needed to provide a foundation and knowledge
as to why an Unani formula is useful for a particular disease. Unani medicines are made
by the extraction of medicinal plants that are used as drugs against various diseases [3].
Based on [5], the Unani System of Medicine was invented in Greece and refined by Arabs
into a sophisticated medical discipline using the framework of Hippocrates’ and Jalinoos’
teachings (Galen). Unani medicine has since been referred to as Greco-Arab medicine.
The Hippocratic notion of the four humors are blood, phlegm, yellow bile, and black bile.
According to this approach, these principles govern the health and composition of the
body and pathological states. The Unani System of Medicine (USM) has been acknowl-
edged by the World Health Organization (WHO) as an alternative system to meet the
demands of the human population in terms of health care. The practice of alternative
medicine is widespread.
One approach in building Unani’s scientific foundation is supervised learning by
utilizing data science. Supervised learning is the machine learning task of learning a
function that maps an input to an output based on examples of input–output pairs [6]. It
infers a function from labeled training data consisting of a set of training examples [7]. A
supervised learning algorithm utilizes the training data and produces an inferred function,
which can be used for mapping new examples. An optimal scenario will allow the algorithm
to determine the class labels for unseen instances correctly.
Deep learning is a supervised learning process. In this work, we want to utilize
deep learning and other machine learning algorithms to find the relationship between the
therapeutic usage of Unani and its constituent metabolites (compounds). The concept of
deep learning also allows computers to model complicated and complex data concepts.
This approach is considered effective for complex data because the principles of learning
emulate the work of human neurons. In addition to supervised learning, this method can
also be used for unsupervised and semi-supervised learning. This study uses a derivative
of deep learning architecture: the deep neural network (DNN). The DNN is an artificial
neural network with a certain level of complexity that has more than one hidden layer [8].
The DNN is considered capable of solving complex problems because this approach has a
fairly complex architecture that makes it possible to study data up to the level of abstraction.
According to [9], this method is beneficial in the process of visual object recognition, object
detection, drug discovery, and genomics.
In this study, we utilized supervised learning to predict the interactions between the
therapeutic usage of Unani formulas based on their metabolites using the deep neural
network. We also compared the prediction performance of the DNN with other machine
learning methods. Moreover, we determined significant metabolites based on target dis-
eases/therapeutic usage of the Unani formula according to the best prediction model, and
validated the result based on journal references and counting the structural similarity with
relevant metabolites [10]. Hence, these results can be used as a reference to other research
and basic knowledge of drug discovery.
2. Materials and Methods

The methods adopted in the present work are illustrated in the flowchart in Figure 1.
The major steps were (1) data acquisition and preprocessing, (2) model development and
comparison, and (3) the prediction of effective metabolites.
In the preliminary step, we collected the metabolite information of medicinal plants
used as the composition of the Unani formulas. The Unani data we utilized in this work are
the same as the data utilized in previous work [3]. Actually, these data were collected from
the following book: BANGLADESH: National Formulary of Unani Medicine (ISBN 978-984-
33-3253-0). The initial data for this study included Unani formulas, medicinal plants, and
therapeutic usage information. The dataset consisted of 609 Unani formulas, 369 medicinal
plants, and these were grouped into 18 efficacy groups (Figure 2a). The efficacy classes
were as follows: (1) Blood and Lymph Diseases, (2) Cancers, (3) The Digestive System,
from the following book: BANGLADESH: National Formulary of Unani Medicine (ISBN 978-
984-33-3253-0). The initial data for this study included Unani formulas, medicinal plants,
and therapeutic usage information. The dataset consisted of 609 Unani formulas, 369 me-
Life 2023, 13, 439 dicinal plants, and these were grouped into 18 efficacy groups (Figure 2a). The efficacy 3 of 15
classes were as follows: (1) Blood and Lymph Diseases, (2) Cancers, (3) The Digestive Sys-
tem, (4) Ear, Nose, and Throat, (5) Diseases of the Eye, (6) Female-Specific Diseases, (7)
Glands
(4) Ear, and Hormones,
Nose, (8)(5)
and Throat, The Heart and
Diseases Blood
of the Eye,Vessels, (9) Diseases Diseases,
(6) Female-Specific of the Immune Sys-
(7) Glands
tem, (10) Male-Specific Diseases, (11) Muscle and Bone, (12) Neonatal Diseases,
and Hormones, (8) The Heart and Blood Vessels, (9) Diseases of the Immune System, (13) The
Nervous System, (14)
(10) Male-Specific Nutritional
Diseases, and Metabolic
(11) Muscle and Bone,Diseases, (15) Diseases,
(12) Neonatal Respiratory
(13)Diseases, (16)
The Nervous
Skin and(14)
System, Connective Tissue,
Nutritional (17) The Urinary
and Metabolic System,
Diseases, and (18) Mental
(15) Respiratory and
Diseases, Behavioral
(16) Skin and
Disorders.
Connective Tissue, (17) The Urinary System, and (18) Mental and Behavioral Disorders.
Life 2022, 12, x FOR PEER REVIEW 4 of 16
The minimum and maximum number of compounds associated with a formula cor-
responding to 18 disease classes are shown in Table 1. Finally, we represented the col-
lected data as a two-dimensional table, in which the rows represent the Unani formulas
and columns represent metabolites. Figure 2b illustrates the data representation of herbal
medicine–metabolite relations. The number of metabolites associated with 369 medicinal
plants is 4688. Therefore, the dimension of the matrix indicating relations between Unani
formulas
Figure
Figure 1. and metabolites
1.Schematic
Schematic diagramis
diagram of609
of the ×prediction
4688. of
theprediction ofUnani
Unaniefficacy
efficacyand
andidentification
identification of
ofmetabolites
metabolites for
for
each
each efficacy
efficacy group.
group.
The initial Unani formulas consisted of plants as ingredients. Unani compounds were
collected according to the corresponding plants by using the following databases: KNAp-
SAcK Family Databases (http://www.knapsackfamily.com/KNApSAcK_Family, accessed
on 25 June 2021), IJAH Analytics (http://ijah.apps.cs.ipb.ac.id, accessed on 3 July 2021),
KEGG (https://www.genome.jp/kegg/, accessed on 10 July 2021), and ChEbi
(https://www.ebi.ac.uk/chebi/, accessed on 11 September 2021). The distribution of me-
tabolites collected for each medicinal plant is shown in Figure 3. The number of com-
pounds belonging to a medicinal plant varies a lot: some plants are associated with a few
metabolites, whereas some are associated with many.
The KNApSAcK database (DB) contains information on the species–metabolite rela-
tionship (101.500), encompassing 20,741 species and 50,048 metabolites. This database also
contains
(a) information on accurate mass, molecular formula,(b) metabolite name, and mass
spectra in several ionization modes [10]. IJAH Analytics is an open-access database spe-
Figure
Figure 2.2.for
cifically Representation
Jamu data. This
Representation of of
thethe
dataset
database including
datasetprovides
includingtherapeutic usage
thetherapeutic for for
plant–metabolite
usage each Unani
relations,
each formula. (a)
andformula.
Unani we as-
Unani–plant relations; (b) Unani–metabolite relations.
sume some metabolites
(a) Unani–plant relations; might be common between
(b) Unani–metabolite relations.Jamu and Unani because both are clas-
sified as traditional medicine. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is
The
also an initial Unani
open-access formulas
database consisted
containing oforganism,
cell, plants as and
ingredients.
molecularUnani compounds
information with
were collected according to the corresponding plants by using
the specific large-scale molecular datasets. The Chemical Entities of Biological the following databases:
Interest
KNApSAcK
(ChEBI) Family
database Databases
contains (http://www.knapsackfamily.com/KNApSAcK_Family,
molecular entities focusing on small chemical compounds.
accessed on 25 June 2021), IJAH Analytics (http://ijah.apps.cs.ipb.ac.id, accessed on
3 July 2021), KEGG (https://www.genome.jp/kegg/, accessed on 10 July 2021), and
ChEbi (https://www.ebi.ac.uk/chebi/, accessed on 11 September 2021). The distribution
of metabolites collected for each medicinal plant is shown in Figure 3. The number of
compounds belonging to a medicinal plant varies a lot: some plants are associated with a
few metabolites, whereas some are associated with many.
(a) (b)
Life 2023, 13, 439 4 of 15
Figure 2. Representation of the dataset including therapeutic usage for each Unani formula. (a)
Unani–plant relations; (b) Unani–metabolite relations.
Figure
Figure3.3.
The distribution
The of compounds
distribution for selected
of compounds medicinal
for selected plants from
medicinal the from
plants top ofthe
thetop
list. of the list.
Table 1.
TheTheKNApSAcK
minimum and maximum
databasemetabolites of each disease’s
(DB) contains class.
information on the species–metabolite re-
lationship (101.500), encompassing 20,741 species and 50,048 metabolites. This database
Number of Metabolites
ID Therapeutic Usage Numberon
also contains information of accurate
Unani Formula
mass, molecular formula, metabolite name, and
Minimum Maximum Mean
mass spectra in several ionization modes [10]. IJAH Analytics is an open-access database
3 The Digestive System 103 3 518 87.63
16
specifically for Jamu data. This40
Skin and Connective Tissue
database provides the16plant–metabolite
240
relations,
75.97
and we
15 assume
Respiratory Diseases some metabolites might 32 be common between15 Jamu and Unani
379 because
107.9 both are
17 classified
The Urinary System as traditional medicine.
31 The Kyoto Encyclopedia
17 of Genes
366 and Genomes
110.5 (KEGG)
10 is also an open-access database 26
Male-Specific Diseases containing cell, organism,
10 and molecular
545 information
148 with
6 the specific large-scale molecular
Female-Specific Diseases 22 datasets. The Chemical
6 Entities
309 of Biological
122 Interest
13 (ChEBI) database contains molecular
The Nervous System 22 entities focusing33on small chemical
265 compounds.
104.5
The minimum and maximum number of compounds associated with a formula corre-
sponding to 18 disease classes are shown in Table 1. Finally, we represented the collected
data as a two-dimensional table, in which the rows represent the Unani formulas and
columns represent metabolites. Figure 2b illustrates the data representation of herbal
medicine–metabolite relations. The number of metabolites associated with 369 medicinal
plants is 4688. Therefore, the dimension of the matrix indicating relations between Unani
formulas and metabolites is 609 × 4688.
Table 1. The minimum and maximum metabolites of each disease’s class.
Number of Unani Number of Metabolites

ID Therapeutic Usage
Formula Minimum Maximum Mean
3 The Digestive System 103 3 518 87.63
16 Skin and Connective Tissue 40 16 240 75.97
15 Respiratory Diseases 32 15 379 107.9
17 The Urinary System 31 17 366 110.5
10 Male-Specific Diseases 26 10 545 148
6 Female-Specific Diseases 22 6 309 122
13 The Nervous System 22 33 265 104.5
8 The Heart and Blood Vessels 19 8 355 84.4
11 Muscle and Bone 19 11 392 102
18 Mental and Behavioral Disorders 16 25 350 121.68
4 Ear, Nose, and Throat 15 4 161 34.35
1 Blood and Lymph Diseases 13 1 308 64.7
5 Diseases of the Eye 10 5 287 87
14 Nutritional and Metabolic Diseases 6 14 148 79.83
2 Cancers 3 2 19 7.66
7 Glands and Hormones 3 33 117 65
9 Diseases of the Immune System 1 64 64 64
Life 2023, 13, 439 5 of 15
2.1. Data Preprocessing

We initially eliminated some Unani formulas with missing values and the Unani
formula with multiple therapeutic usages because we only focused on determining com-
pounds for a specific efficacy. One way to overcome the problems of imbalanced data,
multiple classification, and inconsistent data is by applying filtering methods. We used
a single filtering method in this research. The filtering approach creates models using an
entire dataset as training data, then predicts the class of all data and eliminates misclassi-
fied data. According to this reference [11], we can use random forest and other classifier
methods to remove inconsistent data and increase the performance of the model classifier.
We used two types of machine learning to filter the dataset. The first dataset was created
using random forest as a filter, whereas another dataset was created utilizing deep learning.
Two types of filtering were applied to compare the results and to accept and utilize the
better option for the final prediction.
2.2. Model Generation and Comparison

We generated a prediction model by utilizing the deep learning method. Deep learn-
ing is a form of machine learning that allows computers to learn something based on
experience and understand everything in the form of concepts. Techniques and algorithms
in deep learning can be used for supervised learning, unsupervised learning, and semi-
supervised learning in various applications. The architecture used in this study was the
deep neural network [8].
Deep learning allows a computational model consisting of several layers of processing
to study data at various levels of abstraction. The representation of learning with various
levels of representation obtained by compiling simple non-linear modules is a method of
deep learning. To classify, a higher layer of representation is used to strengthen input and
suppress irrelevant variations. The deep learning method can be used to find complex
Life 2022, 12, x FOR PEER REVIEW structures in high-dimensional data [9]. In this study, the method used consisted of more
than one hidden layer. Figure 4 shows the input layer, hidden layer, and output layer
components in deep learning.
Figure
Figure 4. The
4. The architecture
architecture of deepoflearning
deep learning
[7]. [7].
Initially, we tuned the DNN to obtain the optimal parameter values. The DN
advanced artificial neural network that has more than one hidden layer between t
and output layers. Each hidden layer has an activation function such as a sigmo
fied linear unit (ReLU), or hyperbolic tangent (tanh) function to map the input
previous layer to the output that will be sent to the layer afterward.
The DNN can be discriminatorily trained with backpropagation using cost
Life 2023, 13, 439 6 of 15
Initially, we tuned the DNN to obtain the optimal parameter values. The DNN is
an advanced artificial neural network that has more than one hidden layer between the
input and output layers. Each hidden layer has an activation function such as a sigmoid,
rectified linear unit (ReLU), or hyperbolic tangent (tanh) function to map the input from
the previous layer to the output that will be sent to the layer afterward.
The DNN can be discriminatorily trained with backpropagation using cost function
derivatives to measure the difference between the target output and actual output. Back-
propagation for large training data is performed on a small portion of data taken at random
so that it is more efficient than considering all data together.
The DNN, with a large number of hidden layers, is challenging to optimize. The ap-
proach of using the gradient descent from a randomly generated starting point close to the
actual value cannot produce a good set of weights, unless careful weight-scale initialization
is completed. Therefore, the initialization of weights in DNN modeling becomes essential
to improve the DNN modeling performance. We also compared the performance of the
DNN with other supervised learning methods, such as random forest [12], and support
vector machine [13].
2.3. Extracting Important Metabolites

According to the best prediction model, we extracted important metabolites from
each class by considering the weight of variable importance in the DNN. We selected the
top-15 important metabolites for each disease class and examined their weights. Among
the top-15 selected metabolites, we discarded the metabolites whose weights were less than
the threshold.
3. Results
3.1. Filtering Dataset
First, we removed 33 Unani formulas for fever because this symptom can be found in
many disease classes. Then, we eliminated 195 Unani formulas which have more than one
therapeutic usage, and also eliminated unrelated metabolites after the reduction of Unani
formulas. We applied single filtering using random forest and the deep neural network,
separately. The filtering process was conducted by using all datasets as training data and
also as testing data, and misclassified formulas were deleted. Therefore, we obtained two
datasets from two different types of filtering, namely dataset 1 as the dataset after filtering
using random forest, and dataset 2 as the dataset after filtering using the deep neural
network. The dimensions of the data after filtering can be seen in Table 2.
Table 2. Summary of filtering dataset.
Type of Dataset Accuracy (%) Data Dimension Number of Efficacy

Dataset before filtering - [609 × 4688] 17
Dataset filtering random forest 80.83 [307 × 4688] 16
Dataset filtering deep learning 70.00 [268 × 4688] 13
Next, we examined the distribution of formulas to each efficacy class after filtering.
Each class in both datasets should have had enough Unani samples to generate good
prediction models. Therefore, we eliminated efficacy classes 1, 2, 4, 5, 7, 9, 14, and 18
because only a few Unani formulas were available in both datasets as follows (dataset 1,
dataset 2): (8, 4), (1, 0), (10, 5), (7, 1), (3, 3), (0, 0), (3, 0), and (13, 0). After this removal, the
distribution of the Unani formulas in dataset 1 and dataset 2 is shown in Figure 5.
Type of Dataset Accuracy (%) Data Dimension Number of Efficacy
Dataset before filtering - [609 × 4688] 17
Dataset filtering random forest 80.83 [307 × 4688] 16

Life 2023, 13, 439 7 of 15
Dataset filtering deep learning 70.00 [268 × 4688] 13
Figure5.5.Comparison
Figure Comparisonof
ofUnani
Unanidata
datafor
foreach
eachtherapeutic’s
therapeutic’susage
usageafter
afterfiltering.
filtering.
3.2.Performance
3.2. PerformanceofofPrediction
Prediction
The datasets
The datasets obtained
obtained from
from the
theprevious
previousprocess
processwere
wereused toto
used develop
develop a model
a modelfor
thethe
for prediction of therapeutic
prediction of therapeuticusages of Unani
usages usingusing
of Unani machine learning
machine approaches
learning (Table
approaches
2). We2).
(Table adopted severalseveral
We adopted methods, namely
methods, deep neural
namely networks
deep neural (DNN),
networks random
(DNN), forest
random
(RF), and support vector machine (SVM), etc. The deep neural network was chosen as a
forest (RF), and support vector machine (SVM), etc. The deep neural network was chosen
recommended
as a recommended classifier because
classifier becausethis method
this methodisisrobust
robustfor
for imbalanced
imbalanced and and multi-class
multi-class
problem data. The DNN model that was built for this study was completedaccording
problem data. The DNN model that was built for this study was completed accordingto to
the
themethod
methodproposed
proposedby by[14].
[14]. This
This method
method is is considered
consideredto to be
be able
able to
to model
model complex
complex data.
data.
Tuning
Tuningparameters
parameters are are important factors for
important factors for forming
formingaaprediction
predictionmodel.
model.InIn terms
terms of
of the deep neural network, several parameters affected the accuracy
the deep neural network, several parameters affected the accuracy value of the DNN value of the DNN
model,
model,suchsuchasasthe
theactivation
activationfunction,
function,the
thedropout
dropoutvalue,
value,the
thenumber
numberof ofkkin
inthe
thevalidation
validation
process
process (k-fold cross-validation), the number of hidden layers, and the number of
(k-fold cross-validation), the number of hidden layers, and the number of epochs.
epochs.
Each parameter was tuned by considering a range of values as follows:
Each parameter was tuned by considering a range of values as follows: activation activation functions
func-
(“relu”, “tanh”, “sigmoid”) [15], the dropout value (0.15, 0.25, 0.40, 0.50), the value of k
tions (“relu”, “tanh”, “sigmoid”) [15], the dropout value (0.15, 0.25, 0.40, 0.50), the value
concerning cross-validation (4, 5, 6, 7, 8, 9, 10), the number of hidden layers (4, 6, 8, 12), and
of k concerning cross-validation (4, 5, 6, 7, 8, 9, 10), the number of hidden layers (4, 6, 8,
the number of epochs (30, 50, 100, 500). Then, the best DNN parameters were processed
12), and the number of epochs (30, 50, 100, 500). Then, the best DNN parameters were
using a grid search for both datasets. The optimal parameters for both datasets were the
processed using a grid search for both datasets. The optimal parameters for both datasets
same as follows: activation function = “relu”, dropout value = 0.40, k value = 5, number of
were the same as follows: activation function = “relu”, dropout value = 0.40, k value = 5,
hidden layers = 4, and number of epochs = 30. The prediction results for each fold using
the DNN with the best parameters can be seen in Figure 6.
In the random forest, there are several parameters that should be tuned when making
the RF model, such as n_estimators as the number of trees formed by RF, max_features,
max_depth, min_samples_split, min_samples_leaf, and bootstrap. For each parameter
used in the tuning processes, we utilized the range of values as follows: {‘n_estimators’:
(200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000); ‘max_features’: (‘auto’, ‘sqrt’);
‘max_depth’: (10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None); ‘min_samples_split’: (2,5,10);
‘min_samples_leaf’: (1, 2, 4); and ‘bootstrap’: (True, False)}. The results obtained after a grid
search for the RF model for dataset 1 and 2 were as follows: dataset 1 (n_estimators = 1000,
min_samples_split = 10, min_samples_leaf = 2, max_features = ‘sqrt’, max_depth = 10), and
dataset 2 (n_estimators = 400, min_samples_split = 10, min_samples_leaf = 4,
max_features = ‘sqrt’, max_depth = 90, bootstrap = True). After obtaining the best pa-
rameter results, the prediction model was performed using 5-fold cross-validation. The
prediction results for each fold using RF using the best parameters can be seen in Figure 6.
model are the type of kernel, gamma value, and C. The SVM parameters were tuned using
the search grid according to this configuration: {‘kernel’: (“rbf”, “linear”); ‘gamma’: (0.001,
0.0001, “auto”); and ‘C’: (1, 10, 100, 1000)}. The best parameters for both datasets were as
follows: dataset 1 (kernel: “linear,” C: 1, and gamma: 0.001) and dataset 2 (kernel: “rbf”,
C: 1 and gamma: “auto”). Then, the prediction accuracies obtained using those parameters
Life 2023, 13, 439 and 5-fold cross-validation are shown in Figure 6. Similarly, we performed parameter-8 of 15
tuning for XGBoost and K-nearest neighbors (KNN) algorithms, and the results are shown
in Figure 6.
Figure 6. Comparison of prediction accuracy of deep neural network, random forest, support vector
Figure 6. Comparison of prediction accuracy of deep neural network, random forest, support vector
machine, XGBoost, and K-nearest neighbors algorithms using both datasets.
machine, XGBoost, and K-nearest neighbors algorithms using both datasets.
The comparison of the classifier performances is shown in Figure 7. For the random
In the SVM, the parameters needed to be tuned to form the best SVM prediction model
forest, support vector machine, and XGBoost, the averages of prediction accuracy were
arebelow
the type
40%,of kernel,
and for thegamma value,
KNN it was and 60%
around C. The
butSVM parameters
still much less thanwere tuned
the deep using the
learning
search grid according to this configuration: {‘kernel’: (“rbf”, “linear”); ‘gamma’:
method. In this study, the DNN achieved 87.4% accuracy. The results imply that the pre- (0.001,
diction models based on RF and SVM are not able to make a good efficacy prediction using as
0.0001, “auto”); and ‘C’: (1, 10, 100, 1000)}. The best parameters for both datasets were
Unani’sdataset
follows: compounds as features.
1 (kernel: “linear,” C: 1, and gamma: 0.001) and dataset 2 (kernel: “rbf”, C:
1 and gamma: “auto”). Then, the prediction accuracies obtained using those parameters
and 5-fold cross-validation are shown in Figure 6. Similarly, we performed parameter-
tuning for XGBoost and K-nearest neighbors (KNN) algorithms, and the results are shown
in Figure 6.
The comparison of the classifier performances is shown in Figure 7. For the random
forest, support vector machine, and XGBoost, the averages of prediction accuracy were
below 40%, and for the KNN it was around 60% but still much less than the deep learning
method. In this study, the DNN achieved 87.4% accuracy. The results imply that the
Life 2022, 12, x FOR PEER REVIEW prediction models based on RF and SVM are not able to make a good efficacy prediction
9 of 16
using Unani’s compounds as features.
100
90
80
70
Accuracy
60
50
40
30
20
10
0
Deep Learning Suppport Random Xgboost K Nearest
Vector Forest Neighbors
Machine
Dataset type 1 Dataset type 2
Figure7.7. Comparison
Figure Comparison of
of prediction
prediction accuracy
accuracy between
between deep
deep neural
neural network,
network, random
random forest,
forest, and
and
supportvector
support vectormachine
machineusing
usingboth
bothdatasets.
datasets.
One of the reasons that influenced the results was the imbalanced amount of Unani
formulas belonging to different efficacy classes. It is noteworthy that the results of the
prediction model based on the DNN could increase the accuracy measure by about 50%
when compared to RF and SVM.
Life 2023, 13, 439 9 of 15
One of the reasons that influenced the results was the imbalanced amount of Unani
formulas belonging to different efficacy classes. It is noteworthy that the results of the
prediction model based on the DNN could increase the accuracy measure by about 50%
when compared to RF and SVM.
3.3. Identification of Important Metabolites

After obtaining the best prediction model, we extracted essential features, in this
case metabolites, for each therapeutic usage. The potential compounds for each disease
class were obtained based on variable importance from the best deep neural network
model using the KerasRegressor and PermutationImportance packages. First, we selected
the top-15 compounds and then discarded the compounds with the weight of variable
importance lower than the threshold. In this study, we set the threshold equal to 0.01. In
total, we selected 118 unique compounds for 9 efficacy groups. The statistics of the selected
compounds can be seen in Table 3, and the details of the selected compounds for each
disease class are available in Supplementary Table S1.
Table 3. Statistics of the selected compound from each disease class.
Weight of Variable Importance

No ID Class Num. of Selected Compounds
Min Max Mean
1 3 0.1470 0.5150 0.2437 15
2 6 0.0110 0.3400 0.1212 13
3 8 0.0119 0.5910 0.1829 7
4 10 0.1420 0.5350 0.2961 15
5 11 0.0175 0.1870 0.0782 15
6 13 0.0314 0.2240 0.0951 8
7 15 0.0208 0.4300 0.1209 15
8 16 0.1010 0.4880 0.2172 15
9 17 0.1110 0.3450 0.1993 15
3.4. Validation of Important Metabolites

We utilized three approaches to validate metabolites for each therapeutic group as
follows: (1) by searching in supporting journals/articles; (2) by searching for the same
metabolites in traditional medicine, in this case, Jamu and TCM; (3) by searching for
metabolites with similar structures in the PubChem database (using the Simpson simi-
larity). Equation (1) shows the formula for calculating the Simpson similarity between
two compounds.
a
S= (1)
min{( a + b), ( a + c)}
where a is the number of common features between two compounds, b is the number of
features present in only one compound, and c is the number of features only present in the
other compound. A list of validated metabolites/compounds for different disease classes is
shown in Table 4.
Table 4. List of validated metabolites.
No Feature Metabolites Weight Validation

Class 3—The Digestive System
1 3450 6H-dibenzo[b,d]pyran-6-one 0.2980 Enterophytoestrogens [16]
2 2809 lyratol C 0.2450 Colorectal neoplasms [17]
3 3813 epithienamycin E 0.2190 Validated [18]
4 2356 9(S)-HOTrE 0.2030 Liver neoplasms [19]
5 1557 cimifoetiside A 0.1950 Validated [20]
6 1835 Gymnemic acid XII 0.1750 Diabetes Mellitus [21]
Life 2023, 13, 439 10 of 15
Table 4. Cont.

Class 3—The Digestive System
7 3045 quercetin 7,40 -di-O-β-D-glucoside 0.1730 Flatulence [22]
8 1836 Phenethylamine 0.1470 Validated [23]
Class 6—Female-Specific Diseases
1 2739 D-myo-inositol 1,2,5,6-tetrakisphosphate 0.3400 Validated on medical article [24]
Albizia glaberrima
2 582 butin 0.1550
(TCM)
3 1041 Delphin 0.1190 Inflammation [25]
4 2603 Malvidin 0.0277 Validated based on Jamu data
5 2634 (R)-4-hydroxy-1-methyl-L-proline 0.0155 Aglaia andamanica
Class 8—The Heart and Blood Vessels
kaempferol
Cardiovascular diseases [26]
1 2311 3-O-[α-L-rhamnopyranosyl(1→2)-β-D- 0.5910
Cardiomyopathies [27]
galactopyranosyl]-7-O-α-L-rhamnopyranoside
2 626 Succinic acid 0.5160 Validated based on Jamu data
3 40 Linaloyl acetate 0.0367 Validated [28]
4 2949 Betamethasone valerate 0.0167 Synthetic glucocorticoid
Class 10—Male-Specific Diseases
Validated use of Simpson similarity
1 333 Obtusifoliol 0.5230
(0.9706) Euphadienol [29]
2 1362 Methyl 4-hydroxy cinnamate 0.4600 Prostate cancer [30]
3 4415 3-O-Acetyloleanolic acid 0.3610 Prostate cancer [31]
4 2853 Butiin 0.2890 Bacterial infections [32]
5 603 Gibberellin A12 0.2700 Infertility, Male [33]
6 2253 ∆6-protoilludene 0.2220 Cancer [34]
Rhododendron ferrugineum
7 4534 erythrodiol 0.1420
(TCM)
Class 11—Muscle and Bone
14-deoxo-3-O-propionyl-5,15-di-O-acetyl-7-O- Validated use Simpson similarity
1 4078 0.1870
benzoylmyrsinol-14beta-nicotinoate (0.9523) with perfluorooctyl iodide
Validated use Simpson similarity
2 1804 Euphorbiaproliferin I 0.1250
(0.9523) with cesium
3 4570 Euphorbiaproliferin G 0.1070
(0.974) with moli001259
4 2146 Euphorbiaproliferin D 0.0641 Euphorbia prolifera
Class 13—The Nervous System
1 434 pterostilbene 0.1290 Validated [35] NCBI
2 75 Trapain 0.0396 Alzheimer’s disease [36]
3 1610 cyanidin 3-O-(6-O-acetyl-β-D-glucoside) 0.0314 Neuroprotective effects [37]
Class 15—Respiratory Diseases
1 4624 6-epi-guttiferone J 0.2250
(0.902) with Sesquiterpene lactone
2 848 2(3H)-Furanone 0.0858 Lung neoplasms [38,39]
2-(3,4-dihydroxyphenyl)-ethyl-O-β-D- Cornus mas L. Cornus alba L.
3 2133 0.0395
glucopyranoside (TCM)
Class 16—Skin and Connective Tissue
1 2846 Taxifolin 30 -glucoside 0.4520 Dermatitis [40]
2 4306 Oleanolic acid 0.2560 Skin neoplasms [41]
3 1970 Oleandrin 0.1900 Melanoma [42]
Life 2023, 13, 439 11 of 15
Table 4. Cont.

Class 16—Skin and Connective Tissue
Cedrus deodara (Roxb.) Loud
4 2461 Himaphenolone 0.1360
(TCM)
5 2316 Coniferyl aldehyde 0.1320
(0.9087) with Nalco L
6 2866 Cedrin 0.1010
(0.9370) with Dihydroquercetin
Class 17—The Urinary System
1 908 Glyoxylic acid 0.3450 Kidney calculi [43,44]
2 322 Biochanin A 0.2750 Validated based on Jamu data
3 3752 Pyruvic acid 0.2010 Validated [45]
4 1526 Oxalic acid 0.1840 Validated [46]
5 4026 Soyasaponin I 0.1630 Polycystic kidney diseases [47]
6 1067 2-(methyldithio)pyridine-N-oxide 0.1520 Neoplasms [48]
7 2898 Liquiritigenin 0.1490 Validated based on Jamu data
8 3934 Garbanzol 0.1270 Neoplasms [49]
9 1266 Medicagol 0.1200 Validated based on Jamu data [50]
Table 4 shows the list of predicted compounds for which we could find validations.
Corresponding to the disease category ‘The Digestive System’, there were eight validated
compounds. Out of them, 6H-dibenzo[b,d]pyran-6-one is effective against Enterophytoe-
strogens [16]. lyratol C is used as a drug to treat colorectal neoplasms [17]. Epithienamycin
E is a substance that kills or slows the growth of microorganisms, including bacteria, viruses,
fungi, and protozoans [18]. 9(S)-HOTrE enhances reverse cholesterol transport (RCT) by
increasing the apoA-I transcription in human hepatocellular carcinoma (HepG2) cells [18].
Cimifoetiside A is the active ingredient in Cimicifuga spp., which is used to relieve diarrhea
in TCM [20]. Gymnemic acid XII possesses a higher binding affinity to PPARγ, a promising
drug target for diabetes [21]. Quercetin 7,40 -di-O-β-D-glucoside is the active ingredient
in Delonix elata, which is used to relieve flatulence and purgatives in Saudi Arabia [22].
Furthermore, as therapeutic agents, phenethylamine acts as an appetite suppressant [23].
For the ‘Female-Specific’ category, we have validated five compounds. D-myo-inositol
1,2,5,6-tetracisphosphate inhibits fibroma. This process can also block chloride channels
resulting in epithelial calcium activation [24]. Delphin has been reported to inhibit in-
flammation in some gynecological infections [25]. Butine is the active ingredient in the
ingredients TCM, Albizia glaberrima, and (R)-4-hydroxy-1-methyl-L-proline from Aglaia
andamanica. Additionally, Jamu takes Malvidin as a medical composition.
For the category ‘The Heart and Blood Vessels’, we found four validated compounds.
Out of them, kaempferol 3-O-[α-L-rhamnopyranosyl(1→2)-β-D-galactopyranosyl]-7-O-α-
L-rhamnopyranoside is a candidate agent for the treatment of cardiovascular diseases [26].
Succinic acid is an active component that is applied in Jamu. Linalyl acetate prevents
hypertension-related ischemic injury and can prevent the production of ROS [28].
In the case of ‘Male-Specific Diseases’, there were seven validated compounds. Ac-
cording to the Simpson similarity, Obtusifoliol resembles Euphadienol, which has anti-
inflammatory effects [29]. Methyl 4-hydroxy cinnamate, ∆6-protoilludene, and 3-O-Acetylo-
leanolic acid are active against prostate cancer [30]. Butiin demonstrates the growth inhibi-
tion of Gram-positive and Gram-negative bacteria that cause male-specific infections [32].
Gibberellin A12 is implicated in the treatment of male infertility [33]. The ∆-6-protoilludene
is a precursor for the synthesis of both melleolides and armillyl orsellinates, whose cyto-
toxicity reflects their ability to induce apoptosis [34]. In addition, erythrodiol is an active
ingredient from the herb, Rhododendron ferrugineum, which is used in TCM.
According to the category ‘Muscle and Bone’, the number of compounds validated was
4. Among them, 14-deoxo-3-O-propionyl-5,15-di-O-acetyl-7-O-benzoylmyrsinol-14beta-
Life 2023, 13, 439 12 of 15
nicotinoate shows similarities with perfluorooctyl iodide. These metabolites are useful as
organocatalysts through the activation of substrates with halogen bonds. Euphorbiapro-
liferin I resembles cesium and Euphorbiaproliferin G is similar to moli001259. Structural
similarity is measured based on Simpson’s similarity. Furthermore, Euphorbiaproliferin D
can be isolated from TCM ingredients, namely Euphorbia prolifera. Euphorbia prolifera can
cure various diseases when referring to TCM.
Corresponding to the disease category ‘The Nervous System’, the validated compounds
are pterostilbene, Trapain, and cyanidin 3-O-(6-O-acetyl-β-D-glucoside). The antioxidant
activity of pterostilbene has been implicated in the modulation of neurological disease [35].
Trapain is a promising agent for the treatment of Alzheimer’s disease as the Cholinesterase
and β-site amyloid precursor protein-cleaving enzyme 1 inhibitor [36]. Finally, cyanidin
3-O-(6-O-acetyl-β-D-glucoside) has been verified to have a neuroprotective effect [36].
In the case of ‘The Respiratory Diseases’, 6-epi-guttiferone J, 2(3H)-Furanone and 2-(3,4-
dihydroxyphenyl)-ethyl-O-β-D-glucopyranoside were validated. Based on the Simpson
similarity, 6-epi-guttiferone J resembles (0.902) a moderate antinociceptive agent, sesquiter-
pene lactone. In addition, 2(3H)-Furanone is reported to show anticancer and DNA-
damaging activities in A549 lung cancer cells [38,39]. Furthermore, 2-(3,4-dihydroxyphenyl)-
ethyl-O-β-D-glucopyranoside is a component of TCM herbal, Cornus mas/alba L., which is
applied in the practice as an anti-inflammatory and antibacterial drug.
For the category ‘Skin and Connective Tissue’, Taxifolin 30 -glucoside, Oleanolic acid,
Oleandrin, Himaphenolone, Coniferyl aldehyde, and Cedrin were the validated metabo-
lites. Taxifolin 30 -glucoside is effective for preventing the production of inflammatory
cytokines and reducing atopic dermatitis [40]. Oleanolic acid can inhibit skin tumor
promotion [41]. Oleandrin is shown to induce the apoptosis of malignant cells in melanoma
cell lines [42]. Himaphenolone is the active ingredient of the herb, Cedrus deodara (Roxb.)
Loud, which can be used for the treatment of carbuncle sores, eczema, traumatic bleed-
ing, burns, and scalds. Coniferyl aldehyde is similar to a drug, and Nalco L. and Cedrin
resemble dihydroquercetin.
In terms of the ‘Urinary System’ category, we have validated Glyoxylic acid, Biochanin
A, pyruvic acid, oxalic acid, Soyasaponin I, 2-(methyldithio)pyridine-N-oxide, Liquir-
itigenin, Garbanzol, and Medicagol. Glyoxylic acid and oxalic acid are involved in the
formation of kidney stones [43,44]. Pyruvic acid can prevent oxalate urolithiasis in mice [45].
Soyasaponin I inhibited kidney enlargement and cyst growth in a murine model of polycys-
tic kidney disease [46]. Then, 2-(methyldithio)pyridine-N-oxide and Garbanzol were both
shown to inhibit renal neoplasm [48,49]. Lastly, Liquiritigenin, Biochanin A, and Medicagol
are effective components used in Jamu [50].
4. Discussion
We tried our best to collect as many metabolites as possible for each Unani plant from
various resources. Medicinal metabolites are of more importance to researchers and usually
they are the first identified for various plants. Therefore, we assumed that the currently
available plant–metabolite relation could produce good results up to a certain extent.
The approach adopted in the current work can be considered as a top-down approach
because we started with a global set of Unani formulas in terms of plants, and then
we moved down to the metabolite level and utilized state-of-the-art machine learning
techniques to identify significant compounds. Hence, the approach is also a computational
approach. The results we obtained are promising, showing the strength and usefulness
of computational approaches in drug discovery. Our input data correspond to versatile
types of diseases. In this work, we considered disease classes at an upper hierarchy, and
under each class, there were diseases with some differences. Interestingly, our results also
show compounds corresponding to different types of diseases under each category. This
has been possible by investigating and identifying significant compounds within formulas
showing bias to specific disease classes/categories using efficient algorithms. Therefore,
these are the results of the systems-level investigation.
Life 2023, 13, 439 13 of 15
Another thing that is interesting to discuss is the other compounds (not validated)
extracted from the best model of this study. The validation results show around 43% of
compounds are directly or indirectly related to the therapeutic group of diseases. The
remaining 69 compounds are potential candidates for further research, for example, in
the fields of biochemistry, pharmacy, medicine, and so on. Last, the simple binary data to
represent metabolites have performed well in this study. However, other approaches can
be explored to improve the results.
5. Conclusions
A prediction of the therapeutic usage of the Unani formulas based on their constituent
metabolites using the deep neural network showed the highest accuracy compared to other
algorithms, e.g., the random forest and support vector machine, etc. The best prediction
accuracies corresponding to DNN, KNN, Xgboost, RF, and SVM were 87.4%, 63.2%, 39.3%,
37.9%, and 38.6%, respectively. The results of this prediction indicate that the DNN
performed much better compared to other algorithms. In this work, two datasets were
prepared using filtering techniques, namely, dataset 1 and dataset 2. In the case of the
DNN, the best accuracy was obtained from dataset 1, while RF and SVM obtained the best
accuracy from dataset 2. In general, the filtering process improves prediction accuracy, but
our results were mainly influenced by the type of classifier algorithms.
Based on the best classification model, we extracted important metabolites by making
use of the DNN interest variable. Corresponding to the nine therapeutic uses of the Unani
formula, we extracted 118 essential metabolites, 49 of which were validated using the
following methods: searching in supporting health-related journals/articles, searching the
same metabolites in Jamu or TCM, and searching metabolites with a similar structure and
activity in the PubChem database.
For future work of this research, we need to consider increasing the number of Unani
formulas; by doing this, the number of plants and metabolites will increase simultaneously.
We will be finding more sources of plant–metabolite relation databases, such as open-source
databases, books, and journals, so that our dataset is closer to the actual conditions and
acceptable also in the industry. The authors also recommend using artificially generated
data in testing to support and strengthen the prediction results of model accuracy.
Supplementary Materials: The following supporting information can be downloaded at: https://
www.mdpi.com/article/10.3390/life13020439/s1, Table S1: List of important metabolites for each
disease class extracted from best prediction model using variable importance of Deep Neural Network.
Author Contributions: Conceptualization, S.H.W., A.K.N., S.K. and M.A.-U.-A.; methodology,
A.K.N., S.H.W., S.K., M.A.-U.-A., N.O., I.B. and M.H.; dataset preparation, S.H.W., A.K.N. and
M.A.-U.-A.; machine learning implementation, S.H.W., A.K.N. and M.A.-U.-A.; validation, P.G.,
A.K.N. and M.A.-U.-A., writing—original draft preparation, A.K.N. and M.A.-U.-A.; writing—review
and editing, S.H.W., A.K.N. and M.A.-U.-A.; supervision, M.H., N.O., S.K. and M.A.-U.-A. All authors
have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Ministry of Education, Culture, Sports, Science, and
Technology of Japan (20K12043) and NAIST Big Data Project and was partially supported by the
National Bioscience Database Center in Japan.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are available on request from the corresponding authors.
Acknowledgments: The authors would like to thank the Ministry of Education Science and Technol-
ogy Japan, which has financially supported the authors to continue the study in Japan.
Conflicts of Interest: The authors declare no conflict of interest.
Life 2023, 13, 439 14 of 15
References
1. Ekor, M. The growing use of herbal medicines: Issues relating to adverse reactions and challenges in monitoring safety. Front.
Pharmacol. 2014, 4, 177. [CrossRef]
2. Wijaya, S.H.; Tanaka, Y.; Altaf-Ul-Amin, M.; Morita, A.H.; Afendi, F.M.; Batubara, I.; Kanaya, S. Utilization of KNApSAcK family
databases for developing herbal medicine systems. J. Comput. Aided Chem. 2006, 17, 1–7. [CrossRef]
3. Hossain, S.F.; Wijaya, S.H.; Huang, M.; Batubara, I.; Kanaya, S.; Farhad, M.A.U.A. Prediction of Plant-Disease Relations Based on
Unani Formulas by Network Analysis. In Proceedings of the 2018 IEEE 18th International Conference on Bioinformatics and
Bioengineering (BIBE), Taicung, Taiwain, 29–31 October 2018.
4. Itrat, M. Methods of health promotion and disease prevention in Unani medicine. J. Educ. Health Promot. 2020, 9, 168. [CrossRef]
5. Husain, A.; Sofi, G.D.; Tajuddin, T.; Dang, R.; Kumar, N. Unani system of medicine-introduction and challenges. Med. J. Islam.
World Acad. Sci. 2010, 18, 27–30.
6. Rani, M.; Nayak, R.; Vyas, O.P. An ontology-based adaptive personalized e-learning system, assisted by software agents on cloud
storage. Knowl.-Based Syst. 2015, 90, 33–48. [CrossRef]
7. Cortes, C.; Mohri, M.; Rostamizadeh, A. Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 2012,
13, 795–828.
8. Kang, M.J.; Kang, J.W. Intrusion detection system using deep neural network for in-vehicle network security. PLoS ONE 2016,
11, e0155781. [CrossRef] [PubMed]
9. Patel, H.; Thakkar, A.; Pandya, M.; Makwana, K. Neural network with deep learning architectures. J. Inf. Optim. Sci. 2018, 39,
31–38. [CrossRef]
10. Nasution, A.K.; Wijaya, S.H.; Gao, P.; Islam, R.M.; Huang, M.; Ono, N.; Altaf-Ul-Amin, M. Prediction of Potential Natural
Antibiotics Plants Based on Jamu Formula Using Random Forest Classifier. Antibiotics 2022, 11, 1199. [CrossRef] [PubMed]
11. Wijaya, S.H.; Batubara, I.; Nishioka, T.; Altaf-Ul-Amin, M.; Kanaya, S. Metabolomic studies of Indonesian jamu medicines:
Prediction of jamu efficacy and identification of important metabolites. Mol. Inform. 2017, 36, 1700050. [CrossRef]
12. Jackins, V.; Vimal, S.; Kaliappan, M.; Lee, M.Y. AI-based smart prediction of clinical disease using random forest classifier and
Naive Bayes. J. Supercomput. 2021, 77, 5198–5219. [CrossRef]
13. Gunn, S.R. Support vector machines for classification and regression. ISIS Tech. Rep. 1998, 1, 5–16.
14. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507.
[CrossRef]
15. Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for
deep learning. arXiv 2018, arXiv:1811.03378.
16. Larrosa, M.; González-Sarrías, A.; García-Conesa, M.T.; Tomás-Barberán, F.A.; Espín, J.C. Urolithins; ellagic acid-derived
metabolites produced by human colonic microflora; exhibit estrogenic and antiestrogenic activities. J. Agric. Food Chem. 2006, 54,
1611–1620. [CrossRef]
17. Ren, Y.; Shen, L.; Zhang, D.W.; Dai, S.J. Two new sesquiterpenoids from Solanum lyratum with cytotoxic activities. Chem. Pharm.
Bull. 2009, 57, 408–410. [CrossRef]
18. Pan, R.L.; Chen, D.H.; Si, J.Y.; Zhao, X.H.; Li, Z.; Cao, L. Immunosuppressive effects of new cyclolanostane triterpene diglycosides
from the aerial part of Cimicifuga foetida. Arch. Pharmacal Res. 2009, 32, 185–190. [CrossRef]
19. van der Krieken, S.E.; Popeijus, H.E.; Bendik, I.; Böhlendorf, B.; Konings, M.C.; Tayyeb, J.; Plat, J. Large-Scale Screening of Natural
Products Transactivating Peroxisome Proliferator-Activated Receptor α Identifies 9S-Hydroxy-10E; 12Z; 15Z-Octadecatrienoic
Acid and Cymarin as Potential Compounds Capable of Increasing Apolipoprotein A-I Transcription in Human Liver Cells. Lipids
2018, 53, 1021–1030. [PubMed]
20. Sanders, B.; Lankenau, S.E.; Bloom, J.J.; Hathazi, D. “Research chemicals”: Tryptamine and phenethylamine use among high-risk
youth. Subst. Use Misuse 2008, 43, 389–402. [CrossRef] [PubMed]
21. Tiwari, P.; Sharma, P.; Khan, F.; Singh Sangwan, N.; Nath Mishra, B.; Singh Sangwan, R. Structure activity relationship studies of
gymnemic acid analogues for antidiabetic activity targeting PPARγ. Curr. Comput.-Aided Drug Des. 2015, 11, 57–71. [CrossRef]
22. Al-Taweel, A.M.; Abdel-Kader, M.S.; Fawzy, G.A.; Perveen, S.; Maher, H.M.; Al-Zoman, N.Z.; AI-Showiman, H. Isolation of
flavonoids from Delonix data and determination of its rutin content using capillary electrophoresis. Pak. J. Pharm. Sci. 2015, 28,
1897–1903.
23. Pawar, R.S.; Grundel, E. Overview of regulation of dietary supplements in the USA and issues of adulteration with phenethy-
lamines (PEAs). Drug Test. Anal. 2017, 9, 500–517. [CrossRef] [PubMed]
24. Mattingly, R.R.; Stephens, L.R.; Irvine, R.F.; Garrison, J.C. Effects of transformation with the v-src oncogene on inositol phosphate
metabolism in rat-1 fibroblasts. D-myo-inositol 1; 4; 5; 6-tetrakisphosphate is increased in v-src-transformed rat-1 fibroblasts and
can be synthesized from D-myo-inositol 1; 3; 4-trisphosphate in cytosolic extracts. J. Biol. Chem. 1991, 266, 15144–15153.
25. Abdin, M.; Hamed, Y.S.; Akhtar, H.M.S.; Chen, D.; Chen, G.; Wan, P.; Zeng, X. Antioxidant and anti-inflammatory activities of
target anthocyanins di-glucosides isolated from Syzygium cumini pulp by high speed counter-current chromatography. J. Food
Biochem. 2020, 44, 1050–1062. [CrossRef]
26. Oh, S.M.; Kim, Y.P.; Chung, K.H. Biphasic effects of kaempferol on the estrogenicity in human breast cancer cells. Arch. Pharmacal
Res. 2006, 29, 354–362. [CrossRef]
Life 2023, 13, 439 15 of 15
27. Xiao, J.; Sun, G.B.; Sun, B.; Wu, Y.; He, L.; Wang, X.; Sun, X.B. Kaempferol protects against doxorubicin-induced cardiotoxicity
in vivo and in vitro. Toxicology 2012, 292, 53–62. [CrossRef] [PubMed]
28. Hsieh, Y.S.; Kwon, S.; Lee, H.S.; Seol, G.H. Linalyl acetate prevents hypertension-related ischemic injury. PLoS ONE 2018,
13, e0198082. [CrossRef]
29. Wang, S.; Guan, X.; Zhong, X.; Yang, Z.; Huang, W.; Jia, B.; Cui, T. Simultaneous determination of cucurbitacin IIa and cucurbitacin
IIb of Hemsleya amabilis by HPLC–MS/MS and their pharmacokinetic study in normal and indomethacin-induced rats. Biomed.
Chromatogr. 2016, 30, 1632–1640. [CrossRef] [PubMed]
30. Acquaviva, R.; Di Giacomo, C.; Sorrenti, V.; Galvano, F.; Santangelo, R.; Cardile, V.; Vanella, L. Antiproliferative effect of
oleuropein in prostate cell lines. Int. J. Oncol. 2012, 41, 31–38.
31. Acharya, N.; Acharya, S.; Shah, U.; Shah, R.; Hingorani, L. A comprehensive analysis on Symplocos racemosa Roxb.: Traditional
uses; botany; phytochemistry and pharmacological activities. J. Ethnopharmacol. 2016, 181, 236–251. [CrossRef]
32. Kulikova, V.; Morozova, E.; Rodionov, A.; Koval, V.; Anufrieva, N.; Revtovich, S.; Demidkina, T. Non-stereoselective decomposi-
tion of (±)-S-alk (en) yl-l-cysteine sulfoxides to antibacterial thiosulfinates catalyzed by C115H mutant methionine γ-lyase from
Citrobacter freundii. Biochimie 2018, 151, 42–44. [CrossRef]
33. Sakata, T.; Oda, S.; Tsunaga, Y.; Shomura, H.; Kawagishi-Kobayashi, M.; Aya, K.; Higashitani, A. Reduction of gibberellin by low
temperature disrupts pollen development in rice. Plant Physiol. 2014, 164, 2011–2019. [CrossRef] [PubMed]
34. Engels, B.; Heinig, U.; McElroy, C.; Meusinger, R.; Grothe, T.; Stadler, M.; Jennewein, S. Isolation of a gene cluster from Armillaria
gallica for the synthesis of armillyl orsellinate–type sesquiterpenoids. Appl. Microbiol. Biotechnol. 2020, 105, 211–224. [CrossRef]
[PubMed]
35. McCormack, D.; McFadden, D. A review of pterostilbene antioxidant activity and disease modification. Oxidative Med. Cell.
Longev. 2013, 2013, 575482. [CrossRef] [PubMed]
36. Bhakta, H.K.; Park, C.H.; Yokozawa, T.; Tanaka, T.; Jung, H.A.; Choi, J.S. Potential anti-cholinesterase and β-site amyloid precursor
protein cleaving enzyme 1 inhibitory activities of cornuside and gallotannins from Cornus officinalis fruits. Arch. Pharmacal Res.
2017, 40, 836–853. [CrossRef] [PubMed]
37. Zhang, J.; Wu, J.; Liu, F.; Tong, L.; Chen, Z.; Chen, J.; Huang, C. Neuroprotective effects of anthocyanins and its major component
cyanidin-3-O-glucoside (C3G) in the central nervous system: An outlined review. Eur. J. Pharmacol. 2019, 858, 172500. [CrossRef]
[PubMed]
38. Calderón-Montano, J.M.; Burgos-Morón, E.; Orta, M.L.; Pastor, N.; Austin, C.A.; Mateos, S.; López-Lázaro, M. Alpha; beta-
unsaturated lactones 2-furanone and 2-pyrone induce cellular DNA damage; formation of topoisomerase I-and II-DNA complexes
and cancer cell death. Toxicol. Lett. 2013, 222, 64–71. [CrossRef]
39. Xin, X.Q.; Chen, Y.; Zhang, H.; Li, Y.; Yang, M.H.; Kong, L.Y. Cytotoxic seco-cytochalasins from an endophytic Aspergillus sp.
harbored in Pinellia ternata tubers. Fitoterapia 2019, 132, 53–59. [CrossRef] [PubMed]
40. Ahn, J.Y.; Choi, S.E.; Jeong, M.S.; Park, K.H.; Moon, N.J.; Joo, S.S.; Seo, S.J. Effect of taxifolin glycoside on atopic dermatitis-like
skin lesions in NC/Nga mice. Phytother. Res. 2010, 24, 1071–1077. [CrossRef]
41. Cho, J.; Tremmel, L.; Rho, O.; Camelio, A.M.; Siegel, D.; Slaga, T.J.; DiGiovanni, J. Evaluation of pentacyclic triterpenes found
in Perilla frutescens for inhibition of skin tumor promotion by 12-O-tetradecanoylphorbol-13-acetate. Oncotarget 2015, 6, 39292.
[CrossRef]
42. Lin, Y.; Dubinsky, W.P.; Ho, D.H.; Felix, E.; Newman, R.A. Determinants of human and mouse melanoma cell sensitivities to
oleandrin. J. Exp. Ther. Oncol. 2008, 7, 195–205.
43. Umekawa, T.; Yamate, T.; Amasaki, N.; Kohri, K.; Kurita, T. Osteopontin mRNA in the kidney on an experimental rat model of
renal stone formation without renal failure. Urol. Int. 1995, 55, 6–10. [CrossRef] [PubMed]
44. Kohri, K.E.A.; Nomura, S.; Kitamura, Y.; Nagata, T.; Yoshioka, K.; Iguchi, M.; Sinohara, H. Structure and expression of the mRNA
encoding urinary stone protein (osteopontin). J. Biol. Chem. 1993, 268, 15180–15184. [CrossRef] [PubMed]
45. Robitaille, L.; Mamer, O.A.; Miller, W.H., Jr.; Levine, M.; Assouline, S.; Melnychuk, D.; Hoffer, L.J. Oxalic acid excretion after
intravenous ascorbic acid administration. Metabolism 2009, 58, 263–269. [CrossRef] [PubMed]
46. Kropp, H.; Sundelof, J.G.; Hajdu, R.; Kahan, F.M. Metabolism of thienamycin and related carbapenem antibiotics by the renal
dipeptidase; dehydropeptidase-I. Antimicrob. Agents Chemother. 1983, 22, 62–70. [CrossRef]
47. Philbrick, D.J.; Bureau, D.P.; Collins, F.W.; Holub, B.J. Evidence that soyasaponin Bb retards disease progression in a murine
model of polycystic kidney disease. Kidney Int. 2003, 63, 1230–1239. [CrossRef] [PubMed]
48. O’Donnell, G.; Poeschl, R.; Zimhony, O.; Gunaratnam, M.; Moreira, J.B.; Neidle, S.; Gibbons, S. Bioactive pyridine-N-oxide
disulfides from Allium stipitatum. J. Nat. Prod. 2009, 72, 360–365. [CrossRef]
49. Stahlhut, S.G.; Siedler, S.; Malla, S.; Harrison, S.J.; Maury, J.; Neves, A.R.; Forster, J. Assembly of a novel biosynthetic pathway for
production of the plant flavonoid fisetin in Escherichia coli. Metab. Eng. 2015, 31, 84–93. [CrossRef] [PubMed]
50. IJAH Analytics. Available online: http://ijah.apps.cs.ipb.ac.id/#/home (accessed on 13 December 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Life 13 00439 v2

Uploaded by

Copyright:

Available Formats

Life 13 00439 v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Life 13 00439 v2

Uploaded by

Copyright:

Available Formats

life

Citation: Wijaya, S.H.; Nasution,

Life 2023, 13, 439. https://doi.org/10.3390/life13020439 https://www.mdpi.com/journal/life

2. Materials and Methods

Life 2022, 12, x FOR PEER REVIEW 4 of 16

Table 1. The minimum and maximum metabolites of each disease’s class.

Number of Unani Number of Metabolites

2.1. Data Preprocessing

2.2. Model Generation and Comparison

2.3. Extracting Important Metabolites

Table 2. Summary of filtering dataset.

Type of Dataset Accuracy (%) Data Dimension Number of Efficacy

Dataset before filtering - [609 × 4688] 17

Dataset filtering random forest 80.83 [307 × 4688] 16

Dataset type 1 Dataset type 2

3.3. Identification of Important Metabolites

Table 3. Statistics of the selected compound from each disease class.

Weight of Variable Importance

3.4. Validation of Important Metabolites

Table 4. List of validated metabolites.

No Feature Metabolites Weight Validation

No Feature Metabolites Weight Validation

No Feature Metabolites Weight Validation

You might also like