Life 13 00439 v2
Life 13 00439 v2
Life 13 00439 v2
Article
Deep Learning Approach for Predicting the Therapeutic Usages
of Unani Formulas towards Finding Essential Compounds
Sony Hartono Wijaya 1, * , Ahmad Kamal Nasution 2 , Irmanida Batubara 3 , Pei Gao 2 , Ming Huang 2 ,
Naoaki Ono 2 , Shigehiko Kanaya 2 and Md. Altaf-Ul-Amin 2, *
1 Department of Computer Science, Faculty of Mathematics and Natural Sciences, IPB University,
Bogor 16680, Indonesia
2 Computational Systems Biology Lab, Graduate School of Science and Technology,
Nara Institute of Science and Technology, Nara 630-0101, Japan
3 Department of Chemistry, Faculty of Mathematics and Natural Sciences, IPB University,
Bogor 16680, Indonesia
* Correspondence: [email protected] (S.H.W.); [email protected] (M.A.-U.-A.)
Abstract: The use of herbal medicines in recent decades has increased because their side effects are
considered lower than conventional medicine. Unani herbal medicines are often used in Southern
Asia. These herbal medicines are usually composed of several types of medicinal plants to treat
various diseases. Research on herbal medicine usually focuses on insight into the composition of
plants used as ingredients. However, in the present study, we extended to the level of metabolites that
exist in the medicinal plants. This study aimed to develop a predictive model of the Unani therapeutic
usage based on its constituent metabolites using deep learning and data-intensive science approaches.
Furthermore, the best prediction model was then utilized to extract important metabolites for each
therapeutic usage of Unani. In this study, it was observed that the deep neural network approach
provided a much better prediction model than other algorithms including random forest and support
vector machine. Moreover, according to the best prediction model using the deep neural network, we
identified 118 important metabolites for nine therapeutic usages of Unani.
of drug management. Research on building Unani’s scientific foundation has not been
carried out much by researchers. This is needed to provide a foundation and knowledge
as to why an Unani formula is useful for a particular disease. Unani medicines are made
by the extraction of medicinal plants that are used as drugs against various diseases [3].
Based on [5], the Unani System of Medicine was invented in Greece and refined by Arabs
into a sophisticated medical discipline using the framework of Hippocrates’ and Jalinoos’
teachings (Galen). Unani medicine has since been referred to as Greco-Arab medicine.
The Hippocratic notion of the four humors are blood, phlegm, yellow bile, and black bile.
According to this approach, these principles govern the health and composition of the
body and pathological states. The Unani System of Medicine (USM) has been acknowl-
edged by the World Health Organization (WHO) as an alternative system to meet the
demands of the human population in terms of health care. The practice of alternative
medicine is widespread.
One approach in building Unani’s scientific foundation is supervised learning by
utilizing data science. Supervised learning is the machine learning task of learning a
function that maps an input to an output based on examples of input–output pairs [6]. It
infers a function from labeled training data consisting of a set of training examples [7]. A
supervised learning algorithm utilizes the training data and produces an inferred function,
which can be used for mapping new examples. An optimal scenario will allow the algorithm
to determine the class labels for unseen instances correctly.
Deep learning is a supervised learning process. In this work, we want to utilize
deep learning and other machine learning algorithms to find the relationship between the
therapeutic usage of Unani and its constituent metabolites (compounds). The concept of
deep learning also allows computers to model complicated and complex data concepts.
This approach is considered effective for complex data because the principles of learning
emulate the work of human neurons. In addition to supervised learning, this method can
also be used for unsupervised and semi-supervised learning. This study uses a derivative
of deep learning architecture: the deep neural network (DNN). The DNN is an artificial
neural network with a certain level of complexity that has more than one hidden layer [8].
The DNN is considered capable of solving complex problems because this approach has a
fairly complex architecture that makes it possible to study data up to the level of abstraction.
According to [9], this method is beneficial in the process of visual object recognition, object
detection, drug discovery, and genomics.
In this study, we utilized supervised learning to predict the interactions between the
therapeutic usage of Unani formulas based on their metabolites using the deep neural
network. We also compared the prediction performance of the DNN with other machine
learning methods. Moreover, we determined significant metabolites based on target dis-
eases/therapeutic usage of the Unani formula according to the best prediction model, and
validated the result based on journal references and counting the structural similarity with
relevant metabolites [10]. Hence, these results can be used as a reference to other research
and basic knowledge of drug discovery.
The minimum and maximum number of compounds associated with a formula cor-
responding to 18 disease classes are shown in Table 1. Finally, we represented the col-
lected data as a two-dimensional table, in which the rows represent the Unani formulas
and columns represent metabolites. Figure 2b illustrates the data representation of herbal
medicine–metabolite relations. The number of metabolites associated with 369 medicinal
plants is 4688. Therefore, the dimension of the matrix indicating relations between Unani
formulas
Figure
Figure 1. and metabolites
1.Schematic
Schematic diagramis
diagram of609
of the ×prediction
4688. of
theprediction ofUnani
Unaniefficacy
efficacyand
andidentification
identification of
ofmetabolites
metabolites for
for
each
each efficacy
efficacy group.
group.
The initial Unani formulas consisted of plants as ingredients. Unani compounds were
collected according to the corresponding plants by using the following databases: KNAp-
SAcK Family Databases (http://www.knapsackfamily.com/KNApSAcK_Family, accessed
on 25 June 2021), IJAH Analytics (http://ijah.apps.cs.ipb.ac.id, accessed on 3 July 2021),
KEGG (https://www.genome.jp/kegg/, accessed on 10 July 2021), and ChEbi
(https://www.ebi.ac.uk/chebi/, accessed on 11 September 2021). The distribution of me-
tabolites collected for each medicinal plant is shown in Figure 3. The number of com-
pounds belonging to a medicinal plant varies a lot: some plants are associated with a few
metabolites, whereas some are associated with many.
The KNApSAcK database (DB) contains information on the species–metabolite rela-
tionship (101.500), encompassing 20,741 species and 50,048 metabolites. This database also
contains
(a) information on accurate mass, molecular formula,(b) metabolite name, and mass
spectra in several ionization modes [10]. IJAH Analytics is an open-access database spe-
Figure
Figure 2.2.for
cifically Representation
Jamu data. This
Representation of of
thethe
dataset
database including
datasetprovides
includingtherapeutic usage
thetherapeutic for for
plant–metabolite
usage each Unani
relations,
each formula. (a)
andformula.
Unani we as-
Unani–plant relations; (b) Unani–metabolite relations.
sume some metabolites
(a) Unani–plant relations; might be common between
(b) Unani–metabolite relations.Jamu and Unani because both are clas-
sified as traditional medicine. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is
The
also an initial Unani
open-access formulas
database consisted
containing oforganism,
cell, plants as and
ingredients.
molecularUnani compounds
information with
were collected according to the corresponding plants by using
the specific large-scale molecular datasets. The Chemical Entities of Biological the following databases:
Interest
KNApSAcK
(ChEBI) Family
database Databases
contains (http://www.knapsackfamily.com/KNApSAcK_Family,
molecular entities focusing on small chemical compounds.
accessed on 25 June 2021), IJAH Analytics (http://ijah.apps.cs.ipb.ac.id, accessed on
3 July 2021), KEGG (https://www.genome.jp/kegg/, accessed on 10 July 2021), and
ChEbi (https://www.ebi.ac.uk/chebi/, accessed on 11 September 2021). The distribution
of metabolites collected for each medicinal plant is shown in Figure 3. The number of
compounds belonging to a medicinal plant varies a lot: some plants are associated with a
few metabolites, whereas some are associated with many.
(a) (b)
Life 2023, 13, 439 4 of 15
Figure 2. Representation of the dataset including therapeutic usage for each Unani formula. (a)
Unani–plant relations; (b) Unani–metabolite relations.
Figure
Figure3.3.
The distribution
The of compounds
distribution for selected
of compounds medicinal
for selected plants from
medicinal the from
plants top ofthe
thetop
list. of the list.
Table 1.
TheTheKNApSAcK
minimum and maximum
databasemetabolites of each disease’s
(DB) contains class.
information on the species–metabolite re-
lationship (101.500), encompassing 20,741 species and 50,048 metabolites. This database
Number of Metabolites
ID Therapeutic Usage Numberon
also contains information of accurate
Unani Formula
mass, molecular formula, metabolite name, and
Minimum Maximum Mean
mass spectra in several ionization modes [10]. IJAH Analytics is an open-access database
3 The Digestive System 103 3 518 87.63
16
specifically for Jamu data. This40
Skin and Connective Tissue
database provides the16plant–metabolite
240
relations,
75.97
and we
15 assume
Respiratory Diseases some metabolites might 32 be common between15 Jamu and Unani
379 because
107.9 both are
17 classified
The Urinary System as traditional medicine.
31 The Kyoto Encyclopedia
17 of Genes
366 and Genomes
110.5 (KEGG)
10 is also an open-access database 26
Male-Specific Diseases containing cell, organism,
10 and molecular
545 information
148 with
6 the specific large-scale molecular
Female-Specific Diseases 22 datasets. The Chemical
6 Entities
309 of Biological
122 Interest
13 (ChEBI) database contains molecular
The Nervous System 22 entities focusing33on small chemical
265 compounds.
104.5
The minimum and maximum number of compounds associated with a formula corre-
sponding to 18 disease classes are shown in Table 1. Finally, we represented the collected
data as a two-dimensional table, in which the rows represent the Unani formulas and
columns represent metabolites. Figure 2b illustrates the data representation of herbal
medicine–metabolite relations. The number of metabolites associated with 369 medicinal
plants is 4688. Therefore, the dimension of the matrix indicating relations between Unani
formulas and metabolites is 609 × 4688.
Figure
Figure 4. The
4. The architecture
architecture of deepoflearning
deep learning
[7]. [7].
Initially, we tuned the DNN to obtain the optimal parameter values. The DN
advanced artificial neural network that has more than one hidden layer between t
and output layers. Each hidden layer has an activation function such as a sigmo
fied linear unit (ReLU), or hyperbolic tangent (tanh) function to map the input
previous layer to the output that will be sent to the layer afterward.
The DNN can be discriminatorily trained with backpropagation using cost
Life 2023, 13, 439 6 of 15
Initially, we tuned the DNN to obtain the optimal parameter values. The DNN is
an advanced artificial neural network that has more than one hidden layer between the
input and output layers. Each hidden layer has an activation function such as a sigmoid,
rectified linear unit (ReLU), or hyperbolic tangent (tanh) function to map the input from
the previous layer to the output that will be sent to the layer afterward.
The DNN can be discriminatorily trained with backpropagation using cost function
derivatives to measure the difference between the target output and actual output. Back-
propagation for large training data is performed on a small portion of data taken at random
so that it is more efficient than considering all data together.
The DNN, with a large number of hidden layers, is challenging to optimize. The ap-
proach of using the gradient descent from a randomly generated starting point close to the
actual value cannot produce a good set of weights, unless careful weight-scale initialization
is completed. Therefore, the initialization of weights in DNN modeling becomes essential
to improve the DNN modeling performance. We also compared the performance of the
DNN with other supervised learning methods, such as random forest [12], and support
vector machine [13].
3. Results
3.1. Filtering Dataset
First, we removed 33 Unani formulas for fever because this symptom can be found in
many disease classes. Then, we eliminated 195 Unani formulas which have more than one
therapeutic usage, and also eliminated unrelated metabolites after the reduction of Unani
formulas. We applied single filtering using random forest and the deep neural network,
separately. The filtering process was conducted by using all datasets as training data and
also as testing data, and misclassified formulas were deleted. Therefore, we obtained two
datasets from two different types of filtering, namely dataset 1 as the dataset after filtering
using random forest, and dataset 2 as the dataset after filtering using the deep neural
network. The dimensions of the data after filtering can be seen in Table 2.
Next, we examined the distribution of formulas to each efficacy class after filtering.
Each class in both datasets should have had enough Unani samples to generate good
prediction models. Therefore, we eliminated efficacy classes 1, 2, 4, 5, 7, 9, 14, and 18
because only a few Unani formulas were available in both datasets as follows (dataset 1,
dataset 2): (8, 4), (1, 0), (10, 5), (7, 1), (3, 3), (0, 0), (3, 0), and (13, 0). After this removal, the
distribution of the Unani formulas in dataset 1 and dataset 2 is shown in Figure 5.
Type of Dataset Accuracy (%) Data Dimension Number of Efficacy
Figure5.5.Comparison
Figure Comparisonof
ofUnani
Unanidata
datafor
foreach
eachtherapeutic’s
therapeutic’susage
usageafter
afterfiltering.
filtering.
3.2.Performance
3.2. PerformanceofofPrediction
Prediction
The datasets
The datasets obtained
obtained from
from the
theprevious
previousprocess
processwere
wereused toto
used develop
develop a model
a modelfor
thethe
for prediction of therapeutic
prediction of therapeuticusages of Unani
usages usingusing
of Unani machine learning
machine approaches
learning (Table
approaches
2). We2).
(Table adopted severalseveral
We adopted methods, namely
methods, deep neural
namely networks
deep neural (DNN),
networks random
(DNN), forest
random
(RF), and support vector machine (SVM), etc. The deep neural network was chosen as a
forest (RF), and support vector machine (SVM), etc. The deep neural network was chosen
recommended
as a recommended classifier because
classifier becausethis method
this methodisisrobust
robustfor
for imbalanced
imbalanced and and multi-class
multi-class
problem data. The DNN model that was built for this study was completedaccording
problem data. The DNN model that was built for this study was completed accordingto to
the
themethod
methodproposed
proposedby by[14].
[14]. This
This method
method is is considered
consideredto to be
be able
able to
to model
model complex
complex data.
data.
Tuning
Tuningparameters
parameters are are important factors for
important factors for forming
formingaaprediction
predictionmodel.
model.InIn terms
terms of
of the deep neural network, several parameters affected the accuracy
the deep neural network, several parameters affected the accuracy value of the DNN value of the DNN
model,
model,suchsuchasasthe
theactivation
activationfunction,
function,the
thedropout
dropoutvalue,
value,the
thenumber
numberof ofkkin
inthe
thevalidation
validation
process
process (k-fold cross-validation), the number of hidden layers, and the number of
(k-fold cross-validation), the number of hidden layers, and the number of epochs.
epochs.
Each parameter was tuned by considering a range of values as follows:
Each parameter was tuned by considering a range of values as follows: activation activation functions
func-
(“relu”, “tanh”, “sigmoid”) [15], the dropout value (0.15, 0.25, 0.40, 0.50), the value of k
tions (“relu”, “tanh”, “sigmoid”) [15], the dropout value (0.15, 0.25, 0.40, 0.50), the value
concerning cross-validation (4, 5, 6, 7, 8, 9, 10), the number of hidden layers (4, 6, 8, 12), and
of k concerning cross-validation (4, 5, 6, 7, 8, 9, 10), the number of hidden layers (4, 6, 8,
the number of epochs (30, 50, 100, 500). Then, the best DNN parameters were processed
12), and the number of epochs (30, 50, 100, 500). Then, the best DNN parameters were
using a grid search for both datasets. The optimal parameters for both datasets were the
processed using a grid search for both datasets. The optimal parameters for both datasets
same as follows: activation function = “relu”, dropout value = 0.40, k value = 5, number of
were the same as follows: activation function = “relu”, dropout value = 0.40, k value = 5,
hidden layers = 4, and number of epochs = 30. The prediction results for each fold using
the DNN with the best parameters can be seen in Figure 6.
In the random forest, there are several parameters that should be tuned when making
the RF model, such as n_estimators as the number of trees formed by RF, max_features,
max_depth, min_samples_split, min_samples_leaf, and bootstrap. For each parameter
used in the tuning processes, we utilized the range of values as follows: {‘n_estimators’:
(200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000); ‘max_features’: (‘auto’, ‘sqrt’);
‘max_depth’: (10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None); ‘min_samples_split’: (2,5,10);
‘min_samples_leaf’: (1, 2, 4); and ‘bootstrap’: (True, False)}. The results obtained after a grid
search for the RF model for dataset 1 and 2 were as follows: dataset 1 (n_estimators = 1000,
min_samples_split = 10, min_samples_leaf = 2, max_features = ‘sqrt’, max_depth = 10), and
dataset 2 (n_estimators = 400, min_samples_split = 10, min_samples_leaf = 4,
max_features = ‘sqrt’, max_depth = 90, bootstrap = True). After obtaining the best pa-
rameter results, the prediction model was performed using 5-fold cross-validation. The
prediction results for each fold using RF using the best parameters can be seen in Figure 6.
model are the type of kernel, gamma value, and C. The SVM parameters were tuned using
the search grid according to this configuration: {‘kernel’: (“rbf”, “linear”); ‘gamma’: (0.001,
0.0001, “auto”); and ‘C’: (1, 10, 100, 1000)}. The best parameters for both datasets were as
follows: dataset 1 (kernel: “linear,” C: 1, and gamma: 0.001) and dataset 2 (kernel: “rbf”,
C: 1 and gamma: “auto”). Then, the prediction accuracies obtained using those parameters
Life 2023, 13, 439 and 5-fold cross-validation are shown in Figure 6. Similarly, we performed parameter-8 of 15
tuning for XGBoost and K-nearest neighbors (KNN) algorithms, and the results are shown
in Figure 6.
Figure 6. Comparison of prediction accuracy of deep neural network, random forest, support vector
Figure 6. Comparison of prediction accuracy of deep neural network, random forest, support vector
machine, XGBoost, and K-nearest neighbors algorithms using both datasets.
machine, XGBoost, and K-nearest neighbors algorithms using both datasets.
The comparison of the classifier performances is shown in Figure 7. For the random
In the SVM, the parameters needed to be tuned to form the best SVM prediction model
forest, support vector machine, and XGBoost, the averages of prediction accuracy were
arebelow
the type
40%,of kernel,
and for thegamma value,
KNN it was and 60%
around C. The
butSVM parameters
still much less thanwere tuned
the deep using the
learning
search grid according to this configuration: {‘kernel’: (“rbf”, “linear”); ‘gamma’:
method. In this study, the DNN achieved 87.4% accuracy. The results imply that the pre- (0.001,
diction models based on RF and SVM are not able to make a good efficacy prediction using as
0.0001, “auto”); and ‘C’: (1, 10, 100, 1000)}. The best parameters for both datasets were
Unani’sdataset
follows: compounds as features.
1 (kernel: “linear,” C: 1, and gamma: 0.001) and dataset 2 (kernel: “rbf”, C:
1 and gamma: “auto”). Then, the prediction accuracies obtained using those parameters
and 5-fold cross-validation are shown in Figure 6. Similarly, we performed parameter-
tuning for XGBoost and K-nearest neighbors (KNN) algorithms, and the results are shown
in Figure 6.
The comparison of the classifier performances is shown in Figure 7. For the random
forest, support vector machine, and XGBoost, the averages of prediction accuracy were
below 40%, and for the KNN it was around 60% but still much less than the deep learning
method. In this study, the DNN achieved 87.4% accuracy. The results imply that the
Life 2022, 12, x FOR PEER REVIEW prediction models based on RF and SVM are not able to make a good efficacy prediction
9 of 16
using Unani’s compounds as features.
100
90
80
70
Accuracy
60
50
40
30
20
10
0
Deep Learning Suppport Random Xgboost K Nearest
Vector Forest Neighbors
Machine
Figure7.7. Comparison
Figure Comparison of
of prediction
prediction accuracy
accuracy between
between deep
deep neural
neural network,
network, random
random forest,
forest, and
and
supportvector
support vectormachine
machineusing
usingboth
bothdatasets.
datasets.
One of the reasons that influenced the results was the imbalanced amount of Unani
formulas belonging to different efficacy classes. It is noteworthy that the results of the
prediction model based on the DNN could increase the accuracy measure by about 50%
when compared to RF and SVM.
Life 2023, 13, 439 9 of 15
One of the reasons that influenced the results was the imbalanced amount of Unani
formulas belonging to different efficacy classes. It is noteworthy that the results of the
prediction model based on the DNN could increase the accuracy measure by about 50%
when compared to RF and SVM.
Table 4. Cont.
Table 4. Cont.
Table 4 shows the list of predicted compounds for which we could find validations.
Corresponding to the disease category ‘The Digestive System’, there were eight validated
compounds. Out of them, 6H-dibenzo[b,d]pyran-6-one is effective against Enterophytoe-
strogens [16]. lyratol C is used as a drug to treat colorectal neoplasms [17]. Epithienamycin
E is a substance that kills or slows the growth of microorganisms, including bacteria, viruses,
fungi, and protozoans [18]. 9(S)-HOTrE enhances reverse cholesterol transport (RCT) by
increasing the apoA-I transcription in human hepatocellular carcinoma (HepG2) cells [18].
Cimifoetiside A is the active ingredient in Cimicifuga spp., which is used to relieve diarrhea
in TCM [20]. Gymnemic acid XII possesses a higher binding affinity to PPARγ, a promising
drug target for diabetes [21]. Quercetin 7,40 -di-O-β-D-glucoside is the active ingredient
in Delonix elata, which is used to relieve flatulence and purgatives in Saudi Arabia [22].
Furthermore, as therapeutic agents, phenethylamine acts as an appetite suppressant [23].
For the ‘Female-Specific’ category, we have validated five compounds. D-myo-inositol
1,2,5,6-tetracisphosphate inhibits fibroma. This process can also block chloride channels
resulting in epithelial calcium activation [24]. Delphin has been reported to inhibit in-
flammation in some gynecological infections [25]. Butine is the active ingredient in the
ingredients TCM, Albizia glaberrima, and (R)-4-hydroxy-1-methyl-L-proline from Aglaia
andamanica. Additionally, Jamu takes Malvidin as a medical composition.
For the category ‘The Heart and Blood Vessels’, we found four validated compounds.
Out of them, kaempferol 3-O-[α-L-rhamnopyranosyl(1→2)-β-D-galactopyranosyl]-7-O-α-
L-rhamnopyranoside is a candidate agent for the treatment of cardiovascular diseases [26].
Succinic acid is an active component that is applied in Jamu. Linalyl acetate prevents
hypertension-related ischemic injury and can prevent the production of ROS [28].
In the case of ‘Male-Specific Diseases’, there were seven validated compounds. Ac-
cording to the Simpson similarity, Obtusifoliol resembles Euphadienol, which has anti-
inflammatory effects [29]. Methyl 4-hydroxy cinnamate, ∆6-protoilludene, and 3-O-Acetylo-
leanolic acid are active against prostate cancer [30]. Butiin demonstrates the growth inhibi-
tion of Gram-positive and Gram-negative bacteria that cause male-specific infections [32].
Gibberellin A12 is implicated in the treatment of male infertility [33]. The ∆-6-protoilludene
is a precursor for the synthesis of both melleolides and armillyl orsellinates, whose cyto-
toxicity reflects their ability to induce apoptosis [34]. In addition, erythrodiol is an active
ingredient from the herb, Rhododendron ferrugineum, which is used in TCM.
According to the category ‘Muscle and Bone’, the number of compounds validated was
4. Among them, 14-deoxo-3-O-propionyl-5,15-di-O-acetyl-7-O-benzoylmyrsinol-14beta-
Life 2023, 13, 439 12 of 15
nicotinoate shows similarities with perfluorooctyl iodide. These metabolites are useful as
organocatalysts through the activation of substrates with halogen bonds. Euphorbiapro-
liferin I resembles cesium and Euphorbiaproliferin G is similar to moli001259. Structural
similarity is measured based on Simpson’s similarity. Furthermore, Euphorbiaproliferin D
can be isolated from TCM ingredients, namely Euphorbia prolifera. Euphorbia prolifera can
cure various diseases when referring to TCM.
Corresponding to the disease category ‘The Nervous System’, the validated compounds
are pterostilbene, Trapain, and cyanidin 3-O-(6-O-acetyl-β-D-glucoside). The antioxidant
activity of pterostilbene has been implicated in the modulation of neurological disease [35].
Trapain is a promising agent for the treatment of Alzheimer’s disease as the Cholinesterase
and β-site amyloid precursor protein-cleaving enzyme 1 inhibitor [36]. Finally, cyanidin
3-O-(6-O-acetyl-β-D-glucoside) has been verified to have a neuroprotective effect [36].
In the case of ‘The Respiratory Diseases’, 6-epi-guttiferone J, 2(3H)-Furanone and 2-(3,4-
dihydroxyphenyl)-ethyl-O-β-D-glucopyranoside were validated. Based on the Simpson
similarity, 6-epi-guttiferone J resembles (0.902) a moderate antinociceptive agent, sesquiter-
pene lactone. In addition, 2(3H)-Furanone is reported to show anticancer and DNA-
damaging activities in A549 lung cancer cells [38,39]. Furthermore, 2-(3,4-dihydroxyphenyl)-
ethyl-O-β-D-glucopyranoside is a component of TCM herbal, Cornus mas/alba L., which is
applied in the practice as an anti-inflammatory and antibacterial drug.
For the category ‘Skin and Connective Tissue’, Taxifolin 30 -glucoside, Oleanolic acid,
Oleandrin, Himaphenolone, Coniferyl aldehyde, and Cedrin were the validated metabo-
lites. Taxifolin 30 -glucoside is effective for preventing the production of inflammatory
cytokines and reducing atopic dermatitis [40]. Oleanolic acid can inhibit skin tumor
promotion [41]. Oleandrin is shown to induce the apoptosis of malignant cells in melanoma
cell lines [42]. Himaphenolone is the active ingredient of the herb, Cedrus deodara (Roxb.)
Loud, which can be used for the treatment of carbuncle sores, eczema, traumatic bleed-
ing, burns, and scalds. Coniferyl aldehyde is similar to a drug, and Nalco L. and Cedrin
resemble dihydroquercetin.
In terms of the ‘Urinary System’ category, we have validated Glyoxylic acid, Biochanin
A, pyruvic acid, oxalic acid, Soyasaponin I, 2-(methyldithio)pyridine-N-oxide, Liquir-
itigenin, Garbanzol, and Medicagol. Glyoxylic acid and oxalic acid are involved in the
formation of kidney stones [43,44]. Pyruvic acid can prevent oxalate urolithiasis in mice [45].
Soyasaponin I inhibited kidney enlargement and cyst growth in a murine model of polycys-
tic kidney disease [46]. Then, 2-(methyldithio)pyridine-N-oxide and Garbanzol were both
shown to inhibit renal neoplasm [48,49]. Lastly, Liquiritigenin, Biochanin A, and Medicagol
are effective components used in Jamu [50].
4. Discussion
We tried our best to collect as many metabolites as possible for each Unani plant from
various resources. Medicinal metabolites are of more importance to researchers and usually
they are the first identified for various plants. Therefore, we assumed that the currently
available plant–metabolite relation could produce good results up to a certain extent.
The approach adopted in the current work can be considered as a top-down approach
because we started with a global set of Unani formulas in terms of plants, and then
we moved down to the metabolite level and utilized state-of-the-art machine learning
techniques to identify significant compounds. Hence, the approach is also a computational
approach. The results we obtained are promising, showing the strength and usefulness
of computational approaches in drug discovery. Our input data correspond to versatile
types of diseases. In this work, we considered disease classes at an upper hierarchy, and
under each class, there were diseases with some differences. Interestingly, our results also
show compounds corresponding to different types of diseases under each category. This
has been possible by investigating and identifying significant compounds within formulas
showing bias to specific disease classes/categories using efficient algorithms. Therefore,
these are the results of the systems-level investigation.
Life 2023, 13, 439 13 of 15
Another thing that is interesting to discuss is the other compounds (not validated)
extracted from the best model of this study. The validation results show around 43% of
compounds are directly or indirectly related to the therapeutic group of diseases. The
remaining 69 compounds are potential candidates for further research, for example, in
the fields of biochemistry, pharmacy, medicine, and so on. Last, the simple binary data to
represent metabolites have performed well in this study. However, other approaches can
be explored to improve the results.
5. Conclusions
A prediction of the therapeutic usage of the Unani formulas based on their constituent
metabolites using the deep neural network showed the highest accuracy compared to other
algorithms, e.g., the random forest and support vector machine, etc. The best prediction
accuracies corresponding to DNN, KNN, Xgboost, RF, and SVM were 87.4%, 63.2%, 39.3%,
37.9%, and 38.6%, respectively. The results of this prediction indicate that the DNN
performed much better compared to other algorithms. In this work, two datasets were
prepared using filtering techniques, namely, dataset 1 and dataset 2. In the case of the
DNN, the best accuracy was obtained from dataset 1, while RF and SVM obtained the best
accuracy from dataset 2. In general, the filtering process improves prediction accuracy, but
our results were mainly influenced by the type of classifier algorithms.
Based on the best classification model, we extracted important metabolites by making
use of the DNN interest variable. Corresponding to the nine therapeutic uses of the Unani
formula, we extracted 118 essential metabolites, 49 of which were validated using the
following methods: searching in supporting health-related journals/articles, searching the
same metabolites in Jamu or TCM, and searching metabolites with a similar structure and
activity in the PubChem database.
For future work of this research, we need to consider increasing the number of Unani
formulas; by doing this, the number of plants and metabolites will increase simultaneously.
We will be finding more sources of plant–metabolite relation databases, such as open-source
databases, books, and journals, so that our dataset is closer to the actual conditions and
acceptable also in the industry. The authors also recommend using artificially generated
data in testing to support and strengthen the prediction results of model accuracy.
Supplementary Materials: The following supporting information can be downloaded at: https://
www.mdpi.com/article/10.3390/life13020439/s1, Table S1: List of important metabolites for each
disease class extracted from best prediction model using variable importance of Deep Neural Network.
Author Contributions: Conceptualization, S.H.W., A.K.N., S.K. and M.A.-U.-A.; methodology,
A.K.N., S.H.W., S.K., M.A.-U.-A., N.O., I.B. and M.H.; dataset preparation, S.H.W., A.K.N. and
M.A.-U.-A.; machine learning implementation, S.H.W., A.K.N. and M.A.-U.-A.; validation, P.G.,
A.K.N. and M.A.-U.-A., writing—original draft preparation, A.K.N. and M.A.-U.-A.; writing—review
and editing, S.H.W., A.K.N. and M.A.-U.-A.; supervision, M.H., N.O., S.K. and M.A.-U.-A. All authors
have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Ministry of Education, Culture, Sports, Science, and
Technology of Japan (20K12043) and NAIST Big Data Project and was partially supported by the
National Bioscience Database Center in Japan.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data are available on request from the corresponding authors.
Acknowledgments: The authors would like to thank the Ministry of Education Science and Technol-
ogy Japan, which has financially supported the authors to continue the study in Japan.
Conflicts of Interest: The authors declare no conflict of interest.
Life 2023, 13, 439 14 of 15
References
1. Ekor, M. The growing use of herbal medicines: Issues relating to adverse reactions and challenges in monitoring safety. Front.
Pharmacol. 2014, 4, 177. [CrossRef]
2. Wijaya, S.H.; Tanaka, Y.; Altaf-Ul-Amin, M.; Morita, A.H.; Afendi, F.M.; Batubara, I.; Kanaya, S. Utilization of KNApSAcK family
databases for developing herbal medicine systems. J. Comput. Aided Chem. 2006, 17, 1–7. [CrossRef]
3. Hossain, S.F.; Wijaya, S.H.; Huang, M.; Batubara, I.; Kanaya, S.; Farhad, M.A.U.A. Prediction of Plant-Disease Relations Based on
Unani Formulas by Network Analysis. In Proceedings of the 2018 IEEE 18th International Conference on Bioinformatics and
Bioengineering (BIBE), Taicung, Taiwain, 29–31 October 2018.
4. Itrat, M. Methods of health promotion and disease prevention in Unani medicine. J. Educ. Health Promot. 2020, 9, 168. [CrossRef]
5. Husain, A.; Sofi, G.D.; Tajuddin, T.; Dang, R.; Kumar, N. Unani system of medicine-introduction and challenges. Med. J. Islam.
World Acad. Sci. 2010, 18, 27–30.
6. Rani, M.; Nayak, R.; Vyas, O.P. An ontology-based adaptive personalized e-learning system, assisted by software agents on cloud
storage. Knowl.-Based Syst. 2015, 90, 33–48. [CrossRef]
7. Cortes, C.; Mohri, M.; Rostamizadeh, A. Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 2012,
13, 795–828.
8. Kang, M.J.; Kang, J.W. Intrusion detection system using deep neural network for in-vehicle network security. PLoS ONE 2016,
11, e0155781. [CrossRef] [PubMed]
9. Patel, H.; Thakkar, A.; Pandya, M.; Makwana, K. Neural network with deep learning architectures. J. Inf. Optim. Sci. 2018, 39,
31–38. [CrossRef]
10. Nasution, A.K.; Wijaya, S.H.; Gao, P.; Islam, R.M.; Huang, M.; Ono, N.; Altaf-Ul-Amin, M. Prediction of Potential Natural
Antibiotics Plants Based on Jamu Formula Using Random Forest Classifier. Antibiotics 2022, 11, 1199. [CrossRef] [PubMed]
11. Wijaya, S.H.; Batubara, I.; Nishioka, T.; Altaf-Ul-Amin, M.; Kanaya, S. Metabolomic studies of Indonesian jamu medicines:
Prediction of jamu efficacy and identification of important metabolites. Mol. Inform. 2017, 36, 1700050. [CrossRef]
12. Jackins, V.; Vimal, S.; Kaliappan, M.; Lee, M.Y. AI-based smart prediction of clinical disease using random forest classifier and
Naive Bayes. J. Supercomput. 2021, 77, 5198–5219. [CrossRef]
13. Gunn, S.R. Support vector machines for classification and regression. ISIS Tech. Rep. 1998, 1, 5–16.
14. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507.
[CrossRef]
15. Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for
deep learning. arXiv 2018, arXiv:1811.03378.
16. Larrosa, M.; González-Sarrías, A.; García-Conesa, M.T.; Tomás-Barberán, F.A.; Espín, J.C. Urolithins; ellagic acid-derived
metabolites produced by human colonic microflora; exhibit estrogenic and antiestrogenic activities. J. Agric. Food Chem. 2006, 54,
1611–1620. [CrossRef]
17. Ren, Y.; Shen, L.; Zhang, D.W.; Dai, S.J. Two new sesquiterpenoids from Solanum lyratum with cytotoxic activities. Chem. Pharm.
Bull. 2009, 57, 408–410. [CrossRef]
18. Pan, R.L.; Chen, D.H.; Si, J.Y.; Zhao, X.H.; Li, Z.; Cao, L. Immunosuppressive effects of new cyclolanostane triterpene diglycosides
from the aerial part of Cimicifuga foetida. Arch. Pharmacal Res. 2009, 32, 185–190. [CrossRef]
19. van der Krieken, S.E.; Popeijus, H.E.; Bendik, I.; Böhlendorf, B.; Konings, M.C.; Tayyeb, J.; Plat, J. Large-Scale Screening of Natural
Products Transactivating Peroxisome Proliferator-Activated Receptor α Identifies 9S-Hydroxy-10E; 12Z; 15Z-Octadecatrienoic
Acid and Cymarin as Potential Compounds Capable of Increasing Apolipoprotein A-I Transcription in Human Liver Cells. Lipids
2018, 53, 1021–1030. [PubMed]
20. Sanders, B.; Lankenau, S.E.; Bloom, J.J.; Hathazi, D. “Research chemicals”: Tryptamine and phenethylamine use among high-risk
youth. Subst. Use Misuse 2008, 43, 389–402. [CrossRef] [PubMed]
21. Tiwari, P.; Sharma, P.; Khan, F.; Singh Sangwan, N.; Nath Mishra, B.; Singh Sangwan, R. Structure activity relationship studies of
gymnemic acid analogues for antidiabetic activity targeting PPARγ. Curr. Comput.-Aided Drug Des. 2015, 11, 57–71. [CrossRef]
22. Al-Taweel, A.M.; Abdel-Kader, M.S.; Fawzy, G.A.; Perveen, S.; Maher, H.M.; Al-Zoman, N.Z.; AI-Showiman, H. Isolation of
flavonoids from Delonix data and determination of its rutin content using capillary electrophoresis. Pak. J. Pharm. Sci. 2015, 28,
1897–1903.
23. Pawar, R.S.; Grundel, E. Overview of regulation of dietary supplements in the USA and issues of adulteration with phenethy-
lamines (PEAs). Drug Test. Anal. 2017, 9, 500–517. [CrossRef] [PubMed]
24. Mattingly, R.R.; Stephens, L.R.; Irvine, R.F.; Garrison, J.C. Effects of transformation with the v-src oncogene on inositol phosphate
metabolism in rat-1 fibroblasts. D-myo-inositol 1; 4; 5; 6-tetrakisphosphate is increased in v-src-transformed rat-1 fibroblasts and
can be synthesized from D-myo-inositol 1; 3; 4-trisphosphate in cytosolic extracts. J. Biol. Chem. 1991, 266, 15144–15153.
25. Abdin, M.; Hamed, Y.S.; Akhtar, H.M.S.; Chen, D.; Chen, G.; Wan, P.; Zeng, X. Antioxidant and anti-inflammatory activities of
target anthocyanins di-glucosides isolated from Syzygium cumini pulp by high speed counter-current chromatography. J. Food
Biochem. 2020, 44, 1050–1062. [CrossRef]
26. Oh, S.M.; Kim, Y.P.; Chung, K.H. Biphasic effects of kaempferol on the estrogenicity in human breast cancer cells. Arch. Pharmacal
Res. 2006, 29, 354–362. [CrossRef]
Life 2023, 13, 439 15 of 15
27. Xiao, J.; Sun, G.B.; Sun, B.; Wu, Y.; He, L.; Wang, X.; Sun, X.B. Kaempferol protects against doxorubicin-induced cardiotoxicity
in vivo and in vitro. Toxicology 2012, 292, 53–62. [CrossRef] [PubMed]
28. Hsieh, Y.S.; Kwon, S.; Lee, H.S.; Seol, G.H. Linalyl acetate prevents hypertension-related ischemic injury. PLoS ONE 2018,
13, e0198082. [CrossRef]
29. Wang, S.; Guan, X.; Zhong, X.; Yang, Z.; Huang, W.; Jia, B.; Cui, T. Simultaneous determination of cucurbitacin IIa and cucurbitacin
IIb of Hemsleya amabilis by HPLC–MS/MS and their pharmacokinetic study in normal and indomethacin-induced rats. Biomed.
Chromatogr. 2016, 30, 1632–1640. [CrossRef] [PubMed]
30. Acquaviva, R.; Di Giacomo, C.; Sorrenti, V.; Galvano, F.; Santangelo, R.; Cardile, V.; Vanella, L. Antiproliferative effect of
oleuropein in prostate cell lines. Int. J. Oncol. 2012, 41, 31–38.
31. Acharya, N.; Acharya, S.; Shah, U.; Shah, R.; Hingorani, L. A comprehensive analysis on Symplocos racemosa Roxb.: Traditional
uses; botany; phytochemistry and pharmacological activities. J. Ethnopharmacol. 2016, 181, 236–251. [CrossRef]
32. Kulikova, V.; Morozova, E.; Rodionov, A.; Koval, V.; Anufrieva, N.; Revtovich, S.; Demidkina, T. Non-stereoselective decomposi-
tion of (±)-S-alk (en) yl-l-cysteine sulfoxides to antibacterial thiosulfinates catalyzed by C115H mutant methionine γ-lyase from
Citrobacter freundii. Biochimie 2018, 151, 42–44. [CrossRef]
33. Sakata, T.; Oda, S.; Tsunaga, Y.; Shomura, H.; Kawagishi-Kobayashi, M.; Aya, K.; Higashitani, A. Reduction of gibberellin by low
temperature disrupts pollen development in rice. Plant Physiol. 2014, 164, 2011–2019. [CrossRef] [PubMed]
34. Engels, B.; Heinig, U.; McElroy, C.; Meusinger, R.; Grothe, T.; Stadler, M.; Jennewein, S. Isolation of a gene cluster from Armillaria
gallica for the synthesis of armillyl orsellinate–type sesquiterpenoids. Appl. Microbiol. Biotechnol. 2020, 105, 211–224. [CrossRef]
[PubMed]
35. McCormack, D.; McFadden, D. A review of pterostilbene antioxidant activity and disease modification. Oxidative Med. Cell.
Longev. 2013, 2013, 575482. [CrossRef] [PubMed]
36. Bhakta, H.K.; Park, C.H.; Yokozawa, T.; Tanaka, T.; Jung, H.A.; Choi, J.S. Potential anti-cholinesterase and β-site amyloid precursor
protein cleaving enzyme 1 inhibitory activities of cornuside and gallotannins from Cornus officinalis fruits. Arch. Pharmacal Res.
2017, 40, 836–853. [CrossRef] [PubMed]
37. Zhang, J.; Wu, J.; Liu, F.; Tong, L.; Chen, Z.; Chen, J.; Huang, C. Neuroprotective effects of anthocyanins and its major component
cyanidin-3-O-glucoside (C3G) in the central nervous system: An outlined review. Eur. J. Pharmacol. 2019, 858, 172500. [CrossRef]
[PubMed]
38. Calderón-Montano, J.M.; Burgos-Morón, E.; Orta, M.L.; Pastor, N.; Austin, C.A.; Mateos, S.; López-Lázaro, M. Alpha; beta-
unsaturated lactones 2-furanone and 2-pyrone induce cellular DNA damage; formation of topoisomerase I-and II-DNA complexes
and cancer cell death. Toxicol. Lett. 2013, 222, 64–71. [CrossRef]
39. Xin, X.Q.; Chen, Y.; Zhang, H.; Li, Y.; Yang, M.H.; Kong, L.Y. Cytotoxic seco-cytochalasins from an endophytic Aspergillus sp.
harbored in Pinellia ternata tubers. Fitoterapia 2019, 132, 53–59. [CrossRef] [PubMed]
40. Ahn, J.Y.; Choi, S.E.; Jeong, M.S.; Park, K.H.; Moon, N.J.; Joo, S.S.; Seo, S.J. Effect of taxifolin glycoside on atopic dermatitis-like
skin lesions in NC/Nga mice. Phytother. Res. 2010, 24, 1071–1077. [CrossRef]
41. Cho, J.; Tremmel, L.; Rho, O.; Camelio, A.M.; Siegel, D.; Slaga, T.J.; DiGiovanni, J. Evaluation of pentacyclic triterpenes found
in Perilla frutescens for inhibition of skin tumor promotion by 12-O-tetradecanoylphorbol-13-acetate. Oncotarget 2015, 6, 39292.
[CrossRef]
42. Lin, Y.; Dubinsky, W.P.; Ho, D.H.; Felix, E.; Newman, R.A. Determinants of human and mouse melanoma cell sensitivities to
oleandrin. J. Exp. Ther. Oncol. 2008, 7, 195–205.
43. Umekawa, T.; Yamate, T.; Amasaki, N.; Kohri, K.; Kurita, T. Osteopontin mRNA in the kidney on an experimental rat model of
renal stone formation without renal failure. Urol. Int. 1995, 55, 6–10. [CrossRef] [PubMed]
44. Kohri, K.E.A.; Nomura, S.; Kitamura, Y.; Nagata, T.; Yoshioka, K.; Iguchi, M.; Sinohara, H. Structure and expression of the mRNA
encoding urinary stone protein (osteopontin). J. Biol. Chem. 1993, 268, 15180–15184. [CrossRef] [PubMed]
45. Robitaille, L.; Mamer, O.A.; Miller, W.H., Jr.; Levine, M.; Assouline, S.; Melnychuk, D.; Hoffer, L.J. Oxalic acid excretion after
intravenous ascorbic acid administration. Metabolism 2009, 58, 263–269. [CrossRef] [PubMed]
46. Kropp, H.; Sundelof, J.G.; Hajdu, R.; Kahan, F.M. Metabolism of thienamycin and related carbapenem antibiotics by the renal
dipeptidase; dehydropeptidase-I. Antimicrob. Agents Chemother. 1983, 22, 62–70. [CrossRef]
47. Philbrick, D.J.; Bureau, D.P.; Collins, F.W.; Holub, B.J. Evidence that soyasaponin Bb retards disease progression in a murine
model of polycystic kidney disease. Kidney Int. 2003, 63, 1230–1239. [CrossRef] [PubMed]
48. O’Donnell, G.; Poeschl, R.; Zimhony, O.; Gunaratnam, M.; Moreira, J.B.; Neidle, S.; Gibbons, S. Bioactive pyridine-N-oxide
disulfides from Allium stipitatum. J. Nat. Prod. 2009, 72, 360–365. [CrossRef]
49. Stahlhut, S.G.; Siedler, S.; Malla, S.; Harrison, S.J.; Maury, J.; Neves, A.R.; Forster, J. Assembly of a novel biosynthetic pathway for
production of the plant flavonoid fisetin in Escherichia coli. Metab. Eng. 2015, 31, 84–93. [CrossRef] [PubMed]
50. IJAH Analytics. Available online: http://ijah.apps.cs.ipb.ac.id/#/home (accessed on 13 December 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.