ML and Biochemistry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Chemico-Biological Interactions 358 (2022) 109888

Contents lists available at ScienceDirect

Chemico-Biological Interactions
journal homepage: www.elsevier.com/locate/chembioint

Artificial intelligence approaches to the biochemistry of oxidative stress:


Current state of the art
Igor Pantic a, b, c, *, Jovana Paunovic d, Snezana Pejic e, Dunja Drakulic e, Ana Todorovic e,
Sanja Stankovic f, g, Danijela Vucevic d, Jelena Cumic h, Tatjana Radosavljevic d
a
University of Belgrade, Faculty of Medicine, Institute of Medical Physiology, Laboratory for Cellular Physiology, Visegradska 26/II, RS-11129, Belgrade, Serbia
b
University of Haifa, 199 Abba Hushi Blvd, Mount Carmel, Haifa, IL, 3498838, Israel
c
Ben-Gurion University of the Negev, Faculty of Health Sciences, Department of Physiology and Cell Biology, 84105 Be’er Sheva, Israel
d
University of Belgrade, Faculty of Medicine, Institute of Pathological Physiology, Dr Subotica 9, RS-11129, Belgrade, Serbia
e
University of Belgrade, Vinca Institute of Nuclear Sciences, Department of Molecular Biology and Endocrinology, Mike Petrovica Alasa 12-14, RS-11351, Belgrade,
Serbia
f
University Clinical Centre of Serbia, Centre for Medical Biochemistry, Visegradska 26, RS-11000, Belgrade, Serbia
g
University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovica 69, RS-34000, Kragujevac, Serbia
h
University of Belgrade, Faculty of Medicine, University Clinical Centre of Serbia, Dr. Koste Todorovića 8, RS-11129, Belgrade, Serbia

A R T I C L E I N F O A B S T R A C T

Keywords: Artificial intelligence (AI) and machine learning models are today frequently used for classification and pre­
Reactive oxygen species diction of various biochemical processes and phenomena. In recent years, numerous research efforts have been
Machine learning focused on developing such models for assessment, categorization, and prediction of oxidative stress. Supervised
Oxidative damage
machine learning can successfully automate the process of evaluation and quantification of oxidative damage in
Toxicity
Signal analysis
biological samples, as well as extract useful data from the abundance of experimental results. In this concise
review, we cover the possible applications of neural networks, decision trees and regression analysis as three
common strategies in machine learning. We also review recent works on the various weaknesses and limitations
of artificial intelligence in biochemistry and related scientific areas. Finally, we discuss future innovative ap­
proaches on the ways how AI can contribute to the automation of oxidative stress measurement and diagnosis of
diseases associated with oxidative damage.

1. Introduction Unsupervised learning. During Supervised learning, the machine is


presented with a series of examples in which for one or more inputs, it is
There are numerous definitions of artificial intelligence (AI) and provided with the correct solution (output). Through repetition, the
probably none is able to fully explain this complex phenomenon which is optimization of the appropriate algorithm is performed, until the ma­
today the subject of numerous researches efforts not only in the field of chine may finally give the correct output information for each new entry
information technologies, but many other scientific disciplines as well. with a certain percentage of precision. Unlike Supervised learning, in
Artificial intelligence represents the ability of machines to perceive a Unsupervised learning there is only input information that the machine
problem and find an appropriate solution by applying techniques that organizes according to a pattern. This classification of data into cate­
resemble human cognitive psychic functions. Using artificial intelli­ gories or groups, or logical structuring of data helps to draw appropriate
gence methods, the machine learns how to effectively predict the value conclusions about a function or a rule [2–4].
of a variable, classify a group of data or perform some other complex In recent years, there have been many attempts to apply machine
tasks [1,2]. There are numerous techniques of machine learning, but learning techniques and artificial intelligence in the fields of cell
when it comes to the application of artificial intelligence in biomedicine, biology, biochemistry and molecular medicine [5,6]. The need for
they are usually divided into two major groups: Supervised learning and automation in these areas has led to the development of numerous

* Corresponding author.University of Belgrade, Faculty of Medicine, Institute of Medical Physiology, Visegradska 26/II, RS, 11129, Belgrade, Serbia.
E-mail addresses: [email protected], [email protected] (I. Pantic).
URL: http://www.igorpantic.com (I. Pantic).

https://doi.org/10.1016/j.cbi.2022.109888
Received 31 January 2022; Received in revised form 4 March 2022; Accepted 9 March 2022
Available online 13 March 2022
0009-2797/© 2022 Elsevier B.V. All rights reserved.
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

models of machine learning that are able to recognize and classify cell relatively easy to determine. One of the most important products of this
and tissue damage, as well as to predict various biochemical processes process is malondialdehyde (MDA) which can be detected with thio­
and physiological mechanisms. Most of these models have not yet barbituric reactive substances (TBARS). Even relatively small ROS
become part of the contemporary research and clinical practices, and it generation in pathological conditions may in some circumstances result
is assumed that this will take a long time to be achieved. However, a in high levels of MDA. Malondialdehyde is also a relatively sensitive
large number of authors believe that artificial intelligence has bright indicator of ROS concentrations in tissue homogenates such as liver.
future when it comes to certain methodological approaches in toxi­ Another, less common way to evaluate the consequences of lipid per­
cology [7,8]. oxidation is to measure 4-hydroxynonenal (4-HNE or HNE) by using
Recently, many new and innovative machine learning models have high-performance liquid chromatography and 2,4-dinitrophenylhydra­
been developed and implemented for prediction and classification of zine and 1,3-cyclohexandione probes. Finally, prostaglandin-like com­
oxidative stress in cells and tissues. Oxidative stress usually refers to the pounds such as isoprostanes can be determined to achieve the same
generation of reactive oxygen species such as superoxide, peroxides, effect [14,17].
hydroxyl radical or singlet oxygen [9–11]. Changes in oxidative status in Od all DNA damage markers of oxidative stress, 8-hydroxydeoxygua­
tissues and cell cultures have been reported as a possible result of nosine (8-OHdG) is probably the most frequently used in biological
exposure to toxic chemical agents. Also, oxidative stress may be a research. This is essentially a DNA base modification that occurs when
contributing factor to numerous diseases and conditions in internal guanine is exposed to reactive oxygen species (mainly hydroxyl radical).
medicine, neurology and oncology. For example, some neurological Today various assays exist for 8-OHdG determination in biological
degenerative disorders such as Parkinson’s and Alzheimer’s disorder samples, including ones based on ELISA kits. DNA damage that takes
may in certain conditions be associated with the generation of reactive place due to the exposure of various toxic environmental factors is
oxygen species [12]. Certain substances with antioxidant properties are sometimes assessed using this methodological approach [18].
thought to be beneficial for human health and may contribute to the The third set of techniques for indirect measurement of oxidative
prevention of a large number of chronic noncommunicable diseases. stress refers to the quantification of protein oxidation and nitration.
Process of physiological aging is also sometimes associated with cellular These techniques rely on the fact that ROS and ROS-associated com­
and DNA damage associated with oxidative stress [13]. pounds can induce structural modifications in proteins such as the for­
This concise narrative review focuses on the development and use of mation of protein carbonyl groups, advanced oxidation protein products
contemporary AI and machine learning models for detection, prediction, (AOPP) or advanced glycation end products [19]. Protein carbonyl
and classification of oxidative stress. We cover the applications of neural content is probably the most commonly measured biomarker and there
networks, decision trees and regression analysis as three common stra­ is a variety of commercially available assays and kits designed for this
tegies in machine learning. We also discuss future innovative ap­ purpose. These include methods based on Western blotting, ELISA or
proaches on the ways how AI can contribute to the automation of relatively simple spectrophotometric analysis [19].
oxidative stress measurement and diagnosis of diseases associated with Finally, it should be noted that oxidative stress can also be indirectly
oxidative damage. assessed by quantifying compounds that contribute to antioxidant de­
fense [14,20–22]. These may include enzymes such as superoxide dis­
2. Biomarkers of oxidative stress that can be used as input data mutase, catalase or glutathione peroxidase. Non-enzymatic antioxidant
for machine learning defenses consisting of glutathione, lipoic acid, transferrine, vitamins C
and E may also be evaluated. For example, a common methodological
Today, there are many methods for assessment of oxidative stress in approach for establishing oxidative status in tissue homogenates would
biological samples. Direct measurement of reactive oxygen species is be to measure MDA as an indicator of lipid peroxidation in combination
possible although relatively rare when compared to indirect approaches. with glutathione, superoxide dismutase and catalase.
Direct assessment of superoxide anion radical can for example be The above-mentioned biomarkers are just a few in a wide spectrum
determined using electron paramagnetic resonance (EPR) spectroscopy of oxidative stress indicators that can today be determined in cells, tis­
and so-called “spin trapping”. Spin trapping with 5, 5-dimethyl-1-pyr­ sues or other biological samples. Most, if not all of these indicators can
roline-N-oxide (DMPO) is a common way to monitor superoxide levels in be used as inputs for training and testing contemporary machine
some cell population in in vitro conditions, and sometimes it can be learning models for prediction or classification of oxidative damage
additionally used for ex vivo assessments [14]. Apart from superoxide, (Table 2). In certain experimental conditions, a sole marker may be
this nitrone spin trap approach can be used for quantification of hy­ sensitive enough to distinguish between damaged and intact specimen,
droxyl radical. Alternative way of spin trapping for determination of however, frequently this is not the case, and a variety of parameters need
superoxide hydroxyl radical levels is the utilization 5-(diethox­ to be measured. Artificial intelligence and machine learning strategies
yphosphoryl)-5-methyl-1-pyrroline-N-oxide (DEPMPO) for in vitro may offer a fast and affordable route to increasing sensitivity and
measurements. Another common way to directly measure superoxides is specificity of these parameters.
fluorescence analysis of hydrocyans. Less frequently we see studies done
with Spin Probes associated with cyclic hydroxylamine or other hy­ 3. Artificial neural networks and deep learning
droxylamines (for superoxide anion), or strategies involving PBN
(α-phenyl-N-t-butylnitrone) or POBN [α-(4-pyridyl-1-ox­ Artificial neural network (ANN) is basically a composition of nodes
ide)-N-tert-butylnitrone (superoxide anion measurements, in vivo ex­ connected in a way that may partially resemble biological organization
periments). Also, in some cases, acetylated cytochrome c reduction, of brain synapses. The nodes are referred to as artificial neurons and are
fluorescence analysis of hydroethidium dihydroethidium might also be often clustered in layers. The first layer of neurons is the input layer and
used. Intracellular hydrogen peroxide can be directly quantified with the last is the output layer [23–25]. A neural network may have a large
HyPer fluorescent sensing system, while extracellularly, Amplex Red number of so-called hidden layers between the input and the output
assay is today successfully used [14–16]. layer. The neurons and their connections (synapses) are associated with
Free radicals in biological samples are sometimes short-lived, so their so-called “weights” which influence the probability of signal trans­
direct measurement can be difficult and impractical. Therefore, a mission. The weights are modified during the learning process in which
number of indirect methods exist and are frequently used in biomedical the network is exposed to a series of examples of inputs and (the correct)
research. This includes detection of various biomarkers such as products outputs. Generally, the objective of learning here is to minimize the
of lipid peroxidation, oxidative DNA, proteins and others. Lipid perox­ observed errors by weight adjustment. During the network training, a
idation is associated with the oxidative damage of cell membranes and is cost function is defined in order to evaluate the reduction of the error

2
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

Table 1 rate [23–25].


Machine learning models that are potentially applicable in oxidative stress Several types of ANNs are today widely used in biological research
research. and thy can all theoretically be used for prediction and classification of
Machine learning Type of machine learning Description/Use oxidative stress (Table 1). Probably the simplest ANN is the perceptron,
model a sort of feedforward architecture that is based on binary McCul­
Linear regression Supervised, based on Can be used for prediction of a loch–Pitts neurons. Perceptrons can be single-layer (SLP) or multilayer
linear relationship variable (i.e. parameters of (MLP), and multilayer ones usually use backpropagation learning
between variables oxidative stress) value based technique. Apart from perceptrons, in oxidative stress research more
on the previously calculated
complex deep learning approaches may also be applied. Typical exam­
line of best fit.
Logistic regression Supervised Often used with categorical ples of these complex deep ANNs are Recurrent neural networks (RNNs)
dependent variables (Binomial and Convolutional deep neural networks (CNNs). While MLPs usually
logistic regression). use flattened vectors as inputs, CNNs use tensors and are often more
Support vector Supervised, often based on Often used for construction of suitable for complicated image classification. Furthermore, in MLPs,
machine (SVM) a non-probabilistic binary a hyperplane maximizing the
linear classifier. distance between two sets of
layers are often fully connected while in CNNs, they are sparsely con­
data points (functional nected [26–28].
margin). One of the earliest works to specifically apply neural networks in
Random forest (RF) Supervised, based on Classification and prediction prediction of oxidative status in humans would be the study by de la
decision trees. of data based on specific rules
Villehuchet et al. (2009). The authors mainly used MLPs (“feedforward
and decisions. Creation of a
number of uncorrelated neural networks”) with different number of hidden neurons. For pre­
decision trees (forest). diction of glutathione concentrations a whole range of candidate vari­
Multilayer Supervised – neural Prediction and classification of ables were considered, including Oxidized DNA, Oxidized LDL, vitamins
Perceptron (MLP) network, flattened vectors data. Neurons and their C and E, Protein thiol, Cu/Zn ratio and Selenium. The study also per­
as inputs. connections (synapses) are
formed the prediction of other important indicators of oxidative stress
associated with so-called
“weights” which influence the such as oxidized LDL and the ratio 8-OH-dG/creatinine. The value of the
probability of signal study is in the fact that it used training data from a large and versatile
transmission. The weights are database of patients suffering from a variety of inflammatory and other
modified during the learning
disorders [29].
process in which the network
is exposed to a series of In 2021, a set of different machine learning methods was proposed
examples of inputs and (the for estimation of antioxidant action of polyphenols, plant micronutrients
correct) outputs. that are the part of various over-the-counter supplements [30]. As input
Convolutional Supervised – neural Somewhat similar to MLP but data, various molecular descriptors associated with in vivo antioxidant
neural networks network, tensors as inputs. often more advanced and
activity were considered, including factors related to metabolism (i.e.
(CNNs) suitable for complex image
classification. cytochrome P metabolism or organic anion transporting polypeptide –
Principal Unsupervised - a type of Linear transformation of data mediated hepatic uptake) and hydrogen atom transfer (i.e. bond disso­
component dimensionality reduction in order to create different ciation energy or ionization potential). On of the possible proposed
analysis (PCA) types of data components
outputs were data/index values based on lipid peroxidation (i.e.
(principal components). The
outcome depends on a number
f2-isoprostanes), oxidative damage to proteins (i.e. protein carbonyl
of data dimensions. levels) and oxidative damage to nucleic acids (i.e. 8-hydroxydeoxygua­
nosine). Although artificial neural networks and deep learning were
discussed as possible approaches, this article covers a wide range of
Table 2 other AI-based models such as bayesian probabilistic learning and sup­
Some important biomarkers of oxidative stress that can be used as input data for port vector machines [30].
machine learning. Another example of successful application of feed forward artificial
neural networks would be the work of Liu et al. (2021) where the model
Marker Type Possible way of measuring
was used to classify heart valve tissue samples previously treated with
Superoxide anion radical ROS Spin trapping with 5, 5-
chemical agents that induce oxidative stress. Decellularized porcine
dimethyl-1-pyrroline-N-oxide
(DMPO); Fluorescence analysis aortic heart valves were treated with hydrogen peroxide and Iron (III)
of hydrocyans chloride (both potent inductors of oxidative damage) and ROS genera­
Hydroxyl radical ROS Nitrone spin trap approach tion was evaluated using Fourier transform infrared spectroscopy and
Malondialdehyde Lipid Method using thiobarbituric nitroblue tetrazolium staining. In this case, input data consisted of
peroxidation reactive substances (TBARS)
values of vector normalized absorbance intensities for specific spectral
product
4-hydroxynonenal (4-HNE or Lipid High-performance liquid range of an infrared spectrum [31]. The output layer was a probability of
HNE) peroxidation chromatography and 2,4-dini­ the classification. The model showed relatively high classification ac­
product trophenylhydrazine and 1,3- curacy and could be used as a basis for future development of AI-based
cyclohexandione probes
sensing systems in this area of medical research.
8-hydroxydeoxyguanosine DNA damage ELISA
(8-OHdG) marker of
oxidative stress 4. Decision trees and random forests
Protein carbonyl content Marker of Western blotting, ELISA,
protein spectrophotometric analysis Decision tree learning represents a strategy of predictive modelling
oxidation
Superoxide dismutase, Antioxidant Western blotting, activity gels,
in data science where, through a series of decisions, conclusion is drawn
catalase or glutathione enzymes activity assays based on initial observations [32,33]. Target variables are represented
peroxidase by leaves while the branches refer to the outcomes of the attribute tests
Glutathione, lipoic acid, Non-enzymatic Various biochemical methods performed in internal nodes. This creates an algorithm resembling a
transferrin, vitamins C and antioxidant
flowchart that may be linearized into rules for decision. Rules are
E defenses
somewhat similar to “if statements” in programming languages (for
example: if (A > B): outcome 1 else: outcome 2). Decision trees are often

3
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

grouped into two main categories that differ on whether the outcome is a 5. Linear, logistic and other regression approaches
class or a real number: Classification trees and Regression trees. In
biomedicine, contemporary statistical and data mining programs are Linear and logistic regression machine learning models are probably
usually able to perform a variety of tree analyses ranging from classical one of the simplest supervised ML approaches that can be effectively
classification and regression tree (CART) to Chi-square automatic used for prediction and classification of biological phenomena. Linear
interaction detection (CHAID) and QUEST (Quick, Unbiased, Efficient regression essentially tries to predict a dependent variable (i.e. param­
Statistical Tree). Random decision forests are composed of decision trees eter of ROS generation) based on the values of independent variable (i.e.
that are constructed during the learning process. In molecular biology, some other biochemical parameter), with presumption of a linear rela­
random forests are commonly used for classification of biological sam­ tionship between the variables. Logistic regression is commonly used for
ples or other data types with classes selected by the most trees are binary classification problems, for example to predict whether the cell is
considered as output. Random forests often have higher accuracy and damaged or intact using the available biochemical data or other data.
discriminatory power in comparison to individual decision trees For creation of both linear and logistic regression ML models, training
although this is not always the case [8,34]. data with known inputs and outcomes need to be presented to the ma­
One of the possible applications of random forests in label-free chine, and later the model is tested for classification accuracy,
discrimination and quantitative analysis of cytotoxicity due to oxida­ discriminatory power or other performance indicator.
tive stress is covered in recent work of Zhang et al. (2020). Here the One of the recent examples of application of linear regression ML
authors combined the work on cell cultures (A549 cells) treated with model in oxidative stress research is the work by Shemshaki and asso­
toxic diesel exhaust particles, Raman spectroscopy imaging and a ciates [40]. Here the authors conducted a clinical study on infertile male
number of different machine learning methods (apart from random patients and quantified various biochemical parameters in semen,
forests, support vector machine, k nearest neighbors, linear discriminant including reactive oxygen species. Linear regression model was used to
analysis and many others). The models were used for classification and predict ROS from citric acid, fructose, BMI (body mass index), BMR
evaluation of effects of the toxic particles and antioxidants (resveratrol (basal metabolic rate), sperm motility and sperm morphology. Linear
and mesobiliverdin IXα) on cell behavior [35]. Random forests proved to regression in some circumstances showed relatively good performance,
have relatively good classification accuracy although its performance similar to support vector machine, and even better in comparison to
was not as good as in some other models such as k nearest neighbors. artificial neural networks (although worse than random forests). The
It may be possible that Random Forest models that use data on genes machine learning approach in this study helped reveal the potential
associated with oxidative stress can be utilized to design a diagnostic connection between BMI and ROS generation [40].
strategy for some of the most common diseases and disorders. This in­ In 2015, Lavender and associates used a logistic regression approach
cludes a diagnostic model of acute myocardial infarction as described by in combination with multifactor dimensionality reduction was used for
Yifan et al. (2021) where authors used data representing expression of evaluate the impact of oxidative stress response related genetic variants,
different ferroptosis-related genes. Ferroptosis is a type of iron- antioxidants and prooxidants on prostate cancer risk and aggressiveness.
dependent programmed cell death in which lipid peroxides accumu­ The study included a relatively large sample of 2286 subjects. Using this
late as the result of the disfunction of antioxidant defenses. In this study, sophisticated statistical analysis, numerous gene-gene gene-environ­
the authors were able to create a RF model based on genes in circulating ment interactions were investigated which allowed to identify factors
endothelial cells with strong diagnostic performance for infarction and associated with oxidative stress that contribute most to in prostate
areas under the ROC curve higher than 85% in the validation data set carcinogenesis [41]. Another, more recent example of using binomial
[36]. logistic regression model would be the study done by Zhang et al. (2020)
Random forest can be integrated with support vector machines to where the model was able to distinguishing the active chemicals
achieve good prediction results, at least the ones related post- inducing oxidative stress from inactive compounds. Data from a total of
translational modification of proteins during oxidative stress. Such 638 active and 3632 inactive chemicals were used to develop this ML
integration was achieved by Hasan et al. (2019) who used this strategy strategy which resulted in prediction accuracy of almost 70% and
to predict Cysteine S-nitrosylation which is often related to antioxidative satisfactory area under the receiver operating characteristic curve [42].
defense and redox-based cell signaling. The value of this study lies in the A specific machine learning approach based on the least absolute
fact that experimental identification of this process demands significant shrinkage and selection operator (LASSO)/elastic net regression algo­
material and other resources, while the machine learning methodology rithm was developed by Kim et al. (2021) and used for quantitative
is inexpensive, fast and efficient [37]. determination of oxidative stress risks in healthy human subjects. In this
Another important recent study on the application of Random Forest work, measurement of malondialdehyde (indicator of lipid peroxida­
model in oxidative stress research is the one by Ho Thanh Lam and as­ tion) was used for evaluation of oxidative stress, although many other
sociates (2020). Here the authors use a benchmark set of sequencing biochemical parameters were also used for the creation of ML model.
data to develop various models in order to identify antioxidant proteins The model was applied for classification, and differentiation between
based on their highest performance [38]. Random Forest is presented as individuals with pathological oxidative status and healthy controls. The
the model with high accuracy, and relatively good balance between authors reported an outstanding discriminatory power of the model with
specificity and sensitivity during the identification of proteins. These area under the ROC curve of 0.949 and excellent sensitivity and speci­
data indicate that in certain conditions RF may be superior in compar­ ficity for selected confidence intervals [43].
ison with more complex deep learning approaches. Simple regression models such as the ones relying on linear and
Random forest as well as decision trees were proved to be capable for binomial logistic regression are often underestimated and avoided by
prediction of the activity as well as classification of various drug mole­ some data scientists who prefer more complex neural network designs.
cules associated with some oxidative stress signaling pathways such as In some circumstances this is a mistake since regression models can be
the Nrf2-antioxidant response element path [39]. One of such studies equally sensitive or even more sensitive in prediction of biological
used activity information of total 10 486 molecules and compared or­ phenomena. For example in a recent study done by our laboratory, we
dinary decision trees, random forests, ada boost, linear model and neural compared binomial logistic regression, decision trees and multilayer
networks. When used for binary classification by oversampling, random perceptrons for classification of damaged and intact cells after ethanol
forest had the largest area under the ROC curve (86%) and similar re­ induced toxicity [44]. We found that all three models have similar
sults were obtained for binary classification by undersampling. classification accuracy (tree-based learning algorithm 80.6%, multilayer
perceptron 83.3% and binomial logistic regression algorithm 83.2%).

4
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

6. Future applications of AI and machine learning in oxidative [44].


stress research Despite numerous advantages, artificial intelligence and machine
learning approaches in molecular biology and biochemistry have many
There are reasons to believe that AI and machine learning will in the limitations [45,46]. When constructing ML models, one should always
future become an integral part of statistical analysis of data obtained consider issues related to data quality. Small sample size could severely
from experiments aimed to quantify oxidative status of cells and tissues. limit the development of a useful neural network, but even a large
Even today, there are numerous software platforms that facilitate the sample with biased data can result in a neural network that will not work
training and testing of even the most complex neural networks, and it is properly or produce wrong output. In reality, oxidative stress and its
probable that in the future, even scientists with very limited IT skills will consequences depend on many factors, and usually there are numerous
be able to use them. In other words, the development of automated ML confounding variables that are difficult to include into a learning pro­
software that does not require programming knowledge (i.e. Python cess. Also, it should be stressed even when dealing with high quality
language) from its user will inevitably lead to AI techniques being more input data, the model design (i.e. neural network architecture) itself can
accessible to a large number of laboratories and research teams. Also, it be biased to predict a specific outcome that goes in line with the initial
is estimated that the number of freely available online databases and hypothesis.
data repositories containing results from experiments on oxidative stress Another important limitation of machine learning is the fact that
will substantially increase. This will enable a variety of data scientists many ML models are today considered to be so-called “black box” al­
and other IT specialists to participate in this area of science even though gorithms. In other words, the nature of data modeling in many ML ap­
they were not initially involved in original research. proaches reduces the ability of successful user interpretation of the
In addition to the increased accessibility of AI-related software, we underlying mechanisms of the model. This is especially true for neural
also expect further inclusion of various signal analysis methods not networks and decision trees where it is often very difficult, if not
directly related to machine learning but with results that can be easily impossible to adequately understand and explain the inner workings of
used as input data for model training and testing. Today there is a va­ the produced model and the reasons for the given prediction/classifi­
riety of these methods that can extract useful information from both one- cation. Therefore, in the future, quality assurance and other forms of
dimensional and two-dimensional signals (i.e. digital micrographs of quality testing of AI models, as well as repetition of computations on
cells). The examples would include fractal analysis, gray-level co- different samples, will be essential before a model can be implemented
occurrence matrix analysis or wavelet mathematical analysis done with in standard research and clinical practice.
discrete wavelet transform (Fig. 1). These techniques can detect and Finally, it should be noted that to, the present date, all machine
identify discrete patterns of data in biological signals and can be very learning approaches used in molecular biology (and in other sciences as
useful in providing an abundance of quantifications based on which a well) are related to the development of so-called “narrow artificial in­
ML model can be developed, as demonstrated recently by our laboratory telligence”. This type of AI is basically intended to perform a single task

Fig. 1. The proposed use of MLP model for prediction of oxidative damage based on contemporary image analysis methods such as textural GLCM (Gray level co-
occurrence matrix) and wavelet analysis. For details on the methods, the reader is referred to the recent publication (Davidovic et al., 2022).

5
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

or to achieve a specific goal (i.e. classification or prediction) and is [8] L.M. Davidovic, D. Laketic, J. Cumic, E. Jordanova, I. Pantic, Application of
artificial intelligence for detection of chemico-biological interactions associated
limited solely to this task type. For example, the neural network that is
with oxidative stress and DNA damage, Chem. Biol. Interact. 345 (2021) 109533.
design to predict the existence of a disease based on oxidative stress [9] R. Bardallo, A. Panisello-Rosello, S. Sanchez-Nuno, N. Alva, J. Rosello-Catafau,
indicators cannot provide us with the answers on the molecular mech­ T. Carbonell, Nrf2 and oxidative stress in liver ischemia/reperfusion injury, FEBS J
anism of the disease. A more general artificial intelligence is not possible (2021), https://doi.org/10.1111/febs.16336. In press.
[10] P. Kowalczyk, D. Sulejczak, P. Kleczkowska, I. Bukowska-Osko, M. Kucia,
using today’s technology and there is little evidence that it could be M. Popiel, E. Wietrak, K. Kramkowski, K. Wrzosek, K. Kaczynska, Mitochondrial
developed in the near future. Therefore, unless a substantial techno­ oxidative stress-A causative factor and therapeutic target in many diseases, Int. J.
logical breakthrough happens in the future, we can expect that most AI Mol. Sci. 22 (2021) 13384.
[11] N. Schottlender, I. Gottfried, U. Ashery, Hyperbaric oxygen treatment: effects on
and ML approaches in molecular biology will essentially be focused on mitochondrial function and oxidative stress, Biomolecules 11 (2021) 1827.
advanced applied statistical analysis without significant resemblance to [12] A. Jurcau, Insights into the pathogenesis of neurodegenerative diseases: focus on
true human cognitive processes. mitochondrial dysfunction and oxidative stress, Int. J. Mol. Sci. 22 (2021) 11847.
[13] I. Liguori, G. Russo, F. Curcio, G. Bulli, L. Aran, D. Della-Morte, G. Gargiulo,
G. Testa, F. Cacciatore, D. Bonaduce, P. Abete, Oxidative stress, aging, and
7. Conclusion diseases, Clin. Interv. Aging 13 (2018) 757–772.
[14] K.K. Griendling, R.M. Touyz, J.L. Zweier, S. Dikalov, W. Chilian, Y.R. Chen, D.
G. Harrison, A. Bhatnagar, S. American Heart Association Council on Basic
Artificial intelligence and machine learning models are today Cardiovascular, Measurement of reactive oxygen species, reactive nitrogen species,
frequently used for classification and prediction of various biochemical and redox-dependent signaling in the cardiovascular system: Az scientific
processes and phenomena. In recent years, numerous research efforts statement from the American heart association, Circ. Res. 119 (2016) e39–75.
[15] I. Pantic, J. Cumic, S.R. Skodric, S. Dugalic, C. Brodski, Oxidopamine and oxidative
have been focused on developing such models for assessment, catego­ stress: recent advances in experimental physiology and pharmacology, Chem. Biol.
rization, and prediction of oxidative stress. The obvious examples would Interact. 336 (2021) 109380.
include supervised learning models such as the ones based on logistic [16] J. Paunovic, D. Vucevic, T. Radosavljevic, S. Mandic-Rajcevic, I. Pantic, Iron-based
nanoparticles and their potential toxicity: focus on oxidative stress and apoptosis,
regression, decision trees and neural networks. Supervised machine
Chem. Biol. Interact. 316 (2020) 108935.
learning can successfully automate the process of evaluation and [17] V. Jakovljevic, M. Zlatkovic, D. Cubrilo, I. Pantic, D.M. Djuric, The effects of
quantification of oxidative damage in biological samples, as well as progressive exercise on cardiovascular function in elite athletes: focus on oxidative
stress, Acta Physiol. Hung. 98 (2011) 51–58.
extract useful data from the abundance of experimental results. Future
[18] C. Fenga, S. Gangemi, M. Teodoro, V. Rapisarda, K. Golokhvast, A.O. Docea, A.
studies will reveal the true potential, but also the limitations of AI in this M. Tsatsakis, C. Costa, 8-Hydroxydeoxyguanosine as a biomarker of oxidative DNA
and other areas of biochemistry. damage in workers exposed to low-dose benzene, Toxicol. Rep. 4 (2017) 291–295.
[19] R. Olowe, S. Sandouka, A. Saadi, T. Shekh-Ahmad, Approaches for reactive oxygen
species and oxidative stress quantification in epilepsy, Antioxidants 9 (2020) 990.
Funding [20] J. Arauz, E. Ramos-Tovar, P. Muriel, Redox state and methods to evaluate oxidative
stress in liver damage: from bench to bedside, Ann. Hepatol. 15 (2016) 160–173.
This research was supported by the Science Fund of the Republic of [21] M. Gaggini, L. Sabatino, C. Vassalle, Conventional and innovative methods to
assess oxidative stress biomarkers in the clinical cardiovascular setting,
Serbia, grant 7739645 “Automated sensing system based on fractal, Biotechniques 68 (2020) 223–231.
textural and wavelet computational methods for detection of low-level [22] M. Katerji, M. Filippova, P. Duerksen-Hughes, Approaches and methods to measure
cellular damage”, SensoFracTW. oxidative stress in clinical samples: research applications in the cancer field, 2019,
Oxid. Med. Cell. Longev. (2019) 1279250.
[23] A.N. Ramesh, C. Kambhampati, J.R. Monson, P.J. Drew, Artificial intelligence in
Declaration of competing interest medicine, Ann. R. Coll. Surg. Engl. 86 (2004) 334–338.
[24] K.H. Yu, A.L. Beam, I.S. Kohane, Artificial intelligence in healthcare, Nat. Biomed.
Eng. 2 (2018) 719–731.
The authors declare that they have no known competing financial [25] D.G. Cheirdaris, Artificial neural networks in computer-aided drug design: an
interests or personal relationships that could have appeared to influence overview of recent advances, Adv. Exp. Med. Biol. 1194 (2020) 115–125.
[26] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
the work reported in this paper. [27] S. Min, B. Lee, S. Yoon, Deep learning in bioinformatics, Briefings Bioinf. 18 (2017)
851–869.
Acknowledgements [28] S.M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, M.K. Khan, Medical
image analysis using convolutional neural networks: a review, J. Med. Syst. 42
(2018) 226.
This research was supported by the Science Fund of the Republic of [29] A.M. de la Villehuchet, M. Brack, G. Dreyfus, Y. Oussar, D. Bonnefont-Rousselot, M.
Serbia, grant7739645 “Automated sensing system based on fractal, J. Chapman, A. Kontush, A machine-learning approach to the prediction of
oxidative stress in chronic inflammatory disease, Redox Rep 14 (2009) 23–33.
textural and wavelet computational methods for detection of low-level [30] S.O. Idowu, A.A. Fatokun, Artificial intelligence (AI) to the rescue: deploying
cellular damage”, SensoFracTW. Prof. Igor Pantic is also grateful to machine learning to bridge the biorelevance gap in antioxidant assays, SLAS
NSF Center for Advanced Knowledge Enablement, Miami, FL, USA (I. Technol. 26 (2021) 16–25.
[31] D. Liu, S. Caliskan, B. Rashidfarokhi, H. Oldenhof, K. Jung, H. Sieme, A. Hilfiker,
Pantic is an external research associate). W.F. Wolkers, Fourier transform infrared spectroscopy coupled with machine
learning classification for identification of oxidative damage in freeze-dried heart
References valves, Sci. Rep. 11 (2021) 12299.
[32] P. Zane, H. Gieschen, E. Kersten, N. Mathias, C. Ollier, P. Johansson, A. Van den
Bergh, S. Van Hemelryck, A. Reichel, A. Rotgeri, K. Schafer, A. Mullertz,
[1] I. Dimitriadis, N. Zaninovic, A.C. Badiola, C.L. Bormann, Artificial intelligence in
P. Langguth, In vivo models and decision trees for formulation development in
the embryology laboratory: a review, S1472-6483, Reprod. Biomed. Online (2021),
early drug development: a review of current practices and recommendations for
00557-5.
biopharmaceutical development, Eur. J. Pharm. Biopharm. 142 (2019) 222–231.
[2] E. Crigger, K. Reinbold, C. Hanson, A. Kao, K. Blake, M. Irons, Trustworthy
[33] A. Sarica, A. Cerasa, A. Quattrone, Random forest algorithm for the classification of
augmented intelligence in health care, J. Med. Syst. 46 (2022) 12.
neuroimaging data in Alzheimer’s disease: a systematic review, Front. Aging
[3] S.M. Shah, R.A. Khan, S. Arif, U. Sajid, Artificial intelligence for breast cancer
Neurosci. 9 (2017) 329.
analysis: trends & directions, Comput. Biol. Med. 142 (2022) 105221.
[34] J. Tan, Y. Gao, Z. Liang, W. Cao, M.J. Pomeroy, Y. Huo, L. Li, M.A. Barish, A.
[4] A.A.H. de Hond, A.M. Leeuwenberg, L. Hooft, I.M.J. Kant, S.W.J. Nijman, H.J.
F. Abbasi, P.J. Pickhardt, 3D-GLCM CNN: a 3-dimensional gray-level Co-
A. van Os, J.J. Aardoom, T.P.A. Debray, E. Schuit, M. van Smeden, J.B. Reitsma, E.
occurrence matrix-based CNN model for polyp classification via CT colonography,
W. Steyerberg, N.H. Chavannes, K.G.M. Moons, Guidelines and quality criteria for
IEEE Trans. Med. Imag. 39 (2020) 2013–2024.
artificial intelligence-based prediction models in healthcare: a scoping review, NPJ
[35] W. Zhang, J.S. Rhodes, A. Garg, J.Y. Takemoto, X. Qi, S. Harihar, C.W. Tom Chang,
Digit. Med. 5 (2022) 2.
K.R. Moon, A. Zhou, Label-free discrimination and quantitative analysis of
[5] J.X. Wang, Y. Wang, Towards machine learning in molecular biology, Math. Biosci.
oxidative stress induced cytotoxicity and potential protection of antioxidants using
Eng. 17 (2020) 2822–2824.
Raman micro-spectroscopy and machine learning, Anal. Chim. Acta 1128 (2020)
[6] I.L. Hudson, Data integration using advances in machine learning in drug discovery
221–230.
and molecular biology, Methods Mol. Biol. 2190 (2021) 167–184.
[36] C. Yifan, S. Jianfeng, P. Jun, Development and validation of a random forest
[7] A.V. Singh, A. Romeo, K. Scott, S. Wagener, L. Leibrock, P. Laux, A. Luch,
diagnostic model of acute myocardial infarction based on ferroptosis-related genes
P. Kerkar, S. Balakrishnan, S.P. Dakua, B.W. Park, Emerging technologies for in
in circulating endothelial cells, Front Cardiovasc. Med. 8 (2021) 663509.
vitro inhalation toxicology, Adv. Healthc. Mater. 10 (2021), e2100633.

6
I. Pantic et al. Chemico-Biological Interactions 358 (2022) 109888

[37] M.M. Hasan, B. Manavalan, M.S. Khatun, H. Kurata, Prediction of S-nitrosylation [42] S. Zhang, W.A. Khan, L. Su, X. Zhang, C. Li, W. Qin, Y. Zhao, Predicting oxidative
sites by integrating support vector machines and random forest, Mol. Omics 15 stress induced by organic chemicals by using quantitative Structure-Activity
(2019) 451–458. relationship methods, Ecotoxicol. Environ. Saf. 201 (2020) 110817.
[38] L. Ho Thanh Lam, N.H. Le, L. Van Tuan, H. Tran Ban, T. Nguyen Khanh Hung, N.T. [43] Y. Kim, Y. Kim, J. Hwang, T.J. van den Broek, B. Oh, J.Y. Kim, S. Wopereis,
K. Nguyen, L. Huu Dang, N.Q.K. Le, Machine learning model for identifying J. Bouwman, O. Kwon, A machine learning algorithm for quantitatively diagnosing
antioxidant proteins using features calculated from primary sequences, Biology 9 oxidative stress risks in healthy adult individuals based on health space
(2020) 325. methodology: a proof-of-concept study using Korean cross-sectional cohort data,
[39] N. Verma, H. Singh, D. Khanna, P.S. Rana, S.K. Bhadada, Classification of drug Antioxidants 10 (2021) 1132.
molecules for oxidative stress signalling pathway, IET Syst. Biol. 13 (2019) [44] L.M. Davidovic, J. Cumic, S. Dugalic, S. Vicentic, Z. Sevarac, G. Petroianu,
243–250. P. Corridon, I. Pantic, Gray-level Co-occurrence matrix analysis for the detection of
[40] G. Shemshaki, A.S.N. Murthy, S.S. Malini, Assessment and establishment of discrete, ethanol-induced, structural changes in cell nuclei: an artificial intelligence
correlation between reactive oxidation species, citric acid, and fructose level in approach, Microsc. Microanal. 28 (2022) 265–271.
infertile male individuals: a machine-learning approach, J. Hum. Reprod. Sci. 14 [45] E.J. Topol, High-performance medicine: the convergence of human and artificial
(2021) 129–136. intelligence, Nat. Med. 25 (2019) 44–56.
[41] N. Lavender, D.W. Hein, G. Brock, C.R. Kidd, Evaluation of oxidative stress [46] V. Kaul, S. Enslin, S.A. Gross, History of artificial intelligence in medicine,
response related genetic variants, pro-oxidants, antioxidants and prostate cancer, Gastrointest. Endosc. 92 (2020) 807–812.
AIMS Med. Sci. 2 (2015) 271–294.

You might also like