Bioinformatics
Bioinformatics
Bioinformatics
Soumi Dutta
Saikat Gochhait
Information Retrieval
in Bioinformatics
A Practical Approach
Information Retrieval in Bioinformatics
Soumi Dutta · Saikat Gochhait
Editors
Information Retrieval
in Bioinformatics
A Practical Approach
Editors
Soumi Dutta Saikat Gochhait
Institute of Engineering & Symbiosis Institute of Digital
Management and Telecom Management
Kolkata, West Bengal, India Symbiosis International (Deemed
University)
Pune, Maharashtra, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer
Nature Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the
Publisher, whether the whole or part of the material is concerned, specifically the rights
of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on
microfilms or in any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc.
in this publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore free for
general use.
The publisher, the authors, and the editors are safe to assume that the advice and informa-
tion in this book are believed to be true and accurate at the date of publication. Neither
the publisher nor the authors or the editors give a warranty, expressed or implied, with
respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
This Palgrave Macmillan imprint is published by the registered company Springer Nature
Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore
189721, Singapore
Contents
1 Bioinformatics Overviews 1
Ritu Pasrija
2 Artificial Intelligence in Biological Sciences: A Brief
Overview 19
Uma Dutta, Nikhil Danny Babu, and Girish S. Setlur
3 A Review of Recent Advances in Translational
Bioinformatics and Systems Biomedicine 37
Chittaranjan Baruah, Bhabesh Deka, and Saurov Mahanta
4 Application of Bioinformatics in Agricultural
Pest Management: An Overview of the Evolving
Technologies 63
Bhabesh Deka, Azariah Babu, and Uma Dutta
5 Application of Bioinformatics in Health Care
and Medicine 83
P. Keerthana and Saikat Gochhait
6 Information Retrieval in Bioinformatics: State
of the Art and Challenges 101
Sunita, Sunny Sharma, Vijay Rana, and Vivek Kumar
v
vi CONTENTS
Index 155
Editors and Contributors
Dr. Soumi Dutta She did her B.Tech in Information Technology from
WBUT and M.Tech in Computer Science Engineering from WBUT with
securing 1st position (Gold medal). She has Ph.D. from Indian Institute
of Engineering Science and Technology (IIEST, Shibpur).
Contributors
Dr. Azariah Babu completed Ph.D. under the guidance of Prof. Dr. T.
N. Ananthakrishnan and done Post-Doctoral Research under the guid-
ance of Professor (Mrs.) Silvia Dorn at the SWISS Federal Institute of
Technology Zurich (ETH) in the Institute of Plant Sciences, Applied
Entomology Zurich, Switzerland. He has more than 28 years of research
experience in basic and applied aspects of Entomology. He has been
vii
viii EDITORS AND CONTRIBUTORS
Chapter 7
Fig. 1 Steps in a classifier model to perform data classification 113
Fig. 2 Comparison architecture of existing and SVM–GWO 114
Fig. 3 Linear SVM classifier with a decision plane 119
Fig. 4 Flowchart of hybrid SVM–GWO classifier 121
Fig. 5 SVM–GWO accuracy comparison for conventional
approaches 124
Fig. 6 SVM–GWO sensitivity comparison for conventional
approaches 125
Fig. 7 SVM–GWO specificity comparison for conventional
approaches 127
Fig. 8 SVM–GWO time period comparison for conventional
approaches 128
xi
List of Tables
Chapter 3
Table 1 Resources for translational bioinformatics that are open
to the public 51
Chapter 5
Table 1 Bioinformatic tools may be classified as follows: 85
Chapter 7
Table 1 SVM–GWO accuracy comparison for conventional
approaches 123
Table 2 SVM–GWO sensitivity comparison for conventional
approaches 125
Table 3 SVM–GWO specificity comparison for conventional
approaches 126
Table 4 SVM–GWO time period comparison for conventional
approaches 128
xiii
CHAPTER 1
Bioinformatics Overviews
Ritu Pasrija
1 Background
In the 1970s, a Dutch theoretical biologist Paulien Hogeweg along
with Ben Hesper, first coined the term bioinformatics. They were inter-
ested in accumulating information regarding biological systems. Their
observation was that in addition to biochemistry and biophysics, it is
worthwhile to recognise bioinformatics as a research area and has the
potential to become ‘biology of the future’. This became true as in
these last 50 years, development of bioinformatics has happened at a
very fast pace. Although for a particular interval, persons viewed bioinfor-
matics as the software tools advancement method to support, accumulate,
manoeuvre, and scrutinise biological information. Whilst this application
is indeed a significant one in bioinformatics, this field has much more
potential than that. Both ‘bioinformatics’ and ‘computational biology’
are instrumental in accumulating enormous information of several parts
of natural science. So, it is important to understand the difference in
these two terms. On the one hand, bioinformatics uses computer science,
R. Pasrija (B)
Department of Biochemistry, Maharshi Dayanand University, Rohtak, India
e-mail: [email protected]
2 History
During the 1950s, DNA and computers were not the important tools
in research and in biochemistry, investigations were largely happening
on mechanistic enzymes model. Many scientists in fact thought that
proteins are the carriers of genetic information, as DNA seemed too
simple to carry genetic information, whereas protein show a large number
of alternatives and complexity.
This, the major turning point in bioinformatics has to be DNA being
regarded as the genetic material. The first evidence for this came from
experiments of Oswald Avery et al. (1944), who revealed that DNA regu-
lates the characters in organisms, instead of proteins. This group studied
the uptake of pure DNA from a virulent Streptococcus pneumoniae (S.
pneumonia) bacterial strain, which has smooth round colonies (named
S); which could bestow virulence to even a non-virulent strain (rough
colonies, R) (Avery et al., 1944). Subsequent work by Alfred Hershey
and Martha Chase (in 1952) validated these findings that DNA of bacte-
rial cells infected by bacteriophages can be transmitted to other bacterium
and alter the phenotype of recipient cell (Hershey & Chase, 1952). Later
in 1953, James Watson, Francis Crick, and Rosalind Elsie Franklin finally
proposed the double-helix structure of DNA (Franklin & Gosling, 1953;
Watson & Crick, 1953). Further, it took additional 13 years in inter-
preting the amino acid codon and 24 additional years in improving the
first DNA sequencing technique. Thus, 1970–1980 witnessed a paradigm
shift from protein to DNA analysis. In 1970, Saul B. Needleman and
1 BIOINFORMATICS OVERVIEWS 3
>AGTAAAGGAGAAGAACTTTTCACTGGAGTTGTGACAATTCTTGTTGAATTAGATGGTGAT
GTTAATGGTCACAAATTTTCTGTTAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAAC
TTACCCTTAAATTTATTTGTACTACTGGAAAACTACCTGTTCCCTGGCCAACACTTGTTAC
TACTTTGACTTATGGTGTTCAATGTTTTTCAAGATACCCAGATCACATGAAACGGCACGAC
TTTTTCAAGAGTGCAATGCCCGAAGGTTATGTACAAGAAAGAACTATTTTTTTCAAAGATG
ACGGTAACTACAAGACACGTGCTGAAGTTAAGTTTGAAGGTGATACCCTTGTTAATAGAAT
CGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTTGGACACAAATTGGAATAC
AACTATAACTCACACAATGTATACATTATGGCAGACAAACAAAAGAATGGAATCAAAGTTA
8 R. PASRIJA
ACTTCAAAATTAGACACAACATTGAAGATGGAAGTGTTCAACTAGCAGACCATTATCAACA
AAATACTCCAATTGGCGATGGCCCTGTTCTTTTACCAGACAACCATTACCTGTCCACACAA
TCTGCTCTTTCTAAAGATCCCAACGAAAAGAGAGACCATATGGTGCTTCTTGAGTTTGTAA
CAGCTGCTGGTATTACACACGGTATGGATGAACTATACAAACACCATCACCATCACCATCA
CTAG
>AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTTCCAAGGGCCT
TTGCGTCAGGTGGGCTCAGGATTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAGG
GAGGACGTGGCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACC
TGGCCTTCAGCCTGCCTCAGCCCTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTG
TGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCC
TTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAA
CGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGGTGAGCCAACT
GCCCATTGCTGCCCCTGGCCGCCCCCAGCCACCCCCTGCTCCTGGCGCTCCCACCCAGCATGG
GCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTTTAAAAAGAAGTT
CTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCTGAGGA
CGGTGTTGGCTTCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGC
CCCTCAAACAAATGCCCCGCAGCCCATTTCTCCACCCTCATTTGATGACCGCAGATTCAAGTG
TTTTGTTAAGTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGCCTCTGG
GCGAACACCCCATCACGCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGT
CGCCAGGCCTCACGGCAGCTCCATAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGG
GAGAAGTACTGGGATCACCTGTTCAGGCTCCCACTGTGACCTGCCCCGGGGCGGGGGAAGGAG
GTGG
GACATGTGGGCGTTGGGGCCTGTAGGTCCACACCCAGTGTGGGTGACCCTCCCTCTAACCTGG
GTCCAGCCCGGCTGGAGATGGGTGGGAGTGCGACCTAGGGCTGGCGGGCAGGCGGGCACTGTG
TCTCCCTGACTGTGTCCTCCTGTGTCCCTCTGCCTCGCCGCTGTTCCGGAACCTGCTCTGCGC
GGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGC
CCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCT
GCTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCC
GCCGCCTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAGC
>MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVT
TFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIEL
1 BIOINFORMATICS OVERVIEWS 9
KGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPI
GDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
>MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
BLAST: Its full form is Basic Local Alignment Search Tool and
this procedure catches the region of similarity between sequences. This
was first proposed in 1990 by David J. Lipman and his team, and
one of highly cited paper with more than 65,000 citations (Altschul
et al., 1990). This program can compare both nucleotide (n-BLAST)
and protein primary sequences (p-BLAST) in two different variants. The
evaluation of sequence is finally used to calculate the statistical signif-
icance of matches. It is freely available on internet at ‘https://blast.
ncbi.nlm.nih.gov/Blast.cgi’. It can be performed on various operating
systems like UNIX, Linux, Mac, and MS Windows and is written in C
and C+ language. BLAST deduce useful and evolutionary relationships
between linear arrangements of bases/amino acids, as well as help identify
members of gene families. Like haemoglobin gene sequence in humans
can be compared with that of mouse. The input sequence is generally
provided in FASTA or gene bank format; whereas the output formats,
include HTML, plain text, and XML. Besides online, the program is also
available in free and paid download versions (BLAST+). The megablast
and discontiguous megablast are other variants with separate applications.
Clustal W/X: is an algorithm for multiple sequence analysis (MSA) of
DNA or proteins. It produces meaningful multiple sequence alignment of
divergent species, and computes best matches for the chosen sequences
and line them up so that the resemblances and the disparities can be
understood by viewing the cladograms and phylograms, both of which
are part of the Clustal W algorithm.
4 Application of Bioinformatics
Genome Applications: It starts with DNA sequencing, genome assembly,
annotation of genes and prediction of gene function (based on the simi-
larity to known genes), sequence analysis for comparative exploration,
evolutionary studies, etc. Algorithms such as BLAST, Clustal W and
FASTA provide clarification in sequence investigation and examination.
This has benefitted the proteomics, transcriptomics, and metabolomics
studies. Similarly, functional genomics studies involve RNA-sequence
alignment and differential expression analysis. Comparative genomics and
computational evolutionary biology shed light on major events in evolu-
tion and divergence. Besides them primer designing, restriction enzymes
map analysis, RNA fold, dot plot are other genome-based applications.
In predicting protein structure: The RCSB PDB (Research Collabo-
ratory for Structural Bioinformatics Protein Data Bank) provided the first
open access digital platform for researchers and is available at ‘https://
www.rcsb.org/’. It supports retrieval of 3D-structure data of biological
molecules, including proteins and involves protein sequence retrieval,
followed by virtually establishing the similarities with sequences of known
structures, present in the PDB (homology modelling) (Berman et al.,
2000).
Biomedical: In the biomedical field, bioinformatics tools have an over-
powering effect on the understanding of genome, molecular medicine,
personalised medicine, and preventive medicine. Novel information on
the molecular mechanism of any ailment makes it easier to efficiently treat
and prevent the disease. This makes it easier to investigate genes straight-
forwardly associated with numerous diseases. For all ailments, alike drug
is given to patients, but different people have different genotype, so it
is important to consider the variations, even subtle but significant (called
single nucleotide polymorphism, SNP) in patients’ response to drug. It
is not exaggeration that if DNA profile of a patient is analysed, the
medication would be as unbeaten and efficient as possible, especially in
chemotherapy. Like Tamoxifen, (commercially known as Nolvadex), is
1 BIOINFORMATICS OVERVIEWS 13
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic
local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.
https://doi.org/10.1016/S0022-2836(05)80360-2
Avery, O. T., MacLeod, C. M., & McCarty, M. (1944). Studies on the chemical
nature of the substance inducing transformation of pneumococcal types. The
Journal of Experimental Medicine, 79(2), 137–158.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H.,
Shindyalov, I. N., & Bourne, P. E. (2000). The Protein Data Bank. Nucleic
Acids Research, 28(1), 235–242. https://doi.org/10.1093/nar/28.1.235
Bianchi, L., & Liò, P. (2007). Forensic DNA and bioinformatics. Briefings in
Bioinformatics, 8(2), 117–128. https://doi.org/10.1093/bib/bbm006
Brugada, P., Smeets, J. L., Brugada, J., & Farré, J. (1990). Mechanism of action
of sotalol in supraventricular arrhythmias. Cardiovascular Drugs and Therapy,
4(Suppl 3), 619–623. https://doi.org/10.1007/BF00357040
Dong, X., & Zheng, W. (2008). A new structure-based QSAR method affords
both descriptive and predictive models for phosphodiesterase-4 inhibitors.
Current Chemical Genomics, 2, 29–39. https://doi.org/10.2174/187539
7300802010029
1 BIOINFORMATICS OVERVIEWS 17
1 Introduction
Artificial intelligence (AI) is the term used to broadly describe intelli-
gence demonstrated by machines. In the natural world, humans and other
animals display intelligent behavior in the sense of navigating their envi-
ronments and solving problems to achieve an end goal like finding a
U. Dutta (B)
Department of Zoology, Cotton University, Panbazaar, Guwahati, India
e-mail: [email protected]
N. D. Babu · G. S. Setlur
Department of Physics, IIT Guwahati, Guwahati, India
e-mail: [email protected]
G. S. Setlur
e-mail: [email protected]
G. S. Setlur
Mehta Family School of Data Science and Artificial Intelligence, IIT Guwahati,
Guwahati, India
function has the shape of a straight line. Any function that is not a simple
straight line when plotted is termed as a nonlinear function. It is neces-
sary to use nonlinear functions in the neural networks to get the higher
levels of abstraction needed for learning to take place.
The strength of each of the connections between the neurons in adja-
cent layers is quantified by assigning a number called a “weight” to each
of the connections. Now consider a neuron in one of the hidden layers.
The outputs of the neurons in the preceding layer are multiplied by their
assigned weights and all of this is summed up over the number of neurons
in that layer and an additional parameter called a “bias” is added to this
weighted sum, and this is fed as input to the activation function of the
neuron under consideration. This is how information flows from layer
to layer in the neural network. This process is called “feedforward” pass
through the network. So far we have encountered four parameters that
are important for constructing a neural network. We can think of these
parameters as knobs that one can adjust to design a neural network of the
desired size and configuration. Summarizing these parameters, they are:
Width: The number of neurons in each layer. This can vary from layer to
layer.
Depth: The number of hidden layers, i.e., the number of layers other than
the input and output layer.
Weights: The strengths of the connections between the neurons.
Bias: This is a parameter associated with each neuron. It is basically a
number that is added to the input of the neuron. It is also referred to as
an offset.
The other two methods are beyond the scope of this chapter. In super-
vised learning, the labeled dataset is split into a training dataset and a
test dataset which is used to benchmark the trained model. During the
course of the training, the architecture of the neural network remains
the same, i.e., the depth and width parameters of the network are not
adjusted, but the weights and the biases (initially assigned randomly) will
keep changing until their optimum values are obtained. For this reason,
the weights and biases are called hyperparameters. In simple words, what
the neural network learns is the optimal values of all the weight and biases
that allow it to make accurate predictions. The concept of a cost function
is needed to understand this process. It is basically a mathematical func-
tion of the set of weights (w) and biases (b) denoted by C (w, b) and
it is proportional to the square of the difference between the correct
output/labels of the training data F (y train ) and the output produced
by the neural network F w,b (y train ). That is, C (w, b) ∝ F w,b (y train ) −
F (y train )2 . The idea is to find the set of weights and biases for which
the cost function C (w, b) is minimum. The cost function is like a penalty
for the NN and the target is to achieve the minimum possible penalty.
This is accomplished using “gradient descent” algorithm and backpropa-
gation. Gradient is the derivative of the cost function with respect to the
weight dC/dw. If the cost increases with increasing weight the gradient
will be positive, on the other hand, if the cost decreases with increasing
weight the gradient will be negative. The model needs to know whether
to increase or decrease the weights in order to minimize the cost func-
tion, the negative of the gradient (−dC/dw) shows exactly this. Now
the model knows in which direction to move the weights but we must
specify by what amount it must change the weights and this is decided
by the learning rate parameter η. The weights are updated according to
the following formula w = w − η|dC/dw| till the minimum of the cost
function (where dC/dw = 0) is reached. We have just briefly covered
the very basic introduction to gradient descent but in fact in practice
several gradient descent techniques are employed like ‘stochastic gradient
descent,’ ‘batch gradient descent,’ ‘mini batch gradient descent,’ etc. In
practice the situation is more complex than what we have described as the
cost function can have multiple minima and the gradient descent proce-
dure can get stuck around a local minima but the model should reach
the global minimum of the cost function in order for it to be successfully
optimized. This issue can arise if the initially assigned weights are near a
local minimum of the cost function. AI engineers use “backpropagation”
2 ARTIFICIAL INTELLIGENCE IN BIOLOGICAL SCIENCES: … 25
to ensure that the global minimum of the cost function is reached by the
gradient descent algorithm. The cost function is calculated at the output
layer and this information should be backpropagated to the previous
layers and all the weights associated with the neuron connections all the
way up to the input layer should be adjusted using the gradient descent
formula. This reverse flow of information in the neural network is termed
backpropagation. The mathematical details of this procedure are beyond
the scope of this chapter, and the interested reader can refer to any of the
excellent introductory books and articles on machine learning (Alpaydin,
2020; Baştanlar & Ozuysal, 2014; Kubat, 2017).
5 Limitations of AI
The field of artificial intelligence has come a long way since its inception
in the 1950s and now has become an indispensable part of technology.
But there is a caveat since we don’t yet understand completely the inner
workings of a deep learning model. There are several pitfalls and limita-
tions to AI that can easily be exploited to make it give completely wrong
predictions even for simple tasks. It seems that artificial intelligence is not
that intelligent after all and it should be used with caution with human
supervision. Blindly following the predictions of AI can be misleading in
some cases.
5.1 Overfitting
If a machine learning model gives correct predictions on training data
with good confidence but performs sub-optimally on new data used for
validation then it means that the model is overfitting the data. This
happens when the model starts learning the noisy features in the training
data in addition to the useful features. When there are more hyperpa-
rameters (weights and bias) to adjust than is necessary the model tends
to overfit the data and loses its predictive flexibility when supplied with
new data. To avoid overfitting the training process should be stopped just
before the cost function starts increasing after an initial decrease, if the
training process is continued for longer than necessary, then the model
becomes highly specialized to the training data won’t be generalized to
unknown data. Also selective dropping of some connections (weights)
during backpropagation will reduce the number of adjustable parameters
and decreases the complexity of the model to avoid overfitting.
Underfitting occurs when the model is not optimized properly and
performs poorly with the training data. To avoid underfitting more
32 U. DUTTA ET AL.
interest (Tramèr et al., 2020). This drastically reduces error rate due to
more sophisticated adversarial attacks.
6 Conclusions
In this chapter, we have given a brief overview of the impact of artifi-
cial intelligence (AI) in the biological sciences and bioinformatics (Varsha
et al., 2021). Using simple examples and basic terminology, we briefly
described the building blocks of AI and the steps that go into imple-
menting a successful model. Without burdening the reader with mathe-
matical detail we discussed what machine learning is and how the learning
process in a neural network works. We described how AI is being used to
push the frontiers in understanding the working of our brain and also how
it has become an indispensable tool in modern medical diagnosis. Some
of the most difficult scientific problems of the twentieth century like the
protein folding problem have become more tractable with cutting-edge
developments in AI in recent years. Exponential progress is being made in
this field day by day and it is becoming evident that artificial intelligence
together with human curiosity and innovation would be able to tackle the
biggest challenges that humankind is faced with.
34 U. DUTTA ET AL.
References
Albelwi, S., & Mahmood, A. (2017). A framework for designing the architectures
of deep convolutional neural networks. Entropy, 19(6). https://doi.org/10.
3390/e19060242. https://www.mdpi.com/1099-4300/19/6/242
Alpaydin, E. (2020). Introduction to machine learning. MIT press
Baştanlar, Y., & Ozuysal, M. (2014). Introduction to machine learning. In
MiRNomics: MicroRNA biology and computational analysis (pp 105–128).
Bonnen, T., Yamins, D. L., & Wagner, A. D. (2021). When the ventral
visual stream is not enough: A deep learning account of medial temporal
lobe involvement in perception. Neuron, 109(17), 2755–2766.e6. https://
doi.org/10.1016/j.neuron.2021.06.018. https://www.sciencedirect.com/sci
ence/article/pii/S0896627321004591
Cohen, T., Weiler, M., Kicanaoglu, B., & Welling, M. (2019). Gauge equiv-
ariant convolutional networks and the icosahedral CNN. In K. Chaudhuri,
R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on
Machine Learning, PMLR, Proceedings of Machine Learning Research (Vol.
97, pp. 1321–1330). https://proceedings.mlr.press/v97/cohen19d.html
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing
adversarial examples. arXiv preprint arXiv:1412.6572.
Huang, Y., Xu, J., Zhou, Y., Tong, T., Zhuang, X., & ADNI. (2019). Diagnosis
of Alzheimer’s disease via multi-modality 3d convolutional neural network.
Frontiers in Neuroscience, 13, 509. https://doi.org/10.3389/fnins.2019.
00509. https://www.frontiersin.org/article/10.3389/fnins.2019.00509
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., &
McDermott, J. H. (2018) A task-optimized neural network repli-
cates human auditory behavior, predicts brain responses, and reveals a
cortical processing hierarchy. Neuron, 98(3), 630–644.e16. https://doi.org/
10.1016/j.neuron.2018.03.044. https://www.sciencedirect.com/science/art
icle/pii/S0896627318302502
Kubat, M. (2017). An introduction to machine learning. Springer.
Sverrisson, F., Feydy, J., Correia, B. E., & Bronstein, M. M. (2021). Fast end-to-
end learning on protein surfaces. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR) (pp. 15272–15281).
Tramèr F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel,
P. (2020). Ensemble adversarial training: Attacks and defenses. arXiv preprint
arXiv:1705.07204.
Varsha, P. S., Akter, S., Kumar, A., Gochhait, S., & Patagundi, B. (2021). The
impact of artificial intelligence on branding: A bibliometric analysis (1982–
2019). Journal of Global Information Management (JGIM), 29(4), 221–246.
https://doi.org/10.4018/JGIM.20210701.oa10
2 ARTIFICIAL INTELLIGENCE IN BIOLOGICAL SCIENCES: … 35
Wang, P. Y., Sun, Y., Axel, R., Abbott, L., & Yang, G. R. (2021). Evolving the
olfactory system with machine learning. bioRxiv. https://doi.org/10.1101/
2021.04.15.439917. https://www.biorxiv.org/content/early/2021/04/
16/2021.04.15.439917, https://www.biorxiv.org/content/early/2021/
04/16/2021.04.15.439917.full.pdf
Winkels, M., & Cohen, T. S. (2019). Pulmonary nodule detection in
CT scans with equivariant CNNs. Medical Image Analysis, 55, 15–26.
https://doi.org/10.1016/j.media.2019.03.010. https://www.sciencedirect.
com/science/article/pii/S136184151830608X
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J.
J., & Yamins, D. L. K. (2021). Unsupervised neural network models of
the ventral visual stream. Proceedings of the National Academy of Sciences,
118(3). https://doi.org/10.1073/pnas.2014196118. https://www.pnas.
org/content/118/3/e2014196118. https://www.pnas.org/content/118/
3/e2014196118.full.pdf
CHAPTER 3
1 Introduction
Translational research utilises scientific findings produced in the lab,
clinic, or field and turns them into novel therapies and medical care
methods that directly enhance human health. Translational research
aims to transfer fundamental scientific discoveries into application more
C. Baruah
Bioinformatics Laboratory, Postgraduate Department of Zoology, Darrang
College, Tezpur, India
B. Deka (B)
North Bengal Regional R and D Centre, Tea Research Association, Jalpaiguri,
India
e-mail: [email protected]
S. Mahanta
National Institute of Electronics and Information Technology (NIELIT),
Guwahati, India
2 Translational Bioinformatics
Translational bioinformatics (TBI) is a new area that applies biological
research to patient care and medication discovery. Develop and analyse
clinical and biological data to study illness heterogeneity using computer
methods. The search for disease gene(s) requires a thorough under-
standing of the complex network of biological mechanisms involved
in disease progression. This chapter aims to outline the biological and
clinical data integration strategy. It also explains the key datasets and
techniques used in translational bioinformatics to treat illnesses.
Translational bioinformatics focuses on utilising current research to
connect biological data with clinical informatics. Translational bioinfor-
matics now covers the biological and healthcare industries, bridging
the gaps between the bioinformatics and medical informatics. Transla-
tional bioinformatics has made several databases available to researchers.
These databases are useful for physicians, biologists, clinical researchers,
bioinformaticians, and health care researchers. These databases help biol-
ogists comprehend illness management and medication development
techniques, which help them, generate novel hypotheses. Gene variations,
enzymes, and descriptive genomics databases are examples of translational
bioinformatics databases.
40 C. BARUAH ET AL.
3 Bioinformatics Interventions
in Translational Research
The National Institutes of Health (NIH) defines translational research as
having two areas of translation. There are two ways to achieve this: one
is to adapt findings acquired during laboratory and preclinical research
to the creation of clinical trials and human studies. It’s also important
to note that the second area of translation involves research targeted at
boosting the adoption of best practices in the community. In translational
research, the cost-effectiveness of preventive and treatment methods is
equally essential. From the scientist to the user, translational research
shifts the emphasis. User involvement is increasingly important in the
translational research paradigm. They have an impact on the priorities
of academics.
7 Systems Biomedicine
Systems biology is a new multidisciplinary study that combines biology,
mathematics, computer science, physics, and engineering. Most biolog-
ical systems are too complicated for even the most sophisticated computer
models to capture all system characteristics. A useful mode should be able
to correctly comprehend the system under investigation and give trust-
worthy prediction results. To do this, a certain degree of abstraction may
be needed, focusing on the system behaviours of interest while ignoring
other aspects. Systems biology does not study individual genes or proteins
one at a time, as has been the case for the last 30 years. Rather, it studies
the interactions of all components in a biological system in action. With
the goal of building formal algorithmic models for predicting process
outcomes from component input, systems biomedicine is an emerging
approach to biomedical research. Several important characteristics define
the systems approach:
on medication response
Phenotype Knowledgebase http://phekb.org Electronic phenotypic
algorithms and their
performance characteristics
may be built, validated, and
shared via an online
collaborative repository
Pharmacogenomic http://www.fda.gov/ Contains a list of
Biomarkers in Drug Labels drugs/scienceresearch/researchareas/pharmacogenetics/ucm083378.htm FDA-approved medicines
that include
pharmacogenomic
information on their labels
Clinical Pharmacogenetics http://www.pharmgkb.org/page/cpic Contain a list of the
Implementation guidelines of CPIC for
Consortium (CPIC) drug-gene interactions
NHGRI Catalog of GWAS http://www.genome.gov/ Curated list of phenotypes
studies and key results of GWAS
studies
Catalog of PheWAS results http://phewascatalog.org Contains the catalogue of
EHR PheWAS results
Drug-Gene Interaction http://dgidb.genome.wustl.edu Data from 13 sources is
database used to provide a search
interface for drug-gene
A REVIEW OF RECENT ADVANCES IN TRANSLATIONAL …
interactions
51
(continued)
52
Table 1 (continued)
connections between
human variants and
phenotypes, as well as
supporting data
SHARPn http://phenotypeportal.org SHARPn developed a
compendium of computable
phenotypic algorithms
3 A REVIEW OF RECENT ADVANCES IN TRANSLATIONAL … 53
patient was never tested, and the number of affected embryos (if any) was
never revealed.
The ability to simultaneously investigate several genes or the entire
genome brings up new possibilities in genomic medicine. Patients are
challenging doctors about the applicability of genetic and genomic
medicine to their own care, since new technologies promise better diag-
noses and treatments. Others believe that incorporating genetics and
genomics into routine clinical practice will be difficult.
Thus, methods are required to handle and analyse such large, varied,
and complicated information efficiently. Big data analytics, a common
phrase for big and complicated datasets, is critical in handling enor-
mous healthcare data and enhancing patient care. It also has the potential
56 C. BARUAH ET AL.
8 Related Work
Translational bioinformatics, systems biomedicine, clinical informatics,
statistical genetics, and genomic medicine are all being enticed to play an
increasingly important role in accelerating the translation of genome-scale
studies to hypothesis-driven biological modelling, effective treatment, and
tailored disease management or prevention. Over the last decade, tech-
nological improvements in high-throughput sequencing have resulted in
a growing global capacity for easily creating nucleotide sequences. The
1000 Genomes Project was created in order to compile comprehensive
genetic variation maps of individuals from distinct groups (1000 Genomes
Project Consortium, 2015). For the integration of genetic data with clin-
ical information, data from primary care, hospitals, outcomes, registries,
and social care records should first be gathered using controlled clinical
terminologies such as SNOMED Clinical Terms and the Human Pheno-
type Ontology (Köhler et al., 2017). The Global Alliance for Global
3 A REVIEW OF RECENT ADVANCES IN TRANSLATIONAL … 57
9 Conclusion
The biological and healthcare industries are now covered by transla-
tional bioinformatics, which bridges the gap between bioinformatics
and medical informatics. Current research is used to connect biological
data with clinical informatics in translational bioinformatics. It requires
analysing and sequencing an organism’s whole genetic code, from genes
to transcripts. The translational bioinformatics databases help biologists
to learn about disease management and therapeutic development. A
signalling pathway’s operation in a cell can be studied using systems
biology. A comprehensive view of a biological entity is created by
combining genomic, proteomic, and bioinformatics data. Bioinformatics
methods can be used to replicate the appearance of specific human
diseases or healthy states. HGP discoveries are used in translational
genomics to improve the diagnosis, prognosis, and therapy of compli-
cated disorders. These innovations have revolutionised both healthcare
and biomedical research. New tools and methodologies are needed to
turn massive databases into usable knowledge.
References
1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R. M.,
Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S.,
McVean, G. A., & Abecasis, G. R. (2015). A global reference for human
genetic variation. Nature, 526(7571), 68–74. https://doi.org/10.1038/nat
ure15393
Allmer, J. (2012). Existing bioinformatics tools for the quantitation of post-
translational modifications. Amino Acids, 42(1), 129–138. https://doi.org/
10.1007/s00726-010-0614-3
Altman, R. B. (2012). Translational bioinformatics: Linking the molecular world
to the clinical world. Clinical Pharmacology & Therapeutics, 91(6), 994–1000.
http://doi.wiley.com/10.1038/clpt.2012.49
Aronson, S. J., & Rehm, H. L. (2015). Building the foundation for genomics in
precision medicine. Nature, 526(7573), 336–342. https://doi.org/10.1038/
nature15816
Buccitelli, C., & Selbach, M. (2020). mRNAs, proteins and the emerging princi-
ples of gene expression control. Nature Reviews. Genetics, 21(10), 630–644.
https://doi.org/10.1038/s41576-020-0258-4
Burton, J. L., & Underwood, J. (2007). Clinical, educational, and epidemio-
logical value of autopsy. Lancet (london, England), 369(9571), 1471–1480.
https://doi.org/10.1016/S0140-6736(07)60376-6
60 C. BARUAH ET AL.
Little, J., & Hawken, S. (2010). On track? Using the human genome epidemi-
ology roadmap. Public Health Genomics, 13(4), 256–266. https://doi.org/
10.1159/000279627
Liu, Y., Beyer, A., & Aebersold, R. (2016). On the dependency of cellular
protein levels on mRNA abundance. Cell, 165(3), 535–550. https://doi.org/
10.1016/j.cell.2016.03.014
Liu, Z. X., Cai, Y. D., Guo, X. J., Li, A., Li, T. T., Qiu, J. D., Ren, J., Shi, S.
P., Song, J. N., Wang, M. H., Xie, L., Xue, Y., Zhang, Z. D., & Zhao, X. M.
(2015). Yi chuan = Hereditas, 37 (7), 621–634. https://doi.org/10.16288/
j.yczz.15-003
Mohabatkar, H., Rabiei, P., & Alamdaran, M. (2017). New achievements
in bioinformatics prediction of post translational modification of proteins.
Current Topics in Medicinal Chemistry, 17 (21), 2381–2392. https://doi.
org/10.2174/1568026617666170328100908
Pagon, R. A., Tarczy-Hornoch, P., Baskin, P. K., Edwards, J. E., Covington,
M. L., Espeseth, M., Beahler, C., Bird, T. D., Popovich, B., Nesbitt, C.,
Dolan, C., Marymee, K., Hanson, N. B., Neufeld-Kaiser, W., Grohs, G. M.,
Kicklighter, T., Abair, C., Malmin, A., Barclay, M., & Palepu, R. D. (2002).
Genetests-geneclinics: Genetic testing information for a growing audience.
Human Mutation, 19(5), 501–509. https://doi.org/10.1002/humu.10069
Ritchie, M. D., Moore, J. H., & Kim, J. H. (2020). Translational bioinformatics:
Biobanks in the precision medicine era. Pacific Symposium on Biocomputing,
25, 743–747.
Rubin, D. L., Thorn, C. F., Klein, T. E., & Altman, R. B. (2005). A statistical
approach to scanning the biomedical literature for pharmacogenetics knowl-
edge. Journal of the American Medical Informatics Association, 12, 121–129.
https://doi.org/10.1197/jamia.M1640
Sanseau, P., Agarwal, P., Barnes, M. R., Pastinen, T., Richards, J. B., Cardon,
L. R., & Mooser, V. (2012). Use of genome-wide association studies for
drug repositioning. Nature Biotechnology, 30(4), 317–320. https://doi.org/
10.1038/nbt.2151
Vamathevan, J., & Birney, E. (2017). A review of recent advances in transla-
tional bioinformatics: Bridges from biology to medicine. Yearbook of Medical
Informatics, 26, 178–187.
Yin, D., Ling, S., Wang, D., Dai, Y., Jiang, H., Zhou, X., Paludan, S. R.,
Hong, J., & Cai, Y. (2021). Targeting herpes simplex virus with CRISPR-
Cas9 cures herpetic stromal keratitis in mice. Nature Biotechnology, 39(5),
567–577. https://doi.org/10.1038/s41587-020-00781-8
Wilson, A. C., Chiles, J., Ashish, S., Chanda, D., Kumar, P. L., Mobley, J. A.,
Neptune, E. R., Thannickal, V. J., & McDonald, M. N. (2022). Integrated
bioinformatics analysis identifies established and novel TGFβ1-regulated genes
62 C. BARUAH ET AL.
1 Introduction
In all living things of this universe, sustenance is a common threat.
According to Malthus’ renowned essay, food production grows in an
arithmetic ratio while the population expands geometrically. Humans
have understood from the beginning that they must cultivate crops and
expand output to feed an ever-growing population. Since the beginning
of agriculture, men have attempted to raise the uneven curve of food
production, and this curve has tended to rise, barring famines, in the past.
place, the effective population size of the originating population may tend
to decrease. A small part of the population is restricted to guarantee any
undesirable illnesses or pest species are eradicated. Due to the small popu-
lation size, the process may take several generations, and it may result in
inbreeding (Murty & Banerjee, 2011). In reality, only a small percentage
of individuals released into a new range contribute to the following gener-
ation, reducing the size of the initial population. Thus, even if the initial
collection is sufficiently enough to prevent severe bottlenecks, genetic
diversity may decrease with time.
In the beginning, chemical pesticides were regarded as a miracle answer
to the growing pest problem by farmers who suffered substantial losses
due to insect-related crop damage. A surge in demand for chemical
pesticides resulted as a consequence of farmers abandoning conventional
pest control techniques by adopting this new strategy. Pesticides used
incorrectly have led to unintended effects in the ecosystem. In recent
years, concerns have been raised about biomagnification, or the build-
up of xenobiotics that are more concentrated in the body than in the
environment (Wiratno et al., 2007).
Risky microorganisms are not only bad for our major basic require-
ments but also damage the economical status of a country (Cavicchioli
et al., 2019). Some of the world’s poorer countries, which depend on
agricultural growth, are bearing the brunt of this economic catastrophe.
Several times attack by insects in India has resulted in similar condi-
tions. States like Andhra Pradesh, Karnataka, Punjab, Uttar Pradesh,
Maharashtra, and Haryana have been particularly hard-hit by recent
year’s drought. There are just a few prevalent pests, including stem
borer, fruit and shoot borer, pod borer, and top shoot borer. It has
been decided to take a few measures to guarantee that our harvest
would survive. Central Plant Protection Stations (CPPSs) and Central
Surveillance Stations (CSSs) were combined in 1992 to become Central
Integrated Pest Management Centres (CIPMCs).
It has been a long time since India has made any significant attempts
to solve these serious issues in tackling the pest to save crops. There are
31 CIPMC in 28 states and one union territory that have been created
to tackle the problem. During the years 2007 and 2008, pest control
incurred a large amount of money. Among the important areas where
additional contributions were considered were the pests monitoring (8.16
million acres), the bio-control chemical field releases (1900 million acres),
and the area coverage (7.00 million acres).
66 B. DEKA ET AL.
3 Information Technology:
Scope in Biological Sciences
A branch of contemporary science that is multidisciplinary in nature,
bioinformatics, or computational biology, uses information technology
to address biological problems. Mathematical modelling and statistical
techniques were employed for a long time to forecast a wide variety of
biological features, but with the advent of computers, the area saw a
significant transformation. In recent years, methods from the discipline
of bioinformatics have become more popular in all areas of fundamental
biology. Several of the most frequent applications include phylogenetic
analysis, comparative sequencing, structure prediction, and discovery and
validation of drug and pesticide targets (Banerjee et al., 2008; Munjal
et al., 2018). There is a growing demand for data mining to shed light
on previously undiscovered aspects of complicated biological processes
as more data is made available to the general public (Abdel-latief et al.,
2007; Munjal et al., 2018). An additional advantage of this cutting-edge
technology is the ability to create databases with efficient data input and
retrieval methods for data entry (Gochhait et al., 2021).
Database technology has emerged as a key component of information
technology in recent years. Relational databases are extensively used in
a broad variety of areas. Biology databases are more difficult to maintain
than other databases because the data is so complex, and it requires a high
degree of integration to handle and link a range of input files. Primary and
secondary database systems have been developed and are now widely used
in business and academia, as well as in the armed forces.
As a result, academics and researchers have access to hundreds of useful
databases on the Internet, which are accessible for free or for a nominal
fee, databases are becoming more useful in pest management because of
the establishment of relevant and helpful pest information. It’s worth
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 67
5 Entomo-Informatics: A Prelude
to the Concepts in Bioinformatics
Entomo-informatics is a scientific field that is essential in today’s ento-
mological studies. Data collected in sequencing facilities has led to a
new field called bioinformatics. Many biologists are currently unaware
of bioinformatics methodologies, tools, and databases, which may lead
to missed opportunities or misunderstanding of data. Due to pesticide
resistance, entomological research has increased relevance. It represents
different entomological databases, and suitable their URL addresses.
These databases include all proteins, biological reactions, and physiolog-
ical processes. The NCBI (Benson et al., 2013), DDBJ (Tateno et al.,
2002), and EMBL (Calabrese, 2019; Stoesser et al., 2002) are important
shared platforms for storing biological data. The NCBI’s Entrez platform
is a powerful biological search engine that helps us find material. With so
much research going on globally, an International Nucleotide Sequence
Database Consortium (INSDC) was created to minimise data duplication
in the previously stated genome databases (Arita et al., 2021; Cochrane
et al., 2010; Stevens, 2018). NCBI’s sequence databases receive genomic
data from worldwide sequencing efforts and are the backbone of bioinfor-
matics research. There are several nucleotide databases grouped under the
nucleotide database category. The Entrez search engine returned substan-
tial data for the term insecta. Researchers interested in individual genes
will go farther, sequencing them and submitting them to the various
repositories.
DroSpeGe is a search engine for Drosophila genomes. Drosophila
genomes are made accessible for comparative investigations on this plat-
form (Gilbert, 2007; Song et al., 2011). DroSpeGe is a tool for biologists
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 69
7 Integration of Agricultural
Systems and Pest Biology
There is a dire need for new pest control techniques. Agricultural systems
and pest biology expertise are required to practice sustainable agriculture.
To assist in agricultural production management, models are computer-
based, such as decision support systems (DSS) that integrate data and
human knowledge. For example, Colorado State University is developing
a Pest Management Decision Support System in the United States, and
the work has already begun. To achieve this, a system was developed
in cooperation with all other sectors of the food and fibre manufac-
turing industry. Producers, academics, and policymakers now have a
better understanding of how to monitor and sustain an integrated model
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 71
11 Quantitative Structure–activity
Relationship (QSAR)
Over the past four decades, QSAR has been widely utilised in agro-
chemistry, chemistry, toxicology, and pharmaceutical chemistry (Hansch
et al., 2001; Kwon et al., 2019). As a result of rigorous testing and inde-
pendent variable fine-tuning, the output of molecular and atom-based
descriptors, as well as those produced by quantum chemical calcula-
tions and spectroscopy, has risen (Cho, 2005). It is now possible to
screen a large number of substances under the same test conditions
using high-throughput screening techniques. This reduces the risk of
combining findings from different sources. It’s time to go back to the
basics. QSAR techniques are now multidimensional, ranging from 0 to
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 73
13 Advanced Strategies
There is a lot of promise for microRNA-related techniques in a variety of
emerging technologies (Kim & Nam, 2006). When it comes to digesting
mRNAs, this amazing molecule is more precise. A pest’s unwanted gene
product will be much simpler to target, and pest population manage-
ment will become significantly more precise in the foreseeable future.
These techniques are used to study and produce pharmaceuticals (Ford,
2006). Apart from that, it may also be used to kill insects. The scientific
community has not yet found this region. Computational biology has the
potential to be very beneficial in miRNA screening (Pla et al., 2018).
In the past, various microRNA screening techniques have been devel-
oped; however, this area still needs a great deal of research and specificity.
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 75
15 Future Prospects
Although computational biology has made a significant contribution to
pest management via its numerous services, there is still more work to be
done in this area. The number of specialised databases, such as Spodobase
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 77
16 Conclusion
Future research in pest control should focus more on this important topic.
Numerous significant uses of current multidisciplinary fields linked to
computational biology to pest control are described in this article. The
fact that the study is interdisciplinary lends credence to the idea that
there is still a lot to be done. However, this is slowly changing. Due to
the urgency and importance of the situation, several studies are presently
being performed. Involvement from mathematicians and statisticians is
growing and there is optimism that a global solution could be found
soon if, they work together. These disciplines’ combined efforts may one
day help us find out a solution that does not need us to give up our
food freely, or even a small part cheerfully. For this reason, we must posi-
tion ourselves as the fittest to ensure our survival via the use of technical
instruments in a situation where the survival of the fittest scenario occurs.
78 B. DEKA ET AL.
References
Abdel-latief, M. (2007). A family of chemoreceptors in Tribolium castaneum
(Tenebrionidae: Coleoptera). PLoS ONE, 2(12), e1319.
Arita, M., Karsch-Mizrachi, I., & Cochrane, G. (2021). The international
nucleotide sequence database collaboration. Nucleic Acids Research, 49(D1),
121–124. https://doi.org/10.1093/nar/gkaa967
Banerjee, A. K., Arora, N., & Murty, U. S. N. (2009). Clustering and classifica-
tion of anopheline spacer sequences using self organizing maps. The Internet
Journal of Genomics and Proteomics, 4(1).
Banerjee, A. K., Kiran, K., Murty, U. S. N., & Venkateswarlu, Ch. (2008). Classi-
fication and identification of mosquito species using artificial neural networks.
Computational Biology and Chemistry, 32, 442–447.
Barker, K. (2010). Biosecure citizenship: Politicising symbiotic associations and
the construction of biological threat. Transactions of the Institute of British
Geographers, 35(3), 350–363. http://www.jstor.org/stable/40890992
Bartlett, A. (2002). ICT and IPM, farmers, FAO and field schools: Bringing IPM
to the grass roots in Asia (pp. 8–9).
Benfenati, E., Gini, G., Piclin, N., Roncaglioni, A., & Vari, M. R. (2003).
Predicting log P of pesticides using different software. Chemosphere, 53,
1155–1164.
Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J.,
Ostell, J., & Sayers, E. W. (2013). Genbank. NAR D36–42.
Bhattacharyya, S., & Bhattacharya, D. K. (2006). Pest control through viral
disease: Mathematical modeling and analysis. Journal of Theoretical Biology,
238, 177–197.
Calabrese, B. (2019). Standards and models for biological data: Common
formats. In S. Ranganathan, M. Gribskov, K. Nakai, & C. Schönbach (Eds.),
Encyclopedia of bioinformatics and computational biology (pp. 130–136).
Academic Press. https://doi.org/10.1016/B978-0-12-809633-8.20418-4.
Cavicchioli, R., Ripple, W. J., Timmis, K. N., et al. (2019). Scientists’ warning to
humanity: Microorganisms and climate change. Nature Reviews Microbiology,
17 , 569–586. https://doi.org/10.1038/s41579-019-0222-5
Cho, S. J. (2005). Hologram Quantitative Structure-Activity Relationship
(HQSAR) study of mutagen X. Bulletin of the Korean Chemical Society, 26(1),
85–90.
Cochrane, G., Karsch-Mizrachi, I., & Nakamura, Y. (2010). The interna-
tional nucleotide sequence database collaboration. NAR. https://doi.org/
10.1093/nar/gkq1150
da Silva Mesquita, R., Kyrylchuk, A., Grafova, I., Kliukovskyi, D., Bezdudnyy,
A., Rozhenko, A., Tadei, W. P., Leskela, M., & Grafov, A. (2020). Synthesis,
4 APPLICATION OF BIOINFORMATICS IN AGRICULTURAL … 79
LópezPazos, S. A., & Cerón Salamanca, J. A. (2008). Mini review and hypoth-
esis: Homology modelling of Spodoptera litura (Lepidoptera: Noctuidae)
amino peptidase N receptor. Revista De La Academia Colombiana De Ciencias
Exactas, 32(123), 139–144.
Lorenzen, S., & Zhang, Y. (2007). Monte Carlo refinement of rigid-body protein
docking structures with backbone displacement and side-chain optimization.
Protein Science, 16, 2716–2725.
Marygold, S. J., Leyland, P. C., Seal, R. L., Goodman, J. L., Thurmond, J.
R., Strelets, V. B., & Wilson, R. J. (2013). FlyBase: Improvements to the
bibliography. NAR, 41(D1), D751–D757.
Meyer, D., Leisch, F., & Hornik, K. (2003). The support vector machine under
test. Neurocomputing, 55(1–2), 169–186.
Moné, Y., Nhim, S., Gimenez, S., Legeai, F., Seninet, I., Parrinello, H., negre,
N., & d’Alencon, E. (2018). Characterization and expression profiling of
microRNAs in response to plant feeding in two host-plant strains of the lepi-
dopteran pest Spodoptera frugiperda. BMC Genomics, 19, 804. https://doi.
org/10.1186/s12864-018-5119-6
Munjal, G., Hanmandlu, M., & Srivastava, S. (2018). Phylogenetics algorithms
and applications. Ambient Communications and Computer Systems: RACCCS-
2018, 904, 187–194. https://doi.org/10.1007/978-981-13-5934-7_17
Murty, U. S. N., & Banerjee, A. K. (2011). Bioinformatics with solutions in
pest management science: An insight into the evolving technologies. In D.
Reddy Vudem, N. R. Poduri, & V. R. Khareedu (Eds.), Pests and pathogens:
Management strategies (pp. 521–542). BS Publications, CRC Press.
Murty, U. S. N., Rao, M. S., Arora, N., & Krishna, A. R. (2006). Database
management system for the control of malaria in Arunachal Pradesh. India
Bio Information, 1(6), 194–196.
Nègre, V., Hôtelier, T., Volkoff, A. N., Gimenez, S., Cousserans, F., Mita, K.,
Sabau, X., Rocher, J., Miguel, L. F., Emmanuelle, D., Audant, P., Sabourault,
C., Bidegainberry, V., Hilliou, F., & Fournier, P. (2006). SPODOBASE: An
EST database for the lepidopteran crop pest Spodoptera. BMC Bioinformatics,
7 , 322.
Pla, A., Zhong, X., & Rayner, S. (2018). miRAW: A deep learning-based
approach to predict microRNA targets by analyzing whole microRNA tran-
scripts. PLOS Computational Biology, 14(7), e1006185. https://doi.org/10.
1371/journal.pcbi.1006185
Punia, A., Chauhan, N. S., Singh, D., kesavan, A. K., Kair, S., & Sohal, S.
K. (2021). Effect of gallic acid on the larvae of Spodoptera litura and its
parasitoid Bracon hebetor. Science and Reports, 11, 531. https://doi.org/10.
1038/s41598-020-80232-1
82 B. DEKA ET AL.
1 Introduction
The field of biology has advanced into data research discipline and enor-
mous wealth of data created is sequenced and available for synthesis of
new information (Bayat, 2002). Thus, there is a need for careful storage,
organization, synthesis and analysis of large data set for discovery of new
knowledge. Hence, information technology and computation techniques
are applied to biology to create a field called bioinformatics.
1. Client Module
2. Core Module
8 COVID-19 Implications
and Sustainable Development Goals
The United Nation has developed an agenda for accomplishing sustain-
able development to provide peace and prosperity in the world. A global
partnership strategy to attain sustainable development, seventeen goals
were prioritized. These sustainable development goals (SDGS) aimed at
removing poverty, reducing inequality, improvement in education and
increased economic growth along with handling climate change and
preserving forest and oceans.
The wide spread infection across the world due to corona virus disease
has caused a huge adverse effect on the humankind. The global pandemic
has caused healthcare crisis and also impacted global economy, thus
hampering the achievement of sustainable development goals by 2030.
The disruption caused by COVID-19 pandemic has led to delay in attain-
ment of agenda set by the SDGS by reversing the progress earlier attained
on hunger, poverty, education, health care, etc (Leal Filho et al., 2020;
Min & Perucci, 2020; Ray et al., 2021).
The primary goal is to end poverty, but COVID-19 has led to
increase in global poverty rate. There is an exacerbation of world hunger
due to the global pandemic with worsening child malnutrition. The
goal of ensuring healthy lives has reverse in progress with short life
expectancy and disruption in healthcare system, leading to increase in
morbidity and mortality during the pandemic. Provision of quality educa-
tion is hampered by the COVID-19 pandemic situation, and the closure
of educational institution has devastating effect on learning and well-
being of students (Gochhait et al., 2021; Rimal et al., 2021). Gender
96 P. KEERTHANA AND S. GOCHHAIT
References
Alvis, M. E. (2003). An introduction to software product line development,
26–37.
5 APPLICATION OF BIOINFORMATICS IN HEALTH CARE … 97
Back, J. W., de Jong, L., Muijsers, A. O., & de Koster, C. G. (2003). Chemical
cross-linking and mass spectrometry for protein structural modeling. Journal
of Molecular Biology, 331(2), 303–313.
Bayat, A. (2002). Science, medicine, and the future: Bioinformatics. BMJ
(Clinical Research Ed.), 324(7344), 1018–1022.
Caccia, D., Dugo, M., Callari, M., & Bongarzone, I. (2013). Bioinformatics
tools for secretome analysis. Biochimica Et Biophysica Acta, 1834(11), 2442–
2453.
Cannataro, M., & Harrison, A. (2021). Bioinformatics helping to mitigate the
impact of COVID-19—Editorial. Briefings in Bioinformatics, 22(2), 613–615.
Chen, C., Hou, J., Tanner, J. J., & Cheng, J. (2020). Bioinformatics methods
for mass spectrometry-based proteomics data analysis. International Journal
of Molecular Sciences, 21(8), 2873.
Chen, Q., Luo, H., Zhang, C., & Chen, Y. P. (2015). Bioinformatics in protein
kinases regulatory network and drug discovery. Mathematical Biosciences, 262,
147–156.
Collins, F. S., Green, E. D., Guttmacher, A. E., & Guyer, M. S. (2003). A vision
for the future of genomics research. Nature, 422(6934), 835–847.
Costa, G. C., Braga, R., David, J. M., & Campos, F. (2015). A scientific software
product line for the bioinformatics domain. Journal of Biomedical Informatics,
56, 239–264.
Gochhait, S., Butt, S., De-La-Hoz-Franco, E., Shaheen, Q., Luis, D. M., Piñeres-
Espitia, G., & Mercado-Polo, D. (2021a). A machine learning solution for
bed occupancy issue for smart healthcare sector. Journal of Automatic Control
and Computer Science, 55(6), 546–556. ISSN: 0146-4116.
Hinkson, I. V., Davidsen, T. M., Klemm, J. D., Kerlavage, A. R., Kibbe, W.
A., & Chandramouliswaran, I. (2017). A comprehensive infrastructure for big
data in cancer research: Accelerating cancer research and precision medicine.
Frontiers in Cell and Developmental Biology, 5, 83.
Huang, J., Borchert, G. M., Dou, D., Huan, J., Lan, W., Tan, M., & Wu, B.
(2017). Bioinformatics in microRNA research. In Methods in molecular biology
(Vol. 1617). Humana Press.
Iacobucci, C., Gotze, M., & Sinz, A. (2020). Cross-linking/mass spectrometry
to get a closer view on protein interaction networks. Current Opinion in
Biotechnology, 63, 48–53.
Jamesdaniel, S., Salvi, R., & Coling, D. (2009). Auditory proteomics: Methods,
accomplishments and challenges. Brain Research, 1277 , 24–36.
Leal Filho, W., Brandli, L. L., Lange Salvia, A., Rayman-Bacchus, L., & Platje,
J. (2020). COVID-19 and the UN sustainable development goals: Threat to
solidarity or an opportunity? Sustainability, 12(13), 5343.
98 P. KEERTHANA AND S. GOCHHAIT
Lin, J., Zeng, J., Liu, S., Shen, X., Jiang, N., Wu, Y.S., Li, H., Wang, L., &
Wu, J.-M. (2021). DMAG, a novel countermeasure for the treatment of
thrombocytopenia. Research Square.
Lyon, J., Giuse, N. B., Williams, A., Koonce, T., & Walden, R. (2004). A
model for training the new bioinformationist. Journal of the Medical Library
Association, 92(2), 188–195.
Matzinger, M., & Mechtler, K. (2021). Cleavable cross-linkers and mass
spectrometry for the ultimate task of profiling protein–protein interaction
networks in vivo. Journal of Proteome Research, 20(1), 78–93.
Mehmood, M. A., Sehar, U., & Ahmad, N. (2014). Use of bioinformatics tools
in different spheres of life sciences. Journal of Data Mining in Genomics and
Proteomics, 5(2), 1–13.
Min, Y., & Perucci, F. (2020). Impact of COVID-19 on SDG progress
(UN/DESA Policy Brief 18), 1–5.
Moore, A. C., Winkjer, J. S., & Tseng, T. T. (2015). Bioinformatics resources
for DNA discovery. Biomarker Insights, 10(S4), 53–58.
Mukherjee, P., & Mani, S. (2013). Methodologies to decipher the cell secretome.
Biochimica Et Biophysica Acta, 1834(11), 2226–2232.
Northrop, L. M. (2002). SEI’s software product line tenets. IEEE, 19, 32–40.
Petrotchenko, E. V., & Borchers, C. H. (2010). Crosslinking combined with
mass spectrometry for structural proteomics. Mass Spectrometry Reviews,
29(6), 862–876.
Rai, A., Bhati, J., & Lal, S. B. (2012). Software tools and resources for
bioinformatics research (Vol. 1). New India Publishing Agency.
Rasheed, Z. (2017). Bioinformatics approach: A powerful tool for microRNA
research. International Journal of Health Sciences, 11(3), 1–3.
Ray, M., Sable, M. N., Sarkar, S., & Hallur, V. (2021). Essential interpretations
of bioinformatics in COVID-19 pandemic. Meta Gene, 27 , 100844.
Rimal, Y., Gochhait, S., & Bisht, A. (2021b). Data interpretation and visual-
ization of COVID-19 cases using R programming. Informatics in Medicine
Unlocked, 26(6), 100705. ISSN: 0146-4116.
Saunders, N. F., Brinkworth, R. I., Huber, T., Kemp, B. E., & Kobe, B.
(2008). Predikin and PredikinDB: A computational framework for the predic-
tion of protein kinase peptide specificity and an associated database of
phosphorylation sites. BMC Bioinformatics, 9, 245.
Singh, V., & Mishra, V. (2021). Environmental impacts of coronavirus disease
2019 (COVID-19). Bioresource Technology Reports, 15, 100744.
Stransky, B., & Galante, P. (2010). Application of bioinformatics in cancer
research. In W. Cho (Ed.), An omics perspective on cancer research. Springer.
Tomczak, K., Czerwińska, P., & Wiznerowicz, M. (2015). The Cancer
Genome Atlas (TCGA): An immeasurable source of knowledge. Contemporary
Oncology, 19(1A), A68–A77.
5 APPLICATION OF BIOINFORMATICS IN HEALTH CARE … 99
Tran, B. Q., Goodlett, D. R., & Goo, Y. A. (2015). Advances in protein complex
analysis by chemical cross-linking coupled with mass spectrometry (CXMS)
and bioinformatics. Biochimica et Biophysica Acta, 1864(1), 123–129.
CHAPTER 6
1 Introduction
The retrieval of biomedical literature is getting increasingly complex,
necessitating the development of improved information retrieval systems.
Unstructured resources, such as text documents, are scoured by Infor-
mation Retrieval (IR) tools in massive data repositories, which are held
on systems. Information depiction, storage, and groups are all phases of
IR (Nadkarni, 2002), in which one of the most complex tasks in IR is
formative which materials are appropriate to the users requirements and
which are not. Users cannot perfectly propose search string in an exact
mode to get particular part of data from enormous data reserves under
the current regime. Search results from basic information systems are of
reduced worth. We plan another advancement to filtering searches to
improve imply the user’s data require grouping to work on the results of
IR by utilizing distinctive inquiry extension strategies and produce a linear
arrangement between them, where the linear arrangement was straightly
between two development results all at once in our proposed framework
for this section where the arrangements were linearly between two devel-
opment results at time in our proposed system for this chapter. Query
expansions, for example, discover synonyms and reweight original phrases
to broaden the search query. They deliver substantially more targeted,
specific explore outcomes than standard queries.
The rest of this chapter is facilitated as go after: “related work"
segment gives a diagram of associated effort State of the Art|| Section
discusses the terminologies used in information retrieval. Open Problems
and Challenges|| Section outlines the existing problems and future scope
thereof. Conclusion|| segment is ending, and it also handles on possibility
of future work.
2 Literature
Because of the rapid growth of biological data, good IR systems are
required offers on particular and meaningful responses to complex
queries. One of the key challenges in information retrieval societies is
query extension. Researchers have developed a variety of strategies for
query expansion. Some methods stress the use of unstructured data
(text documents) to determine expansion words, while others emphasize
the use of structured data to determine expansion terms (Ontologies).
Perez Aguera et al. (2010) compare various methods for query expansion
in unstructured documents. The expansion term: Tanimoto, Dice, and
Cosine coefficients to consider co-occurrence of terms in distinct papers.
They also use Kullback Liebler Divergence to look at the allotment of
development expressions in the peak position documents and the entire
collected documents.
Described a work in Buscher et al. (2012) about how to choose
words that are more pertinent to the query topic from feedback docu-
ments based on the placements of keywords in feedback documents. To
solve this challenge in a coherent, probabilistic manner, they developed a
position to importance model (PRM).
6 INFORMATION RETRIEVAL IN BIOINFORMATICS: STATE … 103
in the data set that meet the data need. Normally, the task must be
completed correctly and quickly.
Index structures and Boolean queries: A Boolean inquiry (Dadheech
et al., 2018; Du et al., 2020) is a basic and frequent approach to commu-
nicate an information requirement. A term (e.g., OLE1) or a Boolean
term combination is provided by the user (e.g., OLE1 and lipid). The
biomedical literature databases PubMed, as well as many other text search
engines, use this query paradigm.
The vector model and similarity queries (Bordawekar & Shmueli,
2017), is based on give details in this section, is a widely used of Boolean
query. The viewed documents are as (algebraic) vectors over terms in this
configuration, as formally specify below. A search query, q, can contain a
large number of terms and even an entire text. It is also represented as a
vector and is observed as a body of content rather than just a set of search
terms. The retrieval effort is reduced to looking for document vectors
that are the most comparable to the search-query vector in the database.
The several of documents’ similarity measures have been developed and
applied.
3.3 IR in Biomedical-Informatics
The experimental methods based (Hersh, 2020) that allow for the inves-
tigation of genes and proteins from a whole genome are the initial step
toward molecular comprehension of complex biological processes. While
experimentation are designed and conceded, the ability to observe in the
background of existing information and before hypotheses is critical for
both informed planning and interpretation of outcomes (Rimal et al.,
6 INFORMATION RETRIEVAL IN BIOINFORMATICS: STATE … 105
contractions, and abbreviations can be trying because of the way that the
equivalent (or comparable) names, truncations, and abbreviations may be
utilized to allude to different might be used to refer to various things.
On the other hand, determining where a composite name begins and
stops in a text might be tricky. These issues, in our opinion, are solely
due to an absence of normal terminology and programming appara-
tuses. Luckily, the Unified Medical Language System (UMLS)—which
unites various well-being and biomedical vocabularies and guidelines
determined to empower interoperability between PC systems—has been
a beneficial endeavor aimed at encouraging uniformity. The UMLS has
three tools: a metathesaurus (which contains terms and codes from
a variety of languages), a semantic network (which allows users to
move between related categories and their relationships), and a specialist
database (equipped with language processing tools). However, due to
a lack of standardization, various other issues remain unsolved. Quite
possibly the most troublesome hardships, as we would like to think, is the
necessity for consequently consolidating writing and organic information
bases. Bioinformaticians’ obligations are particular from those of data set
keepers. Standard apparatuses fit for removing messages and connections
from the writing, just as helping data set custodians in finding pertinent
material for comment, would extensively add to making the issue less
serious or even missing for this situation. Other difficult issues have a
close connection to the organization of scientific publishing
5 Conclusion
Bioinformatics text analysis aims to improve access to unstructured knowl-
edge by easing searches, supplying auto-generated summaries, connecting
publications with organized assets, displaying content more visually, and
assisting analysts in the discovery of novel hypotheses. Research in bioin-
formatics text mining has developed over the past few years, from archive
recovery to relationship extraction. There are now a number of tools that
can be used to integrate literature analysis across a range of life science
disciplines, and these tools are being developed at an increasing pace.
As part of this study, we briefly discuss literature retrieval and mining in
bioinformatics, text mining, and writing recovery.
We referenced certain issues worth extra investigation in the second
piece of the paper, fully intent on creating bioinformatics writing recovery
and mining techniques and frameworks. To summarize, the scientific
community is effectively occupied with resolving many issues identified
with writing recovery and mining, and a few arrangements have been
introduced and executed. They will be that as it may, be generally point-
less until mainstream researchers move toward all-inclusive guidelines for
how existing information is published and disseminated with researchers,
with an exacting focus on structure of scientific publications.
References
Abdou, S., & Savoy, J. (2008). Searching in Medline: Query expansion and
manual indexing evaluation. Information Processing & Management, 44(2),
781–789.
Alipanah, N., Parveen, P., Khan, L., & Thuraisingham, B. (2011, July).
Ontology-driven query expansion using map/reduce framework to facilitate
federated queries. In 2011 IEEE International Conference on Web Services
(pp. 712–713). IEEE.
Blynova, N. (2019). Latent semantic indexing (LSI) and its impact on copy-
writing. Communications and Communicative Technologies, (19), 4–12.
Bordawekar, R., & Shmueli, O. (2017, May). Using word embedding to enable
semantic queries in relational databases. In Proceedings of the 1st Workshop on
Data Management for End-to-End Machine Learning (pp. 1–4).
Buscher, G., Dengel, A., Biedert, R., & Elst, L. V. (2012). Attentive documents:
Eye tracking as implicit feedback for information retrieval and beyond. ACM
Transactions on Interactive Intelligent Systems (TiiS), 1(2), 1–30.
6 INFORMATION RETRIEVAL IN BIOINFORMATICS: STATE … 109
Dadheech, P., Goyal, D., Srivastava, S., & Choudhary, C. M. (2018). An efficient
approach for big data processing using spatial Boolean queries. Journal of
Statistics and Management Systems, 21(4), 583–591.
Dang, V., Bendersky, M., & Croft, W. B. (2013, March). Two-stage learning
to rank for information retrieval. In European Conference on Information
Retrieval (pp. 423–434). Springer.
Dey, A., Jenamani, M., & Thakkar, J. J. (2017, December). Lexical TF-IDF:
An n-gram feature space for cross-domain classification of sentiment reviews.
In International Conference on Pattern Recognition and Machine Intelligence
(pp. 380–386). Springer.
Dogan, R. I., Chatr-aryamontri, A., Kim, S., Wei, C. H., Peng, Y., Comeau,
D. C., & Lu, Z. (2017, August). BioCreative VI precision medicine track:
Creating a training corpus for mining protein–protein interactions affected by
mutations. In BioNLP 2017 (pp. 171–175).
Drost, H. G., & Paszkowski, J. (2017). Biomartr: Genomic data retrieval with
R. Bioinformatics, 33(8), 1216–1217.
Du, L., Li, K., Liu, Q., Wu, Z., & Zhang, S. (2020). Dynamic multi-
client searchable symmetric encryption with support for Boolean queries.
Information Sciences, 506, 234–257.
Hersh, W. (2020). Information retrieval: A biomedical and health perspective.
Health Informatics. https://doi.org/10.1007/978-3-030-47686-1
Jang, H., Jeong, Y., & Yoon, B. (2021). TechWord: Development of a
technology lexical database for structuring textual technology information
based on natural language processing. Expert Systems with Applications, 164,
114042.
Krallinger, M., Rabal, O., Lourenco, A., Oyarzabal, J., & Valencia, A. (2017).
Information retrieval and text mining technologies for chemistry. Chemical
Reviews, 117 (12), 7673–7761.
Matos, S., Arrais, J. P., Maia-Rodrigues, J., & Oliveira, J. L. (2010).
Concept-based query expansion for retrieving gene related publications from
MEDLINE. BMC Bioinformatics, 11(1), 1–9.
Nadkarni, P. M. (2002). An introduction to information retrieval: Applications
in genomics. The Pharmacogenomics Journal, 2(2), 96–102.
Pérez-Agüera, J. R., Arroyo, J., Greenberg, J., Iglesias, J. P., & Fresno, V.
(2010, April). Using BM25F for semantic search. In Proceedings of the 3rd
International Semantic Search Workshop (pp. 1–8).
Rimal, Y., Gochhait, S., & Bisht, A. (2021). Data interpretation and visualization
of COVID-19 cases using R programming. Informatics in Medicine Unlocked,
26 (6), 100705. Elsevier, ISSN: 0146-4116.
Rivas, A. R., Iglesias, E. L., & Borrajo, L. (2014). Study of query expansion
techniques and their application in the biomedical information retrieval. The
Scientific World Journal, (1), 1–10.
110 SUNITA ET AL.
1 Introduction
The categorization process consists of the most important aspects
regarding the datamining idea. The classification mechanism has been
discovered to occur frequently in everyday life. For example, in a railway
station, tickets are distributed and classified based on the class required,
in a hospital, patients are classified based on the nature of their disease
and their risk factors (low, medium, and high), in a school, teachers
classify students’ performance based on the grade received (first class,
second class, third class, and fail), and in mobile technologies, (Orriol-
sPuig et al., 2009) the basic goal of multivariate data classification is
M. Revathi (B)
Department of Biotechnology, Bharathiar University, Coimbatore, India
e-mail: [email protected]
D. Ramyachitra
Department of Computer Science, Bharathiar University, Coimbatore, India
Start
Stop
Multivariate datasets
Normalized data
Separate 70% data for training and 30% data for testing
Global Optimization
Using GWO
for almost any classification method. The reminder for this paper is organ-
ised as follows. Section 2 discusses several data classification methods and
investigations. Section 3 describes the suggested hybrid support vector
machine with grey wolf optimization (Pham et al., 2018). In Section 4,
the proposed HSVMGWO and existing support vector machine, random
forest, ada boost, and decision tree experimental results are compared.
The proposed chapter’s final remarks and future scope are found in
Section 5.
2 Literature Review
This present section elaborates a review of previous studies for data classi-
fication using support vector machine-based classifiers (Emamgholizadeh
et al., 2021). Galatenko et al. (2014) gave a formalised definition
related to problem of choosing specification as well as building a
genomic classifier for medical test systems that relies on mathematical
machine learning techniques rather than biological or medical expertise
(Gochhait et al., 2021). Latha (2014) presented a support vector machine
(SVM) for Radial Basis Function technique with automatic analysis of
Magnetic Resonance Image (MRI) (RBF). Karamizadeh Sasan et al.
(2014) published a review on support vector machine (SVM), a pattern
recognition and data categorization algorithm. Using the information
supplied by the support vector machine (SVM), Gürbüz and Kilic (2014)
built a general-purpose, rapid, and adaptive automatic disease detection
system, which enhanced the success rate and reduced the decision-making
time.
Chen Zhi et al. (2016) developed a support vector machine classifier
(GA–SVM) based on a genetic algorithm (GA) for lymph disease diag-
nosis. In the first stage, GA is used to cut the 18 features in the lymph
diseases dataset down to six. A support vector machine with several kernel
functions, such as linear, quadratic, and Gaussian, was used as a clas-
sifier in the second stage. In the field of medical imaging, Lee et al.
(2015) combined support vector machines (SVM) with Active Learning
(AL) into rise prediction based on irregular classes. Jian Xiao and Sheng
Hanmin (2015) proposed a Fuzzy Support Machine (FSVM) as the irreg-
ular problem class (dubbed FSVMCIP), which may be thought of as a
modified FSVM with manifold regularisation for two classes, and there
are two costs associated with misclassification.
116 M. REVATHI AND D. RAMYACHITRA
build SVMs with grid points, which can be expected to speed up SVMs
in the test phase; another method is to build SVMs with unlabelled
data, which has been shown to improve SVM accuracy when there
is very little labelled data. Kim et al. (2003) suggested utilising the
SVM ensemble with bagging (bootstrap aggregation) or boosting to
improve the genuine SVM’s restricted classification performance. To clas-
sify heterogeneous medical data, Kumar and Arasu (2015) employed
Modified Particle Swarm Optimization and Adaptive Fuzzy K-Modes
(MPSO-AFKM). Mishra and colleagues. Hybrid filter-wrapper techniques
for high-performance classification models were developed by (2015).
Nguyen et al. (2015a, 2015b) used wavelet transformation (WT)
and interval type-2 fuzzy logic system (IT2FLS) for automated medical
data classification. The IT2FLS was taught through a hybrid learning
method, resulting in improved performance and reduced computational
burden. Nguyen et al. (2015a, 2015b) proposed a modified Analytic
Hierarchy Process (AHP)-based gene selection for microarray classifica-
tion, with AHP-selected genes being used for cancer classification using
the fuzzy standard additive model (FSAM). In order to handle the
high-dimensional, low-sample nature of microarray data, the number of
fuzzy rules is minimised using a genetic algorithm (GA).In order to
reduce computational burden, Nguyen et al. (2015a, 2015b) proposed
a fuzzy standard additive model (SAM) with genetic algorithm (GSAM)
for healthcare data classification, high-dimensional datasets discriminative
features derived by applying wavelet transformation.
Purwar and Singh (2015) tested a hybrid prediction mode with missing
value imputation on three medical data sets (HPM-MI). Santhanam and
Ephzibah proposed using a genetic algorithm and fuzzy logic to diag-
nose heart illness automatically. To increase the performance of the fuzzy
inference system used to generate a classification model for the GA
selected feature, the fuzzy gaussian membership function and the centroid
technique were applied.
Pourpanah et al. (2019) created a fuzzy ARTMAP (FAM) and Classifi-
cation And Regression Tree (CART) hybrid for medical data classification.
The presented model provides consistent learning, predictions in the
form of a decision tree, and the extraction of valuable explanatory rules
as a decision support tool. Sindhiya and Gunasundari investigated how
the genetic algorithm (GA) and other heuristic methods could be used
to choose characteristics for illness identification in large dimensional
datasets. Stathopoulos and Kalamboukis (2015) demonstrated how to use
118 M. REVATHI AND D. RAMYACHITRA
3 System Design
We describe work done to design and assess ways to handle missing values,
attribute noise, and imbalanced class distribution in datasets to predict in
this research. In this section, we’ll go over a quick overview of the hybrid
approach for hybrid support vector machine with grey wolf optimization
(HSVMGWO) in knowledge discovery. The purpose of this stage is to
select the most appropriate categorization method for a particular dataset.
Because no generalisations about the best classification strategy can be
made, including this phase has necessitated empirical testing of each and
every prediction and analysis for a given dataset. Because the dataset under
study is limited, our suggested approach uses unsupervised learning to
find the best hybrid classification methods.
1
minimise(w, s) wr w + Ce T s (1)
2
1 Σ 2 1 Σ( i )2
N
MSE = Ei − yk − dki . (4)
N N
I =1 i=1
The difference between the actual and desired outputs of the kth
output neuron in the ith sample is denoted by yk and dk, where N
is the number of training samples. The fitness function f is defined by
the MSE in this way (x). To avoid overfitting the classifier model, each
member’s fitness is assessed using the mean square error (MSE) on only
the validation set, rather than the whole training set.
7 HYBRID SUPPORT VECTOR MACHINE WITH GREY WOLF … 121
Start
Hybrid GWO
Initialize Based SVM
Population Training
Stopping
criteria
Reached?
Stop
Accuracy comparison
Dataset Proposed Support Random ada boost decision
system vector forest tree
machine
of SVM, 98.5% of random forest, 86.5% of ada boost, and 78.6% of deci-
sion tree in the cleveland-0 vs 4 dataset are lower than the proposed
system. The suggested system outperforms 91.3% of SVM, 93.6% of
random forest, 88.6% of ada boost, and 81.5% of decision tree on the
glass-0-1-4-6 vs 2 dataset.
In support vector machine, random forest, ada boost, decision tree,
and proposed support vector machine (SVM) with grey wolf optimiza-
tion, the sensitivity comparison for shuttle-2 vs 5, yeast-0-3-5-9 vs 7-8,
vowel0, cleveland-0 vs 4 and glass-0-1-4-6 vs 2 dataset is explained in
Table 2. (HSVMGWO). Figure 6 indicates that the proposed system’s
sensitivity outperforms all three current systems for all five categories of
datasets. when it comes to the proposed system shuttle-2 vs 5 dataset has a
sensitivity rate of 97.6%, yeast-0-3-5-9 vs 7-8 has a sensitivity rate of 95.6
%, vowel0 has a sensitivity rate of 97.3 %, cleveland-0 vs 4 has a sensi-
tivity rate of 99.4 %, and glass-0-1-4-6 vs 2 has a sensitivity rate of 95.5
%. The total average sensitivity was 97.08 %, although other traditional
approaches achieved maximum sensitivity.
Other existing systems achieve 97.6% SVM, 96.6 % random forest,
88.48 % ada boost, and 83.30 % decision tree on the shuttle-2 vs 5
dataset, which is lower than the proposed method. SVM, 89.58 % random
forest, 84.15% ada boost, and 88.21% decision tree are all lower than
Accuracycomparison
120
Accuracy in %
dataset
Sensitivity comparison
Dataset Proposed Support Random ada boost decision
system vector forest tree
machine
Sensitivity comparison
120
sensitivity in %
Dataset
Specificity comparison
Dataset Proposed Support Random ada boost decision
system vector forest tree
Machine
Specificity comparison
120
Specificity in % 100 Proposed system
80 Support vector machine
60 Random forest
40
ada boost
20
0 decision tree
Dataset
forest, 7.11 M.sec of ada boost, and 11.18 M.sec of decision tree on
the cleveland-0 vs 4 dataset. The suggested system outperforms the glass-
0-1-4-6 vs 2 dataset by 7.7 M.sec of SVM, 6.54 M.sec of random forest,
5.86 M.sec of ada boost, and 9.18 M.sec of decision tree.
Tables 1, 2, 3 and 4 show the results of the hybrid support vector
machine with grey wolf optimization (HSVMGWO) for classification of
the selected five multivariate datasets. Reducets derived from forward
feature selection for vowel0, cleveland-0 vs 4, glass- 0-1-4-6 vs 2dataset
and backward feature removal approach for shuttle-2 vs 5, yeast-0-3-5-9
Time comparison
60
50 Proposed system
Time in Msec
dataset
5 Conclusion
This paper describes a hybrid support vector machine with grey wolf opti-
mization (HSVMGWO) technique for classifying five different types of
datasets. The suggested classifier’s simulated results in this chapter show
superior results, however, it should be highlighted that when used to solve
a complicated data classification problem, the HSVMGWO is easily stuck
in local optimization during the search process for effective features for
classification. This can be seen in the datasets shuttle-2 vs 5, yeast-0-3-
5-9 vs 7-8, vowel0, cleveland-0 vs 4, and glass-0-1-4-6 vs 2, where the
suggested algorithm got stuck with local minima numerous times and
had to be elevated to give classification results. In the future, knowledge
engineers may be able to create efficient decision support systems in real-
world scenarios employing hybrid classification methodologies including
two or more classifiers. Hybrid optimization techniques and bio-inspired
artificial intelligence approaches will generate stronger classifier models
in the future, which can be employed in the design and development of
decision support systems to increase efficiency.
References
Akay, M. F. (2009). Support vector machines combined with feature selection for
breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240–3247.
Beckett, C., Eriksson, L., Johansson, E., & Wikström, C. (2017). Multivariate
data analysis (MVDA). Pharmaceutical quality by design: A practical approach,
201–225.
Chang, Y., Kim, N., Lee, Y., Lim, J., Seo, J. B., & Lee, Y. K. (2012). Fast and
efficient lung disease classification using hierarchical one-against-all support
vector machine and cost-sensitive feature selection. Computers in Biology and
Medicine, 42(12), 1157–1164.
130 M. REVATHI AND D. RAMYACHITRA
Chen, Z., Lin, T., Tang, N., & Xia, X. (2016). A parallel genetic algorithm based
feature selection and parameter optimization for support vector machine.
Scientific programming, 2016.
Emamgholizadeh, S., & Mohammadi, B. (2021). New hybrid nature-based algo-
rithm to integration support vector machine for prediction of soil cation
exchange capacity. Soft computing, 25, 13451–13464. https://doi.org/10.
1007/s00500-021-06095-4
Galatenko, V. V., Lebedev, A. E., Nechaev, I. N., Shkurnikov, M. Y., Tonevitskii,
E. A., & Podolskii, V. E. (2014). On the construction of medical test systems
using greedy algorithm and support vector machine. Bulletin of Experimental
Biology and Medicine, 156(5), 706–709.
Gochhait, S. et al. (2021). Data interpretation and visualization of COVID-
19 cases using R programming. Informatics in Medicine Unlocked, 26(6),
Elsevier, ISSN: 0146–4116.
Gudadhe, M., Wankhade, K., & Dongre, S. (2010, September). Decision support
system for heart disease based on support vector machine and artificial neural
network. In 2010 International Conference on Computer and Communication
Technology (ICCCT) (pp. 741–745). IEEE.
Gürbüz, E., & Kılıç, E. (2014). A new adaptive support vector machine for
diagnosis of diseases. Expert Systems, 31(5), 389–397.
Kalimuthu, S. (2021). Sentiment analysis on social media for emotional predic-
tion during COVID-19 pandemic using efficient machine learning approach.
Computational intelligence and healthcare informatics, 215.
Kalimuthu, S., Naït-Abdesselam, F., & Jaishankar, B. (2021). Multimedia data
protection using hybridized crystal payload algorithm with chicken swarm
optimization. In Multidisciplinary approach to modern digital steganography
(pp. 235–257). IGI Global.
Karamizadeh, S., Abdullah, S. M., Halimi, M., Shayan, J., & javadRajabi, M.
(2014, September). Advantage and drawback of support vector machine func-
tionality. In 2014 international Conference on Computer, Communications,
and Control Technology (I4CT) (pp. 63–65). IEEE.
Kim, H. C., Pang, S., Je, H. M., Kim, D., & Bang, S. Y. (2003). Constructing
support vector machine ensemble. Pattern Recognition, 36(12), 2757–2767.
Kumar, R. S., & Arasu, G. T. (2015). Modified particle swarm optimization
based adaptive fuzzy k-modes clustering for heterogeneous medical databases.
Latha, P. (2014). SVM based automatic medical decision support system for
medical image. Journal of Theoretical & Applied Information Technology,
66(3).
Lee, S. H., Bang, M., Jung, K. H., & Yi, K. (2015, June). An efficient selec-
tion of HOG feature for SVM classification of vehicle. In 2015 International
Symposium on Consumer Electronics (ISCE) (pp. 1–2). IEEE.
7 HYBRID SUPPORT VECTOR MACHINE WITH GREY WOLF … 131
Leng, Y., Sun, C., Xu, X., Yuan, Q., Xing, S., Wan, H., & …& Li, D. (2016).
Employing unlabeled data to improve the classification performance of SVM,
and its application in audio event classification. Knowledge-based systems, 98,
117–129.
Liu, R., Peng, J., Leng, Y., Lee, S., Panahi, M., Chen, W., Zhao, X. (2021)
Hybrids of support vector regression with grey wolf optimizer and firefly algo-
rithm for spatial prediction of landslide susceptibility. Remote Sensing, 13(24),
4966. https://doi.org/ https://doi.org/10.3390/rs13244966
Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2015a). Classifica-
tion of healthcare data using genetic fuzzy logic system and wavelets. Expert
Systems with Applications, 42(4), 2184–2197.
Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2015b). Medical data
classification using interval type-2 fuzzy logic system and wavelets. Applied soft
computing, 30, 812–822.
Orriols-Puig, A., & Bernadó-Mansilla, E. (2009). Evolutionary rule-based
systems for imbalanced data sets. Soft Computing, 13(3), 213–225.
Petrich, J., Gobert, C., Phoha, S., Nassar, A. R., & Reutzel, E. W. (2017,
August). Machine learning for defect detection for PBFAM using high reso-
lution layerwise imaging coupled with post-build CT scans. In Proceedings of
the 27th international Solid Freeform Fabrication Symposium.
Pham, B. T., Tien Bui, D., & Prakash, I. (2018). Bagging based support vector
machines for spatial prediction of landslides. Environment and earth science,
77 , 146.
Pourpanah, F., Lim, C. P., & Hao, Q. (2019). A reinforced fuzzy ARTMAP
model for data classification. International Journal of Machine Learning and
Cybernetics, 10(7), 1643–1655.
Purwar, A., & Singh, S. K. (2015). Hybrid prediction model with missing value
imputation for medical data. Expert Systems with Applications, 42(13), 5621–
5631.
Qinan, J., Lei, M., Jianfeng, H., QingQing, Y., & Jun, Z. (2014). A primary
study for cancer prognosis based on classification and regression using support
vector machine. In Frontier and future development of information technology
in medicine and education (pp. 909–920). Springer, Dordrecht.
Qiu, J., Wu, Q., Ding, G., Xu, Y., & Feng, S. (2016). A survey of machine
learning for big data processing. EURASIP Journal on Advances in Signal
Processing, 2016(1), 1–16.
Rokach, L. (2010). Ensemble-Based Classifiers. Artificial Intelligence Review,
33(1), 1–39.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The
synthia dataset: A large collection of synthetic images for semantic segmenta-
tion of urban scenes. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 3234–3243).
132 M. REVATHI AND D. RAMYACHITRA
Samanta, B., & Nataraj, C. (2009). Use of particle swarm optimization for
machinery fault detection. Engineering Applications of Artificial Intelligence,
22(2), 308–316.
Sheng, H., & Xiao, J. (2015). Electric vehicle state of charge estimation:
Nonlinear correlation and fuzzy support vector machine. Journal of Power
Sources, 281, 131–137.
Sindhiya, S., & Gunasundari, S. (2014, February). A survey on genetic algorithm
based feature selection for disease diagnosis system. In Proceedings of IEEE
International Conference on Computer Communication and Systems ICCCS14
(pp. 164–169). IEEE.
Stathopoulos, S., & Kalamboukis, T. (2015). Applying latent semantic analysis
to large-scale medical image databases. Computerized medical imaging and
graphics, 39, 27–34.
Vapnik, V., & Chapelle, O. (2000). Bounds on error expectation for support
vector machines. Neural Computation, 12(9), 2013–2036.
Wu, C. H., Chou, H. J., & Su, W. H. (2008). Direct transformation of coordi-
nates for GPS positioning using the techniques of genetic programming and
symbolic regression. Engineering Applications of Artificial Intelligence, 21(8),
1347–1359.
Zhan, Y., & Shen, D. (2005). Design efficient support vector machine for fast
classification. Pattern Recognition, 38(1), 157–161.
CHAPTER 8
1 Introduction
Bioinformatics is a study of interdisciplinary that creates software and
methods tools for analysing biological data, specifically complex and
large data sets. It is a study that combines biology, information engi-
neering, computer science, mathematics, statistics, and even recon-
struction, pattern recognition, simulation, machine learning, iterative
approaches, and molecular algorithms or modelling to analyse and inter-
pret biological data. For statistical and mathematical in silico analyses of
biological questions, bioinformatics has been used. One use of bioinfor-
matics is the examination of molecular sequences and genomic data. The
goal of bioinformatics, which is a combination of life sciences branches,
S. Patil (B)
Department of Bioanalytical Sciences, B. K. Birla College, Kalyan, Maharashtra,
India
e-mail: [email protected]
A. D. Gupta
Department of Biotechnology, B. K. Birla College, Kalyan, Maharashtra, India
2 Goals
Bioinformatics encompasses both biological research that incorporates
computer programming and a collection of regularly used analysis “pipeli-
nes”, specifically in genomics field. Bioinformatics is commonly used to
identify candidate genes and single nucleotide polymorphisms (SNPs).
Such identification is usually done in order to better understand unique
adaptations, genetic basis of disease, attractive characteristics (in agri-
cultural fields), or population variances. Bioinformatics, often known
as proteomics, is an informal term for the study of the organisational
principles inside nucleic acid and protein sequences.
Gene discovery, drug design, sequence alignment, genome assembly,
gene expression, protein structure alignment, drug discovery, protein
structure prediction, and prediction of protein–protein interactions,
evolution modelling, genome-wide association studies, and cell divi-
sion/mitosis are just a few of the major research efforts in the field
(Frantzi et al., 2019).
Featured sub-disciplines within computational biology and bioinfor-
matics include:
4 Databases
Bioinformatics provides different databases and tools for analysing biolog-
ical data.
Galaxy, UGENE, Kepler, Taverna, HIVE, and Anduril are some of the
platforms that provide this service.
8 BIOINFORMATICS AND ITS APPLICATION IN COMPUTING … 137
5 Applications of Bioinformatics
Bioinformatics is being used in fields like microbial genome applications,
medicines, agriculture, and veterinary sciences.
5.1.2 Biotechnology
Global economic and social challenges are being addressed by advances
in molecular modelling, pharmaceutical discovery, disease characteri-
sation, forensics, clinical health care, and agriculture in the biotech-
nology industry. Bioinformatics has reached unprecedented heights
among the biological disciplines as a result of public trust in biotech-
nology and biotechnology’s advancement. Automatic gene identification,
genome sequencing, prediction of gene function, phylogeny, drug design
and development, protein structure prediction, vaccine development,
organism identification, comprehending genomic and gene complexity,
protein functionality, structure, and folding and other bioinformatics
applications exist to speed up research in the field of biotechnology. The
use of bioinformatics in research allows researchers to complete long-
term research projects quickly, such as genome mapping. The future
demands of biotechnology will also be met by bioinformatics inno-
vation. The role of bioinformatics in many biotechnology disciplines
has been discussed, including genomics, drug design, proteomics, and
environmental biotechnology.
8 BIOINFORMATICS AND ITS APPLICATION IN COMPUTING … 139
Genomics
Genomics refers to the study of genes and their expression. This subject
generates a large amount of data regarding gene sequences, their inter-
relationships, and their functions. Bioinformatics plays a crucial role
in managing this massive amount of data. It is becoming easier and
easier to detect systemic functional behaviour, as more complete genome
sequences for more animals become available, through bioinformatics.
Thompson et al. (1994) assert that bioinformatics is critical in structural
genomics, nutritional genomics, and functional genomics.
Proteomics
The study of the function, structure, and interactions of proteins
produced by a tissue, cell, or organism is known as proteomics. It includes
methods in biochemistry, genetics, and molecular biology. Massive
volumes of data on protein profiles, protein activity patterns, protein–
protein interactions, and organelle compositions have been generated
using advanced biological techniques. This huge amount of data can be
managed and accessed using bioinformatics software, databases, and tools.
Image analysis of 2D gels, peptide mass fingerprinting, and fingerprinting
of peptide fragmentation are just a few of the techniques established in
the field of proteomics so far (Hanash, 2003).
Comparative Genomics
In comparative genomics, bioinformatics is used to determine the genetic
structural and functional relationships between various biological species.
Transcriptomics
Transcriptomics is the study of groupings of all messenger RNA molecules
in a cell (Marini et al., 2021). This is also known as Expression Profiling,
and it comprises utilising a DNA microarray to measure the level of
mRNA expression in a specific cell group. Microarray technology creates
thousands of data values in a single run, while a single experiment neces-
sitates hundreds of runs. To analyse such vast amounts of data, a variety
of software packages are used. For transcriptome analysis, bioinformatics
is used in this way, allow to determine mRNA expression levels. RNA-
sequencing (RNA-seq) has also been included in the transcriptomics
category (Eagles et al., 2021). The quantity and existence of RNA in
a sample at a certain moment are determined using next-generation
140 S. PATIL AND A. D. GUPTA
Cheminformatics
Cheminformatics (chemical informatics) is the study of chemical
substance information storage, indexing, searching, retrieval, and appli-
cation. It comprises the logical organisation of chemical data in order
to make chemical structures, characteristics, and interactions more acces-
sible. It is theoretically possible to design a chemical with the required
properties, detect and structurally modify a natural product, and test its
therapeutic effectiveness using computer algorithms employing bioinfor-
matics. Cheminformatics analysis includes procedures like as grouping,
similarity searches, QSAR modelling, virtual screening, and others.
Gene Expression
Gene expression regulation allows researchers to use genetic data to
construct molecular technologies that is the basis of functional genomics,
and it that can count the number of genes that are currently being
transcribed in each cell at any given time (e.g. gene expression arrays).
5.2 Medicines
In medicine, bioinformatics has several uses, including gene research,
medicinal development, and prevention. Medical applications of bioin-
formatics are:
to those of the same species. The second is the flow of experimental data
from observed biological phenomena to explanation models, which is
then followed by more tests to put the models to the test. The organ-
isation of DNA sequence and protein 3D structural data collections was
one of the first initiatives in bioinformatics in the 1960s and 1970s. With
the development of biological investigations that produce vast volumes of
data quickly, it has grown into a thriving academic and corporate sector
(such as the multiple genome sequencing projects, the large-scale anal-
ysis of gene expression, and the large-scale analysis of protein–protein
interactions). Clinical medicine (including clinical medical information
systems) has long been affected by basic biological science, and a new
generation of epidemiologic, prognostic, diagnostic, and therapeutic tools
is emerging. Over the next decade, bioinformatics activities that appear to
be solely focused on basic research are expected to become increasingly
relevant in clinical informatics. DNA sequence information and anno-
tations, for example, will become more widespread in medical records.
Clinical information systems will soon incorporate bioinformatics tech-
nologies established for research. The focus of genetic disorder research
is turning away from single genes and towards uncovering networks
of genes at cellular level, unravelling their intricate connections, and
establishing their role in disease. As a result, a new era of individually
personalised treatment will emerge. Bioinformatics will aid and guide
clinical researchers and molecular biologists in taking advantage of the
advantages of computational biology. Clinical research teams who can
seamlessly move from clinical practice to the laboratory bench to the use
of these powerful computational tools will be the most prolific in the
coming decades.
Personalised Medicine
The medicine is a sort of treatment that is personalised to each indi-
vidual’s genetic composition. Personalised medicine is a type of medical
care in which each patient’s treatment is individually adjusted to meet
their specific needs. It is conceivable since we are genetically diverse
from one another. There are two important keys in the concept. To
begin, medical research attempts highlight how personalised medicine
is. Shifting medicines focus from reaction to prevention, selecting the
optimal therapy, reducing the length and cost of clinical trials, lowering
the overall estimated cost of health care, and lowering the likelihood of
adverse drug reactions are all attempts (Zhang and Hong, 2015).
144 S. PATIL AND A. D. GUPTA
Forensic Analysis
Biomolecular data is becoming increasingly important in forensic
research, and several European countries are building forensic databases
to preserve DNA profiles of known offenders’ crime sites and conduct
DNA testing. Statistical and technological developments, such as TFT
biosensors, DNA microarray sequencing, and machine learning algo-
rithms, which give an effective manner of organising and inferring
evidence, have strengthened the field (Bianchi & Lio, 2007). Nowadays,
homology modelling is employed to create 3D models in order to analyse
or justify our desired outcomes. Bioinformatics has changed the face of
molecular research by allowing researchers to determine gene structure or
sequence, protein structure, and molecular markers, as well as tie them to
8 BIOINFORMATICS AND ITS APPLICATION IN COMPUTING … 149
5.3 Agriculture
5.3.1 Development of Drought Resistant Varieties
Drought stress induced by unexpected precipitation is a huge danger to
global food supply, and its influence is only likely to grow as climate
change progresses. Understanding the impact of drought on crop and
plant responses is crucial for generating superior varieties with consis-
tent high yields to meet the growing food demand caused by a growing
population relying on diminishing land and water resources. The recent
introduction of innovative “-omics” technologies, like as proteomics,
genomics, and metabolomics, allows us to investigate and discover genetic
elements that underpin system complexity. The main challenge in this
genomics era is storing and managing the large amounts of data contained
in transcriptomics data or even genome scaffolds accessible for most of
the plant species; it is no exaggeration to claim that bioinformatics has
been well incorporated into modern-omics research. Sequence analysis
and de novo genome assembly tools, similarity searching tools, genome
sequencing tools, transcriptome, proteome, genome annotation tools,
and metabolome analysis, as well as visualisation tools, help us analyse
biological data and provide new insights into an organisation of biolog-
ical systems (Dahiya & Lata, 2017). This -omics knowledge might then
be applied to improve crop quality and production, as well as disease
resistance and abiotic stress tolerance. Bioinformatics is changing the
way molecular biology research is designed in the post-genomics age,
contributing significantly to scientific knowledge while also providing
new roles and perspectives to stress tolerance improvement genetic
engineering programmes.
150 S. PATIL AND A. D. GUPTA
6 Conclusion
Bioinformatics has become an important part of a variety of biological
fields. Bioinformatics methods such as signal processing and image enable
the extraction of conclusions that are useful from larger amounts of raw
data in experimental and molecular biology. In the realm of genetics, it
aids in the annotation and sequencing of genomes as well as their reported
mutations. Through biological literature, text mining, and the creation of
gene ontologies and biological, it aids in the organisation and querying of
biological data. It can also be used to find the expression and control of
proteins and genes. Bioinformatics tools help in the analysis, comparison,
and interpretation of genomic and genetic data, as well as knowing the
evolutionary elements in the molecular biology. It also aids in the inves-
tigation and cataloguing of biological pathways and networks on a more
integrated level, that are crucial aspects of systems biology. In structural
biology, it aids in the modelling and simulation of RNA, proteins, DNA,
and biomolecular interactions.
References
Allaby, R. G., & Woodwark, M. (2004). Phylogenetics in the bioinformatics
culture of understanding. Comparative and Functional Genomics, 5, 128–146.
Anderson, A. C. (2003). The process of structure-based drug design. Chem-
istry & Biology, 10, 787–797.
Arora, P. K., Kumar, M., Chauhan, A., Raghava, G. P., & Jain, R. K.
(2009). OxDBase: A database of oxygenases involved in biodegradation. BMC
Research Notes, 2, 67.
Benson, D. A, Boguski, M. S., Lipman, D. J., Ostell, J., & Ouellette, B. F.
(1998). GenBank. Nucleic Acids Research, 26(l), 1–7.
Bianchi, L., & Lio, P. (2007). Forensic DNA and bioinformatics. Briefings in
Bioinformatics, 8(2), 117–128.
Breton, G., Johansson, A. C. V., Sjödin, P., Schlebusch, C. M., & Jakobsson,
M. (2021). Comparison of sequencing data processing pipelines and appli-
cation to underrepresented African human populations. BMC Bioinformatics,
22(2021), 488. https://doi.org/10.1186/s12859-021-04407-x
8 BIOINFORMATICS AND ITS APPLICATION IN COMPUTING … 153
Cantor, C. R. (1998). How will the Human Genome Project improve our quality
of life? Nature Biotechnology, 16(3), 212–213.
Caspi, R., Altman, T., Dreher, K., Fulcher, C. A., Subhraveti, P., Keseler, I. M.,
Kothari, A., Kubo, A., Krummenacker, M., Latendresse, M., Mueller, L. A.,
Ong, Q., Paley, S., Subhraveti, P., Weaver, D. S., Weerasinghe, D., Zhang,
P., & Karp, P. D. (2012). The MetaCyc database of metabolic pathways and
enzymes and the BioCyc collection of pathway/genome databases. Nucleic
Acids Research, 40(D1), D742–D753.
Cello, J., Paul, A. V., & Wimmer, E. (2002). Chemical synthesis of poliovirus
cDNA: Generation of infectious virus in the absence of natural template.
Science, 297 , 1016–1018.
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., &
Thompson, J. D. (2003). Multiple sequence alignment with the Clustal series
of programs. Nucleic Acids Research, 31, 3497–3500.
Dahiya, B. L., & Lata, M. (2017). Bioinformatics impacts on medicine, microbial
genome and agriculture. Journal of Pharmacognosy and Phytochemistry., 6(4),
1938–1942.
Eagles, N. J., Burke, E. E., Leonard, J., et al. (2021). SPEAQeasy: A scal-
able pipeline for expression analysis and quantification for R/bioconductor-
powered RNA-seq analyses. BMC Bioinformatics, 22, 224. https://doi.org/
10.1186/s12859-021-04142-3
Frantzi, M., Latosinska, A., & Mischak, H. (2019). Proteomics in drug devel-
opment: The dawn of a new era? Proteomics Clinical Applications, 5,
e1800087.
Gochhait, S. et al. (2021). Data Interpretation and Visualization of COVID-
19 Cases using R Programming. Informatics in Medicine Unlocked, 26(6).
Elsevier. ISSN: 0146-4116.
Greene, N. (2002). Computer systems for the prediction of toxicity: An update.
Advanced Drug Delivery Reviews, 54(3), 417–431.
Hanash, S. (2003). Disease proteomics. Nature, 422, 226–232.
Imming, P., Sinning, C., & Meyer, A. (2006). Drugs, their targets and the nature
and number of drug targets. Nature Reviews Drug Discovery, 5, 821–834.
Kumar, S., Nei, M., Dudley, J., & Tamura, K. (2008). MEGA: A biologist-centric
software for evolutionary analysis of DNA and protein sequences. Briefings in
Bioinformatics, 9(4), 299–306.
Mann, L., Seibt, K. M., Weber, B., et al. (2021). ECCsplorer: A pipeline
to detect extrachromosomal circular DNA (eccDNA) from next-generation
sequencing data. BMC Bioinformatics, 23, 40. https://doi.org/10.1186/s12
859-021-04545-2
Marini, F., Ludt, A., Linke, J., & Strauch, K. (2021). GeneTonic: An
R/Bioconductor package for streamlining the interpretation of RNA-seq
154 S. PATIL AND A. D. GUPTA
A B
Basic Local Alignment Search Tool
Ada boost algorithm, 111
(BLAST), 3, 7, 9, 12, 85, 88
Adenosine diphosphate (ADP), 92 Bioinformatics, 1–3, 5–7, 10, 12,
14–16, 25, 29, 38, 39, 41, 46,
Adenosine triphosphate (ATP), 91, 92
53, 57, 59, 64, 66, 68, 76,
Agriculture, 16, 63, 70, 137, 138, 83–89, 92–95, 105, 108,
149, 150 133–143, 145–147, 149–152
Algorithm, 3, 6, 7, 9, 10, 12, 14, 16,
20, 21, 23–25, 28, 31, 46, 51,
C
52, 69, 71, 72, 84, 86, 87, 93,
Cancer Biomedical Information Grid
112, 113, 116, 118, 122, 129,
(caBIG), 94, 145
133–136, 140, 144, 148
Cancer Genome Anatomy Project
Applications, 1, 3, 9, 10, 12, 14, 16, (CGAP), 94
20, 25, 28, 31, 37, 38, 42, 46, Catalogue of Somatic Mutation in
54, 57, 64, 66, 67, 69, 73, 75, Cancer (COSMIC), 95
84–86, 88–90, 92, 95, 134–138, Computational biology, 1, 2, 58, 64,
141, 142, 144, 146, 148 66, 74–77, 134, 135, 140, 143,
146, 149
Artificial intelligence (AI), 10, 16,
Convolutional neural network (CNN),
19–21, 23–33, 73, 106, 116,
26, 28–31
129, 135, 144
Corona virus disease (COVID), 95
Artificial neural networks (ANNs), 10, Cross linking mass spectrometry
21, 73, 116 (CXMS), 89
© The Editor(s) (if applicable) and The Author(s), under exclusive 155
license to Springer Nature Singapore Pte Ltd. 2022
S. Dutta and S. Gochhait (eds.), Information Retrieval in Bioinformatics,
https://doi.org/10.1007/978-981-19-6506-7
156 INDEX
D I
Database, 7, 14, 15, 38, 39, 48, 49, Information retrieval (IR), 101–104
59, 66–69, 76, 77, 84, 88, In silico analysis, 133
92–94, 104, 107, 112, 120, 135, International Cancer Genome
138, 139, 147, 148, 151 Consortium (ICGC), 57, 94
Decision tree, 115, 117, 122–128 International Nucleotide Sequence
Deoxyribonucleic acid (DNA), 2–5, 7, Database Collaboration
12, 14, 38, 46, 47, 53, 75, 84, (INSDC), 68, 94
85, 87, 92, 135, 139, 143, 147, Isotope coded affinity tag (ICAT), 87
148, 152
Difference gel electrophoresis
K
(DIGE), 87
Kinase Knowledge Base (KKB), 92
Discrete Probability Detector (DPD),
Kinase Pathway Database (KPD), 92
7
Kinase Sequence database (KSD), 92
DNA Data Bank of Japan (DDBJ),
Knowledge discovery (mining) in
68, 94
databases (KDD), 118
E L
Early Detection Research Network Liquid chromatography tandem mass
(EDRN), 93 spectrometry (LCMS), 87
Electronic health records (HER), 40 Literature, 7, 40, 58, 101, 103–108,
European Molecular Biology 122, 152
Laboratory (EMBL), 3, 68, 94
Expansion Terms (BET), 103 M
Expressed Sequence Tags (EST), 67, Machine learning (ML), 14, 16, 20,
69, 94 21, 23, 25, 31–33, 115, 116,
133, 148
Mass spectrometry (MS), 49, 56,
G 87–89
Gene ontology (GO), 76, 88, 144, Medicine, 12, 13, 16, 20, 40, 43,
152 45–49, 51, 53, 55–58, 84, 85,
Genetic Algorithms (GA), 72, 73, 95, 137, 141–145
115, 117 Micro RNA (MiRNA), 74, 92, 93
Grey wolf optimization (GWO), 112, Molecular biology, 2, 5, 7, 44, 107,
120, 122–124, 126–128 134, 139, 149, 152
Multivariate dataset, 122, 128
H N
Human Cancer Genome Project National Cancer Institute (NCI), 94,
(HCGP), 94 95, 145
INDEX 157