CH12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

BIOINFORMATICS

▪ Bioinformatics is an interdisciplinary field


of science that develops methods and software
tools for understanding biological data,
especially when the data sets are large and
complex.
Introduction to ▪ Bioinformatics
uses biology, chemistry, physics, computer
bioinformatics science, computer programming, information
engineering, mathematics and statistics to
analyze and interpret biological data.
• It uses computation power, algorithm and
software for extracting knowledge from
biological data for analysis, prediction,
imaging, and visualization purpose.
Need of bioinformatics:
• Microbial analysis and computing.
• Recognizing and modelling protein structure.
• Treatments for contagious and dangerous diseases
• Data storage and retrieval related to biotechnology.
• Search for new medicines.
• Understanding agricultural trends, pest management, and crop management in agriculture.
• Finds relevance in evolutionary theory.
• To understand the function of genes and gene therapy.
• Cell organizations and function.
• Analysis of drug targets.
• Examine the characteristics of various diseases.
• Integration and development of various tools for the management of biological databases.

An introduction to global bioinformatics


databases:
BIOLOGICAL DATABASES store and organize biological data for easy retrieval of information. These centralized
resources contain DNA and protein sequences and their associated information.
NUCLEOTIDE DATABASES: are a type of biological database containing genetic information, which
includes DNA and RNA sequences that come from a variety of sources, including whole genomes,
transcriptomes, and individual genes.
Applications of nucleotide databases
• Nucleotide databases are used to identify the gene or the function of a particular nucleotide sequence by
comparing an unknown sequence with the known sequences in the database.
• Nucleotide databases can be used to study and examine gene expression by using the sequence
information stored in the databases.
• Nucleotide databases are also used to identify potential drug targets and develop new therapies for genetic
diseases.
• Nucleotide databases also help in identifying genetic variations that may be linked to diseases, which
ultimately helps in the development of diagnostic tools and treatments.
• Nucleotide databases can be used in phylogenetic analysis to analyze the evolutionary relationships
between organisms, by comparing and examining their DNA or RNA sequences.
• The National Center for Biotechnology
Information (NCBI) is part of the United States
National Library of Medicine (NLM), a branch of
the National Institutes of Health (NIH). It is approved
and funded by the government of the United States.
• The NCBI houses a series of databases relevant
to biotechnology and biomedicine and is an
National Center for
important resource for bioinformatics tools and
Biotechnology
services. Major databases include GenBank for DNA
Information (NCBI)
sequences and PubMed, a bibliographic database for
biomedical literature. Other databases include
the NCBI Epigenomics database. All these databases
are available online through the Entrez search
engine.
• https://www.ncbi.nlm.nih.gov/

GenBank
There are • GenBank is a sequence database that
contains a collection of annotated nucleic
several acid sequence data.
nucleotide • It includes various types of genetic
databases. material, such as genomic DNA, messenger
Some of the RNA (mRNA), complementary DNA (cDNA),
expressed sequence tags (ESTs), high-
most popular throughput raw sequence data, and
nucleotide sequence polymorphisms.
databases are:
• GenBank and its collaborators receive
sequences produced in laboratories
throughout the world from more than
500,000 formally described species.
GenBank
• GenBank has become an important database
for research in biological fields and has
grown in recent years at an exponential
rate by doubling roughly every 18 months.
• https://www.ncbi.nlm.nih.gov/genbank/

• European Molecular Biology Laboratory


(EMBL)
• The European Molecular Biology Laboratory
There are several (EMBL) is another nucleotide database, part
nucleotide of the INSDC (The international nucleotide
databases. Some of sequence database collaboration).
the most • It is focused on the storage and distribution
popular nucleotide of nucleotide and protein sequences.
databases are: • EMBL also develops tools to help
researchers analyze and interpret this data.
• https://www.ebi.ac.uk/
DNA Data Bank of Japan (DDBJ)
• The DNA Data Bank of Japan (DDBJ) is
another nucleotide database that
exchanges data with GenBank and EMBL as
a member of INSDC.
There are
several nucleotide
• DDBJ collects and exchanges nucleotide
databases. Some of the sequence data and manages bioinformatics
most popular nucleotide tools for data submission and retrieval. It
databases are: also develops tools for biological data
analysis and organizes Bioinformatics
Training Courses in Japanese.
• https://www.ddbj.nig.ac.jp/services/index-
e.html?tag=search,DDBJ

Protein database
• Protein databases are a type of biological database that are collections of information about
proteins.
• The information contained in protein databases includes the amino acid sequence, the
domain structure, the biological function of the protein, its three-dimensional structure, and
its interactions with other proteins.
• Several protein databases are publicly available. Based on the type of information stored,
protein databases can be classified into several categories. Some of the most common
categories of protein databases are as follows:
• Sequence Databases, Structure Databases, Interaction Databases, Functional Annotation
Databases, Disease-Associated Databases, Expression Databases
Protein Sequence Databases
• The protein sequence database contains amino acid sequences of proteins and related
information. The amino acid sequence of a protein is important because it determines the
protein’s three-dimensional structure and function, as well as its identity.
Some of the most popular protein sequence databases are:
SWISS-PROT
• SWISS-PROT is a protein sequence database that provides high levels of
annotations, including information on the protein’s function, domain
structure, post-translational modifications, and variants.
• Swiss-Prot is jointly managed by the SIB (Swiss Institute of Bioinformatics)
and the EBI (European Bioinformatics Institute).
• The database distinguishes itself from other protein sequence databases by
three criteria: (i) annotations, which cover a broad range of information, (ii)
minimal redundancy, which ensures that each sequence is represented only
once, and (iii) integration with other databases, which enables cross-
referencing and retrieval of information from related databases.
• https://www.uniprot.org/

Some of the most popular protein sequence databases


are:
TrEMBL
• TrEMBL is a computer-annotated supplement of Swiss-Prot.
TrEMBL entries follow the Swiss-Prot format.
• It contains all the translations of EMBL (European
Molecular Biology Laboratory) nucleotide sequence entries
that have not yet been integrated into Swiss-Prot.
Applications of protein databases
Protein databases have numerous applications. Some of the applications are:
• Protein databases can be used in sequence analysis to identify homologous
sequences and predict protein functions based on sequence similarity.
• Protein databases can also be used for predicting protein structure by
comparing the amino acid sequence of a protein with known structures in the
database.
• Protein databases also include tools to study protein-protein interactions.
• Protein pattern and profile databases can be used for protein family
identification by identifying conserved motifs.
• Protein databases such as metabolic pathway databases can be used in drug
discovery and disease research by studying the metabolic pathways involved
in diseases.

DATA RETRIEVAL
TOOLS:
• In databases, data
retrieval is the
process of identifying
and extracting data
from a database,
based on a query
provided by the user
or application.
Entrez
• Entrez is an integrated search engine which allows
users to search and retrieve different data from the
National Center for Biotechnology Information
(NCBI).
• It can be accessed from the
site www.ncbi.nlm.nih.gov/Entrez/.
• Entrez is NCBI’s major text search and retrieval
system which integrates PubMed database and 39
other scientific literatures, nucleotide and protein
databases, protein domain data, population study
datasets, expression data, pathways and systems of
interacting molecules, complete genome details and
taxonomic information into a tightly inter linked
system.

TAXONOMY BROWSER:
• The Taxonomy Browser is a synthetic database that allows users to
examine the progress of DNA barcoding by browsing through the
different levels of the taxonomic hierarchy available on BOLD.
• Within the Taxonomy Browser, users can select phlya in the Animal,
Plant, Fungus, or Protist kingdoms to navigate from phylum to
species level. Statistics on the progress of DNA barcoding at each
taxon are generated from both public and private data while
protecting private user-owned data.
• Database allows browsing of the taxonomy tree, which contains a
classification of organisms.
• https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi
• https://v3.boldsystems.org/index.php/resources/handbook?chapter=2_datab
ases.html&section=tax_browser

You might also like