Bio in For Matics
Bio in For Matics
Bio in For Matics
Bioinformatics is a scientific discipline that deals with the retrieval, storage, processing,
analysis, and management of biological information through computational techniques.
"Bioinformatics is the field of science in which biology, computer science, and information
technologies merge into a single discipline.
Basically, bioinformatics has three components:
1. The creation of databases allowing the storage and management of large biological data sets.
2. The development of algorithms and statistics to determine relationships among members of
large data sets.
3. The use of these tools for the analysis and interpretation of various types of biological data,
including DNA, RNA and protein sequences, protein structures, gene expression profiles, and
biochemical pathways.
Use of computers:
Bioinformatics is largely, although not exclusively, a computer-based discipline.
Computers are important in bioinformatics for two reasons: First, many bioinformatics
problems require the same task to be repeated millions of times.
For example, comparing a new sequence to every other sequence stored in a database or
comparing a group of sequences systematically to determine evolutionary relationships.
Second, computers are required for their problem-solving power. Typical problems that might
be addressed using bioinformatics could include solving the folding pathways of protein given
its amino acid sequence, or deducing a biochemical pathway given a collection of RNA
expression profiles.
Computers can help with such problems, but it is important to note that expert input and
robust original data are also required.
Internet plays an important role to retrieve the biological information. Bioinformatics emerging
new dimension of biological science includes computer science, mathematics and life science.
The Computational part of bioinformatics use to optimize the biological problems like
(metabolic disorder, genetic disorders).
Databases:
There are many different types of database but for routine sequence analysis, the following are
initially the most important.
1. Primary databases
2. Secondary databases
3. Composite databases
- EMBL, Genbank, DDBJ, SWISS-PROT, TREMBL, PIR.
ary databases: - PROSITE, Pfam.
-Combine different sources of primary databases. Composite database's
NRDB OWL.
GenBank: GenBank (Genetic Sequence Databank) is one of the fastest growing repositories of
known genetic sequences. In addition to sequence data, GenBank files contain information like
gene names, phylogenetic classification and references to published literature.
EMBL: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and
RNA sequences collected from the scientific literature and patent applications and directly
submitted from researchers and sequencing groups.
Swiss Prot: This is a protein sequence database.
PDB: The X-ray crystallography Protein Data Bank (PDB).
GDB: The GDB Human Genome Data Base supports biomedical research, clinical medicine,
and professional and scientific education by providing for the storage and dissemination of data
about genes and other DNA markers, map location, genetic disease and locus information, and
bibliographic information
PIR-PSD: PIR (Protein Information Resource) produces and distributes the PIR-International
Protein Sequence Database (PSD). It is the most comprehensive and expertly annotated protein
sequence database.
Database Searching Algorithms
FASTA (European Bioinformatic Institute) 28-30
BLAST (NCBI) 31
Smith-Waterman
Fasta: - Suite of programs for database searching by homology each program is launched by
typing its name.
One can Search:
1. a nucleotide sequence database with a nucleotide query sequence
2. a protein sequence database with a protein query sequence
Compare:
1. a DNA sequence to a DNA sequence database and protein sequence to a protein
sequence database
2. Compare A protein sequence to a DNA sequence database.
Blast: - Basic Local Alignment Search Tool use a heuristic search algorithm use word matching.
Applications:
One of the applications of bioinformatics in drug designing processes is to achieve an
understanding about the connection between the amino acid sequence and proteins 3D
structure.
The structure of the protein can give the overview of how the protein will function. As a result,
the most vital approach that needs to be taken in consideration is the identification and the
classification of protein. This is due to the need to visualize the 2D and 3D structure of a
particular protein.
The process of drug designing is facilitated by understanding the structure of the target protein.
The prediction starts by identifying the amino acid sequences and genes before going to the
purified protein. Thus, this results in more accurate prediction of the protein.
Thanks to bioinformatics, there have been various databases that offer lists of 3D structure of
various proteins and macromolecules. For example for such databases are, molecular modeling
database (MMDB) and protein data bank (PDB).
The methods in in which the structure of the proteins is predicted are categorized into three
standard methods. They are:
1. Ab inito/ De novo prediction is used when the protein sequences have little or no
structure similar to it. It is done based on the chemistry and physics of the protein
structure.
2. Secondly, the prediction based on homology modeling is done by comparing with
homologous sequence which in turn will produce similar structures. However, not all
homologous sequence will produce the similar structure that we need.
3. Thirdly, the threading method or fold recognition method is used to predict the protein
structure when two proteins have similar three-dimensional structure but they have
distinct primary sequence. Hence, this method can verify the unknown structural
alignment. MAMMOTH and SCOP are some of the programs that are used in structural
alignment.