Bioinformatics Day1

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Bioinformatics

Bioinformatics is the discipline of quantitative analysis of information relating to


biological macromolecules with the aid of computers. The development of bioinformatics
as a field is the result of advances in both molecular biology and computer science over the
past 30–40 years. The earliest bioinformatics efforts can be traced back to the 1960s,
although the word bioinformatics did not exist then. Probably, the first major bioinformatics
project was undertaken by Margaret Dayhoff in 1965, developed a first protein sequence
database called Atlas of Protein Sequence and Structure. Subsequently, in the early 1970s, the
Brookhaven National Laboratory established the Protein Data Bank for archiving three-
dimensional protein structures.The first sequence alignment algorithm was developed by
Needleman and Wunschin 1970. This was a fundamental step in the development of the field
of bioinformatics, which paved the way for the routine sequence comparisons and
databasesearching practiced by modern biologists. The first protein structure prediction
algorithmwas developed by Chou and Fasman in 1974.The 1980s saw the establishment of
GenBank and the development of fast databasesearching algorithms such as FASTA by
William Pearson and BLAST by Stephen Altschul and coworkers. The start of the human
genome project in the late 1980sprovided a major boost for the development of
bioinformatics. The development andthe increasingly widespread use of the Internet in the
1990s made instant access to, and exchange and dissemination of biological data possible.

Definition: Bioinformatics involves the technology that uses computers for storage,
retrieval, manipulation, and distribution of information related to biological
macromolecules such as DNA, RNA, and proteins.

Goals: The ultimate goal of bioinformatics is to better understand a living cell and how it
functions at the molecular level. By analyzing raw molecular sequence and structural data,
bioinformatics research can generate new insights and provide a “global” perspective of the
cell. The reason that the functions of a cell can be better understood by analyzing sequence
data is ultimately because the flow of genetic information is dictated by the “central dogma”
of biology in which DNA is transcribed to RNA, whichis translated to proteins. Cellular
functions are mainly performed by proteins whose capabilities are ultimately determined by
their sequences.

Scope

Bioinformatics consists of two subfields:

1. The development of computational tools and databases


2. The application of these tools and databases in generating biological knowledge to
better understand living systems.

The tool development includes writing software for sequence, structural, and functional
analysis, as well as the construction and curating of biological databases. These tools are used
in three areas of genomic and molecular biological research:
1. Molecular sequence analysis:-The areas of sequence analysis include sequence
alignment, sequence database searching, motif and pattern discovery, gene and
promoter finding, reconstruction of evolutionary relationships, and genome assembly
and comparison.
2. Molecular structural analysis:-Structural analyses include protein and nucleic acid
structure analysis, comparison, classification and prediction.
3. Molecular functional analysis:- The functional analyses include gene expression
profiling, protein-protein interaction prediction, protein subcellular localization
prediction, metabolic pathway reconstruction and simulation.

Applications

1. Major impact on many areas of biotechnologyand biomedical sciences


2. Applications in knowledge-based drug design, forensic DNA analysis, and
agricultural biotechnology.
3. Computational studies of protein–ligand interactions provide a rational basis for the
rapid identification ofnovel leads for synthetic drugs.
4. Knowledge of the three-dimensional structures of proteins allows molecules to be
designed that are capable of binding to the receptor siteof a target protein with great
affinity and specificity.
5. This informatics-based approach significantly reduces the time and cost necessary to
develop drugs with higher potency, fewer side effects, and less toxicity than using the
traditional trial-and-error approach.
6. In forensics, results from molecular phylogenetic analysis have been accepted as
evidence in criminal courts.
7. Some sophisticated Bayesian statistics and likelihood-basedmethods for analysis of
DNA have been applied in the analysis of forensic identity.
8. It is worth mentioning that genomics and bioinformatics are now poised to
revolutionize our healthcare system by developing personalized and customized
medicine.
9. The high speed genomic sequencing coupled with sophisticated informatics
technology will allow a doctor in a clinic to quickly sequence a patient’s genome and
easily detect potential harmful mutations and to engage in early diagnosis and
effective treatmentof diseases.
10. Bioinformatics tools are being used in agriculture as well. Plant genome databases
and gene expression profile analyses have played an important role in the
development of new crop varieties that have higher productivity and more resistance
to disease.

Biological Databases

A database is a computerized archive used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria. Databases are
composed of computer hardware and software for data management. The chief objective of
the development of a database is to organize data in a set of structured records to enable easy
retrieval of information. Each record, also called an entry, should contain a number of fields
that hold the actual data items, for example, fieldsfor names, phone numbers, addresses,
dates. To retrieve a particular record from the database, a user can specify a particular piece
of information, called value, to be found in a particular field and expect the computer to
retrieve the whole data record. This process is called making a query.Although data retrieval
is the main purpose of all databases, biological databases often have a higher level of
requirement, known as knowledge discovery, which refers to the identification of
connections between pieces of information that were not known when the information was
first entered. For example, databases containing raw sequence information can perform extra
computational tasks to identify sequence.

Homology or conserved motifs: These features facilitate the discovery of new biological
insights from raw data.

TYPES OF DATABASES

Originally, databases all used a flat file format, which is a long text file that contains many
entries separated by a delimiter, a special character such as a vertical bar (|).Within each entry
are a number of fields separated by tabs or commas.This is manageable for a small database.

To facilitate the access and retrieval of data, sophisticated computer software programs for
organizing, searching, and accessing data have been developed. Theyare called database
management systems.Dependingon the types of data structures, these database management
systems can be classified into two types: relational database management systems and object-
oriented databasemanagement systems. Consequently, databases employing these
management systems are known as relational databases or object-oriented databases,
respectively.
Relational Databases: Instead of using a single table as in a flat file database, relational
databases use a set of tables to organize data. Each table, also called a relation, is made up of
columns and rows. Columns represent individual fields. Rows represent values in the fields
of records. The columns in a table are indexed according to a common feature called an
attribute, so they can be cross-referenced in other tables. To execute a query in a relational
database, the system selects linked data items from different tables and combines the
information into one report. Therefore, specific information can befound more quickly from a
relational database than from a flat file database. Relational databases can be created using a
special programming language called structured query language (SQL).

Figure 1: Example of constructing a relational database for five students’ course information
originally expressed in a flat file. By creating three different tables linked by common fields,
data can be easily accessed and reassembled.

Object-Oriented Databases: One of the problems with relational databases is that the tables
used do not describe complex hierarchical relationships between data items. To overcome the
problem, object-oriented databases have been developed that store data as objects. In an
object-oriented programming language, an object can be considered as a unit that combines
data and mathematical routines that act on the data. The database is structured such that the
objects are linked by a set of pointers defining pre-determined relationships between the
objects. Searching the database involves navigating through the objects with the aid of the
pointers linking different objects. Programming languages like C++ are used to create object-
oriented databases.
BIOLOGICAL DATABASES

Current biological databases use all three types of database structures: flat files, relational,
and object oriented. Despite the obvious drawbacks of using flat files in database
management, many biological databases still use this format. The justification for this is that
this system involves minimum amount of database design and the search output can be easily
understood by working biologists. Based on their contents, biological databases can be
roughly divided into three categories: primary databases, secondary databases, and
specialized databases. Primary databases contain original biological data. They are archives
of raw sequence or structural data submitted by the scientific community. GenBank and
Protein DataBank (PDB) are examples of primary databases. Secondary databases contain
computationally processed or manually curated information, based on original information
from primary databases. Translated protein sequence databases containing functional
annotation belong to this category. Examples are SWISS-Prot and Protein Information
Resources (PIR). Specialized databases are those that are related to a particular research
interest. For example, Flybase, HIV sequence database, and Ribosomal Database Project are
databases that specialize in a particular organism or a particular type of data.

You might also like