Bioinformatics Day1
Bioinformatics Day1
Bioinformatics Day1
Definition: Bioinformatics involves the technology that uses computers for storage,
retrieval, manipulation, and distribution of information related to biological
macromolecules such as DNA, RNA, and proteins.
Goals: The ultimate goal of bioinformatics is to better understand a living cell and how it
functions at the molecular level. By analyzing raw molecular sequence and structural data,
bioinformatics research can generate new insights and provide a “global” perspective of the
cell. The reason that the functions of a cell can be better understood by analyzing sequence
data is ultimately because the flow of genetic information is dictated by the “central dogma”
of biology in which DNA is transcribed to RNA, whichis translated to proteins. Cellular
functions are mainly performed by proteins whose capabilities are ultimately determined by
their sequences.
Scope
The tool development includes writing software for sequence, structural, and functional
analysis, as well as the construction and curating of biological databases. These tools are used
in three areas of genomic and molecular biological research:
1. Molecular sequence analysis:-The areas of sequence analysis include sequence
alignment, sequence database searching, motif and pattern discovery, gene and
promoter finding, reconstruction of evolutionary relationships, and genome assembly
and comparison.
2. Molecular structural analysis:-Structural analyses include protein and nucleic acid
structure analysis, comparison, classification and prediction.
3. Molecular functional analysis:- The functional analyses include gene expression
profiling, protein-protein interaction prediction, protein subcellular localization
prediction, metabolic pathway reconstruction and simulation.
Applications
Biological Databases
A database is a computerized archive used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria. Databases are
composed of computer hardware and software for data management. The chief objective of
the development of a database is to organize data in a set of structured records to enable easy
retrieval of information. Each record, also called an entry, should contain a number of fields
that hold the actual data items, for example, fieldsfor names, phone numbers, addresses,
dates. To retrieve a particular record from the database, a user can specify a particular piece
of information, called value, to be found in a particular field and expect the computer to
retrieve the whole data record. This process is called making a query.Although data retrieval
is the main purpose of all databases, biological databases often have a higher level of
requirement, known as knowledge discovery, which refers to the identification of
connections between pieces of information that were not known when the information was
first entered. For example, databases containing raw sequence information can perform extra
computational tasks to identify sequence.
Homology or conserved motifs: These features facilitate the discovery of new biological
insights from raw data.
TYPES OF DATABASES
Originally, databases all used a flat file format, which is a long text file that contains many
entries separated by a delimiter, a special character such as a vertical bar (|).Within each entry
are a number of fields separated by tabs or commas.This is manageable for a small database.
To facilitate the access and retrieval of data, sophisticated computer software programs for
organizing, searching, and accessing data have been developed. Theyare called database
management systems.Dependingon the types of data structures, these database management
systems can be classified into two types: relational database management systems and object-
oriented databasemanagement systems. Consequently, databases employing these
management systems are known as relational databases or object-oriented databases,
respectively.
Relational Databases: Instead of using a single table as in a flat file database, relational
databases use a set of tables to organize data. Each table, also called a relation, is made up of
columns and rows. Columns represent individual fields. Rows represent values in the fields
of records. The columns in a table are indexed according to a common feature called an
attribute, so they can be cross-referenced in other tables. To execute a query in a relational
database, the system selects linked data items from different tables and combines the
information into one report. Therefore, specific information can befound more quickly from a
relational database than from a flat file database. Relational databases can be created using a
special programming language called structured query language (SQL).
Figure 1: Example of constructing a relational database for five students’ course information
originally expressed in a flat file. By creating three different tables linked by common fields,
data can be easily accessed and reassembled.
Object-Oriented Databases: One of the problems with relational databases is that the tables
used do not describe complex hierarchical relationships between data items. To overcome the
problem, object-oriented databases have been developed that store data as objects. In an
object-oriented programming language, an object can be considered as a unit that combines
data and mathematical routines that act on the data. The database is structured such that the
objects are linked by a set of pointers defining pre-determined relationships between the
objects. Searching the database involves navigating through the objects with the aid of the
pointers linking different objects. Programming languages like C++ are used to create object-
oriented databases.
BIOLOGICAL DATABASES
Current biological databases use all three types of database structures: flat files, relational,
and object oriented. Despite the obvious drawbacks of using flat files in database
management, many biological databases still use this format. The justification for this is that
this system involves minimum amount of database design and the search output can be easily
understood by working biologists. Based on their contents, biological databases can be
roughly divided into three categories: primary databases, secondary databases, and
specialized databases. Primary databases contain original biological data. They are archives
of raw sequence or structural data submitted by the scientific community. GenBank and
Protein DataBank (PDB) are examples of primary databases. Secondary databases contain
computationally processed or manually curated information, based on original information
from primary databases. Translated protein sequence databases containing functional
annotation belong to this category. Examples are SWISS-Prot and Protein Information
Resources (PIR). Specialized databases are those that are related to a particular research
interest. For example, Flybase, HIV sequence database, and Ribosomal Database Project are
databases that specialize in a particular organism or a particular type of data.