Unit 1
Unit 1
Unit 1
Introduction to Bioinformatics
Syllabus
Five modules
– Introduction to Bioinformatics
– Sequencing Alignment and Dynamic
Programming
– Sequence Databases and Uses
– Evolutionary Trees and Phylogeny
– Special Topics in Bioinformatics
Data & Information
• Data is a representation of a fact, figure,
and idea.
• In computer science – data are numbers,
words, images, etc.
• Information is an ordered sequence of
symbols.
Information Technology
• Information technology (IT) is "the
acquisition, processing, storage and
dissemination of vocal, pictorial, textual
and numerical information by a
microelectronics-based combination of
computing and telecommunications"
– Dennis Longley, Michael Shain (1985)
Dictionary of Information Technology
Informatics
• Informatics is the study of application of
computer and statistical techniques to the
management of information.
• Biology in the 21st century is being
transformed from a purely lab-based
science to an information science as well.
Bioinformatics
• Bioinformatics is the field of science in
which biology, computer science, and
information technology merge to form a
single discipline.
– National Center for Biotechnology Information (NCBI)
Bioinformatics
• Bioinformatics is the marriage
of biology and information
technology.
• Bioinformatics is the application of
statistics and computer science to the field
of molecular biology.
Bioinformatics
• The term bioinformatics
was coined by Paulien
Hogeweg and Ben Hesper
in 1978 for the study of
informatic processes in
biotic systems at Utrecht
University, Netherlands.
Bioinformatics
• Bioinformatics encompasses any computational
tools and methods used to manage, analyze and
manipulate large sets of biological data. Three
components:
– Creation of databases allowing the storage and
management of large biological data sets.
– Development of algorithms and statistics to
determine relationships among members of large
data sets.
– Use of tools for the analysis and interpretation of
various types of biological data, including DNA, RNA
and protein sequences, protein structures, gene
expression profiles and biochemical pathways.
– The actual process of analyzing and interpreting data
is referred to as computational biology.
Computers in Bioinformatics
• Repeat same task millions of times
• Problem solving
Central Dogma of
Molecular Biology
Why genetics is important
Genetics
G×E
interaction
Environment Health
ISI Web of Science topic search for "genetic AND disease"
8000
7000
6000
Number of journal records
5000
4000
3000
2000
1000
0
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
How genes work
What is a gene?
• A gene is a stretch of DNA whose
sequence determines the structure and
function of a specific functional molecule
…GAATTCTAATCTCCCTCTC …function
(usually
DNA a protein)
AACCCTACAGTCACCCATTT
Computer program
GGTATATTAAAGATGTGTTG
sf(){document.
f.q.focus()}…
TCTACTGTCTAGTATCC…
mRNA
Transcription movie
Translation
Translation
Translation
Translation movie
Gene expression movie
Summary
• A gene is a length of DNA that contains
instructions for making a specific protein
• Genes are arranged along 23 pairs of
chromosomes in the cell nucleus
• Genes work by specifying the amino acid
sequence of a protein
Summary
• Post-genomic genetics has enormous
promise for tracking down the genes
involved in common complex diseases
• Currently our ability to exploit this potential
is limited by
– study size
– difficulty of correcting for confounding factors
Components of a Digital Computer System
Bioinformatics and Internet
• Biological information is stored on many different
computers around the world.
• The easiest way to access this information is for
the computers to be joined together in a
network.
• A computer network, is a collection of computers
and devices interconnected by communications
channels that facilitate communications among
users and allows users to share resources.
WORLD INTERNET USAGE AND POPULATION STATISTICS
Internet Users
Population Internet Users Latest Data Penetration Growth
World Regions
( 2010 Est.) Dec. 31, 2000 (June 30, (% Population) 2000-2010
2010)
Latin
592,556,972 18,068,919 204,689,836 34.5 % 1,032.8 %
America/Caribbean
http://www.internetworldstats.com/stats.htm
Internet
• The internet is an international network of computers
derived from an earlier system, ARPAnet, developed by
the US military.
• The foundations of the Internet were formed when packet-
switching networks came into operation in the 1960s.
• Transmitted data is broken up into small packets of data,
sent to its destination, and reassembled at the other side.
• This means that a single signal can be routed to multiple
users, and an interrupted packet may be re-sent without
loss of transmission.
• Packets can be compressed for speed and encrypted for
security.
• Internet Access – Hardware (network card and/ or
modem), Software, Permission for network access
TCP/ IP
• Information transfer over the internet is
governed by a set of protocols (procedures for
handling data packages) called TCP/IP.
• TCP is the Transmission Control Protocol, which
determines how data is broken into packages
and reassembled.
• IP is the Internet Protocol, which determines
how the packets of information are addressed
and routed over the network.
FTP
• File Transfer Protocol (FTP) is a standard
network protocol used to copy a file from one host
to another over a TCP-based network, such as
the Internet. FTP is built on a client-server
architecture and utilizes separate control and data
connections between the client and server. FTP
users may authenticate themselves using a clear-
text sign-in protocol but can connect anonymously
if the server is configured to allow it.
Telnet
• Telnet is a network protocol used on the Internet or local
area networks to provide a bidirectional interactive text-
oriented communications facility using a
virtual terminal connection.
• Telnet is a user command and an underlying TCP/IP
protocol for accessing remote computers. Through
Telnet, an administrator or another user can
access someone else's computer remotely.
• On the Web, HTTP and FTP protocols allow one to
request specific files from remote computers, but not to
actually be logged on as a user of that computer.
• With Telnet, one can log on as a regular user with
whatever privileges may have been granted to the
specific application and data on that computer.
Telnet
• A Telnet command request looks like this:
telnetthe.libraryat.whatis.edu
The result of this request would be an invitation
to log on with a userid and a prompt for a
password. If accepted, one would be logged on
like any user who used this computer every day.
• Telnet is most likely to be used by program
developers and anyone who has a need to use
specific applications or data located at a
particular host computer.
WWW
• The World Wide Web is a way of
exchanging information over the Internet
using a program called a browser.
• The WWW was developed in 1992 and
allows the display of information pages
containing multimedia objects in a special
format called hypertext.
– URL or Hyperlink
Gateway sites for Bioinformatics on WWW
• http://www.ncbi.nlm.nih.gov/
• http://www.ebi.ac.uk/
• http://www.expasy.ch/
• http://www.genome.jp/kegg/