Pairwise Sequence Allignment

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 108

Bioinforma

tics

Biological Research
Traditional biology
Small team working on a specialized topic
Small team working on limited numbers of
targets
( gene, enzyme) in a well defined
experiment to answer precise questions
New high-throughput biology
Large teams working on many targets
(thousand of
genes, enzymes) using cutting edge
technology
Large amount of information is being
created every day.
HelixInfoSystems

Paradigm Shift in
Biology
To use the flood of knowledge, which will
pour across the
computer networks of the world, biologists
not only must
become computer literate, but also change
their approach
to the problem ofHelixInfoSystems
understanding life.

Whats Really Next


Bridging the gap between the real
world of biology
and precise logical nature of
computers with an
interdisciplinary perspective.
HelixInfoSystems

The Problem
How to store and use
these
datas in a useful way?

HelixInfoSystems

The Solution
Leads to a new
insight
called
Bioinformatics.

HelixInfoSystems

What is Bioinformatics?
Using information technology to help solve
biological problems by designing novel and
incisive algorithms and methods of analysis
Bioinformatics = Data Mgmt + Knowledge
Discovery
Data Mgmt =Integration + Transformation +
Cleansing
Knowledge Discovery = Statistics +
Algorithms + Databases
HelixInfoSystems

Where is Bioinformatics
Used?
Genome Projects need to store and
organize DNA sequences Database
Development
How do we find protein coding regions,
introns and exons in genomic DNA
sequences? Gene Finding
Under which conditions is a certain gene
transcribed? Gene Expression Analysis
HelixInfoSystems

Where is Bioinformatics
Used?
What do we know about a specific
protein?
How can we compare protein
sequences? Sequence Alignment
Can we predict protein structures?
Protein Structure Prediction
HelixInfoSystems

The Aims of
Bioinformatics are threefold
The First
To organise data in a way that allows
researchers
to access existing information and
to submit new
entries as they are produced
HelixInfoSystems

10

The Second

To develop tools,algorithms and statistics


that aid
in the analysis of data.

HelixInfoSystems

11

The Third
To use these tools to analyse the data
and interpret the
results in a biologically meaningful
manner.
HelixInfoSystems

12

Challenges in
Bioinformatics

Explosion of information

Need for faster, automated analysis to process


large amounts of data
Need for integration between different types of
information (sequences, literature, annotations,
protein levels, RNA levels etc)
Need for smarter software to identify
interesting relationships in very large data sets

HelixInfoSystems

13

Bioinformatics software
By combining computational
methods with
experimental biology, major
discoveries can be
made faster and more efficiently.
HelixInfoSystems

14

Software Development
Flow Chart
Analyze the
Problem
Refine
Develop Algorithm
Implementation
using
Programming

Further Experiment
Define

HelixInfoSystems

15

Lack of
Bioinformaticians
Software needs to be easier to access, use
and
understand
Biologists need to learn about the software,
its
limitations, and how to interpret its results
HelixInfoSystems

16

A Bioinformatician
should be a
Database user
Tool user
Database developer
Tool developer
Training, practicing or developing
Doing bioinformatics experiments

HelixInfoSystems

17

The future of
Bioinformaticians
In the future, bioinformatics is likely to
become more
central to the way biology is done
Bioinformatics we are woefully short in
terms of having a critical mass of people who
understand both biology and computational
approaches
HelixInfoSystems

18

Representing and
Retrieving Sequences

Definition
A sequence is a linear set of characters
(sequence elements) representing
nucleotides or amino acids
DNA composed of four nucleotides or
"bases": A,C,G,T
RNA composed of four also: A,C,G,U (T
transcribed as U)
proteins are composed of amino acids
(20)
HelixInfoSystems

20

Representation of
Sequences
characters
simplest
easy to read, edit, etc.
bit-coding
more compact, both on disk and
in memory
comparisons more efficient
HelixInfoSystems

21

Character representation
of sequences
DNA or RNA
use 1-letter codes (e.g., A,C,G,T)
protein
use 1-letter codes
can convert to/from 3-letter codes
(e.g., A = Ala = Alanine
C = Cys = Cysteine)
HelixInfoSystems

22

Representing uncertainty
in nucleotide sequences
It is often the case that we would like to
represent uncertainty in a nucleotide
sequence, i.e., that more than one base is
possible at a given position
to express ambiguity during sequencing
to express variation at a position in a
gene during evolution
to express ability of an enzyme to
tolerate more than one base at a given
position of a recognition site
HelixInfoSystems

23

Representing uncertainty in
nucleotide sequences
To do this for nucleotides, we use a
set of single character codes that
represent all possible combinations
of bases
This set was proposed and adopted
by the International Union of
Biochemistry and is referred to as
the I.U.B. code
HelixInfoSystems

24

The I.U.B. Code

A, C, G, T, U
R = A, G (puRine)
Y = C, T (pYrimidine)
S = G, C (Strong hydrogen
bonds)
W = A, T (Weak hydrogen bonds)
M = A, C (aMino group)
K = G, T (Keto group)
B = C, G, T (not A)
D = A, G, T (not C)
H = A, C, T (not G)
V = A, C, G (not T/U)
N = A, C, G, T/U (iNdeterminate)
X or - are sometimes used

HelixInfoSystems

25

Representing
uncertainty in protein
sequences

Given the size of the amino acid


alphabet, it is not practical to design a
set of codes for ambiguity in protein
sequences
Fortunately, ambiguity is less common
in protein sequences than in nucleic acid
sequences
Could use bit-coding as for nucleic acids
but rarely done
HelixInfoSystems

26

Sequence
Analysis

- Exploring the unknown information


about the
Biological Sequences (DNA,RNA &
Protein) by
using various statistical
methods .
HelixInfoSystems

27

What kinds of analyses


are there?

HelixInfoSystems

28

Frequency
analysis

Determine the frequency of occurrence


of sequence elements.
Applications
using oligomer frequency to
distinguish coding and noncoding
regions.
frequency of amino acids for predicting
3D structure of protein, functionality,
location in the cell.
HelixInfoSystems

29

Sequence
Alignment
Common mistake:
sequence similarity is not
homology!
Homologous sequences: derived
from a
common ancestor

Relation of
sequences

Homologs: similar sequences in 2 different


organisms
derived from a common ancestor sequence.

Orthologs: Similar sequences in 2 different


organisms
that have arisen due to a speciation event
Functionality Retained.

Relation of
sequences
Paralogs: Similar sequences within a single
organism

that have arisen due to a gene duplication event.

Xenologs: similar sequences that have arisen


out of

horizontal transfer events (symbiosis, viruses,


etc)

Measuring homology by
pairwise alignment
Similar

protein sequence means


similar function.
Infer

the function of the new gene


from the known one.
Needs

alignment of nucleic acid


sequences or HelixInfoSystems
amino acid

33

Database
searching
One

of the most common


sequence alignment application is
querying of sequence databases.
Develop

database query system


for sequence querying.(that is,
making good sequence alignment
algorithm)
HelixInfoSystems

34

Multiple
alignment
Simultaneously aligning a number
of sequences to produce optimal
global or local alignments (of similar
subsequences).
Applications
global alignment : measurement
of evolutionary distance between
species
local alignment : identifying
subdomains HelixInfoSystems
within sequences(ex.35

Finding regions of interest in


nucleic acid sequence
Identification of gene
components.

exon/intron boundaries,

promoters,
HelixInfoSystems

36

Gene finding
Parts

of a gene is scattered over

DNA.
An approach - identify exactly
where four specific signal type can
be found.
The start codon. (ATG)
Beginning of each intron.(GT-)
End of each intron.(-AG)
The stop codon.
HelixInfoSystems

37

Gene finding
The

two major types : eukaryotic


and prokaryotic
eukaryotic gene finding is much
harder
presence of introns.
Coding density is low.
Underlying technologies : hidden
Markov models, decision trees,
HelixInfoSystems
38
neural networks...

Analyzing protein
sequence

Proteins

3D-structure determines
structural and functional properties.
But we only know its 1D-sequence.
Prediction is needed.
Proteins structure
primary structure : sequence of
amino acids
secondary structure : folding of
amino sequence
HelixInfoSystems
39

Whole genome
analysis
Looking at genomes
in their entirety.
Still

growing and needs explorations.


Examples
characterization of genes that can be
assigned to functional roles by homology
examination of specific oligomer content
across the whole genome
identify laterally transferred genes
prediction of events which caused
evolution of two genomes from a common
HelixInfoSystems
40
ancestor

Pairwise Sequence
Similarity

HelixInfoSystems

41

Why compare
sequences?
Understanding the function, role, and/or structure of
biomolecular sequences (DNA, RNA, proteins)
Evolution reuses, builds
modifies
successful structures.

on,

duplicates,

High sequence similarity usually


significant
functional or structural similarity.

HelixInfoSystems

and

implies

42

Protein vs.
Amino acids
DNA
20 letter alphabet

Each with specific chemical


properties essential for protein
function
Nucleotides
4 letter alphabet
Chemical properties of bases
not directly relevant to
function
Also need to search reverse
complement
HelixInfoSystems

43

Fundamental
Concepts
&
Definitions

Question?
Given two strings

what is their distance?

HelixInfoSystems

45

Answer
One way of measuring their distance is to look
at what will take to transform X into Y
Edit distance
Edit operations performed on X to change to Y
I ) insertion, D ) deletion, S ) substitution
M ) match (no edit operation is performed)
Example
X = applied
Y = prince
HelixInfoSystems

46

Example
6 edit operations
Edit transcript DMSDMIIMD
appli__ed
_pr_ince_

Is a sequence of edits
unique?
HelixInfoSystems

47

Example continued
SSSSSMD
applied
prince_

DDMIIIIDDMD
app____lied
__princ__e_

DDDDDDDIIIIII
applied______
_______prince

6 edit operations
9 edit operations

13 edit operati

Which one is better than


the other?
HelixInfoSystems

48

Hamming Distance
The number of mismatching
position between the two strings
Example: The Hamming distance
between "toned" and "roses" is 3.

HelixInfoSystems

49

Levenshtein Distance
The minimum number of edit
operation required to convert one
string to another.
Similar to Edit Distance

HelixInfoSystems

50

Alignments
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A

Three elements:
Perfect matches
Mismatches
Gaps
HelixInfoSystems

51

Choosing
Alignments
There are many possible alignments
For example, compare:
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A

to
------GCGCATGGATTGAGCGA
TGCGCC----ATTGATGACCA--

Which one is better?


HelixInfoSystems

52

Scoring
Rule
Example Score =
(# matches) (# mismatches)
(# gaps) x 2

HelixInfoSystems

53

Example
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A

Score: (+1x13) + (-1x2) + (-2x4) = 3


------GCGCATGGATTGAGCGA
TGCGCC----ATTGATGACCA

Score: (+1x5) + (-1x6) + (-2x11) = -23


HelixInfoSystems

54

The optimal
alignment
An
optimal There are two possible linear alignments
1.
GDVEKGKKIFIMKCSQ
alignment is one
| |||||
GCVEKGKIFINWCSQ
that exhibits the
2.
GDVEKGKKIFIMKCSQ
most
||| |||
correspondences,
GCVEKGKIFINWCSQ
and
the
least
GDVEKGKKIFIMKCSQ
differences. | ||||| ||| |||
GCVEKGK-IFINWCSQ

Insertion of one break maximizes the identities.


HelixInfoSystems

55

Methods of
Sequence Alignment
Dot Matrix
Dynamic
Programming
Word or K-tuple
HelixInfoSystems

56

Dot
Matrix
Established in 1970 by A.J. Gibbs and
G.A.McIntyre
Method for comparing two amino acid
or nucleotide sequences

HelixInfoSystems

57

Dot Matrix
Displays any possible sequence
alignments as diagnols in a matrix
Reveals the presence of any
Insertion/deletion
Direct and Inverted repeats
Limitation:does not show actual
region of alignment
HelixInfoSystems

58

Dynamic
Programming
DP is a method for solving certain kind of problems
DP can be applied when the solution of a
problem includes solutions to sub problems
We need to find a recursive formula for the
solution
We can recursively solve subproblems, starting
from the trivial case, and save their solutions in
memory
In the end well get the solution of the whole
problem
HelixInfoSystems

59

Word or KTuple
Align two sequences very quickly
Searches for identical short stretches
of sequences and joining them

HelixInfoSystems

60

Scoring Matrix

HelixInfoSystems

61

Why need Scoring


Matrix?
To find the best alignment one needs to
examine all
possible alignments
To reflect the quality of the possible
alignments one
needs to score them
There can be different alignments with the
same highest score
Variations in the scoring scheme may
change the ranking of alignments
HelixInfoSystems

62

Scoring
Matrices

Table of values that describe


the probability
of a residue pair occurring in an
alignment
HelixInfoSystems

63

Unitary
Matrix
Simplest scoring scheme
Amino acids pairs are classified into
Identical
A
Non-identical
A 1
Identical pairs are scored 1
Non-identical pairs are scored 0R 0
Less effective for detection of N 0
weak similarities
HelixInfoSystems

2 types:

R
0
1
0
D 0 0

N
0
0
1
0

D
0
0
0
1
64

Percent Accepted
Mutation
PAM (Percent Accepted Mutation) /
MDM (Mutation Data Matrix ) / Dayhof
Similar sequences organized into
phylogenetic trees
Number of amino acid changes counted
Relative mutabilities evaluated
20 x 20 amino acid substitution matrix
calculated
HelixInfoSystems

65

Percent Accepted
Mutation
PAM 1: 1 accepted mutation event per
100 amino acids; PAM 250: 250
mutation events per 100
PAM 1 matrix can be multiplied by itself
N times to give transition matrices for
sequences that have undergone N
mutations
PAM 250: 20% similar; PAM 120: 40%;
PAM 80: 50%; PAM 60: 60%
HelixInfoSystems

66

BLOSUM
Larger set of sequences considered
Sequences organized into signature
blocks
Consensus sequence formed
60% identical: BLOSUM 60
80% identical: BLOSUM 80

HelixInfoSystems

67

Amino acids: chemical


properties
Aliphatic - G, A, V, L, I, P
Aromatic - F, Y, W
Uncharged polar - S, T, N,
Charged - D, E, H, K, R
Sulfur-containing - C, M

HelixInfoSystems

68

BLOSUM 62

Positive for similar amino acid

Common amino acid have low score


Rare amino acid
have high score

HelixInfoSystems

69

Nucleic Acid Scoring


Matrices
Two mutation models:
Uniform mutation rates (Jukes-Cantor)
Two separate mutation rates (Kimura)
Transitions
Transversions

HelixInfoSystems

70

DNA
Mutations
A

PURINES: A, G
PYRIMIDINES C, T
Transitions:
AG, CT
Transversions:
AC, AT
CG, GT

HelixInfoSystems

71

PAM1 DNA odds


matrices
A. Model of uniform mutation rates among nucleotides.
A
G
T
A
0.99
G 0.00333
0.99
T
0.00333
0.00333
0.99
C
0.00333
0.00333
0.00333
B. Model of 3-fold higher transitions than transversions.
A
G
T
A
0.99
G 0.006
0.99
T
0.002
0.002
0.99
C
0.002
0.002
0.006

HelixInfoSystems

0.99

0.99

72

Use of scoring
matrices
Deciding which scoring matrix you
should use in order of obtain the best
alignment results is a difficult task.
Selection of a "wrong" scoring matrix
will most probable strongly influence
on the outcome of the analysis

HelixInfoSystems

73

Correlations between the


PAM and BLOSUM
matrices
The BLOSUM matrices with low numbers
correspond to PAM matrices with high
numbers

HelixInfoSystems

74

Selection of
Scoring Matrices
For closely related sequences choose
BLOSUM matrices created for highly
similar alignments, like BLOSUM80. You
can also select low PAM matrices such
as PAM1.
For distant related sequences, select
low BLOSUM matrices (for example
BLOSUM45) or high PAM matrices such
as PAM250.
HelixInfoSystems

75

Gap
Penalties
The cost of opening a gap is higher than
the gap extension penalty
This reflects the tendency for insertions
and deletions to occur over several
residues at a time
Scoring scheme should penalize new gaps
more
BLOSUM 62: -11 for gap opening; -1 for
gap extension
BLOSUM 50: -12/-1 penalty
HelixInfoSystems

76

Gap
Penalties

Gap penalties are always a problem:


There is no formal way to calculate
their values.
you can follow recommended settings,
but these are based on trial and error
and not on a formal framework.

HelixInfoSystems

77

DOTPLO
A dotplot gives
T an overview of all possible al
A
T
T
C

Sequence 2
A
C

A
T
A

A C

A T

A C

Sequence 1

HelixInfoSystems

A C

78

In a dotplot each diagonal


corresponds to a
possible (ungapped) alignment
A
T
T
C

Sequence 2
A
C

A
T
A

A C

A T T A C1 G
Sequence

A C

T A C A T T A C G T A C
One possible alignment:
A T A C A C T T 79
A
HelixInfoSystems

Window-based
approaches
Word Size
Window / Stringency

HelixInfoSystems

80

Word size
algorithm
Word
Size = 3

T A C G G T A T G
A C A G T A T C
C
T
A
T
G
A
C
A

T A C G G T A T G
A C A G T A T C
T A C G G T A T G
A C A G T A T C
T A C G G T A T G

T A C G G T A T G

A C A G T A T C

HelixInfoSystems

81

Window / stringency
T A C G G T A T G
Window = 5 / Stringency = 4

T C A G T A T C
T A C G G T A T G

T C A G T A T C
T A C G G T A T G

T C A G T A T C
T A C G G T A T G
T C A G T A T C

C
T
A
T
G
A
C
A

T A C G G T A T G

HelixInfoSystems

82

Considerati
ons method is
The window/stringency
more sensitive than the wordsize
method (ambiguities are permitted).
The smaller the window, the larger
the weight of statistical (unspecific)
matches.
With large windows the sensitivity for
short sequences is reduced.
HelixInfoSystems

83

Insertions / Deletions
inT a Dotplot

Sequence 2

A
C
T
G
T
C
A
T
T

Sequence 1

T A C T G - T C A T
| | | | |
| | | |
T A C T G T T C A T

HelixInfoSystems

84

Hemoglobin
-chain

Dotplot
(Window =
130 /
Stringency
= 9)

HelixInfoSystems

85

Hemoglobin
-chain

Dotplot
(Window
= 18 /
Stringency
= 10)

HelixInfoSystems

86

Conclusions
What about these alike areas?
Whats the best path through the dot
matrix?
How long do I extend it?
How can I zoom-in on it to see exactly
whats happening?
Where, specifically,
is this alignment; how
HelixInfoSystems
87

Dynamic
Programming

Dynamic programming
Dynamic
programming
guaranteed to

algorithms

are

find the optimal scoring alignment or set of


alignments.
Dynamic programming algorithms are core to
computational sequence analysis.
HelixInfoSystems

89

Dynamic Programming
A computational method that works well
for some types of optimization problems.
It works well for problems that can be
solved in a stepwise manner, always
saving just the best solution so far at
each step. (Its efficient because you
disregard loser paths.)
Works well for pair-wise sequence
alignment.of proteins or nucleic acids.
HelixInfoSystems

90

Dynamic Programming

omatic procedure that finds the best align

optimal score depending on the chosen par

Needleman and Wunsch Algorithm


- Global Alignment Smith and Waterman Algorithm
- Local Alignment HelixInfoSystems

91

Needleman Wunsch
Three
steps
programming

in

dynamic

1. Initialization
2. Matrix fill (scoring)
3. Traceback (alignment)

HelixInfoSystems

92

Needleman and
Wunsch
(global alignment)
Sequence 1:
EE
Sequence 2:

HEAGAWGH
PAWHEAE

Scoring parameters:
Gap penalty:
penalty of 8

BLOSUM50

Linear gap

HelixInfoSystems

93

Step1: Initializaton
The

first step in the global alignment


dynamic programming approach is
to create a matrix with M + 1
columns and N + 1 rows

and N correspond to the size of


the sequences to be aligned.
HelixInfoSystems

94

Score Matrix
(BLOSUM 50)
H

-2

-1

-1

-2

-1

-4

-2

-2

-1

-1

-2

-1

-3

-2

-1

-1

-3

-3

-3

-3

-3

15

-3

-3

-3

-3

10

-2

-2

-2

-3

-2

10

-1

-3

-1

-3

-3

-2

-1

-3

-2

-1

-1

-1

-3

-1

-3

-3

HelixInfoSystems

95

Basic principles of dynamic


programming

- Creation of an alignment path mat

- Stepwise calculation of score values

- Backtracking (evaluation of the opt


HelixInfoSystems

96

Creation of an alignment
path
matrix
If F(i-1,j-1),
F(i-1,j) and F(i,j-1) are known
we can calculate F(i,j)
Three possibilities:
xi and yj are aligned, F(i,j) = F(i-1,j1) + s(xi ,yj)
xi is aligned to a gap, F(i,j) = F(i1,j) - d
yj is aligned to a gap, F(i,j) = F(i,j1) - d

HelixInfoSystems

97

Creation of an
alignmentF(i,path
matrix
j) = F(i-1, j-1) + s(x ,y )
i

F(i, j) = max

F(i, j) = F(i-1, j) - d
F(i, j) = F(i, j-1) - d

F(i-1, j-1)

F(i, j-1)

s(xi ,yj)

F(i-1,j)

-d
-d

F(i, j)

HelixInfoSystems

98

Creation of an alignment
H
Ematrix
A
G
A
W
G
H
E
E
path
0
-8
-16 -24 -32 -40 -48 -56 -64 -72 -80
P

-8

A
-16
W

Boundary conditions

-24

F(i, 0) = -i d

-32

F(j, 0) = -j d

H
E
A

-40

E
-48

HelixInfoSystems

99

Creation of an
H
E
A
G
A
W
G
H
E
alignment
path matrix
0
-8
-16 -24 -32 -40 -48 -56 -64 -72
-2
P

-8

F(i, j) = max

-10
A

-16

-24

-9
-3

E
-80

F(i, j) = F(i-1, j-1) + s(xi ,yj)

P-H=-2

F(i, j) = F(i-1, j) - d

E-P=-1

F(i, j) = F(i, j-1) - d

H-A=-2

F(0,0) + s(xi ,yj) = 0 -2 = -2


F(1,1) = max F(0,1) - d

= -8 -8= -16

F(1,0) - d

= -8 -8= -16

= -2

E-A=-1

-32
F(1,0) + s(xi ,yj) = -8 -1 = -9

-40

F(2,1) = max F(1,1) - d


F(2,0) - d

-48

-56

= -9

= -16 -8= -24

-2 -1 = -3

-8 -2 = -10
F(1,2) = max -16 -8 = -24 = -10
-2 -8 = -10

= -2 -8 = -10

F(2,2) = max

-10 -8 = -18 = -3

HelixInfoSystems-9

-8 = -17

100

Backtrac
king

HelixInfoSystems

101

E
-16

A
-24

G
-32

A
-40

W
-48

G
-56

H
-64

E
-72

E
-80

-2

-9

-17
-17

-25
-25

-42

-49

-57

-65

-73

-12

-33
-20
-20

A -16

-10

-3

-4

-52

-60

-7

-15

-36
-13
-13

-44

-6

-28
-5
-5

W -24

-18

-11

-29

-37

-18

-13

-8

-9

-13

-7

-21
-3
-3

H -32

-14

-19

-22

-8

-16

-16

-9

-12

-15

-7

A -48

-30

-16

-3

-11

-11

-12

-12

-15

-11
3
3
-5
-5

E -40

E -56

-38

-24 -11 -6
-12 -14 -15 -12
Optimal global alignment: HEAG AWGHE-E
--P- AW-HEAE

0
P

-8

H
-8

HelixInfoSystems

-5

-9

102

2
1
1

Smith and
Waterman
(local alignment)

Two differences:

0
1. F(i, j) = max

F(i, j) = F(i-1, j-1) + s(xi ,yj)


F(i, j) = F(i-1, j) - d
F(i, j) = F(i, j-1) - d

2. An alignment can now end anywhere in the matrix


Example:
Sequence 1
Sequence 2

H EAGAW G H E E
PAW H EAE

Scoring parameters:
Gap penalty:

BLOSUM
Linear gap penalty of 8

HelixInfoSystems

103

H
0

E
0

A
0

G
0

A
0

W
0

G
0

H
0

E
0

E
0

0
12
12

0
6

0
0

0
5
5

0
20
20

10

12

18

4
22
22

16

10

18

14
28
28

21

13

10

20

27

6
13
18
12
4
0
A WGH E
Optimal local alignment:
A W-H E

16

26

HelixInfoSystems

20

104

Extended Smith &


Waterman

To get multiple local alignments:


Delete regions around best path
Repeat backtracking

HelixInfoSystems

105

Extended Smith & Waterman


0

H
0

E
0

A
0

G
0
0

H
E
A
E

0
0
0
0

10
2
0
0

2
16
8
6

0
8
21
13

0
0
13
18

A
0

W
0

G
0

H
0

E
0

0
5

0
20

0
12

0
4

12

18

22

0
14

0
6

10

18

28

20

10

20

27

16

26

0
0
0
5
12

0
4

HelixInfoSystems

E
0

106

H
0

E
0

A
0

G
0

A
0

W
0

G
0

H
0

E
0

E
0

0
0
0

0
10
10

2
16
16

8
21
21

13

13
18
12
4
Second best local alignment:

HelixInfoSystems

0
HEA
HEA

107

Dynamic programming
Methods

Is computational intensive, for 2 sequence


comparison, requiring O(n2) space and
time.

So, not suitable for searching the best


aligned sequence in databases.

Cannot be used for >3 sequences.


Many heuristic methods are developed
e.g.,BLAST.
HelixInfoSystems

108

You might also like