Bio 3
Bio 3
Bio 3
(3)
ALIGNMENT AND MATCHING
DR. IBRAHIM ZAGHLOUL
DNA Sequencers
DNA Sequencing
• DNA sequencing refers to the general laboratory technique for determining the exact
sequence of nucleotides, or bases, in a DNA molecule.
1
Sequence conservation implies function
1
Homology
GCTAGTCAGATCTGACGCTA
| |||| ||||| |||
TGGTCACATCTGCCGC
18
Sequence Alignment
VLSPADKTNVKAAWGKVGAHAGYEG
||| | | | || | ||
VLSEGDWQLVLHVWAKVEADVAGEG
1
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Definition
Given two strings x = x1x2...xM, y = y1y2…yN,
2
Sources of variation
• Nucleotide substitution
– Replication error
– Chemical reaction
• Insertions or deletions (indels)
– Unequal crossing over
– Replication slippage
• Duplication
– a single gene (complete gene duplication)
– part of a gene (internal or partial gene duplication)
• Domain duplication
• Exon shuffling
– part of a chromosome (partial polysomy)
21
A simple alignment
22
Scoring the alignments
• We need to have a scoring mechanism to evaluate alignments
– match score
– mismatch score
• We can have the total score as:
n
∑
=1
i
match or mismatch score at position i
24
BLOSUM 62 matrix
String Definitions
26
String Definitions
27
String Definitions
28
Exact matching
29
Exact Matching
30
Exact matching: naïve algorithm
31
Exact matching: naïve algorithm
32
Exact matching: naïve algorithm
33
Can we improve on the naïve algorithm?
P: word
T: There would have been a time for such a word
word
P: word
T: There would have been a time for such a word
word
word skip!
word skip!
word
34