Suffix Trees, Suffix Arrays, and Their Applications
Suffix Trees, Suffix Arrays, and Their Applications
Suffix Trees, Suffix Arrays, and Their Applications
their applications
Fall 2019
November 7, 2019
1
Trie
e
f
f e g
Trie
f e g
Compressed Trie
Compress nodes with single outgoing edge, label edges by
strings
c è
c
a a
b
e
d b bbf
d
e eef
f
f e g e g
Suffix tree
Given a string T, a suffix tree of T is a compressed
trie of all suffixes of T
$
a b
b
$
a
a $ b
b $
$
7
Naïve algorithm to build suffix tree
Constructing suffix tree for T=abab$
a
Insert the longest suffix b
a
b
$
a b
b a
Next longest a b
b $
$
a b
b a
a b
b $
$
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Exact string matching
Is aba a substring of T=abab?
Traverse the tree using the string.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
19
Running time
q Build the suffix tree: O(m)
q Overall O(m+n+k).
20
A collection of strings
#
{ $
a b
$ # 5 4
b$ b# #
ab$ ab# b a a $ 3
bab$ aab# b b 4
# $
abab$
}
a $ # 1 2
b
$ 3 2
1
Generalized suffix tree
Given a set of strings T a generalized suffix tree of
T is a compressed trie of all suffixes of t ∈ T.
• Repeats/tandem repeats
• Maximal palindromes
24
Drawbacks
Suffix trees consume a lot of space:
It is O(m) but the constant is quite big. Cannot
fit a large mammalian genome in memory.
Let T = abab.
3 1 4 2
Example
T = mississippi
L 11 i
8 ippi
s = issa 5 issippi
2 ississippi
1 mississippi
M 10 pi
9 ppi
7 sippi
4 sisippi
6 ssippi
R 3 ssissippi
Searching a suffix array
If s occurs in T then all its occurrences are
consecutive in the suffix array.