KMP Skip Search Algorithm: Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan
KMP Skip Search Algorithm: Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan
KMP Skip Search Algorithm: Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan
Very Fast String Matching Algorithm for Small Alphabets and Long
Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in
Computer Science, Vol. 1448, 1998, pp. 55-64
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T c a b c d a b a b d a d a d c d a b c d
:P d a b a b d a d a d
:
The occurrences of P in T : T5
2
• The KMP Skip Search algorithm consists two
phases which are processing and searching.
• KMP Skip Search algorithm uses KMP table to
improve the Skip Search algorithm.
3
Preprocessing
• The preprocessing phase computes the buckets for all
characters of the alphabet , list table , MP table and KMP table.
Example:
Text string T=GCATCGCAGAGAGTATACAGTACG
0 12 3 4 5 6 7 P=GCAGAGAG
Pattern string P=GCAGAGAG 01 234 5 6 7
c A C G T i 0 1 2 3 4 5 6 7
Z
[c] 6 1 7 -1 List
-1 -1 -1 0 2 3 4 5
[i]
0 1 2 3 4 5 6 7 8
mpNext -1 0 0 0 1 0 1 0 1
kmpNext -1 0 0 -1 1 -1 1 -1 1
4
A general situation for the search phase
i
T
j
P
start wall
i
T k
X j
P k
start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
P = ACTACGT
0 1 2 3 4 5 6
k=5
ACTACGT (kmp’s shift) kmpstart = 3
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 7
Example: step 1-1
start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT
0 1 2 3 4 5 6
k=2
ACTACGT (kmp’s shift) kmpstart = 5
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 8
Example: step 1-2
start = 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT
start = 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 9
Example: step 2
start = 9
wall = 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=1 ACTACGT
0 1 2 3 4 5 6
ACTACGT (kmp’s shift) kmpstart = 10
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 12
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 10
Example: step 3
start = 12
wall = 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT
match, k=7 0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 19 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 16 ACTACGT
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 11
Example: step 3-1
start = 12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT
start = 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 12
Example: step 4
start = 19
wall = 21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=2
ACTACGT
0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 21 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 21 ACTACGT
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 13
Example: step 5
start = 21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT
start = 25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 14
Example: step 6
start = 25
wall = 26
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
k=1 ACTACGT
0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 26 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 28 ACTACGT
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT
15
Example: step 7
start = 28
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
match, k=7
ACTACGT
16
Time Complexity
• The preprocessing phase of kmp Skip Search is O(m+σ)(σ is t
he number of alphabet.)
17
References
[BM77] A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20
, 1977 , pp. 762-772 .
[HS91] Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , p
p. 1221-1248 .
[MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stef
an J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp. 589-600 .
[MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, 1994.
[KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing,
Vol. 6, No.2, 1977, pp 323-350 .
[T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92
, No.1, 1992, pp 119-144 .
[T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience arc
hive, Vol. 28, No.5, 1998, pp 561-568
[T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experienc
e archive, Vol. 22, No.10, 1992, pp. 879-884 .
[G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol.
18 3, 1
994, pp. 243 .