KMP Skip Search Algorithm: Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

KMP Skip Search Algorithm

Very Fast String Matching Algorithm for Small Alphabets and Long
Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in
Computer Science, Vol. 1448, 1998, pp. 55-64

Advisor: Prof. R. C. T. Lee


Speaker: Z. H. Pan
1
Definition
• String Matching Problem:
Input: a text string T of length n and a pattern string P of lengt
h m.

Output: Find all occurrence of P in T.

Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
T c a b c d a b a b d a d a d c d a b c d
:P d a b a b d a d a d
:
The occurrences of P in T : T5

2
• The KMP Skip Search algorithm consists two
phases which are processing and searching.
• KMP Skip Search algorithm uses KMP table to
improve the Skip Search algorithm.

3
Preprocessing
• The preprocessing phase computes the buckets for all
characters of the alphabet , list table , MP table and KMP table.

Example:
Text string T=GCATCGCAGAGAGTATACAGTACG
0 12 3 4 5 6 7 P=GCAGAGAG
Pattern string P=GCAGAGAG 01 234 5 6 7
c A C G T i 0 1 2 3 4 5 6 7
Z
[c] 6 1 7 -1 List
-1 -1 -1 0 2 3 4 5
[i]

0 1 2 3 4 5 6 7 8
mpNext -1 0 0 0 1 0 1 0 1
kmpNext -1 0 0 -1 1 -1 1 -1 1
4
A general situation for the search phase
i
T
j
P

start wall
i
T k
X j
P k

First it uses skip search algorithm which makes T[i]=P[j].


wall is the first mismatch position of T when T align with P.
start is the first position of T when T align with P.
k is a small string when the substring of P equal to the substring of T.
KmpStart is the next shift position of kmp.
Skipstart is the next shift position of skip.
5
If k=0, that there is not the prefix of P which equals the substring of T, it us
es skip search algorithm; otherwise, when k>0, that there is not the prefix o
f P which equals the substring of T, we have to find out Kmpstart 、 wall a
nd Skipstart to compare its four cases.

Case1. skipStart < kmpStart


then a shift according to the skip algorithm is applied which gives a new valu
e for skipStart, and we have to compare again skipStart and kmpStart.

Case2. kmpStart < skipStart < wall


then a shift according to the shift table of Morris-Pratt is applied. This gives
a new value for kmpStart. We have to compare again skipStart and kmpStart.

Case3. skipStart = kmpStart


then another step can be performed with start = skipStart.

Case4. kmpStart < wall < skipStart


then another step can be performed with start = skipStart. 6
Example: step 1 First it uses the Skip Search algorithm to align T and P.

start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

P = ACTACGT
0 1 2 3 4 5 6
k=5
ACTACGT (kmp’s shift) kmpstart = 3
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 4

wall =5 Case2. kmpStart < skipStart < wall


then a shift according to the shift table of Morris-Pratt is applied. This gives
kmpstart = 3
a new value for kmpStart. We have to compare again skipStart and kmpStart
skipstart = 4 .

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 7
Example: step 1-1
start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT
0 1 2 3 4 5 6
k=2
ACTACGT (kmp’s shift) kmpstart = 5
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 4

wall =5 Case1. skipStart < kmpStart


then a shift according to the skip algorithm is applied which gives a new val
kmpstart = 5
ue for skipStart, and we have to compare again skipStart and kmpStart.
skipstart = 4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 8
Example: step 1-2
start = 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT

start = 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT 9
Example: step 2
start = 9
wall = 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=1 ACTACGT
0 1 2 3 4 5 6
ACTACGT (kmp’s shift) kmpstart = 10
0 1 2 3 4 5 6
ACTACGT (skip’s shift) skipstart = 12

wall = 10 Case4. kmpStart < wall < skipStart


then another attempt can be performed with start = skipStart.
kmpstart = 10
skipstart = 12
start = 12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 10
Example: step 3
start = 12
wall = 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6
ACTACGT
match, k=7 0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 19 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 16 ACTACGT

wall = 19 Case1. skipStart < kmpStart


then a shift according to the skip algorithm is applied which gives a new val
kmpstart = 19
ue for skipStart, and we have to compare again skipStart and kmpStart.
skipstart = 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 11
Example: step 3-1
start = 12

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT

start = 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 12
Example: step 4
start = 19
wall = 21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=2
ACTACGT
0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 21 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 21 ACTACGT

wall = 21 Case3. skipStart = kmpStart


then another attempt can be performed with start = skipStart.
kmpstart = 21
skipstart = 21
start = 21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 13
Example: step 5
start = 21
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 5 6
ACTACGT

start = 25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT 14
Example: step 6
start = 25
wall = 26
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

k=1 ACTACGT
0 1 2 3 4 5 6
(kmp’s shift) kmpstart = 26 ACTACGT
0 1 2 3 4 5 6
(skip’s shift) skipstart = 28 ACTACGT

wall = 26 Case4. kmpStart < wall < skipStart


then another attempt can be performed with start = skipStart.
kmpstart = 26
skipstart = 28
start = 28
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6
ACTACGT
15
Example: step 7
start = 28
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 5 6

match, k=7
ACTACGT

16
Time Complexity
• The preprocessing phase of kmp Skip Search is O(m+σ)(σ is t
he number of alphabet.)

• The Searching Phase of Kmp Skip Search algorithm is O(n).

17
References
[BM77]    A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20
, 1977 , pp. 762-772 .

[HS91]    Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , p
p. 1221-1248 .

[MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stef
an J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp. 589-600 .

[MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, 1994.

[KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing,
Vol. 6, No.2, 1977, pp 323-350 .

[T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92
, No.1, 1992, pp 119-144 .

[T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience arc
hive, Vol. 28, No.5, 1998, pp 561-568

[T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experienc
e archive, Vol. 22, No.10, 1992, pp. 879-884 .

[G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol.
18 3, 1
994, pp. 243 .

You might also like