Boyer

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

BOYER-MOORE ALGORITHM The Boyer-Moore algorithm is consider the most efficient string-matching algorithm in usual applications, for example,

in text editors and commands substitutions. The reason is that it woks the fastest when the alphabet is moderately sized and the pattern is relatively long. The algorithm scans the characters of the pattern from right to left beginning with the rightmost character. During the testing of a possible placement of pattern P against textT, a mismatch of text character T[i] = c with the corresponding pattern character P[j] is handled as follows: If c is not contained anywhere in P, then shift the pattern Pcompletely past T[i]. Otherwise, shift P until an occurrence of character c in P gets aligned with T[i]. This technique likely to avoid lots of needless comparisons by significantly shifting pattern relative to text.

BOYER_MOORE_MATCHER (T, P) Input: Text with n characters and Pattern with m characters Output: Index of the first substring of T matching P 1. Compute function last 2. i m-1 3. j m-1 4. Repeat 5. 6. 7. 8. 9. 10. 11. else If P[j] = T[i] then if j=0 then return i else i i -1 j j -1 // we have a match

12. 13. 14. 15.

i i + m - Min(j, 1 + last[T[i]]) j m -1 until i > n -1 Return "no match"

For example consider

T: P:

0 1 2 3 4 5 6 7 8 9 a b a c a a b a c c a b a c a b 0 1 2 3 4 5

last(a) is the index of the last (rightmost) occurrence of 'a' in P, which is 4. last(c) is the index of the last occurrence of c in P, which is 3
'd' does not exist in the pattern there we have last

(d) = -1.

c a last(c) 4

c 3

d -1

Now, for 'b' notice

T: a b a c a a b a c c P: a b a c a b
Therefore, last(b) is the index of last occurrence of b in P, which is 5 The complete last(c) function

last(c) 4

-1

Analysis

The computation of the last function takes O(m+||) time and actual search takes O(mn) time. Therefore the worst case running time of Boyer-Moore algorithm isO(nm + ||). Implies that the worst-case running time is quadratic, in case of n = m, the same as the nave algorithm.

You might also like