A Two Way Pattern Matching Algorithm Using Sliding Patterns
A Two Way Pattern Matching Algorithm Using Sliding Patterns
A Two Way Pattern Matching Algorithm Using Sliding Patterns
Ahstract- The string matching problem has attracted a lot of Although data is memorized in various ways the text
interest throughout the history of computer science and is remains the main form to exchange information. This is
becoming still crucial to computing industry. This problem particularly evident in the literature or linguistics where data
arises in many computer packages in the form of spell checkers, is composed of huge corpus and dictionaries. This apply as
search engines on the internet, matching of DNA strands well to computer science where large amount of data is
firewalls etc. Also as the quality of data available in these fields
stored in linear files and this is also the case in molecular
tends to double every 18 months, this seeds the need of more
biology because biological molecules can often be
efficient algorithms even though the speed and storage capacity
approximated as sequence of nucleotides or amino acids.
of computers increase regularly. In this paper we present an
Further as the quality of data available in these fields tends to
efficient pattern matching algorithm based on preprocessing of
double every 18 months, this seeds the need of more efficient
the pattern string by considering three consecutive characters
algorithms even though the speed and storage capacity of
of the text that immediately follow the aligned pattern window
in an event of mismatch between pattern and text character.
computers increase regularly.
The algorithm makes use of two sliding patterns. Both patterns Applications require two kinds of solution depending on
slide in parallel over the text until the first occurrence of the whether pattern or text is given first. Algorithms based on
pattern is found or a failure occurs. the use of automata or combinatorial properties of strings are
commonly implemented to preprocess the pattern and solve
Keywords- pattern matching, preprocessing, string matching the first kind of problem. The notion of indexes realized by
automata or trees is used by second class of problems.
In this paper the idea is to preprocess the pattern string
I. INTRODUCTION and hence this falls under the problem of first class. Section
The string matching problem has attracted a lot of II in this paper describes the related works carried out in the
interest throughout the history of computer science and is literature, Section III and IV describes the proposed
becoming still crucial to computing industry. String algorithm with working example.
matching is finding occurrence of a pattern string in the huge
II. RELATED WORKS
text string. This problem occurs in many computer packages
in the form of spell checkers, search engines on the internet, Many promising data structures and algorithms
matching of DNA strands firewalls etc. discovered by theoretical community are never implemented
String matching algorithms works as follows. Initially the or tested at all. This is because the actual performance of the
pattern string p[l...m] of length m to be searched for is algorithm is not analyzed as only the working of the
aligned to the left most end of the text string T[l...n] of algorithm in practice is taken care. There have been several
length n. The search process begins by comparing the pattern pattern matching algorithms designed in the literature till
characters with the characters of the text string and if a date as discussed below.
mismatch occurs the pattern is shifted by an amount defined In the naive or Brute Force technique [6] the string to be
by the algorithm being used. This is the key criteria where searched is aligned to the left end of the text and each
behaviors of most of the pattern algorithms differ. The speed pattern character is compared with the corresponding text
at which a mismatch between a pattern and the text is character. In this process if a mismatch occurs or the pattern
detected and the amount of shift defined on an event of character exhausts the pattern is shifted by one unit towards
mismatch is the key issue for judging the criteria of any right. The search is again resumed from the start of the
pattern matching algorithm and decides the efficiency of a pattern until the text is exhausted or match is found.
pattern matching algorithm. Naturally the number of comparisons being performed is
Monitoring number of comparisons performed by each very more as each time the pattern string is shifted right by
algorithm was chosen as a way to compare the pattern search only one unit towards right. The worst case comparison of
algorithms [3]. the algorithm is O(mn).
The number of comparisons can be reduced if we can are computed based on the mathematical formulation shown
move the pattern string by more than one unit. This was the below.
idea of KMP algorithm [4]. The KMP algorithm compares
the pattern from left to right with the text just as BF Bad Character of shift right
algorithm. When a mismatch occurs the KMP algorithm
moves the pattern to the right by maintaining the longest
m+2 if p[m-l]=a
overlap of a prefix of the pattern with a suffix of a part of
m+l if p[m-l]p[m-2]=ba
the text that matched the pattern so far. The KMP algorithm
does almost 2n text comparisons and the worst case Bad char of i+3 if p[i]p[i+l]p[i+2]=abc
complexity of the algorithm is 0 (m+n). Shiftr [a, b, c] =min 1 if p[O]p[l] =bc
Boyer-Moore algorithm[2] differs from other algorithms 2 if prO] =c
in one feature. Instead of comparing the pattern characters m+3 Otherwise
from left to right the comparison is done from the right
towards left by starting comparison from the rightmost
character of the pattern. In case of a mismatch it uses two
Begin
functions last occurrence function and good suffix function.
Shift=shiftr=m+3
If the text character does not exist in the pattern then the last
for (each character pi where Pj=O to m-3)
occurrence function returns m where m is the length of the
{Next[i) =m-i, next[i) =i+2}
pattern string. So the maximum shift possible was m.The
if P [m-l)=a {shift=1,shiftr=m+2}
worst case running time of the algorithm is 0 (mn).
else if p[m-l)p[m-2)=ba {shift=2,shiftr=m+l}
In [7] the authors propose the pattern matching based on
else if p[i)p[i+l]p[i+2]=abc {shift=next[i],shiftr=next[i]}
two sliding windows known as TSW algorithm.
else if p[O)p[l)=bc {shift=m+l,shiftr=l}
In this paper the idea is to reduce the number of
else if prO] =c {shiftl=m+2, shiftr=2}
comparisons being performed by obtaining as much shift as
End
possible. A shift value of more than length of the pattern is
obtained when the pattern gets mismatch with the text for
Figure 1: Algorithm for shift left and shifts right function
one instance.
B. Searching Phase
III. PROPOSED ALGORITHM
In this phase the pattern string to be searched for in the
The proposed algorithm consist of two phases given text string is aligned at both ends of the text and
a) Preprocessing phase scanned simultaneously from left and right ends for the
b) Searching phase occurrence of the pattern string.
A. Preprocessing phase
Initially the search process begins by searching for the
pattern string from left end of the text. In case of a mismatch
In this phase the pattern to be searched for in the given text while searching from left, the pattern window is shifted to
string is preprocessed i.e. it mainly deals with the right by a displacement defined by shift-left function. Then
computation of the shift-left and shift-right values which are the search turns to right end of the text and when a
later used in the searching process. To calculate shift-left mismatch occurs while scanning for the occurrence of the
and shift-right, the algorithm uses three consecutive pattern from right end of the text, the shift-right function is
characters a, b, c which are aligned immediately after the used to shift the pattern towards left. This process is
pattern window. Initially the indices of three consecutive continued until the pattern is found or the left and right
characters in the text string from the left are indices over cross each other.
(m+1),(m+2),(m+3) for a,b,c respectively represented
mathematically as shown below. Step1: Initially the search process begins by aligning the
pattern to both ends of the text i.e. by aligning p [0] to T [0]
Bad Character of shift left at the left end and p[m] to T[n] at the right end respectively.
Instead of searching all the characters of the pattern
1 if p[m-I]=a sequentially through p[O] ....p[m-I], only the dead end
characters of the pattern are compared and if satisfied
2 if p[m-I]p[m-2]=ba
remaining characters are compared. i.e. P[O] is compared
Bad char of m-i if p[i]p[i+1]p[i+2] =abc
with corresponding text character if it satisfies then p[m-I]
Shiftl [a, b, c] = min m+I if p[O]p[1]=bc
is compared with the corresponding text character aligned
m+2 if p[O]=c
and in an event of success the remaining pattern characters
m+3 Otherwise say p[m-2] ....p[I] are compared to reduce the number of
comparisons.
In the same way, the indices of three successive characters When a mismatch occurs during comparison from both
a,b,c from right are (n-m-3),(n-m-2),(n-m-I) respectively sides the algorithm goes to step2 otherwise the search
V2-667
2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)
Tnt
The detailed algorithm for the same is given below.
-
V2-668
2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)
V2-669
2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)
REFERENCES
200610A3.pdf
[2] BOYER, R. S. AND MOORE, J. S. 1977. A fast string
searching algorithm. Commun. ACM 20, 10,762-772.
[ 3] T.BERRY AND S.RAVINDRAN A Fast String Matching Algorithm
And Experimental Results, Proceedings of Prague Stringology
Club, workshop'99
[4] Knuth, D'£', J.H. Morris and V.R. Pratt, 1977. Fast pattern matching
in strings. SIAM J. Comput., 6: 323-350.7.
[5 ] Horspool, R.N., 1980. Practical fast searching in strings. Software
Practice Experience, 10: 501-506.
[6] Handbook of Exact String Matching Algorithms
[7] A Fast Pattern Matching Algorithm Using Two Sliding Windows,
Journal of computer science, Vol 6 2008
V2-670