Analogic Preprocessing and Segmentation Algorithms For Off-Line Handwriting Recognition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Analogic Preprocessing and Segmentation Algorithms for Off-line Handwriting Recognition

Gergely Timr, Kristf Karacs and Csaba Rekeczky


Analogic and Neural Computing Systems Laboratory Computer and Automation Institute of the Hungarian Academy of Sciences Kende u. 13-17, 1111 Budapest, Hungary e-mail: [email protected] Tel: (36)-1-209-5357; Fax: (36)-1-209-5264 Abstract: This report describes analogic algorithms used in the preprocessing and segmentation phase of off-line handwriting recognition tasks. A segmentation based handwriting recognition approach is discussed i.e. the system attempts to segment the words into their constituent letters. In order to improve their speed, the utilized CNN algorithms, whenever possible, use dynamic, wave front propagationbased methods instead of relying on morphologic operators embedded into iterative algorithms. The system first locates the handwritten lines in the page image then corrects their skew as necessary. It then searches for the words within the lines and corrects the skew at the word level as well. A novel trigger wave-based word segmentation algorithm is presented which operates on the skeletons of words. Sample results of experiments conducted on a database of 25 handwritten pages along with suggestions for future development are presented. Key words: cellular nonlinear/neural networks (CNN), analogic spatiotemporal algorithms, CNN Universal Machine (CNN-UM), off-line handwriting recognition, segmentation, handwriting preprocessing

INTRODUCTION

In many industries there is significant demand for automatic processing of different types of handwritten materials. Obvious applications include the automatic indexing and processing of archived documents, forms, postal envelopes, notes etc. Even though human handwriting processing (HHP) has gone through substantial development during the past decades, relatively well performing systems are only available for vertical applications where the utilized vocabulary is very narrow and well defined, such as the recognition of postal addresses on envelopes and the recognition of medical prescriptions; or where the writer has to learn a new writing style (graffiti alphabet on PDA-s) so the machine can interpret it. The area of handwriting recognition has two completely different problems: online and offline handwriting recognition. Offline recognition is the reading of handwritten text sometime after it was written. This means that the input of the recognition engine is a binary or grayscale image containing the handwriting. Online handwriting recognition is the interpretation of the human handwriting real-time, as it is created. Input is usually with a special pen on an electronic notepad, which provides temporal information and trajectory data. The substantially larger amount of

input data available makes online recognition an easier task than offline recognition. The offline handwriting recognition task contains character recognition as a subproblem that has been studied using CNN algorithms [15] and Gabor filters for handprinted letters [14]. Producing reasonably well performing offline handwriting recognition systems has been an elusive goal for researchers for two reasons: firstly, handwriting is a form of self expression, and as such the same words can be written in many different styles, secondly, many handwritten words are ambiguous and can only be recognized in context. This is especially true for the recognition of unconstrained texts, where the vocabulary may be arbitrarily large. The complete area of offline handwriting recognition is very large, as it comprises the preprocessing of the input, recognition and postprocessing of the results, and each of these tasks is an actively researched subject in its own right. This paper is concerned with solving the preprocessing and segmentation tasks of an offline handwriting system without the use of a grammatical model or input from the character recognizer. 1.1 The Basic Structure of an Offline Handwriting Recognition System Even though many recognition systems try to solve the problem in different ways there is a generally applicable module structure used by segmentation-based systems that stems from the properties of the problem itself [4]. This structure is shown in Figure 1.1.
Preprocessing
Thresholding

Letter segmentation

Feature extraction
Line localization

Recognition
Word localization

Linguistic postproc.

Downing
Figure 1.1 Block diagram of an off-line handwriting recognition system For a thorough overview of handwriting recognition systems, the reader is referred to [2,3,13]. The first step is the preprocessing of the input picture. The aim is to maximize the signal to noise ratio of the input and to suppress non-relevant data. This phase contains the thresholding and filtering of the input because the subsequent operations work on binary images. The localization of the lines and words on the image are also performed. This information permits the subsequent analysis of the words in context, which may boost the recognition rate considerably. The letter segmentation and feature extraction steps may be performed either serially or in parallel. Some systems try to segment the located words into letters then build feature vectors from them, which are then processed by the recognition engine;

others begin to build the feature vectors from the words before segmenting them into letters. The next step is the recognition of letters or words based on the feature vectors obtained in the previous steps. Many methods exist to accomplish this task, such as neural networks, hidden Markov models (HMMs) [5,7] and dynamic programming techniques. The output of the recognition engine is a list of [word, confidence] pairs that contain the recognized words and the confidence level of the recognition. The linguistic postprocessor takes this list as its input and using grammatical and context information chooses the most appropriate word from the list. The postprocessor also corrects syntactic mistakes of the recognition. 1.2 Computational Complexity of the Preprocessing Algorithms In almost every procedure of the preprocessing algorithms tens or even hundreds of image-processing operations are performed on the input images. This is one of the most computationally intensive parts of off-line handwriting recognition, since the input images must be transformed into a feature space suitable for the recognition engine. The need to perform so many image-processing steps motivated the use of CNNs, one of the fastest parallel hardware architectures for such tasks [19,20,22]. 1.3 Description of the Test Handwriting Database The above algorithms were tested on a database of 25 handwritten pages collected by Andrew Senior [8,9]. The database contains 7000 words from the LOB corpus (Lancaster-Oslo / Bergen) written by a single writer. An excerpt is shown on Figure 1.2. There is a segmentation file for every page that specifies the words and their location in the image in order to ease verification of the developed algorithms. We have also created a word-count file that specifies the number of lines and the number of words on a page for each page. Since the segmentation algorithms do not distinguish punctuation marks from real words, and the segmentation files do not contain punctuation information either, we have erased the punctuation marks from the page images.

Figure 1.2 An excerpt from the handwritten text database

THE PREPROCESSING TASKS AND THEIR SOLUTIONS

These algorithms were designed to take advantage of dual digital and analog processing capabilities of the target Ace4K platform [23]. The experiments were performed with the MatCNN simulator in Matlab. We chose this approach because we could relatively quickly test the capabilities of the algorithms. We tried to devise algorithms that could be run entirely on the CNN chip to further explore the possibilities of analogic programming, careful to utilize only linear and nearestneighbor templates that can be efficiently run on existing hardware implementations. The resolution requirements of the algorithms vary, but all single word algorithms require less than 128x128 pixels, not exceeding the resolution of the next generation ACE16K chip. Where the resolution requirements of the task exceed the chips capabilities, the image is processed in tiles (mostly during the line and word segmentation algorithms). In all flowcharts, the steps designed for CNNs are shown in italic on gray background, and the ones in normal type and white background use traditional digital methods. The utilized CNN templates can be found in the Appendix. Locating the Lines Our algorithm is similar to the ones used in OCR systems [6] where lines are localized by computing the horizontal histograms for the entire image at a couple of relevant skew angles; then the angle and position where the histograms have local minima are chosen as the location between lines. Calculating the horizontal histograms requires non-linear templates on CNNs and is not supported on the current VLSI CNN implementations available to the authors, so we used the traditional histogram calculation executable on DSPs. We refined the line finding algorithm in a number of ways. In our early experiments the histograms were quite noisy, so we began using a method to blur the words without affecting their location. Based on an idea from [16] we compute the pseudo convex hull of each word using the HOLLOW template. This template computes the convex hull of the objects in the image if it is run sufficiently long (i.e. till there are no changes on the image). If it is stopped some time earlier, then the hull will only be pseudo convex. The running time of the template (approximately 37) was found by experimentation and appears to be fairly consistent across the different images. The horizontal histogram computed on the pseudo convex hulls is smoothed further via sliding-window averaging with a window size (p1) of 10 (it is effectively low pass filtered). The window size was found experimentally, but as a rule of thumb, it can be said that the smaller the average character size of the handwriting, the smaller the window needed. The next step of the algorithm is to find the local maxima of the histogram since these correspond to the location of the lines. Sometimes, more than one closely spaced local maxima correspond to the same line, usually because the skew of a line is substantial. To correct this we introduced a second parameter (p2) that specifies a threshold within which we associate all maxima with one line. This parameter is held now at 80% of the largest local maximum, since this level is sufficiently low to correct the most likely skew angles. Finally, we drop those maxima that are smaller than a given percentage of the average local maxima (p3=25%), because these are due to ligatures of letters that descend under the lower baseline (like g, y). No matter how many such letters are on one line, this threshold level is sufficiently high to exclude the resulting lower minimum from further 2.1

processing. The execution of the algorithm is illustrated on Figure 2.1. By adjusting the parameters p1, p2 and p3 the sensitivity of the algorithm can be varied widely and tuned to the given handwriting.
The raw histogram The histogram after smoothing

Figure 2.1 The results of the line localization algorithm

2.2

Correcting the Line Skew Once the lines in a given image have been localized, it is possible to correct their skew to somewhat normalize the word images. This is necessary because the skewed line images introduce extra noise from the following algorithms and the word recognizers point of view. The skew correction algorithm first finds the lowest points of the pseudo convex hulls using the LOCAL SOUTHERN ELEMENT (LSE) detector template. These points follow the baseline of the lines closely, but contain some noise due to letters such as g, j, y, p, and q. To eliminate this noise, the PRUNE and FIGREC templates are applied in succession. The PRUNE template keeps black pixels that have a black neighbor (in 4-connected sense) while the FIGREC template recalls only those parts of the original LSE filtered image that remained after the application of PRUNE. The 4connected neighborhood contains a pixels northern, southern, eastern and western neighbors, while the 8-connected contains the neighbors along the diagonals too (NE, SE, NW and SW). The last step of the algorithm uses linear regression to fit a line to remaining black pixels, to calculate the angle of that line and to rotate the original image with that angle. The flowchart of the algorithm is shown on Figure 2.2.

Line skew correction


PatchMaker & LSE Linear regression fitting with outlier rejection

Calculate skew angle, Rotate

Figure 2.2 The line skew correction algorithm

2.3

Segmentation of Lines into Words To aid further grammatical processing, context analysis and later reconstruction of each run of the algorithm, the lines are stored separately and in the same sequence as they appear on a page. Relevant information is also stored with each line, such as its original image, the de-skewed image etc. The data about the constituent words are also stored along with each line. Once the lines are processed, the next step is to locate the words on each line. This could be easily achieved with the calculation of vertical histograms and identifying the local minima in them, but we looked for an analogic approach that provides almost the same results in this particular application. The conceived algorithm is as follows: for every line, we compute the pseudo convex hull, and then apply the VCCD template. This template detects the vertically connected components in an image by shifting them downwards until they disappear. The result of the template for these types of images is (since there is usually only one vertically connected component, i.e. a word) that horizontal lines appear in the last row of the image corresponding to the largest horizontal extent of the word images. This makes it possible to extract the word images from the line very easily. The steps of the algorithm are illustrated in Figure 2.3. One may ask what happens, if the pseudo convex hulls of the words overlap? This is not nearly as big of a problem as it may first seem (in fact there are no instances of such a configuration in our database), but these sites can be identified because there will be two parallel horizontal lines where the overlap occurs.

Figure 2.3 Locating the words within a line. Red lines mark the location of words

Figure 2.4 Word and line location results for a sample page. Red lines mark the start of words and green lines mark the end

SEGMENTATION OF WORDS INTO LETTERS

Once the words within the lines have been located, all successive operations proceed at the word level since larger context information is only used at the linguistic postprocessing stage of recognition. In segmentation-based systems, where the basic units of recognition are letters, it is crucial to segment the words into letters as accurately as possible. Unfortunately, this is almost impossible to do correctly as illustrated by the Sayre paradox [11]: To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first. There are two basic approaches used to overcome this paradox. One tries to recognize the words in their entirety without segmenting them into letters. This is most effective and viable only when the set of possible words is small and known in advance, such as the recognition of bank checks and postal addresses. The other strategy is over segmentation, where one tries to identify the smallest possible word segments (primitive segments) that may be smaller than letters, but surely cannot be segmented further. Later in the recognition process these primitive segments are assembled into letters based on input from the character recognizer. The advantage of the first strategy is that it is robust and quite straightforward, but is not very flexible. Over segmentation is flexible, but requires more computational resources and is more error prone. We chose the over segmentation approach so our system will not be limited to small lexicons, but we try our best to segment the words into letters as accurately as possible in order to minimize the computational cost associated with the 7

assembly and re-recognition of compound segments. Under segmentation must also be avoided because later these errors cannot be corrected easily and may degrade the performance of the recognizer severely. There is one more problem to overcome, the so-called coarticulation effect. This is the distortion of letter shapes when they are written next to each other and is very frequent in human handwriting. Sometimes it is so severe that the word image in itself does not contain enough information to correctly segment it into letters as illustrated in Figure 3.1.

Figure 3.1 An example for a word that cannot be correctly segmented based on the word image alone (the word is Conservative)

Localization of the Upper and Lower Baselines, Skew Correction Vertical location of a character within a word is important for its recognition, even human readers rely on it to discern letters [1]. This location is best described relative to the upper and lower baselines of the word. The parts of letters that descend under the lower baseline are called descenders while parts that ascend over the upper baseline are called ascenders. Ascenders and descenders are distinguishing features of many letters such as p, q, b, l, j, k etc. Encoding the location and number of descenders in the feature vectors passed to the character recognizer could aid the accurate recognition of the letters. For this to be possible,(?) the baselines of the words must be calculated. The algorithm for determining the location of word baselines found in the literature [8] relies on calculating the vertical histogram of the word and then analyzing it. The algorithm is somewhat similar to the one used to calculate the line skew, but there are important differences. The pseudo convex hulls are created as before, but a mask needs to be generated to exclude parts of the word where only erroneous elements would be located (the upper half, when computing the lower baseline, and the lower half for the upper baseline). This is unnecessary when calculating the baseline for the whole line because there are so many local southern elements on a line (since it is much longer), that the regression fitting will still be accurate. We generated this mask by letting the HOLLOW template run longer (for approximately 58), running the FINDCENTER CNN algorithm [21] on the result, and then generating a shadow from the center point using the SHADOWUP or SHADOWDOWN templates. The result is a mask that can be used to specify where the following templates should be applied. Once this is done the local northern elements (LNE-s) have to be calculated in addition to the southern elements because the upper baselines are also needed. The LNE-s are calculated with the LNE template. Linear regression line fitting with outlier rejection is used to estimate the baselines. Outliers are points that are farther from the word center than a given threshold; this threshold can be estimated from the writing, it should be about half of the average word height. After the baselines are found, the image is rotated so the lower baseline is horizontal. The whole process is illustrated on Figure 3.2.

3.1

Word skew correction

PatchMaker & FindCenter

PatchMaker & FindCenter

ShadowDown Calculate skew angle, Rotate LSE

ShadowUp

LNE

Linear regression fitting with outlier rejection

Linear regression fitting with outlier rejection

Figure 3.2 The word baseline locator algorithm

3.2

Images Aiding Segmentation The letter segmentation algorithm developed by us is based on the same ideas as the segmentation algorithm described for touching numerals in [10]. The basic idea is that we can gain meaningful information about a words structure by skeletonizing it and its background, and locating the junction- and endpoints of the skeleton. By using these special points, we are (mostly) able to construct the segmentation boundaries. 3.3 Skeletons of the Background While pondering the problem of letter segmentation, we noticed an interesting property of the skeletonized word image backgrounds and their correct segmentation boundaries: each correct boundary line should start above the foreground skeleton (i.e. the word) and end under it, and it may only cross the foreground skeleton once. This is a necessary condition. If we skeletonize the whole background at the same time, identifying which endpoints are above and below the word skeleton becomes an extra and not so trivial task. If we take a different approach, however, this step can be skipped.

Our main idea was, that if the foreground skeleton image is 4-connected, then the skeleton could be used as a barrier between the upper and lower parts of the background skeleton. This can be exploited by flood-filling the background from the upper and lower baselines at the same time. This ensures that if there are breaks in the word, the flood wont overflow. These floods can then be separately skeletonized, and the characteristic points of the skeleton identified. Flood filling can be very efficiently accomplished with trigger waves on CNNs. A thorough description of trigger waves with CNNs, their properties and some applications can be found in [18]. The template used for filling the background is the BPROP template and the filling process is shown in Figure 3.3.

Figure 3.3 The background filling process using the BPROP template The skeletonization algorithm described in [21] is suitable for general purposes but in this particular application, the structure of the resulting skeleton is crucial, so we experimented with different ordering of the skeletonization templates. This has an enormous impact on the final skeleton, as shown in Figure 3.4. After much debate, we performed all later experiments with method f), because the skeletons obtained with this ordering are easier to analyze later, since they contain relatively long straight segments.
a) SKELE1-2-3-4-5-6-7-8

b) SKELE8-7-6-5-4-3-2-1

c) SKELE1-5-2-6-3-7-4-8

d) SKELE5-1-6-2-7-3-8-4

e) SKELE1-5-3-7, SKELE2-6-4-8, SKELE1-2-3-4-5-6-7-8

10

f) SKELE5-7-1-3, SKELE6-8-2-4, SKELE1-2-3-4-5-6-7-8

g) SKELE1-3-5-7, SKELE2-4-6-8, SKELE1-2-3-4-5-6-7-8

h) SKLHV1-2-3-4-5-6-7-8

Figure 3.4 The effects of changing the ordering of the skeletonization templates on the skeleton. The number sequences after the template base name represent the order of the individual templates used in the iterative skeletonization algorithm (these templates can be found in the Appendix)

3.3.1 Locating the Endpoints of the Background Skeleton The endpoints of the background skeleton must be found since they form the basis of the letter segmentation algorithm. By definition, a point on an 8-connected skeleton is an endpoint if and only if it has exactly one neighbor. This definition closely follows our intuitive notion of an endpoint except for two special cases. These two exceptions occur when there are branches on the skeleton that are exactly 1 pixel long (this is shown on Figure 3.5). Even though the interpretation is ambiguous (they could be considered only slight curves), we will mark them as endpoints since we do not want to lose an endpoint of the skeleton under any circumstances.

a)

b)

Figure 3.5 Special endpoint morphologies relevant for letter segmentation. The grey pixels may be black or white To find the endpoints we first erase those points that have only 1 neighbor with the GETEP template, then subtract the result from the original image. Afterwards, we find the special cases with the TOPEP and BOTEP templates and merge these points with the previous ones. We use pseudo convex hulls as a mask to constrain the search area near the writing, because skeletons sometimes have stray endpoints. 3.4 Skeletons of the Foreground The foreground skeleton must be 4-connected for the above algorithm to work and skeletonizing with the SKLHV templates (applying them in ascending order) can

11

ensure this. If the original word image itself was not 4-connected, however, then the resulting skeleton wont be 4-connected either, therefore the original image has to be preprocessed to ensure 4-connectedness. This can be done using the CONN4SE, CONN4SW, CONN4NE and CONN4NW templates that fill a given pixel according to local rules. Segmenting Words into Letters After studying the skeletonization images of handwritten words, we have found only two requirements that have to be met by a correct segmentation boundary: a) no endpoint can be part of two boundaries at the same time b) a segmentation boundary must start at a top endpoint and end at a bottom endpoint. Unfortunately, these are not enough to filter and match the already identified skeleton endpoints so some other heuristics are needed, which are illustrated on Figure 3.6. 3.5

Figure 3.6 Segmentation heuristics using skeleton endpoints At location 1, the top and bottom endpoints are located right next to each other, meaning it is certainly a segment boundary, since this can only happen where the top and bottom floods have met because of a word break (in this case it signifies the start of the first letter). Location 2 shows an instance where the skeleton needs postprocessing to unite these short branches that add only clutter to the problem. A novel approach to solve this is presented in the next section. Location 3 shows the average case where the top and bottom endpoints are located close (within a specified distance d), but not right next to each other. There is a boundary point where the shortest path connecting these points intersects the word skeleton. Finally, location 4 shows a special case that is quite common, when there is only a top endpoint with a horizontal (or near horizontal) line segment on the lower background skeleton. This is usually a segmentation point, but it has to be found separately from the other cases. 3.5.1 Postprocessing the Endpoints of Skeletons The objective of the algorithm is to replace the endpoints of two short branches of a skeleton with one endpoint located approximately halfway between the original points and on the same side of the word skeleton. This problem can be solved efficiently using trigger waves with isotropic templates. Trigger waves generated by an isotropic template (CPATCH) propagate with the same speed in all directions, so they could be used to measure distance on an image. In order to measure the distance between two points, we must be able to tell whether the two wave fronts met while the waves propagated for a specified time (say t). We can detect whether the wave fronts merged by running the inverse template (CWPATCH)

12

for the same amount of time (t). This will have the effect that in all places where the patches stayed convex (which is the same as saying where the circles did not merge) only the original starting pixels will remain, but where the wave fronts did merge because the perfect isotropy was broken, multi-pixel patches will be visible. This is illustrated on Figure 3.7. We can give an upper estimate of the distance based on the running time of the templates. The wave fronts produced by the templates presented here have a radial propagation speed of 1pixel/ according to simulations.

Figure 3.7 Distance approximation with trigger waves. The black points are the starting points; the gray circles show the maximal extent of the wave fronts (after CPATCH) and the light gray patches the result after applying the CWPATCH template The skeleton short branch merger algorithm proceeds as follows: 1. Initialise a white image with the branch endpoints as the only black pixels. 2. Run the template CPATCH for x. This generates black circles with a radius r around the initial points. Where the starting points were too close to each other, the circles will merge. 3. Run the inverse template CWPATCH for x. This shrinks the circles back to a point everywhere where the circles did not merge (they stayed convex), and to more than one pixel where they did. 4. Remove single pixels from the image with the FIGEXT template. 5. Find the center pixels of the remaining patches with the FINDCENTER CNN algorithm. 3.5.2 The Letter Segmentation Algorithm 1. Find the skeleton endpoints. 2. Merge the endpoints on short branches, separately for the top half of the background skeleton and for the bottom half. 3. Find the word breaks, i.e. the locations, where the top endpoints and bottom endpoints are right next to each other (location 1 in Figure 3.6). Store these as segmentation boundaries and remove them from further processing. 4. Find those skeleton endpoint pairs where one is on the top skeleton, the other on the bottom skeleton and they are closer to each other than some predefined distance d (location 3). This can be accomplished by the algorithm described in 3.5.1, and running FINDCENTER on the intersection of the foreground skeleton and the patches remaining after CWPATCH. Add the center points to the segmentation boundaries and remove them from further processing. 5. Generate vertical lines from the remaining top endpoints that are at most n pixels long using the DPROP template. If these reach the bottom skeleton and the lines intersect the foreground skeleton, then add the lowest intersection points to the segment boundaries.

13

Letter segmentation
Word image with baseline data Skeletonize the foreground (4-connected) Skeletonize the top half of the background (8-connected) Skeletonize the bottom half of the background (8-connected)

Detect background skeleton endpoints

Find word breaks Filter close bottom endpoints

Filter close top endpoints

Match endpoints List of segmentation points

Figure 3.8 Flowchart of the letter segmentation algorithm

DISCUSSION OF THE RESULTS

We have run these line, word and letter segmentation algorithms on 10 pages of the 25-page database. The line segmentation algorithm found every line on every page correctly. Statistics for the word segmentation algorithm are shown in Figure 4.1.
Total number of lines processed: Correctly segmented lines: Incorrectly segmented lines: Lines w. more words than expected: Lines w. fewer words than expected: 169 146 (86.39 %) 23 (13.61 %) 12 (7.1 %) 11 (6.51 %)

Figure 4.1 Word segmentation results for 10 pages These results are quite good, since 87% of the lines were segmented correctly into words, and out of the erroneous 11 there were only 3 lines with less words than expected. This is important, because merging words afterwards is easier than splitting up words. This means, that 93.49% of the lines were either correctly segmented, or can be easily corrected. Assessing the results of the letter segmentation algorithm is not as straightforward as the results of the other two. The problem is that there are many equivalent segmentation boundaries, which cannot be identified easily by a computer program; one has to inspect the word images manually, one by one, which requires an enormous amount of time and is error-prone. Using a statistical approach, we can evaluate the algorithm differently. Since the goal was over-segmentation, we can assess the algorithm by comparing the number of segmentation boundaries with the number of letters specified in the segmentation file, which will give us a rough

14

estimate of the effectiveness of the algorithm. These statistics are shown in Figure 4.2.
Total number of words found: Total correctly (over)segmented words: Total incorrectly (under)segmented words: 1201 890 (74.10 %) 311 (25.90 %)

Figure 4.2 Letter segmentation results for 7 pages We also inspected the segmentation of random words visually to further grade the algorithm. Results for a few words can be seen in Figure 4.3. Note, that there are a couple of problem sites (encircled), some of which are difficult to resolve even for a human reader. It is also evident why automatic evaluation of the segment boundaries is next to impossible.

real judge difficult

about

executive segregation

Figure 4.3 Letter segmentation results for a few sample words. The segment boundaries have been manually enlarged to be clearly visible. The circles indicate problem spots

CONCLUSIONS & FUTURE WORK

We have demonstrated that analogic algorithms can be utilized effectively in the preprocessing and segmentation problems of off-line handwriting recognition. By avoiding iterative methods and using propagating wave-based approaches wherever possible, the inherent parallel processing capabilities of CNN arrays can be greatly exploited. It is also evident from the segmentation efforts, that a linear feed-forward approach (segmentation recognition postprocessing) may not be the best and most robust way to tackle the problem. Studies conducted on human subjects indicate that readers do not read a word linearly from left to right, letter by letter, rather they try to identify the most distinctive letters first, such as the ones that are at the beginning and the end of the word [1], and those with some prominent features. If this limited information is sufficient to recognize the word with the help of context information, they move on, otherwise they attempt to recognize more letters from the 15

word. We are designing an experimental framework, which would enable us to imitate this recognition process. It is evident that the use of a lexicon based linguistic system - which is able to generate all grammatically correct forms of the words of a language from the earliest stages of processing can provide additional information to support the segmentation and recognition phase. This information could enable the dynamic reduction of lexicon size, selecting only candidate words that fit a given feature set. Since segmentation is so tightly coupled to recognition, an approach that utilizes recognition information during segmentation is expected to work better than a feed-forward mechanism [12]. The structure of the handwriting recognition system described is shown on Figure 5.1.

Preprocessing Thresholding Line localization Word localization

Feature extraction 1. Letter segmentation

Lexicon lookup

2.

Recognition

Linguistic postprocessing

Downing

Figure 5.1 Block diagram of the proposed handwriting recognition system. The numbered lines represent alternative paths to recognition. Path 1 shows the case when the lexicon cannot help in reducing the number of candidate words and the word must be further segmented and recognized. Path 2 illustrates the case when the lexicon lookup narrows down the number of candidate words so effectively that recognition is trivial. Human subjects rely heavily on so-called perceptual features such as ascenders, descenders, t-crossings, loops and word length to identify words and letters when reading. We are also considering the use of these features as the basis of recognition, since we think that the detection of these features lends itself well to CNN algorithms.

REFERENCES
International Journal On Document Analysis And Recognition, Vol. 2, pp.13-18, 1999.

1. L. Schomaker and E. Segers: Finding features used in the human reading of cursive handwriting, 2. G. Lorette: Handwriting recognition or reading? What is the situation at the dawn of the 3rd
millenium?, International Journal On Document Analysis And Recognition, Vol. 2, pp.2-12, 1999. 3. T. Steinherz, E. Rivlin and N. Intrator: Offline cursive script recognition a survey, International Journal On Document Analysis And Recognition, Vol. 2, pp.90-110, 1999. 4. G. Kim, V. Govindaraju and S. Shrihari: An architecture for handwritten text recognition systems, International Journal On Document Analysis And Recognition, Vol. 2, pp. 37-44, 1999.

16

5. A. El-Yacoubi, M. Gilloux, R. Sabourin and C.Y Suen: An HMM-based approach for offline 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
unconstrained handwritten word modeling and recognition, IEEE Transactions On Pattern Matching And Machine Intelligence, Vol. 21, No. 8, pp.752-760, 1999. S. N. Shrihari et al.: Analysis of Textual Images Using The Hough Transform, Machine Vision and Applications, Vol. 2, pp.141-153, 1989. M. Mohamed and P. Gader: Generalized Hidden Markov Models Part II: Application to Handwritten Word Recognition, IEEE Transactions On Fuzzy Systems, Vol. 8, pp. 82-94, 2000. A. Senior and A.J Robinson: An Off-line Cursive Handwriting Recognition System, IEEE Transactions On Pattern Matching And Machine Intelligence, Vol. 20, pp. 309-321, 1998. A. Senior LOB Database: ftp://svr-ftp.eng.cam.ac.uk/pub/reports/Senior_tr105.ps.Z Y. Chen and J. Wang: Segmentation of Single- or Multiple-Touching Handwritten Numeral String Using Background and Foreground analysis, IEEE Transactions On Pattern Matching And Machine Intelligence, Vol. 22, pp.1304-1317, 2000. Sayre: Machine Recognition of Handwritten Words: A Project Report, Pattern Recognition, Vol. 5, No. 3, pp. 213-228, 1973. Sriganesh Madhvanath and Venu Govindaraju: The Role of Holistic Paradigms in Handwritten Word Recognition, IEEE Transactions On Pattern Matching And Machine Intelligence, Vol. 23, pp.149-164, 2001. R. Plamondon and S. Shrihari: On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey, IEEE Transactions On Pattern Matching And Machine Intelligence, Vol. 22, pp. 63-84, 2000. Ertugrul Saatci and Vedat Tavsanoglu: Multiscale Handwritten Character Recognition Using CNN Image Filters, Proceedings of the 13th International Joint Conference on Neural Networks, IJCNN 2002, Honolulu, 2002, in print. Tams Szirnyi and Jzsef Csicsvri: High-Speed Character Recognition Using a Dual Cellular Neural Network Architecture (CNND) IEEE Transactions on Circuits and Systems, Vol. 40, pp.223-231, 1993 Morita M., Bortolozzi F., Facon J. and Sabourin R., Morphological approach of handwritten word skew correction, X SIBGRAPI98, International Symposium on Computer Graphics, Image Processing and Vision, Rio de Janeiro, Brazil, Oct. 1998. Graham R. L., Yao F. F.: Finding the Convex Hull of a Simple Polygon, J. Algorithms, Vol. 4, pp. 324-331, 1983. Cs. Rekeczky, L. Chua: Computing with Front Propagation: Active Contour and Skeleton Models in Continous-Time CNN, Journal of VLSI Signal Processing, Vol. 23, pp. 373-402, 1999. L. O. Chua, and T. Roska: The CNN Paradigm, IEEE Trans. on Circuits and Systems, 40:147156, 1993. L. O. Chua and T. Roska: Cellular Neural Networks: Foundations and Primer, Lecture Notes for the course EE129 at University of California at Berkeley, 1997. T. Roska and L. Kk, CNN Software Library (Templates and Algorithms), Version 1.1, Analogic Computers Ltd., Budapest, 2000. T. Roska and L. O. Chua, The CNN Universal Machine: An Analogic Array Computer, IEEE Transactions on Circuits and SystemsII: Analog and Digital Signal Processing, Vol. 40, pp. 163173, March 1993. G. Lin, S. Espejo, R. Domnguez-Castro and Rodrguez-Vzquez, ACE4k: An analog I/O 64 x 64 visual microprocessor chip with 7-bit analog accuracy, International Journal Of Circuit Theory and Applications, Vol. 30, pp. 89-116, May-June 2002.

17

APPENDIX: THE UTILIZED CNN TEMPLATES


a2 A = a1 a2
Template a0
4 1 3 3 3 3 3 3 3

Linear, isotropic templates:


a1 a0 a1 a2 a1 a2 , b2 B = b1 b2 b1 b0 b1 b2 b1 b2
z
3 -1 -1.5 3.75 -3.75 3.65 -3.65 2.25 -1.25

Feedback (A) a1
0.5 0 0.5 0.25 0.25 0.25 0.25 0.25 0.25

Control (B) a2
0.5 0 0 0.25 0.25 0.2 0.2 0.25 0.25

Current

BCond Bc
-1 -1 0 -1 1 -1 -1 -1 -1

State* X(0)
M

b0
4 8 0 0 0 0 0 0 0

b1
0 1 0 0 0 0 0 0 0

b2
0 1 0 0 0 0 0 0 0

FIGREC FIGEXT PRUNE BPROP WPROP CPATCH CWPATCH HOLLOW GETEP *: P the input image M the marker image DPROP:

P P P P P P

Linear, non-isotropic templates:


X (0) = P, 0 1.75 0 A = 0 3 0 0 0 0 LSE:
0 0 0 0 0 1 0 A= , B=0 0 0 0 1 LNE: Same as LSE, but the B matrix element by 180 VCCD: X (0) = P, 0 1 0 A = 0 2 0 0 1 0 CONN4SE: , 0 0 1 0 1 1 , z = 3

B=0

z = 3.75

must be rotated around the center

0 0 0 B = 0 0 0 0 0 0

z=0

18

X (0) = 0, 0 0 0 0 0 0 0 3 0 A= B = 0 1 1 z = 3 , , 0 0 0 0 1 1 CONN4SW, CONN4NW, CONN4NE: Same as CONN4SW, but the B matrix must be rotated around the center element by 90, 180 and 270 TOPEP: 0 0 0 0 0 0 0 1 0 A= , B = 1 1 1 , z = 5.5 0 0 0 1 1 1
BOTEP:
0 A = 0 0 SKELE1: 0 A = 0 0 0 0 1 0 0 0 , 1 1 1 B = 1 1 1 0 0 0 , z = 5.5

0 0 1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0

1 1 0 B = 1 7 1 0 1 0 1 1 1 0 7 0 B= 0.5 1 0.5
0 1 1 B = 1 7 1 0 1 0

z = 3

SKELE2:

0 A = 0 0 SKELE3: 0 A = 0 0
SKELE4:

z = 3.4

z = 3

0 A = 0 0 SKELE5: 0 A = 0 0
SKELE6:

0 0 1 0 0 0 0 0 1 0 0 0

0.5 0 1 B = 1 7 1 0.5 0 1 0 1 0 B = 1 7 1 0 1 1
0.5 1 0.5 B= 0 7 0 1 1 1

z = 3.4

z = 3

0 0 0 A = 0 1 0 0 0 0 SKELE7:

z = 3.4

19

0 A = 0 0 SKELE8: 0 A = 0 0
SKLHV1:

0 0 1 0 0 0
0 0 1 0 0 0

0 1 0 B = 1 7 1 1 1 0
1 0 0.5 B = 1 7 1 1 0 0.5

z = 3

z = 3.4

0 0 0 1 1 0 0 1 0 A= , B = 1 7 1 , z = 2 0 0 0 0 1 1 SKLHV2 SKLHV8: Same as SklHV1, but the B matrix must be rotated around the center element one by one

20

You might also like