2018 24th International Conference on Pattern Recognition (ICPR), 2018
Water image classification is challenging because water images of ocean or river share the same p... more Water image classification is challenging because water images of ocean or river share the same properties with images of polluted water such as fungus, waste and rubbish. In this paper, we present a method for classifying clean and polluted water images. The proposed method explores Fourier transform based features for extracting texture properties of clean and polluted water images. Fourier spectrum of each input image is divided into several sub-regions based on angle and spatial information. For each region over the spectrum, the proposed method extracts mean and variance features using intensity values, which results in a feature matrix. The feature matrix is then passed to an SVM classifier for the classification of clean and polluted water images. Experimental results on classes of clean and polluted water images show that the proposed method is effective. Furthermore, a comparative study with the state-of-the-art method shows that the proposed method outperforms the existing method in terms of classification rate, recall, precision and F-measure.
Scene text binarization and recognition is a challenging task due to different appearance of text... more Scene text binarization and recognition is a challenging task due to different appearance of text in clutter background and uneven illumination in natural scene images. In this paper, we present a new method based on adaptive histogram analysis for each sliding window over a word of a text line detected by the text detection method. The histogram analysis works on the basis that intensity values of text pixels in each sliding window have uniform color. The method segments the words based on region growing which studies spacing between words and characters. Then we propose to use existing OCRs such as ABBYY and Tesseract (Google) to recognize the text line at word and character levels to validate the binarization results. The method is compared with well-known global thresholding technique of binarization to show its effectiveness.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
Text line segmentation from handwritten documents is challenging when a document image contains s... more Text line segmentation from handwritten documents is challenging when a document image contains severe touching. In this paper, we propose a new idea based on Weighted-Gradient Features (WGF) for segmenting text lines. The proposed method finds the number of zero crossing points for every row of Canny edge image of the input one, which is considered as the weights of respective rows. The weights are then multiplied with gradient values of respective rows of the image to widen the gap between pixels in the middle portion of text and the other portions. Next, k-means clustering is performed on WGF to classify middle and other pixels of text. The method performs morphological operation to obtain word components as patches for the result of clustering. The patches in both the clusters are matched to find common patch areas, which helps in reducing touching effect. Then the proposed method checks linearity and non-linearity iteratively based on patch direction to segment text lines. The method is tested on our own and standard datasets, namely, Alaei, ICDAR 2013 robust competition on handwriting context and ICDAR 2015-HTR, to evaluate the performance. Further, the method is compared with the state of art methods to show its effectiveness and usefulness.
Automatic script identification in archives of documents is essential for searching a specific do... more Automatic script identification in archives of documents is essential for searching a specific document in order to choose an appropriate Optical Character Recognizer (OCR) for recognition. Besides, identification of one of the oldest historical documents such as Indus scripts is challenging and interesting because of inter script similarities. In this work, we propose a new robust script identification system for Indian scripts that includes Indus documents and other scripts, namely, English, Kannada, Tamil, Telugu, Hindi and Gujarati which helps in selecting an appropriate OCR for recognition. The proposed system explores the spatial relationship between dominant points,namely, intersection points, end points and junction points of the connected components in the documents to extract the structure of the components. The degree of similarity between the scripts is studied by computing the variances of the proximity matrices of dominant points of the respective scripts. The method is evaluated on 700 scanned document images. Experimentalresults show that the proposed system outperforms the existing methods in terms of classification rate.
Text detection and recognition in poor quality video is a challenging problem due to unpredictabl... more Text detection and recognition in poor quality video is a challenging problem due to unpredictable blur and distortion effects caused by camera and text movements. This affects the overall performance of the text detection and recognition methods. This paper presents a combined quality metric for estimating the degree of blur in the video/image. Then the proposed method introduces a blind deconvolution model that enhances the edge intensity by suppressing blurred pixels. The proposed deblurring model is compared with other stateof-the-art models to demonstrate its superiority. In addition, to validate the usefulness and the effectiveness of the proposed model, we conducted text detection and recognition experiments on blurred images classified by the proposed model from standard video databases, namely, ICDAR 2013, ICDAR 2015, YVT and then standard natural scene image databases, namely, ICDAR 2013, SVT, MSER. Text detection and recognition results on both blurred and deblurred video/images illustrate that the proposed model improves the performance significantly.
2013 12th International Conference on Document Analysis and Recognition, 2013
This paper presents a two-stage method for multioriented video character segmentation. Words segm... more This paper presents a two-stage method for multioriented video character segmentation. Words segmented from video text lines are considered for character segmentation in the present work. Words can contain isolated or non-touching characters, as well as touching characters. Therefore, the character segmentation problem can be viewed as a two stage problem. In the first stage, text cluster is identified and isolated (nontouching) characters are segmented. The orientation of each word is computed and the segmentation paths are found in the direction perpendicular to the orientation. Candidate segmentation points computed using the top distance profile are used to find the segmentation path between the characters considering the background cluster. In the second stage, the segmentation results are verified and a check is performed to ascertain whether the word component contains touching characters or not. The average width of the components is used to find the touching character components. For segmentation of the touching characters, segmentation points are then found using average stroke width information, along with the top and bottom distance profiles. The proposed method was tested on a large dataset and was evaluated in terms of precision, recall and f-measure. A comparative study with existing methods reveals the superiority of the proposed method.
International Journal on Document Analysis and Recognition (IJDAR), 2015
Thinning that preserves visual topology of characters in video is challenging in the field of doc... more Thinning that preserves visual topology of characters in video is challenging in the field of document analysis and video text analysis due to low resolution and complex background. This paper proposes to explore ring radius transform (RRT) to generate a radius map from Canny edges of each input image to obtain its medial axis. A radius value contained in the radius map here is the nearest distance to the edge pixels on contours. For the radius map, the method proposes a novel idea for identifying medial axis (middle pixels between two strokes) for arbitrary orientations of the character. Iterative-maximal-growing is then proposed to connect missing medial axis pixels at junctions and intersections. Next, we perform histogram on color information of medial axes with clustering to eliminate false medial axis segments. The method finally restores the shape of the character through radius values of medial axis pixels for the purpose of recognition with the Google Open source OCR (Tesseract). The method has been tested on video, natural scene and handwrit
There are situations where it is not possible to capture or scan a large document with given imag... more There are situations where it is not possible to capture or scan a large document with given imaging media such as Xerox machine or scanner as a single image in a single exposure because of their inherent limitations. This results in capturing or scanning of large document into number of split components of a document. Hence, there is a lot of scope for mosaicing the several split images into a single large document image. In this work, we present a novel technique Fourier Transform (FT) based Column-Block (CB) and Row-Block (RB) matching procedure to mosaic the two split images of a large document in order to build an original and single large document image. The FT is rarely used in the analysis of documents since it provides only the global information of the document. The global information doesn't help in analyzing the documents of split images since the mosaicing of split document images requires local information rather than global information. Hence, in this work, we explore a novel idea to obtain local values of the split documents by applying FT for smaller sized split documents of split document images. The proposed method assumes that the overlapping region is present at the right end of split image 1 and the left end of split image 2. The overlapping region is a common region, which helps in mosaicing.
Text detection in the real world images captured in unconstrained environment is an important yet... more Text detection in the real world images captured in unconstrained environment is an important yet challenging computer vision problem due to a great variety of appearances, cluttered background, and character orientations. In this paper, we present a robust system based on the concepts of Mutual Direction Symmetry (MDS), Mutual Magnitude Symmetry (MMS) and Gradient Vector Symmetry (GVS) properties to identify text pixel candidates regardless of any orientations including curves (e.g. circles, arc shaped) from natural scene images. The method works based on the fact that the text patterns in both Sobel and Canny edge maps of the input images exhibit a similar behavior. For each text pixel candidate, the method proposes to explore SIFT features to refine the text pixel candidates, which results in text representatives. Next an ellipse growing process is introduced based on a nearest neighbor criterion to extract the text components. The text is verified and restored based on text direction and spatial study of pixel distribution of components to filter out non-text components. The proposed method is evaluated on three benchmark datasets, namely, ICDAR2005 and ICDAR2011 for horizontal text evaluation, MSRA-TD500 for non-horizontal straight text evaluation and on our own dataset (CUTE80) that consists of 80 images for curved text evaluation to show its effectiveness and superiority over existing methods.
Detection of both scene text and graphic text in video images is gaining popularity in the area o... more Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while low contrast refers to dim intensity values in the video images. The method introduces heuristic rules based on combination of filters and edge analysis for the classification purpose. The heuristic rules are derived based on the fact that the number of Sobel edge components is more than the number of Canny edge components in the case of high contrast video images, and vice versa for low contrast video images. In order to demonstrate the use of this classification on video text detection, we implement a method based on Sobel edges and texture features for detecting text in video images. Experiments are conducted using video images containing both graphic text and scene text with different fonts, sizes, languages, backgrounds. The results show that the proposed method outperforms existing methods in terms of detection rate, false alarm rate, misdetection rate and inaccurate boundary rate.
In the field of multimedia retrieval in video, text frame classification is essential for text de... more In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection etc. We propose a new text frame classification method that introduces a combination of wavelet and median-moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min Clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called Mutual Nearest Neighbor based Symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.
IEEE Transactions on Circuits and Systems for Video Technology, 2010
In this paper, we propose new Fourier-statistical features (FSF) in RGB space for detecting text ... more In this paper, we propose new Fourier-statistical features (FSF) in RGB space for detecting text in video frames of unconstrained background, different fonts, different scripts, and different font sizes. This paper consists of two parts namely automatic classification of text frames from a large database of text and non-text frames and FSF in RGB for text detection in the classified text frames. For text frame classification, we present novel features based on three visual cues, namely, sharpness in filter-edge maps, straightness of the edges, and proximity of the edges to identify a true text frame. For text detection in video frames, we present new Fourier transform based features in RGB space with statistical features and the computed FSF features from RGB bands are subject to K-means clustering to classify text pixels from the background of the frame. Text blocks of the classified text pixels are determined by analyzing the projection profiles. Finally, we introduce a few heuristics to eliminate false positives from the frame. The robustness of the proposed approach is tested by conducting experiments on a variety of frames of low contrast, complex background, different fonts, and sizes of text in the frame. Both our own test dataset and a publicly available dataset are used for the experiments. The experimental results show that the proposed approach is superior to existing approaches in terms of detection rate, false positive rate, and misdetection rate.
There are situations where it is not possible to capture a large document with a given imaging me... more There are situations where it is not possible to capture a large document with a given imaging media such as scanner or copying machine in a single stretch because of their inherent limitations. This results in capturing a large document in terms of split components of a ...
Skew angle estimation is an important component of optical character recognition (OCR) systems an... more Skew angle estimation is an important component of optical character recognition (OCR) systems and document analysis systems (DAS). In this paper, a novel and an efficient method to estimate the skew angle of a scanned document image is proposed. The proposed method has ...
2012 International Conference on Digital Image Computing Techniques and Applications (DICTA), 2012
Word segmentation has become a research topic to improve OCR accuracy for video text recognition,... more Word segmentation has become a research topic to improve OCR accuracy for video text recognition, because a video text line suffers from arbitrary orientation, complex background and low resolution. Therefore, for word segmentation from arbitrarily-oriented video text lines, in this paper, we extract four new gradient directional features for each Canny edge pixel of the input text line image to produce four respective pixel candidate images. The union of four pixel candidate images is performed to obtain a text candidate image. The sequence of the components in the text candidate image according to the text line is determined using nearest neighbor criteria. Then we propose a two-stage method for segmenting words. In the first stage, for the distances between the components, we apply K-means clustering with K=2 to get probable word and non-word spacing clusters. The words are segmented based on probable word spacing and all other components are passed to the second stage for segmenting correct words. For each segmented and un-segmented words passed to the second stage, the method repeats all the steps until the K-means clustering step to find probable word and non-word spacing clusters. Then the method considers cluster nature, height and width of the components to identify the correct word spacing. The method is tested extensively on video curved text lines, non-horizontal straight lines, horizontal straight lines and text lines from the ICDAR-2003 competition data. Experimental results and a comparative study shows the results are encouraging and promising.
2011 International Conference on Document Analysis and Recognition, 2011
Abstract This paper presents a new method based on Fourier and moments features to extract words ... more Abstract This paper presents a new method based on Fourier and moments features to extract words and characters from a video text line in any direction for recognition. Unlike existing methods which output the entire text line to the ensuing recognition algorithm, the ...
2018 24th International Conference on Pattern Recognition (ICPR), 2018
Water image classification is challenging because water images of ocean or river share the same p... more Water image classification is challenging because water images of ocean or river share the same properties with images of polluted water such as fungus, waste and rubbish. In this paper, we present a method for classifying clean and polluted water images. The proposed method explores Fourier transform based features for extracting texture properties of clean and polluted water images. Fourier spectrum of each input image is divided into several sub-regions based on angle and spatial information. For each region over the spectrum, the proposed method extracts mean and variance features using intensity values, which results in a feature matrix. The feature matrix is then passed to an SVM classifier for the classification of clean and polluted water images. Experimental results on classes of clean and polluted water images show that the proposed method is effective. Furthermore, a comparative study with the state-of-the-art method shows that the proposed method outperforms the existing method in terms of classification rate, recall, precision and F-measure.
Scene text binarization and recognition is a challenging task due to different appearance of text... more Scene text binarization and recognition is a challenging task due to different appearance of text in clutter background and uneven illumination in natural scene images. In this paper, we present a new method based on adaptive histogram analysis for each sliding window over a word of a text line detected by the text detection method. The histogram analysis works on the basis that intensity values of text pixels in each sliding window have uniform color. The method segments the words based on region growing which studies spacing between words and characters. Then we propose to use existing OCRs such as ABBYY and Tesseract (Google) to recognize the text line at word and character levels to validate the binarization results. The method is compared with well-known global thresholding technique of binarization to show its effectiveness.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
Text line segmentation from handwritten documents is challenging when a document image contains s... more Text line segmentation from handwritten documents is challenging when a document image contains severe touching. In this paper, we propose a new idea based on Weighted-Gradient Features (WGF) for segmenting text lines. The proposed method finds the number of zero crossing points for every row of Canny edge image of the input one, which is considered as the weights of respective rows. The weights are then multiplied with gradient values of respective rows of the image to widen the gap between pixels in the middle portion of text and the other portions. Next, k-means clustering is performed on WGF to classify middle and other pixels of text. The method performs morphological operation to obtain word components as patches for the result of clustering. The patches in both the clusters are matched to find common patch areas, which helps in reducing touching effect. Then the proposed method checks linearity and non-linearity iteratively based on patch direction to segment text lines. The method is tested on our own and standard datasets, namely, Alaei, ICDAR 2013 robust competition on handwriting context and ICDAR 2015-HTR, to evaluate the performance. Further, the method is compared with the state of art methods to show its effectiveness and usefulness.
Automatic script identification in archives of documents is essential for searching a specific do... more Automatic script identification in archives of documents is essential for searching a specific document in order to choose an appropriate Optical Character Recognizer (OCR) for recognition. Besides, identification of one of the oldest historical documents such as Indus scripts is challenging and interesting because of inter script similarities. In this work, we propose a new robust script identification system for Indian scripts that includes Indus documents and other scripts, namely, English, Kannada, Tamil, Telugu, Hindi and Gujarati which helps in selecting an appropriate OCR for recognition. The proposed system explores the spatial relationship between dominant points,namely, intersection points, end points and junction points of the connected components in the documents to extract the structure of the components. The degree of similarity between the scripts is studied by computing the variances of the proximity matrices of dominant points of the respective scripts. The method is evaluated on 700 scanned document images. Experimentalresults show that the proposed system outperforms the existing methods in terms of classification rate.
Text detection and recognition in poor quality video is a challenging problem due to unpredictabl... more Text detection and recognition in poor quality video is a challenging problem due to unpredictable blur and distortion effects caused by camera and text movements. This affects the overall performance of the text detection and recognition methods. This paper presents a combined quality metric for estimating the degree of blur in the video/image. Then the proposed method introduces a blind deconvolution model that enhances the edge intensity by suppressing blurred pixels. The proposed deblurring model is compared with other stateof-the-art models to demonstrate its superiority. In addition, to validate the usefulness and the effectiveness of the proposed model, we conducted text detection and recognition experiments on blurred images classified by the proposed model from standard video databases, namely, ICDAR 2013, ICDAR 2015, YVT and then standard natural scene image databases, namely, ICDAR 2013, SVT, MSER. Text detection and recognition results on both blurred and deblurred video/images illustrate that the proposed model improves the performance significantly.
2013 12th International Conference on Document Analysis and Recognition, 2013
This paper presents a two-stage method for multioriented video character segmentation. Words segm... more This paper presents a two-stage method for multioriented video character segmentation. Words segmented from video text lines are considered for character segmentation in the present work. Words can contain isolated or non-touching characters, as well as touching characters. Therefore, the character segmentation problem can be viewed as a two stage problem. In the first stage, text cluster is identified and isolated (nontouching) characters are segmented. The orientation of each word is computed and the segmentation paths are found in the direction perpendicular to the orientation. Candidate segmentation points computed using the top distance profile are used to find the segmentation path between the characters considering the background cluster. In the second stage, the segmentation results are verified and a check is performed to ascertain whether the word component contains touching characters or not. The average width of the components is used to find the touching character components. For segmentation of the touching characters, segmentation points are then found using average stroke width information, along with the top and bottom distance profiles. The proposed method was tested on a large dataset and was evaluated in terms of precision, recall and f-measure. A comparative study with existing methods reveals the superiority of the proposed method.
International Journal on Document Analysis and Recognition (IJDAR), 2015
Thinning that preserves visual topology of characters in video is challenging in the field of doc... more Thinning that preserves visual topology of characters in video is challenging in the field of document analysis and video text analysis due to low resolution and complex background. This paper proposes to explore ring radius transform (RRT) to generate a radius map from Canny edges of each input image to obtain its medial axis. A radius value contained in the radius map here is the nearest distance to the edge pixels on contours. For the radius map, the method proposes a novel idea for identifying medial axis (middle pixels between two strokes) for arbitrary orientations of the character. Iterative-maximal-growing is then proposed to connect missing medial axis pixels at junctions and intersections. Next, we perform histogram on color information of medial axes with clustering to eliminate false medial axis segments. The method finally restores the shape of the character through radius values of medial axis pixels for the purpose of recognition with the Google Open source OCR (Tesseract). The method has been tested on video, natural scene and handwrit
There are situations where it is not possible to capture or scan a large document with given imag... more There are situations where it is not possible to capture or scan a large document with given imaging media such as Xerox machine or scanner as a single image in a single exposure because of their inherent limitations. This results in capturing or scanning of large document into number of split components of a document. Hence, there is a lot of scope for mosaicing the several split images into a single large document image. In this work, we present a novel technique Fourier Transform (FT) based Column-Block (CB) and Row-Block (RB) matching procedure to mosaic the two split images of a large document in order to build an original and single large document image. The FT is rarely used in the analysis of documents since it provides only the global information of the document. The global information doesn't help in analyzing the documents of split images since the mosaicing of split document images requires local information rather than global information. Hence, in this work, we explore a novel idea to obtain local values of the split documents by applying FT for smaller sized split documents of split document images. The proposed method assumes that the overlapping region is present at the right end of split image 1 and the left end of split image 2. The overlapping region is a common region, which helps in mosaicing.
Text detection in the real world images captured in unconstrained environment is an important yet... more Text detection in the real world images captured in unconstrained environment is an important yet challenging computer vision problem due to a great variety of appearances, cluttered background, and character orientations. In this paper, we present a robust system based on the concepts of Mutual Direction Symmetry (MDS), Mutual Magnitude Symmetry (MMS) and Gradient Vector Symmetry (GVS) properties to identify text pixel candidates regardless of any orientations including curves (e.g. circles, arc shaped) from natural scene images. The method works based on the fact that the text patterns in both Sobel and Canny edge maps of the input images exhibit a similar behavior. For each text pixel candidate, the method proposes to explore SIFT features to refine the text pixel candidates, which results in text representatives. Next an ellipse growing process is introduced based on a nearest neighbor criterion to extract the text components. The text is verified and restored based on text direction and spatial study of pixel distribution of components to filter out non-text components. The proposed method is evaluated on three benchmark datasets, namely, ICDAR2005 and ICDAR2011 for horizontal text evaluation, MSRA-TD500 for non-horizontal straight text evaluation and on our own dataset (CUTE80) that consists of 80 images for curved text evaluation to show its effectiveness and superiority over existing methods.
Detection of both scene text and graphic text in video images is gaining popularity in the area o... more Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while low contrast refers to dim intensity values in the video images. The method introduces heuristic rules based on combination of filters and edge analysis for the classification purpose. The heuristic rules are derived based on the fact that the number of Sobel edge components is more than the number of Canny edge components in the case of high contrast video images, and vice versa for low contrast video images. In order to demonstrate the use of this classification on video text detection, we implement a method based on Sobel edges and texture features for detecting text in video images. Experiments are conducted using video images containing both graphic text and scene text with different fonts, sizes, languages, backgrounds. The results show that the proposed method outperforms existing methods in terms of detection rate, false alarm rate, misdetection rate and inaccurate boundary rate.
In the field of multimedia retrieval in video, text frame classification is essential for text de... more In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection etc. We propose a new text frame classification method that introduces a combination of wavelet and median-moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min Clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called Mutual Nearest Neighbor based Symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.
IEEE Transactions on Circuits and Systems for Video Technology, 2010
In this paper, we propose new Fourier-statistical features (FSF) in RGB space for detecting text ... more In this paper, we propose new Fourier-statistical features (FSF) in RGB space for detecting text in video frames of unconstrained background, different fonts, different scripts, and different font sizes. This paper consists of two parts namely automatic classification of text frames from a large database of text and non-text frames and FSF in RGB for text detection in the classified text frames. For text frame classification, we present novel features based on three visual cues, namely, sharpness in filter-edge maps, straightness of the edges, and proximity of the edges to identify a true text frame. For text detection in video frames, we present new Fourier transform based features in RGB space with statistical features and the computed FSF features from RGB bands are subject to K-means clustering to classify text pixels from the background of the frame. Text blocks of the classified text pixels are determined by analyzing the projection profiles. Finally, we introduce a few heuristics to eliminate false positives from the frame. The robustness of the proposed approach is tested by conducting experiments on a variety of frames of low contrast, complex background, different fonts, and sizes of text in the frame. Both our own test dataset and a publicly available dataset are used for the experiments. The experimental results show that the proposed approach is superior to existing approaches in terms of detection rate, false positive rate, and misdetection rate.
There are situations where it is not possible to capture a large document with a given imaging me... more There are situations where it is not possible to capture a large document with a given imaging media such as scanner or copying machine in a single stretch because of their inherent limitations. This results in capturing a large document in terms of split components of a ...
Skew angle estimation is an important component of optical character recognition (OCR) systems an... more Skew angle estimation is an important component of optical character recognition (OCR) systems and document analysis systems (DAS). In this paper, a novel and an efficient method to estimate the skew angle of a scanned document image is proposed. The proposed method has ...
2012 International Conference on Digital Image Computing Techniques and Applications (DICTA), 2012
Word segmentation has become a research topic to improve OCR accuracy for video text recognition,... more Word segmentation has become a research topic to improve OCR accuracy for video text recognition, because a video text line suffers from arbitrary orientation, complex background and low resolution. Therefore, for word segmentation from arbitrarily-oriented video text lines, in this paper, we extract four new gradient directional features for each Canny edge pixel of the input text line image to produce four respective pixel candidate images. The union of four pixel candidate images is performed to obtain a text candidate image. The sequence of the components in the text candidate image according to the text line is determined using nearest neighbor criteria. Then we propose a two-stage method for segmenting words. In the first stage, for the distances between the components, we apply K-means clustering with K=2 to get probable word and non-word spacing clusters. The words are segmented based on probable word spacing and all other components are passed to the second stage for segmenting correct words. For each segmented and un-segmented words passed to the second stage, the method repeats all the steps until the K-means clustering step to find probable word and non-word spacing clusters. Then the method considers cluster nature, height and width of the components to identify the correct word spacing. The method is tested extensively on video curved text lines, non-horizontal straight lines, horizontal straight lines and text lines from the ICDAR-2003 competition data. Experimental results and a comparative study shows the results are encouraging and promising.
2011 International Conference on Document Analysis and Recognition, 2011
Abstract This paper presents a new method based on Fourier and moments features to extract words ... more Abstract This paper presents a new method based on Fourier and moments features to extract words and characters from a video text line in any direction for recognition. Unlike existing methods which output the entire text line to the ensuing recognition algorithm, the ...
Uploads
Papers by P Shivakumara