A Robust and Fast Text Extraction in Images and Video Frames

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

A Robust and Fast Text Extraction in Images and

Video Frames

Anubhav Kumar1, Awanish Kr. Kaushik1, R.L. Yadav1, and Anuradha2


1
Department of Electronics and Communication Engineering,
Galgotias College of Engineering and Technology, Greater NOIDA, India
2
Department of Electronics and Communication Engineering,
Laxmi devi Institute of Engineering and Technology, Alwar, Rajsthan, India
[email protected], [email protected]

Abstract. Text detection in a color images is a very challenging problem. This


paper gives an algorithm for detecting text in images. Experimental results on
indoor, outdoor, captcha and Video frames images show that this method can
detect text characters accurately. In this paper the proposed algorithm define the
unite effect of the advantages of various previous approaches to find out the
text, and focus on finding the text. Our experimental result on four different
type images show that the technique based on line edge detection is reasonably
better than the existing approach. This algorithm has 95.29% recall rate and av-
erage computed time is 3.645 second for English text. This is quicker than other
existing methods and is robust to language, font- color and size.

Keywords: Line detection masks, Text detection, Text localization, Text


extraction.

1 Introduction
Text detection in video and image has attracted researchers’ attention for many years.
As a result, hundreds of thousands of hours of archival videos are being stored and
shared. Three types of text in images are: indoor -outdoor text which naturally occurs
in the field of view of the camera and caption/graphics/artificial text which is artifi-
cially superimposed on the video at the time of editing and animation text which
occurs in the field of view of the internet like captcha. The tough task in images is
text extraction due to complicated background, ambiguous text character colors and
different stroke specification.
There are two common methods are used to calculate the spatial connection which
is based on edge based feature and connected component features of text. The area
with a higher contrast between text and background focus upon Edge based method
[2]. In this way, edges from letters are identified and merged. Connected component
method [3] used a bottom-up approach by iteratively merge sets of connected pixels
using a homogeneity criterion leading to the creation of flat-zones or Connected
Components.
Our proposed method for image text extraction system (shown in Fig. 1) extract a
text region from an image which can be broadly classified into three basic section:

S. Unnikrishnan, S. Surve, and D. Bhoir (Eds.): ICAC3 2011, CCIS 125, pp. 342–348, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Robust and Fast Text Extraction in Images and Video Frames 343

(1)detection of the text region in the image, (2)localization of the region, and (3) ex-
tracted the output character image. Any possibility of text detection involves of text
in the image is detected and the process of localization involves further enhancing the
text regions by eliminating non-text regions. At last in text extraction process gener-
ates an output image with white text against a black background.

Fig. 1. Text image extraction flow chart

In this paper we focused on text extraction of four type’s images with the help of
line edge detector. The paper is organized as: Section 2 gives the proposed Algorithm
and Section 3 gives the Experiment results is explained. Section 4 gives conclusions
respectively.

2 Proposed Algorthem
In this section, the processing steps of the proposed text extraction are presented. Our
aim is to build fast and robust text detection system which is able to handle still im-
ages with complex background. We can see from figure 1 that the proposed algorithm
is mainly performed by three sections, which will be described below.

2.1 Text Detection

In our proposed method, images are next convolved with directional filters at different
orientation masks for line edge detection in the horizontal (0° or 180°),vertical (90° or
270°) directions [1]. So it can be imagine that the next region have higher edge strength
in the same directions. The line detection masks used are shown in Figure 2. Which
enhance the find text edge, and then calculate the threshold. If the threshold of the de-
tected edge set an appropriate value, than the other detected weak edge can be filtered.

Fig. 2. Line detection masks in Horizontal and Vertical directions


344 A. Kumar et al.

The basic steps of our algorithm are given below.


Step-1. Create line detection masks to detect edges at 0 or 180 and 90 or 270 orienta-
tion. So that we can find the directional edge maps which are used to represent the vertical
and horizontal directions edge density and edge strength. In figure 3-d, the average edge
image is shown. Figure 3-b, c shows the edge image in horizontal and vertical direction.

a. Original image b. 0°edge image

c. 90°edge image d. average edge image

Fig. 3. Line Edge based Figure

Step-2. Convert edge image to binary image based on Otsu threshold.


Step-3. After taking Otsu threshold, the morphologically operation apply in the im-
age and generally the morphological operations is used for binary images. Morpho-
logical result shown in figure 4.

Fig. 4. Morphological result

2.2 Text Localization


In this section, the processing steps of the proposed text localization approach are
presented.
Step 4. An analysis of horizontal and vertical projection files for the text region is
placed. These profiles are mandatory historograms, in which every bin is a count of total
pixel numbers in any existing row and column. The vertical and horizontal projection
profiles for the sharpened edge image from Figure 4 are shown in Figure5 (a) and (b).
Step-5. Calculate the horizontal and vertical projection profiles of dilated image us-
ing histogram with an appropriate threshold value and Create refined image using
multiplication of binary image and median filtering image.
A Robust and Fast Text Extraction in Images and Video Frames 345

(a) (b)

Fig. 5. (a) Horizontal Projection profile (b) Vertical Projection profile for image in Figure 4

Step-6. Obtain weak refined image in 0 and 90 orientations using morphological


structure element and create the final refined image using subtraction of the refined
image and weak refined image.
Step-7. In this step, with the help of connected component labeling operator, the
long edge of final refined image is removed and 4-neighbor connected component are
utilized. Finally every edge is uniquely labeled as a single connected component with
its own defined component number.
Step-8. In this step, segment out non-text regions using major to minor axis ratio
with help of Heuristic Filtering. Only those regions in the retained image which have
an area greater than or equal to 1/20 of the maximum area region and remove those
regions which have Width/Height < 0.1 ratio. Retained image shown in figure 6.

Fig. 6. Retained image

2.3 Text Extraction


The purpose of this section is to extract accurate binary characters from the localiza-
tion text region.
Step-9. In this step a gap image will be originate by refine the localization of the
region of detected text and then gap filling process is done. This shown in figure 7.

Fig. 7. Gap filling image


346 A. Kumar et al.

Step-10. Text segmentation is the next step to take place. It starts with extraction of
text image from the gray image. Then, the segmentation process concludes with a
procedure which enhances text to background contrast on the text image.
Step-11. The available common OCR system requires to easily recognized the
character of an input images. Thus this process provides an output image with white
text against a black background. Final text image shown in figure 8.

Fig. 8. Final text image

3 Result and Analysis

In order to evaluate the performance of the proposed method, there are 28 distinct test
images are use which are of distinct font sizes, distinct perspective and distinct align-
ment under distinct circumstances. The results which are shown in figure 9 ~ 13
shows that our proposed method can detect the text with distinct font sizes, perspec-
tive, alignment, and detect the text string characters under distinct circumstances. The
importance of algorithms testing with change of scale, lighting and orientation, is use
to find the strength of every technique with change in these circumstance, and also
use to find that where each technique is successful and where it fails.
Figure 9~13, show that our proposed method has excellent performance with wide
variety of set of images. So that we can say our proposed method is a strong and im-
pressive approach to find the text based images. The performance of each method has
been calculated and it is based on obtained Recall rates and average time.

Recall Reate = (Correctly Detected Words) / (Correctly Detected Words+False Negatives)*100 (1)

(a) (b)

Fig. 9. Captcha image (a) Original image (b) Extracted image


A Robust and Fast Text Extraction in Images and Video Frames 347

(a) (b)

Fig. 10. Indoor Image (a) Original image (b) Extracted image

(a) (b)

Fig. 11. Outdoor Image (a) Original image (b) Extracted image

The test set for this evaluation experiment consists of 28 single images selected
randomly from the internet (Google search engine). The experiment is carried out on
Matlab 7.0 software platform. The PC for experiment is equipped with an Intel P4
2.4GHz Personal laptop and 2GB memory. The total processing time, including read-
in and write-out for all 28 images is less than 4 seconds.

(a) (b)

Fig. 12. News vedio frame (a) Original image (b)Extracted image

(a) (b)

Fig. 13. Vedio frame (a) Original image (b)Extracted image


348 A. Kumar et al.

Table 1. Performance Comparison

Approach Recall rate % Average Time (s)


Proposed Approach 95.3 3.64
Xiaoqing et al. [2] 86.2 14.18
Gllavata et al. [3] 88.4 16.25
Li et al. [5] 91.1 12.9
Ye et al. [6] 90.8 10.1
Liu et al. [7] 91.3 11.7

Table 1 shows the performance comparison of our proposed method with several
existing method, where our proposed method has a better performance in average
time and recall rate. The reason for fast speed that the proposed method used line
based edge approach cost less time.

4 Conclusions
In this paper, a fast and robust text extraction in images approach is proposed. The
line detection edge based method in two directions is able to better represent the in-
trinsic characteristics of text. Experiment results show that our method can obtain
95.3 % recall rate and average text detection time is 3.64 second, which is superior to
the existing text detection methods without much increasing the computational cost.

References
1. Al-Eidan, R.B., Al-Braheem, L., El-Zaart, A.: Line Detection Based On The Basic Masks
And Image Rotation. In: 2nd International Conference on Computer Engineering and Tech-
nology, IEEE, pp. 465–469. IEEE, Chengdu (2010)
2. Liu, X., Samarabandu, J.: Multiscale Edge-Based Text Extraction from Complex Images. In:
International Conference on Multimedia and Expo., ICME 2006, pp. 1721–1724. IEEE, Los
Alamitos (2006)
3. Gllavata, J., Ewerth, R., Freisleben, B.: A robust algorithm for text detection in images. In:
Proceedings of the 3rd International Symposium on Image and Signal Processing and
Analysis, ISPA, vol. 2, pp. 611–616. IEEE, Los Alamitos (September 2003)
4. Liu, X., Samarabandu, J.: An edge-based text region extraction algorithm for indoor mobile
robot navigation. In: International Conference on Mechatronics and Automation, pp. 701–
706. IEEE, Los Alamitos (2005)
5. Li, X., Wang, W., Jiang, S., Huang, Q., Gao, W.: Fast and effective text detection. In: 15th
IEEE International Conference on Image Processing, pp. 969–972 (October 2008)
6. Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video
frames. Image and Vision, Computing 23, 565–576 (2005)
7. Liu, Q., Jung, C., Kim, S., Moon, Y., Kim, J.: Stroke filter for text localization in video im-
ages. In: Proc. Int. Conf. Image Process, Atalanta, GA, USA, pp. 1473–1476 (October 2006)

You might also like