HLTR Using ML
HLTR Using ML
HLTR Using ML
Learning
Mansi Mayekar Punit Mestha Shoaib Asif
Department of Electronics and Department of Electronics and Department of Electronics and
Telecommunication Telecommunication Telecommunication
SIES Graduate School of Technology SIES Graduate School of Technology SIES Graduate School of Technology
Navi Mumbai, India Navi Mumbai, India Navi Mumbai, India
[email protected] [email protected] [email protected]
Abstract— Handwritten Line Text Recognition in Machine With these processes, the HTR method mainly suffers
Learning is one of the emerging fields within computer vision. problems of the segmentation not being stable resulting in the
There are different languages and in every language, there are recognition accuracy to be affected. This model consists of
several handwritings of people which need to be correctly many trained modules hence it is a bit tedious to achieve the
identified. Humans identify and process the text that they see desired output for the system. In this project, word images
that is the same which is expected by the HTR. Although it is algorithms are combined with line images algorithms in a
difficult to recognize the text by the system. In handwritten line given image thereby helping the user to easily convert
text recognition, there are several steps involved. Pre-Processing
handwritten documents into digital format. Still the main
the input image, feature extraction of it and transcription are
problem arises during the classification and the solution for
some of the steps involved. During these processes, our system is
being trained and similarities and differences of several
this is to avoid segmentation of lines. The use of HTR is very
handwritten samples are noted. The application hence takes diverse, it can be used in banking applications, verification of
pictures of the transcription thereby converting it into a digital signature of a person, courier services among others.
text which is the desired output.
II. LITERATURE SURVEY
Keywords— HLTR (Handwritten Line Text Recognition), NN Handwritten line text recognition has been an active field
(Neural Network), CNN(convolutional Neural Network), of research and has been witnessing much progress. It is a
RNN(Recurrent Neural Network), CTC(Connectionist Temporal challenging field as the writing of a person depends on various
Classification), TF(Tensor Flow), LSTM(Long Short Term factors like the instrument used to write the text, speed with
Memory) which the text is written, pressure applied while writing etc.
In previous research, a sliding window is used on the input
image and each path is then passed to a convolutional feature
I. INTRODUCTION extractor and then ultimately given to an encoder decoder
With much advancement in technology, manipulation of bidirectional long short term memory. This has a drawback as
photographs is a lot easier. Handwritten line text recognition it adds more weight parameters to the model thereby
plays an integral part in this. Handwritten line text recognition increasing the training time [5]. Document analysis and
is a challenging problem since different people have different recognition gives an advantage of mixing datasets with small
handwritings and styles, still a very useful invention. Also, real images that can improve accuracy. But it has two
text recognition is a tough task to be done by a machine. For disadvantages. 1) In this model, classification of images as
this process, we need to train the system accordingly. printed or handwritten is needed to improve the prediction
Character recognition involves multiple steps and these are accuracy. 2) It was applied to Latin language only [6]. In
image acquisition, pre-processing, feature extraction, computer applications and industrial electronics, datasets
sequence labeling and decoding output. Handwriting text were segmented and tested using various feature based
recognition enables a machine to detect, interpret and classification techniques as its advantages. While the
successfully recognize handwritten text from an external disadvantage was that separate classifiers were required for
media source like image or scanned document. In this project, upper and lower English characters to improve recognition
neural networks play a vital role since convolutional neural accuracy [7]. Trans pattern analysis and machine intelligence
networks and recurrent neural networks are combined with hints at using separable multidimensional long short term
Connectionist Temporal Classification so that the model is memory of recurrent neural network modules that extract
independent of lexical segmentation as well as feature contextual information in various directions and consume
extraction being done manually. Moreover, the training of the much less computation efforts. Hence its disadvantage is to
datasets helps a lot in getting the desired results. On the other solve context overfitting problems in order to improve the
hand, handwritten line text recognition involves conversion of system performance. Another disadvantage is that it was later
all the text in the picture into letter codes and the result thus applied to Latin language only [8]. The proposed model uses
obtained is considered to be a representation of the image. 7 layers CNN, 2 RNN layers with CTC to predict the line text
without pre segmentation. The LSTM enables the network to
Table I: Summary of Dataset
save useful information for more time period and make
predictions based on the previous results. LSTM enables
networks to do more robust training and take advantage of Dataset Words Text Lines Writers
exploring more context. Connectionist Temporal
Classification (CTC) loss helps to remove alignment problems
IAM 1,15,320 13,353 657
as every individual has a different handwriting [1].
B. METHODOLOGY
a) Description of Dataset:
The dataset that we have used is the IAM dataset and it has
a huge database. It accommodates various forms of
handwritten text that are used for testing and training Fig. 3: Architecture of Proposed Model
purposes thereby performing various experiments. The
IAM dataset has over 600 writers who have contributed c) Operations:
their handwriting samples with over 1500 pages of
scanned text meaning each writer contributing over 2 Input Data:
pages of their handwriting samples. At an average, it Input image to the Neural Network is of dimension
accounts to 5685 sentences and 13353 isolated and labeled 800×64. Most of the images in the IAM dataset are not of this
sentences and thereby generating over 115000 isolated and size, therefore the input image is first resized either to width
labeled words. All this data is combined for one training, of 800 or height of 64. It is then normalized to change pixel
one testing and one validation set. There are many images values by placing it in white target image of 800×64 size to
of the same types with a particular dimension in this ease the work for the NN. Fig. 4 shows the output obtained
dataset and also contains images and its text. Database after the preprocessing stage.
includes text written with various writing instruments [2].
Below Table I gives the exact amount of text in the dataset.
Fig. 2 shows a sample image from the IAM line dataset.
Images from the dataset are divided into 95%: 5% for every individual has a different handwriting. The loss
training and validation set for the NN. function is then used to train the Neural Network to predict
the correct output.
Below Table II shows the number of images fed to the NN ● Removing input images with cursive
handwriting.
for the same.
Also, Multi-Dimensional LSTM (MDLSTM) can be
employed to recognize a whole paragraph at once. For full
Table II: Splitting the Dataset
paragraph text recognition line segmentation can be added.
RNN:
The RNN consists of 100 time steps. The LSTM enables
the network to save useful information for more time period
and make predictions based on the previous results. It also
enables networks to do more robust training and take
advantage of exploring more context [1]. Two RNN layers
are stacked to create a bidirectional LSTM which gives
output of 100x80 which is given to the CTC. Long short-term
memory (LSTM) is a RNN which is better because it has
feedback connections. It processes single images and also
entire series of data sequences.
CTC:
The output of the BLSTM layer is given to CTC which
calculates the loss value by comparing the output with the Fig. 5: Learning curve of HTR model
ground truth line text. Connectionist Temporal Classification
(CTC) loss function helps to remove alignment problems as
the handwritten text line by loading the previously created and
saved model. The final text recognition output is shown in
Fig.7
V. CONCLUSION
Our HTR model has Character accuracy of 90.02% and
Word accuracy of 73%. The proposed handwritten text line
recognition approach mentioned in this paper is very
efficient. The project is achieved through CNN. In order to
easily recognize the character, it is necessary to train the
Fig. 6: Address accuracy v/s Epoch network with a large amount of dataset. Hence to achieve an
efficient network there is a stipulation of more memory as
well as better processing speeds. Both efficient and effective
Table III: Performance results are achieved by using this algorithm. The text with
less noise gives the best accuracy. The accuracy is completely
based on the dataset. Various additional processing
Proposed Model techniques such as de-slanting, word segmentation, removing
background noises etc. can be used to improve the accuracy.
Evaluation Metric Training Validation . the accuracy can be improved. If we increase the data, we
can get more accuracy and also if we try to avoid cursive
writing then also it yields better results.
CER 10.54% 9.98%