Cnn
Cnn
Cnn
net/publication/368920962
CITATION READS
1 3,182
3 authors, including:
All content following this page was uploaded by Asherl Bwatiramba on 02 March 2023.
1. Introduction
1.1 Problem Domain
Even though today we are in a digitized world, there are many day-to-day activities that still need information to
be written on paper, and then inserted into their respective system. For example, when withdraw or deposit
money or cheque into a bank account, we need to fill a form at the cashier with details of the transaction. The
bank clerk will then insert these data into their system, which is time consuming and error-prone. Many similar
situations exist elsewhere like insurance company, airport, and school and so on where data must be filled in
manually and then inserted in a computer system.
For some time now, handwriting recognition systems have been developed to help people in their day-to-day
data input activities. Biological neural networks, which enable people and animals to learn and model nonlinear
and complicated interactions, can provide inspiration for handwriting recognition systems. The human brain can
recognize many handwriting objects, including digits, letters, and characters. However, because people are
prejudiced, they can read handwriting letters and numbers differently, (Vadapalli, 2021). Computerized systems,
on the other hand, are impartial and capable of performing highly difficult tasks that may require humans to
expend a great deal of effort and time. It is necessary to comprehend how humans interpret handwriting.
Documentation analysis, mailing address interpretation, bank check processing, signature verification, and postal
address verification are just a few of the practical issues that a handwriting recognition system may assist with in
the field of information technology. Hence, handwriting recognition technologies are used to facilitate
communication between humans and machines.
Based on its biological counterparts, artificial neural networks have been developed in computing to simulate
some aspects of the work of the human brain. A neural network is the most suited technology for the suggested
system because of its ability to draw meaning from complicated data and spot trends from data that are difficult
to notice using other human techniques or by humans themselves. It is the primary goal of this study to construct
models that will be utilized to interpret handwritten numbers, characters from an image. A general review of the
linked works, theoretical basis, the architecture, methodology, experimental results, and conclusion will be
provided in the following subsections, (IBM Education, 2020).
34
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
2. Theoretical Background
Artificial intelligence is the theory and development of computer systems able to perform tasks normally
requiring human intelligence such as visual perception. Machine learning is a subfield of artificial intelligence
(AI) that focuses on the use of data and algorithms to emulate how humans learn while gradually enhancing its
accuracy. Deep learning, on the other hand, is a subfield of machine learning that consists of creating neural
networks with three or more layers.
Handwriting recognition has been the subject of many studies, (Hamad & Kaya, 2016). A key feature of
most recognition systems for handwriting is the use of neural networks (Figure 1.1
It consists of an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, is
connected to another and has a weight and threshold associated with it. If the output of any particular node
exceeds the threshold value, that node is activated and sends data to the subsequent network layer. Otherwise, no
data is transmitted to the subsequent network layer.
High data uncertainty due to the various qualities of each person's handwriting is one of the challenges in
handwriting recognition, (Surya Nath et al., 2015). The efficacy of handwriting recognition is dependent on the
feature extraction and classification approach, as stated. Nowadays, this achieved by using deep learning,
that is using neural network algorithms, but with more layers, (Techopedia, 2022). Convolutional Neural
Network (CNN) is a form of Neural Network designed primarily for processing data using a grid-shaped
architecture, (Goodfellow et al., 2016). CNN is said to be the finest model for dealing with object detection and
recognition. In the case of particular datasets, digital picture recognition accuracy can rival that of humans,
(Coates et al., 2011). (Trnovszky et al., 2017), when CNN was compared to other categorization algorithms, it
produced the best results, with a 98 percent accuracy rate. It demonstrates that the CNN approach is particularly
suitable for picture categorization. Our aim in this research is to construct a neural network first and then develop
it into a CNN to do handwriting recognition
3. Methodology
This section explains the design and architecture of the proposed neural network- based handwritten character
recognition system.
35
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
images of handwritten digits from 0 to 9. The images of handwritten numerals are encoded as a 28x28 matrix
with grayscale pixel values in each cell, (LeCun et al., 2022). Additionally, we will explore the Kaggle AZ
dataset, which consists of 26 folders (A-Z) containing handwritten images in a 28x28 matrix with grayscale
storage for each image. Figure 1.2 shows the Screenshot of the Dataset
Before doing recognition, all images from the dataset were resized and converted from grayscale to black and
white. This is due to the fact that grayscale images contain a variety of shades of grey and permit a range of pixel
values. Black-and-white graphics only include pixel values of 2 (0 and 1) and nothing in between. And also to
check that all images are in a 28x28 pixel matrix. Figure 1.3 displayed the Image Pre-Processing.
36
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
based on certain parameters, data, and training periods. The size of a layer in a deep neural network depends on
the predicted workload of the system.
The Input layer will have 784 nodes (due to our images being of size 28x28 = 784 pixels)
The layer 1 and layer 2 will have each 512 nodes
The output layer for digits will have 10 nodes (to recognize 0 to 9)
Figure 1.4 displays the Neural Network for Character Recognition Systems. This neural network was
implemented and trained as well as tested with the MNIST dataset to check for accuracy and speed.
37
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
The convolution layer is where most of the computation occurs. It requires an input layer, a filter, and a feature
map.
Pooling layer: The pooling layer substitutes network output at certain points by calculating a summary statistic
of surrounding outputs. This aids in lowering the spatial dimension of the representation, which reduces the
amount of computation and weights necessary, (Mishra, 2020).
Fully connected layer: This comprises of the neurons, as well as the weights and biases, and is used to link the
neurons between two separate layers. These layers are often placed prior to the output layer and constitute the
final few levels of a CNN design. The input picture from the preceding layers is flattened and supplied to the FC
layer in this step. The flattened vector is then sent via a few additional FC levels, where the mathematical
function operations are often performed. The categorization procedure begins at this point, (Gurucharan, 2020).
Activation layer: The activation function is a critical element in the CNN model. It is used to learn and estimate
any type of continuous and complicated relationship between network variables. There are various popular
activation functions, including the ReLU, Softmax, TanH, and Sigmoid functions. Each of these functions has a
distinct purpose. For a binary classification CNN model, the sigmoid and softmax functions are favored, whereas
softmax is typically employed for multi-class classification, (Gurucharan, 2020).
38
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
of neural network, train the model, test & evaluate the model.
A. Loading & Preparing Dataset
This step involves acquiring the dataset and loading it into the model. The dataset is then divided into training
and a test dataset. For the MNIST dataset, 60000 images are for training and 10000 images are for testing,
representing digits.
B. Image Pre-Processing
After the dataset has been loaded, the images are converted into grayscale image from color images. Then
normalize the data for training, which is converting the value ranges from [0-255] to [0-1]. The resulting data
will then be flattened to be represented as matrices for faster processing.
C. Creation of Neural Network
TensorFlow was used to build the neural network in the Python programming language, using Anaconda as the
main development environment, (Vaughan, 2018). TensorFlow is an open-source AI library that enables the
construction of data flow graphs for model creation. Keras, a Python-based API for deep learning that runs on
top of TensorFlow was also utilized. In addition, the Matplotlib and Seaborn were used to plot the graphs and
generate images from the datasets. The source code for all this is provided in Notes.
D. Training the model
After creating the neural network, the model was trained with the dataset that was uploaded earlier. From the
training dataset that have been created, 30% of the training dataset was used for validation for up to 25 epochs
(One epoch means all data processed one times). The aim was to increase accuracy and decrease processing
times for each model. After training the model, we need to evaluate them.
E. Test the model
After training the model, both images that are in and outside the dataset are used to train the model. This is done
in order to check if the model can accurately predict the character from any kind of image inserted. Figure 1.7
displays the flowchart for the architecture of the system
4. Implementation
Two different models were developed. The first one model was using Simple Neural Network and the second
one using Convolution Neural Network. We are going to show the working of the models in this section.
4.1 Code for loading the data set
In Figure 4.1, the libraries for Numpy, Tensorflow and Keras were imported, and then the MNIST dataset was
loaded from its source. The subsets that will be used for training and testing were also set.
39
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
40
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
The pixel distribution in each image could also be of interest to optimize our model further. The code for this
is given in Figure 4.5 and the output as histogram for number 5 is given in Figure 4.6
The model summary for this simple neural network is provided in Figure 4.8
41
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
In Figure 4.11, displays the code used to evaluate the model. An accuracy of 93% with the test data (Figure 4.12)
was achieved.
The output is given in Figure 4.14, and it is further plotted graphically with code in Figure 4.15 and output
in Figure 4.16.
42
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
43
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
After being trained, the accuracy of the model is also provided in Figure 4.19. We can see that we have achieved
a higher accuracy of 98% with the CNN model
44
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
We have developed additional coding in Figure 4.20 to estimate the accuracy and loss in our models based
on number of runs.
45
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
Furthermore, the confusing matrix for all digits [0-9] provided in Figure 5.3 and 5.4 also confirm the fact that
CNN is much better than SNN. For example, the number “3” was correctly predicted 1000 times by CNN, but
only 925 times by SNN. Also, the SNN wrongly predicted “3” to be “2” 10 times, whereas CNN did it only once.
So, the overwhelming conclusion is to use CNN for handwriting recognition
46
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
With a less powerful CPU or we were going to train the model with more epochs or more images, it would have
taken more time. As for accuracy, it is around 98.5%. Figures 5.7 and 5.8 confirm that with GPU enabled, our
model to execute all 25 epochs took 108 minutes (1 minutes and 48 seconds), with an average of 2 ms per step.
The accuracy is not changed at 98.5%. So, we can conclude that better hardware, such as addition of GPU or
multiprocessing, is bound to improve the performance of a CNN.
47
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
48
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
A summary of the results is presented in table 5.1. We can therefore conclude that the number of runs that must
be conducted on a neural network model has an impact on its accuracy and hence its effectiveness to be used for
business applications.
49
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
50
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
We have also performed other types of tests like writing in Ms Paint, converting the data in Photoshop
(Figure 5.14), loading and transforming the data to prepare (Figure 5.15) and predicting the result (Figure
5.16) and seen that the model is working fine
51
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
6. Critical Analysis
One of the main drawbacks of this research paper is that it has focused essentially on the use of neural networks
for handwriting recognition. It has not looked at other possibilities such as Hidden Markov Models or Support
Vector Machines to achieve the same objectives.
52
Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.14, No.1, 2023
Another aspect to consider is the accuracy we have obtained. 98.72% (in Table 5.1) seems a very good result.
But is it good enough to be used in a banking environment, which is one of the examples of applications areas
discussed in our introduction section? Whereas this accuracy could be good for general purpose applications
such as object detection or recognition, it could be largely insufficient in banking environment. Imagine a cheque
is drawn for Rs 2000 – our handwriting system based on CNN could interpret this as Rs 7000 in 36 out of 1000
cases (based on our results in Figure 5.4), with total potential loss of Rs up to Rs 180,000 for 36 bank customers.
Clearly, this is not acceptable. Our handwriting recognitions require further refinement and validation to be used
in commercial environments.
The other issue is also that our recognition system model works only on the MNIST dataset and was not
implemented for the Kaggle dataset with Latin alphabets. But it must be pointed out that a similar approach
could be used for letters as the one we did for numbers. In the same way, recognition systems for other languages
like Arabic or Hindi could be implemented.
Furthermore, our recognition system works only for individually scanned numbers. We have not developed a
system what will take a document, identify the numbers, break them down into individual images and then pass
them to the model for recognition. But this could be a way forward to enhance the system by developing such
functions.
Another drawback is that our model takes into account only the recognition of static images. With the use of
mobile phones, tablets and lately touch screens on laptops, there will be an increase in
dynamic recognition of handwriting where the system will also need to follow hand movements to identify the
letter or number being written. This would be a future area of research.
The CNN model we have built could further having fine-tuned by using different types of activation functions,
adding more internal layers or increasing the number of epochs in order to increase accuracy obtained. Also, we
could have developed models based on other libraries such as Pytorch. The dataset could have been extended to
contain much more than 60000 images of numbers, given the various ways in which human being write them.
Finally, our CNN model is an example of deep learning. The main criticism against the latter is that it is not
sufficiently transparent (a black box). One might want to understand how a decision was picked, understand the
extent to which there’s bias, especially if it could lead to important decisions. Deep learning is not well
integrated with prior knowledge. Furthermore, problems that require common sense are not yet
within reach for deep learning. For example, if a customer is making a written bank transfer of Rs 2000 every
month, the CNN will not find anything unusual if it wrongly recognizes a transfer of Rs 70000 for a specific
month, whereas common sense would indicate otherwise.
7. Conclusion
In this research work, we have been able to study different learning models and have been able to design,
develop, implement, deploy and test a convolutional neural network to recognize handwriting. We have used the
MNIST dataset, explored various models and concluded that a CNN running on well-tuned hardware with
GPU,with sufficient training data can deliver accuracy up to 98.7% to recognize numbers. The model can further
be enhanced in terms of accuracy and speed if we increase the dataset, increase the number of epochs runs and
execute it on parallel hardware. Using such techniques, some researchers have been able to achieve up to 99.89%
of accuracy. But there is further scope for improvement.
8. References
1. Vadapalli, P. (2021, February 9). Biological Neural Network: Importance, Components
& Comparison. https://www.upgrad.com/blog/biological- neural-network/
2. Education, I. C. (2020, August 17). What are Neural Networks? What Are Neural Networks?| IBM;
www.ibm.com. https://www.ibm.com/cloud/learn/neural- networks/
3. Hamad, K., & Kaya, M. (2016). A Detailed Analysis of Optical Character Recognition Technology.
international Journal of Applied Mathematics, Electronics and Computers, 4(Special Issue-1),
244–249. https://doi.org/10.18100/ijamec.270374/
4. Surya Nath, R. S., Afseena, S., & International Journal of Scientific and Research Publications.
(2015). Handwritten Character Recognition – A Review. Handwritten Character Recognition – A
Review, 5(3), 1–6. http://www.ijsrp.org/research-paper- 0315/ijsrp-p3996.pdf
5. Techopedia. (2022, February 23). What is Deep Learning? - Definition from
Techopedia. Techopedia.Com; www.techopedia.com.
https://www.techopedia.com/definition/30325/deep-learning/
6. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning (Adaptive
53
View publication stats
Computation and Machine Learning series) (Illustrated ed.). The MIT Press.
7. Coates, A., Lee, H., & Andrew, Y. N. (2011). An Analysis of Single-Layer Networks
in Unsupervised Feature Learning. An Analysis of Single-Layer Networks in Unsupervised Feature
Learning, 1–9. https://cs.stanford.edu/~acoates/papers/coate sleeng_aistats_2011.pdf
8. TRNOVSZKY, T., KAMENCAY, P., ORJESEK, R., BENCO, M., & SYKORA, P. (2017). Animal
Recognition System Based on Convolutional Neural Network. DIGITAL IMAGE
PROCESSING AND COMPUTER GRAPHICS, 15(3), 1–9.
https://pdfs.semanticscholar.org/68ce/c9c6572abae2fddde2ed47785df24eba1713.pdf
9. LeCun, Y., Cortes, C., & J.C. Burges, C. (2022, 0 0). THE MNIST DATABASE of handwritten
digits. THE MNIST DATABASE of Handwritten Digits; yann.lecun.com.
http://yann.lecun.com/exdb/mnist/
10. Yamashita, R., Nishio, M., Gian Do, R. K., & Togashi, K. (2018,June 22).
Convolutional neural networks: an overview and application in radiology - Insights into Imaging.
SpringerLink; link.springer.com. https://link.springer.com/article/10.1007/s13244-018-0639-9
11. Mishra, M. (2020, September 2). Convolutional Neural Networks, Explained | by
Mayank Mishra | Towards Data Science. Medium; towardsdatascience.com.
https://towardsdatascience.com/convolution al-neural-networks-explained-9cc5188c4939
12. Gurucharan, M. K. (2020, December 7). Basic CNN Architecture: Explaining 5
Layers of Convolutional Neural Network | upGrad blog. upGrad Blog; www.upgrad.com.
https://www.upgrad.com/blog/basic-cnn- architecture/#3_Fully_Connected_Layer
13. Vaughan, J. (2018, February 1). What is TensorFlow? - Definition from
https://www.techtarget.com/searchdatamana gement/definition/TensorFlow
54