Convolutional Neural Network CNN For Ima
Convolutional Neural Network CNN For Ima
Convolutional Neural Network CNN For Ima
Abstract- Deep Learning algorithms are designed in such sets. It contains natural images and helps implement the
a way that they mimic the function of the human cerebral image detection algorithms.
cortex. These algorithms are representations of deep neural
networks i.e. neural networks with many hidden layers. In this paper, Convolutional neural networks models
Convolutional neural networks are deep learning algorithms are implemented for image recognition on MNIST dataset
that can train large datasets with millions of parameters, in and object detection on the CIFAR-10 dataset. The
form of 2D images as input and convolve it with filters to implementation of models is discussed and the
produce the desired outputs. In this article, CNN models are performance is evaluated in terms of accuracy. The model
built to evaluate its performance on image recognition and is trained on an only CPU unit and real-time data
detection datasets. The algorithm is implemented on MNIST augmentation is used on the CIFAR-10 dataset. Along
and CIFAR-10 dataset and its performance are evaluated. with that, Dropout is used to reduce Overfitting on the
The accuracy of models on MNIST is 99.6 %, CIFAR-10 is datasets.
using real-time data augmentation and dropout on CPU
unit. The remaining sections of the paper are described as
follows: Section 2 describes a brief literature survey;
Keywords- Deep Learning, Handwritten digit Recognition, Section 3 describes the classifier models with details of
Object Detection, Convolutional neural networks, MNIST, the techniques implemented. Section 4 evaluates the
CIFAR-10, Dropout, Overfitting, Data Augmentation, Relu performance of the model and describes the results.
Section 5 summaries the work with future works.
I INTRODUCTION
II. LITERATURE SURVEY
Image Recognition and detection is a classic machine
learning problem. It is a very challenging task to detect an In recent years there have been great strides in
object or to recognize an image from a digital image or a building classifiers for image detection and recognition on
video. Image Recognition has application in the various various datasets using various machine learning
field of computer vision, some of which include facial algorithms. Deep learning, in particular, has shown
recognition, biometric systems, self-driving cars, emotion improvement in accuracy on various datasets. Some of the
detection, image restoration, robotics and many more[1]. works have been described below:
Deep Learning algorithms have achieved great progress in Norhidayu binti Abdul Hamid et al. [3] evaluated the
the field of computer vision. Deep Learning is an performance on MNIST datasets using 3 different
implementation of the artificial neural networks with classifiers: SVM (support vector machines), KNN (K-
multiple hidden layers to mimic the functions of the nearest Neighbor) and CNN (convolutional neural
human cerebral cortex. The layers of deep neural network networks). The Multilayer perceptron didn't perform well
extract multiple features and hence provide multiple on that platform as it didn't reach the global minimum
levels of abstraction. As compared to shallow networks, rather remained stuck in the local optimal and couldn't
this cannot extract or work on multiple features. recognize digit 9 and 6 accurately. Other classifiers,
Convolutional neural networks is a powerful deep performed correctly and it was concluded that
learning algorithm capable of dealing with millions of performance on CNN can be improved by implementing
parameters and saving the computational cost by inputting the model on Keras platform. Mahmoud M. Abu Gosh et
a 2D image and convolving it with filters/kernel and al. [5] implement DNN (Deep neural networks), DBF
producing output volumes. (Deep Belief networks) and CNN (convolutional neural
The MNIST dataset is a dataset containing networks) on MNIST dataset and perform a comparative
handwritten digits and tests the performance of a study. According to the work, DNN performed the best
classification algorithm. Handwritten digit recognition has with an accuracy of 98.08% and other had some error
many applications such as OCR (optical character rates as well as the difference in their execution time.
recognition), signature verification, interpretation and Youssouf Chherawala et al. [6] built a vote weighted
manipulation of texts and many more[2,3]. Handwritten RNN (Recurrent Neural networks) model to determine the
digit recognition is an image classification and significance of feature sets. The significance is
recognition problem and there have been recent determined by weighted votes and their combination and
advancements in this field [4]. Another dataset is CIFAR- the model is an application of RNN. It extracts features
10 which is an object detection datasets that classifies the from the Alex word images and then uses it to recognize
objects into 10 classes and detects the objects in the test handwriting. Alex krizhevsky [7] uses a 2-layer
279
2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC)
280
2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC)
underfitting is not generalizing or fitting the data points vanishing gradient problem i.e. the gradient takes smaller
well. In case of high bias, which is on training batch the steps and ultimately becomes so small that the changes in
network needs to be trained longer and have much bigger weights are insignificant. Thus the learning rate is a hyper
network of hidden layers. In case of overfitting, the data parameter that needs to be finely tuned and can be done
has high variance i.e. it is generalizing too well on the test with the help of learning rate decay. In learning rate
sets. To reduce high variance, regularization techniques decay, the learning rate decays exponentially after every
and data augmentation techniques can be implemented. epoch.
IV. RESULT ANALYSIS
The results of the experiments are as shown below:
x CNN Model for MNIST dataset: Accuracy
99.6% shown in figure 6
Fig.5. Relu non-linearity The result shows that as the number of epochs get increased
and the best accuracy achieve in recognizing the digits on
G. Dropout to reduce overfitting MNIST data set is 99.6 % with 10 epochs.
Dropout is a regularization technique which is used to
reduce overfitting [7]. In dropout, the network deads some
of its nodes randomly based on a parameter. The
probability parameter determines whether the node should
remain in the network or not. Keep_prob is the probability
parameter to keep the hidden nodes in the network. The
activation is unaffected during this process as it only
determines whether to keep the node in the network or
not.
H. Data Augmentation
Another technique to reduce overfitiing is to train the
data on large datasets. If dataset is limited the dataset can
be artificially created by data augmentation techniques
[7]. The data augmentation techniques include distortion
and altering of data images for processing to get more
data. Some of the techniques are:
x Mirroring – The images are flipped and laterally
inverted.
x Random cropping- Cropping some parts of the
image and creating subsets from the main image.
x Rotation- This includes rotating the images in
any direction at various angles and generating
new images.
x Color shifting- Shifting the RGB pixel values of Fig. 7 Accuracy of the CNN model in 50 epochs
the image to get a new coloured image.
CNN model for CIFAR-10 dataset: Accuracy of 80.17%
I. RMS prop optimizer and learning rate on test set as shown in figure 7.
RMS prop or root mean square prop is an optimizer
V. CONCLUSION
which works on the root mean square value of the change
in gradients. The change in weights and bias determine The article discusses various aspects of deep learning,
the gradient parameters with help of rms value. The CNN in particular and performs image recognition and
learning rate determines the steps the algorithm will take detection on MNIST and CIFAR -10 datasets using CPU
to converge to the global minimum. If the learning rate is unit only. The accuracy of MNIST is good but the
too high the algorithm faces exploding gradient problem accuracy of CIFAR-10 can be improved by training with
i.e. it takes larger steps and fails to converge at the local larger epochs and on a GPU unit. The calculated accuracy
minimum. If the learning rate is too small it faces a on MNIST is 99.6% and on CIFAR-10 is 80.17%. The
281
2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC)
training accuracy on CIFAR-10 is 76.57 percent after 50 [21] Christian szegedy, Wei liu, Yangqing Jia et al., “Going Deeper with
epochs. The accuracy on training set may also be Convolutions”,Conference on Computer Vision and Pattern
Recognition (CPVR) , IEEE explorer, Boston, MA, USA, 2015.
improved further by adding more hidden layers. And this
system can be implemented as a assistance system for
machine vision for detecting nature language sysmbols.
REFERENCES
[1] Yann LeCun, Yoshua Bengio, Geoffery Hinton, “Deep Learning”,
Nature, Volume 521, pp. 436-444, Macmillan Publishers, May
2015.
[2] Norhidayu binti Abdul Hamid, Nilam Nur Binti Amir Sjarif,
“Handwritten Recognition Using SVM, KNN and Neural
Network”, www.arxiv.org/ftp/arxiv/papers/1702/1702.00723
[3] Cheng-Lin Liuכ, Kazuki Nakashima, Hiroshi Sako, Hiromichi
Fujisawa, “Handwritten digit recognition: benchmarking of state-
of-the-art techniques”, ELSEVIER, Pattern Recognition 36 (2003)
2271 – 2285).
[4] Ping kuang, Wei-na cao and Qiao wu, “Preview on Structures and
Algorithms of Deep Learning”, 11th International Computer
Conference on Wavelet Actiev Media Technology and
Information Processing (ICCWAMTIP), IEEE, 2014.
[5] Mahmoud M. Abu Ghosh ; Ashraf Y. Maghari, “A Comparative
Study on Handwriting Digit Recognition Using Neural Networks”,
IEEE, 2017.
[6] Youssouf Chherawala, Partha Pratim Roy and Mohamed Cheriet,
“Feature Set Evaluation for Offline Handwriting Recognition
Systems: Application to the Recurrent Neural Network,” IEEE
Transactions on Cybernetics, VOL. 46, NO. 12, DECEMBER
2016.
[7] Alex Krizhevsky, “Convolutional Deep belief Networks on
CIFAR-10”. Available: https://www.cs.toronto.edu/~kriz/conv-
cifar10-aug2010.pdf.
[8] Yehya Abouelnaga , Ola S. Ali , Hager Rady , Mohamed
Moustafa, “ CIFAR-10: KNN-based ensemble of classifiers”,
IEEE, March 2017.
[9] Caifeng Shan, Shaogang Gong, Peter W. McOwan, “Facial
expression recogniton based on Local binary patterns: A
comprehensive study”, ELSEVIER, Image and Vision Computing
27, pp. 803-816, 2009.
[10] Li Deng, “A tutorial survey of architectures, algorithms, and
applications of Deep Learning”, APSIPA Transactions on Signal
and Information Processing (SIP), Volume 3, 2014.
[11] Yann LeCun, Corinna Cortes and Christopher J.C. Burges, “The
MNIST Database of handwritten digits”. Available:
http://yann.lecun.com/exdb/mnist/ - MNIST database
[12] The CIFAR-10 dataset. Available:
https://www.cs.toronto.edu/~kriz/cifar.html
[13] MNIST dataset introduction, 2017. Available:
http://corochann.com/mnist-dataset-introduction-1138.html
[14] Robust Vision Benchmark. Available:
https://robust.vision/benchmark/about/
[15] Neural Network and Deep Learning. Available:
http://neuralnetworksanddeeplearning.com/chap6.html
[16] Convolutional Neural Networks for Visual Recoginition. Available:
http://cs231n.github.io/neural-networks-1/
[17] Krizhevsky, Sutskever and Hinton, “ImageNet classification with
deep convolutional neural networks”, Advances in Neural
Information Processing Systems 25 (NIPS 2012), pp. 1106–1114,
2012.
[18] Zeiler, M. D. and Fergus, “Visualizing and understanding
convolutional networks”. European Conference on Computer
Vision, vol 8689. Springer, Cham, pp. 818-833, 2014.
[19] Zhengwei Huang, Min Dong, qirong Mao and Yongzhao Zhan,
“Speech Recognition using CNN”, IEEE/ACM Transactions on
Audio, Speech and Language Processing, pp. 1533-1545, Volume
22, Issue 10, 2014, http://dx.doi.org/10.1145/2647868.2654984.
[20] Shima Alizadeh and Azarr Fazel, “Convolutional Neural networks
for Facial Expression recognition”, Computer Vision and Pattern
Recognition, Cornell University Library, ArXiv:1704.06756v1, 22
April, 2017, arXiv.org.1704.06756.pdf.
282