Revue D'intelligence Artificielle: Received: 5 September 2021 Accepted: 13 October 2021
Revue D'intelligence Artificielle: Received: 5 September 2021 Accepted: 13 October 2021
Revue D'intelligence Artificielle: Received: 5 September 2021 Accepted: 13 October 2021
https://doi.org/10.18280/ria.350503 ABSTRACT
Received: 5 September 2021 Fruits come in different variants and subspecies. While some subspecies of fruits can be
Accepted: 13 October 2021 easily differentiated, others may require an expertness to differentiate them. Although
farmers rely on the traditional methods to identify and classify fruit types, the methods are
Keywords: prone to so many challenges. Training a machine to identify and classify fruit types in place
CNN, hybrid deep learning, mangosteen of traditional methods can ensure precision fruit classification. By taking advantage of the
fruit, Resnet, SGD, transfer learning, state-of-the-art image recognition techniques, we approach fruits classification from
Xception, VGG16 another perspective by proposing a high performing hybrid deep learning which could
ensure precision mangosteen fruit classification. This involves a proposed optimized
Convolutional Neural Network (CNN) model compared to other optimized models such as
Xception, VGG16, and ResNet50 using Adam, RMSprop, Adagrad, and Stochastic
Gradient Descent (SGD) optimizers on specified dense layers and filters numbers. The
proposed CNN model has three types of layers that make up its model, they are: 1) the
convolutional layers, 2) the pooling layers, and 3) the fully connected (FC) layers. The first
convolution layer uses convolution filters with a filter size of 3x3 used for initializing the
neural network with some weights prior to updating to a better value for each iteration. The
CNN architecture is formed from stacking these layers. Our self-acquired dataset which is
composed of four different types of Malaysian mangosteen fruit, namely Manggis Hutan,
Manggis Mesta, Manggis Putih and Manggis Ungu was employed for the training and
testing of the proposed CNN model. The proposed CNN model achieved 94.99%
classification accuracy higher than the optimized Xception model which achieved 90.62%
accuracy in the second position.
375
The rest of the work is as follows: Section 2 presents the procedure required for selecting tea leaves, and they found out
related work, Section 3 describes the materials and methods, that there was no proper procedure for carrying out such task.
Section 4 presents the results and discussion, Section 5 Moreover, the study revealed the dangers involved in having
concludes the work. trust in tea buds picked using the conventional standards.
Nasir et al. [12] in order to classify fruits and the diseases
that affect them employed the combination of deep neural
2. RELATED WORK network and contour feature-based approach. They employed
useful features extracted from plant dataset to fine-tune and
Deep learning model can be applied to build an image pretrain VGG19 deep learning model. This was followed by
classification network using conventional neural networks [1, the application of pyramid histogram of oriented gradient
2]. This becomes a core technology for artificial intelligence (PHOG) to contour features extraction, and these features were
(AI) applications including fruit classification. Convolutional added together to the deep features using serial-based
neural network (CNN) is one of the most popular algorithms approach. Their work was similar to the work of Palakodati et
for deep learning; it’s useful for finding patterns in images to al. [13] who proposed CNN and transfer learning for
recognize the objects. They learn directly from image data, use classifying fresh and rotten fruits. However, in our work, we
patterns to classify images and eliminate the need for manual approach fruits classification from another perspective by
feature extraction. This is a kind of classification algorithm introducing a high performing hybrid deep learning that can
that can identify and classify fruits to their types. classify mangosteen fruits for precision agricultural
Transfer learning, also referred to as knowledge transfer on application.
the other hand transfers what the model has already learned to
the new developed model for problem solving. The transfer
learning consists of a previously trained model that releases 3. MATERIALS AND METHODS
some of the top layers of the fixed model foundation and
attaches a new classification layer and a final layer of the base As there are no off-the-shelf mangosteen fruit datasets
model [3]. The adaptation of this high-level feature available, we acquired our own dataset. We acquired 250
representation in the model makes it suitable for specific tasks. colored images for each of the 4 types of subspecies of
This study optimized Xception transfer learning model to mangosteen fruits making a total of 1000 images employed for
validate the performance accuracy of the proposed CNN the experiment. The images were manually cropped and
model. Several works exist in the literature that addresses the resized to 224x224 pixels both in width and height without
problem of fruit recognition as an image segmentation affecting the 3 color channels. Figure 2 shows the sample of
problem. mangosteen fruits in their respective categorical folders. For
Azizah et al. [4] built a system that detects Mangosteen each of the 4 types of subspecies of the mangosteen fruits, 200
fruits based on their color pixel. They studied Mangosteen (80%) images were apportioned as the training dataset and 50
fruit detection problems for outcome prediction. Bello et al. [5] (20%) images were apportioned as the testing dataset. In order
proposed an enhanced Mask R-CNN for the segmentation of to increase the number of images, data augmentation was
individual cow objects in the herd; the method employed employed. Figure 3 shows the subcategories of the fruits
CNN-based ResNet for features extraction, RPN (region augmented to acceptable degree of 250-image dataset
proposal network) for object region proposals in the image, collection threshold except manggis putih. Tensorflow and
and FCL (fully connected layers) for the classification of the Keras were employed as Python libraries for both the training
individual cow objects to their types. Femling et al. [6] and testing of the proposed models on the platform of Google
proposed a classification system for a grocery store that can Colab.
classify ten types of fruit. They used datasets consisting of
images from ImageNet and camera-captured images. Xception
[7], VGG16 [8] and ResNet50 [9] are some of the models
mostly used as transfer learning in detection and classification
problems.
Alkan et al. [10] reported in their paper a study which
utilizes deep learning for automated detection of the symptoms
of diseases on vine leaves. They proposed the improvement of
disease detection accuracy in vine leaves and development of
a system for Syrian and Turkish farmers and agricultural Figure 2. Sample of mangosteen fruits in their respective
engineers to maintain the quality of grape production. The categorical folders
images they acquired were processed using MATLAB
R2018b, Deep Learning Toolbox including CNNs with
AlexNet, GoogleNet and ResNet18. A standard transfer
learning algorithm was also used with CNNs, whereas a
multiclass support vector machine (SVM) was used with
AlexNet, whilst GPU and CUDA were used for accelerating
the process of the disease detection for vine leaves.
Paranavithana and Kalansuriya [11] proposed in their study
an approach based on CNN to develop a model that identifies
and predicts the suitability of tea buds for the tea plucking as Figure 3. From left to right: augmentation samples showing
a solution to the myriad range of problems associated with tea image rotation in Manggis Hutan, horizontal axis flip in
picking. This was after they carried out a study on the proper Manggis Mesta, and vertical axis flip in Manggis Ungu
376
3.1 Proposed hybrid deep learning model Unit (ReLU), the activation function was applied to each
convolution. Finally, before using the dense layer, we flattened
We optimized Xception [7], VGG16 [8] and ResNet50 [9], the map of the third convolution feature. Loss function of
and compared them to the proposed CNN model. The four cross-entropy category and the Adam optimizer with a
optimizers used are Adam, RMSprop, Adagrad, and SGD learning rate of 0.0001 were used to calculate adaptive
optimizers. These models were trained on the acquired learning rates for hyperparameters, and the model was trained
mangosteen fruit dataset, first in their base configuration, then at epochs 20, 50 and 80. The loss function calculates the loss
in their optimized state, before comparing them with each by matching the actual value and the value predicted by the
other to determine the best performing model for our particular neural network. By using the loss function of cross-entropy
use case. While other models were optimized with variations category instead of the sum-of-squares for our classification
of their optimizers, dense layers and epochs, CNN model on problem leads to improved generalization as well as faster
the other hand was optimized with variations of its optimizer, training. Our proposed model being a multi-class classification
dense layers, and epochs including the filter numbers and model has the task of predicting one of more than two class
learning rates in its optimization. labels for a given example. We can, therefore, estimate the
Figure 4 shows the architecture of the proposed CNN model cross-entropy for a single prediction using the cross-entropy
and its application to manggis fruit classification. During the calculation as described in the equation below.
computation, the pixels of the picture were shown using a
matrix. To detect a pattern, we used a filter multiplied by the H(P, Q) = – sum x in X P(x) * log(Q(x))
pixel matrix of the image. The size of this filter may vary and
the multiplication depends entirely on the size of the filter, and where, each x in X is a class label that could be assigned to the
one can take a subset of the image pixel matrix. The example, and P(x) will be 1 for the known label and 0 for all
convolution moved to the next pixel and this process was other labels.
repeated until all the picture pixels in the matrix were complete.
3.2 Optimized transfer learning models
377
fully connected layers. Its final layer comprises the softmax training and testing the model with the training dataset and
classifier, and all hidden layers undergo the application of testing dataset respectively to evaluate their performance, the
ReLu activation. Moreover, VGG16 network gives excellent parameters and results obtained are presented and discussed
performance irrespective of the small quantity of image under results and discussion.
datasets used and this is due to the extensive training it has
undergone [8]. VGG16 was employed in this study to create a
transfer learning model for the Manggis recognition model. 4. RESULTS AND DISCUSSION
The model was imported along with its weight; however, the
top layers of the model which are the two fully connected The effect of optimizer and the learning rate are shown in
layers (FCL) with its output layers were dropped. The pixel Figure 8 and Figure 9 respectively. The weights that are
dimension 224x224 same as the pixel dimension of the updated during the training of network are referred to as
acquired dataset was used. Figure 6 shows the schematic learning rate. These are important hyperparameters used in the
architecture of the VGG16 network. CNN model that range between 0.0 and 1.0 [6]. In our CNN
model, we used three learning rates and observed the influence
of those learning rates on accuracy. The three learning rates
were 0.1, 0.01 and 0.001. From Figure 9, it is observed that
after lowering the learning rate from 0.1 to 0.0001, the
accuracy improves, whereby clarifying the highest accuracy
provided by our CNN model.
378
results of the training and testing accuracies for both epoch epoch value 50 in order to be more accurate.
parameters using the SGD optimizer is lower than the values Although, the 1 layer 1024 neuron model produced testing
obtained using the Adam optimizer for the same epoch, we can accuracy that is less than the testing accuracy of the 2 layer
see that besides the positive influence of the Adam optimizer, 1024, 1024 neuron model, it has highest training accuracy,
increasing the epoch also stabilizes the model’s accuracy as thereby considered as the best optimized model from the
well as the losses as the 5-epoch experiment on both Xception transfer learning experiment. The implication of this
optimizers indicates an incomplete trend, still increasing and is that, the fitting problem of base Xception model is mitigated.
decreasing for both accuracy and loss respectively. The 50- The result of the optimization between SGD and Adam
epoch experiment for both optimizers on the other hand, show optimizers using 5 and 50 epochs is shown in Figure 11. As
both the accuracy and loss values plateauing after epoch 10 or shown in the Figure 11, Adam optimizer yields higher training
so, but for the purpose of this experiment, we prefer using score result than the score yielded by SGD.
Optimizer Adam
Epoch 50
Number of dense layers 0 1 1 2 3
Neuron configuration 0 128 1024 1024, 1024 2048, 1024, 1024
Training accuracy 93.75% 96.88% 89.84% 78.12% 100.00%
Testing accuracy 86.72% 85.16% 90.62% 91.41% 83.59%
Optimizer Adam
Epoch 50
Number of dense layers 0 1 2 2 3
Neuron configuration 0 100 1000, 500 1024, 1024 1000, 1000, 1000
Training accuracy 84.38% 90.62% 90.62% 87.50% 93.75%
Testing accuracy 62.5% 65.62% 71.88% 71.88% 69.75%
379
Table 3 shows the result comparison of each optimization Xception that produced 90.62% accuracy. Figure 13 shows
made. The addition of fully connected layers increases the epoch 50 obtaining 94.99% accuracy, thereby producing the
result of the models’ training accuracy. However, it is highest accuracy so far. The CNN model takes quite an effort
observed that adding more layers and neurons to the model to build and optimize. The Xception model on the other hand
than necessary leads to overfitting. Figure 12 shows the was quite simple to import, optimize and computationally fast
comparison of Adam and SGD optimizers. Based on Table 4, as it is already pretrained.
comparing the effects of varying the epochs and switching
between Adam and SGD optimizers, the better result was
obtained by Adam optimizer at epoch 5 higher than what was
obtained by SGD optimizer at the same epoch of 5, making
Adam optimizer at epoch 5 suitable for the experiments.
From Table 4, it is observed that increasing dense layers
impacts performance negatively because the more the layers,
the lower the testing accuracies. Moreover, raising the neuron
values from 128 to 1024 results in immediate performance
gain between the 1 layer models, however, there is fitting issue
as the layers increase. With these findings, 1-layer 1024
neuron model produced acceptably high model accuracy,
thereby making it considered as the best optimized model from
the ResNet50 transfer learning experiment. Table 5 shows the
performance across models. Out of all the models tested, the
proposed CNN model produced 94.99% accuracy higher than Figure 12. Comparison on epoch and optimizer
Optimizer Adam
Epoch 50
Number of dense layers 0 1 1 2 3 3
Neuron configuration 0 128 1024 1024, 1024 100, 100, 100 2048, 1024, 1024
Model accuracy 27.34% 25.00% 51.56% 34.38% 45.5% 25.00%
REFERENCES
380
network and transfer learning. Computers and [8] Simonyan, K., Zisserman, A. (2014). Very deep
Electronics in Agriculture, 164: 104906. convolutional networks for large-scale image recognition.
https://doi.org/10.1016/j.compag.2019.104906 arXiv preprint arXiv:1409.1556
[4] Azizah, L.M.R., Umayah, S.F., Riyadi, S., Damarjati, C., [9] Gupta, U. (2017). Detailed guide to understand and
Utama, N.A. (2017). Deep learning implementation implement ResNets. https://cv-
using convolutional neural network in mangosteen tricks.com/keras/understand-implement-resnets/.
surface defect detection. 7th IEEE International [10] Alkan, A., Abdullah, M.Ü., Abdullah, H.O., Assaf, M.,
Conference on Control System, Computing and Zhou, H. (2021). A smart agricultural application:
Engineering, Penang, Malaysia, pp. 242-246. automated detection of diseases in vine leaves using
https://doi.org/10.1109/ICCSCE.2017.8284412 hybrid deep learning. Turkish Journal of Agriculture and
[5] Bello, R.W., Mohamed, A.S.A., Talib, A.Z. (2021). Forestry, 45: 1-13. https://doi.org/10.3906/tar-2007-105
Enhanced mask R-CNN for herd segmentation. [11] Paranavithana, I.R., Kalansuriya, V.R. (2021). Deep
International Journal of Agricultural and Biological convolutional neural network model for tea bud(s)
Engineering, 14(4): 238-244. classification. IAENG International Journal of Computer
https://doi.org/10.25165/j.ijabe.20211404.6398 Science, 48(3): 1-6.
[6] Femling, F., Olsson, A., Alonso-Fernandez, F. (2018). [12] Nasir, I.M., Bibi, A., Sha, J.H., Khan, M.A., Sharif, M.,
Fruit and vegetable identification using machine learning Iqbal, K., Nam, Y., Kadry, S. (2021). Deep learning-
for retail applications. In: 14th International Conference based classification of fruit diseases: An application for
on Signal-Image Technology & Internet-Based Systems, precision agriculture. CMC-Computers Materials and
Las Palmas de Gran Canaria, Spain, pp. 9-15. Continua, 66(2): 1949-1962.
https://doi.org/10.1109/SITIS.2018.00013 https://doi.org/10.32604/cmc.2020.012945
[7] Chollet, F. (2017). Xception: Deep learning with [13] Palakodati, S.S.S., Chirra, V.R., Yakobu, D., Bulla, S.,
depthwise separable convolutions. In: Proceedings of the (2021). Fresh and rotten fruits classification using CNN
IEEE Conference on Computer Vision and Pattern and transfer learning. Revue d'Intelligence Artificielle,
Recognition, Honolulu, HI, USA, pp. 1251-1258. 34(5): 617-622. https://doi.org/10.18280/ria.340512
https://doi.org/10.1109/CVPR.2017.195
381