Research Paper2
Research Paper2
Research Paper2
net/publication/359652459
CITATIONS READS
0 43
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Saiqa Khan on 13 December 2022.
Received 14 Mar. 2021, Revised 8 Dec. 2021, Accepted 9 Jan. 2022, Published 31 Mar. 2022
Abstract: In agriculture domain, plant disorder identification and its classification are one of the emerging problems to study. If a
timely and correct diagnosis is not done, it may lead to adverse effects on agricultural productivity and crop yield. The first sign of
disease appears on the leaves. Diseases can be detected from the symptoms appearing on leaves. Aiming at tomato, this paper presents
a novel disease recognition convolution neural network architecture based on Self-excitation network and ResNet architecture. The
main research gap identified was the use of lab controlled standard images, consideration of only biotic disorders and low accuracy
on unseen test dataset. The main contribution of this work is to increase generalization. Therefore, to reduce generalization error,
augmentation is applied and images are captured in a manner where leaf is surrounded by occlusion areas. To capture minute lesion
and spot details, multiscale feature extraction with dilated kernel is applied. Our collected real-world dataset consists of 11 types of
biotic and abiotic disorders. Various experiments are carried out to verify proposed method’s effectiveness. The proposed method has a
recognition accuracy of 81.19% on a real-world validation dataset using 75-10-15(train-validation-test) division ratio on augmented data
and average recognition accuracy of 91.76% for the 10-fold cross-validation technique. The comparative analysis with all state-of-the-art
techniques exhibited amelioration in the computation time and classification accuracy. The results are used to classify tomato biotic and
abiotic diseases in the real-world complex environment and novelty lies in the fact that both biotic and abiotic elements are taken into
account.
Keywords: Tomato leaf disease recognition, Image enhancement, Multiscale feature extraction, Residual block
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1263
efficiency, PlantVillage (http://www.plantvillage.org [20]) dilated convolution kernel can be represented as (Eq .1):
standard dataset is also used to test recognition accuracy.
Compared to real-world images, the PlantVillage dataset X
has images with a simple background. It has in total 54,305 (F ∗ l k) (p) = F (s) k (t) (1)
disease leaf images for 13 plant species. 224 × 224 is the s+lt=p
input image size for the model. To achieve this, the input
dataset is normalized by the Bi-Cubic interpolation resizing where l is the dilation rate that indicates a degree to
method [21]. The collected plant leaf images are in JPEG which the kernel is widened. In the year 2015 for ImageNet
format. Figure 1 shows a sample dataset. competition, ResNet won first place for an image classifi-
cation task [21]. It was primarily being designed to solve
B. Data Augmentation the vanishing gradient problem. It does so by introducing
When classes are unbalanced or when there is a paucity residual blocks and skip connections as shown in Figure 3.
of data for training and validation, data augmentation is It is considered simpler compared to its previous counterpart
applied [22]. In the proposed work, using Keras following such as VGG.
augmentations are applied 1. Random rotation: Random
rotation with the given angle 2. Zoom: Zoom operation for Res-Net uses residual blocks and skip connections.
image 3. Horizontal flip & vertical Flip: Toggle between Residual blocks are considered as a special case of networks
the horizontal and vertical orientations of the image. 4. without the presence of gates. In a neural network, a gate
Fill: Points outside the boundaries of the input are filled serves as a threshold for determining when the network
according to the given mode. 5. Width shift and height shift: should employ standard stacked layers vs an identity con-
Move the entire picture horizontally or vertically at a certain nection. The output of lower levels is added to the output of
distance. Both biotic and abiotic disorders are considered in subsequent layers in an identity connection. In a nutshell,
this work for evaluation. Figure 2 shows the Plant Village it allows the network’s layers to learn in little steps rather
dataset image after augmentation. The proposed model uses than building transformations from scratch. Gates allow the
a ResNet residual module as a baseline for building a lighter flow of memory from initial layers to final layers. Gates
network that includes multi-scale feature extraction with are missing in the residual block’s skip connections; hence,
dilated kernel and ResNet-SeNet combination to improve they provide very good performance.
classification accuracy [23].
Formally, the underlying mapping is denoted as H(x)
Various convolution kernels with feature extraction on and another mapping fit by non-linear layers stacked to-
different scales are presented based on the features and gether is denoted by F(x)=H(x)-x. F(x)+x is a recast of
textures of different tomato illnesses. It can help to extract the original mapping. It is stated that optimising residual
local features. The large-scale convolution kernel 7 × 7 is mapping is more straightforward compared to original map-
used to extract contours. The importance of using varied ping. Recasted mapping is called residual mapping. The
kernels can be justified due to a fact that disease spots and main intuition is to achieve optimization and it is clearly
lesions are relatively small. On the other hand, textures of stated that it is much easier to optimize residual mapping
different diseases are also the same such as early blight, than original mapping. The presence of skip connections
late blight, septoria leaf spot and others. Conclusively, it is makes it easy for identity mapping to be learned. The
decided that to address these issues there is a need to take basic aim of the Squeeze and excitation network(SeNet)
into consideration both fine-grained features and coarse- is to boost representation quality and this can be carried
grained features. Therefore, dilated kernels of different sizes out by modelling interdependencies in convolution feature
are utilized with 32 kernels of small size( 1 × 1 and 3 × channels. The central idea is to apply feature recalibration,
3) and 16 kernels of large size (5 × 5 and 7 × 7). All the which supports the use of global information to concentrate
outputted feature maps are combined and pass into the next on the most relevant features while eliminating the less
layer. important ones.
Dilated convolution kernel technique was used exten- The structure is represented in Figure 4 where Ftr is the
sively in image segmentation [24]. After convolution, pool- transformation that performs mapping of input X to U and
ing layers are used in a traditional CNN architecture. They U ∈ RH×W×C . These produced features are passed through
reduce overfitting. They do, however, diminish the spatial a squeeze network to generate a channel descriptor by the
information of feature maps. In dilated convolution, a filter aggregation of feature maps.
is dilated before computing the convolution. For dilation,
the convolution filter size is increased, and zeroes are put t-Distributed Stochastic Neighbor Embedding(t-SNE) is
at all empty position to get the desired width and height one of the popular methods for visualization [25]. The
of the kernel. In other words, dilated convolutions indicate technique is used to create two-dimensional maps from
a type of convolution where holes are inserted between hundreds of dimensions. This is done by mapping multidi-
the elements of a kernel to inflate kernel, unlike traditional mensional data consists of hundreds of dimensions to two
standard kernel where l(dilation rate) is 1. Technically, 2D dimensions. The algorithm is non-linear, and it transforms
http:// journals.uob.edu.bh
1264 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
underlying data using different operations. Perplexity is a The loss is increased when the projected probability differs
measure of information that is defined as 2 to the power from the actual probability. So, the ideal case is to have
of the Shannon entropy. The perplexity of a fair die with zero loss. Cross-entropy loss for binary classification when
k sides is equal to k. In t-SNE, the perplexity may be the number of classes is two can be calculated as(Eq .2):
viewed as a knob that sets the number of effective nearest
neighbours. The original paper on t-sne visualization says,
− y log (q) + (1 − y) log (1 − q)
“The performance of SNE is fairly robust to changes in (2)
the perplexity, and typical values are between 5 and 50”.
t-sne visualization in the optimization process depends on where q is the probability. When M > 2 , loss is
hyperparameters and it will not produce similar outputs on calculated per class label per instance and summing the
its consecutive runs. result gives loss for a multiclass classification problem as
(Eq 3):
C. Proposed Model architecture
Plant leaf image classification involves various steps
M
to be performed. The flow of steps for classification is X
illustrated in Figure 5. The model comprises a multi-scale − yo,c log qo,c (3)
feature extraction module with the expanded kernel, 2 c=1
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1265
Figure 1. Sample dataset images of tomato leaves (a)Internet Dataset, (b)Plant Village Dataset images and (c)Real-world Dataset
Model Layer Layer Type Used Kernel size Stride Neuron size Feature Maps
Multiscale Multiscale Feature extraction – – 224*224 96
max pooling2d MaxPooling2D 3*3 2 112*112 96
conv2d 4 Conv2D 3*3 1 112*112 192
batch normalization 4 Batch normalization – – 112*112 192
max pooling2d 1 MaxPooling2D 3*3 2 56*56 192
conv2d 5 Conv2D 3*3 1 56*56 16
batch normalization 5 Batch normalization – – 56*56 16
Resnet1 Residual Module – – 56*56 16
SeNet1 Self Excitation – – 56*56 16
Add Add – – 56*56 16
Activation 2 Activation – – 56*56 16
conv2d 10 Conv2D 3*3 2 28*28 32
batch normalization 8 Batch Normalization – – 28*28 32
ResNet2 Residual Module – – 28*28 32
SeNet2 Self excitation – – 28*28 32
Add 1 Add – – 28*28 32
Activation 5 Activation – – 28*28 32
conv2d 15 Convolution 3*3 2 14*14 64
batch normalization 11 Batch normalization – – 14*14 64
ResNet3 Residual Module – – 14*14 64
SeNet3 Self excitation – – 14*14 64
Add 2 Add – – 14*14 64
average pooling2d Average Pooling 7*7 7 2*2 64
Dropout Dropout – – 2*2 64
Flatten Flatten – – 256
Dense Softmax regression Classifier – 1*1*4 no of classes
http:// journals.uob.edu.bh
1266 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1267
Name Parameter
Memory(RAM) 12.0 GB
Processor Intel Core i5 @2.6 ghz
Graphics card Nvidia Graphics card GeForce 920MX
Language Python
section 2,the training set is used to train the model. proposed model there is not much difference between train-
Numerous experiments are carried out. ing accuracy and validation accuracy and as a result, there is
5) Validation data is used to assess the performance of no overfitting. When it comes to the training and validation
model and finally, testing is carried out for unseen data sets, the suggested model, as shown in Figure 6, works
test data. The actual results are compared with the well. DenseNet is a model used for disease recognition by
predicted categories and various performance evalu- the advent of a transfer learning strategy. Chen et al. have
ators will be assessed to check its effectiveness. developed their model by combining the VGG model with
Inception blocks. The accuracy of training and validation
Figure 9 shows validation loss, validation accuracy, differs dramatically, as seen in Figure 8(b). As a result, the
training loss and training accuracy. The a,b,c and d part system is overfitting. There is an improvement in training
of Figure 8 respectively represent the curves of recognition accuracy after the tenth epoch, but validation accuracy
accuracy and loss rate obtained by proposed and other appears to be steady. There is no consistent improvement for
models on training and validation dataset for real-world 11 validation loss, as seen in Figure 8(c). As a result, validation
disorders of the tomato plant. It can be noticed that in the accuracy behaves similarly. Figure 8(d) depicts the results
http:// journals.uob.edu.bh
1268 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
TABLE III. Hyperparameters
Hyperparameter Value
Solver type Adam
Learning rate Initial as 0.1
Batch size 30 for training and validating
Pre-trained Training accuracy(%) Training Loss Validation accuracy(%) Validation Loss Early
model stopping
epoch
number
Dense-Net [9] 100 0.00033 94.44 0.2682 311
Karlekar 99.70 0.0126 96.35 0.1497 127
Model(without
background
removal) [17]
Chen et al. [27] 100 0.00033 94.16 0.2687 34
VGG-GAP 67.69 0.9564 63.45 0.9861 238
Proposed Model 99.20 0.0268 97.27 0.0835 340
when the system converges more quickly. When its accuracy global average pooling.
is tested on unseen test data, it degrades dramatically. Table
V shows the performance comparison with other models. Table VI shows how proposed model delivers for real-
Karlekar et al. in their work designed the model from world images with 50 epochs for 10 folds. The average
scratch and applied pre-processing for the background re- recognition accuracy achieved is 91.76 with a loss of
moval using Hue, Saturation and Value (HSV) colour space. 0.26126. For fold 1 to fold 9, accuracy is higher than
However, when applied the same background removal tech- 90%. For the third set of experiments, all the images are
nique on real-complex images it doesn’t produce effective downloaded from the Internet, acquired using disease name
results due to fixed thresholds. They tested their findings in tomato plants. Images are categorized into three disease
on the PDDB database, which has a simple background. classes and one healthy class. All these images have a
PDDB can be accessed in the link https://www.digipathos- cluttered complex background that resembles real-world
rep.cnptia.embrapa.br/. Other notable work discussed here characteristics. The next step was to enrich the dataset
for the comparison purpose is Visual Geometry Group with augmented images. The main objective of network
(VGG) with Global Average Pooling (GAP) where VGG designing that it should be able to distinguish various
pre-trained model on ImageNet weights are combined with classes. Before training, images are resized for processing
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1269
and 0-255 range is selected for normalization for training Figure 11(a) shows the confusion matrix for test data for
by written script in Python using OpenCV framework. 11 biotic and abiotic disorder classes. From Figure 11(b) it
can be observed that class 6(Insect attack) and class 7(late
t-sne visualization technique(Figure 10) is adapted to blight) have low F1 score compared to other classes on test
show the feature representation of validation data for the data. Trainable parameters in a model are related to time
MF-SE-RT model. and space complexity. Therefore, apart from accuracy, it
is important to ensure time and space requirement for the
proposed model.
Table VII shows comparative results, and it is observable
by experimental results that the proposed method performs
better than other techniques on the Internet dataset. The
primary intuition behind this is to have distinguishable
characteristics extracted for different classes. Even though
DenseNet and VGG+GAP were trained using ImageNet
weights, the proposed model employs a random initializa-
tion strategy. Optimal results were not achieved by these
models.
All metrics for assessment are used for comparison
purpose as shown in Figure 12.
Figure 10. t-sne visualization of the validation
Furthermore, a confusion matrix is created for the anal-
ysis of validation data and it is depicted in Figure 13.
http:// journals.uob.edu.bh
1270 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
(a) (b)
(c)
(d)
Figure 9. Validation loss, validation accuracy, training loss and training accuracy (a) Proposed model, (b)DenseNet, (c)Chen et al.(d)VGG+GAP
We have compared a number of trainable parameters in and a multiscale feature extraction gives good performance
a model to other state-of-the-art models and the same is benefits.
depicted in Table VIII.
One limitation of this work is the limited real-world
4. Conclusions and Future Work dataset. However, architecture has proven efficient and
This work has presented architecture based on Residual consistent in performance due to testing on varied datasets
with SeNet for the classification of tomato leaf diseases. with train-test split and cross-validation strategies. In the
For the verification of the architecture robustness, a new future, we intend to expand dataset by adding more crops
dataset of real-world tomato leaf images is produced. Many and their respective diseases. We plan to add the age factor
existing results are shown for comparative analysis. It is at the time of capturing images. So, multimodal analysis can
verified that the combination of Residual blocks with SeNet
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1271
(a) (b)
Figure 11. Classification report for unseen test data (a)Confusion matrix, (b) Classification summary
Pre-trained Training accuracy(%) Training Loss Validation accuracy(%) Validation Loss Early
model stopping
epoch number
Dense-Net [9] 92.99 0.2696 84.62 0.3896 356
Karlekar 29.19 1.3814 26.56 1.3868 112
Model(without
background
removal) [17]
Chen et al. 96.86 0.1214 87.91 0.2473 11
[27]
VGG+GAP 68.79 0.8584 64.47 0.8961 237
[28]
Proposed 98.92 0.0456 95.97 0.1179 340
Model
be carried out by considering text and image data together. Severity estimation by considering age will bolster this
whole setup and will be effective for the farmers for efficient
http:// journals.uob.edu.bh
1272 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
decision making in the real-world environment. Visualiza- where speed and accuracy have to be balanced to achieve
tion strategies including Class Activation Maps(CAM), Gra- performance.
dient Class Activation Maps(GradCAM) etc. can be utilized
to find interclass and intraclass performance characteristics There could be a further improvement by guiding the
as well as the role of different filters used in architecture. farmer for automating adjustment in the orientation of a
camera angle that could prevent shadows at the time of
Trainable parameters Table shows that there is sig- clicking images. Besides, image capturing practices there
nificant different between models in terms of number of could be an improvement in the segmentation method,
trainable parameters. This eventually affects model perfor- reliability of their usage in the real world.
mance when it is to be operated in real-world environment
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1273
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks [16] W. Zeng and M. Li, “Crop leaf disease recognition based on self-
for large-scale image recognition,” arXiv preprint arXiv:1409.1556, attention convolutional neural network,” Computers and Electronics
2014. in Agriculture, vol. 172, p. 105341, 2020.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- [17] A. Karlekar and A. Seal, “Soynet: Soybean leaf diseases classi-
tion with deep convolutional neural networks,” Advances in neural fication,” Computers and Electronics in Agriculture, vol. 172, p.
information processing systems, vol. 25, pp. 1097–1105, 2012. 105342, 2020.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning [18] Q. Liang, S. Xiang, Y. Hu, G. Coppola, D. Zhang, and W. Sun,
for image recognition,” in Proceedings of the IEEE conference on “Pd2se-net: Computer-assisted plant disease diagnosis and severity
computer vision and pattern recognition, 2016, pp. 770–778. estimation network,” Computers and electronics in agriculture, vol.
157, pp. 518–529, 2019.
[8] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” in [19] U. Barman, R. D. Choudhury, D. Sahu, and G. G. Barman,
Proceedings of the IEEE conference on computer vision and pattern “Comparison of convolution neural networks for smartphone image
recognition, 2016, pp. 2818–2826. based real time classification of citrus leaf disease,” Computers and
Electronics in Agriculture, vol. 177, p. 105661, 2020.
[9] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
“Densely connected convolutional networks,” in Proceedings of the [20] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning
IEEE conference on computer vision and pattern recognition, 2017, for image-based plant disease detection,” Frontiers in plant science,
pp. 4700–4708. vol. 7, p. 1419, 2016.
[10] S. H. Lee, H. Goëau, P. Bonnet, and A. Joly, “New perspectives on [21] R. Keys, “Cubic convolution interpolation for digital image process-
plant disease characterization based on deep learning,” Computers ing,” IEEE transactions on acoustics, speech, and signal processing,
and Electronics in Agriculture, vol. 170, p. 105220, 2020. vol. 29, no. 6, pp. 1153–1160, 1981.
[11] A. Picon, M. Seitz, A. Alvarez-Gila, P. Mohnke, A. Ortiz-Barredo, [22] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study
and J. Echazarra, “Crop conditional convolutional neural networks of the class imbalance problem in convolutional neural networks,”
for massive multi-crop plant disease classification over cell phone Neural Networks, vol. 106, pp. 249–259, 2018.
acquired images taken on real field conditions,” Computers and
Electronics in Agriculture, vol. 167, p. 105093, 2019. [23] L. M. De Carvalho, F. W. Acerbi, J. G. Clevers, L. M. Fonseca, and
S. M. De Jong, “Multiscale feature extraction from images using
[12] P. K. Sethy, N. K. Barpanda, A. K. Rath, and S. K. Behera, “Deep wavelets,” in Remote sensing image analysis: Including the spatial
feature based rice leaf disease identification using support vector domain. Springer, 2004, pp. 237–270.
http:// journals.uob.edu.bh
1274 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
[24] R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka, Saiqa Khan Saiqa Khan is a PhD candidate
“Effective use of dilated convolutions for segmenting small object in the Department of Computer Engineer-
instances in remote sensing imagery,” in 2018 IEEE winter confer-
ence on applications of computer vision (WACV). IEEE, 2018, pp.
ing at DJ Sanghvi College of Engineering.
1442–1450. She has completed her masters in Computer
Engineering from Thadomal Shahani Engi-
[25] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” neering college , Mumbai. She has authored
Journal of machine learning research, vol. 9, no. 11, 2008. more than 50 publications. Her research
interest areas are computer vision, machine
[26] U. Shafi, R. Mumtaz, H. Anwar, A. M. Qamar, and H. Khurshid, learning and deep learning.
“Surface water pollution detection using internet of things,” in 2018
15th International Conference on Smart Cities: Improving Quality
of Life Using ICT & IoT (HONET-ICT). IEEE, 2018, pp. 92–96.
Meera Narvekar Dr.Meera Narvekar re-
[27] J. Chen, D. Zhang, Y. A. Nanehkaran, and D. Li, “Detection of ceived the Ph.D. degree in Computer Sci-
rice plant diseases based on deep transfer learning,” Journal of the ence and Technology from SNDT Univer-
Science of Food and Agriculture, vol. 100, no. 7, pp. 3246–3256, sity, Mumbai. She is currently professor and
2020.
Head of department of computer engineer-
ing, DJSCE Mumbai. She is a member of
[28] Q. Yan, B. Yang, W. Wang, B. Wang, P. Chen, and J. Zhang, “Apple
leaf diseases recognition based on an improved convolutional neural board of studies in Mumbai University. Her
network,” Sensors, vol. 20, no. 12, p. 3535, 2020. research interest are in mobile computing,
data science and machine learning.
[29] J. Ma, K. Du, F. Zheng, L. Zhang, Z. Gong, and Z. Sun, “A
recognition method for cucumber diseases using leaf symptom
images based on deep convolutional neural network,” Computers M.S. Joshi Dr. M.S.Joshi is currently head-
and electronics in agriculture, vol. 154, pp. 18–24, 2018. ing plant pathology department of Dr. B.
S. Konkan Krishi Vidyapeeth, Dapoli, Dist.
[30] J. Ma, K. Du, F. Zheng, L. Zhang, and Z. Sun, “A segmentation Ratnagiri, India. His area of interest are
method for processing greenhouse vegetable foliar disease symptom Mycology and Plant Pathology. His current
images,” Information Processing in Agriculture, vol. 6, no. 2, pp.
research work focusses Rice Pathology. He
216–223, 2019.
has experience over 22 years in agriculture.
He is a recipient of ‘Baliraja’ award for
writing book in Marathi on Mango diseases.
He is a editor/co-editor for various journals.
http:// journals.uob.edu.bh