Research Paper2

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/359652459
MF-SE-RT: Novel transfer Learning method for the detection of Tomato

diseases in real-world environment using dilated multiscale feature extraction
Article in International Journal of Computing and Digital Systems · March 2022

DOI: 10.12785/ijcds/1101102
CITATIONS READS
0 43
3 authors, including:
Saiqa Khan Meera Narvekar

University of Mumbai Dwarkadas J. Sanghvi College of Engineering
29 PUBLICATIONS 213 CITATIONS 71 PUBLICATIONS 451 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Digital Image Forgery View project
Enhanced Data Dissemination in a Mobile Environment View project
All content following this page was uploaded by Saiqa Khan on 13 December 2022.
The user has requested enhancement of the downloaded file.

International Journal of Computing and Digital Systems
ISSN (2210-142X)
Int. J. Com. Dig. Sys. 11, No.1 (Mar-2022)
https://dx.doi.org/10.12785/ijcds/1101102
MF-SE-RT:Novel Transfer Learning Method for the

Identification of Tomato Disorders in Real-World Using Dilated
Multiscale Feature Extraction
Saiqa Khan1 , Meera Narvekar1 and Makarand Joshi2
1
Department of Computer Engineering, DJ Sanghvi College of Engg.,Mumbai, India
2
Department of Plant Pathology, Dr. B.S. Konkan Agril. University, Dapoli, 415 712 Dist. Ratnagiri (M.S.)
Received 14 Mar. 2021, Revised 8 Dec. 2021, Accepted 9 Jan. 2022, Published 31 Mar. 2022
Abstract: In agriculture domain, plant disorder identification and its classification are one of the emerging problems to study. If a
timely and correct diagnosis is not done, it may lead to adverse effects on agricultural productivity and crop yield. The first sign of
disease appears on the leaves. Diseases can be detected from the symptoms appearing on leaves. Aiming at tomato, this paper presents
a novel disease recognition convolution neural network architecture based on Self-excitation network and ResNet architecture. The
main research gap identified was the use of lab controlled standard images, consideration of only biotic disorders and low accuracy
on unseen test dataset. The main contribution of this work is to increase generalization. Therefore, to reduce generalization error,
augmentation is applied and images are captured in a manner where leaf is surrounded by occlusion areas. To capture minute lesion
and spot details, multiscale feature extraction with dilated kernel is applied. Our collected real-world dataset consists of 11 types of
biotic and abiotic disorders. Various experiments are carried out to verify proposed method’s effectiveness. The proposed method has a
recognition accuracy of 81.19% on a real-world validation dataset using 75-10-15(train-validation-test) division ratio on augmented data
and average recognition accuracy of 91.76% for the 10-fold cross-validation technique. The comparative analysis with all state-of-the-art
techniques exhibited amelioration in the computation time and classification accuracy. The results are used to classify tomato biotic and
abiotic diseases in the real-world complex environment and novelty lies in the fact that both biotic and abiotic elements are taken into
account.
Keywords: Tomato leaf disease recognition, Image enhancement, Multiscale feature extraction, Residual block
1. Introduction of features and 2. Automatic identification of features.

In recent times, agricultural yield and productivity are Manual construction of a feature set can be conducted by
largely impacted by plant diseases. The whole scenario considering the texture, colour and shape properties of plant
becomes more complex when the farmers suffer from the leaves. Literature survey shows that many authors have
problem of inaccessibility of experts to aid in the iden- contributed to this [2], [3], [4]. The main hindrance in the
tification process. If the access is provided through long- way of detection using this is to find the feature set that is
distance travel, the visual inspection of disease symptoms capable to distinguish diseases that are very much similar
still is a challenge. Apart from this, the process is slow, and to each other. Examples are septoria leaf spot, early blight,
it is highly subjective. Over the last decade, there has been late blight and other similar diseases. This has given rise to
tremendous improvement in the field of pattern recognition. a second method based on automatic identification of fea-
This is still a challenging topic for the research fraternity. tures, carried out by deep learning using convolution neural
Countless studies have been published and devoted their network(CNN). CNN is a recognition and classification
attention to the optimization of the work cited earlier. network architecture. CNN can be developed from scratch
The symptoms that emerge on plant leaves are used to or it can be constructed by fine-tuning existing pioneering
diagnose most diseases and their identification helps in models, those who have won the ImageNet classification
diagnosing disease in time. Researchers across the globe challenge. A series of models produced every year are VGG
have implemented many algorithms for this automated iden- (Simonyan and Zisserman, 2014 [5]), AlexNet (Krizhevsky
tification and detection process [1]. The whole identification et al. 2012 [6]), ResNet (He et al. 2016 [7]), GoogLeNet
process can be constituted using 1. Manual identification model (Szegedy et al. 2015 [8]), and DenseNet (Huang et al.
E-mail address: [email protected], [email protected] , [email protected]

http:// journals.uob.edu.bh
1262 MF-SE-RT:Novel Transfer Learning Method for the Identification of Tomato Disorders in Real-World
Using Dilated Multiscale Feature Extraction
2017 [9]. Several studies have been carried out in the past Barman et al. have classified citrus leaf diseases using
for automatic disease detection of plants using hand-crafted MobileNet V2 CNN architecture. In their work, they have
features and non hand-crafted features by deep learning incorporated the EasyMeasure app to keep the distance
neural networks. Automatic feature engineering has proven constant for capturing leaf images with white plain back-
as the main impetus for moving towards automatic feature ground [19]. By examining the benefits and drawbacks of
extraction-based methods where generalization errors would recent past works, it has been determined that the majority
be minimal and this, in turn, gives the advantage to apply of the work have either assess their proposed model on
the algorithm in the real-world environment. One such 80:20 or with some other ratio or using cross-validation
effort was conveyed by Lee et al. wherein transfer learning technique. However, it is reported in the literature that
strategies for VGG, InceptionV3 and GoogLeNet along with systems proposed by Mohanty et al. and Picon et al. are
network from scratch is analysed to prove VGG16 works the ones who assessed their proposed architecture on all
better than other networks due to limited dataset exposure three train, validation, and test sets. Our main contributions
[10]. In their work, they have tried to use the algorithm on in this research work are summarized as follows:
unseen crops to demonstrate its applicability.
1) We have collected tomato leaf images across dif-
Waheed et al. have collected total of 12,332 images ferent farms for biotic and abiotic disorders. The
from various sources for the corn crop. To increase variation key research gap discovered is that many abiotic
in the dataset, they have applied many augmentation tech- diseases have symptoms that are similar to biotic
niques and proposed the optimized denseNet architecture disease symptoms and in the past, this is not taken
to classify corn diseases into four classes. They have into consideration while designing systems.
reported accuracy of 98.06% on manually collected images. 2) We have addressed an issue of small disease area
It has been observed that collected images have a uniform recognition by inducting a multiscale feature extrac-
background and this limits the use of the method in practical tion method, wherein features outputted by different
settings [22]. On the other hand, Picon et al. have addressed size kernels are fused to generate a robust feature
this issue by capturing mobile phone images of wheat leaves vector.
over two years. They have proven to achieve an average 3) We have introduced the dilated kernel in the multi-
Balanced accuracy(BAC) of 0.98 and were able to remove scale feature extraction process. The proposed model
71% of the classifier errors [11]. aliased as MF-SE-RT consists of Residual blocks
with SeNet to attain maximum feature representa-
Additionally, Sethy et al. have worked on the rice tion.
diseases and proposed ResNet50+SVM based method to 4) We conducted extensive experiments on varied
achieve an F1 score of 0.9838 [12]. Chen et al. have tried datasets and compared performance with many state-
and tested different hybrid methods by proposing com- of-the-art techniques. Visualization using t-SNE is
binations such as VGG19+Inception, DenseNet+Inception explored to show the effectiveness of the proposed
and image segmentation with CNN from scratch. Rice and model.
maize were the crops considered in their work and they 5) We examined performance using several evaluation
have demonstrated research on around 1200 images after measures by considering different background im-
applying various augmentation strategies [13], [14]. Next, ages.
in another approach by Argeuso et al. few-shot learning on
the Plant Village dataset is used to assess accuracy. Their The article is structured as follows. Section 2 presents data
proposed FSL network was a combination of Inception V3 acquisition and the proposed method. Section 3 explores the
network and a linear SVM classifier [15]. experimental results. Section 4 shows experimental settings
In another study conducted by Zeng et al., a Self- and design. Finally, the conclusion is presented.
Attention Convolutional Neural Network (SACNN) is ex- 2. Data Acquisition and Proposed Work
plored. The authors have claimed that a SACNN method
A. Data Acquisition
has a better anti-interference ability, which demonstrates its
strong robust power nature [16]. Additionally, researchers In this research work, three different data source sets
have worked on devising CNN from scratch for the plant are used for evaluation. The initial dataset constituted real-
disease classification process. One such approach was dis- world images. They are captured at various farms (Tansa,
cussed in the work proposed by Karlekar et al., wherein im- Laasalgaon, Badlapur, Neral) across Maharashtra State,
age segmentation on the PDDB dataset is carried out before India. Uneven illumination and complex background are the
implementing CNN from scratch. Results have shown that main characteristics of the real-world dataset. 12 MP mobile
their system SoyNet was able to detect Soybean diseases phone camera, IOS iPhone with 10Xmobile camera and
with 98.14% for the test data [17]. OPD2SE-Net was put Canon Powershot digital camera with a 12 mm minimum
forward by Liang et al. where visualization of hidden layers focal length. Another data source utilized for this research
was used to find and analyse activations of neurons for work is the Internet Downloaded image dataset. It consists
detection of diseases and severity estimation [18]. Utpal of three disease classes including early blight, late blight,
septoria leaf spot and healthy class. To further assess the
Int. J. Com. Dig. Sys. 11, No.1, 1261-1274 (Mar-2022) 1263
efficiency, PlantVillage (http://www.plantvillage.org [20]) dilated convolution kernel can be represented as (Eq .1):
standard dataset is also used to test recognition accuracy.
Compared to real-world images, the PlantVillage dataset X
has images with a simple background. It has in total 54,305 (F ∗ l k) (p) = F (s) k (t) (1)
disease leaf images for 13 plant species. 224 × 224 is the s+lt=p
input image size for the model. To achieve this, the input
dataset is normalized by the Bi-Cubic interpolation resizing where l is the dilation rate that indicates a degree to
method [21]. The collected plant leaf images are in JPEG which the kernel is widened. In the year 2015 for ImageNet
format. Figure 1 shows a sample dataset. competition, ResNet won first place for an image classifi-
cation task [21]. It was primarily being designed to solve
B. Data Augmentation the vanishing gradient problem. It does so by introducing
When classes are unbalanced or when there is a paucity residual blocks and skip connections as shown in Figure 3.
of data for training and validation, data augmentation is It is considered simpler compared to its previous counterpart
applied [22]. In the proposed work, using Keras following such as VGG.
augmentations are applied 1. Random rotation: Random
rotation with the given angle 2. Zoom: Zoom operation for Res-Net uses residual blocks and skip connections.
image 3. Horizontal flip & vertical Flip: Toggle between Residual blocks are considered as a special case of networks
the horizontal and vertical orientations of the image. 4. without the presence of gates. In a neural network, a gate
Fill: Points outside the boundaries of the input are filled serves as a threshold for determining when the network
according to the given mode. 5. Width shift and height shift: should employ standard stacked layers vs an identity con-
Move the entire picture horizontally or vertically at a certain nection. The output of lower levels is added to the output of
distance. Both biotic and abiotic disorders are considered in subsequent layers in an identity connection. In a nutshell,
this work for evaluation. Figure 2 shows the Plant Village it allows the network’s layers to learn in little steps rather
dataset image after augmentation. The proposed model uses than building transformations from scratch. Gates allow the
a ResNet residual module as a baseline for building a lighter flow of memory from initial layers to final layers. Gates
network that includes multi-scale feature extraction with are missing in the residual block’s skip connections; hence,
dilated kernel and ResNet-SeNet combination to improve they provide very good performance.
classification accuracy [23].
Formally, the underlying mapping is denoted as H(x)
Various convolution kernels with feature extraction on and another mapping fit by non-linear layers stacked to-
different scales are presented based on the features and gether is denoted by F(x)=H(x)-x. F(x)+x is a recast of
textures of different tomato illnesses. It can help to extract the original mapping. It is stated that optimising residual
local features. The large-scale convolution kernel 7 × 7 is mapping is more straightforward compared to original map-
used to extract contours. The importance of using varied ping. Recasted mapping is called residual mapping. The
kernels can be justified due to a fact that disease spots and main intuition is to achieve optimization and it is clearly
lesions are relatively small. On the other hand, textures of stated that it is much easier to optimize residual mapping
different diseases are also the same such as early blight, than original mapping. The presence of skip connections
late blight, septoria leaf spot and others. Conclusively, it is makes it easy for identity mapping to be learned. The
decided that to address these issues there is a need to take basic aim of the Squeeze and excitation network(SeNet)
into consideration both fine-grained features and coarse- is to boost representation quality and this can be carried
grained features. Therefore, dilated kernels of different sizes out by modelling interdependencies in convolution feature
are utilized with 32 kernels of small size( 1 × 1 and 3 × channels. The central idea is to apply feature recalibration,
3) and 16 kernels of large size (5 × 5 and 7 × 7). All the which supports the use of global information to concentrate
outputted feature maps are combined and pass into the next on the most relevant features while eliminating the less
layer. important ones.
Dilated convolution kernel technique was used exten- The structure is represented in Figure 4 where Ftr is the
sively in image segmentation [24]. After convolution, pool- transformation that performs mapping of input X to U and
ing layers are used in a traditional CNN architecture. They U ∈ RH×W×C . These produced features are passed through
reduce overfitting. They do, however, diminish the spatial a squeeze network to generate a channel descriptor by the
information of feature maps. In dilated convolution, a filter aggregation of feature maps.
is dilated before computing the convolution. For dilation,
the convolution filter size is increased, and zeroes are put t-Distributed Stochastic Neighbor Embedding(t-SNE) is
at all empty position to get the desired width and height one of the popular methods for visualization [25]. The
of the kernel. In other words, dilated convolutions indicate technique is used to create two-dimensional maps from
a type of convolution where holes are inserted between hundreds of dimensions. This is done by mapping multidi-
the elements of a kernel to inflate kernel, unlike traditional mensional data consists of hundreds of dimensions to two
standard kernel where l(dilation rate) is 1. Technically, 2D dimensions. The algorithm is non-linear, and it transforms
underlying data using different operations. Perplexity is a The loss is increased when the projected probability differs
measure of information that is defined as 2 to the power from the actual probability. So, the ideal case is to have
of the Shannon entropy. The perplexity of a fair die with zero loss. Cross-entropy loss for binary classification when
k sides is equal to k. In t-SNE, the perplexity may be the number of classes is two can be calculated as(Eq .2):
viewed as a knob that sets the number of effective nearest
neighbours. The original paper on t-sne visualization says,
− y log (q) + (1 − y) log (1 − q)

“The performance of SNE is fairly robust to changes in (2)
the perplexity, and typical values are between 5 and 50”.
t-sne visualization in the optimization process depends on where q is the probability. When M > 2 , loss is
hyperparameters and it will not produce similar outputs on calculated per class label per instance and summing the
its consecutive runs. result gives loss for a multiclass classification problem as
(Eq 3):
C. Proposed Model architecture
Plant leaf image classification involves various steps
M
to be performed. The flow of steps for classification is X
illustrated in Figure 5. The model comprises a multi-scale − yo,c log qo,c (3)
feature extraction module with the expanded kernel, 2 c=1
convolution and 2 pooling layers to reduce dimensionality

before the application of ResNet and SeNet. 3 ResNet- where M indicates number of classes for which loss is
Senet modules, an average pooling layer, drop out and at calculated, log is the natural logarithm, y is the indicator
the end fully connected layer. The reason for combining (binary) for some instance o when class label denoted as
residual blocks with squeeze and excitation network is to c is the accurate answer and q indicates final predicted
ensure attention for the region of interest areas such as probability outcome.
leaf in a complex background. The output of SeNet is to
produce attention weights when combined with convoluted Model evaluation parameters are accuracy, precision,
output. They generate feature maps that highlight areas of recall and F1 Score. They are calculated as:
more importance. This is similar to human brain functioning
where we focus on certain areas more compared to other
areas. All the convolution layers are followed by batch Accuracy = (T P + T N)/(T P + FP + FN + T N) (4)
normalization. The specific structure is shown in Figure 6
and related parameters are shown in Table I.
3. Experimental Results and Evaluations Precision = T P/(T P + FP) (5)
A. Image Classification Experiments
In all our experiments; pre-processing, data augmen-
tation and model implementation was implemented using Recall = T P/(T P + FN) (6)
Python 3.6, Keras and Tensorflow back end. Model train-
ing was implemented by Google Colab. The equipment
configuration is shown in Table II. Mini-batches are used
in training and validation. The batch size is kept as 30. F1 S core = 2∗(Recall ∗ Precision) / (Recall + Precision)
The maximum number of epochs is set to 100. The weight (7)
initialization is set to glorot Xavier and bias to zero. The where TP stands for True Positive, FP for False Positive,
glorot Xavier initializes each weight with a small Gaussian FN for False Negative and TN for True Negative [26] (Eq
value with mean = 0.0 and variance based on the fan- 4-7).
in and fan-out of the weight. The optimization parameter
is Adam and the final classification activation function is For the first set of experiments, the PlantVillage dataset
softmax. The momentum is 0.9 and the 0.1 is selected as is selected. The PlantVillage dataset is a standard plant
the initial learning rate. Reduce learning rate is adapted disease dataset consisting of 24 classes of crops for different
in which validation accuracy is monitored and if there has diseases. Many augmentation experiments are conducted
been no improvement for three epochs, then the learning on tomato leaf images by considering variations. The
rate is reduced by half. Nevertheless, early stopping is used, augmented PlantVillage dataset’s distribution is shown in
which means if loss of validation does not decrease after Figure 7.
six iterations, training is considered complete. For training
and validation, the input image size is set to 224 × 224. Table IV shows the performance of the different models
Table III shows all the hyperparameters. Aside from that, for the PlantVillage dataset. It is observed that proposed
the Cross-Entropy Loss function is used. Cross entropy loss model achieves 97.27% accuracy on validation data. For
also called log loss is used to assess the performance of a the second set of experiments, proposed model is tested on
model whose output can be characterized as probabilities. tomato leaf images captured in a real-world environment
Figure 1. Sample dataset images of tomato leaves (a)Internet Dataset, (b)Plant Village Dataset images and (c)Real-world Dataset
TABLE I. Model parameters
Model Layer Layer Type Used Kernel size Stride Neuron size Feature Maps
Multiscale Multiscale Feature extraction – – 224*224 96
max pooling2d MaxPooling2D 3*3 2 112*112 96
conv2d 4 Conv2D 3*3 1 112*112 192
batch normalization 4 Batch normalization – – 112*112 192
max pooling2d 1 MaxPooling2D 3*3 2 56*56 192
conv2d 5 Conv2D 3*3 1 56*56 16
Resnet1 Residual Module – – 56*56 16
SeNet1 Self Excitation – – 56*56 16
Add Add – – 56*56 16
Activation 2 Activation – – 56*56 16
conv2d 10 Conv2D 3*3 2 28*28 32
batch normalization 8 Batch Normalization – – 28*28 32
ResNet2 Residual Module – – 28*28 32
SeNet2 Self excitation – – 28*28 32
Add 1 Add – – 28*28 32
Activation 5 Activation – – 28*28 32
conv2d 15 Convolution 3*3 2 14*14 64
ResNet3 Residual Module – – 14*14 64
SeNet3 Self excitation – – 14*14 64
Add 2 Add – – 14*14 64
average pooling2d Average Pooling 7*7 7 2*2 64
Dropout Dropout – – 2*2 64
Flatten Flatten – – 256
Dense Softmax regression Classifier – 1*1*4 no of classes
Figure 2. Sample augmented images
Figure 3. Residual block[21]
Figure 4. Squeeze and excitation block
with uneven illumination and cluttered background condi- to 224 × 224.

tions. To reduce the overfitting issue with a small input 2) All the images are categorized to train, validation
dataset, augmentation techniques are applied to enrich the and test dataset and some images are kept there for
dataset (Figure 8). unseen data testing purpose.
3) Secondly, augmentation techniques are applied to
Following is a description of the process adopted in the increase diversity in the dataset and to make it
proposed system: balanced.
4) Next, images are pre-processed to make them nor-
1) First step is to change the size of all input images malized in the range 0 to 255. As discussed in
Figure 5. Framework for proposed work
Figure 6. Proposed model architecture
TABLE II. Equipment Configuration
Name Parameter
Memory(RAM) 12.0 GB
Processor Intel Core i5 @2.6 ghz
Graphics card Nvidia Graphics card GeForce 920MX
Language Python
section 2,the training set is used to train the model. proposed model there is not much difference between train-
Numerous experiments are carried out. ing accuracy and validation accuracy and as a result, there is
5) Validation data is used to assess the performance of no overfitting. When it comes to the training and validation
model and finally, testing is carried out for unseen data sets, the suggested model, as shown in Figure 6, works
test data. The actual results are compared with the well. DenseNet is a model used for disease recognition by
predicted categories and various performance evalu- the advent of a transfer learning strategy. Chen et al. have
ators will be assessed to check its effectiveness. developed their model by combining the VGG model with
Inception blocks. The accuracy of training and validation
Figure 9 shows validation loss, validation accuracy, differs dramatically, as seen in Figure 8(b). As a result, the
training loss and training accuracy. The a,b,c and d part system is overfitting. There is an improvement in training
of Figure 8 respectively represent the curves of recognition accuracy after the tenth epoch, but validation accuracy
accuracy and loss rate obtained by proposed and other appears to be steady. There is no consistent improvement for
models on training and validation dataset for real-world 11 validation loss, as seen in Figure 8(c). As a result, validation
disorders of the tomato plant. It can be noticed that in the accuracy behaves similarly. Figure 8(d) depicts the results
TABLE III. Hyperparameters
Hyperparameter Value
Solver type Adam
Learning rate Initial as 0.1
Batch size 30 for training and validating
Figure 7. Augmented PlantVillage dataset
TABLE IV. Comparative results on PlantVillage dataset
Pre-trained Training accuracy(%) Training Loss Validation accuracy(%) Validation Loss Early
model stopping
epoch
number
Dense-Net [9] 100 0.00033 94.44 0.2682 311
Karlekar 99.70 0.0126 96.35 0.1497 127
Model(without
background
removal) [17]
Chen et al. [27] 100 0.00033 94.16 0.2687 34
VGG-GAP 67.69 0.9564 63.45 0.9861 238
Proposed Model 99.20 0.0268 97.27 0.0835 340
when the system converges more quickly. When its accuracy global average pooling.
is tested on unseen test data, it degrades dramatically. Table
V shows the performance comparison with other models. Table VI shows how proposed model delivers for real-
Karlekar et al. in their work designed the model from world images with 50 epochs for 10 folds. The average
scratch and applied pre-processing for the background re- recognition accuracy achieved is 91.76 with a loss of
moval using Hue, Saturation and Value (HSV) colour space. 0.26126. For fold 1 to fold 9, accuracy is higher than
However, when applied the same background removal tech- 90%. For the third set of experiments, all the images are
nique on real-complex images it doesn’t produce effective downloaded from the Internet, acquired using disease name
results due to fixed thresholds. They tested their findings in tomato plants. Images are categorized into three disease
on the PDDB database, which has a simple background. classes and one healthy class. All these images have a
PDDB can be accessed in the link https://www.digipathos- cluttered complex background that resembles real-world
rep.cnptia.embrapa.br/. Other notable work discussed here characteristics. The next step was to enrich the dataset
for the comparison purpose is Visual Geometry Group with augmented images. The main objective of network
(VGG) with Global Average Pooling (GAP) where VGG designing that it should be able to distinguish various
pre-trained model on ImageNet weights are combined with classes. Before training, images are resized for processing
Figure 8. Augmented real world dataset
TABLE V. Comparative results on real-world dataset
Pre-trained model Training Training Validation Validation Early

accu- Loss accu- Loss stopping
racy(%) racy(%) epoch number
Dense-Net [9] 92.87 0.2817 80.19 0.4676 33
Karlekar Model(without background removal) [17] 24.46 2.2690 24.08 2.2690 11
VGG-Inception [27] 99.48 0.0580 79.70 0.6146 24
Proposed Model 86.82 0.3524 81.19 0.4857 39
and 0-255 range is selected for normalization for training Figure 11(a) shows the confusion matrix for test data for
by written script in Python using OpenCV framework. 11 biotic and abiotic disorder classes. From Figure 11(b) it
can be observed that class 6(Insect attack) and class 7(late
t-sne visualization technique(Figure 10) is adapted to blight) have low F1 score compared to other classes on test
show the feature representation of validation data for the data. Trainable parameters in a model are related to time
MF-SE-RT model. and space complexity. Therefore, apart from accuracy, it
is important to ensure time and space requirement for the
proposed model.
Table VII shows comparative results, and it is observable
by experimental results that the proposed method performs
better than other techniques on the Internet dataset. The
primary intuition behind this is to have distinguishable
characteristics extracted for different classes. Even though
DenseNet and VGG+GAP were trained using ImageNet
weights, the proposed model employs a random initializa-
tion strategy. Optimal results were not achieved by these
models.
All metrics for assessment are used for comparison
purpose as shown in Figure 12.
Figure 10. t-sne visualization of the validation
Furthermore, a confusion matrix is created for the anal-
ysis of validation data and it is depicted in Figure 13.
(a) (b)
(c)
(d)
Figure 9. Validation loss, validation accuracy, training loss and training accuracy (a) Proposed model, (b)DenseNet, (c)Chen et al.(d)VGG+GAP
We have compared a number of trainable parameters in and a multiscale feature extraction gives good performance
a model to other state-of-the-art models and the same is benefits.
depicted in Table VIII.
One limitation of this work is the limited real-world
4. Conclusions and Future Work dataset. However, architecture has proven efficient and
This work has presented architecture based on Residual consistent in performance due to testing on varied datasets
with SeNet for the classification of tomato leaf diseases. with train-test split and cross-validation strategies. In the
For the verification of the architecture robustness, a new future, we intend to expand dataset by adding more crops
dataset of real-world tomato leaf images is produced. Many and their respective diseases. We plan to add the age factor
existing results are shown for comparative analysis. It is at the time of capturing images. So, multimodal analysis can
verified that the combination of Residual blocks with SeNet
TABLE VI. 10-Fold cross validation results on real-world dataset
Fold Number Loss Accuracy

Fold1 0.161441 94.03%
Fold2 0.188695 93.43%
Fold3 0.10275 96.72%
Fold4 0.114066 95.52%
Fold5 0.25182 91.34%
Fold6 0.461352 85.37%
Fold7 0.177619 94.03%
Fold8 0.211399 93.13%
Fold9 0.09485 95.81%
Fold10 0.748863 95.81%
Average scores for all folds:
Accuracy: 91.7666150463952 (+- 5.075302627477456)
Loss: 0.2612682721681065
(a) (b)
Figure 11. Classification report for unseen test data (a)Confusion matrix, (b) Classification summary
TABLE VII. Comparative results on Internet dataset
Pre-trained Training accuracy(%) Training Loss Validation accuracy(%) Validation Loss Early
model stopping
epoch number
Dense-Net [9] 92.99 0.2696 84.62 0.3896 356
Karlekar 29.19 1.3814 26.56 1.3868 112
Model(without
background
removal) [17]
Chen et al. 96.86 0.1214 87.91 0.2473 11
[27]
VGG+GAP 68.79 0.8584 64.47 0.8961 237
[28]
Proposed 98.92 0.0456 95.97 0.1179 340
Model
be carried out by considering text and image data together. Severity estimation by considering age will bolster this
whole setup and will be effective for the farmers for efficient
Figure 12. Evaluation parameters for comparative study
Figure 13. Confusion matrix for validation data of Internet dataset
decision making in the real-world environment. Visualiza- where speed and accuracy have to be balanced to achieve
tion strategies including Class Activation Maps(CAM), Gra- performance.
dient Class Activation Maps(GradCAM) etc. can be utilized
to find interclass and intraclass performance characteristics There could be a further improvement by guiding the
as well as the role of different filters used in architecture. farmer for automating adjustment in the orientation of a
camera angle that could prevent shadows at the time of
Trainable parameters Table shows that there is sig- clicking images. Besides, image capturing practices there
nificant different between models in terms of number of could be an improvement in the segmentation method,
trainable parameters. This eventually affects model perfor- reliability of their usage in the real world.
mance when it is to be operated in real-world environment
TABLE VIII. Layer parameters
Sr. No Model Total parameters Trainable Parameters

1 TCCNN [29] 2,492,940 2,492,940
2 VGG+GAP 20,030,027 5,643
3 Unified Model(Li et al.,2019) 49,588,899 49,501,347
4 GPDCNN [30] 2,55,571 2,54,467
5 DenseNet-Inception [27] 20,242,984 20,013,928
6 DenseNet [9] 7,037,504 7,037,504
7 MF-SE-RT Proposed 2,57,370 256,266
References machine,” Computers and Electronics in Agriculture, vol. 175, p.

[1] J. G. A. Barbedo, “Plant disease identification from individual 105527, 2020.
lesions and spots using deep learning,” Biosystems Engineering, vol.
180, pp. 96–107, 2019. [13] J. Chen, D. Zhang, and Y. A. Nanehkaran, “Identifying plant
diseases using deep transfer learning and enhanced lightweight
[2] J. Basavaiah and A. A. Anthony, “Tomato leaf disease classification network,” Multimedia Tools and Applications, vol. 79, no. 41, pp.
using multiple feature extraction techniques,” Wireless Personal 31 497–31 515, 2020.
Communications, vol. 115, no. 1, pp. 633–651, 2020.
[14] J. Chen, J. Chen, D. Zhang, Y. Sun, and Y. A. Nanehkaran, “Using
[3] X. E. Pantazi, D. Moshou, and A. A. Tamouridou, “Automated leaf deep transfer learning for image-based plant disease identification,”
disease detection in different crop species through image features Computers and Electronics in Agriculture, vol. 173, p. 105393,
analysis and one class classifiers,” Computers and electronics in 2020.
agriculture, vol. 156, pp. 96–104, 2019.
[15] D. Argüeso, A. Picon, U. Irusta, A. Medela, M. G. San-Emeterio,
[4] S. S. Chouhan, U. P. Singh, U. Sharma, and S. Jain, “Leaf disease A. Bereciartua, and A. Alvarez-Gila, “Few-shot learning approach
segmentation and classification of jatropha curcas l. and pongamia for plant disease classification using images taken in the field,”
pinnata l. biofuel plants using computer vision based approaches,” Computers and Electronics in Agriculture, vol. 175, p. 105542,
Measurement, vol. 171, p. 108796, 2021. 2020.
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks [16] W. Zeng and M. Li, “Crop leaf disease recognition based on self-
for large-scale image recognition,” arXiv preprint arXiv:1409.1556, attention convolutional neural network,” Computers and Electronics
2014. in Agriculture, vol. 172, p. 105341, 2020.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica- [17] A. Karlekar and A. Seal, “Soynet: Soybean leaf diseases classi-
tion with deep convolutional neural networks,” Advances in neural fication,” Computers and Electronics in Agriculture, vol. 172, p.
information processing systems, vol. 25, pp. 1097–1105, 2012. 105342, 2020.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning [18] Q. Liang, S. Xiang, Y. Hu, G. Coppola, D. Zhang, and W. Sun,
for image recognition,” in Proceedings of the IEEE conference on “Pd2se-net: Computer-assisted plant disease diagnosis and severity
computer vision and pattern recognition, 2016, pp. 770–778. estimation network,” Computers and electronics in agriculture, vol.
157, pp. 518–529, 2019.
[8] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” in [19] U. Barman, R. D. Choudhury, D. Sahu, and G. G. Barman,
Proceedings of the IEEE conference on computer vision and pattern “Comparison of convolution neural networks for smartphone image
recognition, 2016, pp. 2818–2826. based real time classification of citrus leaf disease,” Computers and
Electronics in Agriculture, vol. 177, p. 105661, 2020.
[9] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
“Densely connected convolutional networks,” in Proceedings of the [20] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning
IEEE conference on computer vision and pattern recognition, 2017, for image-based plant disease detection,” Frontiers in plant science,
pp. 4700–4708. vol. 7, p. 1419, 2016.
[10] S. H. Lee, H. Goëau, P. Bonnet, and A. Joly, “New perspectives on [21] R. Keys, “Cubic convolution interpolation for digital image process-
plant disease characterization based on deep learning,” Computers ing,” IEEE transactions on acoustics, speech, and signal processing,
and Electronics in Agriculture, vol. 170, p. 105220, 2020. vol. 29, no. 6, pp. 1153–1160, 1981.
[11] A. Picon, M. Seitz, A. Alvarez-Gila, P. Mohnke, A. Ortiz-Barredo, [22] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study
and J. Echazarra, “Crop conditional convolutional neural networks of the class imbalance problem in convolutional neural networks,”
for massive multi-crop plant disease classification over cell phone Neural Networks, vol. 106, pp. 249–259, 2018.
acquired images taken on real field conditions,” Computers and
Electronics in Agriculture, vol. 167, p. 105093, 2019. [23] L. M. De Carvalho, F. W. Acerbi, J. G. Clevers, L. M. Fonseca, and
S. M. De Jong, “Multiscale feature extraction from images using
[12] P. K. Sethy, N. K. Barpanda, A. K. Rath, and S. K. Behera, “Deep wavelets,” in Remote sensing image analysis: Including the spatial
feature based rice leaf disease identification using support vector domain. Springer, 2004, pp. 237–270.
[24] R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka, Saiqa Khan Saiqa Khan is a PhD candidate
“Effective use of dilated convolutions for segmenting small object in the Department of Computer Engineer-
instances in remote sensing imagery,” in 2018 IEEE winter confer-
ence on applications of computer vision (WACV). IEEE, 2018, pp.
ing at DJ Sanghvi College of Engineering.
1442–1450. She has completed her masters in Computer
Engineering from Thadomal Shahani Engi-
[25] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” neering college , Mumbai. She has authored
Journal of machine learning research, vol. 9, no. 11, 2008. more than 50 publications. Her research
interest areas are computer vision, machine
[26] U. Shafi, R. Mumtaz, H. Anwar, A. M. Qamar, and H. Khurshid, learning and deep learning.
“Surface water pollution detection using internet of things,” in 2018
15th International Conference on Smart Cities: Improving Quality
of Life Using ICT & IoT (HONET-ICT). IEEE, 2018, pp. 92–96.
Meera Narvekar Dr.Meera Narvekar re-
[27] J. Chen, D. Zhang, Y. A. Nanehkaran, and D. Li, “Detection of ceived the Ph.D. degree in Computer Sci-
rice plant diseases based on deep transfer learning,” Journal of the ence and Technology from SNDT Univer-
Science of Food and Agriculture, vol. 100, no. 7, pp. 3246–3256, sity, Mumbai. She is currently professor and
2020.
Head of department of computer engineer-
ing, DJSCE Mumbai. She is a member of
[28] Q. Yan, B. Yang, W. Wang, B. Wang, P. Chen, and J. Zhang, “Apple
leaf diseases recognition based on an improved convolutional neural board of studies in Mumbai University. Her
network,” Sensors, vol. 20, no. 12, p. 3535, 2020. research interest are in mobile computing,
data science and machine learning.
[29] J. Ma, K. Du, F. Zheng, L. Zhang, Z. Gong, and Z. Sun, “A
recognition method for cucumber diseases using leaf symptom
images based on deep convolutional neural network,” Computers M.S. Joshi Dr. M.S.Joshi is currently head-
and electronics in agriculture, vol. 154, pp. 18–24, 2018. ing plant pathology department of Dr. B.
S. Konkan Krishi Vidyapeeth, Dapoli, Dist.
[30] J. Ma, K. Du, F. Zheng, L. Zhang, and Z. Sun, “A segmentation Ratnagiri, India. His area of interest are
method for processing greenhouse vegetable foliar disease symptom Mycology and Plant Pathology. His current
images,” Information Processing in Agriculture, vol. 6, no. 2, pp.
research work focusses Rice Pathology. He
216–223, 2019.
has experience over 22 years in agriculture.
He is a recipient of ‘Baliraja’ award for
writing book in Marathi on Mango diseases.
He is a editor/co-editor for various journals.
View publication stats

Research Paper2

Uploaded by

Copyright:

Available Formats

Research Paper2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper2

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

MF-SE-RT: Novel transfer Learning method for the detection of Tomato

Article in International Journal of Computing and Digital Systems · March 2022

Saiqa Khan Meera Narvekar

SEE PROFILE SEE PROFILE

Digital Image Forgery View project

Enhanced Data Dissemination in a Mobile Environment View project

The user has requested enhancement of the downloaded file.

MF-SE-RT:Novel Transfer Learning Method for the

1. Introduction of features and 2. Automatic identification of features.

E-mail address: [email protected], [email protected] , [email protected]

convolution and 2 pooling layers to reduce dimensionality

TABLE I. Model parameters

Figure 2. Sample augmented images

Figure 3. Residual block[21]

Figure 4. Squeeze and excitation block

with uneven illumination and cluttered background condi- to 224 × 224.

Figure 5. Framework for proposed work

Figure 6. Proposed model architecture

TABLE II. Equipment Configuration

Figure 7. Augmented PlantVillage dataset

TABLE IV. Comparative results on PlantVillage dataset

Figure 8. Augmented real world dataset

TABLE V. Comparative results on real-world dataset

Pre-trained model Training Training Validation Validation Early

TABLE VI. 10-Fold cross validation results on real-world dataset

Fold Number Loss Accuracy

TABLE VII. Comparative results on Internet dataset

Figure 12. Evaluation parameters for comparative study

Figure 13. Confusion matrix for validation data of Internet dataset

TABLE VIII. Layer parameters

Sr. No Model Total parameters Trainable Parameters

References machine,” Computers and Electronics in Agriculture, vol. 175, p.

View publication stats

You might also like