b3 Plant Leaf Disease Detection

A
Project Report
On
“ PLANT LEAF DISEASE DETECTION BY USING MACHINE AND
DEEP LEARNING APPROACH”
Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY,

ANANTAPURAMU.
In partial fulfilment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY In
COMPUTER SCIENCE AND ENGINEERING
N S REVATHI 17F41A0574
S V KARTHIKEYAN 17F41A0597
POOLA PUJITHA 17F41A0583
SAURABH SINGH 17F41A0599
SRINIVASULU 17F41A0596
Under the Esteemed Guidance of

R.MYTHELI ,M.TECH ,2.,
ASSISTANT PROFESSOR
2020-2021
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
KUPPAM ENGINEERING COLLEGE
(Approved by AICTE and Affiliated to JNTUA, Anantapuramu)
KES Nagar, Kuppam-517425,
Chittoor District
i
KUPPAM ENGINEERING COLLEGE
(Approved by AICTE and Affiliated to JNTUA, Anantapuram)
Accredited by NAAC & ISO 9001- 2008 Certified
KES Nagar, Kuppam-517425, Chittoor District
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that this project report entitled “PLANT LEAF DISEASE
DETECTION BY USING MACHINE LEARNING AND DEEP LEARNING
APPROACH” is being submitted by N.S.REVATHI(17F41A0574),
S.V.KARTHIKEYAN(17F41A0597),POOLA PUJITHA(17F41A0583 ),
SAURABH SIGH(17F41A059P) SREENIVASULU(17F41A0596), in partial
fulfillment of the requirements for the award of BACHELOR OF TECHNOLOGY
in COMPUTER SCIENCE AND ENGINEERING during the academic year
2020-2021.
Internal Guide Head of the Department

R.MYTHELI M.TECH DR. K. LOGESH M.Tech.,PHD
Assistant Professor Dept. of CSE, HOD,
Dept. of CSE, Kuppam Engineering College.
KEC, Kuppam Kuppam
Submitted for viva voce Examination held on __________________
Internal Examiner External Examiner
ii
DECLARATION
We have made this project report on the topic “PLANT LEAF DISEASE
DETECTION BY USING MACHINE AND DEEP LEARNING APPROACH” we have
tried our best to elucidate all the relevant details of the topic to be included in
the report. While in the beginning we have tried to give a general view about
this topic.
We hereby declare that the information furnished in this project report

is true to the best of our knowledge and belief. Any mistakes shall be
Apologized.
NAME OF THE STUDENT REGISTER NUMBER SIGNATURE
N S REVATHI 17F41A0574
S V KARTHIKEYAN 17F41A0597
POOLA PUJITHA 17F41A0583
SAURABH SINGH 17F41A0599

SRINIVASULU 17F41A0596
Date :
Place :
iii
ACKNOWLEDGEMENT
First and foremost, we thank our beloved parents who have constant source of
encouragement all along our passage.
We express our gratitude and heartfelt thanks to our guide R.MYTHELI M.TECH,2.,
Assistant Professor, Department of Computer Science and Engineering for his inspiring and
esteemed guidance and support in every aspect of the Project work without which the
report would have not been completed.
We would like to thank sense of acknowledgement to Dr. K. LOGESH, M.E., Ph.D.
Head of Department of Computer Science and Engineering for his timely suggestions and
motivation.
We take this opportunity to express our profound sense of sincere & deep gratitude
to Dr. Sudhakar Babu, M.Tech, Ph.D. Principal, Kuppam Engineering College.
We would like to thank the Management of Kuppam Engineering College for
providing the facilities to carry out this Project report work.
We thank all the faculty members, lab instructors and attenders of Computer Science
and Engineering Department, Kuppam Engineering College for their co-operation and
support.
With Regards
N S REVATHI
S V KARTHIKEYAN
POOLA PUJITHA
SAURABH SINGH
SRINIVASULU
iv
ABSTRACT
In India, Agriculture plays an essential role because of the rapid growth

of population and increased in demand for food. Therefore, it needs to increase
in crop yield. One major effect on low crop yield is disease caused by bacteria,
virus and fungus. It can be prevented by using plant diseases detection
techniques. Machine learning methods can be used for diseases identification
because it mainly apply on data themselves and gives priority to outcomes of
certain task. These techniques will help in identifying plant diseases thereby
increasing the yield of plants. This survey paper describes plant disease
identification using Machine Learning & Deep learning Approach and study in
detail about various techniques for disease identification and classification is
also done.
v
TABLE OF CONTENT
SL. NO INDEX PAGE NO

ABSTRACT V
LIST OF FIGURES VIII
1 INTRODUCTION 1-12
2 LITERATURE SURVEY 13
2.1 CLASSIFICATION OF CROPS AND WEEDS FROM 13
DIGITAL IMAGE A SVM APPROACH
2.2 COTTON LEAF DISEASE IDENTIFICATION 14
USING PATTERN RECOGNITION TECHNIQUES
2.3 MACHINE LEARNING FOR HIGH-THROUGHPUT 14-15
STRESS PHENOTYPING IN PLANTS
2.4 DETECTION OF POTATO DISEASE USING IMAGE 15
SEGMENTATION AND MULTICLASS SUPPORT
VECTOR MACHINE
2.5 PLANT DISEASE DETECTION USING IMAGE 15-16
PROCESSING
3 EXISTING SYSTEM 14
3.1 DISADVANTAGES OF EXISTING SYSTEM 14
4 PROPOSED SYSTEM 18
4.1 ADVANTAGES OF PROPOSED SYSTEM 19-20
5 PROJECT DESCRIPTION 21
5.1 INTRODUCTION 21-26

5.1.1 DATASET PREPARATION AND PREPROCESSING 27-29
5.1.2 IMAGE PREPROCESSING 29
5.1.3 DATA AUGMENTATION 30
5.1.4 DATA SPLITTING 30
5.1.5 MODELING EVALUATION 30-34
5.2 SOFTWARE SPECIFICATION 34
5.2.1 GENERAL 34-40
6 IMPLEMENTATION AND RESULT 41
6.1 INTRODUCTION 41
6.2 IMPLEMENATATION 41-42
6.3 BACKEND CODE 42-48
6.4 RESULT 48-51
vi
7 CONCLUSION AND FUTURE SCOPE 52
7.1 CONCLUSION AND FUTURE SCOPE 52
7.2 REFERENCE 53-54
LIST OF FUGURES
SL NO INDEX PAGE NO
1.A Table Non-Comprehensive List Of IPython Magic 3
Function
1.1 Machine Learning 4

1.2 Clustering 6
1.3 First Principle Component 9
1.4 Neural Network And Deep Learning 12
4 CNN Architecture 19
5.1A System Architecture 22
5.1.B Block Diagram 23

5.1.C Back End Module Diagram 23
5.1.D Use Case Diagram 24
5.1.E Use Case Diagram 24
5.1.F Stat Diagram 25
5.1.G Activity Diagram 25
5.1.H Architecture Diagram 26
5.1.I ER Diagram 26
5.5.1. VGG 16 Model 33
5.5.2 VGG 16 MODEL 33
5.5.3 Supporting Vector Machine(SVM) 34
vii
5.6.1 Anaconda 36
5.6.2 Visual Studio 37
6.4.1 Login Page 48
6.4.2 Uploading The Image For Detecting The Disease 49
6.4.3 Detects The Disease 50
6.4.4 Detects The Healthy Leaf 50
6.4.5 Comparing The Algorithms For Disease Detection 51
viii
PLANT LEAF DISEASE DETECTION USING DEEP LEARNING AND MACHINE LEARNING APPROACH
CHAPTER 1
INTRODUCTION
1.1 GENERAL
Glossary and Key Terms
This section provides a quick reference for several algorithms that are not explicity
mentioned in this chapter, but may be of interest to the reader. This should provide the reader
with some keywords or useful points of reference for other similar libraries to those discussed
in this chapter.
BIDMach GPU accelerated machine learning library for algorithms that are not necessarily
neural network based.
Caret provides a standardised API for many of the most useful machine learning packages for
R. For readers who are more comfortable with R, Caret provides a good substitute for Python’s
SciKit-Learn.
Mathematica is a commercial symbolic mathematical computation system, developed since

1988 by Wolfram, Inc. It provides powerful machine learning techniques “out of the box” such
as image classification.
MATLAB is short for MATrix LABoratory, which is a commercial numerical computing

environment, and is a proprietary programming language by MathWorks. It is very popular at
universities where it is often licensed. It was originally built on the idea that most computing
applications in some wayrely on storage and manipulations of one fundamental object—the
matrix, and this is still a popular approach.
-R is used extensively by the statistics community. The software package Caret provides a
standardised API for many of R’s machine learning libraries.
WEKA is short for the Waikato Environment for Knowledge Analysis and has been a very
popular open source tool since its inception in 1993. In 2005 Weka received the SIGKDD Data
Mining and Knowledge Discovery Service
1
Award: it is easy to learn and simple to use, and provides a GUI to many machine learning
algorithms.
Vowpal Wabbit Microsoft’s machine learning library. Mature and actively developed, with an
emphasis on performance.
Requirements and Installation
The most convenient way of installing the Python requirements for this tutorial is by
using the Anaconda scientific Python distribution. Anaconda is a collection of the most
commonly used Python packages preconfigured and ready to use.
Approximately 150 scientific packages are included in the Anaconda installation.
Install the version of Anaconda for your operating system.
All Python software described here is available for Windows, Linux, and Macintosh. All code
samples presented in this tutorial were tested under Ubuntu Linux 14.04 using Python 2.7.
Some code examples may not work on Windows without slight modification (e.g. file paths in
Windows use \ and not / as in
UNIX type systems).
The main software used in a typical Python machine learning pipeline can consist of
almost any combination of the following tools:
1. NumPy, for matrix and vector manipulation
2. Pandas for time series and R-like DataFrame data structures
3. The 2D plotting library matplotlib
4. SciKit-Learn as a source for many machine learning algorithms and utilities
5. Keras for neural networks and deep learning
2
Managing Packages
Anaconda comes with its own built in package manager, known as Conda. Using the
conda command from the terminal, you can download, update, and delete Python packages.
Conda takes care of all dependencies and ensures that packages are preconfigured to work with
all other packages you may have installed.
Keeping your Python distribution up to date and well maintained is essential in this fast moving
field. However, Anaconda makes it particularly easy to manage and keep your scientific stack up
to date. Once Anaconda is installed you can manage your Python distribution, and all the
scientific packages installed by Anaconda using the conda application from the command line.
To list all packages currently installed, use conda list. This will output all packages and their
version numbers. Updating all Anaconda packages in your system is performed using the conda
update -all command. Conda itself can be updated using the conda update conda command,
while Python can be updated using the conda update python command. To search for packages,
use the search parameter, e.g. conda search stats where stats is the name or partial name of the
package you are searching for.
Table1.A Non-Comprehensive List Of IPython Magic Function
3
Jupyter
Jupyter, previously known as IPython Notebook, is a web-based, interactive development

environment. Originally developed for Python, it has since expanded to support over 40 other
programming languages including Julia and R.
Jupyter allows for notebooks to be written that contain text, live code, images, and equations.
These notebooks can be shared, and can even be hosted on GitHub for free.
For each section of this tutorial, you can download a Juypter notebook that allows you to edit and
experiment with the code and examples for each topic. Jupyter is part of the Anaconda
distribution; it can be started from the command line using the jupyter command:
Machine Learning
We will now move on to the task of machine learning itself. In the following sections we will
describe how to use some basic algorithms, and perform regression, classification, and clustering
on some freely available medical datasets concerning breast cancer and diabetes, and we will
also take a look at a DNA microarray dataset.
Fig 1.1 Machine Learning
4
SciKit-Learn
SciKit-Learn provides a standardised interface to many of the most commonly used
machine learning algorithms, and is the most popular and frequently used library for machine
learning for Python. As well as providing many learning algorithms, SciKit-Learn has a large
number of convenience functions for common preprocessing tasks (for example, normalisation
or k-fold cross validation).
SciKit-Learn is a very large software library.
Clustering
Clustering algorithms focus on ordering data together into groups. In general clustering
algorithms are unsupervised—they require no y response variable as input. That is to say, they
attempt to find groups or clusters within data where you do not know the label for each sample.
SciKit-Learn have many clustering algorithms, but in this section we will demonstrate
hierarchical clustering on a DNA expression microarray dataset using an algorithm from the
SciPy library.
We will plot a visualisation of the clustering using what is known as a dendrogram, also
using the SciPy library.
The goal is to cluster the data properly in logical groups, in this case into the cancer types
represented by each sample’s expression data. We do this using agglomerative hierarchical
clustering, using Ward’s linkage method:
5
Fig 1.2 Clustering

Classification
we analysed data that was unlabelled—we did not know to what class a sample belonged
(known as unsupervised learning). In contrast to this, a supervised problem deals with labelled
data where are aware of the discrete classes to which each sample belongs. When we wish to
predict which class a sample belongs to, we call this a classification problem. SciKit-Learn has a
number of algorithms for classification, in this section we will look at the Support Vector
Machine.
We will work on the Wisconsin breast cancer dataset, split it into a training set and a test
set, train a Support Vector Machine with a linear kernel, and test the trained model on an
unseen dataset. The Support Vector Machine model should be able to predict if a new sample is
malignant or benign based on the features of a new, unseen sample:
6
You will notice that the SVM model performed very well at predicting the malignancy of
new, unseen samples from the test set—this can be quantified nicely by printing a number of
metrics using the classification report function. Here, the precision, recall, and F1 score (F1 = 2 ·
precision·recall/precision+recall) for each class is shown. The support column is a count of the
number of samples for each class.
Support Vector Machines are a very powerful tool for classification. They work well in
high dimensional spaces, even when the number of features is higher than the number of
samples. However, their running time is quadratic to the number of samples so large datasets can
become difficult to train. Quadratic means that if you increase a dataset in size by 10 times, it
will take 100 times longer to train.
Last, you will notice that the breast cancer dataset consisted of 30 features. This makes it
difficult to visualize or plot the data. To aid in visualization of highly dimensional data, we can
apply a technique called dimensionality reduction.
7
Dimensionality Reduction
Another important method in machine learning, and data science in general, is
dimensionality reduction. For this example, we will look at the Wisconsin breast cancer dataset
once again. The dataset consists of over 500 samples, where each sample has 30 features. The
features relate to images of a fine needle aspirate of breast tissue, and the features describe the
characteristics of the cells present in the images. All features are real values. The target variable
is a discrete value (either malignant or benign) and is therefore a classification dataset.
You will recall from the Iris example in that we plotted a scatter matrix of the data, where
each feature was plotted against every other feature in the dataset to look for potential
correlations. By examining this plot you could probably find features which would separate the
dataset into groups. Because the dataset only had 4 features we were able to plot each feature
against each other relatively easily. However, as the numbers of features grow, this becomes less
and less feasible, especially if you consider the gene expression example in which had over 6000
features. One method that is used to handle data that is highly dimensional is Principle
Component Analysis, or PCA. PCA is an unsupervised algorithm for reducing the number of
dimensions of a dataset. For example, for plotting purposes you might want to reduce your data
down to 2 or 3 dimensions, and PCA allows. You to do this by generating components, which
are combinations of the original features that you can then use to plot your data. PCA is an
unsupervised algorithm. You supply it with your data, X, and you specify the number of
components you wish to reduce its dimensionality to. This is known as transforming the data:
8
Fig 1.3 First Principle Component

Again, you would not use this model for new data—in a real world scenario, you would,
for example, perform a 10-fold cross validation on the dataset, choosing the model parameters
that perform best on the cross validation. This model would be much more likely to perform well
on new data. At the very least, you would randomly select a subset, say 30% of the data, as a test
set and train the model on the remaining 70% of the dataset. You would evaluate the model
based on the score on the test set and not on the training set
9
NEURAL NETWORKS AND DEEP LEARNING

While a proper description of neural networks and deep learning is far beyond the scope
of this chapter, we will however discuss an example use case of one of the most popular
frameworks for deep learning: Keras. In this section we will use Keras to build a simple neural
network to classify theWisconsin breast cancer dataset that was described earlier. Often, deep
learning algorithms and neural networks are used to classify images—convolutional neural
networks are especially used for image related classification. However, they can of course be
used for text or tabular-based data as well. In this we will build a standard feed-forward, densely
connected neural network and classify a text-based cancer dataset in order to demonstrate the
framework’s usage.
In this example we are once again using the Wisconsin breast cancer dataset, which
consists of 30 features and 569 individual samples. To make it more challenging for the neural
network, we will use a training set consisting of only 50% of the entire dataset, and test our
neural network on the remaining 50% of the data.
Note, Keras is not installed as part of the Anaconda distribution, to install it use pip:
10
Keras additionally requires either Theano or TensorFlow to be installed. In the examples

in this chapter we are using Theano as a backend, however the code will work identically for
either backend. You can install Theano using pip, but it has a number of dependencies that must
be installed first. Refer to the Theano and TensorFlow documentation for more information.
Keras is a modular API. It allows you to create neural networks by building a stack of modules,
from the input of the neural network, to the output of the neural network, piece by piece until you
have a complete network. Also, Keras can be configured to use your Graphics Processing Unit,
or GPU. This makes training neural networks far faster than if we were to use a CPU. We begin
by importing Keras:
We may want to view the network’s accuracy on the test (or its loss on the training set)
over time (measured at each epoch), to get a better idea how well it is learning. An epoch is one
complete cycle through the training data.
Fortunately, this is quite easy to plot as Keras’ fit function returns a history object which we can
use to do exactly this:
This will result in a plot similar to that shown. Often you will also want to plot the loss on
the test set and training set, and the accuracy on the test set and training set. Plotting the loss and
accuracy can be used to see if you are over fitting (you experience tiny loss on the training set,
but large loss on the test set) and to see when your training has plateaued.
11
Fig 1.4 Neural Network And Deep Learning
PROBLEM STATEMENT:
Agriculture is one of the important sources of income for farmer. Farmers can grow
variety of plants but diseases hamper the growth of plants. One of the major factors that lead the
destruction of plant is disease attack. Disease attack may reduce the productivity plants from
10%-95%. Classification of Plant and Diseased Plants using Machine Learning approach which
can help to control growth of diseases on Plants using the pesticides in the quantity needed so
that excess use of pesticides can be avoided. Automatic identification of plant diseases is an
important task as it may be proved beneficial for farmer to monitor large field of plants, and
identify the disease using machine learning approach. As per the survey, this paper has made an
attempt to study machine learning method used by researchers to identify diseases and
classification. These machine learning methods will help system to identify disease occurred on
plant by image processing and system will inform farmer about disease in detail and specify the
medicine to get rid of plant disease and increase the productivity.
12
CHAPTER 2
LITERATURE SURVEY
2.1 TITLE: CLASSIFICATION OF CROPS AND WEEDS FROM DIGITAL IMAGES: A
SVM APPROACH
AUTHOR: F Ahmed, Ha Ai-Mamun, Asmh Bari, E Hossain
DESCRIPTION:
In most agricultural systems, one of the major concerns is to reduce the growth of weeds. In most
cases, removal of the weed population in agricultural fields involves the application of chemical
herbicides, which has had successes in increasing both crop productivity and quality. However,
concerns regarding the environmental and economic impacts of excessive herbicide applications
have prompted increasing interests in seeking alternative weed control approaches. An
automated machine vision system that can distinguish crops and weeds in digital images can be a
potentially cost-effective alternative to reduce the excessive use of herbicides. In other words,
instead of applying herbicides uniformly on the field, a realtime system can be used by
identifying and spraying only the weeds. This paper investigates the use of a machine-learning
algorithm called support vector machine (SVM) for the effective classification of crops and
weeds in digital images. Our objective is to evaluate if a satisfactory classification rate can be
obtained when SVM is used as the classification model in an automated weed control system. In
our experiments, a total of fourteen features that characterize crops and weeds in images were
tested to find the optimal combination of features that provides the highest classification rate.
Analysis of the results reveals that SVM achieves above 97% accuracy over a set of 224 test
images. Importantly, there is no misclassification of crops as weeds and vice versa
13
2.2. TITLE: COTTON LEAF DISEASE IDENTIFICATION USING PATTERN
RECOGNITION TECHNIQUES
AUTHOR: P. R. Rothe and R. V. Kshirsagar
DESCRIPTION:
Leaf diseases on cotton plant must be identified early and accurately as it can prove
detrimental to the yield. The proposed work presents a pattern recognition system for
identification and classification of three cotton leaf diseases i.e. Bacterial Blight, Myrothecium
and Alternaria. The images required for this work are captured from the fields at Central Institute
of Cotton Research Nagpur, and the cotton fields in Buldana and Wardha district. Active contour
model is used for image segmentation and Hu's moments are extracted as features for the training
of adaptive neuro-fuzzy inference system. The classification accuracy is found to be 85 percent.
2.3. TITLE: MACHINE LEARNING FOR HIGH-THROUGHPUT STRESS
PHENOTYPING IN PLANTS
AUTHOR: Singh Arti
DESCRIPTION:
Advances in automated and high-throughput imaging technologies have resulted in a
deluge of high-resolution images and sensor data of plants. However, extracting patterns and
features from this large corpus of data requires the use of machine learning (ML) tools to enable
data assimilation and feature identification for stress phenotyping. Four stages of the decision
cycle in plant stress phenotyping and plant breeding activities where different ML approaches
can be deployed are (i) identification, (ii) classification, (iii) quantification, and (iv) prediction
(ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to
14
enable the plant community to correctly and easily apply the appropriate ML tools and best-
practice guidelines for various biotic and abiotic stress traits.
2.4. TITLE: DETECTION OF POTATO DISEASES USING IMAGE SEGMENTATION AND
MULTICLASS SUPPORT VECTOR MACHINE
AUTHOR: Monzurul Islam, Anh Dinh And Khan Wahid
DESCRIPTION:
Modern phenotyping and plant disease detection provide promising step towards food
security and sustainable agriculture. In particular, imaging and computer vision based
phenotyping offers the ability to study quantitative plant physiology. On the contrary, manual
interpretation requires tremendous amount of work, expertise in plant diseases, and also requires
excessive processing time. In this work, we present an approach that integrates image processing
and machine learning to allow diagnosing diseases from leaf images. This automated method
classifies diseases (or absence thereof) on potato plants from a publicly available plant image
database called `Plant Village'. Our segmentation approach and utilization of support vector
machine demonstrate disease classification over 300 images with an accuracy of 95%. Thus, the
proposed approach presents a path toward automated plant diseases diagnosis on a massive scale.
2.5. TITLE: PLANT DISEASE DETECTION USING IMAGE PROCESSING
AUTHOR: Khirade, Sachin D., and A. B. Patil
DESCRIPTION:
Identification of the plant diseases is the key to preventing the losses in the yield and
quantity of the agricultural product. The studies of the plant diseases mean the studies of visually
15
observable patterns seen on the plant. Health monitoring and disease detection on plant is very
critical for sustainable agriculture. It is very difficult to monitor the plant diseases manually. It
requires tremendous amount of work, expertize in the plant diseases, and also require the
excessive processing time. Hence, image processing is used for the detection of plant diseases.
Disease detection involves the steps like image acquisition, image pre-processing, image
segmentation, feature extraction and classification. This paper discussed the methods used for the
detection of plant diseases using their leaves images. This paper also discussed some
segmentation and feature extraction algorithm used in the plant disease detection.
16
CHAPETR – 3
EXISTING SYSTEM
The identification of plant disease is the premise of the prevention of plant disease
efficiently and precisely in the complex environment. With the rapid development of the smart
farming, the identification of plant disease becomes digitalized and data-driven, enabling
advanced decision support, smart analyses, and planning. This paper proposes a mathematical
model of plant disease detection and recognition based on deep learning, which improves
accuracy, generality, and training efficiency. Firstly, the region proposal network (RPN) is
utilized to recognize and localize the leaves in complex surroundings. Then, images segmented
based on the results of RPN algorithm contain the feature of symptoms through Chan–Vese (CV)
algorithm. Finally, the segmented leaves are input into the transfer learning model and trained by
the dataset of diseased leaves under simple background. Furthermore, the model is examined
with black rot, bacterial plaque, and rust diseases. The results show that the accuracy of the
method is 83.57%, which is better than the traditional method, thus reducing the influence of
disease on agricultural production and being favorable to sustainable development of agriculture.
Therefore, the deep learning algorithm proposed in the paper is of great significance in
intelligent agriculture, ecological protection, and agricultural production.
3.1 DISADVANTAGESOF EXISTING SYSTEM
1. In the current work image preprocessing steps like image augmentation, color masking is
used before applying to CNN model .Here MobileNet is used as base model.
2. Vision loss has a significant impact on the lives of those who experience it as well as on
their families, their friends, and society.
3. Vision loss can affect one's quality of life (QOL), independence, and mobility and has
been linked to falls, injury, and worsened status in domains spanning mental health,
cognition, social function, employment, and educational attainment
17
CHAPTER 4
PROPOSED SYSTEM
The proposed system convolutional neural networks (CNNs) has achieved impressive
results in the field of image classification. This paper is concerned with a new approach to the
development of plant disease recognition model, based on leaf image classification, by the use of
deep convolutional networks. Novel way of training and the methodology used facilitate a quick
and easy system implementation in practice. The developed model is able to recognize 13
different types of plant diseases out of healthy leaves, with the ability to distinguish plant leaves
from their surroundings. According to our knowledge, this method for plant disease recognition
has been proposed for the first time. All essential steps required for implementing this disease
recognition model are fully described throughout the paper, starting from gathering images in
order to create a database, assessed by agricultural experts. Caffe, a deep learning framework
developed by Berkley Vision and Learning Centre, was used to perform the deep CNN training.
The experimental results on the developed model achieved precision between 91% and 98%, for
separate class tests, on average 96.3%
18
CNN Architecture:-
Fig 4 CNN Architecture
4.1 ADVANTAGES OF PROPOSED SYSTEM:

1. In the proposed model pre trained VGG16 is used as base model for transfer learning.
2. As transfer learning is used so no of training parameters are reduced which reduces the
time complexity and improves the performance.
3. Our proposed system will accurately detect the affected area from the original area.
4. This system will efficiently mark the affected area from original image.
19
SYSTEM SPECIFICATION:
HARDWARE REQUIREMENTS:
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 500 GB
SOFTWARE REQUIREMENTS:
PYTHON IDE : Anaconda Jupyter Notebook
PROGRAMMING LANGUAGE : Python
20
CHAPTER 5
PROJECT DESCRIPTION
5.1 INTRODUCTION
The problem of efficient plant disease protection is closely related to the problems of
sustainable agriculture and climate change In India, Farmers have a great diversity of crops.
Various pathogens are present in the environment which severely affects the crops and the soil in
which the plant is planted, thereby affecting the production of crops .Various disease are
observed on the plants and crops .The main identification of the affected plant or crop are its
leaves. The various colored spots and patterns on the leaf are very useful in detecting the disease.
The past scenario for plant disease detection involved direct eye observation, remembering the
particular set of disease as per the climate, season etc. These methods were indeed inaccurate and
very time consuming. The current methods of plant disease detection involved various laboratory
tests, skilled people, well equipped laboratories etc. These things are not available everywhere
especially in remote areas.
Detection of disease through some automatic technique is helpful because it reduces an

oversized work of watching in huge farms of crops, and at terribly early stage itself it detects the
symptoms of diseases means that after they seem on plant leaves. There are several ways to
detect plant pathologies. Some diseases do not have any visible symptoms, or the effect becomes
noticeable too late to act, and in those situations, a sophisticated analysis is obligatory.
However, most diseases generate some kind of manifestation in the visible spectrum, so
the naked eye examination of a trained professional is the prime technique adopted in practice
for plant disease detection. Variations in symptoms indicated by diseased plants may lead to an
improper diagnosis since amateur gardeners and hobbyists could have more difficulties
determining it than a professional plant pathologist. An automated system designed to help
identify plant diseases by the plant’s appearance and visual symptoms could be of great help to
amateurs in the gardening process and also trained professionals as a verification system in
disease diagnostics. Advances in computer vision present an opportunity to expand and enhance
the practice of precise plant protection and extend the market of computer vision applications in
the field of precision agriculture.
21
In this changing environment, appropriate and timely disease identification including

early prevention has never been more important. There are several ways to detect plant
pathologies. Some diseases do not have any visible symptoms, or the effect becomes noticeable
too late to act, and in those situations, a sophisticated analysis is obligatory. However, most
diseases generate some kind of manifestation in the visible spectrum, so the naked eye
examination of a trained professional is the prime technique adopted in practice for plant disease
detection. In order to achieve accurate plant disease diagnostics a plant pathologist should
possess good observation skills so that one can identify characteristic symptoms [8]. Variations
in symptoms indicated by diseased plants may lead to an improper diagnosis since amateur
gardeners and hobbyists could have more difficulties determining it than a professional plant
pathologist. An automated system designed to help identify plant diseases by the plant’s
appearance and visual symptoms could be of great help to amateurs in the gardening process and
also trained professionals as a verification system in disease diagnostics.
Advances in computer vision present an opportunity to expand and enhance the practice
of precise plant protection and extend the market of computer vision applications in the field of
precision agriculture.
MODULE DIAGRAMS:
SYSTEM ARCHITECTURE
DATA
Data Data TRAINING
Data pre-processing
collection
Data
BASESet cleaning DATASETS
RESULT ALGORITHM
PREDICTION APPLYING
Fig 5.1.A System Architecture
22
BLOCK DIAGRAM
Input Image
Fig
Image Preprocessing
5.1.B Block Diagram
Back End Image segmentation Module Diagrams:
Fig 5.1.C Back End
Feature Extraction
Module Diagram
Classification
accuracy
Fig 5.1.D Use Case Diagram
Use Case Diagram: A use case diagram in the Unified Modelling Language is a type of
behavioural diagram by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors their goals and
23
any dependencies.
Fig 5.1.E Use Case Diagram
State Diagram: A State diagram is a type of diagram used in computer science and related
fields to describe the behavior of system. State diagram require that the system described is
composed of a finite number of states.
24
Fig 5. 1.F Stat Diagram
Activity Diagram: Activity diagrams are graphical representation of workflow of stepwise

activities and actions with support for choice. Activity diagram can be used to declare the
business and operational step-by-step workflow of components in a system .
Fig 5.1.G Activity Diagram
ARCHITECTURE DIAGRAM:
25
Fig 5.1.H Architecture Diagram
ER DIAGRAM: An entity-relationship model describes interrelated things of interest in a

specific domain of knowledge. It specifies relationship that can exist between entities.
Fig 5.1. I ER Diagram
MODULES
5.1.1 Dataset preparation and preprocessing

5.1.2 Image Preprocessing
5.1.3 Data Augmentation
5.1.4 Data splitting
5.1.5Modeling Evaluation
5.1.1 Dataset preparation and preprocessing:
26
Data is the foundation for any machine learning project. The second stage of project
implementation is complex and involves data collection, selection, preprocessing, and
transformation. Each of these phases can be split into several steps.
Data collection:-
It’s time for a data analyst to pick up the baton and lead the way to machine learning
implementation. The job of a data analyst is to find ways and sources of collecting relevant and
comprehensive data, interpreting it, and analyzing results with the help of statistical techniques.
The type of data depends on what you want to predict.
There is no exact answer to the question “How much data is needed?” because each
machine learning problem is unique. In turn, the number of attributes data scientists will use
when building a predictive model depends on the attributes’ predictive value.
‘The more, the better’ approach is reasonable for this phase. Some data scientists suggest
considering that less than one-third of collected data may be useful. It’s difficult to estimate
which part of the data will provide the most accurate results until the model training begins.
That’s why it’s important to collect and store all data — internal and open, structured and
unstructured.
The tools for collecting internal data depend on the industry and business infrastructure.
For example, those who run an online-only business and want to launch a personalization
campaign can try out such web analytic tools as Mixpanel, Hotjar, CrazyEgg, well-known
Google analytics, etc. A web log file, in addition, can be a good source of internal data. It stores
data about users and their online behavior: time and length of visit, viewed pages or objects, and
location.
Companies can also complement their own data with publicly available datasets. For
instance, Kaggle, Github contributors, AWS provide free datasets for analysis.
Data preprocessing:-
27
The purpose of preprocessing is to convert raw data into a form that fits machine
learning. Structured and clean data allows a data scientist to get more precise results from an
applied machine learning model. The technique includes data formatting, cleaning, and
sampling.
Data formatting: - The importance of data formatting grows when data is acquired from various
sources by different people. The first task for a data scientist is to standardize record formats. A
specialist checks whether variables representing each attribute are recorded in the same way.
Titles of products and services, prices, date formats, and addresses are examples of variables.
The principle of data consistency also applies to attributes represented by numeric ranges.
Data cleaning: - This set of procedures allows for removing noise and fixing inconsistencies in
data. A data scientist can fill in missing data using imputation techniques, e.g. substituting
missing values with mean attributes. A specialist also detects outliers — observations that
deviate significantly from the rest of distribution. If an outlier indicates erroneous data, a data
scientist deletes or corrects them if possible. This stage also includes removing incomplete and
useless data objects.
Data anonymization: - Sometimes a data scientist must anonymize or exclude attributes

representing sensitive information (i.e. when working with healthcare and banking data).
Data sampling: - Big datasets require more time and computational power for analysis. If a
dataset is too large, applying data sampling is the way to go. A data scientist uses this technique
to select a smaller but representative data sample to build and run models much faster, and at the
same time to produce accurate outcomes.
5.1.2 Image Preprocessing:-
28
Image processing is divided into analogue image processing and digital image processing.
Digital image processing is the use of computer algorithms to perform image processing
on digital images. As a subfield of digital signal processing, digital image processing has many
advantages over analogue image processing. It allows a much wider range of algorithms to be
applied to the input data — the aim of digital image processing is to improve the image data
(features) by suppressing unwanted distortions and/or enhancement of some important image
features so that our AI-Computer Vision models can benefit from this improved data to work on.
Read Images: - In this step, we store the path to our image dataset into a variable then we
created a function to load folders containing images into arrays.
Resize image: - In this step in order to visualize the change, we are going to create two functions
to display the images the first being a one to display one image and the second for two images.
After that, we then create a function called processing that just receives the images as a
parameter. The reason for doing resize is some images captured by a camera and fed to our AI
algorithm vary in size, therefore, we should establish a base size for all images fed into our AI
algorithms.
5.3 Data Augmentation:-
Amongst the popular deep learning applications, computer vision tasks such as image
classification, object detection, and segmentation have been highly successful. Data
augmentation can be effectively used to train the DL models in such applications. Some of the
simple transformations applied to the image are; geometric transformations such as Flipping,
Rotation, Translation, Cropping, Scaling, and color space transformations such as color casting,
Varying brightness, and noise injection. Figure 1. Shows the original image and the images after
applying some of these transformations. The python code used for applying the transformations
is shown in appendix-1.
29
5.4 Data splitting:-
A dataset used for machine learning should be partitioned into three subsets — training,
test, and validation sets.
Training set: - A data scientist uses a training set to train a model and define its optimal
parameters — parameters it has to learn from data.
Test set: - A test set is needed for an evaluation of the trained model and its capability for
generalization. The latter means a model’s ability to identify patterns in new unseen data after
having been trained over a training data. It’s crucial to use different subsets for training and
testing to avoid model over fitting, which is the incapacity for generalization we mentioned
above.
5.5 Modeling:-
During this stage, a data scientist trains numerous models to define which one of them provides
the most accurate predictions.
Model training:-
It’s time to train the model with this limited number of images. fast.ai offers many
architectures to use which makes it very easy to use transfer learning.
We can create a convolutional neural network (CNN) model using the pre-trained models
that work for most of the applications/datasets.
We are going to use ResNet architecture, as it is both fast and accurate for many datasets
and problems. The 18 in the resnet18 represents the number of layers in the neural network.
30
We also pass the metric to measure the quality of the model’s predictions using the validation set
from the dataloader. We are using error_rate which tells us how frequently the model is making
incorrect predictions.
The fine_tune method is analogous to the fit() method in other ML libraries. Now, to
train the model, we need to specify the number of times (epochs) we want to train the model on
each image.
Applying Deep Learning modules for object detection:
CNN classifier
In this project, for helmet, scarf and mask detection, the CNN (Convolutional Neural
Networks) is implemented. The system is trained and tested with images of people helmets,
scarfs, and masks and is used to detect if a person is covering his face or not.
CNN is a type of Neural Networks widely used for image recognition and image
classification. CNN uses supervised learning. CNN consists of filters or neurons that have biases
or weights. Every filter takes some inputs and performs convolution on the acquired input. The
CNN classifier has four layers; Convolutional, pooling, Rectified Linear Unit (ReLU), and Fully
Connected layers.
i. Convolutional layer
This layer extracts the features from the image which is applied as input. The neurons
convolve the input image and produce a feature map in the output image and this output image
from this layer is fed as an input to the next convolutional layer.
ii. Pooling layer

This layer is used to decrease the dimensions of the feature map still maintaining all the
important features. This layer is usually placed between two convolutional layers.
iii. ReLu layer
31
ReLu is a non-linear operation which replaces all the negative values in the feature map
by zero. It is an element wise operation.
iv. Fully Connected layer
FLC means that each filter in the previous layer is connected to each filter in the next
layer. This is used to classify the input image based on the training dataset into various classes.
It has four phases:
1. Model construction
2. Model training
3. Model testing
4. Model evaluation
Model construction depends on machine learning algorithms. In this projects case, it was
Convolution Neural Networks. After model construction it is time for model training. Here, the
model is trained using training data and expected output for this data. Once the model has been
trained it is possible to carry out model testing. During this phase a second set of data is loaded.
This data set has never been seen by the model and therefore it’s true accuracy will be verified.
After the model training is complete, the saved model can be used in the real world. The name of
this phase is model evaluation.
VGG16 model:
Transfer learning generally refers to a process where a model trained on one problem is
used in some way on a second related problem. In deep learning, transfer learning is a technique
whereby a neural network model is first trained on a problem similar to the problem that is being
solved. One or more layers from the trained model are then used in a new model trained on the
problem of interest.
32
Transfer learning has the benefit of decreasing the training time for a neural network
model and can result in lower generalization error.
The weights in re-used layers may be used as the starting point for the training process
and adapted in response to the new problem. This usage treats transfer learning as a type of
weight initialization scheme. This may be useful when the first related problem has a lot more
labeled data than the problem of interest and the similarity in the structure of the problem may be
useful in both contexts.
Fig 5.5.1 Vgg16 Model
FIG 5.5.2 VGG16 Model
Supporting vector machine(SVM):
33
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a

separating hyperplane. In other words, given labeled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples. In two
dimensional space this hyperplane is a line dividing a plane in two parts where in each class
lay in either side.
Algorithm
1. Define an optimal hyperplane: maximize margin
2. Extend the above definition for non-linearly separable problems: have a penalty term for misclassification
3. Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulat
problem so that data is mapped implicitly to this space.
FIG 5.5.3 Supporting Vector Machine(SVM)
5.2 SOFTWARE SPECIFICATION

5.2.1 GENERAL
ANACONDA
It is a free and open-source distribution of the Python and R programming languages for
scientific computing (data science, machine learning applications, large-scale data processing,
predictive analytics, etc.), that aims to simplify package management and deployment.
Anaconda distribution comes with more than 1,500 packages as well as

the Conda package and virtual environment manager. It also includes a GUI, Anaconda
Navigator, as a graphical alternative to the Command Line Interface (CLI).
34
The big difference between Conda and the pip package manager is in how package dependencies
are managed, which is a significant challenge for Python data science and the reason Conda
exists. Pip installs all Python package dependencies required, whether or not those conflict with
other packages you installed previously.
So your working installation of, for example, Google Tensorflow, can suddenly stop
working when you pip install a different package that needs a different version of the Numpy
library. More insidiously, everything might still appear to work but now you get different results
from your data science, or you are unable to reproduce the same results elsewhere because you
didn't pip install in the same order.
Conda analyzes your current environment, everything you have installed, any version
limitations you specify (e.g. you only want tensorflow >= 2.0) and figures out how to install
compatible dependencies. Or it will tell you that what you want can't be done. Pip, by contrast,
will just install the thing you wanted and any dependencies, even if that breaks other things.Open
source packages can be individually installed from the Anaconda repository, Anaconda Cloud
(anaconda.org), or your own private repository or mirror, using the conda install command.
Anaconda Inc compiles and builds all the packages in the Anaconda repository itself, and
provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. You can also install
anything on PyPI into a Conda environment using pip, and Conda knows what it has installed
and what pip has installed. Custom packages can be made using the conda build command, and
can be shared with others by uploading them to Anaconda Cloud, PyPI or other repositories.The
default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7.
However, you can create new environments that include any version of Python packaged with
conda.
35
Fig 5.6.1 Anaconda
Anaconda Navigator is a desktop Graphical User Interface (GUI) included in Anaconda

distribution that allows users to launch applications and manage conda packages, environments
and channels without using command-line commands. Navigator can search for packages on
Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the
packages and update them. It is available for Windows, macOS and Linux.
The following applications are available by default in Navigator:
 JupyterLab
 Jupyter Notebook
 QtConsole
 Spyder
 Glueviz
 Orange
 Rstudio
 Visual Studio Code
Microsoft .NET is a set of Microsoft software technologies for rapidly building and
integrating XML Web services, Microsoft Windows-based applications, and Web solutions.
The .NET Framework is a language-neutral platform for writing programs that can easily and
36
securely interoperate. There’s no language barrier with .NET: there are numerous languages
available to the developer including Managed C++, C#, Visual Basic and Java Script. The .NET
framework provides the foundation for components to interact seamlessly, whether locally or
remotely on different platforms. It standardizes common data types and communications
protocols so that components created in different languages can easily interoperate.
“.NET” is also the collective name given to various software components built upon
the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server,
for instance) and services (like Passport, .NET My Services, and so on).
Microsoft VISUAL STUDIO is an Integrated Development Environment (IDE) from

Microsoft. It is used to develop computer programs, as well as websites, web apps, web services
and mobile apps.
Fig 5.6.2 Visual Studio

Python is a powerful multi-purpose programming language created by Guido van Rossum. It has
simple easy-to-use syntax, making it the perfect language for someone trying to learn computer
programming for the first time. Python features are:
 Easy to code
 Free and Open Source
37
 Object-Oriented Language
 GUI Programming Support
 High-Level Language
 Extensible feature
 Python is Portable language
 Python is Integrated language
 Interpreted
 Large Standard Library
 Dynamically Typed Language
PYTHON:
 Python is a powerful multi-purpose programming language created by Guido van

Rossum.
 It has simple easy-to-use syntax, making it the perfect language for someone trying to
learn computer programming for the first time.
Features Of Python :
1.Easy to code:Python is high level programming language. Python is very easy to learn
language as compared to other language like c, c#, java script, java etc. It is very easy to code in
python language and anybody can learn python basic in few hours or days. It is also developer-
friendly language.
2. Free and Open Source:Python language is freely available at official website and you can
download it from the given download link below click on the Download Python keyword. Since,
it is open-source, this means that source code is also available to the public. So you can
download it as, use it as well as share it.
3.Object-Oriented Language:One of the key features of python is Object-Oriented programming.

Python supports object oriented language and concepts of classes, objects encapsulation etc.
38
4. GUI Programming Support:Graphical Users interfaces can be made using a module such as
PyQt5, PyQt4, wxPython or Tk in python.PyQt5 is the most popular option for creating
graphical apps with Python.
5. High-Level Language:Python is a high-level language. When we write programs in python,

we do not need to remember the system architecture, nor do we need to manage the memory.
6.Extensible feature:Python is a Extensible language. we can write our some python code into c
or c++ language and also we can compile that code in c/c++ language.
7. Python is Portable language:Python language is also a portable language. for example, if we

have python code for windows and if we want to run this code on other platform such as Linux,
Unix and Mac then we do not need to change it, we can run this code on any platform.
8. Python is Integrated language:Python is also an Integrated language because we can easily

integrated python with other language like c, c++ etc.
9. Interpreted Language:Python is an Interpreted Language. because python code is executed line

by line at a time. like other language c, c++, java etc there is no need to compile python code this
makes it easier to debug our code. The source code of python is converted into an immediate
form called bytecode.
10. Large Standard LibraryPython has a large standard library which provides rich set of module
and functions so you do not have to write your own code for every single thing.There are many
libraries present in python for such as regular expressions, unit-testing, web browsers etc.
11. Dynamically Typed Language:Python is dynamically-typed language. That means the type
(for example- int, double, long etc) for a variable is decided at run time not in advance.because
of this feature we don’t need to specify the type of variable.
APPLICATIONS OF PYTHON:
WEB APPLICATIONS
39
 You can create scalable Web Apps using frameworks and CMS (Content Management
System) that are built on Python. Some of the popular platforms for creating Web Apps
are: Django, Flask, Pyramid, Plone, Django CMS.
 Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
SCIENTIFIC AND NUMERIC COMPUTING
 There are numerous libraries available in Python for scientific and numeric computing.
There are libraries like: SciPy and NumPy that are used in general purpose computing.
And, there are specific libraries like: EarthPy for earth science, AstroPy for Astronomy
and so on.
 Also, the language is heavily used in machine learning, data mining and deep learning.
CREATING SOFTWARE PROTOTYPES
 Python is slow compared to compiled languages like C++ and Java. It might not be a
good choice if resources are limited and efficiency is a must.
 However, Python is a great language for creating prototypes. For example: You can use
Pygame (library for creating games) to create your game's prototype first. If you like the
prototype, you can use language like C++ to create the actual game.

GOOD LANGUAGE TO TEACH PROGRAMMING
 Python is used by many companies to teach programming to kids

 It is a good language with a lot of features and capabilities. Yet, it's one of the easiest
language to learn because of its simple easy-to-use sy
40
CHAPTER 6
IMPLEMENTATION AND RESULT

6.1 INTRODUCTION
Python is a program that was originally designed to simplify the implementation of

numerical linear algebra routines. It has since grown into something much bigger, and it is used
to implement numerical algorithms for a wide range of applications. The basic language used is
very similar to standard linear algebra notation, but there are a few extensions that will likely
cause you some problems at first.
6.2 IMPLEMENTATION CODE
#list of useful imports that I will use
%matplotlib inline
import os
import tqdm
import matplotlib.pyplot as plt
import pandas as pd
import cv2
import numpy as np
from glob import glob
import seaborn as sns
import random
from keras.preprocessing import image
import tensorflow as tf
from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, GlobalMaxPooling2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
# Run this cell to mount your Google Drive.
from google.colab import drive
41
drive.mount('/content/drive')
file = '/content/drive/My Drive/Copy of 611716_1094714_bundle_archive.zip'
import zipfile as zf
data_zip = zf.ZipFile(file)
data_zip.extractall()
!ls
6.3 BACKEND CODE:
import os
import numpy as np
from keras.utils.np_utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from keras.preprocessing import image
from sklearn.preprocessing import LabelBinarizer
from PIL import Image
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.layers import Dense, Conv2D, MaxPooling2D , Flatten,Dense,Softm
ax,Activation, Dropout,BatchNormalization
from keras.models import Sequential,load_model
from keras.applications.vgg16 import VGG16
import seaborn as sns
import random
import h5py
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from skimage.color import rgb2grey
from skimage.feature import hog
from keras.callbacks import ModelCheckpoint
from google.colab import drive
42
drive.mount('/content/drive')
image_path='/content/drive/MyDrive/Dataset/PlantVillage'
images=[]
i=0
for F in os.walk(image_path):
if(i!=0):
for f in F[2]:
images.append(os.path.join(F[0],f))
i+=1

print(len(images))
images[0]
labels=[]
disease_type=[]
for im in images:
labels.append(im.split('/')[-2])
disease_type.append(im.split('/')[-2])
le = LabelEncoder()
labels=le.fit_transform(labels)
print(labels)
le.classes_
sns.countplot(labels[:500])
classes=np.unique(labels)
disease=np.unique(disease_type)
n_classes=classes.shape[0]
Y=to_categorical(np.array(labels),n_classes)
print(Y.shape)
n_classes
print(disease)
def random_sample(X,no_of_samples):
image_sample=[]
new_labels=[]
smple=int(np.ceil(no_of_samples/classes.shape[0]))
#smple=5
print(smple)
smple_cnt=0
for d in disease:
#print(d)
c=0
for x in X:

43
if(x.split('/')[-2]==d and c<smple and smple_cnt<1000 ):
#print(x)
image_sample.append(x)
new_labels.append(d)
c+=1
smple_cnt+=1
#print('----------------------')
print(len(image_sample))
return image_sample,new_labels

Image_sample,Labels=(random_sample(np.array(images),5000))
data=list(zip(Image_sample,Labels))
random.shuffle(data)
X,Y=zip(*data)
print(len(Labels))
len(Image_sample)
len(Y)
Y[:10]
aug=ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2,
zoom_range=0.2,horizontal_flip=True,
fill_mode="nearest")
input_image=[]
for imgs in X:
#print(imgs)
img=image.load_img(imgs,target_size=(224,224))
img=image.img_to_array(img)
#img_grey=rgb2grey(img)
#img_grey=img_grey.reshape(img_grey.shape[0],img_grey.shape[1],1)
img=img.astype('float32')
img_grey=img/255
input_image.append(img)
#Input_image=np.array(input_image)
Input_image=np.array(input_image)
Input_image.shape
le2 = LabelEncoder()
y = []
for i in list(Y):
if i == "Pepper__bell___Bacterial_spot":
j = [0,1]
44
y.append(j)
else:
j = [1,0]
y.append(j)
model = Sequential()
model.add(Conv2D(32, kernel_size = (3, 3),activation='relu', input_shape=(
224,224,3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
#model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
#model.add(Dropout(0.3))
model.add(Dense(2, activation = 'softmax'))
INIT_LR=0.001
EPOCHS=10
BS=32
opt = Adam(lr=INIT_LR)
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accu
racy'])
import tensorflow as tf
#File path
file_name = '/content/drive/MyDrive/Dataset/plant_disease.h5'
#Save the model
tf.keras.models.save_model(model,file_name)
#model.save('plant_disease.h5')
model=load_model('/content/drive/MyDrive/Dataset/plant_disease.h5')
#model.load_weights('weights-improvementplantdisease-18-0.92.hdf5')
y_pred=model.predict_classes(x_test)
y_pred
#print the test accuracy
score = model.evaluate(x_test, y_test, verbose=0)
print('Test Accuracy Score:', score[1])
import pandas as pd
45
results = pd.DataFrame(columns = ['Model', 'Accuracy'])
new = ['CNN ',0.91]
results.loc[1] = new
vgg.trainable=False
model=Sequential()
model.add(vgg)
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(2, activation = 'softmax'))
INIT_LR=0.001
EPOCHS=25
BS=32
opt = Adam(lr=INIT_LR)
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accu
racy'])
filepath="weights-improvementplantdiseasevgg16-{epoch:02d}-
{val_accuracy:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1,
save_best_only=True, mode='max')
callbacks_list = [checkpoint]
history=model.fit_generator(aug.flow(x_train, y_train, batch_size=BS),epoc
hs=10,callbacks=callbacks_list,validation_data=(x_test,y_test),verbose=1)
new = ['VGG-16 ',0.93]
model.save('/content/drive/MyDrive/Dataset/plant_disease_vgg16.h5')
# plot the accuracy plot
plt.plot(history.history['accuracy'], 'r')
plt.plot(history.history['val_accuracy'], 'b')
plt.legend({'Train accuracy Curve': 'r', 'Test accuracy Curve':'b'})
plt.show()
# plot the accuracy plot
plt.plot(history.history['loss'], 'r')
plt.plot(history.history['val_loss'], 'b')
plt.legend({'Train accuracy Curve': 'r', 'Test accuracy Curve':'b'})
plt.show()
46
MACHINE LEARNING
from skimage import exposure
from skimage import feature
def image2hog(img):
img=image.load_img(imgs,target_size=(224,224))
img=image.img_to_array(img)
img=img.astype('float32')
img_grey=img/255
#input_image.append(img)
return feature.hog(img_grey[:,:,0], pixels_per_cell=(12, 12))
#save resized images into images.
images = [image2hog(img) for img in X]
images[5].shape
images = np.array(images)
images.shape
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import TimeSeriesSplit
enc=LabelEncoder()
y_t = enc.fit_transform(Y)
X_train, X_test, y_train, y_test = train_test_split(images, y_t ,test_size
=0.3, random_state=42)
import pickle
svm = SVC(kernel='rbf')
svm.fit(X_train,y_train)
filename = 'finalized_model.pkl'
pickle.dump(svm, open(filename, 'wb'))
svm.score(X_test,y_test)
from sklearn.metrics import accuracy_score
print("Accuracy on train set: %0.3f%%"%(accuracy_score(y_test, svm.predict
(X_test))*100))
47
new = ['SVM ',0.80]
results
6.4 RESULT
Figure 6.4.1: Login Page
48
Figure 6.4.2 Uploading The Image For Detecting The Disease
49
Figure 6.4.3 Detects The Disease
Fig 6.4.4 Detects The Healthy Leaf
50
Fig 6.4.5 Comparing The Algorithms For Disease Detection
51
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 CONCLUSION & FUTURE SCOPE:-

This review explained DL approaches for the detection of plant diseases. Moreover, many
visualization techniques/mappings were summarized to recognize the symptoms of diseases.
Although much significant progress was observed during the last three to four years, there are
still some research gaps which are described below:
 In most of the researches (as described in the previous sections), the PlantVillage dataset
was used to evaluate the accuracy and performance of the respective DL models/architectures.
Although this dataset has a lot of images of several plant species with their diseases, it has a
simple/plain background. However, for a practical scenario, the real environment should be
considered.
 Hyperspectral/multispectral imaging is an emerging technology and has been used in
many areas of research. Therefore, it should be used with the efficient DL architectures to detect
the plants’ diseases even before their symptoms are clearly apparent.
 A more efficient way of visualizing the spots of disease in plants should be introduced as
it will save costs by avoiding the unnecessary application of fungicide/pesticide/herbicide.
 The severity of plant diseases changes with the passage of time, therefore, DL models
should be improved/modified to enable them to detect and classify diseases during their
complete cycle of occurrence.
 DL model/architecture should be efficient for many illumination conditions, so the
datasets should not only indicate the real environment but also contain images taken in different
field scenarios.
 A comprehensive study is required to understand the factors affecting the detection of
plant diseases, like the classes and size of datasets, learning rate, illumination, and the like.
7.2 REFERENCE:
52
[1] Al-Bashish D, M. Braik and S. Bani-Ahmad, 2011. Detection and classification of leaf
diseases using K-means-based segmentation and neural networks based classification. Inform.
Technol. J., 10: 267-275. DOI:10.3923/itj.2011.267.275, January 2011.
[2] Armand M.Makowski "Feature Extraction of diseased leaf images", Fellow, IEEE
Transactions on information theory Vol.59, no.3 March-2013
[3] H.Al-Hiary, S. Bani-Ahmad, M.Reyalat, M.Braik and Z.AlRahamneh, Fast and Accurate
Detection and Classification of Plant Diseases, International Journal of Computer Applications
(0975-8887), Volume 17-No.1.March 2011.
[4] DaeGwan Kim, Thomas F. Burks, Jianwei Qin, Duke M.Bulanon, Classification of grapefruit
peel diseases using color texture feature analysis, International Journal on Agriculture and
Biological Engineering, Vol:2, No:3,September 2009. Open access at http://www.ijabe.org.
[5] Jasmeet Kaur, Dr.Raman Chadha, Shvani Thakur, Er.Ramanpreet Kaur. A Review Paper on
Plant Disease Detection using Image Processing and Neural Network Approach. International
Journal of Engineering Sciences & Research Technology. April 2016. ISSN: 2277-9655.
[6] Diptesh Majumdar, Dipak Kumar Kole, Aruna Chakraborty, Dwijesh Dutta Majumder.
REVIEW: DETECTION & DIAGNOSIS OF PLANT LEAF DISEASE USING INTEGRATED
IMAGE PROCESSING APPROACH. International Journal of Computer Engineering and
Applications. June 2014. Volume VI; Issue- III.
[7] S. S. Sannakki, V. S. Rajpurohit. An Approach for Detection and Classification of Leaf Spot
Diseases Affecting Pomegranate Crop. International Journal of Advance Foundation and
Research in Computer. January 2015, Volume 2, Special Issue (NCRTIT 2015), ISSN 23484853.
[8] Davoud Ashourloo, Hossein Aghighi, Ali Akbar Matkan, Mohammad Reza Mobasheri, and
Amir Moeini Rad," An Investigation Into Machine Learning Regression Techniques for the Leaf
Rust Disease Detection Using Hyperspectral Measurement" 2016 IEEE.
[9] Mr. Melike Sardogan “Plant Leaf Disease Detection and Classification based on CNN with
LVQ Algorithm” 2018 3rd International Conference on Computer Science and Engineering
(UBMK) 2018 IEEE.
53
[10] K. P. Ferentinos, “Deep learning models for plant disease detection and diagnosis”,
Computers and Electronics in Agriculture, vol. 145, pp. 311-318, 2018
ture, vol. 145, pp. 311-318, 2018
54

b3 Plant Leaf Disease Detection

Uploaded by

Copyright:

Available Formats

b3 Plant Leaf Disease Detection

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

b3 Plant Leaf Disease Detection

Uploaded by

Copyright:

Available Formats

A

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY,

Under the Esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internal Guide Head of the Department

Submitted for viva voce Examination held on __________________

Internal Examiner External Examiner

We hereby declare that the information furnished in this project report

NAME OF THE STUDENT REGISTER NUMBER SIGNATURE

SAURABH SINGH 17F41A0599

In India, Agriculture plays an essential role because of the rapid growth

SL. NO INDEX PAGE NO

5.1 INTRODUCTION 21-26

1.1 Machine Learning 4

5.1.B Block Diagram 23

Mathematica is a commercial symbolic mathematical computation system, developed since

MATLAB is short for MATrix LABoratory, which is a commercial numerical computing

Requirements and Installation

Table1.A Non-Comprehensive List Of IPython Magic Function

Jupyter, previously known as IPython Notebook, is a web-based, interactive development

Fig 1.1 Machine Learning

Fig 1.2 Clustering

Fig 1.3 First Principle Component

NEURAL NETWORKS AND DEEP LEARNING

Keras additionally requires either Theano or TensorFlow to be installed. In the examples

Fig 1.4 Neural Network And Deep Learning

AUTHOR: F Ahmed, Ha Ai-Mamun, Asmh Bari, E Hossain

have prompted increasing interests in seeking alternative weed control approaches. An

images. Importantly, there is no misclassification of crops as weeds and vice versa

2.2. TITLE: COTTON LEAF DISEASE IDENTIFICATION USING PATTERN

AUTHOR: P. R. Rothe and R. V. Kshirsagar

of adaptive neuro-fuzzy inference system. The classification accuracy is found to be 85 percent.

2.3. TITLE: MACHINE LEARNING FOR HIGH-THROUGHPUT STRESS

AUTHOR: Singh Arti

Advances in automated and high-throughput imaging technologies have resulted in a

cycle in plant stress phenotyping and plant breeding activities where different ML approaches

(ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to

practice guidelines for various biotic and abiotic stress traits.

2.4. TITLE: DETECTION OF POTATO DISEASES USING IMAGE SEGMENTATION AND

MULTICLASS SUPPORT VECTOR MACHINE

AUTHOR: Monzurul Islam, Anh Dinh And Khan Wahid

2.5. TITLE: PLANT DISEASE DETECTION USING IMAGE PROCESSING

AUTHOR: Khirade, Sachin D., and A. B. Patil

Fig 4 CNN Architecture

4.1 ADVANTAGES OF PROPOSED SYSTEM:

PROGRAMMING LANGUAGE : Python

Detection of disease through some automatic technique is helpful because it reduces an

In this changing environment, appropriate and timely disease identification including

Fig 5.1.D Use Case Diagram

Fig 5.1.E Use Case Diagram

Fig 5. 1.F Stat Diagram

Activity Diagram: Activity diagrams are graphical representation of workflow of stepwise

Fig 5.1.G Activity Diagram

Fig 5.1.H Architecture Diagram

ER DIAGRAM: An entity-relationship model describes interrelated things of interest in a

Fig 5.1. I ER Diagram

5.1.1 Dataset preparation and preprocessing

5.1.1 Dataset preparation and preprocessing:

The type of data depends on what you want to predict.

Data anonymization: - Sometimes a data scientist must anonymize or exclude attributes