BTP Report 1 (2) (1) PDFG

PLANT LEAF IDENTIFICATION – THESIS
A REPORT
SUBMITTED IN PARTIAL FULFILMENT OF THE
REQUIREMENTS
FOR THE AWARD OF THE DEGREE
OF
BACHELOR OF TECHNOLOGY
IN
DEPARTMENT OF INFORMATION TECHNOLOGY
Submitted by:
Rohit Pathak (2020UIN3314)
Divangi Choudhary (2020UIN3318)
Sahil (2020UIN3349)
Under the supervision of

Dr. Nisha Kandhoul
DEPARTMENT OF INFORMATION TECHNOLOGY

NETAJI SUBHAS UNIVERSITY OF TECHNOLOGY
DECLARATION
Department of Information Technology
Delhi-110078, India
We, (Rohit Pathak 2020UIN3314, Divangi Choudhary 2020UIN3318, Sahil 2020UIN3349) of

B.Tech., Department of Information Technology, hereby declare that the Project II – Thesis titled
“PLANT LEAF IDENTIFICATION” which is submitted by us to the Department of Information
Technology, Netaji Subhas University of Technology, in partial fulfilment of the requirement for
the award of the degree of Bachelor of Technology, is original and not copied from source
without proper citation. This work has not previously formed the basis for the award of any
Degree.
Place: Delhi Rohit Pathak Divangi Choudhary Sahil
Date: 5th May, 2024

CERTIFICATE
Department of Information Technology
Delhi - 110078, India
This is to certify that the work embodied in the Project II – Thesis titled “Plant Leaf
Identification” has been completed by (Rohit Pathak 2020UIN3314, Divangi Choudhary
2020UIN3318, Sahil 2020UIN3349, of B.Tech., Department of Information Technology, under
my guidance towards fulfilment of the requirements for the award of the degree of Bachelor of
Technology. This work has not been submitted for any other diploma or degree of any University.
Place: Delhi
Date: 5th May, 2024

ABSTRACT
Identification of plants through plant leaves on the basis of their shape, colour and texture
features is called Plant Leaf Identification. It is a significantly explored topic of research which
has come out to be of importance due to the ongoing development in the field of Agriculture and
Technology. As processes become more and more digital and enabled, similar developments have
been observed to simulate growth in the aid of Botanists and researchers for better results in
terms of evaluation of plant species using pre-processed tools.
We have integrated the digital image processing techniques with Machine Learning to identify
plant leaf species automatically with brilliant accuracy in our research work as we have worked
on Algorithms that have improved upon the previously used methods such as CNNs, KNNs,
Random Forests, k-nearest neighbours etc.
We have used the Support Vector Machine Classifier which is able to outperform various other
classifiers and was able to classify with 90.05% accuracy and provide a robust framework for
further improvements and developments on the same model to expand the domain of
development.
INDEX
DECLARATION
CERTIFICATE
ABSTRACT
INDEX
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
1.2 MOTIVATION
1.3 OBJECTIVE
CHAPTER 2
LITERATURE REVIEW
2.1 MACHINE LEARNING MODELS
2.2 DEVELOPMENT LIBRARIES, FRAMEWORKS, TOOLS
2.3 EVALUATION METRICS
2.4 LITERATURE SURVEY
2.5 RELATED WORKS
CHAPTER 3
METHODOLOGY
3.1 GAPS IDENTIFIED
3.2 DATASET BACKGROUND AND DESCRIPTION
3.3 METHODOLOGICAL IDEA
3.4 IMPLEMENTATION
3.5 CODE SNIPPETS
3.6 RESULTS AND DISCUSSION
3.7 MOBILE IMAGE VERSION IMPLEMENTATION
CONCLUSION
FUTURE WORK THAT CAN BE DONE
REFERENCES
PLAGIARISM REPORT
CHAPTER 1: INTRODUCTION
1.1 BACKGROUND
● Plant leaf identification is a new and fascinating area of research currently adopted and
explored by various ML works for improved industry applications.
● Traditional knowledge of plants can be utilized in a framework able to detect leaves of
different species in real time with maximum precision.
● As Agriculture and Medicinal industries are technologically booming, new innovations
like these play a role in easing the traditional requirements and bondages of time and
effort.
● Plant leaf identification is a system that uses digital image processing techniques to
classify leaves from 32 distinct plant species..
● Following is a general workflow of this process:

o Pre-Processing: Images are converted into a suitable format to be ready to be
processed into a single format with a uniform scaling using greyscaling, Gaussian
filter and boundary extraction.
o Extraction of Features: The pre-processed image is used to extract many kinds of

leaf attributes, including texture-,based color-based, and shape-based information.
o Model Designing and its Testing: The photos are classified using a support
vector machine constructing model along with other ML models such as KNN,
Decision Tree and then GridSearchCV is utilized to adjust the parameters.
● Advantages:
o Improved Classification Accuracy: It has been shown to improve classification
accuracy by selecting the most competent classifier.
o Operability: SVM methods have been used by previous researchers to show

maximum throughput and provide a solid base for experiments on the same model,
it is also a general model which allows a lot of variation to be adapted to the
method for datasets.
o Adaptability to Data Characteristics: SVM algorithms can adapt or mould itself
according to the characteristics of the input data, such as noise levels, class
imbalance, and varying complexity. By dynamically selecting classifiers based on
the current data characteristics, SVM methods can achieve better performance
across diverse datasets.
o Open to more research domains: SVM provides a broad spectrum to operate on
later including multi-dimensional SVM (MSVM) to improve the training of the
algorithm and generate more accurate results. Further deep learning can also be
integrated later.
● Disadvantages:
o Computational Cost: It can be computationally expensive to train and maintain
models, particularly for complicated or huge datasets. Since there are many
different kinds of plant leaves and their characteristics, we use a sizable dataset for
this type of study.
o Increased Complexity: As we continue to improve our research, we encounter
computational resource to manage the training and evaluation of datasets.
● Methods in Selection of Classifiers :

1. KNN, Random Forest: - KNN's performance is highly sensitive to the choice of
the amount or number of neighbours (k), and selecting an inappropriate k-value
can lead to suboptimal predictions. Similarly, Random Forest performance is
impacted by the number of trees and other hyperparameters, making tuning crucial
but challenging.
2. KNN, Logistic Regression, Random Forest: - All three algorithms are susceptible
to overfitting, especially when dealing with noisy or complex datasets. Logistic
Regression and Random Forest may overfit if the model is too complex or the
dataset is not representative.
3. Difficulty in Handling Missing Data (KNN, Logistic Regression): - KNN is

sensitive to missing data, and imputing missing values can introduce bias. Logistic
Regression also requires handling missing data appropriately, which can be
challenging and may introduce potential inaccuracies.
All these methods have been explored to determine accuracies in close proximity, however SVM
works best for this purpose as we explore its performance further.
1.2 MOTIVATION
• A project focused on plant identification through machine learning is driven by diverse
motivations, spanning environmental conservation, biodiversity monitoring, agriculture, and
education.
• It's pivotal for biodiversity conservation, enabling quick identification and protection of
endangered species. In ecological research, machine learning analyses large datasets, deepening
our understanding of ecosystems.
• Agriculture benefits from rapid plant identification for crop management, disease detection,
and resource optimization. Detecting invasive species early is crucial for ecosystem health.
• Machine learning aids in educational tools, fostering biodiversity appreciation. Citizen science
engagement supports data collection and raises awareness.
• Accurate plant identification informs conservation policies and contributes to global
environmental monitoring, making the project a comprehensive solution for ecological
challenges.
1.3 OBJECTIVE
• This study's main goal was to evaluate and compare the effectiveness of several machine
learning algorithms in order to determine which would be best for moving on with the creation
of a mobile application that would recognize fruit, vegetable, and herbal plants.
• This project's primary objective is to use machine learning (ML) to create an accurate and
effective predictive model for plant leaf identification. This entails using digital image
processing techniques to identify various plant species based on distinctive traits of their leaves,
such as shape, color, and texture. Our goal is to increase plant recognition accuracy and
efficacy by applying machine learning (ML) techniques, which will enable more accurate
classification and placement of plants in different situations.
• Primary focus: Support Vector Machine (SVM) algorithm and compare its effectiveness with
other ML Algorithms.
• Secondary goal: to explore the parameter tuning to find the suitable hyper-parameters of the
model/algorithm using GridSearchCV to get more accurate and efficient results to classify the
plant leaves more precisely along with the creation of a function to extract feature from any
image taken by a mobile phone using background removal function
CHAPTER 2: LITERATURE REVIEW
2.1 MACHINE LEARNING MODELS
• K Nearest Neighbours (KNN)

◦ Data points are represented as features (attributes) in a multidimensional space
◦ K is a user-defined positive integer that determines the number of nearest
neighbours to consider
◦ A distance metric (e.g., Euclidean distance, Manhattan distance) is used to
calculate the similarity between data points. This determines how "close" two
points are in the feature space
◦ The class label is assigned to the new point based on the majority vote of its
neighbours
• Decision Tree
◦ Hierarchical tree structures having nodes and branches
◦ Decisions made by recursively splitting the feature space into regions based on
feature values.
◦ Splits the data at each internal node based on a feature that best separates the data
points into distinct categories
◦ Continues splitting until reaching a stopping criterion (e.g. maximum depth).
• Support Vector Machine

◦ Finds optimal hyperplane in a high dimensional space that best separates the data
points belonging to different classes
◦ The key idea is to maximize the margin between the hyperplane and the closest
data points from each class
◦ It can handle linear and non-linear decision boundaries using different kernel
functions (e.g., linear, polynomial, radial basis function).
◦ By maximizing the margin, SVMs prioritize finding a hyperplane that generalizes
well to unseen data and avoids overfitting the training data.
• Random Forest
◦ An ensemble learning method that builds multiple decision trees and combines
their predictions.
◦ Each tree is trained on a random subset of the training data (bagging) and uses a
random subset of features at each node
◦ Reduces overfitting and improves generalisation by averaging the predictions of
individual trees
1. 2.2 DEVELOPMENT LIBRARIES, FRAMEWORKS AND TOOLS
1. NumPy (Numerical Python)

a. NumPy is a fundamental library for numerical computing in Python.
b. Massive, complex arrays and matrices can be used, and a number of mathematical
operations are available to allow efficient processing of these arrays.
c. NumPy arrays are used to represent datasets in machine learning. NumPy's
random module is used to generate random numbers for tasks like initializing
model parameters, creating synthetic datasets, or introducing randomness in
algorithms like stochastic gradient descent.
d. NumPy is used for data preprocessing tasks like scaling, normalization, and
handling missing values. For instance, mean centering and scaling of features is
easily achieved using NumPy operations
2. pandas
a. Developed on top of NumPy, Pandas is an open-source Python framework meant
to make data analysis and manipulation easier.
b. For processing organized data, like CSV files, Excel sheets, SQL databases, and
more, it offers powerful data structures and tools.
c. Pandas is extensively used in machine learning workflows for data preprocessing,
exploration, and manipulation.
d. Pandas can be used to split datasets into training, validation, and test sets, as well
as to shuffle and sample data for model evaluation.
e. Pandas facilitates the evaluation of machine learning models by organizing
predictions and ground truth labels, calculating evaluation metrics, and generating
performance reports.
f. After model training, Pandas can be used to analyze model outputs, interpret
feature importances, and visualize model behavior.
3. Scitkit learn
a. Based on NumPy, SciPy, and Matplotlib, Scikit-learn is an open-source machine
learning package for Python.
b. For preparing data, selecting features, choosing models, and evaluating them, it
offers a large array of machine learning tools and techniques.
c. Many supervised and unsupervised learning methods, including as clustering.
regression, and classification, are implemented in Scikit-learn. d. Tools for
choosing and assessing models are offered by Scikit-learn, including
d. performance measurements, hyperparameter adjustment, and cross-validation. e.
Data manipulation and preprocessing are made simple by scikit-learn's seamless
integration with NumPy and Pandas data structures.
4. Kaggle
a. Kaggle is an online community and platform for notebooks, datasets, and contests
related to data science and machine learning.
b. It hosts a wide range of datasets, challenges, and competitions, allowing data scientists
and machine learning practitioners to showcase their skills and collaborate with
others.
c. Kaggle Kernels (Jupyter Notebooks on Kaggle) are commonly used for:
i. Exploratory Data Analysis (EDA): Analyzing datasets, visualizing data

distributions, and gaining insights into the data.
ii. Model Development: Writing and testing machine learning models, tuning
hyperparameters, and evaluating model performance.
iii. Documentation and Sharing: Documenting the entire data science workflow,
including code, visualizations, and explanations, and sharing the analysis with
others
iv. Collaboration: Collaborating with team members by sharing notebooks and
working together on projects.
5. Flavia
a. Extensive Dataset: The website provides researchers with an extensive dataset
that has been carefully selected, covering a wide range of characteristics and
factors pertinent to the area of study. For conducting in-depth studies and inquiries
across a variety of study disciplines, this dataset is an invaluable resource.
b. User-Friendly and Accessible Interface: Researchers may easily navigate and

retrieve data from the site thanks to its user-friendly and accessible interface.
Because of its user-friendly design, the dataset may be explored and used
effectively, expediting the research process and increasing productivity.
c. Good Research Opportunities: Using the site's dataset makes it possible for
scholars to tackle important research topics and explore new fields of study.
Whether performing predictive modeling, undertaking exploratory analysis, or
finding hidden patterns within the
2.3 EVALUATION METRICS
● Accuracy: The proportion of accurately anticipated observations to all observations
● Precision: The proportion of accurately forecast positive observations to the total number
of positive predictions.
● Recall (Sensitivity): The proportion of all real class observations to all correctly
anticipated positive observations
● F1 Score: The harmonic mean or average of precision and recall
2.4 LITERATURE SURVEY
i. Plant Leaf Identification based on Machine Learning Algorithms a.

b. The conventional method of plant identification is both time-consuming and
complex, compounded by a dwindling knowledge base of plant species across
generations..
c. This study addresses these challenges by comparing the effectiveness of various
machine learning algorithms to develop a mobile application aimed at identifying
herbal, fruit, and vegetable plants found in Sri Lanka through leaf analysis..
d. The article primarily focuses on preprocessing steps such as noise reduction,
image enhancement, and transformation, followed by feature extraction based on
shape, texture, and color..
e. After normalization, the dataset is subjected to five machine learning algorithms
for classification purposes and other research purposes..
ii. Leaf identification using radial basis function neural networks and SSA
based support vector machine.
a. The leaf's edge points are smoothed into a continuous curve using a special type of
neural network called a Radial Basis Function Neural Network (RBFNN).
b. This allows for calculating the leaf shape's centroid (centre of mass).
c. Distances between specific points on the leaf and the centroid are measured and
normalized.
d. This process extracts features based solely on the leaf's shape, making it resistant
to changes in position, rotation, and size (scale).
e. Different methods for classifying leaf types based on the extracted shape features
are compared.
f. Two popular techniques, RBFNN and SVM, are used alongside a third method that
combines SVM with an optimization algorithm called Salp Swarm Algorithm
(SSA).
g. This research shows that the SSA-optimized SVM achieves significantly better
accuracy than the other two approaches, despite using simpler features compared
to other studies.
iii. Plant leaf identification using moment invariants & General Regression Neural
Network
a. This research demonstrates that TMI features, combined with a GRNN classifier,
can be a powerful tool for automated plant identification based on leaf images.
This finding has the potential to contribute to the development of practical tools
for plant classification.
b. Identifying plant species from leaf images is a challenging task in computer vision.
c. This study compares three methods to extract features from leaf images: ZMI,
LMI, and TMI.
d. TMI outperforms the other methods, extracting features with minimal error.
e. The extracted TMI features are used by a neural network to classify the leaf image
and identify the plant species.
iv. Recognition of Leaf Images Based on Shape Features Using a Hypersphere

Classifier
a. Recognizing plant species from leaf images is a challenging task due to the diverse
shapes and variations within leaves.
b. This research proposes a system that identifies plants solely based on their leaf
shape features.
c. The system first isolates the leaf from its background through image segmentation.
d. Fifteen shape features are then extracted, including geometric properties like
rectangularity and circularity, alongside moment invariants capturing global shape
characteristics.
e. Finally, a unique "moving centre hypersphere classifier" analyzes these features to
successfully identify over 20 different plant leaf classes.
v. Recognizing plant species by leaf shapes - A case study of the Acer family
a. This research identifies plant species based on detailed leaf shape analysis.
b. It breaks down leaf shapes into their individual components, focusing on critical
points like curvature peaks.
c. Polygons are used to approximate the overall structure and component shapes.
d. The method considers natural variations in leaf shapes within the same species.
e. Applying this approach to Maple (Acer) species, both overall leaf structure and
detailed component shapes prove reliable for plant identification.
vi. An improved image segmentation algorithm based on Otsu method

a. Isolating objects (foreground) from their background in images is crucial for
various applications in computer vision. However, this task, known as image
segmentation, can be challenging due to image complexity and variations in
lighting and object properties.
b. Thresholding is a simple yet effective technique for image segmentation. It

converts grayscale images into binary (black and white) images by assigning
pixels to either foreground or background based on a single intensity threshold
value.
c. The Otsu method is a popular thresholding technique known for its efficiency. It
automatically finds the optimal threshold by maximizing the difference in intensity
variations between foreground and background pixels, leading to a clear separation
between the two regions.
d. While effective for images with distinct foreground and background (bimodal
histograms), the Otsu method struggles when the intensity distribution has only
one peak (unimodal) or is very close to it. This can lead to inaccurate segmentation
with misclassified pixels.
e. To address this limitation, improved thresholding algorithms based on Otsu's method have
been developed. These methods ensure the threshold falls between distinct peaks or at the
bottom of a single peak, even for unimodal distributions, leading to more accurate
segmentation for a wider range of image types.
vii. A study of image processing using morphological opening and closing
processes
a. Opening and closing, built on erosion and dilation, enhance images in computer
vision. These techniques manipulate pixel values based on a "structuring element,"
a small template that defines the shape of the manipulation.
b. Opening removes small foreground objects (smaller than the structuring element)
by performing erosion followed by dilation. This smooths object boundaries and
eliminates isolated, noise-like foreground pixels.
c. Closing fills small background holes (smaller than the structuring element) by
performing dilation followed by erosion. This connects small background regions
and removes isolated background noise.
d. Both opening and closing effectively reduce noise and improve image quality.
Opening smooths object edges and removes small specks, while closing fills small
holes and eliminates isolated background pixels.
e. These techniques are fundamental tools in morphological image processing, a
versatile branch of image processing used for various applications. They offer
powerful tools for image enhancement, noise reduction, feature extraction, and
more.
viii. Boundary Extraction in Natural Images Using Ultrametric Contour Maps
a. This research proposes a novel system for boundary extraction and image
segmentation in natural images.
b. It utilizes a hierarchical classification framework, representing the image structure
as a collection of nested segmentations.
c. The system defines specific distances based on local contour information and
regional attributes, capturing both edge cues and regional characteristics.
d. Quantitative evaluation demonstrates significant improvements over existing
methods in both boundary extraction and image segmentation.
e. This system excels in both tasks, offering a more comprehensive solution for
accurate image segmentation.
ix. Combining contour and region for closed boundary extraction of a shape
a. This research investigated how humans extract closed boundaries of shapes in

noisy images using global processing strategies.
b. It focused on the contributions of both contour-based (edge detection) and region-
based (color) processing, and their interaction.
c. Performance was significantly better when fixating inside the shape compared to
outside.
d. With internal fixation, combining contour and color information led to better
boundary extraction than relying on just one cue.
e. The study proposes a model inspired by biological vision to mimic human
boundary extraction. This model calculates the "shortest path" in a specific image
representation, similar to the retina-to-visual cortex mapping.
f. The model considers four factors when calculating path costs: distance, turning
angle, color similarity, and color contrast.
g. When tested on similar conditions as the human experiment, the model's
performance closely matched human performance.
x. Salient Closed Boundary Extraction with Ratio Contour
a. This study presents "ratio contour," a cutting-edge technique for identifying

important closed boundaries in noisy images..
b. It facilitates on edge fragments detected in the image and identifies a subset to
connect into a circumcised boundary with the highest saliency.
c. The saliency measure considers Gestalt principles like proximity and continuity,
penalizing large gaps and sharp turns when connecting fragments.
d. This method aims to overcome the bias towards shorter boundaries present in other
approaches.
e. An efficient algorithm finds the most salient closed boundary in polynomial time.
f. Preprocessing steps are included to facilitate the application of ratio contour to
real-world images.
g. The study contrasts ratio contour with two related techniques: the spectrum
analysis method by Williams and Thornber and the shortest-path approach by
Elder and Zucker..
h. This comparison uses both simulated and real images for practical evaluation in
addition to theoretical analysis..
xi. Comparison of Gaussian and Median Filters to Remove Noise in Images
a. Images often contain noise and unwanted artifacts that hinder analysis.
b. Pre-processing techniques are crucial to improve image clarity before further
processing.
c. This study compares Gaussian and median filters for noise reduction in images.
d. The research concludes that the Gaussian filter achieves superior results in terms
of image clarity.
e. Pre-processing is essential for accurate analysis and better interpretation of image
data.
xii. Extracting Salient Curves from Images: An Analysis of the Saliency Network
a. The Saliency Network excels at extracting prominent curves, favoring long,
smooth curves and handling noise and gaps.
b. However, the analysis reveals limitations: the most salient element may not align
with the perceptually most prominent curve, the saliency measure can be sensitive
to scaling, and it may prefer large gaps over smaller ones.
c. Time complexity analysis shows quadratic steps for serial implementations and
linear steps for parallel implementations, both relative to network size.
d. Coarse sampling of possible curve orientations could introduce issues.
e. While the network efficiently identifies the most salient curve, it struggles to
identify other prominent curves within the image.
xiii. Geometric models for active contours
a. This research proposes a geometric approach for active contours in 2D and 3D

boundary detection and motion tracking.
b. Active contours evolve based on intrinsic geometric image features, allowing them
to naturally split and merge for multi-object detection and both interior and
exterior boundary identification.
c. The method connects active contours to minimal distance curves in a Riemannian
space derived from the image, improving upon previous geometric active contour
models.
d. This approach enhances stability in boundary detection even when gradients have
significant variations or gaps.
e. Numerical experiments shows the effectiveness of the proposed method.
xiv. Review on Techniques for Plant Leaf Classification and Recognition

a. Plant leaf morphology plays a crucial role in plant classification and recognition.
b. Neural networks, including ANNs, PNNs, CNNs, KNN, and SVMs, are popular
tools for plant leaf classification.
c. Preprocessing techniques and feature extraction significantly impact the
performance of plant leaf classification systems.
d. This study compares the accuracy achieved by different neural network techniques
in plant classification tasks.
e. High-quality leaf image databases are essential for training and validating machine
learning algorithms used for leaf recognition.
xv. Leaf shape identification based plant biometrics
a. This study suggests an easy-to-use and effective technique for identifying plant
species from leaf photos, particularly for broad, flat leaves..
b. The method involves acquiring leaf images, user-guided leaf base and reference
point selection, image segmentation, horizontal alignment, and morphological
feature extraction.
c. Unique features are extracted by slicing the leaf along specific axes and
normalizing the measurements.
d. These features are fed into a probabilistic neural network trained with a dataset of
1200 leaves from around 30 plant species.
e. The system achieves an average recognition accuracy of 81.41% through ten-fold
cross-validation.
xvi. Machine learning techniques for ontology-based leaf classification
a. Using machine learning for automation, this research suggests an integrated

strategy for an ontology-based leaf categorization system..
b. In accordance with botanical taxonomy, leaf contour classification uses a scaled
CCD code system to classify fundamental form and margin type. A trained neural
network recognizes detailed tooth patterns on the leaf.
c. Unlobed leaf measurements are conducted automatically following botanical
methods.
d. For precise vein extraction, leaf vein recognition uses a neural network and
thresholding technique.
e. This method integrates low-level image features with botanical domain knowledge
(ontology), enhancing user comprehension and system practicality.
f. Initial experiments demonstrate promising results and validate the proposed
system's feasibility.
xvii. A similarity-based leaf image retrieval scheme: Joining shape and venation
features
a. This study suggests a novel method for retrieving images of leaves based on
venation and form characteristics..
b. To represent similarity between leaf pictures, a matrix of interest points is created
for form similarity.
c. By effectively determining the minimal weight from the matrix as the similarity
score, an adaptive grid-based matching method using Nearest Neighbor search
narrows the search space.Venation similarity is modeled using an adjacency matrix
built from vein intersection and end points.
d. A prototype mobile leaf image retrieval system is implemented and tested on a
database of 1,032 images.
e. Experiments demonstrate significant performance improvement compared to
existing methods
xviii. Support Vector Machine as a Supervised Learning for the Prioritization of Novel
Potential SARS-CoV-2 Main Protease Inhibitors
a. One machine learning technique that was employed was Support Vector Machine
(SVM) classification.
b. The model analyzed two million commercially available compounds and identified
200 novel chemotypes potentially active against Mpro.
c. These compounds were further evaluated through docking simulations and
compared to known protease inhibitors.
d. The top five compounds were subjected to molecular dynamics simulations to
analyze binding interactions.
e. Notably, the SVM-selected compounds displayed key interactions known for
effective Mpro inhibition.
xix. Plant classification using SVM classifier

a. Plants are essential components of the ecosystem, crucial for maintaining
ecological balance.
b. This research investigates automated plant identification and classification.
c. Color distribution, edge detection, and direction features are used to categorize
plants.
d. Features like color histogram (analyzing color distribution) and edge histogram
(capturing edge directions) are extracted.
e. A database stores these features, and a Support Vector Machine classifies plants
with 78% accuracy.
xx. A Method of Plant Classification Based on Wavelet Transforms and Support
Vector Machines
a. This research focuses on utilizing leaf morphology, the shape and structure of
leaves, for plant classification. Leaf characteristics play a critical role in
distinguishing between different plant species.
b. The study proposes a new approach for classifying plants based on their leaf
images. This method aims to efficiently and accurately categorize plants based on
their leaf features.
c. The proposed method utilizes wavelet transforms, a mathematical tool, to convert
leaf images into the time-frequency domain. This transformation allows for the
extraction of valuable features from the images without the need for additional
preprocessing steps like image enhancement or texture thinning.
d. The extracted features are then used to train a Support Vector Machine (SVM)
classifier. SVMs are powerful machine learning algorithms known for their ability
to efficiently learn complex patterns and achieve high classification accuracy.
e. Experimental results demonstrate that the proposed method using wavelet
transforms and SVMs achieves high recognition accuracy in classifying plant
species from leaf images. Additionally, the method boasts faster processing speed
compared to other potential approaches.
.
xxi. A computerized plant species recognition system

a. Web-based Plant Identification: This research introduces CPSRS, a web-based
system for identifying plant species.
b. Platform Independence: CPSRS leverages Java Web infrastructure for
compatibility across different platforms.
c. Client-Server Interaction: Java applets and servlets balance computational load
between client and server for efficiency.
d. Features like color histogram (analyzing color distribution) and edge histogram
(capturing edge directions) are extracted.
e. A database stores these features, and a Support Vector Machine classifies plants
with 78% accuracy.
xxii. Combined thresholding and neural network approach for vein pattern extraction
from leaf images
a. Leaf Vein Significance: Leaf veins carry crucial information for plant
identification, but extracting them accurately is a complex challenge.
b. Proposed Method: This research introduces a novel approach combining
thresholding and Artificial Neural Networks (ANNs) for leaf vein extraction.
c. Initial Segmentation: Analyzing the leaf image's intensity histogram allows for a
rough separation of potential vein regions from the background.
d. Fine-Grained Classification: A trained ANN with ten input features, extracted from
a localized window around each pixel, precisely classifies individual pixels as vein
or background.
CHAPTER 3 : METHODOLOGY
3.1 Drawbacks of base paper algorithm :

● Sensitivity to Hyperparameters (KNN, Random Forest): - KNN's performance is sensitive
to the choice of the number of neighbors (k), and selecting an inappropriate k-value can
lead to suboptimal predictions. Similarly, Random Forest performance is impacted by the
number of trees and other hyperparameters, making tuning crucial but challenging.
● Lack of proper Image processing techniques used to get more accurate and efficient
output, such as during the process of boundary extraction use of sobel filters was done
which proved to be very ineffective as the gaps still persisted .
● Absence of Investigating methods for developing a background subtraction function to
eliminate the backdrop from photos taken of leaves by a mobile camera, in order to
develop a reliable algorithm to extract the features of any picture taken with a random
mobile phone by subtracting the background from the image using a function.
● Difficulty in Handling Missing Data (KNN, Logistic Regression): - KNN is sensitive to
missing data, and imputing missing values can introduce bias. Logistic Regression also
requires handling missing data appropriately, which can be challenging and may
introduce potential inaccuracies.
● F1-score of a classifier is fixed and most of the cases will be such that more than one
classifier predicts the output correctly since it is very rare that only one of so many
classifiers in the pool predicts the output correctly. Therefore, majority of the time the
best classifier will get assigned as the one having the highest f1-score on the dataset. This
will skew the dataset and the model trained on the augmented dataset will develop a bias.
3.2 DATASET BACKGROUND AND DESCRIPTION
The breakpoints and names for the leaves dataset are also present in the Flavia leaves dataset,
which is the dataset that was used.
All image file names in our dataset consist of four digit integers that are followed by the
".jpg" suffix. The table below lists the plants along with the names of the accompanying image
files. The farthest left column contains a list of the plant classification labels that are utilized in
our software. The most right column lists classification details from Wikipedia, USDA webpages,
and other sources.Number of instances in the dataset: 1908
Number of attributes in the dataset: 17
● 7 shape based attributes
● 6 colour based attributes
● 4 texture based attributes
SNo Name of attribute Data type Description
1. area numeric Area of the plant leaf
2. perimeter numeric Perimeter of the plant leaf
3. Physiological_length numeric spatial dimensions expressed in measurable terms, such
as millimeters or inches
4. Physiological_width numeric multiplying each pixel's physical width by the image's
breadth in pixels
5. Aspect_Ratio numeric ratio of width to height of bounding rect of the object.
6. rectangularity numeric used to draw a rectangle on any image
7. circularity numeric comparability of the shape to a circle
8. mean_r numeric Mean of Red channel
9. mean_g numeric Mean of Blue channel
10. mean_b numeric Mean of Green channel
11. stddev_r numeric Standard Deviation of Red Channel
12. stddev_g numeric Standard Deviation of Blue Channel
13. stddev_b numeric Standard Deviation of Green Channel
14. Contrast numeric Texture based feature
15. Correlation numeric Texture based feature
16. inverse_difference_mo numeric Texture based feature
ments
17. entropy numeric Texture based feature
3.3 METHODOLOGICAL IDEA
1. The data from the Flavia Dataset was pre-processed using the procedures listed below.
• Conversion of Image from RGB to grayscale.

• smoothing of Image with the help of Guassian filter
• Otsu's thresholding approach applied to adaptive picture thresholding.
• Morphological transformation is used to close holes.
• Extracting boundaries by contour analysis.
2. The pre-processed image was used to extract a variety of leaf attributes, which are
described below:
a. Colour-based features: R, G, and B channel means and standard deviations
b. Features based on shape: aspect ratio, area, perimeter, physiological width,
physiological length, and circularity and rectangles.
c. Texture-based features: entropy, inverse difference moments, contrast, and
correlation.
3. Model Formation and its testing
a. The model employed to categorise the plant species was the Support Vector
Machine Classifier.
b. StandardScaler was then used to scale the features.
c. GridSearchCV was also used to perform parameter tuning in order to determine
the model's proper hyperparameters.
3.4 IMPLEMENTATION
1. Conversion of RGB to Grayscale image
a. Converting an image to shades of gray from other color spaces, such as RGB,
CMYK, HSV, etc., is known as grayscaling. There are two variations: total
black and total white..
b. We used the cv2.cvtColor() function to get the desired output from it. It helps
in Dimensity Reduction as in RGB images there are only three color channels
and three dimensions while grayscale images are single-dimensional.
c. Furthermore many other algorithms were customized to be used on greyscale
images so it was done for that too.
2. Smoothing image using Guassian filter of size (25,25).

a. Smoothing of image was done using Guassian filter of size (25,25) as it is
similar to average filter . The end result that is provided is more naturally
blurred as compared to other ones with image being less blurred.
b. It uses a Gaussian Kernel of M X N where both of the are odd integers, we
used 25,25 for the same.
2
1 − x2
c. The equation for a Gaussian filter is e 2σ (for on direction) . The blur
2Πσ
takes place by using the function cv2.gaussianBlur with first argument as the
image that we want to blur and the second argument is the tuple that we
provided , in this case being 25 x 25.
3. 44Adaptive image thresholding using Otsu's threshold method
a. The technique looks for a threshold that minimizes the within-class variance,
which is calculated as the weighted sum of the variances of the two classes, in
an exhaustive manner.
b. This technique automatically calculates the optimal threshold value that
minimizes the intraclass (interclass) variance of two pixel classes (foreground
and background) or maximizes its interclass (interclass) variance. The
algorithm is to go through all the available thresholds that we can use and find
a criterion for each threshold and finally find a threshold that meets the
minimum criterion and that threshold is the result
4. Closing of holes using Morphological Transformation

1. Performed so as to close any holes present in the leaf as it refers to operations that
transform an object into a more revealing form by striking or fitting structural
features to it. These structural components are shape primitives designed to
represent a specific property of data or noise
2. Morphological transformations are performed on the data by applying various
algebraic combinations to them using these structural elements. In forensics,
morphological image processing methods are used in binary images.
5. Boundary extraction
a. Using Sobel Filter
i. When using Sobel filter we extracted the boundary of the leaf using sobel
filters. The image after edge extraction is thresholded using Otsu's method.
Then the gaps were closed using Closing operation of Morphological
Transformation.
ii. This method is not effective as even after performing morphological
transformation, gaps still persist.
kernel_edge = np.ones((15,15),np.uint8)
closing_edge = cv2.morphologyEx(im_bw_sobel, cv2.MORPH_CLOSE,
kernel_edge)
plt.imshow(closing_edge,cmap=‘Greys_r’)
b. Using Contours.
i. Leaf boundaries are extracted using contours. The boundary pixels are all
sharp, continuous, and separated by no spaces.
ii. The code that was used follows as
_, contours, hierarchy =
cv2.findContours(closing,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
6. Calculation of Shaped based Features.

a. Calculation of Moments
i. Image moments assist you in determining several aspects such as the
object's area and center of mass, among others. All calculated moment
values are provided as a dictionary by the function cv.moments().
b. Contour area was also calculated which is given by the function cv.contourArea()
along with the arc length denoted by cv2.arclength which is also known as the
perimeter where The second input (perimeter = cv2.arcLength(cnt,True)) specifies
if the shape is a curve alone or a closed contour (if passed True).
c. Fitting in the best-fit rectangle and ellipse
i. The best-fit rectangle is chosen and not ellipse as removes (leaves out)
some portion at the extreme ends of the leaf image.
ii. The function cv2 allows us to fit an ellipse to an object.fitEllipse. A
rectangle that has been rotated contains the ellipse. With the least amount
of area enclosing the object, the rotating rectangle serves as a bounding
rectangle.
Ellipse Rectangle
d. Calculation of Other shape based features such as Aspect ratio, rectangularity,

circularity etc.
i. Aspect ratio: It is the ratio of width to height of bounding rect of the
object.
Wi d th
A spect Ra t io =
Heigh t
AS= 1.6183035714285714
ii. Rectangularity : Rectangularity as also calculated given by the formula

w ×h
Recta ngul ar it y =
ar ea
Rectangularity= 1.390620057917527
iii. Circularity indicates how similar the form is to a circle. The shape area
to the circle area ratio with an identical perimeter (referred to as Circle
Area), as shown in the equation below, is a measure of circularity.
per im eter 2
cir cular it y =
area
Circ= 16.116570088620524
7. Calculation of colour based features.
a. Mean : calculation of mean of all the three RGB channels was performed to get
the desired output by simply dividing the sum of pixel values by the total count -
number of pixels in the dataset computed as len(df) * image_size * image_size.
b. Standard deviation : the following equation, standard deviation for each channel
was also calculated.
Red_mean= 46.304093229166668
Green_mean= 99.320648958333337
Blue_mean= 27.545344791666668
8. Calculation of Texture based features.

a. Using Haralick moments - calculating texture based features such as contrast,
correlation, entropy.
b. Haralick texture features were calculated based on the normalized gray-level
common computation matrix (GLCM), which measures the relationship between
pairs of pixel values in grayscale images
c. Where the rst value signi es the contrast , second correla on , third entropy
and the other inverse di erence moments.
i. print(ht_mean[1]) #contrast = 127.751947589

ii. print(ht_mean[2]) #correlation= 0.983134320165
iii. print(ht_mean[4]) #inverse difference moments= 0.717340327458
iv. print(ht_mean[8]) #entropy= 5.63677579242
9. Import the dataset and libraries. numpy, pandas, seaborn, matplotlib are used in this
implementation. “df” variable is a pandas dataframe containing the dataset.
10. Clean the dataset by removing all the rows containing missing values.
11. Crea on of Target Labels was done where Breakpoints are used alongside the image le
to create a vector of target labels. The breakpoints are speci ed in Flavia leaves dataset
website.
ti
fi

ff
fi
ti
fi
fi
a. breakpoints =
[1001,1059,1060,1122,1552,1616,1123,1194,1195,1267,1268,1323,1324,1385,1
386,1437,1497,1551,1438,1496,2001,2050,2051,2113,2114,2165,2166,2230,223
1,2290,2291,2346,2347,2423,2424,2485,2486,2546,2547,2612,2616,2675,3001,
3055,3056,3110,3111,3175,3176,3229,3230,3281,3282,3334,3335,3389,3390,34
46,3447,3510,3511,3563,3566,3621]
target_list = []
for file in img_files:
target_num = int(file.split(".")[0])
flag = 0
i=0
for i in range(0,len(breakpoints),2):
if((target_num >= breakpoints[i]) and (target_num <= breakpoints[i+1])):
flag = 1
break
if(flag==1):
target = int((i/2))
target_list.append(target)
12. As the dataset was already preprocessed by now and it contained various a ributes such
as area ,perimeter, physiological_length, physiological_width, aspect_ra o etc.
13. Train test split of data was then performed which helps to evaluate the ability of machine
learning models to generalize to new, unseen objects. It also prevents overfitting, where
the model performs well on training material but not well in new situations. Using the
validation process, we recalibrate the model to achieve better performance on unseen
data.
14. Feature scaling was then performed to prevent overfitting and to normalize the data that
we used,imported via standardScalar.
15. . Now a model is trained on the Augmented/Preprocessed Dataset to predict the best
classifier for each row. keeping best classifier column as the dependent variable (Y
variable/target variable)
a. “Leaf_Detection” column is kept the dependent variable (Y variable/target

variable) assigned to variable named “augmentedDF_Y”. The rest of the columns
except “defects” are kept as independent (X variables) variables assigned to
variable named “augmentedDF_X”.
b. The augmented (“augmentedDF_X”, “augmentedDF_Y”) dataset is split into
training and testing sets using random sampling with a stratified ratio of 85% for
training and 15% for testing
ti
tt
c. Feature scaling is done using Robust Scaling. The “ADF_X_train_scaled”,
“ADF_X_test_scaled” variables hold the scaled independent train and test
variables respectively ie. scaled(“ADF_X_train”), scaled(“ADF_X_test”)
d. . ML Models chosen for training are Support Vector Machine ,Random Forest,
KNN and Decision Tree.
e. Results are as follows:
f. As evident from the results, SVM outperforms the rest of the models significantly
hence, SVM is chosen for the Classification of plants through plant leaves on the
basis of their features extracted using digital image processing techniques. The
trained Standard Vector Machine is assigned to a variable named
“leaf_prediction_model”.
16. Finally, the proposed Plant leaf identifier Model is evaluated on the original cleaned pre-
processed test dataset (“X_test_scaled”, “Y_test”).
a. Initialize an empty predictions list named “predictions”. This list will contain the
prediction for each row of the dataset.
b. . Iterate over the length of the original cleaned pre-processed test
dataset(“X_test_scaled”).
i. instance” variable is assigned the independent-variables of the ith row ie.
X_test_scaled[i].
ii. .“true_value” variable is assigned the actual target value of the ith row ie.
Y_test[i]
iii. Predict the Best Classifier for ith row using “plant_leaf_identifier” and
assign the prediction to a variable named “leaf_prediction”.
iv. Predict the target value for ith row using “leaf_prediction” and assign he
target value prediction to a variable named “leaf prediction”.
v. Append “prediction” to “predictions” list.
c. Calculate the Accuracy, Precision, F1-score and Recall metrics from “predictions” and
“Y_test” using accuracy_score, precision_score, f1_score, recall_score methods from
sklearn respectively.
17. Another step is done that is Perfoming parameter tuning of the model by taking the 4
parameters to fine tune them again, them being parameters = [{'kernel': ['rbf'], 'gamma':
[1e-4, 1e-3, 0.01, 0.1, 0.2, 0.5], 'C': [1, 10, 100, 1000]},{'kernel': ['linear'], 'C': [1, 10, 100,
1000]}].
18. The result of means was stored in “mean_test_score” , while the standard result was
assigned as “std_test_score”.
Again the accuracy,precision,recall and f1-score was calculated with results as follows:
19. Dimensity Reduction using PCA was done to enhance the performance furthermore.
3.5. CODE SNIPPETS
Fig 3.5.1 Converting image to grayscale
Fig 3.5.2 : Smoothing image using Guassian filter of size (25,25)
Fig 3.5.3 : Creation of Target Labels

Fig 3.5.4 : Performing parameter tuning of the model
Fig 3.5.4 & 3.5.6 : Performing parameter tuning of the model

Fig 3.5.7: Calculation of deviations
Fig 3.5.7 : Dimensionality Reduction using PCA

Fig 3.5.8 : Finding the correct leaf contour from the list of contour
Fig 3.5.9 : Applying Masking Operation(s) on the image
Fig 3.5.10 : calculation of texture based features using Haralick Moment

3.6 RESULTS AND DISCUSSION
1. We have evaluated our proposed Identification of plants through plant leaves on four
metrics namely accuracy, f1-score, precision and recall.
The results are as follows
Accuracy: 0.91
Precision: 0.90
Recall: 0.89
F1-score: 0.415
2. For comparison we also evaluated traditional, ensemble machine learning models on

the same metrics.
Model Accuracy Precision Recall F1-score
Support Vector Machine 0.91 0.901 0.893 0.415
k-nearest neighbors (KNN) 0.86 0.645 0.559 0.246
Random Forest 0.81 0.833 0.024 0.046
Decision Tree 0.807 0.625 0.012 0.023
3. It can be observed that SVM performs better than all the machine learning models
given in the above table.
It performs better than the rest of the models giving a better accuracy, better precision,
better recall and better f1-score.
The algorithm is successful in the classification of plants through plant leaves on the basis of
their features extracted using digital image processing techniques as the result is shown below
too :
4.
3.7 IMPLEMENTATION OF METHODS FOR DEVELOPING A
BACKGROUND SUBTRACTION ABILITY TO REMOVE BACKGROUND
FROM PHOTOS TAKEN WITH A MOBILE CAMERA THAT INCLUDE
LEAVES
It all starts with reading the image which is named as “test_img_path”
1. Resizing of the image into (1600,1200) was done as all the images in the flavia dataset
were of size (1600,1200).
resized_image = cv2.resize(img, (1600, 1200))
plt.imshow(resized_image,cmap="Greys_r")
2. Conversion of the image into Grayscale was done to apply other methods over it
3. Smoothing image using Guassian filter of size (55,55) was done again by using the
cv2.GaussianBlur function.
4. .Dynamic image thresholding using Otsu's thresholding function was done by

exhaustively searching for a threshold that minimizes the within-class variance defined as
the weighted sum of the variances of the two classes.
kernel = np.ones((50,50),np.uint8)
closing = cv2.morphologyEx(im_bw_otsu, cv2.MORPH_CLOSE, kernel)
plt.imshow(closing,cmap="Greys_r")
5. Finding the correct leaf contour from the list of contours

The following function finds the correct leaf contour by taking any coordinate point of the
leaf (default - center point) and checks whether the current contour contains that point or not.
Returns the index of the correct contour.
def find_contour(cnts):
contains = []
y_ri,x_ri, _ = resized_image.shape
for cc in cnts:
yn = cv2.pointPolygonTest(cc,(x_ri//2,y_ri//2),False)
contains.append(yn)
val = [contains.index(temp) for temp in contains if temp>0]
print(contains)
return val[0]
6. Creating mask image for background subtraction using leaf contour was done
7.
Before After
8. Performing masking operation on the original image

Masking operation was done to fill out any gaps if they were present in the image using
the function created by us
9. Finally Background subtracted image was achieved.

CONCLUSION
1. It was realised on a collection of leaf characteristics that have shown promise in leaf
datasets. 90% greater accuracy was attained. This programme is helpful to both experts
and non-experts interested in learning about plants because of the accuracy of the
identification.
2. We began by Preprocessing the image by using various methodologies and then jumped
on to extract the features be it shaped based or color based followed by model building
and testing it out where the SVM ,KNN and other models were used to classify the plant
species.
3. After that, a large number of features were scaled using standard scalar, and
GridSearchCV was used to tune the model's parameters and determine the proper
hyperparameters.
FUTURE WORK TO BE DONE
1. Adoption of multiclass SVM to improve accuracy and range of plant species.

2. Scope of deep learning algorithms to improve results upto 99 percent in
leave detection.
3. Scope of making a dynamic classifier (DCS) to.achieve more frequency than
normal Multi-Layer Perceptron (MLP).
4. SVM's efficiency in high-dimensional spaces, ability to handle non-linearity,
robustness against overfitting, and effectiveness in handling imbalanced data
make it a favorable choice for placement prediction, especially when dealing
with complex and diverse datasets commonly encountered in educational and
career domains. However, the choice of the most suitable algorithm
ultimately depends on the specific characteristics of the dataset and the
particular goals of the prediction task.
5. More digital imaging techniques could be explored to get more higher
resolution pictures pertaining to more efficient output.
REFERENCES
[1] J. X. Du, et al." Computer-aided plant species identification (CAPSI) based on leaf shape
matching technique". Transaction of the Institue of Measurement and Control. Vol 28, 2006.
pp275-284.
[2] X. F. Wang, et al. "Recognition of Leaf Images Based on Shape Features Using a
Hypersphere Classifier". International Conference on Intelligent Computing, 2005. pp87-96.
[3] J.-X. Du, X.-F. Wang, and G.-J. Zhang, "Leaf shape based plant species recognition," Applied
Mathematics and Computation, vol. 185, 2007.
[4] P. Panchal, V. C. Raman and S. Mantri, "Plant Diseases Detection and Classification using
Machine Learning Models", (CSITSS).
[5] R. G. de Luna, E. P. Dadios and A. Bandala, "Automated Image Capturing System for Deep
Learning-based Tomato Plant Leaf Disease Detection and Recognition", IEEE Region 10
Conference.
[6] S. Kumar, K. Prasad, A. Srilekha, T. Suman, B. P. Rao and J. N. Vamshi Krishna, "Leaf
Disease Detection and Classification based on Machine Learning", (ICST CEE).
[7] J. W. Tan, S. Chang, S. Binti, Abdul Kareem, H. J. Yap and K. Yong, Deep learning for plant
species classification using leaf vein morpho metric, pp. 1-1, 2018.
[8] I. Motoyoshi, S. Nishida, L. Sharan, and E. H. Adelson, “Image statistics and the perception
of surface qualities,” Nature, vol. 447, May 2007.
[9] ] B. C. Heymans, J. P. Onema, and J. O. Kuti, “A neural network for opuntia leaf-form
recognition,” in Proceedings of IEEE International Joint Conference on Neural Networks, 1991.
[10] F. Gouveia, V. Filipe, M. Reis, C. Couto, and J. Bulas-Cruz, “Biometry: the characterisation
of chestnut-tree leaves using computer vision,” in Proceedings of IEEE International Symposium
on Industrial Electronics, Guimar˜aes, Portugal, 1997.
[11] R. Janani and A. Gopal, "Identification of selected medicinal plant leaves using image
features and ann", 2013 International Conference on Advanced Electronic Systems (ICAES), pp.
238-242, Sept 2013.
[12] João Camargo Neto, George E. Meyer, David D. Jones and Ashok K. Samal, "Plant species
identification using Elliptic Fourier leaf shape analysis", Computers and electronics in
agriculture, vol. 50, no. 2, pp. 121-134, 2006.
[13] Shanwen Zhang, Yingke Lei, Tianbao Dong and Xiao-Ping Zhang, "Label propagation based
supervised locality projection analysis for plant leaf classification", Pattern Recognition, vol. 46,
no. 7, pp. 1891-1897, 2013.
[14] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang and Yihong Gong,
"Locality-constrained linear coding for image classification", Computer Vision and Pattern
Recognition (CVPR) 2010 IEEE Conference, pp. 3360-3367, 2010.
[15] Sushil R. Kamlapurkar, "Detection Plant Leaf Disease Using Image Processing Approach",
International Journal _ of Scientific and Research Publications, vol. 6, no. 2, pp. 73-76, February
2016.
[16] Jyoti and Prince Kumar, "A Brief Review on Plant Disease Detection Using Image
Processing Techniques", International Journal of Computer Science and Engineering, vol. 7, no.
9, pp. 112-114, September 2019.
[17] Rima Herlina S. Siburian, Rahmi Karolina, Phong Thanh Nguyen, E. Laxmi Lydia and K.
Shankar, "Leaf Disease Classification using Advanced SVM Algorithm", International Journal of
Engineering and Advanced Technology, vol. 8, no. 6S, pp. 712-718, August 2019.
[18] Khairnar, Khushal, and Rahul Dagade. ”Disease Detection and Diagnosison Plant using
Image Processing–A Review.” International Journal of Computer Applications 108, no. 13
(2014): 36-38.
[19] ] Monalisa saha , sasikala , “ Identification of plant disease using machine learning
algorithm”International journals of advanced science and technology , vol 29, No.9s,(2020)
[20]] Syafiqah Ishaka, Mohd Hafiz Fazalul Rahimana* , “Leaf disease classification using
artificial neural network “.77:17 (2015) 109–114 | www.jurnalteknologi.utm.my | eISSN 2180–
3722 |
[21]P. N. Belhumeur, D. Chen, S. Feiner et al., “Searching the world’s herbaria: a system for
visual identification of plant species,” in Proceedings of the European Conference on Computer
Vision, pp. 116–129, Springer, Marseille, France, October 2008.
[22] M. Sulc and J. Matas, “Texture-based leaf identification,” in Proceedings of the European
Conference on Computer Vision, pp. 185–200, Springer, Zurich, Switzerland, September 2014.
[23] J. Chaki, R. Parekh, and S. Bhattacharya, “Plant leaf recognition using texture and shape
features with neural classifiers,” Pattern Recognition Letters, vol. 58, pp. 61–68, 2015.
[24] S. Arivazhagan, S. Newlin, S. Ananthi and V Vishnu, "Detection of unhealthy region of

plant leaves and classification of plant leaf diseases using texture features", Agric Eng. Int CIGR,
vol. 15, pp. 211-217, 2013.
[25] V. Singh and A. Misra, Detection of unhealthy region of plant leaves using Image Processing
and Genetic Algirithm in Computer Engineering and Applications (ICACEA), 2015.

BTP Report 1 (2) (1) PDFG

Uploaded by

Copyright:

Available Formats

BTP Report 1 (2) (1) PDFG

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BTP Report 1 (2) (1) PDFG

Uploaded by

Copyright:

Available Formats

PLANT LEAF IDENTIFICATION – THESIS

Under the supervision of

DEPARTMENT OF INFORMATION TECHNOLOGY

Department of Information Technology

We, (Rohit Pathak 2020UIN3314, Divangi Choudhary 2020UIN3318, Sahil 2020UIN3349) of

Place: Delhi Rohit Pathak Divangi Choudhary Sahil

Date: 5th May, 2024

Department of Information Technology

Delhi - 110078, India

Date: 5th May, 2024

● Following is a general workflow of this process:

o Extraction of Features: The pre-processed image is used to extract many kinds of

o Operability: SVM methods have been used by previous researchers to show

● Methods in Selection of Classifiers :

3. Difficulty in Handling Missing Data (KNN, Logistic Regression): - KNN is

2.1 MACHINE LEARNING MODELS

• K Nearest Neighbours (KNN)

• Support Vector Machine

1. 2.2 DEVELOPMENT LIBRARIES, FRAMEWORKS AND TOOLS

1. NumPy (Numerical Python)

c. Kaggle Kernels (Jupyter Notebooks on Kaggle) are commonly used for:

i. Exploratory Data Analysis (EDA): Analyzing datasets, visualizing data

b. User-Friendly and Accessible Interface: Researchers may easily navigate and

2.3 EVALUATION METRICS

● Accuracy: The proportion of accurately anticipated observations to all observations

2.4 LITERATURE SURVEY

i. Plant Leaf Identification based on Machine Learning Algorithms a.

iv. Recognition of Leaf Images Based on Shape Features Using a Hypersphere

vi. An improved image segmentation algorithm based on Otsu method

b. Thresholding is a simple yet effective technique for image segmentation. It

a. This research investigated how humans extract closed boundaries of shapes in

x. Salient Closed Boundary Extraction with Ratio Contour

a. This study presents "ratio contour," a cutting-edge technique for identifying

xiii. Geometric models for active contours

a. This research proposes a geometric approach for active contours in 2D and 3D

xiv. Review on Techniques for Plant Leaf Classification and Recognition

xvi. Machine learning techniques for ontology-based leaf classification

a. Using machine learning for automation, this research suggests an integrated

xix. Plant classification using SVM classifier

xxi. A computerized plant species recognition system

3.1 Drawbacks of base paper algorithm :

3.2 DATASET BACKGROUND AND DESCRIPTION

• Conversion of Image from RGB to grayscale.

2. Smoothing image using Guassian filter of size (25,25).

4. Closing of holes using Morphological Transformation

6. Calculation of Shaped based Features.

d. Calculation of Other shape based features such as Aspect ratio, rectangularity,

ii. Rectangularity : Rectangularity as also calculated given by the formula

8. Calculation of Texture based features.

i. print(ht_mean[1]) #contrast = 127.751947589

a. “Leaf_Detection” column is kept the dependent variable (Y variable/target

3.5. CODE SNIPPETS

Fig 3.5.1 Converting image to grayscale

Fig 3.5.2 : Smoothing image using Guassian filter of size (25,25)

Fig 3.5.3 : Creation of Target Labels

Fig 3.5.4 & 3.5.6 : Performing parameter tuning of the model

Fig 3.5.7 : Dimensionality Reduction using PCA

Fig 3.5.9 : Applying Masking Operation(s) on the image

Fig 3.5.10 : calculation of texture based features using Haralick Moment