Pavan Kumar Masters Thesis - Compressed
Pavan Kumar Masters Thesis - Compressed
Pavan Kumar Masters Thesis - Compressed
in Mango Orchards
Optimisation of crop management practices with accurate information based decision mak-
ing approaches is crucial for precision agriculture. An accurate fruit detector framework
complemented by yield estimation and ripeness analysis can help growers to efficiently
target various in-farm operations including quality of crops, efficient use of resources and
labour force, and harvest management. Traditionally these tasks were held based on the
manual decisions on a sample of the crops which tend to be inaccurate and inclined to
the subjective biases.
With the advancements in the agri-robotics, precise measurements in each square
foot of cultivable land can be obtained on a per plant level. This revolution in the
data availability to the growers, in contrast to the past, can help them capture the
detailed representations of their farm to act wisely. This thesis incorporates robust and
accurate data processing techniques to revolutionise the farm operations with data-driven
approaches by presenting state-of-the-art object detection frameworks, YOLO v4 and
YOLO v4 tiny in the context of fruit detection in mango orchards. The results show
a high F1-score of 81.13 % and mAP value of 82.38% for YOLO v4. Analysis on the
ripeness of the fruit is presented with Gaussian Mixture Models (GMM) by reflecting
the potential future implications that influence the harvest management.
i
Acknowledgements
I would like to express my heartfelt thanks to the Centre for Machine Vision, a part of
the Bristol Robotics Laboratory for accepting my co-creation proposal on Agri-Technology.
To my supervisor, Dr.Mark Hansen, you have been an excellent mentor. I greatly admire
your proficiency in providing guidance and support through all aspects of this work.
I have also always appreciated your constructive feedback on my work, writing and
approach to research. The skills that I have learnt under your supervision will continue
to play a big role well into the future.
iii
Declaration
I hereby declare that the work in this dissertation was carried out in accordance with the
requirements of the University’s Regulations and Code of Practice for Research Degree
Programmes and that it has not been submitted for any other academic award. Except
where indicated by specific reference in the text, the work is the candidate’s own work.
Work done in collaboration with, or with the assistance of, others, is indicated as such.
Any views expressed in the dissertation are those of the author.
v
Table of Contents
Page
List of Tables xi
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Principle Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background Research 5
2.1 Fruit Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Sensing Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Feature Engineering Techniques in Orchards . . . . . . . . . . . . 7
2.1.3 Classification and Detection . . . . . . . . . . . . . . . . . . . . . 10
2.2 Feature Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Ripeness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Research Methodology 13
3.1 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 The Anatomy of Yolov4 and Yolov4 tiny . . . . . . . . . . . . . . 15
3.3.2 Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . 19
vii
TABLE OF CONTENTS
5 Conclusion 53
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Recommendations for Field Deployment . . . . . . . . . . . . . . . . . . 54
5.2.1 Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.2 Timing Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . 55
viii
TABLE OF CONTENTS
A Appendix 59
A.1 CLAHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A.2 Ripeness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 69
ix
List of Tables
Table Page
4.1 Table showing the pixel probabilities before dropping Mask pixels cluster . . 45
4.2 Table showing the updated pixel probabilities . . . . . . . . . . . . . . . . . 46
xi
List of Figures
Figure Page
2.1 A view of a typical mango orchard with a raw image data collected . . . . . 7
xiii
LIST OF FIGURES
5.1 Figure showing the significance of depth of scene and size of the fruit to be
well detected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 An example of multi-view approach in mango orchards, referenced in [61] . . 56
5.3 A variation in the detection count observed on same image captured at
different times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
xiv
Nomenclature
List of Acronyms
2D two dimensional
3D three dimensional
AP average precision
xv
LIST OF FIGURES
WS watershed segmentation
xvi
Chapter 1
Introduction
The aim of this thesis is to develop machine vision and AI frameworks to optimise a
variety of in-field or in-farm operations that will facilitate decision-making for orchard
growers to meet their demand-supply needs and to estimate the time to harvest. This
work utilises state-of-the-art image analysis algorithms to conduct fruit detection for
estimating the fruit count and ripeness analysis tasks to plan the harvest time.
1
CHAPTER 1. INTRODUCTION
crop storage and marketing. Precision agriculture enables them with the better control
over farm, agricultural productivity and environmental implications [26]
Currently, farmers and agronomists manually inspect the field and take key measure-
ments on yield, quality and crop health which is to be labor intensive and inaccurate.
Unmanned autonomous or semi-autonomous robots equipped with LIDARs, 2D, 3D and
thermal vision camera sensors are ideal for perceiving the on-field data by traversing
between the orchard rows and perform qualitative and quantitative analysis on fruits.
This analysis should help the growers to make better decisions on various steps in the
supply chain. Robust and accurate machine vision algorithms are required to develop
such support systems.
However, with unconstrained and unstructured agricultural settings, the challenges
in bringing these systems into reality are countless. Heavy leaf occlusions, dynamic
illuminations, myriad shapes, colors, textures, sizes etc. mark this area of research as a
field high degree of uncertainty and complexity. Hence the present-day success is still
limited, leaving agriculture as an important frontier of applied computer vision [26].
While a myriad of applications of precision agriculture are being researched, this thesis
focusses specifically on mango fruit detection, yeild estimation and ripeness analysis in
orchards. It therefore incorporates the algorithms and frameworks that decipher sensor
information to address the fruit detection problem in orchards with heavy leaf occlusions.
2. Counting the detected fruits from all the images(from test-set) to provide a rough
estimate of the yield.
2
1.4. PRINCIPLE CONTRIBUTIONS
• RQ1: How can we develop a robust and accurate object detector that suits to
highly dynamic agricultural settings for fruit detection and yield estimation tasks?
• RQ2: How can we assess the probability of ripeness for the detected mangos in
RQ1?
1. Fruit detection and counting is performed merely on raw images but doesn’t
associate those images to individual orchard rows and trees due to the lack of
1
Accessible at https://github.com/PavanproJack/Fruit-Detection-in-Orchards
3
CHAPTER 1. INTRODUCTION
orientation of image capturing. This limits the ability to estimate the yield of an
orchard accurately.
3. Ripeness of the images in the data set is not labelled. So we limit us ripeness
assessment methodology to merely unsupervised learning tasks.
4. Images are not geo-tagged to a tree, so ripeness analysis methods give a global
analysis on fruit maturity without associating it to fruits on individual trees.
4
Chapter 2
Background Research
Given the geometry of the Mango orchards, a ground-based perspective is best suited
for fruit detection and counting tasks [3]. Unmanned autonomous or semi-autonomous
vehicles equipped with vision and range sensors can capture data across the orchards
enabling the growers to plan and organise many high-level agricultural tasks like quality
assessment, yield estimation etc. Yield is typically estimated by counting the fruits
detected by the detection algorithms that use the raw images captured by the sensors.
Considering the latest developments in the computer vision community, the reviews
presented in the surveys of methods for detecting fruits on trees were although very
comprehensive, now are obsolete [20, 25, 26, 45] with the advancements in the computer
vision techniques. This chapter bridges this gap by not just introducing the most
recent advancements in fruit detection but also incorporating the state-of-the-art object
detection frameworks in the computer vision literature.
Harvesting is a ceaseless cycle often done 3 to 4 times in the season as the time of
fruit maturity is uncertain. Harvesting at the proper stage of maturity is essential for
optimum quality and often for the maintenance of this quality after harvest [31]. Of the
many factors that can affect the taste quality of a product, ripeness, maturity, cultivar,
irrigation, and fertilisation are especially important. The aim of this part of research
is to develop a decision-support system for a grower to make valid decision on the fruit
maturity.
The first Section 2.1 of this chapter reports the relevant and recent literature that
is necessary to address the RQ1 of Chapter 1 by presenting the recent advances in the
literature on various feature engineering techniques and sensing modalities that a fruit
detection system depend upon, and gives a detailed view on frameworks and associated
algorithms that use these techniques. Sections 2.2 presents the state-of-the art machine
5
CHAPTER 2. BACKGROUND RESEARCH
learning object detection models that are extendable to orchards. Section 2.3 presents
the techniques used in the literature to classify the ripe and unripe fruits that make use
of both physical and chemical properties of fruits. We finish up in Section 2.4 with an
outline of the current advancements that answer the research questions in Section 1.3
and featuring the constraints, which thus persuade the future work required in the field.
6
2.1. FRUIT DETECTION
Figure 2.1: A view of a typical mango orchard with a raw image data collected
based representation of the scene, such as images, or a three dimensional (3D) model,
such as point clouds. A combination of these have been used in the literature.
Barnea and Ben-Shahar in [5] accurately localised and detected the red and green
sweet peppers in 3D space exploiting the depth information from RGB-D cameras. RGB-
D cameras are digital cameras that are equipped with tandard CMOS sensor to capture
color information along with the depth of the scene. Shape based local features are used
as they are partially translational invariants. Bulanon and Burks in [10] proposed an
interesting technique to detect fruits in orchards by fusing thermal image and visible
spectrum image captured from thermal camera and digital color camera respectively at
different fields of view and spatial resolutions and showed the improved fruit detection
results in orange canopies than while using digital images alone.
Payne and Walsh [46] proposed a new data collection scheme to capture mango
orchard images at night to allow for the sufficient contrast between foreground and
background and used texture filtering and Hessian filtering to rule out trunks, leaves
and stems. Capturing data by a focused narrow light beam on a tree at nights, we
can eliminate majority of non-fruit pixels from sky, other trees and ground etc. in any
given image. This improves the quality of the training data thus increasing the detection
accuracy and mitigating the training time required.
7
CHAPTER 2. BACKGROUND RESEARCH
because raw mangos are hardly distinguishable from the leaves while ripe mangos appear
very close to the color of twigs and branches as obvious from the Figure 2.1. The
developments in this context are discussed in detail below by presenting commonly used
feature representations till date.
Color
Color images are commonly represented in RGB color space. The components R, G,
and B are sensitive to illuminations which can affect the classification and detection
performance as brightness and spectrum are not orthogonal in RGB making it hard to
distinguish between the colors [13]. A variety of color transformations are proposed in
the literature to address this problem. Akin and Kirci [1] used a color based method to
detect a red pomegranate fruit against green leaves background. Hue, Saturation and
Value(HSV) color space is widely used for orchard scenes to segment fruits as Hue alone
can represent the color. Palaniappan and Won Suk Lee in [2] utilised the color vision
alone to estimate the yield of citrus grove in real-time by segmenting out the citrus fruit
with a threshold determined by the pixel scatter plot of collected images in HSI plane.
In addition to these, Adaptive Histogram Equalisation techniques are used to contrast
adjustments using histograms and can be applied on badly illuminated images to enhance
the color for detection. A variant of Adaptive Histogram Equalisation technique, CLAHE
is used to reduce noise amplification during data capturing of color retinal image [57].
Further, the ripeness analysis section 2.3 details the adoption of this idea in calculating
the probability of ripeness in the detected fruit.
Another important color space used is the L*a*b*, that is developed to more closely
model human perception by representing color as lightness from black to white, green to
red, and blue to yellow. Color channels of the color space are often normalised to reduce
the changes in illuminations on the raw images. This technique was adopted to identify
lemons in [22] and red apples in [60].
For the orchards with fruits similar to the color of background leaves/trunk/twigs,
color alone is not a sufficient factor to discriminate the background from fruit. The
following section section discusses the extraction of further features along with color.
Features like smoothness, regularity and coarseness often differ between the fruits and
background. Adapting to the changing color of fruit in different growth stages, Stajnko and
Cmelik [60] developed a robust and adjustable algorithm to encompass these variations
8
2.1. FRUIT DETECTION
9
CHAPTER 2. BACKGROUND RESEARCH
Transform (RST) in [38, 40, 41, 51]. These combination of features are preferred over
SIFT or SURF for thier low computational cost calculations.
10
2.3. RIPENESS ANALYSIS
In recent literature, Sinno and Qiang in [43] have put forward a survey of transfer
learning techniques that learn features of data that are outside the feature space of the
data it is already trained on. Knowledge transfer is proved to increase the performance
with low training data. YOLO (You Only Look Once), a state-of-the-art deep learning
based object detector is based on such techniques. Yolov3 model was used to assess
the flowering levels based on number of panicles produced by mango trees [28]. Trees
produce hundreds of panicles, out of which only a few flower. Pooja and Priyamm [36]
have tracked the vehicles using Yolov4 by optimising the anchor box predictions in the
architecture.
Although CNNs are a old techniques [32]., their recent success is due to the ad-
vancements in computing using GPUs. They were first tested on MNIST dataset [33].
CNNs with deep residual learning [23] has improved the object detection performance on
PASCAL VOC 2007 dataset [17] with an mAP of 85.6% compared to the mAP of 29%
with an established and leading hand-engineered approaches [18].
11
CHAPTER 2. BACKGROUND RESEARCH
frequencies [39], as stiffness of the texture for ripe and unripe fruits differ. This work
is based explicitly on the idea that agricultural products show natural frequencies that
differ during maturation to ripening. The idea seems promising but lacks a concrete
argument to prove that it is extendable to other fruits. Besides this, techniques like
artificial olfactory system that assess the ripeness quality by the odour are also explored
in [9].
2.4 Summary
After a careful study of literature, we understood that there is limited work reported/pub-
lished on the fruit detection in mango orchards. Most of the fruit detection systems
based their analysis on fruits like apples, sweet peppers, cirus fruits, pomegranates,
grape vineyards, strawberries etc. They used a variety of techniques that to address
the problem of occlusion. Very few of them [3, 4], have applied the transfer learning
techniques to orchard data for detection and yield estimation tasks but limited their
scope to fruit detection and counts. This thesis addresses these gaps by presenting a
research design that encapsulates transfer learning techniques for fruit detection with
ripeness assessment on detected fruits which enables the growers to have more control
over their farm operations.
Further, the sensing modalities addressed in the Section 2.1.1 inspired us to use
a range of vision sensors for data collection [5, 45] and data fusing techniques [10] to
improve the models performance on orchard data. With time constraints in place, we
limit our data collection to the secondary sources of data presented in the Section 3.4.1
and look forward to explore these techniques in near future.
12
Chapter 3
Research Methodology
The main purpose of this chapter is to discuss the research methodology followed while
conducting this study. This research used the quantitative methods and secondary
sources of data to address the key research objectives mentioned in the Chapter 1. Open
source object detection frameworks available on Github are used experimentation and
evaluation.
There have been many external factors that the orchard image data is often subject
to, that instigate undesirable variability in the images. Variable illuminations in different
seasons and irregular shadows due to tree geometry are the major factors of the variability
making it challenging for a detection architecture presented in Section 2.1 to engineer
generalisable feature representations that cater for all the conditions. This chapter
documents the feature learning based state-of-the-art object detection approaches that
are robust to external any factors.
For identifying a fruit from the background, the detection algorithms must be in-
variant to illuminations. Data pre-processing techniques are introduced to enhance
the contrast and quality of images without any loss of information. Contrast Limited
Adaptive Histogram Equalisation is applied on images to enhance the visibility level of
cluttered/unclear scenes [67]. CLAHE is reported to have improved the F1-scores of the
model used to estimate the age from RSNA Bone age dataset [66].
The Section 3.1 presents the research design followed for this project by introducing
the methods chosen for this study. The following Section 3.2 develops a comprehensive
argument to justify the necessity and suitability of the chosen methods over other to the
current context. Section 3.3 gives a comprehensive breakdown on the architectural level
of these methods. Then Section 3.4 details the step-by-step procedure carried out while
experimenting on the methods, from data collection to training the models. Following
13
CHAPTER 3. RESEARCH METHODOLOGY
the experimental setup, Section 3.7 briefly introduces the metrics used in the computer
vision community to evaluate the performance of the models. Finally, Section 3.8 details
the procedure to carry out inference on the test image set.
14
3.3. TECHNICAL BACKGROUND
Figure 3.1: Comparision of Yolo v4 and other state-of-the-art object detectors, referenced
in [7]
Gaussian Mixture Models clustering is a soft-clustering technique that gives the pixel
belonging probability to a cluster. This notion of probability of belongingness becomes
more obvious in the Section 4.2
15
CHAPTER 3. RESEARCH METHODOLOGY
Object detection models that are trained and evaluated on COCO dataset are assumed
to generalise to new object detection tasks with new training data without training from
scratch [58]. Yolov4 and v4 tiny are built on Darknet which makes them incredibly
faster. The following section detail the background of Darknet and give a comprehensive
overview on the architectures of Yolov4 and v4 tiny.
Darknet
Darknet is a custom object detection framework written in C and CUDA by Joseph
Redmon that outperforms state-of-the-art object detection results. It offers better com-
putational speed with bot CPU and GPU [52]. Yolov4 and Yolov4 tiny are implemented
using Darknet.
Yolov4 and Yolov4 tiny tend to have similar architectures with primary differences in
network size. From the config files of Yolov4 and Yolov4 tiny, we can see that there is a
remarkable decrease in the network size for tiny as the number of convolutional layers
in the backbone(CSPDarknet53) are constricted. Besides, tiny has got only two YOLO
layers instead of three and fewer anchor boxes for prediction. Between Yolov4 and Yolov4
tiny, the latter suits better for tasks requiring faster inference speeds, faster training,
low computation power. The comparision of Yolov4 and Yolov4 tiny is illustrated in the
Figure 3.2.
Yolo is a single stage detector that does both object localisation and classification
simultaneously. Traditional object detectors take image as input and compress the
features using convolutional backbone layers. While solving classification problem, these
layers are at the end of the network and perform the prediction tasks. Now in object
detection, bounding boxes are needed to be drawn on images while also classifying fruit
and non-fruit pixels. Hence the feature layers of convolutional backbone should be
blended and held up considering each other [58]. This combination of feature layers of
the backbone happens in the neck.
• CSPDarknet53
• EfficientNet-B3
• CSPResNext50
16
3.3. TECHNICAL BACKGROUND
These networks are pre-trained on Imagenet classification [15]. Yolov4 uses CSP-
17
CHAPTER 3. RESEARCH METHODOLOGY
Darknet53 that is an edited version of DenseNet. The main idea behind this is to send
create a copy of feature map from the base layer, pass one through the partial dense
network while passing the other straight to the partial transition layer( the next block)
this achieved improved learning [7].
The neck step combines the features formed in the backbone convolutional layers. Yolov4
uses PANet in the Neck stage for feature aggregation . Each node with P_i is the feature
layer of the Yolov4 backbone network. The figure below shows the PANet architecture.
18
3.3. TECHNICAL BACKGROUND
Head - Detection
Yolov4 adopts Yolo head from Yolov3 that uses anchor based detection steps, and three
levels of detection granularity [58].
19
CHAPTER 3. RESEARCH METHODOLOGY
each with it’s own distribution. Hence, GMM is a probability distribution that consists
of multiple probability distributions.
It observes the data as generated by multiple gaussian distributions each of n-
dimensional representing different clusters.
Because we have many stages of maturity for a fruit, soft clustering method that
outputs the probability of a fruit in a particular stage should make more sense to a
grower to plan his operations according to the market needs.
20
3.4. EXPERIMENTAL SETUP
Figure 3.8: Image tiles showing different augmentation techniques applied on the dataset
Out of total 1964 mango orchard images, only 70% of them contain at least one mango.
Here is the distribution chart of the data collected. The performance of a detector is
proportional to the number of training examples we show to it while training. The
following Chart 3.9 shows the distribution of data from the source. With only 1964
images we should address the problem of over-fitting to develop a robust model for
detection. The next section introduces techniques for avoiding this problem.
1,399
1,400
1,200
1,000
Count
800
600 565
400
200
0
With Mangos Without Mangos
21
CHAPTER 3. RESEARCH METHODOLOGY
Data Augmentation
Introducing diversity in training examples would make the model more robust on the test
dataset. Without actually collecting new data on-field, data augmentation techniques
expand the data using label preserving transformations and increase the model’s capability
to reduce over-fitting. The techniques used and their significance is mentioned below:
Flipping to be robust to image orientations.
Rotation to be insensitive to camera roll.
Shear to be insensitive to camera and subject pitch and yaw.
Exposure to be robust towards lighting and camera setting changes.
Noise to be robust to camera artifacts.
Data Splitting
After augmenting the data from 1964 to 4016 images, the data is split in 3099 : 442 :
475 ratio for training, validation and testing tasks. Table 3.1 shows the partitions of the
data after augmentation.
22
3.4. EXPERIMENTAL SETUP
Figure 3.10: Image tiles showing different augmentation techniques applied on the dataset
color space and split into three separate channels L*, a*, and b*. Then using the opencv
function createCLAHE function with a clip limit of 3.0 and grid size of 8x8 is used and
apply this transformation on L* channel. Finally merge the three channels and convert
back to original RGB color space. The script to carryout these steps is detailed in the
appendix Section A.1
23
CHAPTER 3. RESEARCH METHODOLOGY
Experiments on models are conducted on Goolge Colaboratory (in short Colab) environ-
ment that runs entirely on cloud. Colab offers a free GPU that is essential for training
Yolov4 and Yolov4 tiny. The following Table 3.2 are the hardware specifications used at
the time of running these experiments.
Table 3.2: Table showing the hardware specifications used for experiments
# Hardware Specification
1 GPU 0 Name Tesla T4
2 Model name Intel(R) Xeon(R) CPU @ 2.00GHz
3 RAM 13GB
3 Hard disk 35GB
3 CPU MHz 2000.176
24
3.6. IMPLEMENTATION AND RUNNING
Google Colab offers a free GPU and OpenCv library pre-installed, so all that we need
to do is configure cuDNN. The following commands will do that for us.
1 ! sed -i ’s / OPENCV =0/ OPENCV =1/ ’ Makefile
2 ! sed -i ’s / GPU =0/ GPU =1/ ’ Makefile
3 ! sed -i ’s / CUDNN =0/ CUDNN =1/ ’ Makefile
4 ! sed -i ’s / CUDNN_HALF =0/ CUDNN_HALF =1/ ’ Makefile
Start by enabling the GPU by changing the runtime type to Hardware accelerator
: GPU. Using Git commands, clone the darknet repository from the github here 2
to download the source code of the architecture. Then run the !make command to
automatically build executable programs and libraries from source code. Then comes the
data part and although the data is annotated, the labels are not yet in a format that a
Yolo model can digest. A custom routine has to be written to achieve the conversion.
Here is the format that Yolo accepts:
Labels Formatting
Yolo expects the labels and images to conform to the following standards:
1. Label file should be in a .txt file format for each .jpg or .png image files in same
directory with same name [7].
2. Each object of a class should correspond to a new line in the .txt file in the format:
< object − class >< x >< y >< width >< height > [7].
2
https://github.com/AlexeyAB/darknet
25
CHAPTER 3. RESEARCH METHODOLOGY
2. subdivisions are the number of pieces a batch is broken into for GPU memory.
26
3.6. IMPLEMENTATION AND RUNNING
# Parameter Value
1 batch 64
2 subdivisions 32
3 width 416
4 height 416
5 max_batches 6000
6 steps 4800, 5400
7 classes 1 (in the three YOLO layers)
8 filters 18 (in the three convolutional layers before the YOLO layers)
Yolo expects these parameters in the configuration file to follw the following calcula-
tions: Width and height can be the multiple of 32 and 416 is the standard, max_batches
= (no.of classes) * 2000 but not less than 6000, steps = (80% of max_batches), (90% of
max_batches) and filters = (no.of classes + 5) * 3.
27
CHAPTER 3. RESEARCH METHODOLOGY
obj.data file. Furthermore, we can find the Average Loss vs Iterations chart showing how
the model performed while training.
Area(Bp ∩ Bg )
IoU =
Area(Bp ∪ Bg )
TP
P recision =
TP + FP
28
3.7. EVALUATION METRICS
Recall Recall is also called as sensitivity of the model and gives the probability that
our model can successfully detect objects.It answers the question of total true positive
predictions out of actual ground truth objects to predict. Recall increases with a lower
confidence threshold.
TP
Recall =
TP + FN
The sum T P + F N gives the total positive samples in the dataset.
Precision-recall Curve If we plot the Precision on y-axis and Recall on x-axis for
all values of confidence thereshold that are between 0 and 1, the plot is called Precision-
Recall curve. P R − AU C is the are under Precision-recall curve that summarises the
curve in single number.
Z 1
P R − AU C = prec(rec)d(rec)
0
F1-Score Another measure that summarises the Precision and Recall is F1-Score. It
is calculated as a harmonic mean of precision and recall. With this average, F1-score
takes into account both false positives and false negatives.
P recision ∗ Recall
F 1Score = 2 ∗
P recision + Recall
29
CHAPTER 3. RESEARCH METHODOLOGY
Average Precision (AP) AP is calculated by averaging out the Precision values for
each recall value in a Precision-Recall curve. This is an important measure for calculating
the mean Average Precision.
Mean Average Precision (mAP) For calculating mAP, we plot the Precison-
Recall curves at multiple IoU thresholds, and take the mean of Average Precisions (APs)
calculated for all the curves.
Log Average Miss Rate Log average miss rate is used to evaluate an object
detection model by Piotr and Christian in their work [16]. They preferred this metric to
precision recall curves in their work.
Log average miss rate is obtained by plotting the miss rate against number of False
Positives Per Image (FPPI) on log axes by varying the confidence thereshold on the
detections. Miss rate is defined as the ratio of False Negatives to the actual ground truth
positives.
Average Loss vs Number of iterations chart This chart is an indicator for the
performance of the model while training.
30
3.9. RIPENESS METHODOLOGY
3. Run the script convert_gt_yolo.py that converts ground-truth labels into a desired
format.
4. Copy the result.txt file to the extra folder under the scripts directory.
The python script convert_dr_yolo.py parses the result.txt into a desired format.
New text files are created into the detection − results directory in < class_name ><
lef t >< top >< right >< bottom > format where class name is M angos and lef t, topright, bottom
are the absolute co-ordinates of the bounding-boxes. To be more specific, the script run
in this step de-normalises the normalised detection results from Yolo.
Finally run the main.py script that outputs the mAP measure and creates a folder
named output with precision − recallcurve, totalf alsepositives, truepositives and a
graph of log − averagemissrate.
31
CHAPTER 3. RESEARCH METHODOLOGY
Figure 3.13: A flow chart linking the outputs of RQ1 to inputs of RQ2
3.9.2 Clustering
Images after masking are plotted on a scatter plot to see the distribution of pixels
in RGB color space. Representations in color spaces like L*a*b*, YCrCb and HSV are
reviewed to find the one that best represents the fruits at different levels of ripeness.
Then images are clustered using the Gaussian Mixture Models.
To implement the GMM, we use P ython3.0 [63] and Scikit − learn library [47]
which is an open-source machine learning software for python to implement clustering,
regression, classification algorithms etc. For reading images and masking them, we use
OpenCV [8] which is also an open-source library with state-of-the-art computer vision
and machine learning algorithms. For creating dataframes from ripeness data, we use
32
3.9. RIPENESS METHODOLOGY
P andas [44], a fast and powerful data analysis tool for python developers . M atplotlib
[24], a python visualisation tool is used for plotting the scatter plots and color histograms.
For numerical computing with image arrays we use N umpy, an efficient and fast python
library.
The following are steps followed to while clustering the masked images.
• Read the images in RGB color space using imread function of opencv.
• Instantiate the GaussianM ixture class from scikit-learn library with the number
of clusters.
• Pass the reshaped data to f it_predict function to estimate the model parameters
and predict the labels for each data point.
• Create a scatter plot visualising the data with their corresponding labels.
• Now that every data point has a label of cluster associated by a probability, create
a pandas dataframe grouping all the pixels by their labels. This makes more sense
with the output shown in the Section 4.2.
• Create a pandas dataframe with all the pixels and their assocated cluster labels.
• Generate a new dataframe by grouping all the pixels by their cluster lables. Append
the columns Mean R, Mean G and Mean B which are the mean values of red, green
blue channels of all the pixels of a specific cluster. Also append a probabilities
column with the list of probabilities obtained from predict_proba function.
33
CHAPTER 3. RESEARCH METHODOLOGY
this, we will be merely analysing the fruit pixels. We use these mean values to
color sections of pie plot we draw later.
• Update the probabilities in the new dataframe and plot the pie chart using pandas
library. This shows the probability of pixels in belonging to a particular colored
cluster.
With these pieplots, we leave the decision of ripeness to the human perception and
explore to automate this in our future scope with labelled training data.
34
Chapter 4
This chapter answers the two research questions by presenting the experimental results
in two separate sections 4.1 and 4.2 and concludes with an analysis drawn from the
individual sections.
4.1 RQ1
This section evaluates the performances of two object detectors discussed in the Chapter 3.
Experiments are run entirely on cloud in the Google Colab environment. Yolov4, Yolov4
tiny are run for 6000 epochs. Here are the partitions of the data after augmentation,
Train : Validation : Test = 3099 : 442 : 475. Evaluating these detectors in the light
of the metrics mentioned in that chapter, this section also answers RQ1 proposed in
Chapter 1.3.
35
CHAPTER 4. EXPERIMENTS AND RESULTS
60
40
20
0
Yolo V4 Yolo V4 Tiny
Observing the chart below, we can infer that, out of the 1860 ground truth mangos,
Yolov4 has got 7% more of its predictions correct than Yolo v4 tiny model.
TP FP FN Ground Truth
2,000 1,860 1,860
1,611
1,473
1,500
Count
1,000
597
494
500 387
249
0
Yolo V4 Yolo V4 Tiny
However, in false positives, Yolov4 tiny performed well with 7% low predictions than
Yolov4. In addition, with a high true positive rate, Yolov4 records a relatively low false
negative rate. To generalise these quantitative results, a harmonic mean of precision and
recall is calculated which is called as F1-score where Yolov4 outperforms the tiny model
with a small margin of 2.22%
36
4.1. RQ1
Figure 4.3: A badly illuminated image before and after CLAHE with 13 Mangos on it.
Figures 4.3 4.4 4.5 show a sample of a badly illuminated image and the detections of
the models on it before and after CLAHE transformation. With the ground truth count
of 13 mangos in the image, Yolov4 made 14 detections on a transformed image, before
37
CHAPTER 4. EXPERIMENTS AND RESULTS
which it was only 11. This is not the case with Yolov4 tiny which actually underestimated
the count after the transformation.
Figure 4.6 reports the percentage improvement in performance of the models after
applying CLAHE on both training and test partitions.
1.94
2
0
−0.4
−1.09
−1.33
−2 −1.54
−4
While a clear improvement in the metrics for Yolov4 can be observed from the
percentage change Figure 4.6, it is not the case with the Yolov4 tiny, where applying
38
4.1. RQ1
histogram equalization diminished the key metrics by about 2%. The Figure 4.7 below
breaks down these results in an intuitive way.
TP FP FN Ground Truth
2,000 1,860 1,860
1,603
1,453
1,500
Count
1,000
489 528
500 407
257
0
Yolo V4 Yolo V4 Tiny
A slight decrease in the true positive count accompanied by 6.8% rise in false positives
and 5.16% rise in the false negatives have caused the decrease in precision, recall and
therefore F1-score.
These curves are drawn by plotting precision and recall values at multiple confidence
thresholds. From the Figure 4.8, area under the curve is relatively more for Yolov4
making it a better performing model than Yolov4 tiny.
Here is the comparision of mean Average Precision for all the models before and after
transformation. While v4 tiny reports a very small decline in the precision and recall
values, it’s mAP escaltes from 72.59% to 73.17% by a value of 0.58%. Also Yolov4 reports
a small increase in mAP by about 0.19% from 82.19% to 82.38%.
39
CHAPTER 4. EXPERIMENTS AND RESULTS
80
mAP %
78
76
74 73.17
72.59
72
70
Yolo V4 Yolo V4 Tiny
40
4.1. RQ1
2,092
1,981
2,000 1,860 1,860
1,500
1,000
Yolo V4 Yolo V4 Tiny
the other-side of analysis shows that 28% of this count corresponds to the false positives
detected. While it is only 26% of false positive detection on overall count by Yolo v4.
This implication shows that the Yolov4 is the best performer in yield estimation task.
We have also calculated the the mean detection time for both the models on the test
dataset. Figure 4.11 potrays these results.
41
CHAPTER 4. EXPERIMENTS AND RESULTS
20
19
18
17
16
15.31
15
Yolo V4 Yolo V4 Tiny
0.4
0.3 0.290.28
0.2
0.1
0
Yolo V4 Yolo V4 Tiny
Figure 4.12: Comparision chart of Log Average Miss Rate with and without CLAHE
4.2 RQ2
For the analysis of ripeness, we clustered the mangos of three different maturity stages:
(1) Raw (2) Half ripe (3) Fully ripe. Visually green mango is considered to be in Raw
stage, half-ripe mangos are with pixel colors from green, orange to yellow, and a fully
ripe mango has a huge accumulation of red pixels. These decisions are made based on
inherent human perceptions and may change from human to human.
The remainder of this section discusses analysis on these mango stages. We have
42
4.2. RQ2
experimented our methods on 10 such mangos. For generality purposes, we present the
analysis on 3 mangos in different stages. The flow of this section is according to the
flowchart presented in the Section 3.9 assuming the detections of the model are saved to
result_img folder as required in the Section 3.9.1.
4.2.1 Masking
This section presents the images that are hand-masked. Going further, these masked
images are used for clustering. Figure 4.14 shows the masked fruits from original
detections.
43
CHAPTER 4. EXPERIMENTS AND RESULTS
Figure 4.16 illustrates the GMM clustering of a Raw mango on a scatter plot with
cluster centers represented as dark red circles. Table 4.1 shows the pixel belongingness
probabilities to each of the 6 clusters where we observed that GMM clustering outputs
the every cell of P ixelbelongingprobability column by using the following equation:
#P ixels
P ixel − belonging − probability = PClustern
Cluster0 #P ixels
Out of the six clusters in the Figure 4.16 and Table 4.1, only 5 clusters belong to the
fruit pixels and the other is of the masked(black) pixels which is misleading as we want
to analyse only fruit pixels. So we tried eliminating the cluster with mask pixels(that
44
4.2. RQ2
Table 4.1: Table showing the pixel probabilities before dropping Mask pixels cluster
are visually black with Mean R = Mean G = Mean B = 0) and update the associated
cluster probabilities by following the same formula used earlier. Thus we would have
45
CHAPTER 4. EXPERIMENTS AND RESULTS
Figure 4.16: GMM clusters representation on a scatter plot for a Raw mango
only 5 clusters instead of 6 making more intuitive to the grower to assess the maturity
stage of a fruit. Table 4.2 quantifies these practices.
Cluster Labels #Pixels Mean R Mean G Mean B Outdated Probabilities Updated Probabilities
Cluster 0 504 59 52 40 0.160091 0.268085
Cluster 2 431 86 66 49 0.134001 0.229255
Cluster 3 508 73 60 44 0.164100 0.270213
Cluster 4 361 46 42 35 0.116039 0.192021
Cluster 5 76 30 27 23 0.028318 0.040426
46
4.2. RQ2
Finally the visual representation of the dominant clusters before and after dropping
the non-fruit pixels cluster is shown in the Figure 4.17. The sub-figure on right represents
the distribution of clusters colors and their associated pixel probabilities after dropping
the mask-pixels and that of the left one before dropping. The right sub-figure is more
intuitive that one can learn from it and say that about 23% of pixels of this fruit are
close to ripeness.
Figure 4.17: Pie chart of fruit color and probabilities distribution for a Raw mango
In this section we present the analysis on Half a ripe mango. Figure 4.18 shows the
pixel distributions in a scatter plot whereas Figure 4.19 shows the probability and color
distribution of pixels.
This section presents the same analysis on a Ripe mango. Figure 4.20 shows the
scatter plot and Figure 4.21 shows the pie plot of probabilty distribution with updated
probabilities.
47
CHAPTER 4. EXPERIMENTS AND RESULTS
Figure 4.18: GMM clusters representation on a scatter plot for a Half ripe mango
48
4.2. RQ2
Figure 4.19: Pie chart of fruit color and probabilities distribution for a Half ripe mango
49
CHAPTER 4. EXPERIMENTS AND RESULTS
Figure 4.20: GMM clusters representation on a scatter plot for a Ripe mango
50
4.2. RQ2
Figure 4.21: Pie chart of fruit color and probabilities distribution for a Ripe mango
51
CHAPTER 4. EXPERIMENTS AND RESULTS
4.3 Discussion
4.3.1 Answering RQ1
How can we develop a robust and accurate object detector that suits to
highly dynamic agricultural settings for fruit detection and yield estimation
tasks?
From the experimental results of RQ1 in Section 4.1, we can conlcude that Yolov4 is
the best performing model for fruit detection in orchards. Yolov4 tiny suffers from high
false positives and false negatives and may require more training data which is not ideal
for our scope.
While there is not a pronounced difference noticed in the performance of Yolov4
with CLAHE, Yolov4 tiny has a negative effect on F1-score, Precision and Recall. With
CLAHE, Yolov4 has considerably dropped false positive count escalating the precision by
3.66%. Finally, it performed better with F1 score, mAP and Area under Precision-Recall
curve maintaining good balance between precision and recall.
When it comes to Yield estimation, Yolov4 is again the better performer with more
true positive results. Thus with Yolov4, we can atleast be more confident on the
predictions. Yolov4 tiny has the shorter inference time and training time compared to
Yolov4. In the context of orchard fruit detection, accuracy matters over the speed of
detection thus concluding that Yolov4 is the better performer.
52
Chapter 5
Conclusion
The purpose of this thesis is to develop an object detection, counting and ripeness analysis
framework for mango orchards by interpreting the raw digital images. This chapter starts
with a synopsis of the various chapters of this thesis followed by the recommendations
on field deployment of models drawing upon the lessons learned from the literature and
then finishes with a discussion on the future implications of this project.
5.1 Summary
Starting with a strong motivation behind the thesis, Chapter 1 clearly defined the aims
and objectives that demand the need to address the two research questions to achieve
them completely. Then it discussed the key contributions in the light of the inevitable
limitations due to various reasons.
Sticking to the computer vision based object detection approaches, the Chapter
2 presents a coherent literature spanning the choice of sensors, feature engineering
and feature learning techniques, machine learning frameworks for orchards. Various
established methodologies for image based fruit detection in orchards for different fruits
and conditions are studied and reported. Despite the advances in computer vision within
the object detection, adoption of these techniques in the orchards for vision based tasks
is still limited. This chapter addresses this gap by exploiting state-of-the-art object
detection algorithms and customising it to suit the dynamic conditions in orchards thus
motivating the further chapters.
Chapter 3 devises a research design keeping in mind the key gaps to fill in the literature.
In the way, it proposes the research methods and justifies them with a reflection from
the literature. Sources of data and collection methods are presented, augmentation and
53
CHAPTER 5. CONCLUSION
pre-processing techniques are introduced to address the problems of over-fitting and low
contrast respectively. It then details every step that revolved around these methods from
setting up the environment, installing the sources to running the models. Following this,
relevant metrics for evaluating object detection models are briefly introduced. It then
sets out the steps to be followed for testing the models.
Chapter 4 investigates for a better performing model by presenting a comparative
study on the methods chosen. It follows the procedures and metrics set in the Chapter
3 to evaluate the models. It concludes with a comprehensive discussion on the better
performing models to answer the research questions in the Section 1.3.
This thesis developed frameworks for fruit detection, counting and ripeness analysis
that are crucial for growers to efficiently plan the in-farm operations like yield estimation
and proper time to harvest etc. The proposed research methods are the first of its kind
in the orchard fruit detection tasks.
54
5.2. RECOMMENDATIONS FOR FIELD DEPLOYMENT
Figure 5.1: Figure showing the significance of depth of scene and size of the fruit to be
well detected
approach helps mitigate the false negatives count but comes with an additional complexity
to avoid multi-view registration of a fruit (counting a same fruit multiple times).
55
CHAPTER 5. CONCLUSION
imaging artefacts such as lens flares and over-exposure occur on the bright days [3].
Figure 5.3 depicts this fact with detections on the same image that are captured at
different elevations of the Sun. This is not desirable as the detection count varies with
the time of data acquisition. Hence, techniques that are robust to these changes and
allow us to gather data at any time of the day need to be explored.
56
5.3. FUTURE WORK
Figure 5.3: A variation in the detection count observed on same image captured at
different times.
57
CHAPTER 5. CONCLUSION
58
Appendix A
Appendix
A.1 CLAHE
Below is the script for applying CLAHE on an image.
1 def h i s t o g ra m_ e qu ali zat io n ( img ) :
2 lab = cv2 . cvtColor ( img , cv2 . COLOR_RGB2LAB )
3 # ----- Splitting the LAB image to different channels
4 # L for lightness and a and b for the color opponents green to red
and blue to yellow .
5 l , a , b = cv2 . split ( lab )
6 # ----- Applying CLAHE to L - channel
7 # clipLimit : Threshold for contrast limiting .
8 # tileGridSize : Image will be divided into grids of size
tileGridSize to apply histogram equalization
9 clahe = cv2 . createCLAHE ( clipLimit =3.0 , tileGridSize =(8 , 8) )
10 cl = clahe . apply ( l )
11 # ----- Merge the CLAHE enhanced L - channel with the a and b channel
-----------
12 cl_limited = cv2 . merge (( cl , a , b ) )
13 # ----- Converting image from LAB Color model to RGB model
--------------------
14 trans formed_i mage = cv2 . cvtColor ( cl_limited , cv2 . COLOR_LAB2RGB )
15
16 return transf ormed_ima ge
59
APPENDIX A. APPENDIX
60
A.2. RIPENESS ANALYSIS
61
APPENDIX A. APPENDIX
82 # print (" The plot below shows the dominance of clusters interms of
their number and color ." , colors )
83
84 plt . figure ( figsize =(16 ,8) )
85 ax1 = plt . subplot (121 , aspect = ’ equal ’)
86 image_newdf . plot ( kind = ’ pie ’ , y = ’ Updated Probabilities ’ , ax = ax1 ,
autopct = ’ %1.1 f %% ’ , colors = colors ,
87 startangle =90 , shadow = False , labels = image_newdf [ ’ Cluster Labels ’] ,
legend = True , fontsize =14)
88
89
90 # ax1 = plt . subplot (122 , aspect = ’ equal ’)
91 ’’’ image_newdf . plot ( kind = ’ pie ’, y = ’ Pixel belonging probability ’,
ax = ax1 , autopct = ’%1.1 f %% ’ , colors = colors ,
92 startangle =90 , shadow = False , labels = image_newdf [ ’ Cluster Labels ’] ,
legend = True , fontsize =14) ’’’
93
94 # Updated Probabilities ... are
95 ’’’ width = 0.90
96 fig , ax = plt . subplots ()
97 a = ax . bar ( clusters , count , width , color = colors ) # plot a vals
98
99 plt . show () ’’’
100 return colors
101
102
103 # In [40]:
104
105
106 def GenerateProb ( prob ) :
107 meanProb = prob . mean ( axis = 0)
108 # print (" Mean probability of pixel belongingness to each gaussian
cluster .")
109 # print ( meanProb )
110 return meanProb
111
112
113 # In [157]:
114
115
116 def createDataFrame ( image_df , probabilities ) :
117 image_newdf = pd . DataFrame ()
62
A.2. RIPENESS ANALYSIS
63
APPENDIX A. APPENDIX
64
A.2. RIPENESS ANALYSIS
195
196 image_df = pd . DataFrame ( mng_img_re )
197 # Add the Cluster Labels column to Reshaped Image dataframe
198 image_df [ ’ Cluster Labels ’ ]= labels
199 # Picks the color pof pixels for scatter plot
200 colors_ = colorPixels ( imagePath )
201
202 scatterPlot ( centers , mng_img_re , colors_ )
203 # Generates the mean probability that a pixel belongs to a cluster
204 probability_list = GenerateProb ( probs )
205 # Creates a New Dataframe with # Pixels corresponding to each cluster
and their mean R , G , B values .
206 updated_image_df = createDataFrame ( image_df , probability_list )
207 # Plots the n (= clusters ) dominant colors in the image .
208 plotColorHist ( updated_image_df )
209
210
211
212 # # Ripe Mango Clustering Analysis
213
214 # In [160]:
215
216
217 from matplotlib import colors
218
219 imagePath = " fullyRipe . png "
220 clusters = 6
221
222 mng_img = cv2 . imread ( imagePath )
223 mng_img = cv2 . cvtColor ( mng_img , cv2 . COLOR_BGR2RGB ) # COnvert to RGB
COlor Space .
224
225 plot ( mng_img )
226
227 mng_img_re = mng_img . reshape (( mng_img . shape [0] * mng_img . shape [1] , 3) )
# Reshape the 3 - channel image to cluster
228
229 labels , centers , probs = GMM_Cluster_Prob ( mng_img_re , clusters )
230
231 image_df = pd . DataFrame ( mng_img_re )
232 # Add the Cluster Labels column to Reshaped Image dataframe
233 image_df [ ’ Cluster Labels ’ ]= labels
65
APPENDIX A. APPENDIX
66
A.2. RIPENESS ANALYSIS
273
274 # Generates the mean probability that a pixel belongs to a cluster
275 probability_list = GenerateProb ( probs )
276 # Creates a New Dataframe with # Pixels corresponding to each cluster
and their mean R , G , B values .
277 updated_image_df = createDataFrame ( image_df , probability_list )
278 # Plots the n (= clusters ) dominant colors in the image .
279 plotColorHist ( updated_image_df )
280
281 # In [158]:
282
283
284 from matplotlib import colors
285
286 imagePath = " Raw . png "
287 clusters = 6
288
289 mng_img = cv2 . imread ( imagePath )
290 mng_img = cv2 . cvtColor ( mng_img , cv2 . COLOR_BGR2RGB ) # COnvert to RGB
COlor Space .
291
292 # plot ( mng_img )
293
294 mng_img_re = mng_img . reshape (( mng_img . shape [0] * mng_img . shape [1] , 3) )
# Reshape the 3 - channel image to cluster
295
296 labels , centers , probs = GMM_Cluster_Prob ( mng_img_re , clusters )
297
298 image_df = pd . DataFrame ( mng_img_re )
299 # Add the Cluster Labels column to Reshaped Image dataframe
300 image_df [ ’ Cluster Labels ’ ]= labels
301 # Picks the color pof pixels for scatter plot
302 colors_ = colorPixels ( imagePath )
303
304 # scatterPlot ( centers , mng_img_re , colors_ )
305
306 # Generates the mean probability that a pixel belongs to a cluster
307 probability_list = GenerateProb ( probs )
308
309 # Creates a New Dataframe with # Pixels corresponding to each cluster
and their mean R , G , B values .
310 updated_image_df = createDataFrame ( image_df , probability_list )
67
APPENDIX A. APPENDIX
68
Bibliography
[2] P. Annamalai, W. Lee, and T. Burks, Color vision system for estimating citrus
yield in real-time, in ASAE Annual International Meeting., 2004.
[3] S. Bargoti, Fruit Detection and Tree Segmentation for Yield Mapping in Orchards,
doctor of philosophy ph.d., 2017-02-03.
[4] S. Bargoti and J. Underwood, Deep fruit detection in orchards, in 2017 IEEE
International Conference on Robotics and Automation (ICRA), IEEE, 2017,
pp. 3626–3633.
[5] E. Barnea and O. Ben-Shahar, Depth based fruit detection from viewer-based
pose, in Proceedings of the AgEng’14 Conference (Zurich, Switzerland). Paper,
volume 137., 2014.
[7] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, Yolov4: Optimal speed and
accuracy of object detection, arXiv preprint arXiv:2004.10934, (2020).
[8] G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, (2000).
69
BIBLIOGRAPHY
[10] D. Bulanon, T. Burks, and V. Alchanatis, Image fusion of visible and thermal
images for fruit detection, Biosystems Engineering, 103 (2009), pp. 12 – 22.
[13] H. Cheng, X. Jiang, Y. Sun, and J. Wang, Color image segmentation: advances
and prospects, Pattern Recognition, 34 (2001), pp. 2259 – 2281.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A
Large-Scale Hierarchical Image Database, in CVPR09, 2009.
70
BIBLIOGRAPHY
[21] C. Guo, F. Liu, W. Kong, Y. He, B. Lou, et al., Hyperspectral imaging analysis
for ripeness evaluation of strawberry with support vector machine, Journal of
Food Engineering, 179 (2016), pp. 11–18.
[23] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recog-
nition, in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2016, pp. 770–778.
[31] H. Lalel, Z. Singh, S. Tan, and M. Agustí, Maturity stage at harvest affects
fruit ripening, quality and biosynthesis of aroma volatile compounds in ‘kensington
pride’mango, The Journal of Horticultural Science and Biotechnology, 78 (2003),
pp. 225–233.
71
BIBLIOGRAPHY
[32] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and
time series, The handbook of brain theory and neural networks, 3361 (1995),
p. 1995.
[36] P. Mahto, P. Garg, P. Seth, and J. Panda, Refining yolov4 for vehicle detec-
tion, International Journal of Advanced Research in Engineering and Technology
(IJARET), 11 (2020).
72
BIBLIOGRAPHY
[45] A. Payne and K. Walsh, Machine vision in estimation of fruit crop yield, Plant
Image Analysis: Fundamentals and Applications; CRC Press: Boca Raton, FL,
USA, (2014), pp. 329–374.
[46] A. Payne, K. Walsh, P. Subedi, and D. Jarvis, Estimating mango crop yield
using image analysis using fruit at ‘stone hardening’ stage and night time imaging,
Computers and Electronics in Agriculture, 100 (2014), pp. 160 – 167.
[50] Z. S. Pothen and S. Nuske, Texture-based fruit detection via images using the
smooth patterns on the fruit, in 2016 IEEE International Conference on Robotics
and Automation (ICRA), 2016, pp. 5171–5176.
73
BIBLIOGRAPHY
[51] , Texture-based fruit detection via images using the smooth patterns on the fruit,
in 2016 IEEE International Conference on Robotics and Automation (ICRA),
2016, pp. 5171–5176.
[53] M. Regunathan and W. S. Lee, Citrus fruit identification and size determination
using machine vision and ultrasonic sensors, in 2005 ASAE Annual Meeting,
American Society of Agricultural and Biological Engineers, 2005, p. 1.
[56] R. Serraj and P. Pingali, Agriculture Food Systems to 2050: Global Trends,
Challenges and Opportunities, 01 2019.
[61] M. Stein, Improving image based fruitcount estimates using multiple view-points,
2016.
74
BIBLIOGRAPHY
[62] M. Tan, R. Pang, and Q. V. Le, Efficientdet: Scalable and efficient object
detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2020, pp. 10781–10790.
[65] C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and
I.-H. Yeh, Cspnet: A new backbone that can enhance learning capability of cnn,
in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, 2020, pp. 390–391.
[66] E. Westerberg, AI-based Age Estimation using X-ray Hand Images: A comparison
of Object Detection and Deep Learning models, PhD thesis, 2020.
Presentationen gjordes online via Zoom.
75