Autonomous Bot Using Machine Learning and Computer Vision: SN Computer Science July 2021
Autonomous Bot Using Machine Learning and Computer Vision: SN Computer Science July 2021
Autonomous Bot Using Machine Learning and Computer Vision: SN Computer Science July 2021
net/publication/351247579
CITATIONS READS
2 126
2 authors, including:
Thejas Karkera
1 PUBLICATION 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Chandra Singh on 27 February 2022.
ORIGINAL RESEARCH
Abstract
Self-driving vehicles have the potential to revolutionize urban mobility by providing sustainable, safe, and convenient
transportability. In recent years several companies have identified automation as their major area of research and also are
investing a huge amount of their financial resources in automating vehicles. This is the period of time where autonomous
vehicles are very close to being capable of transporting us to destinations without the aid of drivers in the very near future.
In the current generation, the main focus is to make vehicles more automated to provide a better driving experience. These
vehicles are designed to drive without or with little human assistance by sensing it’s the environment. This can be achieved
by a combination of sensors and processing the data with the help of computer vision technology and machine learning.
The vehicle autonomy needs to be conducted with care, keeping in mind the challenges that can be faced during the process.
Recognizing the traffic signals, understanding the signs, identifying the lane markings are some of the basic functions that it
needs to perform. After gathering all this information, the next task is to understand the predefined protocols and follow them
without any fault. This problem can be solved stepwise using some functions from image processing and computer vision
technology such as Haar transform, perspective mapping, perspective transformation, canny edge detection, and histogram
equalization. This solution is further enhanced by including machine learning, which improves performance with experience,
making it more reliable. It should be noted that, although the vehicles promoted by the companies ensure 80% reliability, we
are not yet ready to completely adapt to the idea of automated vehicles. This paper hence focuses on the negative of current
ideology and makes it reliable enough to pave a way for its immediate implementation. In this paper, the authors have used
a microcontroller and a microprocessor, to Arduino uno is used as a microcontroller and Raspberry pi B+ model is used as
the microprocessor. To detect the lanes the authors have used image processing using a library called OpenCV. For detect-
ing the traffic signs the authors have used supervised machine learning technique, to capture the images authors have used
raspberry pi version 2 cam, using cascade training to classify the positive images from the negative images.
SN Computer Science
Vol.:(0123456789)
251 Page 2 of 9 SN Computer Science (2021) 2:251
exploring the concepts and effects of some basic function- all the traffic signals are designed in a common way to be
alities in machine learning, computer vision and other such understood easily by everyone. Since all three colors (red,
fields. The final vision is to ideally accomplish all the neces- yellow, and green) have high contrast ratios, this feature
sary tasks that a basic self-driving vehicle needs to perform. itself is used to separate the traffic signal from the rest of
Ideally here refers to a methodology that is uncomplicated to the objects in the image frame [13]. This separates the
understand, easy to modify, and open to improvisation. The region of interest from the rest of the surroundings. Then
report details on how computer vision technology along with to identify the colors individually, their RGB pixel values
some image processing functions further dealt with machine are considered and then the colors are classified [14]. For
learning help to study the environment of the vehicle and more precise performance, this process is carried out in
enables the vehicle to find out a path and travel through it in six different steps. Input frame from the video, color fil-
the prescribed way. tering, edge detection, contour detection, detect bounding
rectangle of contours, and then save candidate images for
recognition. The data is then sent to the processor which
Literature Survey then performs data set exploration and the required action
is taken [15]. The process of detection is followed by the
The process of automation of vehicles is carried out by process of recognition, which involves recognizing vari-
various methods like sensor fusion, computer vision, path ous regions of interest on which the functions need to be
planning, actuator, deep learning, and localization [1]. performed. The first step in recognition is data set explora-
Computer vision deals with the process of making com- tion. The data set used for training is GTSRB. Approxi-
puters acquire a high level of understanding from digital mately 1000 images are taken for each class from different
images [2]. Sensory fusion deals with the process of com- perspectives and different sizes. Twenty percentage of the
bining sensors and analyzing the obtained sensory data training data set is stored for the validation process, hence
as a combined result of two or more sensors that would it helps to increase the data set size artificially by a method
yield a better understanding of the environment of obser- called the augmentation process. Random images are cho-
vation [3]. Deep learning can be seen as a wider family sen from the existing images and random rotations and
of machine learning, that includes various types of data translations are performed on the [16]. The transformed
representation unlike task-specific algorithms [4, 5]. Path set of pixels is then added to the original set of pixels.
planning is a primitive step that identifies the path where The next step is training and model performance. Sto-
the vehicle is allowed to pass through. An efficient path chastic gradient descent is used as the optimizer. Instead
planning can be done by plotting the shortest path between of stochastic gradient descent other optimizers can also
two points [6]. An actuator helps in moving or control- be used to increase the performance, since the work not
ling the vehicle [7]. Navigation is the vehicle’s capability only focuses on the optimizer. Another important perfor-
to determine the position of the vehicle within its frame mance indicator is batch size tuning because small batch
of reference and plan the most effective path towards the sizes results in slow convergence whereas large batch
destination. To navigate in its environment, the vehicle sizes cause memory problems. Usually, middle batch sizes
requires a representation of the plot, i.e. a map showing are preferred. To conclude, the paper includes two main
the environment and the capability to interpret its repre- phases: detection and recognition [17]. The first sign is
sentation. Edge detection comprises a set of mathematical detected from the real-time video stream using a CNN
equations that identify the points within a digital image model. The detected sign is classified with an accuracy of
where there is a rapid change or where the image bright- 97.42%. However, when the video obtained by the RC Car
ness has discontinuities [8]. Since the usual color of the is streamed online, the accuracy rate instantly decreases
road is black, which is the least intensity color and that of to 87.36%. The reason behind this rapid fall is because of
the lane markings is either white or yellow both of which the low sensitivity of the color filtering method used to
falls under the region of high-intensity colors, it is easier the lighting and other objects. From the results, the classic
to differentiate the two regions, thus making the task of image processing methods are eliminated and recurrent
identifying the region of interest easier. The points where neural networks are used for detection as well as recogni-
the image brightness changes sharply are grouped together tion phases. Hence, the result consists of each object in
and stored as a set of curved line segments [9, 10]. These the whole picture [18]. By this, the decrease in the perfor-
line segments compose the edges of the region of inter- mance can be contradicted. The theory of neural networks,
est [11]. Edge detection is a fundamental tool in image autonomous vehicles, and the process of how a prototype
processing, machine learning, and computer vision, after with a camera as its only input can be used to design,
which the functions are performed on the image [12]. The test, and evaluate the algorithm capabilities [19, 20]. The
process of detection is done in three simple steps since ANN is an efficient algorithm that helps in recognizing
SN Computer Science
SN Computer Science (2021) 2:251 Page 3 of 9 251
SN Computer Science
251 Page 4 of 9 SN Computer Science (2021) 2:251
is applied to it. Now the wrapped image is added with this negative then the bot has to move left. The magnitude of the
frame and the lanes can be detected accurately. The actual turn depends on the distance from center-lane.
distance of the lanes are found out by dividing the region of ( )
x2 +y2
interest equally and finding the maximum intensity levels G(x, y) = e
−
2𝜎 2 , (1)
of each element. Again the array is divided into two parts to
detect the left and right part of the lanes. The array elements fs = G(x, y) ⊗ f (x, y), (2)
having the max intensity corresponds to the lane position.
After finding the distance of the lanes the midpoint is taken.
Taking the center of the camera frame as then we have used 𝜕fs
Gx = , (3)
raspberry pi as our microprocessor. To perform image pro- 𝜕x
cessing on the objects we have used an open platform called
OpenCV. 𝜕fs
Gy = , (4)
There are many other platforms for image processing we 𝜕y
have used open CV as it is a open source platform. We have
used pi cam 2 to capture video. We have used this camera √
because we are performing processing with image resolu-
Edge Gradient (G) = G2x + G2y , (5)
tion of 480 × 360. Pi cam perfectly supports this resolution
and is cheaper than other cams. After capturing the required (
Gy
)
frame we first convert the image into signature, before that Angle (𝜃) = tan−1 , (6)
Gx
we have to change the raspberry pi default BGR format of
the image to RGB format. To detect the lanes we have to
| |
apply a perspective wrap on the image. Edge Gradient = ||Gx || + |Gy |, (7)
| |
To apply perspective wrap first we have to create a region
of interest around the working region. Then a perspective {
transform is taken over the region to get a bird’s eye view 1, if f (x, y) > 0.5
t(x, y) = . (8)
of the image. A fresh frame of the same image is taken and 0, if f (x, y) ≤ 0.5
canny edge detection algorithm is applied to it. Let f(x, y)
denote the input image and G(x, y) denote the Gaussian
function. By convolving G and f we for a smoothed image, Master/Slave Communication
which is given by fs. After this it is followed up by calcu-
lating the gradient magnitude and direction. The gradient Here parallel communication is setup between the microcon-
magnitude is computed at every point and direction to esti- troller and raspberry pi using GPIO pins of raspberry pi and
mate the edge strength and direction at every point, which four digital pins of microcontroller. Conditions are applied
is called edge gradient. for different distances between the frame center and the lane
The equations of the process involved in canny edge center. Depending on the conditions the bot is moved left or
detection are from Eqs. (1–7) Thresholding of the image is right towards the frame center (Figs. 2, 3).
done and is added with canny edge detected output. Thresh-
olding is done to extract or enhance the image. To extract an Machine Learning
object from the image one way is to separate the object and
background by using a threshold. At any point (x, y) in an To detect the traffic signals, obstacles, and traffic signs
image f(x, y) > T is called an object point, otherwise called labelled machine learning is used, i.e. the data set used
as a background point. Equation (8) gives the mathematical by the authors will be labelled, these labelled images are
equation of the process. As our image is a grayscale image compared with the real-time images taken by the cam. To
we set T = 0.5. Now the wrapped image is added with this classify the images we need a machine learning model. To
frame and the lanes can be detected accurately. The actual implement a machine learning model, we require a data set
distance of the lanes is found out by diving the region of which sufficient enough for the model to classify between
interest equally and finding the maximum intensity levels the images. The authors take 400 samples of the object to
of each element. Again the array is divided into two parts to be detected these are called the positive images and then
detect the left and right part of the lanes. The array elements 300 negative images are taken, that is those areas which do
having the max intensity corresponds to the lane position. not belong to the object to be detected. Histogram equaliza-
After finding the distance of the lanes the midpoint is taken. tion is applied to all the images after converting the RGB
Taking the center of the camera frame as a reference the image to a grayscaled image. If we consider values of con-
bot has to adjust its position. If the value of the distance is tinuous intensity and if r is the intensities of the image to
SN Computer Science
SN Computer Science (2021) 2:251 Page 5 of 9 251
Fig. 2 Master–slave communi-
cation setup
be processed, we focus attention on intensity mappings of using opencv_traincascade. For given an training example,
the form s = T(r). The purpose of using histogram equaliza- (x1, y1)…(xn, yn), where yi = 0, 1 respectively, the cascade
tion is to uniformly distribute gray value, by making the transform initializes the weights for yi respectively. For
probability distribution function the image intensity uni- training examples from 0 − N, the transform normalizes the
form. By creating a info. file of the images we store the weights, so that the weights are of the form of probability
exact location of the image and also the number of objects distribution. For each feature, we train the classifier which
in each image. The info. file is created by using the OpenCV is restricted to use a single feature, the errors are evaluated
integrated annotation tool. By using these images a training and classifier with the lowest error is taken and the weights
system is developed to recognize the object. This training are updated. Finally a classifier strong enough to classify
method involves cascading and then a XML file format of between is build, which is given by Eqs. (9–11). Where ht is
the learned method is created. This file has to be uploaded a classifier, where alpha and beta are used for updation of the
to the program to apply the remaining operations. After cre- weights. These values are chosen randomly. The best results
ating the info. file cascade training is done on the image of cascade classifier will consist of 38 stages. To train the
SN Computer Science
251 Page 6 of 9 SN Computer Science (2021) 2:251
detector of image size 240 × 240 a total of 30 min was taken. Results and Discussion
After detecting the image, the next step is to stop before the
sign, for this a distance should be known from the bot to the Results obtained from this paper are as follows. The first
detected sign. The authors have used Haar cascade transfor- result is lane detection. From Fig. 5 it has detected the lane
mation to implement the model. To find the distance we use and the lane center is at a distance of − 18 from the frame
linear equations. A linear equation for eg: y = mx + c, where center which indicates us to take a left turn and in Fig. 6
x is a weight of the equation. This weight and the intercept it is giving a value zero, which is a condition for forward
are found manually. After getting the distance from the sign direction.
after the sign is detected, a threshold is set to stop the bot Next part of our paper was to detect the signs in sides of
when the distance is reached. The whole working flow of the the roads, we have taken 500 negative images of the stop
system is shown in Fig. 4.
{ N N
∑ 1 ∑
h(x) = 1 𝛼t ht (x) ≥ 𝛼, (9)
t=1
2 t=1 t
1
Where, 𝛼t = log
𝛽t
, (10)
SN Computer Science
SN Computer Science (2021) 2:251 Page 7 of 9 251
SN Computer Science
251 Page 8 of 9 SN Computer Science (2021) 2:251
Table 1 Evaluation table for Video_Samples Number of posi- Number of nega- Accuracy Precision Recall
different samples/sample size tive samples tive samples
The bold numbers specify the highest value of accuracy for a particular value of Number of positive sam-
ples and Number of negative samples
1. Ballard DH, Brown CM. Computer vision. 1st ed. Prentice Hall;
1982.
SN Computer Science
SN Computer Science (2021) 2:251 Page 9 of 9 251
15. Hough PVC. Machine analysis of bubble chamber pictures. In: 21. Hsieh J-W, Yu S-H, Chen Y-S, Hu W-F. Automatic traffic surveil-
Proc. Int. Conf. High Energy Accelerators and Instrumentation; lance system for vehicle tracking and classification. IEEE Trans
1959. Intell Transp Syst. 2006;7(2):175–87.
16. Nunes E, Conci A, Sanchez A. Robust background subtraction on 22. Jung Y-K, Ho Y-S. Traffic parameter extraction using video
traffic videos. In: 2011 18th International conference on systems, based vehicle tracking. In: 1999 IEEE/IEEJ/JSAI international
signals and image processing (IWSSIP); 2011. pp. 1–4. conference on intelligent transportation systems, proceedings, pp.
17. Lucas BD, Kanade T. An iterative image registration technique 764–769.
with an application to stereo vision. In: IJCAI81; 1981. pp. 23. Cheung S-CS, Kamath C. Robust background subtraction with
674–679. foreground validation for urban traffic video. EURASIP J Apple
18. Pang CCC, Lam WWL, Yung NHC. A novel method for resolving Signal Process. 2005;2005:2330–40.
vehicle occlusion in a monocular traffic-image sequence. IEEE
Trans Intell Transp Syst. 2004;5:129–41. Publisher’s Note Springer Nature remains neutral with regard to
19. Chiu C, Ku M, Wang C. Automatic traffic surveillance system jurisdictional claims in published maps and institutional affiliations.
for vision-based vehicle recognition and tracking. J Inf Sci Eng.
2010;26:611–29.
20. Gordon RL, Tighe W. Traffic control systems handbook. Wash-
ington, DC, USA: U.S. Department of Transportation Federal
Highway Administration; 2005.
SN Computer Science