Final Destination: A Computer Vision Guided Robotic System

Final Destination:
A Computer Vision guided Robotic System
Abstract
Computer Vision and Autonomous Robot Control are active fields of research that have
immense possibilities and applications in today’s world. Their successful integration into
an intelligent robotic system has been a challenge that researchers have tackled using
several different paradigms. We present here a wireless mobile robot design and an
intelligent software system to control it. The system is primarily based on Computer
Vision as its sensory input, and is capable of several navigational behaviours. The unique
feature of our design is a multi-threaded approach that facilitates parallel execution of
sensory processing, decision-making and robot control.
Keywords
computer vision, autonomous robot control, mobile robots, navigation, obstacle
avoidance, path planning
Calcutta Institute of Engineering and Dinabandhu Andrews Institute of

Management Technology and Management
Arpan Chakraborty Jit Ray Chowdhury

Trisha Biswas
Saptarshi Debnath
Abhra Chattopadhyay
1. Introduction
The domains of Artificial Intelligence, Computer Vision and Robotics intersect in many
interesting ways to give us solutions to previously insurmountable problems, as well as
newer challenges that pave the way for development of cutting-edge technology.
Autonomous Robot Control using Computer Vision is our primary subject for discussion.
Autonomous Robots find applications in various domains that demand different

capabilities and levels of efficiency. Domestic robots such as intelligent self-navigating
floor cleaners, desktop assistants and interactive toy-robots are slowly but surely making
their way into our households. These machines need not be made with very high
precision, but attract significant research due to their inherent need for complex AI
algorithms. Industrial robots such as those used for automobile assembly, inspection of
manufactured parts, etc. have much more definite jobs, but at the same time require high
accuracy and efficiency. More mission-critical applications of robots include de-mining
operations in war zones, military surveillance and reconnaissance operations, exploration
of extraterrestrial bodies such as Mars, et al. Robots built for these purposes require
exceptional levels of expertise, highly reliable parts and fail-safe operations. They also
tend to cost an order of magnitude more than other robots. The common characteristics
that are desirable of any robot or robotic system include acceptable response time,
reliability, efficiency, adaptability, simple operator-interface, cost-worthiness, and in
some applications, the guarantee against danger to human life.
Current approaches towards autonomous robots deal with several different aspects.
DeSouza and Kak [1] provide a good insight into the techniques being used for robot
Navigation using Computer Vision. Barnes and Liu [2] deal with more complex aspects
of vision including Embodiment, Perception and important issues arising out of the
physical constraints that robots face. The FINALE system developed at Purdue, described
in Kosaka and Kak [3], shows the importance of Model-based Reasoning in autonomous
robot systems, and that dealing with the inherent Uncertainty in the real world helps
make them more adaptive and robust. A successful implementation of a Stereo-vision
guided robot is described in Kobashi et al [4]. Further information about the state-of-the-
art of industrial robots and current vision research projects can be found at the
Automation Magazine [6], and Vision Systems [7], among other sources.
The system described in this paper was developed primarily for two autonomous robotics
events: The Pixel event at Techfest 2008, Indian Institute of Technology – Bombay; and,
the IMAGE event at Kshitij 2008, Indian Institute of Technology – Kharagpur. These
were part of the annual technological festivals of the respective Indian institutes that
attract thousands of participants – from national and international educational/research
institutions. Although built for these events, our system illustrates several problems faced
while developing any generic vision-guided robot, and demonstrates effective solutions.
We begin by presenting the problem statements and specifications issued for the two
events. Then we provide a detailed description of our robot system: Final Destination.
2. Problem Statements
This section outlines the two events that we designed our system for. We present below
the problem statements obtained from the event websites:
Pixel at IIT Bombay [8]: Build a computer controlled navigator that, with the help of
an overhead/on-board camera and a computer, moves 6 balls of three different specified
colours into three distinct coloured holes.
Arena Specifications:
1. The arena consists of a table top of inner dimension 160cm X 120cm, coloured
'blue'. The table will have a boundary of height 8cm of unison colour 'blue'.
2. The holes will be surrounded by coloured ring of 10mm thickness.
Balls: There will be six balls in all with three different colours namely red, green and
yellow, two balls of each colour. The balls will be placed randomly in the arena. The
balls have an approximate diameter of 5.4 cm and weigh approximately 68 grams.
Navigator Dimensions: 20x20x30 cm (length x breadth x height)
IMAGE (Image Mapping Autonomous Generic Explorer) at IIT Kharagpur [9]:

Build a computer controlled autonomous robot, which, with the help of a maximum of
two cameras (second being optional), should be able to navigate the arena avoiding the
obstacles and passing through various checkpoints before finally reaching the end point.
Arena Specifications:
1. The arena will be a 2.5 m x 2.5m wooden surface painted with black blackboard
paint. There would be a buffer area of 20 cm on each side of the arena.
2. The arena would consist of a starting zone, checkpoints, obstacles and an end point.
Obstacles:
1. The obstacles will be WHITE in color and made of wood.
2. The number and dimensions of obstacles will not be disclosed and may change
during different rounds.
Checkpoints: They are basically circular patches of RED color (10 cm diameter)
stuck to the arena floor. There may be 2 - 4 checkpoints in the arena.
End Point: Denoted by a RED square of dimension 30cm x 30cm. This again would
be stuck to the ground. The color would be same as that of the checkpoints.
Navigator Dimensions: 25x25x100 cm (length x breadth x height)

3. System Overview
To meet the requirements of the two events, we built a simple wireless robot and
designed control software to give it autonomous navigational and operational capabilities.
The software can be run on a Personal Computer (PC), connected to a Signal
Transmission Setup which relays commands to the robot. The entire system is made up of
the following hardware and software components:
The Robot: This is the main operational hardware consisting of the Chassis, Drive
Motors, Wireless Signal Reception Circuit with antenna, On-board Power Source
(rechargeable batteries), and robot body. In this document we have referred to the robot
by the name ‘Nick’ or simply as ‘bot’. Section 4 describes the robot hardware in detail.
Ball-Grabbing Mechanism: This is a special attachment to Nick which facilitates

grabbing and dropping of balls – necessary for the Pixel event only. This component is
described in Section 4.3.
Signal Transmission Setup: This consists of a microcontroller operated serial

interface with the computer and a radio-frequency transmission module. Commands sent
from the software system are interpreted by the microcontroller at the transmission end
itself, and corresponding signals for the robot’s drive motors are directly transmitted.
This setup is described in Section 5.
Software System: This consists of the different modules for Image Processing,
Planning and Task Execution. We implemented most of the modules using C++ on
Microsoft® Visual Studio 6.0 platform with the Intel® OpenCV library for image
processing. For programming the microcontroller we used WINAVR™, which included
Programmers’ Notepad (an IDE to compile C code to a microcontroller program) and
AVR Dude (to deploy/upload instructions to the microcontroller).
Section 6 includes a detailed description of the major software components, with
algorithms and illustrations.
We used a Personal Computer with the following configuration for test runs as well as
the actual performance runs:
Processor : Intel® Pentium 4 (x86 architecture) – 2.4 GHz
RAM : 512MB DDR
Graphics Card : NVIDIA® GeForce2 MX
Operating System : Microsoft® Windows XP – Service Pack 2
We begin our detailed system description with the robot hardware components in the
following section.
4. Robot Hardware
4.1 Chassis and Drive

Nick’s chassis has been built using Masonite board. It is a widely available material that
can be easily cut into any shape we want. At the same time, it is strong enough to be
suitable for use in building a robot body. The main advantage is that no special
fabrication machinery is required for working with Masonite boards.
Brushed DC motors have been used for locomotion. Two motors with gearboxes have
been extracted from toy cars for this purpose. Each motor drives one wheel on either side
of Nick, and steering is accomplished using Differential Drive. A single ball-castor has
been fixed near the rear end to act as a third wheel.
Figure 1 and Figure 2 show Nick’s chassis made using Masonite board; along with
motors, gearboxes and wheels.
Figure 1: Top view of chassis
Figure 2: Perspective view of chassis

4.2 Wireless Signal Reception Circuit
Nick is controlled using commands sent by a computer to a microcontroller circuit that
generates corresponding signals for the drive motors, and transmits them as radio
frequency (RF) bursts. These bursts are received by the Wireless Signal Reception
Circuit (WSRC) onboard Nick. The WSRC is the receiving end of a simple RF-controlled
toy car system, adapted to our needs.
Figure 3 shows Nick’s chassis with the WSRC installed and connected to the motors. A
metal structure can be seen in front of Nick, which is the Ball-Grabbing Mechanism
described in the next section.
Figure 3: Chassis with WSRC and Ball-Grabbing Mechanism
4.3 Ball Grabbing Mechanism

The Pixel event at IIT Bombay required us to build a robot that could grab balls and put
them into holes. We designed a mechanical Ball-Grabbing Mechanism for this purpose.
Bicycle spokes were used to create an octagonal frame (as seen in Figure 3), from which
we hung metal-wire loops that can swing freely – but in only one direction (inwards).
They act like a trap, allowing a ball to roll into the octagonal area and then preventing it
from getting out.
Initial testing confirmed the validity of the mechanism. The force with which Nick
moved towards a ball was enough to swing the metal loops inward and trap the ball
inside. The problem statement specified that the holes would be present at the same
surface level on which the robot would move. Hence, there was no need for a separate
dropping mechanism – a trapped ball would automatically drop into a hole if it was
brought in directly above it.
This simple mechanism, though very crude, solved a very important issue. A controlled
grabbing and dropping mechanism would require precise hardware, additional
computation within the software system, and most importantly, a separate radio channel
for control signals. As our WSRC was extracted from a toy car, it only had two channels
which were necessary for navigational control, and it would have added considerable
complexity to the system if we introduced another WSRC for the grabber.
Figure 4 and Figure 5 show the Ball-Grabbing Mechanism from different view angles,
and Figure 6 shows a ball trapped inside.
Figure 4: Ball-Grabbing Mechanism - Side View
Figure 5: Ball-Grabbing Mechanism - Top view
Figure 6: A ball trapped inside

4.4 Robot Body
The primary constraints in building the robot body were:
i. It had to be made of a dark material with low reflectivity so that it is never
confused as a ball, obstacle or other artifact.
ii. It had to cover all the internal parts of the robot including the circuit board, wires,
motors and gearboxes because of the same reason as above.
iii. It had to have some special mark on it which can be easily identified in the
camera input images so that we can locate the robot and find out the direction it is
facing.
To address the first two points we made the outer covering of the body with a non-glossy
black chart paper. This effectively prevented any internal parts from coming into view.
The rectangular roof merged well with the background after applying some simple
contrast and saturation filters to the camera input images.
To help locate Nick, we affixed two circular patches of different colours longitudinally
on the roof. While a single spot of colour is enough for locating, the second spot enables
us to also compute the current direction of the robot. This is unlike certain other designs
in which a single spot is used for locating the robot, and the difference between frames
while the robot is moving is analyzed to determine its direction. Thus, Nick’s current
direction is updated in every processed video frame and does not depend on movement.
The colours used during Pixel were Pink and White, and during IMAGE were Cyan and
Yellow. This was decided based on the colours of balls, obstacles and checkpoints which
varied across the two events.
Figure 7 shows Nick’s body with the two circular Spots for identification.
Figure 7: Nick’s body with Spots on top

5. Signal Transmission through Microcontroller
5.1 Need for Microcontroller

Our system was designed to perform the image processing and decision making tasks on
a computer, and instruct the robot via remote control. This called for the use of a standard
communication port through which signals could be sent. We had three main options in
this regard: i) Parallel Port, ii) Serial Port, and iii) USB Port.
Parallel Port is relatively the easiest to program. We can send control signals directly
from the computer for up to 8 different actuators. This is especially helpful for a wired
control system. But ours is a 2-channel wireless system, hence the advantage of 8 parallel
lines would be lost here. Moreover, the signals coming from the computer need to be
amplified and transmitted – which required the use of an intermediate digital to analog
circuit. This also meant foregoing the advantage of direct actuator control. So, Parallel
Port did not have a special use for us.
On the other hand, Serial Ports and USB Ports necessitate the use of a digital circuit to
decipher the signals sent from a computer. Hence, they also require slightly more
complex programming. USB Ports add even more complexity due to the protocol used
for establishing a connection with the computer. So we decided on the use of a Serial Port
for sending control signals from the computer.
A general-purpose microcontroller is perfectly suitable for the task of interpreting signals

from the Serial Port and driving the transmission module. Although a fabricated hardware
component would be much faster and efficient for the final model, a microcontroller
gives us the flexibility required during testing and development.
5.2 Circuit Components and Setup

We used an ATmega16L microcontroller for our system. The circuit board includes
components necessary for microcontroller programming, reception via serial port
interface, and signal generation for the RF transmission module. The microcontroller is
programmed via a parallel port interface, hence it is not necessary to disconnect the serial
port command interface even while re-programming. We custom built a STK 200 ISP
programming kit to communicate with the microcontroller using AVR Dude (for
uploading the microcontroller program) and PonyProg (for clearing the memory and
setting fuse bits).
LEDs indicate if a signal is being received from the serial line, and if the microcontroller
is being programmed. There is also a two-line LCD screen that is used to display what
command has been received from the computer and is being transmitted to the robot.
Certain error control measures have been deployed both in the computer as well as
microcontroller program to counter noise in the serial line.
Figure 8 shows the microcontroller circuit board with the above mentioned components,
and Figure 9 shows the entire transmission setup including the microcontroller board and
the RF transmission module.
Figure 8: Microcontroller circuit with LCD display Figure 9: Entire transmission setup
The unique characteristic of our setup is that the microcontroller is placed at the
transmission end purely for signal interpretation and transmission. The robot itself has no
onboard processing – it receives direct actuator control signals wirelessly.
5.3 Advantages of Our Setup

One major advantage of using a microcontroller in conjunction with a serial interface is
the scope for future expansion of functionality. The computer sends 2-byte commands to
the microcontroller circuit. Using 2 bytes instead of one enables better protection against
noise that may creep in. And as the computer can send virtually unlimited number of
different instructions, our microcontroller circuit can be used (with a transmission module
that has more channels) to remotely operate more sophisticated robot hardware that has
many more actuators.
The other big advantage is the ability to incorporate a level of feedback to synchronize
the computer and the robot. The rate of processing and command generation at the
computer is much faster than the speed at which the robot can complete issued
instructions such as moving and turning – due to physical speed limitations and response
characteristics of motors. The microcontroller in our setup acts as a means to enforce the
required synchronization by sending an acknowledgement to the computer after a
predetermined time interval to allow the robot to complete the given instruction. The
different lengths of time intervals for different commands have been arrived at
experimentally. The computer software does not issue the next command until it receives
an acknowledgement for the previous one.
The LCD display is exceptionally helpful in debugging. As our system uses a wireless
communication medium, it is essential (and challenging) for us to identify the location of
problems when they occur. Finding out whether there is trouble in the serial line from the
computer, or in transmission (e.g. robot going out of range), or on the robot itself would
have been quite an uphill task without the LCD.
Other smaller but still important advantages of our setup include the fact that the
microcontroller circuit being present at the transmission end does not draw upon the
limited battery power of the robot. This also allows for frequent reprogramming and
hence more extensive testing and improvement of the microcontroller program.
6. Software System
6.1 Overview
The software system deployed on the computer comprises modules for image processing,
planning, obstacle avoidance, and command execution. The system always exists in one
of several predefined states that determine its current behaviour. This state or mode is
concurrently used by the several modules, but is updated only by one module at a time –
the image processing module updates mode when the transition depends on a visual clue;
the planning module updates it when changing from one planning mode to another; and
the execution module updates it after a particular command has been executed. This
mode is thus a means to transfer control between modules.
The following sub-sections describe in detail the various modules of the system.
6.2 Image Processing

We have used the OpenCV image processing library for most of the generic video
capturing and frame processing functions. Processing speed is the main reason behind
using C++ for programming, along with OpenCV. The image processing module runs on
a separate thread of execution. It iteratively requests frames from the camera, performs
initial frame processing (such as RGB to grayscale and RGB to HSV conversion) and
calls another function for further processing according to the current mode.
The colour format conversion is necessary due to the different processing algorithms and
functions used by us. Each camera frame is received in 24-bit RGB (Red-Green-Blue)
colour format. It is converted to a grayscale image for Hough Transform (described next),
and to the HSV (Hue-Saturation-Value) colour space for Saturation-Value based filtering
and Hue-based colour classification. Using only the Hue component for colour
comparison makes it independent of the ambient lighting conditions.
The single-most important piece of image processing used in our system is the Hough
Transform for detecting circles. Every frame is processed to extract the location and radii
of circles detected in it using the OpenCV library function cvHoughCircles(). This
information is used by many of the detection algorithms - most importantly, for Robot
Detection.
Key elements of the image processing module have been presented below.
6.2.1 Robot Detection

Detecting correctly both the robot’s location and its current heading from every frame is
extremely important. We need to detect the two circular Spots on top of the robot for that.
We call them SpotA and SpotB. SpotA is towards the front of the robot and SpotB is
towards the rear. The colours of SpotA and SpotB were different for the PIXEL and
IMAGE events, depending upon the colours of balls, checkpoints and obstacles.
Experiments showed us that the center of rotation of the robot was roughly SpotA, so we
chose to define it as the robot’s location.
For each frame, after applying the Hough Transform and obtaining a list of circle co-
ordinates and corresponding radii, we execute the following steps to detect the Spots:
1. Initialization:
Assume a global boolean flag DetectedNick, set it to FALSE
Initialize two boolean flags DetectedSpotA and DetectedSpotB to FALSE
2. For each circle detected, iterate Steps 3 to 7 until both DetectedSpotA and
DetectedSpotB are TRUE
3. If the Radius is greater than a predefined maximum threshold for Spots OR
Saturation and Value components of the center pixel are below predefined levels
Then skip to Step 7
4. Average out the Hue components of pixels in a 9x9 area at the center of the circle
5. If DetectedSpotA is FALSE Then:
If the Average Hue is within a predefined range for SpotA Then:
Store the location of SpotA as the center of the current circle
Set DetectedSpotA to TRUE
Skip to Step 7
End If
End If
6. If DetectedSpotB is FALSE Then:
If the Average Hue is within a predefined range for SpotB Then:
Store the location of SpotB as the center of the current circle
Set DetectedSpotB to TRUE
End If
End If
7. Loop to Step 2
8. If both DetectedSpotA and DetectedSpotB are TRUE Then:
Let (x1,y1) be the location of SpotA, and (x2,y2) the location of SpotB
Store SpotA as the location of Nick
Calculate CurrentHeading = Arctangent( (y1 – y2) / (x1 – x2) )
Set DetectedNick to TRUE
End If
Note that Step 3 is required to filter out false positives. And the DetectedNick flag is
required because in some frames both Spots may not be detected. This can be caused by
several reasons such as blurring of the Spots due to robot movement, varying lighting
conditions, shadows etc. Whether Nick was detected by the above steps or not is
important for the decision making code to determine the correct course of action. Hence
DetectedNick has a global scope.
6.2.2 Ball / Checkpoint Detection
For the PIXEL event, we needed to detect Balls, and for IMAGE we had to detect
circular Checkpoints. As the balls appeared like circles to our overhead camera, their
detection was very similar to checkpoints. Hence, we describe here the common method
for detecting both. We shall use the name Artifact to designate ball/checkpoint.
Similar to the Spot detection method, we detect Artifacts using the following Steps:
1. Initialization:
Set NumArtifactsDetected to zero
2. For each circle detected, iterate Steps 3 to 6 till NumArtifactsDetected is less than
MAX_ARTIFACTS (predefined constant)
3. If the Radius is outside a predefined range for Artifacts OR Saturation and Value
components of the center pixel are below predefined levels Then skip to Step 6
4. Average out the Hue components of pixels in a 9x9 area at the center of the circle
5. If Average Hue is within the range for Artifacts(see note) Then:
Store the location of the new Artifact as the center of the current circle
Increment NumArtifactsDetected by 1
End If
6. Loop to Step 2
Note: Step 5 by itself is valid for a single type of Artifact (same hue range) only. The
checkpoints in IMAGE were of the same colour (red), hence this code was sufficient. For
detecting balls of three different colours in PIXEL, we had multiple If...Then blocks in
place of Step 5 – one for each colour. This was very similar to detecting Spots of two
different colours in the Spot detection code earlier.
The similarity between Robot Detection and Ball/Checkpoint Detection facilitated

interleaving of the two methods within a single loop iterating over the circles detecting
using Hough Transform.
6.2.3 Obstacle Detection

Obstacle detection was only necessary in IMAGE. The problem statement specified that
all obstacles would be white in colour, simplifying the task of detecting them to a great
extent. However, the obstacles could be of any shape. Hough Transform was of little use
here because the obstacles would not necessarily be circular.
In fact, we did not detect the obstacles as individual objects (or artifacts) because that
wasn’t necessary. Our approach was to detect all the ‘obstacle pixels’ in the image, and
avoid these pixels while navigating.
This posed a new problem: Each obstacle pixel is a very tiny point in reality, while our
robot is a physical object of much larger dimensions. Simply checking whether the
robot’s location (its center) coincides with an obstacle pixel is not enough. The robot’s
edge would have hit an obstacle much earlier, and besides, the robot’s center would never
really coincide with an obstacle pixel. So what we needed was a way to detect when any
of the robot’s edges would hit any of the obstacle’s edges. Our solution made use of the
Dilation technique.
The first step in detecting obstacles is to store only the white pixels (pixels with very high
Value component and very low Saturation component) in a separate binary image -
Obstacles. An appropriate metric NickRadius is predefined as half the width of our robot
plus a certain clearance (the desired separation between the robot’s and obstacles’ edges).
Now the Obstacles image is processed through a Dilation filter – OpenCV function
cvDilate() – with the degree of dilation equal to NickRadius. This expands each white
pixel in the Obstacles image in all directions by NickRadius pixels.
Figure 10: A Sample Obstacles image
Figure 11: Obstacles image after dilation

One can clearly see that the resulting Obstacles image has a much larger area covered
with obstacle pixels. This area is the true representation of the points where our robot
cannot be present, or in other words, where the center of the robot cannot reach. The
black pixels are areas where our robot can go. Hence, the robot can also pass through any
black channel between obstacles (even those with a single-pixel width). Remember that
dilation takes into account both the actual robot width as well as a certain amount of
clearance (clearance is necessary as robot movement is not guaranteed to be absolutely
precise). This dilated image is then used by the Obstacle Avoidance Algorithm (Section
6.4) to find a path around obstacles.
The channels in the original image (Figure 10) which have been closed up in the dilated
image (Figure 11) reveal an important point: If we keep too much clearance then some
areas may become inaccessible as per the dilated map. Hence the exact value of clearance
was arrived at experimentally after testing with several different obstacle configurations.
6.3 Modes of Operation (System States)

At any point of time, the system can be in any one of several Modes. The Image
Processing, Planning and Execution modules behave differently according to the current
mode the system is in. In fact, switching modes is a means of transferring program
control from one module to another. These modes and corresponding behavioural
characteristics are described below.
6.3.1 Birth
This mode, as the name suggests, is effective initially when the system is activated. It is
meant for performing all initialization tasks, including camera initialization and
calibration, memory allocation (esp. for image data), and detection of invariant features
in the field of view (Holes in Pixel; Obstacles and Checkpoints in IMAGE). Once the
required tasks have been completed, the system goes into Long Term Planning mode.
6.3.2 Long Term Planning

This is the highest level of planning in the system where a complete overall plan is
generated. Parts of this plan are dispatched during the Medium Term Planning mode one
by one. Long Term Planning differs significantly for the two events:
For the Pixel event, all balls on the arena are first identified and their coordinates
stored. Then all possible paths for picking and dropping balls one by one are investigated.
Each path consists of several Waypoints. The first waypoint is the current position of the
bot, followed by waypoints representing the positions of balls and holes. Each waypoint
representing the position of a ball is followed by one for its corresponding hole. When all
the balls (each followed by its corresponding hole) have been included, the total cost of
traversing the path is calculated. This is done by adding up the distances between each
pair of consecutive waypoints, and also the measure of the angles by which the bot must
turn at each waypoint. Note that the distance and angle measures are multiplied by two
different cost coefficients before being added together to give the total path cost. In our
final implementation, we used coefficients that made one angle unit (radian) equal to
about 10 distance units (pixels). These values were based upon experimental observations
from test runs. Among the several possible paths, the one with the least total cost is
chosen. It is worth mention here that a simple heuristic was used by us to reduce the time
required to find the optimum path: While computing the cost, if the value exceeds the
current minimum total cost before all waypoints have been considered, then that path is
abandoned immediately. Once the Long Term Plan (path with minimum cost) has been
selected, the system moves to Medium Term Planning mode.
For the IMAGE event, our primary task is to calculate a path that follows all the
checkpoints and avoids obstacles on the way. This task is performed in the Long Term
Planning mode. Similar to the above steps, a path is generated that contains several
waypoints (called PathNodes in this case to avoid ambiguity, which are actually
Checkpoints selected in a particular order along with detour points inserted to avoid
obstacles). This path is generated using a special Obstacle Avoidance Algorithm, which
is described later in section 6.4. Once the path is ready, the Medium Term Planning mode
is activated.
6.3.3 Medium Term Planning

This mode is like a Task Dispatcher. When the system initially transits from Long Term
Planning to Medium Term Planning, the first Task in the overall plan is active. Each Task
has a Source and Destination waypoint. Additionally, a Task Type is associated with it,
depending upon the nature of the Destination waypoint. In case of the IMAGE event, all
tasks are of simple ‘Move’ type – because the bot only needs to travel from one
checkpoint to another. In case of Pixel however, tasks can be one of 2 types: ‘Grab’
(when the destination is a ball) and ‘Drop’ (when the destination is a hole). The main
objective of the Medium Term Planning mode is to identify the type of the current task
and change over to the corresponding Short Term Planning/Task Execution mode. When
that current task execution is complete, control returns to Medium Term Planning, which
must then select the next task and dispatch it accordingly. When all tasks in the overall
plan have been executed successfully, the Fin mode is activated which is meant to stop
the bot and signal job completion. During the execution of a task, if control is returned to
the Medium Term Planning mode before the task is complete (due to any problem) then
Long Term Planning is activated to generate a fresh plan by re-evaluating the world.
The following 3 modes correspond to the task types discussed above. Medium Term
Planning dispatches each task by activating the relevant mode, and gets back control after
each task execution is complete. Note that these 3 modes make up the Short Term
Planning/Task Execution module of the system.
6.3.4 Move
This mode executes a simple source to destination journey of the bot (the source and
destination waypoints are obtained from the current task dispatched by the Medium Term
Planning mode). The bot orients itself towards the destination, moves in large steps at
first and then uses smaller steps when the destination is near. After each small movement,
the bot position and orientation information is extracted by the image processing module.
If the difference between the current orientation and the desired orientation is beyond a
certain threshold, then a turn command is executed to re-orient the bot. Otherwise a
forward movement command is executed to go towards the destination. The following
algorithm was implemented to carry out this behaviour:
1. Assumptions:
ActiveTask is the current task dispatched by Medium Term Planning
A boolean flag Complete (initially FALSE) is associated with ActiveTask
Source and Destination waypoint locations are extracted from ActiveTask
Bot location and CurrentHeading are obtained from image processing
2. Let (Dest.x, Dest.y) be the Destination waypoint location
3. Let (Nick.x, Nick.y) be the current location of the bot
4. Calculate Distance = ( Dest.x − Nick .x) 2 + ( Dest. y − Nick . y ) 2
5. If Distance < DIST_THRESHOLD Then:
Set the Complete flag of ActiveTask to TRUE
Set Mode to MEDIUM_TERM_PLAN
Else
Calculate Delta = Arctangent( (Dest.y – Nick.y) / (Dest.x – Nick.x) )
If Delta < DELTA_THRESHOLD Then
If Distance < DIST_CLOSE Then
Move Forward SMALL
Else
Move Forward BIG
End If
Else
If Delta < DELTA_CLOSE Then
Turn SMALL
Else
Turn BIG
End If
End If
End If
Note that this piece of code is executed repeatedly as long as the system is in Move state.
The only condition when the state changes to Medium Term Planning is when the
distance between the bot and the destination is negligible, i.e. less than
DIST_THRESHOLD (Step 5, If clause). The SMALL and BIG movements are executed
by the robot hardware by varying the duration for which the drive motors are fired. The
exact durations have been arrived at experimentally. Also, the direction in which the bot
should turn can be obtained by some trivial geometric calculations, and have not been
explicitly shown above.
After a Move task is complete, Medium Term Planning dispatches another task (if
available). Note that this Move mode is used for the IMAGE event only, in which there
are no other types of tasks. Hence, the Medium Term Planner activates the Move mode
again with a new destination waypoint (the previous destination becomes source).
6.3.5 Grab
The Grab mode is very similar to the Move mode. The main difference in its algorithm is
that the condition to check if the bot should move forward in BIG or SMALL steps (If
Distance < DIST_CLOSE …) is not required. The bot always needs to move in BIG
steps because even when it is close to a ball, it requires a certain momentum to push the
ball into its trap. Another difference is that Distance is calculated between the ball and
the center of the trap, instead of the center of the bot (location of the bot). This results in
more accurate tracking of whether the ball has entered the trap or not.
6.3.6 Drop
This mode is also very similar to the Move mode, in fact, more or less identical. The
condition to check if the bot should move forward in BIG or SMALL steps (If Distance <
DIST_CLOSE …) is present as is. This is because the bot needs to slow down and move
accurately while dropping the ball into the hole. If it is too fast in its approach, it might
cross the hole without dropping the ball, or worse, get its wheel(s) stuck in the hole. The
Drop mode also requires that the currently active ball be tracked constantly. We exit this
mode on a special condition that the bot is present in the correct position directly above
the hole and the ball ‘disappears’. We infer the ball to have disappeared when it has been
visible for at least N frames and is no longer visible for the next N frames (N is typically
10). Tests have shown that this condition is effective in correctly judging whether the ball
has been dropped or not.
6.3.7 Fin
The Fin mode is a special mode to indicate that the bot’s job is complete. This would
typically happen when all the balls have been dropped (in Pixel), or when the bot has
reached the end-point (in IMAGE). When Medium Term Planning mode determines that
all tasks have been successfully completed, it activates this mode instead of dispatching
control to Move/Grab/Drop mode. The bot would typically be programmed to stop all
operations when the Fin mode is entered.
6.4 Recursive Obstacle Avoidance Algorithm

Recall that in the IMAGE event, the primary task was to cover checkpoints while
avoiding obstacles. In Section 6.2.3 we saw how to obtain a dilated Obstacles image.
Here we describe how to use the Obstacles image along with the co-ordinates of the
checkpoints to find a path that does not have obstacles on its way. We assume that the
locations of checkpoints are ‘safe’, i.e. the bot cannot come in contact with an obstacle
while it is at a checkpoint.
A sequence of checkpoints culminating with the endpoint is chosen. Each checkpoint

location is stored in a PathNode, and connected in the form of a linked list (in the chosen
sequence). If the arena did not have obstacles, then the bot could have traversed this path.
But since there is a chance that it will encounter an obstacle on any or all of the legs of
this journey, we need to check for obstacles between every consecutive pair of
PathNodes. So we consider each such (source,destination) pair sequentially, and apply
the following recursive algorithm to insert more PathNodes, if necessary, to avoid
obstacles. Please refer to Figure 12 which illustrates the algorithm.
Figure 12: Illustrating the algorithm AVOIDOBS
Algorithm AVOIDOBS (Src : PathNode, Dest : PathNode)

1. Assumptions:
PathNode has data members x, y and a pointer to the next PathNode
A global linked list of PathNodes is accessible
Src is the current source PathNode (A)
Dest is the current destination PathNode (B)
2. Consider the straight line joining Src to Dest (l1): Traverse this line from both
ends (from A towards B and from B towards A) to find the first and last obstacle
points (C & D), i.e. points where an obstacle pixel is encountered in the Obstacles
image
3. If no obstacle pixels are encountered in the previous step then the path is clear;
skip to Step 10
4. Compute the mid-point of C & D on l1 (call it E)
5. Now consider the perpendicular bisector of the line segment CD passing through
E (l2): Traverse this line outwards from point E in both directions to find two
points on the obstacle’s edge (F & G). Looking from A towards B, F is on the
right-hand side of AB (hence we call it the right edge point) and G is on the left
(hence, left edge point).
6. Among F & G, pick the point which is nearer to E (in this case F). Actually, when
implementing this algorithm, we need not find the second point at all – just stop
when the first edge point is found in either direction
7. Create a new PathNode Mid with coordinates of F; insert it between Src & Dest:
Src.next = Mid
Mid.next = Dest
We thus divide the path from A to B into two sub-paths: A to F and F to B. In the
following two steps, we recursively call this same procedure to check for
obstacles on AF and FB
8. Call AVOIDOBS (Src, Mid)
9. Call AVOIDOBS (Mid, Dest)
10. End
The resulting linked list of PathNodes represents a path along which the bot would not
encounter any obstacles. Note that we can obtain an optimal path by studying all possible
sequences of checkpoints and calculating the total path cost in each case, and then
selecting the one with the minimum cost (similar to Section 6.3.2). In our implementation
we added two more features:
Firstly, in order to quickly find a path and not get stuck in a deep recursion tree, we
limited our recursion depth to 20 levels. We could easily find a suitable path for all test
runs within this limit.
Secondly, in case of complex obstacles such as those with rounded features, this
algorithm generates a lot of PathNodes in very close proximity along the obstacle’s edge.
We discovered that, when two consecutive PathNodes are too close to each other, the bot
overshoots the distance and tries to get back, and again overshoots – getting stuck in an
undesirable loop. Hence, we merged together consecutive PathNodes which were less
than a threshold distance apart. This prevented the bot from oscillating and also
optimized the path to a certain extent.
In the following section, we describe our findings from test runs as well as the actual
performance runs at Pixel and IMAGE.
7. Results
At the Pixel event, we put up a competitive performance. The system was able to
successfully navigate its way to a ball, grab it and drop it in the correct hole. During test
runs, the system remained in a static state until we introduced a ball into its view. Once a
ball was visible, it jumped into action after planning a path to the ball and from there to
its corresponding hole. When initiated with multiple balls already present in view, it
correctly identified the sequence of balls that resulted in an optimal overall plan. The
main issue with our system was its limited mechanical performance. With better
hardware such as more accurate drive motors and a controlled grabbing mechanism, our
runs would have been much smoother. The corrective path planning and execution
system worked very well, compensating for the shortcomings of the hardware to a large
extent. Figure 13 is a snapshot of a smaller arena built by us for testing purposes.
Figure 13: Test arena for Pixel
At IMAGE, we had even better success. Our system managed to reach the finals of this
highly competitive nationwide event. Checkpoints and obstacles were correctly
identified, the path was optimally planned, and execution was quite precise. The
shortcomings of the hardware were felt here too. It was clear that the dilation technique
for creating the obstacle map worked very well, although it took many test runs to arrive
at an optimal value for the dilation (clearance) radius.
Some general limitations of the system were observed in both events including the low
frame rate at which the images were being captured and processed (about 6 to 7 frames
per second), and the resulting delay which forced the robot to move in bursts. Certain
specific enhancements like a higher frame-rate camera with crisp resolution and a
suitable computer system designed for image processing can result in a significant
improvement in performance. On the whole, the hardware and software systems worked
satisfactorily, giving us a positive overall outcome.
8. Conclusion and Future Work
We conclude from our experience in designing and building this autonomous system that:
1. A wireless control setup is very effective for frequent and extensive testing, and
can provide a competitive edge. It is also more directly applicable to practical
systems. Although difficult to build initially, a wireless setup turns out to be very
advantageous – including the fact that it requires much less maintenance and can
work in more challenging environments.
2. Moving complex digital equipment to the transmission end and using a very basic
navigational unit allows us to lower the cost of the overall robotic system. In our
case, we kept all the command interpretation circuit (including microcontroller)
separate from the mobile robot, and directly connected to the computer. This kind
of a setup can help us design practical systems with multiple inexpensive robots
controlled by a central computer system.
3. Simple Hough Transform is good enough for detecting circles or round features
(or any regular geometrical feature) in input images, and is also comparatively
fast. Interleaving the separate tasks of identifying different classes of objects
(balls, holes, robot detection spots, etc.) into a single pass further improves the
image processing module.
4. Parallel execution of image processing and decision making modules can greatly
improve performance by cutting down on the delay experienced between video
input and command generation.
5. The Recursive Obstacle Avoidance algorithm proved to be an effective method of
determining an obstacle-free path. It still has a large scope for improvement.
Future work in this direction would include:

1. Building a better navigational unit using precision hardware such as drive motors
with optical encoder feedback for more accurate movements. Also, a more
powerful computer system, preferably with parallel processing capabilities would
be highly advantageous.
2. Designing fast image processing algorithms to detect known shapes such as
squares, triangles etc. just like the Hough Transform for circles used in our
system. This will enable us to handle objects depending on their shapes.
3. Improving the Recursive Obstacle Avoidance to ensure that an obstacle-free path
is always found, if it exists. Currently, the recursive depth limit may prevent this
in some cases.
4. Customizing the system for practical applications, such as exploration of an
unmapped surface, mine detection and removal, search and rescue operations,
vehicular control, etc. All these activities require navigational and obstacle
avoidance capabilities, hence being suitable application domains for our system.
In addition, there would be a need for application-specific instruments onboard
the robot, and consequently a more complex control and communication setup.
Re-designing the present system for efficient operation in such applications would be an
interesting line of research.
9. References
[1] DeSouza, Guilherme N.; and Kak, A. C. "Vision for Mobile Robot Navigation: A
Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24,
no. 2, pp. 237-267; February, 2002
[2] Barnes, N.; and Liu, Z. "Embodied Categorisation for Vision-Guided Mobile
Robots," Pattern Recogition, vol. 37, no. 2, pp. 299-312; February, 2004
[3] Kosaka, A.; and Kak, A. C. "Fast Vision-Guided Mobile Robot Navigation Using
Model-Based Reasoning and Prediction of Uncertainties," Proceedings of IEEE/RSJ
International Conference on Intelligent Robots and Systems, Raleigh; 1992
[ http://cobweb.ecn.purdue.edu/RVL/Projects/Finale/index.htm ]
[4] Kobashi, M.; Kobashi, A.; and Kobashi, A. "Design of a Fully Vision-Guided
Mobile Robot," Association for the Advancement of Artificial Intelligence (AAAI);
2007
[5] Se, S.; Lowe, D. G.; and Little, J. J. "Vision-Based Global Localization and Mapping
for Mobile Robots," IEEE Transactions on Robotics, vol. 21, no. 3, pp. 364-375;
2005
[6] Automation Magazine:

[ http://www.automationmag.com/index.php?option=content&task=view&id=868 ]
[ http://www.automationmag.com ]
[7] Vision Systems:

[ http://www.vision-systems.com/articles/article_display.html?id=261912 ]
[ http://www.vision-systems.com ]
[8] Pixel – A vision-guided robotics event held during Techfest 2008 at IIT Bombay:
[ http://www.techfest.org/competitions/analogic/pixel ]
[ http://www.techfest.org ]
[9] IMAGE (Image Mapping Autonomous Generic Explorer) – An autonomous vision-

guided robot navigation event held during Kshitij 2008 at IIT Kharagpur:
[ http://www.robotix.in/rbtx08/events.php?page=image ]
[ http://www.ktj.in ]

Final Destination: A Computer Vision Guided Robotic System

Uploaded by

Copyright:

Available Formats

Final Destination: A Computer Vision Guided Robotic System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Destination: A Computer Vision Guided Robotic System

Uploaded by

Copyright:

Available Formats

Final Destination:

A Computer Vision guided Robotic System

Calcutta Institute of Engineering and Dinabandhu Andrews Institute of

Arpan Chakraborty Jit Ray Chowdhury

Autonomous Robots find applications in various domains that demand different

Navigator Dimensions: 20x20x30 cm (length x breadth x height)

IMAGE (Image Mapping Autonomous Generic Explorer) at IIT Kharagpur [9]:

Navigator Dimensions: 25x25x100 cm (length x breadth x height)

Ball-Grabbing Mechanism: This is a special attachment to Nick which facilitates

Signal Transmission Setup: This consists of a microcontroller operated serial

4.1 Chassis and Drive

Figure 1: Top view of chassis

Figure 2: Perspective view of chassis

Figure 3: Chassis with WSRC and Ball-Grabbing Mechanism

4.3 Ball Grabbing Mechanism

Figure 4: Ball-Grabbing Mechanism - Side View

Figure 5: Ball-Grabbing Mechanism - Top view

Figure 6: A ball trapped inside

Figure 7: Nick’s body with Spots on top

5.1 Need for Microcontroller

A general-purpose microcontroller is perfectly suitable for the task of interpreting signals

5.2 Circuit Components and Setup

5.3 Advantages of Our Setup

6.2 Image Processing

6.2.1 Robot Detection

The similarity between Robot Detection and Ball/Checkpoint Detection facilitated

6.2.3 Obstacle Detection

Figure 10: A Sample Obstacles image

Figure 11: Obstacles image after dilation

6.3 Modes of Operation (System States)

6.3.2 Long Term Planning

6.3.3 Medium Term Planning

6.4 Recursive Obstacle Avoidance Algorithm

A sequence of checkpoints culminating with the endpoint is chosen. Each checkpoint

Algorithm AVOIDOBS (Src : PathNode, Dest : PathNode)

Figure 13: Test arena for Pixel

Future work in this direction would include:

[6] Automation Magazine:

[7] Vision Systems:

[9] IMAGE (Image Mapping Autonomous Generic Explorer) – An autonomous vision-

You might also like