Pietrow 2017

Objects detection and recognition system using
artificial neural networks and drones
Dymitr PIETROW, MSc Jan MATUSZEWSKI, PhD

Military University of Technology Military University of Technology
2, S. Kaliski St., 00-908 Warsaw, Poland 2, S. Kaliski St., 00-908 Warsaw, Poland
[email protected] [email protected]
Abstract—The paper presents digital image objects detection and patterns needed for learning process, whereas the second
recognition system using artificial neural networks and drones. It application on the basis of patterns created in program
contains description based on example of person identification system mentioned above and static patterns prepared separately teaches
where face is the key object of processing. It describes the structure of our system detecting and recognizing requested objects.
the system and components of the learning sub-system as well as the
processing sub-system (detection, recognition). It consists of a The patterns generator was implemented in C++ by using
description and examples of the learning and processing algorithms Unreal Engine environment in version 4.0. The object which
and the technologies applied. The results of calculations of efficiency needs to be detected by the system is loaded as a 3D model. A
and speed of each algorithm are presented by tables and appropriate skeleton and a description of the parameters allows the patterns
characteristics. The article also describes possibilities of further generator to present the object in many variations. For example,
developments of system. in a gesture detection system, a program will load a hand model,
where the following will be modifiable in the operating time: the
Keywords — neural networks, detection, recognition, drones. hand’s size, the location of the fingers, the hand’s color, the
variant size of each finger. These changes can be made randomly
I. INTRODUCTION or by parameters previously set.
Image recognition and detection is a complex process that
demands a lot of calculations. It requires using complex decision
systems. Their speed depends on the processing algorithms
applied and the implementation method of these algorithms. The
implementation should efficiently use the hardware platform on
which the system runs.
Usage of AI methods and machine learning in processing,
allows to gain greater system effectiveness than using
algorithms and rules that were established by programmer at the
time of system implementation.
By fusing such system with movable sensors, which can
provide stream of digital images, allows us to observe (in real
Fig. II.1. Example of dynamically generated learning patterns on the
time) a selected area under search for requested objects. basis of a 3D model.
Additionally, if our system has base data of object types of the
requested object, we can classify our object more accurately. The learning application was implemented in C++ language
The system proposed in this article is used for identification in Visual STUDIO 2015 and QT environment. The results of its
of people who are located in an area of special security. The work is a creation of two neural networks. The first neural
system contains a learning sub-system and a processing sub- network is designed for detecting an object at digital image,
system. The next chapters describe each component of system whereas the second neural network compares the detected image
and most significant algorithms which have particular influence with images of objects in base data with the intention to get more
on the system’s performance. specific information about the detected object. For example, we
detect a new car by the first neural network but if we want to
II. SYSTEM STRUCTURE know the car’s brand, we have to compare that car with images
of cars of different brands, which are located in base data. Both
A. Learning Sub-system the first and the second neural network have one output which
gives value in range from -1 to 1. The object is accepted as
The learning sub-system is a software platform consisting of detected or recognized if output value of neural network is
two applications. The first application is used for generation greater than zero.
978-1-5090-6755-8/17/$31.00 ©2017 IEEE

B. Processing Sub-system equalization allow the system to act independently from the
Besides the software component the processing sub-system lighting type and the requested object’s color. It means that
also has a movable sensor which provides a stream of digital neural network is concentrated on learning about general
images and an operator who is responsible for the sensor’s features of object and not about its special cases. Normalization
movements. is vital in the learning process because it prevents overflows and
surfeits of modified weights values [1].
The software part of the sub-system is built of base data of
object recognition, an application for image processing which B. Learning of detecting neural network
executes detection and a recognition process of the requested
The detecting neural network has three-layered, feedforward
object. The thread which provides stream of digital images is an
important element of application. The stream can be sent by structure with no-linear activation function. The learning
TCP/IP or UDP protocol. The sending device could be LTE process is conducted by using backpropagation algorithm with
module (in this case the area of processing is whole area where momentum with adjusting learning factor. Selection of learning
patterns is realized by a Bootstrap algorithm. Bootstrap allows
we have communication coverage, for example city) or WI-FI
module (in the case of small area). to decrease the number of patterns needed for learning process.
The initial size of learning patterns is small. The next learning
The movable sensor in this article is a quadcopter drone. The patterns are added periodically during the learning process after
drone is controlled by the human operator or by an algorithm some iterations of backpropagation algorithm. The pattern is
(which at this moment is in an implementation phase). Except added only when it is wrongly recognized by a neural network,
for the parts that are vital for flying (battery, motors, because there is no reason for adding patterns correctly
microcontroller for flying control), the drone has a mounted recognized to training set. Thanks to this the neural network
camera and a Raspberry Pi ver. 2.0 microcontroller, which learns new features of the object analyzed instead of duplicating
provides full control and programmability of the drone. The what is had already learned. It speeds up a single iteration of the
camera provides stream of digital images which is used for learning algorithm, because it eliminates processing patterns that
processing. do not contribute to the learning process [1].
III. ALGORITHMS USED IN THE LEARNING SUB-SYSTEM

The most important algorithms used in the process of
learning of the neural network and the algorithms of initial
preparation and generating learning data are as follows:
A. Preparing learning data

All input data (input images) is subjected to following
operations:
- resizing (smaller resolution);
- conversion to shades of grey;
- histogram equalization;
- normalization.
Ilość pikseli o
danej jasności
Ilość pikseli o
danej jasności
Ilość pikseli o
danej jasności Fig. III.2. Example of adding negative patterns during the
time of learning neural network for face detection.
255 0 jasność piksela 255 0 jasność piksela 1
0 jasność piksela
wyrównanie
C. Learning image comparing neural network

histogramu normalizacja
wyrównanie
Wyrównanie histogramu
histogramu normalizacja The neural network used in the process of comparing images
has a similar architecture to the detecting neural network. The
i normalizacja
wyrównanie
difference is that first and second layer’s connections are not

normalizacja
histogramu
type each to each. Connections were selected so that one half of

Fig. III.1. Histogram and normalization of input images. neurons will be processing one image and the second half of
neurons will be processing the second image. This operation
Changing the size of input image to a smaller size allows to allowed to speed up the learning process in contrast of using
speed up the process of learning (the architecture of neural connections each to each. It decreased amount of executed
network is simplified, having less inputs and connections floating point calculations, which means speed-up of image
between layers, which results in less floating-point calculations). propagation process in neural network. This learning process
In the case of face detecting neural network a sufficient image also uses a Bootstrap algorithm. Additionally, next to the
resolution is 20x20 pixels. Face recognizing neural network uses learning process, the probability thread is executed, which
image resolution of 40x40 pixels, which is sufficient for face provides us information about the effectiveness of comparing
comparison. Image conversion to shades of grey and histogram images.
D. Generation and processing of static patterns which resolution could be handled by input of neural network.
In the initial stage of neural network learning process a learning Obviously before presenting image fragment from shifting
process to identify which elements are constant and windows to the input of neural it has to be normalized and
unchangeable takes place. In order to increase the generalization subjected to operations described at chapter III. A.
capabilities of a neural network, elements of data patterns are
subjected to following operations: B. Efficient Histogram Equalization.
- mirror reflection (horizontally); To allow the system to operate in real time it was necessary
to speed up the histogram calculation of every image fragment
- rotation (in range of 5 – 10 degrees); from shifting windows. In contrast to a standard algorithm, the
complete histogram is calculated only once at the beginning.
- size changes (an increase or a decrease) in range of Histograms for next shifting windows are calculated on the basis
5-10% of the base size. of the histogram from previous shifting window. For example,
for detecting purposes our shifting window has a resolution of
20x20 pixels, if we want to calculate histogram we have to
process 400 elements (pixels) of the analyzed image fragment.
However, by using an efficient histogram, for every next shift
window (by moving left, right, down per pixel length) we have
to process 20 elements of entry column (add these values to
histogram) and 20 elements of an outgoing column (subtract
these values from histogram). As we can see, an efficient
histogram remarkably decreases the amount of elements
necessary to process and change search operations on matrix
values into simple adding and subtracting operations on small
vectors [4].
V. USED TECHNOLOGIES
This chapter describes the hardware technologies in use,
Fig. III.3. Window displayed during the learning process with which were required in the process of achieving real-time
processed patterns of training set. processing detection and object recognition and to accelerate the
learning process.
Size and rotation degree are randomly selected [1]. In the next
step elements of data patterns are converted to form, which is A. Processing on the graphic card (GPU)
accepted by the input of neural network. A conversion is
conducted by the operations described in point A of this chapter Thanks to shader units graphics card processor is able to
[1]. process simultaneously a lot of threads operating on large
matrices of data. The basic requirement for accelerating the
E. Generation and processing of dynamically selected algorithm with the use of a GPU is the possibility to rewrite the
algorithm to his parallel version. The amount of threads which
patterns.
we can execute at the same time is about 10000 and more and
The dynamical patterns data is created during learning time this efficiency is not possible to be achieved at CPU, where the
and depends on current learning progress of neural network. In limit at current time allows us to run the program using eight
contrast to static training set, dynamic patterns are generated threads simultaneously.
based on the 3D model. Thus, we have possibility to modify
object’s location in relation to the camera, object’s texture, The back propagation algorithm used in the learning sub-
background changes, modification method (randomly, by steps), system was modified and adjusted for executing on a GPU
noise insertion. There are practically unlimited possibilities and processor. All patterns are being processed simultaneously in
depends on parameters definitions of patterns generation one epoch of the learning algorithm. It accelerated the learning
process. process by 2-3 times in the contrast to algorithm which is
proceeded at the central processing unit (CPU), where learning
the patterns are processed sequentially.
IV. ALGORITHMS USED PROCESSING SUB-SYSTEM
This chapters describes algorithms used for processing There is a possibility to speed up even 5-10 times relatively
images, detecting and recognizing objects. to CPU, but in this case there is a necessity to use shared memory
of threads blocks.
A. Searching in image space The technology which allowed to implement the learning
To make it possible for neural network to detects objects in algorithm at GPU is C++ AMP [3].
different sizes, every image frame provided by drone has to be
subjected to subsampling operation. On the basis of original
frame, we create images of different diminishing size. Every of
these created images are processed by the shifting window
B. Multi-thread processing B. Comparison of multi-threading processing with single
Depending on the capabilities of the available CPU, there is thread processing and with SSE or without.
a theoretical possibility of n-multiplying increase to speed up the
program where n is the number of processor’s cores. The table below shows results of comparison of multi-thread
The image processing algorithm was modified by dividing it with single thread processing.
into 4 threads, which means that the image is now divided on
four processing areas. Every thread is responsible for one of the Table 1. Comparison of multi-thread with single thread
four parts of image. Additionally, for calculating outputs of the processing
neural networks and histogram equalization there were used SSE
(Streaming SIMD Extensions) instructions, which allow to Speed of processing of single frame video stream
process simultaneously up to four math operations.
thread amounts 1 without SSE 4 with SSE
By using multi-threading (4-threads) and SSE instructions it
was possible to achieve an increase from the average time of Standard histogram 250 ms 130 ms
about 250 to 10-20 milliseconds. That is an improvement od
Efficient histogram 100 ms 16 ms
about 25x-times.
VI. RESULTS The results of a multi-thread processing with SSE show

This chapters shows results of calculations described in this multiple speed-up of processing sub-system. If we divide second
article algorithms. by time of processing of single frame we get theoretical speed of
processing which is 62 fps. The achieved speed-up is sufficient
A. Comparison of speed learning at GPU with CPU to be used by the sub-system in real-time processing.
The figures below present amount of iterations done by
learning algorithm in 3 hours’ time for GPU and CPU. VII. CONCLUSIONS
The results confirm that nowadays not only the knowledge
about rules of operating efficient algorithms but also the
knowledge about hardware platform which will be used for
implementation purposes is very important. Familiarity with
multi-thread processing and skills in converting seemingly serial
algorithms to parallel version is fundamental for creating
efficient and fast algorithms, which we can use in real time
processing systems.
By using AI, we allow our system to acquire knowledge
about features of the analyzed object, which the designer/
programmer is not able to predict while designing. In a certain
Fig. VI.1 General error at CPU for 3 hours of learning. way in the course of the learning process the system better than
its designer. The neural networks allowed for the system to get
general knowledge about the requested object. It means that the
artificial neural network, similarly to biological counterparts,
can detect an object in different lighting, presence of
interferences etc. The difference is that digital neurons are never
tired and in contrast to biological neurons their quality of
processing is always constant.
Fig. VI.2 General error at GPU for 3 hours of learning.
Figure VI.2 shows that speed of learning at GPU is much greater

than in the case of executing this algorithm at CPU. If we
calculate ration of amount operations of GPU to the one
15311
performed at CPU we notice ≈ 2 - times faster execution
7531
of algorithm.
VIII. REFERENCES [8] Gregory K., Miller A.: C++ AMP Accelerated Massive
[1] Wikipedia The Free Encyclopedia, Digital Image: Parallelism with Microsoft Visual C++. O’Reilly Media,
https://en.wikipedia.org/wiki/Digital_image (access Inc, 1005 Graven stein Highway North Sebastopol,
19.10.2015). California 95472.
[2] Wikipedia The Free Encyclopedia, Bitmap: [9] Tutorials for modern OpenGL (3.3+): http://www.opengl-
https://en.wikipedia.org/wiki/Bitmap (access tutorial.org (access 28.05.2017).
19.10.2015). [10] Efficient Histogram-Based Sliding Window:
[3] Osowski S.: Sieci neuronowe do przetwarzania http://msr-waypoint.com/en-
informacji. Oficyna Wydawnicza Politechniki us/people/yichenw/cvpr10_ehsw.pdf (access
Warszawskiej, Warszawa, 2006. 05.11.2015).
[4] Neural Network-Based Face Detection: [11] Stanisław Osowski.: Sieci neuronowe w ujęciu
http://www.informedia.cs.cmu.edu/documents/rowley- algorytmicznym, Warszawa, Wydawnictwo Naukowo
ieee.pdf (access 19.10.2015). Techniczne, 1996.
[5] Rozpoznawanie twarzy za pomocą sieci neuronowych: [12] Learning to Compare Image Patches via Convolutional
http://www.michalbereta.pl/dydaktyka/KPO/Rozpoznaw Neural Networks: http://www.cv-
anie%20twarzy.pdf (access 28.05.2017). foundation.org/openaccess/content_cvpr_2015/papers/
[6] Megatutorial–Od zera do gier kodera: Zagoruyko_Learning_to_Compare_2015_CVPR_paper.
http://xion.org.pl/productions/texts/coding/megatutorial/ pdf (access 28.05.2017).
(access 28.05.2017). [13] Raspberry PI:
[7] Microsoft Software Developer Network (MSDN), Visual https://www.raspberrypi.org/ (access 28.05.2017).
Studio IDE User’s Guide: [14] TCP/IP Python Communication:
https://msdn.microsoft.com/en- https://wiki.python.org/moin/TcpCommunication
us/library/jj620919(v=vs.120).aspx (access 19.10.2015). (access 28.05.2017).
[15] Open Source Computer Vision (OpenCV):
http://opencv.org/ (access 28.05.2017).

Pietrow 2017

Uploaded by

Copyright:

Available Formats

Pietrow 2017

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pietrow 2017

Uploaded by

Copyright:

Available Formats

Objects detection and recognition system using

artificial neural networks and drones

Dymitr PIETROW, MSc Jan MATUSZEWSKI, PhD

978-1-5090-6755-8/17/$31.00 ©2017 IEEE

III. ALGORITHMS USED IN THE LEARNING SUB-SYSTEM

A. Preparing learning data

C. Learning image comparing neural network

difference is that first and second layer’s connections are not

type each to each. Connections were selected so that one half of

VI. RESULTS The results of a multi-thread processing with SSE show

Fig. VI.2 General error at GPU for 3 hours of learning.

Figure VI.2 shows that speed of learning at GPU is much greater

You might also like