Pietrow 2017
Pietrow 2017
Pietrow 2017
Abstract—The paper presents digital image objects detection and patterns needed for learning process, whereas the second
recognition system using artificial neural networks and drones. It application on the basis of patterns created in program
contains description based on example of person identification system mentioned above and static patterns prepared separately teaches
where face is the key object of processing. It describes the structure of our system detecting and recognizing requested objects.
the system and components of the learning sub-system as well as the
processing sub-system (detection, recognition). It consists of a The patterns generator was implemented in C++ by using
description and examples of the learning and processing algorithms Unreal Engine environment in version 4.0. The object which
and the technologies applied. The results of calculations of efficiency needs to be detected by the system is loaded as a 3D model. A
and speed of each algorithm are presented by tables and appropriate skeleton and a description of the parameters allows the patterns
characteristics. The article also describes possibilities of further generator to present the object in many variations. For example,
developments of system. in a gesture detection system, a program will load a hand model,
where the following will be modifiable in the operating time: the
Keywords — neural networks, detection, recognition, drones. hand’s size, the location of the fingers, the hand’s color, the
variant size of each finger. These changes can be made randomly
I. INTRODUCTION or by parameters previously set.
Image recognition and detection is a complex process that
demands a lot of calculations. It requires using complex decision
systems. Their speed depends on the processing algorithms
applied and the implementation method of these algorithms. The
implementation should efficiently use the hardware platform on
which the system runs.
Usage of AI methods and machine learning in processing,
allows to gain greater system effectiveness than using
algorithms and rules that were established by programmer at the
time of system implementation.
By fusing such system with movable sensors, which can
provide stream of digital images, allows us to observe (in real
Fig. II.1. Example of dynamically generated learning patterns on the
time) a selected area under search for requested objects. basis of a 3D model.
Additionally, if our system has base data of object types of the
requested object, we can classify our object more accurately. The learning application was implemented in C++ language
The system proposed in this article is used for identification in Visual STUDIO 2015 and QT environment. The results of its
of people who are located in an area of special security. The work is a creation of two neural networks. The first neural
system contains a learning sub-system and a processing sub- network is designed for detecting an object at digital image,
system. The next chapters describe each component of system whereas the second neural network compares the detected image
and most significant algorithms which have particular influence with images of objects in base data with the intention to get more
on the system’s performance. specific information about the detected object. For example, we
detect a new car by the first neural network but if we want to
II. SYSTEM STRUCTURE know the car’s brand, we have to compare that car with images
of cars of different brands, which are located in base data. Both
A. Learning Sub-system the first and the second neural network have one output which
gives value in range from -1 to 1. The object is accepted as
The learning sub-system is a software platform consisting of detected or recognized if output value of neural network is
two applications. The first application is used for generation greater than zero.
wyrównanie
Wyrównanie histogramu
histogramu normalizacja The neural network used in the process of comparing images
has a similar architecture to the detecting neural network. The
i normalizacja
wyrównanie
V. USED TECHNOLOGIES
This chapter describes the hardware technologies in use,
Fig. III.3. Window displayed during the learning process with which were required in the process of achieving real-time
processed patterns of training set. processing detection and object recognition and to accelerate the
learning process.
Size and rotation degree are randomly selected [1]. In the next
step elements of data patterns are converted to form, which is A. Processing on the graphic card (GPU)
accepted by the input of neural network. A conversion is
conducted by the operations described in point A of this chapter Thanks to shader units graphics card processor is able to
[1]. process simultaneously a lot of threads operating on large
matrices of data. The basic requirement for accelerating the
E. Generation and processing of dynamically selected algorithm with the use of a GPU is the possibility to rewrite the
algorithm to his parallel version. The amount of threads which
patterns.
we can execute at the same time is about 10000 and more and
The dynamical patterns data is created during learning time this efficiency is not possible to be achieved at CPU, where the
and depends on current learning progress of neural network. In limit at current time allows us to run the program using eight
contrast to static training set, dynamic patterns are generated threads simultaneously.
based on the 3D model. Thus, we have possibility to modify
object’s location in relation to the camera, object’s texture, The back propagation algorithm used in the learning sub-
background changes, modification method (randomly, by steps), system was modified and adjusted for executing on a GPU
noise insertion. There are practically unlimited possibilities and processor. All patterns are being processed simultaneously in
depends on parameters definitions of patterns generation one epoch of the learning algorithm. It accelerated the learning
process. process by 2-3 times in the contrast to algorithm which is
proceeded at the central processing unit (CPU), where learning
the patterns are processed sequentially.
IV. ALGORITHMS USED PROCESSING SUB-SYSTEM
This chapters describes algorithms used for processing There is a possibility to speed up even 5-10 times relatively
images, detecting and recognizing objects. to CPU, but in this case there is a necessity to use shared memory
of threads blocks.
A. Searching in image space The technology which allowed to implement the learning
To make it possible for neural network to detects objects in algorithm at GPU is C++ AMP [3].
different sizes, every image frame provided by drone has to be
subjected to subsampling operation. On the basis of original
frame, we create images of different diminishing size. Every of
these created images are processed by the shifting window
B. Multi-thread processing B. Comparison of multi-threading processing with single
Depending on the capabilities of the available CPU, there is thread processing and with SSE or without.
a theoretical possibility of n-multiplying increase to speed up the
program where n is the number of processor’s cores. The table below shows results of comparison of multi-thread
The image processing algorithm was modified by dividing it with single thread processing.
into 4 threads, which means that the image is now divided on
four processing areas. Every thread is responsible for one of the Table 1. Comparison of multi-thread with single thread
four parts of image. Additionally, for calculating outputs of the processing
neural networks and histogram equalization there were used SSE
(Streaming SIMD Extensions) instructions, which allow to Speed of processing of single frame video stream
process simultaneously up to four math operations.
thread amounts 1 without SSE 4 with SSE
By using multi-threading (4-threads) and SSE instructions it
was possible to achieve an increase from the average time of Standard histogram 250 ms 130 ms
about 250 to 10-20 milliseconds. That is an improvement od
Efficient histogram 100 ms 16 ms
about 25x-times.