Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
Abstract. With the rapid development of deep learning, great breakthroughs have
been made in the field of object detection. In this article, the deep learning algorithm
is applied to the detection of daily objects, and some progress has been made in this
direction. Compared with traditional object detection methods, the daily objects
detection method based on deep learning is faster and more accurate. The main
research work of this article: 1. collect a small data set of daily objects; 2. in the
TensorFlow framework to build different models of object detection, and use this data
set training model; 3. the training process and effect of the model are improved by
fine-tuning the model parameters.
1. Introduction
Object detection is a very popular research direction in the vision field. Launched in 70s, object
detection began to be on track until 90s when computers became powerful and application plentiful. It
is easy for us as human to recognize objects in the images, however, things become difficult for
computers. Adding the different posture of objects and the complex environment around, object
detection is more ambiguity.
As we know, the evolution of detection algorithm is divided into two stages. Stage one is based on
the traditional features of the solution, and the second stage is the deep learning algorithm. Before
2013, most of the researches was based on the traditional feature optimization detection method. After
that, both academia and industry turned to deep learning algorithm.
With the increasing amount of detection data, the traditional detection method performance will
become saturated. The detection performance will gradually improve, yet the improvement decreases
after a certain amount of data. However, the method of deep learning is different. While the data of the
scene distribution accumulates, the detection performance promote continuously.
In this article, a set of data of daily supplies is collected, and then different training object detection
models are applied on the data. And by comparing the direct training and parameter adjustment model
training, it will be proved that the convergence speed and accuracy of object detection are improved
by adjusting the parameters.
2. Literature Survey
In ILSVRC 2014, deep learning increases the average object detection rate to 43.933%.
R-CNN proposed by Ross Girshick introduced CNN method into target detection field for the first
time. It introduces CNN method into target detection field. Selective Search[5] window extraction
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
SAMSE IOP Publishing
IOP Conf. Series: Materials Science and Engineering 322 (2018) 062024 doi:10.1088/1757-899X/322/6/062024
1234567890‘’“”
algorithm is proposed to replace the traditional sliding window extraction method. Girshick proposed
Fast R-CNN model, which integrates feature extraction and classification into a classification
framework. In the R-CNN model, the deep convolution network for feature extraction and support
vector machines for classification are trained separately. The training time of Fast R-CNN model is 9
times faster than that of R-CNN model. The region proposal extraction and part of Fast-RCNN are put
into a network model RPN (region proposal net) in Faster R-CNN. The detection stage is very
convenient and the accuracy is similar with Fast R-CNN. YOLO (You Only Look Once) [6] proposed
by Joseph Redmon et al., is a one-time convolutional neural network predict multiple candidate frame
position and classification, target detection and recognition can achieve end to end. It solves object
detection as a regression problem. Based on a single end-to-end network, the output from the original
image to the position and category of the object is completed. R-FCN [7] is an accurate and effective
method for object detection. Compared with the previous region detection, R-FCN is based on the
whole image convolution calculation. In order to achieve this goal, the model uses a position-sensitive
score maps to balance the shift invariance in image classification and the translational transformation
in object detection.
3. Proposed Method
Object detection algorithm usually contains three parts. the first is the design of features, the second is
the choice of detection window, and the third is the design of classifier. Feature design methods
include artificial feature design and neural network feature extraction. The selection of detection
window mainly includes: Exhaustive Search [4], Selective Search [5], and RPN method based on deep
learning. This article adopts deep convolutional neural network (CNN) image feature extraction, using
the most advanced RPN as the detection window selection method, the bounding box regression
analysis, using softmax classification processing, and output the detection result. The model structure
is shown with the help of blocks in Figure 1.
where x represents the input vector, w represents the parameters of a convolution kernel, b
represents the bias term, f represents the activation function, and y represents the output.
A pooling layer is placed behind each roll layer to reduce dimension. Generally, the output matrix
size of the original convolution layer is changed to half of the original one, which is convenient for the
operation at the back. In addition, the pooling layer increases the robustness of the system, turns the
original accurate description into a rough description, and avoids overfitting to some extent.
2
SAMSE IOP Publishing
IOP Conf. Series: Materials Science and Engineering 322 (2018) 062024 doi:10.1088/1757-899X/322/6/062024
1234567890‘’“”
fixed scale windows in the original design. According to the window and the ground truth
Intersection-over-Union (IoU) value to its positive and negative labels, let it learn whether there are
objects inside, so training a Region Proposal Network.
Only the approximate place need to be found, because that the precise positioning position and size
can be accomplished by following works. As the consequence the anchors can be fixed in three
aspects: fixed scale changes (three scales), fixed length and width ratio changes (three ratio), fixed
sampling method, only in the eigenvalues of each point in the original map of the corresponding
Region of Interest(RoI) on the sampling, in the back of the work can be adjusted. This can reduce the
complexity of the task.
After extracting the proposal on the feature map, the convolution calculation can be shared in front
of the network. The result of this network is that each point of the convolution layer has an output
about the k anchor boxes, including whether it is an object or not, adjusting the corresponding position
of the box. The RPN's overall Loss function can be defined as:
(2)
where the i denotes the i-th anchor, = 1 when the anchor is positive, and = 0 when the
anchor is negative. represents a ground true box coordinate associated with the positive sample
anchor (each positive anchor may correspond to only one ground true box: a positive sample anchor
corresponds to a ground true box, then the anchor with the ground true box is either the largest IoU of
all anchor, or greater than 0.7).
4. Experimental Results
3
SAMSE IOP Publishing
IOP Conf. Series: Materials Science and Engineering 322 (2018) 062024 doi:10.1088/1757-899X/322/6/062024
1234567890‘’“”
4.3. Experiment
In this article, the daily object detection data set is trained with different object detection models, and
the results of object detection under different models are obtained. The network structure is shown in
the following figure:
Classification
4
SAMSE IOP Publishing
IOP Conf. Series: Materials Science and Engineering 322 (2018) 062024 doi:10.1088/1757-899X/322/6/062024
1234567890‘’“”
network, structure, parameter settings affect the accuracy of object detection. In the case of parameter
setting, the number of convolution kernel sizes, the learning rate setting, the regularization mode and
the gradient descent algorithm type play a key role in the final convergence rate and effect of the
model, and it is necessary to try to get good results, By repeated experiments, fine-tuning parameters
obtained by the model, compared with the original model accuracy is improved.
5. Conclusion
This article mainly established a small daily items detection data set. The data set is then trained on
different object detection models and has achieved good results in daily object detection. These well-
trained models can be used in the mobile platform, Nao robot platform or other intelligent devices, the
daily items to achieve accurate detection. In the future, we can get a better model by increasing the
capacity of the data set, the optimization of the model structure and the fine tuning of the parameters.
References
[1] Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection
and Semantic Segmentation[J]. 2013:580-587.
[2] Girshick R. Fast R-CNN[J]. Computer Science, 2015.
[3] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks.[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015,
39(6):1137.
[4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ra-manan. Object detection with
discriminatively trained part based models. TPAMI, 32:1627–1645, 2010.
[5] Sande KEAVD, Uijlings JRR, Gevers T, et al. Segmentation as selective search for object
recognition[C]// IEEE International Conference on Computer Vision. IEEE, 2012:1879-1886.
[6] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object
detection[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2016: 779 -788.
[7] Dai J, Li Y, He K, et al. R-FCN: Object Detection via Region-based Fully Convolutional
Networks[J]. 2016.
[8] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]// Computer
Vision and Pattern Recognition. IEEE, 2016:770-778