A Deep Learning Approach For Road Damage Detection From Smartphone Images

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Deep Learning Approach for Road Damage

Detection from Smartphone Images


Abdullah Alfarrarjeh? , Dweep Trivedi? , Seon Ho Kim, Cyrus Shahabi
Integrated Media Systems Center, University of Southern California, Los Angeles, CA 90089, USA
h alfarrar, dtrivedi, seonkim, shahabi [email protected]

Abstract—With recent advances in technology, it is feasible to spite of its immaturity, recent advancements in image analysis
conveniently monitor urban roads using various cameras, such techniques have been producing impressive results and thus
as surveillance cameras, in-vehicle cameras, or smartphones, and increasing their usages for various applications (e.g., street
recognize their conditions by detecting specific types of road
damages in order to plan maintenance resources efficiently based cleanliness [10], traffic flow analysis [11], situation awareness
on the identified spots. This paper describes a road damage of disasters [12], and image search [13]).
type detection and classification solution submitted to the IEEE A few researchers developed image-based approaches for
BigData Cup Challenge 2018. Our solution is based on the road surface inspection using the state-of-the-art deep learning
state-of-the-art deep learning methods for an object detection methods. In particular, some works focus on detecting only the
task. In particular, our approach utilizes an object detection
algorithm to detect various types of road damages by training existence of the damage regardless of its type [6]. Other works
the detector on different image examples categorized into a set of focus on classifying the road damages into a few types. For
damages defined by Japan Road Association. We evaluated our example, Zhang et al. [7] devised an approach for detecting
approach thoroughly using different versions of trained models. two directional cracks (i.e., horizontal and vertical), while
Our experiments show that our approach was able to achieve an Akarsu et al. [8] developed another approach for detecting
F1 score up to 0.62.
Index Terms—Deep Learning, Road Damage Detection and three categories of damages, namely horizontal, vertical, and
Classification, Object Detection, Urban Street Analysis crocodile. Due to the fact that differentiating among damage
types is critical for proper road maintenance planning, Maeda
I. I NTRODUCTION et al. [9] have implemented an approach for a thorough
The economy of cities is essentially affected by their public classification of road damage types.
facilities and infrastructures. One fundamental element of such The focus of this study is automating the detection of
infrastructures is road. Many factors (e.g., raining and aging) different types of road damages (proposed by Maeda et al. [9])
cause different types of road damages that seriously impact using smartphone images crowdsourced by city crews or
road efficiency, driver safety1 , and the value of vehicles2 the public3 . Our approach uses one of the state-of-the-art
Therefore, countries devote a large annual budget for road deep learning algorithms (i.e., YOLO [16]4 ) for an object
maintenance and rehabilitation. For example, according to the detection task. The source code of our solution is available
statistics released by the Federal Highway Administration in at (https://github.com/dweeptrivedi/road-damage-detection).
the United States in 2013, the road network reached to 4.12 The remainder of this paper is organized as follows. Sec-
million miles and the government allocated $30.1 billion for tion II introduces the classification of road damages, reviews
constructing new roads and maintaining the existing ones [3]. the image-based object detection algorithms, and presents our
Efficient road maintenance requires a reliable monitoring solution. In Section III we report our experimental results.
system and a straightforward method is the human visual Finally, in Section IV, we conclude.
inspection; however, it is infeasible due to being expensive, II. ROAD DAMAGE D ETECTION S OLUTION
laborious and time-consuming. Therefore, researchers have
developed various solutions for automatic road damage inspec- A. Image Dataset
tion, including vibration-based [4], laser-scanning-based [5], The images provided by the IEEE BigData Cup Challenge
and image-based [6]–[9] methods. While detection by vibra- capture scenes of urban streets located in seven geographi-
tion methods is limited to the contacted parts of the road, cal areas in Japan; Ichihara city, Chiba city, Sumida ward,
laser-scanning methods provide accurate information about the Nagakute city, Adachi city, Muroran city, and Numazu city.
status of roads; however, such methods are expensive and Each image is annotated by one or more region(s) of interest
require a road closure. Meanwhile, image processing methods (referred to as ground truth box) and each box is labeled
are inexpensive but may suffer from a lack of accuracy. In with one of the road damage classes proposed by Maeda et
? These authors contributed equally to this work. 3 Spatial crowdsourcing mechanisms [14] can be used for collecting more
1 InEurope, 50 millions people are injured in traffic crashes annually [1]. images at locations which are not sufficiently covered by visual informa-
The bad conditions of roads is a primary factor for traffic cashes. tion [15] for roads inspection.
2 A study from the American Automobile Association (AAA) reports that 4 YOLO is chosen due to its feasibility to work on both server and edge
road damages have cost U.S. drivers about $3 billion annually [2]. devices (e.g., detection and classification can be done on a smartphone).
TABLE I: Road Damage Types [9]
Damage Type Detail Class Name
Wheel mark part D00
Longitudinal
Construction joint part D01
Linear Crack
Crack Equal interval D10
Lateral
Construction joint part D11
Alligator Crack Partial pavement, overall pavement D20
Rutting, bump, pothole, separation D40
Other Corruption Crosswalk blur D43
White/Yellow line blur D44

(a) D00 (b) D01 (c) D10 (d) D11

(e) D20 (f) D40 (g) D43 (h) D44


Fig. 1: Image Examples for the Road Damage Types [9]

al. [9] (which is adopted from Japan Road Association [17]). prediction once makes YOLO achieve a significant speedup
As illustrated in Table I, the classification of road damages compared to R-CNN-based algorithms; hence can be used for
includes eight types which can be generalized into two cate- real-time prediction.
gories: cracks and other corruptions. The crack category is
either linear or alligator cracks. The linear cracks can be C. Deep Learning Approach
longitudinal and lateral. Meanwhile, the category of the other To solve the road damage type detection problem, we
corruptions include three sub-categories: potholes and rutting, consdier a road damage as a unique object to be detected. In
white line blur, and crosswalk blur. Examples of such types particular, each of the different road damage types is treated
of road damages are shown in Fig. 1. as a distinguishable object. Then, we use one of the state-of-
the-art object detection algorithms (i.e., YOLO) to be trained
B. Background on Object Detection Algorithms on the road damage dataset to learn the visual patterns of each
An object detection algorithm analyzes the visual content of road damage type (see Fig. 2).
an image to recognize instances of a certain object category,
then outputs the category and location of the detected objects.
With the emergence of deep convolutional networks, many
CNN-based object detection algorithms have been introduced.
The first one is the Region of CNN features (R-CNN)
method [18] which tackles object detection in two steps: object
region proposal and classification. The object region proposal Fig. 2: A Deep Learning Approach for Road Damage Detec-
employs a selective search to generate multiple regions. These tion and Classification
regions are processed and fed to a CNN classifier. R-CNN
is slow due to the repetitive CNN evaluation. Hence, many III. E XPERIMENTS
other algorithms have been proposed to optimize R-CNN (e.g.,
Fast R-CNN [19]). Other than the R-CNN-based algorithms, A. Dataset and Settings
The “You Only Look Once” (YOLO) method [16] uses a The dataset provided by the BigData Cup Challenge consists
different approach and basically merges the two steps of the R- of two sets: training images (7,231) and testing images (1,813).
CNN algorithm into one step by developing a neural network Every image in the training set is annotated by one or more
which internally divides the image into regions and predicts ground-truth boxes where each corresponds to one of the eight
categories and probabilities for each region. Thus, applying the types of road damages. The distribution of the training dataset
TABLE II: The Distribution of Training Datasets (Original, TABLE IV: F1 Scores for the Model Trained using D
Augmented, and Cropped) among the Road Damage Classes C
T
Road Damage Classes 0.01 0.05 0.1 0.15 0.2
Dataset 20k 0.53956 0.57488 0.57683 0.57637 0.57750
D00 D01 D10 D11 D20 D40 D43 D44
D 1747 2856 660 683 1801 371 587 2856 25k 0.53839 0.56616 0.57343 0.57532 0.57229
Da 1984 3267 1349 1372 1916 756 613 4060 30k 0.56553 0.58418 0.58734 0.58585 0.58477
Dc 1747 2856 660 683 1801 371 587 2856 35k 0.56888 0.58526 0.58899 0.58668 0.58703
40k 0.55349 0.57198 0.57502 0.57968 0.58046
45k 0.57658 0.58770 0.59411 0.58977 0.58588
TABLE III: Parameter Values for Experiments 50k 0.57203 0.57988 0.58059 0.58083 0.57342
Parameter Values 55k 0.58432 0.59115 0.59297 0.59325 0.59151
T 20k, 25k, 30k, 35k, 40k, 45k, 50k, 55k, 60k, 65k, 70k 60k 0.57205 0.58154 0.58407 0.57814 0.57637
C 0.01, 0.05, 0.1, 0.15, 0.2 65k 0.56402 0.57580 0.57536 0.57717 0.57653
NMS 0.45, 0.75, 0.85, 0.95, 0.999 70k 0.55960 0.57423 0.57286 0.57781 0.57496

TABLE V: F1 Scores for the Model Trained using Da


among the road damage classes is shown in Table II. It is evi- C
T
dent that the distribution of the dataset among the road damage 0.01 0.05 0.1 0.15 0.2
classes is not balanced where some classes (i.e., D10, D11, 20k 0.54979 0.56940 0.57063 0.57303 0.57393
25k 0.57178 0.58858 0.58433 0.58639 0.58759
D40, and D43) have smaller numbers of images compared 30k 0.55135 0.56225 0.57268 0.57013 0.56740
to the other classes. Hence, we used the Python Augmentor 35k 0.56149 0.57861 0.58590 0.58163 0.58304
library (https://augmentor.readthedocs.io/en/master/) to gener- 40k 0.57018 0.58282 0.58410 0.58062 0.57663
45k 0.56432 0.57993 0.58056 0.57528 0.57189
ate synthesized images for training images that contain such
50k 0.57097 0.58674 0.58371 0.57791 0.57927
classes5 . While augmentation, we carefully selected the image 55k 0.57441 0.58498 0.58734 0.58853 0.58749
processing techniques (e.g., brightening, gray-scale) provided 60k 0.57993 0.59094 0.59211 0.58587 0.58485
by the Augmentor tool to assure that the road damage scenes 65k 0.57102 0.57671 0.58116 0.58791 0.58644
70k 0.56117 0.56640 0.57049 0.56692 0.56735
were not affected6 . Another technique that we considered in
processing the training dataset is cropping. We cropped every
image to create a smaller size of the original image while listed in Table III. In what follows, we report the F1 score of
making sure that the cropped image contains the annotated our results8 to evaluate our solution.
regions. Such a dataset enables the training model to focus
on learning the features of regions of interest properly by B. Evaluation Results
discarding irrelevant scenes (e.g., sky view). Consequently, we Table IV shows the F1 scores of the detection model trained
had three training datasets: original (D), augmented (Da ), and using D by varying the values of T and C. In general,
cropped (Dc ). increasing T (i.e., training the model for a larger number of
To create an object detector, we fine-tuned the darknet53 iterations) or C (i.e., discarding larger number of the predicted
model using the YOLO framework (version 3). The model boxes with the lowest confidence scores) does not necessarily
was trained using the road damage classes. To experiment improve the performance of the model. Using D, the model
our solution thoroughly, we generated different versions of achieved the best F1 score (i.e., 0.59411) when T = 45k and
the trained models using each dataset (D, Da , and Dc ) by C = 0.1. Similarly, we evaluated our solution by training
varying three parameters: # of iterations for model training another model using Da . In general, the model using Da
(T), minimum confidence threshold (C), and non-maximum (see Table V) was slightly better than the one using D in
suppression (N M S)7 . Every model was trained up to 70k some cases. However, the best F1 score using D was higher
iterations and we preserved a snapshot of the trained model than the one using Da . Furthermore, the evaluation of the
at certain numbers of iterations. For every test image, YOLO model trained using Dc is shown in Table VI. In general,
generates a set of predicted boxes and each box is tagged with the model trained using Dc outperforms neither of the other
a prediction confidence score and a predicted label. To avoid models. The model using Dc was not optimized because any
reporting the boxes tagged with very low prediction confidence object detection algorithm during the training phase uses only
score, we discarded the boxes whose confidence scores were the regions of interests of each training image (i.e., ground-
below C. Furthermore, increasing the value of N M S (default truth boxes) rather than the entire image. The model using Dc
value 0.45) increases the number of overlapped predicted achieved the best F1 score (i.e., 0.57688) when T = 60k and
boxes; hence increasing the chances of correctly predicting C = 0.1.
a ground-truth box. The values of these three parameters are We noticed that the training image dataset contains some
5 Since original training images may contain multiple ground-truth boxes, images that have overlapped ground-truth boxes with multiple
the augmented images have changed the number of images for all classes. classes as shown in Fig. 4. Therefore, to enable reporting
6 Some image processing techniques, such as rotating, may lead to a
confusion (e.g., vertical vs. horizontal cracks) among road damage types. 8 Since the provided test images do not have ground-truth boxes, we only
7 Another parameter is the intersection of union (IOU). However, it was reported F1 scores that were calculated using the website of the road damage
fixed at 0.5 per the challenge rules. detection challenge (https://bdc2018.mycityreport.net/).
(a) W/ the Model (D, T = 45K, C = 0.01) (b) W/ the Model (Da , T = 25K, C = 0.0508) (c) W/ the Model (Dc , T = 55K, C = 0.15)
Fig. 3: The Impact of Varying the Value of N M S using Different Models

TABLE VI: F1 Scores for the Model Trained using Dc Systems Center, and unrestricted cash gifts from Oracle.
C The opinions, findings, and conclusions or recommendations
T
0.01 0.05 0.1 0.15 0.2 expressed in this material are those of the authors and do not
20k 0.52322 0.55720 0.56551 0.56611 0.56354
25k 0.54508 0.56571 0.56494 0.56590 0.56752
necessarily reflect the views of any of the sponsors.
30k 0.53448 0.56157 0.56372 0.56292 0.56517
35k 0.55065 0.57252 0.57662 0.57326 0.57088
R EFERENCES
40k 0.53651 0.55193 0.54901 0.55348 0.55078 [1] G. Malkoc, “The importance of road maintenance,”
45k 0.52906 0.55767 0.55419 0.55642 0.55144 2015. [Online]. Available: http://www.worldhighways.com/categories/
50k 0.54435 0.55956 0.56084 0.55968 0.55880 maintenance-utility/features/the-importance-of-road-maintenance/
55k 0.54814 0.56924 0.57683 0.57304 0.57525 [2] “Study: Pothole damage costs u.s. drivers $3b a year,” 2016. [Online].
60k 0.55776 0.57562 0.57688 0.57339 0.57208 Available: https://www.insurancejournal.com/magazines/mag-features/
65k 0.54976 0.57010 0.57081 0.56882 0.56877 2016/03/21/401900.htm
70k 0.54665 0.56253 0.56369 0.55575 0.55151 [3] “American road & transportation builders association,” 2018. [Online].
Available: https://www.insurancejournal.com/magazines/mag-features/
2016/03/21/401900.htm
[4] B. X. Yu and X. Yu, “Vibration-based system for pavement condition
evaluation,” in AATT, 2006, pp. 183–189.
[5] Q. Li, M. Yao, X. Yao, and B. Xu, “A real-time 3d scanning system for
pavement distortion inspection,” MST, vol. 21, no. 1, p. 015702, 2009.
[6] A. Zhang, K. C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu,
J. Q. Li, and C. Chen, “Automated pixel-level pavement crack detection
on 3d asphalt surfaces using a deep-learning network,” CACAIE, vol. 32,
no. 10, pp. 805–819, 2017.
[7] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection
Fig. 4: An Image Containing Overlapped Ground-truth Boxes using deep convolutional neural network,” in ICIP. IEEE, 2016, pp.
3708–3712.
[8] B. Akarsu, M. KARAKÖSE, K. PARLAK, A. Erhan, and A. SARI-
multiple overlapped predicted boxes, we enlarged the values MADEN, “A fast and adaptive road defect detection approach using
of the N M S parameter. Experimentally, we noticed that computer vision with real time implementation,” IJAMEC, vol. 4, no.
increasing the value of N M S has improved the F1 score Special Issue-1, pp. 290–295, 2016.
[9] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road
slightly. As shown in Fig. 3, in general, the F1 score of damage detection and classification using deep neural networks with
all models when N M S = 0.999 is the highest. Thus, we smartphone images,” CACAIE.
exhaustively experimented different combinations of the other [10] A. Alfarrarjeh, S. H. Kim, S. Agrawal, M. Ashok, S. Y. Kim, and
C. Shahabi, “Image classification to determine the level of street
parameters using N M S = 0.999 to get the best F1 score. The cleanliness: A case study,” in BigMM. IEEE, 2018.
detection models trained using D, Da , and Dc have achieved [11] S. H. Kim, J. Shi, A. Alfarrarjeh, D. Xu, Y. Tan, and C. Shahabi, “Real-
F1 scores up to 0.61, 0.62, and 0.60 as shown in Figs. 3a, 3b, time traffic video analysis using intel viewmont coprocessor,” in DNIS.
Springer, 2013, pp. 150–160.
and 3c, respectively. [12] A. Alfarrarjeh, S. Agrawal, S. H. Kim, and C. Shahabi, “Geo-spatial
multimedia sentiment analysis in disasters,” in DSAA. IEEE, 2017, pp.
IV. C ONCLUSION 193–202.
In this study, we developed an image-based solution for [13] A. Alfarrarjeh, C. Shahabi, and S. H. Kim, “Hybrid indexes for spatial-
visual search,” in ACM MM Thematic Workshops. ACM, 2017, pp.
monitoring urban streets. Our solution uses YOLO for train- 75–83.
ing a model to detect various types of road damages as [14] A. Alfarrarjeh, T. Emrich, and C. Shahabi, “Scalable spatial crowdsourc-
distinguishable objects in the analyzed images. The solution ing: A study of distributed algorithms,” in MDM, vol. 1. IEEE, 2015,
pp. 134–144.
was able to achieve an F1 score up to 0.62 by augmenting [15] A. Alfarrarjeh, S. H. Kim, A. Deshmukh, S. Rajan, Y. Lu, and
more synthesized images to the low-cardinality classes of the C. Shahabi, “Spatial coverage measurement of geo-tagged visual data:
training set (for optimizing learning at the training phase) A database approach,” in BigMM. IEEE, 2018, pp. 1–8.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
and using a high value of the non-maximum suppression at once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
prediction (for increasing the overlapped predicted boxes to [17] Maintenance and Repair Guide Book of the Pavement 2013, 1st ed.
enhance the prediction). Tokyo, Japan: Japan Road Association, 04 2017.
[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
ACKNOWLEDGMENT hierarchies for accurate object detection and semantic segmentation,”
in CVPR, 2014, pp. 580–587.
This research has been supported in part by NSF grants [19] R. Girshick, “Fast R-CNN,” in ICCV, 2015, pp. 1440–1448.
IIS-1320149 and CNS-1461963, the USC Integrated Media

You might also like