A Deep Learning Approach For Road Damage Detection From Smartphone Images
A Deep Learning Approach For Road Damage Detection From Smartphone Images
A Deep Learning Approach For Road Damage Detection From Smartphone Images
Abstract—With recent advances in technology, it is feasible to spite of its immaturity, recent advancements in image analysis
conveniently monitor urban roads using various cameras, such techniques have been producing impressive results and thus
as surveillance cameras, in-vehicle cameras, or smartphones, and increasing their usages for various applications (e.g., street
recognize their conditions by detecting specific types of road
damages in order to plan maintenance resources efficiently based cleanliness [10], traffic flow analysis [11], situation awareness
on the identified spots. This paper describes a road damage of disasters [12], and image search [13]).
type detection and classification solution submitted to the IEEE A few researchers developed image-based approaches for
BigData Cup Challenge 2018. Our solution is based on the road surface inspection using the state-of-the-art deep learning
state-of-the-art deep learning methods for an object detection methods. In particular, some works focus on detecting only the
task. In particular, our approach utilizes an object detection
algorithm to detect various types of road damages by training existence of the damage regardless of its type [6]. Other works
the detector on different image examples categorized into a set of focus on classifying the road damages into a few types. For
damages defined by Japan Road Association. We evaluated our example, Zhang et al. [7] devised an approach for detecting
approach thoroughly using different versions of trained models. two directional cracks (i.e., horizontal and vertical), while
Our experiments show that our approach was able to achieve an Akarsu et al. [8] developed another approach for detecting
F1 score up to 0.62.
Index Terms—Deep Learning, Road Damage Detection and three categories of damages, namely horizontal, vertical, and
Classification, Object Detection, Urban Street Analysis crocodile. Due to the fact that differentiating among damage
types is critical for proper road maintenance planning, Maeda
I. I NTRODUCTION et al. [9] have implemented an approach for a thorough
The economy of cities is essentially affected by their public classification of road damage types.
facilities and infrastructures. One fundamental element of such The focus of this study is automating the detection of
infrastructures is road. Many factors (e.g., raining and aging) different types of road damages (proposed by Maeda et al. [9])
cause different types of road damages that seriously impact using smartphone images crowdsourced by city crews or
road efficiency, driver safety1 , and the value of vehicles2 the public3 . Our approach uses one of the state-of-the-art
Therefore, countries devote a large annual budget for road deep learning algorithms (i.e., YOLO [16]4 ) for an object
maintenance and rehabilitation. For example, according to the detection task. The source code of our solution is available
statistics released by the Federal Highway Administration in at (https://github.com/dweeptrivedi/road-damage-detection).
the United States in 2013, the road network reached to 4.12 The remainder of this paper is organized as follows. Sec-
million miles and the government allocated $30.1 billion for tion II introduces the classification of road damages, reviews
constructing new roads and maintaining the existing ones [3]. the image-based object detection algorithms, and presents our
Efficient road maintenance requires a reliable monitoring solution. In Section III we report our experimental results.
system and a straightforward method is the human visual Finally, in Section IV, we conclude.
inspection; however, it is infeasible due to being expensive, II. ROAD DAMAGE D ETECTION S OLUTION
laborious and time-consuming. Therefore, researchers have
developed various solutions for automatic road damage inspec- A. Image Dataset
tion, including vibration-based [4], laser-scanning-based [5], The images provided by the IEEE BigData Cup Challenge
and image-based [6]–[9] methods. While detection by vibra- capture scenes of urban streets located in seven geographi-
tion methods is limited to the contacted parts of the road, cal areas in Japan; Ichihara city, Chiba city, Sumida ward,
laser-scanning methods provide accurate information about the Nagakute city, Adachi city, Muroran city, and Numazu city.
status of roads; however, such methods are expensive and Each image is annotated by one or more region(s) of interest
require a road closure. Meanwhile, image processing methods (referred to as ground truth box) and each box is labeled
are inexpensive but may suffer from a lack of accuracy. In with one of the road damage classes proposed by Maeda et
? These authors contributed equally to this work. 3 Spatial crowdsourcing mechanisms [14] can be used for collecting more
1 InEurope, 50 millions people are injured in traffic crashes annually [1]. images at locations which are not sufficiently covered by visual informa-
The bad conditions of roads is a primary factor for traffic cashes. tion [15] for roads inspection.
2 A study from the American Automobile Association (AAA) reports that 4 YOLO is chosen due to its feasibility to work on both server and edge
road damages have cost U.S. drivers about $3 billion annually [2]. devices (e.g., detection and classification can be done on a smartphone).
TABLE I: Road Damage Types [9]
Damage Type Detail Class Name
Wheel mark part D00
Longitudinal
Construction joint part D01
Linear Crack
Crack Equal interval D10
Lateral
Construction joint part D11
Alligator Crack Partial pavement, overall pavement D20
Rutting, bump, pothole, separation D40
Other Corruption Crosswalk blur D43
White/Yellow line blur D44
al. [9] (which is adopted from Japan Road Association [17]). prediction once makes YOLO achieve a significant speedup
As illustrated in Table I, the classification of road damages compared to R-CNN-based algorithms; hence can be used for
includes eight types which can be generalized into two cate- real-time prediction.
gories: cracks and other corruptions. The crack category is
either linear or alligator cracks. The linear cracks can be C. Deep Learning Approach
longitudinal and lateral. Meanwhile, the category of the other To solve the road damage type detection problem, we
corruptions include three sub-categories: potholes and rutting, consdier a road damage as a unique object to be detected. In
white line blur, and crosswalk blur. Examples of such types particular, each of the different road damage types is treated
of road damages are shown in Fig. 1. as a distinguishable object. Then, we use one of the state-of-
the-art object detection algorithms (i.e., YOLO) to be trained
B. Background on Object Detection Algorithms on the road damage dataset to learn the visual patterns of each
An object detection algorithm analyzes the visual content of road damage type (see Fig. 2).
an image to recognize instances of a certain object category,
then outputs the category and location of the detected objects.
With the emergence of deep convolutional networks, many
CNN-based object detection algorithms have been introduced.
The first one is the Region of CNN features (R-CNN)
method [18] which tackles object detection in two steps: object
region proposal and classification. The object region proposal Fig. 2: A Deep Learning Approach for Road Damage Detec-
employs a selective search to generate multiple regions. These tion and Classification
regions are processed and fed to a CNN classifier. R-CNN
is slow due to the repetitive CNN evaluation. Hence, many III. E XPERIMENTS
other algorithms have been proposed to optimize R-CNN (e.g.,
Fast R-CNN [19]). Other than the R-CNN-based algorithms, A. Dataset and Settings
The “You Only Look Once” (YOLO) method [16] uses a The dataset provided by the BigData Cup Challenge consists
different approach and basically merges the two steps of the R- of two sets: training images (7,231) and testing images (1,813).
CNN algorithm into one step by developing a neural network Every image in the training set is annotated by one or more
which internally divides the image into regions and predicts ground-truth boxes where each corresponds to one of the eight
categories and probabilities for each region. Thus, applying the types of road damages. The distribution of the training dataset
TABLE II: The Distribution of Training Datasets (Original, TABLE IV: F1 Scores for the Model Trained using D
Augmented, and Cropped) among the Road Damage Classes C
T
Road Damage Classes 0.01 0.05 0.1 0.15 0.2
Dataset 20k 0.53956 0.57488 0.57683 0.57637 0.57750
D00 D01 D10 D11 D20 D40 D43 D44
D 1747 2856 660 683 1801 371 587 2856 25k 0.53839 0.56616 0.57343 0.57532 0.57229
Da 1984 3267 1349 1372 1916 756 613 4060 30k 0.56553 0.58418 0.58734 0.58585 0.58477
Dc 1747 2856 660 683 1801 371 587 2856 35k 0.56888 0.58526 0.58899 0.58668 0.58703
40k 0.55349 0.57198 0.57502 0.57968 0.58046
45k 0.57658 0.58770 0.59411 0.58977 0.58588
TABLE III: Parameter Values for Experiments 50k 0.57203 0.57988 0.58059 0.58083 0.57342
Parameter Values 55k 0.58432 0.59115 0.59297 0.59325 0.59151
T 20k, 25k, 30k, 35k, 40k, 45k, 50k, 55k, 60k, 65k, 70k 60k 0.57205 0.58154 0.58407 0.57814 0.57637
C 0.01, 0.05, 0.1, 0.15, 0.2 65k 0.56402 0.57580 0.57536 0.57717 0.57653
NMS 0.45, 0.75, 0.85, 0.95, 0.999 70k 0.55960 0.57423 0.57286 0.57781 0.57496
TABLE VI: F1 Scores for the Model Trained using Dc Systems Center, and unrestricted cash gifts from Oracle.
C The opinions, findings, and conclusions or recommendations
T
0.01 0.05 0.1 0.15 0.2 expressed in this material are those of the authors and do not
20k 0.52322 0.55720 0.56551 0.56611 0.56354
25k 0.54508 0.56571 0.56494 0.56590 0.56752
necessarily reflect the views of any of the sponsors.
30k 0.53448 0.56157 0.56372 0.56292 0.56517
35k 0.55065 0.57252 0.57662 0.57326 0.57088
R EFERENCES
40k 0.53651 0.55193 0.54901 0.55348 0.55078 [1] G. Malkoc, “The importance of road maintenance,”
45k 0.52906 0.55767 0.55419 0.55642 0.55144 2015. [Online]. Available: http://www.worldhighways.com/categories/
50k 0.54435 0.55956 0.56084 0.55968 0.55880 maintenance-utility/features/the-importance-of-road-maintenance/
55k 0.54814 0.56924 0.57683 0.57304 0.57525 [2] “Study: Pothole damage costs u.s. drivers $3b a year,” 2016. [Online].
60k 0.55776 0.57562 0.57688 0.57339 0.57208 Available: https://www.insurancejournal.com/magazines/mag-features/
65k 0.54976 0.57010 0.57081 0.56882 0.56877 2016/03/21/401900.htm
70k 0.54665 0.56253 0.56369 0.55575 0.55151 [3] “American road & transportation builders association,” 2018. [Online].
Available: https://www.insurancejournal.com/magazines/mag-features/
2016/03/21/401900.htm
[4] B. X. Yu and X. Yu, “Vibration-based system for pavement condition
evaluation,” in AATT, 2006, pp. 183–189.
[5] Q. Li, M. Yao, X. Yao, and B. Xu, “A real-time 3d scanning system for
pavement distortion inspection,” MST, vol. 21, no. 1, p. 015702, 2009.
[6] A. Zhang, K. C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu,
J. Q. Li, and C. Chen, “Automated pixel-level pavement crack detection
on 3d asphalt surfaces using a deep-learning network,” CACAIE, vol. 32,
no. 10, pp. 805–819, 2017.
[7] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection
Fig. 4: An Image Containing Overlapped Ground-truth Boxes using deep convolutional neural network,” in ICIP. IEEE, 2016, pp.
3708–3712.
[8] B. Akarsu, M. KARAKÖSE, K. PARLAK, A. Erhan, and A. SARI-
multiple overlapped predicted boxes, we enlarged the values MADEN, “A fast and adaptive road defect detection approach using
of the N M S parameter. Experimentally, we noticed that computer vision with real time implementation,” IJAMEC, vol. 4, no.
increasing the value of N M S has improved the F1 score Special Issue-1, pp. 290–295, 2016.
[9] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road
slightly. As shown in Fig. 3, in general, the F1 score of damage detection and classification using deep neural networks with
all models when N M S = 0.999 is the highest. Thus, we smartphone images,” CACAIE.
exhaustively experimented different combinations of the other [10] A. Alfarrarjeh, S. H. Kim, S. Agrawal, M. Ashok, S. Y. Kim, and
C. Shahabi, “Image classification to determine the level of street
parameters using N M S = 0.999 to get the best F1 score. The cleanliness: A case study,” in BigMM. IEEE, 2018.
detection models trained using D, Da , and Dc have achieved [11] S. H. Kim, J. Shi, A. Alfarrarjeh, D. Xu, Y. Tan, and C. Shahabi, “Real-
F1 scores up to 0.61, 0.62, and 0.60 as shown in Figs. 3a, 3b, time traffic video analysis using intel viewmont coprocessor,” in DNIS.
Springer, 2013, pp. 150–160.
and 3c, respectively. [12] A. Alfarrarjeh, S. Agrawal, S. H. Kim, and C. Shahabi, “Geo-spatial
multimedia sentiment analysis in disasters,” in DSAA. IEEE, 2017, pp.
IV. C ONCLUSION 193–202.
In this study, we developed an image-based solution for [13] A. Alfarrarjeh, C. Shahabi, and S. H. Kim, “Hybrid indexes for spatial-
visual search,” in ACM MM Thematic Workshops. ACM, 2017, pp.
monitoring urban streets. Our solution uses YOLO for train- 75–83.
ing a model to detect various types of road damages as [14] A. Alfarrarjeh, T. Emrich, and C. Shahabi, “Scalable spatial crowdsourc-
distinguishable objects in the analyzed images. The solution ing: A study of distributed algorithms,” in MDM, vol. 1. IEEE, 2015,
pp. 134–144.
was able to achieve an F1 score up to 0.62 by augmenting [15] A. Alfarrarjeh, S. H. Kim, A. Deshmukh, S. Rajan, Y. Lu, and
more synthesized images to the low-cardinality classes of the C. Shahabi, “Spatial coverage measurement of geo-tagged visual data:
training set (for optimizing learning at the training phase) A database approach,” in BigMM. IEEE, 2018, pp. 1–8.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
and using a high value of the non-maximum suppression at once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
prediction (for increasing the overlapped predicted boxes to [17] Maintenance and Repair Guide Book of the Pavement 2013, 1st ed.
enhance the prediction). Tokyo, Japan: Japan Road Association, 04 2017.
[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
ACKNOWLEDGMENT hierarchies for accurate object detection and semantic segmentation,”
in CVPR, 2014, pp. 580–587.
This research has been supported in part by NSF grants [19] R. Girshick, “Fast R-CNN,” in ICCV, 2015, pp. 1440–1448.
IIS-1320149 and CNS-1461963, the USC Integrated Media