Fall Prevention From Ladders Utilizing A
Fall Prevention From Ladders Utilizing A
Fall Prevention From Ladders Utilizing A
April 8, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3164676
ABSTRACT According to the Center for Construction Research and Training (CPWR) and the Korea
Occupational Safety & Health Agency (KOSHA), falls from ladders are a leading cause of fatalities. The
current safety inspection process to enforce height-related rules is manual and time-consuming. It requires
the physical presence of a safety manager, for whom it is sometimes impossible to monitor an entire area in
which ladders are being used. Deep learning-based computer vision technology has the potential to capture a
large amount of useful information from a digital image. Therefore, this paper presents a deep learning-based
height assessment method using a single known value in an image to measure working height, monitor
compliance to safety rules, and ensure worker safety. The proposed method comprises (1) extraction of safety
rules from the KOSHA database related to the A-type ladder; (2) object detection (Single Shot Multibox
Detector SSD) (3) a height-computing module (HCM) to estimate the working height of the worker (how
high a worker is from the ground); and (4) classification of worker behavior (using the developed SSD-based
HCM) based on the best practices derived from the KOSHA database. The developed algorithm has been
tested on four different scenarios based on KOSHA safety rules, with heights ranging from under 1.2 m to
over 2 m. Additionally, the proposed method was evaluated on 300 images for binary classification (safe and
unsafe) and achieved an overall accuracy of 85.33%, verifying its feasibility for intelligent height estimation
and compliance monitoring.
INDEX TERMS Falls from ladders, vision intelligence-based monitoring, construction safety rules, vision-
based height estimation, deep learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 36725
S. Anjum et al.: Fall Prevention From Ladders Utilizing Deep Learning-Based Height Assessment Method
accounting for more than one-third of all industrial accidents. rithms such as feature descriptors (BRIEF, SIFT, and SURF)
According to an industrial accident survey conducted by the were used to extract useful information from digital images
Korea Occupational Safety and Health Agency (KOSHA) [17]. The main problem with traditional approaches is the
from 2009 to 2017, fall-related accidents accounted for 47.7– selection of important features (handcrafted) from images; as
52.1% of the total number of deaths in the construction the number of classes increases, the feature extraction process
industry [6]. Likewise, Sim and Kang [7] published a report becomes more difficult. Thus, researchers introduced convo-
in 2017, which stated that ladder fatalities accounted for 31% lutional neural network (CNN)-based deep learning technol-
of the total industrial fatalities between 2005 and 2014. The ogy, which can automatically extract and recognize features
statistics mentioned above illustrate that, in the construction from a static image by stacking multiple convolutional and
industry, falls from ladders (FFL) are a severe problem in pooling layers.
terms of workers’ fatalities and injuries. The problems of FFL In recent years, CNNs have played a positive role in CV
should be effectively addressed to ensure worker safety. and pattern recognition [18], [19]. Following the role of
FFLs are caused by a defective ladder or climbing a ladder CNN in CV, researchers incorporated deep learning-based
with material, climbing a free ladder, or contravening safety technology in the construction domain for automated doc-
rules [8], [9]. The safety rules of the Occupational Health and umentation, safety monitoring, hazard detection, and defect
Safety Administration (OSHA), International Organization detection [14], [20], [21]. For instance, Thakar et al. [22] used
of Standardization (ISO), and KOSHA [6], [10], [11] are an object detector (SSD) with non-maximum suppression as a
outlined to reduce FFHs and FFLs. Accidents due to rule base model for asset monitoring at construction sites. In addi-
contravening result in workers being laid off, low quality tion, they used affinity propagation clustering to enhance the
of life, and health issues. These factors negatively affect a performance of SSD, offering an optimum balance between
company’s production, finances, and reputation [12]. Unsafe speed and accuracy. Similarly, Ma et al. [23] proposed a
worker behavior has been recognized as a key factor in quality inspection framework by combining an object detec-
construction-related accidents [13]. Monitoring the unsafe tor (SSD) used to detect five different types of defects with
behavior of workers is important for reducing risks at con- building information modeling to improve productivity and
struction sites. Therefore, safety managers should constantly overcome unnecessary deviations caused by human judg-
monitor workers to mitigate the factors mentioned above. ment. Zhong et al. [24] proposed a deep learning-based
However, this monitoring process is manual (safety managers text classification approach by integrating natural language
must be physically present at construction workplaces to processing and CNN for accident reports and utilized latent
identify potential hazards or non-compliance to safety rules) dirichlet allocation to understand the factors contributing
and time-consuming, relying on the safety managers’ per- to construction accidents. Their proposed approach helped
sonal experience and competence [14]. The safety training safety managers to improve safety at construction sites by
program is another effective and standard approach to reduce investigating accident reports.
unsafe behavior; countries with strong occupational safety The researcher extended the application of CV technol-
rules require workers to have training certification before ogy to worker safety and addressed serious FFH problems
working at construction sites. Trained workers have adequate at construction sites. For instance, Khan et al. [25] pro-
knowledge to understand the consequences of unsafe behav- posed a mask region-based convolutional neural network
ior while working at heights; however, they do not take safety (R-CNN-based) detection algorithm to monitor workers’
rules (such as wearing proper personal protective equipment) behavior while working on top of mobile scaffolds. Tang et al.
seriously, and sometimes unsafe behavior happens out of [26] applied the two-stage object detection algorithm Faster
lack of concern for their safety [15]. To deal with the under- R-CNN to monitor workers’ behavior and ensure the safety of
estimation and inadequate awareness of risks, researchers their faces, eyes, hands, and feet. Fang et al. [27] developed
have aimed to improve risk management by developing risk a region-based convolutional neural network (Mask R-CNN)
management methods for worker safety, which can be catego- to identify the unsafe behavior of workers crossing the struc-
rized into proactive and reactive [16]. Proactive risk analysis tural support during the construction of deep-pit foundations.
methods are preferred for workers’ protection, owing to their Similarly, Wang et al. [28] proposed a vision-based system
ability to collect and analyze data in real-time from cameras for workers’ safety by identifying and analyzing worker–
and sensors. Owing to the complexity of construction sites, equipment interactions to identify danger zones and generate
the use of sensor technology is limited. Computer vision safety alarms. Likewise, Khan et al. [29] developed a tag and
(CV)-based technology for collecting data to analyze risk Internet of Things (IoT)-based safety hook for worker safety
from on-site installed cameras is an ideal solution for clogged to prevent falls from scaffolding while working on height.
(material, workers, and equipment) construction sites and However, few studies have focused on worker safety while
preferable for workers who do not want to attach sensors to working on ladders. For example, Seo et al. [30] conducted
their bodies [14]. a study to understand the risks of falls and work-related
CV-based object detection utilizes two approaches: (1) musculoskeletal disorders while working on ladders by esti-
traditional and (2) deep learning. Before the emergence of mating musculoskeletal stress. Ding et al. [31] introduced
deep learning technology, traditional object detection algo- a deep learning-based hybrid model comprising a CNN
and long short-term memory that automatically identified are applied as a source of information. The working
unsafe behavior by detecting workers working on a lad- height and examination of occupational rules relating to
der. Piao et al. [32] proposed a dynamic fall risk assessment this height could significantly prevent FFLs.
framework for construction workers that combined CV and • The proposed topology validated workers’ safe and
the Bayesian network to reduce FFH by automatically detect- unsafe behavior using four different cases following the
ing risk factors and improving risk assessment efficiency and KOSHA rules. The developed algorithm utilized object
used working on a ladder as a case study. Chen et al. [33] detection as a base model for the HCM to estimate the
introduced a proactive worker safety risk evaluation frame- working height and compared it with the corresponding
work using position and worker posture as quantitative indi- occupational safety rule.
cators to classify workers’ behavior. The authors used IMU
sensors with a vision-based 3D skeleton and ultra-wideband II. RESEARCH METHODOLOGY
(UWB) to classify workers’ behavior as safe or unsafe. Like- Various ladders are used during construction work; however,
wise, Han et al. [34] collected motion data with a Kinect R the scope of this research is focused on A-type ladders. These
depth sensor and investigated motion analysis approaches to ladders are primarily used indoors for short-duration work.
automatically recognize unsafe worker behavior while climb- A fall from these ladders is a particularly severe concern
ing a ladder. These approaches to detect unsafe ladders use in the industry. Therefore, this study aimed to develop a
motion capturing-based activity recognition models that can vision-based HCM that provides an effective solution for
intelligently distinguish safety-related behaviors on a ladder. automating worker safety monitoring (rule compliance) on
A comparative analysis of existing studies related to worker an A-type ladder. This section outlines the research process
safety with the proposed study is summarised in Table 1. to develop an algorithm. The proposed algorithm can be
The comparative analysis is performed based on previous used to measure the working height from a vision sensor
methods, working height estimation, ensuring safety rules at and recognize unsafe worker behavior while they work on an
a specified working height, and targeted objects. A-type ladder.
Table 1 summarises that, despite the excellent and in- A systematic approach deployed in this work comprises
depth research, previous studies exhibit two major limita- of the following four steps (Fig. 1): (1) problem identifi-
tions. First, checking the safety rules correlated with the spe- cation and objective, (2) development of the algorithm, (3)
cific working height, which is particularly important because experimental setup and results, and (4) evaluation and future
workers perform various tasks at different working heights work. In the first step, we have identified the problem as
in construction sites, and negligence in compliance with FFL. We reiterate that the construction industry must look
safety rules can lead to hazardous situations. Second, motion at an automated solution to compute the working height
capturing-based activity recognition models are more com- and monitor the workers’ behavior when they are using an
putationally complex than object detection models. Further- A-type ladder. During the development stage (Step 2), dataset
more, motion capturing-based activity recognition models preparation, model training, and HCM were performed. Sub-
have a higher false detection rate (low accuracy) when the sequently, in Step 3, four scenarios were evaluated to verify
scenes to be monitored are closely correlated [34]. There- the performance of SSD and HCM modules. Then as the last
fore, an automated and less computationally intensive method step, the SSD object detector and HCM have been evaluated
to monitor workers’ safety while they work on ladders is and discussed.
required. In light of the above findings, we decided to use a
vision-based technology for enhancing workers’ safety, and A. PROCESS FLOW
this study contributed as follows: Fig. 2 depicts the workflow of the proposed method, which is
described in detail in this section. We extracted frames from
• The manually extracted occupational safety rules corre- the video, and for each frame, the algorithm performed visual
lated with the A-type ladder from the KOSHA expert recognition to detect a worker and an A-type ladder. Follow-
knowledge database (constituted by the ISO-450001) ing this, their pixel values are compared, and the intersection
[35] have been incorporated with CV technology. between the two bounding boxes is calculated to determine
The integrated technology replaced the manual safety whether the bounding box representing the worker is inside
inspection process for real-time monitoring of worker or outside the box, which represents the A-type ladder. If the
safety. intersection between the two bounding boxes is true and
• A dataset for safety behavior detection is created the worker’s bottom-left corner value is less than the ladder
using 21 videos of working on an A-type ladder. The bottom-left corner value, then the worker is standing on the
frames have been extracted and labeled (1825) for Deep ladder. Subsequently, the frame goes through the remaining
learning-based object detection. stages of the algorithm; otherwise, the visual recognition
• A height-computing module (HCM) leveraging a deep process continues. If a worker is on the ladder, the next step
learning-based object detection approach (SSD) has is to determine the height (how high the worker is from the
been developed to estimate the working height of a ground). Following this, the KOSHA regulation is examined
worker with the help of detected object coordinates that to classify worker behavior based on the computed height.
The KOSHA defined specific rules for the worker; working A-type ladder height (1.7 m) as the reference point [37]. The
at different heights on A-type ladders is explained with an A-type ladder is a modified version of the simple ladder that
example. A construction worker (working on an A-type lad- doesn’t inherit the required additional support, as shown in
der up to a maximum height of 1.2 m) exhibits safe behavior Fig. 3. Fig. 4 depicts the coordinate system of the CV-based
when wearing a helmet; otherwise, the behavior is considered approach for height computing using the pixel values of the
unsafe. However, the rules change when the worker works on detected objects.
an A-type ladder at a height greater than 1.2 m but less than Fig. 4(a) depicts that the detected objects in a given digital
or equal to 2 m. In this case, the corresponding KOSHA rule image are a worker with a safety belt and a ladder with
states that the workers must wear a helmet, and those two outriggers. Fig. 4(b) illustrates the coordinates of interest for
workers should work together for safe behavior; otherwise, the objects detected in Fig. 4(a).
p p p p p p
it is unsafe. Pl,t = Xl , Yt , W1 , H1 and Pl,b = H1 + Yt are the top-
left and bottom-left coordinates of the person, respectively,
B. PROPOSED METHOD and Ll,t = Xll , Ytl , W2l , H2l and Ll,b = H2l + Ytl are the top-left
To estimate the working height from the ground, we need and bottom-left coordinates of the ladder, respectively.
to select a reference point in all frames of an entire video After extracting the coordinates of the detected objects in
sequence and then compute the working height with respect the digital image, the next step is to determine the work-
to the reference point. Therefore, this study used a fixed ing height of the worker on the A-type ladder. The SSD
FIGURE 1. Research approach for safety risk identification on ladders utilizing the Deep Learning-based height assessment
method.
p
Yt and Ytl are the top-left y-coordinates of the bounding global competition. However, despite the strict implemen-
boxes corresponding to the worker and ladder, respectively. tation of occupational safety regulations, certain practices
A fixed height of the A-type ladder used as the reference still follow traditional approaches, such as working with A-
point. The height of the A-type ladder and its corresponding type ladders. The ban on the use of mobile/portable ladders
pixel value is given as follows: imposed by the Ministry of Employment and Labor in Korea
and KOSHA was lifted as of March 2019. Mobile/portable
Ladder_Height = 5.5 ft or 1.7 m (5)
ladders can now be used following defined safety rules. Fur-
Ladder_Pixel_value = 100% (6) thermore, it is recommended that a fall prevention device
The height of the ladder bounding box H2l can be written must be installed when an A-type/mobile ladder is used.
as Several safety measures have been proposed for the use of
ladders. Typically, all ladders must be used on flat, solid, and
H2l = lbbh1 + lbbh2 (7) non-slip floors [39]. The rules employed in the developed
SSD-based HCM are listed below. We have designed four
where lbbh1 and lbbh2 are the first and second halves of the
case scenarios based on these.
ladder bounding box pixel values, respectively; and
• If the working height is less than or equal to 1.2 m, the
lbbh1 = Ll,b − Pl,b (8) worker must wear a helmet.
Using Equation (8), the value of lbbh2 can be determined as • If the working height is greater than 1.2 m but less than
2 m, the worker must wear a helmet and work in a
lbbh2 = H2l − lbbh1 (9) group of two workers. The use of the topmost rung is
prohibited.
The working height in pixels (i.e., the height at which the
worker works on the ladder) is given as follows: The CV algorithm is considered a better approach
for implementing expert knowledge (safety regulations) at
lbbh1
Working_height_pixels = × Ladder_Pixel_value construction sites. Therefore, we manually analyzed and
H2l extracted KOSHA rules correlated with A-type ladders.
(10) In this study, we developed a vision-intelligence-based HCM
The actual working height in feet can be obtained using to prevent FFL. Four case scenarios, listed in Table 2, are
Equations (5), (6), and (10) as follows: considered to classify a worker’s behavior as safe or unsafe.
Case 1: If a worker (W) work at a height less than or equal
Working_height_pixels to 1.2 m on a ladder without outriggers (Lw ) and is wearing a
Working_height_feet =
Ladder_Pixel_value helmet (h), then this should be classified as safe behavior (Bs)
× Ladder_Height (11) and represented as
The working height in meters is given as follows
if W ∩ Lw & h ∈ Bs (13)
Working_Height = Working_height_feet × 0.3048 (12)
Case 2: If a worker (W) works at a height less than or
After obtaining the working height, the algorithm tests equal to 1.2 m on a ladder without outriggers (Lw ) and is not
it against the corresponding KOSHA regulation to predict wearing a helmet (h), then this should be classified as unsafe
worker behavior. behavior (Bu) and represented as
C. ANALYSIS OF KOSHA RULES CORRELATED WITH if W ∩ Lw & ¬h ∈ Bu (14)
A-TYPE LADDER
The KOSHA regulations consist of 13 chapters with multiple Case 3: If a worker (W1) works at a height greater than
sections, including 657 standards, of which 277 are associ- 1.2 m but less than 2 m on a ladder without outriggers (Lw )
ated with the construction industry [38]. The Labor Standards while wearing a helmet (h), and if two workers are working
Act of 1953 created a framework for Korea’s industrial safety in a group (W1, W2), then this should be classified as safe
and health standards, which possesses an ISO-450001 accred- behavior and represented as
itation, to establish safety and health management systems
(KOSHA 18001) at work [35]. Inspired by the rapid rise of if W1 ∩ Lw & h & W2 ∈ Bs (15)
industries between 1970 and 1980, KOSHA was founded in
1987. KOSHA amended the corresponding rules and regula- Case 4: If a worker (W1) works at a height greater than
tions to meet the mandatory safety and health requirements 1.2 m but less than or equal to 2 m on a ladder without
in numerous industries having toxic and complex working outriggers (Lw ) and works without the support of a co-worker
environments. Since then, KOSHA has compiled and exam- while not wearing a helmet (h), then this should be classified
ined several cases, resulting in creating an expert knowledge as unsafe behavior and represented as
database. Significant changes have been made to improve
policies in compliance with modern industry practices, facing if W1 ∩ Lw & ¬h|¬W 2 ∈ Bu (16)
FIGURE 4. CV-based approach (a) Examples of the detected objects. (b) Bounding box coordinates of a person and A-type
ladder.
TABLE 2. Selected case scenarios for predicting the behavior of workers on A-type ladders.
D. DEVELOPMENT OF THE ALGORITHM to obtain an appropriate and useful dataset for training a
This section summarizes the necessary steps in developing deep learning-based SSD model. Each frame has been man-
the algorithm, such as dataset preparation, deep-learning- ually reviewed during the data cleaning process to determine
based model selection, HCM, and model training. whether it is suitable for training the model. Improper or
unsuitable frames, such as those with incorrect exposure or
repeated images, have been removed. A total of 1,825 images
1) DATASET PREPARATION
were obtained after the cleaning process. These images were
Many digital images covering a wide range of patterns are imported into an image labeling application in MATLAB,
required to train vision-based object detection models. How- which is used to label the ground truth in the images. The
ever, obtaining labeled datasets from open-source websites input dataset is divided into the following five class labels.
is difficult because vision technology is relatively new in
the construction industry. The dataset for this study consists 1. Ladder with outriggers
of 21 videos collected from the Construction Technology 2. Ladder without outriggers
Innovation Laboratory (ConTil) of Chung-Ang University, 3. Worker with helmet
Seoul, South Korea. To prepare an image recognition dataset 4. Worker without helmet
from these videos, the authors used the command line tool 5. Worker with safety belt
Fast-forward MPEG (ffmpeg) to extract frames from each The labeled dataset was randomly shuffled and split into
video. After extracting the frames, performed data cleaning training and evaluation datasets in an 80:20 ratio, such that
the training and testing set contained 1,460 and 365 images, be determined by post-processing the ResNet-50 classified
respectively. The resolution of the dataset is 404 × 720 pixels. output. Therefore, an additional module (HCM) is introduced
Fig. 5 shows the labeled dataset. for post-processing to measure the working height. SSD is
utilized as a base model for HCM; the algorithm checks the
2) DEEP LEARNING MODEL SELECTION correlated safety KOSHA rule and determines whether the
This study used a deep learning algorithm as the backbone worker behavior is safe or unsafe. The computational steps
model for object detection. There are two types of object involved are as follows. First, the SSD detects the target
detectors: one- and two-stage detectors. The primary dif- objects in the construction site images, as stated in the dataset
ference between one- and two-stage object detectors is that preparation Section II-D (1). Next, the bounding box values
one-stage detectors use only a single CNN to predict classes and labels of the detected objects are input to the HCM, which
and offsets from anchor boxes without requiring proposal checks whether the worker is working on a ladder. Then, the
generators. In contrast, two-stage object detectors achieve HCM computes the working height (based on the equations
prediction in two stages: the first generates region proposals derived in Section II-B). Finally, the result of the SSD-based
with a high score, and the second provides the final predic- HCM is utilized to cross-check the corresponding safety rule
tion. One-stage object detectors are preferred for real-time in order to classify worker behavior.
object detection due to a fixed inference time, unlike two- In HCM, class labels are assigned after computing the
stage detectors with a variable inference time [40]. Existing working height based on the bounding box names of the
state-of-the-art deep learning object detectors, such as Faster person and ladder. To determine whether a person stands on
R-CNN [41], YOLO[42], and SSD [43], can be used to detect an A-type ladder, the HCM check the intersection between
objects effectively in the presence of illumination as well as the bounding boxes of the detected objects, which requires
occlusions. This study used an SSD with ResNet50 to detect two position vectors (defined as ‘‘Person’’ and ‘‘Ladder’’) as
workers’ and A-type ladders in construction sites, as the input; the detector then returns the area (i.e., a scalar value)
SSD outperformed the state-of-the-art object detector Faster of the intersection. For better accuracy, the HCM compares
R-CNN. the pixel values of the bottom-left coordinates of the person
Moreover, SSD exhibits much better accuracy than other and the ladder bounding boxes. When the intersection area
single-stage models, as claimed by Wei et al. [41]. An input between the two bounding boxes is greater than ‘‘1’’ and
image passes through a single CNN operation, following the pixel value of the bottom-left coordinate of the person
which the relevant features are extracted from this image and bounding box is less than that of the bottom-left coordinate
the target objects are detected, as shown in Fig. 6. The HCM of the ladder bounding box, then it is evaluated that the worker
post-process the CNN predictions to compute the working is working on the ladder. The bounding box coordinates of a
height of the worker on the A-type ladder. The algorithm then person standing at different heights on the ladder are shown
determines whether the workers’ behavior is safe or unsafe. in Fig. 7. Note that the proposed HCM can simultaneously
This study classifies worker behavior based on the scenarios identify multiple workers and ladders; moreover, it assigns
discussed in Section II-C. a unique I.D. to the worker when the person bounding box
intersects the ladder bounding box; otherwise, it skips further
3) HEIGHT COMPUTING MODULE (HCM) processing.
Although this study utilized a ResNet-50 CNN in the SSD For the cases represented in Fig. 7(a) and (b), if the person
for object detection, the working height on the ladder can bounding box is inside the ladder bounding box, the HCM
FIGURE 7. Representation of bounding box coordinates at different working heights. (a) Person on an A-type
ladder of height less than 1.2 m. (b) Person on an A-type ladder of height equal to 1.2 m. (c) Person on an A-type
ladder of height greater than 1.2 m and less than 2 m.
determines the intersection between the two bounding boxes, of the ladder bounding box Ll,b . If P2l,b = Ll,b , this implies
and the detector returns the corresponding area. If the area is that the second person works in a group to hold an A-type
greater than ‘‘1’’, then the HCM compared the bottom-left y- ladder). If 1.2 m < H < 2 m, the HCM checks the class label
coordinate of the person bounding box Pl,b with the bottom- of the person. If the class label is ‘‘worker with helmet’’ and
left y-coordinate of the ladder bounding box Ll,b . If Pl,b < two workers are present, then the behavior is classified as
Ll,b , it implies that the person is standing on the ladder. If the safe, otherwise it is classified as unsafe.
computed working height (H) is less than or equal to 1.2 m, Algorithm I presents the pseudocode of the proposed
the HCM checks the class label of the person. When the class method, which identifies and classifies objects, intended to
label is ‘‘worker with helmet,’’ then the behavior is classified recognize unsafe behavior when using an A-type ladder. The
as safe, otherwise, it is classified as unsafe. pseudocode clearly outlined the important steps of the algo-
The HCM follows a similar procedure for the case shown rithm. It takes videos as input from the IP camera and predicts
in Fig. 7(c). First, it determines whether the person P1 is the output as safe or unsafe. Line 1 to 4 extract frames from an
on a ladder. Next, it checks if person P2 is present (this is input video and pass them to the trained model. Lines 5 and
performed by comparing the bottom-left y-coordinate of the 6 extract the coordinates of the person and the ladder bound-
bounding box that represents a second person P2l,b with that ing boxes. The intersection between the person and ladder
TABLE 3. Model raining parameters. ladder, as shown in Fig. 9(a). In Fig. 9(b), a worker wear-
ing a safety belt and helmet correctly identified as working
on an A-type ladder with outriggers. These results demon-
strate that our trained model successfully identified various
objects irrespective of the viewing angle. The detected objects
were post-processed by the HCM to determine the height.
The HCM determines the working height on the ladder and
cross-checks the converted corresponding KOSHA safety
bounding boxes is obtained in lines 7 and 8. The working rule (Section II-C), and categorizes the worker behavior as
height of the ladder is computed from lines 9 (i) to 9 (ix). safe (if no rule violation) and unsafe (in case of violation
Finally, lines 11 and 12 performed a rule-based comparison of safety rules) on the local system. The safety manager is
between the computed height and converted KOSHA safety notified about workers’ visual safety status and behavior on
rules (Section II-C) to determine unsafe behavior. an A-type ladder. We detail the results obtained from the
SSD-based HCM for the four scenarios in real-time.
4) MODEL TRAINING
Training an SSD requires the following input arguments: A. CASE 1: WORKER ON A-TYPE LADDER FOR SAFE
pre-processed data, layer graphs, and training options. Pre- BEHAVIOR (H≤1.2M)
processed data (i.e., modified input data) corresponds to the This experimental scenario demonstrates the safe behavior
prerequisites of the selected model. In this study, the size of of a worker on an A-type ladder working at a height less
the input image and bounding boxes are modified. In addition, than or equal to 1.2 m. Fig. 10(a) shows a worker wearing
SSD layers are utilized as layer graphs, which need input a helmet working on a ladder with outriggers at the height of
parameters of image size, number of classes, and network 1.01 m (safe behavior). Fig. 10(b) depicts a worker wearing
architecture. The input image size is set to 300 × 300 × 3, and a helmet working on a ladder with outriggers at the height of
the number of classes is set to 5. ResNet50 is used as the base 1.2 m (safe behavior). This scenario is deemed safe because it
network (a pre-trained CNN). The default parameters of the fulfils the worker safety requirement as per the KOSHA rule
training options, such as the momentum, initial learning rate, in equation 13 (Section II-C).
mini-batch size, learning rate schedule, learning rate drop
factor, and maximum number of epochs, were modified. The B. CASE 2: WORKER ON A-TYPE LADDER FOR UNSAFE
input network size set to 300 × 300, 3. The initial learning BEHAVIOR (H≤1.2M)
rate, mini-batch size, and stochastic gradient descent with This experimental scenario demonstrates the unsafe behavior
momentum were set to 0.001, 16, and 0.9000 respectively. of a worker on an A-type ladder working at a height less
The execution environment was set to a GPU for fast training. than or equal to 1.2 m. Fig. 11(a) depicts a worker without
In addition, a piecewise learning rate schedule was used; a helmet working on a ladder without outriggers at the height
the learning rate drop period and maximum number of epochs of 0.52 m (unsafe behavior). Fig. 11(b) shows a similar
were set to 30 and 300, respectively. The training was per- example as in Fig. 11(a), except that the working height is
formed on a Windows 10 Pro Intel R Core i9 10th genera- 1.2 m (unsafe behavior). This scenario is classified as unsafe
tion, 3.30 GHz processor with 256 GB RAM. Furthermore, because it contravenes the safety rule shown in equation 14
we trained, tested, and evaluated the proposed algorithm (Section II-C).
using MATLAB R2020b. The model training parameters are
listed in Table 3. C. CASE 3: WORKER ON A-TYPE LADDER FOR SAFE
BEHAVIOR (1.2 m < H < 2 m)
III. RESULTS This experimental scenario demonstrates the safe behavior of
The Android mobile application ‘‘IP Webcam’’ was used as a worker on an A-type ladder working at a height greater than
an IP camera to send real-time video data from a smart- 1.2 m and less than 2 m. Fig. 12(a) depicts the two workers
phone camera to MATLAB using the Hypertext Transfer performing work together, with worker 1 wearing a safety belt
Protocol (HTTP) as a wireless communication protocol. The and helmet and standing on a ladder, and the second is holding
developed model was deployed on a local system (Core i9 the ladder. Fig. 12(b) shows a similar example as in Fig. 12(a)
10th generation) and received a real-time video from an IP except the working height is 1.31 m (safe behavior). This
camera to feed the developed algorithm. scenario is classified as safe because it fulfils the worker
All objects have been identified using an SSD-based deep safety requirement according to the KOSHA rule in equation
learning model. Fig. 8(a) shows the accurate detection of a 15 (Section II-C).
worker wearing a safety belt and helmet working on a ladder,
a worker wearing a helmet holding a ladder, and a ladder with D. CASE 4: WORKER ON A-TYPE LADDER FOR UNSAFE
outriggers. In contrast, in Fig. 8(b), the identified object is a BEHAVIOR (1.2 m < H < 2 m)
ladder without outriggers. The model accurately detected a This experimental scenario demonstrates the unsafe behavior
worker without any safety equipment working on an A-type of a worker on an A-type ladder working at a height greater
FIGURE 8. Object detection in real-time. (a) Two workers and a ladder with outriggers. (b) Ladder without
outriggers.
FIGURE 9. Object detection in real-time. (a) Worker without a helmet on an A-type ladder without
outriggers. (b) Worker with helmet and safety belt on an A-type ladder with outriggers.
than 1.2 m and less than 2 m. Fig. 13 (a) depicts a worker when the model classifies the positive class inaccurately.
with a helmet but working as an individual at the height of The average precision is an important evaluation metric that
1.68 m (unsafe behavior as per the corresponding safety rule). demonstrates the overall usefulness of the algorithm through
Fig. 13(b) shows a worker on a ladder without outriggers a single numerical value.
working at the height of 1.7 m. Although the worker is
wearing a helmet and a safety belt, the behavior is classified
TP
as unsafe as the safety rule states that two workers should be Precison (P) = (17)
working in a group (in equation 16, Section II-C). (TP + FP)
TP
Recall(R) = (18)
IV. EVALUATION METRICS (TP + FN )
We evaluated the efficiency of the trained model using the (P ∗ R)
F1 − Score = 2 (19)
following metrics: precision, recall, F1-score, true positive (P + R)
rate (TPR), false positive rate (FPR), and average precision (TP + TN )
Accuracy = (20)
(Equations (17)–(22)). Precision quantifies the number of (TP + TN + FP + FN )
predicted true positives, whereas recall or TPR signifies the TP
correct identification of true positives. The FPR indicates TruePositiveRate (TPR) = (21)
TP + FN
36736 VOLUME 10, 2022
S. Anjum et al.: Fall Prevention From Ladders Utilizing Deep Learning-Based Height Assessment Method
FIGURE 10. Examples of predicted safe behavior at a working height H ≤ 1.2 m on an A-type ladder
with outriggers. (a) Worker with a helmet at H = 1.01 m. (b) Worker with safety belt and helmet at
H = 1.2 m.
FIGURE 11. Examples of predicted unsafe behavior at a working height H ≤ 1.2 m on an A-type
ladder without outriggers. (a) Worker without a helmet at H = 0.52 m. (b) Worker without a
helmet at H = 1.2 m.
FP
FalsePositiveRate (FPR) = (22) Similarly, Figs. 14 (b) and (c) show the average preci-
FP + TN sion of the class ladder with outriggers as 99% and worker
where TP, FN, FP, and TN represent the true positive, false with safety belt as 90% (confirming the ability to recognize
negative, false positive, and true negative, respectively. objects). The average precision of class workers without
helmets and workers with helmets was 84% and 70%, respec-
A. EVALUATION OF SSD tively. These values appear to be low; however, the lower
The trained model has been evaluated using an average pre- average precision of these classes compared with the other
cision indicator on a test dataset comprising 365 images. The classes is due to the imbalanced distribution in the dataset.
five classes considered in this study are ladder without outrig-
gers, ladder with outriggers, worker with helmet, worker with B. EVALUATION OF HCM
safety belt, and worker without a helmet. Figures 14(a)–14(e) We evaluated the proposed algorithm using four performance
illustrate the precision-recall curves. The recall was plotted indicators on a set of 300 images. These images are divided
on the X-axis and the precision on the Y-axis, which was into class-1 (160 images) for safe behaviors and class-2 for
evaluated at a threshold of 0.3. Fig. 14 (a) depicts the aver- unsafe behaviors (140-images). Both classes are assigned
age precision of the class ladder without outriggers as 98% binary numbers, i.e., 0 to safe and 1 to unsafe, to compare
(confirming the ability to detect an object). the ground truth and prediction. Fig. 15 shows an (n×n)
FIGURE 12. Examples of predicted safe behavior at a working height of 1.2 m < H < 2 m on
an A-type ladder with outriggers. (a) Two workers with helmets worker with a safety belt on
ladder) work in a group at H = 1.42 m. (b) Two workers with helmets (worker with a safety
belt on ladder) work in a group at H = 1.31 m.
FIGURE 13. Examples of predicted unsafe behavior at a working height 1.2 m < H < 2 m on an
A-type ladder. (a) Single worker with helmet working on a ladder with outriggers at H = 1.68 m.
(b) Single worker with safety belt and helmet working on a ladder without outriggers at H = 1.70 m.
confusion matrix, where n is defined as the number of classes. TABLE 4. Performance results of SSD-based HCM.
In this study, n = 2 (safe and unsafe behavior). The columns
represent the ground truth, and the rows represent the target
predictions. The SSD-based HCM correctly identified (TP)
safe behavior as 137, while the unsafe behavior in actual but
classified as safe (FP) was 21. Similarly, the algorithm cor-
rectly predicted the scene as an unsafe behavior (TN) of 119;
however, the scene predicted as unsafe with a safe behavior
in actual (FN) is 23. Table 4 summarises the performance
indicators of the proposed algorithm. This algorithm achieved 0.84, demonstrating that HCM can effectively identify unsafe
precision, recall, F1-score, and overall accuracy of 86.7%, behavior.
85.6%, 86.4%, and 85.33%, respectively.
Additionally, to validate the effectiveness of HCM in clas- V. DISCUSSION AND FUTURE WORK
sifying behavior, the receiver operating characteristic (ROC) The proposed method can compute the working height using
and area under the curve are shown in Fig. 16. The 300 images vision sensors (cameras) and proactively identify unsafe
were divided into a set of 5-(k-fold) and performed the pre- behavior in the case of negligence in compliance with rules.
diction on each k-fold to determine the TPR and FPR. The This method can also be used as a safety intervention as
ROC curve was plotted for each fold, with values ranging it is developed for safety monitoring and as a source to
from 0 to 1, with the calculated TPR and FPR. The green highlight unsafe behavior at different working heights on the
ROC curve shows the average of all the fold AUC values as ladder. When workers are aware that they are being monitored
FIGURE 14. Precision-Recall curves for the classes: (a) ladder without outriggers, (b) ladder with outriggers, (c) worker with
helmet, (d) worker with safety belt, and (e) worker without helmet.
unsafe behavior of a worker on an A-type ladder working at labeled for deep learning-based object detection (SSD
a height greater than 1.2 m but less than 2 m. The proposed with ResNet-50).
method can be deployed at a construction site to recognize 3) HCM: Workers may put their lives in danger by breach-
unsafe behavior (in real-time) as safety management and ing safety rules while working at different heights on
intervention system. This research not only provides an easy ladders. We tried to create a safer working environment
and automatic way to recognize unsafe behavior using CV by integrating the safety rules correlated with the A-
and safety regulations but also provides insights for deter- type ladder from the KOSHA expert database with
mining height using imaging data. The HCM can be easily a deep learning-based object detection system. The
adopted in other engineering domains. coordinates of the detected objects have been used to
Despite the effectiveness of this algorithm, it has several estimate the working height.
limitations. As we used 2D images obtained from 2D CCTV 4) Experiments: The proposed topology validated work-
cameras, workers standing behind an A-type ladder were ers’ safe and unsafe behaviors, following the KOSHA
misidentified as working on the ladder. This is because 2D rules, using four different cases. The developed algo-
cameras cannot identify the actual position of an object. This rithm utilizes object detection as a base model for the
limitation can be overcome by using stereo vision cameras. HCM to estimate the working height and compare it
These cameras can collect depth and distance information of with the corresponding occupational safety rules. The
workers and A-type ladders at construction sites; moreover, case scenarios demonstrated that the proposed height
such cameras enable accurate computation of the distance assessment method performed better in all cases to
between a worker and an A-type ladder. Now the proposed classify worker behavior as safe or unsafe.
algorithm can only predict the behavior of workers working at 5) Evaluation of Model: The detection model was tested
a height in the 1.2–2.0 m range on A-type ladders. However, on a dataset (365 images) categorized into five classes:
in future work, we plan to extend this algorithm to predict ladder without outriggers, ladder with outriggers,
the behavior of workers working at the height of up to 3.5 m worker with helmet, worker with safety belt, and
to cover all KOSHA regulations associated with the A-type worker without helmet, which achieved average preci-
ladder. We plan to develop an early risk assessment frame- sion values of 98%, 99%, 70%, 90%, and 84%, respec-
work with the safety risks index by considering risks and tively.
severity to classify risks as low, medium, and high for more 6) Evaluation of HCM: We evaluated the performance
advanced practical usability in managing risks while working of the HCM using a set of images (300 for all case
at a height on the ladder [32]. We plan to create a larger dataset scenarios) for binary classification (safe and unsafe).
by collecting images from various construction sites to detect The proposed algorithm achieved precision, recall, F1-
relevant objects for a more practical application of CV-based scores, and overall accuracies of 86.7%, 85.6%, 86.4%,
safety monitoring. Moreover, this method requires a reference and 85.33%, respectively.
point (ladder height) to estimate the working height. We are
The findings of this study show that the proposed approach
trying to overcome this limitation by utilizing the objects’
distance from the camera using depth information to estimate can accurately classify worker behavior as safe or unsafe at
the reference point automatically. a specified working height on an A-type ladder. The pro-
posed SSD-based HCM has produced convincing evidence
that this algorithm could help to estimate the working height
VI. CONCLUSION and automate the current safety monitoring process. The
This research focuses on a deep learning-based height esti- proposed method protects workers from injuries and fatalities
mation method for worker safety monitoring in real-time to and improves productivity, quality, and worker determination.
predict worker safety status at A-type ladders. This paper Furthermore, it has the potential to improve the return on
presents an automated solution to facilitate safety manage- investment by overcoming the FFL, which leads to the high
ment and overcome manual safety monitoring to reduce FFLs amount of insurance and fines from the occupational agen-
in construction sites. The main aspects of this study are as cies. Moreover, the HCM module can be used in other engi-
follows: neering domains for height estimation using a vision camera,
1) KOSHA Rules Analysis: We manually extracted the and it can be generalized with minor changes in assessing
safety rules correlated with the A-type ladder from the safety conditions according to the different occupational
the KOSHA expert database, converted them into a safety measures. However, future research should focus on
computer program, and integrated them with the CV overcoming the limitations of the current method, as dis-
cussed in Section V.
technology to provide workers with an effective and
automated safety method while working on A-type
ladders. ACKNOWLEDGMENT
2) Dataset: A dataset for safety behavior detection has Chansik Park would like to express his gratitude to Junsung
been prepared using 21 videos of working on an A-type Park and Dr. Doyeop Lee, who assisted in the extraction of
ladder. The frames were extracted (1825 images) and safety rules.
[42] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal MUHAMMAD KHAN received the master’s
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934. degree in civil engineering from Dong-A Univer-
[43] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and sity. He is currently pursuing the Ph.D. degree with
A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Proc. Eur. Conf. the Civil, Construction and Environmental Engi-
Comput. Vis., in Lecture Notes in Computer Science: Including Subseries neering Department, The University of Alabama,
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat- USA. His research mainly focuses on workers’
ics, vol. 9905, 2016, pp. 21–37, doi: 10.1007/978-3-319-46448-0_2. safety at the construction site utilizing sensors and
computer vision technologies. In this context, his
current research will identify the different acci-
dent risk factors that trigger fall accidents at the
construction site and propose a safety framework by integrating different
SHARJEEL ANJUM received the bachelor’s technologies to mitigate risks. In addition, he will be working to develop
degree in computer science (BSCS) from the a digital twin model for PM dust emission, control, and monitoring in real-
University Institute of Information Technology time.
(UIIT), Arid Agriculture University, Pakistan.
He is currently pursuing the master’s degree from
the School of Architecture and Building Science,
Chung-Ang University, Seoul, South Korea. His
research interests include machine/deep learning,
DONGMIN LEE received the B.E. and Ph.D.
the IoT, smart applications for construction man-
degrees in civil and architectural engineering from
agement, and worker safety.
Korea University, Seoul, South Korea. He has been
an Assistant Professor with the School of Architec-
ture and Building Science, Chung-Ang University,
since 2021. His research interests include the inte-
gration of construction equipment, method, plan-
NUMAN KHAN received the B.S. degree in civil
ning, scheduling, and control to support a better
engineering and the M.S. degree in engineering
human–robot collaborative working environment.
management from the Ghulam Ishaq Khan Insti-
In this context, his current research focuses on
tute of Science and Technology (GIKI), Topi, Pak-
improving project performance (e.g., cost, schedule, quality, safety, and
istan, in 2015, and the Ph.D. degree in archi-
sustainability) in the built environment by developing and testing of a digital
tectural engineering from Chung-Ang University,
twin of physical assets (e.g., robots, workers, and materials), which can be
Seoul, South Korea. He currently holds a postdoc-
used to simulate ‘‘what-if’’ scenarios using AI-based techniques (e.g., deep
toral position with the Department of Construction
reinforcement learning).
Engineering, École de Technologies Supérieure,
Montreal, Canada. His research interests include
BIM, computer vision, construction quality, rule-based modeling, construc-
tion fire safety, human safety in construction, and construction 4.0.