Semantic Video Mining For Accident Detection

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Semantic Video Mining for Accident Detection

Rohith G Twinkle Roy
Dept. of Computer Science Dept. of Computer Science
Sahrdaya College of Engineering and Technology Sahrdaya College of Engineering and Technology
Thrissur, India Thrissur, India
Vishnu Narayan V Shery Shaju

Dept. of Computer Science Dept. of Computer Science
Sahrdaya College of Engineering and Technology Sahrdaya College of Engineering and Technology
Thrissur, India Thrissur, India
Ann Rija Paul

Assistant Prof. (Dept. of Computer Science)
Sahrdaya College of Engineering and Technology Thrissur, India
Abstract:- This paper depicts the efficient use of CCTV for the system will try to extract the features of the vehicles.
traffic monitoring and accident detection. The system which The features like length, width and centroid are extracted to
is designed has the capability to classify the accident and classify the vehicle accordingly. The vehicle count is also
can give alerts when necessary. Nowadays we have CCTVs detected, which can be used for traffic congestion control.
on most of the roads, but its capabilities are being
underused. There also doesn’t exist an efficient system to Keywords:- YOLO V3, SSD , Faster RCNN , RCNN.
detect and classify accidents in real time. So many deaths
occur because of undetected accidents. It is difficult to I. INTRODUCTION
detect accidents in remote places and at night. The
proposed system can identify and classify accidents as Population is increasing day by day. Along with the
major and minor. It can automatically alert the authorities increase in population, the number of vehicles are also
if it deals with a major accident. Using this system the increasing. It is known that the present traffic management
response time on accident can be decreased by processing system is not efficient. Millions of people die in road accidents
the visuals of CCTV. every year. This is not only because of the increase in the
number of vehicles. There doesn’t exist any proper system to
In this system different image processing and machine detect accidents and to alert the authorities. The higher
learning techniques are used. The dataset for training is response time for the arrival of the emergency system causes
extracted from the visuals of already occurred accidents. many precious lives. Normally the road accidents are reported
Accidents mainly occur because of careless driving, alcohol by the people near the accident. Many of the cases the people
consumption and over speeding. Another main cause of who witness the accident are not willing to alert the authorities
death due to accidents are the delay in reporting and instead they are busy taking selfies. These types of
accidents since there doesn’t exist any automated systems. negligence are causing precious lives. Also we have CCTVs
Accidents are mainly reported by the public or by traffic installed on most of the roads. But the CCTV’s are not used
authorities. We can save many lives by detecting and efficiently. In the modern era where the technology is growing
reporting the accident quickly. In this system live video is faster we are still dependent on human power for traffic
captured from the CCTV’s and it is processed to detect monitoring.Since the number of traffic authorities is low and
accidents. In this system the YOLOV3 algorithm is used for the number of vehicle users is high it is difficult to control them.
object detection. Nowadays traffic monitoring has a greater Many people lose their life because of undetected accidents. It
significance. CCTV’s can be used to detect accidents since it is difficult to monitor vehicles all the time for humans. But it is
is present in most of the roads. It is only used for traffic easy and possible by using CCTV’s. The proposed system uses
monitoring. Normally accidents can be classified as two CCTV for traffic monitoring and accident detection with less
classes major and minor. The proposed system is able to human interventions.
classify the accident as major or minor by object detection
and tracking methodologies. Every accident doesn’t need The system captures live video from CCTV and processes
emergency support. Only major accidents must be handled it to detect accidents in real time. Surveillance cameras are
quickly. The proposed system captures the video and installed in most of the roads. This is mounted on a pole which
undergo object detection algorithms to identify the different can give clear vision of vehicles on the road. Present system
objects like vehicles and people. After the detection phase uses these visuals to monitor and control the traffic manually.
IJISRT20JUN432 www.ijisrt.com 670

ISSN No:-2456-2165
There will be control rooms to monitor the CCTV visuals. The calculations of different types of vehicles. Doppler radar in
number of traffic authorities is less and one person should have down-the-road (DTR) configuration is one of the widely used
to monitor multiple CCTV visuals which is very difficult and techniques for speed control on public roads. In DTR
not efficient. After the occurrence of an every minute is very configuration, there will be a line of travel of the target vehicle
critical, every extra minute that it takes for emergency services where the beam of the antenna is directed along. Target
to arrive can cost a life. So it is necessary to implement a identification is the main shortcoming of DTR Doppler radar.
system that can initially detect and track vehicle accidents Doppler radar devices have a wide beam width since they are
automatically so that it can be reported to the concerned not target selective, and the speed measurement is erroneous
authorities quickly so that the emergency services can arrive when there occurs the presence of two or more targets in the
faster to save many lives. The proposed system can detect and radar beam.
classify accidents in real time.
There are several drawbacks in Doppler radar in down-
The video which is captured by the surveillance camera the-road configuration. To minimize this drawback Doppler
undergo a set of image processing and deep learning tech- radar in across-the-road (ATR) configuration is introduced.
niques to create an efficient traffic surveillance system. Frames Vehicle speed is detected by directing the microwave beam
are detected from the captured video and transformations like across instead of down or along, the road in this system. When
foreground and background extraction are carried out to detect compared with DTR the main advantage over ATR
the vehicle. After vehicle detection features such as length, configuration is that the operational area of its beam is reduced.
width and centroid are extracted for the classification of This can drastically reduce the target identification problem.
vehicles. The vehicles are classified as light, medium and heavy These assumptions become a failure when traffic volume is
weighted vehicles. The count of the vehicles are also detected very denser. The added complexity of an ATR radar is the
based on a reference line which can be made use for traffic cosine effect. Radial velocity of the target vehicle can be
monitoring. Using this system vehicle attributes such as measured using the Doppler radar. It means that the value
variation in acceleration, change in position, variation in area which is been measured is the product of the speed of the
and the variation rate of inclination. These attributes are vital vehicle and the cosine of the angle between the beam and the
for the detection and classification of accidents. We are getting direction of motion. The angle should be known so that the
tremendous data from the surveillance camera. It is difficult to radar device will be able to correct the cosine effect. In most of
store these huge amounts of data for a long time. Hence the data the cases calibration is needed because cosine effect can induce
obtained from surveillance cameras must be summarized to uncertainty to the measurement process.
efficiently store it for a long time. In this system after detecting
accidents we are sending alerts to the authorities with a In order to solve the problem of target identification, laser
timestamp. The video will be summarized and only the part of traffic radars are used in DTR configuration. The narrow beam
the accident will be saved for future reference. The main merit width of these devices will help to select individual vehicles.
of this system is that it doesn’t include any physical detectors The technique behind the measuring of speed using this system
which should be maintained periodically. No additional is that it uses delay history of a burst of laser pulses. One of the
installments are required on road for the implementation of this drawbacks of this system is that it has to concentrate on a
system. unique target at a time. It means that different lasers must be
used for every road line so that it can control the whole road. In
II. RELATED WORKS this system the laser pulses would all strike on a flat surface
and it would be perpendicular to the path of the light wave. In
There exist different systems for traffic monitoring and general the waves strike the front or rear of the vehicle and
accident detection. The latest works are related to apps which these surfaces will be irregular. The output obtained here is the
can detect car accidents using the internal sensors of the reflected laser pulse which is dispersed in time. This is due to
smartphone and sends an emergency notification with the the different transit times for different portions of the reflected
location to pre-selected emergency contacts. This is done using beam which is also dispersed as angle in some cases. So there
the accelerometer and GPS sensors present in the smartphone. occurs a confusion in the measurement because of the multiple
By using this system we can send help as soon as possible. But reflection from the neighboring objects. The ATR laser system
the main disadvantage of this system is that classification of can employ two laser beams which can be operated on the time-
the accident is not possible. The accident cannot be confirmed distance principle which is not possible in DTR laser devices.
by only looking at the sensors inside the smartphone. Alerting There will be horizontal bars to which the lasers are mounted
authorities on minor accidents can cause problems. Even though which transmit parallel light beams that are separated by a
these apps are useful it is not efficient. Research is going on for known distance. The beams are directed across the road which
increasing the efficiency of these kinds of apps to detect will be perpendicular to the direction of traffic flow and it will
accidents using smartphone sensors. These types of apps are not be done by the equipment. A vehicle is detected by the beam by
suitable for all types of vehicles. In the case of traffic sensing the changes in the intensity of light reflected. One of
surveillance, another outstanding issue is the speed control on the major limitations of this system is that it can only measure
the road. There are different technologies to carry out speed the speed of one target at a specific time and the system fails if

ISSN No:-2456-2165
two vehicles in adjacent lanes overlap each other.
In order to solve the problem of several vehicles in the

radar beam a system called range Doppler is introduced. These
systems work on the basis of echoes of vehicles located at
different distances to the radar and it can discriminate against
them. This is the main advantage of this system over the
capabilities of current Doppler or laser traffic radars. They have
to concentrate on a unique target at a time. This system is also
in development so that the association of each echo or speed
measurement to each specific target has yet to be solved.
Suppose two targets are located at different road lanes and it
could be detected by using the same range bin by the radar.
Nowadays camera based systems are used to measure the mean
speed. The camera can be used to register the license plate of
each and every vehicle that is driven in a stretch which can
save transit time. Another camera can be used to repeat the
same process. By calculating the distance between both
cameras and both transit times it is possible for the system to
compute the average velocity of every vehicle. The main
disadvantage of this system is that it can only compute the
average speed. The top speed of the vehicle cannot be Fig. 1:- Phases of accident detection and alert
identified. Another alternative solution discovered to solve
these problems are collision warning, collision avoidance, and
adaptive mono pulse Doppler radar. In this Doppler radar
system a mono pulse antenna scheme is introduced in order to
track multiple targets. It can generate range, velocity, and
azimuth angle output data for each vehicle.
III. PROPOSED SYSTEM
We propose a novel accident detection and an accident

classification model which could detect the accidents and clas-
sify the accidents as major or minor and report the accidents
based on their priority. The accident detection is based on
object detection and tracking methodologies and major and
minor accident detection as classification methodology. The
alert thus generated will contain the time stamp and camera ID
which helps the appropriate authorities to track the location to
take the necessary steps to help the ones affected from the
accident. Our proposed accident detection model could be
easily explained as modules. The Overall proposed system
contains four modules. They are object detection and tracking
module, feature extraction module, Accident classification
module and alert system.
Fig. 2:- Darknet 53 [35]
A. Object Detection and Tracking
In order to track the object from the video and extract There are many object detection algorithms based on
features from the same, we have to detect the object in each machine learning as well as deep learning. Machine learning
frame of the video and its location in the subsequent frames based algorithms are usually based on SVM (Support Vector
should be tracked. The bounding box is usually drawn if Ma- chines) and deep learning performs much faster when com-
required around this object detected. It helps the user to pared to machine learning models. Convolution Neural net-
visualize the object tracking on the screen and could identify if works (CNN) under the deep learning models to detect objects
the object tracking mechanism identifies the object correctly performs faster than the other object detection algorithms. In
and marks the same. The features like speed, acceleration of the this project, we propose YOLO V3 CNN based on darknet. It
vehicle could be identified with the same. does object detection in real time. When comparing the
performance with other object detection algorithms it stood

ISSN No:-2456-2165
high in the case of speed but only an average in the case of
accuracy. Instance segmentation performs better than YOLO V3
in terms of accuracy but is slower with average hardware
availability. As the high end hardware is still not technically
viable for implementing along the existing systems, we stick on
to YOLOV3 for object detection.YOLO is a combination of
object locator as well as object recognizer. First the object from
the given images are located and then the objects located are
recognized to determine what object is the same and will group
into a class of objects or will be left unidentified. YOLO first
divides an image into 13 x 13 small images. The size of these
169 cells will depend on the size of the input given to the
YOLO model. Each of these divided cells are responsible for Fig. 4:- Labelling of accident
the recognizing of the objects correctly. Each of these divided
boxes will predict a confidence value for certain objects. 2) Train Test Split: The labelled data should be split into
Combining these confidence values together and grouping the training and testing sets. Training set is used to train the
boxes back will identify the object and will mark a bounding model and the testing set will be used to check the accuracy
box based on the coordinates predicted along with an overall of the model that we have trained. The testing and training
confidence value for the predicted object under a class. The split value could be in any way. In our case we set 80 percent
technique followed here is non-maximum suppression. as the training data and the rest 20 percent as the testing
dataset. This will create two folders with each type of
Steps Involved in Training the YOLO V3 model Labeling images along with its corresponding label text or Pascal
of dataset (images) Train Test file generation Anchor generation file in the folders.
Train model to obtain weight
3) Anchor generation: Anchor generation is done based on the
1) Labelling of dataset: : The dataset is a collection of images clustering of all the width and height of the input images
of the vehicles as well as accidents, damaged vehicles etc. that we have given and it will cluster all the images into a
Each of the images have to be labelled under the different certain width and height ratio. So instead of predicting a
class we have created. The labelling is done using a wide range of width and height ratio YOLOV3 will limit the
labelling program written in python. We have to mark the the object detection to the clustered classes of width height
bounding box on the objects correctly under the correct ratio. The labelled images along with the labelled data files
class. The number of images under each class should be at are given as input in this step and the output will be 2
least 1000, more the number of labelled images given as aggregated final values. These values will be used in the
input more the accuracy. After generating the images, the training phase.
python program will create a file for each image which
will contain the bounding box of the objects under different 4) Training Model : The configuration file is changed with the
classes. These files along with the images will be used in the anchor, filters and class. The number of classes is known.
upcoming steps. The files could be in XML PASCAL
format or may be in normal text files. YOLO V3 supports B. already. Filters are calculated from an equation and the
both the file types and processing could be done based on anchor values are already obtained in the above step. After
any of the two file types. editing the configuration file the dataset is ready for
training. We train our model using google colab using the
GPU of the specifications 1xTesla K80 , compute 3.7,
having 2496 CUDA cores , 12GB GDDR5 VRAM. After
training of the model the output is a weight file. We
will use this weight file generated in the upcoming steps in
order to perform object detection.
Fig. 3:- Labelling of vehicles

ISSN No:-2456-2165
Fig. 5:- training of dataset

Fig. 6:- Algorithm performance comparison on the COCO
The initial weight generation training phase will consume dataset [34]
time. It may last for hours but once the final weight is trained
the object detection could be done in real time. The live stream YOLOV3 Could identify the objects, classify the same
videos of 30 fps (frames per second) could be given as input and could give their moving coordinates also if they do but if
and the same could be processed in real-time and the objects two objects of the same class come into the frame YOLOV3
could be detected. The bounding box prediction is the face issues in tracking the same. In our case, we need to identify
prediction of the center and the x and y values which will each vehicles uniquely and its parameters have to extracted and
generate the same. The object is tracked in each of the stored. We cannot attain this from the direct output of the
frames one after the other. This is done by tracking the centroid YOLOV3, thus we need to track the objects uniquely and an ID
of the object detected and marking the movement of the is to be given for the same.
centroid of the objects. Using this technique the path of the
movement of the object could be easily detected. The speed The case of vehicles on the road, thousands of vehicles
also could be easily calculated from the distance travelled by pass under the same surveillance camera. These same vehicles
the centroid of the object and the time taken for the same. Thus might pass through many other cameras. Thus in order to
various parameters could be obtained from the same. YOLOV3 provide a unique ID for the same, it’s better to give a camera
convolutional neural network is thus a combination of the ID, Class ID and time stamp for creating a unique ID. The final
object locator and classifier. ID will look similar to Eg: CAM23 Car3 13:16:17:10
03:04:2020 , thus with the help of the camera ID the location of
the camera could be identified, The vehicle if more than one
enters to the frame each vehicle under the same class will be
numbered. The time stamp will contain the time in 24 hour
format and the date of the tracked object. Thus we could
uniquely identify the vehicles. If we need to track the same
vehicles under different traffic surveillance videos the same
could be done by aggregating the values together and finally
attaining a result. Thus after tracking the objects uniquely now
we have to extract features from the same.
Table 1:- Comparison of Yolov3 With Other State-Of-The- C. Feature Extraction Module
Art-Models [35] In the feature extraction module the required features are
extracted from the set of features. The informations such as
YOLOV3 will predict the objects faster than SSD but overlapping between vehicles, stopping, velocity, differential
SSD stands a bit higher in terms of accuracy. When comparing motion vector of vehicles, and direction of each vehicle are
with other algorithms YOLOV3 will rank higher in terms of mainly considered. if more than one vehicle are detected in a
speed. A comparison between the performance of different frame, we cannot consider single object features. In this case
algorithms on the COCO dataset is shown in the below each pair of objects are considered and above mentioned factors
figure. are examined and calculate the probability of accidents.

ISSN No:-2456-2165
1) The basic information of object: Initially all the basic Conservation of Momentum: For all impacts where the
information about the vehicle which is our object has to be road or the wheel forces can be considered as negligible, the
extracted. These are obtained when tracking using YOLO variation of momentum for a vehicle is equal but opposite to
V3 model and also by comparing the current frame and that of other vehicles. Hence the primary check can be done
previous frame. The information obtained includes object for momentum relationship. The change in momentum can be
ids, left, top coordinates, right, bottom coordinates, and calculated and checked whether it crosses the threshold value.
center(x, y) coordinates. Here we consider the threshold as 10 km/h.
2) The overlap of objects: In accident detection the overlap-
ping feature is important. This can be obtained by
comparing the left, right, top, and bottom coordinates of
objects. If the left and right coordinates of a vehicle are in
between the left or right of the other vehicle, then they are
overlapped.
3) The stop of objects: The vehicles usually stop after an
accident. So stopping vehicles can be considered as a factor.
This feature can be estimated by comparing the difference
between current coordinates and previous coordinates.
4) Velocity : The variation in velocity before and after the
occurrence of accidents can be used. The average speed of a
vehicle at a general intersection is 60 90km/h. Based on this
we classify the velocity into three states such as fast, Fig. 7:- Differential motion vector [33]
normal, and slow.
5) Direction: The direction between vehicles before and after 1) Velocity Triangles: If the law of momentum conservation is
helps to classify the collisions into broadside collision, a satisfied, a subsequent check on data can be made
head-on collision, and a rear-end collision. This is considering the velocity triangles. The vector sum of initial
calculated by finding the difference of center of the vehicle velocity and change in velocity must be equal to the post
detected before and after the collision. impact velocity.
6) Differential motion vector : This facility helps in identi- 2) Energy Loss: After both the Conservation of Momen- tum
fying the risks at the crossroads. Speed and direction of check and Velocity Triangles checks are done, law of energy
vehicle are identified before and after accidents. This is conservation or by the related expression with Energy
because both speed and direction change due to external Equivalent Speed Equations.
force during accidents. The trio of features have three
discrete values, such as up-down, up and rest, which E. Alert System
represent the difference velocity vectors between past and When all the modules are working, there is an active
present time. Upstate means differential motion vectors relationship between them. The accident detection module is
compared to the previous period. An up-down state means connected to the server and the server is connected to the
reducing the differential motion vectors after the upstate and hospital.
the rest state. Fig. 2 compares the contrast speed and hazard
of the two objects. The differential motion vector and the The feature extraction module monitors for sudden
crash express the dotted line and line, respectively. changes in the position of the vehicle. When there is a sudden
Spontaneous upward movement means the variation and risk change in the parameters such as direction, velocity and
of a moving vector. We can examine the differential speed momentum of the vehicle the accident detection module thinks
vector before and after the crash. This feature was chosen the crash has occurred and a warning is generated in the
as the most important. graphical user interface. A warning message will be sent to
notify the server about the hospital.
D. Accident classification module
The various parameters obtained from feature extraction, Message provides the user to cancel the notification. If the
like change in velocities, change in direction, change in user regains consciousness at any time, he will be able to
coordinates etc. before and after accidents, the analysis has been inform the hospital about it. To avoid unnecessary overhead of
carried out to evaluate the plausibility of collected data. Based uniform and emergency services when conditions are not
on these data collected, cross examination is done by the required. The user has 3 options, the user can cancel the
following methods: message if the alert is present and the risk is not very severe, or
if present. The user is alert, but still wants to notify the server,
the message will be sent if the user does not cancel the message.
If the user is alert and wants to cancel the notification to the
user, buffer time is given to the server to cancel the notification.

ISSN No:-2456-2165
The buffer time depends on the user’s choice. and in Fast R-CNN it is inputted as feature maps.
If the user does not cancel the notification on the server, Faster R-CNN is a network where object detection is
the user’s location, coordinates and timestamp during the crash faster. Region proposal problems solution is found out by this
is sent to the server. The server accepts these coordinates, network. Compared to all models its computation speed is very
assuming the user is unconscious. The server can be in any less. So the image resolution is less than the input original
remote location and provide the appropriate service. It must be image.
available at all times. Since the amount of data sent and
received at any given time is very small, no additional costs YOLO-You Only Look Once is another object detection.
should be applied to ensure availability at all times. In an image what are the objects and where the images are
present can be detected by only one look at the image. Instead
The server contains a database of all hospital IP of classification YOLO uses regression. And it can separate the
addresses. Under the operation of the system and mechanism to bounding boxes and class probabilities for every part in an
identify the nearest hospital based on the coordinates obtained. image. Within a single analysis it can predict the bounding
After identifying the nearest hospital, the server notifies the boxes with class probabilities, only a single network is used for
hospital about the user’s coordinates (location). this process. A single CNN can predict multiple bounded boxes
and also weights are given for these bounding boxes.
The Hospital receives a notification from the server about
the user’s location. The hospital uses a graphical user interface
to display the user’s location coordinates. Hospital operators YOLOv3 is different as it is using logistic regression to
can easily map coordinates on a map. This way the victim can find the object within bounding boxes. If the ground truth is
provide medical emergency services in a short time. This can overlapped with the bounding box which is greater than any
reduce time and mortality rate. other bounding box then object score must be one. If a
bounding box overlaps the ground truth by a threshold which is
IV. EXPERIMENTAL EVALUATION not best, that type of predictions can be disregarded.
We propose a model that detects accidents from the video A. CNN vs YOLO
footage and inform the authorities about the accident. Here we Both Faster R-CNN and YOLO consider its core as
are extracting the accident images from the cctv footage. The Convo- lutional Neural Network. YOLO partitions the image
extraction of images comes under the field of computer vision before using CNN for processing the image whereas RCNN
along with image processing. Mainly object detection is used keeps the whole image as such and only division of proposals
to focus an object based on its size, coordinates and categorize take place later. In YOLO the image is partitioned into grids.
them to various fields. By object detection we can get two
dimensional images which will provide us more details about
space, size, orientation etc.
Object detection we can use CNN, R-CNN, Fast RCNN,

YOLO and SSD. Early days we use CNN for object detection
like face identification, voice identification, etc. CNN is a
convolutional neural network which is under the category of
feed forward network. It can extract certain characteristics or
features from an image. There are three layers in CNN: the
convolution layer, pooling layer and the fully connected layer. Table 2:- SPEED AND PERFORMANCE COMPARISON
Convolution layers function is to focus on input requirements. [36]
Pooling layer is in between the convolution layer and it is used
to increase the effectiveness of feature extraction. Fully Table given above contains the comparison of speed and
connected layers represent the output layer in a convolution performance of detectors. Fast YOLO is fastest but YOLO has
neural network. -CNN consists of three modules, one module is the highest precision compared to Fast YOLO. So it is used for
dependent on categorical production. Second module is detection of objects. Fast YOLO is the fastest detector; it is two
concerned with extraction features. And the last module is times precise as any other detectors within 52.7 percent map.
SVM(Support Vector Machine). YOLO can increase its map to 66.4 percent in its real time
performance. When we consider the accuracy FasterRCNN is
Fast R-CNN is a network where the whole image can be the most suitable algorithm. Super Fast YOLO can be chosen if
taken as input. Here a convolution feature map is created which accuracy is not given much importance. When we consider the
consists of a full image with different convolution layers. The error formation YOLO is having most of the localization errors.
regional detection proposals inputting is different in R-CNN While Fast R-CNN has many background errors and only
and Fast R-CNN. In R-CNN proposals are inputted as pixels

ISSN No:-2456-2165
limited no of localization errors. [7]. G. J. Simon, P. J. Caraballo, T. M. Therneau, S. S.
Cha, M.
Thus we choose YOLO as the best object detection algo- [8]. R. Castro, and P. W. Li, “Extending association rule
rithm for detecting accidents and classifying it into major and summarization techniques to assess risk of diabetes
minor accidents. mellitus,” .”, IEEE Trans. Knowl. Data Eng., vol. 27, no.
1, pp. 130–141, Jan. 2015.
V. CONCLUSION AND FUTURE WORK [9]. T. Yao, T. Mei, and Y. Rui, “Highlight detection with
pairwise deep ranking for first-person video
This paper is all about video extraction based accident de- summarization,” in Proc. IEEE CVPR, Jun. 2016, pp.
tection from road traffic surveillance videos. We are extracting 982–990.
the image of accidents from crash videos and thus detect it [10]. S. S. Thomas, S. Gupta, and K. S. Venkatesh,
as an accident. For this detection we use the YOLOv3 neural “Perceptual video summarization—A new framework for
network which is more precise than any other neural video summarization,”.”, IEEE Trans. Circuits Syst. Video
network. The detected accidents are reported to the authorities Technol., vol. 27, no. 8, pp. 1790–1802, Aug. 2016.
to reduce the human interventions and to get immediate care for [11]. M. Ajmal, M. H. Ashraf, M. Shakir, Y. Abbas, and F.
the human lives. Thus it is tested for various stages of road A. Shah, “Video summarization: Techniques and
accidents. It is also tested for different types of collisions with classification,” .”, in Computer Vision and Graphics
different types of vehicles. The results show that this is a (Lecture Notes in Computer Science), vol. 7594. Springer,
precise model for detecting accidents for traffic surveillance 2012, pp. 1–13. [Online].
and alerting the authorities. [12]. S. Zhang, Y. Zhu, and A. K. Roy-Chowdhury,
“Context-aware surveillance video summarization,”.”,
As a future work we can include the classification of the IEEE Trans. Image Process., vol. 25, no. 11, pp. 5469–
detected accidents into major and minor accidents. Thus major 5478, Nov. 2016.
accidents can be reported to the nearby hospitals and minor [13]. L. Zhang, L. Sun, W. Wang, and Y. Tian, “KaaS: A
accidents to the relatives. And also an extension to the current standard framework proposal on video skimming,” .”,
project the road events can also be detected and any road traffic IEEE Internet Comput., vol. 20, no. 4, pp. 54–59,
violations can be found out. Jul./Aug. 2016.
[14]. L. Itti, C. Koch, and E. Niebur, “A model of saliency-
ACKNOWLEDGEMENTS based visual attention for rapid scene analysis,”.”, IEEE
Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp.
The paper is prepared by taking assistance from various 1254–1259, Nov. 1998.
reference papers, we are thankful to them. We also express our [15]. Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A
gratitude to our professors and guides for helping us throughout generic framework of user attention model and its
the work. application in video summarization,”.”, IEEE Trans.
Multimedia, vol. 7, no. 5, pp. 907–919, Oct. 2005.
REFERENCES [16]. Y. Pritch, A. Rav-Acha, and S. Peleg,
“Nonchronological video synopsis and indexing,”.”, IEEE
[1]. Road Crash Statistics. (Sep. 2016). [Online]. Available: Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp.
https://asirt.org/ initiatives/informing-road-users/road- 1971–1984, Nov. 2008.
safety-facts/road- crash-statistics. [17]. Y. He, Z. Qu, C. Gao, and N. Sang, “Fast online video
[2]. C.-M. Tsai, L.-W. Kang, C.-W. Lin, and W. Lin, synopsis based on potential collision graph,” .”, IEEE
“Scene-based movie summarization via role-community Signal Process. Lett., vol. 24, no. 1, pp. 22–26, Jan.
networks,” .”, IEEE Trans. Circuits Syst. Video Technol., 2017.
vol. 23, no. 11, pp. 1927–1940, Nov. 2013. [18]. S. Chakraborty, O. Tickoo, and R. Iyer, “Adaptive
[3]. M. Tavassolipour, M. Karimian, and S. Kasaei, “Event keyframe selec- tion for video summarization,”.”, in Proc.
detection and summarization in soccer videos using IEEE WACV, Jan. 2015, pp. 702–709.
Bayesian network and Copula,” [19]. J. Xu, L. Mukherjee, Y. Li, J. Warner, J. M. Rehg,
[4]. .”, IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. and V. Singh, “Gaze-enabled egocentric video
2, pp. 291–304, Feb. 2014. summarization via constrained submodular
[5]. S. Parthasarathy and T. Hasan, “Automatic broadcast maximization,”.”, in Proc. IEEE CVPR, Jun. 2015, pp.
news summarization via rank classifiers and 2235–2244.
crowdsourced annotation,” in Proc. IEEE ICASSP, Apr. [20]. F. Hussein, S. Awwad, and M. Piccardi, “Joint action
2015, pp. 5256–5260. recognition and summarization by sub-modular
[6]. M. Cote, F. Jean, A. B. Albu, and D. Capson, “Video inference,” .”, in Proc. IEEE ICASSP, Mar. 2016, pp.
summarization for remote invigilation of online exams,” 2697–2701.
.”, in Proc. IEEE WACV, Mar. 2016, pp. 1–9.

ISSN No:-2456-2165
[21]. E. D’Andrea, P. Ducange, B. Lazzerini, and F. [34]. P. Dollár, R. Appel, S. Belongie, and P. Perona,
Marcelloni, “Real- time detection of traffic from Twitter “Fast feature pyramids for object detection,” .”, IEEE
stream analysis,”.”, IEEE Trans. Intell. Transp. Syst., vol. Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp.
16, no. 4, pp. 2269–2283, Aug. 2015. 1532–1545, Aug. 2014.
[22]. H. Zhao, C. Wang, Y. Lin, F. Guillemard, S. Geronimi, [35]. Ju-Won Hwang, Young-Seol Lee, and Sung-Bae Cho,
and F. Aioun, “On-road vehicle trajectory collection and “Hierarchical Probabilistic Network-based System for
scene-based lane change analysis: Part I,” .”, IEEE Trans. Traffic Accident Detection at Intersections,” .”, Symposia
Intell. Transp. Syst., vol. 18, no. 1, pp. 192–205, Jan. and Workshops on Ubiquitous, Autonomic and Trusted
2017. Computing, 2010.
[23]. W. Yao et al., “On-road vehicle trajectory collection and [36]. J. Du “Understanding of Object Detection Based on CNN
scene-based lane change analysis: Part II,”.”, IEEE Trans. Family and YOLO,” .”, J. Phys. Conf. Ser., pp. 1–8, 2018.
Intell. Transp. Syst., vol. 18, no. 1, pp. 206–220, Jan. [37]. Sik-Ho Tsang “Review: Faster R-CNN (Object
2017. Detection),” .”, https://towardsdatascience.com/review-
[24]. S. R. E. Datondji, Y. Dupuis, P. Subirats, and P. faster-r-cnn-object- detection- f5685cb30202. [Accessed:
Vasseur, “A survey of vision-based traffic monitoring of 10-April-2019]. Towards Data Science, 2018. [Online].
road intersections,” .”, IEEE Trans. Intell. Transp. Syst., Available: https://towardsdatascience.com/review-faster-
vol. 17, no. 10, pp. 2681–2698, Oct. 2016. r-cnn-object- detection- f5685cb30202. [Accessed: 10-
[25]. S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. April-2019].
Sakauchi, “Traffic monitoring and accident detection at [38]. G. Song, Y. Liu, M. Jiang, Y. Wang, J. Yan, and B.
intersections,” .”, IEEE Trans. Intell. Transp. Syst., vol. 1, Leng, “Beyond Trade-Off: Accelerate FCN-Based Face
no. 2, pp. 108–118, Jun. 2000. Detector with Higher Accuracy,” .”, IEEE/CVF
[26]. H. Veeraraghavan, O. Masoud, and N. P. Conference on Computer Vision and Pattern Recognition,
Papanikolopoulos, “Com- puter vision algorithms for 2018, pp. 7756– 7764.
intersection monitoring,”.”, IEEE Trans. Intell. Transp.
Syst., vol. 4, no. 2, pp. 78–89, Jun. 2003.
[27]. S. Atev, H. Arumugam, O. Masoud, R. Janardan, and
N. P. Papanikolopoulos, “A vision-based approach to
collision prediction at traffic intersections,” .”, IEEE
Trans. Intell. Transp. Syst., vol. 6, no. 4, pp. 416–423,
Dec. 2005.
[28]. Y. K. Ki and D. Y. Lee,“A traffic accident recording and
reporting model at intersections,” .”, IEEE Trans. Intell.
Transp. Syst., vol. 8, no. 2, pp. 188–194, Jun. 2007.
[29]. Ö . Aköz and M. E. Karsligil,“Video-based traffic
accident analysis at intersections using partial vehicle
trajectories,” ”, in Proc. IEEE ICIP, Sep. 2010, pp. 499–
502.
[30]. K. Yun, H. Jeong, K. M. Yi, S. W. Kim, and J. Y. Choi,
“Motion interaction field for accident detection in traffic
surveillance video,” .”, in Proc. IEEE ICPR, Aug. 2014,
pp. 3062–3067.
[31]. H.-S. Song, S.-N. Lu, X. Ma, Y. Yang, X.-Q. Liu, and P.
Zhang, “Vehicle behavior analysis using target motion
trajectories,” .”, IEEE Trans. Veh. Technol., vol. 63, no.
8, pp. 3580–3591, Oct. 2014.
[32]. H. T. Nguyen, S.-W. Jung, and C. S. Won, “Order-
preserving condensation of moving objects in surveillance
videos,”.”, IEEE Trans. Intell. Transp. Syst., vol. 17, no.
9, pp. 2408–2418, Sep. 2016.
[33]. S. S. Thomas, S. Gupta, and V. K. Subramanian,
“Perceptual synoptic view of pixel, object and semantic
based attributes of video,”.”,
J. Vis. Commun. Image Represent., vol. 38, pp. 367–377,
Jul. 2016.

Semantic Video Mining For Accident Detection

Uploaded by

Copyright:

Available Formats

Semantic Video Mining For Accident Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semantic Video Mining For Accident Detection

Uploaded by

Copyright:

Available Formats

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

Semantic Video Mining for Accident Detection

Vishnu Narayan V Shery Shaju

Ann Rija Paul

IJISRT20JUN432 www.ijisrt.com 670

IJISRT20JUN432 www.ijisrt.com 671

In order to solve the problem of several vehicles in the

III. PROPOSED SYSTEM

We propose a novel accident detection and an accident

IJISRT20JUN432 www.ijisrt.com 672

Fig. 3:- Labelling of vehicles

IJISRT20JUN432 www.ijisrt.com 673

Fig. 5:- training of dataset

IJISRT20JUN432 www.ijisrt.com 674

IJISRT20JUN432 www.ijisrt.com 675

Object detection we can use CNN, R-CNN, Fast RCNN,

IJISRT20JUN432 www.ijisrt.com 676

IJISRT20JUN432 www.ijisrt.com 677

IJISRT20JUN432 www.ijisrt.com 678

You might also like