I 000850290 Thesis
I 000850290 Thesis
I 000850290 Thesis
Master thesis
Author:
First Examiner:
Ravikiran Venkatesh MURTHY
Prof. Dr.-Ing. Werner Huber
BIDURGA
Second Examiner:
Prof. Dr. techn. Priv.-Doz. Andreas Riener
I hereby declare that this thesis is my own work, that I have not presented it elsewhere
for examination purposes and that I have not used any sources or aids other than those
stated. I have marked verbatim and indirect quotations as such.
i
Abstract
Vision Zero is a multi-national project that aims to achieve no fatalities involving road
traffic. But, many technical challenges need to be mastered if at all Vision Zero is to
become reality. The main challenge is to ensure the safety of the usage of automated
functions in vehicles. In German Autobahn the average distance between two fatal
accidents is around 7 × 108 km [1]. According to Wachenfeld and Winner [2], it is
assumed to drive ten times more than the reference distance to prove an automated
vehicle’s safe operation. Conventional test drives are not suited for this purpose since
critical traffic situations cannot be tested in a normal road with traffic. Real road testing
represents a high risk for other road users. Therefore, it is necessary to transfer part of
the test cases to a safe laboratory environment.
To ensure that the automated driving functions can be tested as reliably as possible, the
complete chain of components must be available in the laboratory. These components
are hardware and software of environmental sensors, ECU and their required interfaces.
In such a configuration with all available components, the hardware-in-the-loop test
methods should be improved by including real hardware of environmental sensors. For
this purpose, synthetic sensor data can be used to verify and validate the tests to be
conducted under laboratory conditions.
The two main types of sensors most used in the automotive industry are the radar
sensor and the camera. In this thesis, only the camera sensor is considered. The pur-
pose of this work is to compare over-the-air and direct data injection test methods for
camera-based algorithms in a hardware-in-the-loop setup. The over-the-air data injec-
tion method involves injecting camera sensor data into an ECU using an LCD monitor
and an automotive camera setup. The camera is placed in front of the LCD monitor
so that it can capture the data being displayed on the LCD monitor. The direct data
injection method uses a device called a video interface box which emulates a camera.
The VIB requires raw camera sensor data input and it injects camera sensor data into
an ECU.
Camera data received by the ECU using over-the-air and direct data injection methods
are compared to its reference camera data using full-reference and no-reference image
quality metrics. Another purpose of this thesis is to observe the influence of these two
injection methods on an image-based algorithm namely, an object detection algorithm.
It is noticed that the direct data injection methods have higher image quality than the
over-the-air data injection method in terms of color channel perceiving, similarity to
the reference and focus in the image. On the other hand, over-the-air data injection
has shown better performance at object detection. But, both methods exhibit a very
good testing methodology to test camera-based algorithm using synthetic camera data
for injection.
ii
Acknowledgement
I would like to extend my thanks to Prof. Dr.-Ing. Werner Huber of the Technische
Hochschule Ingolstadt for his time, support and guidance throughout my work, his
knowledge and ideas encouraged me to explore more about the topic. I would also like
to thank Prof. Dr. techn. Priv.-Doz. Andreas Reiner of the Technische Hochschule
Ingolstadt for accepting to be my second reviewer.
I would like extend my thanks to Dipl.-Ing. Fabio Reway, for mentoring me and providing
moral support throughout the period of my thesis. Finally, I must express my very
profound gratitude to my parents and to my beloved friends for providing me with
unfailing support and continuous encouragement throughout my years of study. This
accomplishment would not have been possible without them. Thank you.
iii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Test Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Model-In-Loop (MIL) . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Software-In-Loop (SIL) . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Hardware-In-Loop (HIL) . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 State of the Art Methodology for Camera-In-the-Loop Simulation . . . . 5
2.3 Image Quality Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Full-Reference Image Quality Metrics . . . . . . . . . . . . . . . . 8
2.3.2 No-Reference Image Quality Metrics . . . . . . . . . . . . . . . . 9
2.3.3 Metric to Compare Edge Pixels . . . . . . . . . . . . . . . . . . . 9
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Software and Hardware Description . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Robot Operating System (ROS) . . . . . . . . . . . . . . . . . . . 13
3.1.2 Nvidia Drive PX2 Platform . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 Automotive Camera . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.4 LCD Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.5 Camera-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.6 Video Interface Box (VIB) . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Data Injection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Over the Air Data Injection . . . . . . . . . . . . . . . . . . . . . 19
iii
3.2.2 Challenges in Over the Air Injection Test Setup . . . . . . . . . . 20
3.2.3 Direct Data Injection . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.4 Challenges in Direct Injection Test Setup . . . . . . . . . . . . . . 21
3.3 Robot Operating Systems (ROS) in Nvidia Drive PX2 . . . . . . . . . . 22
3.3.1 GMSL Driver Package . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Object Detection Package . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Custom ROS Package . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 ROS Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Simulation-based Camera Calibration . . . . . . . . . . . . . . . . 25
3.5 MTF Measurement for OTA and DDI Methods . . . . . . . . . . . . . . 26
3.5.1 Image Acquisition form the Automotive Camera for MTF Mea-
surement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.2 Image Acquisition from OTA and DDI for MTF Measurement . . 27
3.6 Image Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6.1 Need for Image sorting . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6.2 Perceptual Hashing algorithm - dHash . . . . . . . . . . . . . . . 29
3.7 Image Quality Measurements . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7.1 Area Under the Curve (AUC) and Root Mean Squared Error
(RMSE) of R, G and B Channel . . . . . . . . . . . . . . . . . . . 31
3.7.2 Structural Similarity Index (SSIM) . . . . . . . . . . . . . . . . . 31
3.7.3 Absolute Central Moment (ACM) . . . . . . . . . . . . . . . . . . 31
3.7.4 Metric to Compare Edge Pixels . . . . . . . . . . . . . . . . . . . 32
3.8 Object Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8.1 Ground-Truth for Object Detection . . . . . . . . . . . . . . . . . 33
3.8.2 Object Detection Algorithm in ROS using YOLO V3 . . . . . . . 34
3.8.3 Object Detection Algorithm offline using YOLO V4 . . . . . . . . 34
3.8.4 Evaluation of object detection performance . . . . . . . . . . . . . 34
3.9 Data acquisition for OTA and DDI Methods . . . . . . . . . . . . . . . . 36
3.9.1 Scenario Generation . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.9.2 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
iv
4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 MTF Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 MTF Measurement for Automotive Camera with Real Target . . 38
4.1.2 MTF Measurement for OTA Data Injection Method . . . . . . . . 39
4.1.3 MTF Measurement for DDI Data Injection Method . . . . . . . . 39
4.2 Image Quality Measurement Results . . . . . . . . . . . . . . . . . . . . 41
4.2.1 Color Comparison of Images . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 Structural Similarity Index (SSIM) . . . . . . . . . . . . . . . . . 42
4.2.3 Metric for Image Focus . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.4 Metric for Edges Pixel . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Object Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Selection of Scenarios for Object Detection . . . . . . . . . . . . . 46
4.3.2 Result of Object Detection for the Selected Scenarios . . . . . . . 47
v
List of Figures
4.1 MTF curve measured for automotive camera using a real printed target . 39
vi
4.2 MTF curve measured for OTA method . . . . . . . . . . . . . . . . . . . 40
4.3 MTF curve measured for DDI method . . . . . . . . . . . . . . . . . . . 40
4.4 Area under the curve of RMSE values of Red Channel comparison of OTA
and DDI for 14 scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Area under the curve of RMSE value of Green Channel comparison of
OTA and DDI for 14 scenarios . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Area under the curve of RMSE value of Blue Channels comparison of
OTA and DDI for 14 scenarios . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Box Plot for Structural Similarity Index comparison of OTA and DDI for
14 scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 Absolute Central Moment value comparison of OTA, DDI and CarMaker
for 14 scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.9 Average value of Eg comparison of OTA and DDI for 14 scenarios . . . . 45
4.10 Object detection evaluation result of CarMaker Scenario 7 . . . . . . . . 48
4.11 Object detection evaluation result of OTA method Scenario 7 . . . . . . 49
4.12 Object detection evaluation result of DDI method Scenario 7 . . . . . . . 49
4.13 Object detection evaluation result of CarMaker Scenario 8 . . . . . . . . 50
4.14 Object detection evaluation result of OTA method Scenario 8 . . . . . . 51
4.15 Object detection evaluation result of DDI method Scenario 8 . . . . . . . 51
vii
List of Tables
viii
D.5 Results of evaluation of object detection performance in scenario 8 with
threshold 70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
D.6 Results of evaluation of object detection performance in scenario 8 with
threshold 80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
ix
List of Abbreviations
x
Chapter 1
Introduction
1.1 Motivation
The continuous evolution of automotive technology has paved the path for more and
more advanced driver assistance systems. Advanced driver assistance systems are one
of the most popular topics for research and development for improving both safety
and comfort by assisting the driver with their driving tasks. These systems are being
improvised and new automated driving functionalities are being developed to make the
driving experience even better and to make the roads even safer than before.
Lidar, Camera and Radar are the main three primary sensors used in recent auto-
mated driving functionalities. Many autonomous functionalities depend on the cameras
mounted on the front, rear, left, right side of the vehicle. Together a 360-degree view
of the surrounding can be obtained. These cameras can have a field of view up to 360
degrees. Some of the autonomous functionalists also include fisheye cameras to gain a
panoramic view.
The cameras provide an accurate view of the surroundings but they have limitations of
their own. The cameras can identify the distinct details about the nearby pedestrian,
cyclists, street signs, road markings, Traffic signals. However, the distance of those
objects needs to be known as well. It is difficult for cameras-based functionalities of the
vehicle to detect objects in demanding visibility conditions such as rain, fog, or during
night.
This rising development of automated driving requires an equally strong and reliable
testing method. Testing is an integral part of development. one cannot be improved
without another. For a camera dependent technology, testing is one of the main hurdles
to be passed. These technologies and their functionality need to be tested readily and
thoroughly beforehand deploying on the roads.
There are a variety of testing methods for evaluating automated driving-related systems
and functionalities. Methods of MIL, SIL, HIL and VIL are used nowadays as a part
of verifying the functionalities. Due to the availability of numerous testing methods,
it poses difficulty in choosing a particular method for testing. Hence we would need
a proper study regarding the comparison of methods to be used. This is the reason
that in this thesis we explore two different methods of testing related to camera and
vision-based functionalities.
1
Chapter 1: Introduction
The goal of the thesis is to evaluate the two methods of data injection namely, over
the air data injection (OTA) and direct data injection (DDI) that can be used to test a
camera sensor data based algorithm in the Automotive industry. This thesis attempts
to answer the following questions:
1. How much do the camera sensor data injected into an ECU differ from using OTA
and DDI methods when compared to the original data?
2. How efficient are OTA and DDI methods to test an algorithm using synthetic
camera data?
Furthermore, the other objectives of the thesis include creating a proper method and
tool-chain for conducting the tests for OTA and DDI methods. The main objectives of
the thesis are further divided as follows:
1. Create a proper pipeline for data transfer into ECU using the two data injection
methods.
2. Establish real-time data transfer between Host PC for camera data and ECU.
3. Configure the tool-chain in the ECU using ROS and Python to receive and save
the data received for further evaluation.
4. Create test simulations using IPG CarMaker for data injection and algorithm test-
ing.
5. Compare and evaluate images from OTA and DDI methods using MATLAB.
2
Chapter 2
Literature Review
3
Chapter 2: Literature Review
major reason due to which the automotive industry follows the V-Model that links early
development process to their corresponding testing phases and process in the later stages
of development. A simple visualisation of the V-model concerning the testing can be seen
in the image 2.1. We can see how the testing methods are related to the development
process from the beginning of the V-model from the left side to the series development
phase to the right side of the V-model. From the Concept phase to the production and
operation, the testing is an integral part of the development process.
Model in the loop technique can be used in testing and simulation. It can be used to
test, simulate and verify the behaviour of a system or a function under certain defined
parameters and conditions. In MIL, the model and its environment are simulated in the
modeling framework without actually having physical components. In many automotive
model-based developments the functional models are abstract and they do not consider
all the aspects of performance and robustness. These models are meet the requirements
from a software development point of view. In the beginning stages of the develop-
ment process, functions are through mathematical models. These mathematical models
include vehicle dynamics, sensors, actuators and even environmental scenarios such as
traffic. MIL is an implementation of virtual signals to connect the model being tested
and the ECU. MIL is mainly applied to verify the basic implementation of the function
and architecture of the system without actually using any hardware [4].
4
Chapter 2: Literature Review
Software in loop techniques are used in testing functionality of the program code or
algorithm without actual target hardware. SIL is a testing method at system level in
which the software is tested within a simulated environment model such as an embedded
system software. The program code or algorithm is compiled for the targeted hardware.
Virtual input and output interfaces will be implemented in the loop. Software like
MATLAB Simulink, CarMaker can be used to the loop to verify functionality of an
algorithm and can be executed without need of the actual hardware on which the function
will be implemented. This method is very effective when there is a need for software
testing and proof of concept in the early stage of development of the algorithms since
hardware availability for the test is not required.
Hardware in the loop is testing in which a particular physical component in the loop
and the software runs of the final ECU. However, the environment around the ECU is
a simulated environment. ECU and the component at the test will interact via digital
and analog connectors of the ECU. Real-time actions of an environment are one of
the key requirements of the HIL. This means the behaviour of the environment model
communicates with the ECU just the same as in the intended real application.
HIL Simulation needs the development of real-time simulation which can model few parts
of the embedded System Under Test (SUT) and it needs to have significant interactions
with its operational environment. The simulation will keep track of the SUT’s output
signals and injects synthetically generated input signals at required points in the simu-
lation. The output of the SUT can be actuator commands and operation information.
The input of the SUT might be signals and commands from an operator. In general,
the output of the SUT can also serve as the input to the simulation and the simulation
will give output that will the input to the SUT[5]. Figure 2.2 shows a generic high-level
view of an HIL simulation.
There has been much research and development work on methods for the testing camera
in the loop. One of such is conducted by Bernhard and Steffen (2012)[6]. The author
proposes a methodology that utilizes a sophisticated camera model called ’VideoDataS-
tream’ to generate video data which could be used in image processing evaluation and
testing sensor data fusion algorithms within the virtual test driving condition. In the
study, the author also tests the lane departure warning system (LDW) using a camera
5
Chapter 2: Literature Review
and a monitor in a black box. A view of the test setup for this method can be seen
in Figure 2.3. A Synthetic traffic video is being captured by the camera having the
monitor in its field of view. With this test methodology, the author mentions that it
has problems in the area of constructing the test bench and missing synchronization
between the display and the camera which has lead to asynchronous and interference in
the images. On the other hand, a new method is also proposed to transfer the video
data into the ECU using a device called Video interface Box (VIB). The Video Interface
Box (VIB) is designed to convert the input video signals from the computer into four
separate camera signals to simulate a real existing camera.
In a recent study by Zhao et al (2020)[7], a comparison between two smart cameras
has been conducted. In this study, the author has simulated and tested different camera
models using a HIL system along with synthetic data from IPG CarMaker to generate the
video signals. The author has conducted the tests for comparing Maxeye and Mobileye
smart camera using a setup of Camera and a display screen. The author also mentions
that a black box has been used to isolate the external environment to make sure that
the camera placed in front of the display screen can stably receive the video signal.
In [8] the author presents a new spectral HDR camera model in a simulation platform
PreScan. A comparison of RGB-based camera and the PreScan spectral camera mode.
The simulation platform PreScan allows generating broad-spectrum, high color depth
and physically correct image for simulation. An injection HIL setup is used to directly
transfer the image data from PreScan at a very high frame rate. In another study [9],
the authors Pfeffer and Haselhoff illustrated a video injection approach into the image
processing unit of a real component using Video Interface Box (VIB) called ”Monitor
HiL”. Which is considered to be the fast and most effective way of injecting synthetic
data in a camera ECU.
6
Chapter 2: Literature Review
Figure 2.3: View of a test conducting a Lane Departure Warning System using a real
camera [6]
In a study conducted by F.Reway et al (2018) [10], multiple cameras were used in-the-
loop to evaluate object detection algorithm under different environmental conditions.
The study involved utilizing three self-built units called camera-box, in which the camera
is placed in front of an LCD monitor to capture the camera sensor data being displayed.
A test scenario was created to represent a safety-critical situation. This test scenario was
generated with environmental conditions such as daytime, dusk, deep night and dense
fog. Each of the scenario with these environmental conditions had 500 test cases. The
study concluded that the probability of a correct object detection was decreased as the
environmental complexity increased. The daytime condition showed higher sensitivity
and accuracy whereas, the dense fog condition showed the least.
7
Chapter 2: Literature Review
Color images or digital images are made of pixels, and pixels are made of combinations
of a set of real colorants or colored lights represented by a series of code. A digital image
is from a camera is generally composed of red, green and blue channel. When seeking
a comparison of two images comparing the red, green and blue channel of the image
can give an idea of how the images are different in perceiving color. Typical machine
vision algorithms mainly use grey color intensity. In traffic light, traffic sign or rear light
detection red channel color information is used. most of the sensors manufacturers design
Red Clear Clear Clear (RCCC) color filter array for automotive use. In some situations
like low and high lighting conditions, the perception of camera sensors is insufficient
in dynamic range and precision. an automotive camera When talking about synthetic
simulated camera sensor data, they need to reproduce a similar quality compared to real
images.
Root mean squared error (RMSE) is a standard way of measuring the error between two
sets of data. RMSE can be defined as the square root of the average squared errors.
RMSE is always a non-negative value. A value of 0 indicates that there is no error in
the data sets. A lower RMSE value is always better than a higher RMSE value. When
comparing two images, RMSE is calculated using pixel by pixel value difference of the
reference image and the image at test. As mentioned in 2.3.1 a color image consists of 3
channels. The RMSE for two images needs to be calculated after segregating the image
into the red, green, blue channel. For a color image, we will have three RMSE value
corresponding to three channels of the image. With the same resolution, the RMSE
value of two images X and Y can be calculated using the equation 2.1.
v
i=m
u
u 1 j=n
u
RM SE = (Xi,j − Yi,j )2 (2.1)
u X
u
tN
i=1
j=1
where: X, Y = Images
m, n = size of the image (Two dimensional)
N = m × n , Total Number of Pixels
Xi,j = pixel value at (i,j) for the image X
Yi,j = pixel value at (i,j) for the image Y
8
Chapter 2: Literature Review
The structural similarity index (SSIM) is used for measuring the similarity between
two images. The SSIM is a full-reference image quality metric, which calculates the
metric based on a reference image. SSIM was proposed by author Z.Wang in [11].
The SSIM compares the local patterns of pixels intensities which are normalized for
luminance and contrast. The reference image is considered quantitative measurement
having a perfect quality to measure the quality of the image at test. The SSIM metric
compares luminance, contrast and structure. Luminance comparison is a function of
the mean of the pixel intensities of the two images. Contrast comparison is a function
of the standard deviation of the images. For structure comparison, the luminance is
subtracted and the variance is normalized. The structure comparison is a function of
the correlation coefficient between the two images.
The absolute central moment was a statistical measure developed for camera systems.
ACM is a histogram-based measure as opposed to a local gradient or sharpness measure.
Author in [12], proposed validated the ACM image quality metric to measure focus
and sharpness via experimental test patterns. The test pattern has a light grid on a
dark background, which seems like the black colored filled rectangles placed on a white
background. Three other patterns were generated which are increasingly out of focus
and three more test patterns with increasingly poor exposure. The Experimental results
revealed that ACM was an excellent measure of image quality when the image has the
best contrast and sharpness.
Edges in images are created due to the difference in neighbouring pixels. Author F.Cutzu
et al. in [13], present an image classification system that discriminates paintings from
photographs. During the study, A quantitative criterion was developed based on the two
hypotheses. First, the edges in photographs are caused largely due to the intensities of
pixels. Second, the edges in paintings are caused due to change in color as they are not
accompanied by intensity changes. A criterion Eg was proposed to represent a fraction
of pure intensity edge pixels. It is noted that Eg will be higher for photographs. The
results of this study showed that the edges are sharper in photographs and the pure
color edges are higher in paintings.
9
Chapter 2: Literature Review
Spatial resolution is the ability of an imaging system to distinguish the smallest details in
two objects in an image. The measure of how closely lines can be resolved in an image is
characterised by spatial resolution. The spatial resolution can be measured qualitatively
through the visualisation of objects of known size by measuring the modulation of the
system as a function of spatial frequency. Spatial frequency is measured in units of line
pairs per unit length. A line pair is a white line next to another black line. Figure 2.4
shows how the two black and white regions (Line pair) can be distinguished.
Contrast is the measurement of separation between dark and light regions in an image.
The highest contrast means that the region having black color will be perceived as a
true black region and a white region is perceived as true white. As the contrast reduces
the distinction between the black and white begins to blur and it appears to be grey.
To be specific, the change in intensity of one point to another is the contrast. Figure
2.5 depicts how the contrast and pixels are related. In short, contrast determines how
efficiently the shades of grey are differentiated. Contrast is measured in percentages and
can be calculates using maximum and minimum Intensity [14]. The contrast can be
calculated using the equation 2.2
(Imax − Imin )
M ichelsonContrast = (2.2)
(Imax + Imin )
10
Chapter 2: Literature Review
The relative contrast at a given spatial frequency is the Modulation Transfer Function
(MTF). MTF can be referred to as a bridge between the contrast and the spatial resolu-
tion. MTF is the measurement of the imaging lens ability to transfer contrast from the
object plane to the image plane at a particular specific spatial resolution. The object
plane is the spatial area where the object presides and the image plane is where the
image presides. MTF is expressed in terms of image spatial resolution (line pairs per
unit length) and contrast percentage (%). When designing a lens system there is always
a trade-off between spatial resolution and contrast. As the spatial resolution increases,
the contrast decreases until a point at which the image will become indistinguishable
and grey.
MTF quantifies how good object plane brightness variations are preserved when they
pass through a camera lens. MTF value of 1 means that the contrast is completely
preserved. As the values decreases, the contrast will also decrease and reach its lowest
value of 0, where it can no longer distinguish any line pairs.
11
Chapter 2: Literature Review
Figure 2.6: Slanted Edge Method: (a) Edge target and (b) projection, binning, and
averaging before forming a one-dimensional edge profile. [16]
The edge location is estimated after taking a one-dimensional derivative of each data
line and finding the centroid to a subpixel accuracy. A simple linear regression is then
performed on the collected line-to-line edge locations to estimate the edge angle. The
pixels in the ROI are then projected along the direction of the estimated edge onto the
horizontal axis, which is divided into bins with widths equal to a quarter of the sampling
pitch to reduce the influence of signal aliasing on the estimated MTF. The values of
the pixels collected in each bin are averaged, which generates a one-dimensional edge
profile with 4× oversampling. The pixel count in each bin must be large enough to
obtain a reasonable average pixel value. The derivative of the edge profile yields the
line spread function (LSF), and after applying a smoothing Hamming window to the
LSF, performing a discrete Fourier transform, and normalizing, the MTF over a range
of horizontal spatial frequencies can be estimated [16].
12
Chapter 3
Methodology
This Chapter will explain the Methodology used for implementing the objectives men-
tioned in Section 1.2.In Section 3.1, the important hardware and software components
are explained. In Section 3.2, the two methods of data injection onto an ECU are ex-
plained. In Section 3.3, the implementation of ROS packages in Nvidia Drive PX2 is
discussed. In Section 3.4, the different methods implemented for camera calibration is
explained. In Section 3.5, the methodology for measuring contrast and spatial resolution
using MTF is explained. In Section 3.6, the necessity and the methodology for image
sorting using dHash algorithm is explained. In Section 3.7, the different image quality
metrics and their implementation is explained. In Section 3.8, the object detection al-
gorithm and ground truth is explained. In Section 3.9, the data acquisition for the two
methods of data injection methods is explained
Robot Operating System (ROS) [17] is an open-source system that runs on Unix based
platforms. Even though ROS is not an operating system it provides the services expected
from an operating system. It can be referred to as a collection of the software frame-
works. It had services designed for implementing commonly used functionalities in C++
and Python, message-passing between processes, package management. It also provides
libraries and tools for building, writing and running code on multiple computers.
The primary goal of ROS is to support code reuse in research and development. ROS
has distributed network of processes known as Nodes, this helps in executable to be
designed separately and couple them when necessary. ROS is helpful in large run-time
systems and development processes. A basic ROS computation is represented in Figure
3.1.
A simple representation of the ROS mechanism is shown in Figure 3.1. A ’Node’ is a
process that performs the computation. ROS will have many Nodes in its system. Each
Node can communicate and exchange data with one another. The data with which the
Nodes communicate with each other is called a ’Message’. A ’Topic’ is the name that
13
Chapter 3: Methodology
is used to identify a particular content of the Message. A topic can also be referred to
as the communication route or the transport system with which the nodes exchange the
messages. A Node can be a ’Publisher’, ’Subscriber’ or both. A Publisher will publish
the Message with a particular Topic Name. A Subscriber will receive the Message a
Publisher whenever any Message has been published by the Topic it subscribed to. ROS
’Master’ provides the names and registration required for the Nodes to operate in the
system. Without the Master, no Nodes will be able to communicate with each other.
The Master also keeps track of publishers and subscribers to a particular topic and its
services. One of the main roles of the Master is to enable nodes to find one another in
the system. Every Topic has its unique name in the ROS Master with which Nodes can
Publish or Subscribe.
14
Chapter 3: Methodology
2.3 Mega Pixel camera with the Resolution 1928*1208 has been used for the Camera in
the loop simulation in this thesis. A pictorial view of the camera can be seen in Figure
3.2. The further Details of the Camera can be found below in Table 3.1.
15
Chapter 3: Methodology
The LCD monitor is having a resolution of 1920 × 1200 Pixel. The display has a
brightness level of 500 cd m−2 and a contrast ration of 1000:1. The LCD monitor is
capable of input with resolution 4k (4096 × 2160) at 24 Hz refresh rate via HDMI cable.
When the display is connected to a Windows 10 system, we can have reduced latency and
improved the performance of the display by enabling the ”Hardware-accelerated GPU
scheduling” option in the System display-graphics settings menu. The LCD monitor
needs a Power adapter with 5.5 × 2.5 mm Plug and 12V / 1.5A, 18W output power.
3.1.5 Camera-Box
The Camera Box serves as a medium of Sensor Data between the host pc with CarMaker
and the Nvidia Drive PX2. The camera Box has a setup in which the LCD monitor
mentioned in the section 3.1.4 is placed in front of a real automotive Camera mentioned
in the section 3.1.3. The previous version of the Camera Box was used in [10] for
evaluating and validating proprietary algorithm for a multi-class object detection of an
ADAS functionality. The new version of Camera Box is created with improvisations
from the previous Camera Box Version.
16
Chapter 3: Methodology
The new version of Camera Box is constructed in such a way that the ambient light is
blocked with a help of a wooden box. The camera is placed in such a manner that it
captures only the LCD monitor. The camera is fixed with a help of a 3D printed fixture
to hold the camera in place. This 3D printed fixture is placed on the longitudinally
movable platform for adjustments and the correction for the camera placement. An
outlet for the cables has been created behind the camera placement and it is closed
with a black color sponge to avoid any small amount of light entering the camera box.
Apart from having an LCD monitor and a real automotive camera, a lens has been
placed between the LCD monitor and the Camera. The lens is 62 mm in diameter and
is having a 6.25 Diopter.
Video Interface Box (VIB) is a device from IPG Automotive GmbH. VIB can be used
in place of an image sensor in testing procedures such as Hardware in loop testing. VIB
can emulate four cameras channels simultaneously. The data can be induced into VIB
through computer graphics using a Display Port at a frame rate of 30Hz.
Figure 3.4: Structure of a typical test system using a video interface box
A typical video interface box used in a system setup is shown in Figure 3.4. The VIB
is connected to the host system using a graphics port. The host pc provides the frames
that should be injected to am ECU. The VIB behaves as another monitor connected to
the host pc. The data displayed on the VIB monitor is processed by the VIB and it
can distribute this data up to four output channels. Each of the output channels can be
read as a single camera input in the ECU. The readily available camera output channel
interfaces are Gigabit Multimedia Serial Link (GMSL) and Flat Panel Display Link
17
Chapter 3: Methodology
(FPDL) format. The VIB performs internally the color format conversion, manipulation
of sync signals and timing, frame buffering for the input image data. The synchronization
of the channel output from VIB can be triggered internally or externally.
Figure 3.5: Front View of Video Interface Box with operation elements
Figure 3.6: Back View of Video Interface Box with operation elements
Table 3.2: Short description of operating elements shown in Figure 3.5 and 3.6
Index Description
1 Power Switch
2 Control panel with up/down/left/right and OK button
3 LCD Monitor
4 Power supply jack
5 USB connector
6 Ethernet Port
7 Display Port Input connector
8 Video Output connector for Ch0 - Ch3
Figures 3.5 and 3.6 show the front and back view of the Video Interface Box with all
operating elements respectively. See Table 3.2 for a short description of the elements
The VIB is capable of the following settings which can be set using the Control Panel
shown in Figure 3.5 with the operation element number 4. For any further settings or
information of VIB, the VIB User Guide can be referred.
• Input Config :
18
Chapter 3: Methodology
For setting the VIB properly a set of instructions have been written in the Appendix A.
Using this Appendix one can setup the VIB to have One Channel camera with 1920 ×
1232 raw camera sensor data input.
19
Chapter 3: Methodology
2. Display limitations of the LCD monitor in terms of refresh rate, color, resolution,
temperature, brightness, saturation, sharpness and contrast.
5. Necessity of a lens to adjust the focal length and the field of view of the camera.
This method of data injection involves injecting sensor data directly into the ECU. No
real sensor will be used in this method. One way of injecting the sensor data directly into
ECU is using a device that emulates a camera. One such device is the Video Interface
Box (VIB) from IPG Automotive GmbH. The raw camera sensor data from the IPG
20
Chapter 3: Methodology
CarMaker is sent to VIB via Display Port cable. The VIB will feed the sensor Data to
Nvidia Drive PX2 via FAKRA Z Type connector.
1. Necessity of equipment capable of emulating the camera and injecting data into
an ECU.
21
Chapter 3: Methodology
As explained in section 3.1.1, a system of Nodes is being used to read and process the
camera sensor data in the Nvidia Drive PX2. Since Nvidia Drive PX2 is running on
Ubuntu 16.04, ROS Kinetic Kame is installed. All the packages and the files are used
are in the ROS Kinetic version [20]. Important packages of ROS used for this thesis are
explained in the following subsections.
The camera connected to the Nvidia Drive PX2 needs to be properly read in the ROS
environment. Since the Camera 3.1.3 is a GMSL type, a ROS GMSL Driver Package is
used to read the Camera Frames. This package is capable of publishing a raw image,
compressed image, image down-sampling and image rectification with the calibration
parameters. There are two important aspects in this package that need attention. First,
the required Frames per Second (FPS). Second, the type of camera which is to be read.
Once these parameters are defined package creates a node called ’gmsl n cameras node’
which publishes the camera image topics.
As explained in the section 3.1.5, Camera Box uses a real automotive camera 3.2 to
inject the data into the Nvidia Drive PX2. The frames from the real automotive camera
can be read using the camera type ’ar0231-rccb-ae-ss3322’ and the frames from the VIB
with the camera type ’ar0231-grbg-ae-sd3321’. Different launch files have been created
in this package for Camera Box and VIB setups.
Two launch files have been created in this package for the Camera-box. The first launch
file is used when it is necessary of publishing the topic which has Compressed-image
Messages at 30 FPS without downsampling and without any image rectification. The
second launch file is used when the image from the camera needs to be rectified with the
camera calibration parameters. This second launch file publishes the topic which has
the rectified images messages at 30fps without compression and without downsampling.
As for the VIB, there are two launch files created in this package. The first launch
file is similar publishes a topic with Compressed-image messages at 30 FPS without
downsampling and without any image rectification. The second launch file publishes
the camera image messages at 30fps without compression, without downsampling and
without any rectification.
22
Chapter 3: Methodology
A ROS package of Darknet for object detection in the camera images [21] has been used
as the object detection functionality in this thesis. The author of this package has imple-
mented YOLO V3[22] which uses the GPU capabilities. The pre-trained convolutional
neural network can detect pre-trained classes including the data set from COCO and
VOC. The ROS Darknet node from this package will be subscribed to the camera node
when launched.
A custom package for the other required processes has been created in the ROS workspace
in Nvidia Drive PX2. These packages help in collecting the data, saving the ROS bag
file, receiving UDP packets from the host pc which is the camera sensor data source. In
the next section, we discuss a few of the important Nodes that are necessary to run a
simulation and save the data.
In Over the air data injection or the Direct data injection, there is not a trigger or a
way to know when the simulation has been started. To overcome this problem a custom
UDP packet is being sent from the IPG CarMaker to the Nvidia Drive PX2 IP address.
With the help of the ROS, a node has been created which will be the UDP packet server
and receives any incoming packets to the Nvidia Drive PX2. In any given simulation we
will receive two UDP packets, one at the start of the Simulation in IPG CarMaker with
value 1 and the second at the end of the simulation with value 0. The UDP packets re-
ceived by the Nvidia Drive PX2 will be the flag value for most of the custom made Nodes.
The bag file is a ROS file format. It is a file that stores the ROS Message data. In a
normal operation, a bag file can be saved using the terminal by running the appropriate
command line [23]. Since the IPG CarMaker provides a flag value to indicate the status
of the simulation, it can be used to save only the required data in the ROS bag. Hence a
Node has been created to do this task of saving Image data the bag file from the start of
simulation till the end only and save it at a custom location. This Node will subscribe
to any topic which publishes images. It can subscribe to the published image from the
GMSL Driver Node or the ROS Darknet Node.
23
Chapter 3: Methodology
Saving JSON File with the Object List from ROS Darknet
Since the ROS Darknet is used for object detection, the output from this Node needs
to be saved for later evaluation purposes. Hence, a custom Node has been created to
subscribe to the ROS Darknet Node and receive the details on the objects detected in
the images. The JSON file saving Node will write the class of the object, probability
of the object and Bounding Box details such as x, y, width and height into a txt file
at a custom location. This Node will be subscribed to the UDP publisher Node to
start writing the JSON file the moment Simulation is Started and ended from the IPG
CarMaker.
Bag file is a ROS file system, the data inside this file cannot be accessed without special
software. Hence it is important to have the data from the bag file to be extracted using
the ROS itself. Since major bag files being recorded are images, a separate node for
converting these bag files into easily accessible images formats such as JPG, JPEG or
PNG.
ROS camera calibration can be performed using a checkerboard calibration target. The
complete calibration of this method is performed in the ROS environment. The camera
mentioned in section 3.2 is calibrated using a checkerboard of 9*8 with a box size of
136 mm is used. Since the camera data is being read by the Nvidia Drive PX2 using
a ROS GMSL driver, the published topic name of the raw camera is given as an argu-
ment to the ROS camera calibrator. For an overall good calibration, the checkerboard
needs to be moved around the camera. Once the camera calibrator node is running, the
checkerboard is moved around the camera view angle from right to left, top to bottom,
towards and away, tilting the checkerboard to left, right, top and bottom. A detailed
24
Chapter 3: Methodology
explanation and implementation of the ROS camera calibration node used in this thesis
can be found at [25].
Figure 3.9: Raw camera image stream with image processing in ROS
Once the camera calibration parameters have been generated, an image processing node
is added after the GMSL driver in ROS. This image processing node will utilize the
calibration parameters and will rectify the camera input. The image processing node
will then publish the rectified camera data. A simple representation of this process is
represented by Figure 3.9. An advantage of using the image processing node is that all
the processes are on demand. The rectification of the camera image happens only if
there is a subscriber to that particular rectified topic. This image processing layer will
require camera info and the raw image from the camera driver to work properly. Using
these two pieces of information a new launch file and a corresponding yaml file will be
created in the GMSL Driver Package 3.3.1. The output from the image processing layer
will have the rectified image color image ready whenever the GMSL driver publishes the
camera messages.
25
Chapter 3: Methodology
images of the checkerboard are collected, the images are read in the MATLAB Camera
Calibrator App. With the help of MATLAB, we can extract the calibration parameters
of this method. These parameters are used to rectify the images.
The MTF is measured using the slanted edge method explained in section 2.3.3. For
this purpose, a target has been selected to take the image using the device VIB and
automotive camera. To capture the image with the target in the field of view OTA
and DDI methods have been used. The target is made sure to cover the complete field
of view of the camera and the VIB to avoid interference from the other objects or the
lighting from the scenario which can greatly affect the measurement of the MTF. The
target selected can be seen in the Figure 3.10.
The OTA uses a real automotive camera in the Camera Box. Apart from the method
mentioned in 3.5.2, the camera is taken out of the camera box and is used to capture a
real printed MTF Target similar to the simulation-based image capture of the method
from section 3.5.2. The MTF Target is printed on A1 white paper and is pasted onto
the wall. The camera is placed at the middle of the A1 sheet at a distance away from
the target to have the complete target in its field of view. In Figure 3.11 the setup of
this method of image capturing from the real MTF target has been shown.
26
Chapter 3: Methodology
Figure 3.11: Setup for capturing image from the camera for MTF Measurement using
print MTF Target
3.5.2 Image Acquisition from OTA and DDI for MTF Mea-
surement
MTF is a measurement related to the optical system of the imaging system. Since the
camera sensor data to be injected is being generated from the IPG CarMaker, it is valid
to have the MTF target also to be captured with the IPG CarMaker. The MTF target
in Figure 3.10 is placed on a sign plate in the IPG CarMaker scenario. Figure 3.12
shows how the MTF target has been placed in the simulation to have the target fully in
the field of view of the front camera of the vehicle. This method is used because of the
presence of an external lens in the camera box. With respect to VIB, the only possible
way is to inject the target directly into it. Hence to keep both methods the same, the
same IPG CarMaker scenario is used to capture the MTF Target.
27
Chapter 3: Methodology
Figure 3.12: MTF Target placed in the field of view of the camera in the IPG CarMaker
The Video interface box is capable of working at a frame rate of 30Hz whereas, the LCD
Monitor used in this thesis is only capable of a frame rate of 24 Hz at 4k (4096 × 2160)
resolution. The automotive camera is capable of outputting at a maximum frame rate
of 30Hz. In the OTA method, the LCD monitor is displaying the camera sensor data
at a frame rate of 24Hz but the automotive camera received the data at a frame rate of
30Hz. Hence the sampling rate of the LCD monitor is less than the required Nyquist
frequency of the automotive camera. This lower sampling rate causes an aliasing effect
in the OTA method. In addition to this, when the camera sensor data is displayed on
the LCD monitor, it gets cropped because the resolution of the LCD monitor is not a
factor of input camera sensor data i.e. 1920 × 1208.
With the initial experimentation of data injection onto ECU, it was found that the frame
rate of OTA and DDI were not the same. Due to to the fact that the OTA has an image
processing layer in ROS, the frame rate for the OTA method decreases than that of the
DDI method. Hence to avoid the effect of aliasing and mismatching of frames due to
different frame rates between OTA and DDI, The images extracted from OTA, DDI and
CarMaker are needed to be sorted to have similar images for image quality comparison
and evaluating object detection on the images.
28
Chapter 3: Methodology
Perceptual hashing is a process of generating a hash value based on the visual contents of
an image. One such algorithm is the dHash. dHash tracks gradient in an image. dHash
algorithm works on the difference between the adjacent pixels thereby identifying the
gradient direction and computing a 64-bit hash for an image [27]. The 4 steps involved
in generating a dHash value of an image are listed below.
2. Reducing Color: The image is reduced from 3 channel color image to one channel
greyscale image.
3. Computing Row and Column Hash: The difference between the adjacent pixels in
a row and column are calculated separately. 1 bit is assigned if the pixel intensity
is increasing in the given direction, 0 is assigned if it is decreasing.
4. Computing dHash Value: combine both the row and column hash value to get the
dHash value.
Using the dHash value the images can be compared to one another. To measure how
different the images are the hamming distance is used. Once the dHash values of the
two images are computed, the hamming distance will decide how similar the images are.
As per the author the in [27], hamming distance between one and ten are potentially a
variation of the same images.
In this thesis, the dHash algorithm is used to sort the images between the original image
from CarMaker, OTA and DDI. The original sensor data from CarMaker is compared
with the images extracted from Nvidia Drive PX2 using the DDI and OTA data injection
methods to obtain similar images to be analysed and evaluated further.
The extracted images of the OTA and DDI method are compared with the injected im-
ages exported from CarMaker. First, the dHash value is calculated for Nth image from
CarMaker. Then, the dHash value for the images from N-20 to N+20 of the OTA and
DDI are calculated. The hamming distance of the images of OTA and DDI from N-20
to N+20 are calculated with respect to Nth image of the CarMaker. The image with the
lowest hamming distance in the set of N-20 to N+20 is selected from OTA and DDI if
it satisfies the threshold. If the threshold is not satisfied that particular image of Car-
Maker will not be in the sorted image set. Figure 3.13 shows how this is implemented
in python with the help of a flowchart.
29
Chapter 3: Methodology
Start
No
For N = number
of images in CM
Yes CM
Input Images
Read Nth Image in CM from
IPG
CarMaker
Calculate dHash value for
Nth Image in CM
For No
M = (Nth -20) to
(Nth +20)
Yes
DDI OTA
DDI Method Read Mth Image in DDI Read Mth Image in OTA OTA Method
Images Images
from ROS from ROS
if if
No hamming distance of hamming distance of No
Mth DDI Image < hamming Mth OTA Image < hamming
distance of Selected distance of Selected
Image of DDI Image of OTA
Yes Yes
selected DDI Image = selected DDI Image =
Mth Image of DDI Mth Image of DDI
No if hamming distance
OTA & DDI <=
threshold value
Yes
Stop
30
Chapter 3: Methodology
The images from the OTA and DDI are extracted from Nvidia Drive PX2, the images
are sorted using dHash algorithm mentioned in 3.6.2 to obtain similar images to be
compared with each other. Along with these images, the original input image from IPG
CarMaker is also extracted. These three sets of images namely CarMaker, OTA, DDI
are subjected to image quality measurement.
3.7.1 Area Under the Curve (AUC) and Root Mean Squared
Error (RMSE) of R, G and B Channel
The R, G and B channels are the main three constituents of a color image. These images
are segregated into R, G and B channels and are compared with the original input image
from CarMaker. The RMSE of an image for a given color channel is a single value. But
when considering a video, each frame in the video will be having its own RMSE value.
To evaluate the RMSE of a given video the graph of RMSE value over the number
of frames in a video is plotted. Calculating the area under the curve (AUC) of this
particular curve in the graph will yield the overall cumulative RMSE value of a given
video.
Absolute Central Moment (ACM) is a statistical measure extracted from a digital image
to quantify the image quality. The author in [12] proposed a histogram-based image
quality metric to measure the sharpness, focus and exposure of an imaging system. The
ACM of an image can be calculated using the equation 3.1. For the calculation of ACM,
the image needs to be converted into unsigned integers. The calculation of the ACM
for this thesis is implemented in MATLAB based on the work of author S.Pertuz [29].
31
Chapter 3: Methodology
In this thesis, the ACM is used as a metric to compare the focus of the images between
OTA and DDI compared to their reference image from CarMaker.
Furthermore, in this thesis, the comparison of ACM of OTA and DDI method based im-
ages is executed by comparing the average value of the images of a particular scenario.
First, the ACM of images from OTA, DDI and the reference images from CarMaker for
a given scenario is calculated. These three sets of ACM values are used to calculate
the average value of the ACM for that particular scenario. In the result section for the
process of selection of scenarios for object detection, the Equation 3.3 is used to calculate
the RMSE value of ACM for a given scenario.
N −1
ACM = |i − µ|p (i) (3.1)
X
i=0
N −1
µ= ip (i) (3.2)
X
i=0
v
1 i=N
u
RM SE of ACM V alues = ((ACM X)i − (ACM Y )i )2 (3.3)
u X
t
N i=1
where: (ACM X)i = ACM value of reference image with frame number i
(ACM Y )i = ACM value of test image with frame number i
N = Number of frames in a given scenario
Color is one of the main features of digital images. But the removal of color from the
digital images does not mean it loses its visual information. In specific, the removal of
color from a digital image will yield edges present in the images. Using these observations
the author in [13] developed a quantitative criterion called Eg which uses the pure
intensity edge pixels and the pure color edge pixels.
To calculate Eg , a Color image is used as the input. The intensity edges were calculated
by converting the input color image to a grey-scale image and by applying canny edge
32
Chapter 3: Methodology
detector [30]. On the other hand, the red, green and blue channel of the image were
separated and normalized by diving each pixel with the intensity at that particular pixel.
The intensity at a pixel is calculated using the equation 3.4. The three normalized color
channel images were subjected to canny edge detector and the resulting images were
fused to obtain the intensity-free images with pure color edges pixels. Finally Eg can be
calculated using the equation 3.5. It is observed that the Eg is larger for photographs
[31].
Ground-truth refers to the true information on the object present in the image. In object
detection, ground-truth will contain the bounding boxes to the objects that are actually
present in the image. This information of the objects will be helpful in evaluation of
object detection. In this thesis, the MATLAB Ground Truth Labeler app is used to
label the images for object detection. The objects labelled are Car, Pedestrian, Bicycle
and Traffic Signs. Since there are three image sets i.e CarMaker, OTA, DDI, the ground
truth is labelled for all three separately.
33
Chapter 3: Methodology
The object detection using YOLO V3 is implemented in Nvidia Drive PX2 using the
ROS object detection package mentioned in section 3.3.2. This package utilizes the
input from the GMSL camera driver to receive the images in real-time and is integrated
with the camera-in-the-loop. TO save the output data of the bounding boxes from
the algorithm, a custom ROS node mentioned in 3.3.3 is used. This file will save the
details of the bounding box data of the detected object, class of the detected object and
confidence of the detected object onto a text file.
The object detection using YOLO V4 [32] is implemented in the Linux operating system
of the Host PC [33]. This object detection algorithm is offline, it is not a real-time
detection since it is not integrated with the camera-in-the-loop system. First, the images
from the Nvidia drive PX2 are extracted. Then, the images are subjected to object
detection using the. The result of the object detection is saved in a text file. This text
file is used to evaluate the performance of object detection.
Figure 3.14: Intersection over union of ground-truth (red color outline) and prediction
(green color coutline) bounding boxes
For measuring the performance of an object detection the concept of intersection over
union (IoU) is used. IoU calculates the intersection over the union of the bounding box
34
Chapter 3: Methodology
of the ground truth and the bounding box of the predicted object. An example of IoU
is shown in Figure 3.14. The red color outlined in the ground truth bounding box and
the green color outlined is the predicted bounding box. Typically, IoU of 1 implies the
predicted and the ground-truth bounding boxes overlap perfectly.
The IoU considered in this thesis is 0.5. If the IoU is greater than 0.5, the object is
classified as True Positive (TP). If the IoU is lesser than 0.5, it indicates a false ob-
ject detection and it is known as False Positive (FP). When an object is present in
ground-truth and there are no predicted objects it is called a False Negative (FN). To
evaluate the performance, metrics such as Precision and Recall are used. Precision can
be calculated using the Equation 3.6 and recall can be calculated using the Equation
3.7. Ideally, both precision and recall should have a value of 1. But due to inaccuracy of
the object detection the precision and recall decrease. Precision indicates how accurate
are the predictions and recall indicates the total relevant predictions that are correctly
classified. For having an optimal blend of precision and recall, they can be combined
using the F1 score. F1 score is a harmonic mean of precision and recall. It can be
calculated using the Formula 3.8. F1 score gives equal weight to precision and recall.
Using the F1 score a balance of precision and recall can be used to optimize the object
detection algorithm. False Positive Per Frame (FPPI) is a metric to evaluate the falsely
predicted objects. The FPPI can be calculated using Equation 3.9. FPPI can be used
to indicate the false positives per frame, having a lower FFPI indicates that there is a
higher chance of true positive objects being detected.
True Positive
Precision = (3.6)
True Positive + False Positive
True Positive
Recall = (3.7)
True Positive + False Negative
precision ∗ recall
F1 = 2 ∗ (3.8)
precision + recall
False Positive
F1 = 2 ∗ (3.9)
Number of F rames
35
Chapter 3: Methodology
The camera sensor data required for injection into Nvidia Drive PX2 is generated using
IPG CarMaker. CarMaker provides an visual interface to create scenarios which can
be used to feed the camera testing methods with synthetic sensor data. For the testing
purpose various weather conditions and traffic conditions can be selected. The in-built
option of camera sensor makes it suited for the purpose of generating sensor data from
the point of view of the front camera of a car.
For the purpose of evaluating the two methods of data injection in this thesis, 14 different
scenarios have been created in the IPG CarMaker. Each of the scenario are different
compared to one another. These scenarios have different aspects such as traffic intensity,
surrounding view, traffic junction, buildings, vehicles, pedestrians and cyclists. These
scenarios will be used to conduct the tests with the Over the air data injection method
and Direct data injection method.
3.9.2 Workflow
All the scenario mentioned in section 3.9.1 needs to be tested on both of the OTA 3.2.1
and DDI 3.2.3 for Image quality comparison. In this section, the workflow for conducting
a test of these 14 scenarios using the OTA and DDI methods is discussed. The workflow
for conducting the test for a single scenario involves following steps:
2. Start the GMSL camera driver in Nvidia Drive PX2 using ROS package mentioned
in 3.3.1.
3. Verify the image read by the GMSL camera driver in Nvidia Drive PX2.
4. Establish the UDP connection between the Host PC and Nvidia drive px using the
UDP publisher node from custom ROS package from section 3.3.3.
5. Start the Node for saving the bag file of the camera data input onto Nvidia drive
PX2.
6. Start the simulation in the IPG CarMaker in the Host PC. Wait until the simu-
lation is stopped by IPG CarMaker in Host PC and the bag files is saved in the
Nvidia Drive PX2.
7. Run the node to convert the bag files to images in Nvidia Drive PX2.
36
Chapter 3: Methodology
8. Run the Object detection package along with Node to save the JSON file, only
when the object detection using YOLO V3 in ROS is needed.
The above-mentioned steps are used to test a single scenario for either for OTA or DDI
method. Using these steps the 14 scenarios can be tested for OTA and DDI to extract
the image sets of OTA and DDI methods. The injected images from IPG CarMaker
are exported using the inbuilt option to export the images in IPG CarMaker. Once the
images of all 14 scenarios for OTA, DDI and CarMaker are extracted, these images are
subjected to image sorting to obtain similar images using dHash algorithm mentioned in
section 3.6.2. The sorted images from dHash algorithm are used to calculate the image
quality metric explained in section 3.7 and to evaluate object detection when using offline
YOLO V4 algorithm.
37
Chapter 4
Results and Discussion
This chapter will discuss the results of the implementation explained in Chapter 3. Sec-
tion 4.1, has the result and discussion of the MTF Measurement for over the air (OTA)
and direct data injection (DDI) Methods. Section 4.2, contains the result and discus-
sion regarding the image quality measurement and comparison. Section 4.3, contains the
result and discussion of the object detection
Modulation Transfer Function (MTF) determines how much contrast of the object plane
is perceived at the image plane with a given optical system. MTF is also a relation
between the contrast and spatial resolution of an image. The value of MTF at 50 %
i.e. MTF50 is proven to be good for measuring the spatial resolution with respect to
pixel and contrast. As explained in 2.3.3 the higher MTF value represents good contrast
preservation, lower MTF value represents the loss of contrast. MTF is measured using
the concept of the slanted edge method mentioned in section 2.3.3. The methodology for
measuring the MTF for OTA and DDI methods is described in section 3.5. The results
of these MTF measurements are discussed in the following sections.
An MTF target is necessary to measure MTF using the slanted edge method. For
measuring the MTF of an automotive camera a real printed target is used. The result
of measurement is shown in Figure 4.1. The Y-axis represents the measured MTF value
and the X-axis represents the spatial resolution of the camera. In the Figure 4.1, a
red colored point of MTF50 is marked. This red colored point indicated the value of
MTF at approximately 50% contrast and the corresponding spatial Resolution. The
measurement for the automotive camera shows that at MTF50 the camera can attain a
spatial resolution of 0.14 cycles per pixel.
38
Chapter 4: Result and Discussion
Figure 4.1: MTF curve measured for automotive camera using a real printed target
Since measuring the MTF value requires an MTF target, a digital MTF target has been
placed in the field of view of the front camera of the car inside the CarMaker simulation.
Using this method the MTF for OTA is measured where the automotive camera is used.
In the Figure 4.2 the measurement result for OTA is shown. The MTF curve at MTF50
yields a spatial resolution of 0.15 cycles per pixel. This is indicated with a help of a red
colored point on the curve.
In OTA a lens has been placed in between the LCD monitor and the automotive camera.
In the result section 4.1.1 the automotive camera has a spatial resolution of 0.14 cycles
per pixel and the MTF measured for OTA has a spatial resolution of 0.15. Even though
the automotive camera used in both measurements are the same, the OTA setup has
shown a 7.14 % increase in spatial resolution. The main reason for the increased spatial
resolution is the presence of a lens along with the automotive camera.
Video Interface Box (VIB) is the optical system in the DDI method. Hence the MTF
is measured using a digital MTF target inside the CarMaker simulation. In Figure 4.3
the result of MTF measurement for DDI is shown. A red colored point has been marked
on the curve at MTF50 for measuring the respective spatial resolution. The spatial
resolution of DDI at MTF50 is 0.27 cycles per pixel.
Between the MTF measurements of OTA and the DDI method, the DDI method has
shown better results. The MTF measurement of DDI has an 80% increase in spatial
resolution than the OTA method with the value of 0.15 cycles per pixel. The higher
spatial resolution of OTA at MTF50 is a clear indication that the DDI method represents
39
Chapter 4: Result and Discussion
better contrast preservation and provides better spatial resolution than the OTA method.
40
Chapter 4: Result and Discussion
As described in section 3.7.1 the RMSE values of the images of each of the 14 scenarios
are calculated. These RMSE values of each scenario are plotted against their respective
number of frames. Then the area under the curve of these graphs is calculated to measure
the overall cumulative RMSE of a single scenario.
In Figure 4.4 the area under the curve of RMSE values of red Channel are plotted against
the scenarios. The area under the curve of RMSE value represents the overall RMSE
value of the images in the red channel compared to its reference image in the red channel.
In all of the scenarios, the area under the curve of RMSE values for a red channel of the
OTA method shows higher values than the DDI method. It is inferred that the error
of the pixels from the OTA method calculated with respect to the reference image from
CarMaker which is injected into the camera using the LCD monitor is higher. The DDI
having a lower error value is an indication that the pixel values in the red channel are
much closer to the reference image from CarMaker which is injected into the VIB device.
Figure 4.4: Area under the curve of RMSE values of Red Channel comparison of OTA
and DDI for 14 scenarios
subsequently, In Figure 4.5 the area under the curve of RMSE values of Green Channel
are plotted against the scenarios. Similar to the area under the curve of RMSE value
of the red channel, it is observed that the OTA shows a higher area under the curve of
RMSE for the green channel compared to the DDI method. Furthermore, in Figure 4.6
it is seen that the area under the curve of RMSE values of the blue channel also shows
similar behaviour to the other two color channels.
41
Chapter 4: Result and Discussion
Figure 4.5: Area under the curve of RMSE value of Green Channel comparison of OTA
and DDI for 14 scenarios
Figure 4.6: Area under the curve of RMSE value of Blue Channels comparison of OTA
and DDI for 14 scenarios
The structural similarity index of each image is calculated with respect to its reference
image as described in section 3.7.2. SSIM value is calculated for all the images from
a given scenario using OTA and DDI method. To visualise these results, a box plot is
plotted as shown in Figure 4.7. The Y-axis represents the SSIM values of the images, the
primary X-axis is the OTA and the DDI method based test case for a given particular
scenario and the secondary X-axis is the test case scenarios. The box plot is divided
into 14 parts with solid vertical lines to separate each scenario from one another. In
each scenario, there are two two box plots, the first is the OTA method and the second
is the DDI method used for scenario testing. The higher the SSIM value, the higher is
42
Chapter 4: Result and Discussion
the similarity between the test image and the reference image.
Figure 4.7: Box Plot for Structural Similarity Index comparison of OTA and DDI for
14 scenario
The horizontal line inside each of the boxes indicates the mean value of the SSIM of
the images in a given scenario. It is observed that in each of the scenarios, the mean
SSIM value for DDI is higher than the mean SSIM of the OTA method. The maximum
observed SSIM is 0.7484 in scenario 12 with the DDI method. Lowest observed SSIM
value is 0.5454 in scenario 6 with the OTA method. Figure 4.7 shows that the SSIM
values of the frames in all the scenarios in the DDI method are much higher than those
of the OTA method. It can be inferred that the DDI method shows higher similarity to
the reference image i.e the CarMaker image.
43
Chapter 4: Result and Discussion
From the results of the ACM for the 14 scenarios, the average value of ACM for CM box
plots are always higher than the average of the OTA and DDI methods of the CM. The
CM for scenario 7 has the highest average ACM value of 55.5806 among all the others.
Since the CM is the image from the IPG CarMaker used for camera data injection, it is
expected to have the highest ACM values when compared to the OTA and DDI method.
Between OTA and DDI methods, the average value of ACM for DDI is higher in all the
scenarios except in scenario number 8. DDI method for scenario 8 also has the lowest
average ACM value of 25.5222 among all the others.
Figure 4.8: Absolute Central Moment value comparison of OTA, DDI and CarMaker for
14 scenarios
The metric used to evaluate the edges in an image is described in section 3.7.4. Figure
4.9 shows the plot of the average value of Eg calculated for over the 14 scenarios for
OTA and DDI methods. It is noted that the average Eg of the DDI method for all
the scenarios are always higher than the OTA method. The highest Eg is observed in
scenario 12 for the DDI method with the value 0.9386 and the OTA method has 9.33%
lesser value in the same scenario. The lowest Eg value is found in the scenario with the
value of 0.5631 with the DDI method and the OTA method has 33.17 % lesser than the
DDI in the same scenario.
As described in the section 3.7.4, the Eg value is a quantitative measure of pure intensity
pixels. DDI method having a higher average Eg value indicates that the edges in the
images obtained using the DDI method are having more edge pixels. Having more edge
pixels is an indication of the existence of higher pixels that constitute edges in the image.
44
Chapter 4: Result and Discussion
Figure 4.9: Average value of Eg comparison of OTA and DDI for 14 scenarios
To measure how well the OTA and DDI methods can perform in testing a camera-based
algorithm, an object detection algorithm has been tested in this thesis. There were
two approaches used to implement the Object detection algorithm. These two methods
are explained in the sections 3.8.2 and 3.8.3. Using the YOLO V3 object detection
algorithm in ROS was posed with difficulty in keeping the FPS of OTA and DDI the
same. Due to the requirement of calibrating the automotive camera, an image processing
layer has been added to rectify the image. In ROS the output from the GMSL camera
driver is given to this imaging processing layer. Once the input image is rectified by
the image processing layer, it will then publish the rectified image. Because of this
process of rectifying the image and then publishing it, images received at a very lower
rate to the YOLO V3 algorithm in ROS. In the case of the DDI method, there is no
image processing layer between the GMSL camera driver and the YOLO V3 algorithm.
DDI method having a higher frame rate signifies that the images evaluated for object
detection are higher than the OTA with a lower framer rate.
Since the object list is generated based on the input images to the YOLO V3 algorithm,
it poses a problem of comparing the results from the OTA and DDI methods which
have different frame rate. For the purpose of evaluation of object detection the object
detection method mentioned in the section 3.8.3 is utilized. In this method, the object
detection is performed offline after extracting the images from the OTA and DDI method
with the same frame rate of 30 FPS. The object detection is conducted with a threshold
of 65% IoU. The images extracted are first subjected to image sorting using the dHash
algorithm 3.6.2. After the image sorting, ground-truth for the image sets of OTA, DDI
45
Chapter 4: Result and Discussion
and CarMaker are labelled. Finally, the object detection algorithm is performed locally
with the YOLO V4 algorithm for images from OTA, DDI and CarMaker.
In the thesis, a total of 14 different scenarios are simulated. All the image quality metrics
are measured for images in these 14 scenarios. But for object detection, only two of these
scenarios are selected. It is aimed to select the scenarios which have the highest and the
lowest image quality measurements. To determine which scenarios to conduct object
detection a selection criterion is created based on the image quality measurements. The
considerations for the selection criterion are listed below.
46
Chapter 4: Result and Discussion
section 2.3 using the Equation 3.3. This will result in a single RMSE value for
each scenario for OTA and DDI. Overall there will be 14 RMSE value for OTA
and 14 for DDI. These 28 values are then normalized between the values of 0 and
1 using the min-max normalization method. The resulting normalized values are
shown in Table 4.1 in column 10 and 11 for DDI and OTA respectively.
RMSE of Eg values
The metric for edge pixels (Eg ) is a no-reference image quality metric. Hence Eg
value is calculated for OTA, DDI and CarMaker images for a single scenario. Then,
the RMSE value of the Eg for OTA and DDI with respect to their CarMaker is
calculated. This procedure is followed for all the 14 scenarios to obtain 14 RMSE
values for DDI and 14 RMSE values for OTA. In the end, the RMSE values of
OTA and DDI are normalized between the values of 0 and 1 using the min-max
normalization method. The resulting normalized values are shown in Table 4.1 in
column 12 and 13 for DDI and OTA respectively.
The scenario selection is based on the statistical representation of the error between
the expected and real images in a given scenario for OTA and DDI. In Table 4.1, the
values nearer to 0 have lesser error values, and the values nearer to 1 have higher error
value to their respective image quality measurement. Hence, the average of the each of
Normalized values for a particular scenario can be used as a quantity to represent the
total error in the image quality of the images for OTA and DDI. The overall average
value for a single scenario for OTA will be the mean of all the normalized values in the
row of that particular scenario under the columns of OTA. Similarly, the overall average
value for DDI is calculated. The resulting overall average values for DDI and OTA are
shown in column 14 and 15 respectively.
For the selection of scenario, the highest and the lowest value in the overall average value column in
Table 4.1 is used. Scenario number 8 has the lowest value with 0.2152 for the DDI method. Scenario
7 has the highest value with 0.7316 for the OTA method. It implies that the OTA method has the
overall lower image quality and the DDI has the highest overall image quality among all the scenarios
for OTA and DDI. Based on the results of the scenario selection, scenario 7 and scenario 8 are selected
for object detection.
47
Chapter 4: Result and Discussion
Scenario 7
The object detection evaluation results with IoU threshold 65 of images from IPG CarMaker, OTA and
DDI are shown in the Figures 4.10, 4.11, 4.12 respectively. These figures represent the results using
True Positive(TP) in blue color, False Negative(FN) in red color and False Positive(FP) in yellow color.
The frames are plotted in the X-axis and the Y-axis is the absolute number of the TP, FN or FP. The
results are summarized using Precision, Recall, F1 score and False Positive Per Image(FPPI) in Table
4.2.
Since the CarMaker images are the reference images, the difference in the performance of OTA and
DDI are represented with respect to the CarMaker result. The DDI method shows a 0.6% decrease in
precision, OTA method shows a +0.26% increase. Recall of the DDI method is decreased by 4.7% and
OTA is decreased by 6.9%. But when we look at the F1 score of the DDI method, it is decreased by
2.9% whereas, the OTA method is decreased by 3.8%. The FPPI of the DDI method is almost doubled
(increase by 99.6%) and the OTA method is halved (decreased by 50.1%).
48
Chapter 4: Result and Discussion
49
Chapter 4: Result and Discussion
Scenario 8
Similar to the scenario 7 results, the object detection evaluation results with IoU threshold 65 of images
from IPG CarMaker, OTA and DDI are shown in the Figures 4.13, 4.14, 4.15 respectively. In scenario 8,
precision for the DDI method is increased by 4% and OTA method by 3.8%. For recall, DDI decreased
by 3.8% and OTA increased by 13.2%. F1 score for DDI showed a 0.31% decrease and OTA increase by
8.6%. FPPI score is relatively similar with an 83.3% decrease in the DDI method and a 75% decrease
in the OTA method.
50
Chapter 4: Result and Discussion
The scenarios for object detection selected based on the image quality metrics in section 4.3.1. Scenario
7 was selected based on the fact that it achieved lower overall image quality. Scenario 8 was selected
because it showed overall higher image quality. The precision of object detection in OTA and DDI
methods are both higher in scenario 8 when compared to scenario 7. In scenario 7, precision for OTA
increased by 0.26% and in scenario 8 by 3.8%. In scenario 7, precision for DDI decreased by 0.6% and
increased in scenario 8 by 4%. Increased precision in scenario 8 indicates that the object detection is
more accurate in its detections with reduced false positives.
The recall for OTA in scenario 8 is better than in scenario 7. In scenario 7, recall for OTA was reduced
by 6.9% but in scenario 8, the recall was increased by 13.2%. With regards to DDI, the recall was
decreased in scenario 7 by 4.7% and by 3.8% in scenario 8. But, recall for DDI was increased in scenario
8 compared to scenario 7. With increased recall, a lesser number of false negatives are detected.
F1 score in scenario 8 for OTA is higher than in scenario 7. In scenario 7, the F1 score for OTA
decreased by 3.8%. In scenario 8 for OTA, the F1 score increased by 8.6%. In regards to DDI, the F1
51
Chapter 4: Result and Discussion
score was also higher in scenario 8 than in scenario 7. F1 score for DDI in scenario 7 was decreased by
2.9% and in scenario 8 it decreased by -0.31%. F1 score is a weighted average of Precision and Recall.
The results of precision and recall for scenario 7 and 8 also uphold concerning F1 score. The F1 score
was improved in scenario 8 when compared to scenario 7.
FPPI in scenario 8 was decreased when compared to scenario 7 for both OTA and DDI methods. In the
case of OTA, FPPI was decreased by 50% in scenario 7 and decreased by 75% in scenario 8. But in the
case of DDI, FPPI in scenario 7 was increased by 99.6% but in scenario 8, it was decreased by 83.3%.
Higher FPPI indicates that the False positives detected per frame are more. In scenario 8, FPPI for
scenario 8 was improved in both OTA and DDI methods.
As explained earlier in the section, scenario 8 exhibited better overall image quality and scenario 7 was
the least. With regards to object detection performance, scenario 8 exhibited improved performance
when compared to scenario 7. When comparing object detection performance between OTA and DDI
method, OTA showed better result than the DDI except in the case of FPPI for scenario 8. The results
were better than that of the CarMaker object detection performance.
52
Table 4.1: Normalized values of image quality measurements for object detection
53
5 0.5522 0.0000 0.3525 0.6552 0.2142 0.4914 0.5280 0.6007 0.1131 0.1444 0.5052 0.1439 0.3775 0.3393
6 0.4649 0.9748 0.2345 0.8677 0.2498 0.4998 0.4488 0.5755 0.3191 0.7560 0.1371 0.3056 0.3090 0.6632
7 0.5179 0.8641 0.3191 1.0000 0.3263 0.8444 0.4602 0.5705 0.5586 1.0000 0.0385 0.1105 0.3701 0.7316
8 0.3639 0.5793 0.0617 0.3214 0.0257 0.2598 0.4773 0.5558 0.1324 0.0000 0.2301 0.0812 0.2152 0.2996
9 0.4769 0.9004 0.2846 0.8202 0.1894 0.6038 0.5091 0.5991 0.1500 0.2333 1.0000 0.4057 0.4350 0.5938
10 0.4357 0.8526 0.1755 0.8488 0.1080 0.7353 0.4365 0.5469 0.5947 0.7247 0.0803 0.3572 0.3051 0.6776
11 0.4892 0.8764 0.2481 0.8828 0.1451 0.7507 0.4578 0.5422 0.5817 0.7834 0.2607 0.2082 0.3638 0.6740
12 0.3061 0.8767 0.0000 0.8316 0.0000 0.8787 0.2980 0.4025 0.5321 0.6801 0.3241 0.0848 0.2434 0.6257
13 0.3746 1.0000 0.1468 0.9443 0.0851 0.9737 0.3258 0.4308 0.5582 0.6984 0.6445 0.0622 0.3558 0.6849
14 0.3469 0.9568 0.0660 0.9430 0.0897 1.0000 0.3754 0.4911 0.3787 0.4496 0.1227 0.1142 0.2299 0.6591
Chapter 4: Result and Discussion
Chapter 5
Conclusion and Future Scope
This chapter discusses the accomplished objectives and future works. Section 5.1 discusses the tasks
completed followed by section 5.2 describing the recommendations on what can be done further to improve
the functionality, performance, and robustness of the algorithm developed in this work.
5.1 Conclusion
As defined in section 1.2, the goal of the thesis is to evaluate the two methods of camera data injection
namely, Over-the-air (OTA) and Direct Data injection (DDI) used for testing camera-based algorithms.
Furthermore evaluating how efficient are the two methods of data injection in testing an object detection
algorithm. In the following sections conclusion of these two questions are discussed in detail.
The list of objectives accomplished are as follows:
1. A pipeline for injecting real-time camera sensor data into the ECU was created.
2. Using ROS, a tool-chain in ECU to receive, read and save the camera sensor data being injected
was created.
3. 14 different test case scenarios were created using IPG CarMaker.
4. Comparison and evaluation of images from OTA and DDI methods were performed.
5. A object detection algorithm was tested using OTA and DDI methods.
54
Chapter 5: Conclusion and Future Scope
1. Maintaining same resolution in both OTA and DDI method. Mainly with regards to LCD
monitor, VIB and automotive camera.
2. Reducing distortion caused due to lens inside camera-box and improve its focus.
3. Implementing lens distortion similar to automotive camera in the video interface box (VIB)
device.
4. Eliminating the necessity of using image sorting for evaluating same in OTA and DDI methods.
5. Increasing the frame rate of the OTA method and performing object detection in the loop.
6. Maintaining the same ground-truth based on the CarMaker data to evaluate object detection
performance.
7. Creating a feedback loop from ECU to the host PC to transfer real-time object detection data,
thereby utilizing the object detection data in host PC to implement driver assistance systems.
8. Integrating and simulating other sensors such as radar and LIDAR for the same scenario used
for the camera sensor.
55
Appendix A
VIB Instructions
The Appendix provides instructions to setup VIB, to run CarMaker (8.0) scenario from a Windows
Host PC and to read camera data in Nvidia Drive PX2 using driveworks
56
Appendix B
VIB Configuration File
Views.Fullscreen = 1
Views.Amount = 1
Views.MonitorId = 2
# Camera settings
View.1.Camera.Sensitivity = 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0
View.1.Camera.Position = 2.8 0.0 1.25
View.1.Camera.Direction = 0.0 0.0 0.0
View.1.Camera.Distance = 0.05
View.1.Camera.Lens = direct
View.1.Camera.HFov = 50
View.1.Camera.Output = raw12
57
Appendix C
Result of Object detection of
Scenario 7
58
Appendix
59
Appendix D
Result of Object detection of
Scenario 8
60
Appendix
61
Bibliography
62
Bibliography
[13] Florin Cutzu, Riad Hammoud, and Alex Leykin. “Distinguishing paintings from
photographs”. In: Computer Vision and Image Understanding 100.3 (2005), pp. 249–
273.
[14] AA Michelson. Harvey B. Plotnick collection of the history of quantum mechanics
and the theory of relativity. 1927.
[15] The Slanted Edge Method — Strolls with my Dog. https://www.strollswithmydog.
com/the-slanted-edge-method/.
[16] Kenichiro Masaoka et al. “Modified slanted-edge method and multidirectional
modulation transfer function estimation”. In: Optics express 22.5 (2014), pp. 6040–
6046.
[17] ROS.org — Powering the world’s robots. https://www.ros.org/. (Accessed on
04/06/2021).
[18] Nvidia Drive - Wikipedia. https://en.wikipedia.org/wiki/Nvidia_Drive.
(Accessed on 04/18/2021).
[19] Tesla Autopilot - Wikipedia. https://en.wikipedia.org/wiki/Tesla_Autopilot.
(Accessed on 04/18/2021).
[20] kinetic - ROS Wiki. http://wiki.ros.org/kinetic.
[21] Marko Bjelonic. YOLO ROS: Real-Time Object Detection for ROS. https : / /
github.com/leggedrobotics/darknet_ros. 2018.
[22] Joseph Redmon and Ali Farhadi. “YOLOv3: An Incremental Improvement”. In:
arXiv (2018).
[23] rosbag/Commandline - ROS Wiki. http://wiki.ros.org/rosbag/Commandline.
[24] ROS Camera calibration. url: http://wiki.ros.org/camera_calibration.
[25] camera calibration/Tutorials/MonocularCalibration - ROS Wiki. http://wiki.
ros.org/camera_calibration/Tutorials/MonocularCalibration.
[26] Zhixiang Wang, Yinqiang Zheng, and Yung-Yu Chuang. “Polarimetric Camera
Calibration Using an LCD Monitor”. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR). June 2019.
[27] Dr.Neal Krawetz. Kind of Like That - The Hacker Factor Blog. http : / / www .
hackerfactor.com/blog/index.php?/archives/529- Kind- of- Like- That.
html.
[28] Structural similarity (SSIM) index for measuring image quality - MATLAB ssim -
MathWorks Deutschland. https://de.mathworks.com/help/images/ref/ssim.
html. (Accessed on 04/11/2021).
[29] Saif Pertuz. Focus Measure - File Exchange - MATLAB Central. https://de.
mathworks.com/matlabcentral/fileexchange/27314-focus-measure. 2021.
[30] John Canny. “A computational approach to edge detection”. In: IEEE Transac-
tions on pattern analysis and machine intelligence 6 (1986), pp. 679–698.
63
Bibliography
[31] Florin Cutzu, Riad Hammoud, and Alex Leykin. “Estimating the photorealism of
images: Distinguishing paintings from photographs”. In: 2003 IEEE Computer So-
ciety Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
Vol. 2. IEEE. 2003, pp. II–305.
[32] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. “Yolov4: Opti-
mal speed and accuracy of object detection”. In: arXiv preprint arXiv:2004.10934
(2020).
[33] AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for
Object Detection (Windows and Linux version of Darknet ). https://github.
com/AlexeyAB/darknet. (Accessed on 04/18/2021).
64