Journal Pone 0296879
Journal Pone 0296879
Journal Pone 0296879
RESEARCH ARTICLE
Funding: This manuscript has been authored by the CAN captures. Our contributions aim to facilitate appropriate benchmarking and
UT-Battelle, LLC under ContractNo. DE-AC05- needed comparability in the CAN IDS research field.
00OR22725 with the U.S. Department of Energy.
The publisher, by accepting the article for
publication, acknowledges that the U.S.
Government retains a non-exclusive, paid up,
irrevocable, world-wide license to publish or 1 Introduction
reproduce the published form of the manuscript, or Modern vehicles are increasingly drive-by-wire, relying on continual communication of small
allow others to do so, for U.S. Government
computers called electronic control units (ECUs). Nearly ubiquitous in modern vehicles, Con-
purposes. The DOE will provide public access to
these results in accordance with the DOE Public
troller Area Networks (CANs) facilitate the data exchange among ECUs by providing a com-
Access Plan (http://energy.gov/downloads/doe- mon network with a standard protocol. While lightweight and reliable, the CAN standard has
public-access-plan). This research was sponsored well-known security flaws, lacking authentication, encryption, and other important security
in part by Oak Ridge National Laboratory’s features. Furthermore, attack vectors to intra-vehicle CANs are growing in scope as vehicles
(ORNL’s)Laboratory Directed Research and are increasingly offering channels of connectivity. While exploitation of the CAN bus in previ-
Development program and by the DOE. There was
ous works is often implemented directly, e.g., by mandatory on-board diagnostics II (OBD-II)
no additional external funding received for this
study. The funders had no role in study design, ports [7, 8], successful attacks to vehicle CANs also can occur indirectly/remotely through a
data collection and analysis, decision to publish, or variety of vehicle interfaces, such as wireless communication channels [9, 10].
preparation of the manuscript. Consequently, CAN security and vulnerability research has accelerated, with most literature
Competing interests: The authors have declared focused on proving how “hackable” vehicles are [7–14], or proposing novel CAN intrusion
that no competing interests exist. detection systems (IDSs) [5, 15–17]. CAN IDS research has grown rapidly, suffering from an
inability to reproduce or replicate and compare methods. As a result, proposed detection tech-
niques are often not tested on appropriate data due to lack of availability. E.g., Hossain et al.
[18] simulate attacks in real CAN data by adding frames in order to validate an LSTM-based
CAN IDS. In theory their IDS may detect much more subtle attacks, e.g., masquerade attacks
(see Sec. 2.2), but without available data, this cannot and was not tested. Further, using simu-
lated data or attacks limits fidelity compared to real vehicular CAN data with real, and physi-
cally verified, attacks.
To address this problem, we introduce the Real ORNL Automotive Dynamometer (ROAD)
dataset, a novel CAN IDS dataset comprised of real automotive CAN data. ROAD contains
CAN data from the vehicle during ambient driving including a wide variety of driver activities.
The dataset has labeled attacks ranging from easy to difficult to detect. More specifically,
ROAD includes multiple variantions of real (i.e. non-simulated) fuzzing, fabrication, unique
advanced attacks, and simulated masquerade attacks. The goal is to allow appropriate testing
and comparable benchmarking of CAN IDS.
To this end, we also provide a thorough guide to all publicly available CAN IDS datasets to
aid researchers in selecting and the most appropriate dataset for testing their method. Our sur-
vey of previous datasets surveys the publicly-available datasets suitable for CAN IDS testing.
We provide for each dataset their references details to help researchers, in particular, data
characteristics (real/synthetic, raw CAN and/or signal-translated data, number of vehicles,
total time) and attack characteristics (what types of fabrication, suspension, masquerade, or
other are present, real/simulated, and whether the attacks are identifiable with simple timing-
based methods). See Tables 1–3. We also include a discussion of the uses of the datasets, and
findings from our in-depth data analytics on each dataset.
The remainder of this paper is organized into the following sections. The introduction pro-
vides the reader with an overview of the state of CAN IDS research. Here, we illustrate two
major roadblocks prohibiting this research from advancing and focuses mainly on highlight-
ing the dearth of quality data. We point out the consequences that scarcity of data is having on
the community and map them directly to this paper’s contributions. Section 2 provides neces-
sary background on CAN protocol and vehicle attack terminology. In particular, we partition
attacks into categories fabrication attacks (which add frames to the bus, e.g., DOS and fuzzing),
All datasets include ambient data for the given vehicle(s) in addition to attack data.
https://doi.org/10.1371/journal.pone.0296879.t001
Total time of ROAD CAN data is 3h 27m 10s. This does not include the signal translations of the same data. ROAD includes 13 masquerade attack files, which are
identical to the corresponding targeted ID captures but with the ambient frames of the target ID removed during the attack period to simulate the masquerade. ROAD
total CAN data time without the 13 masquerade attacks is 3h 16m 13s.
https://doi.org/10.1371/journal.pone.0296879.t002
https://doi.org/10.1371/journal.pone.0296879.t003
suspension attacks (which prevent frames from being sent thereby removing them from the
bus), masquerade attacks (replacing legitimate frames with malicious frames), and other. Spe-
cific definitions of types of these attacks are discussed. Section 3 comprises the survey, analysis
and discussion of all previous CAN attack datasets. Section 4 introduces our new CAN attack
dataset, and Section 5 concludes this work.
Fig 1. Papers published in peer reviewed journals based on: a) yearly trend of CAN IDS research and b) frequency
of of CAN IDS category; 1) frequency/timing-based, 2) payload-based, 3) signal-based, 4) physical side-channel,
and 5) other.
https://doi.org/10.1371/journal.pone.0296879.g001
While the number of publications in the CAN security domain, especially in IDS research,
has grown appreciably in the past few years, IDS research is significantly hindered by two
major issues: (1) proprietary CAN signals (not the focus of this work that has been approached
by CAN reverse engineering frameworks) and (2) lack of high-quality, publicly available, real
CAN data with advanced attacks present (the focus of this work). We detail work on CAN
reverse engineering in Section 1.1.1.
1.1.1 CAN reverse engineering problem. The asymmetric growth in the field—in partic-
ular the disproportionate number of publications on methods that are timing/frequency-based
(and to a lesser extent payload-based) as compared to signal-based—is a direct result of the
proprietary CAN signal encodings issue. Original equipment manufacturers (OEMs, e.g., Sub-
aru, Ford) of passenger vehicles hold secret their proprietary encodings of signals in the CAN
data fields and vary the encodings across models. Consequently, though researchers can easily
add a node to monitor and send CAN messages on most vehicles, the data is not understand-
able. Thus, most have focused on methods that do not require knowledge of signal encodings.
Indeed a few researchers have paired with OEMs or done some manual reverse engineering to
obtain and develop IDSs based on the decoded CAN signals, though these developments are
not vehicle-agnostic.
Although the proprietary CAN signal problem is not addressed in this paper, it is necessary
context for the issue of availability of real automotive CAN data with advanced attacks present.
Partially reverse engineered signal mappings are emerging online, and a small subfield has
emerged with the goal of automating the reverse engineering of signals from automotive CAN
data [53, 55–58, 61]). Verma et al. [61] provide a comprehensive treatment of the problem as
well as a survey of the previous work. In particular, the signal-reverse engineering problem can
be broken into four sub-problems: (1) tokenization—identifying the signal boundaries within
the data field (e.g., in a 64-bit data field, bits 1-8 may encode wheel speed, bit 9 a binary indica-
tor of cruise control, bit 10 unused, bits 12-16 a temperature signal, . . .); (2) endianness—for
signals crossing a byte boundary the order of the bytes is needed; (3) integer-to-binary encod-
ing—usually base 2 or 2’s complement encoding is used; (4) interpretation—an affine mapping
of each signal to achieve the correct units and labeling the signal with what it communicates
(e.g., speed in mph). Table 4 itemizes previous CAN reverse engineering works and the por-
tions of the four sub-problems to which they contribute. For a comprehensive survey on CAN
reverse engineering, we refer the reader to prior work by Buscemi et al. [62] for further details.
Overall, signal reverse engineering can facilitate CAN IDS research and, more generally, may
enable a wide variety of downstream automotive technologies, e.g., OpenPilot aftermaket
driver assistance technology https://comma.ai/.
data with targeted attacks may be viewed unfavorably by OEMs, resulting in lawsuits if not
handled responsibly. In short, developing subtle attacks can be prohibitively expensive as
researchers must have a dedicated modern vehicle for study, appropriate facilities for safety,
access to deep offensive security expertise, and potentially legal support.
To our knowledge, there are currently six publicly available vehicle CAN datasets with
labeled attacks (see Table 1). Likely due to the inherent difficulties in producing real CAN
attack data described above, in these datasets, the only real attacks in present real data are fab-
rication attacks (by message injection)—all other attack captures are either real data with simu-
lated attacks, or are entirely composed of synthetic data. All have significant limitations when
supporting CAN IDS development. Fabrication attacks are generally simple to detect with tim-
ing-based methods and are thus limited in scope. Due to the complex dynamics of the broad-
cast CAN protocol, the simulated CAN attacks ignore aberrations in message timing, content,
and presence that naturally occur, and therefore change data quality in unknown ways. Fur-
ther, physical verification of the effect of the simulated attack on the vehicle is not possible.
Succinctly, there is no publicly available, real CAN data with labeled attacks that is of sufficient
quality to permit assessment of many CAN IDS methods.
1.3 Consequences
A survey by Loukas et al. [49] classifies 17 surveyed automotive CAN IDS papers by utilizing
the following evaluation methods: “analytical” (theoretical only, no evaluation on data), “simu-
lation” (evaluated on synthetic CAN data or real CAN data with simulated attacks), and
“experimental” (evaluated on real CAN data with real attacks). The distribution of the sur-
veyed papers evaluated throughout the article is: analytical (3), simulated (8), experimental (6).
We believe this percentage and number of IDS evaluation with real CAN data is too small.
A second consequence is that CAN IDS works are not comparable, or at least not com-
pared. Rajbahadur et al. [50] surveys an even larger set of papers, e.g. with a much wider scope
of “Anomaly Detection for Connected Vehicle Cybersecurity”). The investigators found that:
Much of the research is performed on simulated data (37 out of the 65 surveyed papers) . . .
much of the research does not evaluate the newly proposed techniques against a baseline
(only 4 out of the 65 surveyed papers do so), which may lead to results that are difficult to
quantify.
This also reinforces the findings of Loukas et al. regarding synthetic data/simulated attacks.
It appears CAN IDS contributions come from researchers with a wide variety of backgrounds.
While this milieu provides a diverse set of approaches, the area suffers by lacking a uniform
body of knowledge, and the lack of depth seems to inhibit the steady development of ideas and
systematic, quantifiable progress. To again quote Rajbahadur et al.,
The varied use and scattered publication of anomaly detection [for connected vehicle cyber-
security] research has given rise to a sprawling literature with many gaps and concerns . . .
we urge researchers to address these identified shortcomings.
To the best our knowledge, there is no standard data set for comparing methods. We try to
close this gap by evaluating our model on both real and synthetic data, and we make the
synthetic data publicly available. We hope that this simplifies the work of future researchers
to compare their work with a baseline.
Finally, we find that IDSs are often evaluated against inappropriate test data. For example,
IDSs promising detection of advanced, subtle attacks are tested only on CAN data with excep-
tionally noisy attacks, or works use attacks that disrupt timing, then ignore timing in evalua-
tion to test payload-based detection. In order to not disparage other IDS works, we cite our
own insufficient evaluation of our proposed CAN IDSs as examples [63, 64]. The consequence
is that many promising IDS methods, which are excessive for the easily detectable attacks in
currently available data, are never truly evaluated on the more advanced attacks they target.
We add to these cries for a more systematic and rigorous progression of CAN IDS research.
1.4 Contributions
To address the problem at hand, we provide a comprehensive guide to the publicly available
CAN datasets that contain labeled attacks. Our treatment includes datasets with simulated
attacks and code bases for simulating attacks in post-processing of ambient data. We itemize
these datasets, their download links, citations, metadata (real or synthetic, types, duration),
and attack metadata (fabrication, suspension, masquerade, other) clearly in Tables 1–3 for ease
of comparison and reference. As most of the public datasets are not accompanied by detailed
descriptions nor publications, we performed quality analysis investigations on both the data
and documentation of each previously released CAN dataset. In this work, we provide a
detailed description of the data, a discussion to illuminate the benefits and drawbacks of each
dataset, and recommendations for appropriate use of each dataset when developing a CAN
IDS.
Next, we leverage the vehicle and dynamometer resources of Oak Ridge National Labora-
tory (ORNL) to produce and document the ROAD dataset, collected from a passenger vehicle
with a variety of real and simulated CAN attacks. The dataset provides ample (3 hours) train-
ing data with no attacks collected when driving on actual roads (not dynamometer). This data-
set provides the following real attacks: a fuzzing fabrication attack, many targeted fabrication
attacks that are maximally stealthy (manipulating only the necessary portions of the data field
and sending a single manipulated message per ambient message of the same ID), and two
advanced attacks that include no fabricated (injected) messages. Each attack was physically
verified, that is, we observed the effect it has on the vehicle. For each targeted injection attack,
we also include an augmented CAN capture by deleting the targeted ambient message to simu-
late a masquerade attack.
The ROAD dataset fills gaps in the available CAN IDS datasets. The fabrication attacks iso-
late a targeted ID and send a single frame just after the ambient frame (see Section 4.1.2 This is
the most stealthy fabrication attack possible and is not present in of the fabrication attacks of
other datasets. This allows the most thorough testing of detectors of fabrication attacks (e.g.
timing-based or AID-sequence-based detectors). Secondly, by omitting only the targeted IDs
from these attacks we simulate a masquerade attack, which is not present in other datasets.
Thirdly, we include advanced attacks which is unique only to ROAD. These latter two attack
types allow ROAD to uniquely test payload-based detectors seeking more sophisticated attacks
than fabrication attacks. By design the ROAD dataset’s attacks increase in difficulty to detect,
which is not present in previous datasets. Overall, ROAD allows appropriate testing of more
advanced detectors and facilitating head-to-head comparisons of the wide variety of proposed
CAN IDS methods. In all, 33 attack CAN captures are provided. See Table 5.
By using the CAN-D method [61] to convert the raw CAN data to signals, we provide sig-
nal-translated time series alongside many of the CAN captures. Our aim is to provide an open,
realistic, and verified CAN dataset for benchmarking CAN IDSs that take either raw CAN data
or decoded signals’ time series as input. This is perhaps the highest fidelity CAN IDS dataset
currently available in that all data and attacks were captured from a real vehicle and all attacks
are physically verified. It is also the most comprehensive in terms of quantity and diversity of
attacks included. Notably, no other real CAN data has fabrication attacks with the stealth of
ROAD (using only a single injected frame between ambient frames of the targeted ID and
manipulating only a portion of the data field necessary). No other dataset provides both origi-
nal and translated signals that correspond to raw CAN data (both ambient and attacks).
Fig 2. CAN data frame: The two primary components are the Arbitration ID used for message identification and
arbitration (prioritizing messages) and the data field, containing up to 8 bytes of message contents.
https://doi.org/10.1371/journal.pone.0296879.g002
The Arbitration ID, or simply ID, is the message header that identifies the frame and is used
for arbitration, the process by which frames are prioritized when multiple ECUs concurrently
transmit—the lower the ID, the higher the priority. The RTR bit is an indicator of a remote
frame. Any ECU can request the data on an ID by sending the ID and the RTR bit indicating
the request. This remote frame would be immediately followed by a response with the
requested ID and data. The Data Field contains the actual message contents of up to 8 bytes,
where each distinct piece of information carried in the message is called a signal. CAN frames
with the same ID encode the same set of signals in the same format and are usually sent with a
fixed frequency to relay updated signal values. In general, each ECU is assigned a set of IDs
that only it transmits. For example, the PCM may transmit: ID 0x102 with data field contain-
ing engine RPM, vehicle speed, and odometer signals every 0.05s, and ID 0x45D with data
field containing signals encoding the angle of the gas and brake pedals every 0.01s.
The CAN standard also defines a robust error handling mechanism that is designed to pre-
vent erroneous messages from being propagated or faulty nodes from disrupting communica-
tions. For example, if two nodes attempt to concurrently transmit different messages with the
same ID, both nodes will transmit their frame until they send opposing bits simultaneously, at
which point one will incur an error. If a node’s error count gets too high, it will enter a “bus
off” mode, meaning it cannot read or transmit messages on the bus until it is reset. See previ-
ous works [8, 66] for more details on CAN error handling.
involves the vehicle ceasing functionality and rolling to a stop. The driver must remove,
then reinsert the key to restart the vehicle.
Fuzzing Attack—(Note that the fuzzing attack here is distinct from the technique of fuzzing
for vulnerability discovery. While indeed random messages are sent, it is not with the intent
to reveal vulnerabilities. Nevertheless, we follow attack terminology of Cho and Shin [8] for
consistency.) Messages with random IDs and arbitrary payloads are injected at a high fre-
quency. The bus becomes occupied with mostly injected messages, displacing real messages,
and resulting in similar behavior to the DoS attack. Unlike the DoS attack, injected mes-
sages may have an ID that appears in normal traffic, so receiver nodes expecting these ID
messages will read and use the information in the malicious payload, causing a wide variety
of unexpected results. To illustrate these effects we reference a video of a fuzzing attack
[68]. There are two slight variations of this attack: some researchers inject only IDs that
appear during normal traffic (e.g., [1]), while others inject arbitrary random IDs (e.g., [3]).
Targeted ID Attack—Messages are injected with a specific target ID and manipulated data
field. When only the bits in a specific signal—that is, a select part of the 64-bit data field—
are modified, we refer to this as targeting a signal, rather than an ID.
Fabrication attacks are characterized by the inherent problem of message confliction,
described by Miller and Valasek [10],
The biggest problem with CAN message injection is that, while attackers can inject arbitrary
messages onto the bus, the original sender of the message (i.e., the legitimate ECU) is still
sending legitimate messages. . .The result of the ECU continuously sending messages along
side our attack messages is message confliction. From the perspective of the receiving ECU,
inconsistent messages are received (and it must) decide what to do with this conflicting
information.
In general, ECUs will regard the last seen data frame on a given ID; thus, to overwrite (in
effect) legitimate messages with the target ID, the injected frames must occur on the bus very
soon after the true frame. Not all data frames trigger an effect; simply reverse engineering a sig-
nal to inform targeted injections will often not result in the desired or any response from the
vehicle. Miller and Valasek [10] provide potential techniques for side-stepping message con-
fliction, but the desired effect for all techniques is the same—de-conflict ambient and fabri-
cated data frames by suspending the ambient messages.
The first two fabrication attacks described (DoS and fuzzing) require almost no under-
standing of or reconnaissance on the target vehicle, nor do they allow for finesse in execution.
On the other hand, the targeted ID attack can be more sophisticated. Targeting manipulation
of specific functionality requires knowledge of at least one of the IDs’ signals and requires the
data field designed to have a particular effect based on the given ID’s signal definitions.
Furthermore, targeted ID attacks can, similar to the first two attacks, be accomplished by
flooding the bus, simply meaning that messages are sent at a very high frequency, although this
is blatant and easy to detect. Research hackers Miller and Valesek used this tactic to success-
fully attack a Toyota Prius, injecting fabricated collision prevention system messages at a high
frequency, causing the ABS to engage the brakes [7].
The most stealthy targeted ID attack is a flam attack—immediately after each target ID’s
legitimate message, an injected message is sent (with the same ID but manipulated data), so
that the true message state is not physically realized before the spoofed message alters the car
to the target state. Injected frames and true target ID frames are in one-to-one correspondence.
This type of attack was pioneered by Hoppe et al. [13], who essentially disabled a car’s warning
lights by sending a “lights off” frame immediately after any legitimate frame’s “lights on” mes-
sage was sent, resulting in the lights appearing continually off. Provided the attacker can
reverse engineer the target ID’s payload, only the bits involved in the targeted signal need to be
manipulated.
2.2.2 Suspension attacks. An adversary mounting a suspension attack needs a weakly
compromised ECU, preventing it from transmitting some or all messages [67]. For example,
an adversary could suspend all messages on a particular safety-critical ID, thus disrupting
other systems that rely on this constantly updated data. An example from the literature is pro-
vided by Cho and Shin [8]’s bus off attack, where error counting is manipulated until an ECU
is disallowed from speaking. After this bus off attack, the ECU will not transmit its messages;
hence, those AIDs would be suspended.
2.2.3 Masquerade attacks. Finally, the most sophisticated category, masquerade attacks,
involve an adversary first suspending messages of a specific ID from a weakly compromised
target ECU, and then using a strongly compromised ECU to inject spoofed messages with this
ID at a realistic frequency, thus masquerading as the target ECU [67]. Using this more
advanced strategy, a targeted ID attack can be carried out without message confliction, thus
allowing for a more stealthy attack. Miller and Valasek’s infamous remote Jeep Cherokee hack
[10] employed a masquerade attack. Unlike the Prius [7], the Cherokee ABS system dealt with
message confliction by simply turning off the collision prevention system, and thus they were
unable to mount a similar fabrication attack; instead, they had to first suspend legitimate mes-
sages (in addition to a few other steps) in order to mount an attack.
In another example, Cho and Shin cleverly use a strongly compromised ECU in order to
weakly compromise a target ECU by causing it to go into bus off mode, at which point they
run a masquerade attack [8]. Notably, a very recent paper of Bloom [30] provides stealthier
techniques for exhibiting this attack. Interestingly, if an attacker is not careful when mounting
a fabrication attack, this same mechanism can result in the attacker’s own strongly compro-
mised ECU getting bussed off. In fact, we have done this on many occasions, in essence run-
ning a suspension attack on ourselves!
This previous research has shown that masquerade attacks are indeed possible, but they
require ample CAN hacking expertise. Further, white-hat CAN hackers and CAN intrusion
detection research communities are working independently with seemingly different skill sets
toward a common goal. Thus, no real CAN data in which a masquerade attack is exhibited has
been made publicly available, and the evidence from the CAN IDS research community is that
most defensive researchers likely do not have the skills or resources to create such advanced
attacks.
2.2.4 Timing Transparent vs. Timing Opaque. Unsurprisingly, these attack categories
map to intrusion detection techniques that have matured alongside offensive developments.
Fabrication and suspension attacks are detectable by frequency-based IDSs, which regard the
timing of each ID and/or the sequential nature of IDs. Masquerade attacks require more
sophisticated methods: for example, attempting to identify the sending ECU [15, 67], luring an
added node to reveal itself [1], or inspecting the data field [5, 16, 64].
We define a Timing Transparent (T.T.) attack to be any that is hypothetically detectable
using a frequency-based method: a fabrication attack, detectable by unusually fast message
timing or the appearance of new IDs, or a suspension attack, detectable from unusually slow
or disappearance of usually present IDs.
An attack that is Timing Opaque (T.O.), on the other hand, is defined as an attack that does
not disrupt normal timing or ID distributions, and thus would not be detected with a fre-
quency-based IDS. Instead, a more sophisticated method, such as a payload-based detector
that uses the data field, or perhaps a side-channel method monitoring the physical layer,
would be needed. In short, developing a comprehensive and robust IDS requires hardening
(and therefore testing) against timing opaque attacks. A masquerade attack is the primary
example, but other attacks that may alter the overall state of the vehicle (e.g., the “accelerator
attack” in the ORNL dataset, Sec. 4.1.3, may also be included in this category.
3 Previous datasets
We itemize the publicly available CAN datasets with attacks, including descriptions of the
attacks present, whether they are real or simulated, and the dataset benefits and drawbacks.
We examined each dataset and attempted to verify the accuracy of documentation (refer to
Table 1).
...
...
...
...
...
Benefits— The fuzzing attack provided in this dataset is the slightly stealthier version that
involves only spoofing IDs that appear in normal traffic. This is the only example of this
kind of fuzzing attack in an open dataset. This is also the only open dataset with remote
frames and responses.
Drawbacks— First and foremost, the injected messages are not labeled, and the documenta-
tion on the injection intervals is unclear and possibly incorrect. Authors indicate that in the
DoS attack all 0x00 messages are injections (these take place during the entire capture),
and the fuzzing and impersonation attacks start after *250s. However, our analysis indi-
cates that these attacks take place during the entire capture. Furthermore, this disagrees
with their paper, which depicts an injected message during the fuzzing attack at 0.1565s,
well before 250s. Second, as explained above, the “impersonation attack,” while character-
ized as masquerade attack in their paper, does not seem to be a true masquerade attack
since the message transmission by the legitimate node is not suspended. While it is possible
that we misunderstood their documentation, our confusion on the matter and the various
discrepancies we found are a testament to poor documentation. Finally, the presence of
remote frame requests and responses results in small timing changes that are not usually in
ambient traffic, and may be problematic for testing and training a timing-based detector.
Overall, unless it used for leveraging remote frames for an IDS, this dataset is not
recommended.
Fig 3. The time gap between subsequent messages (all messages on the CAN capture of any ID are included) are
plotted over time during fuzzing attacks on four different vehicles, with top three plots from each vehicle in the
HCRL Survival Analysis Dataset, and the bottom plot from the ORNL dataset. While the injections (in red)result in
a significant disruption in the overall message timings in the HCRL dataset, the fuzzing attack in the ORNL dataset
does not, and would therefore be slightly more difficult to detect using a timing-based IDS. This also illustrates that the
bus load and overall message frequency distribution varies widely across vehicles.
https://doi.org/10.1371/journal.pone.0296879.g003
Finally, the ambient data and attack data are in differently formatted CSVs, which is
undesirable.
Fig 4. HCRL Car Hacking dataset contains unintentional artifacts of data collection; in particular, in each of the
four attack datasets, right after conclusion of the attack, there is a prolonged period during which no messages
appear on the bus. This depicts the end of the DoS dataset, starting from the last four injection intervals (red),
followed by *53s of ambient traffic (blue), and a *22s transmission gap before ambient message resume again. Note
that the first point after messages resume (with a Δt � 22.4s) has been omitted for scale. We hypothesize that this gap is
due the CAN bus going into a “stand-by” mode due to inactivity, that is, the vehicle is not being operated and no
messages are being injected.
https://doi.org/10.1371/journal.pone.0296879.g004
source of the issue, namely that the car was not being driven during the attacks (which we
verified by decoding the proprietary signals), poses a problem that cannot be solved in
post-processing. As the car is being driven in the ambient data, the test data fundamentally
differs from the training data outside of the injections, making it an unsuitable test set.
Given these issues, this dataset does not seem like a good choice, even for testing a simple
detector. Similar to the HCRL survival dataset, these attacks are particularly unstealthy with
respect to disrupting overall bus timing (see Fig 4). Finally, ambient and attack data are in
different formats (fixed width format and CSV).
compromised ECU, respectively) and a joystick programmed to replicate the throttle that
sends messages used by the speedometer in the instrument cluster [6]. No original use infor-
mation is provided.
Data—Real data from two cars and synthetic data from a testbed.
Attacks—Simulated except for one targeted ID fabrication attack on the CAN testbed. They
simulate a set of attacks on each CAN, and for all but one attack, simply augmented the
recorded data in post-processing by doing the following: for fabrication attacks, they
“added packets manually and adjusted timestamps accordingly”; for suspension attacks,
they deleted particular frames; and for masquerade attacks, they replaced the data field of
particular frames. The dataset also includes one real attack on their CAN testbed, a targeted
ID fabrication attack. The joystick is used to send messages during normal traffic, and dur-
ing the attack, messages with this ID are injected by the compromised ECU.
Benefits—This dataset includes the only diagnostic protocol attack publicly available, and the
only suspension attack (simulated) in real CAN data. (Recall that SynCAN contains suspen-
sion attacks but is signal data from a simulated CAN.) The same set of attacks is available
for testing on multiple vehicles/CANs.
Drawbacks—First and foremost, adjusting timestamps in post-processing alters data and
diminishes fidelity in a critical way: message timing on the bus is dependent on each ID’s
frequency and priority through the arbitration process. Thus, changing message timings
risks creation of a synthetic dataset that is not realistic. For the DoS attack, they simply
overwrite 10s worth of frames, which is much less noisy than how real DoS attacks appear.
With respect to the testbed, it is unclear how they generated ambient traffic (e.g., were they
recorded and replayed from another car?), and such a testbed is an imperfect proxy for a
real vehicle. Finally, attack labels are in an unstructured text file, so there is no way of
programmatically reading what/when packets were injected.
Benefits—While these authors do not provide any CAN data with attacks, the authors provide
their framework for simulating a wide variety of masquerade attacks; this facilitates the cre-
ation of unlimited masquerade attacks—for example, combining different payload manipu-
lation techniques simultaneously on different IDs, in any CAN data. As this software is
open source, it could be extended to add new attacks. This is the only dataset with real CAN
data that allows for injections that vary continuously over time and is the only dataset
(other than our new dataset, ROAD) that allows for modifying only part of a data field;
thus, it enables attacks targeting signals (select bits of a payload). Other than ROAD, this is
the only dataset furnished with descriptions of the driver’s actions during ambient captures,
which is highly valuable when for training and testing an IDS.
Drawbacks—As attacks are added in post-processing, there is no guarantee that these attacks
would actually affect vehicle function. Moreover, these attacks are completely blind to the
function and signal mapping of particular target IDs. There are also logical problems with
Can-Log-Infector’s implementation, most notably that the data field can only be modified
at the byte level (i.e. replacing characters in hex representation) and all selected bytes must
change uniformly. Thus, signals that do not exactly fill a set of bytes cannot be solely tar-
geted (recall that payloads are composed of several signals of varying lengths and positions
whose bits often cross byte boundaries). Furthermore, incrementing whole bytes means
multi-byte signals will vary in a highly discontinuous manner. Finally, the method for speci-
fying the injection interval is rather irksome—rather than specifying a starting timestamp,
the user passes “a value between 0 and 1 (indicating) the ratio when the attack should start
regarding the full length of the capture,” and the attack end point cannot be specified.
Fig 5. Depiction of six of the targeted ID and masquerade attacks in the ORNL dataset.
https://doi.org/10.1371/journal.pone.0296879.g005
characteristics of the data, with visualizations of three different features (timing, payload, sig-
nals) of six attack captures in our dataset. While ROAD’s fuzzing, fabrication, and suspension
attacks provide T.T. attacks of increasing stealth, the Accelerator Attack—a result of a newly
discovered and disclosed vulnerability—and many simulated masquerade attacks are T.O. For
developing detector using payloads, the Correlated Signal, Max Speedometer, and Reverse
Light attacks all entail discontinuities or break correlation in CAN signals.
Using the fabrication captures we produce simulated masquerade attacks by removing the
legitimate target ID frames preceding each injected frame to provide more advanced versions.
In effect, this removes message confliction in the data, making it appear as though only the
spoofed messages are present during the injection interval. With this masquerade dataset, fre-
quency-based approaches will almost certainly fail to provide accurate detection. It is impor-
tant to note that while the masquerade aspect is simulated through post-processing, this means
of alteration avoids problematic issues with synthetic data. Namely, the effect of the attack on
the vehicle was physically verified; every message appearing in the data was actually seen by
the car in the order it appears in the data; and no aspect of CAN protocol was violated. As dis-
cussed in the introduction, there are no publicly available, real CAN data captures with real
masquerade attacks, and the hacking skill required to implement such an attack on a real vehi-
cle seems to be preventing CAN IDS researchers from implementing such an attack. This pro-
vides the highest fidelity alternative possible.
Considering the top two rows of Fig 5 we can clearly see the difference between the fabrica-
tion attack and corresponding simulated masquerade attack. In the top row (fabrication
attack), all four wheelspeed signals appear to move continuously but are joined by a second
curve—the injected value of that signal—during the attack interval; whereas, in the second row
(masquerade attack) continuity of the signal values is broken and only the injected values
appear during the attack interval.
4.1.3 Accelerator attacks. We have responsibly disclosed this vulnerability to the OEM,
and will not disclose details of how to implement this attack. We do not include the CAN data
during the exploit. After the exploit, the effect is that the vehicle is in a state that has less con-
trol by the driver as follows: when put into Drive gear, the vehicle accelerates to a fixed speed
and then holds this speed (regardless of accelerator pedal position or cruise control setting);
when in reverse, the vehicle accelerates to a (different) fixed speed and holds this speed
(regardless of accelerator pedal position or cruise control setting); cruise control is disabled;
touching the brake pedal results in the acceleration ceasing and the brakes engaging normally;
when the brake is released, the vehicle commences accelerating as described above. The Accel-
erator Attack captures have no injected messages, but simply record the CAN data when the
vehicle is in this state. Discrepancies exist between the vehicle’s actions and the driver’s inputs,
e.g., acceleration occurs regardless of the accelerator pedal position.
4.2 Obfuscation
While other public CAN datasets provide information on the make, model, and year of the
vehicles attacked, it would be irresponsible, given our previous disclosure, to release such
information. Furthermore, we have taken steps to obfuscate the CAN data in such a way as to
preserve the characteristics necessary for CAN IDS development, while ideally preventing
users from knowing the make, model, and year of the vehicle. Below we itemize the augmenta-
tions performed on the data to preserve anonymity:
• Absolute timestamps are shifted uniformly by a scalar.
• Arbitration IDs that were constant, aperiodic, or periodic with frequency under 0.1 Hz (less
than one frame per ten seconds) were replaced with the “filler message”
FFF#0000000000000000 (ID#Data in hex) and same relative timestamp.
• Messages on reserved IDs (greater than 0x700: e.g., diagnostic messages) have been
removed.
• IDs have been anonymized in such a way that arbitration order/priority is not preserved.
There is a one-to-one mapping between the original and the anonymized IDs for a given
vehicle (not including the “filler messages” under ID 0xFFF). For example, if ID 0x10 is
converted to ID 0x821 in an anonymized log, the same is true for all logs.
• Data fields have been scrambled in such a way that signals have been preserved, and fields
are scrambled in a consistent way for each ID; e.g., if the first byte is moved to the end of the
field for ID 0x10, it will be shifted this way in all messages from ID 0x10.
ROAD allows researchers to compare and contrast different techniques in the CAN IDS
realm for benchmarking as seen in [23, 70]. Researchers in [70] compare the evaluation of a
variety of anomaly-based CAN IDS methods against the ROAD dataset. The authors men-
tion that the ROAD dataset stands-out compared to other datasets because it consists of
observations with the stealthiest targeted ID fabrication. Sharmin et al. evaluated four statis-
tical (ID sequences, entropy-based, Hamming distances, frequency-based) and two ML-
based (OCSVM, isolation forest) CAN IDS algorithms against ROAD. An in-depth evalua-
tion of the IDS methods is provided including training time, testing time on different attacks,
precision, recall, accuracy, F1-score, balanced accuracy (bACC), informedness (BM), mark-
edness (MK), and Matthews Correlation Coefficient (MCC). From these metrics, the investi-
gators discovered that the entropy-based algorithm was most effective against fuzzing and
targeted ID attacks. Authors provide deep comparison and analysis of the performance of
each algorithm however, the overlying point is that the ROAD dataset enabled for this com-
prehensive evaluation. Blevins et al. benchmark four time-based IDSs (mean inter-message
time, binning, fitting Gaussian distribution, and kernel density estimation) against the
ROAD dataset [23]. The researchers discovered that the two distribution-agnostic achieved
the highest F1 scores while the distribution-based approaches were limited in comparison,
suggesting that heuristic approaches outperform methods that explicitly relay on p-value
thresholds.
Another key aspect of introducing the research community to the ROAD dataset is that it
provides a quality platform to test novel IDS architectures. In [37, 71–78], researchers use the
ROAD dataset for evaluation purposes from a novel IDS method. Jin et al. combine oversam-
pling, outlier detection, and metric learning for intrusion detection and evaluate their model
on ROAD (and other datasets) [71]. Suhail showcase a gamification (attacker vs defender)
framework for assessing physical cyber security of digital twins [72]. In [73], a study investi-
gates the use of a context aware IDS for detecting cyberattacks on the CAN bus and use the
ROAD dataset for training. The potential of embedding an IDS that utilizes characteristic
functions is proposed in [79], where researchers evaluate the cybersecurity framework on
ROAD. In [77], researchers utilize the ROAD dataset to validate a model called “Deep Evolv-
ing Stream Clustering- IDS” or DESC-IDS. Cheng et al. propose the model as a means of
anomaly detection capable of reducing data complexity for constructing spatial-temporal fea-
tures and exposing attacks. Shahriar et al. compare evaluation of a model called CANShield
between ROAD and SynCAN datasets [37]. Researchers show anomaly scores and ROC curves
for the attacks occurring with the ROAD dataset. Moriano et al. [78] propose a forensic frame-
work for detecting masquerade attacks. Authors demonstrate the results from the study indi-
cate high effectiveness of detecting attacks and the potential of utilizing said framework for
real-time IDS. These pieces of literature demonstrate the impact that the ROAD dataset can
have for enabling novel model evaluations.
ROAD also presents as a hub for researchers to reference the taxonomy of CAN data. Sys-
tematic and survey literature has recently been published citing the ROAD data [51, 80–82].
Some studies make claim that the ROAD dataset is the most comprehensive and realistic open
CAN dataset available for evaluating and comparing CAN IDSs for attacks [51, 81]. Other
research has referenced the quality of the dataset, used it to establish definitions within the
CAN IDS research community, or cited the work as an establishment of research standards
[21, 83–90].
Finally, a few studies have plans to utilize the ROAD dataset for future investigations.
Agbaje et al. plan to utilize the ROAD dataset for benchmarking in the future [91]. Papado-
poulos argue for the use of Named Data Networking for a solution to automotive network
issues (compared to CAN local interconnect network, low-voltage differential, etc.) and plan
on implementing it on ROAD in the future [92]. The availability of the ROAD dataset is
what enables these works to streamline and push the frontier of this research area forward
with ease.
5 Conclusion
In this paper, we identify two troubling trends for the CAN IDS research community: the lack
of comparability of CAN IDS methods and an inability to test IDS approaches targeting subtle,
more advanced attacks. By providing the first comprehensive guide to publicly available CAN
data, we contribute a single source for future researchers to consult when needing to identify
the best public dataset for their developments. Further, we contribute the new ROAD dataset,
containing real CAN data with a wide variety of attacks, designed to allow testing of the multi-
tude of different techniques arising in the literature. Many advancements/gaps are made/
bridged by ROAD—See ROAD’s Advantages 4.4. Notably, gaps that still exist in the CAN IDS
data include: real masquerade attack data; more real signal-translated CAN data with attacks;
CAN data with physical-level characteristics (e.g. voltage). Finally, it is outside the scope of this
paper to use the dataset to test IDS methods. Such examples are appearing in the literature,
e.g., [23].
6 Appendix
6.1 Description of masquerade attacks
Fig 5 shows both the Timing Transparent (fabrication, with message confliction) and Timing
Opaque (masquerade, without message confliction) versions for three attack types. The x-axis
of all plots are elapsed time (s), and the red dashed lines demarcate the attack interval. The
three main columns visualize different aspects of each of the six attacks: Message Timing:
Inter-message arrival time (ms) between all messages shown in the Top All Messages subplot,
and between only the target ID messages in the bottom Target ID Messages (Near Attack Start)
subplot, which zooms in to 15s before to 20s after the attack start. Blue dots/ red x’s indicate
legitimate/injected messages. Compare the six All Messages (Top) subplots with Fig 3 to see
overall bus timing is nearly undisturbed, whereas previous attacks are blatant. The six Target
ID Messages (bottom) subplots illustrate that the fabrication attacks (using flam injection deliv-
ery) cause unusually short inter-message times for the target ID, while masquerade attacks do
not cause perceptible timing changes. Target ID Data Field: Time series of 64-bit binary data
during the time period near the attack start (black denotes 1s, white denotes 0s). If only part of
the message was altered (i.e., one target signal), the section of altered bits are delimited with
red solid lines. Through visual inspection, fabrication attacks are more obvious due to message
confliction, and both fabrication and masquerade attacks are more noticeable when the entire
message is targeted (e.g., Correlated Signal Attack), rather than just a single signal. Target ID
Signals: The time series of signals in the target ID message are depicted, annotated with signal
names and bit ranges, which are made boldface for target signals (note not all non-target sig-
nals in the message may be shown). Notice the Max Speedometer and Reverse Light Off attack
target different signals in the same ID. While even the masquerade versions of the first two
attack types are somewhat visually identifiable at the signal level due to discontinuities and
extreme values, the Reverse Light Off Attack targeting a 1-bit signal is difficult to discern with-
out understanding more complex signal relationships or by examining signals in other
messages.
While timestamps are reported with a precision of 1μs, the hardware used to collect this
data (a Kvaser Leaf Light V2) only guarantees an accuracy of 100μs. Note that all data fields in
these logs contain the full 8 bytes, which we padded with zeros if necessary. The channel is
always can0, so this column can be dropped. We provide metadata (in JSON format) for each
capture, including a general description of driving activities, the length of the capture in sec-
onds, and whether or not the car was on the dynamometer. For attack captures, we also
include whether the capture was modified (i.e., masquerade attacks), the injection ID and data
field, and the interval of injection (start, end) corresponding to the time of the first/last
injected message in elapsed seconds. Importantly, we do not label individual messages as
attack/normal, because the software we used to collect did not have that capability. However,
with injection ID, data, and intervals, these can be labeled in post-processing fairly easily.
Examples of the provided metadata are shown in Fig 6.
We use a wildcard character “X” in the injection_data_str field to indicate that the
byte in the given position was not modified in the injection when only one signal in the data
field is targeted. Similarly, “X” in the injection_id field indicates that no particular ID
was targeted, which is only the case in the fuzzing attack. For the accelerator attack, the
injection_id and injection_data_str are null, and the injection interval is just
the start and end time of the capture (Note that all of these details are included in the full
documentation).
The translated time series are represented in CSV format, following a similar schema as
SynCAN [5], the other signal translated dataset. Specifically, the CSV files have the following
columns: Label, ID, Time, and Signal-<i>-of-ID. Labels are either 0 (benign) and 1
(attack), and all the entries in the ambient captures are labeled 0. Each of the signals within an
ID is named based on the index they have when translated, i.e., i 2 0, 1, . . ., NID − 1, where NID
is the maximum number of signals in a particular ID. We added a metadata file for each of the
logs describing the details of the CSV files.
Fig 6. Snippet of metadata for two example captures, with an example of ambient (left) and attack (right) entries.
https://doi.org/10.1371/journal.pone.0296879.g006
Acknowledgments
Thanks to Gedare Bloom for pointing out timestamp issues in an early version of the ROAD
dataset. Thanks to Suzanne Parete-Koon and Ross Miller for assistance in posting the dataset
online. Thanks Stacy Prowell and John Baston for helping us polish this document.
Author Contributions
Conceptualization: Miki E. Verma, Robert A. Bridges, Pablo Moriano.
Data curation: Miki E. Verma, Michael D. Iannacone, Samuel C. Hollifield.
Formal analysis: Miki E. Verma, Robert A. Bridges, Frank L. Combs.
Funding acquisition: Robert A. Bridges.
Investigation: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C. Hollifield,
Pablo Moriano, Steven C. Hespeler, Bill Kay, Frank L. Combs.
Methodology: Miki E. Verma, Robert A. Bridges, Samuel C. Hollifield, Pablo Moriano, Steven
C. Hespeler, Frank L. Combs.
Project administration: Robert A. Bridges.
Resources: Robert A. Bridges.
Software: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C. Hollifield, Bill
Kay.
Supervision: Robert A. Bridges, Pablo Moriano.
Validation: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C. Hollifield,
Pablo Moriano, Bill Kay, Frank L. Combs.
Visualization: Miki E. Verma, Robert A. Bridges, Pablo Moriano, Steven C. Hespeler, Bill
Kay.
Writing – original draft: Miki E. Verma, Robert A. Bridges, Michael D. Iannacone, Samuel C.
Hollifield, Pablo Moriano, Steven C. Hespeler, Bill Kay, Frank L. Combs.
Writing – review & editing: Miki E. Verma, Robert A. Bridges, Pablo Moriano, Steven C.
Hespeler.
References
1. Lee H, Jeong SH, Kim HK. OTIDS: A novel intrusion detection system for in-vehicle network by using
remote frame. In: PST. IEEE; 2017.
2. Han ML, Kwak BI, Kim HK. Anomaly intrusion detection method for vehicular networks based on sur-
vival analysis. Vehicular Comms. 2018. https://doi.org/10.1016/j.vehcom.2018.09.004
3. Seo E, Song HM, Kim HK. GIDS: GAN based Intrusion Detection System for In-Vehicle Network. In:
PST; 2018.
4. Song HM, Woo J, Kim HK. In-vehicle network intrusion detection using deep convolutional neural net-
work. Vehicular Communications. 2020. https://doi.org/10.1016/j.vehcom.2019.100198
5. Hanselmann M, Strauss T, Dormann K, Ulmer H. CANet: An Unsupervised Intrusion Detection System
for High Dimensional CAN Bus Data. IEEE Access. 2020; 8. https://doi.org/10.1109/ACCESS.2020.
2982544
6. Dupont G, Lekidis A, Den Hartog J, Etalle S. Automotive Controller Area Network (CAN) Bus Intrusion
Dataset v2; 2019.
7. Valasek Miller. Remote exploitation of an unaltered passenger vehicle. Black Hat USA. 2015; 24(S 91).
8. Cho, Shin. Error handling of in-vehicle networks makes them vulnerable. In: SIGSAC. ACM; 2016.
9. Nie S, Liu L, Du Y. Free-Fall: Hacking Tesla from wireless to CAN bus. Black Hat USA. 2017; 25.
10. Valasek Miller. CAN Message Injection. Black Hat USA. 2016; p. 29.
11. Koscher, et al. Experimental Security Analysis of a Modern Automobile. In: 2010 IEEE S&P. IEEE;
2010.
12. Checkoway S, McCoy D, Kantor B, Anderson D, Shacham H, Savage S, et al. Comprehensive experi-
mental analyses of automotive attack surfaces. In: USENIX Security; 2011.
13. Hoppe T, Kiltz S, Dittmann J. Security threats to automotive CAN networks Practical examples and
selected short-term countermeasures. Reliability Engineering & System Safety. 2011; 96. https://doi.
org/10.1016/j.ress.2010.06.026
14. Woo S, Jo HJ, Lee DH. A Practical Wireless Attack on the Connected Car and Security Protocol for In-
Vehicle CAN. IEEE Trans Intel Trans Sys. 2014;. https://doi.org/10.1109/TITS.2014.2351612
15. Choi W, Joo K, Jo HJ, Park MC, Lee DH. VoltageIDS: Low-Level Communication Characteristics for
Automotive Intrusion Detection System. IEEE Trans Info Foren & Sec. 2018; 13. https://doi.org/10.
1109/TIFS.2018.2812149
16. Taylor A, Leblanc S, Japkowicz N. Anomaly Detection in Automobile Control Network Data with Long
Short-Term Memory Networks. In: DSAA. IEEE; 2016.
17. Tomlinson A, Bryans J, Shaikh SA, Kalutarage HK. Detection of Automotive CAN Cyber-Attacks by
Identifying Packet Timing Anomalies in Time Windows; 2018.
18. Hossain MD, Inoue H, Ochiai H, Fall D, Kadobayashi Y. LSTM-based intrusion detection system for in-
vehicle can bus communications. IEEE Access. 2020; 8:185489–185502. https://doi.org/10.1109/
ACCESS.2020.3029307
19. Moore, et al. Modeling inter-signal arrival times for accurate detection of CAN bus signal injection
attacks: a data-driven approach to in-vehicle intrusion detection. In: CISRC. ACM; 2017.
20. Hamada Y, Inoue M, Ueda H, Miyashita Y, Hata Y. Anomaly-Based Intrusion Detection Using the Den-
sity Estimation of Reception Cycle Periods for In-Vehicle Networks. SAE Intern J Trans Cyber & Pri.
2018. https://doi.org/10.4271/11-01-01-0003
21. Rosell J, Englund C. A frequency-based data mining approach to enhance in-vehicle network intrusion
detection. In: Fast Zero 21, Society of Automotive Engineers of Japan, 2021. Society of Automotive
Engineers; 2021.
22. Olufowobi H, Young C, Zambreno J, Bloom G. Saiducant: Specification-based automotive intrusion
detection using controller area network (can) timing. IEEE Transactions on Vehicular Technology.
2019; 69(2):1484–1494. https://doi.org/10.1109/TVT.2019.2961344
23. Blevins DH, Moriano P, Bridges RA, Verma ME, Iannacone MD, Hollifield SC. Time-Based CAN Intru-
sion Detection Benchmark. In: AutoSec; 2021. p. 25.
24. Kang Kang. Intrusion Detection System Using Deep Neural Network for In-Vehicle Network Security.
PLOS ONE. 2016; 11.
25. Marchetti M, Stabili D, Guido A, Colajanni M. Evaluation of anomaly detection for in-vehicle networks
through information-theoretic algorithms. In: RTSI. IEEE; 2016.
26. Zhao Q, Chen M, Gu Z, Luan S, Zeng H, Chakrabory S. CAN bus intrusion detection based on auxiliary
classifier GAN and out-of-distribution detection. ACM Transactions on Embedded Computing Systems
(TECS). 2022; 21(4):1–30. https://doi.org/10.1145/3540198
27. Moulahi T, Zidi S, Alabdulatif A, Atiquzzaman M. Comparative performance evaluation of intrusion
detection based on machine learning in in-vehicle controller area network bus. IEEE Access. 2021;
9:99595–99605. https://doi.org/10.1109/ACCESS.2021.3095962
28. Hossain MD, Inoue H, Ochiai H, Fall D, Kadobayashi Y. Long short-term memory-based intrusion
detection system for in-vehicle controller area network bus. In: 2020 IEEE 44th Annual Computers, Soft-
ware, and Applications Conference (COMPSAC). IEEE; 2020. p. 10–17.
29. Kalkan SC, Sahingoz OK. In-vehicle intrusion detection system on controller area network with machine
learning models. In: 2020 11th International Conference on Computing, Communication and Network-
ing Technologies (ICCCNT). IEEE; 2020. p. 1–6.
30. Bloom G. WeepingCAN: A Stealthy CAN Bus-off Attack. In: AutoSec; 2021. p. 25.
31. Nair Narayanan S, Mittal S, Joshi A. OBD_SecureAlert: An Anomaly Detection System for Vehicles;
2016.
32. Wasicek A, Pesé MD, Weimerskirch A, Burakova Y, Singh K. Context-aware intrusion detection in auto-
motive control systems. In: Proc. 5th ESCAR USA Conf; 2017. p. 21–22.
33. Hanselmann M, Strauss T, Dormann K, Ulmer H. CANet: An unsupervised intrusion detection system
for high dimensional CAN bus data. Ieee Access. 2020; 8:58194–58205. https://doi.org/10.1109/
ACCESS.2020.2982544
34. Nichelini A, Pozzoli CA, Longari S, Carminati M, Zanero S. Canova: a hybrid intrusion detection frame-
work based on automatic signal classification for can. Computers & Security. 2023; 128:103166. https://
doi.org/10.1016/j.cose.2023.103166
35. Tariq S, Lee S, Woo SS. CANTransfer: Transfer learning based intrusion detection on a controller area
network using convolutional LSTM network. In: Proceedings of the 35th annual ACM symposium on
applied computing; 2020. p. 1048–1055.
36. P Moriano and R A Bridges and M D Iannacone. Detecting CAN Masquerade Attacks with Signal Clus-
tering Similarity. In: Workshop on Automotive and Autonomous Vehicle Security (AutoSec); 2022. p. 1–
8.
37. Shahriar MH, Xiao Y, Moriano P, Lou W, Hou YT. CANShield: Deep Learning-Based Intrusion Detection
Framework for Controller Area Networks at the Signal-Level. IEEE Internet of Things Journal. 2023; p.
1–1. https://doi.org/10.1109/JIOT.2023.3303271
38. Cho, Shin. Viden: Attacker Identification on In-Vehicle Networks. In: SIGSAC. CCS’17. ACM; 2017.
39. Jeong W, Choi E, Song H, Cho M, Choi JW. Adaptive Controller Area Network Intrusion Detection Sys-
tem Considering Temperature Variations. IEEE Transactions on Information Forensics and Security.
2022; 17:3925–3933. https://doi.org/10.1109/TIFS.2022.3217389
40. Bhatia R, Kumar V, Serag K, Celik ZB, Payer M, Xu D. Evading Voltage-Based Intrusion Detection on
Automotive CAN. In: NDSS; 2021.
41. Salman N, Bresch M. Design and implementation of an intrusion detection system (IDS) for in-vehicle
networks; 2017. Available from: http://publications.lib.chalmers.se/records/fulltext/251871/251871.pdf.
42. de Faveri Tron A, Longari S, Carminati M, Polino M, Zanero S. Canflict: exploiting peripheral conflicts
for data-link layer attacks on automotive networks. In: Proceedings of the 2022 ACM SIGSAC Confer-
ence on Computer and Communications Security; 2022. p. 711–723.
43. Larson UE, Nilsson DK, Jonsson E. An approach to specification-based attack detection for in-vehicle
networks. In: Intel. Vehic. Symp. IEEE; 2008. p. 220–225.
44. Olufowobi H, Young C, Zambreno J, Bloom G. SAIDuCANT: Specification-Based Automotive Intrusion
Detection Using Controller Area Network (CAN) Timing. IEEE TVT. 2020. https://doi.org/10.1109/TVT.
2019.2961344
45. Tomlinson A, Bryans J, Shaikh SA. Using a One-class Compound Classifier to Detect In-vehicle Net-
work Attacks. ACM; 2018.
46. Kuwahara T, Baba Y, Kashima H, Kishikawa T, Tsurumi J, Haga T, et al. Supervised and unsupervised
intrusion detection based on CAN message frequencies for in-vehicle network. Journal of Info Proc.
2018; 26:306–313.
47. Wu W, Li R, Xie G, An J, Bai Y, Zhou J, et al. A Survey of Intrusion Detection for In-Vehicle Networks.
IEEE Trans Intel Transp Sys. 2019. https://doi.org/10.1109/TITS.2019.2908074
48. Lokman SF, Othman AT, Abu-Bakar MH. Intrusion detection system for automotive Controller Area Net-
work (CAN) bus system: a review. EURASIP J Wireless Comm & Netw. 2019; 2019.
49. Loukas G, Karapistoli E, Panaousis E, Sarigiannidis P, Bezemskij A, Vuong T. A taxonomy and survey
of cyber-physical intrusion detection approaches for vehicles. Ad Hoc Networks. 2019;.
50. Rajbahadur GK, Malton AJ, Walenstein A, Hassan AE. A survey of anomaly detection for connected
vehicle cybersecurity and safety. In: 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE; 2018.
p. 421–426.
51. Rajapaksha S, Kalutarage H, Al-Kadri MO, Petrovski A, Madzudzo G, Cheah M. Ai-based intrusion
detection systems for in-vehicle networks: A survey. ACM Computing Surveys. 2023; 55(11):1–40.
https://doi.org/10.1145/3570954
52. Jaynes M, Dantu R, Varriale R, Evans N. Automating ECU Identification for Vehicle Security. In:
ICMLA; 2016.
53. Markovitz Wool. Field classification, modeling and anomaly detection in unknown CAN bus networks.
Vehicular Communications. 2017; 9.
54. Huybrechts T, Vanommeslaeghe Y, Blontrock D, Van Barel G, Hellinckx P. Automatic reverse engineer-
ing of CAN bus data using machine learning techniques. In: Advances on P2P, Parallel, Grid, Cloud and
Internet Computing: Proceedings of the 12th International Conference on P2P, Parallel, Grid, Cloud
and Internet Computing (3PGCIC-2017). Springer; 2018. p. 751–761.
55. Nolan BC, Graham S, Mullins B, Kabban CS. Unsupervised time series extraction from controller area
network payloads. In: (VTC-Fall). IEEE; 2018.
56. Marchetti Stabili. READ: Reverse engineering of automotive data frames. IEEE Transactions on Info
Foren & Sec. 2018; 14(4):1083–1097. https://doi.org/10.1109/TIFS.2018.2870826
57. Verma M, Bridges R, Hollifield S. ACTT: Automotive CAN Tokenization & Translation. In: CSCI. IEEE;
2018.Available from: https://american-cse.org/csci2018/info.html.
58. Pesé MD, Stacer T, Campos CA, Newberry E, Chen D, Shin KG. LibreCAN: Automated CAN Message
Translator. In: SIGSAC CCS. ACM; 2019.
59. Young C, Svoboda J, Zambreno J. Towards reverse engineering controller area network messages
using machine learning. In: 2020 IEEE 6th World Forum on Internet of Things (WF-IoT). IEEE; 2020.
p. 1–6.
60. Buscemi A, Turcanu I, Castignani G, Crunelle R, Engel T. CANMatch: a fully automated tool for can bus
reverse engineering based on frame matching. IEEE Transactions on Vehicular Technology. 2021; 70
(12):12358–12373. https://doi.org/10.1109/TVT.2021.3124550
61. Verma ME, Bridges RA, Sosnowski JJ, Hollifield SC, Iannacone MD. CAN-D: A Modular Four-Step
Pipeline for Comprehensively Decoding Controller Area Network Data. IEEE TVT. 2021; 70(10):9685–
9700. https://doi.org/10.1109/TVT.2021.3092354
62. Buscemi A, Turcanu I, Castignani G, Panchenko A, Engel T, Shin KG. A Survey on Controller Area Net-
work Reverse Engineering. IEEE Communications Surveys & Tutorials. 2023;.
63. Tyree Z, Bridges RA, Combs FL, Moore MR. Exploiting the shape of CAN data for in-vehicle intrusion
detection. In: 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall). IEEE; 2018. p. 1–5.
64. Pawelec K, Bridges RA, Combs FL. Towards a CAN IDS Based on a Neural Network Data Field Predic-
tor. In: AutoSec. ACM; 2019.
65. Bosch R. CAN specification 2.0. Rober Bousch GmbH, Postfach. 1991;.
66. Voss W. A Comprehensible Guide to Controller Area Network; 2008.
67. Cho, Shin. Fingerprinting electronic control units for vehicle intrusion detection. In: USENIX Security;
2016.
89. Lee S, Jo HJ, Cho A, Lee DH, Choi W. TTIDS: Transmission-Resuming Time-Based Intrusion Detection
System for Controller Area Network (CAN). IEEE Access. 2022; 10:52139–52153. https://doi.org/10.
1109/ACCESS.2022.3174356
90. Islam MR, Oh I, Yim K. CANTool An In-Vehicle Network Data Analyzer. In: 2022 International Confer-
ence on Information Technology Systems and Innovation (ICITSI). IEEE; 2022. p. 252–257.
91. Agbaje P, Anjum A, Mitra A, Bloom G, Olufowobi H. A Framework for Consistent and Repeatable Con-
troller Area Network IDS Evaluation. In: Fourth International Workshop on Automotive and Autonomous
Vehicle Security; 2022.
92. Papadopoulos C, Shannigrahi S, Afanaseyv A. In-vehicle networking with NDN. In: Proceedings of the
8th ACM Conference on Information-Centric Networking; 2021. p. 127–129.