Detection of Unauthorized Iot Devices Using Machine Learning Techniques
Detection of Unauthorized Iot Devices Using Machine Learning Techniques
Detection of Unauthorized Iot Devices Using Machine Learning Techniques
Yuval Elovici
Singapore University of Technology
and Design
[email protected]
ABSTRACT 1 INTRODUCTION
Security experts have demonstrated numerous risks imposed by The Internet of Things (IoT) is globally expanding, providing diverse
Internet of Things (IoT) devices on organizations. Due to the wide- benefits in nearly every aspect of our lives [3, 28, 37, 42, 44]. Unfortu-
spread adoption of such devices, their diversity, standardization nately, the IoT is also accompanied by a large number of information
obstacles, and inherent mobility, organizations require an intel- security vulnerabilities and exploits [1, 3, 8, 23, 30, 32, 37, 38, 44].
ligent mechanism capable of automatically detecting suspicious If we take into account the inherent computational limitations of
IoT devices connected to their networks. In particular, devices not IoT devices in addition to their typical vulnerabilities, the ease by
included in a white list of trustworthy IoT device types (allowed which hackers can locate them (e.g., Shodan [25]), and their ex-
to be used within the organizational premises) should be detected. pected proliferation worldwide [17, 34], then both the risks and the
In this research, Random Forest, a supervised machine learning projected global impact of connecting IoT devices to the network
algorithm, was applied to features extracted from network traffic in any modern environment become clearly evident.
data with the aim of accurately identifying IoT device types from The current research focuses on the risks IoT devices pose to
the white list. To train and evaluate multi-class classifiers, we col- large corporate organizations. IoT security in enterprises is asso-
lected and manually labeled network traffic data from 17 distinct ciated with the behavior of the organization itself, as well as its
IoT devices, representing nine types of IoT devices. Based on the employees. Self-deployed IoT devices may support a variety of
classification of 20 consecutive sessions and the use of majority rule, enterprise applications. For instance, smart cameras and smoke
IoT device types that are not on the white list were correctly de- detectors enhance security; smart thermostats, smart light bulbs
tected as unknown in 96% of test cases (on average), and white listed and sockets facilitate power savings; and so forth. Given this, care
device types were correctly classified by their actual types in 99% should be taken to make sure that such Web-enabled devices do not
of cases. Some IoT device types were identified quicker than others contribute to an expansion of the cyber attack surface within the
(e.g., sockets and thermostats were successfully detected within organization. The smart TVs typically installed in conference rooms
five TCP sessions of connecting to the network). Perfect detection are a good example. As described in [9], the Skype app can be used
of unauthorized IoT device types was achieved upon analyzing 110 by a widget in order to obtain elevated privileges. It is then able to
consecutive sessions; perfect classification of white listed types perform rooting, make images of the complete flash memory, and
required 346 consecutive sessions, 110 of which resulted in 99.49% leak them outside to a remote FTP server. In [6] a "Fake-Off mode" is
accuracy. Further experiments demonstrated the successful appli- outlined, where although the display is switched off, an implanted
cability of classifiers trained in one location and tested on another. malware is still able to capture surrounding voices, and unlawfully
In addition, a discussion is provided regarding the resilience of our transmit them to third parties via a Wi-Fi connection. Additional
machine learning-based IoT white listing method to adversarial exploits which involve smart TVs are described in [19, 22]. Accord-
attacks. ingly, corporate enterprises should reconsider whether to allow
connecting smart TVs to their networks.
Regarding the implications of employee behavior on organiza-
tional IoT security, the rapidly emerging concept of employees
KEYWORDS bringing their own IoT devices (BYOIoT) to the workplace also
Internet of Things (IoT), Cyber Security, Machine Learning, Device increases the number of IoT devices connected to enterprise net-
Type Identification, White Listing works. As per [35], this trend, which has been growing for several
years, is associated largely with the use of wearables which have the device would likely be connected to a specific organiza-
become popular, particularly in the healthcare and business ser- tional network in the future. Various attacks are possible in
vice/consulting industries. Surveying this BYOIoT trend, [13] found this situation, including a supply chain attack, in which the
that remote employees tend to connect numerous IoT devices to IoT device is contaminated before it reaches the end consumer,
their home networks, while 25-50 percent of them admit they have e.g., while in manufacturing, distributing, or handling. Another
connected at least one of these IoT devices to their enterprise net- more focused and precise attack is one in which the attacker
work as well. The resulting risks outlined, e.g., in [21, 29, 31, 41], gains control of a specific IoT device belonging to the non-
include cross-contamination which can arise when a BYOIoT de- compliant user, while this user is at home, and uses the IoT
vice (possibly infected earlier with malware from a domestic net- device to infiltrate the organizational network.
work) connects to the organizational network. Scenarios of this kind
Of these two kinds of cyber attacks, we presume that the latter
which are likely to occur frequently, could serve to unintentionally
demands better hacking skills. As such, a skillful attacker might
inject malware into enterprise networks, or add entry points for
also be aware of an automated white listing system like the one we
hackers. Once obtaining access, attackers can preserve persistency
propose in Section 5. This type of attacker might attempt to bypass
in the network, and hide their presence inside the organization for
it by resorting to adversarial methods. However, as described in
long periods of time. Full-fledged attacks can then be launched from
Section 6.5, attacks of this kind on our mechanism are practically
the compromised IoT device. Additional negative consequences can
unattainable.
incur in cases in which there is inadequate separation between
production and guest networks, or in cases in which the app of the
BYOIoT device is installed on organizational PCs (possibly asking 3 WHITE LISTING FOR IOT SECURITY
for too many permissions). The cyber attacks discussed in Section 2 are enabled only when
The remainder of the paper is structured as follows: Section 2 the compromised IoT devices are connected to the organizational
outlines the enterprise system we assume, as well as two feasible network. For mitigation of associated risks, one option is to inten-
IoT attack vectors. Then, Section 3 discusses how automated white tionally control what connects to the network (i.e., refrain from
listing of IoT device types can address such attacks and mitigate connecting device types that are known to have unacceptable vul-
their associated risks; this section concludes with an explicit state- nerabilities). In small offices, composing a list of authorized IoT
ment of the problem we address. A list of research contributions device types as an organizational policy is a feasible option. Enforce-
is provided in Section 4, followed by a detailed description of the ment of IoT device type white listing is then achievable by means
method we propose in Section 5. Section 6 contains an empirical of employee training, backed by regular physical surveying. In con-
evaluation of our method, and the reasons why our method is re- trast to this small-scale environment, large corporate enterprises are
silient to adversarial attacks. Aspects of deployment are discussed much harder to monitor for unauthorized connected devices. This is
in Section 7, followed by a review of related work in Section 8. We mainly due to the large number of employees and guests, as well as
then summarize our research in Section 9. the size of the physical premises. Thus, non-technical policy-based
solutions will not suffice in large organizations, and advanced auto-
2 SYSTEM AND ATTACK MODEL mated means are required. Once deployed, an automated IoT device
type white listing system can feed a SIEM (security information and
In this research, the system we assume is a typical large enter-
event management) system. Subsequently, (near) real-time network
prise, facing an ever growing range of IoT-related cyber threats.
segmentation and access control can be implemented, e.g., by using
Unlike heavy-duty DDoS attacks, carried out by vast botnets (e.g.,
software defined networks (SDN). Constant network monitoring
Mirai [30] which exploited weak or default passwords), the cur-
with sufficient resolution may also prove effective in enabling the
rent research focuses on advanced attacks which are based on
investigation of security policy violations, and help to identify the
local violations of organizational security policies. Once performed,
specific time and place from which an unauthorized IoT device tries
they enable an attacker to take advantage of people who connect
to gain network access.
unauthorized types of IoT devices to the enterprise network. This
Note that despite having the same ultimate goal of ensuring
noncompliance with organizational policy could be intentional (e.g.,
that only authorized IoT devices can connect to the network, in
a disgruntled employee) or accidental (e.g., an uninformed guest).
this study we opt for white, rather than black listing. For certain
Still, it is important to note that in our research scenario the non-
use cases, such as email spam filtering, black-listing or even a
compliant user is unaware of the presence of malware on the IoT
combination of black and white listing [15] may be preferable.
device, and has no intention of compromising the enterprise net-
However, given the plethora of common IoT vulnerabilities, any
work. This person is used by an attacker due to the frequent access
organization wishing to protect its data and IT infrastructure would
he/she provides to the enterprise network. Two associated attack
be highly suspicious of all types of IoT devices and exhibit great care
types can be differentiated as follows:
before allowing the connection of any IoT device. Consequently, the
(1) Untargeted: The connected IoT device has been previously white list of authorized device types marked as safe would be much
infected by a malware of indiscriminate nature, virally spread- smaller than the ever growing list of presumably insecure types,
ing among as many devices as possible. Cross-contamination unauthorized by default. As a result, a shorter list may contribute
provides a mechanism for this kind of attack. to the increased efficiency of the machine learning (ML) processes
(2) Specifically targeted: The malware was intentionally implanted underlying the proposed white listing method, including model
on the IoT device by an attacker, based on the assumption that training, validation, testing, and deployment. Moreover, collecting
2
data from authorized IoT device types should be more practical IoT device types, our method can be used in new settings with
than unauthorized types, for later comparison against unlabeled new users and devices without additional training.
data in production mode. For example, if we don’t let smart TVs (7) Our method is itself resilient to adversarial attacks, and we
connect to our network, then how can we collect their data to form provide an explanation for this.
the basis of a black list?
Our problem statement is as follows: In order to enforce organiza- 5 PROPOSED METHOD
tional security policies regarding the types of IoT devices authorized Given a set of authorized device types D (i.e., the white list) and a
to connect to the network, continuous traffic monitoring should structured set of traffic data, we treat the task of IoT device type
be performed. For each stream of traffic data originating from a identification as a multi-class classification problem. That is, we
connected device (i.e., an IP stream), the challenge is to accurately wish to map each IP stream to the type of IoT device that is most
identify the IoT device type. Then, upon determining whether the likely to have produced it. We rely on the assumption that every
IoT device type is authorized (i.e., appears on the white list) or not, device type di on the white list D is sufficiently represented in the
actions may be taken (e.g., disconnect from the network). (labeled) dataset. This way, a classifier C can be induced by means
of supervised ML, which captures the behavior of every authorized
4 CONTRIBUTION device type. In turn, this classifier can be continuously applied to
new streams of (unlabeled) traffic data for device type identification
In this work we propose a method for identifying unauthorized and white listing. An overview of the proposed method for IoT
types of IoT devices connected to the network, based on the contin- white listing, from determination of the white list scope to ongoing
uous classification of the traffic of individual devices; if the specific application of trained classifier, is portrayed in Figure 1.
device types do not appear on a white list, they are assumed to be
unauthorized. The contributions of this work are as follows:
(1) Our method only relies on TCP/IP traffic data for classification.
To the best of our knowledge, this is the first attempt to utilize
network traffic data for ML in order to detect unauthorized IoT
devices connected to a network.
(2) Because it is reliant on Internet traffic data, readily available
to any large organization, our method can be easily deployed
without requiring any costly specialized equipment. It can be
implemented as a software service running in the background,
continuously feeding a SIEM system with alerts about unau-
thorized IoT devices connected to the network.
(3) We implemented the proposed system and demonstrated the
performance of our classifiers using 17 different IoT devices,
representing nine types of devices. Some of these types are
also produced by different vendors, and in some cases we have
used more than one model for a vendor. For example, we have
four distinct devices of the type "watch" in our lab, which are
produced by two vendors: Sony and LG. We have two identical
Urban watches and a single G Watch R watch from LG, and
one Sony watch. Further details can be found in Appendix A.1.
(4) Traffic data was captured over a long period of time, with most
device types accumulating more than 50 recording days (see Figure 1: Overview of proposed method for IoT white listing
Table 1). The devices were located and operated in the most
ordinary manner (e.g., a refrigerator in the kitchen, watches on
the wrists of researchers), and are thus representative of real 5.1 Notation
world usage. The notation we use to describe our method is summarized below.
(5) We showed the ability of our method to effectively classify IoT D: Set {d 1 , . . . , dn } of IoT device types that are on the
device types. IoT device types not included in the white list were white list.
100% correctly detected as "unknown" upon analyzing a moving DS t r aininд : Labeled training dataset, used for inducing the multi-
window of 110 consecutive sessions. If deployed, our method class classifier. It includes feature vectors represent-
can issue an alert to the organizational SIEM, few minutes after ing sessions of devices whose types are in D.
the unauthorized device is connected to the network. s: Single TCP/IP session, represented by a feature vec-
(6) We demonstrate transferability of findings, such that classifiers tor.
learned in one lab reached high classification accuracy when C: Multi-class classifier for D, induced from DS t r aininд ,
applied to a set of devices in a second lab located in another classifies a given session as di or unknown.
country. This suggests that given the same list of authorized tr : Classification threshold for C.
3
DSval idat ion : Labeled dataset, sorted in chronological order, used F-measure, also known as the balanced F -score or F 1 score, assumes
for optimizing classification parameters such as tr . equal weight for false positives and false negatives.
i
DSval : Subset of DSval idat ion originating from di , repre-
idat ion
senting an IP stream from that device type. 1 precision · recall
pis : Posterior probability of a session s to originate from F1 = 2 · =2· (1)
1 1
+ pr ecision precision + recall
di ; derived by applying C to session s. r ecall
S d : Sequence of sessions originating from device type A future modification could be to replace F 1 with a more general
d. F β measure (see Equation 2). This way, an organization deploying
s ∗ : Smallest sequence of consecutive sessions that if our methodology and wishing to enforce a stricter IoT security
classified with C and if majority voting is applied on policy could choose a β < 1 (e.g., F 0.5 ) to put more emphasis on
the classification results, then perfect classification precision than on recall. Thus, fewer unauthorized IoTs connected
is achieved. to the organizational network would be incorrectly identified as
DS t est : Labeled test dataset, sorted temporally, used to eval- white listed. Other organizations, wishing to reduce false alarms
uate the proposed method. concerning authorized IoTs, could use a β > 1 (e.g., F 2 ). Note that
DS ti est : Subset of DS t est , originating from device type di , the higher tr is, the more confident we wish to be for white listing
representing an IP stream from that device type. the source of a single session s.
white list while preserving the intended functionality of the rogue 7 DEPLOYMENT
device. The proposed method for IoT device type white listing can be easily
We note that in many cases IoT devices will contact their manu- integrated into typical organizational environments. It is particu-
facturer’s website for various reasons (e.g., heartbeat, update check- larly well suited for integration with a SIEM software service. In
ing, services). Thus, an adversary that wants to mimic a device this case, the detection of an unauthorized IoT device is considered
W must be able to generate similar requests to its manufacturer’s an event, and the detection of the connection of an unsanctioned
servers, and more crucially, obtain similar responses. This might device by the SIEM system can trigger an alarm or the immediate
be challenging if the protocol between the device and the manufac- isolation of the device from the network. As part of such a system
turer requires reverse engineering or if the manufacturer expects these actions can be followed by a thorough investigation of the
some form of authentication from the device (for instance by using monitored data and the security policy violation.
public-key cryptography). We implemented the proposed method and assessed the effec-
It also must be noted that based on the most important features tiveness of such an implementation in IoT device type white listing.
of our approach for the devices considered in our experiments Figure 4 shows the average number of consecutive sessions (i.e.,
(summarized in Table 4), an attacker must be able to mimic the the size of the moving window used for majority voting) required
average speed at which packets travel in a normal connection or the to reach various levels of classification accuracy. Naturally, the
Alexa Rank of the SSL server used in the connection. Although not wider the window, the more accurate classification is, on average.
impossible, an attacker must be able to control and fine-tune several However, it is evident that the marginal utility of widening the
aspects of external servers as well as the respective communication moving windows diminishes after approximately 20 sessions. This
channel with them, in order to achieve this level of camouflage. This is true for both correct detection of a device type as unknown and
can be costly and also requires a substantial amount of knowledge, correct classification of a white listed type to its actual type. More-
expertise, and hacking abilities. over, wider moving windows for higher classification accuracy also
In addition, consider the scenario in which an attacker wants means longer periods of time for an alert to be sent to the SIEM
to connect a device to the network that requires relatively high when a new device connects to the organizational network or is
bandwidth to function, such as smartwatches that broadcast video activated on the organizational premises. This trade-off between
over the Internet (which can be used to live-spy on a confidential accuracy and speed can be settled by any organization deploying
meeting, for instance). In this case, if the devices on the white our method, by setting the parameter of moving window size.
list do not have a similar bandwidth in normal operation (i.e., a In our experiments, perfect detection of unauthorized IoT de-
thermostat), then although an attacker can mimic them, he/she will vice types was obtained on the test set with a moving window of
have to throttle down or severely compress the data broadcast of 110 consecutive sessions (see Appendix A.4). Five of them reached
his/her rogue device to avoid suspicion. This might be impossible 100% detection accuracy with only 20 (or less) sessions. These en-
(compression) or contradict the attacker’s overall goal (a delay of couraging results were obtained for TVs (two models of the same
throttling might render information out-dated when transmitted manufacturer), as well as for sockets and motion sensors (each with
outside the organization). two unique devices of the same model), proving model general-
In summary, although we acknowledge that such attacks are ization across models and specific devices, respectively. For white
possible in principle, a detailed investigation regarding the practical listed types perfect classification required a sequence of 346 ses-
implications of such attacks is left for future work. sions; however, 110 sessions were enough to obtain 99.49% accuracy.
8
For translating between the number of sessions and the respective listing was implemented for industrial automation control systems
amount of time (in seconds), Appendix A.5 summarizes the mean (IACS). However, as noted by the authors, in this domain equip-
and standard deviation of the session inter-arrival time for the stud- ment is usually engineered such that communication relations are
ied IoT device types. Note that for each type the time required for known up front, thus the overall operational complexity remains
detection is different, since the number of sessions per unit of time tractable. In contrast, the large-scale enterprise environment we
differs among devices, and also varies for the same device over time, address (see Section 2) is much more dynamic in nature, where new
even if a fixed size for a moving window is established. Also note types or brands of IoT devices are frequently introduced. Hence,
that for estimation of the time required for detection we omitted authentication-based methods will probably fail to scale. In addi-
the watches and smoke detectors, because their communication is tion, it cannot be assumed that all vendors implement standardized
stimuli-dependent, thus highly variable. encryption protocols, and the feasibility of setting a standard for
global public key infrastructure is limited, so this approach is im-
practical for the problem at hand.
Another method for white listing traffic flows in order to de-
fend against cyber attacks suggested in [4] is more similar to our
approach. However, like [16] (previously mentioned) the authors
admit that in SCADA networks like the ones used in their research,
traffic patterns are somewhat predictable. In contrast, our work
assumes no such predictability and is designed to withstand the
mix of traffic patterns associated with the diverse and evolving IoT
domain. Another gap associated with [4] is that the technique they
suggest only differentiates between authorized and unauthorized
traffic in a binary manner, while the method we propose is also
capable of identifying the specific type of IoT device involved.
The various methods of identifying connected devices proposed
in prior studies have utilized a range of data sources. For example,
researchers suggested a mechanism to identify and verify mobile
devices by extracting features from measured signals and emissions,
Figure 4: Classification accuracy on the test set as a function and classifying them by comparison to a database of labeled clus-
of the size of the moving window used for majority voting ters [40]. In order to detect specific emitters, they measured various
types of signals and extracted different feature sets, primarily from
radio frequency (RF) transmissions and acoustics. Unfortunately,
as noted by the researchers themselves, these features were occa-
8 RELATED WORK sionally hampered by noise and interference, as well as lost and
The automated enforcement of an organizational policy of IoT false data. Our work differs from theirs in several respects: (1) we
white listing requires a reliable mechanism for device type iden- only extract and maintain a single fixed set of networking features,
tification, and prior work has attempted to do this. However, as (2) the features we extract are very hard for adversaries to tamper
elaborated in this section, our work addresses a number of sub- with (see Section 6.5), and (3) our features are extracted from traffic
stantial research gaps, since the method we propose is (1) more data normally collected by the organization, without the need for a
practical and easily-deployable than others, (2) less costly, (3) offers designated signal recorder, making our method easier to deploy.
higher discrimination among multiple types, rather than binary In a later work based on RF transmissions ML techniques (kNN
categorization into ,e.g., authorized or unauthorized, (4) generalizes and SVM) were used to leverage minute imperfections of IEEE
the trained classifiers for multiple devices per type, multiple models 802.11 transmissions, and identify the respective wireless sources [11].
per manufacturer, and multiple manufacturers per type, (5) enables Unlike this work which demonstrated its ability to discern among
continuous verification, (6) is evaluated on large actual datasets identically manufactured network interface cards (NICs), we aim
collected from the ordinary and unrestricted daily usage of hetero- at correctly identifying various IoT device types, including those
geneous IoT devices, in contrast to simulated, constrained, or very from different vendors. Our method is also free from further limita-
limited data, and (7) offers robustness and generalization for the tions imposed by the RF fingerprinting method, such as the need
constantly growing IoT domain. for mission-specific capturing hardware (vector signal analyzers,
Basing the identification on the device MAC address may not be antennas, amplifiers, etc.), as well as physical requirements (e.g.,
effective, since skillful attackers are able to forge the MAC address the need for the hardware to have a line of sight with the NICs and
of a compromised IoT device [11]. Additionally, although MAC be located a maximal distance of 25 meters away from the NICs,
addresses can be used to identify the manufacturer of a particular while enduring fluctuations of RF noise conditions).
device, there is no established standard to identify a device’s brand As opposed to the MAC, authentication, and emission-based
or type based on its MAC address. methods outlined above, we propose to use features of network
Authentication-based methods have also been investigated as traffic for IoT device type identification and white listing. Since
a means of IoT device identification and white listing. This ap- traffic data is readily available within any organization, it has been
proach has been studied in [16], in which IoT certificate white
9
used extensively for a variety of security applications in the past. In [26] the authors presented data collection and analytical tech-
In many cases, analysis was based on ML, and more specifically niques that partially overlap with the current work, with IoT device
on clustering, classification, and anomaly detection. For example, identification set as the research objective. In comparison to this
in [36] the researchers described a traffic-based technique to identify work, in our study (1) identification is leveraged for white listing
rogue wireless access points that attempt to mislead victims into in large-scale enterprises, (2) much more traffic data was collected,
connecting to them. Other research addressed the challenge of contributing to the trained models’ robustness, (3) multiple vendors
identifying malware infected clients in a network, as well as the per device type are monitored to look for cross-vendor patterns and
associated command and control servers [20, 39]. In another study characteristics of a given IoT device type, (4) multi-class classifica-
aimed at detecting malware-related traffic, network traffic features tion is applied, as opposed to binary classification, and (5) model
similar to ours were utilized [5]. Still, despite similarities in data transportability between distant labs is tested. We are not aware of
collection and feature extraction methods, our work focuses on any other study which successfully addresses all these challenges.
the identification of IoT device types and white listing them in
organizational settings, rather than on malware detection. 9 SUMMARY AND CONCLUSION
To the best of our knowledge, only a few studies in the literature This research demonstrated how supervised ML can be applied to
were based on research objectives similar to ours. One case [24] dis- analyze network traffic data in order to accurately detect unautho-
cussed IoT device type-based access control as a key motivation, like rized IoT devices. To train and evaluate a multi-class classifier, we
we do, as well as identity management (IdM) challenges. However, collected and manually labeled network traffic data from seventeen
this research only aims to discern between two types of IoT devices, IoT devices representing nine device types; in order to assess the
namely: expedient (intelligent, high computing power, e.g., sensor ability of our method to detect a variety of unauthorized IoT device
nodes) vs. non-expedient (limited computing power, e.g., passive types, we trained a multi-class classifier for each device type on the
tags). Unlike this study, we propose multi-class classification with remaining eight device types, and examined its ability to correctly
much higher resolution (i.e., mapping IoT devices into functional detect the ninth type as unknown and classify the other eight as
types, such as smart watches, refrigerators, thermostats, TVs, and belonging to a specific type on the white list. Throughout the paper,
so forth). Moreover, the prior study only suggested a framework we demonstrated the effectiveness of our method in terms of the
and relied on simulation for proof of concept, while we train and following:
evaluate our models on a large amount of traffic data collected from
numerous IoT devices. • Classification accuracy: The trained classifiers achieved 96%
In another study motivated by network security, genuine traffic accuracy (on average) in the detection of unauthorized IoT de-
data was collected from several devices and utilized for detecting de- vice types on a test set, by performing majority voting over the
vice types that have potential security vulnerabilities [27]. However, classifications of no more than 20 consecutive sessions. Actually,
in this research only a limited variety of features was extracted; six out of the nine unauthorized device types attained 99-100%
even more importantly, this study only concentrated on the ini- detection accuracy. At the same time, white listed IoT device
tial stage of device setup (when it begins communicating with the types were classified to their specific types with near perfect
gateway). Our research was not limited in these ways; in addition, average accuracy of 99%.
while their data was collected based on repetitive device setups, • Detection speed: On a test set, the classifiers managed to detect
specifically dictated by device vendors via the installation guides, unauthorized IoT devices perfectly based on the analysis of 110
our data was collected over a period of weeks and months and consecutive sessions. A sequence of five sessions was enough
based on ordinary and unrestricted daily usage in natural surround- for sockets and thermostats. The translation from the number
ings (e.g., a smart refrigerator located in the kitchen). This way, of sessions to time (in seconds) varies across the device types
our method enables not only one-time identification, but rather studied.
continuous verification, performed at any stage of device opera- • Classifiers’ transportability: Our method also demonstrated
tion. Consequently, an adversary that somehow manages to bypass good transportability, by training classifiers on data collected in
identification during setup is likely to eventually be detected by one lab and testing on data collected in another lab located in a
our method. different country. Classification accuracy obtained was high, at
Motivated by privacy issues, the authors of [2] exemplified how levels similar to the accuracy obtained when training and testing
an Internet Service Provider (ISP) can analyze traffic data to infer the were performed on all of the data.
type of connected IoT device. However, in comparison to our study, • Resilience to cyber attacks: Although theoretic rather than ex-
they analyzed only four device types (one device per type and one perimental, we analyzed and showed how our method is resilient
manufacturer per type). Moreover, three out of the four devices are in the face of attempted adversarial attacks.
purpose-limited, steady, and rather "predictable" in terms of traffic In future research we plan to analyze a broader collection of IoT
(socket, camera, and sleeping monitor). They also evaluated their device types, explore additional communication technologies, and
method on data collected over just several hours, with repetitious experiment with the data of IoT devices infected by cyber attacks
scenarios and stimuli. Another limitation for scaling this method and malware.
is their reliance on a single feature, which is the domain of DNS
queries. Although we have such data, we refrain from using it in REFERENCES
order to mitigate concerns about model overfitting. [1] Ioannis Andrea, Chrysostomos Chrysostomou, and George Hadjichristofi. 2016.
Internet of Things: Security vulnerabilities and challenges. In Proceedings - IEEE
10
Symposium on Computers and Communications, Vol. 2016-Febru. https://doi.org/ [28] Daniele Miorandi, Sabrina Sicari, Francesco De Pellegrini, and Imrich Chlamtac.
10.1109/ISCC.2015.7405513 2012. Internet of Things: Vision, applications and research challenges. Ad Hoc
[2] Noah Apthorpe, Dillon Reisman, and Nick Feamster. 2017. A Smart Home is No Networks 10, 7 (2012), 1497–1516.
Castle: Privacy Vulnerabilities of Encrypted IoT Traffic. In Workshop on Data and [29] Bill Morrow. 2012. BYOD security challenges: control and protect your most
Algorithmic Transparency. arXiv:1705.06805 https://arxiv.org/pdf/1705.06805. sensitive data. Network Security 2012, 12 (2012), 5–8.
pdfhttp://datworkshop.org/papers/dat16-final37.pdf [30] K. Moskvitch. 2017. Securing IoT: In your smart home and your connected
[3] Luigi Atzori, Antonio Iera, and Giacomo Morabito. 2010. The Internet of Things: enterprise. Engineering Technology 12, 3 (April 2017), 40–42. https://doi.org/10.
A survey. Computer networks 54, 15 (2010), 2787–2805. 1049/et.2017.0303
[4] Rafael Ramos Regis Barbosa, Ramin Sadre, and Aiko Pras. 2013. Flow whitelisting [31] Morufu Olalere, M. T. Abdullah, Ramlan Mahmod, and Azizol Abdullah. 2015. A
in SCADA networks. International Journal of Critical Infrastructure Protection 6, Review of Bring Your Own Device on Security Issues. SAGE Open 5, 2 (2015),
3-4 (2013), 150–158. https://doi.org/10.1016/j.ijcip.2013.08.003 1–11. https://doi.org/10.1177/2158244015580372
[5] Dmitri Bekerman, Bracha Shapira, Lior Rokach, and Ariel Bar. 2015. Unknown [32] Matt Olsen, Bruce Schneier, Jonathan Zittrain, Urs Gasser, Matthew G Olsen,
malware detection using network traffic classification. In Conf. on Communica- Nancy Gertner, Daphna Renan, Jack Goldsmith, Julian Sanchez, Susan Landau,
tions and Network Security (CNS). IEEE, 134–142. Joseph Nye, Larry Schwartztol, and David R O ’brien. 2016. Don’t Panic. Making
[6] Sam Biddle. 2017. WikiLeaks Dump Shows CIA Could Turn Smart Progress on the "Going Dark" Debate. Technical Report. Harvard University,
TVs into Listening Devices. (2017). https://theintercept.com/2017/03/07/ Berkman Center for Internet and Society. https://cyber.harvard.edu/pubrelease/
wikileaks-dump-shows-cia-could-turn-smart-tvs-into-listening-devices/ dont-panic/Dont
[7] Leyla Bilge, Davide Balzarotti, William Robertson, Engin Kirda, and Christopher [33] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
Kruegel. 2012. Disclosure: detecting botnet command and control servers through Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
large-scale netflow analysis. In Proceedings of the 28th Annual Computer Security napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
Applications Conference. ACM, 129–138. Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[8] Riccardo Bonetto, Nicola Bui, Vishwas Lakkundi, Alexis Olivereau, Alexandru [34] Paul Roberts. 2016. IDC: 30 billion autonomous devices by 2020.
Serbanati, and Michele Rossi. 2012. Secure communication for smart IoT objects: https://securityledger.com/2013/10/idc-30-billion-autonomous-devices-
Protocol stacks, use cases and practical examples. In Inter. Symposium on a World by-2020/. The Security Ledger (oct 2016). https://securityledger.com/2013/10/
of Wireless, Mobile and Multimedia Networks (WoWMoM). IEEE, 1–7. idc-30-billion-autonomous-devices-by-2020/
[9] A. Boztas, A. R J Riethoven, and M. Roeloffs. 2015. Smart TV forensics: Digital [35] Matteson S. 2014. Survey: BYOD, IoT and wearables trends in
traces on televisions. Digital Investigation 12, S1 (2015), S72–S80. https://doi. the enterprise - TechRepublic. http://www.techrepublic.com/article/
org/10.1016/j.diin.2015.01.012 survey-byod-iot-and-wearables-trends-in-the-enterprise/. (November
[10] Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32. 2014). (Accessed on 05/10/2017).
[11] Vladimir Brik, Suman Banerjee, Marco Gruteser, and Sangho Oh. 2008. Wire- [36] Ibrahim Halil Saruhan. 2007. Detecting and Preventing Rogue Devices on the
less device identification with radiometric signatures. In ACM conf. on Mobile Network. SANS Institute InfoSec Reading Room, sans. org. (2007).
computing and networking. ACM, 116–127. [37] Sachchidanand Singh and Nirmala Singh. 2015. Internet of Things (IoT): Security
[12] Anna L Buczak and Erhan Guven. 2016. A survey of data mining and machine challenges, business opportunities & reference architecture for E-commerce. In
learning methods for cyber security intrusion detection. IEEE Communications Inter. Conf. on Green Computing and Internet of Things (ICGCIoT). IEEE, 1577–
Surveys & Tutorials 18, 2 (2016), 1153–1176. 1581.
[13] CALERO. 2015. 3 Ways the Internet of Things will Impact Enter- [38] Arunan Sivanathan, Daniel Sherratt, Hassan Habibi Gharakheili, and Arun Vish-
prise Security. (2015). https://www.calero.com/mobility-service-support/ wanath. 2016. Low-Cost Flow-Based Security Solutions for Smart-Home IoT
3-ways-the-internet-of-things-will-impact-enterprise-security/ Devices. In IEEE Advanced Networks and Telecommunications Systems (ANTS).
[14] Gerald Combs and others. 2008. Wireshark-network protocol analyzer. Version Bangalore, India.
0.99 5 (2008). [39] W Timothy Strayer, David Lapsely, Robert Walsh, and Carl Livadas. 2008. Botnet
[15] David Erickson, Martin Casado, and Nick McKeown. 2008. The Effectiveness of detection based on network behavior. In Botnet Detection: Countering the Largest
Whitelisting: a User-Study.. In CEAS. Security Threat. Springer, 1–24. https://doi.org/10.1007/978-0-387-68768-1_1
[16] Rainer Falk and Steffen Fries. 2015. Using Managed Certificate Whitelisting as [40] Kenneth I Talbot, Paul R Duley, and Martin H Hyatt. 2003. Specific emitter
a Basis for Internet of Things Security in Industrial Automation Applications. identification and verification. Technology Review (2003), 113.
International Journal on Advances in Security 8, 1 \& 2 (2015), 89–98. https: [41] John Thielens. 2013. Why APIs are central to a BYOD security strategy. Network
//www.iariajournals.org/security/sec Security 2013, 8 (2013), 5–6.
[17] Gartner. 2015. Gartner Says 6.4 Billion Connected. (2015). Retrieved April 14, [42] Rolf H Weber. 2010. Internet of Things–New security and privacy challenges.
2016 from http://www.gartner.com/newsroom/id/3165317. Computer Law & Security Review 26, 1 (2010), 23–30.
[18] Farnaz Gharibian and Ali A Ghorbani. 2007. Comparative study of supervised [43] Jiong Zhang, Mohammad Zulkernine, and Anwar Haque. 2008. Random-forests-
machine learning techniques for intrusion detection. In Communication Networks based network intrusion detection systems. IEEE Transactions on Systems, Man,
and Services Research, 2007. CNSR’07. Fifth Annual Conference on. IEEE, 350–358. and Cybernetics, Part C (Applications and Reviews) 38, 5 (2008), 649–659.
[19] Aaron Grattafiori and Josh Yavor. 2013. The outer limits: Hacking the samsung [44] Kai Zhao and Lina Ge. 2013. A survey on the internet of things security. In 9th
Smart TV. Black Hat Briefings (2013). Inter. Conf. on Computational Intelligence and Security (CIS). IEEE, 663–667.
[20] Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. 2008. BotMiner:
Clustering Analysis of Network Traffic for Protocol-and Structure-Independent
Botnet Detection. In USENIX Security Symposium. A APPENDIX
[21] Emily Johnson. 2016. 6 IoT Security Dangers To The Enterprise. (2016). http:
//www.darkreading.com/endpoint/6-iot-security-dangers-to-the-enterprise/d/ A.1 IoT Devices Used in Experiments
d-id/1325140
[22] S Lee and Seungjoo Kim. 2013. Hacking, surveilling and deceiving victims on
smart tv. Blackhat Briefing 2013 (2013). device # type of device manufacturer model lab number of number of number of
[23] John Leyden. 2017. We found a hidden backdoor in Chinese Internet of Things client server recorded
sessions sessions days
devices. (2017). https://www.theregister.co.uk/2017/03/02/chinese
[24] Parikshit N Mahalle, Neeli Rashmi Prasad, and Ramjee Prasad. 2013. Object Baby_Monitor
1 baby_monitor Beseye A 51,578 - 9
classification based context management for identity management in internet of _Pro
2 motion_sensor D_Link DCH_S150 A 1,199 - 55
things. International Journal of Computer Applications 63, 12 (2013). 3 motion_sensor D_Link DCH_S150 A 2,635 7,926 53
[25] John Matherly. 2009. Shodan. https://www.shodan.io/. (2009). (Accessed on 4 refrigerator Samsung RF30HSMRTSL A 1,018,921 2,378 74
05/10/2017). 5 security_camera Simple_Home XCS7_1001 A 4,561 - 8
6 security_camera Simple_Home XCS7_1001 A 300 7,903 47
[26] Yair Meidan, Michael Bohadana, Asaf Shabtai, Juan David Guarnizo, Martin 7 security_camera Withings WBP02_WT9510 B 9,533 - 15
Ochoa, Nils Ole Tippenhauer, and Yuval Elovici. 2017. ProfilIoT: A Machine 8 smoke_detector Nest Nest_Protect A 369 - 56
Learning Approach for IoT Device Identification Based on Network Traffic 9 socket Simple_Home XWS7_1001 A 1,309,849 251,401 53
10 socket Simple_Home XWS7_1001 A 1,499,027 287,275 61
Analysis. In SAC 2017: The 32nd ACM Symposium On Applied Computing. 11 thermostat Nest Learning_Ther._3 A 19,015 - 52
[27] Markus Miettinen, Samuel Marchal, Ibbad Hafeez, N. Asokan, Ahmad-Reza 12 TV Samsung UA40H6300AR A 135,035 5,143 58
Sadeghi, and Sasu Tarkoma. 2016. IoT Sentinel: Automated Device-Type Iden- 13 TV Samsung UA55J5500AKXXS B 9,170 - 15
14 watch LG G_Watch_R A 2,327 - 11
tification for Security Enforcement in IoT. (2016). arXiv:arXiv:1611.04880v2 15 watch LG Urban A 1,090 - 34
https://arxiv.org/abs/1611.04880 16 watch LG Urban A 343 - 5
SmartWatch_3
17 watch Sony A 631 - 15
_SWR50
11
A.2 Features Found to be Important for IoT A.5 Distribution of Session Inter-Arrival Times
Device White Listing at Least Twice on the Test Set across IoT Device Types
(Communicating Regardless of Stimuli)
feature brief description occurrences average
in top-10 importance
ttl_min TCP packet time-to-live, minimum 9 0.045
type of device mean of standard deviation of
ttl_B_min TCP packet time-to-live sent by server, minimum 9 0.033
ttl_firstQ TCP packet time-to-live, quartile 1 9 0.029 session inter-arrival time session inter-arrival time
ttl_avg TCP packet time-to-live, average 8 0.024 refrigerator 0 days 00:00:11.784784 0 days 00:00:03.316659
ttl_B_thirdQ TCP packet time-to-live sent by server, quartile 3 8 0.021 socket 0 days 00:00:04.634451 0 days 00:00:03.408049
ttl_B_median TCP packet time-to-live sent by server, median 7 0.02
ttl_B_firstQ TCP packet time-to-live sent by server, quartile 1 7 0.02 TV 0 days 00:00:58.148296 0 days 00:02:23.671879
ssl_dom_server
Alexa Rank of dominated SSL server 6 0.021
thermostat 0 days 00:00:09.359719 0 days 00:00:17.572645
_name_alexaRank motion_sensor 0 days 00:04:08.519480 0 days 00:10:14.900034
bytes_A_B_ratio Ratio between number of bytes sent and received 6 0.019
reset Total packets with RST flag 4 0.019 baby_monitor 0 days 00:00:01.135635 0 days 00:00:01.287092
http_dom_host security_camera 0 days 00:01:17.907303 0 days 00:01:55.050973
Dominated host Alexa rank 4 0.018
_alexaRank
ttl_thirdQ TCP packet time-to-live, quartile 3 3 0.019
ttl_max TCP packet time-to-live, maximum 3 0.017
ttl_B_var TCP packet time-to-live sent by server, variance 2 0.017
A.6 Confusion Matrices on DS t est Based on a
A.3 ROC for the Classifier Trained Without Moving Window of 20 Sessions
Thermostats
12
actual IoT device type \ classified as 0 1 3 4 5 6 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1962 0 0 0 0 0 0 0 1
3 - watch 0 0 1111 0 0 0 0 0 0 1
4 - smoke_detector 0 0 0 104 0 0 0 0 0 1
5 - motion_sensor 0 0 0 0 1239 0 0 0 0 1
6 - security_camera 0 0 0 0 0 1375 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1981 0 0 1
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - baby_monitor 0 0 0 0 0 0 0 0 1981 1
actual IoT device type \ classified as 0 1 2 3 5 6 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1962 0 0 0 0 0 0 0 1
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
3 - watch 0 0 0 1111 0 0 0 0 0 1
5 - motion_sensor 0 0 0 0 1239 0 0 0 0 1
6 - security_camera 0 0 0 0 0 1375 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1981 0 0 1
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - smoke_detector 0 0 0 0 0 0 0 0 104 1
actual IoT device type \ classified as 1 2 3 4 5 6 7 8 Unknown Accuracy
1 - TV 1954 0 0 0 0 0 0 0 8 0.99
2 - baby_monitor 0 1981 0 0 0 0 0 0 0 1
3 - watch 0 0 1109 0 0 0 0 0 2 0.99
4 - smoke_detector 0 0 0 104 0 0 0 0 0 1
5 - motion_sensor 0 0 0 0 1239 0 0 0 0 1
6 - security_camera 0 0 0 0 0 1375 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1981 0 0 1
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - socket 0 0 0 0 0 0 0 0 1962 1
actual IoT device type \ classified as 0 2 3 4 5 6 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
2 - baby_monitor 0 1981 0 0 0 0 0 0 0 1
3 - watch 0 0 1111 0 0 0 0 0 0 1
4 - smoke_detector 0 0 0 104 0 0 0 0 0 1
5 - motion_sensor 0 0 0 0 1239 0 0 0 0 1
6 - security_camera 0 0 0 0 0 1375 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1981 0 0 1
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - TV 0 0 304 0 0 0 1 0 1657 0.84
actual IoT device type \ classified as 0 1 2 3 4 5 6 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1954 0 0 0 0 0 0 8 0.99
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
3 - watch 0 0 0 1111 0 0 0 0 0 1
4 - smoke_detector 0 0 0 0 104 0 0 0 0 1
5 - motion_sensor 0 0 0 0 0 1239 0 0 0 1
6 - security_camera 0 0 0 0 0 0 1375 0 0 1
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - refrigerator 0 22 0 0 0 0 0 0 1959 0.99
actual IoT device type \ classified as 0 1 2 3 4 5 6 7 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1962 0 0 0 0 0 0 0 1
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
3 - watch 0 0 0 1107 0 0 0 0 4 0.99
4 - smoke_detector 0 0 0 0 104 0 0 0 0 1
5 - motion_sensor 0 0 0 0 0 1239 0 0 0 1
6 - security_camera 0 0 0 0 0 0 1375 0 0 1
7 - refrigerator 0 0 0 0 0 0 0 1981 0 1
Unknown - thermostat 0 0 0 0 0 0 0 0 1981 1
actual IoT device type \ classified as 0 1 2 3 4 6 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1907 0 0 0 0 0 0 55 0.97
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
3 - watch 0 0 0 1064 0 0 0 0 47 0.95
4 - smoke_detector 0 0 0 0 104 0 0 0 0 1
6 - security_camera 0 0 0 0 0 1375 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1954 0 27 0.99
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - motion_sensor 0 0 0 0 0 0 0 0 1239 1
actual IoT device type \ classified as 0 1 2 3 4 5 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1937 0 0 0 0 0 0 25 0.99
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
3 - watch 0 0 0 1097 0 0 0 0 14 0.99
4 - smoke_detector 0 0 0 0 104 0 0 0 0 1
5 - motion_sensor 0 0 0 0 0 1239 0 0 0 1
7 - refrigerator 0 0 0 0 0 0 1979 0 2 0.99
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - security_camera 0 0 0 0 0 77 0 0 1298 0.94
actual IoT device type \ classified as 0 1 2 4 5 6 7 8 Unknown Accuracy
0 - socket 1962 0 0 0 0 0 0 0 0 1
1 - TV 0 1889 0 0 0 0 0 0 73 0.96
2 - baby_monitor 0 0 1981 0 0 0 0 0 0 1
4 - smoke_detector 0 0 0 104 0 0 0 0 0 1
5 - motion_sensor 0 0 0 0 1173 0 0 0 66 0.95
6 - security_camera 0 0 0 0 0 1335 0 0 40 0.97
7 - refrigerator 0 0 0 0 0 0 1945 0 36 0.98
8 - thermostat 0 0 0 0 0 0 0 1981 0 1
Unknown - watch 0 179 0 13 0 0 0 0 0 932 0.84