Analyzing Variation Among Iot Botnets Using Medium Interaction Honeypots
Analyzing Variation Among Iot Botnets Using Medium Interaction Honeypots
Analyzing Variation Among Iot Botnets Using Medium Interaction Honeypots
net/publication/338297464
CITATIONS READS
0 302
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Scalable Distributed Event and Intrusion Detection Systems (DEIDS) for Cyber-Physical Power Systems View project
All content following this page was uploaded by Iman Vakilinia on 01 January 2020.
Abstract—Through analysis of sessions in which files were the attacker can enter commands and see believable output
created and downloaded on three Cowrie SSH/Telnet honeypots, without having access to a real system [4]. This is an ideal
we find that IoT botnets are by far the most common source of environment for tracking the state of IoT botnets, which are
malware on connected systems with weak credentials. We detail
our honeypot configuration and describe a simple method for easy to capture samples from because they are automated and
listing near-identical malicious login sessions using edit distance. not able to perform honeypot detection as thoroughly as a
A large number of IoT botnets attack our honeypots, but real attacker. We collect data using the open source Cowrie
the malicious sessions which download botnet software to the honeypot, which is a continuation of the Kippo honeypot.
honeypot are almost all nearly identical to one of two common Kippo/Cowrie honeypots have previously been used to cluster
attack patterns. It is apparent that the Mirai worm is still the
dominant botnet software, but has been expanded and modified login sessions based on session time and attacker skill [5], and
by other hackers. We also find that the same loader devices deploy visualize attacker location and common commands [6].
several different botnet malware strains to the honeypot over the In this paper we describe a honeypot configuration based on
course of a 40 day period, suggesting multiple botnet deployments the Cowrie SSH/Telnet honeypot for capturing remote login
from the same source. We conclude that Mirai continues to be sessions and downloaded files, expanding from our previous
adapted but can be effectively tracked using medium interaction
honeypots such as Cowrie. work analyzing password attempts [7]. For our analysis, we
Index Terms—Botnet, Honeypot, Internet of Things, Mirai look at sessions which created and downloaded files. Login
session analysis has previously been done using high inter-
I. I NTRODUCTION action honeypots and classifying each command as part of
a specific state [8]. We instead describe a simple method
Shortly after the Mirai malware made headlines in 2016 due
for clustering these sessions using edit distance to enumerate
to its usage in a 620 Gbps attack against krebsonsecurity.com
identical attack patterns. In light of these sessions almost
[1], the botnet code was publicly released as open source
universally being associated with the Mirai malware, we
software. The original version of Mirai targeted vulnerable
provide discussion of Mirai’s functionality. Previous work pro-
Internet of Things (IoT) devices using a short wordlist based
vides complete description of the original malware’s behavior,
on default username and password combinations, amassing an
architecture, and attack methods [9], [10]; we instead focus on
army of hundreds of thousands of connected devices using
how much Mirai has been modified. There is some existing
just these credentials. The release of Mirai’s source code
work in this area. Y. Liu and H. Wang [11] use branch name,
has resulted in numerous clones and modifications by other
configuration, IP addresses, attack methods, and credential
malicious actors which have expanded upon the attacks present
dictionaries to distinguish between binary files from different
in the original version of Mirai. Despite receiving much
Mirai variants. Other work describing Mirai has identified
attention after the 2016 attacks, IoT security is still a major
versions which target different ports and devices [10].
issue and new botnets appear regularly. As a result, there is a
Using our clustering results, we note how many different
need to keep track of developments in IoT botnets so current
Mirai variants attack our honeypots and how much they differ
attacks can be appropriately dealt with.
from one another and the publicly available Mirai source code
Honeypots are a popular technique for capturing malicious
in terms of commands entered, choice of filler words, and
activity. They provide a sandbox environment for malicious
downloaded files. We aim to provide an idea of the amount of
actors to attack, recording each action for later analysis. Hon-
variation present in Mirai attacks both in terms of number
eypots have been used for a wide range of tasks, such as attack
of distinct variants and variation amount. We also provide
pattern comparison, root cause identification, risk assessment,
analysis of variation in files from apparently identical malware
attack frequency analysis, and attack origins analysis [2].
loaders as well as identical IP addresses and find that many IP
Medium interaction honeypots such as Cowrie [3] provide a
addresses serve as loader servers for multiple strains of Mirai-
fake shell and virtual file system to the attacker, such that
based botnets. Finally, we provide information on other attacks
This research is supported by the National Science Foundation (NSF), USA, observed by the honeypot to give some perspective regarding
Award #1739032. the prevalence of Mirai.
II. M ETHODOLOGY access to Elasticsearch and Kibana via an ssh tunnel through
A. Honeypot Configuration the mail server. This allows us to remotely access ELK without
needing to expose the server publicly.
Malware data was collected from 3 honeypots running
the Cowrie SSH/Telnet honeypot [3], the continuation of the B. Dataset
popular Kippo honeypot. The Cowrie honeypot is a medium-
Every event Cowrie adds to its log has a structure defined
interaction honeypot which provides a dummy shell imple-
by the event type, such as cowrie.login.success and
mented in python to the attacker with a virtual filesystem
cowrie.command.input. Each event, regardless of type,
and prefab command outputs. Cowrie can be used to emulate
has a session id field that can be used to identify events
an IoT system by setting the prefab outputs to be consistent
generated by the same session. To collect the data for this
with actual IoT devices, and is therefore a good environment
paper, we enumerated every unique session id associated with
for observing IoT botnets without the overhead of a high-
a cowrie.session.file_download event and used
interaction honeypot. Cowrie has also been modified multiple
Elasticsearch to collect all other events for these ids. The
times during its development specifically in response to issues
file download event is used for both downloads from remote
where sessions from the Mirai malware didn’t result in a file
servers and files created during the login session, so we
download, making it particularly good for capturing samples
consider both for our analysis. The data was collected over
of Mirai.
a period of 40 days, from 4 February 2019 to 15 March 2019,
We made minimal alterations to Cowrie’s default config-
and consists of 84,602 unique login sessions. These sessions
uration as of February 2019. Cowrie’s userdb was modified
account for 21.2% of the 398,233 session dataset; that is,
slightly to allow login for the three default cowrie users
roughly 20% of the time a connection was initiated with a
(root, tomcat, and oracle) given any password. Additionally,
honeypot it resulted in a successful login followed by a file
the honeypot was configured to accept both ssh and telnet
creation or download. These login sessions resulted in a total
connections on ports 22/2222 and 23/2223, respectively. This
of 312 unique files collected by the honeypot.
was done to allow as much traffic as possible. Each honeypot
had an Apache webserver running on port 80 associated with C. Attack Pattern Comparison
a registered domain name and static web page. Finally, each
To identify identical attack patterns, each session was en-
honeypot had a Postfix mailserver running on port 25 with
coded as a list of the first argument for each command entered
several valid email accounts set up. Each portion of the
by the attacker. For example, if an attacker logged in, typed
honeypot (login, web, and mail) ran on a separate virtual
cat /proc/mounts followed by echo test, then ended
machine and received traffic through port forwarding. For the
the session, the encoded list would be ["cat", "echo"].
analysis in this paper, only the Cowrie component is necessary.
A command containing ; (which most shells treat as end
of line), but not commands with && or ||, was split into
multiple commands. If the first argument was a shell such
as /bin/busybox or /bin/bash we instead use the
second argument, which is the first argument to the shell
being called. To identify identical attack patterns, Levenshtein
distance between lists of arguments was used as a similarity
metric. Levenshtein distance computes the number of edits
(substitution, insertion, and deletion) required to match two
sequences of non-equal length, providing a rough estimation of
similarity between login sessions. Edit distance has previously
been used for applications such as root cause analysis [12].
The metric was normalized by dividing by its upper bound,
the largest number of arguments in either session. This metric
Fig. 1. Configuration of the honeypot used for data collection. Three
honeypots were deployed in this manner. does not consider state or different but functionally identical
commands, but is sufficient for identifying connections from
To aggregate logs from the web and ssh servers, we used bots using the same code. Based on the results shown in Figure
Filebeat to ship logs to a server running the ELK stack, 2, we consider attack patterns identical if their normalized
composed of Elasticsearch, Logstash, and Kibana. Logstash distance is less than 0.25
is used to transform the JSON data generated by Cowrie and
the Combined Log Format data generated by Apache into the III. S ESSION A NALYSIS
format used by the Elasticsearch search and analytics engine. The two most common attack patterns seen on the honeypot
We make an ssh server available on the Postfix machine by are closely related to publicly available Mirai loader code.
forwarding port 22222 on the pfsense router to port 22 on These two attack patterns account for 97.7% of the dataset, as
the mail server. We also make the ELK machine accessible well as 244 out of the 312 unique files. 236 of these files were
through ssh over port 22 from the internal network, allowing identified as strains of the Mirai IoT botnet malware by at least
the system’s cpu architecture by reading the header for the
echo system binary using cat, and checking for wget
and tftp, the loader downloads a file using wget then
attempts to execute it. In the publicly released Mirai loader,
the downloaded file is a variant of Mirai for the compromised
system’s cpu architecture. We note that all malware collected
by the honeypot was compiled for x86 systems, which is
consistent with the x86 64 architecture displayed by Cowrie.
Fig. 2. Histogram of distance measures for the two most common attack The most common attack pattern follows the same logic as
patterns. For both patterns, there is a clear distinction between distances less the Mirai source code, but does not use the same commands.
than .25 and distances greater than .25. Instead of using echo to save raw hex to a file, this pattern
uses > to create empty files in several directories then attempts
one antivirus software used by the VirusTotal antivirus report to cd to those directories if the redirection was successful.
aggregator [13]. Of the remaining eight, six were identified Unlike in the original Mirai loader, the attacker in this case
by at least one source as Gafgyt, another earlier IoT botnet does not remove any of the created files. Additionally, this
malware with behavior very similar to Mirai [14] [15], and pattern uses read to obtain a binary file header if cat
the other two were not flagged as malware by any antivirus fails. However, after a writable directory is found and the
at the time we submitted them for analysis. The two attacks cpu architecture is determined the loaders are identical. This
are entirely accounted for by 125 unique IPs, of which 16 are pattern is far more common than the previous pattern, and
common to both attacks. accounts for 74.3% of the observed sessions.
To better understand the observed botnet loader sessions, Most of the differences internal to each attack pattern are
we provide a brief description of Mirai’s spreading function- related to the first commands entered by the loader. We define
ality. The Mirai botnet grows through scanning of randomly these commands as everything coming before the first >
generated IPs by infected devices. When an infected device for the most common pattern and everything coming before
detects another vulnerable device, it sends the credentials it the first echo for the second most common pattern. All
used to access that vulnerable device to a report server. After loaders generally begin with enable, shell, and sh, in
a certain period of time, a loader device then connects to which the attacker attempts to gain access to privileged-mode
the compromised device using the reported credentials and commands and run a Bourne shell [17]. Minor variations on
downloads and executes the Mirai malware to add the device these commands show up in different loaders, as shown in
to the botnet. A visual depiction of this behavior is provided Table 1. It appears that different attackers use the same loader
in Figure 3. structure, but make minor modifications to the first commands
run by the loader.
TABLE I
I NITIAL COMMANDS FOR MOST COMMON ATTACK PATTERN
Fig. 8. Mapping between TOKEN QUERY and FN BINARY for the second
pattern.
Fig. 7. Mapping between IP addresses and file hashes for the second most
common attack pattern. The first column is an empty file hash (no file was
downloaded).
demonstrating similar variation in packing and attack methods.
It appears common for attackers to change the botnet con- We surmise that attackers deploy different strains over time to
nected to a loader device, resulting in the same IP attacking the expand their botnets further, or different hackers take over the
honeypot with several different strains of the Mirai malware. same loader device. We again note that 16 of the 125 unique
It is far more rare, in contrast, for different IP addresses to IPs associated with the two most common attacks launched
download the same malware to the honeypot. While many both the most common attack pattern and the second most
unique files are presumably minor updates to existing files, it common attack pattern, indicating that the same IP addresses
is clear that many IPs are downloading highly distinct variants are used for different botnet architectures entirely.
of Mirai. One IP address, for example, downloaded five unique An entirely different relationship between file hash and IP
files to the honeypot. Three of the five were signed in ASCII address is apparent in the second most common attack pattern.
by one of two authors. One was packed with UPX, and Unlike the first pattern, in which every session results in a file
two contain strings indicating an attack attempting to exploit being downloaded to the system, file downloads frequently
CVE-2017-17215. Another IP downloaded 29 unique files, fail to be captured in the second attack. This can be seen
Fig. 9. Mapping between IP address and TOKEN QUERY variable for the most common attack.
R EFERENCES
[1] B. Krebs. (2016, September) Krebsonsecurity hit with record ddos. [On-
line]. Available: https://krebsonsecurity.com/2016/09/krebsonsecurity-
hit-with-record-ddos/
[2] M. Nawrocki, M. Wählisch, T. C. Schmidt, C. Keil, and
J. Schönfelder, “A survey on honeypot software and data
analysis,” CoRR, vol. abs/1608.06249, 2016. [Online]. Available:
http://arxiv.org/abs/1608.06249
[3] M. Oosterhof. Cowrie. [Online]. Available: github.com/cowrie/cowrie
[4] G. Wicherski, “Medium interaction honeypots,” in German Honeynet
Project, 2006.
[5] D. Fraunholz, D. Krohmer, S. D. Anton, and H. Dieter Schotten,
“Investigation of cyber crime conducted by abusing weak or default
passwords with a medium interaction honeypot,” in 2017 International
Conference on Cyber Security And Protection Of Digital Services (Cyber
Security), June 2017, pp. 1–7.
[6] I. Koniaris, G. Papadimitriou, and P. Nicopolitidis, “Analysis and visu-
alization of ssh attacks using honeypots,” in Eurocon 2013, July 2013,
pp. 65–72.
[7] I. Vakilinia, S. Cheung, and S. Sengupta, “Sharing susceptible passwords
as cyber threat intelligence feed,” in MILCOM 2018-2018 IEEE Military
Communications Conference (MILCOM). IEEE, 2018, pp. 1–6.