Paper 3
Paper 3
Paper 3
ABSTRACT on Data and Application Security and Privacy (CODASPY ’21), April 26–
Tracking user activities inside an enterprise network has been a 28, 2021, Virtual Event, USA. ACM, New York, NY, USA, 12 pages. https:
//doi.org/10.1145/3422337.3447831
fundamental building block for today’s security infrastructure, as it
provides accurate user profiling and helps security auditors to make
informed decisions based on the derived insights from the abun-
1 INTRODUCTION
dant log data. Towards more accurate user tracking, we propose
a novel paradigm named UTrack by leveraging rich system-level Nowadays, cyber-attacks have been becoming more sophisticated
audit logs. From a holistic perspective, we bridge the semantic gap and stealthy. In an Advanced Persistent Threat (APT) attack, an
between user accounts and real users, tracking a real user’s activi- attacker may lurk in the target network for more than half a year
ties across different user accounts and different network hosts based on average, escalating and maintaining the access privilege without
on causal relationship among processes. To achieve better scalabil- being caught [40]. As a result, there is an increasing demand of
ity and a more salient view, we apply a variety of data reduction and user tracking inside an enterprise network, in order to improve the
compression techniques to process the large amount of data. We visibility for the network monitoring, and help security analysts
implement UTrack in a real enterprise environment consisting of to make informed decisions on the detection of insider attacks and
111 hosts, which generate more than 4 billion events in total during targeted APT attacks. A recently enabled paradigm in the security
the experiment time of one month. Through our evaluation, we industry, called User Behavior Analytics (UBA) [38, 39], is built
demonstrate that UTrack is able to accurately identify the events upon this foundation. UBA categorizes a range of techniques that
that are relevant to user activities. Our data reduction and compres- keep monitoring user activities and identifying those that deviate
sion modules largely reduce the output data size, producing a both from normal user sessions. While UBA is a rather broad concept
accurate and salient overview on a user session profile. that can be applied to many scenarios on a different level, granu-
larity, and scope, its fundamental building block is to accurately
identify and model user activities. Capturing user activities with an
CCS CONCEPTS inaccurate or incomplete view could result in incorrect detection
• Security and privacy → Distributed systems security. or analysis.
Towards more accurate user modeling and verification, contem-
KEYWORDS porary UBA approaches attempt to fuse data from different data
Audit Logs; Forensics Analysis; User Tracking sources for creating a more comprehensive risk profile [33, 41].
Though they are useful in many scenarios [33, 39, 41], an inher-
ACM Reference Format: ent limitation is that they all lack a holistic view on systems since
Yue Li, Zhenyu Wu, Haining Wang, Kun Sun, Zhichun Li, Kangkook Jee, data are collected from only a couple of security-sensitive appli-
Junghwan Rhee, and Haifeng Chen. 2021. UTrack: Enterprise User Tracking cations, such as firewalls and proxies. Under such a setting, many
Based on OS-Level Audit Logs. In Proceedings of the Eleventh ACM Conference meaningful events could be missed, not to mention the difficul-
ties of correlating data with different syntax and semantics from
a variety of sources. A natural approach would be to leverage log
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed data at the operating system (OS) level, which can record data for
for profit or commercial advantage and that copies bear this notice and the full citation all applications under homogeneous syntax and comprehensible
on the first page. Copyrights for components of this work owned by others than ACM semantics. Such an audit log system is widely deployed in many se-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a curity infrastructures [22–24, 26, 27], mainly for forensics purposes.
fee. Request permissions from [email protected]. SLEUTH [18] and HOLMES [29] leverage system logs to identify
CODASPY ’21, April 26–28, 2021, Virtual Event, USA APT attacks based on abstracted security sensitive activities: “tags"
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8143-7/21/04. . . $15.00 for SLEUTH and “Tactics, Techniques and Procedures" (TTPs) for
https://doi.org/10.1145/3422337.3447831 HOLEMES.
In this paper, we present a novel user tracking system, named 2 MOTIVATIONS AND CHALLENGES
as UTrack, by leveraging the rich system log data to universally
2.1 Motivations
monitor user session activities. In addition to focusing on identify-
ing, consolidating, and scrutinizing security sensitive events [18, Contemporary user behavior monitoring is mostly done on dis-
29], UTrack is user-centric. UTrack does not pay more attention on parate applications and services. However, such a methodology
pre-defined “sensitive" read or write. Instead, the goal of UTrack is has multiple drawbacks that limit the usability of the monitoring
to present a user’s activity profile accurately and concisely, such system. The first drawback is the lack of completeness. In these
that more domain-specific behavior can be audited. For example, systems, only a small portion of user activities are recorded and an-
an employee copying a large amount of digital assets from the alyzed, since the logs are only generated from applications that are
company should be known by UTrack. usually perceived to be of strong security indication, for instance, a
We identify and tackle two major challenges. The first is to bridge firewall, a web proxy, or a sensitive database service. All other user
the semantic gap between user accounts and human users in both activities are not actively monitored. However, a successful attack,
in-host and cross-host scenarios. This is done by tracking causal especially an APT attack, usually comprises many individual steps.
relationship among processes through the user session root and The traces of each step may be buried in seemingly less interesting
correlating network events to identify network control channels. events that are not recorded by applications. By connecting these
The second challenge is to address the “needle in a haystack” prob- dots, one may detect an intrusion that cannot be identified by con-
lem stemmed from the huge volume of log data through a variety of ventional user behavior analytics. In contemporary user tracking
data reduction techniques. Unlike many previous works on log data schemes, the auditor lacks this holistic view on the entire system.
reduction [25, 46] that aim at information-lossless reduction, our The other limitation is the difficulty of correlating log data. Data
data pruning approach is to prune data that may carry meaningful collected from different services and applications may be of different
information but are out of the scope of user activity tracking. formats, granularity, and semantic levels. Parsing and correlating
We deploy UTrack in an enterprise network that comprises more data from different sources is very challenging. As a result, data
than 100 hosts running either Windows or Linux operating systems from individual sources are independently handled and analyzed in
with real users. The users are well aware of the setup. This is a many cases. Shashanka et al. [33] attempted to associate subjects
general setup among many enterprises that the company devices from different data sources, such as different IP addresses and user
are actively monitored. We manage to process log data from all the accounts. However, the capability of such an association is limited
hosts on a single machine, and demonstrate that UTrack is able to to a small scope, where the subjects are tightly bounded. Therefore,
accurately identify and concisely present the events that represent the inspector lacks view on the connections among critical pieces
activities of a real user inside the network in a human-consumable of puzzle from all data sources.
fashion. System Opportunities: A universal user activity tracking sys-
In summary, we make the following contributions. tem, which monitors activities of all users inside an enterprise
network, is very useful to resolve or mitigate the aforementioned
(1) We develop a new universal user tracking mechanism (UTrack) problems. However, recording all activities of individual users in
based on OS-level audit logs. UTrack aims to bridge the se- the entire network may incur significant system overhead. To bal-
mantic gap between human users and computer user accounts ance the trade-off between system overhead and data granularity,
by identifying and associating system events that appear in we leverage an OS level log system to collect data from each host
different user accounts and different hosts but belong to a inside a network. The OS level log system collects low level system
single user session. objects, such as processes, files, and network connections, which
(2) We apply effective data reduction methods on user session largely preserve the running states of a computer at a certain time.
profiles to achieve a scalable and salient presentation. The Thus, it can be used to accurately reconstruct the causality among
reduction mainly involves detecting interactive processes objects with clean semantics. Meanwhile, the data volume is at a
and modeling common data patterns. manageable level. Nowadays, many enterprises have deployed such
(3) We implement UTrack in a real enterprise environment, with a log system for forensics purposes [22–24, 26, 27].
data collected from more than 100 hosts. Our evaluation re-
sults show that UTrack is accurate and concise in presenting 2.2 Challenges
user activities. UTrack scales well with a low resource con- 2.2.1 Accurate Modeling of User Behaviors. When processing audit
sumption. logs, a user account is often considered equivalent to the user
itself. This is mostly true in some high-level applications, such as
The rest of the paper is organized as follows. Section 2 describes Facebook and Twitter. However, the assumption no longer holds
the motivation and challenges of an OS-level log based universal when it comes to low-level OS events.
user tracking system. Section 3 presents system overview. Section 4 Unlike application-specific logging that is clearly defined and has
elaborates how UTrack tracks user sessions across different ac- much higher semantic awareness, a generic OS-level log system
counts and different hosts. Section 5 presents various techniques monitors events with respect to individual user accounts. In an
UTrack adopts to pinpoint relevant events. Section 6 details the enterprise network, a user may have multiple user accounts, and
implementation and evaluation of UTrack, and Section 7 discusses a user account could be accessible by multiple users. For instance,
more use cases. Section 8 surveys related works, and finally, Sec- a network administrator could access both its personal account
tion 9 concludes the paper. and the root account on a web server. The web server may also
In general, there exists a semantic gap between user accounts
and human users. Solely relying on the user accounts to track the
behavior of a user is not reliable as it lacks proper linkage of user
account transition and service account delegation. We realize this
semantic gap, and discard this intuitive but invalid assumption in
our user tracking system. To clearly set boundaries between the
two concepts, thereafter, the term “user" always indicates a real
human user, while the term “account" always indicates a user (or
system) account in a computer system.
Figure 6: File and IP Abstraction Linux users are less likely to log off or restart their computers than
Windows users. Besides, there exist 4 Linux hosts that do not have
any user sessions, which means that they are used as servers and
5.4 Presentation Simplification no one logs on the hosts through the Linux desktop environment.
Different from conventional backtracking or forward tracking of an However, the activities in those servers may be correlated to user
attacking incident, the session profile produced by UTrack describes sessions in other hosts. On average, each user session lasts 4.6 day.
a user session. Thus, the session profile becomes unavoidably larger We also observe that Linux sessions are significantly longer (9.1
and cannot be further compressed since all data points carry mean- days) than Windows sessions (3.9 days). More than 100 sessions
ingful information. To better visualize the data for user tracking, last beyond the one month period, so they are excluded when we
we use graphs to present all processes, files, and network connec- compute the average session lifespans.
tions in a user session. We visualize the session profile generated For cross-host tracking, we first identify the communication
by UTrack using the dot language, and then apply different level of channels. We correlate network events from all hosts by matching
simplification on the graph. 5-tuple attributes, which include local IP, remote IP, local port, re-
A fundamental challenge of presenting the session profile on mote port, and the network protocol. However, due to port or IP
a single graph is that the graph could be very large due to pro- recycling, two network events might be wrongly matched. To avoid
cesses accessing a large amount of files or network connections in such a situation, we add a constraint that two matching events
a long session. To alleviate this issue, we aggregate similar files and should happen within a small time window. This small window
network connections when visualizing the session profile graph. should consider the possible errors caused by asynchronous clocks
For instance, the activities of a process can be represented as in on different hosts and network resource recycling. In our imple-
Figure 6. For the files, we find common prefixes of the file names mentation, we set the time window to 60 seconds, and we recycle
and abstract them with the same prefix. The network connections the unpaired events after this time window.
are either aggregated using the host name of the IP address.The In our environment, the number of all ready-to-pair network
details of the abstraction can be found in Section 6.5. Note that events stabilizes at around 20,000 to 25,000. We observe that only
the essential difference between the presentation abstraction and around 12.3% of network events can be eventually paired, and most
the data reduction/compression techniques is that the actual data of the matched network events (82.4%) are localhost channels. This
model is not changed in the presentation simplification process. is reasonable because any communication to the outside world
Namely, the abstraction does not preserve any resource, but is only cannot be paired. Even the internal communication may not be
used to help the security auditors to have a better view on the data. identified, since not all computers host an agent in our environment.
Another case is the broadcast network events, which have multiple
6 IMPLEMENTATION AND EVALUATION receivers. When the server is working in the worker-pool mode, it
may take a non-negligible time to determine the delegated worker,
6.1 Experiment Environment since it needs to go through a network channel matching process. If
We deploy UTrack on 111 hosts of a real enterprise environment, a worker is found, a virtual process will be created for the requester.
21 Linux hosts and 90 Windows hosts. An agent is installed in each However, before the virtual process is created, the network request
host to collect and report system events. UTrack itself is written may have already been partially or entirely handled, because most
in Java and contains 8.3K LoC. We evaluate the performance of requests are handled very quickly. Thus, one should record the
UTrack based on one month of data. Within this period, more than mapping between the virtual node and the actual node, and migrate
4 billion events are generated, where 1.65 billion events come from the stand-out events to the virtual node once the delegation relation
Windows hosts and 2.41 billion events come from Linux hosts. To is established.
facilitate the use of history data, we implement a data replayer During the one-month experiment, we observe more than 186
to replay the data recorded and stored in the database with their programs that accept network connections, and the top 5 programs
original timestamps. With the assistance of the replayer, we are are listed in Table 1. The “Number of Instances” column shows
able to replay the one-month data within 30 hours. the total number of request processing instances we observed. In
our environment, since a server frequently runs ”ss” to localhost
6.2 User Tracking for system backup, we observe a large number of ssh events. We
In our one month experiment, we identify 507 user sessions across also find a Postgres database that constantly stores new data from
111 hosts. Note that the login screen itself is counted as a user network connections. An Apache server runs the default pre-fork
session and excluded from our data. Among the total 507 user Multi-Processing Module (MPM) to support a worker pool. The
sessions, only 61 of them are Linux sessions. One reason is that “User Instance” column indicates the instances that belong to a user
Table 2: Classification Results
Windowsxplorer.exe *.SMTP.com
(3/3) Dropbox.exe
(1/12) 162.125.X.X/16
8 RELATED WORKS
*.cloudfront.net User Tracking and UBA User or user activity tracking has been
chrome.exe extensively studied in different contexts and various techniques
putty.exe 7/129 *.canonical.com have been proposed. One typical scenario is web user tracking
[29]
103.235.X.X/16 through different measures [1, 3, 28]. User behavior tracking for the
putty.exe security purpose drives UBA, where user accounts are no longer
chrome.exe type=renderer
*.facebook.com the single indicator of who an incident is performed by. Nowadays,
sshd
[4] many security companies have announced UBA tool integration or
ls
sshd X.py plan to develop UBA in their systems [4, 19, 20, 35, 42].
bash
w UBA consists of two steps. The first is to model normal user
vi X.py behaviors, and the second is to detect abnormal users by examining
clear
bash ls how deviated they are from normal users. There can be many
python X.py
sh c ls metrics, algorithms, or machine learning models being used to
(9/75) [72] identify an abnormal user [33, 35]. Contemporary UBA mostly
python X.py
(9/75) ls models users based on basic patterns or statistics, for example,
sh c ls
[689] several basic statistics, such as total upload bytes and total download
bytes of a user [33]. However, to detect more sophisticated attacks, it
Figure 9: Example User Profile is vital to ensure high accuracy and descriptiveness of user activities.
Path1:C:/Users/X/appdata/local/microsoft/windows/temporaryInternetfiles/content.IE5
Path2: C:/Users/X/appdata/local/microsoft/outlook
Log Audit Log audit has been used in many fields of security
Path3: C:/Users/X/appdata/local/TEMP research, such as forensics analysis [22, 23, 27], intrusion recov-
File1: C;/program files/common files/system/ado/msadox.dll ery [14, 21], and intrusion detection [10]. One of the most widely
adopted log levels is the OS level, where the basic units are process,
files, sockets, etc. The reason is that the OS level maintains high
and the number of total files inside the process node. The case of fidelity of states of the entire system, as well as incurring acceptable
a process without a number indicates that the files are completely CPU and storage overhead [22]. There are previous works focusing
modeled. Meaningful files are preserved as nodes on the graph. The on the reduction of the storage overhead while not losing much
timing information is not included in this figure due to the space information [25, 46]. Besides, there are also previous works that
limit. Besides, timing is not critical in understanding the figure. The attempt to increase data granularity based on OS level logs [24, 26].
number of abstracted branches is shown in brackets on edges. One important use of log audit is to understand an attack, espe-
From the figure, we can easily find the user’s activities inside cially more sophisticated attacks (APT attacks) or unknown attacks.
the enterprise network. The user logs in the system on a Windows Security experts rely on the logs to determine how an attack hap-
host and the session lasts for six hours. The session spans two hosts pens [22, 24, 27], as well as its impact on the system [23]. They
through interactive ssh connections using putty. On the Windows capture the causal relationship among processes, files, or sockets,
host, the user browses the Internet via Chrome and uses Outlook and reconstruct the provenance of an attack and its ramification.
for emailing. The user then logs on a remote Linux host to edit and HERCULE [30] leverages community discovery algorithms to iden-
run a python program “X.py" (the file name is anonymized), which tify attacks based on the fact that the attack activities belong to
further runs the “ls" program many times. In general, a graph-based the same community in a graph. [6] logs events at the proxy and
session profile presentation can be easily understood by a human focuses on parsing traffic from application protocols like SQL.
auditor, and provides important insights on the activities of a user. User Interaction Detection The detection of bot generated
data (system-triggered) from human-generated data (user-triggered)
7 USE CASES is a long-studied subject that has applications in many fields. Gen-
Many more UBA features can be directly applied to UTrack for erally there are two types of detection. One is the active detection,
anomaly detection. For example, one can audit the roles (user ac- such as CAPTCHA [44], which is easy to implement, (arguably)
counts) that a user has been playing in the network from the user more accurate, but intrusive. The other type is the passive detection,
profiles and identify higher-level inconsistencies. For instance, one which relies on processing log events to detect abnormal behav-
cannot be both “Alice” and “Bob” in the same session profile. Besides iors. The related previous works include detecting game cheaters
providing a foundation of UBA systems, there are many other use through Human Observational Proof [13], bots in online social
cases that can be built on top of UTrack. For example, it can be used networks [7–9, 43], detecting malicious web bots/crawlers, Google
in forensics analysis to study the behavior of attackers (such that reCaptcha [34], and malicious crawler detection [45]. There are
the attacker becomes the POI) and reveal more seemingly benign some significant differences between these techniques and ours. A
major one is that they have specially tailored data input. For exam- Real-time attack scenario reconstruction from COTS audit data. In 26th USENIX
ple, user agent, cookie lifetime in Google’s reCaptcha [34], a user Security Symposium. 487–504.
[19] IBM. 2016. IBM QRadar User Behavior Analytics. https://www.ibm.com/cz-
account favored access log system in [13, 45], or side information en/marketplace/qradar-user-behavior-analytics.
such as social graph [7, 9, 13, 43]. [20] Johna Till Johnsons. 2015. User behavioral analytics tools can thwart security
attacks. http://searchsecurity.techtarget.com/feature/User-behavioral-analytics-
tools-can-thwart-security-attacks.
9 CONCLUSION [21] Taesoo Kim, Xi Wang, Nickolai Zeldovich, and M Frans Kaashoek. 2010. Intrusion
Recovery Using Selective Re-execution.. In USENIX OSDI. 89–104.
This paper presents UTrack, a novel user tracking system that con- [22] Samuel T King and Peter M Chen. 2003. Backtracking intrusions. ACM SOSP
nects events under different user accounts and from different hosts (2003), 223–236.
to form a novel holistic user session profile. UTrack enables a sys- [23] Samuel T King, Zhuoqing Morley Mao, Dominic G Lucchetti, and Peter M Chen.
2005. Enriching Intrusion Alerts Through Multi-Host Causality.. In NDSS.
tem auditor to easily find out the activities of users inside enterprise [24] Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013. High Accuracy Attack
networks. UTrack associates the activities of a user effectively by Provenance via Binary-based Execution Partition.. In NDSS.
identifying a session root and then following both the local process [25] Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013. LogGC: garbage col-
lecting audit log. In Proceedings of the 2013 ACM SIGSAC Conference on Computer
lineage and the network control flow of the session root. To achieve and Communications Security. 1005–1016.
scalability and salient description, UTrack employs an interaction [26] Shiqing Ma, Kyu Hyung Lee, Chung Hwan Kim, Junghwan Rhee, Xiangyu Zhang,
and Dongyan Xu. 2015. Accurate, low cost and instrumentation-free security
detection module to sift out the most relevant events that result audit logging for windows. In ACM ACSAC. 401–410.
from users’ interactions, and models common file and activity pat- [27] Shiqing Ma, Xiangyu Zhang, and Dongyan Xu. 2016. ProTracer: towards practical
terns. Our evaluation in a real enterprise environment of 111 hosts provenance tracing by alternating between logging and tainting. In Proceedings
of NDSS, Vol. 16.
shows UTrack’s effectiveness on producing accurate and concise [28] Jonathan R Mayer and John C Mitchell. 2012. Third-party web tracking: Policy
user session profiles for system auditors to use. and technology. In IEEE Symposium on Security and Privacy 2012. 413–427.
[29] Sadegh M Milajerdi, Rigel Gjomemo, Birhanu Eshete, R Sekar, and VN Venkatakr-
ishnan. 2019. Holmes: real-time apt detection through correlation of suspicious
REFERENCES information flows. In 2019 IEEE Symposium on Security and Privacy. 1137–1152.
[1] Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind [30] Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing Ma, Fei Wang, Zhiwei
Narayanan, and Claudia Diaz. 2014. The web never forgets: Persistent tracking Zhang, Luo Si, Xiangyu Zhang, and Dongyan Xu. 2016. Hercule: Attack story
mechanisms in the wild. In Proceedings of the 2014 ACM CCS. reconstruction via community discovery on correlated log graph. In Proceedings
[2] Animashree Anandkumar, Chatschik Bisdikian, and Dakshi Agrawal. 2008. Track- of the 32nd Annual Conference on Computer Security Applications. ACM, 583–595.
ing in a spaghetti bowl: monitoring transactions using footprints. In ACM SIG- [31] Patrick Reynolds, Janet L Wiener, Jeffrey C Mogul, Marcos K Aguilera, and Amin
METRICS Performance Evaluation Review, Vol. 36. 133–144. Vahdat. 2006. WAP5: black-box performance debugging for wide-area systems.
[3] Richard Atterer, Monika Wnuk, and Albrecht Schmidt. 2006. Knowing the user’s In Proceedings of the 15th International Conference on World Wide Web. 347–356.
every move: user activity tracking for website usability evaluation and implicit [32] Bo Sang, Jianfeng Zhan, Gang Lu, Haining Wang, Dongyan Xu, Lei Wang, Zhi-
interaction. In WWW. hong Zhang, and Zhen Jia. 2012. Precise, scalable, and online request tracing for
[4] BALABIT. 2015. Privileged Account Analytics - User Behavior Analytics Security multitier services of black boxes. IEEE Transactions on Parallel and Distributed
Solution. https://www.balabit.com/privileged-account-analytics. Systems 23, 6 (2012), 1159–1167.
[5] Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using [33] Madhu Shashanka, Min-Yi Shen, and Jisheng Wang. 2016. User and entity behav-
Magpie for Request Extraction and Workload Modelling.. In USENIX OSDI. ior analytics for enterprise security. In 2016 IEEE Big Data. 1867–1874.
[6] Adam Bates, Wajih Ul Hassan, Kevin Butler, Alin Dobra, Bradley Reaves, Patrick [34] Suphannee Sivakorn, Jason Polakis, and Angelos D Keromytis. 2016. I’m not a
Cable, Thomas Moyer, and Nabil Schear. 2017. Transparent Web Service Audit- human: Breaking the Google reCAPTCHA. Black Hat,(i) (2016), 1–12.
ing via Network Provenance Functions. In Proceedings of the 26th International [35] Splunk. 2015. Splunk User Behavior Analytics.
Conference on World Wide Web. 887–895. https://www.splunk.com/en_us/products/premium-solutions/user-behavior-
[7] Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding analytics.html.
the detection of fake accounts in large scale social online services. In Proceedings [36] Byung-Chul Tak, Chunqiang Tang, Chun Zhang, Sriram Govindan, Bhuvan
of the 9th USENIX Conference on Networked Systems Design and Implementation. Urgaonkar, and Rong N Chang. 2009. vPath: Precise Discovery of Request
15–15. Processing Paths from Black-Box Observations of Thread and Network Activities..
[8] Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2010. Who is In USENIX ATC.
tweeting on Twitter: human, bot, or cyborg?. In Proceedings of the 26th ACM [37] Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-
Annual Computer Security Applications Conference. 21–30. Malek, Julio Lopez, and Gregory R Ganger. 2006. Stardust: tracking activity in a
[9] George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting Sybil Nodes using distributed storage system. In ACM SIGMETRICS Performance Evaluation Review,
Social Networks.. In NDSS. San Diego, CA. Vol. 34. 3–14.
[10] Dorothy E Denning. 1987. An intrusion-detection model. IEEE Transactions on [38] Mike Tierney. 2015. The Rise of User Behavior Analytics.
software engineering 2 (1987), 222–232. http://www.veriato.com/company/blog/veriato-blog/2015/12/15/the-rise-
[11] David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M Chen. of-user-behavior-analytics.
2014. Eidetic Systems.. In USENIX OSDI. 525–540. [39] Roy Hodgman Tod Beardsley. 2015. RAPID 7 Research Report: Understanding
[12] Steven Gianvecchio and Haining Wang. 2007. Detecting covert timing chan- User Behavior Analytics.
nels: an entropy-based approach. In Proceedings of the 14th ACM Conference on [40] Trustwave. 2015. Trustwave global security re-
Computer and Communications Security. 307–316. port. https://www2.trustwave.com/rs/815-RFM-
[13] Steven Gianvecchio, Zhenyu Wu, Mengjun Xie, and Haining Wang. 2009. Battle 693/images/2015_TrustwaveGlobalSecurityReport.pdf.
of botcraft: fighting bots in online games with human observational proofs. [41] Melissa Turcotte and Juston Shane Moore. 2017. Technical Report LA-UR-17-
In Proceedings of the 16th ACM Conference on Computer and Communications 21663: User Behavior Analytics.
Security. 256–268. [42] VARONIS. 2016. User Behavior Analytics. https://www.varonis.com/user-
[14] Ashvin Goel, Kenneth Po, Kamran Farhadi, Zheng Li, and Eyal De Lara. 2005. behavior-analytics/.
The taser intrusion recovery system. In ACM SOSP. 163–176. [43] Bimal Viswanath, Ansley Post, Krishna P Gummadi, and Alan Mislove. 2010.
[15] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, An analysis of social network-based sybil defenses. ACM SIGCOMM Computer
and Ian H Witten. 2009. The WEKA data mining software: an update. ACM Communication Review 40, 4 (2010), 363–374.
SIGKDD explorations newsletter (2009). [44] Luis Von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel
[16] Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without Blum. 2008. recaptcha: Human-based character recognition via web security
candidate generation. In ACM SIGMOD Record, Vol. 29. 1–12. measures. Science 321, 5895 (2008), 1465–1468.
[17] Wajih Ul Hassan, Mark Lemay, Nuraini Aguse, Adam Bates, and Thomas Moyer. [45] Shengye Wan, Yue Li, and Kun Sun. 2017. Protecting Web Contents against
2018. Towards Scalable Cluster Auditing through Grammatical Inference over Persistent Distributed Crawlers. In IEEE ICC.
Provenance Graphs. In NDSS. [46] Zhang Xu, Zhenyu Wu, Zhichun Li, Kangkook Jee, Junghwan Rhee, Xusheng
[18] Md Nahid Hossain, Sadegh M Milajerdi, Junao Wang, Birhanu Eshete, Rigel Xiao, Fengyuan Xu, Haining Wang, and Guofei Jiang. 2016. High fidelity data
Gjomemo, R Sekar, Scott Stoller, and VN Venkatakrishnan. 2017. SLEUTH: reduction for big data security dependency analyses. In ACM CCS.