Detecting Data Exfiltration by Integrating Information Across Layers

Detecting Data Exfiltration by Integrating Information Across Layers
Puneet Sharma, Anupam Joshi and Tim Finin

Computer Science and Electrical Engineering
University of Maryland, Baltimore County
{tc56339, joshi, finin}@umbc.edu
Abstract abilities in popular software installed on the computer.

The malicious process then hides behind a legitimate
Data exfiltration is the unauthorized leakage of confi- one via code injection and evades detection. Once root
dential data from a system. Unlike intrusions that seek privileges are gained, the attack designer has several
to overtly disable or damage a system, it is particularly options. One is to install the payload on the host
hard to detect because it uses a variety of low/slow vec- followed by opening a remote shell on the attacker’s
tors and advanced persistent threats (APTs). It is of- machine. Another is to reduce its footprint by not
ten assisted (intentionally or not) by an insider who installing any payload, but adding functionality that
might be an employee who downloads a trojan or uses enables it to scan the victim’s machine on its own to
a hardware component that has been tampered with or discover the information sought and relay it back to
acquired from an unreliable source. Conventional scan a remote machine before removing traces of the attack
and test based detection approaches work poorly, espe- ands deleting itself. Many incidents have been reported
cially for hardware with embedded trojans. We describe (e.g. [20, 21]) in which data exfiltration took place in a
a framework to detect potential exfiltration events that manner similar to this scenario. There is concern that
actively monitors of a set of key parameters that cover this is not limited to USB sticks, but can be done us-
the entire stack, from hardware to the application layer. ing compromised components and their firmware (e.g
An attack alert is generated only if several monitors network cards, disks, ...).
detect suspicious activity within a short temporal win- Our scenario and many reported incidents have sev-
dow. The cross-layer monitoring and integration helps eral common features: they are triggered by the use
ensure accurate alerts with fewer false positives and of a new hardware device; most are designed to com-
makes designing a successful attack more difficult. municate with a remote system after the host has been
compromised; and a trusted insider is an unwitting con-
tributor to the infiltration by providing the attacker
1. Introduction with initial access to the host.
Ensuring tamper free hardware has become ex-
Data Exfiltration is a key target of many of the more tremely difficult due to the increased use of interna-
sophisticated attacks today. It is typically engaged tional supply chains by vendors of commodity hard-
in by State actors and transnational crime syndicates ware components. Most have supply chains with com-
and uses a variety of advanced persistent threats and ponents coming from several countries, assembled in
low/slow vectors based on novel (zero day) exploits. others, and re-branded and marketed in dozens more.
Such attacks are much harder to detect than those While this has lowered costs, it has also made imple-
seeking to bring down a system or deny access to it. menting comprehensive security checks much more dif-
Concerns have increased that such attacks can use tro- ficult. Since the security infrastructure has to be now
jans embedded in commodity hardware that is manu- deployed over a much more wider scale and over mul-
factured in a global supply chain with limited control. tiple organizational jurisdictions, it has become rela-
Consider a scenario in which an employee uses an tively easier for attackers to sabotage a small portion
infected USB flash drive on a machine connected to of the supply chain and tamper the final product with
his organization’s network. The USB carries malware minimum chances of getting detected.
that automatically runs on insertion as a background Since most of the current hardware testing tests only
process that gains root privileges by exploiting vulner- for cases where the hardware is supposed to perform a
IEEE IRI 2013, August 14-16, 2013, San Francisco, California, USA
309
978-1-4799-1050-2/13/$31.00 ©2013 IEEE
list of operations producing expected outputs, it fails to system features like memory consumption, CPU uti-
take into consideration cases where the hardware may lization and disk usage. The base values are used to
be tampered so that it not only passes all the tests, derive correlation coefficients on test data which came
but has a malicious circuit which executes additional from real attacks. This approach analyzes overall sys-
sabotaging functionality on top of the expected activi- tem features like the total memory and total CPU con-
ties [25]. Since most intrusion detection and prevention sumption level. A drawback of this detection technique
software try to protect their users by actively monitor- is that it can be easily evaded by trojans with an ex-
ing inbound data from the network or by looking for tremely small memory and CPU footprint that can re-
known attack signatures, very few of them can detect sult in a significant deviation in the overall numbers
the aforementioned attack scenarios. for the host machine when observed as a whole. The
We describe a novel detection system that monitors false positive rate of their system is not documented.
a set of system and network level features of a host There has also been some work building on the phi-
system and flags alerts based on temporally-related losophy of using multiple sensing modules to detect
anomalous behavior detected in multiple monitored attacks. However, these multiple sensors are typically
modules. It is well known that by building a behavioral all at the same level of the stack – just the host, or
model of the system under normal usage and detecting just the network. Such a narrow feature set can re-
deviations from this model when under an attack can duce the accuracy of alerts produced by increasing the
provide us with strong hints of an attack[3, 7, 26]. The false positive rates of the intrusion detection system.
individual alerts produced by each module are then Kerschbaum et al. [5] discuss the use of multiple sen-
expressed as resource description framework RDF as- sors embedded into the operating systems, but only
sertions. These assertions when processed by semantic describe in details sensors that specifically pertain to
rules produce highly effective intrusion alerts that have network based attacks.
a low false positive rate. Process profiling is proposed by Okazaki et al. [18],
who derive a normal usage pattern based on system call
2. Related work sequences and compare this to the profile of a system
under an attack. A similar approach based on a sys-
tem calls profile is proposed by Eskin et al. [1]. Various
Fisk et al. [2] propose a global vault to prevent unau-
machine learning approaches have been applied to se-
thorized data breaches by separating the employee ma-
lected system feature sets to classify attacks with good
chines from the ones that contain sensitive information.
results, starting from the seminal work of Forrest et
They implement this strict isolation between the user
al. [3] and Lee et al. [7]. Undercoffer [26] created
machines and the servers by placing limits such as a
a model of a running system under normal usage, and
whitelist of allowed inter machine processes and a max-
then used that model to detect attacks in the future us-
imum allowed bandwidth. This is an impractical ap-
ing machine learning algorithms. Mathews et al. [10]
proach when applied for large organizations as it puts
also took a machine learning approach in identifying a
stringent conditions on what a user can or cannot do.
network-based feature set which was able to produce
Liu et al. [9] describe a framework to actively mon-
good classification results in identifying malicious net-
itor and react in cases of intrusions and their possible
work data.
detection. Their proposed intrusion detection engine
is placed at the network edge, scans outbound traffic,
and decides if it should forward the data to the outside 3. System Design
node or not. The main drawback in the system is the
live monitoring and intrusion prevention approach that We describe a prototype intrusion detection system
must mine a large amount of data and decide whether (IDS) that is highly modular and has in place multiple
or not to forward it without affecting outbound band- sensing modules across multiple layers of the system.
width speeds. For even a medium sized corporation, a Each alert from the individual monitoring sensor is rep-
single module deployed at the lone egress point of a cor- resented as a set of RDF assertions. Producing RDF
porate network would require tremendous processing assertions allows our system to fit into a larger semantic
powers to monitor and analyze each outgoing packet integration and reasoning framework being developed
at runtime. in our laboratory [10, 4] that uses traditional and non
Ramachandran et al. [23] claim that their behavior- traditional sensors to form a collaborative approach to
based model can catch most network data exfiltration cybersecurity. The assertions from our system can be
scenarios. They first learn the normal behavior of a integrated with other information and the results aug-
system by using kernel density estimation methods on mented using various reasoners, including description-
310
of our system.
In the remainder of this section we describe some of
the modules we have implemented and that are used
in the example exfiltration scenario.
3.1 New hardware detection
This module produces an alert each time a new hard-

ware device is inserted in the system. We maintain a
host-based data resource of identifiers of devices that
have been seen with the host to detect ones that are
inserted for the first time. A sample rule that may
take advantage of such classification could be for a case
where a stronger alert is shown on instances when a
completely new USB flash drive is inserted into the
system rather than one which is frequently seen. We
Figure 1. Our prototype system detects use the hardware device’s UUID values as unique iden-
anomalous behavior at different layers, en- tifiers representing the device.
code the events as RDF assertions which are
integrated and reasoned over to recognize 3.2 Memory usage by a process
potential exfiltration events.
As soon as an attacker is able to gain access to a vic-
tim’s system the immediate next task in most attacks
is to hide the malicious process from the user. A com-
logic theorem provers and rule-based systems. monly used method to do this is via a code injection
In this paper, we focus on a simple reasoning ap- into an existing running process. In doing so, the mem-
proach in which a collection of alerts denoting the quick ory usage of the process will likely change [26]. Based
succession of anomalous events across multiple layers on this intuition we monitor the heap, stack and pri-
indicates that an attack may have allowed data to be vate data sections of a list of profiled processes and
exfiltrated from a victim’s machine. These multiple log alerts if the memory footprint of each memory type
modules sensing different system parameters in tan- deviates significantly from the mean observed value.
dem are crucial in reducing false positives, as a large
number of system parameters behaving in a anoma- 3.3 Network data
lous manner is most likely to be a strong indicator of
an attack. Monitoring network data also can produce patterns
To build a normal profile of the system we ran the that provide indicators of exfiltration attacks. A sud-
profiling module for a substantial amount of time log- den burst of outgoing data or communication with a
ging the measured parameters such as memory pat- never-before seen IP address, especially one that is in
terns, DLLs called, and network usage. Once this pro- the DHCP range of an ISP, can be a good hint of data
file is built, the data is stored and serves as a model of being exfiltrated. While monitoring IP addresses to de-
what our IDS defines as “normal behavior”. Each sen- tect communication with a new IP address is straight-
sor module is then run independently to monitor spe- forward, detecting “bursts” of outgoing data is more
cific system level parameters. The monitoring module complex.
for that sensor continuously compares live data with The primary difficulty arises in defining a “burst of
the normal-behavior profile. In case that the module data” compared to normal variability in traffic. Sec-
catches a deviation from the norm, it produces an alert ondly, we must keep a track of all deviations and their
that denotes a abnormalities for that particular system occurrence patterns in order to avoid extremes results
parameter. Next, an RDF generator module runs on which could result in an unacceptable number of false
top of all the sensing modules and takes as input the positives (false alarms) and/or false negatives (missed
alerts produced by the earlier sensors and creates a attacks). Our model of an abnormal “burst of data”
graph of RDF assertions. These RDF assertions de- is derived from an analysis of data collected from mul-
scribe the situation using ontologies developed by our tiple TCP sessions of every outbound communication
research group [27, 28, 4]. Figure 1 shows an overview from the host over a significant period. Our decision to
311
analyze TCP sessions is based on earlier work done in 4. Profile building and live monitoring
our group [10] that had shown good results in detecting
malicious network traffic by analyzing TCP sessions on Most of our development and testing was on systems
inbound data. running the Windows operating system due to the
We use two features to model the outbound network high number of publicly available attacks specifically
flow characteristics: the mean inter-departure packet targeted towards these. Our profiling and process
times and the number of packets in a single TCP ses- monitoring module is currently limited to Windows-
sion. The first feature denotes the rate of packets flying based processes, though similar routines can be easily
out and the second feature denotes the pure quantity written (and in some cases already exist) for Linux.
of outbound data. These two features, when taken to- We successfully profiled and monitored a list of nine
gether, give a good picture of sufficient data going out common Windows processes: calc.exe, conhost.exe,
of a system in a short span of time. Both characteris- explorer.exe, firefox.exe, msinfo32.exe, mspaint.exe,
tics are expected to be high when an attacker infiltrates notepad.exe, powershell.exe and wmplayer.exe.
a victim and tries to maximize his information theft by
The decision to select these nine was based on three
exfiltrating the data as quickly as possible.
factors. First, we wanted a list of processes that are
either pre-installed in a standard configurations or are
3.4 Dynamic link libraries part of very popular software packages. Second, we
wanted a wide range of processes in terms of their mem-
We profile the list of dynamic link library (DLL) calls a ory consumption pattern to avoid biasing our results.
process makes during its normal execution. It is a fair The final selection criterion was the amount of user
assumption that for an extensively profiled process, one interaction each of the monitored processes witness in
can gather a finite list of all DLL files that the process their lifetime. We wanted a broad variety of processes
typically opens for its regular use. A process making a which would include background processes such as ex-
DLL call that is not among its normal set may indicate plorer.exe or conhost.exe that do not involve user in-
that it has been compromised and an alert is generated. teraction to processes like firefox.exe and wmplayer.exe
that do.
3.5 Registry keys We ran the profiling module for three to four days
with intermittent use of each process to produce av-
Similar to the list of DLLs, we maintain a list of all erage values of memory consumed by the heap, stack
registry keys a Windows process usually accesses. Any and private data sections. We also calculated the stan-
new registry key being accessed is another indicator on dard deviation of these three respective mean values
our list that gets flagged as a possible process execut- for each process. Once the profile was built, we moni-
ing maliciously. A trojaned process can have multiple tored these process live and raised alerts if the memory
reasons to access to registry keys it has never accessed consumption for any of the three memory types went
before. A simple process like notepad, for example, over three standard deviations of the averaged value.
should not have to access a network configuration reg- We implemented a simplistic non-statistical approach
istry entry. If it does, there is high probability that a to profile the list of DLLs, windows registry keys and
malicious process pretending to be notepad is access- system calls that the processes called under normal us-
ing network information in order to connect to a remote age. During the profiling phase of these processes, a
server. whitelist of all DLLs, system calls and registry keys
was prepared which was essentially a list of all calls
3.6 System calls the processes made under normal use. If any new DLL,
system call or registry key is called outside the earlier
There is sufficient past work [18, 1, 8] that proves that built whitelist, an alert is raised.
system call monitoring can produce good indicators of For our networking module we used the libpcap [15]
an attack. One of the process characteristics that we libraries to implement packet sniffing for all outbound
monitor to detect any deviations from the norm is the traffic. The splitcap [14] tool was used to extract TCP
system calls being made by that process. We assume session based information from the network packets be-
that a trojan hiding underneath an existing process ing monitored. The system was run for a few days
is likely to call a distinct set of system calls which if and all IP addresses that the host communicated with
monitored, can be used to raise an alert. We use a logged. This list of IP addresses served as a whitelist
fairly simple approach, essentially only looking the the of all destinations that were deemed safe to be com-
number of system calls made, not their pattern. municating with. A network packet sent to any IP
312
address outside this list would throw an alert. Packet 2. Metasploit executable
sniffing sessions were initiated on five machines in our 3. Applet based attack
lab used by multiple users who had volunteered. The 4. Remote Administration using HTTP tunneling
data collected from these volunteers was aggregated (RATTE)
to produce overall network flow characteristics. These 5. Tab nabbing attack
characteristics collected and aggregated produced an
average value of the inter packet departure time per Once the victim’s machine was successfully compro-
TCP session and the average number of packets sent mised and complete access gained, we tried to mimic
in a single TCP session. a real attack resulting in data exfiltration. The first
The hardware monitoring module had a simple im- step was to hide our malicious process behind an ex-
plementation. All connected hardware devices were isting one using code injection. We then downloaded
profiled using their manufacturer UUID as their iden- files from the victim’s machine, took screen shots of
tification number and alerts were raised for any new the victim’s screen, and captured key strokes. We also
hardware introduced in the system. In case of USB executed remote processes and extracted network con-
flash drives, an additional information informing us figuration information from the victim.
whether the USB device was seen in the past or not We ran the same set of attacks against six differ-
was added in the produced alerts. This allows the pos- ent commercially available security software systems.
sibility of highly flexible rules running on our RDF as- These covered traditional anti-virus systems, firewalls
sertions such a sample rule which called for no alerts and pure intrusion detection systems. The list in-
to be raised if the USB drive inserted in the system cluded Microsoft forefront endpoint, Spyware termina-
had been frequently used in the past. This approach tor, Windows defender, Snort, AVG and Comodo fire-
can be extended to other devices – for instance logging wall.
the MAC address of a network card or a disk serial
number. 6. Results
5. Testing our system Every time a new USB flash drive was inserted, our
hardware monitoring module was able to produce an
We used the Metasploit [17] open source penetration- alert with the additional information of whether the
testing framework to create and apply attacks in order flash drive had been seen before or not. Results from
to test our intrusion detection system. Within Metas- the memory monitoring module 1 show that all three
ploit, we extensively used the social engineering toolkit memory types can potentially be good features to be
(SET) [22]. Social engineering based attacks are among monitored to detect an attack. For the nine sample
the most common forms used today for data exfiltra- processes however, heap and stack turned out to be
tion. SET is popular, with over two million downloads, less accurate indicators when compared to private data
for two reasons: (1) it offers a large number of easy to memory type.
run attacks that do not require much experience or We observed that for most of the profiled processes,
background knowledge, and (2) it is tightly integrated the private data memory type witnessed a significant
with Metasploit, allowing pen-testers and white hat jump whenever we tried to hide our malware behind a
hackers to develop custom exploits by combining SET particular process using code injection. The three pro-
based attack options with custom payloads. The list of cesses for which the jump was less than one standard
past attacks that used social engineering to infiltrate deviation (¡1σ) were Microsoft paint (mspaint.exe),
their victims includes highly sophisticated APTs like Windows media player (wmplayer.exe), and Firefox
Stuxnet [6], which was spread using USB drives, and (firefox.exe). This was largely due to these processes
the Aurora attack on Google [19], which is believed having a highly variable memory consumption pattern
to have been initiated by sending malicious URLs to dependent on their usage which leads to a high stan-
Google employees. The social engineering toolkit under dard deviation value. Firefox, for example, can start
Metasploit allows us to test our system against similar as a small process with a memory footprint of a few
attacks that can be launched by using malicious hard- hundred kilobytes, but can reach a value more than ten
ware to directly transfer the Trojan payloads on to a times that due to heavy graphic content of the websites
known system. being viewed or simply by the number of concurrent
We ran the following five attacks available in Metas- tabs opened by the user. In case of Windows media
ploit’s SET: player, we found surges in the memory usage when the
1. PowerShell attack using shellcode injection player was used to stream high definition videos when
313
Process Priv data Stack Heap Process DLL Registry System call
calc.exe 554σ 11.14σ 3.72σ calc.exe 17 31 4
conhost.exe 1964σ 32σ 428σ conhost.exe 27 233 3
esplorer.exe 30.8σ 0.96σ 2.32σ esplorer.exe 22 34 3
firefox.exe 0.47σ 2.1σ 15.6σ firefox.exe 5 40 0
msinfo.exe 31σ 0.047σ 0.89 σ msinfo.exe 21 45 0
mspaint.exe 1.08σ 0.38σ 0.24σ mspaint.exe 14 280 0
notepad.exe 42.58σ 0.01σ 2σ notepad.exe 16 31 0
powershell.exe 1972σ 21σ 15.9σ powershell.exe 34 310 0
wmplayer.exe 0.65σ 0.9σ 0.82σ wmplayer.exe 84 2175 9
Table 1. Memory deviations for attacked pro- Table 2. The number of new DLLs calls, reg-
cesses istry keys accessed and system calls are in-
dicators of compromised processes.
compared to simple music playing or image viewing

operations. deviation. After more analysis of the sample test data,
Although the memory monitoring module was in- we chose 0.01×γ, where γ is the mean inter departure
effective for a small set of our profiled processes, it time per packet per TCP session, as the minimum al-
worked extremely well for processes that have a low lowable value for a TCP session to not be suspected of
memory footprint and run in the background without belonging to an attack.
much user interaction. These background processes are We monitored over 1154 TCP sessions out of which
generally the first choice for most attackers. The Ex- twelve were part of the illegitimate intrusions leading
plorer process, for example, is one of the most pop- to data exfiltration. Since multiple TCP sessions are
ular choices for hiding malware and is recommended often created for a short, one-time communication be-
in many hacking tutorials on the Web [12, 11, 16, 13]. tween two nodes in a network, the number of actual
Processes like Firefox and Wmplayer are poor choices attacks run was much less than twelve. Out of these
because they have short lifetimes since they are often twelve malicious sessions, only three were sessions that
killed by users after their use. involved the attacker exfiltrating small files (<1 Mb)
Table 2 shows that during an attack, effected pro- from the victim’s machine. The inter departure packet
cess accessed a number of new DLLs and registry keys time sub-module detected 114 TCP sessions, which was
that they had never accessed during normal operation. a high number of false positive alerts. The packet count
Our alert sensitivity for this sub-module was such that sub-module was relatively better but still generated a
a single new DLL or registry key access produced an substantial number of false positives. However, requir-
alert. While monitoring system calls, however, we ob- ing a conjunction of both significantly increased the
served that six out of nine processes did not show any alert accuracy with the overall network module detect-
new system call being accessed during an attack. We ing all three exfiltration sessions apart from one other
believe that this is largely due to the simplistic model false positive (Table 3).
of system call access that we used, as prior work has When the six commercial security software men-
shown that detecting complex usage patterns of system tioned earlier were run over our sample attacks, none
calls will detect subverted processes[3]. performed well. The majority could not detect most
For the networking module, the network data be- of our attacks, although each one of them found some
fore and after an attack was not varied by a scale of success for a few attacks. Table 4 compares the perfor-
tens or hundreds, as was the case with the memory mance of our intrusion detection system and the others
monitoring module. Therefore, we needed to come up for each of the five attacks tested. For all, our mod-
with an allowed standard deviation number to use as ules were able to trigger alerts warning us of anomalous
a benchmark when differentiating between normal net- behavior.
work data flow and data exfiltration due to an attack. For all of the attacks studied except tab nabbing, all
After running our module over test data and analyzing of our modules succeeded in flagging anomalous system
the results, we selected +4σ as the maximum allowed behavior, thus providing enough information to pro-
deviation for the number of packets sent in a TCP ses- duce an alert with reasonably high confidence. Since
sion. The mean inter-departure time of packets is more the tab nabbing attack involved neither remote code
varied in terms of its sample set with a large standard execution, process migration, nor substantial network
314
Total TCP sessions monitored 1154
Malicious sessions 12
MIDPT module alerts 114
Packet count alerts 34
Combined alerts 4
True positives 3
False positives 1
Table 3. Combined detection rate for attacks
traffic, some of our modules were unable to detect any

anomalous behavior and raise alerts.
Our IDS was able to detect and log a sequence of
two major events as they happened over the course of
tab nabbing attacks: the insertion of a foreign device
followed by a connection to a never seen before IP ad- Figure 2. False positive rate
dress. Even though the individual alerts raised was
low in confidence, our system was still able to produce
environment, from hardware up to applications. The
some alert for an attack that was missed by all of the
information is encoded as RDF assertions supported
commercial software systems we tested.
by several ontologies designed to support representing
We also tested our system to see how prevalent false
and reasoning over information about security-related
positives were. It was run under normal usage for
entities, relations, events and concepts. Our cross-layer
about six hours without executing any attacks on it
and temporally-aware approach was able to minimize
in order to observe the number of false alarms our in-
the number of false positive alerts which plagues most
dividual modules produced. One of the original goals
of the current security solutions. We plan to build
of our work was to reduce the false positive rate while
on our prototype by adding additional modules, in-
detecting attacks. We aimed to achieve this by inte-
corporating additional background knowledge, using
grating information from multiple detection modules
more sophisticated techniques to model and discrim-
and integrating them as events that are temporarily
inate normal and malicious behavior, incorporate ma-
close. We vary the number of individual modules that
chine learning algorithms where appropriate and con-
need to declare an intrusion before the system as a
duct a more comprehensive and larger-scale evaluation.
whole would. Figure 2 shows the results. They con-
firm that our approach performs well since there were
no false positives when three or more of our monitor- Acknowledgment
ing modules raised alerts concurrently. Also, the high
number of false positives where a single module raised This research was partially supported by AFOSR
an alert were primarily due to the networking module award FA9550-08-1-0265 and a gift from Northrop
that flagged an alert every time we communicated to Grumman. Joshi’s work was supported by funds from
a new IP address. We believe that with a longer pro- the Oros Professorship endowment.
filing duration than ours, this number can go down in
the future. References
7. Conclusion [1] E. Eskin, W. Lee, and S. J. Stolfo. Modeling sys-

tem calls for intrusion detection with dynamic win-
dow sizes. In DARPA Information Survivability Conf.
We implemented and evaluated a prototype system & Expo. II, volume 1, pages 165–175, 2001.
that is effective in detecting attacks leading to data [2] M. Fisk, S. Miller, and A. Kent. Global virtual vault:
exfiltration from a compromised computer running the Preventing unauthorized physical disclosure by the in-
Windows operating system. In our evaluation six pop- sider. In Military Communications Conf., pages 1–7.
ular commercial software products performed poorly IEEE, 2008.
on this task. Our approach is based on integrating in- [3] S. Forrest, S. Hofmeyr, A. Somayaji, and T. Longstaff.
formation from a collection of monitoring systems that A sense of self for unix processes. In Symposium on
operate at different conceptual layers of a computing Security ad Privacy. IEEE, 1996.
315
PowerShell Metasploit Applet tunneling Tab nabbing
Microsoft forefront endpoint Missed Caught Caught Missed Missed
Spyware terminator Missed Missed Missed Missed Missed
Windows defender Missed Missed Missed Missed Missed
Snort Caught Caught Missed Missed Missed
AVG Missed Caught Missed Caught Missed
Comodo firewall Missed Caught Missed Caught Missed
Our system Caught Caught Caught Caught Caught
Table 4. Our system performed well compared to others on experiments with several common types
of attacks.
[4] A. Joshi, R. Lal, T. Finin, and A. Joshi. Extracting [17] J. O’Gorman, D. Kearns, and M. Aharoni. Metasploit:
cybersecurity related linked data from text. In Seventh The Penetration Tester’s Guide. No Starch Press,
IEEE International Conference on Semantic Comput- 2011.
ing. IEEE Computer Society, September 2013. [18] Y. Okazaki, I. Sato, and S. Goto. A new intrusion
[5] F. Kerschbaum, E. Spafford, and D. Zamboni. Us- detection method based on process profiling. In Sym-
ing embedded sensors for detecting network attacks. posium on Applications and the Internet, pages 82–90.
In ACM Workshop on Intrusion Detection Systems, IEEE, 2002.
2000. [19] Operation aurora. http://wikipedia.org/wiki/Opera-
[6] R. Langner. Stuxnet: Dissecting a cyberwarfare tion Aurora. (accessed 2013-05-29).
weapon. Security & Privacy, 9(3):49–51, 2011. [20] IBM distributes infected USB drives at conference.
[7] W. Lee and S. J. Stolfo. A framework for construct- http://scmagazine.com/ibm-distributed-infected-
ing features and models for intrusion detection sys- usb-drives-at-conference/article/170862/. (accessed
tems. ACM Transactions Infformation Systems Secu- 2013-05-29).
rity, 3(4):227–261, Nov. 2000. [21] Netbook comes with factory-sealed malware.
[8] W. Lee, S. J. Stolfo, and P. K. Chan. Learning pat- http://scmagazine.com/netbook-comes-with-factory-
terns from unix process execution traces for intrusion sealed-malware/article/137147/. (accessed 2013-05-
detection. In AAAI Workshop on AI Approaches to 29).
Fraud Detection and Risk Management, 1997. [22] N. Pavkovic and L. Perkov. Social Engineering
[9] Y. Liu, C. Corbett, K. Chiang, R. Archibald, Toolkita systematic approach to social engineering. In
B. Mukherjee, and D. Ghosal. Sidd: A framework MIPRO 2011, 34th International Convention, pages
for detecting sensitive data exfiltration by an insider 1485–1489. IEEE, 2011.
attack. In 42nd Hawaii Int. Conf. on System Sciences, [23] R. Ramachandran, S. Neelakantan, and A. Bidyarthy.
pages 1–10. IEEE, 2009. Behavior model for detecting data exfiltration in net-
[10] M. L. Mathews, P. Halvorsen, A. Joshi, and T. Finin. work environment. In Conf. on Internet Multimedia
A collaborative approach to situational awareness for Systems Architecture and Application. IEEE, 2011.
cybersecurity. In 8th Int. Conf. on Collaborative Com- [24] P. Sharma. A multilayer framework to catch data exfil-
puting: Networking, Applications and Worksharing, tration. Master’s thesis, University of Maryland, Bal-
pages 216–222. IEEE, 2012. timore County, August 2013.
[11] Metasploit Commands. http://hacking-tutorial.com- [25] M. Tehranipoor and F. Koushanfar. A survey of hard-
/tips-and-trick/7-metasploit-meterpreter-core- ware trojan taxonomy and detection. Design & Test
commands-you-should-know/. (accessed 2013-05-29). of Computers, IEEE, 27(1):10–25, 2010.
[12] Metasploit Tutorial. http://offensive-security.com- [26] J. Undercoffer. Intrusion Detection: Modeling Sys-
/metasploit-unleashed/Meterpreter Basics. (accessed tem State to Detect and Classify Aberrant Behav-
2013-05-29). ior. PhD thesis, University of Maryland, Baltimore
[13] MeterpreterClient. http://wikibooks.org/wiki/Meta- County, Feb. 2004.
sploit/MeterpreterClient. (accessed 2013-05-29). [27] J. Undercoffer, A. Joshi, T. Finin, and J. Pinkston.
[14] SplitCap. http://netresec.com/?page=SplitCap. (ac- Using DAML+OIL to classify intrusive behaviours.
cessed 2013-05-29). Knowledge Engineering Review, 18(3):221–241, 2003.
[15] TcpDump and LibPcap. http://tcpdump.org/. (ac- [28] J. Undercoffer, A. Joshi, and J. Pinkston. Modeling
cessed 2013-05-29). computer attacks: An ontology for intrusion detection.
[16] Using Metasploit Meterpreter Keylogger. In 6th Int. Symp. on Recent Advances in Intrusion
http://hacking-tutorial.com/hacking-tutorial/5-step- Detection, pages 113–135. Springer, 2003.
using-metasploit-meterpreter-keylogger-keylogging/.
(accessed 2013-05-29).
316

Detecting Data Exfiltration by Integrating Information Across Layers

Uploaded by

Copyright:

Available Formats

Detecting Data Exfiltration by Integrating Information Across Layers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Detecting Data Exfiltration by Integrating Information Across Layers

Uploaded by

Copyright:

Available Formats

Detecting Data Exﬁltration by Integrating Information Across Layers

Puneet Sharma, Anupam Joshi and Tim Finin

Abstract abilities in popular software installed on the computer.

3.1 New hardware detection

This module produces an alert each time a new hard-

compared to simple music playing or image viewing

Table 3. Combined detection rate for attacks

traﬃc, some of our modules were unable to detect any

7. Conclusion [1] E. Eskin, W. Lee, and S. J. Stolfo. Modeling sys-

You might also like