On Detecting Internet-Based Criminal Threats With Xplicoalerts: Current Design and Next Steps

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

On detecting Internet-based criminal threats with XplicoAlerts: Current design and next steps

Carlos Gacimartn, Jos Alberto Hern ndez, Manuel Urue a, David Larrabeiti e a n
Universidad Carlos III de Madrid, Spain Avda. Universidad 30, E-28911 Legan s, Madrid, Spain e Contact author: {carlos.gacimartin}@uc3m.es

AbstractCriminals use more and more the Internet to plan their crimes. Hence, modern Police must be provided with powerfull threat detection tools to prevent crimes before they actually occur and as a means to provide evidence against criminals at court. In light of this, when monitoring suspect trafc generated by criminals, Deep Packet Inspection (DPI) tools must be combined with automatic threat detection techniques in order to lter out non-relevant information but trigger alarms when potential threats are detected. This work shows an architecture and its implementation for such a combination between DPI tools with automatic thread detection techniques, and further proposes the next steps to follow in order to achieve a high-performance threat detection tool to be used by police ofcers.

I. I NTRODUCTION Criminals use more and more modern communication technologies and the Internet to plan their crimes. Thus, the police of the future will need to be provided with high-performance trafc monitoring tools by IT companies and experts. Essentially, such monitoring tools must provide a means to identify the potential threats that the trafc generated by suspects may carry. This poses a number of both technological challenges and legal issues. Concerning technological challenges, the trafc monitoring tools must operate at very high-speeds in some cases, and are also desired to must provide an easy interface to be used by IT non-experts, as it is often the case of police ofcers. Regarding legal issues, most EU countries require an authorisation by court to monitor the trafc of a given suspect. A large number of open-source monitoring tools are available in the web, with wireshark being the most typical example. However, most of these tools only capture IP packets as they traverse a given link, and display their raw contents without any processing. Furthermore, for privacy reasons, such tools often do both source and destination address anonymisation, and even sometimes they remove the application layer contents. Hence, such tools are not valid for identifying potential threats and criminals, since the IP addresses or both source and destination computers are important parameters to be stored by the police. Additionally, when the goal is to identify whether or not a given communication ow carries any potential threat, it is very necessary to take a look at the layer-ve contents of all
The work reported in this article has been done within the framework of the European FP7-SEC Project INDECT (http://www.indect-project.eu)

packets from a given ow. Remark that a communications ow is the unidirectional set of packets characterised by a particular ow ve-tuple (source address, source port, destination address, destination port, protocol). In the literature, Deep Packet Inspection (DPI) tools are those which process the full contents of packets from a ow, from the link layer up to the application layer, and optionally extract the the application contents from the array of packets of a given ow. A number of open-source DPI tools are publically available in the Internet, however the number of applicationlayer protocols supported by them is currently very limited. For instance, MSNshadow [3] only provides support to msn trafc, whereas tcpick [7] does not support email decoding. Essentially, the number of challenges in designing and developing DPI applications are many-fold:

The number of protocols and applications in the Internet is so high and changes so quickly that is impossible to decode all captured trafc. The reconstruction of the application-layer contents split into a variable number of IP packets, each of them including layer-two to layer-ve headers requires the decoder to keep track of a large number of communication aspects, such as sequence numbers, IP fragmentation, TCP error control, etc. Applying decoding rules for a particular trafc ow requires to accurately determine the ows applicationlayer protocol. This is often related to port number identication (for instance, port 80 is often web trafc, port 21 is FTP trafc, and so on) but the relationship (port number, application protocol) is not always a one-to-one maping. The information bytes may belong to different media, i.e. text, images, video, audio, programming code, etc, and each media may have been encoding with different techniques, for instance, base64 for MIME attachments in an email, iso-8859-1 for text, etc.

In spite of the so-many challanges, the Xplico tool [1], currently in its 0.5.5 version, has shown good performance features and a wide range of layer-ve protocols supported, while promised by the developers to support a wider number of applications in the forthcoming versions. In addition to this, Xplico also provides a very intuitive web-based interface very appropriate for their use and conguration/administration by

IT non-experts, as it is often the case of Police Ofcers. However, the Xplico tool was designed to monitor and decode trafc, but not to generate any alarm when the decoded trafc matches a given set of parameters associated to criminal threats. Thus, when the amount of decoded trafc becomes large, the manual inspection of each decoded ow is impractical, and it becomes necessary to make the tasks of threat identication automatic. To this end, the authors of this work have developed an extension of the original Xplico, called XplicoAlerts, which automatically detects whether or not a given communication ow contains any suspicious le allocated in a database, say for instance a terrorist- or paedophilia-related image. This work reviews the XplicoAlerts extension to Xplico, its design criteria and performance operation concerning the detection of suspicious trafc, and its applicability to the FP7 INDECT project. II. D EEP PACKET I NSPECTION
TECHNIQUES AND

.pcap trafc capture

Internet

DeMa

Capture
eth IPv4 TCP PPP IPv6 UDP
SMTP

Capture
eth IPv4 TCP PPP IPv6 UDP
SMTP

Capture
eth IPv4 TCP PPP IPv6 UDP
HTTP

FTP

FTP

FTP

Dispatcher

Dispatcher

Dispatcher

X PLICO
Files
Visualisation

A. A comparison of DPI techniques Table I shows a comparison of a number of popular DPI techniques together with their most important features, such as the application protocols that they support and the platforms on which they can operate. As shown, the current version of Xplico decodes many more protocols than any other, and is expected to further decode packets collected at 802.11 wireless networks (as long as the keys are provided) and IRC, XMPP in subsequent versions. B. Inside Xplico Xplico has been developed modularly in order to increase its maintenance and ease cooperation with other developers. Xplico comprises four macro-components, building a modular architecture as shown in Fig. 1: A Decoder Manager, called DeMa. This module is in charge of organising the incoming packets into ows, and launch and control the execution of the IP decoder and manipulators. An IP/network decoder, comprising a set of data manipulators (dissectors), one per application protocol, to increase modularity of the system. A dispatcher which outputs the results to several output formats and storage systems (directory tree, SQLite, mySQL, etc). A web-based visualisation system, called XI, which displays the decoded data using a easy-to-use php-based web interface. The operation of Xplico is as follows: Incoming trafc feeds the DeMa, which organises the individual packets into ows. The DeMa then stores the packets from each ow in a separate .pcap le per ow, identied by source and destination IP addresses, port numbers and protocol. The DeMa then creates an instance of the IP/network decoder to process each ow separately. The IP/network decoder comprises a number of data manipulators or dissectors one per protocol. For instance, there are layer-2 dissectors, like PPP and Ethernet, IPv4 and IPv6

DDBB

XplicoAlerts

Hashes

Fig. 1.

Architecture of the Xplico tool and XplicoAlerts

dissectors, TCP and UDP dissectors, and nally a number of dissectors for the application-layer protocols, including SMTP, HTTP, telnet, etc. The IP/network decoder tries different dissectors from botton to top in the TCP/IP layer stack. For instance, in a trafc capture of web trafc collected in an Ethernet LAN, the rst dissector tried is Ethernet, then IPv4, then TCP and nally HTTP to produce an output of the web page downloaded, which is then passed to the dispatcher. If none of the dissectors is suitable for a particular ow, then no output (other than the original .pcap ow) is passed to the dispatcher. For instance, in the example of Fig. 1, the DeMa generated three instances of the IP/network decoder, one for processing an email (thus using the eth, IPv4, TCP and SMTP dissectors), another one for processing a le transmitted through FTP, and a nal one for processing a web page downloaded. After the ow is decoded, the dispatcher stores the results in a given previously-congured format (SQLite and directory tree, for instance). Finally, the web-based visualisation system, which is based on php, displays the decoded data in a more friendly interface. This interface provides a main menu on the left-hand side with the decoded emails, SIP conversations, web sites visited and images transferred, among others. Clicking on each menu displays a list with information concerning the date and time

Tool Xplico[1] PacketoMatic[2] MSNShadow[3] Pyag[4] tftpgrab[5] Chaosreader[6] tcpick[7] Yahsnarf[8]

Version 0.5.5 20100227 0.3beta 0.87 0.2 0.94 0.21

Platform GNU/Linux GNU/Linux, FreeBSD, Solaris GNU/Linux GNU/Linux GNU/Linux GNU/Linux, Windows98 GNU/Linux, FreeBSD, Mac OSX GNU/Linux A

Language C, Python, php, javascript C C Python, C C Perl Ruby TABLE I

Protocols HTTP, SMTP, POP3, IMAP, DNS, FTP, SIP, TFTP, IPP, PJL, MMSE, Telnet, NNTP, Facebook chat, Webmail AOL, hotmail, yahoo HTTP, POP3, MSN, IRC, RTP MSN HTTP, SMTP, POP3, DNS, SIP, MSN, IRC, Yahoo Chat, Webmail Gmail, Google Image Search TFTP HTTP, SMTP, FTP, IRC, Telnet, VNC, ICMP, SSH HTTP, FTP Yahoo chat

COMPARISON OF OPEN - SOURCE DPI TOOLS

when the ow was captured, and other relevant information that summarises the contents of such ow. For instance, in Fig. 2, where all the emails captured and decoded are shown, such relevant information comprises the Subject of the email, the sender and the receiver, and the email size. Clicking on a given email displays its contents (see Fig. 3).

any suspect le. To do so, XplicoAlerts provides an interface to work with : Crime categories: New categories may be added by the Police Ofcer, etc. Include suspicious les in each category: Include the les themselves, the le hashes only or a directory tree with multiple les to be hashed and added. Alarms: Generate information regarding the alarm, including the ow that triggered the alarm, date and time and a description eld for the police ofcer to type extra comments. A screenshot of the index page of XplicoAlerts is given in Fig. 4. Fig. 5 shows how to add lehashes of new suspicious les.

Fig. 2.

Xplico screenshot: Emails decoded

Fig. 3.

Xplico screenshot: Email contents

Next section explains how XplicoAlerts operate. C. XplicoAlerts The authors at Universidad Carlos III de Madrid have extended the original features of Xplico, providing an interface to quickly detect specic les within the ows collected from a given suspect. Recall from Fig. 1, XplicoAlerts combines information from two different databases: One database provides the monitored and decoded trafc from the suspect; the second database provides the hashes of a set of suspicious les. XplicoAlerts triggers an alarm when a decoded le matches
Fig. 4. XplicoAlerts screenshot: Index

III. A

CASE EXAMPLE :

D ETECTING A

THREAT

A. Design requirements Before designing a tool for detecting criminal behaviour with Deep Packet Inspection techniques, it is necessary to take into account a number of requirements and constraints.

objects match any of the suspicious les, whose hashes are stored in a local (encrypted) database. If possitive, then an alert is sent to the Police Ofcer, who must check it manually. In this scenario, the suspect is browsing a website which contains a suspect image (Fig. 6(a)), whose hash is included in the suspicious-le database. Xplico then decodes the full web session (Fig. 6(b) top-right). Then, XplicoAlerts hashes each decoded object and then compares the results with the hash database of suspicious-les. In case of a possitive, XplicoAlerts triggers an alarm (Fig. 6(c)), which must be analysed and anotated by the Police Ofcer (Fig. 6(d)). IV. N EXT
STEPS

The following comprises a set of development enhancements to be included in further versions of XplicoAlerts:
Fig. 5. XplicoAlerts screenshot: Adding lehashes

All these were taken under consideration in the design of XplicoAlerts: The analyser must store all information from the suspect. Basically, if the suspect sends an email which could be identied as a threat, he may then claim that after it he sent another email canceling the previous one. The Police must be able to check whether he sent a second email or not. The Police Ofcer does not need to check all decoded trafc manually, since this would be impractical when the amount of transferred information is large. Instead, the analyser must generate alarms only on those ows which it may consider suspicious. The Police Ofcer would then examine the ows that generated the alarm. Additionally, the Police Ofcer must have a means to type notes on the ows that generated alarms. Detecting suspicious ows shall be based on whether or not the ow contains a given suspicious le, e.g. an image or other type of object transmitted over an email, FTP, etc. The actual suspicious les do not need to be locally stored on the analyser, but only their hashes. The les trasferred by the suspect will be checked against the hash database. This increases the security of the analyser in case of loss. All decoded information must be stored locally, in an encrypted database, in order to increase security in case of loss. The suspicious hash database may be updated remotely. The analyser must be able to communicate safely and securely with the Police Ofce. B. Example of operation In a typical scenario, a Police Ofcer places the XplicoAlerts monitoring system somewhere near the house of a suspect from some criminal activity. Xplico then collects all trafc from the suspect and checks whether or not the decoded

Fuzzy hashing. With conventional hashing, when a le is slightly changed, then the hash changes completely. Thus, a suspect could slightly modify the les to make them undetectable by the analyser. Fuzzy hashing provides a mean to produce similar hashes of similar les (images), thus enabling to detect the transmission of images similar to those stored in the suspicious database. Natural language analysis. A further extension is to analyse not only the les transmitted but also to do some intelligent processing on the message contents (i.e. text of websites and emails, etc). To do so, it is necessary to store an array of words in the suspicious database such that, if some of them are identify within the same context (20 or 50 word distance), then an alarm is triggered to be manually inspected by the Police Ofcer. Host location. It is interesting to correlate the suspicious database with geographical information of the source and destination IP addresses, such that the Police Ofcer may identify the country or region that the suspect communicates with. Fast detection with Bloom Filters. As the number of suspicious items is expected to grow dramatically, an easy way to speed up the processing of les would include the use of Bloom lters. Bloom lters are compact data units that can be used to fast checking the membership of an item within a set. V. S UMMARY
AND DISCUSSION

This work has shown the architectural design of Xplico and XplicoAlerts and their potential to detect criminal activity by monitoring and decoding the trafc of suspects and then compare the results in a hash database of suspicious les. At present, the tool is capable of decoding a wide variety of protocols, but the detection features are still limited to the detection of suspicious les. However, a number of enhancements are planned for further versions, including natural language processing, threat location and fuzzy hashing -based detection.

ACKNOWLEDGEMENTS The authors would like to acknowledge the EU-funded project INDECT, grant number FP7-SEC-218086, to the development of this work. Also, the authors would like to thank Gianluca Costa and Andrea De Franceschi, developers of Xplico, for his support to this work. R EFERENCES
[1] The Xplico Internet decoder, current version 0.5.5. Available at http://www.xplico.org. [2] PacketoMatic, current version 20100227. Available at http://www.packet-o-matic.org [3] MSN Shadow, current version 0.3beta. Available at http://msnshadow.blogspot.com/ [4] Pyag, current version 0.87. Available at http://www.pyag.net [5] TFTPgrab, current version 0.2. Available at http://pseudoaw.net/content/tftpgrab/ [6] Chaosreader, current version 0.94. Available at http://chaosreader.sourceforge.net/ [7] Tcpick, current version 0.21. Available at http://tcpick.sourceforge.net [8] Yahsnarf. Available at http://writequit.org/projects/yahsnarf/ (a) Suspect browsing the web

(b) Trafc captured by Xplico

(c) Suspicious image detectec by XplicoAlerts

(d) Manual anotation by Police Ofcers Fig. 6. A suspect browsing the web (a), all trafc captured by Xplico (b), suspicious image detected by XplicoAlerts (c) and manual anotation by the Police Ofcers (d)

You might also like