Building a Dynamic Reputation System for DNS

Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster
College of Computing, Georgia Institute of Technology,

Abstract evasively move their command-and-control (C&C) infrastruc-

ture. Fast-flux networks rapidly change DNS records to evade
The Domain Name System (DNS) is an essential protocol blacklists and resist take downs [25]. In an attempt to evade
used by both legitimate Internet applications and cyber at- domain name blacklisting, attackers now make very aggres-
tacks. For example, botnets rely on DNS to support agile com- sive use of DNS agility. The most common example of an ag-
mand and control infrastructures. An effective way to disrupt ile malicious resource is a fast-flux network, but DNS agility
these attacks is to place malicious domains on a “blocklist” takes many other forms including disposable domains (e.g.,
(or “blacklist”) or to add a filtering rule in a firewall or net- tens of thousands of randomly generated domain names used
work intrusion detection system. To evade such security coun- for spam or botnet C&C), domains with dozens of A records or
termeasures, attackers have used DNS agility, e.g., by using NS records (in excess of levels recommended by RFCs, in or-
new domains daily to evade static blacklists and firewalls. In der to resist takedowns), or domains used for only a few hours
this paper we propose Notos, a dynamic reputation system for of a botnet’s lifetime. Perhaps the best example is the Con-
DNS. The premise of this system is that malicious, agile use ficker.C worm [15]. After Conficker.C infects a machine, it
of DNS has unique characteristics and can be distinguished will try to contact its C&C server, chosen at random from a list
from legitimate, professionally provisioned DNS services. No- of 50,000 possible domain names created every day. Clearly,
tos uses passive DNS query data and analyzes the network the goal of Conficker.C was to frustrate blacklist maintenance
and zone features of domains. It builds models of known legit- and takedown efforts. Other malware that abuse DNS include
imate domains and malicious domains, and uses these models Sinowal (a.k.a. Torpig) [9], Kraken [20], and Srizbi [22]. The
to compute a reputation score for a new domain indicative of aggressive use of newly registered domain names is seen in
whether the domain is malicious or legitimate. We have eval- other contexts, such as spam campaigns and malicious flux
uated Notos in a large ISP’s network with DNS traffic from networks [25, 19]. This strategy delays takedowns, degrades
1.4 million users. Our results show that Notos can identify the effectiveness of blacklists, and pollutes the Internet’s name
malicious domains with high accuracy (true positive rate of space with unwanted, discarded domains.
96.8%) and low false positive rate (0.38%), and can identify
In this paper, we study the problem of dynamically assign-
these domains weeks or even months before they appear in
ing reputation scores to new, unknown domains. Our main
public blacklists.
goal is to automatically assign a low reputation score to a
domain that is involved in malicious activities, such as mal-
ware spreading, phishing, and spam campaigns. Conversely,
1 Introduction we want to assign a high reputation score to domains that are
used for legitimate purposes. The reputation scores enable dy-
The Domain Name System (DNS) [12, 13] maps domain namic domain name blacklists to counter cyber attacks much
names to IP addresses, and provides a core service to applica- more effectively. For example, with static blacklisting, by the
tions on the Internet. DNS is also used in network security to time one has sufficient evidence to put a domain on a black-
distribute IP reputation information, e.g., in the form of DNS- list, it typically has been involved in malicious activities for
based Block Lists (DNSBLs) used to filter spam [18, 5] or a significant period of time. With dynamic blacklisting our
block malicious web pages [26, 14]. goal is to decide, even for a new domain, whether it is likely
Internet-scale attacks often use DNS as well because they used for malicious purposes. To this end, we propose Notos,
are essentially Internet-scale malicious applications. For ex- a system that dynamically assigns reputation scores to domain
ample, spyware uses anonymously registered domains to ex- names. Our work is based on the observation that agile mali-
filtrate private information to drop sites. Disposable domains cious uses of DNS have unique characteristics, and can be dis-
are used by adware to host malicious or false advertising tinguished from legitimate, professionally provisioned DNS
content. Botnets make agile use of short-lived domains to services. In short, network resources used for malicious and
fraudulent activities inevitably have distinct network charac- 2 Background and Related Work
teristics because of their need to evade security countermea-
sures. By identifying and measuring these features, Notos can DNS is the protocol that resolves a domain name, like
assign appropriate reputation scores., to its corresponding IP address, for ex-
Notos uses historical DNS information collected passively ample To resolve a domain, a host typically
from multiple recursive DNS resolvers distributed across the needs to consult a local recursive DNS server (RDNS). A re-
Internet to build a model of how network resources are al- cursive server iteratively discovers which Authoritative Name
located and operated for legitimate, professionally run Inter- Server (ANS) is responsible for each zone. The typical result
net services. Notos also uses information about malicious do- of this iterative process is the mapping between the requested
main names and IP addresses obtained from sources such as domain name and its current IP addresses.
spam-traps, honeynets, and malware analysis services to build By aggregating all unique, successfully resolved A-type
a model of how network resources are typically allocated by DNS answers at the recursive level, one can build a passive
Internet miscreants. With these models, Notos can assign rep- DNS database. This passive DNS (pDNS) database is ef-
utation scores to new, previously unseen domain names, there- fectively the DNS fingerprint of the monitored network and
fore enabling dynamic blacklisting of unknown malicious do- typically contains unique A-type resource records (RRs)
main names and IP addresses. that were part of monitored DNS answers. A typical RR
Previous work on dynamic reputation systems mainly fo- for the domain name has the following for-
cused on IP reputation [24, 31, 1, 21]. To the best of our mat: { 78366 IN A},
knowledge, our system is the first to create a comprehensive which lists the domain name, TTL, class, type, and rdata. For
dynamic reputation system around domain names. To summa- simplicity, we will refer to an RR in this paper as just a tuple
rize, our main contributions are as follows: of the domain name and IP address.
Passive DNS data collection was first proposed by Florian
• We designed Notos, a dynamic, comprehensive reputa-
Weimer [27]. His system was among the first that appeared
tion system for DNS that outputs reputation scores for
in the DNS community with its primary purpose being the
domains. We constructed network and zone features that
conversion of historic DNS traffic into an easily accessible
capture the characteristics of resource provisioning, us-
format. Zdrnja et al. [29] with their work in “Passive Mon-
ages, and management of domains. These features enable
itoring of DNS Anomalies” discuss how pDNS data can be
Notos to learn models of how legitimate and malicious
used for gathering security information from domain names.
domains are operated, and compute accurate reputation
Although they acknowledge the possibility of creating a DNS
scores for new domains.
reputation system based on passive DNS measurement, they
• We implemented a proof-of-concept version of our sys- do not quantify a reputation function. Our work uses the idea
tem, and deployed it in a large ISP’s DNS network in of building passive DNS information only as a seed for com-
Atlanta, GA and San Jose, CA, USA, where we ob- puting statistical DNS properties for each successful DNS res-
served DNS traffic from 1.4 million users. We also used olution. The analysis of these statistical properties is the basic
passive DNS data from Security Information Exchange building block for our dynamic domain name reputation func-
(SIE) project [3]. This extensive real-world evaluation tion. Plonka et al. [17] introduced Treetop, a scalable way to
shows Notos can correctly classify new domains with manage a growing collection of passive DNS data and at the
a low false positive rate (0.38%) and high true positive same time correlate zone and network properties. Their clus-
rate (96.8%). Notos can detect and assign a low reputa- ter zones are based on different classes of networks (class A,
tion score to malware- and spam-related domain names class B and class C). Treetop differentiates DNS traffic based
several days or even weeks before they appear on public on whether it complies with various DNS RFCs and based on
blacklists. the resolution result. Plonka’s proposed method, despite being
novel and highly efficient, offers limited DNS security infor-
Section 2 provides some background on DNS and related mation and cannot assign reputation scores to records.
works. Readers familiar with this may skip to Section 3, where Several papers, e.g., Sinha et al. [24] have studied the effec-
we describe our passive DNS collection strategy and other tiveness of IP blacklists. Zhang, et al. [31] showed that the hit
whitelist and blacklist inputs. We also describe three fea- rate of highly predictable blacklists (HBLs) decreases signifi-
ture extraction modules that measure key network, zone and cantly over a period of time. Our work addresses the dynamic
evidence-based features. Finally, we describe how these fea- DNS blacklisting problem that makes it significantly differ-
tures are clustered and incorporated into the final reputation ent from the highly predictable blacklists. Importantly, Notos
engine. To evaluate the output of Notos, we gathered an ex- does not aim to create IP blacklists. By using properties of the
tensive amount of network trace data. Section 4 describes the DNS protocol, Notos can rank a domain name as potentially
data collection process, and Section 5 details the sensitivity of malicious or not. Garera et al. [8] discussed “phishing” detec-
each module and final output. tion predominately using properties of the URL and not sta-
tistical observations about the domains or the IP address. The Notos’ main source of information is a passive DNS
statistical features used by Holz et al. [10] to detect fast flux (pDNS) database, which contains historical information about
networks are similar to the ones we used in our work, however, domain names and their resolved IPs. Our pDNS database is
Notos utilizes a more complete collection of network statisti- constantly updated using real-world DNS traffic from multiple
cal features and is not limited to fast flux networks detection. geographically diverse locations as shown in Figure 1. We col-
Researchers have attempted to use unique characteristics lect DNS traffic from two ISP recursive DNS servers (RDNS)
of malicious networks to detect sources of malicious activity. located in Atlanta and San Jose. The ISP nodes witness 30,000
Anderson et al. [1] proposed Spamscatter as the first system to DNS queries/second during peak hours. We also collect DNS
identify and characterize spamming infrastructure by utilizing traffic through the Security Information Exchange (SIE) [3],
layer 7 analysis (i.e., web sites and images in spam). Hao et which aggregates DNS traffic received by a large number of
al. [21] proposed SNARE, a spatio-temporal reputation engine RDNS servers from authoritative name servers across North
for detecting spam messages with very high accuracy and low America and Europe. In total, the SIE project processes ap-
false positive rates. The SNARE reputation engine is the first proximately 200 Mbit/s of DNS messages, several times the
work that utilized statistical network-based features to harvest total volume of DNS traffic in a single US ISP.
information for spam detection. Notos is complementary to Another source of information we use is a list of known
SNARE and Spamscatter, and extends both to not only de- malicious domains. For example, we run known malware
tect spam, but also identify other malicious activity such as samples in a controlled environment and we classify as sus-
phishing and malware hosting. Qian et al. [28] present their picious all the domains contacted by malware samples that do
work on spam detection using network-based clustering. In not match a pre-compiled white list. In addition, we extract
this work, they show that network-based clusters can increase suspicious domain names from spam emails collected using a
the accuracy of spam-oriented blacklists. Our work is more large spam-trap. Again, we discard the domains that match
general, since we try to identify various kinds of malicious our whitelist and consider the rest as potentially malicious.
domain names. Nevertheless, both works leverage network- Furthermore, we collect a large list of popular, legitimate do-
based clustering for identifying malicious activities. mains from (we discuss our data collection and
Felegyhazi et al. [7] proposed a DNS reputation blacklist- analysis in more details in Section 4). The set of known mali-
ing methodology based on WHOIS observations. Our system cious and legitimate domains represents our knowledge base,
does not use WHOIS information making our approaches com- and is used to train our reputation engine, as we discuss in
plementary by design. Sato et al. [23] proposed a way to ex- Section 4.
tend current blacklists by observing the co-occurrence of IP Intuitively, a domain name d can be considered suspicious
address information. Notos is a more generic approach than when there is evidence that d or its IP addresses are (or were in
the proposed system by Sato and is not limited to botnet re- previous months) associated with known malicious activities.
lated domain name detection. Finally, Notos builds the rep- The more evidence of “bad associations” we can find about
utation function mainly based upon passive information from d, the lower the reputation score we will assign to it. On the
DNS traffic observed in real networks — not traffic observed other hand, if there is evidence that d is (or was in the past) as-
from honeypots. sociated with legitimate, professionally run Internet services,
No previous work has tried to assign a dynamic domain we will assign it a higher reputation score.
name reputation score for any domain that traverses the edge
of a network. Notos harvests information from multiple 3.1 System Overview
sources—the domain name, its effective zone, the IP address,
the network the IP address belongs to, the Autonomous Sys- Before describing the internals of our reputation sys-
tem (AS) and honeypot analysis. Furthermore, Notos uses tem, we introduce some basic terminology. A domain
short-lived passive DNS information. Thus, it is difficult for a name d consists of a set of substrings or labels sepa-
malicious domain to dilute its passive DNS footprint. rated by a period; the rightmost label is called the top-
level domain, or TLD. The second-level domain (2LD)
represents the two rightmost labels separated by a pe-
3 Notos: A Dynamic Reputation System riod; the third-level domain (3LD) analogously contains the
three rightmost labels, and so on. As an example, given
The goal of the Notos reputation system is to dynamically the domain name d=“”, T LD(d)=“com”,
assign reputation scores to domain names. Given a domain 2LD(d)=“”, and 3LD(d)=“”.
name d, we want to assign a low reputation score if d is in- Let s be a domain name (e.g., s=“”). We de-
volved in malicious activities (e.g., if it has been involved with fine Zone(s) as the set of domains that include s and all do-
botnet C&C servers, spam campaigns, malware propagation, main names that end with a period followed by s (e.g., do-
etc.). On the other hand, we want to assign a high reputation mains ending in “”).
score if d is associated with legitimate Internet services. Let D = {d1 , d2 , ..., dm } be a set of domain names. We
Resource Record (RR)
S.I.E Passive DNS Reputation Domain Name - IP
Database Engine
Black List
Subnet Notos
pDNS Honeypot
pDNS Query Data

ISP Recursive
Subnet DNS Server
(Atlanta) Internet Network Based Zone Based Evidence Based
Feature Extraction Feature Extraction Feature Extraction


Subnet Reputation
F1 F2 F3 ... F18 F1 F2 F3 ... F17 F1 F2 F3 ... F6
ISP Recursive
Subnet DNS Server Network Based Zone Based Evidence Based
(SJC) Features Vector Features Vector Features Vector

Figure 1. System overview. Figure 2. Computing network-based, zone-based,

evidence-based features.

call A(D) the set of IP addresses ever pointed to by any do- average length of domain names in RHDNs, the number
main name d ∈ D. of distinct TLDs, the occurrence frequency of different
Given an IP address a, we define BGP (a) to be the set characters, etc.
of all IPs within the BGP prefix of a, and AS(a) as the set
of IPs located in the autonomous system in which a resides. • Evidence-based features: The last set of features in-
In addition, we can extend these functions to take as input cludes the measurement of quantities such as the number
a set of IPs: given IP set A = a1 , a2 , ..., aN , BGP (A) = of distinct malware samples that contacted the domain d,
S the number of malware samples that connected to any of
k=1..N BGP (ak ); AS(a) is similarly extended.
To assign a reputation score to a domain name d we proceed the IPs pointed by d, etc.
as follows. First, we consider the most current set Ac (d) =
{ai }i=1..m of IP addresses to which d points. Then, we query Once extracted, these statistical features are fed to the
our pDNS database to retrieve the following information: reputation engine. Notos’ reputation engine operates in two
modes: an off-line “training” mode and an on-line “classifica-
• Related Historic IPs (RHIPs), which consist of the union tion” mode. During the off-line mode, Notos trains the repu-
of A(d), A(Zone(3LD(d))), and A(Zone(2LD(d))). tation engine using the information gathered in our knowledge
In order to simplify the notation we will refer to base, namely the set of known malicious and legitimate do-
A(Zone(3LD(d))) and A(Zone(2LD(d))) as A3LD (d) main names and their related IP addresses. Afterwards, during
and A2LD (d), respectively. the on-line mode, for each new domain d, Notos queries the
trained reputation engine to compute a reputation score for d
• Related Historic Domains (RHDNs), which comprise the (see Figure 3). We now explain the details about the statistical
entire set of domain names that ever resolved to an IP features we measure, and how the reputation engine uses them
address a ∈ AS(A(d)). In other words, RHDNs contain during the off-line and on-line modes to compute a domain
all the domains di for which A(di ) ∩ AS(A(d)) 6= ∅. names’ reputation score.

After extracting the above information from our pDNS

3.2 Statistical Features
database, we measure a number of statistical features. Specif-
ically, for each domain d we extract three groups of features,
as shown in Figure 2: In this section we identify key statistical features and the
intuition behind their selection.
• Network-based features: The first group of statistical
features is extracted from the set of RHIPs. We measure 3.2.1 Network-based Features
quantities such as the total number of IPs historically as-
sociated with d, the diversity of their geographical loca- Given a domain d we extract a number of statistical features
tion, the number of distinct autonomous systems (ASs) from the set RHIPs of d, as mentioned in Section 3.1. Our
in which they reside, etc. network-based features describe how the operators who own d
and the IPs that domain d points to, allocate their network re-
• Zone-based features: The second group of features we sources. Internet miscreants often abuse DNS to operate their
extract are those from the RHDNs set. We measure the malicious networks with a high level of agility. Namely, the
Off-Line "Training" Mode Train EV(d)
Reputation 2-Class Meta Classification
Function d NM(d)
Classifier Confidences

Network & NM(d)

d Network
Based Vectors
Network Profile
Passive Compute
DNS DB Vectors

Network Based Zone Based Cluster

Network & Zone DC(d`) d
Clustering Clustering Characterization

Network Profile Network Zone Radius(R) &

Classification Based Vectors Based Vectors KNN(z)
New RR
Reputation Reputation
Rating EV(d`)
On-Line Mode
Figure 4. (a) Network profile modeling in Notos.
(b) Network and zone based clustering in Notos.
Figure 3. Off-line and on-line modes in Notos.

domain names and IPs that are used for malicious purposes mainly because the IPs in the RHIPs should belong to the
are often short-lived and are characterized by a high churn same organization or a small number of different organiza-
rate. This agility avoids simple blacklisting or removals by tions. On the other hand, if a domain name d participates in
law enforcement. In order to measure the level of agility of malicious activities (i.e., botnet activities, flux networks), then
a domain name d, we extract eighteen statistical features that it could reside in a large number of different networks. The list
describe d’s network profile. Our network features fall into the of IPs in the RHIPs that correspond to the malicious domain
following three groups: name will produce AS features with higher values. In the same
sense, we measure that homogeneity of the registration infor-
• BGP features. This subset consists of a total of nine fea- mation for benign domains. Legitimate domains are typically
tures. We measure the number of distinct BGP prefixes linked to address space owned by organizations that acquire
related to BGP (A(d)), the number of countries in which and announce network blocks in some order. This means that
these BGP prefixes reside, and the number of organiza- the registration-feature values for a legitimate domain name
tions that own these BGP prefixes; the number of distinct d that owned by the same organizations will produce a list of
IP addresses in the sets A3LD (d) and A2LD (d); the num- IPs in the RHIPs that will have small registration feature val-
ber of distinct BGP prefixes related to BGP (A3LD (d)) ues. If this set of IPs exhibits high registration feature values,
and BGP (A2LD (d)), and the number of countries in it means that they very likely reside in different registrars and
which these two sets of prefixes reside. were registered on different dates. Such registration-feature
• AS features. This subset consists of three features, properties are typically linked with fraudulent domains.
namely the number of distinct autonomous systems re-
lated to AS(A(d)), AS(A3LD (d)), and AS(A2LD (d)).
3.2.2 Zone-based Features
• Registration features. This subset consists of six features.
We measure the number of distinct registrars associated The network-based features measure a number of characteris-
with the IPs in the A(d) set; the diversity in the regis- tics of IP addresses historically related to a given domain name
tration dates related to the IPs in A(d); the number of d. On the other hand, the zone-based features measure the
distinct registrars associated with the IPs in the A3LD (d) characteristics of domain names historically associated with
and A2LD (d) sets; and the diversity in the registration d. The intuition behind the zone-based features is that while
dates for the IPs in A3LD (d) and A2LD (d). legitimate Internet services may be associated with many dif-
ferent domain names, these domain names usually have strong
While most legitimate, professionally run Internet services similarities. For example,, googlesyndi-
have a very stable network profile, which is reflected into low,, etc., are all related to
values of the network features described above, the profiles of Internet services provided by Google, and contain the string
malicious networks (e.g., fast-flux networks) usually change “google” in their name. On the other hand, malicious domain
relatively frequently, thus causing their network features to be names related to the same spam campaign, for example, often
assigned higher values. We expect a domain name d from a look randomly generated and share few common characteris-
legitimate zone to exhibit a small values in its AS features, tics. Therefore, our zone-based features aim to measure the
level of diversity across the domain names in the RHDNs set. • Blacklist features. We measure three features, namely the
Given a domain name d, we extract seventeen statistical fea- number of IP addresses in A(d) that are listed in public
tures that describe the properties of the set RHDNs of domain IP blacklists; the number of IPs in BGP (A(d)) that are
names related to d. We divide these seventeen features into listed in IP blacklists; and the number of IPs in AS(A(d))
two groups: that are listed in IP blacklists.

• String features. This group consists of twelve features. Notos uses the blacklist features from the evidence vector
We measure the number of distinct domain names in so it can identify the re-use of known malicious network re-
RHDNs, and the average and standard deviation of their sources like IPs, BGP prefixes or even ASs. Domain names
length; the mean, median, and standard deviation of the are significantly cheaper than IPv4 addresses; so malicious
occurrence frequency of each single character in the do- users tend to reuse address space with new domain names. We
main name strings in RHDNs; the mean, median and should note that the evidence-based features represent only
standard deviation of the distribution of 2-grams (i.e., part of the information we used to compute the reputation
pairs of characters); the mean, median and standard devi- scores. The fact that a domain name was queried by malware
ation of the distribution of 3-grams. does not automatically mean that the domain will receive a
low reputation score.
• TLD features. This group consists of five features. For
each domain di in the RHDNs set, we extract its top-level
domain T LD(di ) and we count the number of distinct
3.3 Reputation Engine
TLD strings that we obtain; we measure the ratio between
the number of domains di whose T LD(di )=“.com” and Notos’ reputation engine is responsible for deciding
the total number of TLD different from “.com”; also, we whether a domain name d has characteristics that are simi-
measure the mean, median, and standard deviation of the lar to either legitimate or malicious domain names. In order
occurrence frequency of the TLD strings. to achieve this goal, we first need to train the engine to rec-
ognize whether d belongs (or is “close”) to a known class of
It is worth noting that whenever we measure the mean, me- domains. This training can be repeated periodically, in an off-
dian and standard deviation of a certain property, we do so in line fashion, using historical information collected in Notos’
order to summarize the shape of its distribution. For exam- knowledge base (see Section 4). Once the engine has been
ple, by measuring the mean, median, and standard deviation trained, it can be used in on-line mode to assign a reputation
of the occurrence frequency of each character in a set of do- score to each new domain name d.
main name strings, we summarize how the distribution of the In this section, we first explain how the reputation engine
character frequency looks like. is trained, and then we explain how a trained engine is used to
assign reputation scores.
3.2.3 Evidence-based Features
3.3.1 Off-Line Training Mode
We use the evidence-based features to determine to what ex-
tent a given domain d is associated with other known mali- During off-line training (Figure 3), the reputation engine
cious domain names or IP addresses. As mentioned above, builds three different modules. We briefly introduce each
Notos collects a knowledge base of known suspicious, ma- module and then elaborate on the details.
licious, and legitimate domain names and IPs from public
sources. For example, we collect malware-related domain • Network Profiles Model: a model of how well known
names by executing large numbers of malware samples in a networks behave. For example, we model the network
controlled environment. Also, we check IP addresses against characteristics of popular content delivery networks (e.g.,
a number of public IP blacklists. We elaborate on how we Akamai, Amazon CloudFront), and large popular web-
build Notos’ knowledge base in Section 4. Given a domain sites (e.g.,, During the on-line
name d, we measure six statistical features using the informa- mode, we compare each new domain name d to these
tion in the knowledge base. We divide these features into two models of well-known network profiles, and use this in-
groups: formation to compute the final reputation score, as ex-
plained below.
• Honeypot features. We measure three features, namely
the number of distinct malware samples that, when ex- • Domain Name Clusters: we group domain names into
ecuted, try to contact d or any IP address in A(d); the clusters sharing similar characteristics. We create these
number of malware samples that contact any IP address clusters of domains to identify groups of domains that
in BGP (A(d)); and the number of samples that contact contain mostly malicious domains, and groups that con-
any IP address in AS(A(d)). tain mostly legitimate domains. In the on-line mode,
given a new domain d, if d (more precisely, d’s projec-,,,,
tion into a statistical feature space) falls within (or close, and
to) a cluster of domains containing mostly malicious do-
mains, for example, this gives us a hint that d should be • CDN Domains. In this class we include domain
assigned a low reputation score. names related to CDNs other than Akamai. For ex-
ample, we collect domain names under the follow-
• Reputation Function: for each domain name di , i = 1..n, ing zones:,,,
in Notos’ knowledge base, we test it against the trained, and We chose not
network profiles model and domain name clusters. Let to aggregate these CDN domains and Akamai’s domains
N M (di ) and DC(di ) be the output of the Network Pro- in one class, since we observed that Akamai’s domains
files (NP) module and the Domain Clusters (DC) mod- have a very unique network profile, as we discuss in Sec-
ule, respectively. The reputation function takes in input tion 4. Therefore, learning two separate models for the
N M (di ), DC(di ), and information about whether di and classes of Akamai Domains and CDN Domains allows
its resolved IPs A(di ) are known to be legitimate, suspi- use to achieve better classification accuracy during the
cious, or malicious (i.e., if they appeared in a domain on-line mode, compared to learning only one model for
name or IP blacklist), and builds a model that can assign both classes (see Section 3.3.5).
a reputation score between zero and one to d. A repu-
tation score close to zero signifies that d is a malicious • Dynamic DNS Domains. This class includes a large set
domain name while a score close to one signifies that d of domain names registered under two of the largest dy-
is benign. namic DNS providers, namely No-IP ( and
DynDNS (
We now describe each module in detail.
For each class of domains, we train a statistical classifier
to distinguish between one of the classes and all the others.
3.3.2 Modeling Network Profiles
Therefore, we train five different classifiers. For example,
During the off-line training mode, the reputation engine builds we train a classifier that can distinguish between the class of
a model of well-known network behaviors. An overview of the Popular Domains and all other classes of domains. That is,
network profile modeling module can be seen in Figure 4(a). given a new domain name d, this classifier is able to recog-
In practice we select five sets of domain names that share simi- nize whether d’s network profile looks like the profile of a
lar characteristics, and learn their network profiles. For exam- well-known popular domain or not. Following the same logic
ple, we identify a set of domain names related to very popular we, can recognize network profiles for the other classes of do-
websites (e.g.,,, and for mains.
each of the related domain names we extract their network fea-
tures, as explained in Section 3.2.1. We then use the extracted 3.3.3 Building Domain Name Clusters
feature vectors to train a statistical classifier that will be able
to recognize whether a new domain name d has network char- In this phase, the reputation engine takes the domain names
acteristics similar to the popular websites we modeled. collected in our pDNS database during a training period, and
In our current implementation of Notos we model the fol- builds clusters of domains that share similar network and zone
lowing classes of domain names: based features. The overview of this module can be seen
in Figure 4(b). We perform clustering in two steps. In the
• Popular Domains. This class consists of a large first step we only use the network-based features to create
set of domain names under the following DNS coarse-grained clusters. Then, in the second step, we split
zones:,,,, each coarse-grained cluster into finer clusters using only the,,, and zone-based features, as shown in Figure 5.

• Common Domains. This class of domains includes do-

Network-based Clustering The objective of network-based
main names under the top one hundred zones, accord-
clustering is to group domains that share similar levels of
ing to We exclude from this group all the
agility. This creates separate clusters of domains with “sta-
domain names already included in the Popular Domains
ble” network characteristics and “non-stable” networks (like
class (which we model separately).
CDNs and malicious flux networks).
• Akamai Domains. Akamai is a large content deliv-
ery network (CDN), and the domain names related to Zone-based Clustering After clustering the domain names
this CDN have very peculiar network characteristics. To according to their network-based features, we further split the
model the network profile of Akamai’s domain names, network-based clusters of domain names into finer groups.
we collect a set of domains under the following zones: In this step, we group domain names that are in the same
Domain Name (d)
Evidence Zone Network
Features Features Features

Domain Clustering Network Profiling

Module Module

F1 ... F6 F1 ... F5 F1 ... F5

EV(d) DC(d) NM(d)

Reputation Function
F1 ... F16 S
Reputation Engine

Figure 6. The output from the network profiling

Figure 5. Network & zone based clustering pro- module, the domain clustering module and the ev-
cess in Notos, in the case of a Akamai [A] and a idence vector will assist the reputation function to
malicious [B] domain name. assign the reputation score to the domain d.

network-based cluster and also share similar zone-based when we consider the set of RHDNs for d1 and d2 , we can
features. To better understand how the zone-based clustering notice that the zone-based features of d1 are much more “sta-
works, consider the following examples of zone-based clus- ble” than the zone-based features of d2 . In other words, while
ters: the RHDNs of d1 share strong domain name similarities (e.g.,
they all share the substring “akamai”) and have low variance of
Cluster 1: the string features (see Section 3.2.2), the strong zone agility
properties of d2 affect the zone-based features measured on,,,,
d2 ’s RHDNs and make d2 look very different from d1 .,,,,
One of the main advantages of Notos is the reliable as-,,,,
signment of low reputation scores to domain names partici-,,,,
pating in “agile” malicious campaigns. Less agile malicious, ... campaigns, e.g., Fake AVs campaigns may use domain names
structured to resemble CDN related domains. Such strate-
Cluster 2: gies would not be beneficial for the FakeAV campaign, since
domains like,,
...,,,,,,, etc., can be trivially blocked by using simple regular expres-,,,,, sions [16]. In other words, the attackers need to introduce,,,,,,, more “agility” at both the network and domain name level in,,,, ... order to avoid simple domain name blacklisting. Notos would
only require a few labeled domain names belonging to the ma-
Each element of the cluster is a domain name - IP ad- licious campaign for training purposes, and the reputation en-
dress pair. These two groups of domains belonged to the gine would then generalize to assign a low reputation score to
same network cluster, but were separated into two different the remaining (previously unknown) domain names that be-
clusters by the zone-based clustering phase. Cluster 1 con- long to the same malicious campaign.
tains domain names belonging to Akamai’s CDN, while the
domains in Cluster 2 are all related to malicious websites that 3.3.4 Building the Reputation Function
distribute malicious software. The two clusters of domains
share similar network characteristics, but have significantly Once we build a model of well-known network profiles (see
different zone-based features. For example, consider domain Section 3.3.2) and the domain clusters (see Section 3.3.3), we
names d1 =“” from the first cluster, and can build the reputation function. The reputation function will
d2 =“” from the second cluster. The reason why d1 and assign a reputation score in the interval [0, 1] to domain names,
d2 were clustered in the same network-based cluster is because with 0 meaning low reputation (i.e., likely malicious) and 1
the set of RHIPs (see Section 3.1) for d1 and d2 have similar meaning high reputation (i.e., likely legitimate). We imple-
characteristics. In particular, the network agility properties of ment our reputation function as a statistical classifier. In order
d2 make it look like if it was part of a large CDN. However, to train the reputation function, we consider all the domain
names di , i = 1, .., n in Notos’ knowledge base, and we feed in Section 3.3.2, and Spam Domains, Flux Domains, and Mal-
each domain di to the network profiles module and to the do- ware Domains.
main clusters module to compute two output vectors N M (di ) In order to compute the output vector DC(d), we compute
and DC(di ), respectively. We explain the details of how the following five statistical features: the majority class label
N M (di ) and DC(di ) are computed later in Section 3.3.5. For L (e.g., L may be equal to Malware Domain), i.e., the label
now it sufficient to consider N M (di ) and DC(di ) as two fea- that appears the most among the vectors vi ∈ Vd ; the stan-
ture vectors. For each di we also compute an evidence fea- dard deviation of label frequencies, i.e., given the occurrence
tures vector EV (di ), as described in Section 3.2.3. Let v(di ) frequency of each label among the vectors vi ∈ Vd we com-
be a feature vector that combines the N M (di ), DC(di ), and pute their standard deviation; given the subset Vd ⊆ Vd of
EV (di ) feature vectors. We train the reputation function us- vectors in Vd that are associated with label L, we compute
ing the labeled dataset L = {(v(di ), yi )}i=1..n , where yi = 0 the mean, median and standard deviation of the distribution
if di is a known malicious domain name, otherwise yi = 1. of distances between zd and the vectors vj ∈ Vd .

3.3.5 On-Line Mode 3.3.6 Assigning Reputation Scores

After training is complete; the reputation engine can be used Given a domain d, once we compute the vectors N M (d) and
in on-line mode (Figure 3) to assign a reputation score to new DC(di ) as explained above, we also compute the evidence
domain names. For example, given an input domain name vector EV (d) as explained in Section 3.2.3. At this point, we
d, the reputation engine computes a score S ∈ [0, 1]. Val- concatenate these three feature vectors into a sixteen dimen-
ues of S close to zero mean that d appears to be related to sional feature vector v(d), and we feed v(d) in input to our
malicious activities and therefore has a low reputation. On trained reputation function (see Section 3.3.4). The reputa-
the other hand, values of S close to one signify that d ap- tion function computes a score S = 1 − f (d), where f (d) can
pears to be associated with benign Internet services, and there- be interpreted as the probability that d is a malicious domain
fore has a high reputation. The reputation score is computed name. S varies in the [0, 1] interval, and the lower the value of
as follows. First, d is fed into the network profiles module, S, the lower d’s reputation.
which consists of five statistical classifiers, as discussed in
Section 3.3.2. The output of the network profiles module is 4 Data Collection and Analysis
a vector N M (d) = {c1 , c2 , ..., c5 }, where c1 is the output of
the first classifier, and can be viewed as the probability that This section summarizes observations from passive DNS
d belongs to the class of Popular Domains, c2 is the proba- measurements, and how professional, legitimate DNS services
bility that d belongs to the class of Common Domains, etc. are distinguished from malicious services. These observations
At the same time, d is fed into the domain clusters module, provided the ground truth for our dynamic domain name rep-
which computes a vector DC(d) = {l1 , l2 , ..., l5 }. The ele- utation system. We also provide an intuitive example to illus-
ments li of this vector are computed as follows. Given d, we trate these properties, using a few major Internet zones like
first extract its network-based features and identify the closest Akamai and Google.
network-based cluster to d, among the network-based clusters
computed by the domain clusters module during the off-line 4.1 Data Collection
mode (see Section 3.3.3). Then, we extract the zone-based
statistical features and identify the zone-based cluster closest The basic building block for our dynamic reputation rating
to d. Let this closest domain cluster be Cd . At this point, we system is the historical or “passive” information from success-
consider all the zone-based feature vectors vj ∈ Cd , and we ful A-type DNS resolutions. We use the DNS traffic from
select the subset of vectors Vd ⊆ Cd for which the two fol- two ISP-based sensors, one located on the US east coast (At-
lowing conditions are verified: i) dist(zd , vj ) < R, where zd lanta) and one located on the US west coast (San Jose). Addi-
is the zone-based feature vector for d, and R is a predefined tionally we use the aggregated DNS traffic from the different
radius; ii) vj ∈ KN N (zd ), where KN N (zd ) is the set of k networks covered by the SIE [3]. In total, our database col-
nearest-neighbors of zd . lected 27,377,461 unique resolutions from all these sources
The feature vectors in Vd are related to domain names ex- over a period of 68 days, from 19th of July 2009 to 24th
tracted from Notos’ knowledge base. Therefore, we can assign September 2009.
a label to each vector vi ∈ Vd , according to the nature of the Simple measurements performed on this large data set
domain name d from which vi was computed. The domains in demonstrate a few important properties leveraged by our se-
Notos’ knowledge base belong to different classes. In particu- lected features. After just a few days the rate of new, unique
lar, we distinguish between eight different classes of domains, pDNS entries leveled off. The graph in Figure 7(b) shows
namely Popular Domains, Common Domains, Akamai, CDN, only about 100,000 to 150,000 new domains/day (with a brief
and Dynamic DNS, which have the same meaning as explained outage issue on the 53rd day), despite very large numbers of
(a) Unique RRs In The Two ISPs Sensors (per day) (c) Akamai Class Growth (d) CDN Class Growth
4e+06 Over Time (Days) Over Time (Days)
Unique RRs

Volume Of

3e+06 10000 10000

1e+06 1000
500000 Unique RRs
0 1000
0 10 20 30 40 50 60 70


Days 100
(b) New RRs Growth In pDNS DB For All Zones 100
1e+07 10
Volume Of
New RRs

1000 1 10
100 New RRs 1 10 100 1 10 100
10 Unique DN Unique DNs
0 10 20 30 40 50 60 70 Unique IPs Unique IP
Days New RRs New RRs

(e) Pop Class Growth (f) Dyn. DNS Class Growth (g) Common Class Growth (h) CDF Of RR Growth
Over Time (Days) Over Time (Days) Over Time (Days) For All Classes
100000 1000 100 100000



1000 Volume 10

10 1 1 1
1 10 100 1 10 100 1 10 100 0.01 0.1 1
Unique DN Unique DN Unique DN Akamai CDN
Unique IP Unique IP Unique IPs Common Dynamic
New RRs New RRs New RRs Pop

Figure 7. Various RRs growth trends observed in the pDNS DB over a period of 68 days

RRs arriving each day (shown in Figure 7(a)). This suggests the domain, instead of the URI), which explains the growth
that most RRs are duplicates, and approximately after the first in domains shown in Figure 7(e). These popular sites use a
few days, 94.7% – on average – from the unique RRs ob- very small number of IPs, however, and after a few weeks of
served in daily base at the sensor level are already recorded by training our pDNS database identified all of them. Since these
the passive DNS database. Therefore, even a relatively small popular domains make up a large portion of traffic in any trace,
pDNS database may be used to deploy Notos. In Section 5, we our intuition is that simple whitelisting would significantly re-
measure the sensitivity of our system to traffic collected from duce the workload of a classifier.
smaller networks. Figure 7(f) shows the rate of pDNS growth for zones in
The remaining plots in Figure 7 show the daily growth of Dynamic DNS providers. These services, sometimes used by
our passive DNS database, from the point of view of five dif- botmasters, demonstrate a nearly matched ratio of new IPs to
ferent zone classes. Figure 7(c) and (d) show the growth rate new domains. The data excludes non-routable answers (e.g.,
associated with CDN networks (Akamai, and all other CDNs). dynamic DNS domains pointing to, since this con-
The number of unique IPs stays nearly constant with the num- tains no unique network information. Intuitively, one can think
ber of unique domains (meaning that each new RR is a new of dynamic DNS as a nearly complete bijection of domains to
IP and a new child domain of the CDN). In a few weeks, most IPs. Figure 7(g) shows the growth of RRs for
of the IPs became known—suggesting that one can fully map top 100 domains. Unlike dynamic DNS domains, these points
CDNs in a modest training set. This is because CDNs, al- to a small set of unique addresses, and most can be identified
though large, always have a fixed number of IP addresses used in a few weeks’ worth of training.
for hosting their high-availability services. Intuitively, we be- A comparison of all the zone classes appears in Figure 7(h),
lieve this would not be the case with malicious CDNs (e.g., which shows the cumulative distribution of the unique RRs de-
flux networks), which use randomly spreading infections to tailed in Figure 7(c) through (g). The different rates of change
continually recruit new IPs. illustrate how each zone class has a distinct pattern of RR use:
The ratio of new IPs to domains diverges in Figure 7(e), some have a small IP space and highly variable domain names;
a plot of the rate of newly discovered RRs for popular web- some pair nearly every new domain with a new IP. Learning
sites (e.g., Google, Facebook). Facebook notably uses unique approximately 90% of all the unique RRs in each zone class,
child domains for their Web-based chat client, and other top however, only requires (at most) tens of thousands of distinct
Internet sites use similar strategies (encoding information in RRs. The intuition from this plot is that, despite the very large
data set we used in our study, Notos could potentially work ecution. After excluding all domain names that belong to the
with data observed from much smaller networks. top 500 most popular zones, we assemble the
main corpus of our “honeypot data”. We automated the crawl-
4.2 Building The Ground Truth ing and collection of black list information and honeypot exe-
To establish ground truth, we use two different labeling The reader should note that we chose to label our data in
processes. First, we assigned labels to RRs at the time of their as transparent way as possible. We used public blacklisting
discovery. This provided an initial static label for many do- information to label our training dataset before we build our
mains. Blacklists, of course, are never complete and always models and train the reputation function. Then we assigned
dynamic. So our second labeling process took place during the reputation scores and validated the results again using the
evaluation, and monitored several well-known domain black- same publicly available blacklist sources. It is safe to as-
lists and whitelists. sume that private IP and DNS blacklist will contain significant
The data we used for labeling came from several sources. more complete information with lower FP rates than the public
Our primary source of blacklisting came from services blacklists. By using such type of private blacklist the accuracy
such as and malwaredo- of Notos’ reputation function should improve significantly. In order to label IP addresses in our pDNS
database we also used the Sender Policy Block (SBL) list from 5 Results
Spamhaus [18]. Such IPs are either known to send spam or
distribute malware. We also collected domain name and IP In this section, we present the experimental results of our
blacklisting information from the Zeus tracker [30]. All this evaluation. We show that Notos can identify malicious domain
blacklisting information was gathered before the first day of names sooner than public blacklists, with a low false posi-
August 2009 (during all the 15 days in which we collected tive rate (FP%) of 0.38% and high true positive rate (TP%)
passive DNS data). Since blacklists traditionally lag behind of 96.8%. As a first step, we computed vectors based on
the active threat, we continued to collect all new data until the the statistical features (described in Section 3.2) from 250,000
end of our experiments. unique RRs. This volume corresponds to the average volume
Our limited whitelisting was derived from the top 500- of new – previously unseen – RRs observed at two recursive domain names, as of the 1st of August 2009. We DNS servers in a major ISP in one day, as noted in Section 4,
reasoned that, although some malicious domains become pop- Figure 7(b). These vectors were computed based on historic
ular, they do not stay popular (because of remediation), and passive DNS information from the last two weeks of DNS traf-
never break into the top tier of domain rankings. Likewise, we fic observed on the same two ISP recursive resolvers in Atlanta
used a list of the 18 most common 2LDs from various CDNs, and San Jose.
which composed the main corpus of our CDN labeled RRs.
Finally a list of 464 dynamic DNS second level domains al- 5.1 Accuracy of Network Profile Modeling
lowed us to identify and label domain name and IPs coming
from zones under dynamic DNS providers. We label our eval- The accuracy of the Meta-Classification system (Fig-
uation (or testing) data-set by aggregating updated blacklist ure 4(a)) in the network profile module is critical for the over-
information for new malicious domain names and IPs from all performance of Notos. This is because, in the on-line mode,
the same lists. Notos will receive unlabeled vectors which must be classified
To compute the honeypot features (presented in Sec- and correlated with what is already present in our knowledge
tion 3.2.3) we need a malware analysis infrastructure that can base. For example, if the classifier receives a new RR and as-
process as many “new” malware samples as possible. Our signs to it the label Akamai with very high confidence, that
honeypot infrastructure is similar to “Ether” [4] and is capa- implies the RR which produced this vector will be part of a
ble of processing malware samples in a queue. Every malware network similar to Akamai. However, this does not necessar-
sample was analyzed in a controlled environment for a time ily mean that it is part of the actual Akamai CDN. We will see
period of five minutes. This process was repeated during the in the next section how we can draw conclusions based on the
last 15 days of July 2009. After 15 days of executions we proximity between labeled and unlabeled RRs within the same
obtained a set of successful DNS resolutions (domain names zone-based clusters. Furthermore, we discuss the accuracy
and IPs) that each malware looked up. We chose to execute of the Meta-Classifier when modeling each different network
malware and collect DNS evidence through the same period profile class (profile classes are described in Section 3.3.2).
of time in which we aggregate the passive DNS database. Our Our Meta-Classifier consists of five different classifiers,
virtual machines are equipped with five popular commercial one for each different class of domains we model. We chose to
anti-virus engines. If one of the engines identifies an exe- use a Meta-Classification system instead of a traditional sin-
cutable as malicious, we capture all domain names and the gle classification approach because Meta-Classification sys-
corresponding IP mappings that the malware used during ex- tems typically perform better than a single statistical classi-
False Positive Rate vs True Positive Rate False Positive Rate vs True Positive Rate
1 1

0.99 0.99

0.98 0.98

0.97 TP over All Pos. vs Threshold 0.97 TP over All Pos. vs Threshold
True Positive Rate

True Positive Rate

0.96 1 0.96 1
0.98 0.98
0.96 0.96

0.95 0.94 0.95 0.94
0.92 0.92
0.9 0.9
0.94 0.88 0.94 0.88
0.86 0.86
0.84 0.84
0.93 0.82 0.93 0.82 ROC
0.8 0.8
0.92 0 Akamai 0.92 0 1
Threshold Popular Threshold
0.91 Common 0.91
Dynamic ROC
0.9 0.9
0 0.05 0.1 0.15 0.2 0 0.02 0.04 0.06 0.08 0.1
False Positive Rate False Positive Rate

Figure 8. ROC curves for all network profile Figure 9. The ROC curve from the reputation func-
classes shows the Meta-Classifier’s accuracy. tion indicating the high accuracy of Notos.

fier [11, 2]. Throughout our experiments this proved to be 5.2 Network and Zone-Based Clustering Results
also true. The ROC curve in Figure 8, shows that the Meta-
Classifier can accurately classify RRs for all different network In the domain name clustering process (Section 3.3.3, Fig-
profile classes. ure 4(b)) we used X-Means clustering in series, once for the
The training dataset for the Meta-Classifier is composed network-based clustering and again for the zone-based clus-
of sets of 2,000 vectors from each of the five network profile tering. In both steps we set the minimum and maximum num-
classes. The evaluation dataset is composed of 10,000 vectors, ber of clusters to one and the total number of vectors in our
2,000 from each of the five network profile classes. The classi- dataset, respectively. We run these two steps using different
fication results for the domains in the Akamai, CDN, dynamic numbers of zone and network vectors. Figure 11 shows that
DNS and Popular classes showed that the supervised learn- after the first 100,000 vectors are used, the number of network
ing process in Notos is accurate, with the exception of a small and zone clusters remains fairly stable. This means that by
number of false positives related to the Common class (3.8%). computing at least 100,000 network and zone vectors—using
After manually analyzing these false positives, we concluded a 15-day old passive DNS database—we can obtain a stable
that some level of confusion between the vectors produced by population of zone and network based clusters for the moni-
Dynamic DNS domain names and the vectors produced by tored network. We should note that reaching this network and
domain names in the Common class still remains. However, cluster equilibrium does not imply that we do not expect to
this minor misclassification between network profiles does not see any new type of domain names in the ISP’s DNS recur-
significantly affect the reputation function. This is because sive. This just denotes that based on the RRs present in our
the zone profiles of the Common and Dynamic DNS domain passive DNS database, and the daily traffic at the ISP’s recur-
names are significantly different. This difference in the zone sive, 100,000 vectors are enough to reflect the major network
profiles will drive the network-based and zone-based cluster- profile trends in the monitored networks. Figure 11 indicates
ing steps to group the RRs from Dynamic DNS class and Com- that a sample set of 100,000 vectors may represent the major
mon class in different zone-based clusters. trends in a DNS sensor. It is hard to safely estimate the exact
minimum number of unique RRs that is sufficient to identify
Despite the fact that the network profile modeling process all major DNS trends. An answer to this should be based upon
provides accurate results, it doesn’t mean this step can inde- the type, size and utilization of the monitored network. With-
pendently designate a domain as benign or malicious. The out data from smaller corporate networks it is difficult for us
clustering steps will assist Notos to group vectors not only to make a safe assessment about the minimum number of RR
based their network profiles but also based on their zone prop- necessary for reliably training Notos.
erties. In the following section we show how the network and The evaluation dataset we used consisted of 250,000 unique
zone profile clustering modules can better associate similar domain names and IPs. The cluster overview is shown in Fig-
vectors, due to properties of their domain name structure. ure 10 and in the following paragraphs we discuss some in-
1st Level (Network Based) Clusters
2nd Level (Zone Based) Clusters




Number of Clusters Produced








0 50000 100000 150000 200000 250000
Number of Vectors Used

Figure 11. By using different number of network

and zone vectors we observe that after the first
100,000, there is no significant variation in the ab-
Figure 10. With the 2-step clustering step, Notos solute number of produced clusters during the 1st
is able to cluster large trends of DNS behavior. and 2nd level clustering steps.

teresting observations that can be made from these network- of these pilot experiments, we decided to set k equal to 50 and
based and zone-based cluster assignments. As an example, the radius distance equal to 100.
network clusters 0 and 1 are predominantly composed of zones Figures 12 and 13 show the effect of this radius selection
participating in fraudulent activities like spam campaigns (yel- on two different types of clustering problems. In Figure 12,
low) and malware dropping or C&C zones (red). On the other unknown RRs for are clustered with a
hand, network clusters 2 to 5 contain Akamai, dynamic DNS, labeled vector As noted in Section 4, CDNs
and popular zones like Google, all labeled as benign (green). such as Akamai tended to have new domain names with each
We included the unlabeled vectors (blue) based on which we RR, but to also reuse their IPs. By training with only a small
evaluated the accuracy of our reputation function. We have a set of labeled RRs, our classifier put the new,
sample of unlabeled vectors in almost all network and zone unknown RRs for into the existing Aka-
clusters. We will see how already labeled vectors will assist mai class. IP-specific features therefore brought the new RRs
us to characterize the unlabeled vectors in close proximity. close to the existing labeled class. Figure 12 compresses all
Before we describe two sample cases of dynamic charac- of the dimensions into a two-dimensional plot (for easier vi-
terization within zone-based clusters, we need to discuss our sual representation), but it is clear the unknown RRs were all
radius R and k value selection (see Section 3.3.5). In Sec- within a distance of 100 to the labeled set.
tion 3.3.5, we discuss how we build domain name clusters. This result validates the design used in Section 4, where
At that point we introduced the dynamic characterization pro- just a few weeks’ worth of labeled data was necessary for
cess that gives Notos the ability to utilize already label vectors training. Thus, one does not have to exhaustively discover all
in order to characterize a newly obtained unlabeled vector by whitelisted domains. Notos is resilient to changes in the zone
leveraging our prior knowledge. After looking into the distri- classes we selected. Services like CDNs and major web sites
bution of Euclidean distances between unlabeled and labeled can add new IPs or adjust domain formats, and these will be
vectors within the same zone clusters, we concluded that in the automatically associated with a known labeled class.
majority of these cases the distances were between 0 and 1000. The ability of Notos to associate new RRs based on lim-
We tested different values of the radius R and the value of k ited labeled inputs is demonstrated again in Figure 13. In
for the K-nearest neighbors (KNN) algorithm. We observed this case, labeled Zeus domains (approximately 2,900 RRs
that the experiments with radius values between 50 and 200 from three different Zeus-related BLs) were used to clas-
provided the most accurate reputation rating results, which we sify new RRs. Figure 13 plots the distance between the la-
describe in the following sections. We also observed that if beled Zeus-related RRs and new (previously unknown) RRs
k > 25 the accuracy of the reputation function is not affected that are also related Zeus botnets. As we can see from
for all radius values between 50 and 200. Based on the results Section 4, most of the new (unlabeled) Zeus RRs lay very
Clustering and Vectors Clustering The Zeus Botnet

CMD 2D Scale (1)

CMD Scale (1)


0 -400

-200 -600

-400 -200 0 200 400 600 800 1000 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000
CMD Scale (2) CMD 2D Scale (2) Labeled Zeus Unlabeled Zeus

Figure 12. An example of characterizing the aka- Figure 13. An example of how the Zeus botnet unknown vectors as benign based on clusters during our experiments. All vectors are
the already labeled vectors ( present in the same network cluster and in two different
in the same cluster. zone clusters.

close, and often even overlap, to known Zeus RRs. This names. We experimented with bot the top 10,000 and top
is a good result, because Zeus botnets are notoriously hard 100,000 Alexa domain names. The detection results for these
to track, given the botnet’s extreme agility. Tracking sys- experiments are as follows. When using the top 10,000 Alexa
tems such as and malware- domains, we obtained a true positive rate of 93.6% and a false have limited visibility into the botnet, positive rate of 0.4% (again using 10-fold cross-validation and
and often produce disjoint blacklists. Notos addresses this a detection threshold equal to 0.5). As we can see, these results
problem, by leveraging a limited amount of training data to are not very different from the ones we obtained using only
correctly classify new RRs. During our evaluation set, Notos the top 500 Alexa domains. However, when we extended our
correctly detected 685 new (previously unknown) Zeus RRs. list of known good domains to include the top 100,000 Alexa
domain names, we observed a significant decrease of the true
5.3 Accuracy of the Reputation Function positive rate and an increase in the false positives. Specifically,
we obtained a TP% of 80.6% and a FP% of 0.6%. We believe
The first thing that we address in this section is our deci- this degradation in accuracy may be due to the fact that the
sion to use a Decision Tree using Logit-Boost strategy (LAD) top 100,000 Alexa domains include not only professionally
as the reputation function. Our decision is motivated by the run domains and network infrastructures, but also include less
time complexity, the detection results and the precision (true good domain names, such as file-sharing, porn-related web-
positives over all positives) of the classifier. We compared sites, etc., most of which are not run in a professional way and
the LAD classifier to several other statistical classifiers using have disputable reputation1 .
a typical model selection procedure [6]. LAD was found to We also wanted to evaluate how well Notos performs, com-
provide the most accurate results in the shortest training time pared to static blacklists. To this end, we performed a number
for building the reputation function. As we can see from the of experiments as follows. Given an instance of Notos trained
ROC curve in Figure 9, the LAD classifier exhibits a low false with data collected up to July 31, 2009, we fed Notos with
positive rate (FP%) of 0.38% and true positive rate (TP%) of 250,000 distinct RRs found in DNS traffic we collected on
96.8%. It is was noting that these results were obtained using August 1, 2009. We then computed the reputation score for
10-fold cross-validation, and the detection threshold was set each of these RRs. First, we set the detection threshold to 0.5,
to 0.5. The dataset using for the evaluation contained 10,719 and with this threshold we identified 54,790 RRs that had a
RRs related to 9,530 known bad domains. The list of known low reputation (lower than the threshold). These RRs where
good domains consisted of the top 500 most popular domains
1 A quick analysis of the top 100,000 Alexa domains reported that about
according to Alexa.
5% of the domains appeared in the SURBL ( blacklist, at
We also benchmarked the reputation function on other two certain point in time. A more rigorous evaluation of these results is left to
datasets containing a larger number of known good domain future work.
domain names with very little historic (passive DNS) informa-
tion. Sufficient time and a relatively large passive DNS collec-
(a) Overall Volume of Malicious RRs (c)Malware/Trojans, Exploits and
tion are required to create an accurate passive DNS database.
Rogue AV RRs Identified
Therefore, if an attacker always buys new domain names and
Volume Of RRs

100 new address space, and never reuses either resource for any
100 10 other malicious purposes, Notos will not be able to accurately

0 20 40 60 80 100 assign a reputation score to the new domains. In the IPv4
0 20 40 60 80
Days After Training
Days After Training
Rogue AV
space, this is very unlikely to happen due to the impending ex-
haustion of the available address space. Once IPv6 becomes
(b) Flux and Spam RRs Identified
(d) Botnet RRs Identified
the predominant protocol, however, this may represent a prob-
Volume Of RRs

lem for the statistical features we extract based on IP granular-

ity. However, we believe the features based on BGP prefixes
0 20 40 60 80 100
and AS numbers would still be able to capture the agility typ-
0 5 10 15 20 25 30 35
Days After Training
Days After Training ical of malicious DNS hosting behavior.
Zeus R.F.I
Flux Spam Koobface Various Bots
As long as newly generated domain names share some net-
Figure 14. Dates in which various blacklists con- work properties (e.g., IPs or BGP prefixes) with already la-
firmed that the RRs were malicious after Notos beled RRs, Notos will be able to assign an accurate reputa-
assigned low reputation to them on the 1st of tion score. In particular, since network resources are finite and
August. more expensive to renew or change, even if the domain prop-
erties change, Notos can still identify whether a domain name
may be associated with malicious behavior. In addition, if a
given domain name for which we want to know the reputation
related to a total of 10,294 distinct domain names (notice that is not present in the passive DNS DB, we can actively probe it,
a domain name may map to more than one IP, and this ex- thus forcing a related passive DNS entry. However, this is pos-
plains the higher number of RRs). Of these 10,294 domains, sible only when the domain successfully maps to a non-empty
7,984 (77.6%) appeared in at least one of the public black- set of IPs.
lists we used for comparison (see Section 4) within 60 day
after August 1, and were therefore confirmed to be malicious. Our experimental results using the top 10,000 Alexa do-
Figure 14(a) reports the number and date in which RRs classi- main names as known good domains, report a false positive
fied as having low reputation by Notos appeared in the public fate of 0.4%. While low in percentage, the absolute number of
blacklists. The remaining three plots (Figure 14(b), (c) and false positives may become significant in those cases in which
(d)), report the same results organized according to the type of very large numbers of new domain names are fed to Notos on
malicious domains. In particular, it is worth noting that Notos a daily bases (e.g., in case of deployment in a large ISP net-
is able to detect never-before-seen domain names related to the work). However, we envision our Notos reputation system to
Zeus botnet several days or even weeks before they appeared be use not as a stand-alone system, but rather in cooperation
in any of the public blacklists. with other defense mechanisms. For example, Notos may be
For the remaining 22.4% of the 10,294 domains we consid- used in collaboration with spam-filtering system. If an email
ered, we were not able to draw a definitive conclusion. How- contains a link to a website whose domain name has a low rep-
ever, we believe many of those domains are involved in some utation score according to Notos, the spam filter can increase
kind of more or less malicious activities. We also noticed the total spam-score of the email. However, if the rest of the
that 7,980 or the 7,984 confirmed bad domain names were email appears to be benign, the spam filter may still decide to
assigned a reputation score lower or equal to 0.15, and that accept the email.
none of the other non-confirmed suspicious domains received During our manual analysis of (a subset of) the false pos-
a score lower than this threshold. In practice, this means that itives encountered in our evaluations we were able to draw
an operator who would like to use Notos as a stand-alone dy- some interesting observation. We found that a number of le-
namic blacklisting system while limiting the false positives to gitimate sites (e.g., are being hosted in net-
a negligible (or even zero) amount may fine-tune the detection works that host large volumes of malicious domain names in
threshold and set it around 0.15. them. In this cases Notos will tend to penalize the reputation
of this legitimate domains because they reside in a bad neigh-
5.4 Discussion borhood. In time, the reputation score assigned to these do-
mains score may change, if the administrators of the network
This section discusses the limits of Notos, and the poten- in which the benign domain name are hosted take actions to
tial for evasion in real networks. On of the main limitations “clean up” their networks and stop hosting bad domain names
is the fact that Notos is unable to assign reputation scores for within their address space.
Domain Name IP Date Domain Name IP Type Src Date 08-15 MAL [1] 08-26 08-15 MAL [2] 08-30 08-15 RAV [3] 09-05 08-15 CWS [2] 09-05 08-15 CWS [4] 09-05 08-15 RAV [2] 09-05 08-15 MAL [2] 09-09 08-19 BOT [2] 09-13 08-19 KBF [5] 09-19 09-02 EXP [6] 09-22 09-19 RAV [2] 10-06 09-27 09-02

Table 2. Anecdotal cases of malicious domain

Table 1. Sample cases form Zeus domains de- names detected by Notos and the correspond-
tected by Notos and the corresponding days ing days that appeared in the public BLs .[1]:
that appeared in the public BLs. All evidence, [2]:, [3], [4]
information in this table were harvested from, [5], [6] malwaredo-

6 Conclusion This material is based upon work supported in part by

the National Science Foundation under grant no. 0831300,
In this paper, we presented Notos, a dynamic reputation the Department of Homeland Security under contract no.
system for DNS. To the best of our knowledge, Notos is the FA8750-08-2-0141, the Office of Naval Research under grants
first system that can assign a dynamic reputation score to any no. N000140710907 and no. N000140911042. Any opinions,
domain name in a DNS query that traverses the edge of a findings, and conclusions or recommendations expressed in
monitored network. Notos harvests information from multiple this material are those of the authors and do not necessarily
sources such as the DNS zone domain names belongs to, the reflect the views of the National Science Foundation, the De-
related IP addresses, BGP prefixes, AS information and hon- partment of Homeland Security, or the Office of Naval Re-
eypot analysis to maintain up-to-date DNS information about search.
legitimate and malicious domain names. Based on this infor-
mation, Notos uses automated classification and clustering al- References
gorithms to model network and zone behaviors of legitimate
and malicious domains, and then applies these models to com-
pute a reputation score for a (new) domain name. [1] D. Anderson, C. Fleizach, S. Savage, and G. Voelker.
Our evaluation using real-world data, which includes traf- Spamscatter: Characterizing internet scam hosting in-
fic from large ISP networks, demonstrates that Notos is highly frastructure. In Proceedings of the USENIX Security
accurate in identifying new malicious domains in the moni- Symposium, 2007.
tored DNS query traffic, with a true positive rate of 96.8% and
false positive rate of 0.38%. In addition, Notos is capable of [2] L. Breiman. Bagging predictors. Machine learning,
identifying these malicious domain weeks or even months be- 24(2):123–140, 1996.
fore they appear in public blacklists, thus enabling proactive
security countermeasures against cyber attacks. [3] Internet Systems Consortium. SIE@ISC : Security Infor-
mation Exchange., 2004.

7 Acknowledgments [4] A. Dinaburg, R. Royal, M. Sharif, and W. Lee. Ether:

malware analysis via hardware virtualization extensions.
We thank Steven Gribble, our shepherd, for helping us to In ACM CCS, 2008.
improve the quality of the final version of this paper, and
the anonymous reviewers for their constructive comments. [5] SORBS DNSBL. Fighting spam by finding and list-
We also thank Gunter Ollmann and Robert Edmonds for ing Exploitable Servers.
their valuable comments. Additionally, we thank the Internet net/, 2007.
Security Consortium Security Information Exchange project
(ISC@SIE) for providing portion of the DNS data used in our [6] R. Duda, P. Hart, and D. Stork. Pattern Classification.
experiments. Wiley-Interscience, 2nd edition, 2000.
[7] M. Felegyhazi, C. Keibich, and V. Paxson. On the poten- [22] S. Shevchenko. Srizbi Domain Generator Calculator.
tial of proactive domain blacklisting. In Third USENIX
LEET Workshop, 2010. srizbis-domain-calculator.html, 2008.
[8] S. Garera, N. Provos, M. Chew, and A. Rubin. A frame- [23] K. Sato, K. Ishibashi, T. Toyono, and N. Miyake. Ex-
work for detection and measurement of phishing attacks. tending black domain name list by using co-occurrence
In Proceedings of the ACM WORM. ACM, 2007. relation between dns queries. In Third USENIX LEET
Workshop, 2010.
[9] B. Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szyd-
lowski, R. Kemmerer, C. Kruegel, and G. Vigna. Your [24] S. Sinha, M. Bailey, and F. Jahanian. Shades of grey: On
botnet is my botnet: analysis of a botnet takeover. In the effectiveness of reputation-based blacklists. In 3rd
ACM CCS 09, New York, NY, USA, 2009. ACM. International Conference on MALWARE, 2008.
[10] T. Holz, C. Gorecki, K. Rieck, and F. Freiling. Measur- [25] The Honeynet Project & Research Alliance. Know Your
ing and detecting fast-flux service networks. In Proceed- Enemy: Fast-Flux Service Networks. http://old.
ings of NDSS, 2008.,
[11] T. Hothorn and B. Lausen. Double-bagging: Combining 2007.
classifiers by bootstrap aggregation. Pattern Recogni- [26] URIBL. Real time URI blacklist. http://uribl.
tion, 36(6):1303–1309, 2003. com.
[12] P. Mockapetris. Domain names - concepts and fa- [27] F. Weimer. Passive DNS replication. In Proceedings of
cilities. FIRST Conference on Computer Security Incident, Hand
txt, 1987. ling, Singapore, 2005.
[13] P. Mockapetris. Domain names - implementation [28] Z. Qian, Z. Mao, Y. Xie and F. Yu. On network-
and specification. level clusters for spam detection. In Proceedings of the
rfc1035.txt, 1987. USENIX NDSS Symposium, 2010.
[14] OPENDNS. OpenDNS — Internet Navigation And Se-
[29] B. Zdrnja, N. Brownlee, and D. Wessels. Passive mon-
curity., 2010.
itoring of DNS anomalies. In Proceedings of DIMVA
[15] P. Porras, H. Saidi, and V. Yegneswaran. An Analysis Conference, 2007.
of Conficker’s Logic and Rendezvous Points. http:
[30] Zeus Tracker. Zeus IP & domain name block list.
//, 2009., 2009.
[16] R. Perdisci, W. Lee, and N. Feamster. Behavioral cluster-
ing of http-based malware and signature generation using [31] J. Zhang, P. Porra, and J. Ullrich. Highly predictive
malicious network traces. In USENIX NSDI, 2010. blacklisting. In Proceedings of the USENIX Security
Symposium, 2008.
[17] D. Plonka and P. Barford. Context-aware clustering of
DNS query traffic. In Proceedings of the 8th IMC, Vou-
liagmeni, Greece, 2008. ACM.
[18] The Spamhaus Project. ZEN - Spamhaus DNSBLs., 2004.
[19] R. Perdisci, I. Corona, D. Dagon, and W. Lee. Detecting
malicious flux service networks through passive analy-
sis of recursive DNS traces. In Proceedings of ACSAC,
Honolulu, Hawaii, USA, 2009.
[20] P. Royal. Analysis of the kraken botnet.
r_pubs/KrakenWhitepaper.pdf, 2008.
[21] S. Hao, N. Syed, N. Feamster, A. Gray and S.
Krasser. Detecting spammers with SNARE: Spatio-
temporal network-level automatic reputation engine. In
Proceedings of the USENIX Security Symposium, 2009.

