IEEE INFOCOM 2005
1
ClassBench: A Packet Classification Benchmark
David E. Taylor, Jonathan S. Turner
Applied Research Laboratory
Washington University in Saint Louis
{det3,jst}@arl.wustl.edu
Abstract— Packet classification is an enabling technology for
next generation network services and often the primary bottleneck
in high-performance routers. The performance and capacity of
many algorithms and classification devices, including TCAMs, depend upon properties of the filter set and query patterns. Despite
the pressing need, no standard filter sets or performance evaluation tools are publicly available. In response to this problem, we
present ClassBench, a suite of tools for benchmarking packet classification algorithms and devices. ClassBench includes a Filter Set
Generator that produces synthetic filter sets that accurately model
the characteristics of real filter sets. Along with varying the size of
the filter sets, we provide high-level control over the composition
of the filters in the resulting filter set. The tool suite also includes a
Trace Generator that produces a sequence of packet headers to exercise packet classification algorithms with respect to a given filter
set. Along with specifying the relative size of the trace, we provide
a simple mechanism for controlling locality of reference. While we
have already found ClassBench to be very useful in our own research, we seek to eliminate the significant access barriers to realistic test vectors for researchers and initiate a broader discussion
to guide the refinement of the tools and codification of a formal
benchmarking methodology. The ClassBench tools are publicly
available at the following site:
http://www.arl.wustl.edu/˜det3/ClassBench/
I. I NTRODUCTION
EPLOYMENT of next generation network services
hinges on the ability of Internet infrastructure to perform
packet classification at physical link speeds. A packet classifier
must compare header fields of every incoming packet against a
set of filters in order to assign a flow identifier that is used to
apply security policies, application processing, and quality-ofservice guarantees. Typical packet classification filter sets have
fewer than a thousand filters and reside in enterprise firewalls
or edge routers. As network services continue to migrate into
the network core, it is anticipated that filter sets could swell to
tens of thousands of filters or more. The most common type
of packet classification examines the packet header fields comprising the standard IP 5-tuple. A packet classifier searches for
the highest priority filter or set of filters matching the packet
where each filter specifies a prefix of the IP source and destination addresses, an exact match or wildcard for the transport
protocol number, and ranges for the source and destination port
numbers for TCP and UDP packets.
As reported in Section III, it has been observed that real filter sets exhibit a considerable amount of structure. In response,
several algorithmic techniques have been developed which exploit filter set structure to accelerate search time or reduce storage requirements [1]. Consequently, the performance of these
D
This work supported by the National Science Foundation, ANI-9813723.
approaches are subject to the structure of filter sets. Likewise, the capacity and efficiency of the most prominent packet
classification solution, Ternary Content Addressable Memory
(TCAM), is also subject to the characteristics of filter sets [1].
Despite the influence of filter set composition on the performance of packet classification search techniques and devices,
no publicly available benchmarking tools or filter sets exist for
standardized performance evaluation. Due to security and confidentiality issues, access to large, real filter sets has been limited to a small subset of the research community. Some researchers in academia have gained access to filter sets through
confidentiality agreements, but are unable to distribute those filter sets. Furthermore, performance evaluations using real filter
sets are restricted by the size and structure of the sample filter
sets.
In order to facilitate future research and provide a foundation
for a meaningful benchmark, we present ClassBench, a publicly
available suite of tools for benchmarking packet classification
algorithms and devices. As shown in Figure 1, ClassBench consists of three tools: a Filter Set Analyzer, Filter Set Generator,
and Trace Generator. The general approach of ClassBench is to
construct a set of benchmark parameter files that specify the relevant characteristics of real filter sets, generate a synthetic filter
set from a chosen parameter file and a small set of high-level
inputs, and generate a sequence of packet headers to probe the
synthetic filter set using the Trace Generator. Parameter files
contain various statistics and probability distributions that guide
the generation of synthetic filter sets. The Filter Set Analyzer
tool extracts the relevant statistics and probability distributions
from a seed filter set and generates a parameter file. This provides the capability to generate large synthetic filter sets which
model the structure of a seed filter set. In Section IV we discuss the statistics and probability distributions contained in the
parameter files that drive the synthetic filter generation process.
The Filter Set Generator takes as input a parameter file and
a few high-level parameters. In addition the filter set size parameter, the smoothing and scope parameters provide high-level
control over the composition of the filter set, abstracting the
user from the low-level statistics and distributions contained in
the parameter files. The smoothing adjustment provides a structured mechanism for introducing new address aggregates which
is useful for modeling filter sets significantly larger than the filter set used to generate the parameter file. The scope adjustment provides a biasing mechanism to favor more or less specific filters during the generation process. These adjustments
and their affects on the resulting filter sets are discussed in Section V. Finally, the Trace Generator tool examines the syn-
2
IEEE INFOCOM 2005
Seed
Filter Set
Filter Set
Parameter
Filter Set
Filter Set
FileFilter Set
Parameter
Analyzer
(Seed)
Parameter
FileFilter Set
File
Parameter
(Seed)
Set of Benchmark
(Seed)
File
Parameter Files
(acl1)
Filter Set Generator
size
smoothing scope
Trace Generator
scale
locality
Synthetic
Filter Set
Synthetic
Header
Trace
Fig. 1. Block diagram of the ClassBench tool suite. The synthetic Filter
Set Generator has size, smoothing, and scope adjustments which provide highlevel, systematic mechanisms for altering the size and composition of synthetic
filter sets. The set of benchmark parameter files model real filter sets and may
be refined over time. The Trace Generator provides adjustments for trace size
and locality of reference.
thetic filter set, then generates a sequence of packet headers to
exercise the filter set. Like the Filter Set Generator, the trace
generator provides adjustments for scaling the size of the trace
as well as the locality of reference of headers in the trace. These
adjustments are described in detail in Section VI.
We highlight previous performance evaluation efforts by the
research community as well as related benchmarking activity
of the IETF in Section II. It is our hope that this work initiates a broader discussion which will lead to refinement of the
tools, compilation of a standard set of parameter files, and codification of a formal benchmarking methodology. Its value will
depend on its perceived clarity and usefulness to the interested
community:
• Researchers seeking to evaluate new classification algorithms relative to alternative approaches and commercial
products.
• Classification product vendors seeking to market their
products with convincing performance claims over competing products.
• Classification product customers seeking to verify and
compare classification product performance on a uniform
scale.
In order to facilitate broader discussion, we make the ClassBench tools and 12 parameter files publicly available at the following site:
http://www.arl.wustl.edu/˜det3/ClassBench/
II. R ELATED W ORK
Extensive work has been done in developing benchmarks for
many applications and data processing devices. Benchmarks
are used extensively in the field of computer architecture to
evaluate microprocessor performance. In the field of computer
communications, the Internet Engineering Task Force (IETF)
has several working groups exploring network performance
measurement. Specifically, the IP Performance Metrics (IPPM)
working group was formed with the purpose of developing standard metrics for Internet data delivery [2]. The Benchmarking
Methodology Working Group (BMWG) seeks to make measurement recommendations for various internetworking technologies [3]. These recommendations address metrics and performance characteristics as well as collection methodologies.
The BMWG specifically attacked the problem of measuring the performance of Forwarding Information Base (FIB)
routers [4] and also produced a methodology for benchmarking firewalls [5]. The methodology contains broad specifications such as: the firewall should contain at least one rule for
each host, tests should be run with various filter set sizes, and
test traffic should correspond to rules at the “end” of the filter
set. ClassBench complements efforts by the IETF by providing
the necessary tools for generating test vectors with high-level
control over filter set and input trace composition. The Network Processor Forum (NPF) has also initiated a benchmarking effort [6]. Currently, the NPF has produced benchmarks for
switch fabrics and route lookup engines. To our knowledge,
there are no current efforts by the IETF or the NPF to provide a
benchmark for multiple field packet classification.
In the absence of publicly available packet filter sets, researchers have exerted much effort in order to generate realistic performance tests for new algorithms. Several research
groups obtained access to real filter sets through confidentiality agreements. Gupta and McKeown obtained access to 40
real filter sets and extracted a number of useful statistics which
have been widely cited [7]. Feldmann and Muthukrishnan composed filter sets based on NetFlow packet traces from commercial networks [8]. Several groups have generated synthetic twodimensional filter sets consisting of source-destination address
prefix pairs by randomly selecting address prefixes from publicly available route tables [9], [8], [10]. Baboescu and Varghese also generated synthetic two-dimensional filter sets by randomly selecting prefixes from publicly available route tables,
but added refinements for controlling the number of zero-length
prefixes (wildcards) and prefix nesting [11], [12]. A simple
technique for appending randomly selected port ranges and protocols from real filter sets in order to generate synthetic fivedimensional filter sets is also described [11]. Baboescu and
Varghese also introduced a scheme for using a sample filter set
to generate a larger synthetic five-dimensional filter set [13].
This technique replicates filters by changing the IP prefixes
while keeping the other fields unchanged. While these techniques address some aspects of scaling filter sets in size, they
lack high-level mechanisms for adjusting filter set composition
which is crucial for evaluating algorithms that exploit filter set
characteristics.
Woo provided strong motivation for a packet classification
benchmark and initiated the effort by providing an overview
of filter characteristics for different environments (ISP Peering
Router, ISP Core Router, Enterprise Edge Router, etc.) [14].
Based on high-level characteristics, Woo generated large synthetic filter sets, but provided few details about how the filter
IEEE INFOCOM 2005
sets were constructed. The technique also does not provide controls for varying the composition of filters within the filter set.
Nonetheless, his efforts provide a good starting point for constructing a benchmark capable of modeling various application
environments for packet classification. Sahasranaman and Buddhikot used the characteristics compiled by Woo in a comparative evaluation of a few packet classification techniques [15].
3
TABLE I
D ISTRIBUTION OF FILTERS OVER THE FIVE PORT CLASSES FOR SOURCE
AND DESTINATION PORT RANGE SPECIFICATIONS ; VALUES GIVEN AS
PERCENTAGE
Port
Source
Destination
(%) OF FILTERS IN THE FILTER SET.
WC
78.08
40.39
HI
6.60
6.18
LO
0.92
0.06
AR
0.42
4.33
EM
13.99
49.04
III. A NALYSIS OF R EAL F ILTER S ETS
Recent efforts to identify better packet classification techniques have focused on leveraging the characteristics of real
filter sets for faster searches. While lower bounds for the general multi-field searching problem have been established, observations made in recent packet classification work offer enticing new possibilities to provide significantly better performance. The focus of this section is to identify and understand
the impetus for the observed structure of filter sets and to develop metrics and characterizations of filter set structure that
aid in generating synthetic filter sets. We performed a battery
of analyses on 12 real filter sets provided by Internet Service
Providers (ISPs), a network equipment vendor, and other researchers working in the field. The filter sets range in size from
68 to 4557 entries and utilize one of the following formats: Access Control List (ACL), Firewall (FW), and IP Chain (IPC).
Due to confidentiality concerns, the filter sets were provided
without supporting information regarding the types of systems
and environments in which they are used. We are unable to
comment on “where” in the network architecture the filter sets
are used. Nonetheless, the following analysis provide useful
insight into the structure of real filter sets. We observe that
various useful properties hold regardless of filter set size or format. Due to space constraints, we are unable to fully elaborate
on our analysis, but a more complete discussion of this work is
available in technical report form [16].
A. Understanding Filter Composition
Many of the observed characteristics of filter sets arise due
to the administrative policies that drive their construction. The
most complex packet filters typically appear in firewall and
edge router filter sets due to the heterogeneous set of applications supported in these environments. Firewalls and edge
routers typically implement security filters and network address
translation (NAT), and they may support additional applications
such as Virtual Private Networks (VPNs) and resource reservation. Typically, these filter sets are created manually by a system administrator using a standard management tool such as
CiscoWorks VPN/Security Management Solution (VMS) [17]
and Lucent Security Management Server (LSMS) [18]. Such
tools conform to a model of filter construction which views a
filter as specifying the communicating subnets and the application or set of applications. Hence, we can view each filter as
having two major components: an address prefix pair and an
application specification. The address prefix pair identifies the
communicating subnets by specifying a source address prefix
and a destination address prefix. The application specification
identifies a specific application session by specifying the transport protocol, source port number, and destination port number.
A set of applications may be identified by specifying ranges for
the source and destination port numbers.
B. Application Specifications
We analyzed the application specifications in the 12 filter sets
in order to corroborate previous observations as well as extract
new, potentially useful characteristics.
1) Protocol: For each of the filter sets, we examined the
unique protocol specifications and the distribution of filters over
the set of unique values. Filters specified one of nine protocols or the wildcard. The most common protocol specification was TCP (49%), followed by UDP (27%), the wildcard
(13%), and ICMP (10%). The following protocols were specified by less than 1% of the filters: General Routing Encapsulation (GRE), Open Shortest Path First (OSPF) Interior Gateway
Protocol (IGP), Enhanced Interior Gateway Routing Protocol
(EIGRP), IP Encapsulating Security Payload (ESP) for IPv6, IP
Authentication Header (AH) for IPv6, IP Encapsulation within
IP (IPE).
2) Port Ranges: Next, we examined the port ranges specified by filters in the filter sets and the distribution of filters over
the unique values. In order to observe trends among the various
filter sets, we define five classes of port ranges:
• WC, wildcard
• HI, ephemeral user port range [1024 : 65535]
• LO, well-known system port range [0 : 1023]
• AR, arbitrary range
• EM, exact match
Motivated by the allocation of port numbers, the first three
classes represent common specifications for a port range. The
last two classes may be viewed as partitioning the remaining
specifications based on whether or not an exact port number is
specified. We computed the distribution of filters over the five
classes for both source and destination ports for each filter set.
Table I shows the combined distribution for all filter sets. We
observe some interesting trends in the raw data. With rare exception, the filters in the ACL filter sets specify the wildcard for
the source port. A majority of filters in the ACL filters specify
an exact port number for the destination port. Source port specifications in the other filter sets are also dominated by the wildcard, but a considerable portion of the filters specify an exact
port number. Destination port specifications in the other filter
sets share the same trend, however the distribution between the
wildcard and exact match is a bit more even. Only one filter
set contained filters specifying the LO port class for either the
source or destination port range.
4
IEEE INFOCOM 2005
0.6
450
400
0.5
350
300
0.4
250
0.3
200
150
AR
EM
Source Port
Fig. 2. Port Pair Class Matrix for TCP, filter set fw4.
3) Port Pair Class: As previously discussed, the structure
of source and destination port range pairs is a key point of interest for both modeling real filter sets and designing efficient
search algorithms. We can characterize this structure by defining a Port Pair Class (PPC) for every combination of source
and destination port class. For example, WC-WC if both source
and destination port ranges specify the wildcard, AR-LO if the
source port range specifies an arbitrary range and the destination port range specifies the set of well-known system ports. As
shown in Figure 2, a convenient way to visualize the structure
of Port Pair Classes is to define a Port Pair Class Matrix where
rows share the same source port class and columns share the
same destination port class. For each filter set, we examined
the PPC Matrix defined by filters specifying the same protocol.
For all protocols except TCP and UDP, the PPC Matrix is trivial – a single spike at WC/WC. Figure 2 shows the PPC Matrix
defined by filters specifying the TCP protocol in filter set fw4.
C. Address Prefix Pairs
A filter identifies communicating hosts or subnets by specifying a source and destination address prefix, or address prefix pair. The speed and efficiency of several longest prefix
matching and packet classification algorithms depend upon the
number of unique prefix lengths and the distribution of filters
across those unique values. We find that a majority of the filter sets specify fewer than 15 unique prefix lengths for either
source or destination address prefixes. The number of unique
source/destination prefix pair lengths is typically less than 32,
which is small relative to the filter set size and the number
of possible combinations, 1024. For example, the largest filter set contained 4557 filters, 11 unique source address prefix
lengths, 3 unique destination address lengths, and 31 unique
source/destination prefix pair lengths.
Next, we examine the distribution of filters over the unique
address prefix pair lengths. Note that this study is unique in that
previous studies and models of filter sets utilized independent
distributions for source and destination address prefixes. Real
filter sets have unique prefix pair distributions that reflect the
types of filters contained in the filter set. For example, fully
specified source and destination addresses dominate the distribution for filter set ipc1 shown in Figure 3. There are very few
2
7
DA Prefix Length
31
32
27
22
17
12
LO
Destination Port
21
HI
26
WC
11
WC
HI
LO
AR
EM
0
16
0.1
100
50
0
1
6
0.2
SA Prefix Length
Fig. 3. Prefix length distribution for address prefix pairs in filter set ipc1.
filters specifying a 24-bit prefix for either the source or destination address, a notable difference from backbone route tables
which are dominated by class C address prefixes (24-bit network address) and their aggregates. Finally, we observe that
while the distributions for different filter sets are sufficiently
different from each other a majority of the filters in the filter
sets specify prefix pair lengths around the “edges” of the distribution. This implies that, typically, one of the address prefixes
is either fully specified or wildcarded.
By considering the prefix pair distribution, we characterize
the size of the communicating subnets specified by filters in the
filter set. Next, we would like to characterize the relationships
among address prefixes and the amount of address space covered by the prefixes in the filter set. Consider a binary tree constructed from the IP source address prefixes of all filters in the
filter set. From this tree, we could completely characterize the
data structure by determining a conditional branching probability for each node. For example, assume that an address prefix
is generated by traversing the tree starting at the root node. At
each node, the decision to take to the 0 path or the 1 path exiting
the node depends upon the branching probability at the node.
As shown in Figure 4, p{0|11} is the probability that the 0 path
is chosen at level 2 given that the 1 path was chosen at level 0
and the 1 path was chosen at level 1. Such a characterization is
overly complex, hence we employ suitable metrics that capture
the important characteristics while providing a more concise
representation.
We begin by constructing two binary tries from the source
and destination prefixes in the filter set. Note that there is one
level in the tree for each possible prefix length 0 through 32 for
a total of 33 levels. For each level in the tree, we compute the
probability that a node has one child or two children. Nodes
with no children are excluded from the calculation. We refer to
this distribution as the Branching Probability. For nodes with
two children, we compute skew, which is a relative measure of
the “weights” of the left and right subtrees of the node. Subtree
weight is defined to be the number of filters specifying prefixes
in the subtree, not the number of prefixes in the subtree. This
definition of weight accounts for popular prefixes that occur in
many filters. Let heavy be the subtree with the largest weight
IEEE INFOCOM 2005
5
p{1}
30
28
26
24
22
20
18
16
14
12
8
p{1|11}
0
p{0|11}
10
p{1|00}
1 Child
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
6
p{1|1}
4
p{0|00}
p{0|1}
2
p{0|0}
2 Children
Distribution
p{0}
Source Address Trie Depth
Fig. 4. Example of complete statistical characterization of address prefixes.
(a) Source address branching probability; average per level.
15
5
35
7
15
22
and let light be the subtree with equal or less weight, thus:
weight(light)
weight(heavy)
30
28
24
26
20
22
18
Source Address Trie Depth
2
Fig. 5. Example of skew computation for the first four levels of an address
trie; shaded nodes denote a prefix specified by a single filter; subtrees denoted
by triangles with associated weight.
skew = 1 −
14
16
0.91 0.79
0.80
10
12
0.46
0
0.67
0.40
8
0.51
0.98
6
0.98
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
4
0.36
2
0.36
Skew (nodes with 2 children)
skew
(1)
Consider the following example: given a node k with two children at level m, assume that 10 filters specify prefixes in the
1-subtree of node k (the subtree visited if the next bit of the
address is 1) and 25 filters specify prefixes in the 0-subtree of
node k. The 1-subtree is the light subtree, the 0-subtree is the
heavy subtree, and the skew at node k is 0.6. We compute the
average skew for all nodes with two children at level m, record
it in the distribution, and move on to level (m + 1). We provide
and example of computing skew for the first four levels of an
address trie in Figure 5.
The result of this analysis is two distributions for each address trie, a branching probability distribution and a skew distribution. We plot these distributions for the source address prefixes in filter set acl5 in Figure 6. In Figure 6(a), note that a
significant portion of the nodes in levels zero through five have
two children, but the amount generally decreases as we move
down the trie. The increase at level 16 and 17 is a notable exception. This implies that there is a considerable amount of
branching near the “top” of the trie, but the paths generally remain contained as we move down the trie. In Figure 6(b), we
observe that skew for nodes with two children hovers around
(b) Source address skew; average per level for nodes with two children.
Fig. 6. Source address branching probability and skew for filter set acl5.
0.5, thus the one subtree tends to contain prefixes specified by
twice as many filters as the other subtree. Note that skew is not
defined at levels where all nodes have one child. Also note that
levels containing nodes with two children may have an average
skew of zero (completely balanced subtrees), but this is rare. Finally, this definition of skew provides an anonymous measure
of address prefix structure, as it does not preserve address prefix
values.
Branching probability and skew characterize the structure of
the individual source and destination address prefixes; however,
it does not capture their interdependence. It is possible that
some filters in a filter set match flows contained within a single subnet, while others match flows between different subnets.
In order to capture this characteristic of a seed filter set, we
measure the “correlation” of source and destination prefixes. In
this context, we define correlation to be the probability that the
source and destination address prefixes continue to be the same
for a given prefix length. This measure is only valid within the
range of address bits specified by both address prefixes. Additional details regarding the “correlation” metric and results
from real filter sets may be found in the technical report [16].
D. Scope
Next we seek to characterize the specificity of the filters in
the filter set. Filters that are more specific cover a small set of
6
IEEE INFOCOM 2005
possible packet headers while filters that are less specific cover
a large set of possible packet headers. The number of possible
packet headers covered by a filter is characterized by its tuple
specification. To be specific, we consider the standard 5-tuple
as a vector containing the following fields:
• t[0], source address prefix length, [0...32]
• t[1], destination address prefix length, [0...32]
• t[2], source port range width, the number of port numbers
covered by the range, [0...216 ]
• t[3], destination port range width, the number of port numbers covered by the range, [0...216 ]
• t[4], protocol specification, Boolean value denoting
whether or not a protocol is specified, [0, 1]
We define a new metric, scope, to be the logarithmic measure
of the number of possible packet headers covered by the filter.
Using the definition above, we define a filter’s 5-tuple scope as
follows:
scope = lg{(232−t[0] ) × (232−t[1] ) × t[2] × t[3] × (28(1−t[4]) )}
= (32 − t[0]) + (32 − t[1]) + (lg t[2]) + (lg t[2]) +
8(1 − t[4])
(2)
Thus, scope is a measure of filter specificity on a scale from 0
to 104. The average 5-tuple scope for our 12 filter sets ranges
from 56 to 24. We note that filters in the ACL filter sets tend to
have narrower scope, while filters in the FW filter sets tend to
have wider scope.
E. Additional Fields
An examination of real filter sets reveals that additional fields
beyond the standard 5-tuple are relevant. In 10 of the 12 filter
sets that we studied, filters contain matches on TCP flags or
ICMP type numbers. In most filter sets, a small percentage of
the filters specify a non-wildcard value for the flags, typically
less then two percent. There are notable exceptions, as approximately half the filters in filter set ipc1 contain non-wildcard
flags. We argue that new services and administrative policies
will demand that packet classification techniques scale to support additional fields beyond the standard 5-tuple. Matches
on ICMP type number and other higher-level header fields are
likely to be exact matches. There may be other types of matches
that more naturally suit the application, such as arbitrary bit
masks on TCP flags.
IV. PARAMETER F ILES
Given a real filter set, the Filter Set Analyzer generates a parameter file that contains statistics and probability distributions
that allow the Filter Set Generator to produce a synthetic filter
set that retains the relevant characteristics of the original filter set. We chose the statistics and distributions to include in
the parameter file based on thorough analysis of 12 real filter
sets and several iterations of the Filter Set Generator design.
Note that parameter files also provide complete anonymity of
addresses in the original filter set. By reducing confidentiality
concerns, we seek to remove the significant access barriers to
realistic test vectors for researchers and promote the development of a benchmark set of parameter files. There still exists a
need for a large sample space of real filter sets from various application environments. We have generated a set of 12 parameter files which are publicly available along with the ClassBench
tool suite.
Parameter files include the following entries1 :
•
•
•
•
•
•
•
•
•
Protocol specifications and the distribution of filters over
those values
Port Pair Class Matrix for each unique protocol specification in the filter set
Flags specifications for each protocol and a distribution of
filters over those values
Arbitrary port range specifications and a distribution of
filters over those values for both the source and destination
port fields
Exact port number specifications and a distribution of filters over those values for both the source and destination
port fields
Prefix pair length distribution for each Port Pair Class Matrix
Address prefix branching and skew distributions for both
source and destination address prefixes
Address prefix correlation distribution
Prefix nesting thresholds for both source and destination
address prefixes.
Parameter files represent prefix pair length distributions using a combination of a total prefix length distribution and source
prefix length distributions for each specified total length2 as
shown in Figure 7. The total prefix length is simply the sum
of the prefix lengths for the source and destination address prefixes. As we will demonstrate in Section V-B, modeling the
total prefix length distribution allows us to easily bias the generation of more or less specific filters based on the scope input parameter. The source prefix length distributions associated
with each specified total length allow us to model the prefix pair
length distribution, as the destination prefix length is simply the
difference of the total length and the source length.
The number of unique address prefixes that match a given
packet is an important property of real filter sets and is often
referred to as prefix nesting. We found that if the Filter Set
Generator is ignorant of this property, it is likely to create filter sets with significantly higher prefix nesting, especially when
the synthetic filter set is larger than the filter set used to generate the parameter file. Given that prefix nesting remains relatively constant for filter sets of various sizes, we place a limit
on the prefix nesting during the filter generation process. The
Filter Set Analyzer computes the maximum prefix nesting for
both the source and destination address prefixes in the filter set
and records these statistics in the parameter file. The Filter
Set Generator retains these prefix nesting properties in the synthetic filter set, regardless of size. We discuss the process of
generating address prefixes and retaining prefix nesting properties in Section V.
1 We avoid an exhaustive discussion of parameter file contents and format details; interested readers and potential users of ClassBench may find a discussion
of parameter file format in the documentation provided with the tools.
2 We do not need to store a source prefix distribution for total prefix lengths
that are not specified by filters in the filter set.
IEEE INFOCOM 2005
7
Prefix Pair Length Distribution
Total Prefix Length Distribution
1
250
0.8
0.6
200
50
64
56
48
40
32
24
8
0
0
0.2
100
16
0.4
150
0
1.0
0.4
32
0.2
32
32
0
0
0
32
DA Length
1.0
SA Length
16
24
0
16
16
8
0
Source Prefix Length Distributions
Fig. 7. Parameter files represent prefix pair length distributions using a combination of a total prefix length distribution and source prefix length distributions
for each non-zero total length.
V. S YNTHETIC F ILTER S ET G ENERATION
The Filter Set Generator is the cornerstone of the ClassBench
tool suite. Perhaps the most succinct way to describe the synthetic filter set generation process is to walk through the pseudocode shown in Figure 8. The first step in the filter generation process is to read the statistics and distributions from the
parameter file. Next, we get the four high-level input parameters:
• size: target size for the synthetic filter set
• smoothing: controls the number of new address aggregates
(prefix lengths)
• port scope: biases the tool to generate more or less specific
port range pairs
• address scope: biases the tool to generate more or less
specific address prefix pairs
We refer to the size parameter as a “target” size because the generated filter set may have fewer filters. This is due to the fact
that it is possible for the Filter Set Generator to produce a filter
set containing redundant filters, thus the final step in the process
removes the redundant filters. The generation of redundant filters stems from the way the tool assigns source and destination
address prefixes that preserve the properties specified in the parameter file. This process will be described in more detail in a
moment.
Before we begin the generation process, we apply the
smoothing adjustment to the prefix pair length distributions3 (lines 6 through 10). In order to apply the smoothing adjustment, we must iterate over all Port Pair Classes (line 7), apply the adjustment to each total prefix length distribution (line
8) and iterate over all total prefix lengths (line 9), and apply the
adjustment to each source prefix length distribution associated
with the total prefix length (line 10). We discuss this adjustment
and its effects on the generated filter set in Section V-A.
The next set of steps (lines 12 through 27) generate a partial filter for each entry in the Filters array. Essentially, we
assign all filter fields except the address prefix values. Note
that the prefix lengths for both source and destination address
3 Note that the scope adjustments do not add any new prefix lengths to the
distributions. It only changes the likelihood that longer or shorter prefix lengths
in the distribution are chosen.
FilterSetGenerator()
// Read input file and parameters
1 read(parameter file)
2 get(size)
3 get(smoothing)
4 get(port scope)
5 get(address scope)
// Apply smoothing to prefix pair lengths
6 If smoothing > 0
7
For i : 1 to MaxPortPairClass
8
TotalLengths[i]→smooth(smoothing)
9
For j : 0 to 64
10
SALengths[i][j]→smooth(smoothing)
// Allocate temporary filter array
11 FilterType Filters[size]
// Generate partial filters
12 For i : 1 to size
// Choose an application specification
13
rv = Random()
14
Filters[i].Prot = Protocols→choose(rv)
15
rv = Random()
16
Filters[i].Flags =
Flags[Filters[i].Prot]→choose(rv)
17
rv = RandomBias(port scope)
18
PPC = PPCMatrix[Filters[i].Prot]→choose(rv)
19
rv = Random()
20
Filters[i].SP =
SrcPorts[PPC.SPClass]→choose(rv)
21
rv = Random()
22
Filters[i].DP =
DstPorts[PPC.DPClass]→choose(rv)
// Choose an address prefix pair length
23
rv = RandomBias(address scope)
24
TotalLength = TotalLengths[PPC]→choose(rv)
25
rv = Random()
26
Filters[i].SALength =
SrcLengths[PPC][TotalLength]→choose(rv)
27
Filters[i].DALength =
TotalLength - Filters[i].SALength
// Assign address prefix pairs
28 AssignSA(Filters)
29 AssignDA(Filters)
// Remove redundant filters
30 RemoveRedundantFilters(Filters)
// Prevent filter nesting
31 OrderNestedFilters(Filters)
32 PrintFilters(Filters)
Fig. 8. Pseudocode for Filter Set Generator.
are assigned. The reason for this approach will become clear
when we discuss the assignment of address prefix values in a
moment. The first step in generating a partial filter is to select
a protocol from the Protocols distribution (line 14) using a
uniform random variable, rv (line 13). We chose to select the
protocol first because we found that the protocol specification
8
dictates the structure of the other filter fields. Next, we select
the protocol flags4 from the Flags distribution associated with
the chosen protocol (line 16).
After choosing the protocol and flags, we select a Port Pair
Class, PPC, from the Port Pair Class Matrix, PPCMatrix,
associated with the chosen protocol (line 18). Note that the selection of the PPC is performed with a random variable that is
biased by the port scope parameter (line 17). This adjustment
allows the user to bias the Filter Set Generator to produce a filter set with more or less specific PPCs, where WC-WC (both
port ranges wildcarded) is the least specific and EM-EM (both
port ranges specify an exact match port number) is the most
specific. We discuss this adjustment and its effects on the generated filter set in Section V-B. Given the PPC, we can select
the source and destination port ranges from their respective port
range distributions associated with each port class (lines 20 and
22). Note that the distributions for port classes WC, HI, and LO
are trivial as they define single ranges.
Selecting the address prefix pair lengths is the last step in
generating a partial filter. We select a total prefix pair length
from the distribution associated with the chosen PPC (line 24)
using a random variable biased by the address scope parameter
(line 23). We select a source prefix length from the distribution
associated with the chosen PPC and total length (line 26) using
a uniform random variable (line 25). Finally, we calculate the
destination address prefix length using the chosen total length
and source address prefix length (line 27).
After we generate all the partial filters, we must assign the
source and destination address prefix values. The AssignSA
routine recursively constructs a binary trie using the set of
source address prefix lengths in Filters and the source address branching probability and skew distributions specified by
the parameter file (line 28). The recursive process first examines all of the entries in FilterList. If an entry has a source
prefix length equal to the level of the node, it assigns the node’s
address to the entry and removes the entry from FilterList.
The process then distributes the remaining filters to child nodes
according to the branching probability and skew for the node’s
level. Note that we also keep track of the number of prefixes
that have been assigned along a path and ensure that the prefix
nesting threshold is not exceeded.
Assigning destination address prefix values is symmetric to
the process for source address prefixes with one extension. In
order to preserve the relationship between source and destination address prefixes in each filter, the AssignDA process (line
29) also considers the correlation distribution specified in the
parameter file. In order to preserve the correlation, AssignDA
employs a two-phase process of constructing the destination
address trie. The first phase recursively distributes filters according to the correlation distribution. When the address prefixes of a particular filter cease to be correlated, it stores the filter in a temporary StubList associated with the current tree
node. The second phase recursively walks down the tree and
completes the assignment process in the same manner as the
AssignSA process, with the exception that the StubList is
appended to the FilterList passed to the AssignDA pro4 Note that the protocol flags field is typically the wildcard unless the chosen
protocol is TCP or ICMP.
IEEE INFOCOM 2005
cess prior to processing. Additional details regarding the address prefix assignment process are included in the technical
report [16].
Note that we do not explicitly prevent the Filter Set Generator from generating redundant filters. Identical partial filters
may be assigned the same source and destination address prefix values by the AssignSA and AssignDA functions. In
essence, this preserves the characteristics specified by the parameter file because the number of unique filter field values allowed by the various distributions is inherently limited. Consider the example of attempting to generate a large filter set
using a parameter file from a small filter set. If we are forced to
generate the number of filters specified by the size parameter,
we face two unfavorable results: (1) the resulting filter set may
not model the parameter file because we are repeatedly forced
to choose values from the tails of the distributions in order to
create unique filters, or (2) the Filter Set Generator never terminates because it has exhausted the distributions and cannot
create any more unique filters. With the current design of the
Filter Set Generator, a user can produce a larger filter set by
simply increasing the size target beyond the desired size. While
this does introduce some variability in the size of the synthetic
filter set, we believe this is a tolerable trade-off to make for
maintaining the characteristics in the parameter file and achieving reasonable execution times for the Filter Set Generator.
Thus, after generating a list of size synthetic filters, we remove any redundant filters from the list via the RemoveRedundantFilters function (line 30). A naı̈ve implementation of this function would require O(N 2 ) time, where N is
equal to size. We discuss an efficient mechanism for removing
redundant filters from the set in Section V-C. After removing
redundant filters from the filter set, we sort the filters in order
of increasing scope (line 31). This allows the filter set to be
searched using a simple linear search technique, as nested filters
will be searched in order of decreasing specificity. An efficient
technique for performing this sorting step is also discussed in
Section V-C.
A. Smoothing Adjustment
As filter sets scale in size, we anticipate that new address
prefix pair lengths will emerge due to subnet aggregation and
segregation. In order to model this behavior, we provide for
the introduction of new prefix lengths in a structured manner.
Injecting purely random address prefix pair lengths during the
generation process neglects the structure of the filter set used to
generate the parameter file. Using scope as a measure of distance, subnet aggregation and segregation results in new prefix
lengths that are “near” to the original prefix length. Consider
the address prefix pair length distribution where all filters in the
filter set have 16-bit source and destination address prefixes;
thus, the distribution is a single “spike”. In order to model
aggregation and splitting of subnets, new prefix pair lengths
should be clustered around the existing spike in the distribution. This structured approach translates “spikes” in the distribution into smoother “hills”; hence, we refer to the process as
smoothing.
In order to control the injection of new prefix lengths, we define a smoothing parameter which limits the maximum radius
IEEE INFOCOM 2005
9
2500
3500
Number of Filters
3000
2000
2500
1500
2000
1000
1500
1000
500
500
0
0
8
0
0
4
8
20
DA Prefix Length
25
30
16
0
8
20
16
12
15
24
10
28
5
32
0
24
16
24
DA Prefix Length
SA Prefix Length
Fig. 9. Prefix pair length distributions for a synthetic filter set of 64000 filters
generated with a parameter file specifying 16-bit prefix lengths for all addresses
and smoothing parameter r = 8.
32
32
SA Prefix Length
Fig. 10. Prefix pair length distribution for a synthetic filter set of 64000 filters
generated with the ipc1 parameter file with smoothing parameter r = 4.
B. Scope Adjustment
the number of flow-specific filters in a filter sets increases, the
average scope decreases. If the number of explicitly blocked
ports for all packets in a firewall filter set increases, then the average scope may increase5 . In order to explore the performance
effects of filter scope, we provide high-level adjustments of the
average scope of the synthetic filter set. Two input parameters,
address scope and port scope, allow the user to bias the Filter Set Generator to create more or less specific address prefix
pairs and port pairs, respectively.
In order to sample from a cumulative distribution, we typically choose a random number uniformly distributed between
zero and one, rvuni , then chooses the value covering rvuni in
the cumulative distribution. Graphically, this amounts to projecting a horizontal line from the random number on the yaxis. The chosen value is the x-coordinate of the intersection of
the cumulative distribution and the y-projection of the random
number. In Figure 11, we shown an example of sampling from a
cumulative total prefix pair length distribution with rvuni = 0.5
to choose the total prefix pair length of 44. The scope adjustments bias the sampling process to select more or less specific
Port Pair Classes and prefix pair lengths. We can realize this
in two ways: (1) apply the adjustment to the cumulative distribution, or (2) bias the random variable used to sample from the
cumulative distribution. Consider the case of selecting prefix
pair lengths. The first option requires that we recompute the
cumulative distribution to make longer or shorter total prefix
lengths more or less probable, as dictated by the address scope
parameter. The second option provides a conceptually simpler
alternative. Returning to the example in Figure 11, if we want
to bias the Filter Set Generator to produce more specific address prefix pairs, then we want the random variable used to
sample from the distribution to be biased to values closer to 1.
The reverse is true if we want less specific address prefix pairs.
Thus, in order to apply the scope adjustment we simply use
a random number generator to choose a uniformly distributed
random variable, rvuni , apply a biasing function to generate a
biased random variable, rvbias , and sample from the cumulative
distribution using rvbias .
As filter sets scale in size and new applications emerge, it is
likely that the average scope of the filter set will change. As
5 We are assuming a common practice of specifying an exact match on the
blocked port number and wildcards for all other filter fields
of deviation from the original prefix pair length, where radius
is measured in the number of bits specified by the prefix pair.
Geometrically, this measurement may be viewed as the Manhattan distance from one prefix pair length to another. For convenience, let the smoothing parameter be equal to r. We chose
to model the clustering using a symmetric binomial distribution. Given the parameter r, a symmetric binomial distribution
is defined on the range [0 : 2r], and the probability at each point
i in the range is given by:
pi =
2r
i
2r
1
2
(3)
Note that r is the median point in the range with probability
pr , and r may assume values in the range [0 : 64]. Once we
generate the symmetric binomial distribution from the smoothing parameter, we apply this distribution to each specified prefix pair length. The smoothing process involves scaling each
“spike” in the distribution according to the median probability pr , and binomially distributing the residue to the prefix pair
lengths within the r-bit radius. When prefix lengths are at the
“edges” of the distribution, we simply truncate the binomial distribution. This requires us to normalize the prefix pair length
distribution as the last step in the smoothing process.
In order to demonstrate this process, Figure 9 shows the prefix pair length distribution for a synthetic filter set generated
with a parameter file specifying 16-bit prefix lengths for all
addresses and a smoothing parameter r = 8. In practice, we
expect that the smoothing parameter will be limited to at most
8. In order to demonstrate the effect of smoothing on a real filter set, Figure 10 shows the prefix pair length distribution for
a synthetic filter set of 64000 filters generated using the ipc1
parameter file and smoothing parameter r = 4. Note that this
synthetic filter set retains the structure of the original filter set
shown in Figure 3 while modeling a realistic amount of address
prefix aggregation and segregation.
10
IEEE INFOCOM 2005
Cummulative Density
1
0.8
s = -1
s=1
s=0
1
1
0.6
1
0.5
rv(uni) = 0.5
0.4
0
rv(bias) = 0.25
1
0.5
Uniform RV
0.25
0.5
1
0
Uniform RV
0.75
0
0.5
1
Uniform RV
0.2
35
0
0
8
16
24
32
40
(a) Biased random variable is defined by area under line with slope
s = 2 × scope.
44
48
56
64
1
Fig. 11. Example of sampling from a cumulative distribution using a uniform
random variable, and a biased random variable. Distribution is for the total
prefix pair length associated with the WC-WC port pair class of the acl2 filter
set.
While there are many possible biasing functions, we limit
ourselves to a particularly simple class of functions. Our chosen biasing function may be viewed as applying a slope, s, to
the uniform distribution as shown in Figure 12(a). When the
slope s = 0, the distribution is uniform. The biased random
variable corresponding to a uniform random variable on the xaxis is equal to the area of the rectangle defined by the value and
a line intersecting the y-axis at one with a slope of zero. Thus,
the biased random variable is equal to the uniform random variable. We can bias the random variable by altering the slope
of the line. In order for the biasing function to have a range
of [0 : 1] for random variables in the range [0 : 1], the slope
adjustment must be in the range [−2 : 2]. For convenience,
we define the scope adjustments to be in the range [−1 : 1],
thus the slope is equal to two times the scope adjustment. For
non-zero slope values, the biased random variable corresponding to a uniform random variable on the x-axis is equal to the
area of the trapezoid defined by the value and a line intersecting
the point (0.5, 1) with a slope of s. The expression for the biased random variable, rvbias , given a uniform random variable,
rvuni , and a scope parameter in the range [−1 : 1] is:
rvbias = rvuni (scope × rvuni − scope + 1)
(4)
Figure 12(b) shows a plot of the biasing function for scope values of 0, -1, and 1, as well as an example of computing the
biased random variable given a uniform random variable of 0.5
and a scope parameter of 1. In this case the rvbias is 0.25. Let
us return to the example of choosing the total address prefix
length from the cumulative distribution. In Figure 11, we also
show an example of sampling the distribution using the biased
random variable, rvbias = 0.25, resulting from applying the
biasing function with scope = 1. The biasing results in the selection of a less specific address prefix pair, a total length of 35
as opposed to 44.
Positive values of address scope bias the Filter Set Generator to choose less specific address prefix pairs, thus increasing
the average scope of the filter set. Likewise, negative values
of address scope bias the Filter Set Generator to choose more
specific address prefix pairs, thus decreasing the average scope
Uniform Random Variable
Total Prefix Pair Length
0.8
scope = 1
0.6
0.5
scope = 0
0.4
scope = -1
0.2
0.25
0
0
0.2
0.4
0.6
0.8
1
Biased Random Variable
(b) Plot of scope biasing function.
Fig. 12. Scope applies a biasing function to a uniform random variable.
of the filter set. The same effects are realized by the port scope
adjustment by biasing the Filter Set Generator to select more
or less specific Port Pair Classes.
Finally, we show the results of tests assessing the effects of
the address scope and port scope parameters on the synthetic
filter sets generated by the Filter Set Generator in Figure 13.
Each data point in the plot is from a synthetic filter set containing 16000 filters generated from a parameter file from filter
sets acl3, fw5, or ipc1. For these tests, both scope parameters were set to the same value. Over their range of values, the
scope parameters alter the average filter scope by ±6 to ±7.5.
We also measured the individual effects of the address scope
and port scope parameters. Over its range of values, the address scope alters the average address pair scope by ±4 to ±6.
Over its range of values, the port scope alters the average port
pair scope by ±1.5 to ±2.5. These scope adjustments provide
a convenient high-level mechanism for exploring the effects of
filter specificity on the performance of packet classification algorithms and devices.
C. Filter Redundancy & Priority
The final steps in synthetic filter set generation are removing
redundant filters and ordering the remaining filters in order of
IEEE INFOCOM 2005
11
acl3
fw5
ipc1
100
5-tuple Scope
80
60
40
20
0
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Address & Port Scope Parameters
Fig. 13. Average scope of synthetic filter sets consisting of 16000 filters generated with parameter files extracted from filter sets acl3, fw5, and ipc1, and
various values of the scope parameters.
increasing scope. The removal of redundant filters may be realized by simply comparing each filter against all other filters
in the set; however, this naı̈ve implementation requires O(N 2 )
time. Such an approach makes execution times of the Filter Set
Generator prohibitively long for filter sets with more than a few
thousand filters. In order to accelerate this process, we first sort
the filters into sets according to their tuple specification. We
perform this sorting efficiently by constructing a binary search
tree of tuple set pointers, using the scope of the tuple as the
key for the node. When adding a filter to a tuple set, we search
the set for redundant filters. If no redundant filters exist in the
set, then we add the filter to the set. If a redundant filter exists
in the set, we discard the filter. The time complexity of this
search technique depends on the number of tuples created by
filters in the filter set and the distribution of filters across the
tuples. In practice, we find that this technique provides acceptable performance. Generating a synthetic filter set of 10k filters
requires approximately five seconds, while a filter set of 100k
filters requires approximately five minutes with a Sun Ultra 10
workstation.
In order to support the traditional linear search technique,
filter priority is often inferred by placement in an ordered list.
In such cases, the first matching filter is the best matching filter.
This arrangement could obviate a filter fi if a less specific filter
fj ⊃ fi occupies a higher position in the list. To prevent this,
we order the filters in the synthetic filter set according to scope,
where filters with minimum scope occur first. The binary search
tree of tuple set pointers makes this ordering task simple. Recall
that we use scope as the node key. Thus, we simply perform an
in-order walk of the binary search tree, appending the filters in
each tuple set to the output list of filters.
VI. T RACE G ENERATION
When benchmarking a particular packet classification algorithm or device, many of the metrics of interest such as storage
efficiency and maximum decision tree depth may be garnered
using the synthetic filter sets generated by the Filter Set Generator. In order to evaluate the throughput of techniques employ-
TraceGenerator()
// Generate list of synthetic packet headers
1 read(FilterSet)
2 get(scale)
3 get(ParetoA)
4 get(ParetoB)
5 Threshold = scale × size(FilterSet)
6 HeaderList Headers()
7 While size(Headers) < Threshold
8
RandFilt = randint(0,size(FilterSet))
9
NewHeader = RandomCorner(RandFilt,FilterSet)
10
Copies = Pareto(ParetoA,ParetoB)
11
For i : 1 to Copies
12
Headers→append(NewHeader)
13 Headers→print
Fig. 14. Pseudocode for Trace Generator.
ing caching or the power consumption of various devices under
load, we must exercise the algorithm or device using a sequence
of synthetic packet headers. The Trace Generator produces a
list of synthetic packet headers that probe filters in a given filter set. Note that we do not want to generate random packet
headers. Rather, we want to ensure that a packet header is covered by at least one filter in the FilterSet in order to exercise
the packet classifier and avoid default filter matches. We experimented with a number of techniques to generate synthetic
headers. One possibility is to compute all the d-dimensional
polyhedra defined by the intersections of the filters in the filter
set, then choose a point in the d-dimensional space covered by
the polyhedra. The point defines a packet header. The bestmatching filter for the packet header is simply the highest priority filter associated with the polyhedra. If we generate at least
one header corresponding to each polyhedra, we fully exercise
the filter set. The number of polyhedra defined by filter intersections grows exponentially, and thus fully exercising the filter
set quickly becomes intractable. As a result, we chose a method
that partially exercises the filter set and allows the user to vary
the size and composition of the headers in the trace using highlevel input parameters. These parameters control the scale of
the header trace relative to the filter set, as well as the locality
of reference in the sequence of headers. As we did with the
Filter Set Generator, we discuss the Trace Generator using the
pseudocode shown in Figure 14.
We begin by reading the FilterSet (line 1) and getting the
input parameters scale, ParetoA, and ParetoB (lines 2 through
4). The scale parameter is used to set a threshold for the size of
the list of headers relative to the size of the FilterSet (line 5). In
this context, scale specifies the ratio of the number of headers
in the trace to the number of filters in the filter set. The next set
of steps continue to generate synthetic headers as long as the
size of Headers does not exceed the Threshold defined by
the product of scale and the number filters in FilterSet.
Each iteration of the header generation loop begins by selecting a random filter in the FilterSet (line 8). Next, we must
choose a packet header covered by the filter. In the interest of
exercising priority resolution mechanisms and providing con-
12
IEEE INFOCOM 2005
servative performance estimates for algorithms relying on filter
overlap properties, we would like to choose headers matching a
large number of filters. In the course of our analyses, we found
the number of overlapping filters is large for packet headers
representing the “corners” of filters. Each field of a filter covers
a range of values. Choosing a packet header corresponding to
a “corner” translates to choosing a value for each header field
from one of the extrema of the range specified by each filter
field. The RandomCorner function chooses a random “corner” of the filter identified by RandFilt and stores the header
in NewHeader.
The last steps in the header generation loop append a variable
number of copies of NewHeader to the trace. The number of
copies, Copies, is chosen by sampling from a Pareto distribution controlled by the input parameters, ParetoA and ParetoB
(line 10). In doing so, we provide a simple control point for
the locality of reference in the header trace. The Pareto distribution6 is one of the heavy-tailed distributions commonly used
to model the burst size of Internet traffic flows as well as the
file size distribution for traffic using the TCP protocol [19]. For
convenience, let a = P aretoA and b = P aretoB. The probability density function for the Pareto distribution may be expressed as:
aba
(5)
P (x) = a+1
x
where the cumulative distribution is:
a
b
(6)
D(x) = 1 −
x
The Pareto distribution has a mean of:
µ=
ab
a−1
(7)
Expressed in this way, a is typically called the shape parameter
and b is typically called the scale parameter, as the distribution
is defined on values in the interval (b, ∞). The following are
some examples of how the Pareto parameters are used to control
locality of reference:
• Low locality of reference, short tail: (a = 10, b = 1) most
headers will be inserted once
• Low locality of reference, long tail: (a = 1, b = 1) many
headers will be inserted once, but some could be inserted
over 20 times
• High locality of reference, short tail: (a = 10, b = 4) most
headers will be inserted four times
Once the size of the trace exceeds the threshold, the header generation loop terminates. Note that a large burst near the end of
the process will cause the trace to be larger than Threshold.
After generating the list of headers, we write the trace to an
output file (line 13).
VII. B ENCHMARKING WITH C LASS B ENCH
We have already found ClassBench to be tremendously valuable in our own research [20]. In order to provide value for
6 The Pareto distribution, a power law distribution named after the Italian
economist Vilfredo Pareto, is also known as the Bradford distribution.
the broader community, a packet classification benchmark must
provide meaningful measurements that cover the spectrum of
application environments. It is with this in mind that we designed the suite of ClassBench tools to be flexible while hiding
the low-level details of filter set structure. While it is unclear if
real filter sets will vary as specified by the smoothing and scope
parameters, we believe that the tool provides a useful mechanism for measuring the effects of filter set composition on classifier performance. It is our hope that ClassBench will enjoy
broader use by researchers in need of realistic test vectors; it
is also our intention to initiate and frame a broader discussion
within the community that results in a larger set of parameter
files that model real filter sets as well as the formulation of a
standard benchmarking methodology.
ACKNOWLEDGMENTS
We would like to thank Ed Spitznagel for contributing his
insight to countless discussions on packet classification and assisting in the debugging of the ClassBench tools. We also would
like to thank Venkatachary Srinivasan and Will Eatherton for
making real filter sets available for study.
R EFERENCES
[1] D. E. Taylor, “Survey & Taxonomy of Packet Classification Techniques,”
Tech. Rep. WUCSE-2004-24, Department of Computer Science & Engineering, Washington University in Saint Louis, May 2004.
[2] V. Paxson, G. Almes, J. Mahdavi, and M. Mathis, “Framework for ip
performance metrics.” RFC 2330, May 1998.
[3] S. Bradner and J. McQuaid, “Benchmarking Methodology for Network
Interconnect Devices.” RFC 2544, March 1999.
[4] G. Trotter, “Methodology for Forwarding Information Base (FIB) based
Router Performance.” Internet Draft, January 2002.
[5] B. Hickman, D. Newman, S. Tadjudin, and T. Martin, “Benchmarking
Methodology for Firewall Performance.” RFC 3511, April 2003.
[6] P. Chandra, F. Hady, and S. Y. Lim, “Framework for Benchmarking Network Processors.” Network Processing Forum, 2002.
[7] P. Gupta and N. McKeown, “Packet Classification on Multiple Fields,” in
ACM Sigcomm, August 1999.
[8] A. Feldmann and S. Muthukrishnan, “Tradeoffs for Packet Classification,” in IEEE Infocom, March 2000.
[9] P. Gupta and N. McKeown, “Packet Classification using Hierarchical Intelligent Cuttings,” in Hot Interconnects VII, August 1999.
[10] P. Warkhede, S. Suri, and G. Varghese, “Fast Packet Classification for
Two-Dimensional Conflict-Free Filters,” in IEEE Infocom, 2001.
[11] F. Baboescu and G. Varghese, “Scalable Packet Classification,” in ACM
Sigcomm, August 2001.
[12] F. Baboescu and G. Varghese, “Fast and Scalable Conflict Detection for
Packet Classifiers,” in Proceedings of IEEE International Conference on
Network Protocols (ICNP), 2002.
[13] F. Baboescu, S. Singh, and G. Varghese, “Packet Classification for Core
Routers: Is there an alternative to CAMs?,” in IEEE Infocom, 2003.
[14] T. Y. C. Woo, “A Modular Approach to Packet Classification: Algorithms
and Results,” in IEEE Infocom, March 2000.
[15] V. Sahasranaman and M. Buddhikot, “Comparative Evaluation of Software Implementations of Layer 4 Packet Classification Schemes,” in Proceedings of IEEE International Conference on Network Protocols, 2001.
[16] D. E. Taylor and J. S. Turner, “ClassBench: A Packet Classification
Benchmark,” Tech. Rep. WUCSE-2004-28, Department of Computer Science & Engineering, Washington University in Saint Louis, May 2004.
[17] Cisco, “CiscoWorks VPN/Security Management Solution,” tech. rep.,
Cisco Systems, Inc., 2004.
[18] Lucent, “Lucent Security Management Server: Security, VPN, and QoS
Management Solution,” tech. rep., Lucent Technologies Inc., 2004.
[19] Wikipedia, “Pareto distribution.” Wikipedia, The Free Encyclopedia,
April 2004. http://en.wikipedia.org/wiki/Pareto distribution.
[20] D. E. Taylor and J. S. Turner, “Scalable Packet Classification using Distributed Crossproducting of Field Labels,” Tech. Rep. WUCSE-2004-38,
Department of Computer Science and Engineering, Washington University in Saint Louis, June 2004.