Spotlight
Internet Traffic
Measurement
Carey Williamson • University of Calgary
T
he Internet’s evolution over the past 30
years has been accompanied by the development of various network applications.
These applications range from early text-based
utilities such as file transfer and remote login to
the more recent advent of the Web, electronic commerce, and multimedia streaming.
For most users, the Internet is simply a connection to these applications. They are shielded from the
details of how the Internet works, through the information-hiding principles of the Internet protocol
stack, which dictates how user-level data is transformed into network packets for transport across the
network and put back together for delivery at the
receiving application. For many networking researchers, however, the protocols themselves, rather
than the information they carry, are of interest.
Using specialized network measurement hardware or software, these researchers collect information about network packet transmissions,
including their timing structure and contents. With
detailed packet-level measurements and some
knowledge of the IP stack, they can use reverse
engineering to gather significant information about
both the application structure and user behavior,
which can be applied to a variety of tasks like network troubleshooting, protocol debugging, workload characterization, and performance evaluation
and improvement.
From humble beginnings in local area networks,
traffic measurement technologies have scaled up
over the past 15 years to provide insight into fundamental behavior properties of the Internet, its
protocols, and its users. In this overview, I introduce the tools and methods for measuring Internet
traffic and offer highlights from research results.
Measurement Tools
Network measurement tools include hardware and
software approaches. Software tools are typically
much less expensive than hardware, but the latter
usually offer better functionality and performance.
70
NOVEMBER • DECEMBER 2001
http://computer.org/internet/
Hardware Approaches
Network traffic analyzers are special-purpose tools
designed to collect and analyze network data. Such
equipment is widely available and often expensive,
with the cost depending on the number of network
interfaces, types of network cards, storage capacity, and protocol processing capabilities.
As an example of the hardware-based traffic
analysis process, I’ll use measurements we collected in 1998 at an ISP running IP over an Asynchronous Transfer Mode backbone. To analyze
network traffic, we used a NavTel IW95000.1
The ATM network analyzer provides nonintrusive
capture of cell-level ATM traffic streams, including
packet headers and payloads. The analyzer timestamps each ATM cell with a 1-microsecond timestamp resolution, and records the captured traffic
into memory in a compressed proprietary binary
data format. The size of the memory capture buffer
and the volume of network traffic determine the
maximum time interval for trace collection (typically several seconds at 155 Mbps OC-3 rates, and several minutes at 1.5 Mbps T1 rates).
Once the capture buffer is full, traces can be saved
to disk or copied to another machine for offline trace
analysis. In this case, we used a custom C program
to decode the NavTel IW95000’s recorded data. The
program converts the binary data file into an ASCII
format with TCP/IP protocol information.
Figure 1 shows an example of the humanreadable trace format (I’ve “sanitized” IP addresses throughout to conceal user identities). The format includes a timestamp (in microseconds,
relative to the trace’s start time), recognized protocol types, and selected fields from the IP and
TCP packet headers.
Given this trace format, you can easily construct
customized scripts to process a trace file and
extract the desired information, such as timestamp,
packet size, and IP and TCP protocol information.
In this example, we used offline trace analyses to
study ISP Web traffic characteristics.1
1089 -7801/ 01/$10.00 ©2001 IEEE
IEEE INTERNET COMPUTING
Tutorial
TIMESTAMP
0
14966
15015
22090
22126
29960
29960
31724
36055
36279
37181
41731
PROTOCOL
SOURCE
IP_ADDRESS
SRC
PORT
DESTINATION
IP_ADDRESS
DST
PORT
IP_PKT
SIZE
TCP
SEQ
TCP
ACK
IP
IP
IP
IP
IP
IP
IP
IP
IP
IP
IP
IP
307.246.129.64
561.877.104.57
391.82.374.90
719.327.502.359
582.127.755.91
561.877.104.57
419.74.87.6
419.74.87.6
512.84.9.317
512.84.9.317
407.84.92.183
399.81.77.33
1060
7410
1105
1140
1291
3741
80
80
1125
1126
1207
80
427.86.12.704
427.86.12.704
891.82.59.75
526.837.913.44
419.74.87.6
427.86.12.704
582.127.755.91
582.127.755.91
419.74.87.628
419.74.87.628
398.54.73.39
342.406.374.91
80
80
80
80
80
80
1291
1291
80
80
5190
1116
40
508
40
40
40
40
1500
1500
311
271
40
40
920641
410104
2816846
1010185
9557082
985526
653402
654862
857517
857661
64202
1062629
412791
32779
7726
14762
50482
58006
57082
57082
89873
3293
9407
68778
TCP
TCP
TCP
TCP
TCP
TCP
TCP
TCP
TCP
TCP
TCP
TCP
Figure 1. TCP/IP packet trace file of ISP network measurements. IP and TCP packet headers include IP source and destination address, IP packet size,TCP source and destination port numbers, and TCP sequence-number information for data
and acknowledgment packets.
Software Approaches
Software-based measurement tools typically modify
the kernel of a commodity workstation’s network
interface to give it packet-capture capability. One
widely used tool is tcpdump, which uses the Berkeley Packet Filter architecture to capture TCP/IP packets. Tcpdump lets you capture a network’s IP packets
and filter the captured traffic streams based on specific host addresses, port numbers, or protocol types.
Tcpdump is widely used to study Internet applications and growth trends in Internet traffic over time.2
Figure 2 (next page) shows an example of a tcpdump trace, which includes a timestamp for each
packet and the IP and TCP headers, which carry
address and control information. In post-processing,
you can extract application-level behaviors, such as
the Web document transfer shown in the figure.
Another software-based approach relies on the
access logs recorded by Web servers and proxies.
These logs record each client request for Web site
content, including the time of day, client IP
address, URL requested, and document size. Postprocessing access logs can offer useful insight into
Web server workloads3 without having to collect
detailed network-level packet traces.
Measurement Methods
Four major axes characterize the measurement
approaches that researchers use to study network
behavior generally and the Internet specifically.
Passive versus Active Measurement
A passive network monitor records packet traffic on
a network without creating additional traffic. Most
IEEE INTERNET COMPUTING
network measurement tools fall into this category.
An active approach uses packets that the measurement device itself generates to probe the Internet and measure its characteristics. Examples of
this approach include
■
■
■
the ping utility, which estimates network latency to a particular Internet destination;
the traceroute utility, which determines Internet routing paths; and
the pathchar tool, which estimates link capacities and latencies along an Internet path.
Online versus Offline Analysis
Some network traffic analyzers support real-time
data collection and analysis, often with graphical
displays of live traffic data. Most hardware-based
analyzers support this feature. Other measurement
devices, such as tcpdump, are intended only for
real-time data collection and storage. Once the
device collects and stores the traffic data, you can
analyze it offline.
LAN versus WAN Measurement
Early network traffic research focused on localarea network (LAN) environments, such as Ethernet LANs. LANs are easier to measure than widearea networks for two main reasons.
First, a LAN is typically administered by a single well-known organization and obtaining security clearance for traffic analysis is thus relatively
straightforward. Second, an Ethernet LAN broadcasts so that all hosts see all packets. To measure
traffic in this context, you simply configure a nethttp://computer.org/internet/ NOVEMBER • DECEMBER 2001
71
Spotlight
TIME
SOURCE PORT
DESTINATION PORT
FLAG
SEQNUM
ACKNUM
19:52.731470
19:52.731889
19:52.732200
19:52.738205
19:52.743248
19:52.758535
19:52.758862
19:52.759700
19:52.759935
406.17.8.12.64826 >
723.65.19.6.www >
406.17.8.12.64826 >
406.17.8.12.64826 >
723.65.19.6.www >
406.17.8.12.64826 >
723.65.19.6.www >
723.65.19.6.www >
406.17.8.12.64826 >
723.65.19.6.www:
406.17.8.12.64826:
723.65.19.6.www:
723.65.19.6.www:
406.17.8.12.64826:
723.65.19.6.www:
406.17.8.12.64826:
406.17.8.12.64826:
723.65.19.6.www:
S
S
4256930:4256930(0)
768500:768500(0)
.
4256931:4257101(170)
768501:5769840(1339)
4257101:4257101(0)
.
5769840:5769840(0)
.
ack 4256931
ack 768501 win 17520
ack 768501
ack 4257101
ack 5769840
ack 4257102
ack 4257102
ack 5769841
P
P
F
F
Figure 2. A tcpdump packet trace file.This example shows a Web document transfer, including a timestamp for each packet and the IP and TCP headers.
work interface into promiscuous mode; that is, the
interface receives and records (rather than ignores)
packets destined for other network hosts.
Researchers later extended measurement work
to WAN environments.2,4,5 These environments
present challenges in administrative control of the
network, including security and privacy. Organizations with a single Internet access point can
install measurement devices inline, on an Internet
link near the organization’s default router.
Recently, Barford and Crovella discussed
deploying a wide-area Web measurement infrastructure that collects simultaneous measurements
of client, server, and network behaviors.6 By coordinating time between these measurements, it’s
possible to achieve a more complete picture of
end-to-end network performance.
Protocol-Level Analysis
Measurement tools collect data and analyze traffic
at different protocol levels. Many network traffic
analyzers support multilayer protocol analysis, but
require a specialized network card for each network
type. For example, specialized network cards exist
for Ethernet, FrameRelay, ATM, and wireless networks, but IP and higher-layer protocols can use
the same back-end protocol analysis engines.
Research Highlights
The past 15 years of Internet traffic measurement
have produced key observations; I’ve selected 10 that
I think best summarize and highlight this research.
1. Internet traffic continues to change.
Longitudinal studies show that Internet traffic continues to grow and change over relatively short time
scales.2 This change is not simply one of traffic volume, but also of traffic mix, protocols, applications,
and users. Despite the value of Internet traffic mea72
NOVEMBER • DECEMBER 2001
http://computer.org/internet/
surement as a research methodology, any data set
collected from an operational network represents but
one snapshot at one point in time in the Internet’s
evolution. Trying to identify invariants in traffic
structure is one way to cope with the unending battle of measuring and understanding Internet traffic.
2.Aggregate network traffic is multifractal.
Characterizing aggregate network traffic is difficult for many reasons:
■
■
■
■
the Internet’s heterogenous nature,
network application diversity,
variable link speeds and network-access technologies, and
changing user behaviors.
Nevertheless, networking researchers have identified
a significant degree of long-range dependence (LRD)
in network traffic, which they refer to as “self-similar,” “fractal,” or “multifractal” behavior.7,8 This LRD
property appears to be ubiquitous; it is present in
LAN, WAN, video, data, Web, ATM, FrameRelay, and
SS7 signaling traffic. Researchers attribute this LRD
property in part to users’ heavy-tailed on-off behaviors, which are perhaps exacerbated by the Internet’s
TCP/IP protocols.9 More recent research addresses
Internet traffic’s “non-stationarity” and suggests that
the multifractal traffic structure evident at a large
network’s edges diminishes within the core.10
Despite its largely complex multifractal structure, researchers have developed surprisingly concise mathematical models to characterize and analyze Internet traffic, with the aim of improving
Internet infrastructure design.
3. Network traffic exhibits locality properties.
Network traffic structure is far from random. Traffic
structure is imposed implicitly by users’ applicationIEEE INTERNET COMPUTING
Tutorial
Traffic Measurement Resources
layer tasks (such as file transfers or Web page downloads), and is reinforced by the TCP/IP data transfer
protocols. Packets are not independent and isolated
entities; rather, they are part of a logical information flow in the higher protocol layers. This flow
manifests at the network layer in recognizable —
though not necessarily predictable — patterns of
packet timing and source and destination addresses.
This structure is often referred to in terms of temporal locality (time-based correlation of information)
or spatial locality (geography-based correlation).
■
■
■
■
■
4. Packet traffic is distributed nonuniformly.
Analysis of TCP/IP packets’ source and destination
addresses typically shows that the distribution of
packet traffic among hosts is highly nonuniform. A
common observation is that 10 percent of hosts
account for 90 percent of traffic. In some sense, this
observation is not surprising, given the client-server
paradigm for many network applications. However,
the presence of this property in many network measurement studies suggests a fundamental power-law
structure in many aspects of Internet traffic3,9,11 and
even in certain aspects of Internet topology.12
5. Packet sizes are distributed bimodally.
The size (in bytes) of the network packets traversing the Internet have a “spiky” distribution.4 About
half the packets carry the maximum number of data
bytes permitted by the maximum transmission unit
(MTU) parameter defined for a network interface.
About 40 percent of packets are small (40 bytes)
because of the prevalence of (header-only) TCP
acknowledgment packets for data received. The
remaining 10 percent of packets are somewhat randomly scattered between the two extremes, based
on how much user data remains in the “last” packet of a multipacket transfer. Occasionally, secondary
spikes occur in the distribution due to IP fragmentation between networks with different MTU sizes.
6.The packet arrival process is bursty.
Much of the classical work in queuing theory and
communication network design is based on the
assumption that the packet arrival process is a Poisson process. In simple terms, the Poisson arrival
process means that events — such as earthquakes,
telephone calls, and, in this case, packet arrivals —
occur independently at random times, with a welldefined average rate. More formally, the interarrival
times between events in a Poisson process are exponentially distributed and independent, and no two
events happen at exactly the same time.
Poisson models are attractive mathematically
IEEE INTERNET COMPUTING
■
Internet Traffic Archive (ITA) is a public-domain repository of traces
and data sets collected by networking researchers. http://ita.ee.lbl.gov
Internet Traffic Report (ITR) offers hourly statistics on global Internet
traffic trends. http://www.InternetTrafficReport.com
National Laboratory for Applied Network Research (NLANR) is a
U.S.-based initiative on high-performance networking. http://nlanr.net/
NLANR Measurement and Operations Analysis Team (MOAT) offers
online Internet traffic statistics, traces, and tools from a NLANR
subgroup specializing in Internet traffic measurement.
http://moat.nlanr.net
National Internet Measurement Infrastructure (NIMI) is an NLANR
initiative that provides ubiquitous measurement capability for Internet
traffic,
topology,
routing,
and
quality
of
service.
http://www.ncne.nlanr.net/nimi/
Tcpdump is public-domain software for collecting network-level
packet traces. http://www.tcpdump.org/ (All current as of Oct. 2001)
because the exponential distribution has a “memoryless” property: even if we know the time
elapsed since the last event, we have no hint when
the next event will occur. Poisson models are often
amenable to elegant mathematical analysis, leading, for example, to closed-form expressions for
the mean waiting time (and variance) in queuing
network models.
Detailed studies of Internet traffic show that the
packet arrival process is bursty, rather than Poisson.
That is, rather than having independent and exponentially distributed interarrival times, Internet
packets arrive in clumps.13 This bursty structure is
due in part to the data transmission protocols. The
result is that queuing behavior can be much more
variable than that predicted by a Poisson model.
Given this finding, the value of the simple (Poisson) network traffic models used in network performance studies is doubtful. This realization has motivated recent research on network traffic modeling.5
7.The session arrival process is Poisson.
Although the packet arrival process is not Poisson,
there is strong evidence that the session arrival
process is Poisson. That is, Internet users seem to
operate independently and at random when initiating access to certain Internet resources. This
observation has been noted for several network
applications. For example, in their studies of telnet
traffic, Paxson and Floyd have found that a Poisson process effectively models the session arrival
process when they use a time-varying rate (such as
hourly).13 Similarly, we found that a Poisson arrival
process is effective for modeling user requests for
individual Web pages on a Web server.3
http://computer.org/internet/ NOVEMBER • DECEMBER 2001
73
Spotlight
8. Most TCP conversations are brief.
In a 1991 study, more than 90 percent of TCP conversations exchanged less than 10 Kbytes of data
and lasted at most only two or three seconds.4 This
prevalence of short-lived connections was somewhat surprising at the time, particularly for file
transfer and remote login applications. However,
the Web’s advent has significantly reinforced this
conversation paradigm. The literature suggests that
approximately 80 percent of Web document transfers are less than 10 Kbytes, though the distribution has a significant heavy tail.3,9
9.Traffic flows are bi-directional, but often
asymmetric.
Many Internet applications generate a bi-directional
data exchange, though the data volume sent in each
direction often differs greatly. This observation was
true in the early 1990s (for example, see Caceres et
al.4 and Paxson5), and it is even truer today because
of the Web’s download-intensive nature. We don’t
yet know how large-scale peer-to-peer networking
paradigms (such as Napster and grids) will impact
Internet traffic asymmetry.
10.TCP accounts for most Internet traffic.
Since the early 1990s, TCP has dominated Internet
packet traffic,4,14 and will likely continue to do so
for the foreseeable future. The primary reason is
the Web’s advent. Because the Web relies on TCP
for reliable data transfer, the growing number of
Internet users, the widespread availability of easyto-use Web browsers, and the proliferation of Web
sites with rich multimedia content have combined
to create an exponential growth of TCP traffic.
Although Web caching and content distribution
networks have softened TCP’s impact (for examples, see Breslau et al.11 and my forthcoming article15) its overall growth is still dramatic. That said,
several recent (and popular) Internet applications
— including video streaming, Napster, IP telephony, and multicast — rely predominantly on the user
datagram protocol (UDP), and might gradually
shift the traffic balance away from TCP.
Conclusion
Network measurement research has grown in scope
and magnitude to match the Internet. Recent initiatives (see the “Traffic Measurement Resources” sidebar) are striving to provide a practical and scalable
infrastructure for wide-scale operational measurement of today’s Internet. Among the challenges
ahead are to establish an adequate measurement
infrastructure across heterogeneous, multivendor
74
NOVEMBER • DECEMBER 2001
http://computer.org/internet/
networks, and ensure the infrastructure’s scalability
with increased traffic and network speeds.
References
1. R. Epsilon, J. Ke, and C. Williamson, “Analysis of ISP
IP/ATM Network Traffic Measurements,” ACM Performance
Evaluation Review, vol. 27, no. 2, Sept. 1999, pp. 15-24.
2. V. Paxson, “Growth Trends in Wide Area TCP Connections,”
IEEE Network, vol. 8, no. 4, July-Aug. 1994, pp. 8-17.
3. M. Arlitt and C. Williamson, “Internet Web Servers: Workload
Characterization and Performance Implications,” IEEE/ACM
Trans. Networking, vol. 5, no. 5, Oct. 1997, pp. 815-826.
4. R. Caceres et al., “Characteristics of Wide-Area TCP/IP Conversations,” Proc. ACM Special Interest Group Data Comm.
(SIGCOMM’91), ACM Press, New York, 1991, pp. 101-112.
5. V. Paxson, “Empirically-Derived Analytic Models of WideArea TCP Connections,” IEEE/ACM Trans. Networking, vol.
2, no. 4, Aug. 1994, pp. 316-336.
6. P. Barford and M. Crovella, “Measuring Web Performance
in the Wide Area,” ACM Performance Evaluation Review,
vol. 27, no. 2, Sept. 1999, pp. 37-48.
7. A. Feldmann, A. Gilbert, and W. Willinger, “Data Networks
as Cascades: Explaining the Multi-Fractal Nature of Internet Traffic,” Proc. ACM Special Interest Group Data Comm.
(SIGCOMM’98), ACM Press, New York, 1998, pp. 42-55.
8. W. Leland et al., “On the Self-Similar Nature of Ethernet
Traffic (Extended Version),” IEEE/ACM Trans. Networking,
vol. 2, no. 1, Feb. 1994, pp. 1-15.
9. M. Crovella and A. Bestavros, “Self-Similarity in World Wide
Web Traffic: Evidence and Possible Causes,” IEEE/ACM
Trans. Networking, vol. 5, no. 6, Dec. 1997, pp. 835-846.
10. J. Cao et al., “On the Nonstationarity of Internet Traffic,”
Proc. ACM Special Interest Group Metrics (SIGMETRICS’01), ACM Press, New York, 2001, pp. 102-112.
11. L. Breslau et al., “Web Caching and Zipf-Like Distributions:
Evidence and Implications,” Proc. Int’l Joint Conf. IEEE Computer and Comm. Societies (IEEE Infocom99), IEEE Computer Soc. Press, Los Alamitos, Calif., 1999, pp. 126-134.
12. M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On PowerLaw Relationships of the Internet Topology,” Proc. ACM
Special Interest Group Data Comm. (SIGCOMM’99), ACM
Press, New York, 1999, pp. 251-262.
13. V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of
Poisson Modeling,” IEEE/ACM Trans. Networking, vol. 3,
no. 3, June 1995, pp. 226-244.
14. K. Thompson, G. Miller, and R. Wilder, “Wide-Area Internet Traffic Patterns and Characteristics,” IEEE Network, vol.
11, no. 6, Nov.-Dec. 1997, pp. 10-23.
15. C. Williamson, “On Filter Effects in Web Caching Hierarchies,” to be published in ACM Trans. Internet Technology,
vol. 2, no. 1, Feb. 2002.
Carey Williamson is a professor in the Department of Computer Science at the University of Calgary in Calgary,
Alberta, Canada, where he holds an iCORE senior research
fellowship in broadband wireless networks, applications,
protocols, and performance. His research interests include
Internet protocol performance, network traffic measurement, and network simulation.
Readers can contact Williamson via e-mail at carey@cpsc.
ucalgary.ca.
IEEE INTERNET COMPUTING