SCALABILITY OF HTTP PACING WITH INTELLIGENT BURSTING
Kevin J. Ma, Radim Bartoš, and Swapnil Bhatia
Department of Computer Science, University of New Hampshire, Durham, NH 03824
{kjma, rbartos, sbhatia}@cs.unh.edu
ABSTRACT
While streaming protocols like RTSP/RTP have continued to evolved,
HTTP has remained a primary method for Web-based video retrieval. The ubiquity and simplicity of HTTP makes it a popular
choice for many applications. However, HTTP was not designed
for retrieving data with just-in-time tolerances; HTTP servers have
always taken an as-fast-as-possible approach to data delivery. For
media with known bandwidth constraints (e.g., audio/video files),
HTTP servers can be enhanced and optimized by taking these
constraints into account. For these data types, we present our architecture for an HTTP streaming server using paced output. We
discuss the scalability advantages of our HTTP streaming server
architecture and compare it with traditional HTTP server response
times and bandwidth usage. We also introduce an intelligent bursting
mechanism and consider its effects on end user experience.
1. INTRODUCTION
The popular definition of multimedia has evolved over time, as advances in technology have enabled new types of media interaction.
The current focus of mainstream multimedia is undoubtedly streaming video. Over the past few years, Internet-based streaming video
has become a commoditized fixture of modern culture. Schemes
have been proposed to combat network congestion [1], however, network bandwidth has largely increased to meet the needs of video,
and the performance bottleneck has moved back to the streaming
servers. While server networking has been studied [2], streaming
server architecture requires additional scrutiny.
From a networking perspective, video delivery breaks down into
streaming vs. download. Streaming is typically associated with
RTSP [3] and RTP [4]. Download is typically associated with HTTP
[5], but divided into two categories: straight download and progressive download, with the latter using range requests to retrieve
data in chunks. A third HTTP option, which we propose as HTTP
streaming, uses paced data output. (Client side pacing, using range
requests, is another HTTP option [6], however, it requires custom
client software, which negates the ubiquity advantages of HTTP.)
Our results show resource efficiency advantages with our HTTP
streaming architecture, over straight HTTP download. HTTP
streaming also maintains the ubiquity advantage of HTTP, over
protocols like RTSP/RTP or RTMP1 [7]. Ubiquity is a key factor as
we consider the future of multimedia. As browser-based dominance
continues, HTTP will continue to play a crucial role in multimedia’s
evolution, and HTTP needs to evolve to meet the changing needs.
While RTSP/RTP have a number of streaming related enhancements, two of the key features are network resilience and band1 While the ubiquity of RTMP in Web-based video distribution could easily be argued, the proprietary nature of RTMP make it inaccessible to server
and protocol developers.
width management. The former relies on frame-based packetization
and unreliable UDP transport to allow graceful degradation during
packet loss. However, with the advances in network infrastructure,
failures are less frequent, and most content providers would prefer
to ensure high quality, with no packet loss, using TCP. For the latter, the pacing employed by our HTTP streaming architecture brings
bandwidth and quality management to HTTP. RTSP/RTP also has
the disadvantage of requiring the creation of multiple UDP connections between server and client, which many firewalls do not allow.
Performance evaluations exist for traditional Web Servers at a
network level [8], but not at the architectural level. Other reports
confirm the performance of the download model compared to other
streaming schemes [9]. In this paper we dive a step deeper and compare different HTTP architectures (i.e., HTTP streaming vs. HTTP
download), from a video delivery perspective. HTTP download
is still widely used for video retrieval on both desktop and mobile platforms. Most RTMP and RTSP/RTP servers support HTTP
tunneling for its firewall traversability. Windows MediaTM and
QuickTime R desktop clients support both RTSP/RTP and HTTP.
Windows MediaTM mobile player, until recently, only supported
HTTP, and QuickTime R mobile player continues to only support
HTTP download2 . In this paper, we compare the characteristics of
our own HTTP streaming server implementation with that of the de
facto standard Apache HTTP server. We also introduce intelligent
bursting and detail its effect on HTTP streaming performance.
2. APACHE HTTP SERVER ARCHITECTURE
Traditional HTTP servers are optimized for delivering web page content, which typically consists of many small files. Small files typically imply short-lived connections. For video, however, file sizes
are large and connections are much longer lived. With short-lived
connections, the number of concurrent connections is much smaller.
Servers which are optimized for fewer concurrent connections may
suffer from head-of-line blocking in the request queue, with traffic
primarily composed of long-lived video streams.
The Apache Web Server is the de facto standard in open source
Web serving. Apache spawns a bounded number of new processes
to handle incoming requests, where each request is assigned to its
own process in a run-to-completion model. Output data is sent as
fast as possible, with fairness between active connections (processes)
managed by the underlying OS.
3. ZIPPY HTTP SERVER ARCHITECTURE
The Zippy streaming server we developed uses a single thread for
managing all sessions, rather than a process per connection. Ses2 While iPhoneTM /iPod R Touch use range requests, the default is to request the entire file, in a degenerate case that mimics straight download.
6
Zippy (no-burst)
Zippy (burst)
Apache
Playback Latency (s)
5
Fig. 1. Comparison of Apache and Zippy architectures.
4
3
2
1
0
sion state is managed by our session pacer rather than by individual
processes. Fig. 1 shows the difference between the Apache multiprocess architecture and the Zippy single thread architecture. With
Zippy, connection fairness is explicitly enforced by the session pacer,
rather than the OS scheduler. While other single-threaded HTTP
server options exist, Zippy is focused on using fair access to resources, via the session pacer, to better support large numbers of
concurrent long-lived connections. The large number of parallel sessions provides an advantage over servers capped by process limits.
The session pacer maintains sessions Si in a heap, ordered by
the sessions’ next absolute send time. Absolute send times are calculated as current wall clock time plus the pacing delay minus any
overhead: Tsend(i) = Tnow + δpace(i) − ki . The pacing delay is
calculated using Zippy’s fixed chunk size and a known constant bit
rate (assumed for simplicity) for the session: δpace(i) = c/ri . The
constant bitrate is derived from the file size divided by the file duration: ri = si /di . Overhead includes processing latency and catch
up delays as described below.
Zippy employs intelligent bursting mechanisms: one to catch
up sessions when network latency inhibits chunk sends; another to
decrease playback latency for media files. During periods of network
congestion, full or near-full TCP buffers may cause partial sends or
send failures. In such an event, future pacing delays are shortened to
help the sessions catch up3 . This adaptive bursting is used to prevent
network issues from causing underrun.
User experience can also be enhanced through client buffer
preload. To combat jitter and prevent underrun, video file playback
typically will not commence until a sufficient amount of video has
been buffered. With paced output, the playback buffering latency
negatively effects user experience. This can be avoided by bursting
the initial portion of the media file.
Zippy manages bursting by monitoring bandwidth thresholds to
prevent bursting sessions from interfering with the minimum bandwidth requirements of paced sessions. Bursting uses only excess
system bandwidth, divided evenly between all bursting sessions or
only high priority sessions.
4. EXPERIMENTAL RESULTS
For our experiments, Zippy and Apache 2.2.6 were installed on a
server with a 2.2 GHz Core2 Duo CPU and 2 GB RAM, running
FC8. (The Apache version corresponds to the default httpd installation for the FC8 distribution used.) To ensure that the test client is not
a limiting factor, test client software was installed on a machine with
3 While a larger chunk size (rather than a shorter delay) could be used
for catch up, if network congestion is the cause of the failure, then the TCP
window is most likely limiting how much data can be sent.
0
10
20
30
40
50
60
Session ID
70
80
90
100
Fig. 2. Playback latency for 100 concurrent sessions.
dual 2.5 GHz quad core Xeons and 4 GB RAM, running RHEL5.1.
The machines were connected via a Gigabit Ethernet network.
The tests were performed using a 1 MB data file. A constant
bit rate of 400 kbps was assumed, which gives the file a duration
of 20 seconds. The client buffering requirement was assumed to
be 4 seconds (or 200 KB). Client underruns checks were performed
against the known constant bit rate, for each packet after the initial
buffer load (i.e., first 200 KB).
The test client is a multithreaded application which spawns a
thread per connection. The connections are initiated as a flash crowd,
with 500 microseconds between each request. Timestamps were
recorded relative to the start of the test, as well as relative to the
first TCP connection attempt. A sniffer monitored actual bandwidth
used. As scalability was our main focus, for each test, we examined
the performance of 100 and 1000 concurrent connections.
4.1. Playback Latency
We consider playback latency as the amount of time required to send
enough data to fill the client buffer. Given our assumption of a 4 second buffer, streamed output without bursting should take less than 4
seconds to send the 200 KB. For a single straight download, over Gigabit Ethernet, 200 KB should take about 2 milliseconds, plus overhead. Figs. 2-3 show the playback latencies for each of 100/1000
sessions, respectively. The latencies are offsets from the first TCP
connection request, in seconds, sorted from low to high.
In Fig. 2 the Zippy no-burst line, as expected, is consistently just
below 4 seconds. The Zippy burst line shows a much lower latency,
but with similar consistency across all sessions. The first 20 Apache
connections are must faster than Zippy (burst or no-burst). The first
20 Apache connections take about 60 milliseconds (∼ 3 milliseconds per connection). Taking into account overhead, the Apache
results are as expected. The rest of the Apache plot, however, looks
like a step function. The steps represent the head-of-line blocking
and latency of run-to-completion download.
In Fig. 3 we can see that Apache performance is noticeably
worse, compared to 1000 sessions. At a certain point, Apache’s
head-of-line blocking delays begin to cause TCP timeouts and the
TCP back off causes more significant latency penalties.
With 1000 sessions, the total bandwidth requirement goes up
significantly, which inhibits Zippy’s ability to burst. We can see this
30
25
Zippy (no-burst)
Zippy (burst)
Apache
25
20
20
Time (s)
Playback Latency (s)
30
Zippy (no-burst)
Zippy (burst)
Apache
15
15
10
10
5
5
0
0
0
100
200
300
400 500 600
Session ID
700
800
900 1000
0
Fig. 3. Playback latency for 1000 concurrent sessions.
20
30
40
50
60
Session ID
70
80
90
100
Fig. 4. Download time for 100 concurrent sessions.
30
Zippy (no-burst)
Zippy (burst)
Apache
25
20
Time (s)
in Fig. 3 as the playback latency for Zippy burst and Zippy no-burst
converge. However, the worst case for both bursting and not bursting
is still significantly better than Apache.
Apache is faster than Zippy (burst or no-burst), for 20 or fewer
connections. This is due to the default Apache process limit for
the given machine. The Apache process limit maybe manually increased, however, the strain on system resources is great, when managing 1000 processes. Zippy consumes far fewer resources at 1000
concurrent sessions, and its consistency in processing all sessions in
parallel gives it a noticeable advantage in response time.
10
15
10
5
4.2. Download Time
0
0
We consider download time as the relative time at which the entire
file download completed. Given our assumptions of a 20 second file
duration, streamed output without bursting should take less than 20
seconds from the time the HTTP connection is accepted. For a single
straight download, over Gigabit Ethernet, 1 MB should take about 10
milliseconds, plus overhead, from the time the HTTP connection is
accepted. Figs. 4-5 show the download start and end times for each
of 100/1000 sessions, respectively. The times are offsets from the
start of the test in seconds, sorted from low to high.
In Fig. 4 the Zippy no-burst line, as expected, is consistently just
below 20 seconds. The Zippy burst line is consistently at about 16
seconds, which takes into account the 4 second burst, followed by
pacing thereafter. The Apache download times are dwarfed by the
paced completion times. In the worse case it takes little more than 1
second to complete the straight download, which is as expected.
In Fig. 5 we can see again that for 1000 sessions, Zippy performance is about the same, but Apache does noticeably worse. The last
100 or so bursted sessions did not have enough excess bandwidth to
really burst, however we can see that those sessions still beat the noburst deadlines. Apache, on the other hand, due to the exponential
backoff in TCP, takes significantly longer to download the last 200
or so connections. Even though the total time to actually download
is less, the user perceived time is quite high.
For larger files, straight download latency gets worse, and more
TCP timeouts occur. Compounding this is that many types of clients
(esp. mobile) are unable to buffer entire files, which causes TCP
back pressuring. This only exacerbates head-of-line blocking issues.
100
200
300
400
500
600
700
800
900 1000
Session ID
Fig. 5. Download time for 1000 concurrent sessions.
4.3. Bandwidth Usage
We consider bandwidth usage as an aggregate for the entire server.
Given our assumptions of a 400 kbps second constant bit rate,
streamed output without bursting should require 400 kbps per active
connection. Figs. 6-7 show the bandwidth used in the 100/1000
sessions cases, respectively. The bandwidth (in Mbps) is calculated,
over time, as an offset (in seconds) from the start of the test.
In Fig. 6 the Apache plot is clustered within the first second and
close to the practical capacity of the Gigabit Ethernet network and
the OS protocol stack. The Zippy burst plot also has a marker close
to the network limits, at the very beginning, representing its burst,
then periodic bursts of data are seen. A similar pattern of periodic
bursts is seen for the Zippy no-burst plot, but shifted to the right,
given the longer duration. The end times times for the Zippy burst
and no-burst plots are at the expected 16 and 20 seconds, respectively and the calculated average bandwidth used, over the full 16/20
seconds, is close to the expected 40 Mbps.
The irregular burstiness of Zippy plots is an artifact caused by
data send clustering and offset sampling. Data send clustering occurs when all sessions are initated at the same time, as with our
flash crowd scenario. This synchronization manifests itself as bursty
1200
Zippy (no-burst)
Zippy (burst)
Apache
Bandwidth (Mbps)
1000
800
600
400
200
0
0
5
10
15
20
25
Time (s)
Fig. 6. Bandwidth usage for 100 concurrent sessions.
1200
Bandwidth (Mbps)
6. REFERENCES
Zippy (no-burst)
Zippy (burst)
Apache
1000
to be the preferred medium for distributing all types of streaming
media and as such, HTTP servers need to evolve to incorporate optimizations for these new classes of realtime media. We believe that
our architecture is a step in that direction.
We continue to explore new aspects of HTTP streaming scalability. We believe that combining some of the streaming advantages
of RTSP/RTP with the ubiquity, simplicity, and robustness of HTTP
provides an optimal solution for practical deployments, especially in
the case of mobile devices. We are evaluating different greedy bursting schemes, for maximizing bandwidth usage, as well as investigating their effects on different traffic profiles (including different types
of audio/video files, as well as alternative types of streaming media,
e.g., microblogging or ticker data). We are also looking into different jitter injection mechanisms (similar to BFD [10]) to combat data
send clustering, without impacting user experience. The future of
multimedia is going to include more categories of realtime data and
new modes of interactivity. With HTTP the likely transport mechanism, these new traffic patterns need to be studied so that HTTP
server architecture can be properly optimized for the future.
[1] C. Chen, Z. Li, and Y. Soh, “TCP-friendly source adaptation
for multimedia applications over the Internet,” Journal of Zhejiang University - Science A (JZUS-A), pp. 1–6, February 2006.
800
[2] D. Freimuth, E. Hu, J. LaVoie, R. Mraz, E. Nahum, P. Pradhan,
and J. Tracey, “Server Network Scalability and TCP Offload,”
in Proceedings of the 2005 Annual USENIX Technical Conference, April 2005, pp. 209–222.
600
400
[3] H. Schulzrinne, A. Rao, and R. Lanphier, “Real Time Streaming Protocol (RTSP),” RFC 2326, Internet Engineering Task
Force (IETF), April 1998.
200
0
0
5
10
15
20
25
Time (s)
Fig. 7. Bandwidth usage for 1000 concurrent sessions.
bandwidth usage. Average bandwidth used is actually much lower
due to the pacing delays. Offset sampling is the difference between
pacing rate and bandwidth sampling rate. When the burst crosses
a sampling boundary, a high and low bandwidth measurement are
seen; the offset sampling rate ensures that boundaries will be crossed
at different points within the burst.
In Fig. 7 the Apache plot is again always at maximum bandwidth, with holes representing the TCP backoff. The Zippy burst
plot shows the burst at the beginning and tails off at about 16 seconds. The Zippy no-burst plot is relatively evenly distributed.
5. CONCLUSIONS AND FUTURE WORK
We have shown the scalability value of a single threaded, paced architecture for HTTP streaming. This architecture enables connection fairness for a larger number concurrent connections, while still
maintaining the ability to use greedy delivery for a smaller number
of concurrent connections. Traditional HTTP servers are optimized
to service short-lived connection requests, but they are suboptimal
for long-lived connections (e.g., live and on-demand audio/video, as
well as other emerging live data streams). The browser continues
[4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
“RTP: A Transport Protocol for Real-Time Applications,” RFC
3550, Internet Engineering Task Force (IETF), July 2003.
[5] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter,
P. Leach, and T. Berners-Lee, “Hypertext Transfer Protocol
– HTTP/1.1,” RFC 2616, Internet Engineering Task Force
(IETF), June 1999.
[6] N. Färber, S. Döhla, and J. Issing, “Adaptive Progressive
Download Based on the MPEG-4 File Format,” Journal of Zhejiang University - Science A (JZUS-A), pp. 106–111, February
2006.
[7] L. Larson-Kelley, “Overview of streaming with Flash Media Server 3,” February 2008, http://www.adobe.com/devnet/
flashmediaserver/articles/overview streaming fms3 02.html.
[8] T. Shinozaki, E. Kawai, S. Yamaguchi, and H. Yamamoto, “Performance Anomalies of Advanced Web Server Architectures in
Realistic Environments,” in Proceeding of IEEE International
Conference on Advanced Communication Technology, 2006
(ICACT 2006), Feb 2006, pp. 169–174.
[9] Y. Won, J. Hong, M. Choi, C. Hwang, and J. Yoo, “Measurement of Download and Play and Streaming IPTV Traffic,”
IEEE Communications Magazine, pp. 154–161, October 2008.
[10] D. Katz and D. Ward, “Bidirectional Forwarding Detection,”
Internet Draft Version 8 (draft-ietf-bfd-base-08), Internet Engineering Task Force (IETF), March 2008.