User-specified Adaptive Scheduling in a Streaming Media
Network∗
Michael Hicks, Adithya Nagarajan
Department of Computer Science
Univ. of Maryland, College Park, MD
[email protected],
[email protected]
Robbert van Renesse
Department of Computer Science
Cornell Univ., Ithaca, NY
[email protected]
Abstract
1
In disaster and combat situations, mobile cameras
and other sensors transmit real-time data, used by
many operators and/or analysis tools. Unfortunately,
in the face of limited, unreliable resources, and varying demands, not all users may be able to get the
fidelity they require. This paper describes MediaNet, a distributed multi-media processing system designed with the above scenarios in mind. Unlike past
approaches, MediaNet’s users can intuitively specify how the system should adapt based on their individual needs. MediaNet uses both local and online global resource scheduling to improve user performance and network utilization, and adapts without requiring underlying support for resource reservations. Performance experiments show that our
scheduling algorithm is reasonably fast, and that user
performance and network utilization are both significantly improved.
Consider a dangerous setting, such as collapsed
buildings caused by an earthquake.
Novel
recording devices, such as cameras carried by
Uninhabited Aerial Vehicles (UAVs) or by robots
that crawl through rubble, may be deployed to
explore the area. The output of these devices
can be of interest to many operators. Operators
may include rescue workers working in the rubble itself, people overseeing the work in a station
somewhere, the press, or software that creates,
say, a 3-dimensional model of the scene.
Different operators may require different views
of the area, and may have different fidelity requirements or user priorities. Although the operators may work independently of one another,
they share many resources, such as the recording
devices themselves, compute servers, and networks. These resources have limited capacity,
and thus it is necessary to allocate them carefully. Without resource reservation, adaptivity
is essential.
The conditions present in this disaster situation are not unique. That is, many applications consist of multiple operators interested
in streaming data from multiple sources that
must adapt to limited resources, potentially in
∗
This work was funded in part by DARPA grant
F30620-98-2-0198, DARPA/AFRL-IFGA grant F3060299-1-0532, a grant under NASA’s REE program administered by JPL, NSF-CISE grant 9703470, and by the
AFRL/Cornell Information Assurance Institute. The
first author was also supported by the AFRL-IFGA Information Assurance Institute under grant AFOSR F4962001-1-0312.
Introduction
application-specific ways. Examples include the
exchange and aggregation of sensor reports [19],
the distribution of media on a home network [30],
the performance of reconnaissance and deployment in a military setting [21], and so on.
A number of projects have explored how to
provide improved quality of service (QoS) for
streaming data in resource-limited conditions.
These systems place computations in the network, either within routers themselves (e.g.,
[5, 11, 34]) or at the application-level using
an overlay network (e.g., [1, 32]), and employ
system-determined, local adaptations, such as
priority-based video frame dropping. While such
adaptations impose little overhead, they can be
inefficient because they do not take into account
global information. Also, existing schemes typically do not consider user preferences in making
QoS decisions.
To study whether these problems can be
overcome, we are developing a system called
MediaNet that takes a comprehensive view of
streaming data delivery. MediaNet mainly differs from past approaches in two ways. First,
rather than making QoS adaptation systemdetermined, MediaNet allows users to specify
how it should adapt under overload conditions.
Each user contributes a list of alternative specifications, and associates a utility value with each
specification. To some users, color depth may
be more important than frame rate, while for
other users the preference may be the other way
around. The primary goal of MediaNet is to
maximize each user’s utility.
Second, in addition to using local scheduling,
MediaNet employs a global scheduling service to
divide tasks and flows among network components. This global point of view has benefits
to both fairness and performance, because the
service can consider specifications from multiple
users while accounting for priority and overall
network efficiency; the challenge is to do this in
a scalable manner. Different from other projects
that use global schedulers (e.g., [11, 15, 30]),
MediaNet’s global scheduling service is continuously looking for improvements based on monitoring feedback. MediaNet employs a completely
adaptive overlay network; it does not rely on resource reservations, and adapts to the presence
of loads not under its control.
Experimental measurements with our prototype implementation are promising. When using
a single global scheduler to implement the global
scheduling service, users achieve better performance and the network is more efficiently utilized than without any or with only local adaptations. On the other hand, our system does
exact a higher cost for its global adaptations,
in terms of scalability and implementation complexity. We consider our work as a step to exploring how to apply adaptations synergistically
from various levels in a scheduling hierarchy.
In this paper, we present the MediaNet architecure (Section 2) and our prototype implementation (Sections 3 and 4). We focus
on the challenges of implementing a globallyreconfigurable stream-processing system, and
show experimental evidence of its costs and benefits (Section 5). We finish up by comparing our
approach to related work (Section 6) and propose future research directions (Section 7).
2
MediaNet
MediaNet’s architecture defines a computational
network, consisting of compute nodes and network links. These elements are responsible for
receiving streaming data from various sources,
computing on that data, and delivering it to the
2
Cameras, Sensors
and Wireless User Devices
Scheduling Service
Workstations
Compute Servers
Figure 1: MediaNet architecture.
account issues of fairness, performance, and userspecified adaptation. We elaborate on user specifications next, and follow with a discussion of
scheduling.
end-applications. As shown in Figure 1, compute nodes are highly heterogeneous, consisting
of cameras, sensors, workstations, and compute
servers; as such they have different computational power, available memory, hardware support for video operations, etc. Network links
between nodes could be either wired or wireless; as such, the underlying network topology
may change at run-time as components physically move around or new parts of the infrastructure are deployed.
The user’s interface to this computational network is via a global scheduling service; the architecture leaves the implementation of this service abstract. Users communicate their requirements to the service using specifications that
consist of what we call continuous media networks (CMNs). A CMN is simply a directed
acyclic graph (DAG) representing a computational dataflow. The job of the global scheduling service is to combine the CMNs of individual
users into a single CMN, and then partition this
CMN into subgraphs to be executed on the various compute nodes, based on the current state
of the network. The act of combining the CMNs
and partitioning them among nodes takes into
2.1
Specifications
Each node in a CMN represents an operation
mapping zero or more input frames (streamspecific packets of data such as video frames, audio clips, etc.) to zero or more output frames.
Operations can be simple, e.g., data forwarding,
frame prioritizing, and frame dropping; or they
can be more complex, e.g., video frame cropping,
changing the resolution or color depth, “picturein-picture” effects, compression, encryption, and
audio volume control. We also need operations
to receive input from and send output to components external to the DAG, to perform I/O
with devices like video cameras and players. The
global scheduling service takes into account the
bandwidth, latency, and processing requirements
of operations.
Operations have a number of associated attributes. One important attribute is the interval
that indicates the minimum time between op3
on pcD. In the second CMN, the frame rate is reduced by proactively dropping B frames, while
in the third CMN the P frames are dropped as
well.3 The MediaNet scheduler can decide which
of these specifications to run, and where to run
the operations with unspecified locations.
We do not expect users will author CMNs
directly, but rather provide higher-level preferences, such as the general adaptation methodology and the streams of interest. For example, a user might specify (in some declarative
format) “I want MPEG stream X from location Y , and I want to adapt using frame dropping.” A weaving tool, which is part of the global
scheduling service, would index these preferences
into a database that contains template CMNs
and stream specifications. The template CMNs
are basically like the CMNs we have shown,
but without any stream-specific data, like the
stream location, resource usage characteristics,
etc., while the stream specifications would include this missing information. The weaver then
merges the template and the stream specification of the requested movie together to create
an almost-complete CMN; only the utility values have not been filled in. This idea is shown
in Figure 2(b).
The weaver should set utility values to share
resources fairly among users of potentially differing priority. Utility values have both relative
and absolute effect. That is, a user’s alternative specifications are prioritized relatively by
the ordering of their utilities, while the partic-
erations on subsequent frames (i.e., the inverse
of the maximum rate). For better performance,
operations can either process input frames immediately, or they can be forced to execute at
the specified intervals on queued data. In either
case, the interval effectively specifies a soft realtime constraint on the processing of frames; if
frames arrive faster than the specified interval,
or if the node cannot process them at that interval (perhaps because of downstream congestion),
then either backpressure must be applied to the
incoming flow or frames must be dropped. How
to handle these situations adaptively is considered in the next subsection.
A CMN node can be fixed at a certain location in the actual network (e.g., to indicate the
network location of a particular video source), or
left unspecified. Moreover, a node can be considered transitional, meaning that it is only inserted between mandatory nodes when the CMN
is scheduled across multiple compute nodes. Operations can maintain internal soft-state1 and
need not actually operate on packets. Requiring
soft-state is important for allowing operations to
relocate during a reconfiguration.
A user specification is a list of CMNs, where
each CMN’s relative preference is indicated by a
corresponding utility value, which is a real number between 0 and 1, where 1 means most desirable. An example is shown in Figure 2(a)2 ,
where the user specifies three CMNs, having decreasing utility. In each CMN, an MPEG video
stream originates at location pcS, the frames are
prioritized for intelligent dropping by the transitional (as indicated by the *) Prio operation,
and they are finally delivered to the user’s player
3
In MPEG streams, I frames are essentially JPEG images, while P frames and B frames exploit temporal locality, including “deltas” from adjacent frames. P frames
rely on the most temporally-recent P or I frame, and B
frames rely on the prior I or P frame, and the next appearing I or P frame. Therefore, I frames are more important
than P frames, while B frames are the least important.
1
Soft state is state not strictly required for correctness,
e.g. caches.
2
Though not shown here, we encode user specifications, and consequently CMNs, as XML documents.
4
Utility
CMN
Vid
1.0
pcS
0.4
pcS
User
Prio*
subscribe
spec
weaver
pcD
user
CMN
to GS
template
CMNs
Vid
Vid
0.1
high−level
user prefs
pcS
Prio*
Prio*
drop
B
Prio*
drop
PB
Prio*
User
pcD
User
high−level
stream
specs
Stream
DB
publish
pcD
(a) Example User CMN
(b) MediaNet User API
Figure 2: User specification APIs
the frame priority-setting Prio operations, the
dB node corresponds to the drop B operation.
The empty circles are send and receive operations inserted by the global scheduling service
to transport data between nodes. The available
link bandwidths are 300 KB/s in general, with
200 KB/s on link L4, and 100 KB/s on links L3
and L7. For both videos, the utility 1.0 configuration requires roughly 145 KB/s, and the utility
0.4 configuration requires roughly 90 KB/s.
ular magnitude of a utility value relates globally
to the utility values of other users. For example, a higher priority user might have the same
specification as in Figure 2(a), but have utility
values 1.0, 0.2, and 0.1, respectively. When resources became limited, this user would be forced
to adapt only after a user having the utilities assigned in Figure 2(a). We expect to report on
the implementation of this aspect of the MediaNet architecture in future work; in the meantime, our implementation assumes utilities are
set fairly by hand.
2.2
The quality of a schedule can be evaluated by
the provided per-user utility, and the networkwide utilization, in terms of CPU, memory, and
bandwidth usage. Schedule evaluation in an absolute sense is difficult because the scheduling
problem is almost certainly NP-complete, so generating an optimal schedule for comparison is
not feasible in general. Therefore, we must assess schedules manually (if possible), or compare
them with schedules from different algorithms.
Scheduling
Once a user provides a specification, the global
scheduling service schedules it on the network
in conjunction with all existing user specifications. An example schedule generated by our
prototype implementation is depicted in Figure 3. Here we have combined five user specifications of equal priority, each varying from that
in Figure 2(a) only in the user and video locations, and scheduled them on a sample network.
In Figure 3 the v1 and v2 nodes correspond to
the video sources, the Pr nodes correspond to
For the example schedule, users 1, 3, and 5 are
receiving the best possible performance: users 1
and 3 have utility 1.0 since they require no intervening dB nodes, while user 5 must have its
B frames dropped, resulting in 0.4 utility, due
5
pc1
pc5
L1
pc3
L2
pc7
L5
L3
L7
pc4
L4
L8
L6
pc2
pc6
pc8
(a) A sample network
user1
pc5
Pr
v1
pc1
Pr
Pr
Pr
L3
Pr
Pr
v2
dB
Pr
L4
Pr
L2 pc3
Pr
pc2
user5
Pr
pc4
pc6
pc8
dB
user3
L7
user2
user4
pc7
L*=300 L3=100 L4=200 L7=100 KB/s
(b) A sample schedule on the above network
Figure 3: A global configuration
ing. As such, the service needs regular reports
of current conditions, including changes to link
and CPU/memory loads, and changes to topology. Because of delays in detecting and reporting changing information, changes to the schedule necessarily occur on the order of seconds.
To mitigate these delays, user specifications can
employ local adaptations, like intelligent packet
dropping or upstream backpressuring.
to the 100 KB/s limitation on its last hop link
L7. Users 2 and 4 also receive utility 0.4, having
their B frames dropped at pc2; this is the best
that can be expected given the requirements of
users 1, 3, and 5 and assuming that striping is
not supported.4 Given the user utilities it is supporting, the network utilization is good, as it is
not wasting bandwidth. For example, video 2’s
dB node is scheduled at pc2 rather pc3, which
is connected to the congested link; this avoids
wasting bandwidth across link L2.
While other architectures with similar global
scheduling services either set up only the initial computational flow [30], or reschedule very
rarely (such as when compute nodes or network
links fail), MediaNet’s global scheduling service
operates on-line, performing continuous schedul-
3
Global Scheduling
The MediaNet architecture leaves the implementation of the global scheduling service abstract,
admitting the possibility of a variety of implementations. The most straightforward implementation would be as a single global scheduler
(GS) which computes a CMN subgraph for each
4
It would be possible for users 2 and 4 to receive utility
1.0 rather than users 1 and 3, but this is an arbitrary node and sends it to the local scheduler (LS) running the node. The LS implements the CMN
decision given that all users are of equal priority.
6
and periodically reports local resource usage to
the GS, which can periodically recompute and
redistribute its schedules, as necessary. This
approach has the benefit that since the scheduler can consider the entire network and all of
its users, it can likely achieve better fairness
and performance, but at the cost of scalability.
Conversely, a completely distributed approach
would improve scalability but likely degrade performance.
We believe that the best approach will be to
use a hierarchy of GSs, each responsible for subcomponents of the network and combined user
CMNs. The users will provide their specifications to a top-level GS, which will aggregate all
of the specifications and disseminate partitions
of them to its child schedulers. These will do
likewise, until ultimately a single CMN is provided to the LS for implementation on a compute node. Conversely, each LS will report its
available resources to its parent GS, which will
report aggregated resource amounts to its parent, and so on. Moreover, the hierarchy will be
best created on-the-fly, depending on the size of
the network, or its current state. For small networks (e.g. 5-15 nodes, with 5-10 users, as might
be expected in the motivated disaster situation),
a single GS will likely be ideal, while for larger
networks, more hierarchy will reduce the systemwide effects of reconfiguration, reduce monitoring overhead, etc.
In this work, we describe a plausible algorithm
for the GS in this hierarchical arrangement. Our
current implementation uses only a single GS,
and so we omit the details of how the hierarchy
might be created on-the-fly, how the schedulers
might partition user specifications, etc. These
important details will be left to future work. In
this section, we describe our global scheduling algorithm and characterize its running time. The
next section describes our prototype implementation of this algorithm and the associated infrastructure.
3.1
Overview
The goal of any MediaNet global scheduling algorithm is to maximize the minimum utility of
each of its users while utilizing network resources
efficiently. In the case of our particular algorithm, given U user specifications, each with one
or more CMNs having utility values between 0
and 1, the goal is to find a minimum utility sufficient for all users, and assign higher utility values
for as many users as possible, possibly preferencing higher priority users. Secondarily, given the
maximal aggregate utility it is able to achieve,
it will choose the schedule that least taxes the
network’s resources.
The algorithm works as follows. Each user
is assigned a utility, and the operations at that
utility are combined with the other users’ operations (at their respective utility) to create a
single, global CMN of user operations. It then
considers possible assignments of user operations
to network hosts, along with the necessary intervening send and receive operations. For each assignment, it calculates a score, based on how effectively the user specifications are met and how
efficiently the network is utilized. Each score is
either negative, in which case the network does
not have the resources to schedule the given operations, or the score is between 0 and 1, where 1
indicates the network is untaxed, and 0 indicates
that at least some part of its resources (whether
CPU, network bandwdith, etc.) is completely
utilized. The assignment chosen is the one with
(1) the highest minimum utility for all users, (2)
the highest aggregate utility (i.e. the sum of all
user utilities considered) above this minimum,
7
In addition, the scheduler uses information
supplied in the user CMN during its calculation. For example, we use interval(o) and
maxdelay(o) to denote the minimum interval
and maximum acceptable delay, as indicated by
the user CMN, where interval(o) is the minimum time between subsequent invocations of
operation o, and maxdelay(o) is maximum acceptable delay between the time a frame enters
the CMN and operation o processes it. These
and other predicates, as well as some notational
conventions, are summarized in Table 1.
and (3) the maximal score for that aggregate
utility.
Our algorithm is not guaranteed to generate
an optimal solution, but it is reasonably fast and
works well on the examples we have considered
(as we demonstrate in Section 5). In particular,
we show in Section 3.3.1 that the algorithm is
polynomial in the size of the network and the
number of users, which is faster than a bruteforce enumeration of possible schedules, which
would be exponential in the size of the network.
This speed is important since the algorithm will
be run on-line, as network conditions and user
specifications change over time. We first discuss
our scoring algorithm, and then present how assignments are chosen.
3.2
Using this information, the scheduler can approximate how long it takes for any operation to
compute on any host, and how long it takes to
propagate the output over any network. In the
future we intend to use more detailed monitoring
so that the GS can update its model over time.
Calculating the Score
The total score is the minimum of three separate scores: the host score, the network score,
and the operation score. These scores measure,
respectively, the leftover ratio of computational
capacity, the leftover network capacity, and the
leftover ratio of acceptable delay; we make these
notions precise below. The larger the scores,
the less loaded the system is, and thus the more
preferable the assignment.
To calculate the score, the scheduler maintains a
model of the network and its available resources,
as well as costs for user operations. The network
is described as a graph (V, E) where V is the set
of hosts h in the network, and E is the set of
links l connecting the hosts; a single link can
connect more than two hosts (i.e. they can be
broadcast). We abuse notation and sometimes
use V and E to refer to the cardinality of V and
E (rather than |V | and |E|).
The model assigns a capacity to each host and
link, where capacity(h) is in terms of instructions per second, and capacity(l) is in terms of
bytes per second. In addition, network links are
assigned a latency latency(l) (in seconds). Each
operation o has associated cost functions cost(o)
and fsize(o), which are the approximate number
of instructions the operation takes, and the average size of any output frames, respectively. In
our implementation, cost functions are parameterized by frame inputs, architecture type, etc.
Each score is calculated in the same way, using
the following methodology. We first calculate a
local score ls(x) for each of n entities x. For
example, for the host score, the entities are the
hosts h ∈ V , and the local scores are the computational loads L(h) on each host. We then
determine the scaled leftover capacity slc(x) by
subtracting the local score from, and dividing it
by, the local capacity c(x):
slc(x) =
8
c(x) − ls(x)
c(x)
Metavariables
o, p
∈ Operations
O, U, S, R ∈ P(Operations)
h
∈V
l
∈E
(i.e.
(i.e.
(i.e.
(i.e.
some operation)
a set of operations)
a host)
a link)
Predicates on the network (V, E) and its operation model
hosts(l)
The set of hosts connected via link l
latency(l)
The time required to send one bit across link l
capacity(l)
The bandwidth available on link l
capacity(h)
The computational cycles per second available on host h
cost(o)
The computational cost of operation o
fsize(o)
The average size of frames output from operation o
Predicates derived from the global CMN
inputs(o)
The set of operations inputting into operation o
interval(o)
The minimum interval of operation o
maxdelay(o) The maximum acceptable delay for operation o
Predicates on the schedule being considered
link(o, p)
Assuming o and p are connected send and receive operations, this is
the link between them
host(o)
The host on which operation o is scheduled
ops(h)
The set of operations scheduled on host h
rops(h)
The set of receive operations scheduled on host h (⊆ ops(h))
Table 1: Definitions for Global Scheduling Algorithm
9
When the load exceeds the capacity, slc(x) will
be negative; otherwise it will be between 0 and
1 (higher is better). Finally, we aggregate the
scaled leftover capacities into a single value. To
favor assignments that avoid overloading a single entity, we use the harmonic mean, which
strongly weights lower values, when all slc(x)
are non-negative. If any slc(x) is ≤ 0, we use
the smallest individual value:
m=
X
all
1
1
n × slc(x)
x
minall x (slc(x))
slc(x) > 0 for all x
otherwise
Using this technique, we calculate the three
scores as follows:
• Host Score. For the host score, the entities
x are the hosts h, the local score is the computational load on the host L(h), and the
capacity is the host’s total computational
capacity capacity(h). The computational
load L(h) is, for every operation o scheduled on host h, the cost of the operation o
divided by its specified minimum interval:
L(h) =
cost(o)
interval(o)
o∈ops(h)
X
• Network Score. For the network score, the
entities are the network links l, the local
score is the required bandwidth on the link
B(l), and the capacity is the link’s total
capacity capacity(l). The required bandwidth B(l) is calculated by determining, for
all hosts h connected by link l, the receive
operations whose data arrives over link l,
and summing their relevant output frame
sizes (which match their input frame sizes)
10
divided by their intervals:
B(l) =
fsize(o)
interval(o)
h∈hosts(l) o∈rops(h)
X
X
• Operation Score. Finally, for the operation
score, the entities are the operations o, the
local score is the operational delay D(o),
and the capacity is the user’s maximum acceptable delay maxdelay(o). The operational delay D(o) is intuitively the maximum time the operation must wait from the
time the CMN first receives a frame to the
time the operation in question can operate
on it. To calculate this, we first determine
the maximum delay on each host md(h) as
the sum of the costs of all operations that
run on that host:
md(h) =
X
cost(o)
o∈ops(h)
The idea is that md(h) is the maximum time
an operation could be delayed due to other
operations running on the same host; we
assume no operation o will run twice once
an operation p becomes runnable. We then
determine the maximum delay nd(o, p) on
each pair of connected operations in a similar manner:
nd(o, p) =
(
latency(link(o, p)) link(o, p) defined
0
otherwise
That is, nd(o, p) is the delay of the network connecting operations p and o, which
is latency(link(o, p)) when link(o, p) is defined (i.e. when o and p are paired send and
receive operations), and 0 otherwise. Finally, we calculate the delay D(o) as the sum
of maximum delay on the local host and all edges in the CMN graph defined by the O operof the delays of o’s upstream neighbors and ations. If each operation has only a single input,
the intervening network links (if any):
then degin (O) = |O|.
Given these definitions, the time to compute
D(o) = md(host(o))+
the
node score, network score, and operation
P
D(p) + nd(o, p)
score are as follows:
p∈inputs(o)
When operations take inputs from multiple operations we also add the minimum interval, as the operation may have to wait
that long to be scheduled. When aggregating D(o), we only consider operations o for
which the user has specified a maximum acceptable delay maxdelay(o).
Note that there is a tension between latency
and bandwidth: a path with minimal latency
may not be the most bandwidth-plentiful path.
Similarly, there are other tensions, say between
minimizing CPU over and minimizing network
usage. Our approach to specification allows
these issues to be resolved, as the scheduler tries
to optimize the worst utility achieved by all its
users.
3.2.1
Cost of Scoring
The cost of scoring an assignment is a function
of the network and the user specification assignment. The user assignment consists of the mapping of O operations to network nodes, where
O is broken down into S send operations, R receive operations, and U (other) user operations.
Furthermore, we define
host score
O(V + O)
network score
O(E + R)
operation score O(O + degin (O))
For the host score, the V component comes from
taking the mean over all hosts h in the network,
the O component is due to calculating the score
L(h). For the network score, the E component
comes from taking the mean over all network
links l, and the R component is due to calculating the required bandwidth B(l) on each link by
looking at all receive operations. Finally, for the
operation score, the O component comes from
taking the mean over all operations, as well as
calculating the maximum delay md(h) for each
host h. The degin (O) component comes from
summing the upstream delays for each operation.
This leads to a total cost of O(V + E + O +
degin (O)). When degin (O) is roughly equal to
O (e.g. for multicast-style applications), we have
O(V + E + O).
3.3
Choosing an Assignment
To pick an assignment that maximizes user utility, we must consider user specifications at various levels of utility, pick possible assignments,
score them, and choose the best one. Even when
ignoring multiple utility levels, and the need for
X
placing intervening send and receive operations,
degin (O) =
|inputs(o)|
it is easy to see that for a particular network and
o∈O
CMN, there are U V possible assignments. This
This is the total number of inputs for all opera- means a brute force enumeration of assignments
tions in O, which corresponds to the number of is infeasible. Therefore, a reasonable algorithm
11
requires a way to prune the assignment space order in which users are considered in this phase
depends on their priority.
while still arriving at a good schedule.
The inner phase tries to find a reasonable asWe present an algorithm here that is roughly
2
2
2
5
signment
for a given global CMN (that is, the
O(V U (U V + E)). The algorithm works well
CMN
resulting
from merging the user CMNs of
in the scenarios we have considered, but could
certainly be improved; e.g. ideas from related various utilities). The first thing it does is to
work could be applied [12, 30]. The basic idea discover the connected components of the global
that recurs throughout the algorithm is that CMN. These are essentially the multicast trees of
rather than consider an entire search space (such user operations that originate from a particular
as assignments of user operations to nodes, or the video source. Then, for each of the c connected
combinations of users’ CMNs at different util- components, the inner phase does the following:
ity levels), we break a space into more coarse1. It calculates the all-pairs “shortest” paths
grained pieces, and make locally beneficial deciof the network, using maximum bottleneck
sions.
bandwidth (also referred to as the path bandWe create an assignment of operations to hosts
width)6 as the metric of optimality; this
in two nested phases. In the outermost phase,
takes time O(V 3 ). We could also use the
the scheduler does a binary search on the utility
maximum-weight spanning tree, rather than
space, trying to find the best utility assignment.
all-pairs shortest paths, to speed up the
When evaluating utility u, the scheduler picks
computation to O(E + V log V ) when using
for each user the CMN that has utility u or the
Prim’s algorithm with Fibonacci heaps.
closest one below u. It then merges the CMNs
and executes the inner phase, described next, to
2. It assigns all operations to default locations
find best-scoring assignment. If the score of the
(either their assigned location, or on one
assignment is nonnegative, the scheduler tries
particular node). Next, it inserts send and
a higher utility value; otherwise a lower one.
receive operations in the CMN to connect
This process continues until the remaining utiloperations adjacent in the CMN but asity space becomes smaller than some pre-chosen
signed to different hosts. Using the most
ǫ. The utility chosen is the lower bound of this
bandwidth-plentiful links, determined by
space. If 0, the algorithm could not find an asthe shortest path computation just comsignment that works.
pleted, the GS creates a tree of paths origAs an optimization, after we have arrived at
a lower bound, we try to improve the utility of
some (but not all) users, by increasing the utility of each individual user one at a time. When
at some point this fails because not enough resources are available, the algorithm finishes. The
inating from each operation to its immediate downstream neighbors in the CMN.
At each intermediate (physical) host in this
tree, the scheduler inserts receive and send
operations to forward the data. It can then
calculate the score.
6
5
Recall that R ≤ O, so it gets folded into the O portion
of the equation.
The maximum bottleneck bandwidth of a path is the
bandwidth of the link along the path with the smallest
available bandwidth.
12
Shortest paths for all CCs
+
×
×
Each possible assignment
Inserting send/receive ops
Calculating the score
+
Updating graph for all CCs
=
=
=
=
=
O(cV 3 )
= O(U V 3 )
UV
O(degin (U ) · V )
O(V + E + U + S + R + degin (O))
= O(V + E + U + S + R + degin (U ))
= O(V + E + U + U V + degin (U ))
= O(U V + E + degin (U ))
O(c(V + E))
= O(U (V + E))
O(U (V 3 + V + E)) + O(U V · degin (U ) · V · (U V + E + degin (U )))
O(U V 3 + U V + U E + V 2 U · degin (U ) · (U V + E + degin (U )))
O(V 2 U · degin (U ) · (U V + E + degin (U )))
O(V 2 U 2 · (U V + E + U ))
O(V 2 U 2 · (U V + E))
since c ≤ |U |
since degin (S ∪ R) = S ∪ R
since (|S| + |R|) ≤ |U | · |V |
since c ≤ |U |
assuming degin (U ) = U
Table 2: Breakdown of Global Scheduling Running Time, Inner phase
The cost of inserting the send and receive
operations is roughly O(degin (U )·V ). This
cost arises from the fact that for each user
operation input o, if its predecessor is located on a different node, we determine the
shortest path between the two (using information already calculated), and merge this
path with the tree already rooted at o. The
cost of merging the path is essentially the
length of the path itself when using a predecessor matrix implementation. No path will
ever be greater than the number of nodes in
the network, so this length is bounded by
V.
tions can be moved. If no operations are
fixed, U · V assignments of operations to
nodes will be considered.
4. Finally, the connected component under
consideration is fixed at its best scheduling, and the loads on the links in the GS
network model are updated; this takes time
O(V + E). The scheduler then moves on to
the next connected component.
Note that the order that each of the c connected
components will affect the resulting schedule, favoring the components considered first. This order should be consistent, once chosen, to avoid
frequent reconfigurations. Ordering could be
3. At this point, it tries to greedily improve determined randomly, by per-user priority, and
the score by relocating each (movable) user other metrics.
operation to each possible host, remembering for each operation the location that im3.3.1 Cost of choosing an assignment
proves the score most. Send and receive operations must be inserted for every change. The running time of the inner phase of the
This process continues until no more opera- algorithm is O(V 2 U · degin (U ) · (U V + E +
13
degin (U ))), as broken down in Table 2. In the 3.4 Discussion
case of a multicast-style application, degin (U ) =
U , yielding a running time of O(V 2 U 2 · (U V + Here we discuss some aspects of the global
E)). This phase will be run for each combination scheduling algorithm.
of user utilities considered by the outer phase.
While performing binary search, the outer Utility and Resource Usage A key tenet
phase will consider log 1ǫ possible schedules. Dur- of the algorithm is that for a given user, a
ing the optimization phase it will consider n · U lower utility CMN will require fewer resources
additional schedules, where U is the total num- to schedule. Furthermore, as described in Secber of users, and n is the maximum number of tion 2.1, the absolute magnitude of user utility
distinct utilities in each user specification. For values needs to be set based on a user’s prioreach schedule, the U user operations at the util- ity and the total resources consumed. At the
ities being considered will have to be combined moment, we assume that users only have access
into the global CMN. This yields a total run- to adaptation templates that can, in aggregate,
ning time of O(U + V 2 U · degin (U ) · (U V + E + meet these requirements.
In future work, we plan to develop a generic
degin (U ))(log 1ǫ + U n)).
notion of resource usage that combines the CPU
We are most interested in how the algorithm and bandwidth requirements7 of a CMN as if
scales as we increase the size of the network it were scheduled on a virtual topology, which
(V, E) and the number of users. With this in should in some way approximate the actual one.
mind, we can simplify the characterization of
Relating utilities to resource usage depends
the running time by relating other parameters on a well-defined generic notion of resource usto these variables. In particular, we can assume age, which depends on the available resources.
that U , which is the sum total of all user oper- For example, in a CPU-plentiful environment we
ations for a particular combination of user util- would expect more weight on the bandwidth. If,
ties, is equal to u · x for some constant x, where for the next lower user utility, the bandwidth
u is the number of individual users. This is basi- requirement drops by 100 KB/s while required
cally assuming that all user CMNs have fewer CPU time goes up by 1 ms, then the overall rethan x nodes. Moreover, we can hold n and sult would be a significant decrease in resource
log 1ǫ constant. This results in a simplification use. On the other hand, for CPU-poor environof the running time to O(V 2 U 2 (U V degin (U ) + ments, this situation may actually signal a reE +degin (U )2 )), broken down in Table 3. When source increase. Our algorithm can weight single
holding degin (U ) = U for multicast-style appli- resource scores (i.e., host score, network score,
cations, we arrive at O(V 2 U 2 (U 2 V + E)). This and/or operation score) by taking each to a parbasically illustrates that the dominating cost of
7
Other resource requirements are likely to be useful,
the algorithm is the number of users being conparticularly
memory requirements. To keep things simsidered, and secondarily the number of nodes in
ple, we expect to combine CPU and memory use as a
the network. Being polynomial in V, E, and U single metric. On the other hand, our current algorithm
makes the algorithm scale better than the expo- could be extended to include memory counters without
much trouble.
nential, brute-force approach.
14
=
=
=
=
=
=
O(V 2 U degin (U )(U V + E + degin (U ))(log 1ǫ + U n))
O((V 3 U 2 degin (U ) + V 2 U E + V 2 U degin (U )2 )(log 1ǫ + U n))
O((V 3 U 2 degin (U ) + V 2 U E + V 2 U degin (U )2 )(U n))
O(V 3 U 3 degin (U ) + V 2 U 2 E + V 2 U 2 degin (U )2 )
O(V 2 U 2 (U V degin (U ) + E + degin (U )2 ))
O(V 2 U 2 (U 2 V + E + U 2 ))
O(V 2 U 2 (U 2 V + E))
log 1ǫ constant
n constant
assuming degin (U ) = U
Table 3: Simplifying the Running Time Characterization of the Algorithm
ticular power. (For a negative score s and power struct Operation {
string_t id;
x, use −(−s)x .) After doing so, each score will
double interval;
still be negative for exceeding a capacity, 0 for
fn_t<streambuff_t,int> ? inports;
fn_t<streambuff_t,int> ? outports;
reaching it exactly, and 1 for being completely
fn_t<fn_t<streambuff_t,int> ?,int> schedule_f;
unloaded.
};
Implementation As part of our prototype,
we have implemented the GS in C, consisting
of about 10,000 lines of code. We measured the
performance of the scheduler on-line for the experiments presented in Section 5.3, in which we
had an eight node network with five users. We
measured running times of between 1 ms and
90 ms, with the longer running times for the
cases when the network was more loaded, and
thus more possibilities were considered. Much
of the running time is due to ‘constant factors’
in our implementation that we have yet to tune.
For example, we use an excessive amount of allocation when inserting send and receive operations for each configuration
Figure 4: A user operation in Cyclone
directed to do so; we describe all of these tasks
in this section.
In our prototype, the LSs are written in the
type-safe systems language Cyclone [22], which
is based on C, comprising roughly 13,000 lines
of code. Cyclone is simply C at its core, but
with both restrictions to ensure type-safety and
enhancements for greater flexibility (e.g., exceptions, tagged unions, garbage collection, a variety of safe pointer types, etc.).
4.1
4
Local Scheduling
The LS is responsible for implementing the CMN
provided by its parent GS, reporting back resource consumption information (monitoring),
and safely reconfiguring to use a new CMN when
Implementing a CMN
When a compute node receives a reconfiguration message, it translates the CMN into a graph
of data structures implementing its operations.
The Cyclone implementation of an operation is
shown in Figure 4. Each operation consists of
a name and a compute interval, which map di-
15
rectly from the user specification. It also contains 0 or more input ports and 0 or more output ports, where the output ports refer to the
inport ports of the downstream operations. We
use an upcall model where each input port is a
closure that expects a frame, along with some
operation-specific state, encoded as the environment of the closure.8 An inports closure is
invoked by the upstream operation when it invokes its corresponding outports closure. In the
figure, the syntax fn t<streambuff t,int> indicates the type of a closure specialized to inputting streambuff t structures (our definition
of frames), and returning integers. The ? syntax
indicates a dynamically-sized (i.e., malloc’ed)
array.
In addition to being invoked by receiving input data, an operation can be scheduled to run at
the requested interval, by calling its schedule f
function. This function is also a closure, having some hidden state relating to the operation,
and additionally the list of the operation’s output ports, so that it can send any generated
data downstream. For example, we implement
monitor operations that wake up at specific intervals and generate monitoring messages; these
messages are sent out by invoking the provided
outports.
In essence, the local scheduling algorithm is as
follows: after creating components that implement the operations, it sorts them topologically
based on their data-flow. Next, it uses deadlines to ensure that operations are run as soon
as possible after the prescribed interval elapses
(if one is given), following the topological order8
Closures are not supported directly in Cyclone, but
via abstractions for existential types which can be used to
encode them [27].
ing. When operations are data-driven the LS
simply runs the operations when frames arrive.
4.2
Transport
To allow legacy applications to use MediaNet
seamlessly, MediaNet’s transport protocol, implemented by its inserted send and receive operations, needs to meet the API expected by the
application. For example, if the application uses
TCP to receive its data, then MediaNet must
not only connect to that application via TCP
on its last hop, but also needs to ensure that
data is delivered to the receiver reliably, in order, and without duplication by that point. A
UDP-based application would impose fewer requirements. MediaNet should support a variety
of transport protocols between send and receive
operations to maximize the performance of the
system while still meeting the minimal requirements of the application. For expedience, our
prototype implementation uses TCP exclusively;
we plan to support other transport protocols,
such as UDP, RTP [38] over UDP, and possibly
others.
One benefit of using TCP within MediaNet
is that it readily communicates bandwidth limitations, mitigating the need for external available bandwidth detection facilities. In particular, when the TCP send buffer fills up, the
application receives an EWOULDBLOCK error and
therefore queues its frames until more bandwidth is available. Once the application queue
is filled, the consequent action depends on the
application semantics. For streams that can tolerate dropped frames, like video streams, MediaNet will start dropping frames based on priority. User-supplied operations are used to set the
priority (see Figure 2), supporting local adaptation. If a stream cannot tolerate lost data, then
16
MediaNet will exert backpressure to the sending
application, effectively to throttle its rate (until
a reconfiguration can take place). Choosing a
reasonable queue size is important for reconfigurations, and we mention it further below.
4.3
Monitoring
For adaptive reconfiguration to be profitable, the
GS must be reasonably well-informed of changes
to the network, particularly those to its topology
and to the loads on nodes and links. We have
been focusing on bandwidth limitations in our
experiments, and therefore on available bandwidth reporting; we have yet to implement a
CPU monitor.
Available bandwidth detection is an ongoing
area of research with no clear, general solutions as yet [13]. In particular, various techniques trade off accuracy, overhead, and measurement time. For example, packet-pair-based
estimates [13, 4, 17] can quickly predict available
bandwidth with extremely low overhead (just a
few packets), but only reliably so for single hop
links [20, 4], including wireless links [4]. On the
other hand, Jain and Dovrolis’ approach using
one-way delays [20] works for multi-hop paths
with reasonably low overhead (on the order of a
few hundred packets), but the estimation time
is typically between 10–30s and is often within a
couple Mbps of the “actual” value. These limits
to accuracy and speed constrain the timescales
and magnitude of the changes made by the global
scheduling service.
In our implementation, each LS notes how
much data is sent and dropped (at the application level) for a given link, and sends this
information periodically to the GS. These reports provide a low-overhead, highly relevant
way of assessing the available bandwidth. It is
low-overhead because the information is piggybacked on the actual stream being sent, and it
is most relevant because it directly reports the
value of interest to the global scheduler: how
much data can a node send across a particular
link using the appropriate transport protocol?
We observe that each link report will indicate
that either the link can support the bandwidth
imposed on it, or it cannot. That is, either all the
data intended to be sent was sent, or else frames
were dropped or backpressure was applied. For
the latter case, the GS knows that the link is at
peak capacity, so it sets its estimate to the reported sent bandwidth (for broadcast links, reports must be aggregated). In the former case,
it knows the capacity is at least the reported
amount.
Unfortunately, this approach only provides an
upper bound on the available bandwidth when
a link is overloaded; this is the main drawback
of the technique. To compensate, if the GS
does not receive a link peak capacity report for
some time, it assumes that additional bandwidth
might be available and so begins “creeping” its
bandwidth estimate for that link at regular intervals in the spirit of TCP’s additive increase. The
net effect is that eventually a reconfiguration will
take place that uses the assumed-available bandwidth; if the estimation is incorrect then the new
configuration will fail and another will take place
to compensate. We currently increase the estimate by a constant w = 3% each second beginning t = 5 seconds after a peak capacity report.
The choice of values for w and t essentially determines how rapidly the GS tries to find unused
bandwidth.
There is a tension between monitoring and
accuracy: the more frequently that monitoring
information is sent, the more accurate the GS
network model will be, but the more overhead
17
there will be on network links. To reduce traffic
but maintain accuracy, each LS sends non-peak
reports only when the reported bandwidth increases by ∆ = 10%. Peak reports are sent every
r = 1 seconds. We have found that this approach
(rather than sending all reports every r = 1 seconds) reduces monitoring traffic by roughly 75%
in our experiments.
As future work, we plan to incorporate other
forms of feedback and estimation into our link
estimates to improve the accuracy of the GS’s
network view. For example, Jain and Dovrolis’
technique of finding an increasing trend in the
latencies of sent packets could be incorporated
into our measurements to determine an upper
bound before a link becomes overloaded. In general, we wish to associate “confidence” measures
with link bandwidth estimates, so that mostly
estimated links are not weighted as highly as
those with recent measurements during scheduling. We also imagine that “link profiles” could
be used to estimate unmeasured links based on
past usage. Finally, we could consider testing
unmeasured links after a new schedule is determined but before it is used to configure the network.
4.4
Reconfiguration
When a global reconfiguration is initiated, it
should take effect quickly and safely, without
negatively impacting perceived user quality. We
do this by defining a protocol that allows old and
new configurations to run in parallel until the old
configuration can be removed. We use a number
of mechanisms to ensure the old configuration is
removed as quickly as possible, while preserving
the application’s expected stream semantics.
The protocol works as follows. Whenever the
GS calculates a new schedule, it sends a new
CMN to each LS. The LS schedules this CMN
immediately upon receipt, in parallel with its
existing configuration. So that the two configurations do not interfere, the GS assigns different TCP port numbers to its inserted send and
receive operations. As such, these operations
will establish connections, but connections to the
video source and receiver outside of MediaNet
(which are still using the same ports) will be delayed until they are closed by the old configuration. Next, all video source applications are
notified that a reconfiguration has taken place
(we do this using out-of-band TCP data from
the downstream MediaNet node). They each
close their connections to MediaNet and reconnect, this time connecting to the new configuration. In the meantime, the old configuration will
continue to forward any data it has toward the
destination; when a LS’s old queues are flushed
the old configuration is removed. When the last
bit of old data is sent to the video receiver, the
new configuration will be able to connect to it
and forward its data.
Using this protocol, we minimize the time during which the video source and receiver are disconnected from MediaNet; for our experiments
this time averages 1 ms, which is far less than a
typical video inter-frame interval of 33 ms. Even
so, to reduce the total switching time we must
reduce the time that the old configuration stays
connected to the receiving application; during
this time, frames from the new configuration
queue while waiting to connect to the receiver.
In the case that frames can be dropped, we reduce the old configuration’s lifetime by quickly
clearing its application queues via priority-based
frame dropping, as described next. Otherwise,
the queues must clear naturally; this suggests
that when frames cannot be dropped, applica-
18
tion queue lengths should be relatively short, so
as to permit quicker reconfigurations.
We initiate frame dropping in two ways. First,
we tie together the queue lengths of connections
using the same link, so that higher priority new
frames can force the dropping of lower-priority
old ones sharing the same link. Second, for those
cases in which the old and new paths are not
shared, we set a drop timer (currently going off
every 0.5 s) that proactively drops increasingly
higher priority frames from the old queues. Using these methods, our average reconfiguration
time in the larger experiment in Section 5.3 is
0.3 s, with a maximum time of about 1.1 s; these
times are easily within the buffering window of
most video players.
For stability, global reconfigurations are initiated at most once per reconfiguration window
w, currently with w = 5 seconds. The larger
this window, the less adaptive, but the more stable the system. We are currently experimenting
with different kinds of windows for limiting reconfigurations, based on the quality of the network model, rather than on a fixed timeout.
5
pc1
pc5
L1
L3
pc3
L2
pc2
pc7
L5
L7
pc4
L4
L6
pc6
L8
pc8
Figure 5: Experimental Topology on Emulab.
5.1
Configuration
The experiments were performed on Emulab [14,
41], configured to use the topology shown in
Figure 5. Each node is a 850MHz Intel Pentium III Processor running RedHat Linux 7.1,
having 512MB RAM and 5 Intel EtherExpress
Pro 10/100Mbps Ethernet cards. Emulab supports ”dynamic events scheduling” to inject traffic shaping events on-the-fly, implemented by
Dummynet nodes [36]; we use this to increase
and decrease the available bandwidth on various
links during our experiments. In all experiments
we ran a LS on every node and the GS on pc3.
For the source video, we loop a MPEG video
stream with the following frame distribution:
Frame Type
I
P
B
Experiments
Size (B)
13500
7625
2850
Frequency (Hz)
2
8
20
This video requires about 145 KB/s to send at
In this section, we present experiments that its full rate, about 88 KB/s to send only I and P
measure MediaNet while delivering an MPEG frames, and about 27 KB/s to send only I frames.
video stream under various topologies and load
conditions. We show that MediaNet consis- 5.2 Exploiting Global Adaptation
tently delivers good performance and efficient
network utilization by effectively utilizing redun- To demonstrate the benefit of local adaptation
dant paths, by exploiting commonality in user under network load, and then the added benefit
specifications, and by carefully locating CMN of global adaptation, we compare four different
operations.
configurations:
19
decoded
not decoded
L3 b/w
L4 b/w
200
150
100
50
250
bandwidth (KB/s)
bandwidth (KB/s)
250
0
decoded
not decoded
L3 b/w
L4 b/w
200
150
100
50
0
0
50
100
150
0
50
time (s)
(a) no adaptation
decoded
not decoded
L3 b/w
L4 b/w
200
150
(b) local priority-based frame dropping
150
100
50
250
bandwidth (KB/s)
bandwidth (KB/s)
250
100
time (s)
0
decoded
not decoded
L3 b/w
L4 b/w
200
150
100
50
0
0
50
100
150
0
time (s)
50
100
150
time (s)
(c) local proactive frame dropping
(d) MediaNet
Figure 6: User-perceived performance under diminishing bandwidth for various adaptivity schemes.
20
• The “no adaptation” configuration consists
of streaming the data at the desired play
rate, oblivious to network conditions. We
implement this with the MediaNet LSs only.
• The “priority-based frame dropping” configuration consists of tagging P, B, and I
frames with successively higher priority, and
during overload the lowest priority frames
are dropped. This approximates some past
approaches on intelligent frame dropping [5,
23].
• The “proactive frame dropping” configuration also consists of intelligently dropping video frames during overload. In this
case, the LS observes when frames start
getting dropped for a particular link, and
then adapts by proactively dropping all B
frames, and then later all P frames. When
dropping frames, the LS will occasionally
attempt to improve the configuration; i.e.,
if it is dropping P and B frames, it will
try just dropping B frames. This configuration approximates past approaches to intelligent, in-network frame dropping, as well as
end-to-end layered approaches [26] (where
each frame type essentially defines a layer).
In particular, the path of the data never
changes, just what data is sent along that
path. For this experiment, we implement
this approach by using the GS but preventing it from choosing alternate paths.
• Finally, the “global adaptation” uses MediaNet’s GS with the user specification depicted in Figure 2.
For each configuration, we ran an experiment
that uses the diamond portion (pc3 to pc6) of
our topology (Figure 5), with a single video
sender on pc3 and a receiver on pc4 The experiment measures the video player’s performance,
in terms of the received bandwidth and the decodable frames, as we lower both link L3’s and
L4’s available bandwidth over time.
Each of the graphs shown in this section has
the same format. Each light gray circle in the
figure is a correctly-decoded frame, while each
black × is an incorrectly decoded one. The figure
plots time versus bandwidth, so the x-location
is the time the frame is received, and the ylocation is the bandwidth seen by the player at
that time (aggregated over the previous second).
The available bandwidth, as set by Dummynet,
is shown as dashed and/or solid lines. Dropped
frames are not shown.
Figure 6(a) shows the no adaptation case. At
the start, the route to the receiver is fixed along
L3, and as the available bandwidth on the link
drops the video quality degrades. The application cannot decode the majority of the received
frames because temporally important frames (I
and P frames) are being dropped. During playback, each undecodable frame manifests as a
“glitch” noticeable by the user. In this case, the
large and constant clumping of glitches is quite
disruptive. In the players we have used, these result in a checkerboard pattern momentarily appearing and corrupting the playback; corrupted
playback persists until a frame can be correctly
decoded.
Figure 6(b) shows the priority-based dropping
case. In this case, playback improves when dropping B frames, but remains poor under highly
loaded conditions. Until roughly time 50, the
player can decode all of its received frames, but
after that a large fraction of frames cannot be decoded properly. This is because by this time we
are only sending I or P frames, so any dropped P
21
frame could prevent downstream P frames from
being decoded.
In contrast, when using local adaptation along
the same path, the performance improves significantly, as shown in Figure 6(c). The few glitches
that occur are as a result of a sudden drop in
available bandwidth, and due to attempts to obtain a better configuration when no resources are
available. By dropping all B and/or P frames,
we avoid dropping frames that could lead to temporal glitches.
While the proactive frame-dropping adaptation significantly improves playback along a congested path, it fails to use alternative paths
that could further improve playback. In contrast, MediaNet’s global scheduler reconfigures
the network to utilize redundant paths.9 Figure 6(d) shows how MediaNet’s GS reroutes traffic through pc6 when L3 becomes congested at
roughly time 30, utilizing the idle L4. Later,
L4’s bandwidth is reduced as well, which causes
MediaNet to start dropping frames until it
reaches the same level as the local case.
A number of times in this experiment, the GS
optimistically assumes that more bandwidth is
available on unmaximized links and attempts to
improve the total utility. At time 105 when link
L4’s bandwidth drops, it tries to reroute the flow
through link L3. However, L3 has even lower
available bandwidth, and so after the reconfiguration window expires (here set to 5 seconds), the
GS returns the configuration to link L4, at utility
0.4 (dropping B frames). Similar failed attempts
occur at times 120 and 155. Our user configuration mitigates the negative effects of such reconfigurations by intelligently dropping frames
until the network is reconfigured. Ideally we
could prevent these spurious configurations without becoming so conservative so as to degenerate
to local adaptation only; possible approaches are
discussed in Section 4.3.
We should emphasize that MediaNet’s contribution is not simply multi-path routing or local
adaptation, since each has been explored in prior
contexts. Rather, MediaNet’s global scheduling service encapsulates a more general way of
performing adaptation on a network-wide basis,
based on individually-specified adaptation preferences and metrics. In so doing, it in effect
employs both local adaptations (i.e., proactive
frame dropping) and global adaptations (i.e.,
path rerouting), among others, to meet the needs
of users and the network.
5.3
Multi-user Sharing
To examine how resources are shared among
users, we configured MediaNet with two video
sources and five clients. Video v1 on pc1 has
three clients: users user1 on pc5, user3 on
pc7, and user5 on pc8; video v2 on pc2 has
two clients: users user2 on pc4 and user4 on
pc7. Each user specification varies from the one
shown in Figure 2 in the specification of the video
source and user locations.
If all links are fully available, the GS assigns
the operations as shown in Figure 8(a). The unlabeled operations are TCP sends and receives,
and the Pr operations assign frame priorities for
intelligent dropping. In combining the five user
specifications, the GS has essentially created two
multicast dissemination trees, and uses L3 for the
v1 stream and L4 for v2.
9
Redundant paths occur frequently in the wide
The performance as seen at the two sets of rearea [37], and mobile hosts often have multiple networks
ceivers
is shown in Figure 7. At time 20, the
available, e.g., many laptops have cellular, 802.11b, and
bandwidth on link L3 is reduced to 100 KB/s,
Ethernet.
22
decoded
not decoded
L3 b/w
L4 b/w
200
decoded
not decoded
L3 b/w
L4 b/w
300
bandwidth (KB/s)
bandwidth (KB/s)
300
100
0
200
100
0
0
20
40
60
80
0
time (s)
20
40
60
80
time (s)
(a) users 1, 3, 5
(b) users 2, 4
Figure 7: User-perceived performance for multiple user scenario.
and so the GS reconfigures the network to be
as in Figure 8(b) where all flows go along link
L4 so as to maintain utility 1.0 for all users. At
time 40, the bandwidth on link L4 is dropped
to 200 KB/s, making it impossible to carry both
streams along that link. As such, the GS reconfigures to be as in Figure 8(c), in which v2
is sent along link L3 with its B frames dropped
(as indicated by the dB node on pc2), using the
utility 0.4 CMN, while v1 goes along link L4 at
utility 1.0 for all users. Notice that the GS has
scheduled the dB (dropping B frames) node at
the source pc2 rather than at the node connected
to the congested link, for better network utilization. At time 60, L4’s bandwidth also drops to
100 KB/s, which results in all flows now operating at utility 0.4, as shown in Figure 8(d). Here
we can see that this is essentially the same as the
unloaded configuration in Figure 8(a) but with
dB nodes on both of the video source hosts.
During the run, the GS guesses that additional bandwidth might be available on various
links, and so attempts to improve the configuration. This occurs at time 50 (to improve to
Figure 8(a)), but fails and reverts back at time
55. A similar attempt is made at time 80 (to go
up to Figure 8(c)).
6
Related Work
Although distributed multi-media research has
been popular for decades, the idea of multimedia processing in the network was first inspired by the problems of digital video broadcasting in heterogeneous networks [40, 33]. The
goal of providing adaptive QoS for streaming
data is shared by a number of systems, including Active Networks applications [39, 34, 5],
QoS middleware substrates [23, 24, 25], and
application-layer in-network processors [43, 1, 2].
Other projects have targeted the dissemination
to mobile, wireless workstations, such as Quasar
[18] and Odyssey [31]. None of these systems
focuses on sharing resources among many users
with differing adaptation preferences, though
adaptivity mechanisms and resource models are
quite relevant.
23
(a) no congestion
Pr
v1
Pr
pc1
pc5
Pr
Pr
Pr
pc8
Pr
Pr
v2
user5
Pr
user1
Pr
pc3
pc2
pc4
pc6
user3
user2
pc7
user4
(b) L3=100Kb/s
v1
Pr
Pr
user1
pc1
Pr
pc5
Pr
user5
pc8
Pr
Pr
v2
Pr
Pr
Pr
pc3
pc2
pc4
Pr
pc6
user2
user3
pc7
user4
(c) L3=100 L4=200 Kb/s
v1
user1
pc5
Pr
pc1
Pr
Pr
Pr
Pr
user5
pc8
Pr
Pr
v2
pc2
dB
Pr
Pr
pc3
Pr
pc4
pc6
user2
user3
pc7
Figure 8: Scheduling under varying conditions for multiple users.
24
user4
A few systems have considered efficient stream
adaptations shared among many users. Layered
multicast [26, 34, 35] shares resources efficiently
among many users, and Degas [32] contains decentralized protocols for task distribution and
load balancing of streaming data operations.
Layered multicast layers are coarse-grained abstractions, however, and do not support more
“computational adaptations” like transcoding.
Degas similarly fails to account for user preferences in scheduling adaptations.
MediaNet shares some mechanisms with certain overlay networks (e.g., RON [3]) that, in
addition to constructing a flexible virtual network on top of physical networks, can provide
improved network performance via alternative
paths in conjunction with bandwidth probing
and failure detection [3, 37, 10]. To date, these
systems have not been concerned with QoS (i.e.,
real-time constraints) of streaming data, or sharing of resources among many users.
An alternative approach to adaptive QoS is
reservation-based QoS, in which resources like
CPU and bandwidth are allocated for applications in advance [9, 7, 8]. The drawbacks of
reservations are that underlying support is not
widely available, and allocated resources can
be underutilized, resulting in inefficiency. A
number of systems looked at application-specific
scheduling in reservation-capable environments,
for example, the OMEGA end-system architecture [28, 29].
A number of systems share our goal of
supporting user-specified, adaptive streaming
data applications, including CANS [15], Conductor [42], Darwin [11], End-to-end Media
Paths [30], Ninja [16], PATHS [6], and [12]. Central to all of these systems is the notion of paths
of stream transformers that must be scheduled
on the network, and the presence of a centralized
plan manager to schedule paths across the network, similar to MediaNet’s GS. However, these
systems only use the plan manager at initialization or rarely, while MediaNet’s GS runs continuously. Less attention has been paid to exploring fast, on-line scheduling algorithms that are
nonetheless effective, which would be needed in
a scalable on-line system. As such, these systems do not take advantage of path-based, userspecified adaptation. In addition, plan managers
appear to consider scheduling only for a particular application or flow, as opposed to the combination of many or all existing applications or
flows, and therefore miss opportunities to improve both per-user and network performance,
for example, by aggregation and/or re-routing.
7
Conclusions
MediaNet is an architecture for user-specified,
globally-adaptive QoS in distributed streaming data applications. It has two clear benefits. First, adaptations are user-specified, rather
than a system-determined. Second, MediaNet’s
global and local scheduling approach in effect
employs both global and local adaptations; our
experiments demonstrate better user and system
performance in three ways:
25
1. The GS aggregates users’ continuous media networks, removing redundancy in a
multicast-like fashion.
2. It utilizes redundant resources, such as alternative, unloaded routing paths.
3. It adapts proactively to prevent wasted resources, for example by dropping frames
close to the source when there is downstream congestion.
While our work is a promising first step,
many questions remain. Three important areas are scalability, accuracy, and applications.
For the first, we are interested in examining
hindrances to growth—such as monitoring message overhead, GS running times, and network instability—to understand possible tradeoffs. For example, by increasing the reconfiguration window we limit the effects of configurationflapping, but could be stuck with an ill-advised
configuration. Or, by reducing the frequency of
monitoring messages we reduce monitoring overhead, but increase the possibility of a bad schedule. We are particularly interested in devising a
hierarchical system, such as used in Darwin [11].
A central requirement to an on-line adaptive
system is to have an accurate view of its resources. As mentioned earlier (Section 4.3), we
are interested in employing additional monitoring techniques and better heuristics for weighing
information.
Another possible enhancement would be to
consider splitting (or ‘striping’) data across multiple paths between a single source and destination. Doing so would require that the split data
be properly resequenced upon reaching the destination, unless the receiving operation could tolerate out-of-order arrival. While striping would
allow greater efficiency, it could significantly increase the scheduling overhead.
Finally, we have just scratched the surface of
MediaNet’s possibilities by experimenting with
only network-limited (i.e., video plus frame
dropping) applications. We believe that MediaNet’s generality will be quite useful when considering CPU-limited cases; for example, when
streaming data to embedded devices, and/or
while performing computationally-intense transformations, such as digital facial recognition or
motion analysis.
Acknowledgements
Thanks to Michael Marsh for helping us discover
the usefulness of the harmonic mean. Thanks to
Cyclone development team members Greg Morrisett and Dan Grossman for their rapid response
to our Cyclone-related problems. Thanks also to
Scott Nettles, Jonathan T. Moore, Bobby Bhattacharjee, Trevor Jim, and Jonathan Smith for
helpful comments on earlier versions of this paper.
References
26
[1] Elan Amir, Steve McCanne, and Zhang Hui.
An application level video gateway. In Proceedings of the Third ACM International Multimedia Conference and Exhibition (MULTIMEDIA), pages 255–266, November 1995.
[2] Elan Amir, Steve McCanne, and Robert Katz.
An Active Service framework and its application
to real-time multimedia transcoding. In Proceedings of the ACM SIGCOMM Conference, pages
178–189, September 1998.
[3] David G. Andersen, Hari Balakrishnan,
M. Frans Kaashoek, and Robert Morris. Resilient overlay networks.
In Proceedings of
the Eighteenth ACM Symposium on Operating
Systems Principles. ACM Press, October 2001.
[4] Benjamin Atkin and Kenneth P. Birman. Evaluation of an adaptive transport protocol. In
Proceedings of the IEEE INFOCOM Conference,
April 2003. To appear.
[5] Samrat Bhattacharjee, Ken Calvert, and Ellen
Zegura. On Active Networking and congestion. Technical Report GIT-CC-96-02, College
of Computing, Georgia Tech, 1996.
[6] John Markus Bjørndalen, Otto J. Anshus, Tore
Larsen, L.A. Bongo, and B. Vinter. Scalable
processing and communication performance in
a multi-media related context. In Proceedings of
the EUROMICRO Conference, September 2002.
[7] S. Blake, D. Black, M. Carlson, E. Davies,
Z. Wang, and W. Weiss. An architecture for
differentiated services. Technical Report RFC
2475, IETF, December 1998.
A. Joseph, R.H. Katz, Z.M. Mao, S. Ross, and
B. Zhao. The Ninja architecture for robust
Internet-scale systems and services. Computer
Networks, 35(4):473–497, March 2001.
[8] R. Braden, L. Zhang, S. Berson, S. Herzog, [17] Ningning Hu and Peter Steenkiste. Estimating available bandwidth using packet pair proband S. Jamin. Resource reservation protocol
ing. Technical Report CMU-CS-02-166, School
(RSVP). Technical Report RFC 2205, IETF,
of Computer Science, Carnegie Mellon UniverSeptember 1997.
sity, September 2002.
[9] Robert Braden, David Clark, and Scott Shenker.
Integrated services in the Internet architecture: [18] Jon Inouye, Shanwei Cen, Calton Pu, and
Jonathan Walpole. System support for mobile
an overview. Technical Report RFC 1633, IETF,
multimedia applications. In Proceedings of the
June 1994.
Workshop on Network and Operating System
[10] John Byers, Jeffrey Considine, Michael MitzenSupport for Digital Audio and Video, pages 143–
macher, and Stanislav Rost. Informed content
154, May 1997.
delivery across adaptive overlay networks. In
Proceedings of the ACM SIGCOMM Conference, [19] Chalermek Intanagonwiwat, Deborah Estrin,
Ramesh Govindan, and John S. Heidemann. Im2002.
pact of network density on data aggregation in
[11] Prashant Chandra, Allan Fisher, Corey Kosak,
wireless sensor networks. In Proceedings of the
T. S. Eugene Ng, Peter Steenkiste, Eduardo
Twenty-Second Proceedings of the 22nd InterTakahashi, and Hui Zhang. Darwin: Cusnational Conference on Distributed Computing
tomizable resource management for value-added
Systems (ICDCS), July 2002.
network services. In Proceedings of the IEEE
International Conference on Network Protocols [20] Manesh Jain and Constantinos Dovrolis. Endto-end available bandwidth:
Measurement
(ICNP), pages 177–188, October 1998.
methodology, dynamics, and relation with TCP
[12] Sumii Choi, Jonathan Turner, and Tillman
throughput. In Proceedings of the ACM SIGWolf. Configuring sessions in programmable netCOMM Conference, August 2002.
works. In Proceedings of the IEEE INFOCOM
[21] JBI - Joint Battlespace Infosphere. http://
Conference, April 2001.
www.rl.af.mil/programs/jbi/default.cfm.
[13] James Curtis and Tony McGregor. Review
of bandwidth estimation techniques. In New [22] Trevor Jim, Greg Morrisett, Dan Grossman,
Michael Hicks, James Cheney, and Yanling
Zealand Computer Science Research Students’
Wang. Cyclone: A safe dialect of C. In ProConference, volume 8, April 2001.
ceedings of the USENIX Annual Technical Con[14] Emulab.net, 2001. http://www.emulab.net.
ference, June 2002.
[15] Xiaodong Fu, Weisong Shi, Anatoly Akkerman,
[23] David A. Karr, Craig Rodrigues, Joseph P.
and Vijay Karamcheti. CANS: Composable,
Loyall, Richard E. Schantz, Yamuna Krishnaadaptive network services infrastructure. In Promurthy, Irfan Pyarali, and Douglas C. Schmidt.
ceedings of the USENIX Symposium on Internet
Application of the QuO quality-of-service frameTechnologies and Systems (USITS), March 2001.
work to a distributed video application. In
[16] Steven .D. Gribble, Matt Welsh, Rob
Proceedings of the International Symposium on
Van Behren, Eric A. Brewer, David Culler,
Distributed Objects and Applications, September
N. Borisov, S. Czerwinski, R. Gummadi, J. Hill,
2001.
27
[24] Baochun Li and Klara Nahrstedt. Dynamic reconfiguration for complex multimedia applications. In Proceedings of the IEEE International
Conference on Multimedia Computing and Sys[33]
tems, pages 165–170, June 1999.
the Workshop on Network and Operating System Support for Digital Audio and Video, June
2000.
Joseph C. Pasquale, George C. Polyzos, Eric W.
Anderson, and Vachaspathi P. Kompella. Filter
propagation in dissemination trees: Trading off
bandwidth and processing in continuous media
networks. Lecture Notes in Computer Science,
846:259–269, 1994.
[25] Baochun Li and Klara Nahrstedt. QualProbes:
Middleware QoS profiling services for configuring adaptive applications. In Proceedings of the
IFIP International Conference on Distributed
Systems Platforms and Open Distributed Pro[34] Ranga S. Ramanujan and Kenneth J. Thurber.
cessing (Middleware), pages 256–262, 2000.
An active network-based design of a QoS adap[26] Steven McCanne, Van Jacobson, and Martin
tive video multicast service. In Proceedings of
Vetterli. Receiver-driven layered multicast. In
the Workshop on Network and Operating SysProceedings of the ACM SIGCOMM Conference,
tem Support for Digital Audio and Video, pages
Stanford, CA, August 1996.
29–40, July 1998.
[27] Yasuhiko Minamide, Greg Morrisett, and [35] Reza Rejaie, Mark Handley, and Deborah EsRobert Harper. Typed closure conversion. In
trin. Quality adaptation for congestion conTwenty-Third ACM SIGPLAN/SIGACT Symtrolled video playback over the Internet. In
posium on Principles of Programming LanProceedings of the ACM SIGCOMM Conference,
guages, pages 271–283, St. Petersburg, Florida,
pages 189–200, 1999.
January 1996.
[36] Luigi Rizzo. Dummynet: A simple approach
[28] Klara Nahrstedt and Jonathan M. Smith. Deto the evaluation of network protocols. ACM
sign, implementation and experiences of the
Computer Communication Review, 27(1), JanOMEGA end-point architecture. Technical Reuary 1997.
port MS-CIS-95-22, Department of Computer
[37] Stefan Savage, Andy Collins, Eric Hoffman,
and Information Science, the University of PennJohn Snell, and Thomas Anderson. The end-tosylvania, 1995.
end effects of Internet path selection. In Proceed[29] Klara Nahrstedt and Jonathan M. Smith. The
ings of the ACM SIGCOMM Conference, pages
QoS broker. IEEE Multimedia, 2(1):53–67, 1995.
289–299, September 1999.
[30] Akihiro Nakao, Larry Peterson, and Andy [38] H. Schulzrinne, S. Casner, R. Frederick, and
Bavier.
Constructing end-to-end paths for
V. Jacobson. RTP: A transport protocol for realplaying media objects. In Proceedings of the
time applications. Internet RFC 1889, 1996.
IEEE Conference on Open Architectures (OPE[39] David L. Tennenhouse and David J. Wetherall.
NARCH), April 2001.
Towards an Active Network architecture. ACM
[31] B.D. Noble and M. Satyanarayanan. Experience
Computer Communication Review, 26(2):5–18,
with adaptive mobile applications in Odyssey.
April 1996.
Mobile Networks and Applications, 4:245–254,
[40] Thierry Turletti and Jean-Chrysotome Bolot. Is1999.
sues with multicast video distribution in het[32] Wei Tsang Ooi, Robbert van Renesse, and Brian
erogeneous packet networks. In Proceedings of
Smith. Design and implementation of prothe Packet Video Workshop, pages F3.1–3.4,
grammable media gateways. In Proceedings of
September 1994.
28
[41] Brian White, Jay Lepreau, Leigh Stoller, Robert
Ricci, Shashi Guruprasad, Mac Newbold, Mike
Hibler, Chad Bard, and Abhijeet Joglekar. An
integrated experimental environment for distributed systems and networks. In Proceedings of
the USENIX Symposium on Operating Systems
Design and Implementation, December 2002.
[42] Mark Yarvis, Peter Reiher, and Gerald J. Popek.
Conductor: A framework for distributed adaptation. In Proceedings of the IEEE Workshop on
the Hot Topics in Operating Systems (HOTOS),
pages 44–49, March 1999.
[43] Nicholas Yeadon, Andrew Mauthe, David
Hutchison, and Francisco Garcia. QoS filters:
Addressing the heterogeneity gap. Lecture Notes
in Computer Science, 1045:2271–243, 1996.
29