Paper Dvi
Paper Dvi
Paper Dvi
Framework
1 Introduction
The MapReduce framework [5, 6] is a programming paradigm for designing parallel and
distributed algorithms. It provides a simple programming interface that is specifically
designed to make it easy for a programmer to design a parallel program that can effi-
ciently perform a data-intensive computation. Moreover, it is a framework that allows
for parallel programs to be directly translated into computations for cloud computing
environments and server clusters (e.g., see [16]). This framework is gaining wide-spread
interest in systems domains, in that this framework is being used in Google data cen-
ters and as a part of the open-source Hadoop system [19] for server clusters, which have
been deployed by a wide variety of enterprises3 , including Yahoo!, IBM, The New York
Times, eHarmony, Facebook, and Twitter.
Building on pioneering work by Feldman et al. [9] and Karloff et al. [14], our
interest in this paper is in studying the MapReduce framework from an algorithmic
standpoint, by designing and analyzing MapReduce algorithms for fundamental sort-
ing, searching, and simulation problems. Such a study could be a step toward ultimately
putting the MapReduce framework on an equal theoretical footing with the well-known
PRAM and BSP parallel models.
⋆
MADALGO is the Center for Massive Data Algorithmics, a center of the Danish National
Research Foundation.
3
See http://en.wikipedia.org/wiki/Hadoop.
Still, we would be remiss if we did not mention that this framework is not without
its detractors. DeWitt and Stonebraker [7] mention several issues they feel are short-
comings of the MapReduce framework, including that it seems to require brute-force
enumeration instead of indexing for performing searches. Naturally, we feel that this
criticism is a bit harsh, as the theoretical limits of the MapReduce framework have yet
to be fully explored; hence, we feel that further theoretical study is warranted. Indeed,
this paper can be viewed as at least a partial refutation of the claim that the MapRe-
duce framework disallows indexed searching, in that we show how to perform fast and
efficient multi-search in the MapReduce framework.
– A map step applies a function, µ, to each value, xi , to produce a finite set of key-
value pairs (k, v). To allow for parallel execution, the computation of the function
µ(xi ) must depend only on xi .
– A shuffle step collects all the key-value pairs produced in the previous map step,
and produces a set of lists, Lk = (k; v1 , v2 , . . .), where each such list consists of
all the values, vj , such that kj = k for a key k assigned in the map step.
– A reduce step applies a function, ρ, to each list Lk = (k; v1 , v2 , . . .), formed in
the shuffle step, to produce a set of values, y1 , y2 , . . . . The reduction function, ρ, is
allowed to be defined sequentially on Lk , but should be independent of other lists
Lk′ where k ′ 6= k.
The parallelism of the MapReduce framework comes from the fact that each map or
reduce operation can be executed on a separate processor independently of others. Thus,
the user simply defines the functions µ and ρ, and the system automatically schedules
map-shuffle-reduce steps and routes data to available processors, including provisions
for fault tolerance.
The outputs from a reduce step can, in general, be used as inputs to another round
of map-shuffle-reduce steps. Thus, a typical MapReduce computation is described as a
sequence of map-shuffle-reduce steps that perform a desired action in a series of rounds
that produce the algorithm’s output after the last reduce step.
We can make a crude calibration of a MapReduce algorithm using the following addi-
tional parameters:
– L: the latency L of the shuffle network, which is the number of steps that a mapper
or reducer has to wait until it receives its first input in a given round.
– B: the bandwidth of the shuffle network, which is the number of elements in a
MapReduce computation that can be delivered by the shuffle network in any time
unit.
Given these parameters, a lower bound for the total running time, T , of an imple-
mentation of a MapReduce algorithm can be characterized as follows:
R−1
!
X
T =Ω (tr + L + Cr /B) = Ω(t + RL + C/B).
r=0
For example, given a document D of n words, the simple word-counting MapReduce al-
gorithm given above has a worst-case performance of R = 1, C = Θ(n), and t = Θ(n);
hence, its worst-case time performance T = Θ(n), which is no faster than sequential
computation. Unfortunately, such performance could be quite common in the natural-
language documents. For instance, in the Brown Corpus [15], the word “the” accounts
for 7% of all word occurrences.
Note, therefore, that focusing exclusively on R, the number of rounds in a MapRe-
duce algorithm, can actually lead to an inefficient algorithm. For example, if we focus
only on the number of rounds, R, then the most efficient algorithm would always be the
trivial one-round algorithm, which maps all the inputs to a single key and then has the
reducer for this key perform a standard sequential algorithm to solve the problem. This
approach would run in one round, but it would not use any parallelism; hence, it would
be relatively slow compared to an algorithm that was more “parallel.”
Memory-Bound and I/O-Bound MapReduce Algorithms. So as to steer algorithm de-
signers away from the trivial one-round algorithm, recent algorithmic formalizations of
the MapReduce paradigm have focused primarily on optimizing the round complexity
bound, R, while restricting the memory size or input/output size for reducers. Karloff
et al. [14] define their MapReduce model, MRC, so that each reducer’s I/O size is re-
stricted to be O(N 1−ǫ ) for some small constant ǫ > 0, and Feldman et al. [9] define
their model, MUD, so that reducer memory size is restricted to be O(logc N ), for some
constant c ≥ 0, and reducers are further required to process their inputs in a single pass.
These restrictions limit the feasibility of the trivial one-round algorithm for solving a
problem in the MapReduce framework and instead compel algorithm designers to make
better utilization of parallelism.
In this paper, we follow the I/O-bound approach, as it seems to correspond better
to the way reducer computations are specified, but we take a somewhat more general
characterization than Karloff et al. [14], in that we do not bound the I/O size for reducers
explicitly to be O(N 1−ǫ ), but instead allow it to be an arbitrary parameter:
– We define M to be an upper bound on the I/O-buffer memory size for all reducers
used in a given MapReduce algorithm. That is, we predefine M to be a parameter
and require that ∀r, i : nr,i ≤ M.
We then can use M in the design and/or analysis of each of our MapReduce algorithms.
For instance, if each round of an algorithm has a reducer with an I/O size of at most M ,
then we say that this algorithm is an I/O-memory-bound MapReduce algorithm with
parameter M . In addition, if each round has a reducer with an I/O size proportional to
M (whose processing probably dominates the reducer’s internal running time), then we
can give a simplified lower bound on the time, T , for such an algorithm as
T = Ω(R(M + L) + C/B).
This approach therefore can characterize the limits of parallelism that are possi-
ble in a MapReduce algorithm and it also shows that we should concentrate on the
round complexity and communication complexity of a MapReduce algorithm in char-
acterizing its performance.4 Of course, such bounds for R and C may depend on M ,
but that is fine, for similar characterizations are common in the literature on external-
memory algorithms (e.g., see [1, 3, 4, 18]). In the rest of the paper, when we talk about
the MapReduce model, we always mean the I/O-memory-bound MapReduce model.
– create an item; delete an item; modify an item; keep item x (that is, the item x will
be sent to v itself by creating an item (v, x) ∈ Bv (r)); send an item x to node w
(create an item (w, x) ∈ Bv (r)).
– create an edge; delete an edge. This is essentially the same as create an item and
delete an item, since explicit edges are just maintained as items at nodes. This
operations will simplify exposition when dealing with explicitly defined graphs G
on which computation is performed.
The following theorem shows that the above framework captures the essence of
computation in the MapReduce framework.6
6
Due to space constraints, all omitted proofs can be found in Appendix A.
Theorem 1. Let G = (V, E) and f be defined as above such that in each round each
node v ∈ V sends, keeps and receives at most M items. Then computation on G with
round complexity R and communication complexity C can be simulated in the I/O-
memory-bound MapReduce model with the same round and communication complexi-
ties.
The above theorem gives an abstract way of designing MapReduce algorithms.
More precisely, to design a MapReduce algorithm, we define graph G and a sequen-
tial function f to be performed at each node v ∈ V . This is akin to designing BSP
algorithms and is a more intuitive way than defining Map and Reduce functions.
Note that in the above framework we can easily implement a global loop primitive
spanning over multiple rounds: each item maintains a counter that is updated at each
round. We can also implement parallel tail recursion by defining the labels of nodes to
include the recursive call stack identifiers.
3 Simulation Results
BSP simulation. The reader may observe that the generic MapReduce model of the pre-
vious section is very similar to the BSP model of Valiant [17], leading to the following
conclusion.
Theorem 2. Given a BSP algorithm A that runs in R super-steps with a total memory
size N using P ≤ N processors, we can simulate A using R rounds and C = O(RN )
communication in the I/O-memory-bound MapReduce framework with reducer memory
size bounded by M = ⌈N/P ⌉.