3.6 Backup Tasks 4.2 Ordering Guarantees: To Appear in OSDI 2004

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Furthermore, R is often constrained by users because the intermediate key.

A default partitioning function is


the output of each reduce task ends up in a separate out- provided that uses hashing (e.g. hash(key) mod R).
put file. In practice, we tend to choose M so that each This tends to result in fairly well-balanced partitions. In
individual task is roughly 16 MB to 64 MB of input data some cases, however, it is useful to partition data by
(so that the locality optimization described above is most some other function of the key. For example, sometimes
effective), and we make R a small multiple of the num- the output keys are URLs, and we want all entries for a
ber of worker machines we expect to use. We often per- single host to end up in the same output file. To support
form MapReduce computations with M = 200, 000 and situations like this, the user of the MapReduce library
R = 5, 000, using 2,000 worker machines. can provide a special partitioning function. For example,
using hash(Hostname(urlkey)) mod R as the par-
titioning function causes all URLs from the same host to
3.6 Backup Tasks end up in the same output file.
One of the common causes that lengthens the total time
taken for a MapReduce operation is a straggler: a ma- 4.2 Ordering Guarantees
chine that takes an unusually long time to complete one
of the last few map or reduce tasks in the computation. We guarantee that within a given partition, the interme-
Stragglers can arise for a whole host of reasons. For ex- diate key/value pairs are processed in increasing key or-
ample, a machine with a bad disk may experience fre- der. This ordering guarantee makes it easy to generate
quent correctable errors that slow its read performance a sorted output file per partition, which is useful when
from 30 MB/s to 1 MB/s. The cluster scheduling sys- the output file format needs to support efficient random
tem may have scheduled other tasks on the machine, access lookups by key, or users of the output find it con-
causing it to execute the MapReduce code more slowly venient to have the data sorted.
due to competition for CPU, memory, local disk, or net-
work bandwidth. A recent problem we experienced was 4.3 Combiner Function
a bug in machine initialization code that caused proces-
sor caches to be disabled: computations on affected ma- In some cases, there is significant repetition in the inter-
chines slowed down by over a factor of one hundred. mediate keys produced by each map task, and the user-
We have a general mechanism to alleviate the prob- specified Reduce function is commutative and associa-
lem of stragglers. When a MapReduce operation is close tive. A good example of this is the word counting exam-
to completion, the master schedules backup executions ple in Section 2.1. Since word frequencies tend to follow
of the remaining in-progress tasks. The task is marked a Zipf distribution, each map task will produce hundreds
as completed whenever either the primary or the backup or thousands of records of the form <the, 1>. All of
execution completes. We have tuned this mechanism so these counts will be sent over the network to a single re-
that it typically increases the computational resources duce task and then added together by the Reduce function
used by the operation by no more than a few percent. to produce one number. We allow the user to specify an
We have found that this significantly reduces the time optional Combiner function that does partial merging of
to complete large MapReduce operations. As an exam- this data before it is sent over the network.
ple, the sort program described in Section 5.3 takes 44% The Combiner function is executed on each machine
longer to complete when the backup task mechanism is that performs a map task. Typically the same code is used
disabled. to implement both the combiner and the reduce func-
tions. The only difference between a reduce function and
a combiner function is how the MapReduce library han-
4 Refinements dles the output of the function. The output of a reduce
function is written to the final output file. The output of
Although the basic functionality provided by simply a combiner function is written to an intermediate file that
writing Map and Reduce functions is sufficient for most will be sent to a reduce task.
needs, we have found a few extensions useful. These are Partial combining significantly speeds up certain
described in this section. classes of MapReduce operations. Appendix A contains
an example that uses a combiner.
4.1 Partitioning Function
4.4 Input and Output Types
The users of MapReduce specify the number of reduce
tasks/output files that they desire (R). Data gets parti- The MapReduce library provides support for reading in-
tioned across these tasks using a partitioning function on put data in several different formats. For example, text

To appear in OSDI 2004 6

You might also like