CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
Advanced Distributed
Systems
Spring 2010
Yeah! That’s what
I’d like to know.
2
“A Cloudy History of Time” © IG 2010
The first datacenters!
1940
1960
1970 Clusters
Grids
1980
1990
PCs
(not distributed!) 2000
Peer to peer
systems
2010 3
Clouds and datacenters
“A Cloudy History of Time” © IG 2010
First large datacenters: ENIAC, ORDVAC, ILLIAC
Many used vacuum tubes and mechanical relays
Berkeley NOW Project
Supercomputers
Server Farms (e.g., Oceano)
6
Single site Cloud: to Outsource or
Own? [OpenCirrus paper]
• Medium-sized organization: wishes to run a
service for M months
– Service requires 128 servers (1024 cores) and 524 TB
– Same as UIUC CCT cloud site
• Outsource (e.g., via AWS): monthly cost
– S3 costs: $0.12 per GB month. EC2 costs: $0.10 per
Cpu hour
– Storage ~ $62 K
– Total ~ $136 K
• Own: monthly cost
– Storage ~ $349 K / M
– Total ~ $ 1555 K / M + 7.5 K (includes 1 sysadmin /
100 nodes)
• using 0.45:0.0.4:0.15 split for hardware:power:network 7
Single site Cloud: to Outsource or
Own?
• Breakeven analysis: more preferable to own if:
– $349 K / M < $62 K (storage)
– $ 1555 K / M + 7.5 K < $136 K (overall)
Breakeven points
– M > 5.55 months (storage)
• Not surprising: Cloud providers benefit monetarily most from storage
– M > 12 months (overall)
• Assume hardware lasts for 3 years (typical lifetime)
• Systems are typically underutilized
• With system utilization of x%, still more preferable to own
if:
– x > 33.3%
– Even with CPU util of 20%, storage > 47% makes owning
preferable 8
I want to do research in this
area. I am sure there are no
grand challenges in cloud
computing!
9
10 Challenges [Above the Clouds]
(Index: Performance Data-related Scalability Logisitical)
12
New Parallel Programming Paradigms:
MapReduce
• Highly-Parallel Data-Processing
• Originally designed by Google (OSDI 2004 paper)
• Open-source version called Hadoop, by Yahoo!
– Hadoop written in Java. Your implementation could be
in Java, or any executable
• Google (MapReduce)
– Indexing: a chain of 24 MapReduce jobs
– ~200K jobs processing 50PB/month (in 2006)
• Yahoo! (Hadoop + Pig)
– WebMap: a chain of 100 MapReduce jobs
– 280 TB of data, 2500 nodes, 73 hours
• Annual Hadoop Summit: 2008 had 300 attendees, 2009
had 700 attendees
13
What is MapReduce?
• Terms are borrowed from Functional Language
(e.g., Lisp)
Sum of squares:
• (map square ‘(1 2 3 4))
– Output: (1 4 9 16)
[processes each record sequentially and independently]
Welcome 1
Welcome Everyone
Hello Everyone
Everyone 1
Hello 1
Input <filename, file text> Everyone 1
15
Reduce
16
Some Applications
• Distributed Grep:
– Map - Emits a line if it matches the supplied pattern
– Reduce - Copies the intermediate data to output
• Count of URL access frequency
– Map – Process web log and outputs <URL, 1>
– Reduce - Emits <URL, total count>
• Reverse Web-Link Graph
– Map – process web log and outputs <target,
source>
– Reduce - emits <target, list(source)>
17
Programming MapReduce
• Externally: For user
1. Write a Map program (short), write a Reduce program (short)
2. Submit job; wait for result
3. Need to know nothing about parallel/distributed programming!
• Internally: For the cloud (and for us distributed systems
researchers)
1. Parallelize Map
2. Transfer data from Map to Reduce
3. Parallelize Reduce
4. Implement Storage for Map input, Map output, Reduce input,
and Reduce output
18
Inside MapReduce
• For the cloud (and for us distributed systems researchers)
1. Parallelize Map: easy! each map job is independent of the other!
• All Map output records with same key assigned to same Reduce
2. Transfer data from Map to Reduce:
• All Map output records with same key assigned to same Reduce task
• use partitioning function (more soon)
3. Parallelize Reduce: easy! each reduce job is independent of the
other!
4. Implement Storage for Map input, Map output, Reduce input, and
Reduce output
• Map input: from distributed file system
• Map output: to local disk (at Map node); uses local file system
• Reduce input: from (multiple) remote disks; uses local file systems
• Reduce output: to distributed file system
local file system = Linux FS, etc.
distributed file system = GFS (Google File System), HDFS (Hadoop
Distributed File System)
19
Internal Workings of MapReduce
20
Fault Tolerance
• Worker Failure
– Master keeps 3 states for each worker task
• (idle, in-progress, completed)
– Master sends periodic pings to each worker to keep track of
it (central failure detector)
• If fail while in-progress, mark the task as idle
• If map workers fail after completed, mark worker as idle
• Reduce task does not start until all Map tasks done, and all its
(Reduce’s) data has been fetched
• Master Failure
– Checkpoint
21
Locality and Backup tasks
• Locality
– Since cloud has hierarchical topology
– GFS stores 3 replicas of each of 64MB chunks
• Maybe on different racks
– Attempt to schedule a map task on a machine that
contains a replica of corresponding input data: why?
• Stragglers (slow nodes)
– Due to Bad Disk, Network Bandwidth, CPU, or
Memory.
– Perform backup (replicated) execution of straggler
task: task done when first replica complete
22
Testbed: 1800 servers each with 4GB RAM, dual 2GHz Xeon,
dual 169 GB IDE disk, 100 Gbps, Gigabit ethernet per machine
Grep
25
Administrative
Announcements
Student-led paper presentations (see instructions on website)
• Start from February 11th
• Groups of up to 2 students present each class, responsible for
a set of 3 “Main Papers” on a topic
– 45 minute presentations (total) followed by discussion
– Set up appointment with me to show slides by 5 pm day prior to
presentation
– Select your topic by Jan 31st
• List of papers is up on the website
• Each of the other students (non-presenters) expected to read
the papers before class and turn in a one to two page review of
the any two of the main set of papers (summary, comments,
criticisms and possible future directions)
– Email review and bring in hardcopy before class
26
Announcements (contd.)
Projects
• Groups of 2 (need not be same as
presentation groups)
• We’ll start detailed discussions “soon” (a few
classes into the student-led presentations)
27