Cloud Computing For ML Sys Class
Cloud Computing For ML Sys Class
Cloud Computing For ML Sys Class
Machine Learning
in the Cloud
Joseph E. Gonzalez
Co-director of the RISE Lab
[email protected]
What is cloud computing?
“The interesting thing about Cloud
Computing is that we’ve redefined
Cloud Computing to include
everything that we already do. . . . I
don’t understand what we would do
differently in the light of Cloud
Computing other than change the
wording of some of our ads.”
-- Larry Ellison,
Wall Street Journal, 2008
3
Quote from “Above the Clouds: A Berkeley View of Cloud Computing”
8 years later …
2016
4
“If ‘cloud computing’ has a meaning,
it is not a way of doing computing, but
rather a way of thinking about
computing: a devil-may-care
approach which says, ‘Don't ask
questions. Don't worry about who
controls your computing or who holds
your data. Don't check for a hook
hidden inside our service before you
swallow it. Trust companies without
hesitation.’ In other words, ‘Be a
sucker.’ ”
-- Richard Stallman,
Boston Review, 2010
5
https://bostonreview.net/articles/richard-stallman-free-software-drm/
2006-2011
10
Economics of the Cloud
CapEx to OpEx: transition from large up-front capital expenditures
to operational expenditures
More money to spend on your launching your business
Economies of scales
Negotiate lower hardware prices
Spread management costs
Leverage existing investments
11
The Cloud Enabled Academic Research
Access to the latest hardware
Ability to burst experiments near conference deadlines
Usually…
industrial adoption
Companies can evaluate open-source (academic) big data tools without big
upfront investment in hardware.
What about the Cloud?
Access to latest GPUs and TPUs drove AI research
used a LOT OF CREDITS (thank you AWS, Azure, & Google!)
Conference
Deadlines
13
The Elasticity
of the cloud
drove us to rethink
our approach to AI Research
14
Deadline
Accuracy
Time
Deadline
Accuracy
Time
Deadline
Accuracy
Time
Exploration Exploitation
00:00 Time (mins) 60:00 00:00 Time (mins) 45:00
GPU 1 GPU 1
GPU 2 GPU 2
GPU 3 GPU 3
Model GPU 4 GPU 4
learning GPU 5
GPU 5
rate 𝒍, weight
decay d GPU 6 GPU 6
GPU 7 GPU 7
GPU 8 GPU 8
Linear
Exploitation Scaling
00:00 Time (mins) 45:00
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
GPU 7
GPU 8
Linear
Exploitation Scaling
00:00 Time (mins) 45:00
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
GPU 7
GPU 8
Fixed Cluster Cloud
Resources = Machines Resources = Money
Fixed Cluster Cloud
Resources = Machines Resources = Money
GPU-mins
GPU-mins
GPU 1
GPU 2
GPU 3
GPU 4
Liaw et al.
00:00 Time (mins) Deadline
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
00:00 Time (mins) Deadline
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
00:00 Time (mins) Deadline
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
00:00 Time (mins) Deadline
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
00:00 Time (mins) Deadline
GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
When it comes to machine-time allocation
00:00 Time (mins) Deadline 00:00 Time (mins) Deadline
GPU 1 GPU 1
GPU 2 GPU 2
GPU 3 GPU 3
GPU 4 GPU 4
GPU 5
GPU 6
Cloud
Cluster Computing
Compute
Computing
e
e
m
Tim
Ti
Storage Storage
e
Tim
Storage
32
33
©2019 RISELab
Canonical example
Canonical example
BaaS
FaaS
BaaS
BaaS
AWS Lambda
36
©2019 RISELab
A Layer to Simplify Using the Cloud
Three essential qualities of serverless
computing
H
ides servers and the complexity of programming and operating
them
Offers a pay-per-use cost model with no charge for idle resources
Has excellent autoscaling so resources match demand closely
Airport Analogy
When you arrive at the destination airport and need to get to your hotel
you could:
1. Buy a car and drive [Legacy on premise systems]
Long term investment and you are responsible for everything
Sky
Secure
Training Serving
Data proc
46
Readings This Week
47
Pollux: Co-adaptive Cluster Scheduling for
Goodput-Optimized Deep Learning
Published in OSDI’21 – Best Paper Award
Why we chose it?
Good example of work exploring scheduling of for ML
Addresses ML and Systems concerns: throughput, improvements in
accuracy, fairness?
48
The Sky Above the Clouds [Unpublished]
Draft of the vision paper describing Sky research agenda
Do Not Distribute
Feedback will help the paper (be critical!)
Makes a case for both the inevitability and need for research in
“Sky Computing”
Things to think about:
Presentation of premise [what is proposed?]
Role of data
Role of research
ML Systems Research Case?
49
FrugalML: How to Use ML Prediction APIs
More Accurately and Cheaply
Published in NuerIPS’20
Example of a “Sky Computing” ML research direction
Combining competing prediction services to improve accuracy and reduce
costs.
Potentially exciting new research direction!