Bay Learn 2015 Deep Mind
Bay Learn 2015 Deep Mind
Bay Learn 2015 Deep Mind
Across many
products/areas:
Android
Apps
GMail
Image Understanding
Maps
NLP
Photos
Robotics
Speech
Translation
many research uses..
YouTube
many others ...
Outline
Two generations of deep learning software systems:
1st generation: DistBelief [Dean et al., NIPS 2012]
2nd generation: TensorFlow (unpublished)
An overview of how we use these in research and products
Plus, ...a new approach for training (people, not models)
Text Understanding
This movie should have NEVER been made. From the poorly
done animation, to the beyond bad acting. I am not sure at what
point the people behind this movie said "Ok, looks good! Lets
do it!" I was in awe of how truly horrid this movie was.
1-4 days
Tolerable
Interactivity replaced by running many experiments in parallel
1-4 weeks:
>1 month
Model Parallelism
Model Parallelism
Model Parallelism
Data Parallelism
Parameter Servers
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
p
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
p = p + p
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
p = p + p
p
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
p = p + p
Model
Replicas
...
Data
...
Data Parallelism
Parameter Servers
p = p + p
Model
Replicas
...
Data
...
Certain model structures reuse parameter many times within each example:
High dimensional
representation of a
sequence
0.1
0.5
1.0
0.0
2.4
It works well
WMT14
BLEU
State-of-the-art
37.0
37.3
or a chatbot.
I'm fine, thank you!
or a parser.
n:(S.17 n:(S.17 n:(NP.11 p:NNP.53 n:) ...
It works well
Completely learned parser with no parsing-specific code
State of the art results on WSJ 23 parsing task
Grammar as a Foreign Language, Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav
Petrov, Ilya Sutskever, and Geoffrey Hinton (to appear in NIPS 2015)
http://arxiv.org/abs/1412.7449
input:
collection
of points
cat
Image Models
cat
Module with 6
separate
=
convolutional
layers
24 layers deep
Good Generalization
Sensible Errors
Previous SOTA
Human
MS COCO
N/A
67
69
FLICKR
49
63
68
25
59
68
11
27
N/A
TensorFlow:
Second Generation Deep Learning System
Motivations
DistBelief (1st system) was great for scalability
Not as flexible as we wanted for research purposes
Better understanding of problem space allowed us to
make some dramatic simplifications
Core in C++
Very low overhead
Different front ends for specifying/driving the computation
Python and C++ today, easy to add more
...
GPU
Android
iOS
...
# Variables
logits = tf.mat_mul(examples, W) + b
# Training computation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
prediction = tf.nn.softmax(logits)
# Optimizer to use
# Predictions for training data
# Variables
logits = tf.mat_mul(examples, W) + b
# Training computation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
prediction = tf.nn.softmax(logits)
with tf.Session(graph=graph) as session:
tf.InitializeAllVariables().Run()
for step in xrange(num_steps):
_, l, predictions = session.Run([optimizer, loss, prediction])
if (step % 100 == 0):
print 'Loss at step', step, ':', l
print 'Training accuracy: %.1f%%' % accuracy(predictions, labels)
# Optimizer to use
# Predictions for training data
biases
Add
weights
MatMul
examples
labels
Relu
Xent
biases
Add
weights
MatMul
examples
labels
with
s
r
o
s
ten
Relu
Xent
'Biases' is a variable
e
t
a
t
ith s
= updates biases
biases
...
learning rate
Add
...
Mul
Device A
biases
...
d
e
t
u
b
i
r
t
is
Add
learning rate
...
Mul
Device B
What is in a name?
Tensor: N-dimensional array
1-dimension: Vector
2-dimension: Matrix
Represent many dimensional data flowing through the graph
e.g. Image represented as 3-d tensor rows, cols, color
Flexible
General computational infrastructure
Deep Learning support is a set of libraries on top of the core
Also useful for other machine learning algorithms
Possibly even for high performance computing (HPC) work
Abstracts away the underlying devices/computational hardware
Extensible
Core system defines a number of standard operations
and kernels (device-specific implementations of
operations)
Easy to define new operators and/or kernels
Auto Differentiation
model
computation
update
model
computation
parameters
update
model
computation
Synchronous Variant
update
add
gradient
model
computation
gradient
model
computation
parameters
gradient
model
computation
Were always looking for people with the potential to become excellent
machine learning researchers
The resurgence of deep learning in the last few years has caused a surge of
interest of people who want to learn more and conduct research in this area
programming experience
g.co/brainresidency
Contact us:
Questions?