Lab RNN Intro

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Machine Learning

- Intro to Recurrent Neural Networks -


RNN Tasks

2/18
RNN Tasks

Vanilla RNNs

Source: CS231n Lecture 10

3/18
RNN Tasks

e.g. Image Captioning


Image → sequence of words
Source: CS231n Lecture 10

4/18
RNN Tasks

e.g. Sentiment Classification


Sequence of words → sentiment
Source: CS231n Lecture 10

5/18
RNN Tasks

e.g. Translation
Sequence of words → sequence of words
Source: CS231n Lecture 10

6/18
RNN Tasks

e.g. Video classification


on frame level
Source: CS231n Lecture 10

7/18
RNN Model

8/18
Vanilla RNN Model

(t) (t ) (t )
x h y

wih whh who


Current state depends on current inputs and previous state

RNNs can yield outputs at each time step
(t ) (t−1) (t)
h =f w (h
hh
, f w ( x ))
ih

(t ) (t )
y =f w (h ), ∀ t ∈{1... τ }
ho

9/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

10/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

11/18
Unfolding RNN in time

Source: NN Lectures, Tudor


Berariu, 2016

12/18
Forward through entire sequence to
compute loss, then backward through
Backpropagation through time entire sequence to compute gradient

Loss

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 50 April 29, 2021


Truncated Backpropagation through time
Loss

Run forward and backward


through chunks of the
sequence instead of whole
sequence

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 51 April 29, 2021


Truncated Backpropagation through time
Loss

Carry hidden states


forward in time forever,
but only backpropagate
for some smaller
number of steps

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 52 April 29, 2021


Truncated Backpropagation through time
Loss

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - 53 April 29, 2021


Truncated BPTT


Used in practice

Summary of the algorithm:
– Present a sequence of k1 timesteps of input and output pairs to
the network.
– Unroll the network then calculate and accumulate errors across
k2 timesteps.
– Roll-up the network and update weights.
– Repeat

13/18
Teacher Forcing and Warm-start


When training a RNN to generate a sequence, often, the
predictions (outputs y(t)) of a RNN cell are used as the input of
the cell at the next timestamp

Teacher Forcing: at training time, use the targets of the
sequence, instead of RNN predictions, as inputs to the next
step


Warm-start: when using an RNN to predict a next value
conditioned on previous predictions, it is sometimes
necessary to give the RNN some context (known ground truth
elements) before letting it predict on its own

14/18
LSTM

15/18
LSTM Cell

Img source:
https://medium.com/
@kangeugine/


Input Gate (i in (0, 1) – sigmoid) – scales input to cell (write)

Output Gate (o in (0, 1) – sigmoid) – scales output from cell
(read)

Forget Gate (f in (0, 1) – sigmoid) – scales old cell values
(reset mem)

16/18
LSTM Cell - Equations

(t ) (t−1)
it =σ ( θ xi x + θhi h +b i )

(t ) (t−1)
f t =σ ( θ xf x + θhf h +b f )

(t ) (t−1)
o t =σ ( θ xo x + θho h +b o )

(t) (t−1)
g t =tanh ( θ xg x + θhg h +b g )

c t =f t ⊙c(t−1)+it ⊙g t
h t =ot ⊙tanh(ct ) , where ⊙ is elementwise multiplication

17/18
LSTMs in practice


Sutskever et al, Sequence
to Sequence Learning with
Neural Networks, NIPS 2014
– Models are huge :-)

– 4 layers, 1000 LSTM cells


per layer
– Input vocabulary of 160k
– Output vocabulary of 80k
– 1000 dimensional word
embeddings

18/18

You might also like