15 Unsup+gen PDF

CMSC498L
Introduction to Deep Learning

Abhinav Shrivastava
Supervised Unsupervised
Learning Learning
Discriminative Generative
Models Models
Supervised Learning Unsupervised Learning
Data: Data:
Goal: Goal:
Method: Method:
Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Supervised Learning Unsupervised Learning
Data: (x, y) Data: (x)

x is datum, y is label x is datum
Goal: Goal:
Learn P(x, y)
Learn P(y|x)
Classify xnew, Generate xnew
Classify xnew
Method:
Method:
Learn some underlying hidden structure
Learn a function to map x -> y of the data
Learning Learning
Discriminative GeneraIve
Models Models
Discriminative vs. Generative Models
Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Figure from: hOps://datawarrior.wordpress.com/2016/05/08/generaIve-discriminaIve-pairs/, hOp://www.evolvingai.org/fooling





Learning Learning
Models Models
Discussion
Generative tasks
Learn to generate Images from
• Random Noise
noise G Image
Slide inspired by Svetlana Lazebnik (link)

Generative tasks

Generative tasks
• Random Noise
• Conditional Generation (e.g., noise and scalar/one-hot class)
noise
G Image
class
Slide inspired by Svetlana Lazebnik (link), figure from: Self-AOenIon GeneraIve Adversarial Networks
Generative tasks
Slide inspired by Svetlana Lazebnik (link), figure from: Self-Attention Generative Adversarial Networks
Generative tasks
• Random Noise
• CondiMonal GeneraMon (e.g., noise and scalar/one-hot class)
• Image-to-Image GeneraMon (CondiMonal without Noise)
Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix

Generative tasks

Generative tasks
• Random Noise
• Conditional Generation (e.g., noise and scalar/one-hot class)
• Image-to-Image Generation (Conditional without Noise)

Generative tasks
noise
G Image
class
G
Generative tasks
noise
G Image
class
E f G
Generative tasks
E f G
E f D
Generative tasks
age
E 2f I m G
age
I m
E ?f D
Generative tasks
age
E 2f I m G
age
I m
er
co d
E - En
f D
u to
A
Autoencoder
E f D
𝑥 Latent features 𝑥!
Input Data Reconstructed Data
Autoencoder
E f D
𝑥 Latent features 𝑥!
Autoencoder
E f D
𝑥 𝑥!
Trained using reconstruction loss

Autoencoder
𝑥 E f D 𝑥!
$
𝑥 − 𝑥!
Trained using reconstruction loss (L2)
No labels* required!
Autoencoder: Unsupervised or Generative?
E f D
E f Use features for recognition

E f Use features for recognition

• Static Features
• Clustering
• Classification
• etc.
• Update Features
• Pre-training + Fine-tuning
• Etc.
E f D
f D
f D
^
Generate New Data f D
f D
^
Generate New Data f D
E f D
Learning Learning
DiscriminaIve Generative
Models Models
Generative Models for Image Generation
Training data ~ Pdata(x) Generated samples ~ Pmodel(x)
Want to learn a Pmodel(x) similar to Pdata(x)
Want to learn a Pmodel(x) similar to Pdata(x)
Has flavors of density estimation – a core problem in Unsupervised Learning

• Explicit density estimation: explicitly define and solve for Pmodel(x)
• Implicit density estimation: learn a model that can sample from Pmodel(x), without
explicitly defining it
Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung. hOp://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Generative Models
Direct
Explicit Density Implicit Density
Tractable Density Approximate Density Markov Chain
Variational Markov Chain
Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Generative Models
Direct
• GAN

• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• VariaIonal Auto-encoders • Boltzmann Machine
Inspired from:
Generative Models
Direct
• GAN

• NADE
• MADE
• Variational Auto-encoders • Boltzmann Machine
Inspired from:
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:
𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
• Then maximize likelihood of training data
Pixel-RNN/CNN
𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
Pixel-RNN/CNN
𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
Pixel-RNN/CNN
𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
Pixel-RNN/CNN
𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
Pixel-RNN/CNN
𝑃0 𝑥 = ' 𝑃0 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*
Pixel-RNN
State-to-State Component
Credit: Logan Lebanoff

Input-to-State Component

Combine State Components
Credit: Logan Lebanoff

Row LSTM
Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Diagonal LSTM

Diagonal LSTM

Comparison

Results

Multi-scale PixelRNN

Multi-scale PixelRNN

Conditional Image Generation

Results

Pixel-RNN/CNN – aka Auto-regressive Models
Pros: Tricks:
• Can explicitly compute P(x) • Gated Conv. Layers
• Skip connecMons
• Explicit P(x) gives good
• DiscreMzed logits
evaluation metric
• MulM-scale
• Clever training
Cons: • Etc.
• Sequence generation slow
• Optimizing P(x) is hard (See PixelRNN/CNN/CNN++)
Generative Models
Direct
• GAN

• NADE
• MADE
• PixelRNN/CNN VariaIonal Markov Chain
Inspired from:
Generative Models
Direct
• GAN

• NADE
• MADE
Inspired from:
Fei-Fei Li & JusIn Johnson & Serena Yeung. hOp://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on GeneraIve Adversarial Networks, 2017
Generative Adversarial Networks
G(z)
z G Image
GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
G(z)
z G D(G(z)) Fake
D
G(z)
z G D(G(z)) Fake
D
D(xdata) Real
xdata
G(z)
z G D(G(z)) Fake
D
D(xdata) Real
xdata
Conditional Generative Adversarial Networks
z G(z)
class G D(G(z)) Fake
D
D(xdata) Real
xdata
Conditional Generative Adversarial Networks
z G(z)
class G D(G(z)) Fake
D
D(xdata) Real
xdata
Image-to-Image Conditional GANs
G(z)
E G D D fake
D D real
Examples
pix2pix
Contribution:
• Add L1 loss to the loss function
• U-net generator
• PatchGAN discriminator
Isola et al., Image-to-Image Translation with Conditional Adversarial Nets, CVPR 2017
pix2pix – L1 Loss
• Compared to L2, L1 loss results in less blurring*
pix2pix – U-Net Encoder
pix2pix – PatchGAN Discriminator
• Traditional discriminator outputs one value to represent the real/fake
probability.
• PatchGAN outputs N*N values to represent each patch’s real/fake
probability in the image.
• Finally, we use the mean value of the N*N probability
pix2pix Results
CycleGAN
pix2pix – Paired image-to-image translation model
CycleGAN – Unpaired image-to-image translation model
Zhu et al., Unpaired Image-to-Image TranslaIon using Cycle-Consistent Adversarial Networks, ICCV 2017
CycleGAN
CycleGAN
• Adversarial Loss:
• Cycle Loss:
• CycleGAN Loss:
CycleGAN Results
CycleGAN Results
CycleGan à StarGAN
CycleGAN – transfer between 2 domains
StartGAN – transfer between multiple domains
Choi et al., StarGAN: Unified GeneraIve Adversarial Networks for MulI-Domain Image-to-Image TranslaIon
G(z)
z G D(G(z)) Fake
D
D(xdata) Real
xdata
GAN Training Problems
• Instability
• Difficult to keep G and D in sync
• Mode Collapse
Figure from: Goodfellow’s Tutorial on GANs; Metz et al. & Hung-yi Lee & Arjovsky et al.
Heuristic Solutions
• GAN Hacks (h\ps://github.com/soumith/ganhacks)
• GAN Tutorial
• Improved Techniques for Training GANs
Also see: https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b

Numerous variants
• DCGAN: Deep Convolutional GAN
• WGAN: Wasserstein GAN
• Improved WGAN
• IWGAN: Importance Weighted GAN
• LSGAN: Least-squares GAN
• EBGAN: Energy-based GAN
• BEGAN: Boundary Equilibrium GAN
• InfoGAN: Information Maximizing GAN
• BiGAN: Bidirectional GAN
• Etc.
• Etc.
• Etc.
DCGAN: Deep Convolutional GAN
• Make GAN stable
• Combine CNN and GAN
Figure credit: Radford et al.

DCGAN: Deep Convolutional GAN

Experiments
• Generating more complicated scenes

Experiments
• Vector arithmetic on face samples

Numerous variants
• DCGAN: Deep ConvoluMonal GAN
• WGAN: Wasserstein GAN
• Improved WGAN
• IWGAN: Importance Weighted GAN
• LSGAN: Least-squares GAN
• EBGAN: Energy-based GAN
• BEGAN: Boundary Equilibrium GAN
• InfoGAN: InformaMon Maximizing GAN
• BiGAN: BidirecMonal GAN
• Etc.
• Etc.
• Etc.
GAN Zoo
https://github.com/hindupuravinash/the-gan-zoo
E.g., LSGANs
Mao et al., ICCV 2017

High-fidelity Samples (256x256)
ProgressiveGAN, ICLR 2018

ProgressiveGAN, ICLR 2018

BIGGAN, 2018
High-fidelity Samples (interpolation)
BIGGAN, 2018
How to evaluate GANs?
• Showing pictures of samples is not enough, especially for
simpler datasets like MNIST, CIFAR, faces, bedrooms, etc.
• We cannot directly compute the likelihoods of high-
dimensional samples (real or generated), or compare
their distributions
• Many GAN approaches claim mainly to improve stability,
which is hard to evaluate
• For discussion, see Ian Goodfellow’s Twitter thread

Evaluating GANs

Evaluating GANs
• Turing Test

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Evaluating GANs
• Turing Test

Evaluating GANs
• Turing Test
• Inception Score

Evaluating GANs
• Turing Test
• Inception Score
• Fréchet Inception Distance

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter,
GANs trained by a two Ime-scale update rule converge to a local Nash equilibrium, NIPS 2017
Are GANs created equal?

M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018

Abstract:
“We find that most models can reach similar scores with
enough hyperparameter optimization and random restarts.
This suggests that improvements can arise from a higher
computational budget and tuning more than fundamental
algorithmic changes … We did not find evidence that any of
the tested algorithms consistently outperforms the non-
saturating GAN introduced in Goodfellow et al. (2014)”
Generative Models
Direct
• GAN

• NADE
• MADE
Inspired from:
Generative Models
Direct
• GAN

• Fully Visible Belief Nets • GeneraIve StochasIc Networks
• NADE
• MADE
Inspired from:
Variational Auto-encoders (VAEs)
• Probabilistic twist to standard VAEs
𝑥 E z D 𝑥!
• Recall: how can we generate new data using Auto-encoders?
𝑥 E z D 𝑥!
• Recall: how can we generate new data using Auto-encoders?
𝑥 E z 𝑧̂ D 𝑥!
Generating Samples using VAEs
𝑧 D 𝑥
where, 𝑧 ~ 𝑃(𝑧) where, 𝑥 ~ 𝑃(𝑥|𝑧)

Prior Conditional
• Choose something simple • Complex, generates image
• E.g., Gaussian • Parameterized by the decoder
network D
• 𝑃0 (𝑥|𝑧) or 𝑃(𝑥|𝑧; 𝜃)
Training a VAE Generator
𝑧 D 𝑥
where, 𝑧 ~ 𝑃(𝑧) where, 𝑥 ~ 𝑃(𝑥|𝑧)
Maximize likelihood of training data:
𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
𝑃 𝑧 : Simple prior (gaussian)

𝑃 𝑥|𝑧 : Decoder Network

∫ 𝑑𝑧: Intractable to compute for every z

∫ 𝑑𝑧: Intractable to compute for every z
Why is that?
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
+
1
𝑛
()B
• Need a lot of samples of z
• Most of the 𝑃 𝑥 𝑧 ≈ 0.
+
1
𝑛
()B
• Can we learn which z will generate 𝑃 𝑥 𝑧 ≫ 0?
+
1
𝑛
()B
E(F|G)E(G)
Posterior density is also intractable: 𝑃 𝑧 𝑥 =
E(F)
+
1
𝑛
()B
E(F|G)E(G)
Posterior density is also intractable: 𝑃 𝑧 𝑥 =
E(F)
Variational Auto-encoder (VAE)
We want, but impractical:
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
Assume we can learn a distribution 𝑄 𝑧 , such that z ~ 𝑄(𝑧) generates P x z ≫ 0.

Then, we can compute 𝐸G~M(G) (𝑃(𝑥|𝑧))
We want, but impractical:
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
Assume we can learn a distribution 𝑄 𝑧 , such that z ~ 𝑄(𝑧) generates P x z ≫ 0.

Then, we can compute 𝐸G~M(G) (𝑃(𝑥|𝑧))
Questions:
• How can we learn such a 𝑄 𝑧 ?
• How are 𝑃 𝑥 or 𝐸G~E(G) and 𝐸G~M(G) related?
𝑥 E𝜙 𝑄O (𝑧|𝑥)
𝑧 D𝜃 𝑃0 (𝑥|𝑧)
If 𝑄O and 𝑃0 are diagonal
gaussian distributions
𝑥 E𝜙 𝑄O (𝑧|𝑥)
𝑧 D𝜃 𝑃0 (𝑥|𝑧)
𝜇G|F
𝑥 E𝜙
Mean and (diagonal) covariance
of 𝑄O (𝑧|𝑥)
ΣG|F
𝜇F|G
Mean and (diagonal) covariance
𝑧 D𝜃 of 𝑃0 (𝑥|𝑧)
ΣF|G
𝜇G|F
𝑥 E𝜙 Sample z from z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
𝜇F|G
𝑧 D𝜃 Sample x from x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
How are 𝑃 𝑥 or 𝐸G~E(G) and 𝐸G~M(G) related?
Definition of KL-divergence:
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)

E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log (Bayes rule)
E(F)
E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log
E(F)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥 (Re-arrange)
(P(x) independent of z)
E(F|G)E(G)
E(F)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥
(Re-arrange)
E(F|G)E(G)
E(F)


E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log E(F)
= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(KL-divergence definition)
(re-arranging)
log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)

(re-arranging)

Data likelihood,
which we want to maximize,
but is not tractable
(re-arranging)

Data likelihood, Decoder output is P(x|z),
which we want to maximize, can compute estimate of
but is not tractable this term by sampling
(coming soon)
(re-arranging)

Data likelihood, Decoder output is P(x|z), Recall that both
which we want to maximize, can compute estimate of distributions are simple
but is not tractable this term by sampling (e.g., gaussian), therefore
(coming soon) KL-term has a nice form.
(re-arranging)

Can we compute this?
(re-arranging)

As seen earlier, P(z|x) is intractable
(re-arranging)

As seen earlier, P(z|x) is intractable
However,
KL-term is always >= 0.
log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃 𝑧
log 𝑃 𝑥 ≥ 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
Tractable lower bound which we can take

gradient of and optimize!
(known as Variational Lower Bound or ELBO)
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
𝜇F|G
z Sample x from
D x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
ΣG|F
𝜇F|G
z Sample x from
ΣF|G
𝜇G|F
𝑥 E
Sample z from
ΣG|F
𝜇F|G
z Sample x from
ΣF|G
𝜇G|F
𝑥 E
Sample z from
ΣG|F
Sample x from
𝜇F|G x|z ~ 𝑁(𝜇F|G , ΣF|G )
D
ΣF|G 𝑥!
$
𝑥 − 𝑥!
𝜇G|F
𝑥 E
Sample z from
ΣG|F
Sample x from
𝜇F|G x|z ~ 𝑁(𝜇F|G , ΣF|G )
D
ΣF|G 𝑥!
$
𝑥 − 𝑥!
Modeling P(x|z)
Let 𝑓 𝑧 be the network output.
- Assume 𝑃(𝑥|𝑧) to be i.i.d. Gaussian
- 𝑥! = 𝑓 𝑧 + 𝜂, where 𝜂 ~ 𝑁(0,1) (re: Linear Regression)
$
- Simplifies to an L2 loss: 𝑥 − 𝑓(𝑧)
$
Also, approximate 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 with 𝑥 − 𝑓(𝑧) for a single 𝑧.
Why is this reasonable?
Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link).
Details of second approximation in Tutorial on VAE.
𝜇G|F
𝑥 E
Sample z from
ΣG|F
$
D 𝑓(𝑧) 𝑥 − 𝑓(𝑧)
𝜇G|F
𝑥 E
Sample z from
ΣG|F
$
Re-parameterization Trick
z ~ 𝑁 𝜇, 𝜎 $ is equivalent to
• 𝜇 + (𝜎 ⋅ 𝜖), where 𝜖 ~ 𝑁 0,1
• Now we can easily backpropagate the loss to the Encoder.
Figure from Kingma’s workshop slides

𝜇G|F
𝑥 E
ΣG|F * +
Sample 𝜖 from 𝑁(0, 1)
$
From: A Tutorial on VAE
Figure from Doersch, Tutorial on VAE.

Sampling from VAE

Conditional VAE

Generated Samples
Samples from: Fei-Fei Li & Justin Johnson & Serena Yeung (link)
Application: Expression Magnification and Suppression
Yeh et al., Semantic Facial Expression Editing using Autoencoded Flow



VAEs
Pros
• Principled approach to generative models
• Allows inference of q(z|x) – which can be used as feature representation
Cons
• Maximizes ELBO (remember tightness?)
• Samples are blurrier
• Why?
Active areas of research:
• More flexible approximations, e.g. richer approximate posterior instead of
diagonal Gaussian
• Incorporating
Slide from: structure
Fei-Fei Li & Justin Johnson inYeung
& Serena latent
(link)variables
VAE-GAN
Decoder/
Encoder z
Generator
D
Discriminator real/
fake
Slide inspired by Svetlana Lazebnik (link). Bottom diagram from Larsen et al.
Application: Fader Networks
Lample et al., Fader Networks: Manipulating Images by Sliding Attributes

VAE: many schools of thoughts
Not blurry, but noisy*
Sample vs. Mean/Expected Value
x|z ~ 𝑁(𝜇F|G , ΣF|G ) vs. x|z = 𝜇F|G
L-2 Loss**
*see reddit discussion
**see blog
KL Divergence
From:
Ian Goodfellow,
Tutorial on GANs, 2017
KL Divergence
From: Pedro’s blog

Learning Learning
Models Models
What generative models we haven’t covered?
Videos
• Future generation
• Future prediction
• Future action prediction
• Future pose prediction
• Future pose generation
• Etc.
• Etc.
Generative Models
Direct
• GAN

• NADE
• MADE
Inspired from:
Text Generation Models
source
source
source
source
3 solutions:
• Gumbel-Softmax (continuous approximation of Softmax)
• Work with continuous spaces
• Reinforcement learning (e.g., REINFORCE)
source
Text Generation Models – Examples
source
source
source
Maskgan: better text generation via filling in the______

source
source
source
Text to Image – Examples
source
source
source
source
Text to Video – Examples
source
Text Generation Models – Good References
• GAN for text generation – Part I
• GAN for text generation – Part II: RL (part III & IV coming shortly)
• OpenAI GPT-2 model
• Generating Natural-language text with Neural Networks
• Generative Model for text: An overview of recent advancements
GAN and VAE references
• Why it is so hard to train GANs!
• VAE explanations (link, link, link, link, link, link, tutorial)
Learning Learning
Models Models
Learning Learning
Models Models
Generative and Unsupervised Gap
cat dog
Self-supervised Learning – teaser
Proxy Tasks for “Self”- supervised learning
Doersch et al., Unsupervised Visual RepresentaIon Learning by Context PredicIon

Pathak et al., Context Encoders: Feature Learning by Inpainting

Wang et al., Unsupervised Learning of Visual Representations using Videos

Evaluate on Recognition
Table from: Pathak et al., Context Encoders: Feature Learning by Inpainting

Evaluate on Recognition
Table from: Donahue et al., Adversarial Feature Learning

Back to image generative models
• GANs
• VAEs
•?
• GANs
• VAEs
• Flow-based Methods
source
source
Conclusion
source

15 Unsup+gen PDF

Uploaded by

Copyright:

Available Formats

15 Unsup+gen PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

15 Unsup+gen PDF

Uploaded by

Copyright:

Available Formats

CMSC498L

Introduction to Deep Learning

Data: (x, y) Data: (x)

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Figure from: hOps://datawarrior.wordpress.com/2016/05/08/generaIve-discriminaIve-pairs/, hOp://www.evolvingai.org/fooling

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling

Slide inspired by Svetlana Lazebnik (link)

Slide inspired by Svetlana Lazebnik (link)

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix

Trained using reconstruction loss

Trained using reconstruction loss (L2)

E f Use features for recognition

E f Use features for recognition

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Want to learn a Pmodel(x) similar to Pdata(x)

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Want to learn a Pmodel(x) similar to Pdata(x)

Has flavors of density estimation – a core problem in Unsupervised Learning

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

Variational Markov Chain

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

• Then maximize likelihood of training data

• Then maximize likelihood of training data

• Then maximize likelihood of training data

• Then maximize likelihood of training data

• Then maximize likelihood of training data

• Then maximize likelihood of training data

Credit: Logan Lebanoff

Credit: Logan Lebanoff

Credit: Logan Lebanoﬀ

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Logan Lebanoff

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

Also see: https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b

Figure credit: Radford et al.

Figure credit: Radford et al.

Figure credit: Radford et al.

Figure credit: Radford et al.