BTP Presentation On Text To Image Synthesis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

Generative Adversarial Text to Image Synthesis

Amit Manchanda Anshul Jain


14116013 14116016

Under the guidance of


Dr. Vinod Pankajakshan
(Assistant Professor, ECE, IIT Roorkee)
Content
➢ Objective
➢ Background
○ GANs
○ Text Embeddings
➢ Methodology and Results
○ Vanilla GANs
○ WGANs
○ Attention GANs
➢ Future Scope of Research
➢ Conclusion
2
Objective

Translating text in form of single statement human


written descriptions directly into image.

3
Background

4
Background: GANs

Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
5
Background: GANs (continued)

Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
6
Background: GANs (continued)

Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
7
Background: Text Embedding

Recurrent Neural Network

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
8
Background: Text Embedding
(continued)

Long Short Term Memory Network

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
9
Background: Text Embedding
(continued)

● Skip-thought Vectors :
Consists of an encoder-decoder model which
generates the surrounding sentences based on
the given sentence.

● The following objective function is to be


optimized.

10
Datasets

11
Datasets

We used Caltech-UCSD Birds(CUB) dataset and


Oxford-102 flowers dataset.
● CUB dataset contains 11,788 birds images of
200 categories.
● Oxford-102 dataset contains 8,189 images
from 102-different flower categories.

12
Methodology
and
Results

13
Vanilla GANs

Source: [1]
14
Vanilla GANs (continued)

Source: [1]
15
Vanilla GANs (continued)

Source: [1]
16
Vanilla GANs (continued)

Source: [1]
17
Vanilla GANs (continued)

18
Vanilla GANs (continued)

19
WGANs

● Minimize the distance between real distribution


and model distribution.
● Uses Earth-Mover or Wasserstein distance.
● We want to model a distribution Pθ as a generator
network g dependent on parameter θ.

where is the critic.

20
WGANs (continued)

Wasserstein Loss Generator Loss

21
WGANs (continued)

22
WGANs (continued)

23
Attention GANs

StackGAN : A multi stage generation process.

Source: [3]
24
Attention GANs : Attention
Mechanism
● Motivated by the human tendency to focus on certain words.
● Model takes n inputs along with context and returns a
weighted sum of inputs.
● Focus on the contextual information.

Attention Model
25
Attention GANs (continued)

Deep Attentional Multimodal Similarity Model


(DAMSM)

● Text Encoder:
○ uses bi-directional LSTM to extract feature vectors
○ Global sentence vector is generated in the last state.
● Image Encoder:
○ uses part of Inception-v3 trained on ImageNet.
○ Global feature vector is taken from last pooling
layer.
● DAMSM loss calculated to find similarity between image
and sentence.

26
Attention GANs (continued)

Attention Generative Network

● Model has m generator discriminator pairs.


● Each generator takes hidden state hi as input and
produces an intermediary image.

● Hidden states are defined as follows :

● The word context vectors from attention mechanism is


used to generate images for next stage.

27
Attention GANs (continued)

Source: [4]
28
Attention GANs (continued)

Discriminator Loss Generator Loss

29
Attention GANs (continued)

30
Attention GANs (continued)

31
Attention GANs (continued)

32
Attention GANs (continued)

33
Attention GANs (continued)

34
Future Scope of Research

● Divide the image generation process into


individual object generation.
● WGAN with attention mechanism.
● Training on MS-COCO dataset to produce
generalized images.
● Object oriented learning.

35
Conclusions

● Successfully implemented a model for synthesizing


images using text descriptions.
● Generated images of size 256 × 256 and
photorealistic quality.
● Implemented image-word loss, DAMSM, to be used
for training the model.
● Explored conditional WGAN.

36
References
1. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
“Generative adversarial text to image synthesis,” in Proceedings of
the 33rd International Conference on Machine Learning - Volume 48,
ICML’16, pp. 1060–1069, JMLR.org, 2016.
2. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
adversarial networks,” in Proceedings of the 34th International
Conference on Machine Learning, vol. 70 of Proceedings of Machine
Learning Research, pp. 214–223, PMLR, 06–11 Aug 2017.
3. H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. N.
Metaxas, “Stackgan: Text to photo-realistic image synthesis with
stacked generative adversarial networks,” in ICCV, 2017.
4. T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He,
“Attngan: Finegrained text to image generation with attentional
generative adversarial networks,” CoRR, vol. abs/1711.10485, 2017.
5. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial
nets,” in NIPS, 2014.
37
Thank You

38

You might also like