BTP Presentation On Text To Image Synthesis
BTP Presentation On Text To Image Synthesis
BTP Presentation On Text To Image Synthesis
3
Background
4
Background: GANs
Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
5
Background: GANs (continued)
Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
6
Background: GANs (continued)
Source : http://slazebni.cs.illinois.edu/spring17/lec11_gan.pdf
7
Background: Text Embedding
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
8
Background: Text Embedding
(continued)
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
9
Background: Text Embedding
(continued)
● Skip-thought Vectors :
Consists of an encoder-decoder model which
generates the surrounding sentences based on
the given sentence.
10
Datasets
11
Datasets
12
Methodology
and
Results
13
Vanilla GANs
Source: [1]
14
Vanilla GANs (continued)
Source: [1]
15
Vanilla GANs (continued)
Source: [1]
16
Vanilla GANs (continued)
Source: [1]
17
Vanilla GANs (continued)
18
Vanilla GANs (continued)
19
WGANs
20
WGANs (continued)
21
WGANs (continued)
22
WGANs (continued)
23
Attention GANs
Source: [3]
24
Attention GANs : Attention
Mechanism
● Motivated by the human tendency to focus on certain words.
● Model takes n inputs along with context and returns a
weighted sum of inputs.
● Focus on the contextual information.
Attention Model
25
Attention GANs (continued)
● Text Encoder:
○ uses bi-directional LSTM to extract feature vectors
○ Global sentence vector is generated in the last state.
● Image Encoder:
○ uses part of Inception-v3 trained on ImageNet.
○ Global feature vector is taken from last pooling
layer.
● DAMSM loss calculated to find similarity between image
and sentence.
26
Attention GANs (continued)
27
Attention GANs (continued)
Source: [4]
28
Attention GANs (continued)
29
Attention GANs (continued)
30
Attention GANs (continued)
31
Attention GANs (continued)
32
Attention GANs (continued)
33
Attention GANs (continued)
34
Future Scope of Research
35
Conclusions
36
References
1. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
“Generative adversarial text to image synthesis,” in Proceedings of
the 33rd International Conference on Machine Learning - Volume 48,
ICML’16, pp. 1060–1069, JMLR.org, 2016.
2. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
adversarial networks,” in Proceedings of the 34th International
Conference on Machine Learning, vol. 70 of Proceedings of Machine
Learning Research, pp. 214–223, PMLR, 06–11 Aug 2017.
3. H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. N.
Metaxas, “Stackgan: Text to photo-realistic image synthesis with
stacked generative adversarial networks,” in ICCV, 2017.
4. T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He,
“Attngan: Finegrained text to image generation with attentional
generative adversarial networks,” CoRR, vol. abs/1711.10485, 2017.
5. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial
nets,” in NIPS, 2014.
37
Thank You
38