All Questions
Tagged with variational or variational-inference
104 questions
2
votes
1
answer
45
views
Posterior estimation using VAE
Using normalizing flows, we can model model's posteriors $p(\theta|D)$, by feeding Gaussian noise $z$ to the NF (parametrized with $\phi$), using the output of the NF $\theta$ as model parameters, and ...
2
votes
2
answers
47
views
VAEs - Two questions regarding the posterior and prior distribution derivations
I'm deeply failing to understand the first step in the ELBO derivation in VAEs.
When asking my questions I'll also try to clearly state my assumptions and perhaps some of them are wrong to begin with:
...
1
vote
0
answers
59
views
How to speed up the following ELBO evaluation?
I have an estimation problem where I need to maximize the evidence lower bound:
$$ \mathrm{ELBO} = -\frac{1}{2} \Bigg( \mathbb{E}_{q(\theta)} \left[ \mathrm{vec}(\mathbf{Z})^{\mathrm{H}} \mathbf{C}^{-...
0
votes
0
answers
34
views
Is the inferential challenge of dense bayesian or markov networks solved after current improvements in variational inference and neural networks?
I am trying to understand more about Graphical models, and have a reasonable grasp of the basics now.
One issue that recurs in a lot of the papers of the mid-2000s and even in Koller's textbook is ...
1
vote
0
answers
28
views
Why do we need to marginalize when finding p(data) when latent variables are involved? (part of elbo derivation)
so confused with the derivation of elbo. In part of the derivation p(data) is intractable as it involves an integral over a high dimensional latent variable. I cant understand why the latent ...
3
votes
1
answer
54
views
When deriving ELBO to solve variational inference problems why do we know p(z) and p(x,z) but not p(x) and p(z|x)?
I am a bit lost with the derivation of ELBO because I dont understand why some distributions are known and some are unknown.
I guess we know p(z) (the prior) because it was the last value of q(z) ...
1
vote
0
answers
40
views
ELBO & "backwards" KL divergence argument order
On wikipedia it says: "A simple interpretation of the KL divergence of P from Q [i.e. D_KL(P||Q)] is the expected excess surprise from using Q as a model instead of P when the actual distribution ...
0
votes
0
answers
11
views
Setting inducing points to non trainable in Gaussian Process regression
I notice that on the GPflow tutorials for Stochastic Variational Inference they choose a certain amount of inducing points, and after that they make them not trainable.
Here they set it to not ...
0
votes
1
answer
291
views
Exploring vae latent space
I recently trained a AE and a VAE and used the latent variables of each for a clustering task. It seemed to work well, sensible clusters. The main reason for training the VAE was too gain more ...
2
votes
1
answer
116
views
Why sampling from the posterior is a good estimate for the Likelihood but sampling from the prior is bad?
In Variational Autoencoders (VAE), we have:
$$
\log p_\theta(x) = \log \left[ \int p_\theta(x \mid z)p(z) \, dz \right]
$$
where $ p_\theta(x \mid z) = \mathcal{N}(x; \mu_\theta(z), I) $ and $ p(z) = \...
2
votes
1
answer
75
views
Why is the forward process referred to as the "ground truth" in diffusion models?
I've seen in many tutorials on diffusion models refer to the distribution of the latent variables induced by the forward process as "ground truth". I wonder why. What we can actually see is ...
2
votes
2
answers
100
views
Why does Variational Inference work?
ELBO is a lower bound, and only matches the true likelihood when the q-distribution/encoder we choose equals to the true posterior distribution. Are there any guarantees that maximizing ELBO indeed ...
8
votes
2
answers
225
views
Theoretical justification for minimizing $KL(q_\phi|p)$ rather than $KL(p|q_\phi)$?
Suppose we have a true but unknown distribution $p$ over some discrete set (i.e. assume no structure or domain knowledge), and a parameterized family of distributions $q_\phi$.
In general it makes ...
1
vote
0
answers
21
views
Getting accurate Uncertainty from MFVI?
I wanted to know if there has been any research on methods to improve the accuracy of Mean-Field Variantional Inference (which doesn't discard the mean-field approximation). Apparently it is known to ...
0
votes
0
answers
20
views
Sampling Gauss-Bernoulli RBM
In the 2018 paper Stein Variational Gradient Descent Without Gradient the authors analyze the sampling performance of their algorithm on multiple benchmarks. One of them is sampling from a Gauss-...
0
votes
0
answers
19
views
ShapeNet VAE KL Divergence issues
I am trying to train a VAE on shapenet but I can't seem to make it work. Any help or ideas would be highly appreciated. Now the problem is whenever I apply the KL divergence loss the network seems to ...
2
votes
0
answers
44
views
Posterior approximation following optimization methods
I'm trying to quantify the uncertainty in a high dimensional, and multimodal posterior space. We do not have a analytical solution for the forward model, and the forward model could be expensive to ...
0
votes
0
answers
30
views
Variational inference question - how to get eq. 22 from eq. 21
Referring to David Blei's notes on variational inference, I wonder how to get eq. 22 from eq. 21. Also, what is $z_{-k}$ in $L_k = \int q(z_k) \mathbb{E}_{-k}[\log p(z_k | z_{-k},x)]dz_k - \int q(z_k)\...
0
votes
0
answers
23
views
Conditions of applications for coordinate ascent variational inference?
In every reference about coordinate ascent variational inference for the mean field family (Chapter 10 Of the book of C.Bishop Pattern recognition and machine learning, or the review article of Blei ...
1
vote
0
answers
41
views
Using bootstrap for accurate posterior in Variational Bayes
A common well-known issue in Variational Bayes is the variance underestimation of the posterior. Some methods using "sandwich" variance have already been proposed but provide frequentist ...
0
votes
0
answers
142
views
VAE with linear decoder and nonlinear encoder, does this just learn a linear decomposition of the data?
There are a number of variational autoencoder(VAE) methods that have nonlinear encoders and linear decoders. The concept of using the linear decoder is to improve the interpretability (which features ...
1
vote
0
answers
50
views
Understanding Variational inference and EM in relation to each other
I have read several answers like here but, somehow I still have a few doubts. I hope to present my understanding and ask a few questions to clear my doubts
EM:
A maximization maximization algorithm
E-...
4
votes
3
answers
185
views
Justification of independence assumption for latent variables in Expectation Maximization algorithm
When deriving the ELBO/free energy in the EM algorithm, it is often done in a "general" case of observed and latent variables and then an assumption of independent (or iid) variables is ...
2
votes
1
answer
85
views
How to calculate the score of a new datapoint by a score based diffusion model(song & ermon, 2019)?
I have a pretrained score based diffusion model trained on 64X64 images. Now I want to calculate the score of a new image(of same dimension) through this pre-trained neural network.
The score network ...
1
vote
0
answers
23
views
Using MCMC-derived posterior to design variational approximation function
I am trying to fit a hierarchical model that estimates the covariance of some parameters, using the probabilistic programming language pyro.
In simulation experiments, I saw that the MCMC generates ...
1
vote
1
answer
362
views
Understanding a beta-variational autoencoder
I'm working on a beta-variational autoencoder using car images from the Vehicle Color Recognition Dataset. At this point, I'm just exploring different architectures and values for beta. (If you're ...
0
votes
0
answers
44
views
Inference of Beta-Bernoulli Distribution
Assume $x_1, x_2, \cdots, x_n$ follows a $Bern(\pi_0)$, Let $y_{ik}$ follows $Beta(\alpha,\beta)$, $i\in \{1,\cdots, n\}$, and $k\in \{1,\cdots, K\}$. Let $z_k$ follows a Bernoulli Distribution with a ...
1
vote
0
answers
94
views
Why can Variational Autoencoders (VAEs) approximate arbitrary distributions?
I am trying to reason to myself why is it that VAEs can approximate arbitrary probability distributions even though $q_{\phi}(z|x)$ and $p_{\theta}(x|z)$ are Gaussian.
I understand that the parameters ...
1
vote
0
answers
35
views
Tree-reweighted belief propagation: optimizing edge appearances $\mu$
I am currently implementing Tree-Reweighted Belief Propagation (TRBP) to optimize edge appearances. The authors in the main manuscript of this work keep the edge appearances, represented by 𝜇, fixed [...
5
votes
1
answer
93
views
Calculation of an optimal variational distribution for covariance parameters in a Bayesian graphical lasso model
Context:
I am considering here a variational Bayesian framework where I need to calculate the optimal variational distribution for some covariance parameters.
Formally the model can be expressed as:
$$...
2
votes
1
answer
52
views
Do I need to take additional log det Jacobians for every PDF that uses the reparameterization trick?
Consider the - ELBO objective with reparameterization which is also used in VAE's:$$
\mathcal L_{\theta,\phi}(x)=\log p_\theta(X|Z)+\log p_\theta(Z) +\log q_\phi(Z)
$$
The reparameterization trick ...
1
vote
0
answers
64
views
How does Variational Autoencoder approximate the joint probability distribution?
I know that in Variational Inference the idea is to approximate the posterior P(z|x, y) and I know that Variational AutoEncoders (VAEs) use the idea of variational inference through neural network ...
5
votes
1
answer
219
views
derivation of coordinate ascent variational inference
From the slides of variational inference, it shows the evidence lower bound ($L$) and the derivative over a variational distribution $q(z_k)$, quoted as follows
$$
L_k = \int q(z_k) E_{-k} \bigg[ \log ...
0
votes
0
answers
22
views
Understanding line in the derivation of KL divergence optimising function in Variational Bayes
I am following the derivation of Variational Bayes approach in David Blei's lecture notes, particularly equations (13 - 16).
In particular, the line:
$$
= E_q [\ \log_2 q(Z) ]\ - E_q \left[\ \log_2 \...
3
votes
2
answers
460
views
Replacing the KL-divergence term in a VAE with parameter regularization
When training a VAE, one aim to optimize function $\mathcal{L}$, defined as:
$$\mathcal{L}\left(\theta,\phi; \mathbf{x}^{(i)}\right) = - D_{KL}\left(q_\phi(\mathbf{z}|\mathbf{x}^{(i)}) || p_\theta(\...
1
vote
0
answers
126
views
How to measure posterior collapse if any
Is there any theoretical work on how to measure posterior collapse?
One can measure decoder output, but it is not clear if the degradation (if any) happened due to posterior collapse or due to failing ...
1
vote
0
answers
113
views
In the β-TCVAE paper, can someone help with the derivation (S3) in Appendix C.1?
Paper: Isolating Sources of Disentanglement in VAEs
I follow as far as,
$$\mathbb{E}_{q(z)}[log[q(z)] = \mathbb{E}_{q(z, n)}[\ log\ \mathbb{E}_{n'\sim\ p(n)}[q(z|n')]\ ]$$ Subsequently, I don't ...
1
vote
0
answers
172
views
Are there any methods that combine mcmc and VI?
Are there any methods that combine VI and MCMC? If it exists, why isn’t it used prominently over techniques such as NUTS or other VIs.
2
votes
0
answers
305
views
Why is the Wasserstein distance not used in Variational Inference
I just started learning the concept of variational inference in the context of variational Autoencoder, so please excuse me if the answer is obvious. I would like to know why traditionally, KL-...
4
votes
1
answer
887
views
Justification of the fixed variational distribution in diffusion models
Diffusion models can be regarded as latent variable models (Ho et al., 2020; Section 2), with the latents being an hierarchical chain of random variables $z_T → \dots → z_t → z_{t-1} → \dots → z_1$ (...
7
votes
2
answers
1k
views
VAE : How is likelihood $p(x|z)$ defined?
Disclaimer : not a strong background in Bayesian statistics.
I gather from questions such as this one and this one that in the context of VAEs, we suppose that we know the (form of the ?) prior $p(z)$ ...
3
votes
2
answers
420
views
Variational inference : is evidence constant?
I'm studying variational inference (in the context of VAEs), and I'm watching this video at this time point. At this point in the video, the goal of approximating the intractable posterior $p_{\theta}(...
0
votes
0
answers
74
views
Understanding Variational inference for LDA
I am trying to derive from scratch variational inference for LDA. I am following this course: https://home.cs.colorado.edu/~jbg/teaching/CSCI_5622/19a.pdf
When computing $p(Z|\Theta)$ they do the ...
0
votes
0
answers
520
views
What is the closed-form of the KL-Divergence between two relaxed Bernoulli distributions?
I've seen in multiple papers that use a relaxation of the Bernoulli distribution as defined in Maddison et. al (here it is referred to as Binary Concrete) and they say that a closed form solution for ...
1
vote
1
answer
213
views
Comparing Gibbs sampler and variational inference
I am learning about variational inference and Gibbs simpler.
I am in the process of deriving variational inference on my own. In this process, I need to make a comparison with Gibbs sampler.
I am ...
7
votes
1
answer
4k
views
What's the role of the commitment loss in VQ-VAE?
I'm reading about VQ-VAE, and trying to understand the commitment loss $\beta||z_e(x) - sg(e)||^2$, described in the following sentence:
Finally, since the volume of the embedding space is ...
4
votes
3
answers
128
views
What's the difference between p(Z, X=x) and p(Z|X=x)?
I'm trying to understand variational inference, and I've found resources that mention $p(Z, X=x)$, where $Z$ is a latent random variable and $X$ is the observed random variable. (Here is one such ...
3
votes
1
answer
1k
views
VQ-VAE objective - is it ELBO maximization, or minimization of the KL-divergence between the posterior and its approximation?
I'm reading two descriptions of the VQ-VAE objective:
Kingma claims in page 18 that we want to maximize the ELBO, and shows that it can be written as $ELBO = logp_{\theta}(x) - KL(q_{\phi}(z|x)||p_{\...