Skip to main content

All Questions

Filter by
Sorted by
Tagged with
2 votes
1 answer
45 views

Posterior estimation using VAE

Using normalizing flows, we can model model's posteriors $p(\theta|D)$, by feeding Gaussian noise $z$ to the NF (parametrized with $\phi$), using the output of the NF $\theta$ as model parameters, and ...
Alberto's user avatar
  • 1,381
2 votes
2 answers
47 views

VAEs - Two questions regarding the posterior and prior distribution derivations

I'm deeply failing to understand the first step in the ELBO derivation in VAEs. When asking my questions I'll also try to clearly state my assumptions and perhaps some of them are wrong to begin with: ...
DrPrItay's user avatar
  • 121
1 vote
0 answers
59 views

How to speed up the following ELBO evaluation?

I have an estimation problem where I need to maximize the evidence lower bound: $$ \mathrm{ELBO} = -\frac{1}{2} \Bigg( \mathbb{E}_{q(\theta)} \left[ \mathrm{vec}(\mathbf{Z})^{\mathrm{H}} \mathbf{C}^{-...
CfourPiO's user avatar
  • 315
0 votes
0 answers
34 views

Is the inferential challenge of dense bayesian or markov networks solved after current improvements in variational inference and neural networks?

I am trying to understand more about Graphical models, and have a reasonable grasp of the basics now. One issue that recurs in a lot of the papers of the mid-2000s and even in Koller's textbook is ...
krishnab's user avatar
  • 1,582
1 vote
0 answers
28 views

Why do we need to marginalize when finding p(data) when latent variables are involved? (part of elbo derivation)

so confused with the derivation of elbo. In part of the derivation p(data) is intractable as it involves an integral over a high dimensional latent variable. I cant understand why the latent ...
user425635's user avatar
3 votes
1 answer
54 views

When deriving ELBO to solve variational inference problems why do we know p(z) and p(x,z) but not p(x) and p(z|x)?

I am a bit lost with the derivation of ELBO because I dont understand why some distributions are known and some are unknown. I guess we know p(z) (the prior) because it was the last value of q(z) ...
user425635's user avatar
1 vote
0 answers
40 views

ELBO & "backwards" KL divergence argument order

On wikipedia it says: "A simple interpretation of the KL divergence of P from Q [i.e. D_KL(P||Q)] is the expected excess surprise from using Q as a model instead of P when the actual distribution ...
profPlum's user avatar
  • 451
0 votes
0 answers
11 views

Setting inducing points to non trainable in Gaussian Process regression

I notice that on the GPflow tutorials for Stochastic Variational Inference they choose a certain amount of inducing points, and after that they make them not trainable. Here they set it to not ...
Francisco Javier Jara Ávila's user avatar
0 votes
1 answer
291 views

Exploring vae latent space

I recently trained a AE and a VAE and used the latent variables of each for a clustering task. It seemed to work well, sensible clusters. The main reason for training the VAE was too gain more ...
Nathan Thompo's user avatar
2 votes
1 answer
116 views

Why sampling from the posterior is a good estimate for the Likelihood but sampling from the prior is bad?

In Variational Autoencoders (VAE), we have: $$ \log p_\theta(x) = \log \left[ \int p_\theta(x \mid z)p(z) \, dz \right] $$ where $ p_\theta(x \mid z) = \mathcal{N}(x; \mu_\theta(z), I) $ and $ p(z) = \...
rando's user avatar
  • 328
2 votes
1 answer
75 views

Why is the forward process referred to as the "ground truth" in diffusion models?

I've seen in many tutorials on diffusion models refer to the distribution of the latent variables induced by the forward process as "ground truth". I wonder why. What we can actually see is ...
Daniel Mendoza's user avatar
2 votes
2 answers
100 views

Why does Variational Inference work?

ELBO is a lower bound, and only matches the true likelihood when the q-distribution/encoder we choose equals to the true posterior distribution. Are there any guarantees that maximizing ELBO indeed ...
Daniel Mendoza's user avatar
8 votes
2 answers
225 views

Theoretical justification for minimizing $KL(q_\phi|p)$ rather than $KL(p|q_\phi)$?

Suppose we have a true but unknown distribution $p$ over some discrete set (i.e. assume no structure or domain knowledge), and a parameterized family of distributions $q_\phi$. In general it makes ...
user56834's user avatar
  • 2,987
1 vote
0 answers
21 views

Getting accurate Uncertainty from MFVI?

I wanted to know if there has been any research on methods to improve the accuracy of Mean-Field Variantional Inference (which doesn't discard the mean-field approximation). Apparently it is known to ...
profPlum's user avatar
  • 451
0 votes
0 answers
20 views

Sampling Gauss-Bernoulli RBM

In the 2018 paper Stein Variational Gradient Descent Without Gradient the authors analyze the sampling performance of their algorithm on multiple benchmarks. One of them is sampling from a Gauss-...
HansDoe's user avatar
  • 11
0 votes
0 answers
19 views

ShapeNet VAE KL Divergence issues

I am trying to train a VAE on shapenet but I can't seem to make it work. Any help or ideas would be highly appreciated. Now the problem is whenever I apply the KL divergence loss the network seems to ...
Youssef's user avatar
2 votes
0 answers
44 views

Posterior approximation following optimization methods

I'm trying to quantify the uncertainty in a high dimensional, and multimodal posterior space. We do not have a analytical solution for the forward model, and the forward model could be expensive to ...
Geooo's user avatar
  • 21
0 votes
0 answers
30 views

Variational inference question - how to get eq. 22 from eq. 21

Referring to David Blei's notes on variational inference, I wonder how to get eq. 22 from eq. 21. Also, what is $z_{-k}$ in $L_k = \int q(z_k) \mathbb{E}_{-k}[\log p(z_k | z_{-k},x)]dz_k - \int q(z_k)\...
Mark's user avatar
  • 171
0 votes
0 answers
23 views

Conditions of applications for coordinate ascent variational inference?

In every reference about coordinate ascent variational inference for the mean field family (Chapter 10 Of the book of C.Bishop Pattern recognition and machine learning, or the review article of Blei ...
Pierre Gloaguen's user avatar
1 vote
0 answers
41 views

Using bootstrap for accurate posterior in Variational Bayes

A common well-known issue in Variational Bayes is the variance underestimation of the posterior. Some methods using "sandwich" variance have already been proposed but provide frequentist ...
Mangnier Loïc's user avatar
0 votes
0 answers
142 views

VAE with linear decoder and nonlinear encoder, does this just learn a linear decomposition of the data?

There are a number of variational autoencoder(VAE) methods that have nonlinear encoders and linear decoders. The concept of using the linear decoder is to improve the interpretability (which features ...
sanK's user avatar
  • 1
1 vote
0 answers
50 views

Understanding Variational inference and EM in relation to each other

I have read several answers like here but, somehow I still have a few doubts. I hope to present my understanding and ask a few questions to clear my doubts EM: A maximization maximization algorithm E-...
figs_and_nuts's user avatar
4 votes
3 answers
185 views

Justification of independence assumption for latent variables in Expectation Maximization algorithm

When deriving the ELBO/free energy in the EM algorithm, it is often done in a "general" case of observed and latent variables and then an assumption of independent (or iid) variables is ...
user246795's user avatar
2 votes
1 answer
85 views

How to calculate the score of a new datapoint by a score based diffusion model(song & ermon, 2019)?

I have a pretrained score based diffusion model trained on 64X64 images. Now I want to calculate the score of a new image(of same dimension) through this pre-trained neural network. The score network ...
rajoy99's user avatar
  • 21
1 vote
0 answers
23 views

Using MCMC-derived posterior to design variational approximation function

I am trying to fit a hierarchical model that estimates the covariance of some parameters, using the probabilistic programming language pyro. In simulation experiments, I saw that the MCMC generates ...
David Shor's user avatar
1 vote
1 answer
362 views

Understanding a beta-variational autoencoder

I'm working on a beta-variational autoencoder using car images from the Vehicle Color Recognition Dataset. At this point, I'm just exploring different architectures and values for beta. (If you're ...
KirkD_CO's user avatar
  • 1,170
0 votes
0 answers
44 views

Inference of Beta-Bernoulli Distribution

Assume $x_1, x_2, \cdots, x_n$ follows a $Bern(\pi_0)$, Let $y_{ik}$ follows $Beta(\alpha,\beta)$, $i\in \{1,\cdots, n\}$, and $k\in \{1,\cdots, K\}$. Let $z_k$ follows a Bernoulli Distribution with a ...
LAM_MN's user avatar
  • 1
1 vote
0 answers
94 views

Why can Variational Autoencoders (VAEs) approximate arbitrary distributions?

I am trying to reason to myself why is it that VAEs can approximate arbitrary probability distributions even though $q_{\phi}(z|x)$ and $p_{\theta}(x|z)$ are Gaussian. I understand that the parameters ...
Decaying Tails's user avatar
1 vote
0 answers
35 views

Tree-reweighted belief propagation: optimizing edge appearances $\mu$

I am currently implementing Tree-Reweighted Belief Propagation (TRBP) to optimize edge appearances. The authors in the main manuscript of this work keep the edge appearances, represented by 𝜇, fixed [...
c.uent's user avatar
  • 115
5 votes
1 answer
93 views

Calculation of an optimal variational distribution for covariance parameters in a Bayesian graphical lasso model

Context: I am considering here a variational Bayesian framework where I need to calculate the optimal variational distribution for some covariance parameters. Formally the model can be expressed as: $$...
Mangnier Loïc's user avatar
2 votes
1 answer
52 views

Do I need to take additional log det Jacobians for every PDF that uses the reparameterization trick?

Consider the - ELBO objective with reparameterization which is also used in VAE's:$$ \mathcal L_{\theta,\phi}(x)=\log p_\theta(X|Z)+\log p_\theta(Z) +\log q_\phi(Z) $$ The reparameterization trick ...
wd violet's user avatar
  • 777
1 vote
0 answers
64 views

How does Variational Autoencoder approximate the joint probability distribution?

I know that in Variational Inference the idea is to approximate the posterior P(z|x, y) and I know that Variational AutoEncoders (VAEs) use the idea of variational inference through neural network ...
Amir Jalilifard's user avatar
5 votes
1 answer
219 views

derivation of coordinate ascent variational inference

From the slides of variational inference, it shows the evidence lower bound ($L$) and the derivative over a variational distribution $q(z_k)$, quoted as follows $$ L_k = \int q(z_k) E_{-k} \bigg[ \log ...
avocado's user avatar
  • 3,653
0 votes
0 answers
22 views

Understanding line in the derivation of KL divergence optimising function in Variational Bayes

I am following the derivation of Variational Bayes approach in David Blei's lecture notes, particularly equations (13 - 16). In particular, the line: $$ = E_q [\ \log_2 q(Z) ]\ - E_q \left[\ \log_2 \...
Joseph's user avatar
  • 143
3 votes
2 answers
460 views

Replacing the KL-divergence term in a VAE with parameter regularization

When training a VAE, one aim to optimize function $\mathcal{L}$, defined as: $$\mathcal{L}\left(\theta,\phi; \mathbf{x}^{(i)}\right) = - D_{KL}\left(q_\phi(\mathbf{z}|\mathbf{x}^{(i)}) || p_\theta(\...
Asterion's user avatar
  • 946
1 vote
0 answers
126 views

How to measure posterior collapse if any

Is there any theoretical work on how to measure posterior collapse? One can measure decoder output, but it is not clear if the degradation (if any) happened due to posterior collapse or due to failing ...
Pavel Podlipensky's user avatar
1 vote
0 answers
113 views

In the β-TCVAE paper, can someone help with the derivation (S3) in Appendix C.1?

Paper: Isolating Sources of Disentanglement in VAEs I follow as far as, $$\mathbb{E}_{q(z)}[log[q(z)] = \mathbb{E}_{q(z, n)}[\ log\ \mathbb{E}_{n'\sim\ p(n)}[q(z|n')]\ ]$$ Subsequently, I don't ...
S R's user avatar
  • 33
1 vote
0 answers
172 views

Are there any methods that combine mcmc and VI?

Are there any methods that combine VI and MCMC? If it exists, why isn’t it used prominently over techniques such as NUTS or other VIs.
JJbox's user avatar
  • 11
2 votes
0 answers
305 views

Why is the Wasserstein distance not used in Variational Inference

I just started learning the concept of variational inference in the context of variational Autoencoder, so please excuse me if the answer is obvious. I would like to know why traditionally, KL-...
user3748950's user avatar
4 votes
1 answer
887 views

Justification of the fixed variational distribution in diffusion models

Diffusion models can be regarded as latent variable models (Ho et al., 2020; Section 2), with the latents being an hierarchical chain of random variables $z_T → \dots → z_t → z_{t-1} → \dots → z_1$ (...
Dan Oneață's user avatar
7 votes
2 answers
1k views

VAE : How is likelihood $p(x|z)$ defined?

Disclaimer : not a strong background in Bayesian statistics. I gather from questions such as this one and this one that in the context of VAEs, we suppose that we know the (form of the ?) prior $p(z)$ ...
Soltius's user avatar
  • 1,396
3 votes
2 answers
420 views

Variational inference : is evidence constant?

I'm studying variational inference (in the context of VAEs), and I'm watching this video at this time point. At this point in the video, the goal of approximating the intractable posterior $p_{\theta}(...
Soltius's user avatar
  • 1,396
0 votes
0 answers
74 views

Understanding Variational inference for LDA

I am trying to derive from scratch variational inference for LDA. I am following this course: https://home.cs.colorado.edu/~jbg/teaching/CSCI_5622/19a.pdf When computing $p(Z|\Theta)$ they do the ...
sam's user avatar
  • 449
0 votes
0 answers
520 views

What is the closed-form of the KL-Divergence between two relaxed Bernoulli distributions?

I've seen in multiple papers that use a relaxation of the Bernoulli distribution as defined in Maddison et. al (here it is referred to as Binary Concrete) and they say that a closed form solution for ...
dannybrig's user avatar
1 vote
1 answer
213 views

Comparing Gibbs sampler and variational inference

I am learning about variational inference and Gibbs simpler. I am in the process of deriving variational inference on my own. In this process, I need to make a comparison with Gibbs sampler. I am ...
sam's user avatar
  • 449
7 votes
1 answer
4k views

What's the role of the commitment loss in VQ-VAE?

I'm reading about VQ-VAE, and trying to understand the commitment loss $\beta||z_e(x) - sg(e)||^2$, described in the following sentence: Finally, since the volume of the embedding space is ...
ihadanny's user avatar
  • 3,360
4 votes
3 answers
128 views

What's the difference between p(Z, X=x) and p(Z|X=x)?

I'm trying to understand variational inference, and I've found resources that mention $p(Z, X=x)$, where $Z$ is a latent random variable and $X$ is the observed random variable. (Here is one such ...
Addison's user avatar
  • 221
3 votes
1 answer
1k views

VQ-VAE objective - is it ELBO maximization, or minimization of the KL-divergence between the posterior and its approximation?

I'm reading two descriptions of the VQ-VAE objective: Kingma claims in page 18 that we want to maximize the ELBO, and shows that it can be written as $ELBO = logp_{\theta}(x) - KL(q_{\phi}(z|x)||p_{\...
ihadanny's user avatar
  • 3,360