Questions tagged [maximum-entropy]
maximum entropy or maxent is a statistical principle derived from information theory. Distributions maximizing entropy (under some constraints) are thought to be "maximally uninformative" given the constraints. Maximum entropy can be used for multiple purposes, like choice of prior, choice of sampling model, or design of experiments.
172 questions
0
votes
0
answers
12
views
Why is formalism of max entropy principle for continuous case different from discrete case?
In discrete case (Chapter 11 of Jaynes' Probability Theory: The Logic of Science), max entropy principle states that we need to encode our ignorance prior by doing a constrained optimization problem, ...
2
votes
1
answer
32
views
Which operations on distributions respect MaxEnt property?
Seems like MaxEnt property for log-normal distribution follows directly from MaxEnt property of normal. So for any Y, such that ...
2
votes
0
answers
16
views
Maximum entropy distributions with more general constraints
Gibbs showed that for a space $X$ (assume finite for simplicity) and functions $f_i:X\to R$, the maximum entropy distribution on $X$ s.t. constraints on the expectation of $f_i$ is the Boltzmann ...
1
vote
0
answers
31
views
Solve for maximum entropy conditional probability
I'm new to max-ent principle and functional derivatives. I have known joint data distribution $p_D(x,y)$ (where $y$ is regarded as the labels) and a latent variable model $(x,y,z)$ with the prior $p(z)...
0
votes
0
answers
48
views
Under What Conditions Does a Gaussian Mixture Model (GMM) Have Maximum Entropy?
Introduction
I'm delving into Gaussian Mixture Models (GMMs) within unsupervised learning frameworks and am particularly interested in their statistical properties, with a focus on entropy. Entropy ...
1
vote
0
answers
35
views
Exponential families as families of limite distributions of Markov processes
An exponential family verifies a maximum entropy property: each density is the maximum entropy density given the expectation of its sufficient statistic.
On the other hand, from my understanding, the ...
1
vote
0
answers
31
views
Maximum Entropy distribution of a ticking clock
Say I have a clock that emits "ticks". An ideal clock looks like a dirac comb. It has:
perfect periodicity of ticks (there is a precise fixed time interval between any two consecutive ticks)...
0
votes
0
answers
67
views
Minimizing cross entropy over a restricted domain?
Suppose $f(x;q)$ is the true distribution. The support of the random variable $X$ is $\Omega$. Suppose, I am interested in a particular subset of $\Xi \subset \Omega$. I would like to minimize the ...
2
votes
1
answer
198
views
When and how was the Bernoulli distribution with real binomial proportion introduced?
I certainly should read Jakob Bernoulli's Ars Conjectandi again but let me share my concerns.
I'm just wondering when and how the Bernoulli distribution $Be(p)$ (and related distributions like the ...
4
votes
1
answer
103
views
Does every distribution family have a set of maximum entropy constraints?
I am reflecting on these examples of maximum entropy distributions. I am (pleasantly) surprised that various common distribution families have maximum entropy constraints.
It got me wondering if ...
1
vote
1
answer
103
views
Is the principle of maximum entropy misleading?
If a distribution belongs to a certain class, then the distribution with the largest entropy in that class is typically referred to as the least-informative distribution. To me, this his highly ...
2
votes
1
answer
226
views
What is the reasoning behind max entropy constraints for the gamma distribution?
The max entropy method is a way of deriving a probability distribution given only the information you know about how the data is distributed and nothing more. For example the normal distribution can ...
8
votes
2
answers
580
views
Jaynes' Description of Maximum Entropy Distribution
I am reading E. T. Jaynes' probability theory book, and I am at chapter 11 where he introduces the maximum entropy principle. I understand that Jaynes separates the notion of probability from that of ...
1
vote
1
answer
150
views
How can we use shannon entropy to discriminate between two similar probability distribution function?
I studied two papers related to discriminating between two similar distributions using Shannon entropy. But both of them had different views. Can anyone explain what would be the basic flow of idea to ...
5
votes
1
answer
2k
views
Choosing "Target Entropy" for Soft-Actor-Critic (SAC) algorithm
I am quite familiar with Soft-Actor-Critic (SAC) and its many applications in continuous control RL environments. However, when implementing this algorithm in a practical setting, one thing that still ...
1
vote
0
answers
86
views
Discrete Bayes Net learning under parameter constraints
What is some relevant research available on estimating the parameters of a Bayes Net (with known structure) when there are known constraints on conditional and marginal probabilities?
For example, ...
2
votes
1
answer
209
views
Highest Entropy Distribution, on $[0,\infty)$ given Mean, Variance, and goes to $p(0) = 0$?
I am dealing with temperature measurements, and normally we assume the probability of getting a measurement $t_i$ with a certain uncertainty $\sigma_t$ given the model ('true' value) $M(x_i)$ (where $...
3
votes
0
answers
93
views
Maximum entropy distribution of a positive continuous variable with known mean and vanishing probability at 0
I am working on a problem where I know that the variable of concern $x$ is positive, and has no upper bound on its value and whose probability would vanish as we approach 0, $\lim_{x \rightarrow 0^+} ...
3
votes
1
answer
34
views
How to statistically detect a treshold effect over a dependent variable measured repeated times on the same population
I want to identify the level of a predictive variable X (with Gaussian distribution) able to induce a reduction in a variable y (with Poisson distribution), that has been measured over the same ...
0
votes
0
answers
402
views
Computing the gradient of the log-partition function in a linear-chain conditional random field (CRF) model
Query.
When computing the gradient of the log-partition function for an exponential family distribution specified by the linear-chain conditional random field (CRF) model, will unary conditional ...
3
votes
0
answers
431
views
Geometric distribution and entropy
According to wikipedia, among all discrete probability distributions supported on $\{1, 2, 3, ... \}$ with given expected value $\mu$, the geometric distribution X with parameter $p = \frac{1}{
\mu} $ ...
1
vote
0
answers
118
views
What is the maximum entropy joint Bernoulli distribution with fixed covariances and individual means?
We have Bernoulli variables $B_i$ with known means $E(B_i)$ and covariance matrix $\Sigma = (cov(B_i, B_j))$. What joint distribution would have the maximum entropy?
1
vote
0
answers
88
views
How to evaluate the likelihood of a conditional MAXENT estimation?
Suppose I have a random variable $Y$ (the outcome) and a set of random variables $\mathbf{X}$ (the input variables). I don't have access to observations of the joint distribution of $P(Y, \mathbf{X})$,...
3
votes
0
answers
123
views
Generalization of Burg's Maximum Entropy Theorem
Burg's Theorem characterizes the form of an entropy-maximizing time series, subject to constraints on the autocorrelation. More precisely, the theorem states that the autoregressive Gaussian process $...
1
vote
0
answers
15
views
Complexity of Maximum Entropy Algorithm in Sentiment Analysis
Does anyone know how the process of calculating the complexity of the maximum entropy algorithm and its implementation later in the sentiment analysis?
Please help me, because I haven't got a ...
3
votes
0
answers
108
views
What determines the functional form of maximum entropy constraints?
I'm familiar with the maximum entropy (ME) principle in statistical mechanics, where, for example, the Boltzmann distribution $p(\epsilon_i|\beta)$ is identified as the ME distribution constrained by ...
5
votes
1
answer
3k
views
Is there a relationship between Maximum Likelihood Estimation and the Maximum Entropy Principle?
I know that both techniques can be used to estimate distribution from the data, but I didn't see anything in common between the two and I haven't found anything yet for the internet that relates the ...
2
votes
0
answers
117
views
Maximum entropy discrete distributions with specified mean
Consider a discrete distribution on {1, ...,n}, with mean given as $m$, what is the maximum entropy distribution?
I know it takes the form $p_{X}(k)=ar^{k}$ and is a geometric distribution when n is ...
3
votes
0
answers
91
views
entropy regularization in generative model
I am wondering if it is possible to use entropy as a regularization in a generative model. For example, in the conjugate model where $x_i \in X$ is observed data and generated from a Normal ...
2
votes
0
answers
321
views
If a zero entropy distribution implies high information a priori, what does it mean ex posteriori?
The following counteracts the statements made for the maximum entropy principle case in order to posit a pseudo "minimum entropy principle" case that is simply the polar opposite of the ...
0
votes
1
answer
639
views
Which has minimum concentration: the uniform distribution or the maximum entropy distribution?
For a continuous random variable, the uniform distribution has high entropy because it demonstrates the greatest level uncertainty.
However, this conflicts with the maximum entropy principle, which ...
2
votes
0
answers
223
views
Multiplying vector by the covariance matrix only known approximately
(cross-posted on math.SE)
For random variable $(x,y)$ in $\mathbb{R}^{2d}$ and vector $v$, I need to perform the following operation on a $d \times d$ covariance matrix $E[xy']$
$$T(v)=E[xy']v$$
The ...
3
votes
2
answers
578
views
How does a distribution's differential entropy correspond to its moments?
The Gaussian distribution maximizes entropy for the following functional constraints
$$E(x) = \mu$$
and
$$E((x-\mu)^2) = \sigma^2$$
which are just its first and second statistical moments (true ...
4
votes
1
answer
2k
views
Does minimizing KL-divergence result in maximum entropy principle?
The Kullback-Leibler divergence (or relative entropy) is a measure of how a probability distribution differs from another reference probability distribution. I want to know what connection it has to ...
2
votes
1
answer
3k
views
What does maximizing mutual information do?
In information theory, there is something called the maximum entropy principle. Are other information measures, such as mutual information, also commonly maximized? If mutual information describes the ...
1
vote
2
answers
1k
views
Do all random variables' probability distributions have entropy?
Entropy of probability distributions is the weighted average of the log probabilities of each observation of a random variable. Does this mean that every random variable that has a probability ...
6
votes
1
answer
980
views
Why do we want a maximum entropy distribution, if it has the lowest information?
It is said that the distribution with the largest entropy should be chosen as the least-informative default. That is, we should choose the distribution that maximizes entropy because it has the lowest ...
1
vote
0
answers
38
views
Am I understanding correctly the Maximum Entropy concept using a sentence?
In the sentence "The house is white", each word carries a different amount of information. If I remove the "The" from the sentence, almost nothing happens: you are still able to ...
3
votes
1
answer
198
views
Continuous Entropy and Maximum Entropy Solution
This is a problem that I have been working on and the mathematics of it have me fairly stumped.
I am given the continuous entropy for a density $p(x)$. It is $H(X)=-\int_{0}^{\infty}p(x)\text{log}\: ...
2
votes
1
answer
55
views
Is this implementation detail for solving maximum entropy on a computer correct?
I am currently looking at a paper by Mattos and Veiga, who describe an approach to solving the maximum entropy problem subject to linear constraints:
$$\begin{aligned}
\max_{p_i} -\sum_{i=1}^N p_i \...
2
votes
1
answer
633
views
When is uniform distribution have maximum entropy instead of normal distribution?
As far as I know, when we have just data and no constraints (other than probabilities must add up to 1), the distribution that gives maximum entropy is uniform distribution. But when we know mean and ...
5
votes
0
answers
162
views
Why are $\mathbb{E}( \ln(x))$ and $\mathbb{E} ( \ln(1 - x))$ reasonable descriptions of knowledge about a beta distribution?
The max entropy philosophy states that given some constraints on the prior, we should choose the prior that is maximum entropy subject to those constraints.
I know that the Beta($\alpha, \beta$) is ...
9
votes
1
answer
229
views
Do Lévy α-stable distributions maximize entropy subject to a simple constraint?
Is there a simple constraint on real-valued distributions such that the maximum entropy distribution is Lévy α-stable? Special cases include the Normal and Cauchy distributions for which the answer is ...
1
vote
1
answer
174
views
How do I prove conditional entropy is a good measure of information?
This question is a follow-up of Does "expected entropy" make sense?, which you don't have to read as I'll reproduce the relevant parts. Let's begin with the statement of the problem
A student has ...
4
votes
0
answers
325
views
Are there nonparametric generative models for datasets?
Typically when I see generative models, e.g., Latent Dirichlet Allocation (JMLR) or Linear/Quadratic Discriminant Analysis (wikipedia LDA), they are probabilistic models that belong to the exponential ...
5
votes
2
answers
802
views
Intuition for the uniform distribution having the maximum entropy
I saw the following explanation for Entropy in probability:
(Entropy). The surprise of learning that an event with probability $p$ happened is defined as $\log_2(1/p)$, measured in a unit called ...
2
votes
1
answer
963
views
MaxEnt model vs cross entropy loss
Pardon my ignorance. I am still learning.
We try to minimize the cross-entropy loss for best results.
However, why should the entropy be high for a MaxEnt model for the model to be good?
My ...
0
votes
0
answers
30
views
Maximum entropy function, with f(0)=0?
I want to derive the Maximum Entropy distribution (f(x)) with the following constraints: 1. non-negative 2. specified mean 3. specified variance 4. f(0)=0 I know how to derive the MaxEnt distro with ...
1
vote
0
answers
19
views
Maximum entropy prior for dichotomous variables [closed]
I have a set of dichotomous variables $A, B, C,$... and I know their probabilities $P(A), P(B), P(C),$... as well es their pairwise dependencies $P(A \cap B), P(A \cap C), P(B \cap C),$... . Or in ...
1
vote
0
answers
55
views
Why generative models in Machine Learning are Boltzmann distribution-backed?
I learned from this review paper that MaxEnt models naturally display a Boltzmann distribution for the data samples, it comes from the Principle of Maximum Entropy. But I could not understand why this ...