Skip to main content

Questions tagged [maximum-entropy]

maximum entropy or maxent is a statistical principle derived from information theory. Distributions maximizing entropy (under some constraints) are thought to be "maximally uninformative" given the constraints. Maximum entropy can be used for multiple purposes, like choice of prior, choice of sampling model, or design of experiments.

Filter by
Sorted by
Tagged with
0 votes
0 answers
12 views

Why is formalism of max entropy principle for continuous case different from discrete case?

In discrete case (Chapter 11 of Jaynes' Probability Theory: The Logic of Science), max entropy principle states that we need to encode our ignorance prior by doing a constrained optimization problem, ...
username123's user avatar
2 votes
1 answer
32 views

Which operations on distributions respect MaxEnt property?

Seems like MaxEnt property for log-normal distribution follows directly from MaxEnt property of normal. So for any Y, such that ...
uhbif19's user avatar
  • 123
2 votes
0 answers
16 views

Maximum entropy distributions with more general constraints

Gibbs showed that for a space $X$ (assume finite for simplicity) and functions $f_i:X\to R$, the maximum entropy distribution on $X$ s.t. constraints on the expectation of $f_i$ is the Boltzmann ...
user56834's user avatar
  • 2,987
1 vote
0 answers
31 views

Solve for maximum entropy conditional probability

I'm new to max-ent principle and functional derivatives. I have known joint data distribution $p_D(x,y)$ (where $y$ is regarded as the labels) and a latent variable model $(x,y,z)$ with the prior $p(z)...
Kaiwen's user avatar
  • 211
0 votes
0 answers
48 views

Under What Conditions Does a Gaussian Mixture Model (GMM) Have Maximum Entropy?

Introduction I'm delving into Gaussian Mixture Models (GMMs) within unsupervised learning frameworks and am particularly interested in their statistical properties, with a focus on entropy. Entropy ...
Alireza's user avatar
  • 113
1 vote
0 answers
35 views

Exponential families as families of limite distributions of Markov processes

An exponential family verifies a maximum entropy property: each density is the maximum entropy density given the expectation of its sufficient statistic. On the other hand, from my understanding, the ...
Chevallier's user avatar
1 vote
0 answers
31 views

Maximum Entropy distribution of a ticking clock

Say I have a clock that emits "ticks". An ideal clock looks like a dirac comb. It has: perfect periodicity of ticks (there is a precise fixed time interval between any two consecutive ticks)...
kram1032's user avatar
  • 277
0 votes
0 answers
67 views

Minimizing cross entropy over a restricted domain?

Suppose $f(x;q)$ is the true distribution. The support of the random variable $X$ is $\Omega$. Suppose, I am interested in a particular subset of $\Xi \subset \Omega$. I would like to minimize the ...
entropy's user avatar
  • 19
2 votes
1 answer
198 views

When and how was the Bernoulli distribution with real binomial proportion introduced?

I certainly should read Jakob Bernoulli's Ars Conjectandi again but let me share my concerns. I'm just wondering when and how the Bernoulli distribution $Be(p)$ (and related distributions like the ...
Student's user avatar
  • 39
4 votes
1 answer
103 views

Does every distribution family have a set of maximum entropy constraints?

I am reflecting on these examples of maximum entropy distributions. I am (pleasantly) surprised that various common distribution families have maximum entropy constraints. It got me wondering if ...
Galen's user avatar
  • 9,680
1 vote
1 answer
103 views

Is the principle of maximum entropy misleading?

If a distribution belongs to a certain class, then the distribution with the largest entropy in that class is typically referred to as the least-informative distribution. To me, this his highly ...
Mr Saltine's user avatar
2 votes
1 answer
226 views

What is the reasoning behind max entropy constraints for the gamma distribution?

The max entropy method is a way of deriving a probability distribution given only the information you know about how the data is distributed and nothing more. For example the normal distribution can ...
Davey's user avatar
  • 171
8 votes
2 answers
580 views

Jaynes' Description of Maximum Entropy Distribution

I am reading E. T. Jaynes' probability theory book, and I am at chapter 11 where he introduces the maximum entropy principle. I understand that Jaynes separates the notion of probability from that of ...
Feri's user avatar
  • 187
1 vote
1 answer
150 views

How can we use shannon entropy to discriminate between two similar probability distribution function?

I studied two papers related to discriminating between two similar distributions using Shannon entropy. But both of them had different views. Can anyone explain what would be the basic flow of idea to ...
nishant's user avatar
  • 11
5 votes
1 answer
2k views

Choosing "Target Entropy" for Soft-Actor-Critic (SAC) algorithm

I am quite familiar with Soft-Actor-Critic (SAC) and its many applications in continuous control RL environments. However, when implementing this algorithm in a practical setting, one thing that still ...
Alerra's user avatar
  • 205
1 vote
0 answers
86 views

Discrete Bayes Net learning under parameter constraints

What is some relevant research available on estimating the parameters of a Bayes Net (with known structure) when there are known constraints on conditional and marginal probabilities? For example, ...
Innuo's user avatar
  • 1,168
2 votes
1 answer
209 views

Highest Entropy Distribution, on $[0,\infty)$ given Mean, Variance, and goes to $p(0) = 0$?

I am dealing with temperature measurements, and normally we assume the probability of getting a measurement $t_i$ with a certain uncertainty $\sigma_t$ given the model ('true' value) $M(x_i)$ (where $...
Craig's user avatar
  • 133
3 votes
0 answers
93 views

Maximum entropy distribution of a positive continuous variable with known mean and vanishing probability at 0

I am working on a problem where I know that the variable of concern $x$ is positive, and has no upper bound on its value and whose probability would vanish as we approach 0, $\lim_{x \rightarrow 0^+} ...
Ishan Kashyap Hazarika's user avatar
3 votes
1 answer
34 views

How to statistically detect a treshold effect over a dependent variable measured repeated times on the same population

I want to identify the level of a predictive variable X (with Gaussian distribution) able to induce a reduction in a variable y (with Poisson distribution), that has been measured over the same ...
Agus Camacho's user avatar
0 votes
0 answers
402 views

Computing the gradient of the log-partition function in a linear-chain conditional random field (CRF) model

Query. When computing the gradient of the log-partition function for an exponential family distribution specified by the linear-chain conditional random field (CRF) model, will unary conditional ...
microhaus's user avatar
  • 2,630
3 votes
0 answers
431 views

Geometric distribution and entropy

According to wikipedia, among all discrete probability distributions supported on $\{1, 2, 3, ... \}$ with given expected value $\mu$, the geometric distribution X with parameter $p = \frac{1}{ \mu} $ ...
ABK's user avatar
  • 668
1 vote
0 answers
118 views

What is the maximum entropy joint Bernoulli distribution with fixed covariances and individual means?

We have Bernoulli variables $B_i$ with known means $E(B_i)$ and covariance matrix $\Sigma = (cov(B_i, B_j))$. What joint distribution would have the maximum entropy?
Solveit's user avatar
  • 111
1 vote
0 answers
88 views

How to evaluate the likelihood of a conditional MAXENT estimation?

Suppose I have a random variable $Y$ (the outcome) and a set of random variables $\mathbf{X}$ (the input variables). I don't have access to observations of the joint distribution of $P(Y, \mathbf{X})$,...
Sergio's user avatar
  • 336
3 votes
0 answers
123 views

Generalization of Burg's Maximum Entropy Theorem

Burg's Theorem characterizes the form of an entropy-maximizing time series, subject to constraints on the autocorrelation. More precisely, the theorem states that the autoregressive Gaussian process $...
Simon Segert's user avatar
  • 2,114
1 vote
0 answers
15 views

Complexity of Maximum Entropy Algorithm in Sentiment Analysis

Does anyone know how the process of calculating the complexity of the maximum entropy algorithm and its implementation later in the sentiment analysis? Please help me, because I haven't got a ...
Triska Pangaribuan's user avatar
3 votes
0 answers
108 views

What determines the functional form of maximum entropy constraints?

I'm familiar with the maximum entropy (ME) principle in statistical mechanics, where, for example, the Boltzmann distribution $p(\epsilon_i|\beta)$ is identified as the ME distribution constrained by ...
marnix's user avatar
  • 88
5 votes
1 answer
3k views

Is there a relationship between Maximum Likelihood Estimation and the Maximum Entropy Principle?

I know that both techniques can be used to estimate distribution from the data, but I didn't see anything in common between the two and I haven't found anything yet for the internet that relates the ...
Raphael Augusto's user avatar
2 votes
0 answers
117 views

Maximum entropy discrete distributions with specified mean

Consider a discrete distribution on {1, ...,n}, with mean given as $m$, what is the maximum entropy distribution? I know it takes the form $p_{X}(k)=ar^{k}$ and is a geometric distribution when n is ...
triplester's user avatar
3 votes
0 answers
91 views

entropy regularization in generative model

I am wondering if it is possible to use entropy as a regularization in a generative model. For example, in the conjugate model where $x_i \in X$ is observed data and generated from a Normal ...
JYY's user avatar
  • 767
2 votes
0 answers
321 views

If a zero entropy distribution implies high information a priori, what does it mean ex posteriori?

The following counteracts the statements made for the maximum entropy principle case in order to posit a pseudo "minimum entropy principle" case that is simply the polar opposite of the ...
develarist's user avatar
  • 4,049
0 votes
1 answer
639 views

Which has minimum concentration: the uniform distribution or the maximum entropy distribution?

For a continuous random variable, the uniform distribution has high entropy because it demonstrates the greatest level uncertainty. However, this conflicts with the maximum entropy principle, which ...
develarist's user avatar
  • 4,049
2 votes
0 answers
223 views

Multiplying vector by the covariance matrix only known approximately

(cross-posted on math.SE) For random variable $(x,y)$ in $\mathbb{R}^{2d}$ and vector $v$, I need to perform the following operation on a $d \times d$ covariance matrix $E[xy']$ $$T(v)=E[xy']v$$ The ...
Yaroslav Bulatov's user avatar
3 votes
2 answers
578 views

How does a distribution's differential entropy correspond to its moments?

The Gaussian distribution maximizes entropy for the following functional constraints $$E(x) = \mu$$ and $$E((x-\mu)^2) = \sigma^2$$ which are just its first and second statistical moments (true ...
develarist's user avatar
  • 4,049
4 votes
1 answer
2k views

Does minimizing KL-divergence result in maximum entropy principle?

The Kullback-Leibler divergence (or relative entropy) is a measure of how a probability distribution differs from another reference probability distribution. I want to know what connection it has to ...
develarist's user avatar
  • 4,049
2 votes
1 answer
3k views

What does maximizing mutual information do?

In information theory, there is something called the maximum entropy principle. Are other information measures, such as mutual information, also commonly maximized? If mutual information describes the ...
develarist's user avatar
  • 4,049
1 vote
2 answers
1k views

Do all random variables' probability distributions have entropy?

Entropy of probability distributions is the weighted average of the log probabilities of each observation of a random variable. Does this mean that every random variable that has a probability ...
develarist's user avatar
  • 4,049
6 votes
1 answer
980 views

Why do we want a maximum entropy distribution, if it has the lowest information?

It is said that the distribution with the largest entropy should be chosen as the least-informative default. That is, we should choose the distribution that maximizes entropy because it has the lowest ...
develarist's user avatar
  • 4,049
1 vote
0 answers
38 views

Am I understanding correctly the Maximum Entropy concept using a sentence?

In the sentence "The house is white", each word carries a different amount of information. If I remove the "The" from the sentence, almost nothing happens: you are still able to ...
perep1972's user avatar
3 votes
1 answer
198 views

Continuous Entropy and Maximum Entropy Solution

This is a problem that I have been working on and the mathematics of it have me fairly stumped. I am given the continuous entropy for a density $p(x)$. It is $H(X)=-\int_{0}^{\infty}p(x)\text{log}\: ...
pflykyle's user avatar
  • 103
2 votes
1 answer
55 views

Is this implementation detail for solving maximum entropy on a computer correct?

I am currently looking at a paper by Mattos and Veiga, who describe an approach to solving the maximum entropy problem subject to linear constraints: $$\begin{aligned} \max_{p_i} -\sum_{i=1}^N p_i \...
stats_model's user avatar
  • 2,515
2 votes
1 answer
633 views

When is uniform distribution have maximum entropy instead of normal distribution?

As far as I know, when we have just data and no constraints (other than probabilities must add up to 1), the distribution that gives maximum entropy is uniform distribution. But when we know mean and ...
ikadorus's user avatar
5 votes
0 answers
162 views

Why are $\mathbb{E}( \ln(x))$ and $\mathbb{E} ( \ln(1 - x))$ reasonable descriptions of knowledge about a beta distribution?

The max entropy philosophy states that given some constraints on the prior, we should choose the prior that is maximum entropy subject to those constraints. I know that the Beta($\alpha, \beta$) is ...
Elle Najt's user avatar
  • 221
9 votes
1 answer
229 views

Do Lévy α-stable distributions maximize entropy subject to a simple constraint?

Is there a simple constraint on real-valued distributions such that the maximum entropy distribution is Lévy α-stable? Special cases include the Normal and Cauchy distributions for which the answer is ...
fritzo's user avatar
  • 240
1 vote
1 answer
174 views

How do I prove conditional entropy is a good measure of information?

This question is a follow-up of Does "expected entropy" make sense?, which you don't have to read as I'll reproduce the relevant parts. Let's begin with the statement of the problem A student has ...
nalzok's user avatar
  • 1,817
4 votes
0 answers
325 views

Are there nonparametric generative models for datasets?

Typically when I see generative models, e.g., Latent Dirichlet Allocation (JMLR) or Linear/Quadratic Discriminant Analysis (wikipedia LDA), they are probabilistic models that belong to the exponential ...
Sleepy 17's user avatar
5 votes
2 answers
802 views

Intuition for the uniform distribution having the maximum entropy

I saw the following explanation for Entropy in probability: (Entropy). The surprise of learning that an event with probability $p$ happened is defined as $\log_2(1/p)$, measured in a unit called ...
Dom Fomello's user avatar
2 votes
1 answer
963 views

MaxEnt model vs cross entropy loss

Pardon my ignorance. I am still learning. We try to minimize the cross-entropy loss for best results. However, why should the entropy be high for a MaxEnt model for the model to be good? My ...
sdk's user avatar
  • 21
0 votes
0 answers
30 views

Maximum entropy function, with f(0)=0?

I want to derive the Maximum Entropy distribution (f(x)) with the following constraints: 1. non-negative 2. specified mean 3. specified variance 4. f(0)=0 I know how to derive the MaxEnt distro with ...
AAndersson's user avatar
1 vote
0 answers
19 views

Maximum entropy prior for dichotomous variables [closed]

I have a set of dichotomous variables $A, B, C,$... and I know their probabilities $P(A), P(B), P(C),$... as well es their pairwise dependencies $P(A \cap B), P(A \cap C), P(B \cap C),$... . Or in ...
Maximilian's user avatar
1 vote
0 answers
55 views

Why generative models in Machine Learning are Boltzmann distribution-backed?

I learned from this review paper that MaxEnt models naturally display a Boltzmann distribution for the data samples, it comes from the Principle of Maximum Entropy. But I could not understand why this ...
matte's user avatar
  • 11