Questions tagged [information-theory]
The science of compressing and communicating information. It is a branch of applied mathematics and electrical engineering. Though originally the focus was on digital communications and computing, it now finds wide use in biology, physics and other sciences.
2,531 questions
0
votes
0
answers
21
views
How to calculate the channel capacity of the transmit model?
I recently face the problem to work out the channel capacity of the following transmit model: The recieved signal
$$
{\boldsymbol{y}}=\boldsymbol{h}\left(\boldsymbol{Fx}\otimes\boldsymbol{I}_{N_r}\...
-1
votes
0
answers
33
views
Regarding Kolmogorov complexity, I need to show the following theorem.
The original text is in Japanese but I used ChatGPT to translate it.
For any natural number c, there exists a natural number n such that K(n)>c.
For any consistent axiomatic system T capable of ...
1
vote
1
answer
54
views
The relation between nonadjacency preserving map and fractional chromatic number
In the paper "Graph information ratio", $\phi:V(G^k)\to V(H^n)$ is nonadjacency preserving if for any $g\not\sim g'$, $\phi(g)\not\sim \phi(g')$, $\not\sim$ means not adjacent and distinct. ...
1
vote
0
answers
85
views
Imperfect coding theory problem: Locating $j$ and determining the best constant $k_{q}$ [closed]
Let $\mathbf{x}= \left ( x_{1}, x_{2}, \ldots, x_{n} \right )\in\left\{ 0, 1, \ldots, q- 1 \right\}^{n}$ be unknown, but its syndrome is given:
$$\operatorname{Syn}\left ( \mathbf{x} \right )= \sum\...
-1
votes
0
answers
31
views
What is the graph information ratio when one of the graph is complete
In the paper "Graph Information Ratio", the information ratio is defined to be $\textrm{Ir}(H/G)=\sup \{k/n:\exists \textrm{ a nonadjacency preserving mapping from } G^k \textrm{ to } H^n\}$,...
0
votes
2
answers
53
views
How to get an understanding of information entropy.
In my book it says:
If we are told that a highly improbable event has occurred, we will have received more information than if we were told that some likely event has just occurred.
I have come up ...
0
votes
0
answers
10
views
$\Omega(\log T)$ minimax regret lower bound for a 2-armed bandit with Uniform rewards
I'm studying multi-armed bandit problems and encountered the following scenario:
We have a 2-armed bandit where each arm $ i $ yields rewards distributed as
$$
X_i = \mu_i + \text{Unif}[-1, 1],
$$
...
1
vote
2
answers
55
views
Uniformity of X Given a Doubly Stochastic Conditional Matrix
I have two discrete finite random variables $X$ and $Y$, both defined over the same finite domain. The conditional distribution of $Y$ given $X$ is represented by a square matrix $W$, where the $x$-th ...
1
vote
0
answers
35
views
What is the maximum subspace of $\mathbb{F}_2^n$ that only contains $0$ and elements of weight larger than $w$? [closed]
For element $x$ in binary linear space $\mathbb{F}_2^n$, $\mathrm{wt}(x)$ is the weight function. Define the set
$$A := \{x \in \mathbb{F}_2^n \mid \mathrm{wt}(x) \ge w\}. $$
What is the largest ...
0
votes
0
answers
32
views
Information content of a square wave with random cycles
How can I measure the information content of a square wave with random cycles?
This signal has the values of 0 and 1, but the time that it remains in one of the states changes, so the time is ...
0
votes
1
answer
37
views
Behavior of argmax of dependent random variables
Suppose $X_1...X_N$ are a set of dependent random variables with the same support (continuous). They do not necessarily have the same distribution. Let $Y$ be $\text{argmax} ({X_1... X_n})$, so with ...
0
votes
1
answer
58
views
Polar codes and Reed-Muller codes
Can someone explain the construction, or give a definition, of polar codes, say for the binary symmetric channel? [I seriously can't even find a definition.] I'd like an explanation, if possible, of ...
1
vote
0
answers
47
views
Inequality between $f$-divergences and Fisher information
I am wondering if there exists a general global inequality between some $f$-divergence and Fisher information of the form:
$$\forall \theta_0, \theta_1 ~ D_f(p_{\theta_0} || p_{\theta_1}) \leq C (\...
0
votes
0
answers
21
views
Reversing data processing inequality for f-divergences for certain types of corruptions
Consider a random variable $X$. Let $Z ~ \mathcal{N}(0, \sigma^2)$. Let $Y = X + Z$. For distributions $p_X, q_X$ and $p_Y, q_Y$, data processing inequality gives us
$$D_f(p_y \| q_y) \leq D_f(p_x \| ...
0
votes
1
answer
58
views
lipschitzness for the gradient of KL divergence $KL(P||Q_\phi)$ where $P$ is fixed and $Q$ is parameterized by $\phi$
I am considering KL divergence $\text{KL}(P||Q_\phi)$ between two continuous distributions $P$ and $Q$ where $P$ is fixed and $Q$ is a Gaussian parameterized by $\phi$. Is it possible to prove that ...
0
votes
1
answer
32
views
how to derive the feature importance based on information?
Suppose we have features $x_1,...,x_n$ and target $y$, we want to decide the feature importance of each $x_i$ to $y$. Define a quantity (based on information) like: $I(y,x_1,...,x_n) = log_2(P(y|x_1,.....
2
votes
1
answer
105
views
Interpreting "Log Probability" in Optimization/Statistics/Machine Learning
It seems like in machine learning we work with log probabilities a lot, like, minimize, maximize log probability. This makes sense from a numerical perspective since adding is easier than multiplying ...
2
votes
0
answers
62
views
On the Abstract Structure Behind Oracle Problems
Here is what I mean by an 'Oracle Problem'.
An universe set or class $U$, a family of queries $\mathcal Q$ of partitions of $U$ and a target element $t\in U$ are fixed. You're not aware of the target ...
0
votes
0
answers
35
views
Given a channel transition matrix ($\bf{P}$) of DMC channel, does the capacity of $\mathbf{P}$ equal to that achieved with $\mathbf{P}^T$?
I'm a bit confused with this.
Since $I(X;Y) = I(Y;X)$, at first I thought it implies that Capacity($\mathbf{P}$) = Capacity($\mathbf{P}^T$), i.e., the capacity achieved with a transition matrix, ...
0
votes
1
answer
48
views
Optimal coding for uniform distribution?
I have asked this in Computer science stack exchange. Here: Optimal coding for uniform distribution. My answer to the question only provides intuition. I fail to make this mathematically valid. The ...
1
vote
1
answer
68
views
Conditional expectation of the second occurrence of an $n$-digit pattern knowing that it occurred in the beginning of the string $p$
I was confronted with the problem of proving the following lemma :
Consider the semi-infinite sequence of digits $d_i$.\begin{align} d = [d_0,d_1,d_2,... \end{align}
We have $L \geq 2$ possible ...
0
votes
0
answers
34
views
What do Autoencoders do form a statistical point of view?
As the title suggests, i was wondering how can autoencoders be interpreted from a statistical point of view. Here, i'll try to give you my interpretation but, since i'm not too familiar with ...
1
vote
1
answer
55
views
Weird manipulation using log-sum inequality
This is going to be a simple question. The statement is simple:
Given a probability vector $a$ and a doubly stochastic matrix $P$ which satisfies
$$\sum_{i}P_{ij} = 1 = \sum_{j} P_{ij}$$
, then $b = ...
0
votes
0
answers
41
views
Mutual information upper-bound
For a given $c\in(0,1)$, a positive integer $n$ and independent random variables $X$ and $Y$, I am interested in obtaining an upper-bound on the mutual information between $c^n X+Y$ and $X$.
One (...
0
votes
1
answer
53
views
Gaussian Distribution as minimizer of difference between two differential entropies [closed]
Let $N$ be a Gaussian random variables with variance $\sigma^2$, and let $X$ be another random variable independent of $N$ with variance $P$. I want to show that the Gaussian distribution for the ...
0
votes
0
answers
176
views
How I could evaluate Gabor's Uncertainty relation for a system with finite duration?
How I could evaluate Gabor's Uncertainty relation for a system with finite duration?
Intro____________
In this another question I think I found an ansatz for solving the PDE $$ \Delta |u|^{\frac12} = ...
0
votes
0
answers
34
views
What are the asymptotic properties of plain kolmogorov complexity?
This is somewhat of a vague question, but here is an example:
If $C$ is plain kolmogorov complexity, and $f$ is a computable injective function, does $C(f(t)) = \Theta(C(t))$ regardless of the ...
0
votes
0
answers
51
views
How Hamming Distance changes with added vector
Suppose we have $k$ vectors, $v_1, \dots, v_k$, where $v_i = (v_{i,1}, \dots, v_{i,n})$, all of the same length $n$ over a finite field $\mathbb{F}_p$. We can construct a matrix $V$:
\begin{equation}
...
1
vote
0
answers
24
views
Prove optimality of Huffman code
This is going to be short question. I have trouble generalizing the result of Huffman optimality for two-alphabet vocabulary. The book I'm following suggests an inductive approach, but I find it not ...
0
votes
1
answer
60
views
Partition form of the KL divergence.
A definition of $KL$ divergence is the following:
$$
KL(P||Q)=\underset{\mathcal{Z}}{\sup}KL(P_{\mathcal{Z}}||Q_{\mathcal{Z}}),
$$
where $Z$ is any finite partition of the underlying sample space and $...
-1
votes
1
answer
69
views
Le Cam's two point method [closed]
I am looking for a source for Le Cam's two point method and I can't find any book that states in the form I could cite it. Specifically I am looking for the following result
$$
\sup_{\theta \in \Theta}...
0
votes
0
answers
18
views
Understanding Statistical Manifolds for Discrete Events vs Continuous Sample Spaces
Consider a probability space $(\Omega, \mathcal{F}, P)$ where:
$\Omega$ is the sample space
$\mathcal{F} = \{A_1, A_2, A_3, A_4, A_5\}$ is a finite $\sigma$-algebra
$P$ is a probability measure
We ...
0
votes
0
answers
26
views
Understanding Non-parametric Probability Density Estimation
I'm trying to deepen my understanding of non-parametric probability density estimation methods and their relation to parametric methods. I'd like to verify if my current understanding is correct:
In ...
0
votes
0
answers
40
views
Informativeness and entropy, Sims 2010 chapter 4
I have a potential very simple question for you, a little bit tough for me. I'm reading the chapter 4 of the famous handbook of economics by Woodford, where Sims is introducing the idea of the ...
1
vote
0
answers
30
views
Is the conditional smooth min Entropy of two random variables larger than the conditional smooth min Entropy of one
I have tried to prove the following inequality for smooth min-entropies
\begin{equation}
H_{min}^{\epsilon}(XY|K) \geq H_{min}^{\epsilon}(X|K)
\end{equation}
I started trying to prove it for the ...
0
votes
0
answers
52
views
Why standardized LDPC codes, such 3GPP NR, have 4 cycles in Tanner Graph?
We know that short cycles in Tanner Graph are detrimental in error performance. So, Why standardized 5G NR LDPC codes have 4-cycles?
3rd generation Partnership Project (3GPP) had announced Base Matrix ...
0
votes
1
answer
39
views
Illustration of high-probability and typical set in Elements of information theory?
I'm following the book Elements of information theory by Joy A. Thomas, Thomas M. Cover. Here's the link to the text for you to understand the question: Elements of information theory
At the end of ...
1
vote
1
answer
42
views
Compute the Expected Codeword Length when using a Huffman-Code for a Source that Generates the German Alphabet
Lemma we have to use:
Let $ E(p_1, \dots, p_n) $ be the average codeword length of a Huffman code for an alphabet with $ n $ letters and probabilities $ p_1, \dots, p_n $. Let $ p_1 $ and $ p_2 $ be ...
1
vote
0
answers
46
views
$\frac{1}{N}\sum_{n=1}^{N}\ln\left(\frac{p^{z_n}(1-p)^{1-z_n}}{{q^{z_n}(1-q)^{1-z_n}}}\right)$ approximates $D_{KL}(\text{Bern}(p)||\text{Bern}(q))$?
Let $Z_n\overset{iid}{\sim}\text{Bernoulli}(p)$ for some $p\in[0,1]$. Fix some $q\in[0,1]$, where $q$ may $-$ or may not $-$ be equal to $p$.
Fix some $N\in\mathbb{N}$ and fix some sequence of ...
0
votes
0
answers
45
views
Entropy of random walk on a chessboard queen less than rook + bishop
I am looking at Cover's textbook on information theory and computed the entropies for random walks of chess pieces on an 8x8 board as follows:
Rook: can move to 14 squares wherever it is $\to H(Rook) =...
0
votes
0
answers
13
views
Geodesic Behavior in the Calvo-Oller Embedding of Multivariate Normal Distributions
The $\textbf{Calvo-Oller Embedding}$, introduced by Calvo and Oller, provides a novel way to study multivariate normal distributions by embedding their parameter space into the manifold of symmetric ...
2
votes
1
answer
149
views
Lower bound for Kullback–Leibler divergence between Bernoulli distributed random variables
Let $D(x, y)$ be the Kullback–Leibler divergence between Bernoulli distributed random variables, i. e. $D(x, y) = x \ln \frac{x}{y} + (1-x) \ln \frac{1-x}{1-y}$.
I've seen this lower bound for this ...
-2
votes
1
answer
51
views
Why Do the Entropies of Discrete and Continuous Distributions Diverge, Even When Their Distributions Converge? [closed]
I'm grappling with the concept of entropy as it relates to discrete and continuous distributions, and I’d appreciate some help understanding a thought experiment that’s left me puzzled.
Thought ...
0
votes
0
answers
57
views
Maximizing the KL Divergence when it is Bounded
I've recently been working on trying to find the "worst" possible model, in the Kullback-Leibler (KL) divergence sense, to look at the robustness of an underlying algorithm. I was wondering ...
1
vote
0
answers
41
views
Show that $r(X,Y)=0$ when Y is a function of X.
Let $X$ and $Y$ be two random variables. Then a measure called conditional residual extropy, defined as
$$J(X|Y)=-\frac{1}{2}\int_0^\infty \bar{F}^2(x|y)dx, \hspace{2mm}y>0$$ where $\bar{F}_{X|Y}(x\...
1
vote
1
answer
46
views
Question about Gaussian Mixture Model
A random vector $X = (X_1, …, X_k)'$ is said to have the multivariate normal distribution if it satisfies the following equivalent conditions.
Every linear combination of its components $Y = a_1X_1 + ...
0
votes
0
answers
27
views
Conditional limit theorem for method of types with explicit bounds in terms of n
A lot of large deviation theory is only concerned seemingly with the asymptotic nature of certain probabilities. Mainly because I imagine most applications have $n$ so big that lower order terms are ...
1
vote
0
answers
55
views
Understanding the entropy of the normal distribution
Using the standard definition for $H[p]$, you can show that the entropy of a Normally distributed variable is
$$
H(\mu, \sigma^2) = \frac{1}{2} \ln\left(2\pi\sigma^2\right) + \frac{1}{2}
$$
in units ...
1
vote
0
answers
65
views
Is there any good definition of the "interesting" information contained in a string?
The most well known measurement of a string's information content is Kolmogorov complexity. By this measurement random strings contain the most information. However it's clear that the information ...
6
votes
3
answers
174
views
Link between channel capacity and mutual information
I am confused about the following: How can channel capacity, which is measured in bits per second, be equal to (the maximum) mutual information, which is measured in bits? I am confused. Could someone ...