Skip to main content

Questions tagged [information-theory]

The science of compressing and communicating information. It is a branch of applied mathematics and electrical engineering. Though originally the focus was on digital communications and computing, it now finds wide use in biology, physics and other sciences.

Filter by
Sorted by
Tagged with
0 votes
0 answers
21 views

How to calculate the channel capacity of the transmit model?

I recently face the problem to work out the channel capacity of the following transmit model: The recieved signal $$ {\boldsymbol{y}}=\boldsymbol{h}\left(\boldsymbol{Fx}\otimes\boldsymbol{I}_{N_r}\...
RACHpreamble's user avatar
-1 votes
0 answers
33 views

Regarding Kolmogorov complexity, I need to show the following theorem.

The original text is in Japanese but I used ChatGPT to translate it. For any natural number c, there exists a natural number n such that K(n)>c. For any consistent axiomatic system T capable of ...
Aniq Zafri's user avatar
1 vote
1 answer
54 views

The relation between nonadjacency preserving map and fractional chromatic number

In the paper "Graph information ratio", $\phi:V(G^k)\to V(H^n)$ is nonadjacency preserving if for any $g\not\sim g'$, $\phi(g)\not\sim \phi(g')$, $\not\sim$ means not adjacent and distinct. ...
Delong's user avatar
  • 1,943
1 vote
0 answers
85 views

Imperfect coding theory problem: Locating $j$ and determining the best constant $k_{q}$ [closed]

Let $\mathbf{x}= \left ( x_{1}, x_{2}, \ldots, x_{n} \right )\in\left\{ 0, 1, \ldots, q- 1 \right\}^{n}$ be unknown, but its syndrome is given: $$\operatorname{Syn}\left ( \mathbf{x} \right )= \sum\...
kangwon_1502's user avatar
-1 votes
0 answers
31 views

What is the graph information ratio when one of the graph is complete

In the paper "Graph Information Ratio", the information ratio is defined to be $\textrm{Ir}(H/G)=\sup \{k/n:\exists \textrm{ a nonadjacency preserving mapping from } G^k \textrm{ to } H^n\}$,...
Delong's user avatar
  • 1,943
0 votes
2 answers
53 views

How to get an understanding of information entropy.

In my book it says: If we are told that a highly improbable event has occurred, we will have received more information than if we were told that some likely event has just occurred. I have come up ...
HMPtwo's user avatar
  • 533
0 votes
0 answers
10 views

$\Omega(\log T)$ minimax regret lower bound for a 2-armed bandit with Uniform rewards

I'm studying multi-armed bandit problems and encountered the following scenario: We have a 2-armed bandit where each arm $ i $ yields rewards distributed as $$ X_i = \mu_i + \text{Unif}[-1, 1], $$ ...
MartinS's user avatar
  • 88
1 vote
2 answers
55 views

Uniformity of X Given a Doubly Stochastic Conditional Matrix

I have two discrete finite random variables $X$ and $Y$, both defined over the same finite domain. The conditional distribution of $Y$ given $X$ is represented by a square matrix $W$, where the $x$-th ...
pmoi's user avatar
  • 101
1 vote
0 answers
35 views

What is the maximum subspace of $\mathbb{F}_2^n$ that only contains $0$ and elements of weight larger than $w$? [closed]

For element $x$ in binary linear space $\mathbb{F}_2^n$, $\mathrm{wt}(x)$ is the weight function. Define the set $$A := \{x \in \mathbb{F}_2^n \mid \mathrm{wt}(x) \ge w\}. $$ What is the largest ...
Jiawei Wu's user avatar
0 votes
0 answers
32 views

Information content of a square wave with random cycles

How can I measure the information content of a square wave with random cycles? This signal has the values of 0 and 1, but the time that it remains in one of the states changes, so the time is ...
SaeedTaghavi's user avatar
0 votes
1 answer
37 views

Behavior of argmax of dependent random variables

Suppose $X_1...X_N$ are a set of dependent random variables with the same support (continuous). They do not necessarily have the same distribution. Let $Y$ be $\text{argmax} ({X_1... X_n})$, so with ...
Ryan Park's user avatar
0 votes
1 answer
58 views

Polar codes and Reed-Muller codes

Can someone explain the construction, or give a definition, of polar codes, say for the binary symmetric channel? [I seriously can't even find a definition.] I'd like an explanation, if possible, of ...
yoyo's user avatar
  • 10.2k
1 vote
0 answers
47 views

Inequality between $f$-divergences and Fisher information

I am wondering if there exists a general global inequality between some $f$-divergence and Fisher information of the form: $$\forall \theta_0, \theta_1 ~ D_f(p_{\theta_0} || p_{\theta_1}) \leq C (\...
Ramufasa's user avatar
0 votes
0 answers
21 views

Reversing data processing inequality for f-divergences for certain types of corruptions

Consider a random variable $X$. Let $Z ~ \mathcal{N}(0, \sigma^2)$. Let $Y = X + Z$. For distributions $p_X, q_X$ and $p_Y, q_Y$, data processing inequality gives us $$D_f(p_y \| q_y) \leq D_f(p_x \| ...
kvarun95's user avatar
0 votes
1 answer
58 views

lipschitzness for the gradient of KL divergence $KL(P||Q_\phi)$ where $P$ is fixed and $Q$ is parameterized by $\phi$

I am considering KL divergence $\text{KL}(P||Q_\phi)$ between two continuous distributions $P$ and $Q$ where $P$ is fixed and $Q$ is a Gaussian parameterized by $\phi$. Is it possible to prove that ...
melatonin15's user avatar
0 votes
1 answer
32 views

how to derive the feature importance based on information?

Suppose we have features $x_1,...,x_n$ and target $y$, we want to decide the feature importance of each $x_i$ to $y$. Define a quantity (based on information) like: $I(y,x_1,...,x_n) = log_2(P(y|x_1,.....
Xu Shan's user avatar
  • 256
2 votes
1 answer
105 views

Interpreting "Log Probability" in Optimization/Statistics/Machine Learning

It seems like in machine learning we work with log probabilities a lot, like, minimize, maximize log probability. This makes sense from a numerical perspective since adding is easier than multiplying ...
Mason Wang's user avatar
2 votes
0 answers
62 views

On the Abstract Structure Behind Oracle Problems

Here is what I mean by an 'Oracle Problem'. An universe set or class $U$, a family of queries $\mathcal Q$ of partitions of $U$ and a target element $t\in U$ are fixed. You're not aware of the target ...
Alma Arjuna's user avatar
  • 5,643
0 votes
0 answers
35 views

Given a channel transition matrix ($\bf{P}$) of DMC channel, does the capacity of $\mathbf{P}$ equal to that achieved with $\mathbf{P}^T$?

I'm a bit confused with this. Since $I(X;Y) = I(Y;X)$, at first I thought it implies that Capacity($\mathbf{P}$) = Capacity($\mathbf{P}^T$), i.e., the capacity achieved with a transition matrix, ...
ChuSequence's user avatar
0 votes
1 answer
48 views

Optimal coding for uniform distribution?

I have asked this in Computer science stack exchange. Here: Optimal coding for uniform distribution. My answer to the question only provides intuition. I fail to make this mathematically valid. The ...
Duck Gia's user avatar
  • 111
1 vote
1 answer
68 views

Conditional expectation of the second occurrence of an $n$-digit pattern knowing that it occurred in the beginning of the string $p$

I was confronted with the problem of proving the following lemma : Consider the semi-infinite sequence of digits $d_i$.\begin{align} d = [d_0,d_1,d_2,... \end{align} We have $L \geq 2$ possible ...
soufiane yes's user avatar
0 votes
0 answers
34 views

What do Autoencoders do form a statistical point of view?

As the title suggests, i was wondering how can autoencoders be interpreted from a statistical point of view. Here, i'll try to give you my interpretation but, since i'm not too familiar with ...
Francesco De Santis's user avatar
1 vote
1 answer
55 views

Weird manipulation using log-sum inequality

This is going to be a simple question. The statement is simple: Given a probability vector $a$ and a doubly stochastic matrix $P$ which satisfies $$\sum_{i}P_{ij} = 1 = \sum_{j} P_{ij}$$ , then $b = ...
Duck Gia's user avatar
  • 111
0 votes
0 answers
41 views

Mutual information upper-bound

For a given $c\in(0,1)$, a positive integer $n$ and independent random variables $X$ and $Y$, I am interested in obtaining an upper-bound on the mutual information between $c^n X+Y$ and $X$. One (...
ba029188's user avatar
0 votes
1 answer
53 views

Gaussian Distribution as minimizer of difference between two differential entropies [closed]

Let $N$ be a Gaussian random variables with variance $\sigma^2$, and let $X$ be another random variable independent of $N$ with variance $P$. I want to show that the Gaussian distribution for the ...
Nick Cooper's user avatar
0 votes
0 answers
176 views

How I could evaluate Gabor's Uncertainty relation for a system with finite duration?

How I could evaluate Gabor's Uncertainty relation for a system with finite duration? Intro____________ In this another question I think I found an ansatz for solving the PDE $$ \Delta |u|^{\frac12} = ...
Joako's user avatar
  • 1,756
0 votes
0 answers
34 views

What are the asymptotic properties of plain kolmogorov complexity?

This is somewhat of a vague question, but here is an example: If $C$ is plain kolmogorov complexity, and $f$ is a computable injective function, does $C(f(t)) = \Theta(C(t))$ regardless of the ...
William Oliver's user avatar
0 votes
0 answers
51 views

How Hamming Distance changes with added vector

Suppose we have $k$ vectors, $v_1, \dots, v_k$, where $v_i = (v_{i,1}, \dots, v_{i,n})$, all of the same length $n$ over a finite field $\mathbb{F}_p$. We can construct a matrix $V$: \begin{equation} ...
CatsAndDogs's user avatar
1 vote
0 answers
24 views

Prove optimality of Huffman code

This is going to be short question. I have trouble generalizing the result of Huffman optimality for two-alphabet vocabulary. The book I'm following suggests an inductive approach, but I find it not ...
Duck Gia's user avatar
  • 111
0 votes
1 answer
60 views

Partition form of the KL divergence.

A definition of $KL$ divergence is the following: $$ KL(P||Q)=\underset{\mathcal{Z}}{\sup}KL(P_{\mathcal{Z}}||Q_{\mathcal{Z}}), $$ where $Z$ is any finite partition of the underlying sample space and $...
Sushant Vijayan's user avatar
-1 votes
1 answer
69 views

Le Cam's two point method [closed]

I am looking for a source for Le Cam's two point method and I can't find any book that states in the form I could cite it. Specifically I am looking for the following result $$ \sup_{\theta \in \Theta}...
craaaft's user avatar
  • 783
0 votes
0 answers
18 views

Understanding Statistical Manifolds for Discrete Events vs Continuous Sample Spaces

Consider a probability space $(\Omega, \mathcal{F}, P)$ where: $\Omega$ is the sample space $\mathcal{F} = \{A_1, A_2, A_3, A_4, A_5\}$ is a finite $\sigma$-algebra $P$ is a probability measure We ...
Andyale's user avatar
  • 51
0 votes
0 answers
26 views

Understanding Non-parametric Probability Density Estimation

I'm trying to deepen my understanding of non-parametric probability density estimation methods and their relation to parametric methods. I'd like to verify if my current understanding is correct: In ...
Andyale's user avatar
  • 51
0 votes
0 answers
40 views

Informativeness and entropy, Sims 2010 chapter 4

I have a potential very simple question for you, a little bit tough for me. I'm reading the chapter 4 of the famous handbook of economics by Woodford, where Sims is introducing the idea of the ...
Riccardo Rasoni's user avatar
1 vote
0 answers
30 views

Is the conditional smooth min Entropy of two random variables larger than the conditional smooth min Entropy of one

I have tried to prove the following inequality for smooth min-entropies \begin{equation} H_{min}^{\epsilon}(XY|K) \geq H_{min}^{\epsilon}(X|K) \end{equation} I started trying to prove it for the ...
SemLavy's user avatar
  • 11
0 votes
0 answers
52 views

Why standardized LDPC codes, such 3GPP NR, have 4 cycles in Tanner Graph?

We know that short cycles in Tanner Graph are detrimental in error performance. So, Why standardized 5G NR LDPC codes have 4-cycles? 3rd generation Partnership Project (3GPP) had announced Base Matrix ...
Ozkan's user avatar
  • 1
0 votes
1 answer
39 views

Illustration of high-probability and typical set in Elements of information theory?

I'm following the book Elements of information theory by Joy A. Thomas, Thomas M. Cover. Here's the link to the text for you to understand the question: Elements of information theory At the end of ...
Duck Gia's user avatar
  • 111
1 vote
1 answer
42 views

Compute the Expected Codeword Length when using a Huffman-Code for a Source that Generates the German Alphabet

Lemma we have to use: Let $ E(p_1, \dots, p_n) $ be the average codeword length of a Huffman code for an alphabet with $ n $ letters and probabilities $ p_1, \dots, p_n $. Let $ p_1 $ and $ p_2 $ be ...
Sgt. Slothrop's user avatar
1 vote
0 answers
46 views

$\frac{1}{N}\sum_{n=1}^{N}\ln\left(\frac{p^{z_n}(1-p)^{1-z_n}}{{q^{z_n}(1-q)^{1-z_n}}}\right)$ approximates $D_{KL}(\text{Bern}(p)||\text{Bern}(q))$?

Let $Z_n\overset{iid}{\sim}\text{Bernoulli}(p)$ for some $p\in[0,1]$. Fix some $q\in[0,1]$, where $q$ may $-$ or may not $-$ be equal to $p$. Fix some $N\in\mathbb{N}$ and fix some sequence of ...
cluelessmathematician's user avatar
0 votes
0 answers
45 views

Entropy of random walk on a chessboard queen less than rook + bishop

I am looking at Cover's textbook on information theory and computed the entropies for random walks of chess pieces on an 8x8 board as follows: Rook: can move to 14 squares wherever it is $\to H(Rook) =...
warped's user avatar
  • 101
0 votes
0 answers
13 views

Geodesic Behavior in the Calvo-Oller Embedding of Multivariate Normal Distributions

The $\textbf{Calvo-Oller Embedding}$, introduced by Calvo and Oller, provides a novel way to study multivariate normal distributions by embedding their parameter space into the manifold of symmetric ...
Andyale's user avatar
  • 51
2 votes
1 answer
149 views

Lower bound for Kullback–Leibler divergence between Bernoulli distributed random variables

Let $D(x, y)$ be the Kullback–Leibler divergence between Bernoulli distributed random variables, i. e. $D(x, y) = x \ln \frac{x}{y} + (1-x) \ln \frac{1-x}{1-y}$. I've seen this lower bound for this ...
Iakunin Aleksandr's user avatar
-2 votes
1 answer
51 views

Why Do the Entropies of Discrete and Continuous Distributions Diverge, Even When Their Distributions Converge? [closed]

I'm grappling with the concept of entropy as it relates to discrete and continuous distributions, and I’d appreciate some help understanding a thought experiment that’s left me puzzled. Thought ...
itchytoo's user avatar
0 votes
0 answers
57 views

Maximizing the KL Divergence when it is Bounded

I've recently been working on trying to find the "worst" possible model, in the Kullback-Leibler (KL) divergence sense, to look at the robustness of an underlying algorithm. I was wondering ...
kjc93's user avatar
  • 41
1 vote
0 answers
41 views

Show that $r(X,Y)=0$ when Y is a function of X.

Let $X$ and $Y$ be two random variables. Then a measure called conditional residual extropy, defined as $$J(X|Y)=-\frac{1}{2}\int_0^\infty \bar{F}^2(x|y)dx, \hspace{2mm}y>0$$ where $\bar{F}_{X|Y}(x\...
Unknown's user avatar
  • 11
1 vote
1 answer
46 views

Question about Gaussian Mixture Model

A random vector $X = (X_1, …, X_k)'$ is said to have the multivariate normal distribution if it satisfies the following equivalent conditions. Every linear combination of its components $Y = a_1X_1 + ...
Andyale's user avatar
  • 51
0 votes
0 answers
27 views

Conditional limit theorem for method of types with explicit bounds in terms of n

A lot of large deviation theory is only concerned seemingly with the asymptotic nature of certain probabilities. Mainly because I imagine most applications have $n$ so big that lower order terms are ...
latbbltes's user avatar
  • 430
1 vote
0 answers
55 views

Understanding the entropy of the normal distribution

Using the standard definition for $H[p]$, you can show that the entropy of a Normally distributed variable is $$ H(\mu, \sigma^2) = \frac{1}{2} \ln\left(2\pi\sigma^2\right) + \frac{1}{2} $$ in units ...
Xander's user avatar
  • 71
1 vote
0 answers
65 views

Is there any good definition of the "interesting" information contained in a string?

The most well known measurement of a string's information content is Kolmogorov complexity. By this measurement random strings contain the most information. However it's clear that the information ...
CyborgOctopus's user avatar
6 votes
3 answers
174 views

Link between channel capacity and mutual information

I am confused about the following: How can channel capacity, which is measured in bits per second, be equal to (the maximum) mutual information, which is measured in bits? I am confused. Could someone ...
Ronald's user avatar
  • 71

1
2 3 4 5
51