Newest 'information-theory' Questions

0 votes

0 answers

21 views

How to calculate the channel capacity of the transmit model?

I recently face the problem to work out the channel capacity of the following transmit model: The recieved signal $$ {\boldsymbol{y}}=\boldsymbol{h}\left(\boldsymbol{Fx}\otimes\boldsymbol{I}_{N_r}\...

RACHpreamble

31

asked yesterday

-1 votes

0 answers

33 views

Regarding Kolmogorov complexity, I need to show the following theorem.

The original text is in Japanese but I used ChatGPT to translate it. For any natural number c, there exists a natural number n such that K(n)>c. For any consistent axiomatic system T capable of ...

Aniq Zafri

1

asked Dec 10 at 1:01

1 vote

1 answer

54 views

The relation between nonadjacency preserving map and fractional chromatic number

In the paper "Graph information ratio", $\phi:V(G^k)\to V(H^n)$ is nonadjacency preserving if for any $g\not\sim g'$, $\phi(g)\not\sim \phi(g')$, $\not\sim$ means not adjacent and distinct. ...

Delong

1,943

asked Dec 5 at 16:11

1 vote

0 answers

85 views

Imperfect coding theory problem: Locating $j$ and determining the best constant $k_{q}$ [closed]

Let $\mathbf{x}= \left ( x_{1}, x_{2}, \ldots, x_{n} \right )\in\left\{ 0, 1, \ldots, q- 1 \right\}^{n}$ be unknown, but its syndrome is given: $$\operatorname{Syn}\left ( \mathbf{x} \right )= \sum\...

kangwon_1502

100

asked Dec 4 at 14:46

-1 votes

0 answers

31 views

What is the graph information ratio when one of the graph is complete

In the paper "Graph Information Ratio", the information ratio is defined to be $\textrm{Ir}(H/G)=\sup \{k/n:\exists \textrm{ a nonadjacency preserving mapping from } G^k \textrm{ to } H^n\}$,...

Delong

1,943

asked Dec 2 at 11:01

0 votes

2 answers

53 views

How to get an understanding of information entropy.

In my book it says: If we are told that a highly improbable event has occurred, we will have received more information than if we were told that some likely event has just occurred. I have come up ...

HMPtwo

533

asked Dec 1 at 12:12

0 votes

0 answers

10 views

$\Omega(\log T)$ minimax regret lower bound for a 2-armed bandit with Uniform rewards

I'm studying multi-armed bandit problems and encountered the following scenario: We have a 2-armed bandit where each arm $ i $ yields rewards distributed as $$ X_i = \mu_i + \text{Unif}[-1, 1], $$ ...

MartinS

88

asked Nov 30 at 7:52

1 vote

2 answers

55 views

Uniformity of X Given a Doubly Stochastic Conditional Matrix

I have two discrete finite random variables $X$ and $Y$, both defined over the same finite domain. The conditional distribution of $Y$ given $X$ is represented by a square matrix $W$, where the $x$-th ...

pmoi

101

asked Nov 28 at 10:28

1 vote

0 answers

35 views

What is the maximum subspace of $\mathbb{F}_2^n$ that only contains $0$ and elements of weight larger than $w$? [closed]

For element $x$ in binary linear space $\mathbb{F}_2^n$, $\mathrm{wt}(x)$ is the weight function. Define the set $$A := \{x \in \mathbb{F}_2^n \mid \mathrm{wt}(x) \ge w\}. $$ What is the largest ...

Jiawei Wu

81

asked Nov 27 at 9:14

0 votes

0 answers

32 views

Information content of a square wave with random cycles

How can I measure the information content of a square wave with random cycles? This signal has the values of 0 and 1, but the time that it remains in one of the states changes, so the time is ...

SaeedTaghavi

1

asked Nov 27 at 8:44

0 votes

1 answer

37 views

Behavior of argmax of dependent random variables

Suppose $X_1...X_N$ are a set of dependent random variables with the same support (continuous). They do not necessarily have the same distribution. Let $Y$ be $\text{argmax} ({X_1... X_n})$, so with ...

Ryan Park

1

asked Nov 25 at 4:45

0 votes

1 answer

58 views

Polar codes and Reed-Muller codes

Can someone explain the construction, or give a definition, of polar codes, say for the binary symmetric channel? [I seriously can't even find a definition.] I'd like an explanation, if possible, of ...

yoyo

10.2k

asked Nov 21 at 15:18

1 vote

0 answers

47 views

Inequality between $f$-divergences and Fisher information

I am wondering if there exists a general global inequality between some $f$-divergence and Fisher information of the form: $$\forall \theta_0, \theta_1 ~ D_f(p_{\theta_0} || p_{\theta_1}) \leq C (\...

Ramufasa

21

asked Nov 19 at 16:03

0 votes

0 answers

21 views

Reversing data processing inequality for f-divergences for certain types of corruptions

Consider a random variable $X$. Let $Z ~ \mathcal{N}(0, \sigma^2)$. Let $Y = X + Z$. For distributions $p_X, q_X$ and $p_Y, q_Y$, data processing inequality gives us $$D_f(p_y \| q_y) \leq D_f(p_x \| ...

kvarun95

13

asked Nov 16 at 19:25

0 votes

1 answer

58 views

lipschitzness for the gradient of KL divergence $KL(P||Q_\phi)$ where $P$ is fixed and $Q$ is parameterized by $\phi$

I am considering KL divergence $\text{KL}(P||Q_\phi)$ between two continuous distributions $P$ and $Q$ where $P$ is fixed and $Q$ is a Gaussian parameterized by $\phi$. Is it possible to prove that ...

melatonin15

701

asked Nov 13 at 4:42

0 votes

1 answer

32 views

how to derive the feature importance based on information?

Suppose we have features $x_1,...,x_n$ and target $y$, we want to decide the feature importance of each $x_i$ to $y$. Define a quantity (based on information) like: $I(y,x_1,...,x_n) = log_2(P(y|x_1,.....

Xu Shan

256

asked Nov 12 at 12:36

2 votes

1 answer

105 views

Interpreting "Log Probability" in Optimization/Statistics/Machine Learning

It seems like in machine learning we work with log probabilities a lot, like, minimize, maximize log probability. This makes sense from a numerical perspective since adding is easier than multiplying ...

Mason Wang

21

asked Nov 9 at 23:55

2 votes

0 answers

62 views

On the Abstract Structure Behind Oracle Problems

Here is what I mean by an 'Oracle Problem'. An universe set or class $U$, a family of queries $\mathcal Q$ of partitions of $U$ and a target element $t\in U$ are fixed. You're not aware of the target ...

Alma Arjuna

5,643

asked Nov 7 at 20:58

0 votes

0 answers

35 views

Given a channel transition matrix ($\bf{P}$) of DMC channel, does the capacity of $\mathbf{P}$ equal to that achieved with $\mathbf{P}^T$?

I'm a bit confused with this. Since $I(X;Y) = I(Y;X)$, at first I thought it implies that Capacity($\mathbf{P}$) = Capacity($\mathbf{P}^T$), i.e., the capacity achieved with a transition matrix, ...

ChuSequence

1

asked Nov 5 at 2:46

0 votes

1 answer

48 views

Optimal coding for uniform distribution?

I have asked this in Computer science stack exchange. Here: Optimal coding for uniform distribution. My answer to the question only provides intuition. I fail to make this mathematically valid. The ...

Duck Gia

111

asked Nov 4 at 7:59

1 vote

1 answer

68 views

Conditional expectation of the second occurrence of an $n$-digit pattern knowing that it occurred in the beginning of the string $p$

I was confronted with the problem of proving the following lemma : Consider the semi-infinite sequence of digits $d_i$.\begin{align} d = [d_0,d_1,d_2,... \end{align} We have $L \geq 2$ possible ...

soufiane yes

23

asked Nov 3 at 19:49

0 votes

0 answers

34 views

What do Autoencoders do form a statistical point of view?

As the title suggests, i was wondering how can autoencoders be interpreted from a statistical point of view. Here, i'll try to give you my interpretation but, since i'm not too familiar with ...

Francesco De Santis

13

asked Nov 2 at 20:28

1 vote

1 answer

55 views

Weird manipulation using log-sum inequality

This is going to be a simple question. The statement is simple: Given a probability vector $a$ and a doubly stochastic matrix $P$ which satisfies $$\sum_{i}P_{ij} = 1 = \sum_{j} P_{ij}$$ , then $b = ...

Duck Gia

111

asked Nov 1 at 12:19

0 votes

0 answers

41 views

Mutual information upper-bound

For a given $c\in(0,1)$, a positive integer $n$ and independent random variables $X$ and $Y$, I am interested in obtaining an upper-bound on the mutual information between $c^n X+Y$ and $X$. One (...

ba029188

53

asked Oct 30 at 2:54

0 votes

1 answer

53 views

Gaussian Distribution as minimizer of difference between two differential entropies [closed]

Let $N$ be a Gaussian random variables with variance $\sigma^2$, and let $X$ be another random variable independent of $N$ with variance $P$. I want to show that the Gaussian distribution for the ...

Nick Cooper

85

asked Oct 23 at 18:41

0 votes

0 answers

176 views

How I could evaluate Gabor's Uncertainty relation for a system with finite duration?

How I could evaluate Gabor's Uncertainty relation for a system with finite duration? Intro____________ In this another question I think I found an ansatz for solving the PDE $$ \Delta |u|^{\frac12} = ...

Joako

1,756

asked Oct 23 at 17:09

0 votes

0 answers

34 views

What are the asymptotic properties of plain kolmogorov complexity?

This is somewhat of a vague question, but here is an example: If $C$ is plain kolmogorov complexity, and $f$ is a computable injective function, does $C(f(t)) = \Theta(C(t))$ regardless of the ...

William Oliver

901

asked Oct 23 at 14:07

0 votes

0 answers

51 views

How Hamming Distance changes with added vector

Suppose we have $k$ vectors, $v_1, \dots, v_k$, where $v_i = (v_{i,1}, \dots, v_{i,n})$, all of the same length $n$ over a finite field $\mathbb{F}_p$. We can construct a matrix $V$: \begin{equation} ...

CatsAndDogs

467

asked Oct 20 at 18:58

1 vote

0 answers

24 views

Prove optimality of Huffman code

This is going to be short question. I have trouble generalizing the result of Huffman optimality for two-alphabet vocabulary. The book I'm following suggests an inductive approach, but I find it not ...

Duck Gia

111

asked Oct 19 at 5:33

0 votes

1 answer

60 views

Partition form of the KL divergence.

A definition of $KL$ divergence is the following: $$ KL(P||Q)=\underset{\mathcal{Z}}{\sup}KL(P_{\mathcal{Z}}||Q_{\mathcal{Z}}), $$ where $Z$ is any finite partition of the underlying sample space and $...

Sushant Vijayan

41

asked Oct 18 at 10:45

-1 votes

1 answer

69 views

Le Cam's two point method [closed]

I am looking for a source for Le Cam's two point method and I can't find any book that states in the form I could cite it. Specifically I am looking for the following result $$ \sup_{\theta \in \Theta}...

craaaft

783

asked Oct 14 at 4:54

0 votes

0 answers

18 views

Understanding Statistical Manifolds for Discrete Events vs Continuous Sample Spaces

Consider a probability space $(\Omega, \mathcal{F}, P)$ where: $\Omega$ is the sample space $\mathcal{F} = \{A_1, A_2, A_3, A_4, A_5\}$ is a finite $\sigma$-algebra $P$ is a probability measure We ...

Andyale

51

asked Oct 13 at 6:21

0 votes

0 answers

26 views

Understanding Non-parametric Probability Density Estimation

I'm trying to deepen my understanding of non-parametric probability density estimation methods and their relation to parametric methods. I'd like to verify if my current understanding is correct: In ...

Andyale

51

asked Oct 12 at 20:03

0 votes

0 answers

40 views

Informativeness and entropy, Sims 2010 chapter 4

I have a potential very simple question for you, a little bit tough for me. I'm reading the chapter 4 of the famous handbook of economics by Woodford, where Sims is introducing the idea of the ...

Riccardo Rasoni

1

asked Oct 10 at 9:43

1 vote

0 answers

30 views

Is the conditional smooth min Entropy of two random variables larger than the conditional smooth min Entropy of one

I have tried to prove the following inequality for smooth min-entropies \begin{equation} H_{min}^{\epsilon}(XY|K) \geq H_{min}^{\epsilon}(X|K) \end{equation} I started trying to prove it for the ...

SemLavy

11

asked Oct 10 at 6:52

0 votes

0 answers

52 views

Why standardized LDPC codes, such 3GPP NR, have 4 cycles in Tanner Graph?

We know that short cycles in Tanner Graph are detrimental in error performance. So, Why standardized 5G NR LDPC codes have 4-cycles? 3rd generation Partnership Project (3GPP) had announced Base Matrix ...

Ozkan

1

asked Oct 8 at 18:21

0 votes

1 answer

39 views

Illustration of high-probability and typical set in Elements of information theory?

I'm following the book Elements of information theory by Joy A. Thomas, Thomas M. Cover. Here's the link to the text for you to understand the question: Elements of information theory At the end of ...

Duck Gia

111

asked Oct 8 at 6:40

1 vote

1 answer

42 views

Compute the Expected Codeword Length when using a Huffman-Code for a Source that Generates the German Alphabet

Lemma we have to use: Let $ E(p_1, \dots, p_n) $ be the average codeword length of a Huffman code for an alphabet with $ n $ letters and probabilities $ p_1, \dots, p_n $. Let $ p_1 $ and $ p_2 $ be ...

Sgt. Slothrop

113

asked Oct 2 at 10:39

1 vote

0 answers

46 views

$\frac{1}{N}\sum_{n=1}^{N}\ln\left(\frac{p^{z_n}(1-p)^{1-z_n}}{{q^{z_n}(1-q)^{1-z_n}}}\right)$ approximates $D_{KL}(\text{Bern}(p)||\text{Bern}(q))$?

Let $Z_n\overset{iid}{\sim}\text{Bernoulli}(p)$ for some $p\in[0,1]$. Fix some $q\in[0,1]$, where $q$ may $-$ or may not $-$ be equal to $p$. Fix some $N\in\mathbb{N}$ and fix some sequence of ...

cluelessmathematician

153

asked Oct 2 at 2:37

0 votes

0 answers

45 views

Entropy of random walk on a chessboard queen less than rook + bishop

I am looking at Cover's textbook on information theory and computed the entropies for random walks of chess pieces on an 8x8 board as follows: Rook: can move to 14 squares wherever it is $\to H(Rook) =...

warped

101

asked Sep 29 at 9:11

0 votes

0 answers

13 views

Geodesic Behavior in the Calvo-Oller Embedding of Multivariate Normal Distributions

The $\textbf{Calvo-Oller Embedding}$, introduced by Calvo and Oller, provides a novel way to study multivariate normal distributions by embedding their parameter space into the manifold of symmetric ...

Andyale

51

asked Sep 20 at 11:31

2 votes

1 answer

149 views

Lower bound for Kullback–Leibler divergence between Bernoulli distributed random variables

Let $D(x, y)$ be the Kullback–Leibler divergence between Bernoulli distributed random variables, i. e. $D(x, y) = x \ln \frac{x}{y} + (1-x) \ln \frac{1-x}{1-y}$. I've seen this lower bound for this ...

Iakunin Aleksandr

21

asked Sep 18 at 10:32

-2 votes

1 answer

51 views

Why Do the Entropies of Discrete and Continuous Distributions Diverge, Even When Their Distributions Converge? [closed]

I'm grappling with the concept of entropy as it relates to discrete and continuous distributions, and I’d appreciate some help understanding a thought experiment that’s left me puzzled. Thought ...

itchytoo

1

asked Sep 18 at 8:17

0 votes

0 answers

57 views

Maximizing the KL Divergence when it is Bounded

I've recently been working on trying to find the "worst" possible model, in the Kullback-Leibler (KL) divergence sense, to look at the robustness of an underlying algorithm. I was wondering ...

kjc93

41

asked Sep 12 at 2:17

1 vote

0 answers

41 views

Show that $r(X,Y)=0$ when Y is a function of X.

Let $X$ and $Y$ be two random variables. Then a measure called conditional residual extropy, defined as $$J(X|Y)=-\frac{1}{2}\int_0^\infty \bar{F}^2(x|y)dx, \hspace{2mm}y>0$$ where $\bar{F}_{X|Y}(x\...

Unknown

11

asked Sep 1 at 14:12

1 vote

1 answer

46 views

Question about Gaussian Mixture Model

A random vector $X = (X_1, …, X_k)'$ is said to have the multivariate normal distribution if it satisfies the following equivalent conditions. Every linear combination of its components $Y = a_1X_1 + ...

Andyale

51

asked Aug 26 at 14:28

0 votes

0 answers

27 views

Conditional limit theorem for method of types with explicit bounds in terms of n

A lot of large deviation theory is only concerned seemingly with the asymptotic nature of certain probabilities. Mainly because I imagine most applications have $n$ so big that lower order terms are ...

latbbltes

430

asked Aug 25 at 0:50

1 vote

0 answers

55 views

Understanding the entropy of the normal distribution

Using the standard definition for $H[p]$, you can show that the entropy of a Normally distributed variable is $$ H(\mu, \sigma^2) = \frac{1}{2} \ln\left(2\pi\sigma^2\right) + \frac{1}{2} $$ in units ...

Xander

71

asked Aug 24 at 3:03

1 vote

0 answers

65 views

Is there any good definition of the "interesting" information contained in a string?

The most well known measurement of a string's information content is Kolmogorov complexity. By this measurement random strings contain the most information. However it's clear that the information ...

CyborgOctopus

374

asked Aug 22 at 21:46

6 votes

3 answers

174 views

Link between channel capacity and mutual information

I am confused about the following: How can channel capacity, which is measured in bits per second, be equal to (the maximum) mutual information, which is measured in bits? I am confused. Could someone ...

Ronald

71

asked Aug 21 at 8:00

Questions tagged [information-theory]

Related Tags