Newest 'st.statistics' Questions

3 votes

0 answers

54 views

Cuthill - Mckee Guarantees?

I'm interested in the following problem: given $M$, a $p \times p $ symmetric sparse matrix (the number of non-zero elements in each row is at most $s \ll p$), find a matrix $B = PMP^T$ where $P$ is a ...

WeakLearner

131

asked May 4 at 4:21

2 votes

1 answer

375 views

Additive chernoff bound

From wikipedia, Additive form (absolute error) The following theorem is due to Wassily Hoeffding and hence is called the Chernoff-Hoeffding theorem. Chernoff-Hoeffding theorem. Suppose $X_1, \ldots, ...

Dotman

109

asked Oct 9, 2023 at 16:49

3 votes

1 answer

118 views

Is a Single Linear MLP Equivalent to a Random Projection

I am just hoping to confirm my hypothesis, that a single MLP (untrained and randomly initialized) can be used for random projection for dimensionality reduction. If a random MLP layer with no ...

Liam F-A

41

asked Jul 10, 2023 at 21:05

2 votes

1 answer

93 views

Are there pseudorandom sequences which cannot be learned by any ML model but which still fail the Diehard tests?

This is likely a very silly question which has a simple answer. As I understand, ML models are able to detect patterns in sequences. Given a sequence which is not truly random but rather only ...

AaronYeloy

121

asked Jun 15, 2023 at 22:17

1 vote

0 answers

166 views

Computational complexity of higher order cumulants

From Wikipedia: In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two ...

Omar Shehab

661

asked Nov 12, 2022 at 11:39

1 vote

0 answers

80 views

Relationship between statistical query lower bounds and "traditional" iid sampling lower bounds

Coming from a more statistical background, it is not clear to me if or how lower bounds in the statistical query (SQ) model imply anything useful about traditional learning problems with iid samples (...

student3365

19

asked Aug 11, 2022 at 15:44

2 votes

1 answer

91 views

Regularity Lemma for Multi-Relational Graphs?

Is there an analogous to Szemerédi regularity lemma in the setting, where I have multi relational graph i.e. I have $n$ nodes, but instead of having edges to be in $\{0,1\}$ i.e. there is an edge or ...

SagarM

716

asked Aug 4, 2022 at 11:16

3 votes

1 answer

216 views

Sample complexity lower bound to learn the mode (the value with the highest probability) of a distribution with finite support

Say we have a black-box access to a distribution $\mathcal{D}$ with finite support $\{1,2,...,n\}$ with probability mass function $i \mapsto p_i$. How many samples of $\mathcal{D}$ are needed to learn ...

actcon

33

asked May 19, 2022 at 12:31

0 votes

0 answers

33 views

Minimax computation for classification problems with smooth densities functions

Fix $d=1$, $r \in (0,\infty)$ and a neigborhood $\Omega$ of $0$ in $\mathbb R^d$ and let and let $W^{1,\infty}(r)$ be the Sobolev ball continuously differentiable functions $f:\mathbb R^d \to \mathbb ...

dohmatob

291

asked May 4, 2022 at 19:33

2 votes

1 answer

240 views

Upper bound for VCdim of $H$ in terms of subgraph$(F)$, where $H := \{S(f) | f \in F\}$, with $S(f) := \{(x,y) \in X \times \{\pm 1\} | yf(x) \le 1\}$

$\DeclareMathOperator\sg{sg}\DeclareMathOperator\VCdim{VCdim}$ Let $X$ be a measurable space and given a measurable function $f:X \to \mathbb R$, recall that the subgraph of $f$, denoted $\sg(f)$ is ...

dohmatob

291

asked Apr 23, 2022 at 19:12

0 votes

1 answer

145 views

Differential privacy definition: subset of range of values vs. equals a value in the range

Consider only $\epsilon$-differential privacy. The textbook definition for this is: Definition 1: "A randomized algorithm $\mathcal{M}$ with domain $\mathbb{N}^{|\chi|}$ is $\epsilon$-...

user1246462

87

asked Dec 5, 2021 at 21:26

0 votes

2 answers

169 views

An (unusual?) risk bound

I am told that that a bound on the generalization error of the following form exists in terms of something called the ``shattering coefficient" - but I am not able to reference this quantity in ...

gradstudent

1,453

asked Jul 19, 2021 at 22:09

1 vote

1 answer

280 views

Generalization bound for parameters rather than loss functions

I was wondering if it is possible to obtain high probability bounds (provided finite sample size of the training data) for the distance (say in the l-1 or l-2 norm) between the best parameter set and ...

kd212149

11

asked May 21, 2021 at 5:45

3 votes

1 answer

70 views

Examples of learning via exactly integrable gradient flows

If $\ell (\vec{w}, \vec{z})$ is the loss function at weights $\vec{w}$ and for data $\vec{z}$ then corresponding to a distribution ${\cal D}$ we can consider doing gradient flow with step-length $\eta ...

gradstudent

1,453

asked Feb 26, 2021 at 18:42

0 votes

1 answer

67 views

Machine Learning: Calibrating SubGroups of Probability Predictions inside a Dataset which should sum to 100%

I am working on an interesting type of problem where I want to make predicitons for individual elements within subgroups- with the knowledge that the sum of the probabilities within a subgroup should ...

daffodil_flower_boy

9

asked Dec 3, 2020 at 0:11

6 votes

1 answer

519 views

Is there an equivalent to VC-dimension for density estimation as opposed to classification?

VC-dimension can be used to quantify the capacity for classifier models and compute generalization bounds, but is there an equivalent concept that can be applied to density estimation, e.g. to compute ...

Andy

105

asked Oct 18, 2020 at 16:12

7 votes

1 answer

325 views

Testing for finite expectation

The mean of a positive random variable $X$ is either finite or infinite; define $J(X)$ to be $0$ in the former case and $1$ in the latter case. Claim: there does not exist a function $J_n$ from the ...

Aryeh

10.6k

asked Sep 15, 2020 at 9:10

3 votes

1 answer

239 views

Why Asymptotic Equipartition Property theorem proofs assume the source is memoryless?

I do not understand the assumption $X_1, X_2, \cdots$ are i.i.d. ~p(x) in the AEP proofs I have seen. I have read some different sources for understanding the Asymptotic Equipartition Property. Using ...

Fred Guth

143

asked Apr 29, 2020 at 4:01

-1 votes

1 answer

68 views

Notation in proof for Asymptotic Equipartition Property

In the following lecture notes chapter 3, page 12-13, they state the following We begin by introducting some important notation: - For a set $\mathcal{S},|\mathcal{S}|$ denotes its cardinality (...

sn3jd3r

133

asked Apr 12, 2020 at 1:00

0 votes

1 answer

158 views

Volume of elements mapped to the same codeword is $2^{H(X|\hat{X})}$

In this paper by Tishby, Pereira and Bialek they mention on page 4 in the Relevant quantization chapter the setting is the following; Given some signal space $X \sim p(x)$ and a quantized codebook $\...

sn3jd3r

133

asked Apr 9, 2020 at 14:22

2 votes

1 answer

312 views

Why non-uniform learnability does not imply PAC learnability?

PAC guarantees provide us a a learning algorithm $A_n(\cdot)$ and sample complexity bound $n_{\mathcal{F}}(\epsilon,\sigma)$ that ensures $ P\left[L_P(A(\mathcal{D}^n))-L_P(f^*)\leq \epsilon\right]\...

CrispyMcDiarmid

21

asked Apr 2, 2020 at 13:25

0 votes

1 answer

121 views

Notation of sequences in rate distortion theory

I have been reading whatever sources I could get my hands on today, regarding this problem. Most notes online about rate distortion theory come from the book Elements of Information Theory by Thomas ...

sn3jd3r

133

asked Mar 26, 2020 at 15:46

3 votes

1 answer

209 views

Binary search on coin heads probability

Let $f:[0,1] \to [0,1]$ be a smooth, monotonically increasing function. I want to find the smallest $x$ such that $f(x) \ge 1/2$. If I had a way to compute $f(x)$ given $x$, I could simply use ...

D.W.

12.4k

asked Mar 15, 2020 at 20:57

4 votes

0 answers

118 views

Strong data-processing inequality: bound $TV(T_{\#}P_0,T_{\#}P_1)$ if $\|T(x)-x\|_\infty \le \varepsilon;\forall x \in \mathbb R^p$

Disclaimer. I've moved this question from MO hoping that here is the right venue. Also, this is my first post on this channel, so please have some patience. So, Iet $X = (X,d)$ be a Polish space, ...

dohmatob

291

asked Dec 25, 2019 at 14:31

4 votes

1 answer

284 views

Is this a known learning problem?

Let $(\mathcal{X},\rho)$ be a metric space (say, $\mathcal{X}=[0,1]$ with the Euclidean metric). Let $\alpha:\mathcal{X}\to[0,1]$ be unknown. Suppose that $\mathcal{X}$ is endowed with a distribution $...

Aryeh

10.6k

asked Dec 23, 2019 at 10:06

2 votes

1 answer

282 views

Is the Chi-square divergence a Bregman divergence?

Is the Chi-squared divergence $\sum_{i} \frac{(x(i)-y(i))^2}{x(i)}$ a Bregman divergence? I.e., can it be written as $\phi(x) - \phi(y) - \langle\phi'(y),x-y\rangle$? If so, what is the potential ...

Yonatan

33

asked Oct 12, 2019 at 8:27

4 votes

1 answer

80 views

Terminology and references for a learning model

Let's say we're doing regression over $[0,1]^d$ -- either in the PAC sense with bounded-range agnostic noise or in the more classical-statistics sense with additive Gaussian noise. Suppose further ...

Aryeh

10.6k

asked Sep 18, 2019 at 7:26

1 vote

0 answers

57 views

Average smoothness learning rates

This question is somewhat related to this one. There are many results in statistics where convergence rates (including minimax ones) are given in terms of the smoothness properties of the underlying ...

Aryeh

10.6k

asked Sep 1, 2019 at 13:34

0 votes

0 answers

89 views

Expected value of a random experiment in a graph

I need to find the expected value of R in the random experiment below. $$ R = \frac{1}{K} \sum_{C \in \mathcal{H} } \ [\frac{1}{2} |V(C)| * (|V(C)| - 1) - |C|] $$ $\mathcal{H}$ is a partition on ...

Morten Movik

1

asked Aug 27, 2019 at 10:57

2 votes

2 answers

584 views

About learning a single Gaussian in total-variation distance

I am looking for the proof of this following result which I saw as being claimed as a "folklore" in a paper. It would be helpful if someone can share a reference where this has been shown! Let $G$ ...

gradstudent

1,453

asked Feb 18, 2019 at 17:43

2 votes

0 answers

105 views

Lower bounds for SRM?

This question is about structural risk minimization and model selection. Let $H_n$ be the collection of all binary classifiers on some fixed set with an $n$-bit description length in some fixed ...

Aryeh

10.6k

asked Nov 6, 2018 at 14:33

2 votes

1 answer

142 views

Sample Complexity for Order Statistics

I have a sample complexity question which seems fairly basic, but for which I'm having trouble finding a reference. Let $F$ be an unknown distribution over $[0,1]$. Denote by $X_{k:n}$ the $k$th of $...

Lemke

21

asked Oct 22, 2018 at 21:04

1 vote

0 answers

119 views

Average margin bounds for separable SVM

Suppose we're training a linear separator in the realizable PAC setting. Given $m$ labeled examples $(x_i,y_i)$ in $\mathbb R^d\times\{-1,1\}$, a (consistent) linear separator is a vector $w\in\mathbb ...

Aryeh

10.6k

asked Oct 14, 2018 at 13:50

0 votes

0 answers

50 views

Function that maps non-linear distribution to normal distribution while maintaining distance

I have a collection $X$ of 10 million $(x,y,z)$ 3-tuples, where $x$, $y$, and $z$ are all numbers between 0 and 1. The distribution of $x$, $y$, and $z$ values are complex, and the distributions of $...

lightning

433

asked Oct 4, 2018 at 19:43

3 votes

2 answers

325 views

How can AIC converge in the limit when even 2 parameter models can have infinite VC dimension?

AIC-based model-selection converges to zero error in the limit, and also has finite-sample convergence that is rate-optimal with respect to worst case minimax error [1]. (Note that AIC refers to ...

Andy

105

asked Sep 25, 2018 at 20:17

4 votes

1 answer

106 views

Design a sampling process to select an element with probability proportional to its appear probability in a simulation

We are given a black box $A$ that can do a simulation. Each time running box A gives a sample $S \in 2^X$ where $X$ is a finite ground set. Let $\Pr[x]$ be the probability that $x \in X$ appears in ...

Arthur

117

asked Jul 12, 2018 at 21:13

1 vote

0 answers

69 views

To what extent supervised learning ERM learn first-order knowledge

Suppose I have a collection of (hidden) first-order rules: $$ \mathcal{R}: \{ Q_i(x) => P_i(x) \}_{i=1}^{k} $$ all defined over $x \in \mathcal{X}$. I can use these rules and (automatically) ...

Daniel

749

asked May 10, 2018 at 2:28

-1 votes

1 answer

170 views

Application of the inequality with expectations

Let $\Vert\cdot\Vert$ is a norm in $R^n$. Let $x_1,\dots,x_N$ non-independent Rademacher random variables random variables (variables which are uniform on $\{-1, 1\}$). . By $E$ we denote an ...

user124297

141

asked Mar 29, 2018 at 11:06

0 votes

1 answer

115 views

Learning a discrete distribution in $\ell_r$ norm

Let $P=(p_1,\ldots,p_d)$ be a distribution on $[d]$. Given $n$ iid draws from $P$, we construct some empirical estimate $\hat P_n=(\hat p_{n,1},\ldots,\hat p_{n,d})$. Let us define the $r$-risk by $$ ...

Aryeh

10.6k

asked Nov 30, 2017 at 15:22

15 votes

0 answers

363 views

Differential privacy and data poisoning

A differentially private algorithm takes datasets containing inputs and produces randomized outputs, such that no small change in the dataset can shift the distribution of outputs by too much. This ...

WuTheFWasThat

331

asked Oct 19, 2017 at 3:51

6 votes

2 answers

671 views

Learning a coin's bias (localized)

It's well known that the minimax sample complexity for estimating the bias $p$ of a coin to additive error $\epsilon$ with confidence $\delta$ is $\Theta(\epsilon^{-2}\log(1/\delta))$. What if we ...

Aryeh

10.6k

asked Aug 28, 2017 at 9:07

-1 votes

1 answer

183 views

L1 / Variational Distance between distributions [closed]

My statistics knowledge is somewhat poor, so I have to ask one (dumb) question. Let $\beta$ be a real number in the interval $\big[0, \frac{1}{2}\big)$ and $\mathcal{D}_1, \mathcal{D}_2, \mathcal{D}...

Cip Baetu

51

asked Aug 9, 2017 at 21:45

9 votes

2 answers

356 views

What is the connection between moments of Gaussians and perfect matchings of graphs?

Today, I heard the following statement in a talk: The 4th moment of a $1$-dimensional Gaussian distribution with mean $0$ and variance $1$ is the same as the number of perfect matchings of a ...

Jardine

405

asked Apr 21, 2017 at 5:51

2 votes

1 answer

165 views

Learning from derivative data

In many machine learning algorithm, it is often assumed that outputs of unknown function and their corresponding inputs are given to estimate the unknown function. However, I wonder whether there ...

Kyoungjae Lee

33

asked Dec 4, 2016 at 13:42

2 votes

0 answers

71 views

Impossibility result on metric learning?

Are there any fundamental limitations (impossibility results) known for metric learning? Are there any direct connection reduction from/to that I can use results in clustering? (e.g. this: 2 ) 2 ...

Daniel

749

asked Feb 10, 2016 at 20:37

1 vote

0 answers

171 views

Maximal correlation vs correlation coefficient when one RV is Gaussian

Last week I asked a question on MOF (see here), but I got no reply. So I am asking my question here. Let a pair of random variables $(X,Y)$ be continuous random variables (i.e., they both have ...

SAmath

435

asked Jan 18, 2016 at 0:12

7 votes

1 answer

1k views

Exponential Concentration Inequality for Higher-order moments of Gaussian Random Variables

Let $X_1,\ldots, X_n$ be $n$ i.i.d. copies of Gaussian random variable $X \sim N(0, \sigma^2)$. It is known that \begin{align} \mathbb{P}\Bigl( \Bigl|\frac{1}{n}\sum_{j=1}^n X_j \Bigl| >t\Bigr) &...

Steve

449

asked Nov 20, 2015 at 20:36

2 votes

1 answer

164 views

Tolerance parameter of statistical query model and adaptivity

It seems that the reasonable assumption for the tolerance parameter of statistical query model is roughly $1/\sqrt{n}$, which is obtained from concentration inequalities (see, e.g., Definition 2.3 of ...

Minkov

862

asked Oct 25, 2015 at 7:00

4 votes

3 answers

274 views

Approximating distributions from samples

One claim I find in many papers about identity testing, and closeness testing is that any distribution over $[n]$ can be approximated to within $\ell_1$ distance $\epsilon$ in $O\left(\frac{n}{\...

Devil

419

asked Apr 21, 2015 at 2:59

0 votes

0 answers

54 views

Establishing causality under conditions of certainty

I'm currently reading "Causality: Models, Reasoning, and Inference" by Judea Pearl. Early on, he states that the development assumes that there are no certain entailments, no 1 or 0 probabilities -- ...

James Koppel

455

asked Mar 20, 2015 at 21:02

Questions tagged [st.statistics]

Related Tags