Questions tagged [st.statistics]
The st.statistics tag has no usage guidance.
82 questions
3
votes
0
answers
54
views
Cuthill - Mckee Guarantees?
I'm interested in the following problem: given $M$, a $p \times p $ symmetric sparse matrix (the number of non-zero elements in each row is at most $s \ll p$), find a matrix $B = PMP^T$ where $P$ is a ...
2
votes
1
answer
375
views
Additive chernoff bound
From wikipedia,
Additive form (absolute error)
The following theorem is due to Wassily Hoeffding and hence is called the Chernoff-Hoeffding theorem.
Chernoff-Hoeffding theorem.
Suppose $X_1, \ldots, ...
3
votes
1
answer
118
views
Is a Single Linear MLP Equivalent to a Random Projection
I am just hoping to confirm my hypothesis, that a single MLP (untrained and randomly initialized) can be used for random projection for dimensionality reduction.
If a random MLP layer with no ...
2
votes
1
answer
93
views
Are there pseudorandom sequences which cannot be learned by any ML model but which still fail the Diehard tests?
This is likely a very silly question which has a simple answer. As I understand, ML models are able to detect patterns in sequences. Given a sequence which is not truly random but rather only ...
1
vote
0
answers
166
views
Computational complexity of higher order cumulants
From Wikipedia:
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two ...
1
vote
0
answers
80
views
Relationship between statistical query lower bounds and "traditional" iid sampling lower bounds
Coming from a more statistical background, it is not clear to me if or how lower bounds in the statistical query (SQ) model imply anything useful about traditional learning problems with iid samples (...
2
votes
1
answer
91
views
Regularity Lemma for Multi-Relational Graphs?
Is there an analogous to Szemerédi regularity lemma in the setting, where I have multi relational graph i.e. I have $n$ nodes, but instead of having edges to be in $\{0,1\}$ i.e. there is an edge or ...
3
votes
1
answer
216
views
Sample complexity lower bound to learn the mode (the value with the highest probability) of a distribution with finite support
Say we have a black-box access to a distribution $\mathcal{D}$ with finite support $\{1,2,...,n\}$ with probability mass function $i \mapsto p_i$. How many samples of $\mathcal{D}$ are needed to learn ...
0
votes
0
answers
33
views
Minimax computation for classification problems with smooth densities functions
Fix $d=1$, $r \in (0,\infty)$ and a neigborhood $\Omega$ of $0$ in $\mathbb R^d$ and let and let $W^{1,\infty}(r)$ be the Sobolev ball continuously differentiable functions $f:\mathbb R^d \to \mathbb ...
2
votes
1
answer
240
views
Upper bound for VCdim of $H$ in terms of subgraph$(F)$, where $H := \{S(f) | f \in F\}$, with $S(f) := \{(x,y) \in X \times \{\pm 1\} | yf(x) \le 1\}$
$\DeclareMathOperator\sg{sg}\DeclareMathOperator\VCdim{VCdim}$
Let $X$ be a measurable space and given a measurable function $f:X \to \mathbb R$, recall that the subgraph of $f$, denoted $\sg(f)$ is ...
0
votes
1
answer
145
views
Differential privacy definition: subset of range of values vs. equals a value in the range
Consider only $\epsilon$-differential privacy. The textbook definition for this is:
Definition 1: "A randomized algorithm $\mathcal{M}$ with domain $\mathbb{N}^{|\chi|}$ is $\epsilon$-...
0
votes
2
answers
169
views
An (unusual?) risk bound
I am told that that a bound on the generalization error of the following form exists in terms of something called the ``shattering coefficient" - but I am not able to reference this quantity in ...
1
vote
1
answer
280
views
Generalization bound for parameters rather than loss functions
I was wondering if it is possible to obtain high probability bounds (provided finite sample size of the training data) for the distance (say in the l-1 or l-2 norm) between the best parameter set and ...
3
votes
1
answer
70
views
Examples of learning via exactly integrable gradient flows
If $\ell (\vec{w}, \vec{z})$ is the loss function at weights $\vec{w}$ and for data $\vec{z}$ then corresponding to a distribution ${\cal D}$ we can consider doing gradient flow with step-length $\eta ...
0
votes
1
answer
67
views
Machine Learning: Calibrating SubGroups of Probability Predictions inside a Dataset which should sum to 100%
I am working on an interesting type of problem where I want to make predicitons for individual elements within subgroups- with the knowledge that the sum of the probabilities within a subgroup should ...
6
votes
1
answer
519
views
Is there an equivalent to VC-dimension for density estimation as opposed to classification?
VC-dimension can be used to quantify the capacity for classifier models and compute generalization bounds, but is there an equivalent concept that can be applied to density estimation, e.g. to compute ...
7
votes
1
answer
325
views
Testing for finite expectation
The mean of a positive random variable $X$ is either finite or infinite; define $J(X)$ to be $0$ in the former case and $1$ in the latter case. Claim: there does not exist a function $J_n$ from the ...
3
votes
1
answer
239
views
Why Asymptotic Equipartition Property theorem proofs assume the source is memoryless?
I do not understand the assumption $X_1, X_2, \cdots$ are i.i.d. ~p(x) in the AEP proofs I have seen. I have read some different sources for understanding the Asymptotic Equipartition Property. Using ...
-1
votes
1
answer
68
views
Notation in proof for Asymptotic Equipartition Property
In the following lecture notes chapter 3, page 12-13, they state the following
We begin by introducting some important notation:
- For a set $\mathcal{S},|\mathcal{S}|$ denotes its cardinality (...
0
votes
1
answer
158
views
Volume of elements mapped to the same codeword is $2^{H(X|\hat{X})}$
In this paper by Tishby, Pereira and Bialek they mention on page 4 in the Relevant quantization chapter the setting is the following; Given some signal space $X \sim p(x)$ and a quantized codebook $\...
2
votes
1
answer
312
views
Why non-uniform learnability does not imply PAC learnability?
PAC guarantees provide us a a learning algorithm $A_n(\cdot)$ and sample complexity bound $n_{\mathcal{F}}(\epsilon,\sigma)$ that ensures
$
P\left[L_P(A(\mathcal{D}^n))-L_P(f^*)\leq \epsilon\right]\...
0
votes
1
answer
121
views
Notation of sequences in rate distortion theory
I have been reading whatever sources I could get my hands on today, regarding this problem.
Most notes online about rate distortion theory come from the book Elements of Information Theory by Thomas ...
3
votes
1
answer
209
views
Binary search on coin heads probability
Let $f:[0,1] \to [0,1]$ be a smooth, monotonically increasing function. I want to find the smallest $x$ such that $f(x) \ge 1/2$.
If I had a way to compute $f(x)$ given $x$, I could simply use ...
4
votes
0
answers
118
views
Strong data-processing inequality: bound $TV(T_{\#}P_0,T_{\#}P_1)$ if $\|T(x)-x\|_\infty \le \varepsilon;\forall x \in \mathbb R^p$
Disclaimer. I've moved this question from MO hoping that here is the right venue. Also, this is my first post on this channel, so please have some patience.
So, Iet $X = (X,d)$ be a Polish space, ...
4
votes
1
answer
284
views
Is this a known learning problem?
Let $(\mathcal{X},\rho)$ be a metric space (say, $\mathcal{X}=[0,1]$ with the Euclidean metric). Let $\alpha:\mathcal{X}\to[0,1]$ be unknown. Suppose that $\mathcal{X}$ is endowed with a distribution $...
2
votes
1
answer
282
views
Is the Chi-square divergence a Bregman divergence?
Is the Chi-squared divergence $\sum_{i} \frac{(x(i)-y(i))^2}{x(i)}$ a Bregman divergence? I.e., can it be written as
$\phi(x) - \phi(y) - \langle\phi'(y),x-y\rangle$?
If so, what is the potential ...
4
votes
1
answer
80
views
Terminology and references for a learning model
Let's say we're doing regression over $[0,1]^d$ -- either in the PAC sense with bounded-range agnostic noise or in the more classical-statistics sense with additive Gaussian noise. Suppose further ...
1
vote
0
answers
57
views
Average smoothness learning rates
This question is somewhat related to this one. There are many results in statistics where convergence rates (including minimax ones) are given in terms of the smoothness properties of the underlying ...
0
votes
0
answers
89
views
Expected value of a random experiment in a graph
I need to find the expected value of R in the random experiment below.
$$
R = \frac{1}{K} \sum_{C \in \mathcal{H} } \ [\frac{1}{2} |V(C)| * (|V(C)| - 1) - |C|]
$$
$\mathcal{H}$ is a partition on ...
2
votes
2
answers
584
views
About learning a single Gaussian in total-variation distance
I am looking for the proof of this following result which I saw as being claimed as a "folklore" in a paper. It would be helpful if someone can share a reference where this has been shown!
Let $G$ ...
2
votes
0
answers
105
views
Lower bounds for SRM?
This question is about structural risk minimization and model selection. Let $H_n$ be the collection of all binary classifiers on some fixed set with an $n$-bit description length in some fixed ...
2
votes
1
answer
142
views
Sample Complexity for Order Statistics
I have a sample complexity question which seems fairly basic, but for which I'm having trouble finding a reference.
Let $F$ be an unknown distribution over $[0,1]$. Denote by $X_{k:n}$ the $k$th of $...
1
vote
0
answers
119
views
Average margin bounds for separable SVM
Suppose we're training a linear separator in the realizable PAC setting. Given $m$ labeled examples $(x_i,y_i)$ in $\mathbb R^d\times\{-1,1\}$, a (consistent) linear separator is a vector $w\in\mathbb ...
0
votes
0
answers
50
views
Function that maps non-linear distribution to normal distribution while maintaining distance
I have a collection $X$ of 10 million $(x,y,z)$ 3-tuples, where $x$, $y$, and $z$ are all numbers between 0 and 1. The distribution of $x$, $y$, and $z$ values are complex, and the distributions of $...
3
votes
2
answers
325
views
How can AIC converge in the limit when even 2 parameter models can have infinite VC dimension?
AIC-based model-selection converges to zero error in the limit, and also has finite-sample convergence that is rate-optimal with respect to worst case minimax error [1]. (Note that AIC refers to ...
4
votes
1
answer
106
views
Design a sampling process to select an element with probability proportional to its appear probability in a simulation
We are given a black box $A$ that can do a simulation. Each time running box A gives a sample $S \in 2^X$ where $X$ is a finite ground set.
Let $\Pr[x]$ be the probability that $x \in X$ appears in ...
1
vote
0
answers
69
views
To what extent supervised learning ERM learn first-order knowledge
Suppose I have a collection of (hidden) first-order rules:
$$
\mathcal{R}: \{ Q_i(x) => P_i(x) \}_{i=1}^{k}
$$
all defined over $x \in \mathcal{X}$.
I can use these rules and (automatically) ...
-1
votes
1
answer
170
views
Application of the inequality with expectations
Let $\Vert\cdot\Vert$ is a norm in $R^n$. Let $x_1,\dots,x_N$ non-independent Rademacher random variables random variables (variables which are uniform on $\{-1, 1\}$). . By $E$ we denote an ...
0
votes
1
answer
115
views
Learning a discrete distribution in $\ell_r$ norm
Let $P=(p_1,\ldots,p_d)$ be a distribution on $[d]$. Given $n$ iid draws from $P$, we construct some empirical estimate
$\hat P_n=(\hat p_{n,1},\ldots,\hat p_{n,d})$. Let us define the $r$-risk
by
$$ ...
15
votes
0
answers
363
views
Differential privacy and data poisoning
A differentially private algorithm takes datasets containing inputs and produces randomized outputs, such that no small change in the dataset can shift the distribution of outputs by too much. This ...
6
votes
2
answers
671
views
Learning a coin's bias (localized)
It's well known that the minimax sample complexity for estimating the bias $p$ of a coin to additive error $\epsilon$ with confidence $\delta$ is
$\Theta(\epsilon^{-2}\log(1/\delta))$. What if we ...
-1
votes
1
answer
183
views
L1 / Variational Distance between distributions [closed]
My statistics knowledge is somewhat poor, so I have to ask one (dumb) question.
Let $\beta$ be a real number in the interval $\big[0, \frac{1}{2}\big)$ and $\mathcal{D}_1, \mathcal{D}_2, \mathcal{D}...
9
votes
2
answers
356
views
What is the connection between moments of Gaussians and perfect matchings of graphs?
Today, I heard the following statement in a talk:
The 4th moment of a $1$-dimensional Gaussian distribution with mean $0$ and variance $1$ is the same as the number of perfect matchings of a ...
2
votes
1
answer
165
views
Learning from derivative data
In many machine learning algorithm, it is often assumed that outputs of unknown function and their corresponding inputs are given to estimate the unknown function. However, I wonder whether there ...
2
votes
0
answers
71
views
Impossibility result on metric learning?
Are there any fundamental limitations (impossibility results) known for metric learning?
Are there any direct connection reduction from/to that I can use results in clustering? (e.g. this: 2 )
2 ...
1
vote
0
answers
171
views
Maximal correlation vs correlation coefficient when one RV is Gaussian
Last week I asked a question on MOF (see here), but I got no reply. So I am asking my question here.
Let a pair of random variables $(X,Y)$ be continuous random variables (i.e., they both have ...
7
votes
1
answer
1k
views
Exponential Concentration Inequality for Higher-order moments of Gaussian Random Variables
Let $X_1,\ldots, X_n$ be $n$ i.i.d. copies of Gaussian random variable $X \sim N(0, \sigma^2)$. It is known that
\begin{align}
\mathbb{P}\Bigl( \Bigl|\frac{1}{n}\sum_{j=1}^n X_j \Bigl| >t\Bigr) &...
2
votes
1
answer
164
views
Tolerance parameter of statistical query model and adaptivity
It seems that the reasonable assumption for the tolerance parameter of statistical query model is roughly $1/\sqrt{n}$, which is obtained from concentration inequalities (see, e.g., Definition 2.3 of ...
4
votes
3
answers
274
views
Approximating distributions from samples
One claim I find in many papers about identity testing, and closeness testing is that any distribution over $[n]$ can be approximated to within $\ell_1$ distance $\epsilon$ in $O\left(\frac{n}{\...
0
votes
0
answers
54
views
Establishing causality under conditions of certainty
I'm currently reading "Causality: Models, Reasoning, and Inference" by Judea Pearl. Early on, he states that the development assumes that there are no certain entailments, no 1 or 0 probabilities -- ...