Questions tagged [mathematical-statistics]
Mathematical theory of statistics, concerned with formal definitions and general results.
7,896 questions
0
votes
0
answers
23
views
Resampling for AB Test, to achieve normal distribution under the CLT
I have (finally) wrapped my head around the Central Limit Theorem. Very exciting. However, I am struggling with how to apply, of if it should be applied, to an AB Test. In this example, let's say I ...
0
votes
0
answers
23
views
Correct usage of "sum" and "mean" for proportions vs continuous variables
What is the proper way to aggregate a measure of "proportions" vs "continuous" by "sum" vs "mean" ?
For example, let's say I have "time_on_site" and &...
0
votes
0
answers
9
views
Stein's method for exchangeable sequence
I have a random (not i.i.d.) sequence that I can show is an exchangeable pair.
I am trying to apply relatively recent ideas in Berry-Esseen bounds to the problem (https://link.springer.com/chapter/10....
0
votes
1
answer
19
views
Interaction term negative when both its components are positive?
I examined the effect of labour cost, labour quality and their interaction (cost*quality) on FDI, but I got positive coff. of both components and negative coff. of interaction term. How could to ...
0
votes
1
answer
29
views
What is the relationship between OR, se, and CI
Apologies if this is a duplicate question, but I have been unable to find a clear answer.
What is the relationship between Odds Ratio, standard error, and confidence intervals?
I try to do a meta-...
2
votes
0
answers
34
views
Marginal empirical distribution from joint sample
I have a quite 'simple' doubt that would like to clear.
Suppose I have a heiarchical model where data is sampled in the following manner:
Sample $U_i$ from $P_U$
Sample $X_i$ from $P_{X|U_i}$
In ...
3
votes
1
answer
30
views
What is the difference between Tolerance Interval and Control Limits?
I understand that a tolerance interval indicates a range within which a specified percentage of a population (future value) is expected to fall.
Control limits, on the other hand, represent the ...
0
votes
0
answers
10
views
Fixed-Length Confidence Interval for Gaussian iid Mean [closed]
I have a question on an exercise where the goal is to find a bounded length confidence interval for the mean of Gaussian random variables. More precisely, my question is on part a) of the exercise ...
1
vote
0
answers
9
views
Cumulative distribution function of coherent system
The theorem is from here.
Theorem 3.1. Let $X_1, \ldots, X_n$ be the i.i.d. component lifetimes of an $n$-component coherent system with signature $s$, and let $T$ be the system's lifetime. Then
$$
\...
0
votes
0
answers
16
views
Estimate SD[Z=X*Y] which one is correct?
Given we have a sample of two variables X and Y with sample size n. I want to calculate the standard deviation of Z = X*Y. I don't know which of the two options bellow are correct?
Option 1:
Simply ...
4
votes
1
answer
68
views
How do machine learning topics fit into a traditional undergraduate statistics course on estimation?
I'm recently teaching an undergraduate introduction to statistics course, but as required by program director, need to add some machine learning materials to it. I'm wondering what is the appropriate ...
2
votes
1
answer
44
views
Deriving Scale Parameter in Exponential Family
The following is from p. 98 of Casella & Berger's Statistical Inference (2024 edition):
Several of the families introduced in Section 3.3 either are scale families or have scale families
as ...
0
votes
0
answers
16
views
Compare two variances
I am reading this paper
I have difficulty understanding Section 6: A Linear Time Statistic and Test.
At the beginning, they claim that $\text{MMD}^2_l$ has higher variance than $\text{MMD}^2_u$ (we ...
2
votes
1
answer
59
views
What exactly is the definition of a UMP unbiased test?
I am solving exercises from section 8.3 of Hogg and McKean's "Introduction to Mathematical Statistics." I cannot proceed because the authors have not formally defined UMPU test even though ...
1
vote
0
answers
41
views
What is the proof of the mean sampling distribution being approximated by Student's t-distribution? [closed]
In the case of non-normal distribution with unknown variance, what is the justification that the sampling distribution of the mean (with large samples) is approximately Student’s t-distribution?
======...
1
vote
1
answer
42
views
Rate of convergence in probability
I am reading the paper
In this paper, they proved theorem 7 which stated in the following way
Theorem 7: Let $p, q, X, Y$ be defined as in Problem 1, and assume $0 \leq k(x, y) \leq K$. Then:
$$
\Pr_{...
2
votes
0
answers
43
views
Occurence extrapolation/statistics in a set of incomplete collections
My statistics courses are a long way off now (I'm a biologist). My problem is probably trivial, but I don't know where to start.
My goal is to calculate the occurrence of a gene X in a set of genomes. ...
2
votes
2
answers
44
views
Mathematical Reference for Metropolis-Within-Gibbs Algorithm
Is there a MATHEMATICAL reference for the Metropolis-Within-Gibbs algorithm with proves the algorithm mathematically ? (Presumably, the reference shall use facts in Markov Chain Theory, the fact that ...
1
vote
1
answer
60
views
Outcome Level vs. Treatment Level vs. Fixed-Effects Level in Difference-in-Differences
I am a bit confused about what controls should I include in my event-study (Callaway and Santanna 2021) specifications.
One of my models tries to understand the impact of the opening of a public ...
1
vote
2
answers
25
views
Is a sequential binomial sample a multinomial sample?
Say I have N particles and I remove a fraction $f_1$ of these obtaining $k_1$ particles as
$$ k_1 \sim \text{Binomial}(N,f_1) $$
and from these selected $k_1$ I have another Binomial draw of $k_2$ ...
0
votes
1
answer
27
views
Choose a good estimator in a candidate set
Recently, I've been interested in the following statistical problem:
I have a set that consists of some estimators $\hat{A}_i$ of a matrix $A\in \mathbb{R}^{p\times p}$. Then I have some data ...
6
votes
1
answer
226
views
MLE in stochastically increasing parametric family
Let $X$ have cumulative density function $F_{\theta}$, suppose this family is stochastically increasing in $\theta$, that is, for $\theta_1<\theta_2$, $F(x;\theta_2) \le F(x;\theta_1)$.
We have one ...
0
votes
0
answers
25
views
Variable selection for checking casual relationship of regression model: should or should not? [duplicate]
I am looking for documents and online sources to understand whether or not I should exclude variables from my model through model selection (variable selection).
I also tried to use methods of Least ...
1
vote
1
answer
44
views
Generalization error as U shape curve with respect to model complexity (bias variance tradeoff))
Is there any work mathematical rigorously prove that the generalization error for certain learning problems exhibits U shape curve with respect to model complexity (bias variance tradeoff)? Any ...
2
votes
2
answers
57
views
(More complete) proof the Fisher information is additive
For independent, identically distributed variables it is well known that the Fisher information is additive, i.e.
\begin{align}
\mathcal{I}_n(\theta)&=\left<{\left({\frac{\partial}{\partial\...
2
votes
1
answer
41
views
Alternatives for RMSE to Evaluate Goodness of Fit for Stable Distribution Parameters
I am estimating the parameters (alpha, beta, gamma, delta) of a stable distribution from a list of numerical data. I used a package to generate data from one type of stable distribution, specifically ...
1
vote
1
answer
91
views
Why could data bootstraping modifiy the slope of a population comming from the same distribution?
Im bootstraping some samples to calculate slopes (with replacement). Once that is done, the slopes that should have the same distribution, do not have the same distribution. To be clear im not asking ...
3
votes
1
answer
137
views
Different parametrizations of the exponential family
I found in this article https://arxiv.org/pdf/1607.06450 , formula 10, a parametrization of the exponential family that I think can be written like this:
$$P(t|\eta,s)=e^{\eta t/s}e^{-g(\eta)/s}e^{c(t,...
0
votes
1
answer
50
views
Connecting two different meanings of "degree of freedom"
I have heard at least 2 meanings of "degree of freedom".
The parameter in a t-distribution.
The the number of values in the final calculation of a statistic that are free to vary (like ...
0
votes
0
answers
12
views
Non-Analytical Differentiable Hamiltonian Function in Neural Networks
Im writing a study on this paper: https://arxiv.org/pdf/1906.01563v1
Its by Sam Greydanus et al. And they discovered that by using Neural Networks to predict the Hamiltonian (total energy of a system) ...
0
votes
0
answers
28
views
Prove that a test is most powerful when $X_1,\cdots,X_n\sim U(0,\theta)$
Let $X_1,\cdots,X_n\sim U(0,\theta),\theta >0$ be independent random variables. I want to prove that $\phi :\mathbb{R}^n\to [0,1]$ given by $\phi (x):=\begin{cases}1,&\theta _0<x_{(n)}\vee ...
2
votes
1
answer
43
views
A simple Hidden Markov Model
I was clarifying the exact formulas for the EM algorithm of a simphe hidden Markov model. This problem has came out from the problem 25, chapter 9, in which several practical examples of EM algorithms ...
8
votes
3
answers
577
views
Proof of the statement "the best test is unbiased"
There is a corollary from Hogg and McKean's textbook titled "Introduction to Mathematical Statistics" and I have miserably failed to understand the proof. Unfortunately, my question requires ...
0
votes
0
answers
9
views
Understanding the Implications of Similar Smooth Functions in Generalised Additive Models
I have a question about GAM models.
If I fit two GAM models, one including all variables together and the other adding one variable at a time, and the resulting smooth functions are similar, what does ...
3
votes
2
answers
83
views
Estimate a vector $\beta=\underbrace{\beta_1}_{\text{sparse}}+\underbrace{\beta_2}_{\text{dense}}$
In high-dimensional settings, we solve the linear regression using the lasso method which relies on the assumption of sparsity,
$$
\hat{\beta}=\underset{\beta\in \mathbb{R}^{p}}{\arg \min}\|Y-X\beta\|...
2
votes
1
answer
111
views
definition of regular estimators
In the book "Semiparametric Theory and Missing Data" by Tsiatis, superefficient estimators are defined as "they are unnatural and have undesirable local properties associated with them&...
2
votes
2
answers
30
views
Showing what the best line of fit is given the method of least squares - what do I do from here?
I have been working through Multivariable Calculus 4th Edition by James Stewart (©1999) and am currently stuck on what seems to be a stats problem on problem 51 of Chapter 15.7:
Suppose that a ...
0
votes
0
answers
25
views
Is there a form of regularized regression that's equivalent to maximum likelihood together with model selection by information criteria?
AIC
We often use stuff like AIC for model selection:
$$
AIC = 2k - 2\ln(L\hat)
$$
where $k$ is the number of parameters and $L̂$ is the maximized likelihood function.
https://en.wikipedia.org/wiki/...
0
votes
0
answers
17
views
A practical way to understand subgaussian parameter
I am currently assuming that the random variable $X$ I am working with is subgaussian with parameter $\sigma^2$. I have simulated data, but I would like to know how to use the generated data to ...
0
votes
0
answers
25
views
Chi squared for samples of different sizes
I would like to test for independence of a categorical variable in three different samples, X, Y, and Z. Each sample can be either of category A or of category B. This seems like a straightforward ...
0
votes
0
answers
32
views
Square of the convergence rate $\|\hat{A}-A\|_F^2$
In a high-dimensional setting, if we have an estimator, $\hat{A}\in\mathbb{R}^{n\times n}$, we always try to get the convergence rate measured by matrix norms, such as $\|\hat{A}-A\|_F$.
Now, if I ...
1
vote
0
answers
21
views
Is $\mathbb{E}\left[\|\hat{\Sigma}\|_F\right]=\|{\Sigma}\|_F$?
In one paper I read, the authors write
$$
\mathbb{E}\left[\|\tilde{\Sigma}^{-\frac{1}{2}}\left(\hat{\Theta}-\Omega\right)\|_F^2\right]=\mathbb{E}\left[\|{\Sigma}^{-\frac{1}{2}}\left(\hat{\Theta}-\...
1
vote
1
answer
43
views
Unbiased Variance MLE Distribution
If you take $10000$ samples from a normal distribution, the unbiased variance MLE (with Bessel's correction) is
$$\hat{\sigma}^2 = \frac{1}{9999}\sum_i (x_i - \hat{\mu})$$
Apparently the distribution ...
4
votes
1
answer
128
views
Closed form solution for bayesian linear regression with 2 responses?
I am thinking about first principles from the point of view of a frequentist moving from regression with 1 response to regression with 2 responses. Reflecting on that I am trying to figure out how to ...
1
vote
1
answer
48
views
The interpretation of the term "uncertainty" in statistics vs. information theory vs. machine learning
I have an ensemble model consisting from multiple classifiers and I wish to quantify the uncertainty of the predictions the ensemble model makes. From an information theory / machine learning ...
0
votes
0
answers
17
views
Deriving a multiple based on actuals and forecast values
For context, we are using the DeepAR model for demand planning forecasting. Currently the forecast often underrepresents actual demand. It was suggested that we use a higher quantile to overestimate ...
3
votes
1
answer
86
views
Probability expression in Multi-Task Logistic Regression
I'm trying to understand how the authors of this paper (Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors) obtain the general formula on page for the ...
4
votes
2
answers
141
views
Variance-bias tradeoff formula for simple linear regression with both X fixed and X random
I am trying to understand the Variance-Bias tradeoff formula using simple linear regression. But there are some formulas I am not able to derive. I will explain what I mean by first doing it for a ...
0
votes
0
answers
23
views
Confidence interval for entropy from Basharin’s asymptotic normality result
Setup:
Say we have i.i.d. observations $X_1, \dots, X_N$ from the distribution given by
$$
0 < p_i := \mathbb P(X_j = i) < 1, \quad i = 1, \dots, s, \quad \text{and} \quad \sum_{i = 1}^s p_i = 1....
2
votes
1
answer
47
views
Conditions for Pointwise convergence to imply uniform convergence
I have the following situation. Let $f:\mathbb{R}^p \times \Theta \to \mathbb{R}$ a measurable function. Moreover, let $X_n$ be a sequence of real-valued random vectors. I know that the function ...