Questions tagged [computational-statistics]
Refers to the interface of statistics and computing; the use of algorithms and software for statistical purposes.
709 questions
3
votes
1
answer
111
views
Wilcoxon signed-rank test always finding randomly generated ratios to be different from unity?
I need to test if a group of ratios from an experiment (calculated as condition1/condition2 from paired samples) is significantly different from one. All measures in the experiment are positive, real ...
0
votes
0
answers
32
views
Using inducing points for exact gaussian process inference
I'm a bit muddled with the inference of gaussian processes using inducing points, in particular in conditions when this should be exact inference and not an approximation.
For a gaussian process $f\...
1
vote
1
answer
62
views
Computing or sampling from a posterior with samples observed through a dimensional reduction transformation
Let $\boldsymbol \theta$ be a vector of parameters, with a known prior $\pi(\boldsymbol \theta)$.
Let $\boldsymbol x_1,...,\boldsymbol x_n$ be i.i.d. samples with $\boldsymbol x|\boldsymbol \theta$ ...
3
votes
0
answers
35
views
Is it more efficient to optimize precision than covariance matrix?
This might be a silly question, but I want to make sure I'm not missing something.
Say that we want to fit a multivariate Gaussian distribution $\mathcal{N}(\mu, \Sigma)$ to some data by maximizing ...
0
votes
0
answers
30
views
How do I maximize this specific loglikelihood function in R?
I am interested in determining the parameters mu and lambda that maximizes the function:
...
2
votes
1
answer
164
views
Why does the `boot` R package require the `i` argument? When does it make the package easier to use instead of harder? [closed]
I want to use the boot package to calculate bootstrap confidence intervals for the mean. Sure, I could do this by inverting a t-test, but I want to see what happens ...
1
vote
0
answers
22
views
Recycling MCMC samples for another data set from the same distribution
Suppose I'm given $\theta_0$ and I want to sample data from a density $f(Y|\theta_0)$ and then sample from the posterior of $\theta|Y$ (given, obviously, some prior). I want to do this lots of times, ...
1
vote
0
answers
43
views
how to approximate the eigendecomposition of a correlation matrix when the data have been standardized?
Context
I am working to develop a penalized regression framework that will scale up to analyzing high dimensional data with a certain correlation structure. Let $X$ represent an $n \times p$ matrix of ...
0
votes
0
answers
17
views
Why does the forecast for some series degrade when using a VARMA model comparing to independent ARMA models?
I am working with multiple time series that I suspect are correlated, and I have assumed that using a VARMA model would at least not degrade the forecasts of each series, if not improve them. However, ...
1
vote
0
answers
30
views
Statistically determining a count of particles
I perform experiments to do measurements on various pharmaceuticals. One such measurement is interested in the number of particles which is confined into a small volume.
The raw data in my experiment ...
1
vote
1
answer
91
views
Using Rao-Blackwell to improve the estimator of P(X/Y < t)
X and Y are independent N (0, 1) random variables, we want to approximate P (X/Y ≤ t), for a fixed number t.
The first part of the problem was to describe a naive Monte Carlo estimate. I described ...
3
votes
0
answers
57
views
XGBoost with time lagged predictors
I have a prediction problem that involves an outcome $Y_t$ and predictors $X_t$ that vary with time $t$. I want to fit a regression of $Y_t$ on $(t,X_t)$ including also lagged versions of $X$, i.e., $...
0
votes
0
answers
15
views
Estimating the alpha-support of a distribution [duplicate]
I want to estimate the $\alpha$-support ($S_{α}$) of a distribution, which is the minimum volume subset of the support $S$ of a probability distribution $P$ ($S= supp(P)$), that supports a probability ...
1
vote
0
answers
24
views
How to compute the expected value of a function of a random variable given its log-density function? [closed]
Given a log-density function $\mathcal{L}f_{X}(x)$ of an 1d continuous random variable $X \in \mathcal{L}^{\infty}$ and an 1d polynomial function $h: \mathcal{I}(X) \to \mathbb{R}$, the expected value ...
0
votes
1
answer
35
views
Wilcoxon Sign Rank Test with differing list lengths [closed]
I am running a Wilcoxon Sign-Rank test with two lists. One contains 5 elements and the other contains 6. They were taken from the same place but under different conditions. I am trying to compute the ...
1
vote
0
answers
56
views
Sampling from a very high dimensional Gaussian
I would like to a sample from a Gaussian $N(0,K)$ where $K$ is a kernel gram matrix, so that $K=[K_{ij}]$ with $K_{ij} = k(x_i,x_j)$ for some positive definite function $k$. The first issue is that ...
0
votes
0
answers
13
views
False negative B coefficient following multiple linear regression? [duplicate]
I'm running a linear regression in SPSS to test for effects of a binary variable (X) on cost of hospital admission. The variable is correlated with a cost increase of around $3000W
When the model has ...
3
votes
1
answer
71
views
Expectation and variance of bivariate skew normal distribution
I am fitting a bivariate skew normal distribution to a 2D data through the sn package in R. I get a $2 \times 1$ vector of ...
0
votes
0
answers
23
views
Metropolis-Hastings on domain $(2, \infty)$
I'm trying to understand the Metropolis Hastings algorithm in depth by solving some exercise problems. On one of them, I'm asked to use MH to generate samples from
$$f(x) = c \frac{1}{\theta}e^{-\frac{...
0
votes
0
answers
25
views
Reference datasets for conditional density estimation
[In case you feel inclined to close this question because I'm asking for a dataset - I'm looking for solutions in the spirit of point 2 (on-topic) in the accepted answer to this question about asking ...
4
votes
1
answer
107
views
Check if a coin flips randomly, but it can have a different number of sides each toss
I would like to check if a coin flips randomly, based on observational data. The catch is, the coin can have two sides, but also three, four, up to nine. The number of sides differs in each ...
0
votes
0
answers
77
views
Efficient way to encode a set of large covariance matrices
I have a computational model that involves having a set of $K$ covariance matrices, $\{\Sigma_1, ..., \Sigma_K\}$ with each $\Sigma_i \in R^{n \times n}$. Storing all these full covariance matrices is ...
1
vote
0
answers
35
views
mixed effect model question
Hi i have a certain task i want to solve:
For two months, participants played an app, in which they played 5 different therapeutic games (TGs). At the beginning of each session, they also completed a ...
0
votes
0
answers
39
views
Resampling only $N$ particles out of $N(T+1)$ weighted particles
I have a bunch of weighted particles $(Z^{(i, k)}, W^{(i, k)})$ from a distribution $\mu(dz)$ where $i=1, \ldots, N$ and $k=0,\ldots, T$. These defines the following empirical approximation
$$
\hat{\...
0
votes
1
answer
90
views
Guidance for statistical analysis on academic collaborations [closed]
I am currently engaged in a research project involving data analysis in the field of academic publications and author collaborations.
The dataset I'm working with includes information such as ...
2
votes
1
answer
254
views
Confusion on Chi-Squared test results
Basically, I have the data where I am trying to assess whether there is any correlation between the gender of the manager and the gender of people within their team. I decided to do the chi-squared ...
2
votes
0
answers
75
views
Bias and Variance of a Honest Random Forest
I am trying to read the paper Estimation and Inference of Heterogeneous Treatment
Effects using Random Forests. In the section 3.1(Theoretical Background), page 13 paragraph 2, The authors have ...
1
vote
0
answers
67
views
Most efficient way of converting a Quarto document to a presentation [closed]
I currently have a large body of statistics lecture notes that I wrote in Rmarkdown/Quarto document format, and I am looking to convert these notes to Quarto presentations in the simplest way possible....
0
votes
0
answers
44
views
Inference of Beta-Bernoulli Distribution
Assume $x_1, x_2, \cdots, x_n$ follows a $Bern(\pi_0)$, Let $y_{ik}$ follows $Beta(\alpha,\beta)$, $i\in \{1,\cdots, n\}$, and $k\in \{1,\cdots, K\}$. Let $z_k$ follows a Bernoulli Distribution with a ...
0
votes
0
answers
252
views
Why do I get NaN p values in some variables when using mgcv to fit generalized additive mixed models?
I am currently trying to fit milk production data collected for three years by using generalized additive mixed model through mgcv package. The problem is, I am getting NaN p values in some variables. ...
0
votes
0
answers
61
views
Is this algorithm for robust estimation of the covariance matrix sensible?
I have a high dimensional dataset $\bf{X} \subset \mathbb{R}^d$, which is multimodal and has outliers. I want to estimate a robust measure of association, something like the correlation between two ...
1
vote
0
answers
19
views
How do I numerically compute $I(X;CX+Y)$?
Given that $X\sim\text{Bernoulli}(\nu)$ for some $\nu\in(0,1)$, and $Y\sim N(0,1)$ are independent random variables. I want to compute the mutual information $I(X;CX+Y)$, where $C$ is some known non-...
2
votes
1
answer
154
views
Generating MLE in python - Problem witth the function [closed]
After my previous question (here) I tried to improve my work with this distribution. I'm using the parametrization $$f_X(x) = \frac{\theta^2 x^{\theta-1}(\gamma-\log(x))}{1+\theta\gamma} \mathbb{I}(0&...
4
votes
2
answers
378
views
How to iteratively calculate weighted standard error to report alongside a weighted mean
I have a group of individuals for which I would like to report a mean and a weighted error. The data that I observe on a daily basis are two independent $iid$ random variables with unknown ...
4
votes
1
answer
197
views
bootstrap confidence interval and p-value calculations for finite population sizes
I am comparing the difference of medians between two groups of sample sizes $n1$ and $n2$. I would like to confirm that my boostrap approach for finite population size without pooling sample data ...
12
votes
6
answers
2k
views
How to generate from this distribution without inverse in R/Python?
I am working with a distribution with the following density: $$f(x) = - \frac{(\alpha+1)^2 x^\alpha \log(\beta x)}{1-(\alpha + 1)\log(\beta)}$$ and CDF $$\mathbb{P} (X \leq x) = \int_0^x - \frac{(\...
0
votes
0
answers
82
views
Exact computation of Bayes factor for multivariate normal
Question: Is there a known, exact expression for the Bayes factor between two multivariate normal hypotheses?
Let $H_1$ and $H_2$ be two subsets of $R^d$ with normal priors $\pi(\mu|H_j)$. The sets $...
1
vote
1
answer
86
views
XGBoost: Why is the "approximate algorithm" faster?
I am reading T. Chen, C. Guestrin, "XGBoost: A Scalable Tree Boosting System", 2016 (arXiv), which is seemingly full of typos. They propose the so-called "approximate algorithm" (...
0
votes
0
answers
59
views
Can I use Kendall's correlation to determine the correlation between continuous and binary variables?
I have a dataset where the first 6 columns correspond to binary entries referring to sick/not sick and 2 additional columns with age and a specific score (dim of the dataset 60x8). I need to generate ...
0
votes
0
answers
405
views
Testing whether a set of points on the unit sphere is uniformly distributed
The canonical way to do the test is to perform the spherical harmonic transform of the empirical distribution and then check that the power spectrum decays, but this is presumably fairly expensive. Is ...
1
vote
0
answers
323
views
Algorithm for Irwin Hall Distribution [closed]
I've been trying to create a function for the Irwin Hall distribution that doesn't face the same issue as the unifed package implementation. Because the function suffers from numerical issues, I ...
0
votes
0
answers
166
views
Overflow when computing binomial distribution for large n [duplicate]
How do you compute a binomial probability distribution for large $n$? If I try the following, I get an integer overflow in any programming language:
...
3
votes
0
answers
125
views
How does one create comparable metrics when the original metrics are not comparable?
The problem I have is that I have several groups (say 3 to make discussion concrete) with observations from a true but known distribution $p^*_1, p^*_2, p^*_3$ (or 3 populations). I can compute some ...
2
votes
0
answers
335
views
Rust or C++ for computational statistics? [closed]
I'll work on developing computer-intensive Bayesian sampling algorithms for spatio-temporal applications (e.g. MCMC, KF). So far, I'm thinking of coding my algorithms in C++. However, I've heard that ...
1
vote
0
answers
125
views
Sampling From Four-Parameter Beta Distribution
Most statistical computing packages have functions to sample out of a two-parameter Beta distribution, but not many offer functions for sampling out of a four-parameter Beta distribution. If I want to ...
1
vote
0
answers
32
views
How to filter out outliers from dataset? [duplicate]
I am trying to filter out outliers given a data set (maximum of 50 samples). For example:
dataSet = (10.0, 10.0, 10.0, 10.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...
1
vote
0
answers
94
views
Introducing the third spatial dimension (Z-coordinate) in the generalized dissimilarity model in R
Generalized dissimilarity modeling (R package gdm) is a tool to study the relative effects of the user-defined environmental gradients and the spatial distance decay on the pair-wise dissimilarity ...
0
votes
0
answers
28
views
How to simulate multivariate posterior distribution with a flat prior in general?
If I know that the posterior $p(\theta_1,\dots,\theta_m|y)$ can be written $p(\theta_1|\theta_2,\dots,\theta_m,y)p(\theta_2|\theta_3,\dots,\theta_m|y)\dots p(\theta_m|y)$ where $p(\cdot|y)$ in each ...
0
votes
0
answers
119
views
Numerical Stability when Inverse CDF Sampling from Truncated Density
Let $f(x)$ be the pdf of a random variable that we want to truncate to the interval $[a,b]$ and then sample from it. Let $F(x)$ denote the corresponding cdf. We can use inverse cdf sampling and ...
0
votes
0
answers
79
views
Conditional expectation of Uniform given sum of Bernoulli trials
Given:
[]
Find the conditional probability distribution of theta given Sn and compute the conditional expectation.
I believe the distribution of Sn will be a binomial with mean ntheta and variance (...