Skip to main content

Questions tagged [computational-statistics]

Refers to the interface of statistics and computing; the use of algorithms and software for statistical purposes.

Filter by
Sorted by
Tagged with
3 votes
1 answer
111 views

Wilcoxon signed-rank test always finding randomly generated ratios to be different from unity?

I need to test if a group of ratios from an experiment (calculated as condition1/condition2 from paired samples) is significantly different from one. All measures in the experiment are positive, real ...
user443937's user avatar
0 votes
0 answers
32 views

Using inducing points for exact gaussian process inference

I'm a bit muddled with the inference of gaussian processes using inducing points, in particular in conditions when this should be exact inference and not an approximation. For a gaussian process $f\...
Charlie's user avatar
1 vote
1 answer
62 views

Computing or sampling from a posterior with samples observed through a dimensional reduction transformation

Let $\boldsymbol \theta$ be a vector of parameters, with a known prior $\pi(\boldsymbol \theta)$. Let $\boldsymbol x_1,...,\boldsymbol x_n$ be i.i.d. samples with $\boldsymbol x|\boldsymbol \theta$ ...
heckman's user avatar
  • 13
3 votes
0 answers
35 views

Is it more efficient to optimize precision than covariance matrix?

This might be a silly question, but I want to make sure I'm not missing something. Say that we want to fit a multivariate Gaussian distribution $\mathcal{N}(\mu, \Sigma)$ to some data by maximizing ...
dherrera's user avatar
  • 1,405
0 votes
0 answers
30 views

How do I maximize this specific loglikelihood function in R?

I am interested in determining the parameters mu and lambda that maximizes the function: ...
learner123's user avatar
2 votes
1 answer
164 views

Why does the `boot` R package require the `i` argument? When does it make the package easier to use instead of harder? [closed]

I want to use the boot package to calculate bootstrap confidence intervals for the mean. Sure, I could do this by inverting a t-test, but I want to see what happens ...
Dave's user avatar
  • 67.2k
1 vote
0 answers
22 views

Recycling MCMC samples for another data set from the same distribution

Suppose I'm given $\theta_0$ and I want to sample data from a density $f(Y|\theta_0)$ and then sample from the posterior of $\theta|Y$ (given, obviously, some prior). I want to do this lots of times, ...
Thomas Lumley's user avatar
1 vote
0 answers
43 views

how to approximate the eigendecomposition of a correlation matrix when the data have been standardized?

Context I am working to develop a penalized regression framework that will scale up to analyzing high dimensional data with a certain correlation structure. Let $X$ represent an $n \times p$ matrix of ...
Tabitha Peter's user avatar
0 votes
0 answers
17 views

Why does the forecast for some series degrade when using a VARMA model comparing to independent ARMA models?

I am working with multiple time series that I suspect are correlated, and I have assumed that using a VARMA model would at least not degrade the forecasts of each series, if not improve them. However, ...
Rocio's user avatar
  • 1
1 vote
0 answers
30 views

Statistically determining a count of particles

I perform experiments to do measurements on various pharmaceuticals. One such measurement is interested in the number of particles which is confined into a small volume. The raw data in my experiment ...
gokudegrees's user avatar
1 vote
1 answer
91 views

Using Rao-Blackwell to improve the estimator of P(X/Y < t)

X and Y are independent N (0, 1) random variables, we want to approximate P (X/Y ≤ t), for a fixed number t. The first part of the problem was to describe a naive Monte Carlo estimate. I described ...
stat_student123's user avatar
3 votes
0 answers
57 views

XGBoost with time lagged predictors

I have a prediction problem that involves an outcome $Y_t$ and predictors $X_t$ that vary with time $t$. I want to fit a regression of $Y_t$ on $(t,X_t)$ including also lagged versions of $X$, i.e., $...
Iván Díaz's user avatar
0 votes
0 answers
15 views

Estimating the alpha-support of a distribution [duplicate]

I want to estimate the $\alpha$-support ($S_{α}$) of a distribution, which is the minimum volume subset of the support $S$ of a probability distribution $P$ ($S= supp(P)$), that supports a probability ...
SpaakC's user avatar
  • 1
1 vote
0 answers
24 views

How to compute the expected value of a function of a random variable given its log-density function? [closed]

Given a log-density function $\mathcal{L}f_{X}(x)$ of an 1d continuous random variable $X \in \mathcal{L}^{\infty}$ and an 1d polynomial function $h: \mathcal{I}(X) \to \mathbb{R}$, the expected value ...
Alice Springs's user avatar
0 votes
1 answer
35 views

Wilcoxon Sign Rank Test with differing list lengths [closed]

I am running a Wilcoxon Sign-Rank test with two lists. One contains 5 elements and the other contains 6. They were taken from the same place but under different conditions. I am trying to compute the ...
Indefeasible's user avatar
1 vote
0 answers
56 views

Sampling from a very high dimensional Gaussian

I would like to a sample from a Gaussian $N(0,K)$ where $K$ is a kernel gram matrix, so that $K=[K_{ij}]$ with $K_{ij} = k(x_i,x_j)$ for some positive definite function $k$. The first issue is that ...
WeakLearner's user avatar
  • 1,531
0 votes
0 answers
13 views

False negative B coefficient following multiple linear regression? [duplicate]

I'm running a linear regression in SPSS to test for effects of a binary variable (X) on cost of hospital admission. The variable is correlated with a cost increase of around $3000W When the model has ...
James's user avatar
  • 1
3 votes
1 answer
71 views

Expectation and variance of bivariate skew normal distribution

I am fitting a bivariate skew normal distribution to a 2D data through the sn package in R. I get a $2 \times 1$ vector of ...
Kasthuri's user avatar
  • 173
0 votes
0 answers
23 views

Metropolis-Hastings on domain $(2, \infty)$

I'm trying to understand the Metropolis Hastings algorithm in depth by solving some exercise problems. On one of them, I'm asked to use MH to generate samples from $$f(x) = c \frac{1}{\theta}e^{-\frac{...
Christina Kataki's user avatar
0 votes
0 answers
25 views

Reference datasets for conditional density estimation

[In case you feel inclined to close this question because I'm asking for a dataset - I'm looking for solutions in the spirit of point 2 (on-topic) in the accepted answer to this question about asking ...
Scriddie's user avatar
  • 2,439
4 votes
1 answer
107 views

Check if a coin flips randomly, but it can have a different number of sides each toss

I would like to check if a coin flips randomly, based on observational data. The catch is, the coin can have two sides, but also three, four, up to nine. The number of sides differs in each ...
Nucular's user avatar
  • 453
0 votes
0 answers
77 views

Efficient way to encode a set of large covariance matrices

I have a computational model that involves having a set of $K$ covariance matrices, $\{\Sigma_1, ..., \Sigma_K\}$ with each $\Sigma_i \in R^{n \times n}$. Storing all these full covariance matrices is ...
dherrera's user avatar
  • 1,405
1 vote
0 answers
35 views

mixed effect model question

Hi i have a certain task i want to solve: For two months, participants played an app, in which they played 5 different therapeutic games (TGs). At the beginning of each session, they also completed a ...
nof's user avatar
  • 11
0 votes
0 answers
39 views

Resampling only $N$ particles out of $N(T+1)$ weighted particles

I have a bunch of weighted particles $(Z^{(i, k)}, W^{(i, k)})$ from a distribution $\mu(dz)$ where $i=1, \ldots, N$ and $k=0,\ldots, T$. These defines the following empirical approximation $$ \hat{\...
Physics_Student's user avatar
0 votes
1 answer
90 views

Guidance for statistical analysis on academic collaborations [closed]

I am currently engaged in a research project involving data analysis in the field of academic publications and author collaborations. The dataset I'm working with includes information such as ...
idkwhatiamdoing's user avatar
2 votes
1 answer
254 views

Confusion on Chi-Squared test results

Basically, I have the data where I am trying to assess whether there is any correlation between the gender of the manager and the gender of people within their team. I decided to do the chi-squared ...
prettyPlease's user avatar
2 votes
0 answers
75 views

Bias and Variance of a Honest Random Forest

I am trying to read the paper Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. In the section 3.1(Theoretical Background), page 13 paragraph 2, The authors have ...
yo wa's user avatar
  • 137
1 vote
0 answers
67 views

Most efficient way of converting a Quarto document to a presentation [closed]

I currently have a large body of statistics lecture notes that I wrote in Rmarkdown/Quarto document format, and I am looking to convert these notes to Quarto presentations in the simplest way possible....
Rmarkdown_user's user avatar
0 votes
0 answers
44 views

Inference of Beta-Bernoulli Distribution

Assume $x_1, x_2, \cdots, x_n$ follows a $Bern(\pi_0)$, Let $y_{ik}$ follows $Beta(\alpha,\beta)$, $i\in \{1,\cdots, n\}$, and $k\in \{1,\cdots, K\}$. Let $z_k$ follows a Bernoulli Distribution with a ...
LAM_MN's user avatar
  • 1
0 votes
0 answers
252 views

Why do I get NaN p values in some variables when using mgcv to fit generalized additive mixed models?

I am currently trying to fit milk production data collected for three years by using generalized additive mixed model through mgcv package. The problem is, I am getting NaN p values in some variables. ...
Zainab Hassan's user avatar
0 votes
0 answers
61 views

Is this algorithm for robust estimation of the covariance matrix sensible?

I have a high dimensional dataset $\bf{X} \subset \mathbb{R}^d$, which is multimodal and has outliers. I want to estimate a robust measure of association, something like the correlation between two ...
MachineEpsilon's user avatar
1 vote
0 answers
19 views

How do I numerically compute $I(X;CX+Y)$?

Given that $X\sim\text{Bernoulli}(\nu)$ for some $\nu\in(0,1)$, and $Y\sim N(0,1)$ are independent random variables. I want to compute the mutual information $I(X;CX+Y)$, where $C$ is some known non-...
Resu's user avatar
  • 229
2 votes
1 answer
154 views

Generating MLE in python - Problem witth the function [closed]

After my previous question (here) I tried to improve my work with this distribution. I'm using the parametrization $$f_X(x) = \frac{\theta^2 x^{\theta-1}(\gamma-\log(x))}{1+\theta\gamma} \mathbb{I}(0&...
Lucas cantu's user avatar
4 votes
2 answers
378 views

How to iteratively calculate weighted standard error to report alongside a weighted mean

I have a group of individuals for which I would like to report a mean and a weighted error. The data that I observe on a daily basis are two independent $iid$ random variables with unknown ...
bmasri's user avatar
  • 193
4 votes
1 answer
197 views

bootstrap confidence interval and p-value calculations for finite population sizes

I am comparing the difference of medians between two groups of sample sizes $n1$ and $n2$. I would like to confirm that my boostrap approach for finite population size without pooling sample data ...
Docuemada's user avatar
  • 103
12 votes
6 answers
2k views

How to generate from this distribution without inverse in R/Python?

I am working with a distribution with the following density: $$f(x) = - \frac{(\alpha+1)^2 x^\alpha \log(\beta x)}{1-(\alpha + 1)\log(\beta)}$$ and CDF $$\mathbb{P} (X \leq x) = \int_0^x - \frac{(\...
Lucas cantu's user avatar
0 votes
0 answers
82 views

Exact computation of Bayes factor for multivariate normal

Question: Is there a known, exact expression for the Bayes factor between two multivariate normal hypotheses? Let $H_1$ and $H_2$ be two subsets of $R^d$ with normal priors $\pi(\mu|H_j)$. The sets $...
tims's user avatar
  • 1
1 vote
1 answer
86 views

XGBoost: Why is the "approximate algorithm" faster?

I am reading T. Chen, C. Guestrin, "XGBoost: A Scalable Tree Boosting System", 2016 (arXiv), which is seemingly full of typos. They propose the so-called "approximate algorithm" (...
paperskilltrees's user avatar
0 votes
0 answers
59 views

Can I use Kendall's correlation to determine the correlation between continuous and binary variables?

I have a dataset where the first 6 columns correspond to binary entries referring to sick/not sick and 2 additional columns with age and a specific score (dim of the dataset 60x8). I need to generate ...
mango's user avatar
  • 1
0 votes
0 answers
405 views

Testing whether a set of points on the unit sphere is uniformly distributed

The canonical way to do the test is to perform the spherical harmonic transform of the empirical distribution and then check that the power spectrum decays, but this is presumably fairly expensive. Is ...
Igor Rivin's user avatar
1 vote
0 answers
323 views

Algorithm for Irwin Hall Distribution [closed]

I've been trying to create a function for the Irwin Hall distribution that doesn't face the same issue as the unifed package implementation. Because the function suffers from numerical issues, I ...
user1329307's user avatar
0 votes
0 answers
166 views

Overflow when computing binomial distribution for large n [duplicate]

How do you compute a binomial probability distribution for large $n$? If I try the following, I get an integer overflow in any programming language: ...
at01's user avatar
  • 111
3 votes
0 answers
125 views

How does one create comparable metrics when the original metrics are not comparable?

The problem I have is that I have several groups (say 3 to make discussion concrete) with observations from a true but known distribution $p^*_1, p^*_2, p^*_3$ (or 3 populations). I can compute some ...
Charlie Parker's user avatar
2 votes
0 answers
335 views

Rust or C++ for computational statistics? [closed]

I'll work on developing computer-intensive Bayesian sampling algorithms for spatio-temporal applications (e.g. MCMC, KF). So far, I'm thinking of coding my algorithms in C++. However, I've heard that ...
antarctica's user avatar
1 vote
0 answers
125 views

Sampling From Four-Parameter Beta Distribution

Most statistical computing packages have functions to sample out of a two-parameter Beta distribution, but not many offer functions for sampling out of a four-parameter Beta distribution. If I want to ...
nguzman's user avatar
  • 123
1 vote
0 answers
32 views

How to filter out outliers from dataset? [duplicate]

I am trying to filter out outliers given a data set (maximum of 50 samples). For example: dataSet = (10.0, 10.0, 10.0, 10.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...
Java2Avaj's user avatar
1 vote
0 answers
94 views

Introducing the third spatial dimension (Z-coordinate) in the generalized dissimilarity model in R

Generalized dissimilarity modeling (R package gdm) is a tool to study the relative effects of the user-defined environmental gradients and the spatial distance decay on the pair-wise dissimilarity ...
Kryštof Chytrý's user avatar
0 votes
0 answers
28 views

How to simulate multivariate posterior distribution with a flat prior in general?

If I know that the posterior $p(\theta_1,\dots,\theta_m|y)$ can be written $p(\theta_1|\theta_2,\dots,\theta_m,y)p(\theta_2|\theta_3,\dots,\theta_m|y)\dots p(\theta_m|y)$ where $p(\cdot|y)$ in each ...
user45765's user avatar
  • 1,465
0 votes
0 answers
119 views

Numerical Stability when Inverse CDF Sampling from Truncated Density

Let $f(x)$ be the pdf of a random variable that we want to truncate to the interval $[a,b]$ and then sample from it. Let $F(x)$ denote the corresponding cdf. We can use inverse cdf sampling and ...
yrx1702's user avatar
  • 730
0 votes
0 answers
79 views

Conditional expectation of Uniform given sum of Bernoulli trials

Given: [] Find the conditional probability distribution of theta given Sn and compute the conditional expectation. I believe the distribution of Sn will be a binomial with mean ntheta and variance (...
Santori's user avatar

1
2 3 4 5
15