Newest 'computational-statistics+r' Questions

0 votes

0 answers

30 views

How do I maximize this specific loglikelihood function in R?

I am interested in determining the parameters mu and lambda that maximizes the function: ...

learner123

11

asked Jul 27 at 0:53

2 votes

1 answer

165 views

Why does the `boot` R package require the `i` argument? When does it make the package easier to use instead of harder? [closed]

I want to use the boot package to calculate bootstrap confidence intervals for the mean. Sure, I could do this by inverting a t-test, but I want to see what happens ...

Dave

67.2k

asked Jul 25 at 10:46

12 votes

6 answers

2k views

How to generate from this distribution without inverse in R/Python?

I am working with a distribution with the following density: $$f(x) = - \frac{(\alpha+1)^2 x^\alpha \log(\beta x)}{1-(\alpha + 1)\log(\beta)}$$ and CDF $$\mathbb{P} (X \leq x) = \int_0^x - \frac{(\...

Lucas cantu

197

asked Apr 9, 2023 at 15:16

1 vote

0 answers

323 views

Algorithm for Irwin Hall Distribution [closed]

I've been trying to create a function for the Irwin Hall distribution that doesn't face the same issue as the unifed package implementation. Because the function suffers from numerical issues, I ...

user1329307

113

asked Feb 27, 2023 at 14:42

1 vote

0 answers

94 views

Introducing the third spatial dimension (Z-coordinate) in the generalized dissimilarity model in R

Generalized dissimilarity modeling (R package gdm) is a tool to study the relative effects of the user-defined environmental gradients and the spatial distance decay on the pair-wise dissimilarity ...

Kryštof Chytrý

133

asked Dec 28, 2022 at 21:56

0 votes

0 answers

48 views

Calculating the computer memory required to estimate a fixed effects logit model

What is the calculation to determine the computer memory necessary to estimate a fixed effects logit model? Will this calculation vary across statistical software? For example, let's say I want to ...

nicholas

103

asked Oct 25, 2022 at 23:40

1 vote

1 answer

280 views

Solving estimating equation using R [closed]

I am working with different types of data and comparing a variety of estimating equation approaches which share a multi-dimensional parameter $\beta$. Given a set of data $\boldsymbol{X}$, is there an ...

J McVittie

56

asked Sep 27, 2022 at 3:43

2 votes

1 answer

1k views

Coding the likelihood function for logistic regression

I would appreciate help in understanding if I made a correct interpretation and coding of the likelihood function for logistic regression. Background: For a task I am going to write a function in <...

idlatva

33

asked Sep 20, 2022 at 23:13

0 votes

0 answers

27 views

computing the "mix effect" in the evolution of a variable on R

Background: I am currently working on a dataset representing the evolution of the income of a hospital which hosts several medical specialties. The ratio income/medical act increases between year $N$ ...

gérard

1

asked Aug 4, 2022 at 12:23

2 votes

1 answer

222 views

I have a group that contains zero data, and I want to know whether it is considered normally or not normally distributed.?

I have a group (group2) that contains zero data, and I want to know whether it is considered normal or not normally distributed. I used the SPSS software, and it showed this result. I also tried R, ...

halah A

21

asked May 22, 2022 at 9:49

5 votes

2 answers

1k views

Antithetic method for monte carlo when bounds of the integral are infinite

I wanted to apply Monte Carlo with antithetic variables to estimate $\int_{0}^{\infty} e^{-x} \,dx$ (equal to 1). I used this R code. ...

Rootsyl

53

asked May 1, 2022 at 9:47

2 votes

1 answer

355 views

Is best model selection by RSS equivalent to best model selection by R2 value?

I am trying to compare models using K-Fold-CV using the regsubsets function in R. By default, it states that the ideal model is determined by the $RSS$. I wished to ...

h3ab74

133

asked Dec 1, 2021 at 1:04

0 votes

0 answers

66 views

Bootstrapping example ISL - pages 194-195

I'm currently learning about bootstrapping using the book Introduction to Statistical Learning, and am struggling to understand what the point of using the boot ...

h3ab74

133

asked Nov 10, 2021 at 22:36

1 vote

0 answers

11 views

Can Statistics help in reducing row-wise computation time on data.frames? [duplicate]

I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...

Capri

21

asked Sep 9, 2021 at 0:12

1 vote

0 answers

192 views

How to use statistics to speed up row-wise computations on a data.frame?

I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...

Capri

21

asked Sep 4, 2021 at 5:09

1 vote

0 answers

47 views

Assessing performance of entire nonlinear SUR in R?

I understand the McElroy's R-squared is used to assess the entire SUR system in Hamann & Henningsen's systemfit package in R. However, I've been running SURs ...

250gallontank

11

asked Aug 6, 2021 at 15:17

1 vote

1 answer

6k views

Maximum likelihood estimation of gamma distribution using optim in R

I'm trying to get the shape and scale parameters for this data using the optim function in R. ...

Seb

69

asked May 22, 2021 at 16:38

32 votes

13 answers

2k views

If R were reprogrammed from scratch today, what changes would be most useful to the statistics community? [closed]

Many people in the statistics community and other academic fields use R as their primary language for data analysis and statistical computing. It is a wonderful ...

Community wiki

Ben

1 vote

0 answers

33 views

Generate data for significance testing

I want to generate a data set with a pre-specified significance level. Let's say we have 2 covariates x1, x2, and an outcome variable y. We fit a linear regression model as follow: ...

Rasel Biswas

11

asked Dec 23, 2020 at 4:22

2 votes

0 answers

68 views

Computationally estimate $E[f(\hat \beta_1 X)]$ where $\hat \beta_1$ is the estimated coefficient obtained by ordinary least squares regression?

Let $(X_1,Y_1),(X_2,Y_2),\dots,(X_5,Y_5)$ be i.i.d samples and consider the regression model $$ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \quad \quad \text{for} \ i \in \{1,2,\dots,5\}, $$ where $\...

Bertus101

805

asked Nov 16, 2020 at 19:40

0 votes

0 answers

33 views

How to explain why categorical inputs (as is) versus categorical inputs (as binary) produce different coefficients in logistic regression

Background When using logistic regression, I observe that using categorical inputs (as-is i.e. leaving them as categorical with type factor) versus using said categorical inputs but transforming them ...

kyg

101

asked Sep 23, 2020 at 12:27

3 votes

2 answers

402 views

Is my logistic regression model correct?

I have a factorial design 2*2 (A and B). Both variables with two responses high (coded as 1) and low (coded as 0) and I have a response variable $y$, my logistic model include interaction between A ...

Mustapha Hakkou Asz

31

asked Jul 22, 2020 at 17:34

1 vote

1 answer

240 views

Hypothesis Testing: Z-test or T-test? and how to test the null hypothesis?

I have a question as below: I have to generate two portfolio returns in R and compare their means. However, I am not sure which test should be applied in this case. And how can I indicate at least 2%...

zeze

11

asked May 23, 2020 at 17:03

6 votes

2 answers

14k views

Advice on running random forests on a large dataset

I am planning to run random forests to predict a binary outcome. I have a relatively (from my point of view) large dataset, composed of 500,000 units and around 100 features (a mix of continuous, ...

Miker

61

asked Apr 2, 2020 at 18:36

1 vote

0 answers

197 views

Correlation between two 3d arrays [closed]

I know that a is related to b is related to c. I have data for two years: In 2018, d=10. In 2019, d=0. I would like to know the correlation between a, b and c, for both d=0 and d=10 in order to ...

Eugene

31

asked Jul 25, 2019 at 16:17

0 votes

0 answers

388 views

Ordinal multinomial logistic regression on one-hot encoded data

I have a task I am unable to tackle by principle. I'm working on survey data for one of our clients such that my design matrix is made of one-hot vectors with 15 features (originally 3 variables with ...

Hany Daher

1

asked Jul 19, 2019 at 13:38

1 vote

0 answers

557 views

Random-effects-meta-analysis-simulation: zero-estimates for tau^2

I am working on simulating a random-effects-model for comparison of the DerSimonian-Laird-method vs. Hartung-Knapp-Sidik-Jonkman-method in R. To do so, I chose different combinations of mu (true ...

feher

11

asked Jun 6, 2019 at 10:12

4 votes

1 answer

13k views

Why median is NA for some of the group outcomes in survival analysis?

I'm trying to do survival analysis using the Followup information, patient_vital_status and the expression of gene. I'm using like below: ...

beginner

175

asked Apr 11, 2019 at 12:41

0 votes

1 answer

4k views

Use Shapley Values for explaining whole Data Frame instead of a Single prediction [closed]

I am working on a Machine Learning model. One of the requests is to explain the models 'decisions' to the business. Therefore I am using Shapley Values (Game Theory). I found an interesting example ...

R overflow

245

asked Apr 10, 2019 at 7:59

1 vote

0 answers

690 views

Using R to maximize a two parameter Weibull model via multivariate extension of Newton-Raphson method

I am just getting back into using R for the first time in a while, and wrote some code to perform the aforementioned task in the title. I was wondering if anyone could take a look at it and see if ...

Tai Lopez

143

asked Feb 28, 2019 at 17:59

2 votes

0 answers

86 views

Using Markov random field spatial weights to account for spatial autocorrelation

I am looking at the relationship between life expectancy and smoking rate within the London boroughs. I thus created a bayesx spatial regression model including a term which assigns spatial ...

Steve Ahlswede

133

asked Feb 7, 2019 at 19:18

8 votes

2 answers

1k views

Subtracting very small probabilities - How to compute? [duplicate]

This question is an extension of a related question about adding small probabilities. Suppose you have log-probabilities $\ell_1 \geqslant \ell_2$, where the corresponding probabilities $\exp(\ell_1)$...

Ben

133k

asked Dec 18, 2018 at 4:15

3 votes

1 answer

1k views

Vectorised computation of logsumexp

In this related post there is an explanation of how you can add together two very small probabilities using the logsumexp function, and how this can be programmed into base ...

Ben

133k

asked Dec 14, 2018 at 1:51

2 votes

1 answer

947 views

Data perturbation with normal variables

I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R". In the Students projects session (chapter 24), I am doing ...

Gabriel Monteiro

43

asked Dec 12, 2018 at 18:40

4 votes

1 answer

1k views

Adding very small probabilities—How to compute?

In some problems, probabilities are so small that they are best represented in computational facilities as log-probabilities. Computational problems can arise when you try to add these small ...

Ben

133k

asked Nov 29, 2018 at 3:06

6 votes

2 answers

526 views

Evaluating the hazard function when the CDF is close to 1?

I need to evaluate a hazard function $h(t;\theta) = \dfrac{f(t;\theta)}{1-F(t;\theta)}$, where $f$ and $F$ are a pdf and a cdf, respectively, at many values of $t$ (and for several values of the ...

Hazardous

63

asked Oct 15, 2018 at 15:26

5 votes

1 answer

7k views

When to switch off the continuity correction in chisq.test function?

From this Research paper Table1 Association of RAD51-AS1 expression with clinicopathological features of EOC patients I see that p-value is calculated based on Chi-...

stack_learner

303

asked Aug 16, 2018 at 15:58

0 votes

0 answers

354 views

Is this model really saturated? Is there a good alternative to an ANOVA for saturated models?

I have chemical data from groundwater wells that were exposed to two separate amendments of vegetable oil and monitored over several time points. I am trying to set up a 2-way ANOVA to test if the ...

Kt McBride

41

asked Jul 11, 2018 at 21:54

1 vote

1 answer

84 views

Compute $P(X > Y)$ for two random variables with unknown distributions from Markov chains

I would like to compute the probability $P(X > Y)$ with R. I used JAGS to sample from the posterior distribution of each variable, so I have a Markov chain for each variable (of length $3\times 10^{...

FatherNucleus

21

asked Jun 12, 2018 at 21:58

2 votes

0 answers

42 views

Selection of differentially expressed genes

I don't have any statistical background. Have some questions. I see in some research papers they select differentially expressed genes based on fold change and p.value. And in some other papers I ...

beginner

175

asked May 28, 2018 at 9:10

1 vote

2 answers

265 views

Can you confirm the complexity of an algorithm using simulations?

Let's say we're solving a regression problem where $X$ is an $n \times p$ matrix, $Y$ is $n \times 1$, and $\beta$ is $p \times 1.$ Then if we use the naive approach to solving the least squares ...

user9685396

11

asked Apr 23, 2018 at 16:51

1 vote

2 answers

1k views

Can normalization modify the results of an analysis?

I'm currently working with R on some oculometric data produced after an experiment I made. I have two conditions ("Risky" & "Safe") and 3 mental states ("Focus" & "Around" & "Mind ...

Pyxel

127

asked Nov 22, 2017 at 10:31

2 votes

1 answer

8k views

How to use kappa() or cohen.kappa() to check for matching assigments between observers

down vote favorite Example 1. In a common situation, if we have the following data: ...

Yatrosin

121

asked Oct 5, 2017 at 0:55

0 votes

1 answer

264 views

What is the analytical test to run in case of 1 measure for three groups?

I have this case where the data look like that Trial Person 1 person 2 1 4.7. 3.8 2 7.1. 6.3 3 5.4. 4.5 I want ...

Omar113

232

asked Sep 22, 2017 at 8:16

-1 votes

1 answer

428 views

How to build a roc curve and do statistical analysis for discrete classifiers?

I have 5 supervised databases containing S similar documents and N not similar. Within each base, I separated 10 samples with bootstrapping. These samples contain the identifier of each document. For ...

Denise

1

asked Aug 27, 2017 at 17:52

0 votes

1 answer

298 views

How to compute more efficiently in R the probability distribution of the sum of non-independent discrete random variables

I hope you are well. Let $\{s_0,\,s_1,\ldots,\,s_T\}$ be a sequence of discrete random variables and denote $S_t=s_0+s_1+\cdots+s_t$, with $S_0=0$. For all $t\in\{1,\ldots,\,T\}$, suppose that $s_t|\{...

Student1981

131

asked Jul 20, 2017 at 23:18

3 votes

1 answer

2k views

Difference between pROC and ROCR in compute time and accuracy

I've been calculating receiver operating characteristic (ROC) curves on very large datasets for my thesis. I tried to run these in the pROC R package but the ...

Tom Kelly ケリー・トム

208

asked Mar 24, 2017 at 11:30

4 votes

1 answer

560 views

Computing by hand vs. R's magic wand

I have fitted a polynomial to a data set that I have. Thus I have obtained coefficients $\beta_i$ for $i=0,1,2$ and have a relationship of the form is $$Y=\beta_0+\beta_{1}X+\beta_{2}X^{2}+\varepsilon$...

l7ll7

1,305

asked Jan 29, 2017 at 21:25

1 vote

0 answers

291 views

Taguchi Crossed Array Design Creation and Analysis [closed]

I am trying to create the following Taguchi Crossed Array Design in R. ...

MYaseen208

2,759

asked Nov 28, 2016 at 9:16

3 votes

1 answer

735 views

simulation of t distribution - repeated sampling

I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10. Do I need to create a single vector of data from the rt generator, ...

user119563

81

asked Nov 23, 2016 at 17:18

All Questions

Related Tags