Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
0 answers
30 views

How do I maximize this specific loglikelihood function in R?

I am interested in determining the parameters mu and lambda that maximizes the function: ...
learner123's user avatar
2 votes
1 answer
165 views

Why does the `boot` R package require the `i` argument? When does it make the package easier to use instead of harder? [closed]

I want to use the boot package to calculate bootstrap confidence intervals for the mean. Sure, I could do this by inverting a t-test, but I want to see what happens ...
Dave's user avatar
  • 67.2k
12 votes
6 answers
2k views

How to generate from this distribution without inverse in R/Python?

I am working with a distribution with the following density: $$f(x) = - \frac{(\alpha+1)^2 x^\alpha \log(\beta x)}{1-(\alpha + 1)\log(\beta)}$$ and CDF $$\mathbb{P} (X \leq x) = \int_0^x - \frac{(\...
Lucas cantu's user avatar
1 vote
0 answers
323 views

Algorithm for Irwin Hall Distribution [closed]

I've been trying to create a function for the Irwin Hall distribution that doesn't face the same issue as the unifed package implementation. Because the function suffers from numerical issues, I ...
user1329307's user avatar
1 vote
0 answers
94 views

Introducing the third spatial dimension (Z-coordinate) in the generalized dissimilarity model in R

Generalized dissimilarity modeling (R package gdm) is a tool to study the relative effects of the user-defined environmental gradients and the spatial distance decay on the pair-wise dissimilarity ...
Kryštof Chytrý's user avatar
0 votes
0 answers
48 views

Calculating the computer memory required to estimate a fixed effects logit model

What is the calculation to determine the computer memory necessary to estimate a fixed effects logit model? Will this calculation vary across statistical software? For example, let's say I want to ...
nicholas's user avatar
  • 103
1 vote
1 answer
280 views

Solving estimating equation using R [closed]

I am working with different types of data and comparing a variety of estimating equation approaches which share a multi-dimensional parameter $\beta$. Given a set of data $\boldsymbol{X}$, is there an ...
J McVittie's user avatar
2 votes
1 answer
1k views

Coding the likelihood function for logistic regression

I would appreciate help in understanding if I made a correct interpretation and coding of the likelihood function for logistic regression. Background: For a task I am going to write a function in <...
idlatva's user avatar
  • 33
0 votes
0 answers
27 views

computing the "mix effect" in the evolution of a variable on R

Background: I am currently working on a dataset representing the evolution of the income of a hospital which hosts several medical specialties. The ratio income/medical act increases between year $N$ ...
gérard's user avatar
2 votes
1 answer
222 views

I have a group that contains zero data, and I want to know whether it is considered normally or not normally distributed.?

I have a group (group2) that contains zero data, and I want to know whether it is considered normal or not normally distributed. I used the SPSS software, and it showed this result. I also tried R, ...
halah A's user avatar
  • 21
5 votes
2 answers
1k views

Antithetic method for monte carlo when bounds of the integral are infinite

I wanted to apply Monte Carlo with antithetic variables to estimate $\int_{0}^{\infty} e^{-x} \,dx$ (equal to 1). I used this R code. ...
Rootsyl's user avatar
  • 53
2 votes
1 answer
355 views

Is best model selection by RSS equivalent to best model selection by R2 value?

I am trying to compare models using K-Fold-CV using the regsubsets function in R. By default, it states that the ideal model is determined by the $RSS$. I wished to ...
h3ab74's user avatar
  • 133
0 votes
0 answers
66 views

Bootstrapping example ISL - pages 194-195

I'm currently learning about bootstrapping using the book Introduction to Statistical Learning, and am struggling to understand what the point of using the boot ...
h3ab74's user avatar
  • 133
1 vote
0 answers
11 views

Can Statistics help in reducing row-wise computation time on data.frames? [duplicate]

I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...
Capri's user avatar
  • 21
1 vote
0 answers
192 views

How to use statistics to speed up row-wise computations on a data.frame?

I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...
Capri's user avatar
  • 21
1 vote
0 answers
47 views

Assessing performance of entire nonlinear SUR in R?

I understand the McElroy's R-squared is used to assess the entire SUR system in Hamann & Henningsen's systemfit package in R. However, I've been running SURs ...
250gallontank's user avatar
1 vote
1 answer
6k views

Maximum likelihood estimation of gamma distribution using optim in R

I'm trying to get the shape and scale parameters for this data using the optim function in R. ...
Seb's user avatar
  • 69
32 votes
13 answers
2k views

If R were reprogrammed from scratch today, what changes would be most useful to the statistics community? [closed]

Many people in the statistics community and other academic fields use R as their primary language for data analysis and statistical computing. It is a wonderful ...
1 vote
0 answers
33 views

Generate data for significance testing

I want to generate a data set with a pre-specified significance level. Let's say we have 2 covariates x1, x2, and an outcome variable y. We fit a linear regression model as follow: ...
Rasel Biswas's user avatar
2 votes
0 answers
68 views

Computationally estimate $E[f(\hat \beta_1 X)]$ where $\hat \beta_1$ is the estimated coefficient obtained by ordinary least squares regression?

Let $(X_1,Y_1),(X_2,Y_2),\dots,(X_5,Y_5)$ be i.i.d samples and consider the regression model $$ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \quad \quad \text{for} \ i \in \{1,2,\dots,5\}, $$ where $\...
Bertus101's user avatar
  • 805
0 votes
0 answers
33 views

How to explain why categorical inputs (as is) versus categorical inputs (as binary) produce different coefficients in logistic regression

Background When using logistic regression, I observe that using categorical inputs (as-is i.e. leaving them as categorical with type factor) versus using said categorical inputs but transforming them ...
kyg's user avatar
  • 101
3 votes
2 answers
402 views

Is my logistic regression model correct?

I have a factorial design 2*2 (A and B). Both variables with two responses high (coded as 1) and low (coded as 0) and I have a response variable $y$, my logistic model include interaction between A ...
Mustapha Hakkou Asz's user avatar
1 vote
1 answer
240 views

Hypothesis Testing: Z-test or T-test? and how to test the null hypothesis?

I have a question as below: I have to generate two portfolio returns in R and compare their means. However, I am not sure which test should be applied in this case. And how can I indicate at least 2%...
zeze's user avatar
  • 11
6 votes
2 answers
14k views

Advice on running random forests on a large dataset

I am planning to run random forests to predict a binary outcome. I have a relatively (from my point of view) large dataset, composed of 500,000 units and around 100 features (a mix of continuous, ...
Miker's user avatar
  • 61
1 vote
0 answers
197 views

Correlation between two 3d arrays [closed]

I know that a is related to b is related to c. I have data for two years: In 2018, d=10. In 2019, d=0. I would like to know the correlation between a, b and c, for both d=0 and d=10 in order to ...
Eugene's user avatar
  • 31
0 votes
0 answers
388 views

Ordinal multinomial logistic regression on one-hot encoded data

I have a task I am unable to tackle by principle. I'm working on survey data for one of our clients such that my design matrix is made of one-hot vectors with 15 features (originally 3 variables with ...
Hany Daher's user avatar
1 vote
0 answers
557 views

Random-effects-meta-analysis-simulation: zero-estimates for tau^2

I am working on simulating a random-effects-model for comparison of the DerSimonian-Laird-method vs. Hartung-Knapp-Sidik-Jonkman-method in R. To do so, I chose different combinations of mu (true ...
feher's user avatar
  • 11
4 votes
1 answer
13k views

Why median is NA for some of the group outcomes in survival analysis?

I'm trying to do survival analysis using the Followup information, patient_vital_status and the expression of gene. I'm using like below: ...
beginner's user avatar
  • 175
0 votes
1 answer
4k views

Use Shapley Values for explaining whole Data Frame instead of a Single prediction [closed]

I am working on a Machine Learning model. One of the requests is to explain the models 'decisions' to the business. Therefore I am using Shapley Values (Game Theory). I found an interesting example ...
R overflow's user avatar
1 vote
0 answers
690 views

Using R to maximize a two parameter Weibull model via multivariate extension of Newton-Raphson method

I am just getting back into using R for the first time in a while, and wrote some code to perform the aforementioned task in the title. I was wondering if anyone could take a look at it and see if ...
Tai Lopez's user avatar
  • 143
2 votes
0 answers
86 views

Using Markov random field spatial weights to account for spatial autocorrelation

I am looking at the relationship between life expectancy and smoking rate within the London boroughs. I thus created a bayesx spatial regression model including a term which assigns spatial ...
Steve Ahlswede's user avatar
8 votes
2 answers
1k views

Subtracting very small probabilities - How to compute? [duplicate]

This question is an extension of a related question about adding small probabilities. Suppose you have log-probabilities $\ell_1 \geqslant \ell_2$, where the corresponding probabilities $\exp(\ell_1)$...
Ben's user avatar
  • 133k
3 votes
1 answer
1k views

Vectorised computation of logsumexp

In this related post there is an explanation of how you can add together two very small probabilities using the logsumexp function, and how this can be programmed into base ...
Ben's user avatar
  • 133k
2 votes
1 answer
947 views

Data perturbation with normal variables

I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R". In the Students projects session (chapter 24), I am doing ...
Gabriel Monteiro's user avatar
4 votes
1 answer
1k views

Adding very small probabilities—How to compute?

In some problems, probabilities are so small that they are best represented in computational facilities as log-probabilities. Computational problems can arise when you try to add these small ...
Ben's user avatar
  • 133k
6 votes
2 answers
526 views

Evaluating the hazard function when the CDF is close to 1?

I need to evaluate a hazard function $h(t;\theta) = \dfrac{f(t;\theta)}{1-F(t;\theta)}$, where $f$ and $F$ are a pdf and a cdf, respectively, at many values of $t$ (and for several values of the ...
Hazardous's user avatar
5 votes
1 answer
7k views

When to switch off the continuity correction in chisq.test function?

From this Research paper Table1 Association of RAD51-AS1 expression with clinicopathological features of EOC patients I see that p-value is calculated based on Chi-...
stack_learner's user avatar
0 votes
0 answers
354 views

Is this model really saturated? Is there a good alternative to an ANOVA for saturated models?

I have chemical data from groundwater wells that were exposed to two separate amendments of vegetable oil and monitored over several time points. I am trying to set up a 2-way ANOVA to test if the ...
Kt McBride's user avatar
1 vote
1 answer
84 views

Compute $P(X > Y)$ for two random variables with unknown distributions from Markov chains

I would like to compute the probability $P(X > Y)$ with R. I used JAGS to sample from the posterior distribution of each variable, so I have a Markov chain for each variable (of length $3\times 10^{...
FatherNucleus's user avatar
2 votes
0 answers
42 views

Selection of differentially expressed genes

I don't have any statistical background. Have some questions. I see in some research papers they select differentially expressed genes based on fold change and p.value. And in some other papers I ...
beginner's user avatar
  • 175
1 vote
2 answers
265 views

Can you confirm the complexity of an algorithm using simulations?

Let's say we're solving a regression problem where $X$ is an $n \times p$ matrix, $Y$ is $n \times 1$, and $\beta$ is $p \times 1.$ Then if we use the naive approach to solving the least squares ...
user9685396's user avatar
1 vote
2 answers
1k views

Can normalization modify the results of an analysis?

I'm currently working with R on some oculometric data produced after an experiment I made. I have two conditions ("Risky" & "Safe") and 3 mental states ("Focus" & "Around" & "Mind ...
Pyxel's user avatar
  • 127
2 votes
1 answer
8k views

How to use kappa() or cohen.kappa() to check for matching assigments between observers

down vote favorite Example 1. In a common situation, if we have the following data: ...
Yatrosin's user avatar
  • 121
0 votes
1 answer
264 views

What is the analytical test to run in case of 1 measure for three groups?

I have this case where the data look like that Trial Person 1 person 2 1 4.7. 3.8 2 7.1. 6.3 3 5.4. 4.5 I want ...
Omar113's user avatar
  • 232
-1 votes
1 answer
428 views

How to build a roc curve and do statistical analysis for discrete classifiers?

I have 5 supervised databases containing S similar documents and N not similar. Within each base, I separated 10 samples with bootstrapping. These samples contain the identifier of each document. For ...
Denise's user avatar
  • 1
0 votes
1 answer
298 views

How to compute more efficiently in R the probability distribution of the sum of non-independent discrete random variables

I hope you are well. Let $\{s_0,\,s_1,\ldots,\,s_T\}$ be a sequence of discrete random variables and denote $S_t=s_0+s_1+\cdots+s_t$, with $S_0=0$. For all $t\in\{1,\ldots,\,T\}$, suppose that $s_t|\{...
Student1981's user avatar
3 votes
1 answer
2k views

Difference between pROC and ROCR in compute time and accuracy

I've been calculating receiver operating characteristic (ROC) curves on very large datasets for my thesis. I tried to run these in the pROC R package but the ...
Tom Kelly ケリー・トム's user avatar
4 votes
1 answer
560 views

Computing by hand vs. R's magic wand

I have fitted a polynomial to a data set that I have. Thus I have obtained coefficients $\beta_i$ for $i=0,1,2$ and have a relationship of the form is $$Y=\beta_0+\beta_{1}X+\beta_{2}X^{2}+\varepsilon$...
l7ll7's user avatar
  • 1,305
1 vote
0 answers
291 views

Taguchi Crossed Array Design Creation and Analysis [closed]

I am trying to create the following Taguchi Crossed Array Design in R. ...
MYaseen208's user avatar
  • 2,759
3 votes
1 answer
735 views

simulation of t distribution - repeated sampling

I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10. Do I need to create a single vector of data from the rt generator, ...
user119563's user avatar