All Questions
Tagged with computational-statistics r
106 questions
0
votes
0
answers
30
views
How do I maximize this specific loglikelihood function in R?
I am interested in determining the parameters mu and lambda that maximizes the function:
...
2
votes
1
answer
165
views
Why does the `boot` R package require the `i` argument? When does it make the package easier to use instead of harder? [closed]
I want to use the boot package to calculate bootstrap confidence intervals for the mean. Sure, I could do this by inverting a t-test, but I want to see what happens ...
12
votes
6
answers
2k
views
How to generate from this distribution without inverse in R/Python?
I am working with a distribution with the following density: $$f(x) = - \frac{(\alpha+1)^2 x^\alpha \log(\beta x)}{1-(\alpha + 1)\log(\beta)}$$ and CDF $$\mathbb{P} (X \leq x) = \int_0^x - \frac{(\...
1
vote
0
answers
323
views
Algorithm for Irwin Hall Distribution [closed]
I've been trying to create a function for the Irwin Hall distribution that doesn't face the same issue as the unifed package implementation. Because the function suffers from numerical issues, I ...
1
vote
0
answers
94
views
Introducing the third spatial dimension (Z-coordinate) in the generalized dissimilarity model in R
Generalized dissimilarity modeling (R package gdm) is a tool to study the relative effects of the user-defined environmental gradients and the spatial distance decay on the pair-wise dissimilarity ...
0
votes
0
answers
48
views
Calculating the computer memory required to estimate a fixed effects logit model
What is the calculation to determine the computer memory necessary to estimate a fixed effects logit model? Will this calculation vary across statistical software? For example, let's say I want to ...
1
vote
1
answer
280
views
Solving estimating equation using R [closed]
I am working with different types of data and comparing a variety of estimating equation approaches which share a multi-dimensional parameter $\beta$. Given a set of data $\boldsymbol{X}$, is there an ...
2
votes
1
answer
1k
views
Coding the likelihood function for logistic regression
I would appreciate help in understanding if I made a correct interpretation and coding of the likelihood function for logistic regression.
Background: For a task I am going to write a function in <...
0
votes
0
answers
27
views
computing the "mix effect" in the evolution of a variable on R
Background:
I am currently working on a dataset representing the evolution of the income of a hospital which hosts several medical specialties.
The ratio income/medical act increases between year $N$ ...
2
votes
1
answer
222
views
I have a group that contains zero data, and I want to know whether it is considered normally or not normally distributed.?
I have a group (group2) that contains zero data, and I want to know whether it is considered normal or not normally distributed.
I used the SPSS software, and it showed this result.
I also tried R, ...
5
votes
2
answers
1k
views
Antithetic method for monte carlo when bounds of the integral are infinite
I wanted to apply Monte Carlo with antithetic variables to estimate $\int_{0}^{\infty} e^{-x} \,dx$ (equal to 1). I used this R code.
...
2
votes
1
answer
355
views
Is best model selection by RSS equivalent to best model selection by R2 value?
I am trying to compare models using K-Fold-CV using the regsubsets function in R.
By default, it states that the ideal model is determined by the $RSS$.
I wished to ...
0
votes
0
answers
66
views
Bootstrapping example ISL - pages 194-195
I'm currently learning about bootstrapping using the book Introduction to Statistical Learning, and am struggling to understand what the point of using the boot ...
1
vote
0
answers
11
views
Can Statistics help in reducing row-wise computation time on data.frames? [duplicate]
I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...
1
vote
0
answers
192
views
How to use statistics to speed up row-wise computations on a data.frame?
I have a data frame with 10,000 rows and 40 columns. I am trying to apply a function to each of these rows. For each row, I am expecting to return a scalar which is the value of the statistic I am ...
1
vote
0
answers
47
views
Assessing performance of entire nonlinear SUR in R?
I understand the McElroy's R-squared is used to assess the entire SUR system in Hamann & Henningsen's systemfit package in R. However, I've been running SURs ...
1
vote
1
answer
6k
views
Maximum likelihood estimation of gamma distribution using optim in R
I'm trying to get the shape and scale parameters for this data using the optim function in R.
...
32
votes
13
answers
2k
views
If R were reprogrammed from scratch today, what changes would be most useful to the statistics community? [closed]
Many people in the statistics community and other academic fields use R as their primary language for data analysis and statistical computing. It is a wonderful ...
1
vote
0
answers
33
views
Generate data for significance testing
I want to generate a data set with a pre-specified significance level.
Let's say we have 2 covariates x1, x2, and an outcome variable y.
We fit a linear regression model as follow:
...
2
votes
0
answers
68
views
Computationally estimate $E[f(\hat \beta_1 X)]$ where $\hat \beta_1$ is the estimated coefficient obtained by ordinary least squares regression?
Let $(X_1,Y_1),(X_2,Y_2),\dots,(X_5,Y_5)$ be i.i.d samples and consider the regression model
$$
Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \quad \quad \text{for} \ i \in \{1,2,\dots,5\},
$$
where $\...
0
votes
0
answers
33
views
How to explain why categorical inputs (as is) versus categorical inputs (as binary) produce different coefficients in logistic regression
Background
When using logistic regression, I observe that using categorical inputs (as-is i.e. leaving them as categorical with type factor) versus using said categorical inputs but transforming them ...
3
votes
2
answers
402
views
Is my logistic regression model correct?
I have a factorial design 2*2 (A and B). Both variables with two responses high (coded as 1) and low (coded as 0) and I have a response variable $y$, my logistic model include interaction between A ...
1
vote
1
answer
240
views
Hypothesis Testing: Z-test or T-test? and how to test the null hypothesis?
I have a question as below:
I have to generate two portfolio returns in R and compare their means.
However, I am not sure which test should be applied in this case. And how can I indicate at least 2%...
6
votes
2
answers
14k
views
Advice on running random forests on a large dataset
I am planning to run random forests to predict a binary outcome. I have a relatively (from my point of view) large dataset, composed of 500,000 units and around 100 features (a mix of continuous, ...
1
vote
0
answers
197
views
Correlation between two 3d arrays [closed]
I know that a is related to b is related to c. I have data for two years: In 2018, d=10. In 2019, d=0. I would like to know the correlation between a, b and c, for both d=0 and d=10 in order to ...
0
votes
0
answers
388
views
Ordinal multinomial logistic regression on one-hot encoded data
I have a task I am unable to tackle by principle. I'm working on survey data for one of our clients such that my design matrix is made of one-hot vectors with 15 features (originally 3 variables with ...
1
vote
0
answers
557
views
Random-effects-meta-analysis-simulation: zero-estimates for tau^2
I am working on simulating a random-effects-model for comparison of the DerSimonian-Laird-method vs. Hartung-Knapp-Sidik-Jonkman-method in R. To do so, I chose different combinations of mu (true ...
4
votes
1
answer
13k
views
Why median is NA for some of the group outcomes in survival analysis?
I'm trying to do survival analysis using the Followup information, patient_vital_status and the expression of gene. I'm using like below:
...
0
votes
1
answer
4k
views
Use Shapley Values for explaining whole Data Frame instead of a Single prediction [closed]
I am working on a Machine Learning model. One of the requests is to explain the models 'decisions' to the business. Therefore I am using Shapley Values (Game Theory).
I found an interesting example ...
1
vote
0
answers
690
views
Using R to maximize a two parameter Weibull model via multivariate extension of Newton-Raphson method
I am just getting back into using R for the first time in a while, and wrote some code to perform the aforementioned task in the title. I was wondering if anyone could take a look at it and see if ...
2
votes
0
answers
86
views
Using Markov random field spatial weights to account for spatial autocorrelation
I am looking at the relationship between life expectancy and smoking rate within the London boroughs.
I thus created a bayesx spatial regression model including a term which assigns spatial ...
8
votes
2
answers
1k
views
Subtracting very small probabilities - How to compute? [duplicate]
This question is an extension of a related question about adding small probabilities. Suppose you have log-probabilities $\ell_1 \geqslant \ell_2$, where the corresponding probabilities $\exp(\ell_1)$...
3
votes
1
answer
1k
views
Vectorised computation of logsumexp
In this related post there is an explanation of how you can add together two very small probabilities using the logsumexp function, and how this can be programmed into base ...
2
votes
1
answer
947
views
Data perturbation with normal variables
I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R". In the Students projects session (chapter 24), I am doing ...
4
votes
1
answer
1k
views
Adding very small probabilities—How to compute?
In some problems, probabilities are so small that they are best represented in computational facilities as log-probabilities. Computational problems can arise when you try to add these small ...
6
votes
2
answers
526
views
Evaluating the hazard function when the CDF is close to 1?
I need to evaluate a hazard function $h(t;\theta) = \dfrac{f(t;\theta)}{1-F(t;\theta)}$, where $f$ and $F$ are a pdf and a cdf, respectively, at many values of $t$ (and for several values of the ...
5
votes
1
answer
7k
views
When to switch off the continuity correction in chisq.test function?
From this Research paper Table1 Association of RAD51-AS1 expression with clinicopathological features of EOC patients I see that p-value is calculated based on Chi-...
0
votes
0
answers
354
views
Is this model really saturated? Is there a good alternative to an ANOVA for saturated models?
I have chemical data from groundwater wells that were exposed to two separate amendments of vegetable oil and monitored over several time points. I am trying to set up a 2-way ANOVA to test if the ...
1
vote
1
answer
84
views
Compute $P(X > Y)$ for two random variables with unknown distributions from Markov chains
I would like to compute the probability $P(X > Y)$ with R.
I used JAGS to sample from the posterior distribution of each variable, so I have a Markov chain for each variable (of length $3\times 10^{...
2
votes
0
answers
42
views
Selection of differentially expressed genes
I don't have any statistical background. Have some questions.
I see in some research papers they select differentially expressed genes based on fold change and p.value. And in some other papers I ...
1
vote
2
answers
265
views
Can you confirm the complexity of an algorithm using simulations?
Let's say we're solving a regression problem where $X$ is an $n \times p$ matrix, $Y$ is $n \times 1$, and $\beta$ is $p \times 1.$
Then if we use the naive approach to solving the least squares ...
1
vote
2
answers
1k
views
Can normalization modify the results of an analysis?
I'm currently working with R on some oculometric data produced after an experiment I made. I have two conditions ("Risky" & "Safe") and 3 mental states ("Focus" & "Around" & "Mind ...
2
votes
1
answer
8k
views
How to use kappa() or cohen.kappa() to check for matching assigments between observers
down vote
favorite
Example 1. In a common situation, if we have the following data:
...
0
votes
1
answer
264
views
What is the analytical test to run in case of 1 measure for three groups?
I have this case where the data look like that
Trial Person 1 person 2
1 4.7. 3.8
2 7.1. 6.3
3 5.4. 4.5
I want ...
-1
votes
1
answer
428
views
How to build a roc curve and do statistical analysis for discrete classifiers?
I have 5 supervised databases containing S similar documents and N not similar. Within each base, I separated 10 samples with bootstrapping. These samples contain the identifier of each document. For ...
0
votes
1
answer
298
views
How to compute more efficiently in R the probability distribution of the sum of non-independent discrete random variables
I hope you are well.
Let $\{s_0,\,s_1,\ldots,\,s_T\}$ be a sequence of discrete random variables and denote $S_t=s_0+s_1+\cdots+s_t$, with $S_0=0$.
For all $t\in\{1,\ldots,\,T\}$, suppose that
$s_t|\{...
3
votes
1
answer
2k
views
Difference between pROC and ROCR in compute time and accuracy
I've been calculating receiver operating characteristic (ROC) curves on very large datasets for my thesis. I tried to run these in the pROC R package but the ...
4
votes
1
answer
560
views
Computing by hand vs. R's magic wand
I have fitted a polynomial to a data set that I have. Thus I have obtained coefficients $\beta_i$ for $i=0,1,2$ and have a relationship of the form is $$Y=\beta_0+\beta_{1}X+\beta_{2}X^{2}+\varepsilon$...
1
vote
0
answers
291
views
Taguchi Crossed Array Design Creation and Analysis [closed]
I am trying to create the following Taguchi Crossed Array Design in R.
...
3
votes
1
answer
735
views
simulation of t distribution - repeated sampling
I am new to simulation exercises in R. I want to create 1000 samples of size 25 from a t distribution with degrees of freedom 10.
Do I need to create a single vector of data from the rt generator, ...