Questions tagged [rule-of-thumb]
Advises on statistical analysis that are often useful in practice (but are not always guaranteed to work).
62 questions
1
vote
2
answers
56
views
Prevalence upper bound when no events are observed in sample
In a sample of 2000 observations, no positive cases were found, but I want to still be able to provide an upper bound for the prevalence.
The general rule that people seem to use is to simply take 3/n,...
5
votes
3
answers
926
views
Rule of Thumb meaning in statistics
I was wondering what the term "rule of thumb" actually means in statistics. Why did they select this name, for example, for sample size calculation? Is it like an approximation based on ...
4
votes
1
answer
413
views
Academic reference on the "minimum of 5 expected counts per cell" rule of thumb for Chi-Square test?
I remember very clearly an academic paper that investigated thoroughly the "minimum of 5 expected counts per cell" rule of thumb when conducting a chi-square test on a contingency table. It ...
1
vote
1
answer
940
views
Number of observations for multiple linear regression
I've read multiple responses here that the recommended range of observations per each IV is 20. However, I'd like to ask for clarification on whether this number (20) includes dummy variables.
Let's ...
1
vote
0
answers
1k
views
Identification High Leverage for Logistic Regression
As seen below from Piet De Jong, Generalized Linear Models for Insurance Data, for linear model, we can identify high leverage, if the value of leverage>2p/n (or hii > 2p/n) then the indicates ...
1
vote
1
answer
1k
views
Rule of Thumb for multivariable linear regression analysis
Rule-of-thumb suggested by Green(1991)(https://doi.org/10.1207/s15327906mbr2603_7)
: "Some support was obtained for a rule-of-thumb that $N ≥ 50 + 8 m $ for the multiple correlation and $N ≥104 + ...
0
votes
0
answers
328
views
Sum of sensitivity and specificity meaningful?
I came across an article in the BMJ that claims that a useful rule of thumb metric for assessing a medical test's performance is that the sum of specificity and sensitivity should be greater than 1.5 ...
0
votes
1
answer
280
views
Doc2vec Corpus Size Recommendation
I'm trying to make a semantic search engine with Doc2Vec where you query the model a document and it returns N most similar documents from its training corpus. I'm having trouble pushing accuracy past ...
3
votes
1
answer
200
views
What is the optimal sampling split for stratified sampling?
Suppose we have a population split into two strata of interest and we want to estimate the mean of some quantity of interest pertaining to the population. Denote the quantities of interest for the ...
0
votes
1
answer
194
views
On the existence of rule of thumb for machine learning algorithms
I want to know if there are conditions about the minimum number of observations to have (the relation between the number of variables and the number of presence and absence records) in order to use ...
13
votes
3
answers
2k
views
Revisiting the Rule of Three
The rule of three is a method for calculating a 95% confidence interval when estimating $p$ from a set of $n$ IID Bernoulli trials with no successes.
My understanding from its derivation is that the ...
2
votes
1
answer
510
views
Is there any rationale for rules of thumb for maxlag selection?
I understand that the optimum number of lags can be selected using some information criterion (e.g. Akaike (AIC), Schwartz Bayesian Information Criterion (SBIC) etc.). However, in order to select the ...
1
vote
0
answers
106
views
Bounds for hypothesis testing with non-normal distributions
After doing randomized experiment with two variants (AB tests), I want to do hypothesis testing without the normal distribution assumption to compare their means. Since the samples sizes are different,...
1
vote
0
answers
2k
views
Reference for the rule of thumb $\sqrt{n}$ for the number of bins of an histogram
Does anyone have an idea where the rule of thumb $\sqrt{n}$ for the number of bins of an histogram come from? I need a reference to put in my article. I remember that rule from my college times, but ...
1
vote
0
answers
550
views
Does the 10% condition apply to geometric distributions?
I know that the 10% condition is required to assume independence in binomial settings. However, in a geometric setting where there is no fixed n-value, how would one determine independence?
I am a ...
3
votes
0
answers
77
views
How small is too small to fit a reasonable long memory model?
When looking at papers about long memory they tend to analyze data sets whose length is in the thousands, see http://www.math.canterbury.ac.nz/~m.reale/pub/Reaetal2011.pdf for an example.
My question ...
1
vote
1
answer
318
views
How should I treat categorical variables for the purpose of the "One in 10 rule"?
Hope a basic question like this is alright! To avoid overfitting, we try to maintain enough cases for the least common event per explanatory variable; people usually recommend at least 10.
How should ...
2
votes
0
answers
55
views
Bootstrap and distribution of the test statistic
One rule of thumb says that we should avoid using the bootstrap to construct a confidence interval for some test statistic, if (1) the test statistic is greatly affected by outliers or by rare ...
3
votes
1
answer
540
views
Is there a general rule of thumb for how big the ratio ( Sample size / parameters in model ) should be?
I'm aware of model selection methods based on AIC and backwards/forwards selection, but I'm wondering if there is a general rule about how big your sample size $n$ should be (as an absolute minimum) ...
1
vote
1
answer
2k
views
Equality of variances based on Rule of Thumb (for very small groups)
I have three groups with very small sample sizes (6 obs per each group). In order to use ANOVA I just want to make sure that the equality of variances is satisfied, as Levene's test does not have ...
1
vote
0
answers
181
views
Minimum sample size heuristic for using Hotelling's T$^2$ test
My understanding is that for non-normal data, a widely accepted heuristic for using the t-test is a sample size of at least $n=30$. Since for smaller sample sizes the distribution of the sample mean ...
5
votes
1
answer
648
views
How good an approximation is sampling with replacement to sampling without replacement?
I'm learning about probability with Feller's book and he states that, when the population size $n$ is big in comparison with the sample size $r$, then $n_r$, which is a shorthand for $\frac{ n!}{(n-r)!...
18
votes
2
answers
8k
views
What is the logic behind "rule of thumb" for meaningful differences in AIC?
I've been struggling to find meaningful guidelines for comparing models based on differences in AIC. I keep coming back to the rule of thumb offered by Burnham & Anderson 2004, pp. 270-272:
...
3
votes
1
answer
1k
views
Paper for the rule of thumb for homogenity of variances
I often read about a rule of thumb, one can apply if the test of equal variances returns a significant result. Depending on the source, the proposed maximal $F$ ratio varies between 1.5 and 4, which ...
1
vote
1
answer
4k
views
Multiple Regression - Minimum Observations Per Dummy Variable
I believe the rule of thumb is at least 10-20 observations per predictor variable, but I was hoping to get some additional clarification.
Suppose a hypothetical example with dependent variable of ...
1
vote
1
answer
657
views
rule of thumb for the number of degrees of freedom of a chi-squared to converge to normal
It is said that if $X\sim\chi^2_{(k)}$, then $Y=\frac{X-k}{\sqrt{2k}}$ converges to $Y\sim N(0,1)$ when $k$ tends to $\infty$.
Is there a commonly used rule of thumb for a $k$ to be "big enough" to ...
4
votes
0
answers
153
views
Optimal number of folds for nested cross-validation? [duplicate]
I'm doing nested cross-validation on a classification problem (inner loop to tune hyperparameters, outer loop to select algorithm).
I've often heard that a reasonable number of folds for standard (I.e....
3
votes
3
answers
3k
views
How to judge skewness based on the mean and range?
Is there any rule of thumb to judging skewness of data based on its mean and range (max-min)? I found such implication in one of the papers I'm reading and I can't see why it would be obvious.
The ...
2
votes
3
answers
2k
views
How many cycles are required to model seasonality?
How many cycles should I have in my time series to model the seasonality component of it so that I can get rid of it and just look at the trend?
2
votes
0
answers
327
views
'68–95–99.7 rule' equivalent for multivariate normal distribution [duplicate]
The 68–95–99.7 rule is a convenient way of quickly getting an overview of the spread of some normally distributed data.
I am wondering if there is an equivalent rule in the multivariate case? and how ...
15
votes
3
answers
10k
views
Relation between learning rate and number of hidden layers?
Is there any rule of thumb between depth of a neural network and learning rate? I have been noticing that the deeper the network is, the lower the learning rate must be.
If that's correct, why is ...
2
votes
2
answers
5k
views
Rule of Thumb for Accepting the Null Hypothesis
Usually, hypothesis testing is performed with the goal to make a conclusions about the statistical significance of an effect, i.e. $H_0 \ \hat{=} \ \text{No Effect} $ vs. $H_1 \ \hat{=} \ \text{Effect}...
10
votes
1
answer
373
views
How does one formalize a prior probability distribution? Are there rules of thumb or tips one should use?
While I like to think I have good grasp of the concept of prior information in Bayesian statistical analysis and decision making, I often have trouble wrapping my head around its application. I have ...
3
votes
1
answer
3k
views
Rule of thumb for the number of significant digits to report
What is a good rule of thumb for the number of significant digits to report? Preferably, the rule of thumb is given in a citable publication. I am particularly interested in a rule that does not ...
3
votes
1
answer
5k
views
How to tell a coefficient of variation (C.V.) is high?
I have a group of data series which contain hundreds of values.
I've calculated the C.V.s of these data series, but I don't know how I can recognise if they are high or low according to the C.V. ...
1
vote
0
answers
72
views
Statistical texts [closed]
Neophyte social science researcher here. I keep finding myself in the position of needing to reference the interpretation of some statistic or other (e.g. >.40 is adequate factor loading, >.80 is ...
6
votes
2
answers
3k
views
Rule of thumb for using logarithmic scale
When I am given a variable, I usually decide whether to take its logarithm based on gut feeling. Usually I base it on its distribution - if it has long tail (like: salaries, GDP, ...) I use logarithms....
0
votes
0
answers
283
views
How do "stacked" repeated samples influence our rules of thumb for minimum samples size in regression?
A professor in one of my graduate statistics courses once said, when briefly reviewing simple linear regression: "I would never EVER fit a line to fewer than 8-10 data points, it would make me feel......
8
votes
1
answer
20k
views
Optimal number of bins in histogram by the Freedman–Diaconis rule: difference between theoretical rate and actual number
Wikipedia reports that under the Freedman and Diaconis rule,
the optimal number of bins in an histogram, $k$ should grow as
$$k\sim n^{1/3}$$
where $n$ is the sample size.
However, If you look at ...
0
votes
3
answers
982
views
How strongly supported is a conclusion drawn from the correlation of 10 observations?
The Spearman’s rank correlation coefficient calculated for these observations is 0.826 (p<0.005). Is there something like a "rule of thumb" minimum number of observations?
15
votes
2
answers
38k
views
"When to use boxplot and when barplot" rules (of thumb?)
Both box-and-whisker plot and bar chart are appropriate graphics
for ANOVA according to The R Book (Crawley, 2013),
but which is more appropriate? I suppose it depends on situation... can anybody help ...
8
votes
3
answers
3k
views
Is there a simple method to calculate the minimum number of datalines needed for a machine learning algorithm with x variables?
Here is the problem: I am making a machine learning algorithm that takes the inputs and outputs of some software I've written, and I don't know how many datalines to produce to get results that are a '...
3
votes
2
answers
255
views
Rule of thumb when drawing N samples from a discrete distribution with N possible values with replacement
I'm looking for an explanation and possibly the name of a rule of thumb:
When drawing N samples with replacement from a discrete uniform distribution of N values, it is very likely that:
1/3 of the ...
4
votes
0
answers
403
views
Intuition or explanation of "rule of thumb" for loss of degrees of freedom in chi-square for model selection/knot placement?
A document I read many years ago (to do with actuarial education; sorry I can't link to a copy) used chi-square tests to assess lack of fit of various models to mortality rates which were treated ...
-1
votes
3
answers
3k
views
Minimum items for questionnaire
Is there a minimum item for questionnaire? Some researchers said that the total items for teenagers ranging from 80-110 items
1
vote
0
answers
6k
views
Min-max vs. standard deviation when plotting error bars
I am running some experiments in which I plot the values of some variable y (averaged over 4 runs) against some independent variable x. I also compare the effect of z. In other words:
horizontal axis:...
11
votes
1
answer
9k
views
Histogram with uniform vs non-uniform Bins
This question describes the basic difference between a uniform and a nonuniform histogram. And this question discusses the rule of thumb for picking the number of bins of a uniform histogram that ...
3
votes
2
answers
465
views
Is Rule of Three inappropriate in some cases?
I'm building binomial proportion confidence intervals for a patient dataset containing the frequency of home nursing visits during the week prior to hospital admission. The freq. of home visit ...
7
votes
0
answers
1k
views
Rule of thumb - number of predictors - Poisson regression rates
I am interested in estimating a Poisson regression for mortality rates, with number of deaths as the dependent variable and log(population size) as the offset. I have 50 observations (states).
I am ...
3
votes
2
answers
2k
views
Rules of Thumb to choose an initial number of class intervals and refine that choice (potentially automatically)
I was wondering if there are established rules of thumb (or algorithms) that, given a set of observations can help:
choose an initial number of class intervals.
refine that choice to a better number.
...