Questions tagged [bias]
The difference between the expected value of a parameter estimator & the true value of the parameter. Do NOT use this tag to refer to the [bias-term] / [bias-node] (ie the [intercept]).
1,105 questions
21
votes
4
answers
3k
views
Is it (always) better to build a model prior to viewing the data?
When it comes to data exploration, aside from checking for outliers (human error), correlated covariates, and missing values, is there a downside to viewing relationships between a response variable ...
3
votes
1
answer
58
views
Would withholding marks until students respond to a survey bias the responses?
My university is running an anonymous survey, mostly to check if we understand how we are going to be assessed, if we are comfortable with the material, and if we find the material well organised. ...
0
votes
0
answers
42
views
What is the asymptotic bias of the nonparametric histogram density estimator?
I am trying to derive an expression for the asymptotic bias of the nonparametric histogram density estimator in order to compare it to the bias of the kernel density estimator. In term of notation, ...
1
vote
1
answer
30
views
Seeking advice on bad control when the relationship is indirect (overcontrol bias?)
I’m working on a project where we’re trying to estimate the impact of country fragility status on project outcomes. We have data on the total grant funding allocated to each project, which we’ve been ...
2
votes
1
answer
160
views
How do I estimate the mean and variance from data?
I have made a periodogram (plot given below) from some 1D data, and would like to estimate the bias and variance of it. because by minimizing both I could select the ideal window size for calculating ...
2
votes
0
answers
19
views
Asymptotic properties of the estimator in IV panel regression
I am studying the rationale behind using the SYS-GMM estimator from Baltagi's book.
Consider the following Data Generating Process (DGP):
\begin{equation}
y_{it} = \rho y_{i,t-1} + \alpha_i + \nu_{it},...
0
votes
0
answers
26
views
Underestimation of Empirical Coefficient of Variation When Sampling from a Log-Normal Distribution with High CV
I am working with a log-normal distribution where I input a coefficient of variation (CV) to generate the variance. I then sample 𝑛 times from this distribution. The issue I am encountering is that ...
4
votes
1
answer
107
views
Statistical test for bias in a simulation study to tell if the estimate is biased or unbiased
Say we have a classic simulation study: we choose true parameter value $\theta$, then we generate N datasets and on each of those we run the model we want to test. So we get N estimates $\hat{\theta}...
2
votes
0
answers
39
views
Can I Perform a Micro Synthetic Control Analysis with Different Aggregation Levels for Treatment and Control Groups?
I am conducting an analysis using the microsynth package in R to evaluate the impact of increased police presence on various outcome measures obtained from an official survey. My treatment areas ...
3
votes
3
answers
56
views
The ability to engage in an unhealthy behavior (e.g. smoking) late in life may indicate strong overall health. Is there a name for this "bias"?
Is there a name for this phenomenon in epidemiology? I'd like to read about examples and approaches to identify and account for it.
The scenario:
Imagine you have an elderly cohort.
Most of these ...
2
votes
2
answers
94
views
Simple OLS to measure correlation
I have two variables, X and Y, and I have good reason to believe that they are simultaneously determined.
$$Y = a_{1} + b_{1}X + u_{1}\tag{1}$$
$$X = a_{2} + b_{2}Y + u_{2}\tag{2}$$
My question is ...
0
votes
0
answers
10
views
How to find the bias gradient for localization problem?
The work is about finding the cramer-rao bound when the estimator is biased. The algorithm based on is from Rethinking Biased Estimation: Improving Maximum Likelihood and the Cram´er–Rao Bound, and it ...
0
votes
0
answers
14
views
How to deal with Bias Gradient Matrix for biased CRB(Cramér–Rao bound) calculation if the gradient matrix is m-by-n but $m \neq n$?
I am doing a model for collabrative localization and using the CRB(Cramér–Rao bound) as the localization performance measurement. I want to consider interference caused by NLOS and clutter, therefore ...
1
vote
0
answers
22
views
Why is the threshold term incorporated into the weight vector in linear classifiers?
In the context of linear classifiers, such as the perceptron or logistic regression, I understand that the decision boundary is defined by a linear combination of input features and weights, plus a ...
0
votes
1
answer
36
views
Expectation of reciprocal residual sum of squares
Consider an IID sample $X_1 , \cdots, X_n \in \mathbb{R}^d$, then what can we say about the expectation of the reciprocal residuals when projecting onto every other point? That is can we compute
$$
E \...
0
votes
0
answers
12
views
Interpreting differences between confidence intervals with and without adjustment for clustering. Should those from adjustment be wider?
I am trying to interpret an article involving data from a cluster randomised trial, where the confidence intervals for effect sizes are said to have been adjusted 'using the standard errors of the ...
2
votes
1
answer
54
views
Why do top-down approaches produce biased coherent forecasts?
The context is forecasting hierarchical time series. Section 10.4 of "Forecasting: Principles and Practice" (2nd edition) by Hyndman & Atahnasopoulos states:
One disadvantage of all top-...
0
votes
1
answer
72
views
Why does increasing model complexity reduce bias over the entire data distribution?
In ML, we often talk about the bias-variance tradeoff, and how increasing model complexity both reduces bias and increases variance. I understand why increasing model complexity reduces bias at first, ...
0
votes
0
answers
124
views
How to prevent a regression model from overpredicting lower values and underpredicting higher ones
I'm trying to predict rental prices for houses that are listed for sale. My training set consists of houses that are listed for rent. With the predictions, my idea is to then compute an estimate of ...
0
votes
0
answers
29
views
How were the asymmetric recovery ranges in Table A5 of Appendix F from AOAC determined?
I am trying to understand how the recovery ranges in Table A5 of Appendix F from AOAC (https://www.aoac.org/wp-content/uploads/2019/08/app_f.pdf) were determined.
I did not understand how the ...
3
votes
0
answers
39
views
Is there a likelihood penalization or (im)proper prior to remove estimation bias for gamma parameters?
So I am learning that maximum likelihood estimation of the parameters for a gamma distribution are biased. As far as I understand there is no guarantee in general that there exists a prior (or base ...
2
votes
0
answers
32
views
Bias in treatment effect estimation (adaptive design)
could someone explain what is the source of bias of treatment effect estimation in context of adaptive designs?
The FDA guidance for industry for adaptive designs https://www.fda.gov/media/78495/...
8
votes
2
answers
167
views
What to show as error-bar if the bootstrap distribution is biased?
Say I have a sample, of finite size $N$, and I compute some statistic $\theta$ from it. I want to plot this sample estimate, $\hat{\theta}$, with an error-bar.
To compute the error, I am using ...
1
vote
0
answers
47
views
Question on nonlinear least squares
Consider the following equation for $Y>0$:
$$
(1) \quad \log(Y)=\log(\gamma)+\log(\alpha+\beta X)+\epsilon.
$$
Assume that $E(\epsilon| X)=c\neq 0$. What are the consequences of this assumption on ...
0
votes
0
answers
26
views
Target encoding in linear regression
I have a dataset with the loss rates of each contract as dependent variable. As independent variables I have country (four values), profession (5 values) and income (continous variable). I apply ...
3
votes
1
answer
98
views
Granular difference-in-differences with non-repeating unit of observation
I want to analyze changes in characteristics of job postings around an (exogenous) event. However, rather than conducting the analysis at the job poster level (e.g., a company or geographic area), my ...
0
votes
0
answers
20
views
Multi-reader study design: split-plot or fully crossed?
I am a radiologist designing a study where 230 CT scans of cancer patients will be evaluated by 5 radiologists. There will be two sets of evaluations: one where radiologist is aided by an AI Computer-...
0
votes
0
answers
15
views
Methods to level spatiotemporal data when simultaneous measurements of the same physical quantity are different
I have data of (simulated) measurements of the density content of ionized ozone in the atmosphere with three different satellites. Specifically, I have a unique set of observations x1,x2,x3,...xN for ...
0
votes
0
answers
9
views
What are the conditions to specify the regressors in Heckman 2 step model
I have the issue of interpreting the STATA command Twostep Heckman model, and also adding fixed effects to the model.
My analysis is based on a panel dataset and I want to solve for the selection bias ...
1
vote
0
answers
51
views
Standard practice to show Biased CRBs
I have a problem with four-parameter estimation. I have derived the variances for the estimated parameters using Monte Carlo simulations (numerical ones) and theoretical ones using the inverse of the ...
0
votes
1
answer
35
views
Does the intuitive sense of overfitting in this mechanism design context exemplify bias-variance tradeoff?
Suppose the (we can say unanimous) preference of each individual in a society is to select roads for travel by placing 95% weight on the objective of minimizing travel time, and the remaining 5% ...
1
vote
0
answers
64
views
Degrees of freedom for biased sample autocorrelation function
I want to find the expression for the a biased estimate of the autocorrelation function for a time series $X$, and am doing this from the biased estimated autocovariance function for lag $k$, divided ...
0
votes
1
answer
63
views
conditional-on-positives bias
I am reading the Bad COP section on https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html#bad-cop. I am confused if
$$
E[Y|T = 1] - E[Y|T = 0] = \\
E[Y|Y > 0, T = 1]...
0
votes
0
answers
17
views
How to calculate bias having three groups?
Three groups of people each tried one of the three different applications and answered a questionnaire on a Likert scale from 0 to 4. Their age and experience in video games were also asked (on a ...
2
votes
1
answer
51
views
Is Assessment Bias a type of Observer Bias?
Based on the definitions of assessment bias and observer bias I have found bellow, seems like assessment bias is a type of observer bias?
Assessment bias:
If the observer knows the treatment being ...
1
vote
0
answers
121
views
Regression Discontinuity Design, staggered treatment allocation
I'm unsure if this complex allocation rule is appropriate for RDD. I will have data for a staggered rollout treatment where there will be about 10 rounds of selection over two years for services (...
3
votes
1
answer
48
views
Can we get the conditional bias of the estimator at a generic $x$?
Consider a standard ERM problem based on quadratic loss where we solve
$$
\hat{f}_n\in \operatorname*{arg min}_{f\in \mathcal{F}} R_\text{tr}(f)
$$
where $R_\text{tr}(f)=\frac{1}{n}\sum_{i=1}^n (Y_i-f(...
6
votes
3
answers
728
views
Do autocorrelated residuals cause OLS coefficients to be biased?
I see different answers everywhere. Intuitively, I would think if residuals are autocorrelated then there is some information that you are not incorporating into your model and is a sign of a biased ...
0
votes
0
answers
37
views
Derivation of bias of LASSO in the ortnormal case
In the following lecture slides by Breheny, P. (2016) titled "Adaptive lasso, MCP, and SCAD" from the High Dimensional Data Analysis course at the University of Iowa, slide 2 presents the ...
6
votes
5
answers
369
views
Name of this fallacy and how to reach conclusion
While handling some demographic data, I stuck in a position where (I did not disclose the actual data set and whom it is concerning, therefore I replace it with hypothetical data) I could not reach a ...
2
votes
1
answer
76
views
Check if method of moments estimator is unbiased for $X_1...X_n$ being a random sample from $\mathcal{U}_{[-\theta,\theta]}$
I am not sure how to do this. To find the method of moments estimator I did:
$$E[X] = \frac{-\theta + \theta}{2} = 0$$
use 2nd moment:
$$E[X^2] = \frac{(-\theta)^2 + -(\theta^2) + \theta^2}{3} = \frac{...
7
votes
1
answer
84
views
On unbiasedness of an optimal forecast
Diebold "Forecasting in Economics, Business, Finance and Beyond" (v. 1 August 2017) section 10.1 lists absolute standards for point forecasts, with the first one being unbiasedness: Optimal ...
3
votes
2
answers
160
views
Instrumental variable as a control variable
I understand that instrumental variable is used to address endogeneity bias since there could be correlation between the variable of interest and the error term.
Suppose now we want to see the ...
1
vote
0
answers
28
views
Multiplicative BIASES in Log-Log regression
When we try to estimate elasticities by regression, we usually estimate the following regression model:
$$ln(y) = \beta_0 + \beta_1 ln(x_1) + \dots + \epsilon$$
When we expect to have endogenous ...
3
votes
1
answer
162
views
How does non-collapsibility and the lack of an error term affect coefficients in regression
I have read from here that in nonlinear models such as the logit and Cox, because of a lack of an error term, coefficients may be biased (typically towards zero) when covariates are omitted; I see how ...
4
votes
2
answers
167
views
What does it mean that BLUP is unbiased, given a linear two-level model?
Suppose we have the following mixed effects model for observation $Y_{ij}$ of pupil $i$ in school $j$:
$Y_{ij}=b_0 + u_j + e_{ij}$
Here, $b_0$ is a fixed parameter for the "grand mean", $u_j$...
0
votes
1
answer
91
views
Bias vs consistency in instrumental variable estimation
So in Mostly Harmless Econometrics, page 154, they analyse the bias of instrumental variables:
They consider the case of one endogenous variable $x$, multiple instruments $Z$, and $\eta$ is the ...
0
votes
0
answers
32
views
Treating longitudinal data as a repeated cross section
Can you introduce bias by treating longitudinal data as a repeated cross section?
Suppose I have two data sources measuring the same variables. The first is a balanced panel dataset $\{y^{long}_{it},X^...
0
votes
0
answers
30
views
Why does the jackknife reduce bias? [duplicate]
Given a sample $x = (x_1, \ldots, x_n)$, define $x_{(-i)}$ as the sample values excluding sample $x_i$. That is,
$$
x_{(-i)} = (x_1, \ldots, x_{i-1}, x_{i+1}, \ldots x_n).
$$
Now given estimator $T(x)$...
0
votes
0
answers
55
views
Why the MSE of the fitted data is not equal to the sum of the bias and the variance in R?
I use simple linear regression and I want to find the decomposition of MSE, that is as a sum of the bias, the variance and the variance of the error terms. I have the following code:
...