Questions tagged [selection-bias]
Bias introduced by non-random selection of observations, such that the sample is not representative of the underlying population.
105 questions
0
votes
0
answers
8
views
Selection Correction Method with the Variance of the Truncated Normal Distribution
Consider the following data generating process:
$$Y=\beta_0+\beta_1X_1+\beta_2X_2+\beta_3u+\varepsilon,$$
$$D=1\left[\gamma_0+\gamma_1Z+\gamma_2X_1+\gamma_3X_2+u>0\right],$$
where $D=1$ if the unit ...
0
votes
0
answers
10
views
In heckman model can i use dependent variable in selection equation as independent variable in outcome equation? [closed]
I have included above said variable as dependent in selection and independent in outcome equation.. But in results the choosen and most importent variable got omitted.. It is showing ommitted because ...
9
votes
3
answers
1k
views
How to handle bias in 1-5 star ratings?
I was discussing with a friend my bad experience with a health insurance company, and as support for my impression, I pointed out to her that trustpilot gave it a very low score.
There were 70 reviews,...
2
votes
2
answers
86
views
Definition of selection bias vs confounding bias
I've been learning about causal inference, having read Pearl's Primer and Parts I and II of "What If?".
I was under the impression that the definition of "There is confounding" was
...
3
votes
2
answers
72
views
Limitations of propensity score matching
While studying propensity score matching, I was struck by the following thought:
When we are running a logistic regression model to estimate $p(Z=1∣X)$ through some form of parametrization and we are ...
3
votes
1
answer
71
views
How to Hyperparameter Tune without sample Bias?
While searching for ways to fine-tune the hyperparameters (HP) of my models I found out multiple reference to Cross Validation Techniques (K-folds, LPO, OOB.632+) and Ways to Select the Best ...
0
votes
0
answers
42
views
Is this confounding bias or selection bias or both?
Can confounding and selection bias (biased sampling) be the same?
In epidemiology, selection bias and confounding are often considered as two different biases. I wonder if they can be same in certain ...
9
votes
3
answers
768
views
Is prescreening not detrimental for paid surveys?
Survey sites like Swagbucks have often a prescreening mode in which one is asked questions like your annual income, whether you own car or not. It is observed that most of the time if one selects ...
1
vote
0
answers
19
views
Inverse Probability of Weighting in Directed acyclic graph for a binary collider as a selection bias
For a confounder, like the following figure, it is commonly suggested that use of the Inverse Probability of Weighting can remove the path from confounder to exposure so that it removes the backdoor ...
3
votes
1
answer
41
views
Statistical Non-Response and Drop Out
In statistical studies, it is possible that there might be biases:
Someone groups of people are more likely to be represented compared to others groups of people (e.g. poorer people have difficult ...
0
votes
0
answers
20
views
Positivity Assumption in Propensity Score Methods for Pre- and Post-Treatment [duplicate]
I am designing a research project and could use some guidance. My research question focuses on estimating the effect of a new co-responder policing program on use-of-force and arrests. I want to see ...
0
votes
1
answer
63
views
conditional-on-positives bias
I am reading the Bad COP section on https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html#bad-cop. I am confused if
$$
E[Y|T = 1] - E[Y|T = 0] = \\
E[Y|Y > 0, T = 1]...
1
vote
0
answers
121
views
Regression Discontinuity Design, staggered treatment allocation
I'm unsure if this complex allocation rule is appropriate for RDD. I will have data for a staggered rollout treatment where there will be about 10 rounds of selection over two years for services (...
1
vote
1
answer
38
views
Bias introduced by removing early censors
Suppose we have right-censored survival data on some population, and want to compare individuals with "good outcome" (who have no event in the first X months) to individuals with "bad ...
0
votes
4
answers
129
views
Small-sample binary logit and linear models - response to referees [closed]
Background: This cross-sectional study collected 30 thrombosis samples. We evaluated the presence or absence of MP components (dependent variable), where 24 cases had MP (coded as 1) and 6 cases did ...
1
vote
1
answer
52
views
Average treatment effect (ATE) estimation via matching method while outcomes of control population are constant
I want to estimate the average effect of a treatment that was given with a selection bias. To do this, I'd like to use a matching method. Basically, this method involves finding, for each treated ...
0
votes
0
answers
37
views
Find correlation from biased observations
I have a set of observations of a variable Z (shown as the colormap) as a function of two other variables A and B. I want to study how Z varies with respect to A, B, and both A and B (eg. if A ...
2
votes
0
answers
37
views
How to estimate the age of players correctly?
I have the data of players active on a gaming console and the playtime hours corresponding to the games they have played and their age. I want to analyze the top (say 10) games that the people between ...
3
votes
1
answer
68
views
Deriving conditional independence statements for causal graphs with selection nodes
In "basic" causal graphs / DAGs / probabilistic graphical models (PGMs), conditional independence statements can be derived using the d-separation criterion. How does this work if selection ...
1
vote
1
answer
22
views
Choice and endogeneity
An independent variable is endogenous if it is correlated with the error term (source).
In the regression framework, this may happen (only?) in case of omitted variables, simultaneity, or measurement ...
0
votes
0
answers
38
views
Difference in Difference and Selection Into Treatment
Suppose I impose that the true model of some variable of interest is:
$$
Y_{it} = \alpha_i + \beta_t+\tau_{it}D_{it}+\epsilon_{it}
$$
Where $ D_{it} = 1\{E_i \geq t\} $. This is a kind of DID model ...
1
vote
0
answers
271
views
Calculating Inverse Mills Ratio after Probit
I need to compute the Inverse Mills Ratio after the probit command in Stata. From here, I found that predict IMR1, score, will calculate it and store it in IMR1. I ...
5
votes
2
answers
243
views
Correcting for selection bias with standardisation/g-computation
Two sets of methods for correcting for selection bias are g-computation (standardisation) and inverse probability of censoring weighting (IPCW). I'm having a difficult time understanding how to apply ...
0
votes
0
answers
25
views
Right Way to Sample a Validation Set
I am working on a project that uses training data selection techniques; it involves sampling the training set in some smart way rather than sampling randomly. The goal is to compare different data ...
1
vote
1
answer
112
views
Understanding selection bias and endogeneity in marketing
Media mix modelling is concerned with estimating causal impact of marketing investments , a goal which have several challenges. In general, multiple regression models are deployed mapping up total ...
0
votes
0
answers
36
views
Is "skewing the data" and "skewing the results" just selection bias?
I recall various conversations with biologists, ecologists, and foresters that I neglected to ask for clarification on at the time. It doesn't occur in any of my statistics references.
Sometimes in ...
1
vote
0
answers
22
views
Can I use Shapley values with metadata (i.e. information about observations that I didn't train my model on)?
I'm training a set of models (random forest/XGBoost) for an ordinal regression task. I'm (tentatively) planning to use Shapley values to infer feature performance.
I also have some metadata that my ...
1
vote
0
answers
34
views
How can aggregation be helpful in mitigating bias?
I am working on the estimation assessing the impact of exposure to infrastructure (mainly schooling) on the number of children.
Since I do not have migration data, my colleague recommended that I ...
1
vote
1
answer
32
views
Selection Bias in Conflict Studies
A common critique I have heard levied against conflict studies (research examining the causes, consequences, and solutions to violence such as civil war, terrorism, etc.) is the problem of selection ...
2
votes
0
answers
853
views
How to address selection bias in a diff-in-diff study?
We know that selection bias occurs when the treatment and control groups are not comparable, leading to differences in the outcome that are not solely due to the treatment.
First edit: By selection ...
0
votes
0
answers
70
views
Heckman correction for correlation estimates
Suppose I observe random $y_{i,1}, y_{i,2}$, and I wish to estimate the correlation between them.
However, the $y_{i,j}$ are observed subject to some sample selection criterion.
That is,
there are ...
1
vote
1
answer
39
views
Comparing a multi-dose drug to no drug exposure in a cohort study: Censoring events between doses
I am interested in assessing the association between the two doses of a dietary supplement on an event of interest.
The primary exposure is 'two doses of the supplement', and the comparator is 'no ...
0
votes
1
answer
144
views
How to understand random assignment eliminates selection bias in the potential outcomes framework
In Angrist & Pischke's book mostly harmless econometrics, they explain that if the treatment in an RCT $D_i$ is randomly assigned, then $D_i$ is independent of potential outcomes and the following ...
1
vote
0
answers
18
views
Selection bias in postmortem data and creating an artificial earlier study endpoint
I want to analyze postmortem (neuropathology) data from dementia patients who are part of a larger ongoing observational study. At the time of the data freeze (i.e. the time at which I access the data)...
0
votes
1
answer
124
views
Questions Regarding Sampling Bias
I'm taking a course in R: "Data Analysis in R" on Coursera, and I came across this question during the lecture:
A retail store considering updates to their credit card policies randomly ...
2
votes
1
answer
67
views
How to correct for sampling bias in one population when comparing against another
I have two populations that I'd like to compare across certain metrics. However, most members of population A did not respond to our request for data, and those respondents that did are not ...
1
vote
1
answer
51
views
Maximum likelihood of Normal density under selection
Consider the density function given by
$$
\left[\dfrac{\gamma_{\leq0} \mathbb{1}(t \leq 0) + \gamma_{>0} \mathbb{1}(t > 0)}{\gamma_{\leq0}\Phi\left(- \mu / \sigma\right) + \gamma_{>0}\Phi\...
1
vote
0
answers
49
views
Why does normalizing difference score>0.25 indicates selection bias which cannot be corrected by regression?
I am reading Propensity Score Analysis(2014) by Guo and Fraser chapter 1 section 4. Denote $\Delta_X$ normalizing difference score of covariate $X$.
"Following Imbens and Wooldridge, a $\Delta_X$ ...
1
vote
0
answers
148
views
Inverse Mills Ratio Interpretation [closed]
What is the interpretation of inverse mills ratio in Heckman Selection Model ? Why we are including it as an explanatory variable in the OLS estimator?
1
vote
1
answer
24
views
Control Group Selection Bias
I found a study that compared minor physical anomalies(MPA) between certain group of patients with the control group to determine if MPAs occur more frequently among these patients compared to the ...
1
vote
1
answer
19
views
Estimating interactions from non-interacting features
Suppose I have a sample $\mathcal{D}=\{(\mathbf{x}^{i}, y^{i})\}_{i=1\dots M}$ of binary variables $\mathbf{X}$ ($N$ of them) and a continuous variable $Y$ that I want to predict based on a linear ...
1
vote
0
answers
99
views
Nested Cross-Validation with Small dataset
I am currently working with a small dataset (only 175 samples, 45 features) and have been reading on the proper way to cross-validate my model. I had started with a basic cross-validation using a grid ...
2
votes
1
answer
315
views
Sampling weights in Cox proportional hazards models
I'd like to use sampling weights in a Cox proportional hazards regression model to address selection due to different response probabilities. I'm calculating the weights as Inverse Probability of ...
4
votes
3
answers
178
views
When can we get unbiased estimate given biased data?
There was a recent "hot take" tweet by Andrej Karpathy (without any comment or clarification from the author):
real-world data distribution is ~N(0,1)
good dataset is ~U(-2,2)
It provoked ...
3
votes
1
answer
364
views
Selection Models of Publication Bias for Multilevel Meta-analyses?
Are there any suitable selection models of publication bias for multilevel meta-analyses?
I am currently conducting a 3-level meta-analysis and trying to incorporate selection models to assess ...
1
vote
1
answer
30
views
Sample Bias in Study
I have following Study statement:
A council wishes to study the digital awareness of its resident senior population (over 65 years), so it questioned in person 50 residents randomly chosen from a ...
1
vote
1
answer
89
views
How to show how biased a statistic is in a non-random sample, knowing the parameter in the general population?
I have a convenience sample, and want to show readers how biased it is relative to the population it's taken from. It's absolutely certain that the sample is biased, and I want to give readers as many ...
1
vote
1
answer
139
views
Is it true that a larger, representative dataset is always better to use than a smaller, representative dataset?
By "representative" I mean that the data in the dataset faithfully reflects the "underlying signal" a model is trying to tap in to. Is it always true that, as long as increasing ...
0
votes
1
answer
28
views
Can I ignore these individuals without introducing bias?
I have a population that falls under 10 classes. Each individual may or may not come with a location - 83% overall have locations and a breakdown by class is:
Class
# individuals
# with location
# ...
2
votes
0
answers
68
views
What are the statistical fallacies of illusion of control?
Illusion of control* appears in gambling and events involving randomness. For example, choosing a lottery ticket which has an additional information that participant has a control of choosing, such as ...