Skip to main content

Questions tagged [selection-bias]

Bias introduced by non-random selection of observations, such that the sample is not representative of the underlying population.

Filter by
Sorted by
Tagged with
0 votes
0 answers
8 views

Selection Correction Method with the Variance of the Truncated Normal Distribution

Consider the following data generating process: $$Y=\beta_0+\beta_1X_1+\beta_2X_2+\beta_3u+\varepsilon,$$ $$D=1\left[\gamma_0+\gamma_1Z+\gamma_2X_1+\gamma_3X_2+u>0\right],$$ where $D=1$ if the unit ...
MinChul Park's user avatar
0 votes
0 answers
10 views

In heckman model can i use dependent variable in selection equation as independent variable in outcome equation? [closed]

I have included above said variable as dependent in selection and independent in outcome equation.. But in results the choosen and most importent variable got omitted.. It is showing ommitted because ...
Tej's user avatar
  • 1
9 votes
3 answers
1k views

How to handle bias in 1-5 star ratings?

I was discussing with a friend my bad experience with a health insurance company, and as support for my impression, I pointed out to her that trustpilot gave it a very low score. There were 70 reviews,...
user6376297's user avatar
2 votes
2 answers
86 views

Definition of selection bias vs confounding bias

I've been learning about causal inference, having read Pearl's Primer and Parts I and II of "What If?". I was under the impression that the definition of "There is confounding" was ...
ThighCrush's user avatar
3 votes
2 answers
72 views

Limitations of propensity score matching

While studying propensity score matching, I was struck by the following thought: When we are running a logistic regression model to estimate $p(Z=1∣X)$ through some form of parametrization and we are ...
richardjoseph's user avatar
3 votes
1 answer
71 views

How to Hyperparameter Tune without sample Bias?

While searching for ways to fine-tune the hyperparameters (HP) of my models I found out multiple reference to Cross Validation Techniques (K-folds, LPO, OOB.632+) and Ways to Select the Best ...
Linces games's user avatar
0 votes
0 answers
42 views

Is this confounding bias or selection bias or both?

Can confounding and selection bias (biased sampling) be the same? In epidemiology, selection bias and confounding are often considered as two different biases. I wonder if they can be same in certain ...
Vincent's user avatar
  • 431
9 votes
3 answers
768 views

Is prescreening not detrimental for paid surveys?

Survey sites like Swagbucks have often a prescreening mode in which one is asked questions like your annual income, whether you own car or not. It is observed that most of the time if one selects ...
Splendid Digital Solutions's user avatar
1 vote
0 answers
19 views

Inverse Probability of Weighting in Directed acyclic graph for a binary collider as a selection bias

For a confounder, like the following figure, it is commonly suggested that use of the Inverse Probability of Weighting can remove the path from confounder to exposure so that it removes the backdoor ...
Elong Chen's user avatar
3 votes
1 answer
41 views

Statistical Non-Response and Drop Out

In statistical studies, it is possible that there might be biases: Someone groups of people are more likely to be represented compared to others groups of people (e.g. poorer people have difficult ...
user avatar
0 votes
0 answers
20 views

Positivity Assumption in Propensity Score Methods for Pre- and Post-Treatment [duplicate]

I am designing a research project and could use some guidance. My research question focuses on estimating the effect of a new co-responder policing program on use-of-force and arrests. I want to see ...
galaxy-friday1017's user avatar
0 votes
1 answer
63 views

conditional-on-positives bias

I am reading the Bad COP section on https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html#bad-cop. I am confused if $$ E[Y|T = 1] - E[Y|T = 0] = \\ E[Y|Y > 0, T = 1]...
Anonny's user avatar
  • 143
1 vote
0 answers
121 views

Regression Discontinuity Design, staggered treatment allocation

I'm unsure if this complex allocation rule is appropriate for RDD. I will have data for a staggered rollout treatment where there will be about 10 rounds of selection over two years for services (...
dcoy's user avatar
  • 372
1 vote
1 answer
38 views

Bias introduced by removing early censors

Suppose we have right-censored survival data on some population, and want to compare individuals with "good outcome" (who have no event in the first X months) to individuals with "bad ...
Nuclear Hoagie's user avatar
0 votes
4 answers
129 views

Small-sample binary logit and linear models - response to referees [closed]

Background: This cross-sectional study collected 30 thrombosis samples. We evaluated the presence or absence of MP components (dependent variable), where 24 cases had MP (coded as 1) and 6 cases did ...
zhiheng yi's user avatar
1 vote
1 answer
52 views

Average treatment effect (ATE) estimation via matching method while outcomes of control population are constant

I want to estimate the average effect of a treatment that was given with a selection bias. To do this, I'd like to use a matching method. Basically, this method involves finding, for each treated ...
HnbBarca's user avatar
0 votes
0 answers
37 views

Find correlation from biased observations

I have a set of observations of a variable Z (shown as the colormap) as a function of two other variables A and B. I want to study how Z varies with respect to A, B, and both A and B (eg. if A ...
Euryproktos's user avatar
2 votes
0 answers
37 views

How to estimate the age of players correctly?

I have the data of players active on a gaming console and the playtime hours corresponding to the games they have played and their age. I want to analyze the top (say 10) games that the people between ...
Ritik P. Nayak's user avatar
3 votes
1 answer
68 views

Deriving conditional independence statements for causal graphs with selection nodes

In "basic" causal graphs / DAGs / probabilistic graphical models (PGMs), conditional independence statements can be derived using the d-separation criterion. How does this work if selection ...
Eike P.'s user avatar
  • 3,098
1 vote
1 answer
22 views

Choice and endogeneity

An independent variable is endogenous if it is correlated with the error term (source). In the regression framework, this may happen (only?) in case of omitted variables, simultaneity, or measurement ...
robertspierre's user avatar
0 votes
0 answers
38 views

Difference in Difference and Selection Into Treatment

Suppose I impose that the true model of some variable of interest is: $$ Y_{it} = \alpha_i + \beta_t+\tau_{it}D_{it}+\epsilon_{it} $$ Where $ D_{it} = 1\{E_i \geq t\} $. This is a kind of DID model ...
DarkenExcalibur's user avatar
1 vote
0 answers
271 views

Calculating Inverse Mills Ratio after Probit

I need to compute the Inverse Mills Ratio after the probit command in Stata. From here, I found that predict IMR1, score, will calculate it and store it in IMR1. I ...
user917983's user avatar
5 votes
2 answers
243 views

Correcting for selection bias with standardisation/g-computation

Two sets of methods for correcting for selection bias are g-computation (standardisation) and inverse probability of censoring weighting (IPCW). I'm having a difficult time understanding how to apply ...
Lachlan's user avatar
  • 1,182
0 votes
0 answers
25 views

Right Way to Sample a Validation Set

I am working on a project that uses training data selection techniques; it involves sampling the training set in some smart way rather than sampling randomly. The goal is to compare different data ...
Mr.Robot's user avatar
  • 247
1 vote
1 answer
112 views

Understanding selection bias and endogeneity in marketing

Media mix modelling is concerned with estimating causal impact of marketing investments , a goal which have several challenges. In general, multiple regression models are deployed mapping up total ...
kurt eriksson's user avatar
0 votes
0 answers
36 views

Is "skewing the data" and "skewing the results" just selection bias?

I recall various conversations with biologists, ecologists, and foresters that I neglected to ask for clarification on at the time. It doesn't occur in any of my statistics references. Sometimes in ...
Galen's user avatar
  • 9,680
1 vote
0 answers
22 views

Can I use Shapley values with metadata (i.e. information about observations that I didn't train my model on)?

I'm training a set of models (random forest/XGBoost) for an ordinal regression task. I'm (tentatively) planning to use Shapley values to infer feature performance. I also have some metadata that my ...
Neil's user avatar
  • 66
1 vote
0 answers
34 views

How can aggregation be helpful in mitigating bias?

I am working on the estimation assessing the impact of exposure to infrastructure (mainly schooling) on the number of children. Since I do not have migration data, my colleague recommended that I ...
Yendao Su's user avatar
1 vote
1 answer
32 views

Selection Bias in Conflict Studies

A common critique I have heard levied against conflict studies (research examining the causes, consequences, and solutions to violence such as civil war, terrorism, etc.) is the problem of selection ...
Brian Lookabaugh's user avatar
2 votes
0 answers
853 views

How to address selection bias in a diff-in-diff study?

We know that selection bias occurs when the treatment and control groups are not comparable, leading to differences in the outcome that are not solely due to the treatment. First edit: By selection ...
funcard's user avatar
  • 61
0 votes
0 answers
70 views

Heckman correction for correlation estimates

Suppose I observe random $y_{i,1}, y_{i,2}$, and I wish to estimate the correlation between them. However, the $y_{i,j}$ are observed subject to some sample selection criterion. That is, there are ...
shabbychef's user avatar
1 vote
1 answer
39 views

Comparing a multi-dose drug to no drug exposure in a cohort study: Censoring events between doses

I am interested in assessing the association between the two doses of a dietary supplement on an event of interest. The primary exposure is 'two doses of the supplement', and the comparator is 'no ...
user3qpu's user avatar
  • 109
0 votes
1 answer
144 views

How to understand random assignment eliminates selection bias in the potential outcomes framework

In Angrist & Pischke's book mostly harmless econometrics, they explain that if the treatment in an RCT $D_i$ is randomly assigned, then $D_i$ is independent of potential outcomes and the following ...
Tomas R's user avatar
  • 177
1 vote
0 answers
18 views

Selection bias in postmortem data and creating an artificial earlier study endpoint

I want to analyze postmortem (neuropathology) data from dementia patients who are part of a larger ongoing observational study. At the time of the data freeze (i.e. the time at which I access the data)...
AnnaC's user avatar
  • 11
0 votes
1 answer
124 views

Questions Regarding Sampling Bias

I'm taking a course in R: "Data Analysis in R" on Coursera, and I came across this question during the lecture: A retail store considering updates to their credit card policies randomly ...
JackJackAttack0214's user avatar
2 votes
1 answer
67 views

How to correct for sampling bias in one population when comparing against another

I have two populations that I'd like to compare across certain metrics. However, most members of population A did not respond to our request for data, and those respondents that did are not ...
mdrishan's user avatar
  • 207
1 vote
1 answer
51 views

Maximum likelihood of Normal density under selection

Consider the density function given by $$ \left[\dfrac{\gamma_{\leq0} \mathbb{1}(t \leq 0) + \gamma_{>0} \mathbb{1}(t > 0)}{\gamma_{\leq0}\Phi\left(- \mu / \sigma\right) + \gamma_{>0}\Phi\...
Student_718's user avatar
1 vote
0 answers
49 views

Why does normalizing difference score>0.25 indicates selection bias which cannot be corrected by regression?

I am reading Propensity Score Analysis(2014) by Guo and Fraser chapter 1 section 4. Denote $\Delta_X$ normalizing difference score of covariate $X$. "Following Imbens and Wooldridge, a $\Delta_X$ ...
user45765's user avatar
  • 1,465
1 vote
0 answers
148 views

Inverse Mills Ratio Interpretation [closed]

What is the interpretation of inverse mills ratio in Heckman Selection Model ? Why we are including it as an explanatory variable in the OLS estimator?
Shivam Saboo's user avatar
1 vote
1 answer
24 views

Control Group Selection Bias

I found a study that compared minor physical anomalies(MPA) between certain group of patients with the control group to determine if MPAs occur more frequently among these patients compared to the ...
Kim's user avatar
  • 11
1 vote
1 answer
19 views

Estimating interactions from non-interacting features

Suppose I have a sample $\mathcal{D}=\{(\mathbf{x}^{i}, y^{i})\}_{i=1\dots M}$ of binary variables $\mathbf{X}$ ($N$ of them) and a continuous variable $Y$ that I want to predict based on a linear ...
Sergio's user avatar
  • 336
1 vote
0 answers
99 views

Nested Cross-Validation with Small dataset

I am currently working with a small dataset (only 175 samples, 45 features) and have been reading on the proper way to cross-validate my model. I had started with a basic cross-validation using a grid ...
Fritos121's user avatar
2 votes
1 answer
315 views

Sampling weights in Cox proportional hazards models

I'd like to use sampling weights in a Cox proportional hazards regression model to address selection due to different response probabilities. I'm calculating the weights as Inverse Probability of ...
r_epi's user avatar
  • 31
4 votes
3 answers
178 views

When can we get unbiased estimate given biased data?

There was a recent "hot take" tweet by Andrej Karpathy (without any comment or clarification from the author): real-world data distribution is ~N(0,1) good dataset is ~U(-2,2) It provoked ...
Tim's user avatar
  • 141k
3 votes
1 answer
364 views

Selection Models of Publication Bias for Multilevel Meta-analyses?

Are there any suitable selection models of publication bias for multilevel meta-analyses? I am currently conducting a 3-level meta-analysis and trying to incorporate selection models to assess ...
makie's user avatar
  • 73
1 vote
1 answer
30 views

Sample Bias in Study

I have following Study statement: A council wishes to study the digital awareness of its resident senior population (over 65 years), so it questioned in person 50 residents randomly chosen from a ...
Snoke's user avatar
  • 23
1 vote
1 answer
89 views

How to show how biased a statistic is in a non-random sample, knowing the parameter in the general population?

I have a convenience sample, and want to show readers how biased it is relative to the population it's taken from. It's absolutely certain that the sample is biased, and I want to give readers as many ...
J-J-J's user avatar
  • 5,873
1 vote
1 answer
139 views

Is it true that a larger, representative dataset is always better to use than a smaller, representative dataset?

By "representative" I mean that the data in the dataset faithfully reflects the "underlying signal" a model is trying to tap in to. Is it always true that, as long as increasing ...
sangstar's user avatar
  • 131
0 votes
1 answer
28 views

Can I ignore these individuals without introducing bias?

I have a population that falls under 10 classes. Each individual may or may not come with a location - 83% overall have locations and a breakdown by class is: Class # individuals # with location # ...
Chris Browne's user avatar
2 votes
0 answers
68 views

What are the statistical fallacies of illusion of control?

Illusion of control* appears in gambling and events involving randomness. For example, choosing a lottery ticket which has an additional information that participant has a control of choosing, such as ...
patagonicus's user avatar
  • 2,630