Advanced Handling of Missing Data: One-Day Workshop
Advanced Handling of Missing Data: One-Day Workshop
Advanced Handling of Missing Data: One-Day Workshop
Missing Data
One-day Workshop
Nicole Janz
[email protected]
Goals
2
Steps in the research process
1. Identify patterns of missingness for each variable
2. Why are data missing? Could this bias your sample?
3
Proportions of missingness
A SIMPLIFIED per
BIVARIATE TEST GUIDE
variable in a table
variable nmiss n propmiss
4
Proportions of missingness
A SIMPLIFIED per
BIVARIATE TEST GUIDE
variable in a graph
Proportion of missingness
Petrol/GDP
Mining/GDP
Other FDI/GDP
Deposit./GDP
Finance/GDP
US FDI/GDP
Wh.Trade/GDP
Food/GDP
Chemical/GDP
Metal/GDP
Transp./GDP
Machinery/GDP
Mosley Law
Mosley Prac.
Mosley Labor
Electr./GDP
PTS
Democracy
CIRI Women
CIRI Phys.
CIRI Emp.
CIRI Worker
Trade
GDP p. capita
Population
Conflict
Fariss
Life exp.
Inf.mort.
6
Heatmap per country-year
A SIMPLIFIED andGUIDE
BIVARIATE TEST
variable
7
yellow=missing
Why are my data
A SIMPLIFIED missing?
BIVARIATE TEST GUIDE
Due to social/natural processes
• school graduation, dropout, death
• a country does not exist anymore e.g. GDR
• statistics office reclassified variables
• intentional non-disclosure
Respondent refusal
• income
8
Why are my data
A SIMPLIFIED missing?
BIVARIATE TEST GUIDE
variable nmiss n propmiss
9
Three types of missingness
A SIMPLIFIED BIVARIATE TEST GUIDE
10
MCAR: Missing Completely
A SIMPLIFIED at Random
BIVARIATE TEST GUIDE
Missing value (y) neither depends on x nor y. Probability of
missingness is the same for all units.
What to do:
If data are missing completely at random, then throwing out
cases with missing data does not bias your inferences
If sex, race, education, and age are recorded for all the
people in the survey, then “earnings” is MAR if the
probability of nonresponse depends only on these variables
If men are more likely to tell you their weight than women,
and we record gender, then weight is MAR.
What to do?
Some say listwise deletion is fine, but only if regression
controls for all variables that affect probability of missingness.
13
MNAR: MissingBIVARIATE
A SIMPLIFIED not at Random
TEST GUIDE
(non-ignorable missingness)
What to do?
Most problematic case. Potential lurking variables are often
unobserved.
14
How to distinguish
A SIMPLIFIED between
BIVARIATE MNAR
TEST GUIDE
and MAR?
Think about your variables and use your substantive scientific
knowledge of the data and your field.
15
How to distinguish
A SIMPLIFIED between
BIVARIATE MAR
TEST GUIDE
and MCAR?
Again, think about the data. Some indication (but no definitive
answer) can be gained from two tests:
Mean imputation
easiest way to impute is to replace each NA with the mean
• distorts distribution for this variable, e.g. underestimates sd
• ignores changes over time
20
Multiple Imputation
A SIMPLIFIED Techniques
BIVARIATE TEST GUIDE
Multiple imputation (MI) is also based on the idea of using
predicted values, but it builds in mechanisms to incorporate
uncertainty about the predicted values.
The observed values remain the same, but the imputed value
varies across these 5 data sets, reflecting uncertainty.
Figure: https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf
22
Details on expectation maximization (EM) algorithm, see King et al. (2001).
Combination ofBIVARIATE
A SIMPLIFIED results TEST GUIDE
Run each analysis (e.g. regression) on all 5 imputed data
sets.
23
See King et al. (2001).
Multiple Imputation
A SIMPLIFIED Software
BIVARIATE TEST GUIDE
STATA mi estimate
24
Social Sciences Research
Methods Centre
Lab
Summarizing and
A SIMPLIFIED Visualizing
BIVARIATE TEST GUIDE
Missingness in R
% of missingness per variable and subsets of variables
Graphical display
26
MCAR patterns?
A SIMPLIFIED BIVARIATE TEST GUIDE
27
Ad-hoc measures
A SIMPLIFIED in R TEST GUIDE
BIVARIATE
1) Listwise deletion, pairwise
deletion
3) Mean imputation
28
Example 1
A SIMPLIFIED BIVARIATE TEST GUIDE
Adapted from Schlomer et al. (2010)
Step 1:
Calculate M, SD, B, and SE with DV0Miss
Step 2:
Create the target data set with DV20MAR
30
Example 1
A SIMPLIFIED BIVARIATE TEST GUIDE
Adapted from Schlomer et al. (2010)
Describe missing patterns
Summarize and Little's (1998) MCAR Dummy code
visualize missingness test missingness
Ad-hoc methods
Delete listwise Carry last Substitute Recode Predict from
or pairwise value forward with mean manually regression
Multiple imputation
Amelia II
31
Multiple Imputation
A SIMPLIFIED with Amelia
BIVARIATE II
TEST GUIDE
32
Reproducibility
A SIMPLIFIED BIVARIATE TEST GUIDE
33
Reproducibility
A SIMPLIFIED BIVARIATE TEST GUIDE
Dear _____,
I'm a PhD student at Cambridge University, and I work on foreign
investment and labor standards. I read your with great interest. I
was wondering if you could make the imputation Rcode available
to me? I am asking this because I am using Amelia as well, and I
would like to try and replicate your imputation with the same
specifications.
• Include variables in the form they enter the model (lags, logs,
leads, transformations).
• Check diagnostics
35
Literature and tutorials
Amelia mailing list
https://lists.gking.harvard.edu/mailman/listinfo/amelia
James Honaker and Gary King, What to do About Missing Values in Time
Series Cross-Section Data American Journal of Political Science Vol. 54,
No. 2 (April, 2010): Pp. 561-581.
Gary King, James Honaker, Anne Joseph, and Kenneth Scheve. Analyzing
Incomplete Political Science Data: An Alternative Algorithm for Multiple
Imputation, American Political Science Review, Vol. 95, No. 1 (March,
2001): Pp. 49-69.
36
Literature and tutorials
Andrew Gelman and Jeniffer Hill, Data Analysis Using Regression and
Multilevel/Hierarchical Models, CHAPTER 25: Missing-data imputation.
Cambridge University Press, Cambridge (2006).
Enders, Craig. 2010. Applied Missing Data Analysis. Guilford Press: New
York.
Little, Roderick J., Donald Rubin. 2002. Statistical Analysis with Missing
Data. John Wiley & Sons, Inc: Hoboken.
Schafer, Joseph L., John W. Graham. 2002. “MissingData: Our View of the
State of the Art.” Psychological Methods. 37
Thank you !
Nicole Janz
www.nicolejanz.de