Propensity Score Matching (PSM) using MatchThem

Ask Question

Asked 20 days ago

Modified 20 days ago

Viewed 24 times

I am conducting a non-randomized study to assess differences in complication rates between two treatment groups. Before matching, the observed complication rates differ significantly between the two groups (22% vs. 50%).

To account for potential confounders and ensure the consistency of these results, I decided to perform propensity score matching (PSM) using the Matchthem package. First, I identified potential confounders based on clinical considerations and constructed a Directed Acyclic Graph (DAG). I then applied various matching models to determine which performed the best (see Table for comparisons).

Model	Distance	Balanced Variables (<0.1)	Unbalanced Variables (>0.1)	Greatest Imbalance	Treated ESS	Control ESS	Unmatched Treated	Unmatched Control
Nearest Logit	0.0278	11	4	min_distance_gallblad_mm (0.1833)	29.89	70.4	4	6
Nearest Logit (Ratio=1, Replace=False)	0.1326	8	6	min_distance_gallblad_mm (0.2641)	42.6	42.6	10.4	34.4
Nearest Mahalanobis	NA	7	5	Segments_treated_per_session (0.4141)	32.74	77	7	7
Optimal	0.46	9	5	Metastasis_size_at_treatment_mm (0.3666)	53	53	0	24
Full Matching	0.0496	8	7	min_distance_gallblad_mm (0.3046)	19.5	77	NA	NA
Genetic	1.3338	5	9	Metastasis_size_at_treatment_mm (0.5037)	53	53	0	24

Below is the code for the first model, which performed the best:

# Perform matching in each imputed dataset
m.out_model1 <- matchthem(Treatment_type_tumor_near_gallbladder ~ age + sex + bmi + ASA + comorbidities + Systemic_treatment_prior_local_treatment + Segments_treated_per_session + Metastasis_size_at_treatment_mm + min_distance_gallblad_mm + stage_binary, 
data = new_amcore_imputed, 
method = "nearest", 
distance = "logit",
replace = TRUE,
ratio = 3,
caliper = 0.2)

Output:

Balance summary across all imputations
                                                 Type Max.Diff.Adj
distance                                     Distance       0.0278
age                                           Contin.       0.0558
sex_Male                                       Binary       0.1080
bmi                                           Contin.       0.1040
ASA_1                                          Binary       0.0094
ASA_2                                          Binary       0.0423
ASA_3                                          Binary       0.0329
comorbidities_0                                Binary       0.0381
comorbidities_1                                Binary       0.0452
comorbidities_2                                Binary       0.0329
Systemic_treatment_prior_local_treatment_Yes   Binary       0.0440
Segments_treated_per_session                  Contin.       0.1328
Metastasis_size_at_treatment_mm               Contin.       0.0931
min_distance_gallblad_mm                      Contin.       0.1833
stage_binary_III-IV                            Binary       0.0493
                                                    M.Threshold
distance                                         Balanced, <0.1
age                                              Balanced, <0.1
sex_Male                                     Not Balanced, >0.1
bmi                                          Not Balanced, >0.1
ASA_1                                            Balanced, <0.1
ASA_2                                            Balanced, <0.1
ASA_3                                            Balanced, <0.1
comorbidities_0                                  Balanced, <0.1
comorbidities_1                                  Balanced, <0.1
comorbidities_2                                  Balanced, <0.1
Systemic_treatment_prior_local_treatment_Yes     Balanced, <0.1
Segments_treated_per_session                 Not Balanced, >0.1
Metastasis_size_at_treatment_mm                  Balanced, <0.1
min_distance_gallblad_mm                     Not Balanced, >0.1
stage_binary_III-IV                              Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1        11
Not Balanced, >0.1     4

Variable with the greatest mean difference
                 Variable Max.Diff.Adj        M.Threshold
 min_distance_gallblad_mm       0.1833 Not Balanced, >0.1

Average sample sizes across imputations
                         0  1
All                  53.   77
Matched (ESS)        31.17 71
Matched (Unweighted) 49.   71
Unmatched             4.    6

Question 1: Can I rely on the results even if not all variables are balanced? If not, what alternative approach is appropriate?

Afterward, I performed a multivariable logistic regression using the matched dataset. Following guidance from Noah Greifer in another post (thanks!), I computed the Average Treatment Effect on the Treated (ATT) using the marginaleffects::avg_comparisons function.

# Perform a logistic regression on the matched data
results_all <- with(m.out_model1, svyglm(complications ~ Treatment_type_tumor_near_gallbladder + age + sex + bmi +
ASA + comorbidities + Systemic_treatment_prior_local_treatment + Segments_treated_per_session +
Metastasis_size_at_treatment_mm + min_distance_gallblad_mm + stage_binary,
family = quasibinomial()), cluster = FALSE)

# Compute the ATT on RISK RATIO SCALE
ATT_all <- lapply(results_all$analyses, function(fit) {
marginaleffects::avg_comparisons(fit, variables = c("Treatment_type_tumor_near_gallbladder", "age", "sex",
"bmi", "ASA", "comorbidities", "Systemic_treatment_prior_local_treatment", "Segments_treated_per_session",
"Metastasis_size_at_treatment_mm", "min_distance_gallblad_mm", "stage_binary"),
newdata = subset(Treatment_type_tumor_near_gallbladder == "MWA"),
comparison = "lnratioavg") # This returns the ATT on the risk ratio scale. To get the ATT on the odds ratio scale, change comparison to "lnoravg". To get the ATT on the risk difference scale, remove comparison and exponentiate.
})

# Pool the results
ATT_all |> mice::pool() |> summary(exponentiate = TRUE, conf.int = TRUE) |> mutate(across(where(is.numeric),
round, digits = 3))

Question 2: In the code above, I included all the variables used in the PSM to perform the multivariable logistic regression. Is this a correct approach, or should I have included only the unbalanced variables?

Thank you in advance.

asked Nov 25 at 22:53

Ludovico Ambrosi

473 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Stack Exchange Network

Propensity Score Matching (PSM) using MatchThem

0

Your Answer

Browse other questions tagged
r
logistic
propensity-scores
or ask your own question.

Hot Network Questions

Propensity Score Matching (PSM) using MatchThem

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged rlogisticpropensity-scores or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
r
logistic
propensity-scores
or ask your own question.