Merge two dataframes and keep non-missing entries

Question

I have two dataframes like this:

set.seed(1)
df1 <- data.frame(id= 1:4, sex= c("m", "m", NA, NA), somevar= letters[1:4], whocares_var= rnorm(4))
df2 <- data.frame(id= 1:6, sex= c("m", NA, "m", NA, "m", NA), somevar= NA, morevars= LETTERS[1:6])

And I want to merge them. What I do is:

df_both <- merge(df1, df2, by= "id", all= TRUE)
df_both

  id sex.x somevar.x whocares_var sex.y somevar.y morevars
1  1     m         a    2.5721564     m        NA        A
2  2     m         b   -1.1182118  <NA>        NA        B
3  3  <NA>         c    0.6560304     m        NA        C
4  4  <NA>         d   -0.7959650  <NA>        NA        D
5  5  <NA>      <NA>           NA     m        NA        E
6  6  <NA>      <NA>           NA  <NA>        NA        F

I don't want the merged dataframe to have two columns sex.x and sex.y. Instead I want to have one sex column that contains the non-missing entry. So what I expect to get is:

set.seed(1)
df_wanted <- data.frame(id= 1:6, sex= c("m", "m", "m", NA, "m", NA),
                        somevar= c(letters[1:4], NA, NA),
                        whocares_var= c(rnorm(4), NA, NA),
                        morevars= LETTERS[1:6])
df_wanted
  id  sex somevar whocares_var morevars
1  1    m       a   -0.6264538        A
2  2    m       b    0.1836433        B
3  3    m       c   -0.8356286        C
4  4 <NA>       d    1.5952808        D
5  5    m    <NA>           NA        E
6  6 <NA>    <NA>           NA        F

So the function I am looking for only keeps the non-missing entries whenever both dataframe have the same column name. If a column is only present in one of the dataframes, it should also appear in the final data. How to achieve that?

Remark: I don't have the case of conflicting entries (i.e. different non-missing entries for same id)

Can you please adapt your toy data? Two similar columns per data frame would be helpful. — Friede, Commented Oct 29 at 13:56

ThomasIsCoding · Accepted Answer · 2024-10-29 14:53:41Z

Probably you can try

d <- merge(df1, df2, by = "id", all = TRUE)
nms <- sub("\\.[xy]", "", names(d))
list2DF(
    lapply(
        split.default(d, nms)[unique(nms)],
        \(x) do.call(coalesce, x)
    )
)

which gives

  id  sex somevar whocares_var morevars
1  1    m       a   -0.6264538        A
2  2    m       b    0.1836433        B
3  3    m       c   -0.8356286        C
4  4 <NA>       d    1.5952808        D
5  5    m    <NA>           NA        E
6  6 <NA>    <NA>           NA        F

Note: coalesce is from dplyr package

G. Grothendieck · Accepted Answer · 2024-10-29 15:37:46Z

With the expansion of the example in the question the answer has been revised. Merge the data frames, convert to a list, split the columns into groups having the same root, combine the columns in each such group using pmax removing NA's unless all NA and finally convert back to data frame. No packages are used other than tools which comes with R so it does not have to be installed.

library(tools)

df1 |>
  merge(df2, by = "id", all = TRUE) |>
  as.list() |>
  list(xx = _) |>
  with(split(xx, file_path_sans_ext(names(xx)))) |>
  lapply(\(cols) do.call("pmax", c(cols, na.rm = TRUE))) |>
  as.data.frame()

giving

  id morevars  sex somevar whocares_var
1  1        A    m       a   -0.6264538
2  2        B    m       b    0.1836433
3  3        C    m       c   -0.8356286
4  4        D <NA>       d    1.5952808
5  5        E    m    <NA>           NA
6  6        F <NA>    <NA>           NA

Have revised answer in light of the expanded example.
– G. Grothendieck
Commented Oct 29 at 15:43 — G. Grothendieck, Commented Oct 29 at 15:43

Collectives™ on Stack Overflow

Merge two dataframes and keep non-missing entries

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
r
dataframe
merge
missing-data
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged rdataframemergemissing-data or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dataframe
merge
missing-data
or ask your own question.