Home > Enterprise >  R merge not duplicating columns even when some row differ between both dataframes
R merge not duplicating columns even when some row differ between both dataframes

Time:01-19

I am trying to merge 2 dataframes which only differs in a row and a column:

df1
          factor  nb  pass
   A        f      1   Yes
   B        c      3   No
   C        a      3   Yes

df2
         factor  quality pass
   A        f       100     Yes
   B        c       30      No
   D        x       60      Yes

After consulting this post, I've tried the following:

merge(df1, df2, by='row.names')

But this only returns the common observations between both dataframes, and it also duplicates common columns:

df3
       factor.x   nb.x   pass.x   factor.y   quality.y  pass.y
A         f        1       Yes      f          100       Yes
B         c        3       No       c          30        No

I've also tried:

merge(df1, df2)

But it does not work as the common variant is located in rownames.

Finally, I've tried:

merge(df1, df2, by='row.names', all =T, no.dup= T)

In this case, uncommon observations are also included in the merged dataframe, but all variants are duplicated.

Do you know how can I get this dataframe?

     factor   nb   pass   quality
A      f       1    Yes    100
B      c       3    No     30
C      a       3    Yes    NA
D      x       NA   Yes    60

Thanks!

CodePudding user response:

library(dplyr)
full_join(df1, df2)

Joining, by = c("id", "factor", "pass")
  id factor nb pass quality
1  A      f  1  Yes     100
2  B      c  3   No      30
3  C      a  3  Yes      NA
4  D      x NA  Yes      60

CodePudding user response:

R is not very good with row names, if you want to keep the row names you need to first add them as a column and then merge by all the common columns

merge(
  cbind("rows"=rownames(df1),df1),
  cbind("rows"=rownames(df2),df2),
  all=T
)

  rows factor pass nb quality
1    A      f  Yes  1     100
2    B      c   No  3      30
3    C      a  Yes  3      NA
4    D      x  Yes NA      60

alternatively

merge(df1,df2,by=c("row.names","factor","pass"),all=T)
  •  Tags:  
  • Related