I am trying to merge 2 dataframes which only differs in a row and a column:
df1
factor nb pass
A f 1 Yes
B c 3 No
C a 3 Yes
df2
factor quality pass
A f 100 Yes
B c 30 No
D x 60 Yes
After consulting this post, I've tried the following:
merge(df1, df2, by='row.names')
But this only returns the common observations between both dataframes, and it also duplicates common columns:
df3
factor.x nb.x pass.x factor.y quality.y pass.y
A f 1 Yes f 100 Yes
B c 3 No c 30 No
I've also tried:
merge(df1, df2)
But it does not work as the common variant is located in rownames.
Finally, I've tried:
merge(df1, df2, by='row.names', all =T, no.dup= T)
In this case, uncommon observations are also included in the merged dataframe, but all variants are duplicated.
Do you know how can I get this dataframe?
factor nb pass quality
A f 1 Yes 100
B c 3 No 30
C a 3 Yes NA
D x NA Yes 60
Thanks!
CodePudding user response:
library(dplyr)
full_join(df1, df2)
Joining, by = c("id", "factor", "pass")
id factor nb pass quality
1 A f 1 Yes 100
2 B c 3 No 30
3 C a 3 Yes NA
4 D x NA Yes 60
CodePudding user response:
R is not very good with row names, if you want to keep the row names you need to first add them as a column and then merge by all the common columns
merge(
cbind("rows"=rownames(df1),df1),
cbind("rows"=rownames(df2),df2),
all=T
)
rows factor pass nb quality
1 A f Yes 1 100
2 B c No 3 30
3 C a Yes 3 NA
4 D x Yes NA 60
alternatively
merge(df1,df2,by=c("row.names","factor","pass"),all=T)
