I have two dataframes, df1 of 19600 rows and df2 of 18830 rows, df2$id finds correspondence in the df1$id, while the remaining 770 have no correspondence, I would like to merge the two dataframes creating df3 with column df3$id equal to df1$id , if the row matches df2 it gives me values df2$value, otherwise it gives me 0, so in the 770 unmatched places I will have 0 as value
CodePudding user response:
Let's try to recreate your problem with a toy data set:
df1 <- data.frame(id = 1:5, values = 1:5)
df2 <- data.frame(id = 3:7, values = 6:10)
df1
#> id values
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 5
df2
#> id values
#> 1 3 6
#> 2 4 7
#> 3 5 8
#> 4 6 9
#> 5 7 10
Then the naive way to achieve this in base R would be:
df3 <- df1
df3$values <- df2$values[match(df3$id, df2$id)]
df3$values[is.na(df3$values)] <- 0
Resulting in:
df3
#> id values
#> 1 1 0
#> 2 2 0
#> 3 3 6
#> 4 4 7
#> 5 5 8
You will see that id 1 and 2 appear in df1 but not df2. The final result has 0 for the values column in these rows, but where there is a match, the correct values from df2 are written in the corresponding position.
Created on 2022-02-05 by the reprex package (v2.0.1)
CodePudding user response:
Using data.table():
library(data.table)
DT1 = data.table(ID = c(1:15),
Value = c(3:10))
DT2 = data.table(ID = c(1:10),
Value = c(7:13))
Join the tables and then recode the variable:
DT3 = DT2[DT1, on = .(ID)][, .(ID,Value)]
DT3[!ID %in% DT2$ID, Value := 0]
tail(DT3)
Output
ID Value
1: 10 9
2: 11 0
3: 12 0
4: 13 0
5: 14 0
6: 15 0
