Home > database >  create new database from two dataframes, with different rows
create new database from two dataframes, with different rows

Time:02-06

I have two dataframes, df1 of 19600 rows and df2 of 18830 rows, df2$id finds correspondence in the df1$id, while the remaining 770 have no correspondence, I would like to merge the two dataframes creating df3 with column df3$id equal to df1$id , if the row matches df2 it gives me values ​​df2$value, otherwise it gives me 0, so in the 770 unmatched places I will have 0 as value

CodePudding user response:

Let's try to recreate your problem with a toy data set:

df1 <- data.frame(id = 1:5, values = 1:5)
df2 <- data.frame(id = 3:7, values = 6:10)

df1
#>   id values
#> 1  1      1
#> 2  2      2
#> 3  3      3
#> 4  4      4
#> 5  5      5

df2
#>   id values
#> 1  3      6
#> 2  4      7
#> 3  5      8
#> 4  6      9
#> 5  7     10

Then the naive way to achieve this in base R would be:

df3        <- df1
df3$values <- df2$values[match(df3$id, df2$id)]
df3$values[is.na(df3$values)] <- 0

Resulting in:

df3
#>   id values
#> 1  1      0
#> 2  2      0
#> 3  3      6
#> 4  4      7
#> 5  5      8

You will see that id 1 and 2 appear in df1 but not df2. The final result has 0 for the values column in these rows, but where there is a match, the correct values from df2 are written in the corresponding position.

Created on 2022-02-05 by the reprex package (v2.0.1)

CodePudding user response:

Using data.table():

library(data.table)

DT1 =  data.table(ID = c(1:15),
                  Value = c(3:10))

DT2 = data.table(ID = c(1:10),
                 Value = c(7:13))

Join the tables and then recode the variable:

DT3 = DT2[DT1, on = .(ID)][, .(ID,Value)]

DT3[!ID %in% DT2$ID, Value := 0] 

tail(DT3)

Output

 ID Value
1: 10     9
2: 11     0
3: 12     0
4: 13     0
5: 14     0
6: 15     0
  •  Tags:  
  • Related