Home > Blockchain >  How to filter rows in a datatable based on another datatable?
How to filter rows in a datatable based on another datatable?

Time:02-03

I want to filter a datatable(d1) for all rows sharing the same id as another datatable(d2).

d1 <- data.table(id = c('123','132','142', '153', '143'),
                 x1 = c('a','a','b','e','r'),
                 x2 = c('d','e','f','z','o'))
    
d2 <- data.table(id = c('123','132','142', '143', '165', '153', '143'),
                   y1= c('ABCDE', 'CDEFE', 'EFSAD', 'FEASD', 'SDSDS', 'VDCDS', 'DSXSZ' )

Expected output:

   id    y1
1: 123  ABCDE
2: 132  CDEFE
3: 142  EFSAD
4: 153  VDCDS
5: 143  DSXSZ

I have tried the simple code:

d2[id %in% d1$id]

But R gets stuck when running the code and the stop sign to stop the code does not show up and ESC when in the console does not work. I need to restart R to stop the code.

I know how to do this in dplyr

d2 %>%
 group_by(id) %>%
  filter(lopnr %in% d1$lopnr)

but i need to do it in data.table since I am dealing with large dataframes and it takes too long time in dplyr.

CodePudding user response:

newdf <- merge(d2,d1, all = FALSE)

This will still bring in the other columns but you can simply drop these after

CodePudding user response:

You can use this approach:

library(data.table)

d2[d1, on = .(id), nomatch = NULL][, c("id", "y1")]

Output

    id    y1
1: 123 ABCDE
2: 132 CDEFE
3: 142 EFSAD
4: 153 VDCDS
5: 143 FEASD
6: 143 DSXSZ
  •  Tags:  
  • Related