Home > Net >  r data.table excluding rows with certain value in a column removes NAs too
r data.table excluding rows with certain value in a column removes NAs too

Time:01-28

I came accross this unexpected behaviour in data.table. Rows with NAs in a certain column are removed when excluding rows with a certain value as in this example:

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ cyl != 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 0

Created on 2022-01-27 by the reprex package (v2.0.1)

Excluding rows instead like

library(data.table)

dt_mtcars <- setDT(copy(mtcars))

set.seed(42)
na_rows <- runif(3, min = 1, max = nrow(mtcars))

dt_mtcars[ na_rows, cyl := NA]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

dt_mtcars <- dt_mtcars[ !cyl %in% 4]

dt_mtcars[ is.na(cyl), .N]
#> [1] 3

Created on 2022-01-27 by the reprex package (v2.0.1)

does have the expected result. Am I wrong in expecting this same result in the first example above? Or is this a bug in data.table?

CodePudding user response:

This isn't a data.table issue.

In the first case you don't select NAs:

NA != 4
[1] NA

In the second case you do:

!NA %in% 4
[1] TRUE

  •  Tags:  
  • Related