filtering any missing values in R-CodePudding

I have been trying to filter any missing values in R using dplyr but I don't seem to get it right. Any lead would be grateful.

library(dplyr)
library(tibble)

set.seed(2021)
testdata <- tibble(
  a1 = c(10, 12, NA, 10, 13),
  a2 = c("Test", "Test1", "Test 2", NA, NA),
  a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
  a4 = rbinom(5, 10, 0.5)
)

testdata2 <- testdata %>% 
  filter(across(.cols = everything(), ~any_vars(is.na(.))))

CodePudding user response：

If you are using dplyr to do this you can use the functions if_all/if_any to do this.

To select rows with at least one missing value -

library(dplyr)

testdata %>% filter(if_any(.fns = is.na))

#    a1 a2     a3        a4
#  <dbl> <chr>  <chr>  <int>
#1    10 Test   NA         5
#2    NA Test 2 Test 2     6
#3    10 NA     NA         5
#4    13 NA     Test 4     6

To select rows with no missing values -

testdata %>% filter(if_all(.fns = Negate(is.na)))

#     a1 a2    a3         a4
#  <dbl> <chr> <chr>   <int>
#1    12 Test1 Test 10     6

CodePudding user response：

To remove any rows that have an NA value you'll need to edit your code slightly, to include a negation (i.e. filter for the rows that return a FALSE when you ask if they contain missing values).

I also used .cols = contains("a") to show you a way of using tidy select when you don't want to include every column.

testdata %>%
  filter(across(.cols = contains("a"), ~!is.na(.)))

CodePudding user response：

library(tidyverse)
set.seed(2021)
testdata <- tibble(
  a1 = c(10, 12, NA, 10, 13),
  a2 = c("Test", "Test1", "Test 2", NA, NA),
  a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
  a4 = rbinom(5, 10, 0.5)
)

testdata %>% 
  filter(rowSums(across(.cols = everything(), .fns = is.na)) > 0)
#> # A tibble: 4 x 4
#>      a1 a2     a3        a4
#>   <dbl> <chr>  <chr>  <int>
#> 1    10 Test   <NA>       5
#> 2    NA Test 2 Test 2     6
#> 3    10 <NA>   <NA>       5
#> 4    13 <NA>   Test 4     6

^{Created on 2022-01-25 by the reprex package (v2.0.1)}

CodePudding user response：

For the sake of completeness, here's a base R concise solution:

testdata[apply(testdata, 1, \(x) any(is.na(x))),]

     a1 a2     a3        a4
1    10 Test   NA         5
2    NA Test 2 Test 2     6
3    10 NA     NA         5
4    13 NA     Test 4     6

If you wish to select rows with no NAs, I'd suggest drop_na:

tidyr::drop_na(testdata)

     a1 a2    a3         a4
1    12 Test1 Test 10     6