I have been trying to filter any missing values in R using dplyr but I don't seem to get it right. Any lead would be grateful.
library(dplyr)
library(tibble)
set.seed(2021)
testdata <- tibble(
a1 = c(10, 12, NA, 10, 13),
a2 = c("Test", "Test1", "Test 2", NA, NA),
a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
a4 = rbinom(5, 10, 0.5)
)
testdata2 <- testdata %>%
filter(across(.cols = everything(), ~any_vars(is.na(.))))
CodePudding user response:
If you are using dplyr to do this you can use the functions if_all/if_any to do this.
To select rows with at least one missing value -
library(dplyr)
testdata %>% filter(if_any(.fns = is.na))
# a1 a2 a3 a4
# <dbl> <chr> <chr> <int>
#1 10 Test NA 5
#2 NA Test 2 Test 2 6
#3 10 NA NA 5
#4 13 NA Test 4 6
To select rows with no missing values -
testdata %>% filter(if_all(.fns = Negate(is.na)))
# a1 a2 a3 a4
# <dbl> <chr> <chr> <int>
#1 12 Test1 Test 10 6
CodePudding user response:
To remove any rows that have an NA value you'll need to edit your code slightly, to include a negation (i.e. filter for the rows that return a FALSE when you ask if they contain missing values).
I also used .cols = contains("a") to show you a way of using tidy select when you don't want to include every column.
testdata %>%
filter(across(.cols = contains("a"), ~!is.na(.)))
CodePudding user response:
library(tidyverse)
set.seed(2021)
testdata <- tibble(
a1 = c(10, 12, NA, 10, 13),
a2 = c("Test", "Test1", "Test 2", NA, NA),
a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
a4 = rbinom(5, 10, 0.5)
)
testdata %>%
filter(rowSums(across(.cols = everything(), .fns = is.na)) > 0)
#> # A tibble: 4 x 4
#> a1 a2 a3 a4
#> <dbl> <chr> <chr> <int>
#> 1 10 Test <NA> 5
#> 2 NA Test 2 Test 2 6
#> 3 10 <NA> <NA> 5
#> 4 13 <NA> Test 4 6
Created on 2022-01-25 by the reprex package (v2.0.1)
CodePudding user response:
For the sake of completeness, here's a base R concise solution:
testdata[apply(testdata, 1, \(x) any(is.na(x))),]
a1 a2 a3 a4
1 10 Test NA 5
2 NA Test 2 Test 2 6
3 10 NA NA 5
4 13 NA Test 4 6
If you wish to select rows with no NAs, I'd suggest drop_na:
tidyr::drop_na(testdata)
a1 a2 a3 a4
1 12 Test1 Test 10 6
