Filtering by multiple columns at once in `dplyr`-CodePudding

Here is some sample data

library(tidyverse)

data <- matrix(runif(20), ncol = 4) 
colnames(data) <- c("mt100", "cp001", "cp002", "cp003")
data <- as_tibble(data)

The real data set has many more columns but it stands that there are many columns that all start with "cp". In dplyr I can select all these columns

data %>%
  select(starts_with("cp"))

Is there a way in which I can use the starts_with (or similar function) to filter by multiple columns without having to explicitly write them all? I'm thinking something like this

data %>%
  filter(starts_with("cp") > 0.2)

Thanks!

CodePudding user response：

We could use if_all or if_any as Anil is pointing in his comments: For your code this would be:

https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

if_any() and if_all()

"across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any()."

if_all

data %>% 
  filter(if_all(starts_with("cp"), ~ . > 0.2))

  mt100 cp001 cp002 cp003
  <dbl> <dbl> <dbl> <dbl>
1 0.688 0.402 0.467 0.646
2 0.663 0.757 0.728 0.335
3 0.472 0.533 0.717 0.638

if_any:

data %>% 
  filter(if_any(starts_with("cp"), ~ . > 0.2))

  mt100 cp001   cp002 cp003
  <dbl> <dbl>   <dbl> <dbl>
1 0.554 0.970 0.874   0.187
2 0.688 0.402 0.467   0.646
3 0.658 0.850 0.00813 0.542
4 0.663 0.757 0.728   0.335
5 0.472 0.533 0.717   0.638

CodePudding user response：

You can use dplyr::across() along with a purrr-style anonymous function:

data %>%
  filter(across(starts_with("cp"), ~ . > .2))
# # A tibble: 3 × 4
#   mt100 cp001 cp002 cp003
#   <dbl> <dbl> <dbl> <dbl>
# 1 0.628 0.604 0.802 0.501
# 2 0.744 0.283 0.702 0.493
# 3 0.279 0.372 0.975 0.751

(Note that without a set.seed(), our results will differ due to the RNG.)