Here is some sample data
library(tidyverse)
data <- matrix(runif(20), ncol = 4)
colnames(data) <- c("mt100", "cp001", "cp002", "cp003")
data <- as_tibble(data)
The real data set has many more columns but it stands that there are many columns that all start with "cp". In dplyr I can select all these columns
data %>%
select(starts_with("cp"))
Is there a way in which I can use the starts_with (or similar function) to filter by multiple columns without having to explicitly write them all? I'm thinking something like this
data %>%
filter(starts_with("cp") > 0.2)
Thanks!
CodePudding user response:
We could use if_all or if_any as Anil is pointing in his comments: For your code this would be:
https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
if_any() and if_all()
"across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any()."
if_all
data %>%
filter(if_all(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.688 0.402 0.467 0.646
2 0.663 0.757 0.728 0.335
3 0.472 0.533 0.717 0.638
if_any:
data %>%
filter(if_any(starts_with("cp"), ~ . > 0.2))
mt100 cp001 cp002 cp003
<dbl> <dbl> <dbl> <dbl>
1 0.554 0.970 0.874 0.187
2 0.688 0.402 0.467 0.646
3 0.658 0.850 0.00813 0.542
4 0.663 0.757 0.728 0.335
5 0.472 0.533 0.717 0.638
CodePudding user response:
You can use dplyr::across() along with a purrr-style anonymous function:
data %>%
filter(across(starts_with("cp"), ~ . > .2))
# # A tibble: 3 × 4
# mt100 cp001 cp002 cp003
# <dbl> <dbl> <dbl> <dbl>
# 1 0.628 0.604 0.802 0.501
# 2 0.744 0.283 0.702 0.493
# 3 0.279 0.372 0.975 0.751
(Note that without a set.seed(), our results will differ due to the RNG.)
