I have a dataframe and a list. The list contains the data I need to filter in the dataframe. How can I automate the filtering process when I don't know the variables in the list?
some sample data:
df <- data.frame(V1 = c(sample(1:2,10,replace=T)),
V2 = c(sample(c("A","B","C"),10, replace=T)),
V3 = c(sample(100:104,10,replace=T)))
The list, f_list, is created in another part of the application and eventually passed on to the function that needs to do the filtering. For example, some times the list contains V1 and V3
f_list <- list()
f_list$V1 <- c("2")
f_list$V3 <- c("101","103","104")
Other times it contains V1 and V2
f_list <- list()
f_list$V1 <- c("1")
f_list$V2 <- c("A","B")
and so on... the real data has hundreds of variables. How can I automate the filtering process that would look something like this when the variables are known?
df %>%
filter(V1 %in% f_list$V1,
V3 %in% f_list$V3)
How do I construct the loop?
EDITED
I edited the name of the object, from ls to f_list per @I_0's reminder that objects should not have names of existing functions. Thanks for the help everyone.
CodePudding user response:
You could use if_any and cur_column
library(dplyr)
df %>%
filter(
(if_all(
.cols = names(f_list),
.fns = ~ .x %in% f_list[[cur_column()]])
)
)
# V1 V2 V3
#1 1 B 100
#2 1 A 101
#3 1 A 102
#4 1 A 103
Note for the time being the () around if_all due to cur_column() requires extra parentheses to work inside if_any() and if_all()
CodePudding user response:
one approach:
## example data:
filter_list <- list()
filter_list$V1 <- c("2") %>% as.numeric
filter_list$V3 <- c("101","103","104") %>% as.numeric
notes: 1. avoid naming an object (your list ls) like an existing function (ls() in base R), 2. remember to match the type (numeric, character etc.) of filter criteria and filter object.
code:
library(dplyr)
filter_list %>%
as.data.frame %>%
left_join(df)
