I'm looking for a more eloquent way to write R code for a kind of case that I've encountered more than once. Here is an example of the data and some code that accomplishes the result I want:
library(tidyverse)
df <- tibble(id = 1:5, primary_county = 101:105, secondary_county = 201:205)
specific_counties <- c(101, 103, 202, 205)
df |>
mutate(target_area =
primary_county %in% specific_counties | secondary_county %in% specific_counties)
The result is:
# A tibble: 5 × 4
id primary_county secondary_county target_area
<int> <int> <int> <lgl>
1 1 101 201 TRUE
2 2 102 202 TRUE
3 3 103 203 TRUE
4 4 104 204 FALSE
5 5 105 205 TRUE
I want to know if there is a way to get the same result using code that would be more succinct and eloquent if I were dealing with more columns of the "..._county" variety. Specifically, in my code above, the expression %in% specific_counties must be repeated with an | for each extra column I want to handle. Is there a way to not have to repeat so many lines of code?
CodePudding user response:
This allows a little over what you have, not sure how "eloquent" I'd call it:
df %>%
mutate(
target_area = rowSums(
sapply(select(cur_data(), matches("_county")),
`%in%`, specific_counties)) > 0
)
# # A tibble: 5 x 4
# id primary_county secondary_county target_area
# <int> <int> <int> <lgl>
# 1 1 101 201 TRUE
# 2 2 102 202 TRUE
# 3 3 103 203 TRUE
# 4 4 104 204 FALSE
# 5 5 105 205 TRUE
Or you can list the columns explicitly, replacing the select(.., matches(..)) with list(primary_county, secondary_county).
Add as many columns to the list(..) as you want.
CodePudding user response:
I would use across() to select the columns, and pmap inside mutate() to create the desired column. The key would be to use c(...) as an argument inside any(c(...) %in% index)
library(dplyr)
library(purrr)
df %>%
mutate(target_area = pmap_lgl(across(ends_with('county')),
~any(c(...) %in% specific_counties)))
# A tibble: 5 × 4
id primary_county secondary_county target_area
<int> <int> <int> <lgl>
1 1 101 201 TRUE
2 2 102 202 TRUE
3 3 103 203 TRUE
4 4 104 204 FALSE
5 5 105 205 TRUE
using dplyr::select() instead of list() may be more generalizable to other use cases.
