I have a data set with multiple columns. A sample of the first three columns are as follows:
df$a1 <- c("00845", "486", "49392", "04186", "5990")
df$a2 <- c("34580", "**2761**", "27800", "4439", "5849")
df$a3 <- c("0340", "49392", "78831", "70714", "486")
I want to create a column df$b which gives me a "1" if any of the columns a1-a15 contain the string "2761".
| a1 | a2 | a3 | ... | a15 | b |
|---|---|---|---|---|---|
| 00845 | 34580 | 0340 | ... | 4280 | 0 |
| 486 | 2761 | 49392 | ... | 25000 | 1 |
| 49392 | 27800 | 78831 | ... | 7955 | 0 |
| 04186 | 4439 | 70714 | ... | 27800 | 0 |
| 5990 | 5849 | 486 | ... | 4400 | 0 |
So far, I've developed the following code:
df %>%
mutate(d = c(0, 1)[(a1:a15 %in% c("2761")) 1])
but it doesn't work. Any help would be greatly appreciated!
CodePudding user response:
We may use if_any to check if the 'a1' to 'a15' columns in a row contain the string "2761" - if_any returns a logical vector, which is coerced to binary with or as.integer to create a new column 'd'
library(dplyr)
df <- df %>%
mutate(d = (if_any(matches("^a\\d $"), ~ . %in% "2761")))
CodePudding user response:
You may use dplyr's rowwise() and c_across() as follows:
df |>
rowwise() |>
mutate(
b = grepl(pattern = "2761", x = c_across(a1:a15)) |> any() |> as.numeric()
)
