Home > Back-end >  Creating a column based off of multiple columns
Creating a column based off of multiple columns

Time:01-11

I have a data set with multiple columns. A sample of the first three columns are as follows:

df$a1 <- c("00845", "486", "49392", "04186", "5990")

df$a2 <- c("34580", "**2761**", "27800", "4439", "5849")

df$a3 <- c("0340", "49392", "78831", "70714", "486")

I want to create a column df$b which gives me a "1" if any of the columns a1-a15 contain the string "2761".

a1 a2 a3 ... a15 b
00845 34580 0340 ... 4280 0
486 2761 49392 ... 25000 1
49392 27800 78831 ... 7955 0
04186 4439 70714 ... 27800 0
5990 5849 486 ... 4400 0

So far, I've developed the following code:

df %>%

  mutate(d = c(0, 1)[(a1:a15 %in% c("2761"))   1])

but it doesn't work. Any help would be greatly appreciated!

CodePudding user response:

We may use if_any to check if the 'a1' to 'a15' columns in a row contain the string "2761" - if_any returns a logical vector, which is coerced to binary with or as.integer to create a new column 'd'

library(dplyr)
df <- df %>%
     mutate(d =  (if_any(matches("^a\\d $"), ~ . %in% "2761")))

CodePudding user response:

You may use dplyr's rowwise() and c_across() as follows:

df |> 
  rowwise() |> 
  mutate(
    b = grepl(pattern = "2761", x = c_across(a1:a15)) |> any() |> as.numeric()
  )
  •  Tags:  
  • Related