Home > Enterprise >  Loop over multiple character columns to derive a new variable
Loop over multiple character columns to derive a new variable

Time:01-19

Could you please help me with a code in R. I want to derive a new variable available with values as Yes or No, however it need to check the character variables a1, a2, a3, a4 and whichever variable has a value as 'yellow' then available variable should be yes else no.

a1 <- c('orange','red') 
a2 <- c('red','yellow') 
a3 <- c('black','orange') 
a4 <- c('red','brown')

testa <- data.frame(a1,a2,a3,a4)

I want the available variable as below

CodePudding user response:

dplyr

You may use rowwise and c_across:

testa %>% 
  rowwise() %>% 
  mutate(available = ifelse(any(c_across(a1:a4) == "yellow"), "yes", "no"))

  a1     a2     a3     a4    available
  <chr>  <chr>  <chr>  <chr> <chr>    
1 orange red    black  red   no       
2 red    yellow orange brown yes      

base R

Use apply:

testa$available <- apply(testa, 1, function(x) ifelse(any(x == "yellow"), "yes", "no"))

CodePudding user response:

a1 <- c('orange','red') 
a2 <- c('red','yellow') 
a3 <- c('black','orange') 
a4 <- c('red','brown')

df <- data.frame(a1,a2,a3,a4)

library(tidyverse)
df %>% 
  mutate(available = ifelse(rowSums(. == "yellow") > 0, "yes", "no"))
#>       a1     a2     a3    a4 available
#> 1 orange    red  black   red        no
#> 2    red yellow orange brown       yes

library(data.table)

setDT(df)[, available := ifelse(rowSums(.SD == "yellow") > 0, "yes", "no")][]
#>        a1     a2     a3    a4 available
#> 1: orange    red  black   red        no
#> 2:    red yellow orange brown       yes

Created on 2022-01-19 by the reprex package (v2.0.1)

CodePudding user response:

You could just concatenate the cells in each row and look for the character "yellow" in the resulting strings:

> grepl("yellow",paste(as.data.frame(t(testa))))
[1] FALSE  TRUE

You can then use this resulting logical vector to put labels in a new column:

testa$available = c("no","yes")[grepl("yellow",paste(as.data.frame(t(testa)))) 1]

That would result in the data.frame:

> testa
      a1     a2     a3    a4 available
1 orange    red  black   red        no
2    red yellow orange brown       yes

(If you want "yes" for the lines that do not contain yellow, just flip them in the vector c("no","yes"))

  •  Tags:  
  • Related