Create a new column with all selected column names in R-CodePudding

It's been a while since I've had to drop by and ask for advice, but this one has me stumped and my searches for an answer haven't been fruitful (probably because the words I'm using are confusing search engines).

I have a survey data file that has columns that are either set to a 1 (if the respondent clicked the box) or are NA (if they didn't). I want to create a new field for each respondent that summarizes the checked boxes.

Below is a reprex with the starting df and the desire df2 I would like to create.

Any advice is greatly appreciated. Happy Sunday! And thanks in advance.

Original table:

library(tibble)
set.seed(125)
df <- tibble(
  id = 1:10,
  math = sample(c(rep(1, 3), rep(NA, 7))),
  english = sample(c(rep(1, 5), rep(NA, 5))),
  science = sample(c(rep(1, 6), rep(NA, 4))),
  history = sample(c(rep(1, 8), rep(NA, 2)))
)

Looks like this:

# A tibble: 10 x 5
#      id  math english science history
#   <int> <dbl>   <dbl>   <dbl>   <dbl>
# 1     1    NA       1      NA       1
# 2     2    NA      NA       1       1
# 3     3    NA       1       1       1
# 4     4     1      NA       1       1
# 5     5     1       1      NA       1
# 6     6    NA       1       1       1
# 7     7    NA      NA      NA      NA
# 8     8     1      NA       1       1
# 9     9    NA       1      NA       1
#10    10    NA      NA       1      NA

Desired table

# A tibble: 10 x 2
#      id background                 
#   <int> <chr>                      
# 1     1 "english, history"         
# 2     2 "science, history"         
# 3     3 "english, science, history"
# 4     4 "math, science, history"   
# 5     5 "math, english, history"   
# 6     6 "english, science, history"
# 7     7 ""                         
# 8     8 "math, science, history"   
# 9     9 "english, history"         
#10    10 "science"

^{Created on 2022-01-30 by the reprex package (v2.0.1)}

CodePudding user response：

A dplyr/purrr approach:

library(dplyr)
library(purrr)

df %>%
  transmute(id = id, 
            background = pmap_chr(across(-id, ~ cur_column()[.x]), ~ toString(na.omit(c(...)))))

# A tibble: 10 x 2
      id background                 
   <int> <chr>                      
 1     1 "english, history"         
 2     2 "science, history"         
 3     3 "english, science, history"
 4     4 "math, science, history"   
 5     5 "math, english, history"   
 6     6 "english, science, history"
 7     7 ""                         
 8     8 "math, science, history"   
 9     9 "english, history"         
10    10 "science"

CodePudding user response：

If you don't mind, here is a base R option using by (somewhat short for lapply(split(x, seq_len(nrow(x) ...) where we loop over the rows, and extract the name of columns 2:4 if the value is equal to 1.

tmp <- df[,-1]
out <- by(data = tmp,
          INDICES = df[, 1], 
          FUN = \(x) names(tmp)[which(x == 1)])

The result will be unclassed and combined with the first row of your df:

cbind(
  df[, 1, drop = FALSE],
  background = unclass(out)
)

Result

#   id                background
#1   1          english, history
#2   2          science, history
#3   3 english, science, history
#4   4    math, science, history
#5   5    math, english, history
#6   6 english, science, history
#7   7                          
#8   8    math, science, history
#9   9          english, history
#10 10                   science

data

set.seed(125)
df <- data.frame(
  id = 1:10,
  math = sample(c(rep(1, 3), rep(NA, 7))),
  english = sample(c(rep(1, 5), rep(NA, 5))),
  science = sample(c(rep(1, 6), rep(NA, 4))),
  history = sample(c(rep(1, 8), rep(NA, 2)))
)

CodePudding user response：

You can use pivot_longer and filter with a final paste to get the desired output

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols=c(-id)) %>% 
  group_by(id) %>% 
  mutate(value=replace_na(value, 0), 
         v_sum=sum(value), 
         name=ifelse(v_sum==0,"",name)) %>% 
  filter(value==1|v_sum==0) %>% 
  summarise(background=paste(unique(name), collapse=", "))
# A tibble: 10 × 2
      id background                 
   <int> <chr>                      
 1     1 "english, history"         
 2     2 "science, history"         
 3     3 "english, science, history"
 4     4 "math, science, history"   
 5     5 "math, english, history"   
 6     6 "english, science, history"
 7     7 ""                         
 8     8 "math, science, history"   
 9     9 "english, history"         
10    10 "science"