I am working with survey data where I need to identify whether responses were not applicable or simply skipped.
x1andx2are my primary response questions. x3 is only filled out when someone indicates ifx1andx2are not applicable.- I want row 1 and row 2 to equal 0 as
x1andx2were not applicable, and I want row 4 to equal 1 asx1andx2was correctly reported as "None of the above." - Row 3 is the truly missing observation as neither
x1,x2, orx3were answered.
My code below works, but is there a simplier way where I can just refer to my list variable rather than writing a lengthy AND statement using x1 and x2? I have simplified this example as my full dataset has 10 x variables.
library(dplyr)
df3 <- tibble(x1 = c("Yes", "No", NA, NA), x2 = c("Yes", "Yes", NA, NA), x3 = c(NA, NA, NA, "None of the above"))
list <- c("x1", "x2")
df3 %>%
mutate(
x3 = coalesce(as.integer(x3 == "None of the above"), 0),
x3 = case_when(
x3 == 1 ~ 1,
is.na(x1) & is.na(x2) ~ NA_real_,
TRUE ~ 0))
CodePudding user response:
A minimal revision is replacing is.na(x1) & is.na(x2) with if_all(..., is.na), i.e.
df3 %>%
mutate(
x3 = coalesce(as.integer(x3 == "None of the above"), 0),
x3 = case_when(
x3 == 1 ~ 1,
if_all(all_of(list), is.na) ~ NA_real_,
TRUE ~ 0))
With if_all(), you don't even need to refer to the list variable, just pass a simple tidy-select helper instead, e.g.
if_all(x1:x2, is.na)(This wayx1&x2must be consecutive columns)if_all(starts_with("x"), is.na)if_all(num_range("x", 1:2), is.na)
CodePudding user response:
Here is one idea. We can use all_of, select to select the columns based on the target vector (I changed list to target as list is a function in R), then use is.na and rowSums to evaluate if all columns are NA. We can combine this strategy with case_when to create the x3 column.
library(dplyr)
target <- c("x1", "x2")
df3 %>%
mutate(
x3 = case_when(
rowSums(is.na(select(., all_of(target)))) == 0 ~0L,
rowSums(is.na(select(., all_of(target)))) != 0 & !is.na(x3) ~1L,
TRUE ~NA_integer_
),
)
# # A tibble: 4 × 3
# x1 x2 x3
# <chr> <chr> <int>
# 1 Yes Yes 0
# 2 No Yes 0
# 3 NA NA NA
# 4 NA NA 1
