I have a data frame df which looks something like what you see below.
df <- data.frame(id = c("abc", "def", "ghi", "jkl", "mno"),
flag_V1 = c(2, 1, 1, 0, 1),
V1 = c(600, 1000, 500, NA, 700),
flag_V2 = c(1, 1, 0, 2, 0),
V2 = c(400, 600, 100, 700, NA),
flag_V3 = c(1, 2, 1, 0, 1),
V3 = c(600, 300, 600, NA, 700))
> df
id flag_V1 V1 flag_V2 V2 flag_V3 V3
1 abc 2 600 1 400 1 600
2 def 1 1000 1 600 2 300
3 ghi 1 500 0 100 1 600
4 jkl 0 NA 2 700 0 NA
5 mno 1 700 0 NA 1 700
Essentially, each column V1, V2, and V3 comes with a flag variable flag_V1, flag_V2, and flag_V3 which contain information about how the values in V1, V2, and V3 where collected.
As you can see, whenever the flag variable reports 0, the value in the associated column is NA. I want to replace these missing values with zeroes if (and only if) the flag variable reports 0. I want something like this
df_modified <- df %>% mutate(V1 = ifelse(flag_V1 == 0, 0, V1))
for every column that comes with a flag variable. The actual data that I'm working with has hundreds of columns, and not all of them come with a flag variable.
Is there a simple way to perform the above operation on many columns without having to do it one by one, preferably using dplyr-like syntax and avoiding traditional loops? I want to highlight again that I don't want to change all missing values to zeroes; it is very important that this is only done when the flag variable reports 0. Thank you!
CodePudding user response:
Use across twice with replace:
library(dplyr)
df %>%
mutate(replace(across(matches("^V")), across(matches("^flag")) == 0, 0))
id flag_V1 V1 flag_V2 V2 flag_V3 V3
1 abc 2 600 1 400 1 600
2 def 1 1000 1 600 2 300
3 ghi 1 500 0 0 1 600
4 jkl 0 0 2 700 0 0
5 mno 1 700 0 0 1 700
