I have a dataframe that has many columns describing the number of a particular asset that a household owns. I would like set all missing values NA equal to 0 conditional on there being at least one non NA value in at least one other asset_ column. For example, in the dataframe:
dat <- data.frame(asset_1 = c(NA, 2, NA), asset_2 = c(1,3,NA), asset_3 = c(NA, NA, NA))
| asset_1 | asset_2 | asset_3 |
|---|---|---|
| NA | 3 | NA |
| 0 | 1 | NA |
| NA | NA | NA |
I would to achieve:
| asset_1 | asset_2 | asset_3 |
|---|---|---|
| 0 | 3 | 0 |
| 0 | 1 | 0 |
| NA | NA | NA |
Is there any nice way to do this with Dplyr? Any more efficient alternatives in any other package are also welcome.
My guess would be to use mutate in combination with across and if_else but I can't figure out a way to consider all columns except the one that is being mutated.
CodePudding user response:
Please try
library(dplyr)
dat <- data.frame(asset_1 = c(NA, 2, NA), asset_2 = c(1,3,NA), asset_3 = c(NA, NA, NA))
dat2 <- dat %>% mutate(new=coalesce(asset_1,asset_2,asset_3), across(starts_with('asset'), ~ ifelse(!is.na(new) & is.na(.x), 0, .x))) %>% dplyr::select(-new)
CodePudding user response:
library(tidyverse)
dat %>%
mutate(helper = apply(across(everything()), 1, function(x) sum(is.na(x))),
across(-helper, ~ifelse(helper != 3 & is.na(.), 0, .)))
asset_1 asset_2 asset_3 helper
1 0 1 0 2
2 2 3 0 1
3 NA NA NA 3
