I have a dataframe like this:
df <- data.frame(
Metric = c('WI', NA, 'MN', NA, 'CO', NA),
Eval = c('WI', NA, 'AK', NA, 'CO', NA),
colA = c(30, 'ABC', 45, 'DEF', 2, 'XYZ'),
colB = c(25, 'BEC', 23, 'FED', 50, 'HIJ')
)
I'd like to evaluate whether the Metric column and the Eval column are equal, and, if they are, fill everything to the right of the Metric column with NAs so the result would look like this:
df_desired <- data.frame(
Metric = c('WI', NA, 'MN', NA, 'CO', NA),
Eval = c(NA, NA, 'AK', NA, NA, NA),
colA = c(NA, 'ABC', 45, 'DEF', NA, 'XYZ'),
colB = c(NA, 'BEC', 23, 'FED', NA, 'HIJ')
)
What's the best way to do this using R, ideally with a tidyverse function? I tried using mutate/across, but defining the conditional here is throwing me off.
CodePudding user response:
Create a logical vector and do the assignment based on row/column index/names (base R is more direct to do)
i1 <- with(df, Metric == Eval & !is.na(Metric) & !is.na(Eval))
df[i1, -1] <- NA
-output
> df
Metric Eval colA colB
1 WI <NA> <NA> <NA>
2 <NA> <NA> ABC BEC
3 MN AK 45 23
4 <NA> <NA> DEF FED
5 CO <NA> <NA> <NA>
6 <NA> <NA> XYZ HIJ
Or with dplyr, create a column of logical vector ('i1'), loop across the columns 'Eval' to 'colB', use case_when/ifelse/if_else/replace to change the values to NA based on the 'i1' and remove the temporary column by assigning to NULL
library(dplyr)
df %>%
mutate(i1 = Metric == Eval,
across(Eval:colB, ~ case_when(i1 ~ NA_character_, TRUE ~ .)),
i1 = NULL)
-output
Metric Eval colA colB
1 WI <NA> <NA> <NA>
2 <NA> <NA> ABC BEC
3 MN AK 45 23
4 <NA> <NA> DEF FED
5 CO <NA> <NA> <NA>
6 <NA> <NA> XYZ HIJ
CodePudding user response:
Use mutate() and if_else() to conditionally replace values:
df |>
mutate(colA = if_else(Metric == Eval, NA_character_, colA, missing = colA))
#> Metric Eval colA colB
#> 1 WI WI <NA> 25
#> 2 <NA> <NA> ABC BEC
#> 3 MN AK 45 23
#> 4 <NA> <NA> DEF FED
#> 5 CO CO <NA> 50
#> 6 <NA> <NA> XYZ HIJ
Note that we can't just use NA, we have to match it to the existing column type. In your example colA and colB are character vectors, so it's NA_character_. And we have to specify missing to handle the NA == NA case.
To generalise this across multiple columns, use across() and wrap the if_else() in an anonymous function:
df |>
mutate(across(Eval:colB, ~if_else(Metric == Eval, NA_character_, ., missing = .)))
#> Metric Eval colA colB
#> 1 WI <NA> <NA> <NA>
#> 2 <NA> <NA> ABC BEC
#> 3 MN AK 45 23
#> 4 <NA> <NA> DEF FED
#> 5 CO <NA> <NA> <NA>
#> 6 <NA> <NA> XYZ HIJ
