How to conditionally fill in rows with NAs depending on an evaluation of the first two columns in a-CodePudding

I have a dataframe like this:

df <- data.frame(
  Metric = c('WI', NA, 'MN', NA, 'CO', NA),
  Eval = c('WI', NA, 'AK', NA, 'CO', NA),
  colA = c(30, 'ABC', 45, 'DEF', 2, 'XYZ'),
  colB = c(25, 'BEC', 23, 'FED', 50, 'HIJ')
)

I'd like to evaluate whether the Metric column and the Eval column are equal, and, if they are, fill everything to the right of the Metric column with NAs so the result would look like this:

df_desired <- data.frame(
  Metric = c('WI', NA, 'MN', NA, 'CO', NA),
  Eval = c(NA, NA, 'AK', NA, NA, NA),
  colA = c(NA, 'ABC', 45, 'DEF', NA, 'XYZ'),
  colB = c(NA, 'BEC', 23, 'FED', NA, 'HIJ')
)

What's the best way to do this using R, ideally with a tidyverse function? I tried using mutate/across, but defining the conditional here is throwing me off.

CodePudding user response：

Create a logical vector and do the assignment based on row/column index/names (base R is more direct to do)

i1 <- with(df, Metric == Eval & !is.na(Metric) & !is.na(Eval))
df[i1, -1] <- NA

-output

> df
  Metric Eval colA colB
1     WI <NA> <NA> <NA>
2   <NA> <NA>  ABC  BEC
3     MN   AK   45   23
4   <NA> <NA>  DEF  FED
5     CO <NA> <NA> <NA>
6   <NA> <NA>  XYZ  HIJ

Or with dplyr, create a column of logical vector ('i1'), loop across the columns 'Eval' to 'colB', use case_when/ifelse/if_else/replace to change the values to NA based on the 'i1' and remove the temporary column by assigning to NULL

library(dplyr)
df %>% 
   mutate(i1 = Metric == Eval, 
    across(Eval:colB, ~ case_when(i1 ~ NA_character_, TRUE ~ .)), 
      i1 = NULL)

-output

 Metric Eval colA colB
1     WI <NA> <NA> <NA>
2   <NA> <NA>  ABC  BEC
3     MN   AK   45   23
4   <NA> <NA>  DEF  FED
5     CO <NA> <NA> <NA>
6   <NA> <NA>  XYZ  HIJ

CodePudding user response：

Use mutate() and if_else() to conditionally replace values:

df |>
  mutate(colA = if_else(Metric == Eval, NA_character_, colA, missing = colA))
#>   Metric Eval colA colB
#> 1     WI   WI <NA>   25
#> 2   <NA> <NA>  ABC  BEC
#> 3     MN   AK   45   23
#> 4   <NA> <NA>  DEF  FED
#> 5     CO   CO <NA>   50
#> 6   <NA> <NA>  XYZ  HIJ

Note that we can't just use NA, we have to match it to the existing column type. In your example colA and colB are character vectors, so it's NA_character_. And we have to specify missing to handle the NA == NA case.

To generalise this across multiple columns, use across() and wrap the if_else() in an anonymous function:

df |>
  mutate(across(Eval:colB, ~if_else(Metric == Eval, NA_character_, ., missing = .)))
#>   Metric Eval colA colB
#> 1     WI <NA> <NA> <NA>
#> 2   <NA> <NA>  ABC  BEC
#> 3     MN   AK   45   23
#> 4   <NA> <NA>  DEF  FED
#> 5     CO <NA> <NA> <NA>
#> 6   <NA> <NA>  XYZ  HIJ