Recode NA when another column value is NA in R-CodePudding

I have a quick recoding question. Here is my sample dataset looks like:

df <- data.frame(id = c(1,2,3),
                 i1 = c(1,NA,0),
                 i2 = c(1,1,1))

> df
  id i1 i2
1  1  1  1
2  2 NA  1
3  3  0  1

When, i1==NA , then I need to recode i2==NA. I tried below but not luck.

df %>%
  mutate(i2 = case_when(
    i1 == NA ~  NA_real_,
    TRUE ~ as.character(i2)))

Error in `mutate()`:
! Problem while computing `i2 = case_when(i1 == "NA" ~ NA_real_, TRUE ~ as.character(i2))`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

my desired output looks like this:

> df
  id i1 i2
1  1  1  1
2  2 NA  NA
3  3  0  1

CodePudding user response：

Here is an option:

t(apply(df, 1, \(x) if (any(is.na(x))) cumsum(x) else x))
#     id i1 i2
#[1,]  1  1  1
#[2,]  2 NA NA
#[3,]  3  0  1

The idea is to calculate the cumulative sum of every row, if a row contains an NA; if there is an NA in term i , subsequent terms i 1 will also be NA (since e.g. NA 1 = NA). Since your sample data df is all numeric, I recommend using a matrix (rather than a data.frame). Matrix operations are usually faster than data.frame (i.e. list) operations.

Key assumptions:

id cannot be NA.
This replaces NAs in i2 based on an NA in i1 per row.

A `tidyverse` solution

I advise against a tidyverse solution here for a couple of reasons

Your data is all-numerical, so a matrix is a more suitable data structure than a data.frame/tibble.
dplyr/tidyr syntax usually operates efficiently on columns; as soon as you want to do things "row-wise", dplyr (and its family packages) might not be the best way (despite dplyr::rowwise() which just introduces a row number-based grouping).

With that out of the way, you can transpose the problem.

library(tidyverse)
df %>%
    transpose() %>%
    map(~ { if (is.na(.x$i1)) .x$i2 <- NA_real_; .x }) %>%
    transpose() %>%
    as_tibble() %>%
    unnest(everything())
## A tibble: 3 × 3
#     id    i1    i2
#  <dbl> <dbl> <dbl>
#1     1     1     1
#2     2    NA    NA
#3     3     0     1

CodePudding user response：

Would a simple assignment meet your requirements for this?

df$i2[is.na(df$i1)] <- NA

A tidyverse solution

A `tidyverse` solution