I have a quick recoding question. Here is my sample dataset looks like:
df <- data.frame(id = c(1,2,3),
i1 = c(1,NA,0),
i2 = c(1,1,1))
> df
id i1 i2
1 1 1 1
2 2 NA 1
3 3 0 1
When, i1==NA , then I need to recode i2==NA. I tried below but not luck.
df %>%
mutate(i2 = case_when(
i1 == NA ~ NA_real_,
TRUE ~ as.character(i2)))
Error in `mutate()`:
! Problem while computing `i2 = case_when(i1 == "NA" ~ NA_real_, TRUE ~ as.character(i2))`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
my desired output looks like this:
> df
id i1 i2
1 1 1 1
2 2 NA NA
3 3 0 1
CodePudding user response:
Here is an option:
t(apply(df, 1, \(x) if (any(is.na(x))) cumsum(x) else x))
# id i1 i2
#[1,] 1 1 1
#[2,] 2 NA NA
#[3,] 3 0 1
The idea is to calculate the cumulative sum of every row, if a row contains an NA; if there is an NA in term i , subsequent terms i 1 will also be NA (since e.g. NA 1 = NA). Since your sample data df is all numeric, I recommend using a matrix (rather than a data.frame). Matrix operations are usually faster than data.frame (i.e. list) operations.
Key assumptions:
idcannot beNA.- This replaces
NAs ini2based on anNAini1per row.
A tidyverse solution
I advise against a tidyverse solution here for a couple of reasons
- Your data is all-numerical, so a
matrixis a more suitable data structure than adata.frame/tibble. dplyr/tidyrsyntax usually operates efficiently on columns; as soon as you want to do things "row-wise",dplyr(and its family packages) might not be the best way (despitedplyr::rowwise()which just introduces a row number-based grouping).
With that out of the way, you can transpose the problem.
library(tidyverse)
df %>%
transpose() %>%
map(~ { if (is.na(.x$i1)) .x$i2 <- NA_real_; .x }) %>%
transpose() %>%
as_tibble() %>%
unnest(everything())
## A tibble: 3 × 3
# id i1 i2
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 2 NA NA
#3 3 0 1
CodePudding user response:
Would a simple assignment meet your requirements for this?
df$i2[is.na(df$i1)] <- NA
