Home > Mobile >  how to generate a new variable by one column's value overriding the other's in R
how to generate a new variable by one column's value overriding the other's in R

Time:01-19

I have a dataset that is essential in the following format:

group var1 var2 var3
a 1 . .
a 1 . .
a 1 2 .
a 1 2 3
a 1 . .
b 1 . .
b 1 2 3
b 1 2 .
b 1 2 3
b 1 2 .

I want to generate a new variable that in this format:

group var1 var2 var3 new var
a 1 . . 1
a 1 . . 1
a 1 2 . 2
a 1 2 3 3
a 1 . . 3
b 1 . . 1
b 1 2 3 3
b 1 2 . 3
b 1 2 3 3
b 1 2 . 3

Help pls?

CodePudding user response:

Here is an option with pmax and cummax (assuming the . are missing -NA). Grouped by 'group', invoke pmax across the columns that 'starts_with' 'var' in column names, and get the cumulative max (cummax)

library(dplyr)
library(purrr)
df1 %>%
    group_by(group) %>%
    mutate(newvar = cummax(invoke(pmax, 
        c(across(starts_with('var')), na.rm = TRUE)))) %>%
    ungroup

-output

# A tibble: 10 × 5
   group  var1  var2  var3 newvar
   <chr> <int> <int> <int>  <int>
 1 a         1    NA    NA      1
 2 a         1    NA    NA      1
 3 a         1     2    NA      2
 4 a         1     2     3      3
 5 a         1    NA    NA      3
 6 b         1    NA    NA      1
 7 b         1     2     3      3
 8 b         1     2    NA      3
 9 b         1     2     3      3
10 b         1     2    NA      3

data

df1 <- structure(list(group = c("a", "a", "a", "a", "a", "b", "b", "b", 
"b", "b"), var1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    var2 = c(NA, NA, 2L, 2L, NA, NA, 2L, 2L, 2L, 2L), var3 = c(NA, 
    NA, NA, 3L, NA, NA, 3L, NA, 3L, NA)), row.names = c(NA, -10L
), class = "data.frame")

CodePudding user response:

See if this helps you out

lastValue <- function(x)   tail(x[!is.na(x)], 1)

df$newvar <- apply(df, 1, lastValue)
  •  Tags:  
  • Related