I have a dataset that is essential in the following format:
| group | var1 | var2 | var3 |
|---|---|---|---|
| a | 1 | . | . |
| a | 1 | . | . |
| a | 1 | 2 | . |
| a | 1 | 2 | 3 |
| a | 1 | . | . |
| b | 1 | . | . |
| b | 1 | 2 | 3 |
| b | 1 | 2 | . |
| b | 1 | 2 | 3 |
| b | 1 | 2 | . |
I want to generate a new variable that in this format:
| group | var1 | var2 | var3 | new var |
|---|---|---|---|---|
| a | 1 | . | . | 1 |
| a | 1 | . | . | 1 |
| a | 1 | 2 | . | 2 |
| a | 1 | 2 | 3 | 3 |
| a | 1 | . | . | 3 |
| b | 1 | . | . | 1 |
| b | 1 | 2 | 3 | 3 |
| b | 1 | 2 | . | 3 |
| b | 1 | 2 | 3 | 3 |
| b | 1 | 2 | . | 3 |
Help pls?
CodePudding user response:
Here is an option with pmax and cummax (assuming the . are missing -NA). Grouped by 'group', invoke pmax across the columns that 'starts_with' 'var' in column names, and get the cumulative max (cummax)
library(dplyr)
library(purrr)
df1 %>%
group_by(group) %>%
mutate(newvar = cummax(invoke(pmax,
c(across(starts_with('var')), na.rm = TRUE)))) %>%
ungroup
-output
# A tibble: 10 × 5
group var1 var2 var3 newvar
<chr> <int> <int> <int> <int>
1 a 1 NA NA 1
2 a 1 NA NA 1
3 a 1 2 NA 2
4 a 1 2 3 3
5 a 1 NA NA 3
6 b 1 NA NA 1
7 b 1 2 3 3
8 b 1 2 NA 3
9 b 1 2 3 3
10 b 1 2 NA 3
data
df1 <- structure(list(group = c("a", "a", "a", "a", "a", "b", "b", "b",
"b", "b"), var1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
var2 = c(NA, NA, 2L, 2L, NA, NA, 2L, 2L, 2L, 2L), var3 = c(NA,
NA, NA, 3L, NA, NA, 3L, NA, 3L, NA)), row.names = c(NA, -10L
), class = "data.frame")
CodePudding user response:
See if this helps you out
lastValue <- function(x) tail(x[!is.na(x)], 1)
df$newvar <- apply(df, 1, lastValue)
