I'm trying to mutate a column in a Dataframe using the lag() function as a condition without producing NA values. Let me create an example:
df <- data.frame("Score" = as.numeric(c("20", "10", "15", "30", "15", "10")),
"Time" = c("1", "2", "1", "2", "1", "2"),
"Team" = c("A", "A", "B", "B", "C", "C"))
After that, I created a new column named Diff that calculates the difference of the Score of every Team:
df <- df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score))
My problem is that this method creates NA values, obviously:
Score Time Team Diff
20 1 A NA
10 2 A -10
15 1 B NA
30 2 B 15
15 1 C NA
10 2 C -5
My goal is to have this at the end:
Score Time Team Diff
20 1 A -10
10 2 A -10
15 1 B 15
30 2 B 15
15 1 C -5
10 2 C -5
I've tried mutating again using the case_when() function to substitute the NA for the next value, but it also didn't work:
df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score)) %>%
mutate(Diff = case_when(
NA ~ lead(Diff)
))
Anyway, how do I make the NA values be replaced by the next Diff value?
Thanks a lot!
CodePudding user response:
Just use fill() after the fact:
library(tidyverse)
df <- data.frame("Score" = as.numeric(c("20", "10", "15", "30", "15", "10")),
"Time" = c("1", "2", "1", "2", "1", "2"),
"Team" = c("A", "A", "B", "B", "C", "C"))
df <- df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score)) %>%
fill(Diff, .direction = 'up')
df
# output
# Score Time Team Diff
# <dbl> <chr> <chr> <dbl>
#1 20 1 A -10
#2 10 2 A -10
#3 15 1 B 15
#4 30 2 B 15
#5 15 1 C -5
#6 10 2 C -5
