I have a data df as follows:
df <- data.frame(id = c(1,1,1,2,2,2,2,3,3),
year=c(2011,2012,2013,2010,2011,2012,2013,2012,2013),
points=c(45,69,79,53,13,12,11,89,91),
result = c(2,3,5,4,6,1,2,3,4))
But I want to make df as below:
df <- data.frame(id = c(1,1,2,2,2,3),
year=c(2011,2012,2010,2011,2012,2012),
points=c(45,69,53,13,12,89),
result = c(3,5,6,1,2,4))
Here, I want to do some regression with the response variable result. Since I want to estimate result, I have to delay the response variable result and leave the other dependent variable points. So, for my regression setting, result is the response variable and points is the dependent variable. In summary, I want to do time lag for result. Within each id, each last row should be removed because, there are no next result.
I simplified my problem for demonstration purpose. Is there any way to achieve this using R?
CodePudding user response:
Tidyverse solution:
library(tidyverse)
df %>% group_by(id) %>% mutate(lead_result = lead(result)) %>% na.exclude
# A tibble: 6 x 5
# Groups: id [3]
id year points result lead_result
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2011 45 2 3
2 1 2012 69 3 5
3 2 2010 53 4 6
4 2 2011 13 6 1
5 2 2012 12 1 2
6 3 2012 89 3 4
CodePudding user response:
A data.table solution:
library(data.table)
na.omit(setDT(df)[, result := shift(result, type = "lead"), by = "id"], "result")
Output
id year points result
<num> <num> <num> <num>
1: 1 2011 45 3
2: 1 2012 69 5
3: 2 2010 53 6
4: 2 2011 13 1
5: 2 2012 12 2
6: 3 2012 89 4
