I like to reshape a time series data from long to wide format , with columns such as StartTime and StopTime. All variables measured during the same time interval (StartTime, StopTime) to be in the same line.
For example if this is my dataset
Id Time Status Col1
10 2012 4 2
11 2009 2 5
11 2010 2 5
12 2004 2 2
12 2009 2 3
12 2011 2 1
12 2018 2 3
17 2018 2 3
17 2020 2 1
Expecting a dataset like this
Id From To Status Col1
10 2012 2012 4 2
11 2009 2010 2 5
12 2004 2009 2 2
12 2009 2011 2 3
12 2011 2018 2 1
12 2018 2018 2 3
17 2018 2020 2 3
17 2020 2020 2 1
Thanks in advance for the help.
CodePudding user response:
An option would be to create a lead column after grouping by 'Id'
library(dplyr)
df1 %>%
group_by(Id) %>% mutate(To = if(n() == 1) Time else
lead(Time, default = last(Time)), .before = Status) %>%
ungroup %>%
rename(From = Time) %>%
filter(!is.na(To))
CodePudding user response:
I do not understand why in id 12, there is no 2018-2018.
df %>%
group_by(Id)%>%
mutate(From = Time,To = lead(Time, def = last(Time)),.after=Id, Time = NULL)
# A tibble: 9 × 5
# Groups: Id [4]
Id From To Status Col1
<int> <int> <int> <int> <int>
1 10 2012 2012 4 2
2 11 2009 2010 2 5
3 11 2010 2010 2 5
4 12 2004 2009 2 2
5 12 2009 2011 2 3
6 12 2011 2018 2 1
7 12 2018 2018 2 3
8 17 2018 2020 2 3
9 17 2020 2020 2 1
