Not the best title... but I'm trying to track tasks outstanding using weekly data exports. I want to use the total task count as of the minimum date (e.g. Jan 1st) of each year as a variable to return percent remaining (proportion) each week throughout that year. See example:
dt <- data.table(id.date=as.Date(c(rep('2019-01-01',80),rep('2019-01-08',60),rep('2019-01-15',40),
rep('2019-01-22',40),rep('2020-01-01',70),rep('2020-01-08',50),
rep('2020-01-15',40),rep('2020-01-29',20))),
task.type=rep('taskA'))[order(id.date)][,year:=year(id.date)]
# 2 sample records per weekly report
dt[,.SD[sample(.N,min(2,.N))],by=id.date]
id.date task.type year
<date> <chr> <int>
2019-01-01 taskA 2019
2019-01-01 taskA 2019
2019-01-08 taskA 2019
2019-01-08 taskA 2019
2019-01-15 taskA 2019
2019-01-15 taskA 2019
2019-01-22 taskA 2019
2019-01-22 taskA 2019
2020-01-01 taskA 2020
2020-01-01 taskA 2020
2020-01-08 taskA 2020
2020-01-08 taskA 2020
2020-01-15 taskA 2020
2020-01-15 taskA 2020
2020-01-29 taskA 2020
2020-01-29 taskA 2020
And the desired output:
# for weekly reports in 2019, 80 tasks would be used for all reports in 2019
# and for 2020, 70 would be used.
id.date N pct.rem
<date> <int> <dbl>
2019-01-01 80 1.0000000
2019-01-08 60 0.7500000
2019-01-15 40 0.5000000
2019-01-22 40 0.5000000
2020-01-01 70 1.0000000
2020-01-08 50 0.7142857
2020-01-15 40 0.5714286
2020-01-29 20 0.2857143
I've made attempts reworking some of the answers from this SO discussion here. But no success. My guess is that I need to utilize .SD or .EACHI is some manner but I'm just starting out with data.table.
Any guidance here is much appreciated. Let me know if more clarity would be helpful. Thanks.
CodePudding user response:
It seems like
dt[, N := .N, by = id.date]
dt[, pct.rem := N/.SD[1,N], by = year]
# .SD[1,N] by year gives the total task's number of each year
dt[,.SD[1,.(N,pct.rem)], by = id.date]
If you don't want to change dt, you can also use copy(dt) instead.
output:
id.date N pct.rem
<Date> <int> <num>
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
CodePudding user response:
You can also do something like this
out <- dt[, .(.N), by = id.date][, pct.rem := N / N[[1L]], by = year(id.date)]
Output
> out[]
id.date N pct.rem
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
If you have multiple types of tasks per id.date, then try this
out <- dt[, .(.N), by = .(id.date, task.type)][, pct.rem := N / N[[1L]], by = .(year(id.date), task.type)]
Output
> out[]
id.date task.type N pct.rem
1: 2019-01-01 taskA 80 1.0000000
2: 2019-01-08 taskA 60 0.7500000
3: 2019-01-15 taskA 40 0.5000000
4: 2019-01-22 taskA 40 0.5000000
5: 2020-01-01 taskA 70 1.0000000
6: 2020-01-08 taskA 50 0.7142857
7: 2020-01-15 taskA 40 0.5714286
8: 2020-01-29 taskA 20 0.2857143
