Return percent remaining within a group, using the 1st date of group's count-CodePudding

Not the best title... but I'm trying to track tasks outstanding using weekly data exports. I want to use the total task count as of the minimum date (e.g. Jan 1st) of each year as a variable to return percent remaining (proportion) each week throughout that year. See example:

dt <- data.table(id.date=as.Date(c(rep('2019-01-01',80),rep('2019-01-08',60),rep('2019-01-15',40),
                         rep('2019-01-22',40),rep('2020-01-01',70),rep('2020-01-08',50),
                         rep('2020-01-15',40),rep('2020-01-29',20))),
           task.type=rep('taskA'))[order(id.date)][,year:=year(id.date)]

# 2 sample records per weekly report
dt[,.SD[sample(.N,min(2,.N))],by=id.date]

 id.date  task.type year
  <date>    <chr>  <int>
2019-01-01  taskA   2019        
2019-01-01  taskA   2019        
2019-01-08  taskA   2019        
2019-01-08  taskA   2019        
2019-01-15  taskA   2019        
2019-01-15  taskA   2019        
2019-01-22  taskA   2019        
2019-01-22  taskA   2019        
2020-01-01  taskA   2020        
2020-01-01  taskA   2020
2020-01-08  taskA   2020        
2020-01-08  taskA   2020    
2020-01-15  taskA   2020
2020-01-15  taskA   2020    
2020-01-29  taskA   2020    
2020-01-29  taskA   2020

And the desired output:

# for weekly reports in 2019, 80 tasks would be used for all reports in 2019
# and for 2020, 70 would be used.
   id.date   N    pct.rem
  <date>   <int>    <dbl>
2019-01-01  80  1.0000000       
2019-01-08  60  0.7500000       
2019-01-15  40  0.5000000       
2019-01-22  40  0.5000000       
2020-01-01  70  1.0000000       
2020-01-08  50  0.7142857       
2020-01-15  40  0.5714286       
2020-01-29  20  0.2857143

I've made attempts reworking some of the answers from this SO discussion here. But no success. My guess is that I need to utilize .SD or .EACHI is some manner but I'm just starting out with data.table.

Any guidance here is much appreciated. Let me know if more clarity would be helpful. Thanks.

CodePudding user response：

It seems like

dt[, N := .N, by = id.date]
dt[, pct.rem := N/.SD[1,N], by = year] 
# .SD[1,N] by year gives the total task's number of each year

dt[,.SD[1,.(N,pct.rem)], by = id.date]

If you don't want to change dt, you can also use copy(dt) instead.

output:

      id.date     N   pct.rem
       <Date> <int>     <num>
1: 2019-01-01    80 1.0000000
2: 2019-01-08    60 0.7500000
3: 2019-01-15    40 0.5000000
4: 2019-01-22    40 0.5000000
5: 2020-01-01    70 1.0000000
6: 2020-01-08    50 0.7142857
7: 2020-01-15    40 0.5714286
8: 2020-01-29    20 0.2857143

CodePudding user response：

You can also do something like this

out <- dt[, .(.N), by = id.date][, pct.rem := N / N[[1L]], by = year(id.date)]

Output

> out[]
      id.date  N   pct.rem
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143

If you have multiple types of tasks per id.date, then try this

out <- dt[, .(.N), by = .(id.date, task.type)][, pct.rem := N / N[[1L]], by = .(year(id.date), task.type)]

Output

> out[]
      id.date task.type  N   pct.rem
1: 2019-01-01     taskA 80 1.0000000
2: 2019-01-08     taskA 60 0.7500000
3: 2019-01-15     taskA 40 0.5000000
4: 2019-01-22     taskA 40 0.5000000
5: 2020-01-01     taskA 70 1.0000000
6: 2020-01-08     taskA 50 0.7142857
7: 2020-01-15     taskA 40 0.5714286
8: 2020-01-29     taskA 20 0.2857143