library(data.table)
DT <- data.table(var = 1:100)
I want to create a second variable, group that groups the values in var by n consecutive integers. So if n is equal to 1, it would return the same column as var. If n=2, it would return me:
var group
1: 1 1
2: 2 1
3: 3 2
4: 4 2
5: 5 3
6: 6 3
If n=3, it would return me:
var group
1: 1 1
2: 2 1
3: 3 1
4: 4 2
5: 5 2
6: 6 2
and so on. I would like to do this as flexibly as possibly.
Note that there could be repeated values:
var group
1: 1 1
2: 1 1
3: 2 1
4: 3 2
5: 3 2
6: 4 2
Here, group corresponds to n=2. Thank you!
CodePudding user response:
I think we can use findInterval for this:
DT <- data.table(var = c(1L, 1:10))
n <- 2
DT[, group := findInterval(var, seq(min(var), max(var) n, by = n))]
# var group
# <int> <int>
# 1: 1 1
# 2: 1 1
# 3: 2 1
# 4: 3 2
# 5: 4 2
# 6: 5 3
# 7: 6 3
# 8: 7 4
# 9: 8 4
# 10: 9 5
# 11: 10 5
n <- 3
DT[, group := findInterval(var, seq(min(var), max(var) n, by = n))]
# var group
# <int> <int>
# 1: 1 1
# 2: 1 1
# 3: 2 1
# 4: 3 1
# 5: 4 2
# 6: 5 2
# 7: 6 2
# 8: 7 3
# 9: 8 3
# 10: 9 3
# 11: 10 4
(The n in the call to seq is so that we always have a little more than we need; if we did just seq(min(.),max(.),by=n), it would be possible the highest values of var would be outside of the sequence. One could also do c(seq(min(.), max(.), by=n), Inf) for the same effect.)
