Suppose I have a dataset dt like this:
| meta_cat | cat | sku | price | sales |
|---|---|---|---|---|
| bakery | bread | 796590 | 22.6 | 24 |
| bakery | bread | 796595 | 19.8 | 20 |
| bakery | doughnut | 796588 | 30.6 | 36 |
| bakery | sandwich | 796640 | 45.9 | 42 |
| bakery | sandwich | 796643 | 43.3 | 45 |
| fruits | feijoa | 645342 | 97.2 | 5 |
| fruits | orange | 645675 | 35.7 | 78 |
| fruits | orange | 645677 | 43.9 | 65 |
| fruits | feijoa | 645342 | 92.9 | 11 |
Also, I have a list which looks like this, for example:
lvl_list <- list(c("meta_cat"),
c("cat"))
I don’t know in advance how many levels there will be in the list (list length can be either 0 (empty list), or one, two, three, etc. (in our example, there are two levels)). List values correspond to the columns names from the dataset.
My task is to run the nested for loops based on the length of the list.
If the list is empty, the loop does not start and the main code is executed.
If the list length = 1, there should be 1 for loop like this:
for(i in unique(dt[[lvl_list[[1]]]])){
dt <- dt[get(lvl_list[[1]]) == I,] # make subset
# run main code
# .
# .
# main code
}
}
So, at the first iteration, we filter the dt by the first unique value of the meta_cat column (for example, choose only records where meta_cat = "bakery") and run main code on this dt.
If the length of the list = 2, we should get 2 for loops:
for(i in unique(dt[[lvl_list[[1]]]])){
dt <- dt[get(lvl_list[[1]]) == i, ] # filter dt
for(j in unique(dt[[lvl_list[[2]]]])){
dt <- dt[get(lvl_list[[2]]) == j, ] # filter dt again
# run main code
# .
# .
# main code
}
}
So, here we filter dt by values of two columns.
There are two unique values for variable meta_cat and 5 unique values for cat variable.
The logic of code execution should be as follows: at the first iteration, we filter the dt by the first value of meta_cat (leaving in dt observations, where meta_cat = "bakery"), at the first iteration of the second loop, we filter the dt by the first value of cat variable (we will choose observations where cat = "bread"). So, we obtain dt where meta_cat = "bakery" and cat = "bread". Further, this filtered dt is used as an input for the modelling code.
On the second iteration, the original dt is filtered by meta_cat = "bakery", and cat = "doughnut". Then the main code is executed for this dt, end so on.
If there are 3 levels in the list, we should have 3 for loops, etc.
My question: is it possible to create nested for loops dynamically, based on the list length?
I would be grateful for any help how it can be implemented.
CodePudding user response:
It may be easier with split
lst1 <- lapply(split(dt, dt[[lvl_list[[1]]]]), function(x)
split(x, x[[lvl_list[[2]]]]))
Also, as this is a recursive split, use rsplit from collapse, which by default does recursive split and returns the nested list`
library(collapse)
lst2 <- rsplit(dt, by = dt[, unlist(lvl_list), with = FALSE])
data
dt <- structure(list(meta_cat = c("bakery", "bakery", "bakery", "bakery",
"bakery", "fruits", "fruits", "fruits", "fruits"), cat = c("bread",
"bread", "doughnut", "sandwich", "sandwich", "feijoa", "orange",
"orange", "feijoa"), sku = c(796590L, 796595L, 796588L, 796640L,
796643L, 645342L, 645675L, 645677L, 645342L), price = c(22.6,
19.8, 30.6, 45.9, 43.3, 97.2, 35.7, 43.9, 92.9), sales = c(24L,
20L, 36L, 42L, 45L, 5L, 78L, 65L, 11L)), row.names = c(NA, -9L
), class = c("data.table", "data.frame"))
