Assuming we have a data.table with a nested column val
dt <- data.table(
grp = c(1, 2, 1, 3, 4),
val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)
which shows as
> dt
grp val
1: 1 a
2: 2 b,c
3: 1 d,e,f
4: 3 g
5: 4 h,i
Question
I would like to unnest the val column, where a possible option is using tidyr::unnest, i.e.,
> dt %>%
unnest(val)
# A tibble: 9 × 2
grp val
<dbl> <chr>
1 1 a
2 2 b
3 2 c
4 1 d
5 1 e
6 1 f
7 3 g
8 4 h
9 4 i
I am wondering if we can implement it by using the data.table only.
Watch out the order of values in the column grp, I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4.
My data.table Attempt
My attempt is as below
> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
grp val
1: 1 a
2: 2 b
3: 2 c
4: 1 d
5: 1 e
6: 1 f
7: 3 g
8: 4 h
9: 4 i
or
> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
grp val
1: 1 a
2: 2 b
3: 2 c
4: 1 d
5: 1 e
6: 1 f
7: 3 g
8: 4 h
9: 4 i
but I guess there might be some more concise and elegant way to do this, e.g., without creating an auxiliary column id or using rep lengths.
Any idea? Much appreciated!
CodePudding user response:
One more option:
dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
# grp val
# <num> <char>
# 1: 1 a
# 2: 2 b
# 3: 2 c
# 4: 1 d
# 5: 1 e
# 6: 1 f
# 7: 3 g
# 8: 4 h
# 9: 4 i
PS. If you are using data.table 1.4.2 or older by = .I won't work. Instead you can use:
dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]
PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.
CodePudding user response:
I guess this works:
dt[, lapply(.SD, unlist), by = .(grp)]
grp val
1: 1 a
2: 1 d
3: 1 e
4: 1 f
5: 2 b
6: 2 c
7: 3 g
8: 4 h
9: 4 i
