I have checked the documentation at:
https://dataframes.juliadata.org/stable/man/comparisons/#Comparison-with-the-R-package-data.table
but I am missing the following commands I often use with data.table, and I have trouble finding a solution for those.
I am re-using the same example:
library(data.table)
df <- data.table(grp = rep(1:2, 3), x = 6:1, y = 4:9,
z = c(3:7, NA), id = letters[1:6])
df
grp x y z id
1: 1 6 4 3 a
2: 2 5 5 4 b
3: 1 4 6 5 c
4: 2 3 7 6 d
5: 1 2 8 7 e
6: 2 1 9 NA f
Get row count by group:
df[, .N, by=grp]
grp N
1: 1 3
2: 2 3
Adding a column which is an index by group:
df[, idx := 1:.N, by=grp]
> df
grp x y z id idx
1: 1 6 4 3 a 1
2: 2 5 5 4 b 1
3: 1 4 6 5 c 2
4: 2 3 7 6 d 2
5: 1 2 8 7 e 3
6: 2 1 9 NA f 3
Adding a column which is an index for each group. Of course here we already have grp as numerical index, but this is often not the case.
df[, grp_index := .GRP, by=grp]
df
grp x y z id idx grp_index
1: 1 6 4 3 a 1 1
2: 2 5 5 4 b 1 2
3: 1 4 6 5 c 2 1
4: 2 3 7 6 d 2 2
5: 1 2 8 7 e 3 1
6: 2 1 9 NA f 3 2
CodePudding user response:
There are several ways to do it. Here is an example:
julia> using DataFrames
julia> df = DataFrame(grp = repeat(1:2, 3), x = 6:-1:1, y = 4:9,
z = [3:7; missing], id = 'a':'f')
6×5 DataFrame
Row │ grp x y z id
│ Int64 Int64 Int64 Int64? Char
─────┼────────────────────────────────────
1 │ 1 6 4 3 a
2 │ 2 5 5 4 b
3 │ 1 4 6 5 c
4 │ 2 3 7 6 d
5 │ 1 2 8 7 e
6 │ 2 1 9 missing f
julia> gdf = groupby(df, :grp)
GroupedDataFrame with 2 groups based on key: grp
First Group (3 rows): grp = 1
Row │ grp x y z id
│ Int64 Int64 Int64 Int64? Char
─────┼───────────────────────────────────
1 │ 1 6 4 3 a
2 │ 1 4 6 5 c
3 │ 1 2 8 7 e
⋮
Last Group (3 rows): grp = 2
Row │ grp x y z id
│ Int64 Int64 Int64 Int64? Char
─────┼────────────────────────────────────
1 │ 2 5 5 4 b
2 │ 2 3 7 6 d
3 │ 2 1 9 missing f
julia> combine(gdf, nrow)
2×2 DataFrame
Row │ grp nrow
│ Int64 Int64
─────┼──────────────
1 │ 1 3
2 │ 2 3
julia> transform!(gdf, :grp => eachindex => :idx)
6×6 DataFrame
Row │ grp x y z id idx
│ Int64 Int64 Int64 Int64? Char Int64
─────┼───────────────────────────────────────────
1 │ 1 6 4 3 a 1
2 │ 2 5 5 4 b 1
3 │ 1 4 6 5 c 2
4 │ 2 3 7 6 d 2
5 │ 1 2 8 7 e 3
6 │ 2 1 9 missing f 3
julia> df.grp_index = groupindices(gdf)
6-element Vector{Union{Missing, Int64}}:
1
2
1
2
1
2
julia> df
6×7 DataFrame
Row │ grp x y z id idx grp_index
│ Int64 Int64 Int64 Int64? Char Int64 Int64?
─────┼──────────────────────────────────────────────────────
1 │ 1 6 4 3 a 1 1
2 │ 2 5 5 4 b 1 2
3 │ 1 4 6 5 c 2 1
4 │ 2 3 7 6 d 2 2
5 │ 1 2 8 7 e 3 1
6 │ 2 1 9 missing f 3 2
As @phipsgabler commented you can also use DataFramesMeta.jl or DataFrameMacros.jl packages if you want to have non-standard evaluation syntax (above I used code that does not rely on non-standard evaluation but just uses standard Julia syntax).
You could also chain these operations using Chain.jl if you preferred.
