Home > OS >  Julia DataFrame - how to get data.table .N, .GRP
Julia DataFrame - how to get data.table .N, .GRP

Time:01-09

I have checked the documentation at:

https://dataframes.juliadata.org/stable/man/comparisons/#Comparison-with-the-R-package-data.table

but I am missing the following commands I often use with data.table, and I have trouble finding a solution for those.

I am re-using the same example:

library(data.table)
df  <- data.table(grp = rep(1:2, 3), x = 6:1, y = 4:9,
                  z = c(3:7, NA), id = letters[1:6])
df
   grp x y  z id
1:   1 6 4  3  a
2:   2 5 5  4  b
3:   1 4 6  5  c
4:   2 3 7  6  d
5:   1 2 8  7  e
6:   2 1 9 NA  f

Get row count by group:

df[, .N, by=grp]
   grp N
1:   1 3
2:   2 3

Adding a column which is an index by group:

df[, idx := 1:.N, by=grp]
> df
   grp x y  z id idx
1:   1 6 4  3  a   1
2:   2 5 5  4  b   1
3:   1 4 6  5  c   2
4:   2 3 7  6  d   2
5:   1 2 8  7  e   3
6:   2 1 9 NA  f   3

Adding a column which is an index for each group. Of course here we already have grp as numerical index, but this is often not the case.

df[, grp_index := .GRP, by=grp]
df
   grp x y  z id idx grp_index
1:   1 6 4  3  a   1         1
2:   2 5 5  4  b   1         2
3:   1 4 6  5  c   2         1
4:   2 3 7  6  d   2         2
5:   1 2 8  7  e   3         1
6:   2 1 9 NA  f   3         2

CodePudding user response:

There are several ways to do it. Here is an example:

julia> using DataFrames

julia> df = DataFrame(grp = repeat(1:2, 3), x = 6:-1:1, y = 4:9,
                         z = [3:7; missing], id = 'a':'f')
6×5 DataFrame
 Row │ grp    x      y      z        id
     │ Int64  Int64  Int64  Int64?   Char
─────┼────────────────────────────────────
   1 │     1      6      4        3  a
   2 │     2      5      5        4  b
   3 │     1      4      6        5  c
   4 │     2      3      7        6  d
   5 │     1      2      8        7  e
   6 │     2      1      9  missing  f

julia> gdf = groupby(df, :grp)
GroupedDataFrame with 2 groups based on key: grp
First Group (3 rows): grp = 1
 Row │ grp    x      y      z       id
     │ Int64  Int64  Int64  Int64?  Char
─────┼───────────────────────────────────
   1 │     1      6      4       3  a
   2 │     1      4      6       5  c
   3 │     1      2      8       7  e
⋮
Last Group (3 rows): grp = 2
 Row │ grp    x      y      z        id
     │ Int64  Int64  Int64  Int64?   Char
─────┼────────────────────────────────────
   1 │     2      5      5        4  b
   2 │     2      3      7        6  d
   3 │     2      1      9  missing  f

julia> combine(gdf, nrow)
2×2 DataFrame
 Row │ grp    nrow
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      3

julia> transform!(gdf, :grp => eachindex => :idx)
6×6 DataFrame
 Row │ grp    x      y      z        id    idx
     │ Int64  Int64  Int64  Int64?   Char  Int64
─────┼───────────────────────────────────────────
   1 │     1      6      4        3  a         1
   2 │     2      5      5        4  b         1
   3 │     1      4      6        5  c         2
   4 │     2      3      7        6  d         2
   5 │     1      2      8        7  e         3
   6 │     2      1      9  missing  f         3

julia> df.grp_index = groupindices(gdf)
6-element Vector{Union{Missing, Int64}}:
 1
 2
 1
 2
 1
 2

julia> df
6×7 DataFrame
 Row │ grp    x      y      z        id    idx    grp_index
     │ Int64  Int64  Int64  Int64?   Char  Int64  Int64?
─────┼──────────────────────────────────────────────────────
   1 │     1      6      4        3  a         1          1
   2 │     2      5      5        4  b         1          2
   3 │     1      4      6        5  c         2          1
   4 │     2      3      7        6  d         2          2
   5 │     1      2      8        7  e         3          1
   6 │     2      1      9  missing  f         3          2

As @phipsgabler commented you can also use DataFramesMeta.jl or DataFrameMacros.jl packages if you want to have non-standard evaluation syntax (above I used code that does not rely on non-standard evaluation but just uses standard Julia syntax).

You could also chain these operations using Chain.jl if you preferred.

  •  Tags:  
  • Related