Home > Back-end >  Convenient way to convert dataframe to vector of tuple?
Convenient way to convert dataframe to vector of tuple?

Time:01-23

I'm trying to make a function that converts dataframe to vector of tuples in Julia.

For example,

using DataFrames
df = DataFrame(A=1:4, B=4:7, C=10:13)

4×3 DataFrame
 Row │ A      B      C     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      4     10
   2 │     2      5     11
   3 │     3      6     12
   4 │     4      7     13

T = [t for t in zip(df.A, df.B, df.C)]

T = 4-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 4, 10)
 (2, 5, 11)
 (3, 6, 12)
 (4, 7, 13)

Then T becomes the result what I exactly wanted.

The problem is, however, that I need to functionalize above process.

So, what I need is to automatically put columns of dataframe into the zip function.

The form of function that I want to make is as below

using DataFrames

function DataframeToTuple(df)
    T = [t for t in zip(df.first column name, df.second column name, ... df.last column name)]
    return T
end

Is there any convenient way? Thanks a lot

CodePudding user response:

This is perhaps the shortest way:

julia> Tuple.(eachrow(df))
4-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 4, 10)
 (2, 5, 11)
 (3, 6, 12)
 (4, 7, 13)

It is also quite interesting to know that you can convert a DataFrame to a Vector of NamedTuples in an identical way:

julia> NamedTuple.(eachrow(df))
4-element Vector{NamedTuple{(:A, :B, :C), Tuple{Int64, Int64, Int64}}}:
 (A = 1, B = 4, C = 10)
 (A = 2, B = 5, C = 11)
 (A = 3, B = 6, C = 12)
 (A = 4, B = 7, C = 13)

CodePudding user response:

A more efficient way to convert a data frame to a vector of NamedTuple if it is not very wide (roughly less than 1000 columns) is:

julia> Tables.rowtable(df)
4-element Vector{NamedTuple{(:A, :B, :C), Tuple{Int64, Int64, Int64}}}:
 (A = 1, B = 4, C = 10)
 (A = 2, B = 5, C = 11)
 (A = 3, B = 6, C = 12)
 (A = 4, B = 7, C = 13)

and if you insist on tuple then do Tuple.(Tables.rowtable(df)).

CodePudding user response:

There are quite a few ways of doing this - inevitably, it will involve Tuple or NamedTuple.

using DataFrames
df = DataFrame(A=1:4, B=4:7, C=10:13)

4×3 DataFrame
Row │ A      B      C     
    │ Int64  Int64  Int64 
────┼─────────────────────
  1 │     1      4     10
  2 │     2      5     11
  3 │     3      6     12
  4 │     4      7     13

[Tuple(df[n,:]) for n in 1:size(df,1)]

4-element Vector{Tuple{Int64, Int64, Int64}}:
(1, 4, 10)
(2, 5, 11)
(3, 6, 12)
(4, 7, 13)

The reason why is interesting. As a basic idea, there are two kinds of type, simple values and collections. An example of the former is String, and examples of the latter is Tuple and Array. Type names start with a capital letter.

For every type, there is a corresponding function, with the same name as the type, but the name is all lower-case. Type String has function string, Tuple has function tuple. The function takes the data and puts it into the new type. If the new type is a simple type then a conversion is done, if a collection then the data is put into a new collection without conversion.

 # String
 string(1.0)       # -> "1.0"
 string(1:3)       # -> "1:3"

 # Tuple
 tuple(1.0)        # -> (1.0,)
 tuple(1:3)        # -> (1:3,) 
 tuple([1,2,3,4])  # -> ([1, 2, 3, 4],)

For collections only, another function is provided which has the same name as the type, complete with a capital letter at the start. This does the conversion.

 # Tuple
 Tuple(1.0)       # -> (1.0,)
 Tuple(1:3)       # -> (1,2,3)
 Tuple([1,2,3,4]) # -> (1,2,3,4)

Hence here, when converting to tuples we have to use the function Tuple (or the function NamedTuple).

  •  Tags:  
  • Related