How to store my training examples in a Pandas dataframe?-CodePudding

I currently have my training examples stored in a python list. Each training example is a dictionary with the following structure:

example = {
  "features" : {
     "position" : np.array([[0,1], [1,2], [2, 3], [3, 4]]),
     "type"     : np.array([-1, -1, 2, 1])
   }
  "labels" : np.array([[2,1], [3,2], [4, 3], [5, 4]])
}

What would be the correct structure to store this in a pandas Dataframe? Can I have a numpy array as a column datatype? That would be ideal, I think. But it does not seem possible?

CodePudding user response：

Maybe try use pd.json_normalize and optionally df.explode depending on how you want to treat each row in your arrays:

import pandas as pd
import tabulate

example = {
  "features" : {
     "position" : np.array([[0,1], [1,2], [2, 3], [3, 4]]),
     "type"     : np.array([-1, -1, 2, 1])
   },
  "labels" : np.array([[2,1], [3,2], [4, 3], [5, 4]])
}

df = pd.json_normalize(example, sep='_')
# df = pd.json_normalize([example, example], sep='_') <-- for a list of examples

print(df.to_markdown())

|    | labels   | features_position   | features_type   |
|---:|:---------|:--------------------|:----------------|
|  0 | [[2 1]   | [[0 1]              | [-1 -1  2  1]   |
|    |  [3 2]   |  [1 2]              |                 |
|    |  [4 3]   |  [2 3]              |                 |
|    |  [5 4]]  |  [3 4]]             |                 |

print(df.explode('features_type').to_markdown())

|    | labels   | features_position   |   features_type |
|---:|:---------|:--------------------|----------------:|
|  0 | [[2 1]   | [[0 1]              |              -1 |
|    |  [3 2]   |  [1 2]              |                 |
|    |  [4 3]   |  [2 3]              |                 |
|    |  [5 4]]  |  [3 4]]             |                 |
|  0 | [[2 1]   | [[0 1]              |              -1 |
|    |  [3 2]   |  [1 2]              |                 |
|    |  [4 3]   |  [2 3]              |                 |
|    |  [5 4]]  |  [3 4]]             |                 |
|  0 | [[2 1]   | [[0 1]              |               2 |
|    |  [3 2]   |  [1 2]              |                 |
|    |  [4 3]   |  [2 3]              |                 |
|    |  [5 4]]  |  [3 4]]             |                 |
|  0 | [[2 1]   | [[0 1]              |               1 |
|    |  [3 2]   |  [1 2]              |                 |
|    |  [4 3]   |  [2 3]              |                 |
|    |  [5 4]]  |  [3 4]]             |                 |