I currently have my training examples stored in a python list. Each training example is a dictionary with the following structure:
example = {
"features" : {
"position" : np.array([[0,1], [1,2], [2, 3], [3, 4]]),
"type" : np.array([-1, -1, 2, 1])
}
"labels" : np.array([[2,1], [3,2], [4, 3], [5, 4]])
}
What would be the correct structure to store this in a pandas Dataframe? Can I have a numpy array as a column datatype? That would be ideal, I think. But it does not seem possible?
CodePudding user response:
Maybe try use pd.json_normalize and optionally df.explode depending on how you want to treat each row in your arrays:
import pandas as pd
import tabulate
example = {
"features" : {
"position" : np.array([[0,1], [1,2], [2, 3], [3, 4]]),
"type" : np.array([-1, -1, 2, 1])
},
"labels" : np.array([[2,1], [3,2], [4, 3], [5, 4]])
}
df = pd.json_normalize(example, sep='_')
# df = pd.json_normalize([example, example], sep='_') <-- for a list of examples
print(df.to_markdown())
| | labels | features_position | features_type |
|---:|:---------|:--------------------|:----------------|
| 0 | [[2 1] | [[0 1] | [-1 -1 2 1] |
| | [3 2] | [1 2] | |
| | [4 3] | [2 3] | |
| | [5 4]] | [3 4]] | |
print(df.explode('features_type').to_markdown())
| | labels | features_position | features_type |
|---:|:---------|:--------------------|----------------:|
| 0 | [[2 1] | [[0 1] | -1 |
| | [3 2] | [1 2] | |
| | [4 3] | [2 3] | |
| | [5 4]] | [3 4]] | |
| 0 | [[2 1] | [[0 1] | -1 |
| | [3 2] | [1 2] | |
| | [4 3] | [2 3] | |
| | [5 4]] | [3 4]] | |
| 0 | [[2 1] | [[0 1] | 2 |
| | [3 2] | [1 2] | |
| | [4 3] | [2 3] | |
| | [5 4]] | [3 4]] | |
| 0 | [[2 1] | [[0 1] | 1 |
| | [3 2] | [1 2] | |
| | [4 3] | [2 3] | |
| | [5 4]] | [3 4]] | |
