I have this dataframe:
COL0 COL1 COL2 COL3
----------------------------
A A1 A11 A111
A A1 A11 A112
A A1 A12 A113
A A1 A12 A114
A A2 A13 A115
A A2 A13 A116
A A2 A14 A117
A A2 A14 A118
And I would like to obtain a dictionary like the below from it. If I just apply the to_dict() method to the original dataframe, the format is not what I would like.
{
'A':{
'A1':{
'A11':['A111', 'A112'],
'A12':['A113', 'A114']
},
'A2':{
'A13':['A115', 'A116'],
'A13':['A117', 'A118']
}
}
}
PS: snippet to generate the above dataframe:
df = pd.DataFrame(
{
'COL0': ['A']*8,
'COL1': ['A1']*4 ['A2']*4,
'COL2': ['A11']*2 ['A12']*2 ['A12']*2 ['A13']*2,
'COL3': [f'A11{i 1}' for i in range(8)]
})
EDIT:
TypeError Traceback (most recent call last) in 1 {a: {k: f.groupby('COL2')['COL3'].apply(list).to_dict() 2 for k, f in g.groupby('COL1')} ----> 3 for a, g in df.groupby('COL0')}
in (.0) 1 {a: {k: f.groupby('COL2')['COL3'].apply(list).to_dict() 2 for k, f in g.groupby('COL1')} ----> 3 for a, g in df.groupby('COL0')}
in (.0) 1 {a: {k: f.groupby('COL2')['COL3'].apply(list).to_dict() ----> 2 for k, f in g.groupby('COL1')} 3 for a, g in df.groupby('COL0')}
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\generic.py in apply(self, func, *args, **kwargs) 219 ) 220 def apply(self, func, *args, **kwargs): --> 221 return super().apply(func, *args, **kwargs) 222 223 @doc(_agg_template, examples=_agg_examples_doc, klass="Series")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in apply(self, func, *args, **kwargs) 865 def apply(self, func, *args, **kwargs): 866 --> 867 func = self._is_builtin_func(func) 868 869 # this is needed so we don't try and wrap strings. If we could
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\base.py in _is_builtin_func(self, arg) 342 otherwise return the arg 343 """ --> 344 return self._builtin_table.get(arg, arg) 345 346
TypeError: unhashable type: 'list'
EDIT2:
TypeError Traceback (most recent call last) in 1 out = {} ----> 2 for keys, v in df.groupby(list(df.columns[:-1]))[df.columns[-1]]: 3 d = out # restart at root 4 val = v.to_list() 5 for k in keys:
TypeError: 'list' object is not callable
CodePudding user response:
Iterate over the rows and collect key-value pairs; use dict.setdefault to set a default value if a key doesn't exist yet:
out = {}
for w,x,y,z in df.to_numpy():
out.setdefault(w, {}).setdefault(x, {}).setdefault(y, []).append(z)
With pd.groupby and a nested dict comprehension:
out = {a: {k: f.groupby('COL2')['COL3'].apply(list).to_dict()
for k, f in g.groupby('COL1')}
for a, g in df.groupby('COL0')}
Output:
{'A': {'A1': {'A11': ['A111', 'A112'], 'A12': ['A113', 'A114']},
'A2': {'A13': ['A115', 'A116'], 'A14': ['A117', 'A118']}}}
CodePudding user response:
Here is a way that works with an arbitrary number of columns.
It uses first groupby to set the groups per all but the last column.
Then it goes down the tree, changing the reference to the parent until it reaches the last key.
out = {}
for keys, v in df.groupby(list(df.columns[:-1]))[df.columns[-1]]:
d = out # restart at root
val = v.to_list()
for k in keys:
if k not in d:
d[k] = {} # create child if missing
parent = d
d = d[k] # go down in nested level
parent[k] = val # we reached the bottom, set the value
output:
{'A': {'A1': {'A11': ['A111', 'A112'],
'A12': ['A113', 'A114']},
'A2': {'A13': ['A115', 'A116'],
'A14': ['A117', 'A118']},
},
}
Example on further nested dictionary:
from itertools import product
df = (pd.DataFrame(product('AB', '12', '123', '1', '12'))
.add_prefix('COL').cumsum(1)
)
# COL0 COL1 COL2 COL3 COL4
# 0 A A1 A11 A111 A1111
# 1 A A1 A11 A111 A1112
# 2 A A1 A12 A121 A1211
# 3 A A1 A12 A121 A1212
# 4 A A1 A13 A131 A1311
# ...
# 23 B B2 B23 B231 B2312
output:
{'A': {'A1': {'A11': {'A111': ['A1111', 'A1112']},
'A12': {'A121': ['A1211', 'A1212']},
'A13': {'A131': ['A1311', 'A1312']}},
'A2': {'A21': {'A211': ['A2111', 'A2112']},
'A22': {'A221': ['A2211', 'A2212']},
'A23': {'A231': ['A2311', 'A2312']}}},
'B': {'B1': {'B11': {'B111': ['B1111', 'B1112']},
'B12': {'B121': ['B1211', 'B1212']},
'B13': {'B131': ['B1311', 'B1312']}},
'B2': {'B21': {'B211': ['B2111', 'B2112']},
'B22': {'B221': ['B2211', 'B2212']},
'B23': {'B231': ['B2311', 'B2312']}}}}
