For every unique combination of the first two indexes, I want all of the rows (and the index name) of their third index transformed into a json string column.
For example
recs = [{'id':123,'color':'orange','store':'big_mart', 'price':6}, {'id':123,'color':'orange','store':'Buckyz','price':5}, {'id':234,'color':'blue', 'store':'Gmart','price':7}]
d3 = pd.DataFrame(recs)
d3.set_index(['id','color','store'])
My intended output is supposed to look like something like this.
| id | color | new_col |
|---|---|---|
| 123 | 'orange' | "[ {'store':'big_mart','price': 6}, {'store':'Buckyz','price':5}]" |
| 234 | 'blue' | "[{'store':'Gmart','price':7}]" |
The best I could do is this
d3.reset_index()
d3.groupby(['id','color']).apply(lambda x: x[['store','price']].to_json(orient='records')))
id color
123 orange [{"store":"big_mart","price":6},{"store":"Buck...
234 blue [{"store":"Gmart","price":7}]
dtype: object
It works, but I'm having trouble appending the list output as a column into a deduplicated dataframe
CodePudding user response:
You can groupby "id" and "color" and then apply to_dict with orient parameter set to "records" to each group:
out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_dict('records')).astype(str).reset_index(name='new_col')
or use to_json (saves conversion to type str):
out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_json(orient='records')).reset_index(name='new_col')
Output:
id color new_col
0 123 orange [{'store': 'big_mart', 'price': 6}, {'store': 'Buckyz', 'price': 5}]
1 234 blue [{'store': 'Gmart', 'price': 7}]
CodePudding user response:
Try this:
cols = ['id', 'colors']
new_df = d3.groupby(cols).apply(lambda x: x.drop(cols, axis=1).to_dict('records')).reset_index(name='new_col')
Output:
>>> new_df
id color new_col
0 123 orange [{'store': 'big_mart', 'price': 6}, {'store': ...
1 234 blue [{'store': 'Gmart', 'price': 7}]
