Home > Software design >  Pandas, transform inner most index into a json string column or a list of dictionaries
Pandas, transform inner most index into a json string column or a list of dictionaries

Time:02-01

For every unique combination of the first two indexes, I want all of the rows (and the index name) of their third index transformed into a json string column.

For example


recs = [{'id':123,'color':'orange','store':'big_mart', 'price':6}, {'id':123,'color':'orange','store':'Buckyz','price':5}, {'id':234,'color':'blue', 'store':'Gmart','price':7}]
d3 = pd.DataFrame(recs)
d3.set_index(['id','color','store'])

My intended output is supposed to look like something like this.

id color new_col
123 'orange' "[ {'store':'big_mart','price': 6}, {'store':'Buckyz','price':5}]"
234 'blue' "[{'store':'Gmart','price':7}]"

The best I could do is this

d3.reset_index()
d3.groupby(['id','color']).apply(lambda x: x[['store','price']].to_json(orient='records')))

id   color 
123  orange    [{"store":"big_mart","price":6},{"store":"Buck...
234  blue                          [{"store":"Gmart","price":7}]
dtype: object

It works, but I'm having trouble appending the list output as a column into a deduplicated dataframe

CodePudding user response:

You can groupby "id" and "color" and then apply to_dict with orient parameter set to "records" to each group:

out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_dict('records')).astype(str).reset_index(name='new_col')

or use to_json (saves conversion to type str):

out = d3.groupby(['id', 'color'])[['store', 'price']].apply(lambda x: x.to_json(orient='records')).reset_index(name='new_col')

Output:

    id   color                                            new_col
0  123  orange  [{'store': 'big_mart', 'price': 6}, {'store': 'Buckyz', 'price': 5}]
1  234    blue                   [{'store': 'Gmart', 'price': 7}]

CodePudding user response:

Try this:

cols = ['id', 'colors']
new_df = d3.groupby(cols).apply(lambda x: x.drop(cols, axis=1).to_dict('records')).reset_index(name='new_col')

Output:

>>> new_df
    id   color                                            new_col
0  123  orange  [{'store': 'big_mart', 'price': 6}, {'store': ...
1  234    blue                   [{'store': 'Gmart', 'price': 7}]
  •  Tags:  
  • Related