Home > Mobile >  Create new dataframe columns based on lists of indices in a column and another dictionary
Create new dataframe columns based on lists of indices in a column and another dictionary

Time:01-22

Given the following dataframe and list of dictionaries:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict([
                        {'id': '912SAFD', 'key': 3, 'list_index': [0]},
                        {'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
                        {'id': '712SAFD', 'key': 5, 'list_index': [2]}])

designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]}, 
           {'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]}, 
           {'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]

Dataframe output:

        id  key list_index
0  912SAFD    3        [0]
1  812SAFD    4     [0, 1]
2  712SAFD    5        [2]

Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index' for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?

Here is an example of the expected result:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b    NaN    NaN         NaN
1  812SAFD    4     [0, 1]      b      c    NaN         NaN
2  712SAFD    5        [2]    NaN    NaN      d          fr

If 'lang' is not empty, it should be added as a column to the dataframe by using the color_id value combined with an underscore and its own name as the column name. For example: 609092_lang.

Any help would be much appreciated.

CodePudding user response:

# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]

df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()

print(df)

output :

>>>
color_id       id  key lang 609090 609091 609092
0         712SAFD    5   fr    NaN    NaN      d
1         812SAFD    4           b      c    NaN
2         912SAFD    3           b    NaN    NaN

CodePudding user response:

First, we need to change the designs dictionary to get the relevant data and create a mapper that maps indices to dict values. Use enumerate and dict.setdefault for that:

designs_dict = {} 
for i, des in enumerate(designs):
    color_id = des['designs'][0]['color_id']
    designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
    if des['designs'][0]['lang'] != '':
        designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})

Now designs_dict looks like this:

{0: [{609090: 'b'}], 
 1: [{609091: 'c'}], 
 2: [{609092: 'd'}, {'609092_lang': 'fr'}]}

Then

(i) explode "list_index" and for each index there, map "designs_dict" to it; then explode again to get rid of lists

(ii) construct a DataFrame from (i); groupby the index and use first to shrink the DataFrame

(iii) join (ii) to df

s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)

Final output:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b   None   None        None
1  812SAFD    4     [0, 1]      b      c   None        None
2  712SAFD    5        [2]   None   None      d          fr
  •  Tags:  
  • Related