Home > Enterprise >  How to convert list of nested dictionary into Pandas dataframe. But nested dictionary list length is
How to convert list of nested dictionary into Pandas dataframe. But nested dictionary list length is

Time:02-04

I want convert the nested dictionary list into data frame. but Nested dictionary each key's List has different length. for instance, the below nested dictionary keys x's,y's,z's list has different lengths.

dict_example = {
    'a': {
        'x': ['f1', 'f2', 'f6'],
        'y': ['f6', 'f9', 'f2', 'f8'],
        'z': ['f1', 'f9', 'f2', 'f8', 'f6', 'f10', 'f3']
    },
    'b': {
        'x': ['f1', 'f2', 'f6', 'f4'],
        'y': ['f6', 'f9', 'f2', 'f8', 'f17','f12'],
        'z': ['f1', 'f9', 'f2', 'f10', 'f3']
    }
}
from tabulate import tabulate
import pandas as pd
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in dict_example.items()]))
print(tabulate(df, headers='keys', tablefmt='psql'))

the above code is giving the output like the below table. But i want to get the each value of the list into a row that belongs to nested key. key a and b are main columns.

 ---- --------------------------------------------- --------------------------------- 
|    | a                                           | b                               |
|---- --------------------------------------------- ---------------------------------|
| x  | ['f1', 'f2', 'f6']                          | ['f1', 'f2', 'f6', 'f4']        |
| y  | ['f6', 'f9', 'f2', 'f8']                    | ['f6', 'f9', 'f2', 'f8', 'f17'] |
| z  | ['f1', 'f9', 'f2', 'f8', 'f6', 'f10', 'f3'] | ['f1', 'f9', 'f2', 'f10', 'f3'] |
 ---- --------------------------------------------- --------------------------------- 

The output data frame should look like this.

    a   b
x   f1  f1
    f2  f2
    f6  f6
    NaN f4
y   f6  f6
    f9  f9
    f2  f2
    f8  f8
    NaN f17
z   f1  f1
    f9  f9
    f2  f2
    f8  f10
    f6  f3
    f10 NaN
    f3  NaN

CodePudding user response:

Use dict comprehension with concat and DataFrame.stack:

d = {k: pd.DataFrame({k1:pd.Series(v1) for k1, v1 in v.items()}) 
                      for k,v in dict_example.items()}

df = pd.concat(d, axis=1).stack()
print (df)
       a    b
x 0   f1   f1
  1   f2   f2
  2   f6   f6
  3  NaN   f4
y 0   f6   f6
  1   f9   f9
  2   f2   f2
  3   f8   f8
  4  NaN  f17
  5  NaN  f12
z 0   f1   f1
  1   f9   f9
  2   f2   f2
  3   f8  f10
  4   f6   f3
  5  f10  NaN
  6   f3  NaN

CodePudding user response:

This approach seems pretty fast using itertools and numpy:

import pandas as pd
import numpy as np
import itertools

def process(a, b):
  x = np.array(list(itertools.zip_longest(*[a, b],fillvalue=np.nan))).T
  return x[0], x[1]
df = pd.DataFrame(dict_example)
df[['a','b']] = df.apply(lambda x: process(x.a, x.b), axis=1, result_type='expand')
df = df.explode(['a', 'b'])
     a    b
x   f1   f1
x   f2   f2
x   f6   f6
x  nan   f4
y   f6   f6
y   f9   f9
y   f2   f2
y   f8   f8
y  nan  f17
y  nan  f12
z   f1   f1
z   f9   f9
z   f2   f2
z   f8  f10
z   f6   f3
z  f10  nan
z   f3  nan

Or even faster with to_numpy():

def process(A, B):
  a_col, b_col = [], []
  for a,b in zip(A,B):
    x = np.array(list(itertools.zip_longest(*[a, b],fillvalue=np.nan))).T
    a_col.append(x[0])
    b_col.append(x[1])
  return a_col, b_col

df = pd.DataFrame(dict_example)
df['a'], df['b'] = process(df['a'].to_numpy(), df['b'].to_numpy())
df = df.explode(['a', 'b'])
  •  Tags:  
  • Related