Home > OS >  Unnest dict to unique key value pairs
Unnest dict to unique key value pairs

Time:01-21

I would like to use the to_dict() method to generate a two-column dataframe. The objective here is to convert a dictionary into all unique key-value pairs, for example:

{'83d945fffffffff': {'83d940fffffffff',
  '83d941fffffffff',
  '83d944fffffffff',
  '83d963fffffffff',
  '83d96afffffffff',
  '83d96efffffffff'},
 '83bcf2fffffffff': {'83bc8dfffffffff', '83bcf6fffffffff'}...

should become

                 k                v
0  83d945fffffffff  83d940fffffffff
1  83d945fffffffff  83d941fffffffff
2  83d945fffffffff  83d944fffffffff
3  83d945fffffffff  83d963fffffffff
4  83d945fffffffff  83d96afffffffff
5  83d945fffffffff  83d96efffffffff
6  83bcf2fffffffff  83bc8dfffffffff
7  83bcf2fffffffff  83bcf6fffffffff

Specifying orient='index', however, does not provide this result and instead creates NoneType cells:

    0                1              2               3               4                5                    6       
    83d945fffffffff 83d96efffffffff 83d963fffffffff 83d941fffffffff 83d940fffffffff 83d944fffffffff 83d96afffffffff
    83bcf2fffffffff 83bc8dfffffffff 83bcf6fffffffff None    None    None    None

Is there a known workaround or efficient method for producing a double-column dataframe directly from a dict?

CodePudding user response:

Here is a quick&dirty nested loop solution.

import pandas as pd
d = {'83d945fffffffff': {'83d940fffffffff',
  '83d941fffffffff',
  '83d944fffffffff',
  '83d963fffffffff',
  '83d96afffffffff',
  '83d96efffffffff'},
 '83bcf2fffffffff': {'83bc8dfffffffff', '83bcf6fffffffff'}}

k,v, = [],[]
for ki,vi in d.items():
    for vii in set(vi):
        k.append(ki)
        v.append(vii)
df = pd.DataFrame({'k':k,'v':v})

If you want it prettier you can put it in a listcomp:

d2 = {'k':[],'v':[]}
_ = [[(d2['k'].append(k),d2['v'].append(vi)) for vi in set(v)] for k,v in d.items()]
df = pd.DataFrame(d2)
df

CodePudding user response:

Use d.items() on your dict with pd.DataFrame:

df = pd.DataFrame(d.items(), columns=['k', 'v']).explode('v').reset_index(drop=True)
print(df)

# Output
                 k                v
0  83d945fffffffff  83d963fffffffff
1  83d945fffffffff  83d96afffffffff
2  83d945fffffffff  83d941fffffffff
3  83d945fffffffff  83d940fffffffff
4  83d945fffffffff  83d944fffffffff
5  83d945fffffffff  83d96efffffffff
6  83bcf2fffffffff  83bcf6fffffffff
7  83bcf2fffffffff  83bc8dfffffffff

Setup:

d = {'83d945fffffffff': {'83d940fffffffff',
  '83d941fffffffff',
  '83d944fffffffff',
  '83d963fffffffff',
  '83d96afffffffff',
  '83d96efffffffff'},
 '83bcf2fffffffff': {'83bc8dfffffffff', '83bcf6fffffffff'}}
  •  Tags:  
  • Related