I have a sample excel data as:
Col1 Col2 Col3 Col4 Col5 . . .
MIT1 AD Profile MM MD
MIT1 AES Document MM MD
MA3 ALL Price Retail -
MA4 BEG Group MM Eh
I want to read this excel in this format:
{'MIT1': [{'AD':['Profile','MM', 'MD']},{'AES':['Document','MM', 'MD']},...]... }
Where Col1 becomes Key of outer dictionary, Col2 becomes Key of inner dictionary and Col3 onwards(there are more columns but for sake I have shown only 5) becomes values as list of inner dictionary.
I wrote this code but it doesn't create the desired format:
df1 = pd.read_excel('Example.xlsx', sheet_name='Datamodel',header = [0])
df1 = df1.dropna()
df1_dict = df1.to_dict()
How to get the correct format?
CodePudding user response:
You can create Series with MultiIndex by first 2 columns and all another columns convert to list and then create nested dicts:
s = df.set_index(['Col1','Col2']).agg(list, axis=1)
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{'MA3': {'ALL': ['Price', 'Retail', '-']},
'MA4': {'BEG': ['Group', 'MM', 'Eh']},
'MIT1': {'AD': ['Profile', 'MM', 'MD'],
'AES': ['Document', 'MM', 'MD']}}
CodePudding user response:
What you want to achieve is impossible, dictionaries cannot have duplicated keys.
An alternative might be:
out = {k: v.set_index('Col2').T.to_dict('list')
for k, v in df.set_index('Col1').groupby(level=0)}
output:
{'MA3': {'ALL': ['Price', 'Retail', '-']},
'MA4': {'BEG': ['Group', 'MM', 'Eh']},
'MIT1': {'AD': ['Profile', 'MM', 'MD'],
'AES': ['Document', 'MM', 'MD']}}
