Home > Net >  Splitting nested dictionary with list as Values (with two values) in two dataframes
Splitting nested dictionary with list as Values (with two values) in two dataframes

Time:02-01

I'm fairly new to Python and Pandas and I'm currently working on a paper for university. I have a nested dictionary:

#Normaly this is a dict contains about 400 ID's and is created from an CSV file, so I just added an example dict.
ExampleDict= { 10001: 123123, 10002: 234234, 1003: 345345}

# Just pasted the function I'm using to create a base dict with 0 as values, these are changed later, but are not needed for this example.

def createVDict (BaseDict):
    date_generated = pd.date_range(datetime.date(2019, 6, 1),datetime.date(2022, 1, 16))
    emptyDict = defaultdict(dict)
    for id in BaseDict:
        for Date in date_generated:
            emptyDict[id][Date.strftime('%Y-%m-%d')] = [0, 0]
    return emptyDict

Dict_to_transform = createVDict(ExampleDict)

I now want to transform this Dict into two seperate Dataframes, one for each Value in the list:

ID 1 ID 2
Date 1 Value1 Value1
Date 2 Value1 Value1
ID 1 ID 2
Date 1 Value2 Value2
Date 2 Value2 Value2

From the information I gathered in other posts I think the way it is supposed to work is like this:

  1. From Dataframe from Dict: DictDF = pd.DataFrame.from_dict(Nested_Dict(),orient='columns'):
ID 1 ID 2
Date 1 [Value1, Value2] [Value1, Value2]
Date 2 [Value1, Value2] [Value1, Value2]
  1. Split Columns using pd.DataFrame(DictDF[ID].to_list(), columns=str(ID) 'V1',str(ID) 'V2'])

  2. Safe those columns as a list (as one should not alter Dataframes for perfomance sake).

  3. Create two dataframes from the lists created.

My problem is that I'm not able to combine the different answers I found for the steps. I hope you guys can help me out. My goal is, to compare the two dataframes (for each value) with two other dataframes (which have the same structure) using .corrwith(), so that I have the correlation for value 1 and value 2.

CodePudding user response:

It's not always necessary to use pandas for everything. I've found that it's often much easier to do a little preprocessing before importing to pandas, rather than try to twist my data after it's already in pandas.

import pandas as pd
from collections import defaultdict

data = {
    "ID: 1": {
        "2022-01-01": [1111, 2222],
        "2022-01-02": [3333, 4444]
    },
    "ID: 2": {
        "2022-01-01": [5555, 6666],
        "2022-01-02": [7777, 8888]
    }
}

build1 = defaultdict(list)
build2 = defaultdict(list)
keys = ['date'] list(data.keys())
for k,v in data.items():
    for k1, v1 in v.items():
        build1[k1].append( v1[0] )
        build2[k1].append( v1[1] )
build1 = [[k] v for k,v in build1.items()]
build2 = [[k] v for k,v in build2.items()]
df1 = pd.DataFrame(build1, columns=keys).set_index('date')
df2 = pd.DataFrame(build2, columns=keys).set_index('date')
print(df1)
print(df2)

Output:

            ID: 1  ID: 2
date                    
2022-01-01   1111   5555
2022-01-02   3333   7777
            ID: 1  ID: 2
date                    
2022-01-01   2222   6666
2022-01-02   4444   8888
  •  Tags:  
  • Related