I'm fairly new to Python and Pandas and I'm currently working on a paper for university. I have a nested dictionary:
#Normaly this is a dict contains about 400 ID's and is created from an CSV file, so I just added an example dict.
ExampleDict= { 10001: 123123, 10002: 234234, 1003: 345345}
# Just pasted the function I'm using to create a base dict with 0 as values, these are changed later, but are not needed for this example.
def createVDict (BaseDict):
date_generated = pd.date_range(datetime.date(2019, 6, 1),datetime.date(2022, 1, 16))
emptyDict = defaultdict(dict)
for id in BaseDict:
for Date in date_generated:
emptyDict[id][Date.strftime('%Y-%m-%d')] = [0, 0]
return emptyDict
Dict_to_transform = createVDict(ExampleDict)
I now want to transform this Dict into two seperate Dataframes, one for each Value in the list:
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | Value1 | Value1 |
| Date 2 | Value1 | Value1 |
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | Value2 | Value2 |
| Date 2 | Value2 | Value2 |
From the information I gathered in other posts I think the way it is supposed to work is like this:
- From Dataframe from Dict:
DictDF = pd.DataFrame.from_dict(Nested_Dict(),orient='columns'):
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | [Value1, Value2] | [Value1, Value2] |
| Date 2 | [Value1, Value2] | [Value1, Value2] |
Split Columns using
pd.DataFrame(DictDF[ID].to_list(), columns=str(ID) 'V1',str(ID) 'V2'])Safe those columns as a list (as one should not alter Dataframes for perfomance sake).
Create two dataframes from the lists created.
My problem is that I'm not able to combine the different answers I found for the steps. I hope you guys can help me out. My goal is, to compare the two dataframes (for each value) with two other dataframes (which have the same structure) using .corrwith(), so that I have the correlation for value 1 and value 2.
CodePudding user response:
It's not always necessary to use pandas for everything. I've found that it's often much easier to do a little preprocessing before importing to pandas, rather than try to twist my data after it's already in pandas.
import pandas as pd
from collections import defaultdict
data = {
"ID: 1": {
"2022-01-01": [1111, 2222],
"2022-01-02": [3333, 4444]
},
"ID: 2": {
"2022-01-01": [5555, 6666],
"2022-01-02": [7777, 8888]
}
}
build1 = defaultdict(list)
build2 = defaultdict(list)
keys = ['date'] list(data.keys())
for k,v in data.items():
for k1, v1 in v.items():
build1[k1].append( v1[0] )
build2[k1].append( v1[1] )
build1 = [[k] v for k,v in build1.items()]
build2 = [[k] v for k,v in build2.items()]
df1 = pd.DataFrame(build1, columns=keys).set_index('date')
df2 = pd.DataFrame(build2, columns=keys).set_index('date')
print(df1)
print(df2)
Output:
ID: 1 ID: 2
date
2022-01-01 1111 5555
2022-01-02 3333 7777
ID: 1 ID: 2
date
2022-01-01 2222 6666
2022-01-02 4444 8888
