I have a dictionary like so: {key_1: pd.Dataframe, key_2: pd.Dataframe, ...}.
Each of these dfs within the dictionary has a column called 'ID'.
Not all instances appear in each dataframe meaning that the dataframes are of different size.
Is there anyway I could combine these into one large dataframe?
Here's a minimal reproducible example of the data:
data1 = [{'ID': 's1', 'country': 'Micronesia', 'Participants':3},
{'ID':'s2', 'country': 'Thailand', 'Participants': 90},
{'ID':'s3', 'country': 'China', 'Participants': 36},
{'ID':'s4', 'country': 'Peru', 'Participants': 30}]
data2 = [{'ID': '1', 'country': 'Micronesia', 'Kids_per_participant':3},
{'ID':'s2', 'country': 'Thailand', 'Kids_per_participant': 9},
{'ID':'s3', 'country': 'China', 'Kids_per_participant': 39}]
data3= [{'ID': 's1', 'country': 'Micronesia', 'hair_style_rank':3},
{'ID':'s2', 'country': 'Thailand', 'hair_style_rank': 9}]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
dict_example={'df1_key':df1,'df2_key':df2,'df3_key':df3}
pd.merge(dict_example.values(), on="ID", how="outer")
CodePudding user response:
For a dict with arbitrary number of keys you could do this
i=list(dict_example.keys())
newthing = dict_example[i[0]]
for j in range(1,len(i)):
newthing = newthing.merge(dict_example[i[j]],on='ID', how = 'outer')
First make a list of your dataframes. Second create a first DataFrame. Then iterate through the rest of your DataFrames and merge each one after that. I did notice you have country for each ID, but it's not listing in your initial on statement. Do you want to join on country also? If so replace the merge above with this changing the join criteria to a list including country
newthing = newthing.merge(dict_example[i[j]],on=['ID','country'], how = 'outer')
Documents on merge
If you don't care about altering your DataFrames code could be shorter like this
for j in range(1,len(i)):
df1 = df1.merge(dict_example[i[j]],on=['ID','country'], how = 'outer')
