I am looking to create multiple dataframes from one original dataframe. I want to loop through one column and create dataframes on its matches.
TargetListID Placement_Description Buy_Component_Type NPI_ID First_Name
0 123456 test_test email 123456 paul
1 234567 test_test email 123456 paul
2 345678 test_test email 123456 paul
3 456789 test_test email 123456 paul
4 123456 test_test_test video 987654 karol
5 234567 test_test_test video 987654 karol
6 345678 test_test_test video 987654 karol
7 456789 test_test_test video 987654 karol
This is my original df and I want to create a new df of every match in TargetListID I have looked through :
Create multiple dataframes in loop
And tried the following:
def create_df_from_target_list_id(dataframe):
dataframe = {target_list_id: pd.DataFrame() for target_list_id in dataframe['TargetListID']}
return dataframe
test = create_df_from_target_list_id(df)
print(test)
Which gives me:
{123456: Empty DataFrame
Columns: []
Index: [], 234567: Empty DataFrame
Columns: []
Index: [], 345678: Empty DataFrame
Columns: []
Index: [], 456789: Empty DataFrame
Columns: []
Index: []}
So not sure what I am exactly doing wrong here? Any pointers would be create. The reason for this is because the original dataframe could have 1000s rows. So would like to create dataframes without knowing the TargetListId
I tried groupby here:
def create_df_from_target_list_id(dataframe):
dataframe = dict(iter(dataframe.groupby('TargetListID')))
return dataframe
test = create_df_from_target_list_id(df)
print(test)
and got the following
{123456: TargetListID Placement_Description Buy_Component_Type NPI_ID First_Name
0 123456 test_test email 123456 paul
4 123456 test_test_test video 987654 karol, 234567: TargetListID Placement_Description Buy_Component_Type NPI_ID First_Name
1 234567 test_test email 123456 paul
5 234567 test_test_test video 987654 karol, 345678: TargetListID Placement_Description Buy_Component_Type NPI_ID First_Name
2 345678 test_test email 123456 paul
6 345678 test_test_test video 987654 karol, 456789: TargetListID Placement_Description Buy_Component_Type NPI_ID First_Name
3 456789 test_test email 123456 paul
7 456789 test_test_test video 987654 karol}
CodePudding user response:
It looks like your approach is not far off, but notice that in your function you are calling pd.DataFrame() which creates a DataFrame but gives it no data to use (hence the empty DataFrames).
If you want a DataFrame for each unique value of TargetListID, and your original DataFrame is called df, then you could do something like this:
df_dictionary = { target_id: df[ df.TargetListID == target_id ] for target_id in df.TargetListID.unique() }
You could do this inside your function or separately.
Note that you may want to first drop any null TargetListID values and you may want to use reset_index() when creating your dictionary depending on what you need.
CodePudding user response:
I think you are looking for a plain and simple groupby.
r = {}
for target_id, df in in dataframe.groupby('TargetListID'):
r[target_id] = df
