Home > Net >  creating multiple dataframes from looping over a column in a df
creating multiple dataframes from looping over a column in a df

Time:01-12

I am looking to create multiple dataframes from one original dataframe. I want to loop through one column and create dataframes on its matches.

   TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
0        123456             test_test              email  123456       paul
1        234567             test_test              email  123456       paul
2        345678             test_test              email  123456       paul
3        456789             test_test              email  123456       paul
4        123456        test_test_test              video  987654      karol
5        234567        test_test_test              video  987654      karol
6        345678        test_test_test              video  987654      karol
7        456789        test_test_test              video  987654      karol

This is my original df and I want to create a new df of every match in TargetListID I have looked through :

Create multiple dataframes in loop

And tried the following:

def create_df_from_target_list_id(dataframe):
    dataframe = {target_list_id: pd.DataFrame() for target_list_id in dataframe['TargetListID']}
    return dataframe


test = create_df_from_target_list_id(df)
print(test)

Which gives me:

{123456: Empty DataFrame
Columns: []
Index: [], 234567: Empty DataFrame
Columns: []
Index: [], 345678: Empty DataFrame
Columns: []
Index: [], 456789: Empty DataFrame
Columns: []
Index: []}

So not sure what I am exactly doing wrong here? Any pointers would be create. The reason for this is because the original dataframe could have 1000s rows. So would like to create dataframes without knowing the TargetListId

I tried groupby here:

def create_df_from_target_list_id(dataframe):
    dataframe = dict(iter(dataframe.groupby('TargetListID')))
    return dataframe


test = create_df_from_target_list_id(df)
print(test)

and got the following

{123456:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
0        123456             test_test              email  123456       paul
4        123456        test_test_test              video  987654      karol, 234567:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
1        234567             test_test              email  123456       paul
5        234567        test_test_test              video  987654      karol, 345678:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
2        345678             test_test              email  123456       paul
6        345678        test_test_test              video  987654      karol, 456789:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
3        456789             test_test              email  123456       paul
7        456789        test_test_test              video  987654      karol}

CodePudding user response:

It looks like your approach is not far off, but notice that in your function you are calling pd.DataFrame() which creates a DataFrame but gives it no data to use (hence the empty DataFrames).

If you want a DataFrame for each unique value of TargetListID, and your original DataFrame is called df, then you could do something like this:

df_dictionary = { target_id: df[ df.TargetListID == target_id ] for target_id in df.TargetListID.unique() }

You could do this inside your function or separately.

Note that you may want to first drop any null TargetListID values and you may want to use reset_index() when creating your dictionary depending on what you need.

CodePudding user response:

I think you are looking for a plain and simple groupby.

r = {}
for target_id, df in in dataframe.groupby('TargetListID'):
    r[target_id] = df
  •  Tags:  
  • Related