A question regarding filtering using a list of values. I want to do the following:
- filter a dataframe based on certain criteria
- create a list of (one column, containing ID's) of this dataframe
- next i want to exclude this list from another dataframe.
all individual steps are working using the following code:
df3 = df2.loc[df2['value'] < parameter] (1)
my_list = df3['ID'].tolist() (2)
final_df = df[~df['column'].isin(my_list)] (3)
yet somehow filtering the frame using the first step results in the final step NOT working (so not filtering anything). When i remove the first step it again works like a charm. Does anybody know what i am doing wrong?
kind regards,
Alex
CodePudding user response:
can not confirm without having a look at an example dataframe. But You can try the same using joins and see if it produce the same answer. Other wise there might be a bug some where in your code.
PARAMETER = 30
df3 = df2[df["value"] < PARAMETER]
merged_df = df.merge(df3[["ID"]], on="ID", how="left", indicator=True)
final_df = merged_df[merged_df["_merge"] == "left_only"]
final_df = final_df.drop("_merge", axis=1)
I have not tested the code, it might need modification, it is just a concept which you can try out. Here I have used a join with an indicator which will add a new column in the merged dataframe as an indicator if the row comes from left table or right table and then I kept only the data comes from left table. This approach can also be faster than list approach, but can not confirm
CodePudding user response:
Changed datatype to 'integer' between step 1 and 2.
