I have the following df, I want to groupby "group" and drop not only rows that include NaN, but all the preceding values.
group date value
1 jan2019 NaN
1 jan2019 3
1 jan2019 NaN
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 feb2019 0
2 feb2019 NaN
2 mar2019 7
2 mar2019 4
2 apr2019 5
desired df
group date value
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 mar2019 7
2 mar2019 4
2 apr2019 5
CodePudding user response:
You can flag the inf values using abs eq. Then to flag all the values before it as well, you can reverse the order of the Series and use cummax. Since you want to do this job across groups, you can use groupby.cummax. Finally, use the boolean mask to filter the desired output via loc:
out = df.loc[~df['value'].abs().eq(float('inf'))[::-1].groupby(df['group']).cummax()]
If the values to flag are NaNs (rather than inf), then we could use isna instead:
out = df.loc[~df['value'].isna()[::-1].groupby(df['group']).cummax()]
Output:
group date value
2 1 feb2019 3.0
3 1 mar2019 4.0
4 1 mar2019 5.0
7 2 mar2019 7.0
8 2 mar2019 4.0
9 2 apr2019 5.0
CodePudding user response:
A more verbose but possibly more readable approach. .isin will work for NaN as well as infs.
def filter_preceding(df, search_values=[np.nan, np.inf, -np.inf]):
ind = np.where(df['values'].isin(search_values))[0]
if len(ind) == 0:
return df
max_ind = ind.max()
return df.iloc[max_ind 1:, :]
df.groupby('group').apply(filter_preceding).reset_index(drop=True)
