I have a dataframe df that looks something like that
print(df)
x outlier_flag
10 1
NaN 1
30 1
543 -1
50 1
I would like to substitute values flagged with outlier_flag==-1 with the interpolated values between row['A][i-1] and row['A][i 1], means I want to substitute the presented wrong value of 543 with 40.
What I could do is
df['x'] = df.apply(lambda row: np.nan if row['outlier_flag']==-1 else row['x'], axis=1)
df.interpolate(method='polynomial', order=3, inplace=True)
But I don't want to do this, because this would also interpolate nan values in df['x'] that are not marked with outlier_flag==-1 (see for that the second row)! Pure nan values, not marked by the flag, I want to keep as nan for a task later on.
So, is there a way to do the interpolation in place, even for a value like 543 that is not nan?
I tried doing
df['x'] = df.apply(lambda row: row['x'].interpolate(method='polynomial', order=3) if row['outlier_flag']==-1 else row['x'], axis=1)
But this throws an error, because only nan can be interpolated and 543 is int. Do you have a suggestion for me? Tnx
CodePudding user response:
use np.where:
df['x'] = np.where(df['outlier_flag'] == -1, (df['x'].shift(1) df['x'].shift(-1))/2, df['x'])
print(df)
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 40.0 -1
4 50.0 1
CodePudding user response:
This is a way that you can use interpolate() as you intend to.
You can first create a list containing the index of the rows which have -1 in outlier flag, and replace the values in x to be np.nan using loc:
incl = df.index[df['outlier_flag'] == -1].tolist()
df.loc[df.index.isin(incl), 'x'] = np.nan
>>> df
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 NaN -1
4 50.0 1
Then, you can use np.where to check whether x isnull() and whether that particular index is in the list you created, and apply your interpolation:
df['x']= np.where( (df['x'].isnull()) & (df.index.isin(incl)), df['x'].interpolate(),df['x'])
Which prints:
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 40.0 -1
4 50.0 1
