How can I replace leading zeros with NAs? suppose I have the following example:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'c1': [0.0, 0.0, 1.0, 0.0], 'c2': [1.0, 1.0, 1.0, 0.0]})
the goal is to have the following result:
c1 c2
NA 1.0
NA 1.0
1.0 1.0
0.0 0.0
but doing the following will not work since it will replace all and not just the leading zeros:
df[np.abs(df) < 1e-50] = np.nan
CodePudding user response:
Similar to mozways' answer, but with assignment and cummin:
df[df.eq(0).cummin()] = pd.NA
Demo:
>>> df
c1 c2
0 0.0 1.0
1 0.0 1.0
2 1.0 1.0
3 0.0 0.0
>>> df.eq(0).cummin()
c1 c2
0 True False
1 True False
2 False False
3 False False
>>> df[df.eq(0).cummin()] = pd.NA
>>> df
c1 c2
0 NaN 1.0
1 NaN 1.0
2 1.0 1.0
3 0.0 0.0
CodePudding user response:
You could use cummax to fill the non-trailing zeros, and mask those that remain:
df.mask(df.cummax().eq(0))
Or to handle all non zero values with limited precision:
df.mask(df.gt(1e-50).cummax().lt(1e-50))
output:
c1 c2
0 NaN 1.0
1 NaN 1.0
2 1.0 1.0
3 0.0 0.0
CodePudding user response:
I think you got the sign wrong.
if you switch to
df[np.abs(df) < 1e-50] = np.nan
it should work as intended.
CodePudding user response:
You can also check with ffill
df[df.mask(df==0).ffill().isna()] = np.nan
df
Out[141]:
c1 c2
0 NaN 1.0
1 NaN 1.0
2 1.0 1.0
3 0.0 0.0
