Home > Software design >  Replace zeroes with nan in either data frame or array based on another element in the row
Replace zeroes with nan in either data frame or array based on another element in the row

Time:01-28

I have a dataset which can be in a numpy array, or a dataframe, here is a sample of it in a dataframe:

  totalsum totalmean raindiffsum raindiffmean       name                  bin
0        0       NaN           0          NaN  openguage  2021-11-01 00:00:00
1        0       NaN           0          NaN  openguage  2021-11-01 00:30:00
2        0       NaN           0          NaN  openguage  2021-11-01 01:00:00
3        0       NaN           0          NaN  openguage  2021-11-01 01:30:00
4        0       NaN           0          NaN  openguage  2021-11-01 02:00:00

I have the same data as a numpy array. I need to replace the zero values with nan, but only when there is a nan in the same row.

for clarity, this is further down the same dataframe, I DO NOT want to replace the zeroes in lines 1518 and 1519 with nan.

totalsum totalmean  ...       name                  bin
1515        0       NaN  ...  openguage  2021-12-02 13:30:00
1516        0       NaN  ...  openguage  2021-12-02 14:00:00
1517        0       NaN  ...  openguage  2021-12-02 14:30:00
1518      0.0       0.0  ...  openguage  2021-12-02 15:00:00
1519      0.0       0.0  ...  openguage  2021-12-02 15:30:00

[5 rows x 6 columns]

I have tried np.where() I have tried a for loop (on the numpy array), none of these loops throw an error, but have no effect:-

for i in range(len(dfbinarr)):
    if dfbinarr[i,1] is nan:
        dfbinarr[i,0]=nan
        dfbinarr[i,2]=nan

 for i in range(len(dfbinarr)):
    if dfbinarr[i,1] is nan:
        dfbinarr[i,0]=np.nan
        dfbinarr[i,2]=np.nan

for i in range(len(dfbinarr)):
    if dfbinarr[i,1] ==nan:
        dfbinarr[i,0]=np.nan
        dfbinarr[i,2]=np.nan

any help would be greatly appreciated!

CodePudding user response:

You can use .loc to do this. df['totalmean'].isna() returns a mask (just a Series) where each value is true if that item in totalmean is NaN, false otherwise.

df.loc[df['totalmean'].isna(), 'totalsum'] = np.nan

Output:

>>> df
   totalsum  totalmean  raindiffsum  raindiffmean       name                  bin
0       NaN        NaN            0           NaN  openguage  2021-11-01x00:00:00
1       NaN        NaN            0           NaN  openguage  2021-11-01x00:30:00
2       NaN        NaN            0           NaN  openguage  2021-11-01x01:00:00
3       0.0        0.0            0           NaN  openguage  2021-11-01x01:30:00
4       0.0        0.0            0           NaN  openguage  2021-11-01x02:00:00
  •  Tags:  
  • Related