I have some -np.inf and np.inf values in my dataframe.
I would like to replace them with the respective minimum and maximum values of the dataframe.
I thought it should be possible with something like this:
df.replace([np.inf, -np.inf], [df.max, df.min], axis=1, inplace = True)
But it didn't work. I had the idea because I can use something similar to replace nans with fillna().
What is an effective way to go about it?
Is there a numpy version?
Thanks for any tips!
CodePudding user response:
You can use .replace(), as follows:
df = df.replace({np.inf: df[np.isfinite(df)].max().max(),
-np.inf: df[np.isfinite(df)].min().min()})
Here, df[np.isfinite(df)].max().max() and df[np.isfinite(df)].min().min() are the respective finite maximum and minimum of the dataframe. We replace np.inf and -np.inf with them respectively.
Demo
Data Input
df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})
Col1 Col2
0 inf 1234.0
1 -2000.0 -inf
2 345.0 890.0
Output:
print(df)
Col1 Col2
0 1234.0 1234.0
1 -2000.0 -2000.0
2 345.0 890.0
Edit
If you want to replace with min max of the particular column instead of the min max over the global dataframe, you can use nested dict in .replace(), as follows:
min_max_dict = {np.inf: df[np.isfinite(df)].max(), -np.inf: df[np.isfinite(df)].min()}
df = df.replace({col: min_max_dict for col in df.columns})
Demo
Data Input
df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})
Col1 Col2
0 inf 1234.0
1 -2000.0 -inf
2 345.0 890.0
Output:
print(df)
Col1 Col2
0 345.0 1234.0
1 -2000.0 890.0
2 345.0 890.0
inf and -inf are replaced by the respective max, min of the column accordingly.
CodePudding user response:
You can use df.mask() which takes boolean series or dataframes, which you can then get with np.isinf for example.
>>> df
0
0 0.0
1 1.0
2 inf
3 2.0
4 -inf
5 3.0
>>> posinf = df.gt(0) & df.transform(np.isinf)
>>> neginf = df.lt(0) & df.transform(np.isinf)
>>> df = df.mask(posinf, df.mask(posinf).max().max())
>>> df = df.mask(neginf, df.mask(neginf).min().min())
>>> df
0
0 0.0
1 1.0
2 3.0
3 2.0
4 0.0
5 3.0
I’m masking twice, the inner call has no replacing value so it replaces with NaN. This allows to then compute the min/max bounds.
CodePudding user response:
You can compute masks for inf/-inf and replace with the values you want:
import numpy as np
m1 = df.eq(np.inf)
m2 = df.eq(-np.inf)
df.mask(m1, df[~m1].max().max()).mask(m2, df[~m2].min().min()))
NB. this will replace the inf with the min/max for the whole dataframe, if you want to take the min/max per column:
df.mask(m1, df[~m1].max(), axis=1).mask(m2, df[~m2].min(), axis=1)
input:
col
0 inf
1 1.0
2 -inf
3 2.0
4 NaN
output:
col
0 2.0
1 1.0
2 1.0
3 2.0
4 NaN
