Home > Software design >  Pandas dataframe: convert column to number with default value
Pandas dataframe: convert column to number with default value

Time:01-08

I'm surprised I couldn't find anything about this:

Q: How can I convert values of a column to numbers with a default value if casting fails.

I need all values to be numbers in order to perform arithmetical functions on them. The raw input data contains non-convertible values like "","**", "not available", ....

E.g.

1
2
*
4

Should become:

1
2
0
4

What I've tried:

  • to_numberic: Does not allow to set a default value. errors=coerce will set the value to NaN, which can not be handled arithmetically afterwards.
  • df[COLUMN].apply (makeNumber) with def makeNumber(value): return 0.0 does apparently nothing on the string values.

CodePudding user response:

If no missing values in original use:

df['col'] = pd.to_numeric(df['col'], errors='coerce').fillna(0)

If need downcast like comment solution:

df['col'] = pd.to_numeric(df['col'], errors='coerce').fillna(0, downcast="infer")
print (df)
   col
0    1
1    2
2    0
3    4

If possible missing values in original and need not replace them:

print (df)
   col
0    1
1    2
2    *
3    4
4  NaN

s = pd.to_numeric(df['col'], errors='coerce')

df['col'] = s.mask(s.isna() & df['col'].notna(), 0)
print (df)
   col
0  1.0
1  2.0
2  0.0
3  4.0
4  NaN
  •  Tags:  
  • Related