Home > database >  How to treat pandas <NA> values in a series/list of numbers to summarize
How to treat pandas <NA> values in a series/list of numbers to summarize

Time:01-04

My question focus on the pandas way. Is the behaviour of pandas fixed defined in that situation?

I have a list/series of numbers and want to summarize them. I can do this with sum() or with simply operator. The point is that sometimes there is a <NA> in such a list. This is OK for me when the result is always <NA>.

Of course I could check each element explicit with if val is pandas.NA. But I hope there is a better but also save way.

Here is a MWE producing two different results. Using results in a <NA> as expected. But .sum() simply ignores the <NA> in the list and give a concrete number as result.

#!/usr/bin/env python3
import pandas as pd

# 1.2.5 and 1.3.2
print(pd.__version__)

df = pd.DataFrame(data={'VAR': [pd.NA], 'X': [2]})

a = df.VAR   df.X
print(a)  # <NA>

b = df.iloc[0].sum()
print(b)  # 2

CodePudding user response:

You need to pass skipna=False to sum, because it's True by default:

>>> df.iloc[0].sum(skipna=False)
<NA>
  •  Tags:  
  • Related