My question focus on the pandas way. Is the behaviour of pandas fixed defined in that situation?
I have a list/series of numbers and want to summarize them. I can do this with sum() or with simply operator. The point is that sometimes there is a <NA> in such a list. This is OK for me when the result is always <NA>.
Of course I could check each element explicit with if val is pandas.NA. But I hope there is a better but also save way.
Here is a MWE producing two different results. Using results in a <NA> as expected. But .sum() simply ignores the <NA> in the list and give a concrete number as result.
#!/usr/bin/env python3
import pandas as pd
# 1.2.5 and 1.3.2
print(pd.__version__)
df = pd.DataFrame(data={'VAR': [pd.NA], 'X': [2]})
a = df.VAR df.X
print(a) # <NA>
b = df.iloc[0].sum()
print(b) # 2
CodePudding user response:
You need to pass skipna=False to sum, because it's True by default:
>>> df.iloc[0].sum(skipna=False)
<NA>
