I have 2 DataFrames, one is called old and another is called new.
The 2 DataFrames have multiple columns, but I am interested in column called ADDTEXT. When you open the 2 files in Excel and compare the ADDTEXT columns, they are completely identical.
When I do old == new in Python, it returns False. When I do new['ADDTEXT'].equals(old['ADDTEXT']) it returns True.
Why don't they both return True since both columns contain only the NaN values in them?
Example output:
>>> new = pd.read_excel('3.8_self_input_data.xlsx')
>>>
>>>
>>> old = pd.read_excel('3.7_self_input_data.xlsx')
>>>
>>> old['ADDTEXT']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
13630 NaN
13631 NaN
13632 NaN
13633 NaN
13634 NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
13630 NaN
13631 NaN
13632 NaN
13633 NaN
13634 NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT'] == old['ADDTEXT']
0 False
1 False
2 False
3 False
4 False
...
13630 False
13631 False
13632 False
13633 False
13634 False
Name: ADDTEXT, Length: 13635, dtype: bool
>>>
>>> new['ADDTEXT'].equals(old['ADDTEXT'])
True
CodePudding user response:
NaN != NaN
Instead of just using .equals(), you can use isna() on the two columns:
(new['ADDTEXT'].eq(old['ADDTEXT']) | (new['ADDTEXT'].isna() & old['ADDTEXT'].isna()))
Basically that reads: return True for each item if both items are equal or both are NaN.
