Pandas dataframe operations are pretty straightforward. Look at this, I create a datframe with two rows called A and B:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
In [3]: df
Out[3]:
A B
0 True 1
1 False 0
In [5]: df.any()
Out[5]:
A True
B True
dtype: bool
Documentation says operation happens on rows by default. Then how come the output contains column names instead of row index? Should not this be the output:
In [5]: df.any()
Out[5]:
0 True
1 False
Thanks to @user4718221.
Explanation: So any() reduces the index while any(axis=1) reduced the columns. Next question is: how all of this starts? What's the first step? Here is the explanation:
- any() returns whether any element is True. Kind of logical OR.
- it begins to reduce entire index to one value
- row 0 for column A is True. Logical OR succeeded, no more checks needed for column A
- row 0 for column B is 1 (which is True). Logical OR succeeded here too, no more checks needed for column B
- answer is A: True, B: True .. which is the correct answer
CodePudding user response:
Here's what the documentation states
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
The default value is 0, and what you're getting is a Series showing if there are any True or False values per column
