Home > Net >  Understanding Columns and Rows operations in Pandas dataframe
Understanding Columns and Rows operations in Pandas dataframe

Time:01-27

Pandas dataframe operations are pretty straightforward. Look at this, I create a datframe with two rows called A and B:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [True, False], "B": [1, 0]})

In [3]: df
Out[3]: 
       A  B
0   True  1
1  False  0
    
In [5]: df.any()
Out[5]: 
A    True
B    True
dtype: bool

Documentation says operation happens on rows by default. Then how come the output contains column names instead of row index? Should not this be the output:

In [5]: df.any()
Out[5]: 
0    True
1    False

Thanks to @user4718221.

Explanation: So any() reduces the index while any(axis=1) reduced the columns. Next question is: how all of this starts? What's the first step? Here is the explanation:

  • any() returns whether any element is True. Kind of logical OR.
  • it begins to reduce entire index to one value
  • row 0 for column A is True. Logical OR succeeded, no more checks needed for column A
  • row 0 for column B is 1 (which is True). Logical OR succeeded here too, no more checks needed for column B
  • answer is A: True, B: True .. which is the correct answer

CodePudding user response:

Here's what the documentation states

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

The default value is 0, and what you're getting is a Series showing if there are any True or False values per column

  •  Tags:  
  • Related