Select the columns in a dataframe where values are less than x value and exclude this operation on c-CodePudding

I have a data frame

In = pd.DataFrame([
    #["A", "B", "C", "D","E","F"]
    ["V",  13, 0.0,  45,  65,  32], 
    ["W",  23, 45, 0.06,  78,  33], 
    ["X",  12, 37,   53, 0.00, 0.9],
    ["Y",  12, 34,   56, 0.03, 36],
    ["Z",0.07, 31,   46,  87,  33]
], columns=["A","B","C","D","E","F"])

I want to keep only those columns where the value is greater than 1. And this operation should exclude certain rows. In this case W and Z row of A column.

Expected Output:

Out = pd.DataFrame([
    #["A","B", "D"]
    ["V", 13,  45], 
    ["W", 23, 0.06], 
    ["X", 12,  53],
    ["Y", 12,  56],
    ["Z",0.07, 46]
],columns=["A","B","D"])

How to do it?

CodePudding user response：

Create a boolean mask: first hide rows W and Z then keep columns where all values are greater than 0.

mask = In[~In['A'].isin(['W', 'Z'])].select_dtypes('number').gt(1).all()
Out = In[['A']   mask[mask].index.tolist()]

Output:

>>> Out
   A      B      D
0  V  13.00  45.00
1  W  23.00   0.06
2  X  12.00  53.00
3  Y  12.00  56.00
4  Z   0.07  46.00

>>> mask
B     True
C    False
D     True
E    False
F    False
dtype: bool

CodePudding user response：

Use the loc accessor to slice. To do this, set column A as index.

Oneliner of code

In.set_index('A').loc[:,In.set_index('A').loc[['W','Z']].le(1).any()]

We can make it neat

s = In.set_index('A')

s.loc[:,s.loc[['W','Z']].le(1).any()].reset_index()



   A      B      D
0  V  13.00  45.00
1  W  23.00   0.06
2  X  12.00  53.00
3  Y  12.00  56.00
4  Z   0.07  46.00