I have a data frame
In = pd.DataFrame([
#["A", "B", "C", "D","E","F"]
["V", 13, 0.0, 45, 65, 32],
["W", 23, 45, 0.06, 78, 33],
["X", 12, 37, 53, 0.00, 0.9],
["Y", 12, 34, 56, 0.03, 36],
["Z",0.07, 31, 46, 87, 33]
], columns=["A","B","C","D","E","F"])
I want to keep only those columns where the value is greater than 1. And this operation should exclude certain rows. In this case W and Z row of A column.
Expected Output:
Out = pd.DataFrame([
#["A","B", "D"]
["V", 13, 45],
["W", 23, 0.06],
["X", 12, 53],
["Y", 12, 56],
["Z",0.07, 46]
],columns=["A","B","D"])
How to do it?
CodePudding user response:
Create a boolean mask: first hide rows W and Z then keep columns where all values are greater than 0.
mask = In[~In['A'].isin(['W', 'Z'])].select_dtypes('number').gt(1).all()
Out = In[['A'] mask[mask].index.tolist()]
Output:
>>> Out
A B D
0 V 13.00 45.00
1 W 23.00 0.06
2 X 12.00 53.00
3 Y 12.00 56.00
4 Z 0.07 46.00
>>> mask
B True
C False
D True
E False
F False
dtype: bool
CodePudding user response:
Use the loc accessor to slice. To do this, set column A as index.
Oneliner of code
In.set_index('A').loc[:,In.set_index('A').loc[['W','Z']].le(1).any()]
We can make it neat
s = In.set_index('A')
s.loc[:,s.loc[['W','Z']].le(1).any()].reset_index()
A B D
0 V 13.00 45.00
1 W 23.00 0.06
2 X 12.00 53.00
3 Y 12.00 56.00
4 Z 0.07 46.00
