There is this dataframe called frame with columns: Age, Maturity, Gender, Height which has values 'PASSED' or 'FAILED'.
I want to create a new column called result and count the number of a subset of the overall columns as seen in check_columns which has the value PASSED.
I tried to use a comprehension expression which as seen in this case I wanted it to be evaluated to 2 since the columns Age and Maturity have PASSED while Gender has FAILED which are present in a subset of columns in check_columns.
frame = pd.DataFrame(
data = {'Age':['PASSED'], 'Maturity':['PASSED'],'Gender':['FAILED'], 'Height':['PASSED']}
)
check_columns = ['Age','Maturity','Gender']
frame['result'] = sum([1 if frame[column] =='PASSED' else 0 for column in check_columns ])
I tried to used a comprehension with a list but it says this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
Select only the desired columns, check which ones are equal to 'PASSED' using DataFrame.eq, and count the True values using DataFrame.sum (pass axis=1 to count column-wise)
import pandas as pd
frame = pd.DataFrame(
data = {'Age':['PASSED'], 'Maturity':['PASSED'],'Gender':['FAILED'], 'Height':['PASSED']}
)
check_columns = ['Age','Maturity','Gender']
frame['results'] = frame[check_columns].eq('PASSED').sum(axis=1)
Output:
>>> frame
Age Maturity Gender Height results
0 PASSED PASSED FAILED PASSED 2
