how can i add one feature for dataframe based on complex condition?-CodePudding

df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[1,0,0,1,0,1],'col_c':[1,0,0,1,0,1]})

df
   col_a  col_b  col_c
0      0      1      1
1      0      0      0
2      0      0      0
3      1      1      1
4      1      0      0
5      1      1      1

i want to add a new feature to this df,based on (presudocode)if numbers(1) in a row are majority in this row,just like a voter. i have tried for on every column, but the orginal data`s rows are 10000, it takes about several mintutes( i think if use pandas api, it would be faster). i have tried apply or assign, but it fails because of the unfamiliarity to the pandaspackage. i want to learn it using pandas api,thank you all

CodePudding user response：

You can use mode:

df['col_d'] = df.mode(axis=1)
print(df)

# Output
   col_a  col_b  col_c  col_d
0      0      1      1      1
1      0      0      0      0
2      0      0      0      0
3      1      1      1      1
4      1      0      0      0
5      1      1      1      1

CodePudding user response：

You can sum on columns, if the result is greater than 1, it means 1 is majority

import numpy as np

df['feature'] = np.where(df.sum(axis=1).ge(2), '1 majority', '0 majority')

print(df)

   col_a  col_b  col_c     feature
0      0      1      1  1 majority
1      0      0      0  0 majority
2      0      0      0  0 majority
3      1      1      1  1 majority
4      1      0      0  0 majority
5      1      1      1  1 majority