df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[1,0,0,1,0,1],'col_c':[1,0,0,1,0,1]})
df
col_a col_b col_c
0 0 1 1
1 0 0 0
2 0 0 0
3 1 1 1
4 1 0 0
5 1 1 1
i want to add a new feature to this df,based on (presudocode)if numbers(1) in a row are majority in this row,just like a voter. i have tried for on every column, but the orginal data`s rows are 10000, it takes about several mintutes( i think if use pandas api, it would be faster). i have tried apply or assign, but it fails because of the unfamiliarity to the pandaspackage.
i want to learn it using pandas api,thank you all
CodePudding user response:
You can use mode:
df['col_d'] = df.mode(axis=1)
print(df)
# Output
col_a col_b col_c col_d
0 0 1 1 1
1 0 0 0 0
2 0 0 0 0
3 1 1 1 1
4 1 0 0 0
5 1 1 1 1
CodePudding user response:
You can sum on columns, if the result is greater than 1, it means 1 is majority
import numpy as np
df['feature'] = np.where(df.sum(axis=1).ge(2), '1 majority', '0 majority')
print(df)
col_a col_b col_c feature
0 0 1 1 1 majority
1 0 0 0 0 majority
2 0 0 0 0 majority
3 1 1 1 1 majority
4 1 0 0 0 majority
5 1 1 1 1 majority
