I have the following dataframe :
| id | Statement_1 | Statement_2 | Statement_3 |
|---|---|---|---|
| xyz | 0.0 | 1.0 | 0.0 |
| abc | 1.0 | 0.0 | 0.0 |
| efg | 0.0 | 0.0 | 1.0 |
I want to create a new column based on the condition, if a row have a 1.0, it return the column name associate with it.
More clearly, I would like this dataframe in return :
| id | Statement_1 | Statement_2 | Statement_3 | Condition |
|---|---|---|---|---|
| xyz | 0.0 | 1.0 | 0.0 | Statement_2 |
| abc | 1.0 | 0.0 | 0.0 | Statement_1 |
| efg | 0.0 | 0.0 | 1.0 | Statement_3 |
Thx to everyone who can help me with this !
CodePudding user response:
Use idxmax if you are sure to have at least 1 per row:
df['Condition'] = df.filter(like='Statement').idxmax(axis=1)
print(df)
# Output
id Statement_1 Statement_2 Statement_3 Condition
0 xyz 0.0 1.0 0.0 Statement_2
1 abc 1.0 0.0 0.0 Statement_1
2 efg 0.0 0.0 1.0 Statement_3
A more robust way for case where there are 0, 1 or more 1 in the row:
df['Condition'] = df.filter(like='Statement').melt(ignore_index=False) \
.query('value == 1').groupby(level=0)['variable'] \
.apply(', '.join)
CodePudding user response:
If possible multiple 1 values use matrix multiplication with ![[1]: https://i.stack.imgur.com/2lzSk.png](https://img.codepudding.com/202203/648c2a5a2f1b477ca18ccfa17e9b5f08.png)
