I have small dataset defined as follows;
df = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
'cmp1':[1,0,0],
'cmp2':[0,1,0],
'cmp3':[1,0,0]})
The goal is to add a new column called col which'll tell the first occurrence of 1 for every row among the columns, cmp1,cmp2,cmp3. Hence the final output shd look like;
df2 = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
'cmp1':[1,0,0],
'cmp2':[0,1,0],
'cmp3':[1,0,0],
'col':['first', 'second','none']})
As you can see for id=id1, 1 occurred for the very first time in cmp1, hence we have first in col.It is irrespective of the fact that we have cmp2=0, and again cmp3=1.
Can I get some help on how to define this new column, col?
CodePudding user response:
You can use .max(axis=1) to find those rows with at least one 1, then np.where with idxmax (for the column name):
# the actual data columns
data = df.iloc[:, 1:]
df['col'] = np.where(data.max(axis=1), data.idxmax(axis=1), 'none')
Output:
id cmp1 cmp2 cmp3 col
0 id1 1 0 1 cmp1
1 id2 0 1 0 cmp2
2 id3 0 0 0 none
If you insist on first, second,... you would need to do manually
labels = np.array(['first', 'second', 'third'])
df['col'] = np.where(data.max(axis=1), labels[data.to_numpy().argmax(axis=1)], 'none')
and output:
id cmp1 cmp2 cmp3 col
0 id1 1 0 1 first
1 id2 0 1 0 second
2 id3 0 0 0 none
CodePudding user response:
Use idxmax() with axis=1
cmp = df.filter(like='cmp')
df.loc[cmp.eq(1).any(axis=1), 'col'] = cmp.eq(1).idxmax(axis=1)
Output:
>>> df
id cmp1 cmp2 cmp3 col
0 id1 1 0 1 cmp1
1 id2 0 1 0 cmp2
2 id3 0 0 0 NaN
CodePudding user response:
You can use a combination of idxmax() and a mask to find the answer
df['first'] = df.drop(['id'], axis = 1).mask(df.eq(0)).idxmax(axis = 1).fillna('None')
