Trying to group by DocumentNo and return the Max of the Concat column. However I want the Revision column (any other columns I add to this df) to be included in final output.
This is an example input:
DocumentNo Revision Concat
MAOIE-2100-CI-MTO-0001 4 MAOIE-2100-CI-MTO-00014
MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
This should be the output:
DocumentNo Revision Concat
MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
I've tried this but I get an argmax error I think because Concat is a string column.
df = df.groupby(['DocumentNo'], as_index=False).agg({'Concat': 'max', 'Revision': 'first'})
This works, but only returns the DocumentNo and Concat columns:
df = df.groupby('DocumentNo')['Concat'].max()
CodePudding user response:
This should works:
df.groupby("DocumentNo").apply(lambda d:d.loc[d["Concat"]==d["Concat"].max()])
Probably a faster way exists.
CodePudding user response:
You can try merge after find the groupby max
df = df.merge(df.groupby('DocumentNo', as_index=False)['Concat'].max(), how='right')
print(df)
DocumentNo Revision Concat
0 MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
1 MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053
