| ID | Check 1 | Check 2 | Value |
|---|---|---|---|
| 1 | Y | 1 | |
| 1 | Y | Y | 2 |
| 2 | Y | 5 | |
| 2 | 8 |
I have a dataframe similar to the above table. I need to find a way to collapse the table so that I have only one row for each ID, and the Value column contains the minimum value for that ID.
I'm able to do this using:
df = df.fillna('')
df.groupby(['ID'],as_index=False).min()
However I also need to preserve any of the 'Y' values in my two Check columns, so that a Y will always take priority and fill in any null values on a row with the same ID. So based on the table above, the result I'm looking for would be like this:
| ID | Check 1 | Check 2 | Value |
|---|---|---|---|
| 1 | Y | Y | 1 |
| 2 | Y | 5 |
What I'm seeing happen so far is that I'll be left with blanks in my Check 1 column, instead of a Y value. Any ideas on how to do this?
CodePudding user response:
Group the dataframe by ID, then pass min as aggregate for Value column, and first as aggregate for rest of the columns.
>>> df.groupby('ID').agg({'Check 1': 'first', 'Check 2':'first', 'Value': 'min'})
Check 1 Check 2 Value
ID
1 Y Y 1
2 None Y 5
CodePudding user response:
If you sort_values on the "Value" column before groupby, you can simply keep the first valid row for all columns:
>>> df.sort_values("Value").groupby("ID").first().fillna('')
Check 1 Check 2 Value
ID
1 Y Y 1
2 Y 5
