I have a table with 3 main columns. I would like to first group the data by Company ID, then get the Highest Post Valuation per Company ID, and its corresponding Deal Date.
Question: How do I add corresponding Deal Date in?
The data:
| Company ID | Post Valuation | Deal Date | ||
|---|---|---|---|---|
| 60 | 119616-85 | NaN | 2022-03-01 | |
| 80 | 160988-50 | 6.77 | 2022-02-10 | |
| 85 | 108827-47 | NaN | 2022-02-01 | |
| 89 | 154876-33 | 1.40 | 2022-01-27 | |
| 104 | 435509-92 | 6.16 | 2022-01-05 | |
| 107 | 186777-73 | 17.26 | 2022-01-03 | |
| 111 | 232001-47 | NaN | 2022-01-01 | |
| 113 | 160988-50 | NaN | 2021-12-31 | |
| 119 | 114196-78 | NaN | 2021-12-15 | |
| 128 | 481375-00 | 2.82 | 2021-12-01 | |
| 130 | 128348-20 | NaN | 2021-11-25 | |
| 131 | 166855-60 | 658.36 | 2021-11-25 | |
| 150 | 113503-87 | NaN | 2021-10-20 | |
| 156 | 178448-68 | 21.75 | 2021-10-07 | |
| 170 | 479007-64 | NaN | 2021-09-13 | |
| 182 | 128479-51 | NaN | 2021-09-01 | |
| 185 | 113503-87 | NaN | 2021-08-31 | |
| 186 | 128348-20 | NaN | 2021-08-30 | |
| 191 | 108643-42 | 8.02 | 2021-08-13 | |
| 192 | 186272-74 | NaN | 2021-08-12 |
The attempt
df_X.sort_values('Post Valuation', ascending=True).groupby('Company ID', as_index=False)['Post Valuation'].first()
CodePudding user response:
Sort and drop duplicates:
result = df.sort_values('Post Valuation').drop_duplicates(subset='Company ID', keep='last')
