I have a dataframe that looks like this:
| ID | Name | Major1 | Major2 | Major3 |
|---|---|---|---|---|
| 12 | Dave | English | NaN | NaN |
| 12 | Dave | NaN | Biology | NaN |
| 12 | Dave | NaN | NaN | History |
| 13 | Nate | Spanish | NaN | NaN |
| 13 | Nate | NaN | Business | NaN |
I need to merge rows resulting in this:
| ID | Name | Major1 | Major2 | Major3 |
|---|---|---|---|---|
| 12 | Dave | English | Biology | History |
| 13 | Nate | Spanish | Business | NaN |
I know this is possible with groupby but I haven't been able to get it to work correctly. Can anyone help?
CodePudding user response:
If you are intent on using groupby, you could do something like this:
dataframe = dataframe.melt(['ID', 'Name']).dropna()
dataframe = dataframe.groupby(['ID', 'Name', 'variable'])['value'].sum().unstack('variable')
You may have to mess with the column names a bit, but this is what comes to me as a possible solution using groupby.
CodePudding user response:
Use melt and pivot
>>> df.melt(['ID', 'Name']).dropna() \
.pivot(['ID', 'Name'], 'variable', 'value') \
.reset_index().rename_axis(columns=None)
ID Name Major1 Major2 Major3
0 12 Dave English Biology History
1 13 Nate Spanish Business NaN
