Is the following dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame([[1001, 120,np.nan], [1001,np.nan ,30], [1004, 160,np.nan],[1005, 160,np.nan],
[1006,np.nan ,8], [1010, 160,np.nan],[1010,np.nan ,4]], columns=['CustomerNr','Period1','Period2'])
| CustomerNr | Period1 | Period2 | |
|---|---|---|---|
| 0 | 1001 | 120.0 | NaN |
| 1 | 1001 | NaN | 30.0 |
| 2 | 1004 | 160.0 | NaN |
| 3 | 1005 | 160.0 | NaN |
| 4 | 1006 | NaN | 8.0 |
| 5 | 1010 | NaN | 4.0 |
| 6 | 1010 | 160.0 | NaN |
and i need to generate this where actually duplicated CustomerNr are eliminated but the values of Period1 and Period 2 are kept.
| CustomerNr | Period1 | Period2 | |
|---|---|---|---|
| 0 | 1001 | 120.0 | 30.0 |
| 1 | 1004 | 160.0 | NaN |
| 2 | 1005 | 160.0 | NaN |
| 3 | 1006 | NaN | 8.0 |
| 4 | 1010 | 160.0 | 4 |
CodePudding user response:
df.groupby('CustomerNr').agg('min')
CodePudding user response:
You can groupby and take the first item per group, by default the NaNs are ignored in the groupby operations:
df.groupby('CustomerNr').first()
output:
Period1 Period2
CustomerNr
1001 120.0000 30.0000
1004 160.0000 NaN
1005 160.0000 NaN
1006 NaN 8.0000
1010 160.0000 4.0000
