Remove duplicates from dataframe but keep the values of other dataframe columns-CodePudding

Is the following dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame([[1001, 120,np.nan], [1001,np.nan ,30], [1004, 160,np.nan],[1005, 160,np.nan], 
                   [1006,np.nan ,8], [1010, 160,np.nan],[1010,np.nan ,4]], columns=['CustomerNr','Period1','Period2'])

	CustomerNr	Period1	Period2
0	1001	120.0	NaN
1	1001	NaN	30.0
2	1004	160.0	NaN
3	1005	160.0	NaN
4	1006	NaN	8.0
5	1010	NaN	4.0
6	1010	160.0	NaN

and i need to generate this where actually duplicated CustomerNr are eliminated but the values of Period1 and Period 2 are kept.

	CustomerNr	Period1	Period2
0	1001	120.0	30.0
1	1004	160.0	NaN
2	1005	160.0	NaN
3	1006	NaN	8.0
4	1010	160.0	4

CodePudding user response：

df.groupby('CustomerNr').agg('min')

CodePudding user response：

You can groupby and take the first item per group, by default the NaNs are ignored in the groupby operations:

df.groupby('CustomerNr').first()

output:

             Period1   Period2
CustomerNr                    
1001        120.0000   30.0000
1004        160.0000       NaN
1005        160.0000       NaN
1006             NaN    8.0000
1010        160.0000    4.0000