Home > database >  Group by ID and select rows with the newest DATE1 and the oldest DATE2
Group by ID and select rows with the newest DATE1 and the oldest DATE2

Time:01-13

I have dataframe:

ID  DATE_1      DATE_2
12  01.01.2015  01.01.2021
12  01.01.2016  01.01.2021
12  01.01.2017  01.01.2019
12  01.01.2012  01.01.2021
13  01.01.2011  01.01.2021
13  01.01.2014  01.01.2020
13  01.01.2010  01.01.2018
13  01.01.2013  01.01.2016
14  01.01.2012  01.01.2021
14  01.01.2016  01.01.2020
14  01.01.2017  01.01.2021
14  01.01.2018  01.01.2021

I want to group by ID and leave records with the newest DATE_2 and the oldest DATE_1. How to do it?

CodePudding user response:

IIUC, this is a classical groupby agg. You need to set the dates to a datetime type for meaningful comparisons:

(df
 .assign(DATE_1=pd.to_datetime(df['DATE_1']),
         DATE_2=pd.to_datetime(df['DATE_2'])
        )
 .groupby('ID')
 .agg({'DATE_1': 'min', 'DATE_2': 'max'})
)

output:

       DATE_1     DATE_2
ID                      
12 2012-01-01 2021-01-01
13 2010-01-01 2021-01-01
14 2012-01-01 2021-01-01
  •  Tags:  
  • Related