My goal is to choose the most closest date to end of month. For instance, how to choose '2021-01-29', '2021-02-26' ? (Some sort of masking method is available?)
import pandas as pd
df=pd.DataFrame({'date': ['2021-01-28', '2021-01-29', '2021-02-25','2021-02-26']})
date
0 2021-01-28
1 2021-01-29
2 2021-02-25
3 2021-02-26
CodePudding user response:
You can grouping by month periods and get maximal value of datetimes:
df['date'] = pd.to_datetime(df['date'])
df = df.groupby(df['date'].dt.to_period('m')).max()
print (df)
date
date
2021-01 2021-01-29
2021-02 2021-02-26
Or use DataFrameGroupBy.idxmax for indices by maximal date, then select values by DataFrame.loc:
df['date'] = pd.to_datetime(df['date'])
df = df.loc[df.groupby(df['date'].dt.to_period('m'))['date'].idxmax()]
print (df)
date
1 2021-01-29
3 2021-02-26
Or first sorting and then get last duplicated row by Series.duplicated:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df = df[~df['date'].dt.to_period('m').duplicated(keep='last')]
print (df)
date
1 2021-01-29
3 2021-02-26
