I have data similar to:
| Date | A | B |
|---|---|---|
| 1/5/22 | 4 | 5 |
| 1/6/22 | 8 | 3 |
| 1/8/22 | 1 | 5 |
| 2/10/22 | 4 | 8 |
| 2/28/22 | 6 | 4 |
| 3/15/22 | 6 | 0 |
| 3/21/22 | 0 | 0 |
| 3/25/22 | 5 | 7 |
I want to retain only rows with the last date of the month, similar to below:
| Date | A | B |
|---|---|---|
| 1/8/22 | 1 | 5 |
| 2/28/22 | 6 | 4 |
| 3/25/22 | 5 | 7 |
I tried using the following code:
df.resample("M").last()
However, I get the following error: 'DataFrame' object has no attribute 'to_datetime'
But my index is set as Date column which is passed through the datetime function as below:
df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = pd.to_datetime(df['Date']).dt.date
df.set_index('Date', inplace=True)
CodePudding user response:
You can use a monthly period (convert to_datetime and to_period) and groupby.idxmax:
# convert to datetime
date = pd.to_datetime(df['Date'], dayfirst=False)
# get index of last day per monthly period
out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]
output:
Date A B
2 1/8/22 1 5
4 2/28/22 6 4
7 3/25/22 5 7
NB. if you expect more than one "last date" row per month, use:
out = df.loc[date.isin(date.groupby(date.dt.to_period('M')).max())]
if Date is the index
date = pd.to_datetime(df.index.to_series(), dayfirst=False)
out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]
