I have a pandas DataFrame like the below:
| Price | Date |
|---|---|
| 25149.570 | 2/5/2017 14:22 |
| 24799.680 | 2/5/2017 14:22 |
| 24799.680 | 2/5/2017 14:22 |
| 14570.000 | 2/5/2017 14:47 |
| 14570.001 | 2/5/2017 14:47 |
| 14570.001 | 2/5/2017 14:47 |
| 14570.000 | 2/5/2017 15:01 |
| 14570.001 | 2/5/2017 15:01 |
| 14570.001 | 2/5/2017 15:01 |
| 14600.000 | 2/6/2017 17:49 |
| 14600.000 | 2/6/2017 17:49 |
| 14800.000 | 2/6/2017 17:49 |
| 14600.000 | 2/6/2017 17:49 |
| 14600.000 | 2/6/2017 17:49 |
| 14600.000 | 2/6/2017 18:30 |
| 14600.000 | 2/6/2017 18:30 |
| 14800.000 | 2/6/2017 18:30 |
| 14600.000 | 2/6/2017 18:30 |
| 14600.000 | 2/6/2017 18:30 |
I want to find first and last value of each day based on Date column. The result can be like the below for the first day:
| Date | first | last |
|---|---|---|
| 2/5/2017 | 25149.57 | 14570.001 |
I try to use this Q/A solution but it does not work.
How do I find First and Last Value of each day (group by date)?
CodePudding user response:
You could convert "Date" column values to dates (without hours); then groupby it and use first and last to get the desired outcome:
out = df.groupby(pd.to_datetime(df['Date']).dt.strftime('%m/%d/%Y'))['Price'].agg(['first', 'last']).reset_index()
Output:
Date first last
0 02/05/2017 25149.57 14570.001
1 02/06/2017 14600.00 14600.000
CodePudding user response:
You have to ensure your dataframe is sorted by ascending Date (and maybe Price)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=False)
out = df.sort_values(['Date', 'Price']).groupby(df['Date'].dt.date)['Price'] \
.agg(['first', 'last']).reset_index()
print(out)
# Output
Date first last
0 2017-02-05 24799.68 14570.001
1 2017-02-06 14600.00 14800.000
CodePudding user response:
You can use pd.to_datetime and dt.date as a grouper for GroupBy.agg:
df2 = (df.groupby(pd.to_datetime(df['Date']).dt.date)
['Price'].agg(['first', 'last'])
)
Output:
first last
Date
2017-02-05 25149.57 14570.001
2017-02-06 14600.00 14600.000
CodePudding user response:
You can simply use:
df["Date"] = pd.to_datetime(df["Date"])
df.set_index("Date", inplace=True)
df.groupby(pd.Grouper(freq="D")).agg(["first", "last"])
Output
| Date | ('Price', 'first') | ('Price', 'last') |
|---|---|---|
| 2017-02-05 00:00:00 | 25149.6 | 14570 |
| 2017-02-06 00:00:00 | 14600 | 14600 |
