I have a dataframe that looks like this:
| Part | Date |
|---|---|
| 1 | 9/1/2021 |
| 1 | 9/8/2021 |
| 1 | 9/15/2021 |
| 2 | 9/1/2020 |
| 2 | 9/12/2020 |
| 2 | 9/14/2020 |
The dataframe is already sorted by part, then by date.
I need to calculate the days between each date in the previous row.
The date diff calculation would have to restart each time a new part row in encountered.
So the desired output would be:
| Part | Date | Diff |
|---|---|---|
| 1 | 9/1/2021 | |
| 1 | 9/8/2021 | 7 |
| 1 | 9/15/2021 | 7 |
| 2 | 9/1/2020 | |
| 2 | 9/12/2020 | 11 |
| 2 | 9/14/2020 | 2 |
How would you go about processing this data to achieve the desired output?
Any assistance on this would be greatly appreciated!
Thank you
CodePudding user response:
Use groupby diff:
df.groupby('Part').Date.diff()
0 NaT
1 7 days
2 7 days
3 NaT
4 11 days
5 2 days
Name: Date, dtype: timedelta64[ns]
If you do not have Date as timestamp, you can use df.Date = pd.to_datetime(df.Date) to convert.
