I am trying to calculate the time difference(in days) between the customer's previous visit out time and the customer's latest visit in time.
time difference = latest in time - previous out time
Here is a sample of input data
sample output table
The approach I have tried so far groupby based on customer ID and rank
temp['RANK'] = temp.groupby('customer ID')['in time'].rank(ascending=True)
but I am unsure that how to calculate the difference.
CodePudding user response:
You can use GroupBy.shift() to get the previous out time within the group. Substracted by current in time. Then, use dt.days to get the number of days of the timedelta between in time and out time within the group, as follows:
# convert date strings to datetime format
df['out time'] = pd.to_datetime(df['out time'], dayfirst=True)
df['in time'] = pd.to_datetime(df['in time'], dayfirst=True)
df['Visit diff (in days)'] = (df['in time'] - df['out time'].groupby(df['customer ID']).shift()).dt.days
Data Input:
print(df)
customer ID out time in time
0 1 05-12-1999 15:20:07 05-12-1999 14:23:31
1 1 21-12-1999 09:59:34 21-12-1999 09:41:09
2 2 05-12-1999 11:53:34 05-12-1999 11:05:37
3 2 08-12-1999 19:55:00 08-12-1999 19:40:10
4 3 01-12-1999 15:15:26 01-12-1999 13:08:11
5 3 16-12-1999 17:10:09 16-12-1999 16:34:10
Result:
print(df)
customer ID out time in time Visit diff (in days)
0 1 1999-12-05 15:20:07 1999-12-05 14:23:31 NaN
1 1 1999-12-21 09:59:34 1999-12-21 09:41:09 15.0
2 2 1999-12-05 11:53:34 1999-12-05 11:05:37 NaN
3 2 1999-12-08 19:55:00 1999-12-08 19:40:10 3.0
4 3 1999-12-01 15:15:26 1999-12-01 13:08:11 NaN
5 3 1999-12-16 17:10:09 1999-12-16 16:34:10 15.0
CodePudding user response:
You may try the following:
temp.groupby('customer ID').apply(lambda x: (x['in time'].max() - x['out time'].min()).days )


