I have a dataset with two columns, Date1 and Date2. My goal is to subtract date2 from date1 at lag 0, lag 1, lag 2, ..., lag 6. If date2 is bigger, then flag it as 1 otherwise 0. I need to do this iteratively so
date2[0] - date1[0] date1[1] ... date1[5]
date2[1] - date1[1] date1[2] ... date1[6]
etc.
Date1 Date2
0 2021-06-15 00:25:29 2021-06-15 10:05:50
1 2021-06-15 13:32:01 2021-06-15 14:17:30
2 2021-06-15 17:59:37 2021-06-15 18:12:30
3 2021-06-17 01:01:16 2021-06-17 13:30:23
4 2021-06-17 14:07:11 2021-06-17 14:34:45
5 2021-06-17 18:30:24 2021-06-17 19:22:02
6 2021-06-17 19:42:28 2021-06-18 10:11:04
7 2021-06-18 12:54:50 2021-06-18 13:25:16
8 2021-06-18 16:59:40 2021-06-18 17:22:23
9 2021-06-18 17:49:54 2021-06-18 18:25:53
10 2021-06-18 19:57:39 2021-06-18 20:43:11
11 2021-06-21 13:52:28 2021-06-21 14:03:31
12 2021-06-21 15:44:46 2021-06-21 18:31:21
13 2021-06-21 20:03:37 2021-06-21 20:59:54
14 2021-06-22 18:39:22 2021-06-22 19:23:28
15 2021-06-23 19:45:54 2021-06-23 19:52:26
16 2021-06-23 19:59:33 2021-06-23 20:00:43
17 2021-06-24 12:53:31 2021-06-25 13:25:30
18 2021-06-25 17:57:05 2021-06-25 19:32:37
19 2021-06-28 13:34:25 2021-06-28 14:00:04
DATA
{'Date1': {0: Timestamp('2021-06-15 00:25:29'), 1: Timestamp('2021-06-15 13:32:01'), 2: Timestamp('2021-06-15 17:59:37'), 3: Timestamp('2021-06-17 01:01:16'), 4: Timestamp('2021-06-17 14:07:11'), 5: Timestamp('2021-06-17 18:30:24'), 6: Timestamp('2021-06-17 19:42:28'), 7: Timestamp('2021-06-18 12:54:50'), 8: Timestamp('2021-06-18 16:59:40'), 9: Timestamp('2021-06-18 17:49:54'), 10: Timestamp('2021-06-18 19:57:39'), 11: Timestamp('2021-06-21 13:52:28'), 12: Timestamp('2021-06-21 15:44:46'), 13: Timestamp('2021-06-21 20:03:37'), 14: Timestamp('2021-06-22 18:39:22'), 15: Timestamp('2021-06-23 19:45:54'), 16: Timestamp('2021-06-23 19:59:33'), 17: Timestamp('2021-06-24 12:53:31'), 18: Timestamp('2021-06-25 17:57:05'), 19: Timestamp('2021-06-28 13:34:25')}, 'Date2': {0: Timestamp('2021-06-15 10:05:50'), 1: Timestamp('2021-06-15 14:17:30'), 2: Timestamp('2021-06-15 18:12:30'), 3: Timestamp('2021-06-17 13:30:23'), 4: Timestamp('2021-06-17 14:34:45'), 5: Timestamp('2021-06-17 19:22:02'), 6: Timestamp('2021-06-18 10:11:04'), 7: Timestamp('2021-06-18 13:25:16'), 8: Timestamp('2021-06-18 17:22:23'), 9: Timestamp('2021-06-18 18:25:53'), 10: Timestamp('2021-06-18 20:43:11'), 11: Timestamp('2021-06-21 14:03:31'), 12: Timestamp('2021-06-21 18:31:21'), 13: Timestamp('2021-06-21 20:59:54'), 14: Timestamp('2021-06-22 19:23:28'), 15: Timestamp('2021-06-23 19:52:26'), 16: Timestamp('2021-06-23 20:00:43'), 17: Timestamp('2021-06-25 13:25:30'), 18: Timestamp('2021-06-25 19:32:37'), 19: Timestamp('2021-06-28 14:00:04')}}
CodePudding user response:
Setup
print(df)
Date1 Date2
0 2021-06-14 00:25:29 2021-06-15 10:05:50
1 2021-06-09 13:32:01 2021-06-15 14:17:30
2 2021-06-12 17:59:37 2021-06-15 18:12:30
3 2021-06-17 01:01:16 2021-06-17 13:30:23
4 2021-06-14 14:07:11 2021-06-17 14:34:45
5 2021-06-12 18:30:24 2021-06-17 19:22:02
6 2021-06-11 19:42:28 2021-06-18 10:11:04
7 2021-06-17 12:54:50 2021-06-18 13:25:16
8 2021-06-18 16:59:40 2021-06-18 17:22:23
9 2021-06-15 17:49:54 2021-06-18 18:25:53
Numpy solution
x = df['Date1'].to_numpy()
y = df['Date2'].to_numpy()
m = y[:, None] >= x
df['count'] = (np.triu(m) & ~np.triu(m, 6)).sum(1)
Result
Date1 Date2 count
0 2021-06-14 00:25:29 2021-06-15 10:05:50 5
1 2021-06-09 13:32:01 2021-06-15 14:17:30 5
2 2021-06-12 17:59:37 2021-06-15 18:12:30 4
3 2021-06-17 01:01:16 2021-06-17 13:30:23 5
4 2021-06-14 14:07:11 2021-06-17 14:34:45 5
5 2021-06-12 18:30:24 2021-06-17 19:22:02 4
6 2021-06-11 19:42:28 2021-06-18 10:11:04 3
7 2021-06-17 12:54:50 2021-06-18 13:25:16 2
8 2021-06-18 16:59:40 2021-06-18 17:22:23 2
9 2021-06-15 17:49:54 2021-06-18 18:25:53 1
Some details
y[:, None] >= xcreates a boolean mask by comparing each value inDate2to every value inDate1np.triu(arr, k=0)is used to select an array with the elements below thek-th diagonal zeroed.
