Home > Back-end >  Finding whether there is any overlap between two date periods in DataFrame
Finding whether there is any overlap between two date periods in DataFrame

Time:01-09

I have the following pd.DataFrame

df = pd.DataFrame({'admission_timestamp': ['2021-01-17 17:45:00', '2020-03-31 23:32:00', '2020-03-27 18:20:00', '2020-04-17 18:12:00', '2020-03-19 19:12:00'], 'end_period': ['2021-01-18 17:45:00', '2020-04-01 23:32:00', '2020-03-28 18:20:00', '2020-04-18 18:12:00', '2020-03-20 19:12:00'], 'start_med': ['NaT', '2020-04-01 00:00:00', '2020-03-27 19:00:00', '2020-04-17 18:39:24', 'NaT'], 'end_med': ['NaT', '2020-04-14 21:00:00', '2020-04-05 00:00:00', '2020-05-06 22:07:29', 'NaT']})

that looks like:

   admission_timestamp  end_period           start_med            end_med 
1  2021-01-17 17:45:00  2021-01-18 17:45:00  NaT                  NaT   
2  2020-03-31 23:32:00  2020-04-01 23:32:00  2020-04-01 00:00:00  2020-04-14 21:00:00
3  2020-03-27 18:20:00  2020-03-28 18:20:00  2020-03-27 19:00:00  2020-04-05 00:00:00
4  2020-04-17 18:12:00  2020-04-18 18:12:00  2020-04-17 18:39:24  2020-05-06 22:07:29   
5  2020-03-19 19:12:00  2020-03-20 19:12:00  NaT                  NaT

I want to create a new column received_medidation that states whether or not (boolean) the patient received medication between admission_timestamp and end_period (even if it was for only one second). So, the boolean should state if there is any time between admission_timestamp and end_period that overlaps with the time between start_med and end_med. The dtypes are all datetime64[ns].

I know that we can create boolean masks such as

condition = (df['date'] > start_date) & (df['date'] <= end_date)

... however I fail to understand how this could possibily solve the task above. Any help is appreciated.

CodePudding user response:

Use between

df['overlaps'] = df['start_med'].between(df['admission_timestamp'], df['end_period']) \
                 | df['end_med'].between(df['admission_timestamp'], df['end_period'])
print(df)

# Output
  admission_timestamp          end_period           start_med             end_med  overlaps
1 2021-01-17 17:45:00 2021-01-18 17:45:00                 NaT                 NaT     False
2 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00 2020-04-14 21:00:00      True
3 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00 2020-04-05 00:00:00      True
4 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24 2020-05-06 22:07:29      True
5 2020-03-19 19:12:00 2020-03-20 19:12:00                 NaT                 NaT     False

CodePudding user response:

You can apply a function that checks the condition (note that it suffices that start_med date is between admission_timestamp and end_period):

for col in df.columns:
    df[col] = pd.to_datetime(df[col])

df['received_medidation'] = (df['admission_timestamp'] < df['start_med']) & (df['start_med'] < df['end_period'])

Output:

  admission_timestamp          end_period           start_med  \
0 2021-01-17 17:45:00 2021-01-18 17:45:00                 NaT   
1 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00   
2 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00   
3 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24   
4 2020-03-19 19:12:00 2020-03-20 19:12:00                 NaT   

              end_med  received_medidation  
0                 NaT                False  
1 2020-04-14 21:00:00                 True  
2 2020-04-05 00:00:00                 True  
3 2020-05-06 22:07:29                 True  
4                 NaT                False  
  •  Tags:  
  • Related