how to check whether a customer id in df1 existed before that date in df2 using pandas-CodePudding

I have two datasets allocated to pandas dataframes DF1,DF2. One with historical customer transactions (DF2) and another with other customer activities (DF1), I need to first keep only first occurances in DF2 to avoid any mistakes, then add a column ['existed_before'] to DF1 with input as True/False after comparing ID's and dates in both dataframes. The logic I have in mind is something like the following: if DF1['id'] == to DF2['ID'] and DF1['date'] > DF2['date'] then True, else it should be False or Nan in case the ID didn't exist in DF2 at all.

Below is a demonstration of the desired output, *Notes: code should consider both id and full date (including hours/mins/sec). Also, an ID might repeat many times in both dataframes.

df1:
id date
1 19-01-2021 00:13:48
2 04-03-2021 06:23:21
3 06-05-2021 08:44:00
3 23-11-2021 23:18:18
4 15-12-2021 03:20:13

df2:
id date
1 19-01-2021 00:13:48
2 02-01-2021 09:27:23
3 06-05-2021 08:44:00

df1:
id date existed_before
1 19-01-2021 00:13:48 False
2 04-03-2021 06:23:21 True
3 06-05-2021 08:44:00 False
3 23-11-2021 23:18:18 True
4 15-12-2021 03:20:13 NaN

CodePudding user response：

I got different ouput in merge_asof with helper column:

df1['date'] = pd.to_datetime(df1['date'], dayfirst=True)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=True)

df1 = df1.sort_values('date')
df2 = df2.sort_values('date')

df = pd.merge_asof(df1, 
                    df2.assign(first_time=True), 
                    on='date', 
                    by='id', 
                    allow_exact_matches=False)

df['first_time'] = (df['first_time'].fillna(False)
                                    .astype('boolean')
                                    .where(df['id'].isin(df2['id'])))
print (df)
   id                date  first_time
0   1 2021-01-19 00:13:48       False
1   2 2021-03-04 06:23:21        True
2   3 2021-05-06 08:44:00       False
3   3 2021-11-23 23:18:18        True
4   4 2021-12-15 03:20:13        <NA>