Remove all previous rows from primary dataframe based on condition from another dataframe-CodePudding

I have two dataframe say df1 (primary dataframe) and df2. I want to drop all previous rows from df1 based on a condition from df2. My dataframe are like below:

df2

           tradingsymbol      Time
0  BANKNIFTY2220339500CE  12:54:40
1  BANKNIFTY2220340000CE  12:53:33
2  BANKNIFTY2220340500CE  12:51:50

df1.head(20)

            tradingsymbol      Time  last_price
0   BANKNIFTY2220339500CE  09:20:10       84.40
1   BANKNIFTY2220339500CE  09:20:10       85.95
2   BANKNIFTY2220339500CE  12:55:60       84.70 <-Valid Row
3   BANKNIFTY2220339500CE  13:22:10       86.35 <-Valid Row
4   BANKNIFTY2220339500CE  14:55:40       87.10 <-Valid Row

5   BANKNIFTY2220340000CE  09:20:13       88.95
6   BANKNIFTY2220340000CE  09:20:13       88.80
7   BANKNIFTY2220340000CE  09:20:14       88.30
8   BANKNIFTY2220340000CE  14:23:11       87.30 <-Valid Row

9   BANKNIFTY2220340500CE  09:20:15       90.15
10  BANKNIFTY2220340500CE  09:20:16       90.10
11  BANKNIFTY2220340500CE  09:20:17       91.05
12  BANKNIFTY2220340500CE  09:20:18       90.95

I want to remove all rows from df1 previous to time in Time column of df2 for each tradingsymbol. I want my result as below:

            tradingsymbol      Time  last_price
2   BANKNIFTY2220339500CE  12:55:60       84.70
3   BANKNIFTY2220339500CE  13:22:10       86.35
4   BANKNIFTY2220339500CE  14:55:40       87.10
8   BANKNIFTY2220340000CE  14:23:11       87.30

CodePudding user response：

In case the column elements are not yet in datetime format, you can transform:

df["Time"] = pd.to_datetime(df["Time"]).dt.time

Or, you can set this option directly while reading:

df = pd.read_csv(
    filename,
    parse_dates=["Time"],
    date_parser=lambda x: pd.to_datetime(x, format="%H:%M:%S").time()
)

Having done this for both dataframes, one way to filter the dataframe is to go through all the rows in df2 and, for each one, drop the rows that satisfy the condition in df1. So:

for index, row in df2.iterrows():
    df1.drop(
        df1[(df1.tradingsymbol == row["tradingsymbol"]) & (df1.Time < row["Time"])].index,
        inplace=True
    )