I have two dataframe say df1 (primary dataframe) and df2. I want to drop all previous rows from df1 based on a condition from df2. My dataframe are like below:
df2
tradingsymbol Time
0 BANKNIFTY2220339500CE 12:54:40
1 BANKNIFTY2220340000CE 12:53:33
2 BANKNIFTY2220340500CE 12:51:50
df1.head(20)
tradingsymbol Time last_price
0 BANKNIFTY2220339500CE 09:20:10 84.40
1 BANKNIFTY2220339500CE 09:20:10 85.95
2 BANKNIFTY2220339500CE 12:55:60 84.70 <-Valid Row
3 BANKNIFTY2220339500CE 13:22:10 86.35 <-Valid Row
4 BANKNIFTY2220339500CE 14:55:40 87.10 <-Valid Row
5 BANKNIFTY2220340000CE 09:20:13 88.95
6 BANKNIFTY2220340000CE 09:20:13 88.80
7 BANKNIFTY2220340000CE 09:20:14 88.30
8 BANKNIFTY2220340000CE 14:23:11 87.30 <-Valid Row
9 BANKNIFTY2220340500CE 09:20:15 90.15
10 BANKNIFTY2220340500CE 09:20:16 90.10
11 BANKNIFTY2220340500CE 09:20:17 91.05
12 BANKNIFTY2220340500CE 09:20:18 90.95
I want to remove all rows from df1 previous to time in Time column of df2 for each tradingsymbol. I want my result as below:
tradingsymbol Time last_price
2 BANKNIFTY2220339500CE 12:55:60 84.70
3 BANKNIFTY2220339500CE 13:22:10 86.35
4 BANKNIFTY2220339500CE 14:55:40 87.10
8 BANKNIFTY2220340000CE 14:23:11 87.30
CodePudding user response:
In case the column elements are not yet in datetime format, you can transform:
df["Time"] = pd.to_datetime(df["Time"]).dt.time
Or, you can set this option directly while reading:
df = pd.read_csv(
filename,
parse_dates=["Time"],
date_parser=lambda x: pd.to_datetime(x, format="%H:%M:%S").time()
)
Having done this for both dataframes, one way to filter the dataframe is to go through all the rows in df2 and, for each one, drop the rows that satisfy the condition in df1. So:
for index, row in df2.iterrows():
df1.drop(
df1[(df1.tradingsymbol == row["tradingsymbol"]) & (df1.Time < row["Time"])].index,
inplace=True
)
