Loop the removal of pandas dataframe rows-CodePudding

I want to loop the deletion of rows until I reach the row with the time value of 04:30:00, and then stop the deletion process. how do i do that?

ticker  date    time    vol     vwap    open    high    low close
0   AACG    2022-01-06  04:07:00    242 2.0400  2.04    2.04    2.04    2.04
1   AACG    2022-01-06  04:08:00    427 2.0858  2.06    2.10    2.06    2.10
2   AACG    2022-01-06  04:09:00    906 2.1098  2.10    2.11    2.10    2.11
3   AACG    2022-01-06  04:16:00    186 2.1108  2.12    2.12    2.10    2.10
4   AACG    2022-01-06  04:30:00    237 2.0584  2.06    2.06    2.06    2.06
5   AACG    2022-01-06  04:31:00    700 2.1098  2.10    2.11    2.10    2.11

I tried this but it doesn't show that anything has changed:

row = 0
while df['time'].values[row] == datetime.time(4,30) == False:
    print(df['time'].values[row])
    df.drop(row,axis=0,inplace=True)
    row = row 1

Here is the df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
ticker    10 non-null object
date      10 non-null object
time      10 non-null object
vol       10 non-null int64
vwap      10 non-null float64
open      10 non-null float64
high      10 non-null float64
low       10 non-null float64
close     10 non-null float64
lbh       10 non-null int64
lah       10 non-null int64
trades    10 non-null int64
dtypes: float64(5), int64(4), object(3)
memory usage: 1.1  KB

UPDATE: Thanks again for your help everyone.

"df[df['time'] >= datetime.time(4, 30)]" helped me remove unnecessary rows.

CodePudding user response：

You can use a boolean mask to slice your data. If your df['time'] is datetime.time objects, then you can slice df simply as:

out = df[df['time'] > datetime.time(4,30)]

Output:

  ticker        date      time  vol    vwap  open  high  low  close
5   AACG  2022-01-06  04:31:00  700  2.1098   2.1  2.11  2.1   2.11

CodePudding user response：

Don't loop but slice. You can use a mask for that (here generated with a boolean array and cummax):

df[df['time'].eq('04:30:00').cummax()]

output:

  ticker        date      time  vol    vwap  open  high   low  close
4   AACG  2022-01-06  04:30:00  237  2.0584  2.06  2.06  2.06   2.06
5   AACG  2022-01-06  04:31:00  700  2.1098  2.10  2.11  2.10   2.11

If you also want to exclude the matching row:

df[df['time'].eq('04:30:00').shift(fill_value=False).cummax()]

CodePudding user response：

You don't need a loop here if you convert your time column to a TimedeltaIndex:

out = df[~pd.to_timedelta(df['time']).lt('04:30:00')]
print(out)

# Output
  ticker        date      time         vol  vwap  open  high  low close
4   AACG  2022-01-06  04:30:00  237 2.0584  2.06  2.06  2.06       2.06
5   AACG  2022-01-06  04:31:00  700 2.1098  2.10  2.11  2.10       2.11

Does it work?

from datetime import time

out = df[df['time'] >= time(4, 30)]
print(out)

# Output:
  ticker        date      time         vol  vwap  open  high  low close
4   AACG  2022-01-06  04:30:00  237 2.0584  2.06  2.06  2.06       2.06
5   AACG  2022-01-06  04:31:00  700 2.1098  2.10  2.11  2.10       2.11

# Info
print(df['time'].iloc[0])
# datetime.time(4, 7)