I want to loop the deletion of rows until I reach the row with the time value of 04:30:00, and then stop the deletion process. how do i do that?
ticker date time vol vwap open high low close
0 AACG 2022-01-06 04:07:00 242 2.0400 2.04 2.04 2.04 2.04
1 AACG 2022-01-06 04:08:00 427 2.0858 2.06 2.10 2.06 2.10
2 AACG 2022-01-06 04:09:00 906 2.1098 2.10 2.11 2.10 2.11
3 AACG 2022-01-06 04:16:00 186 2.1108 2.12 2.12 2.10 2.10
4 AACG 2022-01-06 04:30:00 237 2.0584 2.06 2.06 2.06 2.06
5 AACG 2022-01-06 04:31:00 700 2.1098 2.10 2.11 2.10 2.11
I tried this but it doesn't show that anything has changed:
row = 0
while df['time'].values[row] == datetime.time(4,30) == False:
print(df['time'].values[row])
df.drop(row,axis=0,inplace=True)
row = row 1
Here is the df.info():
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
ticker 10 non-null object
date 10 non-null object
time 10 non-null object
vol 10 non-null int64
vwap 10 non-null float64
open 10 non-null float64
high 10 non-null float64
low 10 non-null float64
close 10 non-null float64
lbh 10 non-null int64
lah 10 non-null int64
trades 10 non-null int64
dtypes: float64(5), int64(4), object(3)
memory usage: 1.1 KB
UPDATE: Thanks again for your help everyone.
"df[df['time'] >= datetime.time(4, 30)]" helped me remove unnecessary rows.
CodePudding user response:
You can use a boolean mask to slice your data. If your df['time'] is datetime.time objects, then you can slice df simply as:
out = df[df['time'] > datetime.time(4,30)]
Output:
ticker date time vol vwap open high low close
5 AACG 2022-01-06 04:31:00 700 2.1098 2.1 2.11 2.1 2.11
CodePudding user response:
Don't loop but slice. You can use a mask for that (here generated with a boolean array and cummax):
df[df['time'].eq('04:30:00').cummax()]
output:
ticker date time vol vwap open high low close
4 AACG 2022-01-06 04:30:00 237 2.0584 2.06 2.06 2.06 2.06
5 AACG 2022-01-06 04:31:00 700 2.1098 2.10 2.11 2.10 2.11
If you also want to exclude the matching row:
df[df['time'].eq('04:30:00').shift(fill_value=False).cummax()]
CodePudding user response:
You don't need a loop here if you convert your time column to a TimedeltaIndex:
out = df[~pd.to_timedelta(df['time']).lt('04:30:00')]
print(out)
# Output
ticker date time vol vwap open high low close
4 AACG 2022-01-06 04:30:00 237 2.0584 2.06 2.06 2.06 2.06
5 AACG 2022-01-06 04:31:00 700 2.1098 2.10 2.11 2.10 2.11
Does it work?
from datetime import time
out = df[df['time'] >= time(4, 30)]
print(out)
# Output:
ticker date time vol vwap open high low close
4 AACG 2022-01-06 04:30:00 237 2.0584 2.06 2.06 2.06 2.06
5 AACG 2022-01-06 04:31:00 700 2.1098 2.10 2.11 2.10 2.11
# Info
print(df['time'].iloc[0])
# datetime.time(4, 7)
