I am trying to read an excel data set starting from a specific row in a workbook containing multiple sheets. I can't use the row index as the location of the row I want to start with changes in each sheet.
My data looks like this:
| A header | Another header |
|---|---|
| start after this | A |
| 07:00 | B |
| 08:00 | C |
| 09:00 | D |
and I wanted to read the data starting from whatever row is before 08:00
| A header | Another header |
|---|---|
| 07:00 | B |
| 08:00 | C |
| 09:00 | D |
I have tried the following but I am getting an error that it can't find "08:00" in the list
df = df[df.index[list(df.index.values).index('08:00') - 1:]]
I would appreciate any hints, many thanks in advance!
CodePudding user response:
Compare values of index converted to Series, so possible add Series.shift for previous match and Series.cummax for next Trues:
print (df)
Another header
A header
start after this A
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '08:00').shift(-1,fill_value=False).cummax()])
Another header
A header
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '07:00').shift(-1,fill_value=False).cummax()])
Another header
A header
start after this A
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '09:00').shift(-1,fill_value=False).cummax()])
Another header
A header
08:00 C
09:00 D
print (df[(df.index.to_series() == '17:00').shift(-1,fill_value=False).cummax()])
Empty DataFrame
Columns: [Another header]
Index: []
CodePudding user response:
I would write the iteration out explicitly to make things clear:
start = 0 # in case we fail to find it later
for i, row in df["Col 1 header"].iteritems():
if "07:00" in row:
start = i -1
break
Then you can just use start from then on.
