I am reading PDF's and getting data into Dataframe.
| EmpID | Team_Name | Cost | No_Emps |
|---|---|---|---|
| AA1 | Sam | 25,689 | 2 |
| AA2 | Tom | 78,368 | 3 |
| AA3 | Dick | 125,369 | 5 |
| AA4 | Harry | 32,658 | 2 |
| AA5 | Joan | 22,685 | 2 |
| Grand Total: | 284,769 | 17 | |
| xxx | |||
| yyy | |||
| dfg | nnn | ||
| fgh | xxx | ||
| vhg | ttt | ||
| ppp | ddd |
There will be n number of rows after Grand Total, I need to exclude All rows after the EmpID = 'Grand Total'.
CodePudding user response:
if you have pandas df :
df = df.iloc[: df.index[df['EmpID'] == 'Grand Total:'][0] 1])
output:
EmpID Team_Name Cost No_Emps
0 AA1 Sam 25689 2
1 AA2 Tom 78368 3
2 AA3 Dick 125369 5
3 AA4 Harry 32658 2
4 AA5 Joan 22685 2
5 Grand Total: NaN 284769 17
CodePudding user response:
Make use of the index of the "Grand Total" row.
index = df.index
gt_index = index[df["EmpID"] == "Grand Total:"]
new_df = df.drop(index=index[gt_index[0] 1:])
new_df
