I'm trying to remove rows in a pandas dataframe, in a way that everything between two specific values (e.g., start and end) is deleted, including the two values. These values can repeat, as in:
| c1 | c2 |
|---|---|
| 1 | 1 |
| 2 | start |
| 3 | 1 |
| 4 | 0 |
| 5 | end |
| 6 | 1 |
| 7 | start |
| 8 | 1 |
| 9 | 0 |
| 10 | end |
| 11 | 1 |
So the desired output would be:
| c1 | c2 |
|---|---|
| 1 | 1 |
| 6 | 1 |
| 11 | 1 |
CodePudding user response:
I recreated a similar dataframe like yours. This is not an efficient way to do it, but it can work.
df1:
c1 c2
0 1 1
1 2 start
2 3 3
3 4 end
4 5 5
5 6 start
6 7 end
7 8 0
code:
import pandas as pd
import copy
df = pd.DataFrame({'c1': [1, 2, 3, 4,5,6,7,8], 'c2': ['1', 'start', '3', 'end','5','start','end',0]})
df2 = copy.copy(df)
flag = False
for i, j in df.iterrows():
if j['c2'] == 'start':
flag = True
df2 = df2.drop(df.index[[i]])
elif j['c2'] =='end':
flag = False
df2 = df2.drop(df.index[[i]])
elif flag:
df2 = df2.drop(df.index[[i]])
output df2:
c1 c2
0 1 1
4 5 5
7 8 0
CodePudding user response:
You can use masks
mask1 = df.c2.shift(-1) == "start"
mask2 = df.c2.shift(1) == "end"
newDf = (df.loc[mask1 | mask2]).reset_index(drop=True)
Output
c1 c2
0 1 1
1 5 5
2 8 0
