Remove rows in a pandas dataframe between two specific values-CodePudding

I'm trying to remove rows in a pandas dataframe, in a way that everything between two specific values (e.g., start and end) is deleted, including the two values. These values can repeat, as in:

c1	c2
1	1
2	start
3	1
4	0
5	end
6	1
7	start
8	1
9	0
10	end
11	1

So the desired output would be:

c1	c2
1	1
6	1
11	1

CodePudding user response：

I recreated a similar dataframe like yours. This is not an efficient way to do it, but it can work.

df1:

   c1     c2
0   1      1
1   2  start
2   3      3
3   4    end
4   5      5
5   6  start
6   7    end
7   8      0

code:

import pandas as pd
import copy

df = pd.DataFrame({'c1': [1, 2, 3, 4,5,6,7,8], 'c2': ['1', 'start', '3', 'end','5','start','end',0]})
df2 = copy.copy(df)
flag = False
for i, j in df.iterrows():
    if j['c2'] == 'start':
        flag = True
        df2 = df2.drop(df.index[[i]])
    elif j['c2'] =='end':
        flag = False
        df2 = df2.drop(df.index[[i]])
    elif flag:
        df2 = df2.drop(df.index[[i]])

output df2:

CodePudding user response：

You can use masks

mask1 = df.c2.shift(-1) == "start"                                                    
mask2 = df.c2.shift(1) == "end"                                                       
newDf = (df.loc[mask1 | mask2]).reset_index(drop=True)

Output