Home > Mobile >  Remove rows in a pandas dataframe between two specific values
Remove rows in a pandas dataframe between two specific values

Time:01-20

I'm trying to remove rows in a pandas dataframe, in a way that everything between two specific values (e.g., start and end) is deleted, including the two values. These values can repeat, as in:

c1 c2
1 1
2 start
3 1
4 0
5 end
6 1
7 start
8 1
9 0
10 end
11 1

So the desired output would be:

c1 c2
1 1
6 1
11 1

CodePudding user response:

I recreated a similar dataframe like yours. This is not an efficient way to do it, but it can work.

df1:

   c1     c2
0   1      1
1   2  start
2   3      3
3   4    end
4   5      5
5   6  start
6   7    end
7   8      0

code:

import pandas as pd
import copy

df = pd.DataFrame({'c1': [1, 2, 3, 4,5,6,7,8], 'c2': ['1', 'start', '3', 'end','5','start','end',0]})
df2 = copy.copy(df)
flag = False
for i, j in df.iterrows():
    if j['c2'] == 'start':
        flag = True
        df2 = df2.drop(df.index[[i]])
    elif j['c2'] =='end':
        flag = False
        df2 = df2.drop(df.index[[i]])
    elif flag:
        df2 = df2.drop(df.index[[i]])

output df2:

   c1 c2
0   1  1
4   5  5
7   8  0

CodePudding user response:

You can use masks

mask1 = df.c2.shift(-1) == "start"                                                    
mask2 = df.c2.shift(1) == "end"                                                       
newDf = (df.loc[mask1 | mask2]).reset_index(drop=True)

Output

   c1 c2
0   1  1
1   5  5
2   8  0
  •  Tags:  
  • Related