I am quite stuck with a task I have to do. I have a dataframe and I want to delete specific row of this drataframe according to some conditions. when df[var1]>40 we note the value of df[var2] == x
I want to delete all the rows from the first row where df[var1]>40 included and df[var2] == x 1 excluded. Knowing that df[var2] will contain again this "x" value I want to keep. I just want to delete from df[var2] ==x to the first df[var2] == x 1
I am tearing my hairs finding some efficient code :(
I am using python. Where its bold I want to delete
var2 : [1,1,**1,1**,2,2,2, etc ...1,1,1,2,2,etc...]
var1 : [4,12,**350,4**,18,12,etc....12,14,etc...]
I wrote this kind of code to explain what I want to do:
for index, row in df.iterrows():
if df.var1[index]>40:
x = df.var2[index]
while df.var[index] == x :
df.drop(row)
CodePudding user response:
This should do the job:
df[df[var1] <= 40 | df[var2] != x 1]
Essentially, keeping the rows that have the reverse condition
CodePudding user response:
You can use this syntax to extract data with multiple conditions by adding "&" between condition
df[(df[var1] <= 40) & (df[var2] != x 1)]
# df[(condition_1) & (condition_2) &...]
CodePudding user response:
IIUC, suppose the following dataframe:
>>> df
var2 var1
0 1 4
1 1 12
2 1 55 # <- (1) remove from here
3 1 21
4 1 32 # <- (1) until then
5 2 23
6 2 29
7 2 49 # <- (2) remove from here
8 2 72
9 2 23 # <- (2) until then
10 1 12
11 1 49 # <- (3) remove from here
12 1 23 # <- (3) until then
Since you have cyclic values for var2, you can't use it directly, you have to create dummy groups. For each new group, find var1 greater than 40 and apply cumulative sum. Only row where values equal 0 are kept.
>>> df[df.groupby(df['var2'].ne(df['var2'].shift()).cumsum())['var1']
.apply(lambda x: x.gt(40).cumsum()).eq(0)]
var2 var1
0 1 4
1 1 12
5 2 23
6 2 29
10 1 12
