I'm working on a problem where if certain conditions are met then I need to add or subtract from the previous row. So I have the following df:
data = {'sample_val':[5.5, 6.2 , 4.0, 7.8, 3.6], 'sample_lab':['nor','high','nor','high','low']}
df=pd.DataFrame(data)
I want to add an additional column 'adjustment', that is filled by the following rules and has a starting value of 10:
- if sample_val < 4 then - 6 from the previous adjustment entry
- if 4<= sample_val <= 5 then 0 to the previous adjustment entry
- if sample_val>5 then add 4 to the previous adjustment entry
I've tried iterrows( ), but I can't get it to add or subtract. I also tried np.where, but can't get it to iterate.
CodePudding user response:
If I understood correctly, you can use np.select to know which coefficients you must add or subtract, and then accumulate the result to the initial value of 10 using panda's cumsum:
import numpy as np
x = df.sample_val.to_numpy()
adj = np.select([x<4, (x>=4)&(x<=5), x>5], [-6,0,4], x)
df['adjustment'] = 0
df.loc[0, 'adjustment'] = 10
df['adjustment'] = df.adjustment.add(adj).cumsum()
print(df)
sample_val sample_lab adjustment
0 5.5 nor 14.0
1 6.2 high 18.0
2 4.0 nor 18.0
3 7.8 high 22.0
4 3.6 low 16.0
CodePudding user response:
Use pandas.cut to assign numbers to bins and cumsum 10 for the cumulated sum starting at 10:
df['adjustment'] = (pd.cut(df['sample_val'],
bins=[-np.inf, 4, 5, np.inf],
labels=[-6, 0, 4])
.astype(int)
.cumsum()
) 10
output:
sample_val sample_lab adjustment
0 5.5 nor 14
1 6.2 high 18
2 4.0 nor 12
3 7.8 high 16
4 3.6 low 10
CodePudding user response:
There are other approaches that provide concise solutions however as you stated that you could not get this to work with itterows, this solution shows you how it can be done. This approach is also useful if you need to add additional logic or perform other calculations.
You can use Pandas iterrows and loc functions. See the Pandas documentation for iterrows and loc to learn more.
The following assumes you are starting with a value of 10. You are then applying the rules before setting the value of the adjustment column.
# add the additional column
data_df['adjustment'] = 0
# set the initial adjustment
adjustment = 10
# adjust the values of the column
for index, row in data_df.iterrows():
sample_val = float(row['sample_val'])
if sample_val < 4:
adjustment -= 6
elif sample_val > 5:
adjustment = 4
# set the column value
data_df.loc[index, 'adjustment'] = adjustment
print(data_df)
This would produce the following output for your example:
sample_val sample_lab adjustment
0 5.5 nor 14
1 6.2 high 18
2 4.0 nor 18
3 7.8 high 22
4 3.6 low 16
Alternatively, if you require the first row to have a value of 10 in the adjustment column then you will need to adjust the code as follows:
# add the additional column
data_df['adjustment'] = 0
# set the initial adjustment
adjustment = 10
# adjust the values of the column
for index, row in data_df.iterrows():
sample_val = float(row['sample_val'])
if index == 0:
pass
elif sample_val < 4:
adjustment -= 6
elif sample_val > 5:
adjustment = 4
# set the column value
data_df.loc[index, 'adjustment'] = adjustment
print(data_df)
This would produce the following result:
sample_val sample_lab adjustment
0 5.5 nor 10
1 6.2 high 14
2 4.0 nor 14
3 7.8 high 18
4 3.6 low 12
