I have the following table:
A B C D
75987 1 0 0
75987 1 1 1
75987 2 1 1
75987 2 2 1
75987 2 6 4
75987 1 6 2
75987 1 6 1
59221 2 18 4
59221 1 18 0
59221 2 18 1
I am trying to create column E based upon the following criteria:
- Groupby columns A and B;
- When column C is less than or equal to 6, sum the corresponding values in column D;
- Assign these summed values to column E
I would expect the values in column E to be as below:
A B C D E
75987 1 0 0 4
75987 1 1 1 4
75987 2 1 1 6
75987 2 2 1 6
75987 1 6 2 4
75987 1 6 1 4
75987 2 6 4 6
59221 2 18 4 0
59221 1 18 0 0
59221 2 18 1 0
There are suggestions on how to do implement groupby, boolean indexing and .query from this forum to solve similar issues but I can't seem to adapt them to get it to work as expected. Here for example, but there are plenty others I have tried too.
My attempt below seems logical but I'm still having difficulties.
df['E'] = np.where(df.groupby(['A','B'])(['C'] <= 6))['D'].transform('sum')
CodePudding user response:
Here is a way that works, similar to your attempt. The idea is to replace the values in D by 0 where the column C is over 6. then groupby.transform with the sum.
df['E'] = (
df['D'].where(df['C'].le(6), other=0)
.groupby([df['A'], df['B']])
.transform(sum)
)
print(df)
# A B C D E
# 0 75987 1 0 0 4
# 1 75987 1 1 1 4
# 2 75987 2 1 1 6
# 3 75987 2 2 1 6
# 4 75987 2 6 4 6
# 5 75987 1 6 2 4
# 6 75987 1 6 1 4
# 7 59221 2 18 4 0
# 8 59221 1 18 0 0
# 9 59221 2 18 1 0
