Home > database >  pandas groupby excluding when a column takes some value
pandas groupby excluding when a column takes some value

Time:01-11

Is there a way to exclude rows that take certain values when aggregating?

For example:

ID | Company | Cost 
1  | Us      | 2
1  | Them    | 1
1  | Them    | 1
2  | Us      | 1
2  | Them    | 2
2  | Them    | 1

I would like to do a groupby and sum but ignoring whenever a row is Company="us".

The result should be something like this:

ID | Sum of cost
1  | 2
2  | 3

I solved it by doing this, but I want to know if there's a smarter solution:

df_agg = df[df['Company']!="Us"][['ID','Cost']].groupby(['ID']).sum()

CodePudding user response:

Use:

print (df)
   ID Company  Cost
0   1      Us     2
1   1    Them     1
2   1    Them     1
3   2      Us     1
4   2    Them     2
5   2    Them     1
6   3      Us     1 <- added new row for see difference

If need filter first and not matched groups (if exist) are not important use:

df1 = df[df.Company!="Us"].groupby('ID', as_index=False).Cost.sum()
print (df1)
   ID  Cost
0   1     2
1   2     3


df1 = df.query('Company!="Us"').groupby('ID', as_index=False).Cost.sum()
print (df1)
   ID  Cost
0   1     2
1   2     3

If need all groups ID with Cost=0 for Us first set Cost to 0 and then aggregate:

df2 = (df.assign(Cost = df.Cost.where(df.Company!="Us", 0))
         .groupby('ID', as_index=False).Cost
         .sum())
print (df2)
   ID  Cost
0   1     2
1   2     3
2   3     0
  •  Tags:  
  • Related