Home > Software design >  SUMIFS in Python
SUMIFS in Python

Time:01-08

What I would like to do is group my data by "Team" and "Date". If player(s) on a team are not playing, then those that are playing will pick up the sum of points of those players not playing on that date. My data has a bunch of different teams, I'm showing one team as an example.

Here is my input data:

input:

  Team  Name Played?  Avg Points Scored      Date
0  Red   Bob     Yes                  8  1/1/2022
1  Red  Mike     Yes                  7  1/1/2022
2  Red  Paul      No                 12  1/1/2022
3  Red   Tom     Yes                  9  1/1/2022
4  Red   Bob     Yes                 10  1/2/2022
5  Red  Mike      No                 14  1/2/2022
6  Red  Paul      No                 12  1/2/2022
7  Red   Tom     Yes                 12  1/2/2022

My goal is the column "Point Burden" below:

goal:

  Team  Name Played?  Avg Points Scored      Date  Point Burden
0  Red   Bob     Yes                  8  1/1/2022            12
1  Red  Mike     Yes                  7  1/1/2022            12
2  Red  Paul      No                 12  1/1/2022             0
3  Red   Tom     Yes                  9  1/1/2022            12
4  Red   Bob     Yes                 10  1/2/2022            26
5  Red  Mike      No                 14  1/2/2022             0
6  Red  Paul      No                 12  1/2/2022             0
7  Red   Tom     Yes                 12  1/2/2022            26

code:

import pandas as pd

#raw data
df = pd.DataFrame({'Team': ['Red']*8,
                   'Name': ['Bob','Mike','Paul','Tom']*2,
                   'Played?': ['Yes','Yes','No','Yes','Yes','No','No','Yes'],
                   'Avg Points Scored': [8,7,12,9,10,14,12,12],
                   'Date': ['1/1/2022']*4 ['1/2/2022']*4}
                   )

df['Date'] = pd.to_datetime(df['Date'])

df['Point Burden'] = df.groupby(['Team', 'Date']).apply(lambda x : x['Avg Points Scored'][x['Played?']=='No'].sum()).reset_index(level=0, drop=True)

print(df)

result:

  Team  Name Played?  Avg Points Scored       Date  Point Burden
0  Red   Bob     Yes                  8 2022-01-01           NaN
1  Red  Mike     Yes                  7 2022-01-01           NaN
2  Red  Paul      No                 12 2022-01-01           NaN
3  Red   Tom     Yes                  9 2022-01-01           NaN
4  Red   Bob     Yes                 10 2022-01-02           NaN
5  Red  Mike      No                 14 2022-01-02           NaN
6  Red  Paul      No                 12 2022-01-02           NaN
7  Red   Tom     Yes                 12 2022-01-02           NaN

I know I'm missing a step somewhere to make the people that played receive the sum, not the players that didn't, but I haven't been able to get the sum to work.

CodePudding user response:

One option is to groupby on the name and date, and merge back to the main dataframe:

# Group on `Name` and `Date`, and select only `No` values:
mapping = (df.pivot_table(index = ['Team', 'Date'], 
                          columns = 'Played?', 
                          values = 'Avg Points Scored', 
                          aggfunc = 'sum' )
             .No # these are the values we need
             .rename('Point Burden')
           )

# merge back to main dataframe
(df.merge(mapping, 
          on = ['Team', 'Date'], 
          how = 'left')
   .assign(**{'Point Burden' : lambda df: np.where(df['Played?'] == 'No', 
                                                   0, 
                                                   df['Point Burden'])
              })
)

  Team  Name Played?  Avg Points Scored       Date  Point Burden
0  Red   Bob     Yes                  8 2022-01-01            12
1  Red  Mike     Yes                  7 2022-01-01            12
2  Red  Paul      No                 12 2022-01-01             0
3  Red   Tom     Yes                  9 2022-01-01            12
4  Red   Bob     Yes                 10 2022-01-02            26
5  Red  Mike      No                 14 2022-01-02             0
6  Red  Paul      No                 12 2022-01-02             0
7  Red   Tom     Yes                 12 2022-01-02            26

Another option, and hopefully simpler and more intuitive, is to filter for the No rows, groupby, and merge back to the main dataframe:

# filter first, before grouping
# should be more efficient this way, 
# since number of rows are reduced before computation
mapping = (df.loc[df['Played?'] == 'No', ['Team', 'Avg Points Scored', 'Date']]
             .groupby(['Team', 'Date'])
             .sum()
           )

# add the `Yes` rows
# makes it easy when merging back to the original dataframe
mapping = mapping.set_index([['Yes'] * len(mapping)], append = True)
mapping.index.names = ['Team', 'Date', 'Played?']
mapping = mapping.squeeze().rename('Points Burden')

#final step
(df.merge(mapping, on = ['Team', 'Date', 'Played?'], how = 'left')
   .fillna({'Points Burden' : 0}, downcast='infer')
)

  Team  Name Played?  Avg Points Scored       Date  Points Burden
0  Red   Bob     Yes                  8 2022-01-01             12
1  Red  Mike     Yes                  7 2022-01-01             12
2  Red  Paul      No                 12 2022-01-01              0
3  Red   Tom     Yes                  9 2022-01-01             12
4  Red   Bob     Yes                 10 2022-01-02             26
5  Red  Mike      No                 14 2022-01-02              0
6  Red  Paul      No                 12 2022-01-02              0
7  Red   Tom     Yes                 12 2022-01-02             26
  •  Tags:  
  • Related