What I would like to do is group my data by "Team" and "Date". If player(s) on a team are not playing, then those that are playing will pick up the sum of points of those players not playing on that date. My data has a bunch of different teams, I'm showing one team as an example.
Here is my input data:
input:
Team Name Played? Avg Points Scored Date
0 Red Bob Yes 8 1/1/2022
1 Red Mike Yes 7 1/1/2022
2 Red Paul No 12 1/1/2022
3 Red Tom Yes 9 1/1/2022
4 Red Bob Yes 10 1/2/2022
5 Red Mike No 14 1/2/2022
6 Red Paul No 12 1/2/2022
7 Red Tom Yes 12 1/2/2022
My goal is the column "Point Burden" below:
goal:
Team Name Played? Avg Points Scored Date Point Burden
0 Red Bob Yes 8 1/1/2022 12
1 Red Mike Yes 7 1/1/2022 12
2 Red Paul No 12 1/1/2022 0
3 Red Tom Yes 9 1/1/2022 12
4 Red Bob Yes 10 1/2/2022 26
5 Red Mike No 14 1/2/2022 0
6 Red Paul No 12 1/2/2022 0
7 Red Tom Yes 12 1/2/2022 26
code:
import pandas as pd
#raw data
df = pd.DataFrame({'Team': ['Red']*8,
'Name': ['Bob','Mike','Paul','Tom']*2,
'Played?': ['Yes','Yes','No','Yes','Yes','No','No','Yes'],
'Avg Points Scored': [8,7,12,9,10,14,12,12],
'Date': ['1/1/2022']*4 ['1/2/2022']*4}
)
df['Date'] = pd.to_datetime(df['Date'])
df['Point Burden'] = df.groupby(['Team', 'Date']).apply(lambda x : x['Avg Points Scored'][x['Played?']=='No'].sum()).reset_index(level=0, drop=True)
print(df)
result:
Team Name Played? Avg Points Scored Date Point Burden
0 Red Bob Yes 8 2022-01-01 NaN
1 Red Mike Yes 7 2022-01-01 NaN
2 Red Paul No 12 2022-01-01 NaN
3 Red Tom Yes 9 2022-01-01 NaN
4 Red Bob Yes 10 2022-01-02 NaN
5 Red Mike No 14 2022-01-02 NaN
6 Red Paul No 12 2022-01-02 NaN
7 Red Tom Yes 12 2022-01-02 NaN
I know I'm missing a step somewhere to make the people that played receive the sum, not the players that didn't, but I haven't been able to get the sum to work.
CodePudding user response:
One option is to groupby on the name and date, and merge back to the main dataframe:
# Group on `Name` and `Date`, and select only `No` values:
mapping = (df.pivot_table(index = ['Team', 'Date'],
columns = 'Played?',
values = 'Avg Points Scored',
aggfunc = 'sum' )
.No # these are the values we need
.rename('Point Burden')
)
# merge back to main dataframe
(df.merge(mapping,
on = ['Team', 'Date'],
how = 'left')
.assign(**{'Point Burden' : lambda df: np.where(df['Played?'] == 'No',
0,
df['Point Burden'])
})
)
Team Name Played? Avg Points Scored Date Point Burden
0 Red Bob Yes 8 2022-01-01 12
1 Red Mike Yes 7 2022-01-01 12
2 Red Paul No 12 2022-01-01 0
3 Red Tom Yes 9 2022-01-01 12
4 Red Bob Yes 10 2022-01-02 26
5 Red Mike No 14 2022-01-02 0
6 Red Paul No 12 2022-01-02 0
7 Red Tom Yes 12 2022-01-02 26
Another option, and hopefully simpler and more intuitive, is to filter for the No rows, groupby, and merge back to the main dataframe:
# filter first, before grouping
# should be more efficient this way,
# since number of rows are reduced before computation
mapping = (df.loc[df['Played?'] == 'No', ['Team', 'Avg Points Scored', 'Date']]
.groupby(['Team', 'Date'])
.sum()
)
# add the `Yes` rows
# makes it easy when merging back to the original dataframe
mapping = mapping.set_index([['Yes'] * len(mapping)], append = True)
mapping.index.names = ['Team', 'Date', 'Played?']
mapping = mapping.squeeze().rename('Points Burden')
#final step
(df.merge(mapping, on = ['Team', 'Date', 'Played?'], how = 'left')
.fillna({'Points Burden' : 0}, downcast='infer')
)
Team Name Played? Avg Points Scored Date Points Burden
0 Red Bob Yes 8 2022-01-01 12
1 Red Mike Yes 7 2022-01-01 12
2 Red Paul No 12 2022-01-01 0
3 Red Tom Yes 9 2022-01-01 12
4 Red Bob Yes 10 2022-01-02 26
5 Red Mike No 14 2022-01-02 0
6 Red Paul No 12 2022-01-02 0
7 Red Tom Yes 12 2022-01-02 26
