I have the following data frames:
A list of client invoices.
| Client | NetTotal | Issued |
|---|---|---|
| A | 1000 | 01/01/2021 |
| A | 2500 | 03/01/2021 |
| B | 1500 | 05/01/2021 |
| B | 2000 | 07/01/2021 |
A list of the same clients and dates of promotional offers.
| Client | ChangeDate |
|---|---|
| A | 02/01/2021 |
| B | 06/01/2021 |
Dates are formatted dd/mm/yyyy.
And I need to flatten these into another data frame that shows the totals before and after the change date like so:
| Client | ChangeDate | NetTotal | BeforeChange | AfterChange |
|---|---|---|---|---|
| A | 02/01/2021 | 3500 | 1000 | 2500 |
| B | 06/01/2021 | 3500 | 1500 | 2000 |
Thank you.
data_invoices = {'Client': ['A', 'A', 'B', 'B'], 'NetTotal': [1000,2500,1500,2000], 'Issued':['01/01/2021','03/01/2021', '05/01/2021', '07/01/2021']}
df_invoices = pd.DataFrame(data_invoices)
data_changes = {'Client': ['A', 'B'], 'ChangeDate': ['02/01/2021', '06/01/2021']}
df_changes = pd.DataFrame(data_changes)
CodePudding user response:
Use:
#convert columns to datetimes
df_invoices['Issued'] = pd.to_datetime(df_invoices['Issued'], dayfirst=True)
df_changes['ChangeDate'] = pd.to_datetime(df_changes['ChangeDate'], dayfirst=True)
#added column for compare with greater
df_invoices['ChangeDate'] = df_invoices['Client'].map(df_changes.set_index('Client')['ChangeDate'])
df_invoices['g'] = np.where(df_invoices['ChangeDate'].gt(df_invoices['Issued']), 'BeforeChange','AfterChange')
#pivoting with aggregate sum
df1 = df_invoices.pivot_table(index='Client', columns='g', values='NetTotal', aggfunc='sum')
#added total aggregation sum with before after column
df = df_changes.join(df_invoices.groupby('Client')['NetTotal'].sum(), on='Client').join(df1, on='Client')
print (df)
Client ChangeDate NetTotal AfterChange BeforeChange
0 A 2021-01-02 3500 2500 1000
1 B 2021-01-06 3500 2000 1500
