Home > Software design >  Avoiding iterating through a dataframe to get a total column, using a second dataframe as data
Avoiding iterating through a dataframe to get a total column, using a second dataframe as data

Time:02-02

Considering the two dataframes

>>> df1
    Dr  Cr  Opening Balance
0  B2  B2              0.0
1  B1  B1            100.0
2  D1  D1              0.0
3  F1  F1           -100.0

>>> df2
         Date  Amount  Dr  Cr
0 2021-12-01  452.25  B1  D1
1 2022-01-01  100.00  B1  D1
2 2022-01-02  100.00  B1  D1

I am trying to add a column to df1 which gives the total of Opening Balance sum of Dr from df2 - sum of Cr from df2

In this case, df1 would become:

>>> df1

    Dr  Cr  Opening Balance    Total
0  B2  B2              0.0      0.0
1  B1  B1            100.0   752.25
2  D1  D1              0.0  -652.25
3  F1  F1           -100.0   -100.0

Thanks

CodePudding user response:

One of more straight-forward and relatively debug friendly approach is to group df2 based on Dr and Cr, join the results to df1 and add/subtract the values:

dr = df2.groupby('Dr')['Amount'].sum().rename('Dr Amount')
cr = df2.groupby('Cr')['Amount'].sum().rename('Cr Amount')
df3 = df1.join(dr, 'Dr').join(cr, 'Cr').fillna(0)
df3['Total'] = df3['Opening Balance']   df3['Dr Amount'] - df3['Cr Amount']

#    Dr  Cr  Opening Balance  Dr Amount  Cr Amount   Total
# 0  B2  B2              0.0       0.00       0.00    0.00
# 1  B1  B1            100.0     652.25       0.00  752.25
# 2  D1  D1              0.0       0.00     652.25 -652.25
# 3  F1  F1           -100.0       0.00       0.00 -100.00


# Optional clean-up:
df3.drop(columns=['Dr Amount', 'Cr Amount'], inplace=True)

Note that the fillna(0) on third row is needed only for rows where there is no corresponding Dr or Cr in df2

  •  Tags:  
  • Related