I have two dataframes as shown below:
df1 =
A B C D E F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN
2022-04-08 10:07:01 70 80 80 10 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10 NaN
df2 =
A_1 B_1
C 10 10
D 20 10
E 30 20
F 5 15
As you can see, the columns C,D,E and F of df1 are index of df2. I want to perform an operation as follows:
For each row in df1, the values in columns C D E and F are checked. If there is a value present in one of the columns, the name of the column is noted. Then the corresponding values for columns A and B are taken. These values are added with the values in A_1 and B_1 of df2 for that particular index (which was noted earlier) and this sum is updated back in df1. The below table might explain the problem better. The expected output is as follows:
df1 =
A B C D E F Sum_C Sum_D Sum_E Sum_F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN NaN 120 NaN NaN
2022-04-08 10:07:01 70 80 80 10 NaN NaN 170 180 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10 NaN NaN 450 470 NaN
Here,
In the first row of df1, there is a value for column D. The values from column A and B are 40 and 50. Now, since there is a value present in column D of df1, the index D of df2 is selected, the corresponding A_1 and B_1 is noted which in this case is 20 and 10.
All of these values (A B A_1 B_1) is added resulting in a value of 120. This is added in Sum_D column of df1. The rest of the values (Sum_C,Sum_E,Sum_F) are NaN.
Would appreciate if there is an easier way to do this.
Thanks in advance!
CodePudding user response:
IIUC, you can use:
d = df1[df2.index].notna()
out = (df1
.join(d.mul(df2.sum(1)) # add sum from df2
.add(df1[['A', 'B']].sum(1), axis=0) # add data from A/B
.where(d) # keep only original non-NAN
.add_prefix('Sum_')
)
)
output:
A B C D E F Sum_C Sum_D Sum_E Sum_F
timestamp
2022-04-08 10:07:00 40 50 NaN 50 NaN NaN NaN 120 NaN NaN
2022-04-08 10:07:01 70 80 80.0 10 NaN NaN 170.0 180 NaN NaN
2022-04-08 10:07:02 200 220 NaN 10 10.0 NaN NaN 450 470.0 NaN
