I am trying to sum rows values of each of the below columns if their "date" values are lower than column names' values :
01-01-2020 01-01-2021 01-01-2022 date
1 1 3 6 01-01-2020
2 4 4 2 01-10-2021
3 5 1 9 01-12-2021
For instance for column 1, the only row whose date value is equal or lower than column 1's name (01-01-2020) is the first row, thus the sum is 1 for column 1.
Likewise, as all dates in the "date" column are lower than last column's name (01-01-2022), the total is 6 2 9=17, which would result to this :
01-01-2020 01-01-2021 01-01-2022 date
1 1 3 6 01-01-2020
2 4 4 2 01-10-2021
3 5 1 9 01-12-2021
Total 1 3 17
Is there any way do to it more elegantly than looping over each columns and then each rows ?
CodePudding user response:
We can check with np.greater_equal.outer, then slice the column mask the unwanted cell with boolean output as NaN
s = pd.to_datetime(df.date).values
m = np.greater_equal.outer(pd.to_datetime(df.columns[:-1]).values,s).T
df = df.append(df.iloc[:,:-1].where(m).sum().to_frame('Total').T)
df
Out[381]:
01-01-2020 01-01-2021 01-01-2022 date
1 1.0 3.0 6.0 01-01-2020
2 4.0 4.0 2.0 01-10-2021
3 5.0 1.0 9.0 01-12-2021
Total 1.0 3.0 17.0 NaN
