Home > OS >  Sum if column name is higher than row value
Sum if column name is higher than row value

Time:01-21

I am trying to sum rows values of each of the below columns if their "date" values are lower than column names' values :

  01-01-2020     01-01-2021     01-01-2022           date
1          1              3              6     01-01-2020
2          4              4              2     01-10-2021
3          5              1              9     01-12-2021

For instance for column 1, the only row whose date value is equal or lower than column 1's name (01-01-2020) is the first row, thus the sum is 1 for column 1.

Likewise, as all dates in the "date" column are lower than last column's name (01-01-2022), the total is 6 2 9=17, which would result to this :

      01-01-2020     01-01-2021     01-01-2022           date
1              1              3              6     01-01-2020
2              4              4              2     01-10-2021
3              5              1              9     01-12-2021
Total          1              3             17

Is there any way do to it more elegantly than looping over each columns and then each rows ?

CodePudding user response:

We can check with np.greater_equal.outer, then slice the column mask the unwanted cell with boolean output as NaN

s = pd.to_datetime(df.date).values
m = np.greater_equal.outer(pd.to_datetime(df.columns[:-1]).values,s).T
df = df.append(df.iloc[:,:-1].where(m).sum().to_frame('Total').T)
df
Out[381]: 
       01-01-2020  01-01-2021  01-01-2022        date
1             1.0         3.0         6.0  01-01-2020
2             4.0         4.0         2.0  01-10-2021
3             5.0         1.0         9.0  01-12-2021
Total         1.0         3.0        17.0         NaN
  •  Tags:  
  • Related