Home > Mobile >  Summarize values every N columns
Summarize values every N columns

Time:01-27

From this dataframe :

|-|----|---|---|---|---|---|---|---|---|
| |code|M-1|M-2|M-3|M-4|M-5|M-6|M-7|M-8|
|-|----|---|---|---|---|---|---|---|---|
|0| DE | 3 | 0 | 5 | 7 | 0 | 2 | 1 | 9 |
|1| GT | 5 | 2 | 2 | 1 | 0 | 3 | 1 | 7 |
|2| KT | 8 | 2 | 0 | 3 | 0 | 7 | 0 | 3 |
|3| SZ | 0 | 2 | 3 | 2 | 5 | 4 | 0 | 2 |
|4| NJ | 7 | 3 | 3 | 0 | 2 | 1 | 0 | 1 |
|5| DC | 1 | 0 | 3 | 0 | 8 | 1 | 0 | 0 |
|-|----|---|---|---|---|---|---|---|---|

I would like to get that :

|-|----|-----|-----|
| |code| T-1 | T-2 |
|-|----|-----|-----|
|0| DE |  8  |  9  |
|1| GT |  9  |  4  |
|2| KT | 10  | 10  |
|3| SZ |  5  | 11  |
|4| NJ | 13  |  3  |
|5| DC |  4  |  9  |
|-|----|-----|-----|

Month-1, Month-2, Month-3 are summarized in Trimester-1.

M-4, M-5, M-6 are summarized in T-2

We lack M-9 to add the column T-3...so we deleted M-7 and M-8.

In this example, the input dataframe goes till M-8 but it could have been just till M-1 or till M-12.

CodePudding user response:

static code may not work because as you mentioned number of columns may vary this works for any given number of columns

example input:

  code  M-1  M-2  M-3  M-4  M-5  M-6
0   DE    8    9   10    9   10    9
1   GT    9    4    8    4    8    4
2   KT   10   10    3   10    3   10
3   SZ    5   11    2   11    2   11
4   NJ   13    3    1    3    1    3
5   DC    4    9    0    9    0    9

create another dataframe df1

df1=df[df.columns[:1]]
for x in range(len(df.columns[1:])//3):
     df1["T-" str(x 1)]=df[df.columns[x*3 1:(x 1)*3 1]].sum(axis=1)
df1

a warning might arise ignore it output:

  code  T-1  T-2
0   DE   27   28
1   GT   21   16
2   KT   23   23
3   SZ   18   24
4   NJ   17    7
5   DC   13   18

CodePudding user response:

Assuming you have months in order per group of 3 columns, you can use:

import numpy as np

group = np.arange(len(df.columns)-1)//3 1

(df.set_index('code')
   .groupby(group, axis=1) 
   .sum()
   .add_prefix('T-')
)

If you want to extract the trimester from the column names of potentially unsorted columns (e.g., M-4->T-2):

group = (df.columns[1:].str[2:].astype(int)-1)%3 1

(df.set_index('code')
   .groupby(group, axis=1) 
   .sum()
   .add_prefix('T-')
)

output:

      T-1  T-2  T-3
code               
DE      8    9   10
GT      9    4    8
KT     10   10    3
SZ      5   11    2
NJ     13    3    1
DC      4    9    0
  •  Tags:  
  • Related