From this dataframe :
|-|----|---|---|---|---|---|---|---|---|
| |code|M-1|M-2|M-3|M-4|M-5|M-6|M-7|M-8|
|-|----|---|---|---|---|---|---|---|---|
|0| DE | 3 | 0 | 5 | 7 | 0 | 2 | 1 | 9 |
|1| GT | 5 | 2 | 2 | 1 | 0 | 3 | 1 | 7 |
|2| KT | 8 | 2 | 0 | 3 | 0 | 7 | 0 | 3 |
|3| SZ | 0 | 2 | 3 | 2 | 5 | 4 | 0 | 2 |
|4| NJ | 7 | 3 | 3 | 0 | 2 | 1 | 0 | 1 |
|5| DC | 1 | 0 | 3 | 0 | 8 | 1 | 0 | 0 |
|-|----|---|---|---|---|---|---|---|---|
I would like to get that :
|-|----|-----|-----|
| |code| T-1 | T-2 |
|-|----|-----|-----|
|0| DE | 8 | 9 |
|1| GT | 9 | 4 |
|2| KT | 10 | 10 |
|3| SZ | 5 | 11 |
|4| NJ | 13 | 3 |
|5| DC | 4 | 9 |
|-|----|-----|-----|
Month-1, Month-2, Month-3 are summarized in Trimester-1.
M-4, M-5, M-6 are summarized in T-2
We lack M-9 to add the column T-3...so we deleted M-7 and M-8.
In this example, the input dataframe goes till M-8 but it could have been just till M-1 or till M-12.
CodePudding user response:
static code may not work because as you mentioned number of columns may vary this works for any given number of columns
example input:
code M-1 M-2 M-3 M-4 M-5 M-6
0 DE 8 9 10 9 10 9
1 GT 9 4 8 4 8 4
2 KT 10 10 3 10 3 10
3 SZ 5 11 2 11 2 11
4 NJ 13 3 1 3 1 3
5 DC 4 9 0 9 0 9
create another dataframe df1
df1=df[df.columns[:1]]
for x in range(len(df.columns[1:])//3):
df1["T-" str(x 1)]=df[df.columns[x*3 1:(x 1)*3 1]].sum(axis=1)
df1
a warning might arise ignore it output:
code T-1 T-2
0 DE 27 28
1 GT 21 16
2 KT 23 23
3 SZ 18 24
4 NJ 17 7
5 DC 13 18
CodePudding user response:
Assuming you have months in order per group of 3 columns, you can use:
import numpy as np
group = np.arange(len(df.columns)-1)//3 1
(df.set_index('code')
.groupby(group, axis=1)
.sum()
.add_prefix('T-')
)
If you want to extract the trimester from the column names of potentially unsorted columns (e.g., M-4->T-2):
group = (df.columns[1:].str[2:].astype(int)-1)%3 1
(df.set_index('code')
.groupby(group, axis=1)
.sum()
.add_prefix('T-')
)
output:
T-1 T-2 T-3
code
DE 8 9 10
GT 9 4 8
KT 10 10 3
SZ 5 11 2
NJ 13 3 1
DC 4 9 0
