Home > Net >  pandas: join columns if their headers contains a specific string
pandas: join columns if their headers contains a specific string

Time:02-01

I have a dataframe as follows:

import pandas as pd
df = pd.DataFrame({"A_1_a.1": [3],"A_1_a.2": [4],"A_1_a.3": [1],
"A_2_a.1": [2],"A_2_a.2": [3], "A_2_a.3": [7],"A_3_a.1": [4],
"A_3_a.2": [2],"A_3_a.3": [9]})

The first number in the header is the 'relation number' and the last number in each header is the 'attribute' number. I would like to join the columns in a way that I would have the following output. but i don't know how to proceed.

desired output:

  attribute_num   A_1_a   A_2_a  A_3_a
0     1              3      2      4        
1     2              4      3      2
2     3              1      7      9

I have tried the following but I know that it returns an error

df.groupby(df['A_1_a.1', 'A_1_a.2', 'A_1_a.3', 'A_2_a.1',
          'A_2_a.2', 'A_2_a.3','A_3_a.1', 'A_3_a.2','A_3_a.3'].str[2]).reset_index(name='new')

CodePudding user response:

You can split the column index to create a MultiIndex and stack:

df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.map(lambda x: x.split('.'))))
df.stack(level=1).rename_axis((None, 'attribute_num')).reset_index(level=1)

Output:

  attribute_num  A_1_a  A_2_a  A_3_a
0             1      3      2      4
0             2      4      3      2
0             3      1      7      9
  •  Tags:  
  • Related