Combine similar DataFrame columns and stack values-CodePudding

Currently, I have a dataframe that looks as such:

abc	def	ghi	abc	def	ghi
2	4	78	56	7	45

Is there a way to combine the columns that have the same name and create a new row for each set of values? Example:

abc	def	ghi
2	4	78
56	7	45

CodePudding user response：

You can use .groupby(level=0, axis='columns') to assign a cumulative count and then perform a transformation based on that.

import pandas as pd

new_cols = pd.MultiIndex.from_arrays([df.columns, df.groupby(level=0, axis=1).cumcount()])

out = df.set_axis(new_cols, axis=1).stack().reset_index(level=0, drop=True)

print(out)
   abc  def  ghi
0    2    4   78
1   56    7   45

CodePudding user response：

You can set up a MultiIndex with help of groupby.cumcount and stack:

(df
 .set_axis(pd.MultiIndex
             .from_arrays([df.columns,
                           df.groupby(level=0, axis=1)
                             .cumcount()]), axis=1)
 .stack()
 .droplevel(0)
)

Output:

   abc  def  ghi
0    2    4   78
1   56    7   45

CodePudding user response：

Just to give an alternative to other answers with melt:

out = (df.melt(var_name='col', value_name='val')
         .assign(idx=lambda x: x.groupby('col').cumcount())
         .pivot('idx', 'col', 'val').rename_axis(index=None, columns=None))
print(out)

# Output
   abc  def  ghi
0    2    4   78
1   56    7   45

CodePudding user response：

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(names_to = '.value', names_pattern = '(. )')

   abc  def  ghi
0    2    4   78
1   56    7   45

In the above solution, the .value determines which parts of the column labels remain as headers - the labels are determined by the groups in the regular expression in names_pattern.

Another option would be to pass the names of the new columns to names_to, while passing a list of matching regular expressions to names_pattern :

df.pivot_longer(names_to = ['abc', 'def', 'ghi'], 
                names_pattern = ['abc', 'def', 'ghi'])

   abc  def  ghi
0    2    4   78
1   56    7   45