Currently, I have a dataframe that looks as such:
| abc | def | ghi | abc | def | ghi |
|---|---|---|---|---|---|
| 2 | 4 | 78 | 56 | 7 | 45 |
Is there a way to combine the columns that have the same name and create a new row for each set of values? Example:
| abc | def | ghi |
|---|---|---|
| 2 | 4 | 78 |
| 56 | 7 | 45 |
CodePudding user response:
You can use .groupby(level=0, axis='columns') to assign a cumulative count and then perform a transformation based on that.
import pandas as pd
new_cols = pd.MultiIndex.from_arrays([df.columns, df.groupby(level=0, axis=1).cumcount()])
out = df.set_axis(new_cols, axis=1).stack().reset_index(level=0, drop=True)
print(out)
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
You can set up a MultiIndex with help of groupby.cumcount and stack:
(df
.set_axis(pd.MultiIndex
.from_arrays([df.columns,
df.groupby(level=0, axis=1)
.cumcount()]), axis=1)
.stack()
.droplevel(0)
)
Output:
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
Just to give an alternative to other answers with melt:
out = (df.melt(var_name='col', value_name='val')
.assign(idx=lambda x: x.groupby('col').cumcount())
.pivot('idx', 'col', 'val').rename_axis(index=None, columns=None))
print(out)
# Output
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(names_to = '.value', names_pattern = '(. )')
abc def ghi
0 2 4 78
1 56 7 45
In the above solution, the .value determines which parts of the column labels remain as headers - the labels are determined by the groups in the regular expression in names_pattern.
Another option would be to pass the names of the new columns to names_to, while passing a list of matching regular expressions to names_pattern :
df.pivot_longer(names_to = ['abc', 'def', 'ghi'],
names_pattern = ['abc', 'def', 'ghi'])
abc def ghi
0 2 4 78
1 56 7 45
