Reshaping dataframe from categories cross-table to one with multi-index in the columns-CodePudding

Problem statement

I am trying to turn a dataframe with a table that crosses categories with items at different dates into a dataframe with dates in the index and multiindex columns with categories as first level and items as the second level. Below you can find an example of original and target tables.

Is there a pythonic way to achieve this in Pandas?

Input table

All of the table is in the dataFrame data. Indices and columns are just integers*

	Category A	Category B	Category C
31/12/2020
item one	10	100	1000
item two	20	200	2000
31/12/2021
item one	30	300	3000
item two	40	400	4000

Desired output table

Dates should be in the index and columns should be a multiindex of Category and item

	Category A		Category B		Category C
	item one	item two	item one	item two	item one	item two
31/12/2020	10	20	100	200	1000	2000
31/12/2021	30	40	300	400	3000	4000

Python code to create input dataframe

import pandas as pd

input_df = pd.DataFrame(
              index = [0,1,2,3,4,5,6],
              columns = [0,1,2,3],
              data = [
                 ["","Category A","Category B","Category C"],
                 ["31/12/2020","","",""],
                 ["item one",10,100,1000],
                 ["item two",20,200,2000],
                 ["31/12/2021","","",""],
                 ["item one",30,300,3000],
                 ["item two",40,400,4000],
              ]
           )

CodePudding user response：

It's a complex reshape with a central pivot. I used the ability to be converted to date to identify the groups of rows:

out = (input_df
 .T.set_index(0).T
 .assign(date=lambda d: pd.to_datetime(d[''], errors='coerce').ffill())
 .loc[lambda d: pd.to_datetime(d[''], errors='coerce').ne(d['date'])]
 .pivot(index='date', columns='')
 .rename_axis(index=None, columns=(None, None))
)

Output:

           Category A          Category B          Category C         
             item one item two   item one item two   item one item two
2020-12-31         10       20        100      200       1000     2000
2021-12-31         30       40        300      400       3000     4000

CodePudding user response：

Here's another way. Create a dummy Series from the first column depending on if it contains '/' or not (helps identify dates). Then create groups depending on dates. Then create a mapper that maps groups to dates using groupby_first and use it to create a "date" column. Finally, select relevant rows and unstack to get the desired shape:

input_df = input_df.rename(columns=input_df.iloc[0]).drop(input_df.index[0])
check_for_items = input_df[''].str.contains(r'/')
input_df['date'] = check_for_items.cumsum()
input_df['date'] = input_df['date'].map(input_df.groupby('date')[''].first())
out = input_df[~check_for_items].set_index(['date','']).unstack().reset_index()

Output:

         date Category A          Category B          Category C         
                item one item two   item one item two   item one item two
0  31/12/2020         10       20        100      200       1000     2000
1  31/12/2021         30       40        300      400       3000     4000