I have a dataset
Name Type Cluster Value
ABC AA,BB AZ,YZ 15
LMN CC,DD,EE LM,LM,LM 20
with many other columns.
I want to convert it to a dataframe like:
Name Type Cluster Value TypeSubset ClusterSubset
ABC AA, BB AZ, YZ 15 AA AZ
ABC AA, BB AZ, YZ 15 BB YZ
LMN CC,DD,EE LM,LM,LM 20 CC LM
LMN CC,DD,EE LM,LM,LM 20 DD LM
LMN CC,DD,EE LM,LM,LM 20 EE LM
The dataframe can have many columns. But the Number of elements in Type and Cluster will be same. I just want them separated into different rows and duplicate all the other columns.
How can I do it in python.
I tried
df.set_index(['Type','Cluster'])
.apply(lambda x: x.astype(str).str.split(',').explode())
.reset_index()) ```
Not getting the desired result.
CodePudding user response:
assign new columns and explode in parallel.
(df.assign(TypeSubset=df['Type'].str.split(','),
ClusterSubset=df['Cluster'].str.split(',')
)
.explode(['TypeSubset', 'ClusterSubset'])
)
