Home > Software engineering >  seperate multi-valued column into new columns by not using str.split('',expand=true)
seperate multi-valued column into new columns by not using str.split('',expand=true)

Time:01-15

I have below data in csv.

dataCenter,customer,companyID,UID,uba
dc1,customer1,companyID1,uid1,"uba1,uba2,uba3,uba4"
dc2,customer2,companyID1,uid2,"ubaA"
dc3,customer3,companyID3,uid3,"uba1,uba4"
dc4,customer4,companyID4,uid4,"uba1,uba2,uba5,uba6,uba10"

Now I want to convert the data to below format, assign the multi-values in 'uba' column into other new columns.

dataCenter,customer,companyID,UID,action1,action2,action3,action4,action5,...,
dc1,customer1,companyID1,uid1,uba1,uba2,uba3,uba4
dc2,customer2,companyID1,uid2,uba
dc3,customer3,companyID3,uid3,uba1,uba4
dc4,customer4,companyID4,uid4,uba1,uba2,uba5,uba6,uba10,uba11,uba12,uba13

I have tried below,but doesn't work.

a = a.explode('uba')
a = pd.concat([a,pd.DataFrame(a.pop('uba').tolist(),index=a.index)],axis=1)

I don't want to use str.split('',expand=True) as the performance is really bad when data is large.

is there any other good option with good performance for me?

CodePudding user response:

If done want to use str.split('',expand=True) and no missing values is possible use list comprehension:

a = pd.concat([a,pd.DataFrame([x.split(',') for x in a.pop('uba')],index=a.index).add_prefix('action')],axis=1)
print (a)
  dataCenter   customer   companyID   UID action0 action1 action2 action3  \
0        dc1  customer1  companyID1  uid1    uba1    uba2    uba3    uba4   
1        dc2  customer2  companyID1  uid2    ubaA    None    None    None   
2        dc3  customer3  companyID3  uid3    uba1    uba4    None    None   
3        dc4  customer4  companyID4  uid4    uba1    uba2    uba5    uba6   

  action4  
0    None  
1    None  
2    None  
3   uba10  

EDIT: For processing first N values use:

N = 2
a = pd.concat([a,pd.DataFrame([x.split(',', N)[:N] for x in a.pop('uba')],index=a.index).add_prefix('action')],axis=1)
print (a)
  dataCenter   customer   companyID   UID action0 action1
0        dc1  customer1  companyID1  uid1    uba1    uba2
1        dc2  customer2  companyID1  uid2    ubaA    None
2        dc3  customer3  companyID3  uid3    uba1    uba4
3        dc4  customer4  companyID4  uid4    uba1    uba2
  •  Tags:  
  • Related