Home > Software design >  Concatenating two fields which have multiple values separated by delimiter using pandas
Concatenating two fields which have multiple values separated by delimiter using pandas

Time:02-02

I have source csv file with structure as:

Col1, Col2, Col3, Col4
123, John, ID1|ID2|ID3, Lisbon|Kite|Dam
456, Tree, ID4|ID9|ID10|ID11, Chart||Cam|Dare
543, Jam, ID12, Jar

I need to concatenate the col3 and col4 values by maintaining the sequence and order of pipe(delimiter). The CombinedCol should also be a pipe separated column like below:

Col1, Col2, CombinedCol
123, John, ID1:Lisbon|ID2:Kite|ID3:Dam
456, Tree, ID4:Chart|ID9|ID10:Cam|ID11:Dare
543, Jam, ID12:Jar

I am trying this using pandas but I am stuck and not able to iterate the field values at the column

df['CombinedCol']=df['Col3'].str.split('|') ':' df['Col4'].str.split('|')

CodePudding user response:

Use list comparehension with double zip - by splitted values by |, then zip and join by : and last join by | to new column:

zipped = zip(df['Col3'], df['Col4'])

df['CombinedCol'] = ['|'.join(':'.join(y).rstrip(':') 
                      for y in zip(a.split('|'), b.split('|'))) 
                      for a, b in zipped]
print (df)
   Col1  Col2               Col3             Col4  \
0   123  John        ID1|ID2|ID3  Lisbon|Kite|Dam   
1   456  Tree  ID4|ID9|ID10|ID11  Chart||Cam|Dare   
2   543   Jam               ID12              Jar   

                        CombinedCol  
0       ID1:Lisbon|ID2:Kite|ID3:Dam  
1  ID4:Chart|ID9|ID10:Cam|ID11:Dare  
2                          ID12:Jar 

CodePudding user response:

csv file :

Col1,Col2,Col3,Col4
123,John,ID1|ID2|ID3,Lisbon|Kite|Dam
456,Tree,ID4|ID9|ID10|ID11,Chart||Cam|Dare
543,Jam,ID12,Jar

Code:

>>> import pandas as pd
>>> df = pd.read_csv('/Users/rpghosh/python-examples/webscraping/weathermap/test1.csv')
>>> import numpy as np
>>> def f1():
    r = np.empty(len(df), dtype=object)
    for idx, i in enumerate(list(zip(df.Col3, df.Col4))):
        lst_ids = i[0].split('|')
        lst_vals = i[1].split('|')
        s = []
        for item in list(zip(lst_ids, lst_vals)):
            s.append( item[0]   ':'    item[1] )
    
        r[idx] = '|'. join(s)
    #print(r)
    return r

>>> f1()
array(['ID1:Lisbon|ID2:Kite|ID3:Dam', 'ID4:Chart|ID9:|ID10:Cam|ID11:Dare',
       'ID12:Jar'], dtype=object)

>>> df['Combined'] = f1()

>>> print(df)

Col1  Col2               Col3             Col4  \
0   123  John        ID1|ID2|ID3  Lisbon|Kite|Dam   
1   456  Tree  ID4|ID9|ID10|ID11  Chart||Cam|Dare   
2   543   Jam               ID12              Jar   

                            Combined  
0        ID1:Lisbon|ID2:Kite|ID3:Dam  
1  ID4:Chart|ID9:|ID10:Cam|ID11:Dare  
2                           ID12:Jar  
  •  Tags:  
  • Related