Home > Back-end >  Extract element in column between two pipes in pandas
Extract element in column between two pipes in pandas

Time:01-18

I have a dataframe such as :

COL1        
Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU
Rats|Q1GYHH9|VFG_IIV3
Mice|Q9TY15|Q9GT5_GQCPA
Bird|BTYG8|BHJU8_OUIVM

And I would like to extract into a new COL2 the element in COL1 beetwen the two pipes. And get:

COL1                                      COL2 
Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU    A0AYFFTBB6
Rats|Q1GYHH9|VFG_IIV3                     Q1GYHH9
Mice|Q9TY15|Q9GT5_GQCPA                   Q9TY15
Bird|BTYG8|BHJU8_OUIVM                    BTYG8

Thanks a lot for your help.

CodePudding user response:

You can use str.split on '|' and select second element of each split:

df['COL2'] = df['COL1'].str.split('|').str[1]

Output:

                                     COL1        COL2
0  Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU  A0AYFFTBB6
1                   Rats|Q1GYHH9|VFG_IIV3     Q1GYHH9
2                 Mice|Q9TY15|Q9GT5_GQCPA      Q9TY15
3                  Bird|BTYG8|BHJU8_OUIVM       BTYG8

CodePudding user response:

Use Series.str.extract with escape |, because special regex character for get values between 2 |:

df['COL2'] = df['COL1'].str.extract('\|(. )\|')
print (df)
                                     COL1        COL2
0  Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU  A0AYFFTBB6
1                   Rats|Q1GYHH9|VFG_IIV3     Q1GYHH9
2                 Mice|Q9TY15|Q9GT5_GQCPA      Q9TY15
3                  Bird|BTYG8|BHJU8_OUIVM       BTYG8
  •  Tags:  
  • Related