I have a dataframe such as :
COL1
Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU
Rats|Q1GYHH9|VFG_IIV3
Mice|Q9TY15|Q9GT5_GQCPA
Bird|BTYG8|BHJU8_OUIVM
And I would like to extract into a new COL2 the element in COL1 beetwen the two pipes. And get:
COL1 COL2
Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU A0AYFFTBB6
Rats|Q1GYHH9|VFG_IIV3 Q1GYHH9
Mice|Q9TY15|Q9GT5_GQCPA Q9TY15
Bird|BTYG8|BHJU8_OUIVM BTYG8
Thanks a lot for your help.
CodePudding user response:
You can use str.split on '|' and select second element of each split:
df['COL2'] = df['COL1'].str.split('|').str[1]
Output:
COL1 COL2
0 Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU A0AYFFTBB6
1 Rats|Q1GYHH9|VFG_IIV3 Q1GYHH9
2 Mice|Q9TY15|Q9GT5_GQCPA Q9TY15
3 Bird|BTYG8|BHJU8_OUIVM BTYG8
CodePudding user response:
Use Series.str.extract with escape |, because special regex character for get values between 2 |:
df['COL2'] = df['COL1'].str.extract('\|(. )\|')
print (df)
COL1 COL2
0 Cais_lupus|A0AYFFTBB6|AFTT1SZAA6_9VIRU A0AYFFTBB6
1 Rats|Q1GYHH9|VFG_IIV3 Q1GYHH9
2 Mice|Q9TY15|Q9GT5_GQCPA Q9TY15
3 Bird|BTYG8|BHJU8_OUIVM BTYG8
