I'm trying to extract part of string before a dash in some rows in pandas df dataframe. The problem is that when I use extract() function it extracts the part of string before dash but inserts NaN value in rows where there is no dash present.
Data example:
I2311-A45
Z13A-SA87
CSSSAA1-4
LKJ3B-15
1AAAZ0-14
ASHENSKFR
ASD
AFSDFGRE
So I have df['values'] where is the example column. My attempts are:
df['values'] = df['values'].str.extract('(.*)-')
output:
I2311
Z13A
CSSSAA1
LKJ3B
1AAAZ0
NaN
NaN
NaN
and it gives me 3 NaN values instead of
ASHENSKFR
ASD
AFSDFGRE
Next what I was trying was using df.loc conditions and apply() function with lambda but with the same exception :
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
df['values'] = df['values'].apply(lambda x: df['values'].str.extract('(.*)-') if df['values'].str.contains('-') else None)
Thank you for help in advance!
CodePudding user response:
You can simply use Series.str.split. This will split the value where - is present, otherwise will leave the value as is.
In [134]: df['values'].str.split('-').str[0]
Out[134]:
0 I2311
1 Z13A
2 CSSSAA1
3 LKJ3B
4 1AAAZ0
5 ASHENSKFR
6 ASD
7 AFSDFGRE
Name: values, dtype: object
