Home > Back-end >  Pandas extracting values from dataframe based on condition
Pandas extracting values from dataframe based on condition

Time:01-07

I'm trying to extract part of string before a dash in some rows in pandas df dataframe. The problem is that when I use extract() function it extracts the part of string before dash but inserts NaN value in rows where there is no dash present.

Data example:

I2311-A45
Z13A-SA87 
CSSSAA1-4 
LKJ3B-15
1AAAZ0-14
ASHENSKFR
ASD
AFSDFGRE

So I have df['values'] where is the example column. My attempts are:

df['values'] = df['values'].str.extract('(.*)-')

output:

I2311
Z13A 
CSSSAA1 
LKJ3B
1AAAZ0
NaN
NaN
NaN

and it gives me 3 NaN values instead of

ASHENSKFR
ASD
AFSDFGRE

Next what I was trying was using df.loc conditions and apply() function with lambda but with the same exception :

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

df['values'] = df['values'].apply(lambda x: df['values'].str.extract('(.*)-') if df['values'].str.contains('-') else None)

Thank you for help in advance!

CodePudding user response:

You can simply use Series.str.split. This will split the value where - is present, otherwise will leave the value as is.

In [134]: df['values'].str.split('-').str[0]
Out[134]: 
0        I2311
1         Z13A
2      CSSSAA1
3        LKJ3B
4       1AAAZ0
5    ASHENSKFR
6          ASD
7     AFSDFGRE
Name: values, dtype: object
  •  Tags:  
  • Related