I have location in the column 'ville' as the format 'Paris 75000' or 'Paris 75000 foobar'
I would like to keep only chain before zipcode in column 'ville'
I try to decompose the task
ville = 'Paris 75000 foo'
index = re.search(r"[0-9]", val).start()-1
print(ville[0:index]
It seems the result is good my REGEX give me the index of the first number and the slice is ok
But I have difficulties to implement in my dataframe
I tried :
data['ville']= data['ville'].apply(lambda x: re.split(r" [0-9]", x))
But I obtained several values by row in a python list
I tried also with slice method on the column but I can not found the good syntax
an idea ?
Thank you for your help
CodePudding user response:
Try this:
df['ville'] = df['ville'].str.split(' [0-9] ').str[0]
