I want to extract specific content from string. Consider the following dataframe:
data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "bike10", "veh0", "veh10", "moto100"]}
df = pd.DataFrame(data)
I would like to extract with a regular expression the digit value in the string. The final result should be:
data = {'time': [0, 1, 2, 3, 4], 'id': [0, 10, 0, 10, 100]}
df = pd.DataFrame(data)
The difficulty here is that the length of the string and the number of digits to extract are variable.
Thanks for help.
CodePudding user response:
You can grab a sequence of digits at the end of each string in the id column, then covert them integers and reassign to the id column.
df['id'] = df.id.str.extract(r'(\d )$').astype(int)
CodePudding user response:
I hope that below code is OK. It removes all alpha characters. You can extend it to special chars.
import pandas as pd
data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "biKe10", "veh0", "veh10", "moto100"]}
df = pd.DataFrame(data)
df["id"] = df["id"].str.replace(r"[a-z]","", case=False)
print(df)
