Home > Back-end >  How to extract specific content in a pandas dataframe
How to extract specific content in a pandas dataframe

Time:01-18

I want to extract specific content from string. Consider the following dataframe:

data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "bike10", "veh0", "veh10", "moto100"]}  
df = pd.DataFrame(data)

I would like to extract with a regular expression the digit value in the string. The final result should be:

data = {'time': [0, 1, 2, 3, 4], 'id': [0, 10, 0, 10, 100]}  
df = pd.DataFrame(data)

The difficulty here is that the length of the string and the number of digits to extract are variable.

Thanks for help.

CodePudding user response:

You can grab a sequence of digits at the end of each string in the id column, then covert them integers and reassign to the id column.

df['id'] = df.id.str.extract(r'(\d )$').astype(int)

CodePudding user response:

I hope that below code is OK. It removes all alpha characters. You can extend it to special chars.

import pandas as pd
data = {'time': [0, 1, 2, 3, 4], 'id': ["bike0", "biKe10", "veh0", "veh10", "moto100"]}  
df = pd.DataFrame(data)
df["id"] = df["id"].str.replace(r"[a-z]","", case=False)
print(df)
  •  Tags:  
  • Related