Python how to remove unwanted substring in a string-CodePudding

I have this column in my dataset with that contains the following data sample:

and I'm trying to make it look like this using Python:

Please help.

CodePudding user response：

In Python when you have a str and you want to remove a substring you can use the .replace method:

>>> a = "Hello!"
>>> a.replace("Hello", '')
'!'
>>> a = a.replace("Hello", '')

In your case the most simple thing to do is this:

>>> s = "Kareem Hunt*\\HuntKa00"
>>> s = s.replace("*\\", ' ').replace("00", '')

Or, to be more sure about the 00 to be removed from the back of the string:

>>> s = s.replace("*\\", ' ').removesuffix("00")

Since your strings are not all ending with 00 but some others are ending for example with 08, I would suggest this:

>>> s = s.replace("*\\", ' ')[:-2]

which excludes the last two characters from the string

CodePudding user response：

You can split on the first special character and get the first chunk:

df['player'] = df['player'].str.split(r'[^\w ]', n=1).str[0]

Or, using replace:

df['player'] = df['player'].str.replace(r'[^\w ].*$', '', regex=True)

Output:

          player
0  David Johnson
1    Kareem Hunt
2  Melvin Gordon