I have this column in my dataset with that contains the following data sample:
| player |
|---|
| David Johnson* \JohnDa08 |
| Kareem Hunt*\HuntKa00 |
| Melvin Gordon\GordMe00 |
and I'm trying to make it look like this using Python:
| player |
|---|
| David Johnson |
| Kareem Hunt |
| Melvin Gordon |
Please help.
CodePudding user response:
In Python when you have a str and you want to remove a substring you can use the .replace method:
>>> a = "Hello!"
>>> a.replace("Hello", '')
'!'
>>> a = a.replace("Hello", '')
In your case the most simple thing to do is this:
>>> s = "Kareem Hunt*\\HuntKa00"
>>> s = s.replace("*\\", ' ').replace("00", '')
Or, to be more sure about the 00 to be removed from the back of the string:
>>> s = s.replace("*\\", ' ').removesuffix("00")
Since your strings are not all ending with 00 but some others are ending for example with 08, I would suggest this:
>>> s = s.replace("*\\", ' ')[:-2]
which excludes the last two characters from the string
CodePudding user response:
You can split on the first special character and get the first chunk:
df['player'] = df['player'].str.split(r'[^\w ]', n=1).str[0]
Or, using replace:
df['player'] = df['player'].str.replace(r'[^\w ].*$', '', regex=True)
Output:
player
0 David Johnson
1 Kareem Hunt
2 Melvin Gordon
