Is there a way to pass through select numbers when parsing a string column that contains a name and-CodePudding

I have a column in python Name that reads in football team names and a number such as New Orleans Saints 2

I parse the data into 2 separate & additional columns as Name2: New Orleans Saints & Name3: 2

I do this by using this line of code: NameSeparate=dt['Name'].str.split(r'\s(?:([ -]?\d{,100}\.?\d{1,100}?))', expand=True)

Then dt[Name2] = NameSeparate[0] & dt[Name3] = NameSeparate[1] separates and creates the 2 new columns

I have an issue when the code pulls in San Francisco 49ers 5 as the code parses the 49 instead of the 5

I've tried writing a check loop in but can't get the script to pass through the 49 and just pull the 5

The name will always come in as San Francisco 49ers however the 2nd piece can have either a or - sign in addition to any number following, for example it may come in as: San Francisco 49ers 5, San Francisco 49ers -5, San Francisco 49ers 8.5, San Francisco 49ers -8.5, etc.

Can someone please help me write a check for this issue? Thank you!

CodePudding user response：

In case there is always a or a - in front of the number which should be in the second column, you should remove the ? after [ -] in your regex for splitting the original column:

NameSeparate=dt['Name'].str.split(r'\s(?:([ -]\d{,100}\.?\d{1,100}?))', expand=True)

Otherwise, you also select numbers where no such sign is placed before, as you already noticed.

If there are also entries where the number has no or - before it, you can add an $ to the very end of your regex string, which denotes the end of the string. Since the number seems to always come last, this also should work:

NameSeparate=dt['Name'].str.split(r'\s(?:([ -]?\d{,100}\.?\d{1,100}?))$', expand=True)