How to split a dataframe column into 2 new columns, by slicing the all strings before the last item-CodePudding

I have a dataframe that has a column which contains addresses. I would like to split the addresses so that the ending are in a column Ending and the strings before the the ending item are in a separate column Beginning. The address vary in length eg:

Main Street
Jon Smith Close
The Rovers Avenue

After searching different resources I came up with the following

new_address_df['begining'], new_address_df['ending'] = new_address_df['street'].str.split().str[:-1].apply(lambda x: ' '.join(map(str, x))), new_address_df['street'].str.split().str[-1]

The code works but I am not sure if its the right way to write the code in python. Another option would have been to convert to list, modify the data in list form and then convert back to dataframe. I guess this might not be the best approach.

Is there a way to improve the above code if its not pythonic.

CodePudding user response：

There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.

df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]

CodePudding user response：

You might use regular expression for this as follows

import pandas as pd
df = pd.DataFrame({"street":["Main Street","Jon Smith Close","The Rovers Avenue"]})
df2 = df.street.str.extract(r"(?P<Beginning>. )\s(?P<Ending>\S )")
df = pd.concat([df,df2],axis=1)
print(df)

output

              street   Beginning  Ending
0        Main Street        Main  Street
1    Jon Smith Close   Jon Smith   Close
2  The Rovers Avenue  The Rovers  Avenue

Explanation: I used named capturing group which result in pandas.DataFrame with such named columns, which I then concat with original df with axis=1. In pattern I used group are sheared by single whitespace (\s), in group Beginning any character is allowed in group Ending only non-whitespace (\S) characters are allowed.