How to remove unwanted spaces from a cell using pandas?-CodePudding

I have the below mentioned data.

,name,link,address
0,Aasia Steel Industrial Group,http://www.aasiasteel.com/,"
Address

                                        1 
                                                Saudi Arabia 
                                    "
1,ADES,http://investors.adihgroup.com/,"
Address

                                        Al-Kifah Tower 
                                                King Fahad Road 
                                                    Dhahran 
                                                Saudi Arabia 
                                    "
2,AEC,https://www.aecl.com,"
Address

                                        King Khalid International Airport. 
                                                Industrial Estate P.O.Box 90916, 
                                                    Riyadh 11623, 
                                                    Saudi Arabia 
                                    "

There is a lot of unwanted spaces. I tries using the below functions but I am not able to clean my cells.

df['address']=df.address.str.strip()

In the console the the output is address column follows:

\nAddress\r\n\r\n\t\t\t\t\t\t\t\t\t\tAl-Kifah

CodePudding user response：

df['address'].apply(lambda x:' '.join(x.split()))

If the column has other variable excpet string then we can use:

df['address'].apply(lambda x:' '.join(x.split()) if hasattr(x,'lower') else x)

CodePudding user response：

As your address cells have newline, it's better to split it with newline character. Following solution will also remove trailing and ending spaces using strip() method.

def format_address(address):
    slines = address.splitlines() #split cell into lines
    slines = [ l.strip() for l in lines ] # to remove trailing/ending spaces
    slines.remove("") #to remove empty strings
    return ",".join(slines) # joining all lines with comma
df.address = df.address.apply(format_address)