Home > Net >  How to remove unwanted spaces from a cell using pandas?
How to remove unwanted spaces from a cell using pandas?

Time:01-31

I have the below mentioned data.

,name,link,address
0,Aasia Steel Industrial Group,http://www.aasiasteel.com/,"
Address

                                        1 
                                                Saudi Arabia 
                                    "
1,ADES,http://investors.adihgroup.com/,"
Address

                                        Al-Kifah Tower 
                                                King Fahad Road 
                                                    Dhahran 
                                                Saudi Arabia 
                                    "
2,AEC,https://www.aecl.com,"
Address

                                        King Khalid International Airport. 
                                                Industrial Estate P.O.Box 90916, 
                                                    Riyadh 11623, 
                                                    Saudi Arabia 
                                    "

There is a lot of unwanted spaces. I tries using the below functions but I am not able to clean my cells.

df['address']=df.address.str.strip()

In the console the the output is address column follows:

\nAddress\r\n\r\n\t\t\t\t\t\t\t\t\t\tAl-Kifah

CodePudding user response:

df['address'].apply(lambda x:' '.join(x.split()))

If the column has other variable excpet string then we can use:

df['address'].apply(lambda x:' '.join(x.split()) if hasattr(x,'lower') else x)

CodePudding user response:

As your address cells have newline, it's better to split it with newline character. Following solution will also remove trailing and ending spaces using strip() method.

def format_address(address):
    slines = address.splitlines() #split cell into lines
    slines = [ l.strip() for l in lines ] # to remove trailing/ending spaces
    slines.remove("") #to remove empty strings
    return ",".join(slines) # joining all lines with comma
df.address = df.address.apply(format_address)
  •  Tags:  
  • Related