I have the below mentioned data.
,name,link,address
0,Aasia Steel Industrial Group,http://www.aasiasteel.com/,"
Address
1
Saudi Arabia
"
1,ADES,http://investors.adihgroup.com/,"
Address
Al-Kifah Tower
King Fahad Road
Dhahran
Saudi Arabia
"
2,AEC,https://www.aecl.com,"
Address
King Khalid International Airport.
Industrial Estate P.O.Box 90916,
Riyadh 11623,
Saudi Arabia
"
There is a lot of unwanted spaces. I tries using the below functions but I am not able to clean my cells.
df['address']=df.address.str.strip()
In the console the the output is address column follows:
\nAddress\r\n\r\n\t\t\t\t\t\t\t\t\t\tAl-Kifah
CodePudding user response:
df['address'].apply(lambda x:' '.join(x.split()))
If the column has other variable excpet string then we can use:
df['address'].apply(lambda x:' '.join(x.split()) if hasattr(x,'lower') else x)
CodePudding user response:
As your address cells have newline, it's better to split it with newline character. Following solution will also remove trailing and ending spaces using strip() method.
def format_address(address):
slines = address.splitlines() #split cell into lines
slines = [ l.strip() for l in lines ] # to remove trailing/ending spaces
slines.remove("") #to remove empty strings
return ",".join(slines) # joining all lines with comma
df.address = df.address.apply(format_address)
