Home > Mobile >  How to escape special characters in Python?
How to escape special characters in Python?

Time:02-02

I pulled a report from a crm system that came with some special characters like:

Belgium (Dutch)                                
Saint Lucia                                              
Trinidad and Tobago                                 
Sierra Leone                                             
Mali                                                          
Svalbard and Jan Mayen                         

This is a drop down menu from the web interface that contains all the countries and regions. Per what I read this is an xml formatting issue. I am processing this in Python Pandas. From this post I got an idea but I'd like to write a regex to escape any string with similar sequence of characters.

By the way, I imported the csv file like this:

df = pd.read_csv('report.csv', encoding='utf-8')

And use this to try to escape the characters (which worked for that case only):

df['Country/Region'] = df['Country/Region'].replace(to_replace='(', value= ' ', regex=False)

This is to a specific character. But I could not figure out with a regex.

CodePudding user response:

You can use the built-in function html.unescape:

import html
df['Country/Region'] = df['Country/Region'].astype(str).map(html.unescape)

Output:

>>> df
           Country/Region
0         Belgium (Dutch)
1             Saint Lucia
2     Trinidad and Tobago
3            Sierra Leone
4                    Mali
5  Svalbard and Jan Mayen
  •  Tags:  
  • Related