Home > Net >  How can I convert special characters into regular characters (é to e, å to a, etc.)?
How can I convert special characters into regular characters (é to e, å to a, etc.)?

Time:01-07

I have a list of songs I am trying to use to search with through YouTube. However, when certain songs with special characters are used, the following error pops up:

Code:

import urllib.request
import re

search_kw = tracks[3]['Artist']   ' '   tracks[3]['Track Title']
search_kw = search_kw.replace(' ',' ')

html = urllib.request.urlopen("https://www.youtube.com/results?search_query="   search_kw)
video_ids = re.findall(r"watch\?v=(\S{11})", html.read().decode())
print("https://www.youtube.com/watch?v="   video_ids[0])

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 43: ordinal not in range(128)

Example of string that causes error:

Tutu Au Mic'  –  dumbéa

How can I convert the special characters into regular characters to prevent the error from occurring?

CodePudding user response:

Use the Unidecode library for this: https://pypi.org/project/Unidecode/, that guarantees a ascii string in return.

CodePudding user response:

For a web query you probably need to use urlencode

urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus)

or for general character translations the string maketrans method

Python 3.9.5 (default, Nov 18 2021, 16:00:48) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = "Tutu Au Mic' – dumbéa"
>>> mytable = txt.maketrans("é", "e")
>>> print(txt.translate(mytable))
Tutu Au Mic' – dumbea
>>> 

CodePudding user response:

Instead of doing this, you should encode the non-ascii characters. Youtube will likely be able to understand what you mean with an ascii approximation, but not all characters have an ascii approximation. And it's not necessary, there are well defined ways to pass non-ascii characters in as part of a URL's query string.

The standard library offers urlib.parse.quote_plus for escaping text to be used as a query string. Or use the excellent requests library, https://docs.python-requests.org/en/latest/.

  •  Tags:  
  • Related