I'm trying to use the following google translate api endpoint to translate text in an application: https://clients5.google.com/translate_a/t?client=dict-chrome-ex&sl=auto&tl=en&q=контрольная работа
When I click on the link, it downloads a text file which when opened has all the information I need, seemingly in the right format (sentences[0].trans = "text" is the same format as if I manually wrote out the word "text").
However in C# when using a www file request, in python using requests.get, or through postman, I get the following string instead "trans": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°".
I have tried converting it to a bunch of different encodings but none give the right value. It also doesn't sit right with me that the full request has its English parts correct, but the translation which should be in English shows up wrong and the Russian part where it shows the original translation also shows up wrong.
The text that I get back from this doesn't seem to convert back to test regardless of how I change its encoding when trying out different encodings in C# (utf7, utf8, utf16, utf16-be).
Is there something that I'm missing here?
The code for trying the request out, the results from downloading the file manually, and the results from running the code are shown below:
Code:
import json
import requests
text = "контрольная работа"
lang = "en"
url = f"https://clients5.google.com/translate_a/t?client=dict-chrome-ex&sl=auto&tl={lang}&q={text}"
url = url.replace(" ", " ")
res = requests.get(url)
res = res.text
jres = json.loads(res)
translation = jres["sentences"][0]["trans"]
print(res, end="\n\n")
print("\t", translation)
Manual Download (clicking the link in chrome downloads the file):
{
"sentences": [
{
"trans": "test",
"orig": "контрольная работа",
"backend": 10
},
{
"src_translit": "kontrol'naya rabota"
}
],
"dict": [
{
"pos": "noun",
"terms": [
"test"
],
"entry": [
{
"word": "test",
"reverse_translation": [
"тест",
"испытание",
"анализ",
"проверка",
"критерий",
"контрольная работа"
],
"score": 0.18498141
}
],
"base_form": "контрольная работа",
"pos_enum": 1
}
],
"src": "ru",
"alternative_translations": [
{
"src_phrase": "контрольная работа",
"alternative": [
{
"word_postproc": "test",
"score": 1000,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
10
]
},
{
"word_postproc": "test work",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 18
}
],
"raw_src_segment": "контрольная работа",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"ru"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"ru"
]
},
"target_inflections": [
{
"written_form": "test",
"features": {
"number": 2
}
},
{
"written_form": "tests",
"features": {
"number": 1
}
}
]
}
Requesting the file using www in C# (.net framework 3.5 with unity engine when www was not deprecated) or requests in Python:
{
"sentences": [
{
"trans": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
"orig": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°",
"backend": 3,
"translation_engine_debug_info": [
{
"model_tracking": {
"checkpoint_md5": "ef4a126affdcc2d3c84e987e2d0fb6b1",
"launch_doc": "tea_GermanicB_afdaislbnosvfyyiiw_en_2020q2.md"
}
}
]
}
],
"src": "is",
"alternative_translations": [
{
"src_phrase": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°",
"alternative": [
{
"word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
},
{
"word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð °",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
8
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 35
}
],
"raw_src_segment": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"is"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"is"
]
}
}
CodePudding user response:
Since it worked with Chrome directly I added a Chrome user agent header and it worked correctly:
import json
import requests
from pprint import pprint
url = 'https://clients5.google.com/translate_a/t'
params = {'client': 'dict-chrome-ex',
'sl': 'auto',
'tl': 'en',
'q': 'контрольная работа'}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'}
r = requests.get(url,params=params,headers=headers)
jres = r.json()
print(json.dumps(jres, indent=2, ensure_ascii=False))
Output:
{
"sentences": [
{
"trans": "test",
"orig": "контрольная работа",
"backend": 10
},
{
"src_translit": "kontrol'naya rabota"
}
],
"dict": [
{
"pos": "noun",
"terms": [
"test"
],
"entry": [
{
"word": "test",
"reverse_translation": [
"тест",
"испытание",
"анализ",
"проверка",
"критерий",
"контрольная работа"
],
"score": 0.18498141
}
],
"base_form": "контрольная работа",
"pos_enum": 1
}
],
"src": "ru",
"alternative_translations": [
{
"src_phrase": "контрольная работа",
"alternative": [
{
"word_postproc": "test",
"score": 1000,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
10
]
},
{
"word_postproc": "control work",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 18
}
],
"raw_src_segment": "контрольная работа",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"ru"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"ru"
]
},
"target_inflections": [
{
"written_form": "test",
"features": {
"number": 2
}
},
{
"written_form": "tests",
"features": {
"number": 1
}
}
]
}
