I have an API that returns a pdf from json, but it just returns as a long string of integers like following
[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...
...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}
My questions are:
- What is this encoding?
- How to convert this into a pdf using python?
P.S: Here is the endpoint to get the full response.
CodePudding user response:
The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.
So you must first extract that curious string:
data = json_data[1]['data']
to have:
"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."
convert it to a list of int first, then a byte string (i if i >=0 else i 256 ensure positive values...):
intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i 256 for i in intlist)
to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'
And finaly save that to a file:
with open('file.pdf', 'wb') as fd:
fd.write(b)
