Home > Back-end >  Unknown pdf encoding from JSON response
Unknown pdf encoding from JSON response

Time:01-23

I have an API that returns a pdf from json, but it just returns as a long string of integers like following

[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...



...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}

My questions are:

  1. What is this encoding?
  2. How to convert this into a pdf using python?

P.S: Here is the endpoint to get the full response.

CodePudding user response:

The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.

So you must first extract that curious string:

data = json_data[1]['data']

to have:

"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."

convert it to a list of int first, then a byte string (i if i >=0 else i 256 ensure positive values...):

intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i 256 for i in intlist)

to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'

And finaly save that to a file:

with open('file.pdf', 'wb') as fd:
    fd.write(b)
  •  Tags:  
  • Related