Home > database >  Web scraping in Python - Arraybuffer to data readable
Web scraping in Python - Arraybuffer to data readable

Time:01-26

I'm developing a web scraper for the Hoymiles monitoring system. One of the statistics I can get is historical data, but I get data in a strange format. After a lot of research and a search in the platform code, I found that in the post request made in addition to the headers and payload, they use a parameter that is the responseType: "arraybuffer". Hence, after more research, I found that arraybuffer is "a data type used to represent a generic, fixed-size binary data buffer".

My code is as follows:

def plants_data_historycal(self, authorization):

    payload = '''
        {
            "mode":3,
            "date":"2022-01-20"
        }
    '''

    headers = {
        'Accept': 'application/json, text/plain, */*',
        'Accept-Encoding': 'gzip, deflate, br',
        'authorization': authorization,
        'Content-Type': 'application/json;charset=UTF-8',
        'Origin': 'https://global.hoymiles.com',
        'Referer': 'https://global.hoymiles.com/platform/login',
        'Cookie': cookie,
        'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36'
    }

    response = self.session.post(self.url '/pvm-data/api/0/statistics/count_station_eq', headers=headers, data=payload)

    if response.status_code != 200:
        raise RuntimeError("A requisição falhou: %s", response)

    print(response.text)
    data = BeautifulSoup(response.text, 'html.parser')
    data = json.loads(data.text)

    return data

The response to my request looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20Y
pv_eqP��G�@H��[H�nH���H�0H��G���G�QH���G��H�JH�
�H`�H�1�H�ݗH@]sH��_H`j�H�!�H

The response.text before BeatifulSoup

\n\x011\n\x012\n\x013\n\x014\n\x015\n\x016\n\x017\n\x018\n\x019\n\x0210\n\x0211\n\x0212\n\x0213\n\x0214\n\x0215\n\x0216\n\x0217\n\x0218\n\x0219\n\x0220\n\x0221\n\x0222\n\x0223\x12e\n\x05pv_eq\x12\\\x00��G\x00�@H��[H\x00�nH���H�\x080H\x00��G���G�Q\x00H���G��\x02H�J\x03H�\r�H`�H�1�H�ݗH@]sH��_H`j�H�!�H���H`z�H I�H

To try to turn this string into something understandable, I tried to use the code available in the inspect option of the Hoymiles home page Chrome browser (https://global.hoymiles.com/platform/home). From there I found that they transformed the arraybuffer with the function

transformResponse: [
    function (e) {
            if ("string" == typeof e)
                   try {
                        e = JSON.parse(e);
                   } catch (e) {}
                        return e;
     },
],

But even with that, the arraybuffer comes empty. So I turned the response as arraybuffer into a Uint8Array, but I can't understand what the data means.

Uint8Array {
0: 123
1:34
10:34
11:49
12:48
13:48
14:34
15:44
16:34
17: 100
18:97
19: 116
2: 115
20: 97
21:34
22:58
23:34
24: 34
25:44
26: 34
27: 109
28: 101
29: 115
3: 116
30: 115
31: 97
32: 103
33: 101
34: 34
35: 58
36: 34
37: 116
38: 111
39: 107
4: 97
40: 101
41: 110
42: 32
43: 118
44: 101
45: 114
46: 105
47: 102
48: 121
49: 32
5: 116
50: 101
51: 114
52: 114
53: 111
54: 114
55: 46
56: 34
57: 125
6: 117
7: 115
8:34
9:58
}

Does anyone know how to turn this into readable or understandable data?

CodePudding user response:

So I don't know what data type you're expecting from that Uint8Array. But below are some insights that might help you answer your question. The Uint8Array typed array represents an array of 8-bit unsigned integers. So theres ambiguity of what this is suppose to represent. It could be a String, a float, json, an int.

So the output you printed of the Uint8Array I'm assuming treats the keys as index positions in the array and the values are the values at that index in the array. However index 50 is not a key in your output so that's confusing

Anyways I made a Uint8Array with the values in the same order as your output below.


// Same array you printed out
const lst = [ 123, 34, 115, 116, 97, 116, 117, 115,  34,  58,  34,  49,  48, 
 48, 34,  44,  34,  100,  97,  116,  97,  34,  58,34,  34,  44,  34,  109,  
 101,  115,  115,  97,  103,  101,  34,  58,  34,  116,  111,  107,  101, 
 110,  32,  118,  101,  114,  105,  102,  121,  32,  undefined,  114,  114,  
  111,  114,  46,  34 ]

// Make the Uint8Array
var data = new Uint8Array(lst);

// Print as String
let str = Buffer.from(data).toString('base64');
console.log(str); // output: eyJzdGF0dXMiOiIxMDAiLCJkYXRhIjoiIiwibWVzc2FnZSI6InRva2VuIHZlcmlmeSAAcnJvci4i


// Print as float 
const floatValue =new DataView(data.buffer).getFloat64(0);
console.log(floatValue) // output: 1.3718470458079746e 285

// Print as json
function printAsJson(arr) {
  let str = "";
  for (var i=0; i<arr.byteLength; i  ) {
    str  = String.fromCharCode(arr[i]);
  }

  var serializedData = JSON.stringify(str);
  let message = JSON.parse(serializedData);

  console.log(message)
}
printAsJson(data); // output: {"status":"100","data":"","message":"token verify rror."

So to sum it up the json makes the most sense and it seems like you have a token verify error.

CodePudding user response:

Actually I had a token verification error, since I was using an outdated version of the authorization token in the javascript code. So I compiled the code with the updated token and used your code.

Code

const lst = [10, 1, 49, 10, 1, 50, 10, 1, 51, 10, 1, 52, 10, 1, 53, 10, 1, 54, 10, 1, 55, 10, 1, 56, 10, 1, 57, 10, 2, 49, 48, 10, 2, 49, 49, 10, 2, 49 , 50, 10 , 2, 49 , 51, 10 , 2, 49 , 52 , 10 , 2 , 49 , 53 , 10 , 2 , 49 , 54 , 10 , 2 , 49 , 55 , 10 , 2 , 49 , 56 , 10 , 2 , 49 , 57 , 10 , 2 , 50 , 48 , 10 , 2 , 50 , 49 , 10 , 2 , 50 , 50 , 10 , 2 , 50 , 51 , 10 , 2 , 50 , 52 , 18 , 105 , 10 , 5 , 112 , 118 , 95 , 101 , 113 , 18 , 96 , 0 , 244 ,  214 ,  71 ,  0 ,  144 ,  64 ,  72 ,  128 ,  237 ,  91 ,  72 ,  0 ,  162 ,  110 ,  72 ,  128 ,  138 ,  130 ,  72 ,  128 ,  8 ,  48 ,  72 ,  0 ,  236 ,  252 ,  71 ,  128 ,  209 ,  205 ,  71 ,  192 ,  81 ,  0 ,  72 ,  128 ,  189 ,  253 ,  71 ,  128 ,  156 ,  2 ,  72 ,  192 ,  74 ,  3 ,  72 ,  128 ,  13 ,  128 ,  72 ,  96 ,  234 ,  150 ,  72 ,  192 ,  49 ,  161 ,  72 ,  224 ,  221 ,  151 ,  72 ,  64 ,  93 ,  115 ,  72 ,  192 ,  145 ,  95 ,  72 ,  96 ,  106 ,  129 ,  72 ,  160 ,  33 ,  164 ,  72 ,  160 ,  240 ,  143 ,  72 ,  96 ,  122 ,  147 ,  72 ,  32 ,  73 ,  158 ,  72 ,  160 ,  159 ,  139 ,  72]

var data = new Uint8Array(lst);

let str = Buffer.from(data).toString('base64');
console.log(str); 

const floatValue =new DataView(data.buffer).getFloat64(0);
console.log(floatValue) 

function printAsJson(arr) {
  let str = "";
  for (var i=0; i<arr.byteLength; i  ) {
    str  = String.fromCharCode(arr[i]);
  }

  var serializedData = JSON.stringify(str);
  let message = JSON.parse(serializedData);

  console.log(message)
}
printAsJson(data); 

This was the result:

str

CgExCgEyCgEzCgE0CgE1CgE2CgE3CgE4CgE5CgIxMAoCMTEKAjEyCgIxMwoCMTQKAjE1CgIxNgoCMTcKAjE4CgIxOQoCMjAKAjIxCgIyMgoCMjMKAjI0EmkKBXB2X2VxEmAA9NZHAJBASIDtW0gAom5IgIqCSIAIMEgA7PxHgNHNR8BRAEiAvf1HgJwCSMBKA0iADYBIYOqWSMAxoUjg3ZdIQF1zSMCRX0hgaoFIoCGkSKDwj0hgepNIIEmeSKCfi0g=

floatValue

1.7470648220442632e-260

json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24i
pv_eq`
  •  Tags:  
  • Related