how do you convert json output to a data frame in python-CodePudding

I need to convert this json file to a data frame in python:

print(resp2)

    {
  "totalCount": 1,
  "nextPageKey": null,
  "result": [
    {
      "metricId": "builtin:tech.generic.cpu.usage",
      "data": [
        {
          "dimensions": [
            "process_345678"
          ],
          "dimensionMap": {
            "dt.entity.process_group_instance": "process_345678"
          },
          "timestamps": [
            1642021200000,
            1642024800000,
            1642028400000
          ],
          "values": [
            10,
            15,
            12
          ]
        }
      ]
    }
  ]
}

Output needs to be like this:

metricId    dimensions  timestamps  values
builtin:tech.generic.cpu.usage  process_345678  1642021200000   10
builtin:tech.generic.cpu.usage  process_345678  1642024800000   15
builtin:tech.generic.cpu.usage  process_345678  1642028400000   12

I have tried this:

print(pd.json_normalize(resp2, "data"))

I get invalid syntax, any ideas?

CodePudding user response：

Take a look at the examples of json_normalize, and you'll see a list of dictionaries that have the key names of the columns you want, unique to each row. When you have nested lists/objects, then the columns will be flatten to have dot-notation, but nested arrays will not end up duplicated across rows.

Therefore, parse the data into a flat list, then you can use from_records.

data = []
for r in resp2['result']:
    metricId = r['metricId']
    for d in r['data']:
        dimension = d['dimensions'][0]  # unclear why this is an array 
        timestamps = d['timestamps']
        values = d['values']
        for t, v in zip(timestamps, values):
            data.append({'metricId': metricId, 'dimensions': dimension,  'timestamps': t, 'values': v})

df = pd.DataFrame.from_records(data)