Python compute average value of key in series of JSON-CodePudding

I have a pandas.core.series.Series where each element is a JSON as shown

0     {"count": 157065, "grp": {"a1": 12, "a2": 32}}
1     {"count": 2342, "grp": {"a1": 4, "a2": 34}}
2     {"count": 543, "grp": {"a1": 1, "a2": 11}}
3     {"count": 156, "grp": {"a1": 56, "a2": 75}}

How to compute the average value of count in all the JSONs and also compute the average value of a1 and a2?

CodePudding user response：

I'm not entirely sure whether this is what you were asking for.

This is for calculating the average of "count"

doc1 = {"count": 157065, "grp": {"a1": 12, "a2": 32}}
doc2 = {"count": 2342, "grp": {"a1": 4, "a2": 34}}
doc3 = {"count": 543, "grp": {"a1": 1, "a2": 11}}
doc4 = {"count": 156, "grp": {"a1": 56, "a2": 75}}
lojs = [doc1, doc2, doc3, doc4] # list of all the jsons

countaverage = 0
# For every json, it gets the count and adds it to the variable I defined
for j in lojs:
    countaverage  = j["count"]
# Divides it by the length of the amount of documents
countaverage = countaverage/len(lojs)

And if you wanted to get the average of a1 with or instead of the one above, you could use this code:

a1average = 0
for j in lojs:
    a1average  = j["grp"]["a1"] # getting "a1" inside of "grp"
a1average = a1average/len(lojs)

and you could just swap a1 out for a2 if wanted to get a2

EXTENSION For documents that might have different amount of "a"s:

doc1 = {"count": 157065, "grp": {"a1": 12, "a2": 32}}
doc2 = {"count": 2342, "grp": {"a1": 4, "a2": 34}}
doc3 = {"count": 543, "grp": {"a1": 1, "a2": 11, "a3": 46, "a4": 23}}
doc4 = {"count": 156, "grp": {"a1": 56, "a2": 75, "a3": 23}}
lojs = [doc1, doc2, doc3, doc4]

grps = [] # defining a list that will contain all of the "a"s
for doc in lojs: # getting each document in the list of documents
    for a in doc["grp"].keys(): # getting all the keys in the grp of that document
        if a not in grps: # checking whether the "a" already exists in the list of "a"s
            grps.append(a) # adding the new "a" to the list

averages = {} # using a dict instead of a list because it will be containing multiple values
for grp in grps: # getting each "a"
    averages[grp] = [0, 0] # setting the value of that "a" to zero

for grp in grps: # getting each "a"
    for doc in lojs: # getting each document
        if grp in doc["grp"].keys(): # getting every "a" in the grp of the document
            averages[grp][0]  = doc["grp"][grp] # adding the value of that a to the corresponding value/key (idk dude) in the dictionary
            averages[grp][1]  = 1 # increasing the amount the "a" has been mentioned by 1

for el in averages: # getting each average
    averages[el][0] = averages[el][0]/averages[el][1] # dividing b

And you can get the value of each average using

averages["a3"][0]

Of course, you can change "a3" to whichever "a" you want. Btw, if it isn't clear, you are getting the first element because the value of that key is a list that contains both the averaged (idk if that's a word) value and the amount of times the "a" has occurred inside your documents.

This probably isn't the most efficient way, but I mean, it works!