Home > Blockchain >  How to loop values of different dictionaries and append them in a pandas dataframe
How to loop values of different dictionaries and append them in a pandas dataframe

Time:01-19

i need to loop the values of different dictionaries and create a data frame out of them.

the data is from an api which outputs a json, that looks like the following

Source Dictionairy

result = {
    "meta": {
        "request": {
            "segment_name": "Searches1",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 100, "confidence": "High"},
        {"date": "2021-11-02", "visits": 200, "confidence": "High"},
        {"date": "2021-11-03", "visits": 300, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 400, "confidence": "High"},
        {"date": "2021-11-05", "visits": 500, "confidence": "Low"},
    ]
},
{
    "meta": {
        "request": {
            "segment_name": "Searches2",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 110, "confidence": "High"},
        {"date": "2021-11-02", "visits": 220, "confidence": "High"},
        {"date": "2021-11-03", "visits": 330, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 440, "confidence": "High"},
        {"date": "2021-11-05", "visits": 540, "confidence": "Low"},
    ]
}

I tried with the following approach where i just loop the "segments"-dictionairy but this obviously doesn't work.

My Approach

def getSearches():

    Searches = []
    segment_name = result['meta']['request']['segment_name']

    if "segments" in result:
        for fs in result['segments']:
            Searches.append(
                {"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})

    fs_df = pd.DataFrame(Searches)
    print(fs_df)


getSearches()

I get the following error message

Error Message

Traceback (most recent call last):
  File "/Users/ismail/Desktop/sw_dict_test", line 51, in <module>
    getFlightSearches()
  File "/Users/ismail/Desktop/sw_dict_test", line 40, in getFlightSearches
    segment_name = result['meta']['request']['segment_name']
TypeError: tuple indices must be integers or slices, not str

to be exact i need to access the "segment_name" from the "request" dictionairy as well as all the variables in the "segments" dictionairy and append them in a pandas table.

Desired Output


         date segment_name  visits confidence
0  2021-11-01    Searches1     100       High
1  2021-11-02    Searches1     200       High
2  2021-11-03    Searches1     300        Low
3  2021-11-04    Searches1     400       High
4  2021-11-05    Searches1     500        Low
5  2021-11-01    Searches2     110       High
6  2021-11-02    Searches2     220       High
7  2021-11-03    Searches2     330        Low
8  2021-11-04    Searches2     440       High
9  2021-11-05    Searches2     550        Low

How can i achieve that?

CodePudding user response:

You can also use json_normalize to flatten the JSON data. Since the list of records, i.e. the dicts that you need to convert to rows are stored in "segments", set record_path='segments'. You only use "segment_name" as metadata for each record, so you set the path to it as a list: meta=[['meta', 'request', 'segment_name']].

Then use rename to change a column name and reindex to get the columns in correct order.

df = pd.json_normalize(result, 'segments', [['meta', 'request', 'segment_name']]).rename({'meta.request.segment_name':'segment_name'}, axis=1).reindex(['date', 'segment_name', 'visits', 'confidence'], axis=1)

Output:

         date segment_name  visits confidence
0  2021-11-01    Searches1     100       High
1  2021-11-02    Searches1     200       High
2  2021-11-03    Searches1     300        Low
3  2021-11-04    Searches1     400       High
4  2021-11-05    Searches1     500        Low
5  2021-11-01    Searches2     110       High
6  2021-11-02    Searches2     220       High
7  2021-11-03    Searches2     330        Low
8  2021-11-04    Searches2     440       High
9  2021-11-05    Searches2     540        Low

CodePudding user response:

result is a tuple, hence the error. Make it a list instead and loop over each element.

result = [{
    "meta": {
        "request": {
            "segment_name": "Searches1",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 100, "confidence": "High"},
        {"date": "2021-11-02", "visits": 200, "confidence": "High"},
        {"date": "2021-11-03", "visits": 300, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 400, "confidence": "High"},
        {"date": "2021-11-05", "visits": 500, "confidence": "Low"},
    ]
},
{
    "meta": {
        "request": {
            "segment_name": "Searches2",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 110, "confidence": "High"},
        {"date": "2021-11-02", "visits": 220, "confidence": "High"},
        {"date": "2021-11-03", "visits": 330, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 440, "confidence": "High"},
        {"date": "2021-11-05", "visits": 540, "confidence": "Low"},
    ]
}]


def getSearches(result):

    Searches = []
    segment_name = result['meta']['request']['segment_name']

    if "segments" in result:
        for fs in result['segments']:
            Searches.append(
                {"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})

    return Searches

searches = []
for r in result:
    searches  = getSearches(r)
    
pd.DataFrame(searches)
  •  Tags:  
  • Related