i need to loop the values of different dictionaries and create a data frame out of them.
the data is from an api which outputs a json, that looks like the following
Source Dictionairy
result = {
"meta": {
"request": {
"segment_name": "Searches1",
"metrics": ["Visits"]
},
"status": "Success"
},
"segments": [
{"date": "2021-11-01", "visits": 100, "confidence": "High"},
{"date": "2021-11-02", "visits": 200, "confidence": "High"},
{"date": "2021-11-03", "visits": 300, "confidence": "Low"},
{"date": "2021-11-04", "visits": 400, "confidence": "High"},
{"date": "2021-11-05", "visits": 500, "confidence": "Low"},
]
},
{
"meta": {
"request": {
"segment_name": "Searches2",
"metrics": ["Visits"]
},
"status": "Success"
},
"segments": [
{"date": "2021-11-01", "visits": 110, "confidence": "High"},
{"date": "2021-11-02", "visits": 220, "confidence": "High"},
{"date": "2021-11-03", "visits": 330, "confidence": "Low"},
{"date": "2021-11-04", "visits": 440, "confidence": "High"},
{"date": "2021-11-05", "visits": 540, "confidence": "Low"},
]
}
I tried with the following approach where i just loop the "segments"-dictionairy but this obviously doesn't work.
My Approach
def getSearches():
Searches = []
segment_name = result['meta']['request']['segment_name']
if "segments" in result:
for fs in result['segments']:
Searches.append(
{"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})
fs_df = pd.DataFrame(Searches)
print(fs_df)
getSearches()
I get the following error message
Error Message
Traceback (most recent call last):
File "/Users/ismail/Desktop/sw_dict_test", line 51, in <module>
getFlightSearches()
File "/Users/ismail/Desktop/sw_dict_test", line 40, in getFlightSearches
segment_name = result['meta']['request']['segment_name']
TypeError: tuple indices must be integers or slices, not str
to be exact i need to access the "segment_name" from the "request" dictionairy as well as all the variables in the "segments" dictionairy and append them in a pandas table.
Desired Output
date segment_name visits confidence
0 2021-11-01 Searches1 100 High
1 2021-11-02 Searches1 200 High
2 2021-11-03 Searches1 300 Low
3 2021-11-04 Searches1 400 High
4 2021-11-05 Searches1 500 Low
5 2021-11-01 Searches2 110 High
6 2021-11-02 Searches2 220 High
7 2021-11-03 Searches2 330 Low
8 2021-11-04 Searches2 440 High
9 2021-11-05 Searches2 550 Low
How can i achieve that?
CodePudding user response:
You can also use json_normalize to flatten the JSON data. Since the list of records, i.e. the dicts that you need to convert to rows are stored in "segments", set record_path='segments'. You only use "segment_name" as metadata for each record, so you set the path to it as a list: meta=[['meta', 'request', 'segment_name']].
Then use rename to change a column name and reindex to get the columns in correct order.
df = pd.json_normalize(result, 'segments', [['meta', 'request', 'segment_name']]).rename({'meta.request.segment_name':'segment_name'}, axis=1).reindex(['date', 'segment_name', 'visits', 'confidence'], axis=1)
Output:
date segment_name visits confidence
0 2021-11-01 Searches1 100 High
1 2021-11-02 Searches1 200 High
2 2021-11-03 Searches1 300 Low
3 2021-11-04 Searches1 400 High
4 2021-11-05 Searches1 500 Low
5 2021-11-01 Searches2 110 High
6 2021-11-02 Searches2 220 High
7 2021-11-03 Searches2 330 Low
8 2021-11-04 Searches2 440 High
9 2021-11-05 Searches2 540 Low
CodePudding user response:
result is a tuple, hence the error. Make it a list instead and loop over each element.
result = [{
"meta": {
"request": {
"segment_name": "Searches1",
"metrics": ["Visits"]
},
"status": "Success"
},
"segments": [
{"date": "2021-11-01", "visits": 100, "confidence": "High"},
{"date": "2021-11-02", "visits": 200, "confidence": "High"},
{"date": "2021-11-03", "visits": 300, "confidence": "Low"},
{"date": "2021-11-04", "visits": 400, "confidence": "High"},
{"date": "2021-11-05", "visits": 500, "confidence": "Low"},
]
},
{
"meta": {
"request": {
"segment_name": "Searches2",
"metrics": ["Visits"]
},
"status": "Success"
},
"segments": [
{"date": "2021-11-01", "visits": 110, "confidence": "High"},
{"date": "2021-11-02", "visits": 220, "confidence": "High"},
{"date": "2021-11-03", "visits": 330, "confidence": "Low"},
{"date": "2021-11-04", "visits": 440, "confidence": "High"},
{"date": "2021-11-05", "visits": 540, "confidence": "Low"},
]
}]
def getSearches(result):
Searches = []
segment_name = result['meta']['request']['segment_name']
if "segments" in result:
for fs in result['segments']:
Searches.append(
{"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})
return Searches
searches = []
for r in result:
searches = getSearches(r)
pd.DataFrame(searches)
