I've been attempting to use a get request from a rest API call. When putting the following url in for a link the data displays in JSON.
https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json?classId=C&relaSource=ATC&trans=0
Below is my code, which is mostly commented and mostly what I've attempted.
import urllib3
from urllib3 import request
import json
import ndjson
import pandas as pd
from pandas.io.json import json_normalize
http = urllib3.PoolManager()
url ='https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json?classId=C&relaSource=ATC&trans=0'
resp = http.request("GET", url)
data = json.loads(resp.data.decode('utf-8'))
#with open(data) as f: ##expected str, byes or os.pathlike object not dict. This was for a malformed json test.
# source = f.read()
# source = source.replace('}{', '}\n{')
# data = ndjson.loads(source)
#print(data)
#print(data.key()) # drugMemberGroup
#print(data.values()) #prints the entire json
df = pd.json_normalize(data, 'drugMemberGroup') # prints the entire json since it's stored as a single key
#print(df.head(10))
#with open(df) as f: # doesn't change anything still prints a single key:value for df
# source = f.read()
# source = source.replace('}{', '}\n{')
# data = ndjson.loads(source)
#print(df)
#print(df.key())
#b = json.loads(data) #TypeError: the JSON object must be str, bytes or bytearray, not dict
#print(b["rxcui"]) #if I use pdf instead of data then I get the entire json for a single key
#rxcuijson = json.dumps(data) #TypeError: the JSON object must be str, bytes or bytearray, not dict
#print(rxcuijson) #if I use pdf instead of data then I get the entire json for a single key
I've pasted the JSON in a JSON viewer and the tree looks like the following, so I don't believe it's malformed JSON. However, I did try ndjson just in case.:
object {1}
drugMemberGroup {1}
drugMember [326]
0 {2}
minConcept {3}
rxcui : 1007489
name : felodipine / metoprolol
tty : MIN
nodeAttr [3]
0 {2}
attrName : SourceId
attrValue : C07FB02
1 {2}
attrName : SourceName
attrValue : metoprolol and felodipine
2 {2}
attrName : Relation
attrValue : INDIRECT
1 {2}
minConcept {3}
rxcui : 1009219
name : aliskiren / amlodipine
tty : MIN
nodeAttr [3]
0 {2}
attrName : SourceId
attrValue : C09XA53
1 {2}
attrName : SourceName
attrValue : aliskiren and amlodipine
2 {2}
attrName : Relation
attrValue : INDIRECT
2 {2}
minConcept {3}
rxcui : 1033889
name : amlodipine / perindopril
tty : MIN
nodeAttr [3]
0 {2}
attrName : SourceId
attrValue : C09BB04
1 {2}
attrName : SourceName
attrValue : perindopril and amlodipine
2 {2}
attrName : Relation
attrValue : INDIRECT
I'd like the JSON from the GET request to come through as a dictionary, so I can put it in tabular format, or work with it in a dataframe. Everything I tried puts the JSON as a single key:value dictionary. The only key available is the drugMemberGroup key, I believe my problem is around that.
As you can see I tried using pandas, json, ndjson, json_normalize, but nothing seems to work properly. Anything that can point me in the correct direction would be appreciated. Feel's like this should be an easy problem to solve, however it's been quite the headache.
CodePudding user response:
Have you looked at the data? You have a dictionary with a single key, "drugMemberGroup", which itself contains a single key, "drugMember". THAT key contains list, and I suspect it is that list that you want to process. Something like this:
df = pd.json_normalize(data['drugMemberGroup']['drugMember]')
By the way, you should use the excellent jq tool for looking at JSON data. Among its other uses, it prints JSON in a pretty structured format.
