my code is
if name == 'main': json_data=requests.get("https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds?articleIds=PMC:4771370§ion=Abstract&provider=Europe PMC&format=JSON").content r=json.loads(json_data) df = json_to_dataframe(r) print(df)
My only problem is how can run this for multiple IDs, like i have atleast thousands of ids in a file. Please help I'm using python.
CodePudding user response:
Assuming you know Python and can get all the IDs from the file into a list article_ids, you can use the following script:
URL = 'https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds'
article_ids = ['PMC:4771370']
for article_id in article_ids:
params = {
'articleIds': article_id,
'section': 'Abstract',
'provider': 'Europe PMC',
'format': 'JSON'
}
json_data = requests.get(URL, params=params).content
r = json.loads(json_data)
df = json_to_dataframe(r)
print(df)
CodePudding user response:
After analyzing the shared URL and reading the URL Encodings article, I observed that each value of annotationByArticleIDs has format of SOURCE:EXTERNAL_ID format.
TEST1: If you hit the url:
https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds?articleIds=PMC
Output is: It must contain values with format SOURCE:EXTERNAL_ID where SOURCE must have one of the following values [PMC, MED, PAT, AGR, CBA, HIR, CTX, ETH, CIT, PPR, NBK] and EXTERNAL_ID must be a number when SOURCE=PMC
- Above output shows possible list of sources
- Each source is separated by EXTERNAL_ID using colon
- Colon is represented by : in URL Encoding article
- In order to separate one pair of value from another value, you could use comma operator
- Comma is represented using , in the same URL encoding article
ANSWER: So to fetch multiple articles you could generate string of article ids in the format SOURCE1:EXTERNAL_ID1,SOURCE2:EXTERNAL_ID2 .... SOURCE3:EXTERNAL_ID3 and append in the main url
Few Limitations:
- Max URL Length could be 2048 characters
- Depending upon possible ids, you will be able to fetch around 150 to 200 articles
- You could loop over a batch of 150 and then fetch the required information
