i have a pmc.txt file which contains atleast 20k pmc ids, and the api will only take i think 1000 request each time. i have written the code for one id, but i'm not able to do for the whole file, below is my main code. Please help.
if __name__ == '__main__':
URL = 'https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds'
article_ids = ['PMC:4771370']
for article_id in article_ids:
params = {
'articleIds': article_id,
'section': 'Abstract',
'provider': 'Europe PMC',
'format': 'JSON'
}
json_data = requests.get(URL, params=params).content
r = json.loads(json_data)
df = json_to_dataframe(r)
print(df)
df.to_csv("data.csv")
CodePudding user response:
you can read in the data from the file like so:
with open('pmc.txt', 'r') as file:
article_ids = [item.replace('\n', '') for item in file]
which you can do instead of article_ids = ['PMC:4771370']
Though you are going to have to save your files with a different name (you will have 20,000 files then or instead you have to append your json data to the dataframe before you make it a csv)
CodePudding user response:
You can use grequests. You can try setting stream=False in grequests.get, or call explicitly response.close() after reading response.content. It's discussed in detail here
Additionally, you can also test requests-futures. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests but simply wrapped requests into ThreadPoolExecutor can be as fast as grequests, but without external dependencies. Reference here.
