Home > database >  Iterating over paginated data with an unknown number of pages with Python requests from BSE API
Iterating over paginated data with an unknown number of pages with Python requests from BSE API

Time:01-23

I am trying to fetch data from a URL that uses pagination. I fetch the paginated data using the following payload:

payload = {
'Pageno': '7',
'strCat': '-1',
'strPrevDate': '20220122',
'strScrip': '',
'strSearch': 'P',
'strToDate': '20220122',
'strType': 'C'}

I do not know how many pages are there. I want to take them one by one until I end up with a page that doesn't exist. The code to extract the data is as follows:

jsonData = requests.get(url, headers=headers, params=payload).json()

How can I can ensure that I make this request conditional to existence of the URL?

The URL is:

url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

CodePudding user response:

BSE's API will return 200 OK for any positive value of Pageno, even if you've read past the end of the data. You should loop over each page, and then break out whenever you encounter an empty list (which indicates that you've reached the end of the data).

import requests

payload = {
    'Pageno': 1,
    'strCat': '-1',
    'strPrevDate': '20220122',
    'strScrip': '',
    'strSearch': 'P',
    'strToDate':   '20220122',
    'strType': 'C'
}

url = 'https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

data = []
should_fetch_next_page = True
while should_fetch_next_page:
    print(f"Fetching page {payload['Pageno']} ...")
    jsonData = requests.get(url, headers=headers, params=payload).json()
    if jsonData["Table"]:
        data.extend(jsonData["Table"])
        payload['Pageno']  = 1
    else:
        should_fetch_next_page = False
        
print(data)
  •  Tags:  
  • Related