This is my first experience in dynamic pagination scraping with selenium. I want to scrape following website. Basically the idea is I want to scrape all tables 118 pages of table and store in some json. I tried to get first table and It printed perfectly well but when I tried going to next button, It throws exception
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <tr > is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
here is little part of code I have tried as of now
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get("https://merolagani.com/Floorsheet.aspx")
for z in (driver.find_elements(By.XPATH, '//tbody/tr')):
table_data = z.find_elements_by_tag_name('td')
for td in table_data:
print(td.text)
time.sleep(1)
z.find_element(By.XPATH, "(//a[@title='Next Page'])[2]").click()
It is my first time scraping dynamic pagination any help will be useful thank you.
CodePudding user response:
StaleElementReferenceException means that the page DOM structure was already changed while you still trying to access/interact some WebElement (I mean cached element, stored in some variable), but:
- the element is not present on the page any more, OR
- another element, will be found by the original element's locator
So, make sure after the new page is loaded, you refresh all the elements with
driver.find_element/driver.find_elements commands.
For your case such problem might appear e.g. if you will init the elements list, then iterate over it and there is some new page load will be performed in the loop. And this damages your original element's list.
You should always keep in mind this point.
I see click invocation in your script, potentially, this may lead to StaleElementReferenceException (since it may provoke the DOM changes).
And the message referenced to the <tr > element, so make sure, you refresh it.
See also https://www.selenium.dev/exceptions/#stale_element_reference
CodePudding user response:
Abit laggy ans but I did this way.
total_length = (driver.find_element(By.XPATH, "//span[@id='ctl00_ContentPlaceHolder1_PagerControl2_litRecords']").text)
z = int((total_length.split()[-1]).replace(']', ''))
for data in range(1, z 1):
driver.find_element(By.XPATH, "(//a[@title='Page {}'])[2]".format(data)).click()
for value in driver.find_elements(By.XPATH, '//tbody/tr'):
table_data = value.find_elements_by_tag_name('td')
print([td.text for td in table_data])
time.sleep(2)
