Trying to scrape the table with Selenium where have pagination. Website which trying to scrape don't have pagination in URL.
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
teams = []
page = 0
while page < 10:
page =1
time.sleep(5)
for i in range(len(home)):
temp_data = home[i].text '\n' away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
teams = [] store only data from the first page. When the script move to another page, get this error
Traceback (most recent call last):
File "C:\Users\XXX\OneDrive\Documents\A\b\s_pc.py", line 49, in <module>
temp_data = home[i].text '\n' away[i].text
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 76, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=96.0.4664.45)
Stacktrace:
CodePudding user response:
Have defined the home and away elements inside the while loop. And also shifted the time.sleep() at the beginning of the while loop. And the code didnt throw any error.
Check if this is working as expected.
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
teams = []
page = 0
while page < 10:
time.sleep(5)
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
page =1
for i in range(len(home)):
temp_data = home[i].text '\n' away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
