I'm trying to load the videos page of a youtube channel and parse it to extract recent video information. I want to avoid using the API since it has a daily usage quota. The problem I'm having is Selenium does not seem to load the full html of the webpage when printing "driver.pagesource":
from bs4 import BeautifulSoup
from selenium.webdriver import Chrome
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
driver = Chrome(executable_path='chromedriver')
driver.get('https://www.youtube.com/c/Oxylabs/videos')
# Agree to youtube cookie popup
try:
consent = driver.find_element_by_xpath(
"//*[contains(text(), 'I agree')]")
consent.click()
except:
pass
# Parse html
WebDriverWait(driver,100).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="show-more-button"]')))
print(driver.page_source)
I have tried to implement WebDriverWait as seen above. This results in a timeout exception error. However, the following xpath (/html - the end of the webpage) does not result in a timeout exception:
WebDriverWait(driver,100).until(EC.visibility_of_element_located((By.XPATH, '/html')))
-but this does not load the full html either. I have also tried to implement time.sleep(100) instead of WebDriverWait, but this too results in the incomplete html. Any help would be greatly appreciated.
CodePudding user response:
The element you are looking for is not on the page, this is the reason for the timeout:
//*[@id="show-more-button"]
Have you tried scrolling to the page bottom or looking for some other element??
driver.execute_script("arguments[0].scrollIntoView();", element)
