Home > Blockchain >  Why selenium and firefox webdriver cannot crawl wesite tags loaded by ajax
Why selenium and firefox webdriver cannot crawl wesite tags loaded by ajax

Time:01-16

I want to get some HTML tags' texts from bonbast which some elements are loaded by ajax (for example tag with "ounce_top" id). I have tried selenium and geckodriver but again I can not crawl these tags and also when robotic firefox (geckodriver) opens, these elements are not shown on the web page! I have no idea why it happens. How can I crawl this website?

Code trials:

from selenium import webdriver
from bs4 import BeautifulSoup

url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
a = driver.find_element_by_id(id_="ounce_top")

CodePudding user response:

To do that with Selenium you will need to add a wait / delay. Preferably to use the expected conditions explicit wait.
I guess you are trying to get the text value inside that element?
This should work:

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 20)
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
your_gold_value = wait.until(EC.visibility_of_element_located((By.ID, "ounce_top"))).text

CodePudding user response:

The desired element is a dynamic element, so ideally to extract the desired text i.e. 1,817.43 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get("https://bonbast.com/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#ounce_top"))).text)
    
  • Using XPATH:

    driver.get("https://bonbast.com/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@id='ounce_top']"))).text)
    
  • Console Output:

    1,817.43
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

  •  Tags:  
  • Related