Home > OS >  Selenium get page's inspect element HTML
Selenium get page's inspect element HTML

Time:01-25

My goal is to get a google search result's title through parsing its inspect data, like this:

Website title

I tried to find the tag this is attached to and search for it using find_element(By.XPATH) but this hasn't seemed to work.

Inspect location of wanted data

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

web = "https://www.google.com/search?q=crash"
path = "C:\\Users\\simon\\Downloads\\chromedriver_win32\\chromedriver.exe"

driver = webdriver.Chrome(path)
driver.get(web)
html = driver.page_source
elem = driver.find_element(By.XPATH, "//div[contains(@class,'yuRUbf')]")
print(elem)

but just get this output:

<selenium.webdriver.remote.webelement.WebElement (session="29b69f91490fc177d1833f2fc156ec01", element="0289ff09-6e3d-4373-ab09-b265cc3b1206")>

Which isn't useful. If I go to the next nested class I get this error:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[contains(@class,'LC201b MBeuO DKV0Md')]"}

Is this because I am parsing the page source rather than the inspect data?

CodePudding user response:

Well, first of all you are missing a delay. So, the simplest way to fix it is to add a dummy time.sleep(5) there while the better approach is to use Expected Conditions explicit wait.
But this will still not work since driver.find_element(By.XPATH, "//div[contains(@class,'yuRUbf')]") will give you a web element object, not a text. To get a text you need to apply the .text method on that web element.
Also your locator is not really correct, should be improved.
Anyway this should work better:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

web = "https://www.google.com/search?q=crash"
path = "C:\\Users\\simon\\Downloads\\chromedriver_win32\\chromedriver.exe"

driver = webdriver.Chrome(path)
wait = WebDriverWait(driver, 20)
driver.get(web)
html = driver.page_source
elem = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'yuRUbf')]")))
print(elem.text)
  •  Tags:  
  • Related