Home > OS >  Cannot click on SVG element within loop and grab all content
Cannot click on SVG element within loop and grab all content

Time:01-05

I'm trying to scrape a job-site for information on it's job-titles. I would also like to go to the next page, extract the data and keep going until no more pages are available. However, when I try to click on the next page which is a svg tag, I get the following error:

ElementClickInterceptedException: Message: element click intercepted: Element <path d="M5.408.153a.588.588 0 00-.098.755l.059.076L13.566 10 5.37 19.016a.588.588 0 00-.025.761l.065.07c.216.197.54.202.761.026l.07-.066 8.27-9.096c.337-.372.363-.925.077-1.324l-.078-.097L6.24.193a.588.588 0 00-.832-.04z" fill-rule="evenodd"></path> is not clickable at point (1357, 686). Other element would receive the click: <section id="explicit_consent" >...</section>
  (Session info: chrome=96.0.4664.110)
Stacktrace:
0   chromedriver                        0x000000010edfa269 __gxx_personality_v0   582729
1   chromedriver                        0x000000010ed85c33 __gxx_personality_v0   106003
2   chromedriver                        0x000000010e942e28 chromedriver   171560
3   chromedriver                        0x000000010e97f681 chromedriver   419457
4   chromedriver                        0x000000010e97d33e chromedriver   410430
....
....

Here's the script that I'm working with:

from selenium import webdriver
import time
import pandas as pd
from collections import defaultdict
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

url1 = {'Accounting_and_Finance': ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance'],
             'Aeronautical_Engineering': ['https://www.jobsite.co.uk/jobs/Degree-Aeronautical-Engineering']}


driver = webdriver.Chrome()
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
            data = match.select('article h2[]')
            #test_data['job_title'].append(data.text.strip())
            print(data)
            points = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[name()='path' and contains(@d,'M5.408.153')]")))
            for point in points:
                point.click()
            time.sleep(1)

Update: Whilst using additional code from @Arundeep, I have the following:

driver = webdriver.Chrome()
driver.implicitly_wait(5)
wait = WebDriverWait(driver, 5)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        while True:
            for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
                for m in range(1, 26):
                    data = match.select(f'body > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child({m}) > article:nth-child(1) > div:nth-child(3) > dl:nth-child(5) > span:nth-child(1)')
                    #test_data['job_title'].append(data.text.strip())
                    print(data)
                    wait=WebDriverWait(driver,60)
                    try:
                        wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))).click()
                    except:
                        break

I've had to change the CSS-selector so it's more general, as each page the the name in class changes. However, I cannot seem to get the text as this keeps throwing an error. Without grabbing the text, I get the following output:

[<span  fill="#3a434f"><svg viewbox="0 0 16 16"><path d="M15.52 5.06a.48.48 0 01.472.394L16 5.54v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H3.36a.48.48 0 01-.086-.952l.086-.008h11.52a.16.16 0 00.152-.11l.008-.05V5.54a.48.48 0 01.48-.48zm-1.28-1.28a.48.48 0 01.472.394l.008.086v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H2.08a.48.48 0 01-.086-.952l.086-.008H13.6a.16.16 0 00.152-.11l.008-.05V4.26a.48.48 0 01.48-.48zM11.683 2.5c.795 0 1.44.645 1.44 1.44v5.484a1.44 1.44 0 01-1.44 1.44H1.44A1.44 1.44 0 010 9.424V3.94C0 3.145.645 2.5 1.44 2.5zM.96 8.634v.79c0 .265.215.48.48.48l.789-.001L.96 8.634zm8.575-5.175H3.588L.96 6.087v1.189l2.627 2.627h5.949l2.627-2.628V6.088L9.535 3.459zm2.628 5.174l-1.269 1.27h.79a.48.48 0 00.471-.393l.008-.086v-.791zM6.562 4.351a2.33 2.33 0 110 4.662 2.33 2.33 0 010-4.662zm0 .96a1.37 1.37 0 100 2.742 1.37 1.37 0 000-2.742zm-3.438.89a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.53zm7.404 0a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.529zm1.155-2.741l-.79-.001 1.27 1.271v-.79a.48.48 0 00-.394-.472l-.086-.008zM2.23 3.459l-.79.001a.48.48 0 00-.48.48v.789l1.27-1.27z" fill-rule="evenodd"></path></svg></span>]
[]
[]
[]
[]
...
...

It clicks on the next page, but I cannot grab the data for each page as given by the output. Is this an issue with the loop, and selector?

CodePudding user response:

url1 = {'Accounting_and_Finance': ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance'],
             'Aeronautical_Engineering': ['https://www.jobsite.co.uk/jobs/Degree-Aeronautical-Engineering']}


driver = webdriver.Chrome()
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
            data = match.select('article h2[]')
            #test_data['job_title'].append(data.text.strip())
            print(data)
            points = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[name()='path' and contains(@d,'M5.408.153')]")))
            for point in points:
                driver.find_element_by_xpath(point).click()
            time.sleep(1)

May be it works.Try and comment if not worked. Another method having just change the presense of element or visibility_of_element_located or or use "arguments[0].click();" (js) to click the element.

CodePudding user response:

wait=WebDriverWait(driver,60)     
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#ccmgt_explicit_accept > span"))).click()

You have an overlapping element please close it first.

<section id="explicit_consent" >...</section>

To find the pagination-next I would recommend using the following so you can tell when it's disabled or not.

while True:
    try:
        wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))).click()
    except:
        break

Imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
  •  Tags:  
  • Related