Home > Mobile >  Python Selenium Webscraping: find_elements_by_xpath returning an empty list
Python Selenium Webscraping: find_elements_by_xpath returning an empty list

Time:01-18

I've taken a few coding subjects in uni and am trying to analyse tennis statistics by learning selenium which is completely new to me.

The page I'm using is here (https://www.atptour.com/en/scores/results-archive?year=2021) and I'm followinig a guide from this website here (https://www.scrapingbee.com/blog/selenium-python/ , https://www.scrapingbee.com/blog/practical-xpath-for-web-scraping/). The particular problem I'm having is in the second guide website under the subtitle "E-commerce product data extraction".

My Goal is to loop through the tournaments and extract the links located with the 'Results' button, but I'm having trouble as my program is just giving me an emptylist.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


DRIVER_PATH = "C:\Program Files (x86)\chromedriver.exe"
#driver = webdriver.Chrome(executable_path=DRIVER_PATH)
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
#driver.get("https://www.nintendo.com/")
#print(driver.page_source)
#driver.quit()
# 1 Data Collection
# 1.1 Find Links to All Tournaments
tournaments_2021_url = "https://www.atptour.com/en/scores/results-archive?year=2021"
#tournament_class = "tourney-result"
driver.get(tournaments_2021_url) # print(driver.page_source)
tournaments_2021_url_list = driver.find_elements_by_xpath("//a[@class='button-border']")
print("\n tournament urls \n")
print(tournaments_2021_url_list)
print(len(tournaments_2021_url_list))
driver.quit()
# 1.2 For Each Tournament, Find Links to Each Match
# 1.3 For Each Match, Extract Relevant Statistics

I would expect to have a list of elements or some weird objects and be able to extract the links, but instead I get an empty list with len 0. Thanks for any help.

CodePudding user response:

To print the value of the href attributes of all the RESULTS you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using PARTIAL_LINK_TEXT:

    driver.get("https://www.atptour.com/en/scores/results-archive?year=2021")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.PARTIAL_LINK_TEXT, "Results")))])
    driver.quit()
    
  • Using CSS_SELECTOR:

    driver.get("https://www.atptour.com/en/scores/results-archive?year=2021")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href$='results']")))])
    driver.quit()
    
  • Using XPATH:

    driver.get("https://www.atptour.com/en/scores/results-archive?year=2021")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Results']")))])
    driver.quit()
    
  • Console Output:

    ['https://www.atptour.com/en/scores/archive/delray-beach/499/2021/results', 'https://www.atptour.com/en/scores/archive/antalya/9426/2021/results', 'https://www.atptour.com/en/scores/archive/auckland/301/2021/results', 'https://www.atptour.com/en/scores/archive/melbourne/8998/2021/results', 'https://www.atptour.com/en/scores/archive/melbourne/9428/2021/results', 'https://www.atptour.com/en/scores/archive/pune/891/2021/results', 'https://www.atptour.com/en/scores/archive/atp-cup/8888/2021/results', 'https://www.atptour.com/en/scores/archive/australian-open/580/2021/results', 'https://www.atptour.com/en/scores/archive/new-york/424/2021/results', 'https://www.atptour.com/en/scores/archive/rio-de-janeiro/6932/2021/results', 'https://www.atptour.com/en/scores/archive/singapore/9460/2021/results', 'https://www.atptour.com/en/scores/archive/cordoba/9158/2021/results', 'https://www.atptour.com/en/scores/archive/montpellier/375/2021/results', 'https://www.atptour.com/en/scores/archive/rotterdam/407/2021/results', 'https://www.atptour.com/en/scores/archive/buenos-aires/506/2021/results', 'https://www.atptour.com/en/scores/archive/doha/451/2021/results', 'https://www.atptour.com/en/scores/archive/marseille/496/2021/results', 'https://www.atptour.com/en/scores/archive/santiago/8996/2021/results', 'https://www.atptour.com/en/scores/archive/dubai/495/2021/results', 'https://www.atptour.com/en/scores/archive/acapulco/807/2021/results', 'https://www.atptour.com/en/scores/archive/miami/403/2021/results', 'https://www.atptour.com/en/scores/archive/marrakech/360/2021/results', 'https://www.atptour.com/en/scores/archive/cagliari/9481/2021/results', 'https://www.atptour.com/en/scores/archive/marbella/9462/2021/results', 'https://www.atptour.com/en/scores/archive/houston/717/2021/results', 'https://www.atptour.com/en/scores/archive/monte-carlo/410/2021/results', 'https://www.atptour.com/en/scores/archive/barcelona/425/2021/results', 'https://www.atptour.com/en/scores/archive/belgrade/5053/2021/results', 'https://www.atptour.com/en/scores/archive/estoril/7290/2021/results', 'https://www.atptour.com/en/scores/archive/munich/308/2021/results', 'https://www.atptour.com/en/scores/archive/madrid/1536/2021/results', 'https://www.atptour.com/en/scores/archive/rome/416/2021/results', 'https://www.atptour.com/en/scores/archive/geneva/322/2021/results', 'https://www.atptour.com/en/scores/archive/lyon/7694/2021/results', 'https://www.atptour.com/en/scores/archive/parma/9510/2021/results', 'https://www.atptour.com/en/scores/archive/belgrade/9512/2021/results', 'https://www.atptour.com/en/scores/archive/roland-garros/520/2021/results', 'https://www.atptour.com/en/scores/archive/s-hertogenbosch/440/2021/results', 'https://www.atptour.com/en/scores/archive/stuttgart/321/2021/results', 'https://www.atptour.com/en/scores/archive/halle/500/2021/results', 'https://www.atptour.com/en/scores/archive/london/311/2021/results', 'https://www.atptour.com/en/scores/archive/mallorca/8994/2021/results', 'https://www.atptour.com/en/scores/archive/eastbourne/741/2021/results', 'https://www.atptour.com/en/scores/archive/wimbledon/540/2021/results', 'https://www.atptour.com/en/scores/archive/hamburg/414/2021/results', 'https://www.atptour.com/en/scores/archive/newport/315/2021/results', 'https://www.atptour.com/en/scores/archive/bastad/316/2021/results', 'https://www.atptour.com/en/scores/archive/los-cabos/7480/2021/results', 'https://www.atptour.com/en/scores/archive/gstaad/314/2021/results', 'https://www.atptour.com/en/scores/archive/umag/439/2021/results', 'https://www.atptour.com/en/scores/archive/tokyo/96/2021/results', 'https://www.atptour.com/en/scores/archive/atlanta/6116/2021/results', 'https://www.atptour.com/en/scores/archive/kitzbuhel/319/2021/results', 'https://www.atptour.com/en/scores/archive/washington/418/2021/results', 'https://www.atptour.com/en/scores/archive/toronto/421/2021/results', 'https://www.atptour.com/en/scores/archive/cincinnati/422/2021/results', 'https://www.atptour.com/en/scores/archive/winston-salem/6242/2021/results', 'https://www.atptour.com/en/scores/archive/us-open/560/2021/results', 'https://www.atptour.com/en/scores/archive/nur-sultan/9410/2021/results', 'https://www.atptour.com/en/scores/archive/metz/341/2021/results', 'https://www.atptour.com/en/scores/archive/laver-cup/9210/2021/results', 'https://www.atptour.com/en/scores/archive/san-diego/9569/2021/results', 'https://www.atptour.com/en/scores/archive/sofia/7434/2021/results', 'https://www.atptour.com/en/scores/archive/chengdu/7581/2021/results', 'https://www.atptour.com/en/scores/archive/zhuhai/9164/2021/results', 'https://www.atptour.com/en/scores/archive/shanghai/5014/2021/results', 'https://www.atptour.com/en/scores/archive/beijing/747/2021/results', 'https://www.atptour.com/en/scores/archive/tokyo/329/2021/results', 'https://www.atptour.com/en/scores/archive/indian-wells/404/2021/results', 'https://www.atptour.com/en/scores/archive/moscow/438/2021/results', 'https://www.atptour.com/en/scores/archive/antwerp/7485/2021/results', 'https://www.atptour.com/en/scores/archive/vienna/337/2021/results', 'https://www.atptour.com/en/scores/archive/st-petersburg/568/2021/results', 'https://www.atptour.com/en/scores/archive/basel/328/2021/results', 'https://www.atptour.com/en/scores/archive/paris/352/2021/results', 'https://www.atptour.com/en/scores/archive/stockholm/429/2021/results', 'https://www.atptour.com/en/scores/archive/intesa-sanpaolo-next-gen-atp-finals/7696/2021/results', 'https://www.atptour.com/en/scores/archive/nitto-atp-finals/605/2021/results']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

I took your code and ran it and it's good. It does what it's supposed to. Thus, my advice is to run it thru a debugger and step thru to make sure everything goes as it's supposed to. Remove the headless option as well so you can visually confirm. Check your chrome browser version and make sure it matches with the chromedriver you're using. (although it should give you an error message if the versions don't match.) Finally, if all else fails, try it using another browser, firefox for example, and the appropriate geckodriver.

  •  Tags:  
  • Related