Home > OS >  Why the Selenium xpath to scrape ab table is NOT matching, although an attribute is unique given
Why the Selenium xpath to scrape ab table is NOT matching, although an attribute is unique given

Time:01-05

I try to scrape the NASDAQ values from the www.n-tv.de website. I'm crawling with SELENIUM through the Sites. The Stock Values are on the Site in Tables.

The Source COde of Table for Example is like:

<div >
  <table >
    <thead>
      <tr>
        <th>Name</th><th >Kurs</th><th >%</th><th >Absolut</th><th >Relation</th><th >Zeit</th><th >Handelsvolumen</th><th >ISIN</th>
      </tr>
    </thead>
    <tbody>
      
      <tr  onclick="document.location='https://www.n-tv.de/boersenkurse/aktien/activision-blizzard-295693';">
        <td>Activision Blizzard</td>
        <td ><span >66,53$</span></td>
        <td ><span >-1,42%</span></td>
        <td ><span >-0,96</span></td>
        <td ><span >&nbsp;<span><span></span></span><span style="border-width: 24px;"></span></span></td>
        <td >31.12.</td>
        <td >8 Tsd.</td>
        <td >US00507V1098</td>
      </tr>
  
      
      ...
  
    </tbody>
  </table>
</div>

SO I do not understand the following Problem:

Seachrching the Web Elements of NASDAQ table i will do per Xpath :

nasdaq = driver.find_element_by_xpath('//table[@]')
       
rows_nasdaq = nasdaq.find_elements_by_class_name('linked')

I have made another solution, that works correctly by searching the tableholder elements (3 on this site) and after listing them then taking only the third object, but i really want to understand, why that xpath selctor above is not working to this the element , although i have the class name unique on this site as an attribute of the table tag element.

I do not use css or something, has someone an idea, why in this case the xpath is not matching ??

CodePudding user response:

Assumed yo like to scrape this url https://www.n-tv.de/boersenkurse/suche/?suchbegriff=to le.

You have to wait for element you try to find is present in the DOM and can use selenium waits for this:

nasdaq = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@]')))

Need to be imported

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Example:

....
driver.get('https://www.n-tv.de/boersenkurse/suche/?suchbegriff=to le')
nasdaq = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@]')))

for i in nasdaq.find_elements_by_class_name('linked'):
    print(i.get_attribute('onclick'))

Output

document.location='https://www.n-tv.de/boersenkurse/indizes/swx-sp-tra-leis-tr-303397';
document.location='https://www.n-tv.de/boersenkurse/aktien/apollo-tourism- -leisure-1562996';
document.location='https://www.n-tv.de/boersenkurse/aktien/toqublanmonde--eo-047-11904326';
document.location='https://www.n-tv.de/boersenkurse/indizes/cb-p2p-onl-lend---digbanking-12533785';
document.location='https://www.n-tv.de/boersenkurse/indizes/concinngenddivwomin-leader-3254557';
document.location='https://www.n-tv.de/boersenkurse/indizes/concinnity-msos-leaders-39076931';
...

EDIT

Based on your comment I got the "link" - Issue, there was no table under url https://www.n-tv.de/ but the nasdaq is linked by https://www.n-tv.de/boersenkurse/indizes/nasdaq-849974 and there I found your table.

So it is not necessary to wait, but it can't hurt either. I have imported the table directly with pandas into a dataframe:

import pandas as pd
...
driver.get('https://www.n-tv.de/boersenkurse/indizes/nasdaq-849974')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@]')))

pd.read_html(driver.page_source)[3]

Output

Note: Relation column is empty, cause there is no text stored in it and you can simply drop it, if you like

Name Kurs % Absolut Relation Zeit Handelsvolumen ISIN
Activision Blizzard 67,12$ -0,44% -30 nan 18:05 4 Mio. US00507V1098
Adobe 545,25$ -3,39% -1912 nan 18:05 2 Mio. US00724F1012
Advanced Micro Devices 141,89$ -5,55% -834 nan 18:05 44 Mio. US0079031078
Airbnb 167,86$ -2,79% -481 nan 18:05 2 Mio. US0090661010
Align Technology 629,44$ -2,87% -1861 nan 18:02 178 Tsd. US0162551016
... ... ... ... ... ... ... ...
  •  Tags:  
  • Related