I have a table created using 'div' elements, which has dynamic content based on the choice and also the data to be displayed that are generated with javascript. Html structure is like this:
<div >
<div >
<div >
<div >
</div></div></div>
<div style="box-shadow:none">
<div style="width:0"></div>
<span >Total common shares outstanding</span></div>
<div ></div>
<div >
<div >
<div >
<div>22.32B</div>
</div></div>
<div >
<div >
<div>21.34B</div>
</div></div>
<div >
<div ><div>20.50B</div>
</div></div>
Using below python code, result is like this: Total common shares outstanding22.32B21.34B20.50B19.02B17.77B16.98B16.43B16.33B Instead I would it in a list or in a dtaframe like this:
['Total common shares outstanding',22.32,21.34,20.50B,19.02,17.77,16.98B,16.43,16.33,]
Python code I'm using to scrape data is this one:
from selenium import webdriver
import pandas as pd
import requests, bs4
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
url ='https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/'
driver = webdriver.Chrome('chromedriver',options=options)
driver.get(url)
html = driver.page_source
#print(html)
soup = bs4.BeautifulSoup(html, 'html.parser')
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
print(title.text '\n')
Is there any way in selenium or beautifulsoap to get a list like that?
CodePudding user response:
As one approach, if there is no api, what you should prefer to use, you can go with BeautifulSoup and stripped_strings:
data = []
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
data.append(list(title.stripped_strings))
pd.DataFrame(data)
Output DataFrame:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|---|
| Key stats | |||||||||
| Total common shares outstanding | 22.32B | 21.34B | 20.50B | 19.02B | 17.77B | 16.98B | 16.43B | 16.33B | |
| Float shares outstanding | 22.29B | 21.32B | 20.48B | 18.99B | 17.75B | 16.96B | 16.41B | 16.32B | |
| Number of employees | 110.00K | 116.00K | 123.00K | 132.00K | 137.00K | 147.00K | 154.00K | — | |
| Number of shareholders | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | — | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
CodePudding user response:
Using Selenium to print the desired texts you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
Using xpath:
driver.get("https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Accept']"))).click() df = pd.DataFrame([my_elem.text.replace('\u202a', ' ').replace('\u202c', ' ') for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[text()='Total common shares outstanding']//following::div[2]//div[starts-with(@class, 'wrap')]/div")))], columns = ['Total common shares outstanding']) print(df) driver.quit()Console Output:
Total common shares outstanding 0 22.32B 1 21.34B 2 20.50B 3 19.02B 4 17.77B 5 16.98B 6 16.43B 7 16.33BNote : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
