I have one website on which I have details of the property that I need to scrap using python3. I have tried to scrap that details but essential information of the website show when we click on a toggle. Please check the below code and image to get an understanding.
<section _ngcontent-serverapp-c62="" >
<h2 _ngcontent-serverapp-c62="">Details</h2>
<dl _ngcontent-serverapp-c62="">
<dt _ngcontent-serverapp-c62="">Objektart</dt>
<dd _ngcontent-serverapp-c62="">Dachgeschosswohnung</dd>
<!---->
<dt _ngcontent-serverapp-c62="">Lage</dt>
<dd _ngcontent-serverapp-c62="">gute Infrastruktur</dd>
<!---->
<dt _ngcontent-serverapp-c62="">Vertragsart</dt>
<dd _ngcontent-serverapp-c62="">Kauf</dd>
<!---->
<dt _ngcontent-serverapp-c62="">Kaufpreis</dt>
<dd _ngcontent-serverapp-c62="">439.000</dd>
<!---->
<dt _ngcontent-serverapp-c62="">Betriebskosten pro Monat</dt>
<dd _ngcontent-serverapp-c62="">172</dd>
<!---->
<!---->
</dl>
<div _ngcontent-serverapp-c62="" > mehr anzeigen <i _ngcontent-serverapp-c62=""
></i>
<!---->
<!---->
<!---->
</div>
<!---->
</section>
I have tried to perform a click using selenium webdriver
WebDriver = None
WebDriver = GetWebDriver.detail_page()
WebDriver.get("url_of_website")
sectionData = WebDriver.find_element_by_xpath('//section[@]/div[@]').click()
But it throws errors like it's not clickable but in the console if you try to perform click using jquery then it works. On click elements are appending so how can I scrap that data please anyone has an idea about that then please help me.
For better understanding please check out the below video. https://www.awesomescreenshot.com/video/7053773?key=6c1c016ea3b7fcaa63b50fb4374690ef
CodePudding user response:
You can find the data you are looking for in some json that is buried in a script tag within the html. Look at the below code to see how to do it, the json key for the property you want comes from the URL of the property:
import requests
from bs4 import BeautifulSoup
import json
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'paste_the_url_here' #I figured out what site you are scraping ;)
key_value = 'G./api/article' url.split('at')[-1] '?'#get the reference for the json response later
resp = requests.get(url,headers=headers)
soup = BeautifulSoup(resp.text,'html.parser')
dirty = soup.find('script',{'id':'serverApp-state'}).text.replace('&q;','"')
clean = json.loads(dirty)
print(clean[key_value]['body'])
