I searched a lot on the internet. I couldn't find an example similar to the one below. I'm trying to pull text from a web page. There is no location line in the first p tag. The second location section has a location line. When pulling data, I can only pull the contents of the p tag, which is the location row. I cannot pull the contents of the other p tag. I wonder how can I pull the data inside the first and second p tag?
HTML codes of Page Source:
<div >
<p>
<i class='fa fa-home main-color'></i> ORHAN MAH.İBRAHİM CAD. NO:35
<br>
<i class='fa fa-phone main-color'></i>
<a href="tel:0508-2920344">0508-2920344 </a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span >19.01.2022</span>
</p>
<p>
<i class='fa fa-home main-color'></i> HAZAN MAH.ÖKTEM CAD. NO:13/B
<br>
<i class='fa fa-phone main-color'></i>
<a href="tel:0584 837 23 70">0584 837 23 70 </a>
<br>
<i ></i>
<a href="https://www.google.com/maps?q=35.554433,25.887766" target="_blank">Haritada</a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span >20.01.2022</span>
</p>
</div>
Here is the selenium code I used to pull the data from the HTML source above:
item = browser.find_elements_by_class_name("col-md-10")
urls = browser.find_elements_by_xpath("//div[@class=' col-md-10']/p/a[2]")
for i in zip(item,urls):
try:
address = i[0].find_element_by_css_selector("p").text.split("\n")[:2]
except:
address = None
try:
phone = i[0].find_element_by_xpath("//a[@class='gri'][1]").text
except:
phone = None
print(address)
print(phone)
try:
url = i[1].get_attribute('href').replace("https://www.google.com/maps?q=","")
except:
url = None
try:
date = i[0].find_element_by_xpath("//span[@class='red'][1]").text
except:
date = None
print(url)
print(date)
CodePudding user response:
Use xpath //div[@class=' col-md-8']/p. This will return data of both p tags.
Then you can perform string operations as per your requirement and use data of each p tag using for loop
CodePudding user response:
After long research, I found the solution to the problem, friends. It is necessary to use zip_longest from the itertools module.
