Home > database >  data scraping by selenium p tag
data scraping by selenium p tag

Time:01-24

I searched a lot on the internet. I couldn't find an example similar to the one below. I'm trying to pull text from a web page. There is no location line in the first p tag. The second location section has a location line. When pulling data, I can only pull the contents of the p tag, which is the location row. I cannot pull the contents of the other p tag. I wonder how can I pull the data inside the first and second p tag?

HTML codes of Page Source:

<div >
    <p>                                                                       
    <i class='fa fa-home main-color'></i> ORHAN MAH.İBRAHİM CAD. NO:35  
    <br>
    <i class='fa fa-phone main-color'></i> 
    <a  href="tel:0508-2920344">0508-2920344 </a>
    <br /> 
    <i class='fa fa-clock-o main-color'></i> 
    <span >19.01.2022</span>     
    </p>
    <p>
       <i class='fa fa-home main-color'></i> HAZAN MAH.ÖKTEM CAD. NO:13/B                                           
    <br>
    <i class='fa fa-phone main-color'></i> 
    <a  href="tel:0584 837 23 70">0584 837 23 70 </a>
    <br>
    <i ></i> 
    <a  href="https://www.google.com/maps?q=35.554433,25.887766" target="_blank">Haritada</a>
    <br /> 
    <i class='fa fa-clock-o main-color'></i> 
    <span >20.01.2022</span> 
    </p>
</div>

Here is the selenium code I used to pull the data from the HTML source above:

item = browser.find_elements_by_class_name("col-md-10")
urls = browser.find_elements_by_xpath("//div[@class=' col-md-10']/p/a[2]")
for i in zip(item,urls):
    try:            
        address = i[0].find_element_by_css_selector("p").text.split("\n")[:2]
    except:
        address = None
    try:            
        phone = i[0].find_element_by_xpath("//a[@class='gri'][1]").text
    except:
        phone = None
    print(address)
    print(phone)
    try:
        url = i[1].get_attribute('href').replace("https://www.google.com/maps?q=","")
    except:
        url = None
    try:            
        date = i[0].find_element_by_xpath("//span[@class='red'][1]").text
    except:
        date = None
    print(url)
    print(date)

CodePudding user response:

Use xpath //div[@class=' col-md-8']/p. This will return data of both p tags. Then you can perform string operations as per your requirement and use data of each p tag using for loop

CodePudding user response:

After long research, I found the solution to the problem, friends. It is necessary to use zip_longest from the itertools module.

  •  Tags:  
  • Related