I've scraped title and website link but I fail to pull phone number and address. How can I get them?
Script:
import re
import requests
from bs4 import BeautifulSoup
url='https://www.constructionplacements.com/top-construction-companies-in-india/'
req=requests.get(url)
soup =BeautifulSoup(req.content,'lxml')
for h4 in soup.find_all(lambda tag: tag.name=='h4' and re.search(r'^\d \.',tag.text)):
title=h4.text
website=h4.find_next('a')['href']
CodePudding user response:
You might want to try this:
Note: Not all companies have a phone number.
import requests
from bs4 import BeautifulSoup
def extractor(search_for: str) -> list:
return [
p.getText() for p in soup if p.getText(strip=True).startswith(search_for)
]
url = 'https://www.constructionplacements.com/top-construction-companies-in-india/'
soup = BeautifulSoup(requests.get(url).text, "lxml").select(".post p")
phone_numbers = extractor("Phone")
addresses = extractor("Address")
print(len(phone_numbers), len(addresses))
Output:
62 70
