I am a beginner with python and using BeautifulSoup to extract links from the following webpage https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital. All available codes are like the follows,
html_page = urllib.request.urlopen("https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(link.get('href'))
The outputs include partial links, such as "/providers", etc. It should be "https://mhealthfairview.org/providers". Is there any way I can extract the full link rather than the partial link? Thank you.
CodePudding user response:
from urllib.parse import urljoin
url = "https://mhealthfairview.org/locations/m-health-fairview-st-johns-hospital"
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page)
for link in soup.find_all('a'):
print(urljoin(url, link.get('href')))
CodePudding user response:
You can simply use an if.
webroot = 'https://mhealthfairview.org'
href = link.get('href')
if href[0] == "/":
print(webroot href)
