I am working with a Flask api, trying to retrieve e.g. the page description from the meta tag and I can retrieve the full tag, but I only want the contents section.
Html_Content_of_page = requests.get(website).text
soup = BeautifulSoup(Html_Content_of_page, "html5lib")
print(soup.find("meta", property="og:title"))
e.g. when website = 'https://nytimes.com'
this code prints to the terminal/console:
<meta content="The New York Times - Breaking News, US News, World News and Videos" data-rh="true" property="og:title"/>
Instead, I want only:
"The New York Times - Breaking News, US News, World News and Videos"
without the actual tag.
CodePudding user response:
The soup.find("meta", property="og:title") would return the element with given name and property. You can use the element["tag_name"] to extract the corresponding tag value of "tag_name". Try -
from bs4 import BeautifulSoup
import requests
Html_Content_of_page = requests.get("https://nytimes.com").text
soup = BeautifulSoup(Html_Content_of_page, "html5lib")
print(soup.find("meta", property="og:title")["content"])
outputs -
The New York Times - Breaking News, US News, World News and Videos
The element["tag_name"] raises KeyError if tag_name does not exist in the element. So you can use soup.find("meta", property="og:title").get('contents') instead which would return None if the tag_name does not exist in the element.
