Home > Blockchain >  With BeautifulSoup How to get content from Meta tag
With BeautifulSoup How to get content from Meta tag

Time:02-06

I am working with a Flask api, trying to retrieve e.g. the page description from the meta tag and I can retrieve the full tag, but I only want the contents section.

 Html_Content_of_page = requests.get(website).text
    soup = BeautifulSoup(Html_Content_of_page, "html5lib")
    print(soup.find("meta", property="og:title"))

e.g. when website = 'https://nytimes.com'

this code prints to the terminal/console:

<meta content="The New York Times - Breaking News, US News, World News and Videos" data-rh="true" property="og:title"/>

Instead, I want only: "The New York Times - Breaking News, US News, World News and Videos"

without the actual tag.

CodePudding user response:

The soup.find("meta", property="og:title") would return the element with given name and property. You can use the element["tag_name"] to extract the corresponding tag value of "tag_name". Try -

from bs4 import BeautifulSoup
import requests
Html_Content_of_page = requests.get("https://nytimes.com").text
soup = BeautifulSoup(Html_Content_of_page, "html5lib")
print(soup.find("meta", property="og:title")["content"])

outputs -

The New York Times - Breaking News, US News, World News and Videos

The element["tag_name"] raises KeyError if tag_name does not exist in the element. So you can use soup.find("meta", property="og:title").get('contents') instead which would return None if the tag_name does not exist in the element.

  •  Tags:  
  • Related