Home > Software design >  Using BeautifulSoup4, find every time text starts with a certain symbol in a website
Using BeautifulSoup4, find every time text starts with a certain symbol in a website

Time:01-23

I am trying to scrape price for an item from a website using python.

import requests
from bs4 import BeautifulSoup

URL = "https://..."
result = requests.get(URL)

doc = BeautifulSoup(result.text, "html.parser")
prices = doc.find_all(???)
print(prices)

In question marks I know I can write the full string which to look for, but I want so that it finds every time there is a text that starts with "$".

Is it possible, if so, how?

CodePudding user response:

Use regular expression to catch the tags that starts with certain character as below:

import re
from bs4 import BeautifulSoup

html = """
<p>$Show me</p>
<p>I am invisible</p>
<p>me too</p>
<p>$Show me too</p>
"""

soup = BeautifulSoup(html, 'html.parser')
result = soup.find_all("p", text=re.compile("^\$"))
# -> [<p>$Show me</p>, <p>$Show me too</p>]

Note that I used \ operated before $ since dollar sign itself is a special character. See regular expression syntax for more information.

  •  Tags:  
  • Related