Home > Software design >  Python BeautifulSoup extract Class Text only if it contains specific text
Python BeautifulSoup extract Class Text only if it contains specific text

Time:01-23

is there a way to extract the below class if the whole class text = New

 <li >New

tried:

doc.find('li', class_ = 'ClassifiedDetail').attrs['New']

maybe something like if class text = New or contains 'New', take it?

CodePudding user response:

Note It is not that clear if you mean class or tag, so I assume you mean the text of a tag

One approach could be use of css selectors and :-soup-contains():

soup.select('li.ClassifiedDetail:-soup-contains("New")')

Alternativ is using string=re.compile(), cause stringor in former versionstext` works only for exact matches of full string:

soup.find_all('li', class_ = 'ClassifiedDetail',text=re.compile('New'))

Example

from bs4 import BeautifulSoup

html='''
<li >New</li>
<li >New York</li>
<li >Ne </li>
<li >Old</li>
<li >knew</li>
'''

soup = BeautifulSoup(html)
for li in soup.select('li.ClassifiedDetail:-soup-contains("New")'):
    print(li.text)

Output

New
New York
  •  Tags:  
  • Related