Home > database >  Parse div element from html with style attributes
Parse div element from html with style attributes

Time:01-28

I'm trying to get the text Something here I want to get inside the div element from a html file using Python and BeautifulSoup.

This is how part of the code looks like in html:

<div xmlns="" id="idp46819314579224" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #d43f3a; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;"  onclick="toggleSection('idp46819314579224-container');" onm ouseover="this.style.cursor='pointer'">Something here I want to get<div id="idp46819314579224-toggletext" style="float: right; text-align: center; width: 8px;">
                -
            </div>
</div>

And this is how I tried to do:

vu = soup.find_all("div", {"style" : "background: #d43f3a"})

for div in vu:
    print(div.text)

I use loop because there are several div with different id but all of them has the same background colour. It has no errors, but I got no output.

How can I get the text using the background colour as the condition?

CodePudding user response:

The style attribute has other content inside it

style="box-sizing: ....; ....;"

Your current code is asking if style == "background: #d43f3a" which it is not.

What you can do is ask if "background: #d43f3a" in style -- a sub-string check.

One approach is passing a regular expression.

>>> import re
>>> vu = soup.find_all("div", style=re.compile("background: #d43f3a"))
... 
... for div in vu:
...     print(div.text.strip())
Something here I want to get

You can also say the same thing using CSS Selectors

soup.select('div[style*="background: #d43f3a"]')

Or by passing a function/lambda

>>> vu = soup.find_all("div", style=lambda style: "background: #d43f3a" in style)
... 
... for div in vu:
...     print(div.text.strip())
Something here I want to get
  •  Tags:  
  • Related