I'm trying to get the text Something here I want to get inside the div element from a html file using Python and BeautifulSoup.
This is how part of the code looks like in html:
<div xmlns="" id="idp46819314579224" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #d43f3a; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;" onclick="toggleSection('idp46819314579224-container');" onm ouseover="this.style.cursor='pointer'">Something here I want to get<div id="idp46819314579224-toggletext" style="float: right; text-align: center; width: 8px;">
-
</div>
</div>
And this is how I tried to do:
vu = soup.find_all("div", {"style" : "background: #d43f3a"})
for div in vu:
print(div.text)
I use loop because there are several div with different id but all of them has the same background colour. It has no errors, but I got no output.
How can I get the text using the background colour as the condition?
CodePudding user response:
The style attribute has other content inside it
style="box-sizing: ....; ....;"
Your current code is asking if style == "background: #d43f3a" which it is not.
What you can do is ask if "background: #d43f3a" in style -- a sub-string check.
One approach is passing a regular expression.
>>> import re
>>> vu = soup.find_all("div", style=re.compile("background: #d43f3a"))
...
... for div in vu:
... print(div.text.strip())
Something here I want to get
You can also say the same thing using CSS Selectors
soup.select('div[style*="background: #d43f3a"]')
Or by passing a function/lambda
>>> vu = soup.find_all("div", style=lambda style: "background: #d43f3a" in style)
...
... for div in vu:
... print(div.text.strip())
Something here I want to get
