Is it possible to test with Beautiful Soup whether a div is a (not necessarily immediate) child of a div?
Eg.
<div class='a'>
<div class='aa'>
<div class='aaa'>
<div class='aaaa'>
</div>
</div>
</div>
<div class='ab'>
<div class='aba'>
<div class='abaa'>
</div>
</div>
</div>
</div>
Now I want to test whether the div with class aaaa and the div with class abaa are (not necessarily immediate) children of the div with class aa.
import bs4
with open('test.html','r') as i_file:
soup = bs4.BeautifulSoup(i_file.read(), 'lxml')
div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})
print(div1 in div0) # must return True, but returns False
print(div2 in div0) # must return False
How can this be done?
(Of course, the actual HTML is more complicated, with more nested divs.)
CodePudding user response:
try finding all the child elements using find_all_next and see if the child elements has the required class attribute.
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, "html.parser")
def is_child(element, parent_class, child_class):
return any(
child_class in i.attrs['class']
for i in soup.find("div", attrs={"class": parent_class}).find_all_next(element)
)
print(is_child("div", "aa", "aaa")) # True
print(is_child("div", "abaa", "aa")) # False
CodePudding user response:
You can use find_parent method from Beautifulsoup.
import bs4
with open("test.html", "r") as i_file:
soup = bs4.BeautifulSoup(i_file.read(), "lxml")
div0 = soup.find("div", {"class": "aa"})
div1 = soup.find("div", {"class": "aaaa"})
div2 = soup.find("div", {"class": "abaa"})
print(div1.find_parent(div0.name, attrs=div0.attrs) is not None) # Returns True
print(div2.find_parent(div0.name, attrs=div0.attrs) is not None) # Returns False
CodePudding user response:
Okay, I think I found a way. You gotta get all children divs of the parent div with find_all:
import bs4
with open('test.html','r') as i_file:
soup = bs4.BeautifulSoup(i_file.read(), 'lxml')
div0 = soup.find('div', {'class':'aa'})
div1 = soup.find('div', {'class':'aaaa'})
div2 = soup.find('div', {'class':'abaa'})
children = div0.find_all('div')
print(div1 in children)
print(div2 in children)
