I was learning web scraping, and the 'li' tag is not showing when I run soup.findAll
Here's the html:
<label>
<input type="checkbox">
<ul >
<li>
<a href=stuff</a>
</li>
</ul>
</label>
I tried:
soup = BeautifulSoup(r.content,'html5lib')
dropdown = soup.findAll('ul', {'class':'dropdown-content'})
print(dropdown)
And it only shows:
[<ul ></ul>]
Any help will do. Thanks!
CodePudding user response:
in this command: dropdown = soup.findAll('ul', {'class':'dropdown-content'}), yo search for ul and dropdown-content class.
dropdown = soup.find('ul').findAll('li')
CodePudding user response:
Your selection per se is okay to find the <ul> it may do not contain any <li> cause I assume these elements are generated dynamically by javascript. To validate this, question should be improved and url of website should be provided.
If content is provided dynamically one approach could be to work with selenium that will render the website like a browser and could return the "full" dom.
Note: In new code use find_all() instead of old syntax findAll()
Example
Html in your example is broken, but your code works if any lis are in the ul in your soup.
import requests
from bs4 import BeautifulSoup
html = '''
<label>
<input type="checkbox">
<ul >
<li>
<a href="stuff"></a>
</li>
</ul>
</label>
'''
soup = BeautifulSoup(html,'html5lib')
dropdown = soup.find_all('ul', {'class':'dropdown-content'})
print(dropdown)
Output
[<ul >
<li>
<a href="stuff"></a>
</li>
</ul>]
