I use the below script to gather all tags from a html page, but it's not showing html response, instead I am getting something else
import urllib.request
from bs4 import BeautifulSoup
loginurl= 'https://172.56.66.77'
fhand = urllib.request.urlopen(loginurl).read()
soup = BeautifulSoup(fhand,'html.parser')
print(soup)
I tried collect a particular data from html page, but when I use Beautiful soup, it's not getting html data instead I am getting the below response
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="xslt.cgi"?>
<iconmenu>
<title>Geräteinformationen</title><prompt>Geräteinformationen anzhhas</prompt>
<menuitem/><iconindex>-1</iconindex><name>MAC-Adresse : 76238823354</name><url></url>
<menuitem/><iconindex>-1</iconindex><name>Host-Name : SEP76238823354</name><url></url>
</iconmenu>
I cannot filter the data as it's not showing html tag.
Please help me to get the 2nd data SEP76238823354 from the response
CodePudding user response:
It turns out that you just need to remove the second argument 'html.parser' from the constructor call:
import urllib.request
from bs4 import BeautifulSoup
xml_doc = """<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="xslt.cgi"?>
<iconmenu>
<title>Geräteinformationen</title><prompt>Geräteinformationen anzhhas</prompt>
<menuitem/><iconindex>-1</iconindex><name>MAC-Adresse : 76238823354</name><url></url>
<menuitem/><iconindex>-1</iconindex><name>Host-Name : SEP76238823354</name><url></url>
</iconmenu>"""
soup = BeautifulSoup(xml_doc)
print(soup.find_all("name")[1])
# -> <name>Host-Name : SEP76238823354</name>
CodePudding user response:
Just select the element you need in this case, by containing Host-Name, split() it by delemiter and grab the last part:
...
soup = BeautifulSoup(fhand, 'xml')
soup.select_one('name:-soup-contains("Host-Name")').text.split(': ')[-1]
Output:
SEP76238823354
