Given the following html response:
<div><input type="hidden" id="CSRFToken" name="CSRFToken" value="HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA=="/>Login<input type="password" id="LogBox" name="B8d5" /><input type="hidden" name="loginurl" value="/general/status.html"/><input id="login" type="submit" value=" " /></div>
How can I retrieve the value of all:
Hidden fields only
whose name is
CSRFTokenorCSRFToken2
I tried:
return soup.find("input", {"name":"CSRFToken", }, type='hidden').get('value','')
CodePudding user response:
Hey I have been working with BeautifulSoup as well and I know how much trouble you can get with it. Check out requests-html which is an extension of the request library but also with (IMO) better functionalities than BeautifulSoup to do such a task.
CodePudding user response:
I've simulated your initial situation as best I can. You might still need to define an appropriate parser. Since you use return I assume that you want to write a function.
Just put the code given here into a function.
Example Code
#!pip install beautifulsoup4
#Added a simulated input with name CSRFToken2
simulated_html_input = """<div>
<input type="hidden" id="CSRFToken" name="CSRFToken" value="HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA=="/>Login<input type="password" id="LogBox" name="B8d5" />
<input type="hidden" name="loginurl" value="/general/status.html"/><input id="login" type="submit" value=" " />
<input type="hidden" id="CSRFToken" name="CSRFToken2" value="HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA==CSRFTOKEN2"/>Login<input type="password" id="LogBox" name="B8d5" />
</div>"""
#Init soup
#Embed code into function if needed
soup = bs(simulated_html_input)
try:
token_list = soup.find_all('input', {'name': ['CSRFToken','CSRFToken2'], 'type': 'hidden'})
except Exception as e:
print("Got unhandled exception %s" % str(e))
res = dict()
for i, ele in enumerate(token_list):
#Modify output as needed
res["token_" str(i)] = ele.get('value')
#In function use return
print(res)
Output is:
'token_0': 'HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA==,
'token_1': 'HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA==CSRFTOKEN2'
Modify this code to meet your personal requirements
CodePudding user response:
To achieve your goal and just selecting these two put them into ['CSRFToken','CSRFToken2']:
soup.find_all('input', {'name': ['CSRFToken','CSRFToken2'], 'type': 'hidden'})
This will give you a resultset of tags you have to iterate over to get its values.
As alternativ and more generic approach you can use css selectors in combination with dict comprehension as one of many options:
Select all hidden inputs
soup.select('input[type="hidden"]')
alternativ with find_all()
soup.find_all("input", type="hidden")
Iterate over resultset
{i.get('name'):i.get('value') for i in soup.select('input[type="hidden"]')}
Iterate over resultset incl. check if name contains CSRFToken
{i.get('name'):i.get('value') for i in soup.select('input[type="hidden"]') if 'CSRFToken' in i.get('name')}
This will give you a dict of name / value pairs based on your conditions.
Example
from bs4 import BeautifulSoup
html='''
<div>
<input type="hidden" id="CSRFToken" name="CSRFToken" value="HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA=="/>
<input type="hidden" id="CSRFToken2" name="CSRFToken2" value="CSRFTOKEN2VALUE"/>
Login
<input type="password" id="LogBox" name="B8d5" />
<input type="hidden" name="loginurl" value="/general/status.html"/>
<input id="login" type="submit" value=" " />
</div>
'''
soup=BeautifulSoup(html)
{i.get('name'):i.get('value') for i in soup.select('input[type="hidden"]') if 'CSRFToken' in i.get('name')}
Output
{'CSRFToken': 'HFT/qajA/9FV2kJvMvONwurnFDY6GXZBAA==', 'CSRFToken2': 'CSRFTOKEN2VALUE'}
