I was trying to get all the links from the inspect element code of this website with the following code.
import requests
from bs4 import BeautifulSoup
url = 'https://chromedriver.storage.googleapis.com/index.html?path=97.0.4692.71/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for link in soup.find_all('a'):
print(link)
However, I got no links. Then, I checked what soup was by printing it, and I compared it to the code I got after inspecting element and viewing page source on the actual website. The code returned by print(source) matched that which showed up when I clicked view page source, but it did not match the code that showed up when I clicked inspect element. Firstly, how do I get the inspect element code instead of the page source code? Secondly, why are the two different?
CodePudding user response:
Just use the other URL mentioned in the comments and parse the XML with BeautifulSoup.
For example:
import requests
from bs4 import BeautifulSoup
url = "https://chromedriver.storage.googleapis.com/?delimiter=/&prefix=97.0.4692.71/"
soup = BeautifulSoup(requests.get(url).text, features="xml").find_all("Key")
keys = [f"https://chromedriver.storage.googleapis.com/{k.getText()}" for k in soup]
print("\n".join(keys))
Output:
https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_linux64.zip
https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_mac64.zip
https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_mac64_m1.zip
https://chromedriver.storage.googleapis.com/97.0.4692.71/chromedriver_win32.zip
https://chromedriver.storage.googleapis.com/97.0.4692.71/notes.txt
