url = 'http://www.mtv.de/charts/c6mc86/single-top-100?expanded=true'
chromedriver = Service("/usr/local/bin/chromedriver")
op = webdriver.ChromeOptions()
browser = webdriver.Chrome(service=chromedriver, options=op)
browser.get(url)
timeout = 60
browser.implicitly_wait(20)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight,)")
time.sleep(5)
try:
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, '/html/body/div[1]/main/div/section/div/div/div/object')))
print('========================')
except TimeoutException:
browser.quit()
items = browser.switch_to.frame(browser.find_element(By.TAG_NAME,'object'))
print(items)
itembox = items.find_elements(By.CLASS_NAME, 'charts-marslnet')
# print(itembox)
for item in itembox:
print(item.text)
I have been trying to scrap the song name, author and url for the song from this website but unable to access the html inside the tag under #document section. I am not able to figure why i cant access it. Any insights on what can be the issue with my code or what should be done to access this html inside #document section would be very helpful. [HTML inside the tag with #document(Screenshot 2][1]
CodePudding user response:
You can grab it from the direct url:
import requests
from bs4 import BeautifulSoup
url = 'https://mtv.marsl.net/demo/showdbcharts.php?c=4'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
acts = soup.find_all('div', {'class':'cmn-act'})
for each in acts:
title = each.find_next('div', {'class':'cmn-title'}).text.strip()
artist = each.find_next('div', {'class':'cmn-artist'}).text.strip()
link = each.find_next('a', href=True)['href']
print(f'{title}\n{artist}\n{link}\n\n')
Output:
abcdefu
Gayle
https://www.mtv.de/musikvideos/r9d9sl/abcdefu
Wenn ich will
Gzuz & Bonez MC
https://www.mtv.de/musikvideos/7evkst/10von10
10von10
Pajel
https://www.mtv.de/musikvideos/7evkst/10von10
Shivers
Ed Sheeran
https://www.mtv.de/musikvideos/miq9lq/shivers
Heat Waves
Glass Animals
https://www.mtv.de/musikvideos/l9rv5d/heat-waves
...
