Scrapping links should be a simple feat, usually just grabbing the src value of the a tag.
I recently came across this website (
CodePudding user response:
By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js) that gives you a way to get all the links, which are based on the HappeningID. You can verify by running this in the JS console, which gives you the first promotion:
window.__NUXT__.state.Promotion.promotions[0].HappeningID
Based on that, you can create a Python loop to get all the promotions:
items = driver.execute_script("return window.__NUXT__.state.Promotion;")
for item in items["promotions"]:
base = "https://sunteccity.com.sg/promotions/"
happening_id = str(item["HappeningID"])
print(base happening_id)
That generated the following output:
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
https://sunteccity.com.sg/promotions/764
https://sunteccity.com.sg/promotions/766
https://sunteccity.com.sg/promotions/762
https://sunteccity.com.sg/promotions/767
https://sunteccity.com.sg/promotions/732
https://sunteccity.com.sg/promotions/733
https://sunteccity.com.sg/promotions/735
https://sunteccity.com.sg/promotions/736
https://sunteccity.com.sg/promotions/737
https://sunteccity.com.sg/promotions/738
https://sunteccity.com.sg/promotions/739
https://sunteccity.com.sg/promotions/740
https://sunteccity.com.sg/promotions/741
https://sunteccity.com.sg/promotions/742
https://sunteccity.com.sg/promotions/743
https://sunteccity.com.sg/promotions/744
https://sunteccity.com.sg/promotions/745
https://sunteccity.com.sg/promotions/746
https://sunteccity.com.sg/promotions/747
https://sunteccity.com.sg/promotions/748
https://sunteccity.com.sg/promotions/749
https://sunteccity.com.sg/promotions/750
https://sunteccity.com.sg/promotions/753
https://sunteccity.com.sg/promotions/755
https://sunteccity.com.sg/promotions/756
https://sunteccity.com.sg/promotions/757
https://sunteccity.com.sg/promotions/758
https://sunteccity.com.sg/promotions/759
https://sunteccity.com.sg/promotions/760
https://sunteccity.com.sg/promotions/761
https://sunteccity.com.sg/promotions/763
https://sunteccity.com.sg/promotions/765
https://sunteccity.com.sg/promotions/730
https://sunteccity.com.sg/promotions/734
https://sunteccity.com.sg/promotions/623
CodePudding user response:
You are using a wrong locator. It brings you a lot of irrelevant elements.
Instead of find_elements_by_class_name('thumb-img') please try find_elements_by_css_selector('.collections-page .thumb-img') so your code will be
all_items = bot.find_elements_by_css_selector('.collections-page .thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
You can also get the desired links directly by .collections-page .thumb-img a locator so that your code could be:
links = bot.find_elements_by_css_selector('.collections-page .thumb-img a')
for link in links:
print(link.get_attribute("href"))
CodePudding user response:
The descendant <img> tags of the parent <div > field doesn't have href or onclick attribute but have src attribute.
To print the value of the src attribute you need to induce WebDriverWait for the presence_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://sunteccity.com.sg/promotions") print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "ul.collections div.thumb-img>a>img")))])Using XPATH:
driver.get("https://sunteccity.com.sg/promotions") print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[contains(@class, 'collections')]//div[@class='thumb-img']/a/img")))])Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as ECConsole Output:
['https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2753-0605_Marcom_New_StoresWebsite_LandingPage_06122021__1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4748-0608_Marcom_CNY2022_Digital_FA_1536x882px_EATS-09.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6775-Website-Promotion-1536(w)-x-882(h)_-_annchi_sac.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4106-1536x882_-_Umistrong.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8883-Woptics_Metaform_KV_360W_x_260H.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/320-TRU_LNY_campaign_Website_Promotion_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9035-GintellCNY-Digital-Marketing_Singapore_1536x882_Rev-C.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1605-Website_Promotion__Organic_Hair_Regrowth_Solutions.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5125-website_image_-_PY.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7462-Martiangear2._Website_Promotion_1536(w)_x_882(h)_(1).jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9576-BBQSuntec_WebsitePromo.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7265-Nimisski_suntec_2_-_mandy_oh.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4106-1536x882_-_Umistrong.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4982-HLA_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2197-bh_cny_2022_(1536_x_882_px).jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7657-(1536x882)_Hair_Plus_-_Suntec_City_Website_Promotion_-_Wee.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8834-fz_cny_04_-_Sherman_Fu.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2742-White_Restaurant_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2424-BWCJ_Chinese_New_Year_Special_Bundle_1536_X_882_no_text.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2812-EYS_20-Dec-Hamper-1536x882-r1_-_Bok_kok_wai.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6476-Superpark_20off_(1536_x_882_px)_(2).jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/921-TB_CNY_FieryFeastSet_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6770-Recoil_Suntec_Website_Promotion_(1).jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1797-morganfield_website.jpeg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/273-1536x882_-_Ruth_NgTSB.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8610-DEC-SuntecCity-CNY2022-TigerPlushToy-Banner-1536x882_-_Shiau_Chen_Lim.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5460-SG_Scanteak_CNY2022_SUNTEC_DIGITALSCREEN-04_-_Scanteak_SG.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3926-Singapore_min_tNew_Suntec_Web_Promo_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4308-Suntec-CNY22-1536x882_-_Elements_Wellness_Group.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5923-PetLoversSuntecCity-CNY22-1536x882-Dec21.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/5906-Myths_&_Legends_Website.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/8873-FANCL_Suntec_LNY_visual_websitepromo.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4584-Suntec-1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2898-1536_882_low_res_-_Theresa_StateSwim.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6775-Website-Promotion-1536(w)-x-882(h)_-_annchi_sac.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3288-Suntec_Advertising_LNY_Website_Promotion_-_Ilina_Sim.png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1437-Hans_2022_CNY_-_SUN_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/6395-SC_Website_Highlights_1536x882px_ToTT_-_Ren_Qi_Quak.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9201-Harvey_Norman_Electrical_&_IT_lifestyle_V2_1536x882px.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/9017-EncikTanSuntec_City_-_Website_Highlights_(1536px_by_882px).png', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/4748-0608_Marcom_CNY2022_Digital_FA_1536x882px_EATS-09.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/3852-FINAL_Promo_listing_1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/2753-0605_Marcom_New_StoresWebsite_LandingPage_06122021__1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/1050-WebsiteHighlights1536x882.jpg', 'https://suntecproject.s3.amazonaws.com/BI/highlight/mobile_small/7312-TUES15_EATS_promolisting_01.jpg']
