The website has 9 pages and my code just add the last page elements to the list. I want to add all elements for all pages next together in list.
alltitles = []
allnames = []
alllinks = []
allpeices = []
allstocks = []
for n in range(pagenum):
pages_url = f"https://www.ispsupplies.com/manufacturers/TP~Link?order=relevance:asc&page=
{n 1}&keywords=tp-link"
driver.get(pages_url)
html = driver.page_source
soup = Soup(html)
title = soup.find_all("span", itemprop="name")
titleloop = [titles.text for titles in title]
alltitles.append(titleloop)
name = soup.find_all("div", class_="item-details-sku-container")
nameloop = [names.text for names in name]
allnames.append(nameloop)
link = soup.find_all("a", class_="facets-item-cell-grid-title")
linkloop = [links.text for links in link]
alllinks.append(linkloop)
price = soup.find_all("span", class_="item-views-price-lead")
priceloop = [prices.text for prices in price]
allpeices.append(priceloop)
stock = soup.find_all("div", class_="item-details-stock")
stockloop = [stocks.text for stocks in stock]
allstocks.append(stockloop)
CodePudding user response:
What happens?
Code works well, but iterates to fast and elements your looking for are not present in the moment you try to find them.
How to fix?
Use selenium waits to check if elements are present in the DOM:
...
driver.get(pages_url)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-type="item"]')))
html = driver.page_source
...
Note: You have to make additional imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Example
Not sure why decided for these bunch of lists, this example deals with a single list of dicts:
data = []
for n in range(2):
pages_url = f"https://www.ispsupplies.com/manufacturers/TP~Link?order=relevance:asc&page={n 1}&keywords=tp-link"
driver.get(pages_url)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-type="item"]')))
html = driver.page_source
soup = Soup(html)
for item in soup.select('[data-type="item"]'):
data.append({
'title' : item.find("span", itemprop="name").text,
'name' : item.find("div", class_="item-details-sku-container").text,
'link' : item.find("a", class_="facets-item-cell-grid-title")['href'],
'price' : item.find("span", class_="item-views-price-lead").text,
'stock' : item.find("div", class_="item-details-stock").text.strip()
})
pd.DataFrame(data)
Output
| title | name | link | price | stock |
|---|---|---|---|---|
| TP-Link AC750 Wireless Dual Band Router | SKU: Archer C20 | /TP-Link-Archer-C20 | US$34.99 | Direct Ship item Item usually ships directly from the manufacturer |
| TP-Link 16-Port Gigabit Unmanaged Pro Switch | SKU: TL-SG116E | /TP-Link-TL-SG116E | US$79.99 | 3 In Stock |
| TP-Link AC1200 Wireless MU-MIMO Gigabit Router Archer A6 | SKU: Archer A6_V3 | /TP-Link-Archer-A6 | US$49.99 | Direct Ship item Item usually ships directly from the manufacturer |
| TP-Link AC4000 MU-MIMO Tri-Band Wi-Fi Router Archer A20 | SKU: Archer A20 | /TP-Link-Archer-A20 | US$189.99 | Direct Ship item Item usually ships directly from the manufacturer |
| TP-Link AC5400 MU-MIMO Tri-Band Gaming Router | SKU: Archer C5400X | /TP-Link-Archer-C5400X | US$279.99 | Direct Ship item Item usually ships directly from the manufacturer |
CodePudding user response:
Any reason not just go through the api? Far more efficient, and you'll get more data. You can always just filter out columns you don't need.
import requests
import pandas as pd
items = []
page = 0
while True:
url = 'https://www.ispsupplies.com/api/items'
payload = {
'_t': '1641815468877',
'c': '393682',
'country': 'US',
'currency': 'USD',
'custitem_disable_from_main_website': '0',
'custitem_is_international': '0',
'fieldset': 'search',
'include': 'facets',
'language': 'en',
'limit': '100',
'manufacturers': 'TP~Link',
'n': '2',
'nocache': 'T',
'offset': str(page*100),
'sort': 'quantityavailable:desc'}
jsonData = requests.get(url, params=payload).json()
items = jsonData['items']
print('Page: %s' %(page 1))
if len(jsonData['items']) < 100:
break
page = 1
df = pd.DataFrame(items)
Output:
Full Output (just first 5 rows of the 199 products):
print(df.head(5).to_string())
custitem88 custitem89 custitem83 custitem_is_international custitem_open_box_ids custitem_ns_pr_item_attributes custitemnew ispurchasable custitem_ns_pr_attributes_rating stockdescription custitemclearance itemimages_detail custitem_commercecategory_brand custitemwarehousemessage custitem_incanada onlinecustomerprice_detail custitem71 weight custitem_ns_pr_rating_by_rate internalid itemoptions_detail outofstockmessage custitemextralargeimage2 custitem_availableus storedescription pricelevel1_formatted isinstock custitem67 custitem20 custitem21 onlinecustomerprice dontshowprice custitemrefurbished custitemonsale custitem68 manufacturer custitem69 custitemfree_shipping itemid custitemondiscount offersupport onlinecustomerprice_formatted nopricemessage custitem_disable_from_main_website pricelevel66_formatted isbackorderable custitemtariff_item custitemfree_shipping_cw custitem93 custitem94 custitem19 custitem18 custitem_st7 custitem_st6 showoutofstockmessage outofstockbehavior custitem_st8 itemtype quantityavailable custitem_st3 custitem_st2 custitem_st5 displayname storedisplayname2 custitem_st4 custitem_availableca pricelevel1 custitem_st1 custitem_gpon urlcomponent pricelevel66 custitem_commerce_category_1 custitem_commerce_category_3 custitem_commerce_category_2
0 0 False False True False {'5366': {'urls': [{'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-LINK-Gigabit-PCI-Express-Network-Adapter-TG-3468.5366-2.jpg'}]}} TP-Link 11/8/2021 False {'onlinecustomerprice_formatted': 'US$14.99', 'onlinecustomerprice': 14.99} 0.50 5366 {'fields': [{'internalid': 'custcol19', 'label': 'Item Length', 'type': 'float'}, {'internalid': 'custcol20', 'label': 'Item Width', 'type': 'float'}, {'internalid': 'custcol21', 'label': 'Item Height', 'type': 'float'}, {'internalid': 'custcol_tariff_fee_option', 'label': 'Tariff Fee', 'type': 'currency'}, {'internalid': 'custcol_tariff_fee', 'label': 'Tariff Fee Custom', 'type': 'currency'}, {'internalid': 'custcol_is_tariff', 'label': 'Is Tariff', 'type': 'checkbox'}, {'internalid': 'custcol26', 'label': 'Purchase Price', 'type': 'currency'}, {'internalid': 'custcol36', 'label': 'Not Kit Component', 'type': 'checkbox'}, {'internalid': 'custcol67', 'label': 'Is Tariff (Webstore)', 'type': 'text'}, {'internalid': 'custcol_shiphawk_proposed_shipment_id', 'label': 'ShipHawk Proposed Shipment ID', 'type': 'text'}, {'internalid': 'custcol_shiphawk_source_system_line_n', 'label': 'ShipHawk Source System Line Number', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier', 'label': 'Carrier Name', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier_service', 'label': 'Carrier Service', 'type': 'text'}]} /core/media/media.nl?id=922920&c=393682&h=1qP1ijidIPW2P4DK3Fi_jlV_N3UT-StJuJYKXsZSuMSrOrIn 109 32-bit Gigabit PCIe Network Adapter US$14.99 True 2.25 False 14.99 False False False TP-Link False True TG-3468 False False US$14.99 False US$14.99 True False False <div >In stock at College Station</div> 5.50 6.25 False - Default - InvtPart 109.0 TP-LINK 32-bit Gigabit PCIe Network Adapter 14.99 False TP-LINK-Gigabit-PCI-Express-Network-Adapter-TG-3468 14.99 PCI Adapters NaN NaN
1 0 False False True False {'urls': [{'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-LINK-TL-PA4010-KIT.01.jpg'}]} TP-Link 11/8/2021 False {'onlinecustomerprice_formatted': 'US$39.99', 'onlinecustomerprice': 39.99} 1.00 5406 {'fields': [{'internalid': 'custcol19', 'label': 'Item Length', 'type': 'float'}, {'internalid': 'custcol20', 'label': 'Item Width', 'type': 'float'}, {'internalid': 'custcol21', 'label': 'Item Height', 'type': 'float'}, {'internalid': 'custcol_tariff_fee_option', 'label': 'Tariff Fee', 'type': 'currency'}, {'internalid': 'custcol_tariff_fee', 'label': 'Tariff Fee Custom', 'type': 'currency'}, {'internalid': 'custcol_is_tariff', 'label': 'Is Tariff', 'type': 'checkbox'}, {'internalid': 'custcol26', 'label': 'Purchase Price', 'type': 'currency'}, {'internalid': 'custcol36', 'label': 'Not Kit Component', 'type': 'checkbox'}, {'internalid': 'custcol67', 'label': 'Is Tariff (Webstore)', 'type': 'text'}, {'internalid': 'custcol_shiphawk_proposed_shipment_id', 'label': 'ShipHawk Proposed Shipment ID', 'type': 'text'}, {'internalid': 'custcol_shiphawk_source_system_line_n', 'label': 'ShipHawk Source System Line Number', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier', 'label': 'Carrier Name', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier_service', 'label': 'Carrier Service', 'type': 'text'}]} /core/media/media.nl?id=875835&c=393682&h=blNs8_wT0YD2isH8-8LHyXuDVz82k4V5VxMsQVeVrrUeVsAE 94 AV500 Nano Powerline Ethernet Adapter Starter Kit, Twin Pack US$39.99 True 4.00 False 39.99 False False False TP-Link False True TL-PA4010 KIT False False US$39.99 False US$39.99 True False False <div >In stock at College Station</div> 6.00 8.00 False - Default - InvtPart 94.0 TP-LINK AV600 Powerline Starter Kit 39.99 False TP-LINK-TL-PA4010-KIT 39.99 Powerline Systems NaN NaN
2 0 False False True False {'urls': [{'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-UE300.01.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-UE300.02.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-UE300.03.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-UE300.04.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-UE300.05.jpg'}]} TP-Link 9/17/2021 False {'onlinecustomerprice_formatted': 'US$12.99', 'onlinecustomerprice': 12.99} 0.25 20996 {'fields': [{'internalid': 'custcol19', 'label': 'Item Length', 'type': 'float'}, {'internalid': 'custcol20', 'label': 'Item Width', 'type': 'float'}, {'internalid': 'custcol21', 'label': 'Item Height', 'type': 'float'}, {'internalid': 'custcol_tariff_fee_option', 'label': 'Tariff Fee', 'type': 'currency'}, {'internalid': 'custcol_tariff_fee', 'label': 'Tariff Fee Custom', 'type': 'currency'}, {'internalid': 'custcol_is_tariff', 'label': 'Is Tariff', 'type': 'checkbox'}, {'internalid': 'custcol26', 'label': 'Purchase Price', 'type': 'currency'}, {'internalid': 'custcol36', 'label': 'Not Kit Component', 'type': 'checkbox'}, {'internalid': 'custcol67', 'label': 'Is Tariff (Webstore)', 'type': 'text'}, {'internalid': 'custcol_shiphawk_proposed_shipment_id', 'label': 'ShipHawk Proposed Shipment ID', 'type': 'text'}, {'internalid': 'custcol_shiphawk_source_system_line_n', 'label': 'ShipHawk Source System Line Number', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier', 'label': 'Carrier Name', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier_service', 'label': 'Carrier Service', 'type': 'text'}]} /core/media/media.nl?id=7189171&c=393682&h=qYPfPWXvWc_Udet9IChlyz96qbiA25Y-jMsjg8svIFm-WHxm 79 US$12.99 True 0.67 False 12.99 False False False TP-Link False False UE300 False False US$12.99 False US$12.99 True False False <div >In stock at College Station</div> 3.35 6.10 False - Default - InvtPart 79.0 TP-Link USB 3.0 to Gigabit Ethernet Network Adapter 12.99 False TP-Link-UE300 12.99 USB Converters NaN NaN
3 0 False False True False {'urls': [{'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-LINK-2-4GHz-300Mbps-9dBi-Outdoor-CPE-CPE210.001.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-LINK-2-4GHz-300Mbps-9dBi-Outdoor-CPE-CPE210.002.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-LINK-2-4GHz-300Mbps-9dBi-Outdoor-CPE-CPE210.003.jpg'}]} TP-Link 9/22/2021 False {'onlinecustomerprice_formatted': 'US$39.99', 'onlinecustomerprice': 39.99} 1.65 5319 {'fields': [{'internalid': 'custcol19', 'label': 'Item Length', 'type': 'float'}, {'internalid': 'custcol20', 'label': 'Item Width', 'type': 'float'}, {'internalid': 'custcol21', 'label': 'Item Height', 'type': 'float'}, {'internalid': 'custcol_tariff_fee_option', 'label': 'Tariff Fee', 'type': 'currency'}, {'internalid': 'custcol_tariff_fee', 'label': 'Tariff Fee Custom', 'type': 'currency'}, {'internalid': 'custcol_is_tariff', 'label': 'Is Tariff', 'type': 'checkbox'}, {'internalid': 'custcol26', 'label': 'Purchase Price', 'type': 'currency'}, {'internalid': 'custcol36', 'label': 'Not Kit Component', 'type': 'checkbox'}, {'internalid': 'custcol67', 'label': 'Is Tariff (Webstore)', 'type': 'text'}, {'internalid': 'custcol_shiphawk_proposed_shipment_id', 'label': 'ShipHawk Proposed Shipment ID', 'type': 'text'}, {'internalid': 'custcol_shiphawk_source_system_line_n', 'label': 'ShipHawk Source System Line Number', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier', 'label': 'Carrier Name', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier_service', 'label': 'Carrier Service', 'type': 'text'}]} /core/media/media.nl?id=875579&c=393682&h=skaSM39aCBHsxoAkbixkUtedRt2h7qw6xp6EXKWbFg9QUAGA 71 Outdoor 2.4GHz 300Mbps High power Wireless Access Point US$39.99 True 4.10 False 39.99 False False False TP-Link False True CPE210 False False US$39.99 False US$39.99 True False False <div >In stock at College Station</div> 5.25 10.62 False - Default - InvtPart 71.0 TP-LINK 2.4GHz 300Mbps 9dBi Outdoor CPE CPE210 39.99 False TP-LINK-2-4GHz-300Mbps-9dBi-Outdoor-CPE-CPE210 39.99 2GHz PTP/PTMP NaN NaN
4 0 False False True False {'urls': [{'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-TL-WR902AC.011.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-TL-WR902AC.012.jpg'}, {'altimagetext': '', 'url': 'https://www.ispsupplies.com/SSP Applications/NetSuite Inc. - SCA Vinson/Development/product_images/images/TP-Link-TL-WR902AC.013.jpg'}]} TP-Link 11/29/2021 False {'onlinecustomerprice_formatted': 'US$39.99', 'onlinecustomerprice': 39.99} 0.60 5512 {'fields': [{'internalid': 'custcol19', 'label': 'Item Length', 'type': 'float'}, {'internalid': 'custcol20', 'label': 'Item Width', 'type': 'float'}, {'internalid': 'custcol21', 'label': 'Item Height', 'type': 'float'}, {'internalid': 'custcol_tariff_fee_option', 'label': 'Tariff Fee', 'type': 'currency'}, {'internalid': 'custcol_tariff_fee', 'label': 'Tariff Fee Custom', 'type': 'currency'}, {'internalid': 'custcol_is_tariff', 'label': 'Is Tariff', 'type': 'checkbox'}, {'internalid': 'custcol26', 'label': 'Purchase Price', 'type': 'currency'}, {'internalid': 'custcol36', 'label': 'Not Kit Component', 'type': 'checkbox'}, {'internalid': 'custcol67', 'label': 'Is Tariff (Webstore)', 'type': 'text'}, {'internalid': 'custcol_shiphawk_proposed_shipment_id', 'label': 'ShipHawk Proposed Shipment ID', 'type': 'text'}, {'internalid': 'custcol_shiphawk_source_system_line_n', 'label': 'ShipHawk Source System Line Number', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier', 'label': 'Carrier Name', 'type': 'text'}, {'internalid': 'custcol_shiphawk_carrier_service', 'label': 'Carrier Service', 'type': 'text'}]} /core/media/media.nl?id=1056731&c=393682&h=che7-nic7o8Sln8Cl1UJWkH_DVUv7VRlcJi9_va_9WP4bFwv 60 AC750 Portable Wi-Fi Travel Router, 2.4/5GHz US$39.99 True 3.00 False 39.99 False False False TP-Link False True TL-WR902AC False False US$39.99 False US$39.99 True False False <div >In stock at College Station</div> 4.50 4.50 False - Default - InvtPart 60.0 TP-Link AC750 Wireless Travel Router 2.4/5GHz 39.99 False TP-Link-TL-WR902AC 39.99 Wireless Routers NaN
Or just whats seen on the site:
print(df[['storedisplayname2',
'itemid',
'urlcomponent',
'onlinecustomerprice_formatted',
'quantityavailable']].head(5).to_string())
storedisplayname2 itemid urlcomponent onlinecustomerprice_formatted quantityavailable
0 TP-LINK 32-bit Gigabit PCIe Network Adapter TG-3468 TP-LINK-Gigabit-PCI-Express-Network-Adapter-TG-3468 US$14.99 109.0
1 TP-LINK AV600 Powerline Starter Kit TL-PA4010 KIT TP-LINK-TL-PA4010-KIT US$39.99 94.0
2 TP-Link USB 3.0 to Gigabit Ethernet Network Adapter UE300 TP-Link-UE300 US$12.99 79.0
3 TP-LINK 2.4GHz 300Mbps 9dBi Outdoor CPE CPE210 CPE210 TP-LINK-2-4GHz-300Mbps-9dBi-Outdoor-CPE-CPE210 US$39.99 71.0
4 TP-Link AC750 Wireless Travel Router 2.4/5GHz TL-WR902AC TP-Link-TL-WR902AC US$39.99 60.0

