I am building a bot to check stock of various Ubiquiti Unifi devices from their shop page (Hey, these things are disappearing FAST) and I need some help. I've been searching all day for something like this, but none of the things I've seen here have quite worked.
I'm using the below code to access the UI.com shop (https://store.ui.com/). They, quite conveniently, have the in-stock product information in the header of every page, and I'm using selenium to get the home-page and need to access:
<script data-ot-ignore type="text/javascript">
window.APP_DATA = {
assets: {...},
cart: {"note":null,"attributes":{"quantity-hdds":"{\"4446782390361\"=\u003e{\"0\"=\u003e{\"sku\"=\u003e\"HDD-1TB\", \"ratio\"=\u003e\"1\"}, \"1\"=\u003e{\"sku\"=\u003e\"HDD-8TB\", \"ratio\"=\u003e\"1\"}}}"},"original_total_price":0,"total_price":0,"total_discount":0,"total_weight":0.0,"item_count":0,"items":[],"requires_shipping":false,"currency":"USD","items_subtotal_price":0,"cart_level_discount_applications":[]},
cartAccessories: [{
"id": 4446782390361,
"title": "Dream Machine Pro",
"handle": "udm-pro",
"url": "\/products\/udm-pro",
"tags": ["#HDD-1TB","#HDD-8TB","ALT","ALT::udm-pro","bestseller","enhanced-wizard","featured","mx29","recommended","redirect-wizard","related","UI::1U","UI::AI","UI::Cloud Key","UI::HDD","UI::Network","UI::SFP ","UI::UniFi","unifi"],
"featured_image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008",
"variants": [{
"id": 32264307703897,
"title": "Default Title",
"price": 37900,
"sku": "UDM-Pro",
"available": true,
"inventory_empty":false,
"inventory_policy": "deny",
"image": "//cdn.shopify.com/s/files/1/1439/1668/products/UDM-Pro_front-top-angle_53e97c87-61d9-4f3e-acad-6ba113bbf5de_small.png?v=1629983008"
},],
"data":{"for":{"product-vendors":[],"product-types":["VoIP","Access","Surveillance"]},"type":"UDM-PRO","view":{"default":["UAP-nanoHD-US","UAP-FlexHD-US","UWB-XG-US","UAP-IW-HD-US","UAP-AC-HD-US","UAP-AC-M-US","UAP-BeaconHD-US","UAP-AC-PRO-US","UAP-AC-LITE-US","UAP-AC-LR-US","UAP-AC-IW-US","UAP-AC-M-PRO-US","UAP-AC-SHD-US","UAP-XG-US","UAP-AC-EDU-US","USW-48-POE","USW-24","USW-Pro-24","USW-48-BETA","USW-Lite-16-PoE-BETA","USW-LEAF-BETA","USW-16-PoE","USW-24-PoE","USW-Pro-48-PoE","USW-Pro-24-PoE","USW-Pro-48o","USW-Industrial","UVC-G4-DoorBell","UVC-G3-FLEX","UP-Sense-BETA","UP-Sense","*","!UT-ATA-BETA","!UT-Conference-BETA"],"checkout":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US","U6-IW-US-BETA","U6-Extender-US-BETA","U6-Lite-US-BETA"],"bundle":["!UDM-Pro","!UDM-US","UVC-G4-DoorBell","UWB-XG-US","UAP-AC-HD-US","UAP-FlexHD-US","UAP-IW-HD-US","UAP-nanoHD-US","UAP-BeaconHD-US","UAP-XG-US","UAP-AC-SHD-US","UAP-AC-EDU-US","UAP-AC-M-PRO-US","UAP-AC-PRO-US","UAP-AC-LR-US","UAP-AC-M-US","UAP-AC-IW-US","UAP-AC-LITE-US"]},"priority":1,"description":"","countries":[]}},{
Now, I don't have a lot of experience with javascript, but it looks like my data of interest is basically a javascript array of objects inside another object? (The [{}] structure of "cartAccessories" where everything is.) Element inspect of the source code gives me ""/html/head/script[33]" as the XPATH for the script...I think. It seems to return different data almost each time.
I'm using the following basic code to get the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
urlpage = 'https://store.ui.com/'
print(urlpage)
driver = webdriver.Firefox()
# get web page
driver.get(urlpage)
time.sleep(1)
print("Getting Results.")
results = driver.find_element(By.XPATH, "/html/head/script[33]")
html = results.get_attribute('innerHTML')
print(f"The results are: {html}")
driver.quit()
But this doesn't seem to be right. I want to get that "cartAccessories" information into a python list so I can work on it. What's the best way to access this information? Am I going about this all wrong?
CodePudding user response:
You can use a regex to grab the overarching JavaScript object containing the array of interest within it, then pass that to hjson to deal with the unquoted keys. Finally, extract the cartAccessories item and do what you want with it.
import requests, re, hjson
r = requests.get('https://store.ui.com/')
data =hjson.loads(re.search(r'window.APP_DATA = (.*?)<', r.text, re.S).group(1))
print(data['cartAccessories'])
