Home > Software design >  I can't access some <div> elements using bs4?
I can't access some <div> elements using bs4?

Time:01-23

containers=page_soup.findAll("div",{"class":"item-container"})
containers[0].div.div

when I run this code it outputs the following.

<div ><a  href="https://www.newegg.com/Yeston-GeForce-GTX-1050-Ti-GTX1050Ti-4G/p/27N-0042-00041?cm_sp=SH-_-946241-_-8-_-2-_-9SIAZUEEV65926-_-graphics-_-graphic-_-4&amp;Item=9SIAZUEEV65926&amp;IsFeedbackTab=true#scrollFullInfo" title="Rating   5"><i ></i><span >(<!-- -->3<!-- -->)</span></a></div>

it is skipping a division element inside this division identified as . How can you access it?

CodePudding user response:

What happens?

There are some containers with sponsored products and a slightly different structure and the tag with is an <a>:

<div >
    <div >
    <div >
        <div >
            <a href="https://www.newegg.com/Sabrent/BrandStore/ID-8281" >

How to fix?

There are different strategies:

  1. Select your elements more specific and deal with None results:

    containers[0].find("a",{"class":"item-brand'})

  2. Just select the containers without sponsored products

    containers = soup.select('div.item-container:not(:has(.item-sponsored))')
    containers[0].div.div.a
    

Note: In new code use find_all() instead of old syntax findAll()

Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38'
page = requests.get(url)
soup = BeautifulSoup(page.text)

data = []

for item in soup.find_all('div',{'class':'item-container'}):
    data.append({
        'title':item.find('a', {'class':'item-title'}).text,
        'brand':item.find('a', {'class':'item-brand'}).img['title']
    })

print(data)

Output

[{'title': 'ASRock Phantom Gaming D Radeon RX 6500 XT Video Card RX6500XT PGD 4GO', 'brand': 'ASRock'}, {'title': 'GIGABYTE Eagle Radeon RX 6500 XT Video Card GV-R65XTEAGLE-4GD', 'brand': 'GIGABYTE'}, {'title': 'PowerColor AMD Radeon RX 6500XT ITX Gaming Graphics Card 4GB GDDR6', 'brand': 'PowerColor'}, {'title': 'ASUS TUF Gaming Radeon RX 6900 XT Video Card TUF-RX6900XT-O16G-GAMING', 'brand': 'ASUS'}, {'title': 'ASUS Dual Radeon RX 6500 XT Video Card DUAL-RX6500XT-O4G', 'brand': 'ASUS'}, {'title': 'MSI Mech Radeon RX 6500 XT Video Card RX 6500 XT MECH 2X 4G OC', 'brand': 'MSI'}, {'title': 'SAPPHIRE PULSE Radeon RX 6500 XT Video Card 11314-01-20G', 'brand': 'Sapphire Tech'}, {'title': 'MSI Ventus GeForce RTX 3080 Ti Video Card RTX 3080 Ti VENTUS 3X 12G', 'brand': 'MSI'}, {'title': 'EVGA GeForce RTX 3080 Ti XC3 GAMING Video Card, 12G-P5-3953-KR, 12GB GDDR6X, iCX3 Cooling, ARGB LED, Metal Backplate', 'brand': 'EVGA'}, {'title': 'ASUS ROG Strix NVIDIA GeForce RTX 3080 OC Edition Gaming Graphics Card (PCIe 4.0, 12GB GDDR6X, LHR, HDMI 2.1, DisplayPort 1.4a, Axial-tech Fan Design, 2.9-slot, Super Alloy Power II, GPU Tweak II)', 'brand': 'ASUS'}, {'title': 'ASUS Noctua OC Edition GeForce RTX 3070 Video Card RTX3070-O8G-NOCTUA (LHR)', 'brand': 'ASUS'}, {'title': 'GIGABYTE Eagle GeForce RTX 3080 Ti Video Card GV-N308TEAGLE OC-12GD', 'brand': 'GIGABYTE'}]
  •  Tags:  
  • Related