Home > Software engineering >  Web Scraper not pulling text
Web Scraper not pulling text

Time:02-01

I'm trying to make a web scraper that grabs data from the Milwaukee tools website. And I can make a request and download the website but I can't seem to get the text title. All I get is <div v-html="result.Title"></div> witch is not the data I need. What I it to return is <div >M18 FUEL™ HAMMERVAC™ 1-1/8” Dedicated Dust Extractor</div> witch is the first entry on the website. This is my code:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.milwaukeetool.com/Products/Power-Tools/Drilling').text
soup = BeautifulSoup(html_text, 'html5lib')
tools = soup.find('div', class_ = 'product-listing__result')
name = tools.find('div', class_ = 'result-title')
print(name)
sku = tools.find('div', class_="result-sku")

Any help is appreciated.

CodePudding user response:

I think you should scrap the title using the page of the product because when you try to get the HTML of this linkhttps://www.milwaukeetool.com/Products/Power-Tools/Drilling there are some tags not loaded so the data you want it will not return to you.

Maybe this way can help you to get the data you want.

import requests
from bs4 import BeautifulSoup
  

#add to this list all URLs you want to get the title from

urls = ['https://www.milwaukeetool.com/Products/Power-Tools/Drilling/2915-DE',
'https://www.milwaukeetool.com/Products/Power-Tools/Drilling/2706-20']

for url in urls:
    reqs = requests.get(url)
    # using the BeaitifulSoup module
    soup = BeautifulSoup(reqs.text, 'html.parser')
    # displaying the title
    title = soup.find('h1', class_ = 'product-info__title')
    print(title.text)

CodePudding user response:

In case you want the list of the results (Titles) in no particular order you can do this:

import requests
import json

params = {
    'Availability': False,
    'Categories': 'BA16CBC0-793E-407A-AED6-DDBB1359AA10',
    'FacetList': '501a4ad5-8e79-40bf-b125-d1cdc48d49ea|d77896af-2dc6-4e90-8d40-3a7c211bb04e|135e489f-c19c-46cd-834f-93092fe8da25|a438883b-2015-4f97-91c8-9a1f1fc5de40',
    'Fuel': False,
    'Language': 'en',
    'NumberFacetValues': 8,
    'OneKey': False
}

page = requests.post('https://www.milwaukeetool.com/api/sitecore/products/GetProductsByProductListingQuery', params=params )
data = json.loads(page.content)

for item in data['Results']:
    print(item['Title'])

I hope this helps.

  •  Tags:  
  • Related