from bs4 import BeautifulSoup
import requests
url13cases = 'https://hitechfix.com/product-category/cases/apple-cases/iphone-
cases/iphone-13-6-1-cases/'
r = requests.get(url13cases)
soup = BeautifulSoup(r.text, 'html.parser')
img = soup.findAll('img',{"class":"attachment-woocommerce_thumbnail size-
woocommerce_thumbnail"})
So I am trying to scrape all the pictures from my friends website but the problem is there are a few pages. I just want to know how to edit the url where it goes to the second third and fourth page also. Then I also want to create an array or objects for each link.
The link for page 2 is like this https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/2/
Its the same as the last link just the end just the extra /page/2/ at the end. There are also 2 more pages for 4 pages total how do i get all of them and create objects.
CodePudding user response:
You could use built in function range() to itrate the pages.
In newer code avoid old syntax findAll() instead use find_all() or select() with css selectors - For more take a minute to check docs
Example
from bs4 import BeautifulSoup
import requests
img_list = []
for i in range(1,5):
r = requests.get(f'https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/{i}')
soup = BeautifulSoup(r.text)
img_list.extend(soup.find_all('img',{"class":"attachment-woocommerce_thumbnail size-woocommerce_thumbnail"}))
img_list
