Home > OS >  In Scrapy, how to proceed to parse method after getting HTTP403
In Scrapy, how to proceed to parse method after getting HTTP403

Time:01-07

I'm trying to scrape a website which returns HTTP403 if JavaScript is not enabled.

The methodology that I'm trying to implement is, in the parse method, Selenium driver gets the url from response.requets.url and fetch the page

But the issue I'm facing is selenium is automatically closing the request after getting HTTP403 and not entering into the parse method.

Here is my code:

class SampleSpider(scrapy.Spider):

    name = "sample_spider"
    start_urls = ["https://website_that_returning_403.com"]

    def parse(self, response):
        bot = webdriver.Chrome()
        bot.get(response.request.url)

CodePudding user response:

To handle status other than those in the 200-300 range you use the handle_httpstatus_list spider attribute as below

class SampleSpider(scrapy.Spider):

    name = "sample_spider"
    handle_httpstatus_list = [403]
    start_urls = ["https://website_that_returning_403.com"]

    def parse(self, response):
        bot = webdriver.Chrome()
        bot.get(response.request.url)

Read more about it from the docs

  •  Tags:  
  • Related