Home > Back-end >  Nokogiri not showing search-result
Nokogiri not showing search-result

Time:01-24

Today I wanted to try out Nokogiri (Ruby) to list the addresses which are listed on this site: https://www.funda.nl/koop/rotterdam/straat-oostzeedijk/

I tried to show the addresses with the debugger using this video https://www.youtube.com/watch?v=b3CLEUBdWwQ

The results are

  • Oostzeedijk 6 B01
  • Oostzeedijk 166 C

It's class is called "search-result__header-title".

I tried different things such as div-elements but I can't show the results.

require 'nokogiri' 
require 'httparty' 
require 'byebug' 

def scraper
    url = "https://www.funda.nl/koop/rotterdam/straat-oostzeedijk/"
    unparsed_page = HTTParty.get(url)
    parsed_page = Nokogiri::HTML(unparsed_page)
    byebug 
end

scraper

In the debugger I have tried this:

(byebug) parsed_page 

This give me a result, but when a specify this then the result is:

(byebug) parsed_page.css('div.search-content-output')    
[]

Can somebody give me a hint? I am stuck.

CodePudding user response:

The problem is that on the URL you are using (https://www.funda.nl/koop/rotterdam/straat-oostzeedijk/), content is loaded asynchronously.

The tutorial you're following assumes a "simple" web-page, where all of the page's content is loaded immediately. But for your scenario, unparsed_page is initially missing lots of page content that only loads later.

So what you need to do here is run code that actually mimics the behaviour of a user interacting with the website. There are many libraries designed to do this, so my solution below is certainly not the only option available, but hopefully you will find this concrete example useful.

I will be using Google Chrome, Chromedriver and the ruby library watir. Prerequisites:

  1. Install chromedriver. This step will vary depending on your operating system. For example, on MacOS, you can probably just run brew install chromedriver.
  2. gem install watir

The code:

require 'watir'

b = Watir::Browser.new :chrome
b.goto("https://www.funda.nl/koop/rotterdam/straat-oostzeedijk/")
puts b.div(class: 'search-content-output').text

Result:

Hartschelp 111 Monster, € 1.395.000 k.k.

Uitgelicht door Kolpa van der Hoek Makelaars Rotterdam

Buitenbassinweg 506 Rotterdam, € 495.000 k.k.

Uitgelicht door Oranje Bouwgroep B.V.

Van der Duijn van Maasdamweg 614 O.3.6. Rotterdam, € 890.000 v.o.n.

...

Note that this website also seems to have a CAPTCHA to prevent web scrapers, however, the developers have screwed this up because the EU cookie consent popup appears before the CAPTCHA at the moment, thus rendering it somewhat useless

  •  Tags:  
  • Related