Home > Mobile >  Web scraping with R and rvest when javascript-rendered content in the web page
Web scraping with R and rvest when javascript-rendered content in the web page

Time:02-03

I am attempting to scrape the webpage https://www.filmweb.no/kinotoppen/ for title and other information under each movie. For other webpages I have been fine with running a couple of lines with html_nodes() and html_text() using SelectorGadget to pick the CSS selectors to get the different things I wanted as such:

html <- read_html("https://www.filmweb.no/kinotoppen/")
title <- html %>% 
  html_nodes(".Kinotoppen_MovieTitle__2MFbT") %>% 
  html_text()

However, when running those lines on this webpage I only get an empty character vector. Upon inspecting the webpage further I see that it is calling on javascripts. I tried using html_nodes("script") together with the v8 library to run the javascripts, but to no avail. I'm also unsure which scripts to run, so I tried all as such:

ct <- v8()
ct$eval(scripts[3])

Is there an easier way in general to get the webpage into a form where I can just use rvest? I do not know anything about javascript.

CodePudding user response:

Here's what it would look like using RSelenium to get the page to load.

library(rvest)
library(RSelenium)
remDr <- rsDriver(browser='chrome', port=4444L)
brow <- remDr[["client"]]
brow$open()
brow$navigate("https://www.filmweb.no/kinotoppen/")
h <- brow$getPageSource()
h <- read_html(h[[1]])
h %>% html_nodes(".Kinotoppen_MovieTitle__2MFbT") %>% 
  html_text()
# [1] "Spider-Man: No Way Home"              "Clifford: Den store røde hunden"      "Lise & Snøpels - Venner for alltid"  
# [4] "Familien Voff - alle trenger en venn" "Nightmare Alley"                      "Snødronningen"                       
# [7] "Scream"                               "Bergman Island"                       "Trøffeljegerne fra Piemonte"         
# [10] "Encanto"                             


  •  Tags:  
  • Related