Home > Blockchain >  Scrap a datas using CSS selector (Python, BS4)
Scrap a datas using CSS selector (Python, BS4)

Time:02-02

I am scraping datas using CSS selector for the first time.

And There is a problem scraping content of anchor.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

title = post.find("span", {"class": "title"}).get_text()
company = post.find("span", {"class": "company"}).get_text()
location = post.find("span", {"class": "region company"}).get_text()
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

print {"title": title, "company": company, "location": location, "link":f"https://weworkremotely.com/{link}"}

I want to scrap the content of anchor to make a link of each post. So, I put a[href].

But it doesn't work but scrap contents of all subcategory.

How do I have to change to scrap just the content of anchor?

CodePudding user response:

Assuming you have correctly selected the jobs of interest from all jobs listed, you need a loop, then extract the first href attribute with substring -jobs i.e. post.select_one('[href*=-jobs]' during the loop:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})

for post in posts:
    print('https://weworkremotely.com'   post.select_one('a[href*=-jobs]')['href'])

To get all the listings on the page switch to:

posts = wwr_soup.select('li:has(.tooltip)')
  •  Tags:  
  • Related