Home > database >  BeautifulSoup python module can't find text in a website
BeautifulSoup python module can't find text in a website

Time:01-26

I am trying to find weather temperature off of weather.com using beautiful soup. If I go to the url and inspect element, 8:00 pm, the text I am looking for, is on the website. However, the code outputs a NoneType object and can't find an instance of the text. I tried weather_entry=soup.find(text="8.00") and that didn't yield any results either.

import requests
import re
from bs4 import BeautifulSoup
  
def weather():
    url='https://weather.com/weather/hourbyhour/l/823266028e3362e3a9578cfe64cb1c6ac654c492d22b41dbe3ac567cd31e1083'
      
    #open with GET method
    resp=requests.get(url)
      
    #http_respone 200 means OK status
    if resp.status_code==200:

        
        soup=BeautifulSoup(resp.text,'html.parser')    

#this line is the problem, .find("8:00) and .find(text=re.compile("8:00") dont work either
weather_entry=soup.find(text=re.compile("8:00 pm"))

        print(str(weather_entry) "\n")
        print(weather_entry.get_text())
        
    else:
        print("Error")
          
weather()

CodePudding user response:

I think that the weather information you are trying to find is contained in Javascript. If you switch to Debugger in the developers console (I'm using Firefox) you will see a folder called hourly/assets which contains a lot of js scripts.

I've tried to do use Beautiful Soup to read weather websites previously and come up against the exact same problem. The solution I found (which may not be available to you) was to ask the website for access to the raw data via JSON or API.

Another solution I have used previously is to find a website for an amateur web station, which is far more likely to be written in pure HTML

  •  Tags:  
  • Related