I've created a simple python scraper with requests and beautifulsoup that scans a site every 3 minutes with crojobs to see if it contains the word 'dog'. If the word is present, the script will send an email. The problem is that, the script keeps sending the email every 3 minutes as long as the word dog is on the site. How do I make the script send the email just once and never again, as long as 'dog' is on the site. If 'dog' is removed from the site and added, the scraper should send the email again. Kind of like a scraper for an online shop. It sends an email just once when an item is in stock. If the item goes out of stock and gets restocked again, it sends an email.
import requests
from bs4 import BeautifulSoup
url = "https://justpaste.it/4f4jb"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
if 'dog' in soup.text:
send_email()
else:
do_nothing()
CodePudding user response:
Try to add some flag:
import requests
from bs4 import BeautifulSoup
url = "https://justpaste.it/4f4jb"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
send_dog_flag = True
if 'dog' in soup.text:
if send_dog_flag:
send_email()
send_dog_flag = False
else:
send_dog_flag = True
do_nothing()
CodePudding user response:
If you want the code to remember if it already send a message to you, it must have a flag of some kind to remember it by.
You can either write it directly in the code, like dimay suggests, but if your code ever stops running, the information will be lost.
If the code only executes every 3 minutes, or if it sometimes shuts down, you can choose to either save the flag in a local file (like .txt), or in a Database. This will ensure that even if the program shuts down, the value will be remembered.
CodePudding user response:
You need a state to know whether the word was find during the previous run. If the loop was inside the script a simple variable would be enough. In pseudo code is would be:
was_found = false
loop every third minute:
if dog is found in site:
if not was_found:
was_found = true
send-email
else:
was_found = false
As the script is externally launched, you must persist the state, for example in a disk file:
import requests
from bs4 import BeautifulSoup
import ast
# read current state
try:
with open('path/to/state/file') as fd:
prev = ast.literal_eval(next(fd))
except (OSError, SyntaxError):
prev = False
url = "https://justpaste.it/4f4jb"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
found = 'dog' in soup.text
if found and not state
send_email()
else:
do_nothing()
if state != found:
# store state because it changed
with open('path/to/state/file', 'w') as fd:
print(found, file=fd)
