ok, so I'm new to web scraping. I followed a tutorial I found on the internet and it works a treat for a specific website. so I tried to change it up to work for another site. I think I have figured out the headers as I get a 200 response, But when I'm targeting a div to pull its value I am just met with null. So my question is am I doing something wrong here? I have tried to follow other tuts to see if it would answer my question, But I guess because I am new I'm not really sure what to look for?!
EDIT: I should be a bit more specific. so as you can see in my code, I am trying to scrape data from Chaos cards website, I think I have the search function sorted (could be wrong?) but what I'm trying to achieve is when I inspect the page I would like to take the data from
<div >Out of stock </div>
Specifically the "Out of stock" part. as I know this div will contain "in stock" assuming it is. But when I target this div I am just met with null
All I am trying to do is set up a scrapper that when a user in discord types a specific product it will search the website, if it is in stock or not, it will return saying in stock or not in stock. But for now I'm trying to take baby steps, and just get it to firstly print the data I'm after
CODE
import os
import asyncio
import discord
import bs4 as bs
import requests
r = requests.session()
client = discord.Client()
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76'}
@client.event
async def on_ready():
print(f'{client.user.name} - Have a good day <3')
result = requests.get ("https://www.chaoscards.co.uk/", headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76'})
print(result.status_code)
def site_search(keyword):
resp = r.get(f'https://www.chaoscards.co.uk/prod/{keyword}', headers = headers)
# print(resp.text)
soup = bs.BeautifulSoup(resp.text, 'lxml')
in_stock =''
out_of_stock =''
for x in soup.find_all('div', {'class': 'product-detail__content'}):
if ' Out of stock ' in (x):
in_stock = 'Out of stock bro'
if ' In stock ' in str():
out_of_stock = 'Its in stock '
#current_image_url = soup.find('img', {'itemprop': 'image'}).get('src') #
#current_name = soup.find('p', {'class': 'listing-title'}).get_text()
return in_stock,out_of_stock
@client.event
async def on_message(message):
if message.content.startswith('.sm'):
keyword = message.content.split('.sm')[1]
print(site_search(keyword))
in_stock,out_of_stock = site_search(keyword)
EDIT 2:
So i printed the text from resp = r.get(f'https://www.chaoscards.co.uk/prod/{keyword}', headers = headers)
And received this in return
<html lang="en-US">
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>Just a moment...</title>
<style type="text/css">
html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
p {font-size: 20px; font-weight: 400; margin: 8px 0;}
p, .attribution, {text-align: center;}
#spinner {margin: 0 auto 30px auto; display: block;}
.attribution {margin-top: 32px;}
@keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
@-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
#cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
#cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
#cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
.bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
a:hover{color: #f4a15d}
.attribution{font-size: 16px; line-height: 1.5;}
.ray_id{display: block; margin-top: 8px;}
#cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
#cf-hcaptcha-container { text-align:center;}
#cf-hcaptcha-container iframe { display: inline-block;}
</style>
<meta http-equiv="refresh" content="35">
<script type="text/javascript">
//<![CDATA[
(function(){
window._cf_chl_opt={
cvId: "2",
cType: "non-interactive",
cNounce: "66939",
cRay: "6d5bfeb08acc8771",
cHash: "18474546270a019",
cPMDTk: "wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70",
cUPMDTk: "\/prod\/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70",
cFPWv: "b",
cTTimeMs: "1000",
cRq: {
ru: "aHR0cHM6Ly93d3cuY2hhb3NjYXJkcy5jby51ay9wcm9kL1Bva2Vtb24tTGVhZmVvbi1WLVN0YXItU3BlY2lhbC1Db2xsZWN0aW9uLUJveA==",
ra: "TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzk3LjAuNDY5Mi45OSBTYWZhcmkvNTM3LjM2IEVkZy85Ny4wLjEwNzIuNzY=",
rm: "R0VU",
d: "iWUrdApuyTqwp7Sa1s7 bi5hqVur/PkVsEkqFAgmNisGGdY/Hz93xG5mIaMzA9XizszFqLjvwVKypShAl3Lm45xvxp8eYawYXrvO505H8 ouA9KL2g cmlQJrfXxkdmFI5QseUz1MIX/PGL/2S4A1HCLT7gmpXqr muDiazQCUs7XUTOla n/YWWyPERFG/uhI8 uOckDxuY F8HdGDGE8xus50JmOBLgGMC4gELQfxSTyg7Ed7Lw1YUquPfkjSt9Q4aQ2nOWtuzYmO3zV/UTeu0qSsrMI/p7pPYi9ZDANElXlNnuUhFcMd2aDSnUF/aYdNG09p2RTiG3/Jkj5fPpGt4gm9X98Dd6X OndUT/x01iSCq4NTgwgxjmubgZMbmuryIaU2eFKIV7o7TuJkIz1x6p4mdhapTdMMhsfVTS1iNWy0L0TwedlFeUaCNPv lH76ely2NypA/hUtDUVYz1Eey/bwaxGZBp9McRcVwpsPbTCwddxr9Oc29obSDNCid5gpRPhu1Efs0a9zixzPEjQEjZD5tJ7SaFnmI6n7A6Hjc9YzHmvjPrNAUv ZuWAD",
t: "MTY0MzU1OTI4My4yOTAwMDA=",
m: "HvTOqkkdUexOvObprQaK20tiA50EsMdMAUNxBs9a76U=",
i1: "KnbCImKzNxo3XehPmg6jWg==",
i2: "oGYSEcaLbEuXjAZsN7GZBg==",
zh: "JJbyu7T 3hg5jWQCnkKHsP/7REhUTr23SkrwnAaFfjA=",
uh: "l4HLyhywYXQDOYBGJBbVDnfNOSLbBOqVMJwcpsr3qjc=",
hh: "8JWW5AsAg62xfggeMY1P1hRpDlpOqO6xoRTKU6X/36Q=",
}
}
window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
})();
//]]>
</script>
</head>
<body>
<table width="100%" height="100%" cellpadding="20">
<tr>
<td align="center" valign="middle">
<div >
<noscript>
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div id="cf-content" style="display:none">
<div id="cf-bubbles">
<div ></div>
<div ></div>
<div ></div>
</div>
<h1><span data-translate="checking_browser">Checking your browser before accessing</span> www.chaoscards.co.uk.</h1>
<div id="no-cookie-warning" data-translate="turn_on_cookies" style="display:none">
<p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
</div>
<p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
<p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs" >Please allow up to 5 seconds…</p>
<p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting…</p>
</div>
<form id="challenge-form" action="/prod/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_f_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="md" value="lBy7XQRIP3rCTVaX6BoLog981WTI9wl7VPUnFUhdr80-1643559283-0-AfIJze-AsdFTbXwD6zN0kNrMUN92opj5F0JV4HP_IIHIJajx_7BeYxgFsAgzPKKs7B76uy2sTy0NMNe5Lonr5nsHsVd0d8oakLrUtEc43FE_-loi5O9yohJL7zVGcrm5BD3ZjEJMgxY3VwIM0TIl4QifHX3Xiacvm9Us_1J5_OALeEt8dyCDKBUbdhJbfkAV36zEt1-iFbst-6wTI-t_LM6YSJOD9j1K_sxVqdUzAawDadHBGslCDmRO4mA2LTGMhZdNdVN_RUZkUpqWKatfeHID4Hp-w3fx3tW4lxHE6gC86Ud8f-YgeYHKUDkfA_YomWCUxk9WFwoEYlr7MqQhQgWfBgxhAJNpXEbcaIb9e71bSZvbGw8BCLipFXuSk2ZvFofI-CdPIymN17v4S2xNgL92cGpXRhcr1OwJT6iFPJ8zuxPXPGud3C9ZeHnXbntYoYRQFXRcpcYcKIbBJEG8lIhJ4aWqmVkpkmai5oGlf0tnolsiO_5-i8cCEazYlbcUCqKnVDt6UGfuQNJdQXTNmmwNusmt4kPFLztjhNjKydzWHO6AWswLkMzj7rC1759cGdsyBiQkzb632-4Yqvi4f6ZOwBOEWfE0t8ZwdQtkEWy4U84c9j6hM8MG_xgl3t_0yKWRIFANVD9vkN1pqTfJRo8bQPm9oD3KmvRrVl5y_5InKhUotZYMJVV6DhV98WvHVOvjOGqJMPs75vQ0VaqQUiPzlyJ1MQ0G4Qe-sZzoIP0cxuvkCbQE2kxhRrzN887jWQ" />
<input type="hidden" name="r" value="IxoGI_uynuxxTjqGKlMnSQ0FLUh3S6TIZtjcFTDgzzE-1643559283-0-ASk8gczAHx3QxOhXW8WEDt3t1OSXiJ7qJHx ppz1M0nJipXy14O9Y2KKa1Q/qTKOeLAkBCnJuHaVX7YBvcXDde6M8x8kRdlX/AS1CNXDoqegpDIwjQKyyw0/e19MMsryFGK5ltynnTh9NKTFHJFOEcTF8OKBZqgcGH0dEGH3I1e/lPhMAAsMmWkE0i7aPiwTtEPYRkL/z8gpJbyDyqF/pL ykLEqtpq3EDfFYbdMn4Fv27XNs4YKU9z4Z1DrjECS/Nwo4hCq0ZLYafLBnFHp9ZzIVEpGrM07Teci91bqTz3COri7Y3YZ0Vyj7NsZ/DPA ykGWKU794u7OeSpIR4iifH6AEJA5ZVjhPMr46W7cvbgEAReq8TA QdkIo7IA4Yn/Zcu77hx2ESjMpGMbXbJE4OrjZ8Xng/GoG18lBpF0nJUA9QAeUQ4cDOcHK8OkfHObBdTN4qGtQywBGdR7Wm8ZsxDjxry1kOKx1r4wXH1/PdOB0C5wWPVz5k6UPtIJOeqDfc8q7GFQ4f1UmHIeHE3Xg5FfntTitBbAQwNEZ/ymhpO2iGeLjog/wgAtiNY/qgnpTkpJXTjYgZoENwu9VgPIAaJt9wOUPLGnSkQu9nTDDnlbo2DwLmQKdfIYtCUfSF2DNNcyrk7LzWDHc5mWsfXhG/d9J2Ns9nJ4hWHcovnqOHHGLI7QLjBNKBW8 OrFn52OkYdCfXKrcC1PiV3mybK1gYT2uPWGjzOEodQ3x4GzII4qhvonkEPlaTKFnTA3sygjmsoQmbc6GnFQxP0kBIyI5B7qtF29/g2jTSB6ymvHQR oNtrkvfaxM0tSt0tiiUV6HiI/83jCBmWkt6552D2PskpfPLgZqf968KCL5M9YfDBEBHlBswKZMBK6TvPGtS04P4S5gmi M1rBuaubxKLZhUIs0V2OOy HAZsJfluf6SNJe3W9x8EPqnXWT0b3tM3ybuYy4yj31JdChBk On5zVqAoPpaWLQTeRLinVW2ludZ7KMFJltS9LqAJ0evwNcEJAnklwuE9/4uagEJjEsuWkf3C6UIyCFB5lfKlofe4hhwxkanVjds Eg1bIJld0xqUNjPmZdA3LIWnzAq3iL5OoWN2WOAz87k7XI4A9H/ruSiPvtHf5KIOtX3fxDVP3TziOAtvb81p pgK WiL3LAEbEasDMw9O3HBSaXw54Gmq gfNkoPDGCgyP7C25WH67yeqkoVtq64Q3EOpSglfjyyEmQyXT24Gs14zta3Ul6N1jSM38CDd3tIV/XCZZg3xa5TggKjI43lKe2dflR3pllF2Bpg8LH1JVMH6NKsts3TkBAy KWrExBPOeoHgu0BZCIxs9nh1kk0k/LFQhjC6ENDW6swlJ 4hlv9865jTuu5DA4emNvpmHXKmjQ0OlQXpJJYhMcqRAoHpsT9TSaO2MYYZpbHx4kmYJIy04N5jY9TB8vzfnimnxrTYKrrM zSxNPVCXZDVh8LUaxQKqYbgl5LsecA2QFzIc66SQc/8waruFwstNO/f8x/6ijA9s3EWrueKmYK6yeQqWrw4iVO30xppcSLK3lvk0aUYyu1TiQOXCokDCFUDIrxG/S3PEq4UgNIpTF3aRhBtkq49XCYd7MfCteVBzkDQu28IaN JdojGY8LrVdR4VInr6p8 fmpirQZ7WgfWWLHJhqr8pF8eHG60yt372F c5QecYvwGtitOitHbjOXKeLDKoXmtnnguTMRw4Xwz ICfhz/wZ96PzlgKuPydwREQ4DbrhMf mmRCc0EWi2QTGGdt56EiR/lJmXq8FpiRTgYuTRxSTtbtwFS1BHKrgdrc Zuqm3h7t9WRvlRj8KhZEXsDJVWJgKDVT0sjox3phvRlo68Gr016valv5Lr JAujzr1azDMgSaQhNL4cCuxW5jzL5Q3V/k9JgjEg=="/>
<input type="hidden" value="b8506ea0b61c6bf512de56146f25f432" id="jschl-vc" name="jschl_vc"/>
<!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
<input type="hidden" name="pass" value="1643559284.29-RM/SqTEMYf"/>
<input type="hidden" id="jschl-answer" name="jschl_answer"/>
</form>
<script type="text/javascript">
//<![CDATA[
(function(){
var a = document.getElementById('cf-content');
a.style.display = 'block';
var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
var trkjs = isIE ? new Image() : document.createElement('img');
trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=6d5bfeb08acc8771");
trkjs.id = "trk_jschal_js";
trkjs.setAttribute("alt", "");
document.body.appendChild(trkjs);
var cpo=document.createElement('script');
cpo.type='text/javascript';
cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=6d5bfeb08acc8771";
window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.indexOf('?') !== -1 ? '?' : location.search;
window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
if (window._cf_chl_opt.cUPMDTk && window.history && window.history.replaceState) {
var ogU = location.pathname window._cf_chl_opt.cOgUQuery window._cf_chl_opt.cOgUHash;
history.replaceState(null, null, "\/prod\/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_rt_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70" window._cf_chl_opt.cOgUHash);
cpo.onload = function() {
history.replaceState(null, null, ogU);
};
}
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
//]]>
</script>
<div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=6d5bfeb08acc8771')"> </div>
</div>
<div >
DDoS protection by <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
<br />
<span >Ray ID: <code>6d5bfeb08acc8771</code></span>
</div>
</td>
</tr>
</table>
</body>
</html> ````
One thing that stood out to me is this
```<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>``` So I am using beautiful soup and I have heard it cant handle java script? Is this whats affecting my search?
Has anyone got tips, or if you may know the answer to my question but would prefer to point me in the correct direction, I would really appreciate it!
Thank You!
CodePudding user response:
So I found out my problem. as you can see from the update I made on the original post. I was being blocked from accessing the site. This is due to it being a Java script loaded site, and apparently beautiful soup can’t load Java script . Therefore I have scraped the code and followed a new tutorial that uses Selenium and now it works perfectly.
For anyone who stumbles across this post with the same issue I will provide a link to the tutorial I followed in hopes it helps you!
Link: https://replit.com/talk/learn/Python-Selenium-Tutorial-The-Basics/148030
CodePudding user response:
You could try turning the source code of the website into a string and do one of the following:
website_contents = website_contents.split('<div >')[1].split('</div>')
if 'out' in website_contents.lower():
print('Out of stock!')
else:
print('In stock!')
or
if '>Out of stock </' in website_contents:
print('Out of stock!')
else:
print('In stock!')
