I’ve been using the following command to get the status code of a Tweet:
import requests
response = requests.get("https://twitter.com/jack/38373837")
status_code = response.status_code
print(status_code)
----------------------
200
I expected 404. However, I got 200.
Is there another command, or perhaps even a Python package, that accurately determines a page’s HTTP status code?
CodePudding user response:
This is happening because it actually loads Twitter and Twitter can't load the post. So the response is 200 which means OK! Cause you can reach Twitter.
If you try it with an API or with a website that has no protection, you can get error 404!
Try with "https://cidqu.net/thisisnotexist.html"
This page doesn't load anything so it will give you the error 404.
CodePudding user response:
I tried on my laptop and i got 200 as well. I used the -I to get only the headers.
-I, --head Show document info only
nabil@LAPTOP:~$ curl -I https://twitter.com/jack/38373837
HTTP/2 200
date: Mon, 31 Jan 2022 20:08:45 GMT
expiry: Tue, 31 Mar 1981 05:00:00 GMT
pragma: no-cache
server: tsa_f
set-cookie: guest_id=v1:164365972506197722; Max-Age=34214400; Expires=Fri, 03 Mar 2023 20:08:45 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
content-type: text/html; charset=utf-8
x-powered-by: Express
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
last-modified: Mon, 31 Jan 2022 20:08:45 GMT
x-frame-options: DENY
x-xss-protection: 0
x-content-type-options: nosniff
content-security-policy: connect-src 'self' blob: https://*.giphy.com https://*.pscp.tv https://*.video.pscp.tv https://*.twimg.com https://api.twitter.com https://api-stream.twitter.com https://ads-api.twitter.com https://aa.twitter.com https://caps.twitter.com https://media.riffsy.com https://pay.twitter.com https://sentry.io https://ton.twitter.com https://twitter.com https://upload.twitter.com https://www.google-analytics.com https://accounts.google.com/gsi/status https://accounts.google.com/gsi/log https://app.link https://api2.branch.io https://bnc.lt wss://*.pscp.tv https://vmap.snappytv.com https://vmapstage.snappytv.com https://vmaprel.snappytv.com https://vmap.grabyo.com https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net ; default-src 'self'; form-action 'self' https://twitter.com https://*.twitter.com; font-src 'self' https://*.twimg.com; frame-src 'self' https://twitter.com https://mobile.twitter.com https://pay.twitter.com https://cards-frame.twitter.com https://accounts.google.com/ https://recaptcha.net/recaptcha/ https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/; img-src 'self' blob: data: https://*.cdn.twitter.com https://ton.twitter.com https://*.twimg.com https://analytics.twitter.com https://cm.g.doubleclick.net https://www.google-analytics.com https://www.periscope.tv https://www.pscp.tv https://media.riffsy.com https://*.giphy.com https://*.pscp.tv https://*.periscope.tv https://prod-periscope-profile.s3-us-west-2.amazonaws.com https://platform-lookaside.fbsbx.com https://scontent.xx.fbcdn.net https://scontent-sea1-1.xx.fbcdn.net https://*.googleusercontent.com https://imgix.revue.co; manifest-src 'self'; media-src 'self' blob: https://twitter.com https://*.twimg.com https://*.vine.co https://*.pscp.tv https://*.video.pscp.tv https://*.giphy.com https://media.riffsy.com https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net; object-src 'none'; script-src 'self' 'unsafe-inline' https://*.twimg.com https://recaptcha.net/recaptcha/ https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://www.google-analytics.com https://twitter.com https://app.link https://accounts.google.com/gsi/client https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js 'nonce-NzE1OGUzMjgtYWVkZS00ZGNkLWI4ZjctNDQwYmU1ODA2NjJh'; style-src 'self' 'unsafe-inline' https://accounts.google.com/gsi/style https://*.twimg.com; worker-src 'self' blob:; report-uri https://twitter.com/i/csp_report?a=O5RXE===&ro=false
strict-transport-security: max-age=631138519
cross-origin-opener-policy: same-origin-allow-popups
cross-origin-embedder-policy: unsafe-none
x-response-time: 185
x-connection-hash: d1535e6f6d60a343d5d9adfbe574b67f65b771b35fcc93c7ea887705bffb2ba8
CodePudding user response:
Try this endpoint to check tweet if exists or not:
import requests
import json
# https://twitter.com/jack/status/1247616214769086465
tweet_id = 1247616214769086465
url = 'https://twitter.com/i/api/graphql/_iJccJ-mHcyaV0nq_odmBA/TweetDetail'
# Request Headers
headers = {'Host': 'twitter.com',
'sec-ch-ua': '',
'x-twitter-client-language': 'en',
'x-csrf-token': '9d2d0361bd589118ff41e56619327537',
'sec-ch-ua-mobile': '?0',
'authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs'
'=1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA',
'content-type': 'application/json',
'x-guest-token': '1488257541251469319',
'x-twitter-active-user': 'yes',
'sec-ch-ua-platform': '',
'accept': '*/*',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://twitter.com/GioCellRed/status/1488257200195842048',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'guest_id_ads=v1:164069712670696178; guest_id=v1:164069712670696178; '
'guest_id_marketing=v1:164069712670696178; personalization_id=',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76'}
# Request Parameters
variables = {"focalTweetId": tweet_id, "referrer": "search",
"controller_data": "DAACDAAFDAABDAABDAABCgABAAAAAAAAgEAAAAwAAgoAAQAAAAAAAAAICgACTOJ7aVQ"
"/L38LAAMAAAAFU29uaWEMAAQMAAELAAEAAAAFU29uaWELAAIAAAAkOTUxYmYyZjItMDl"
"hNC00ZTlmLWJkZWItMTBhYTFjMmU5YjBhAAAKAAUbtNSIOd CdQAAAAAA",
"with_rux_injections": False,
"includePromotedContent": True,
"withCommunity": True,
"withQuickPromoteEligibilityTweetFields": True,
"withBirdwatchNotes": False,
"withSuperFollowsUserFields": True,
"withDownvotePerspective": False,
"withReactionsMetadata": False,
"withReactionsPerspective": False,
"withSuperFollowsTweetFields": True,
"withVoice": True, "withV2Timeline": False,
"__fs_interactive_text": False,
"__fs_responsive_web_uc_gql_enabled": False,
"__fs_dont_mention_me_view_api_enabled": False}
params = {'variables': json.dumps(variables)}
with requests.get(url, headers=headers, params=params) as resp:
result = resp.json()
print('Error:', ("errors" in result),
'\nSuccess:', ("data" in result))
