I was trying to scrape the json page of the sec on python 3, and for the life of me I can't get the json code, I always end up with json decode error and the HTML code of the page (I'm new to python). Here is my code:
import requests
base_url = r"https://data.sec.gov/api/xbrl/companyfacts/CIK"
CIK = "0000320193"
json_index = ".json"
url = base_url CIK json_index
content = requests.get(url)
decoded_content = content.json()
Thanks a lot for your help!
CodePudding user response:
You just need to cover your tracks a little by adding a header to your request that makes it look like the request came from a real browser, I copied my Chrome User-Agent and used that:
import requests
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK0000320193.json'
resp = requests.get(url,headers=headers).json()
print(resp)
