Json SEC scraping-CodePudding

I was trying to scrape the json page of the sec on python 3, and for the life of me I can't get the json code, I always end up with json decode error and the HTML code of the page (I'm new to python). Here is my code:

import requests

base_url = r"https://data.sec.gov/api/xbrl/companyfacts/CIK"

CIK = "0000320193"

json_index = ".json"

url = base_url   CIK   json_index

content = requests.get(url)
decoded_content = content.json()

Thanks a lot for your help!

CodePudding user response：

You just need to cover your tracks a little by adding a header to your request that makes it look like the request came from a real browser, I copied my Chrome User-Agent and used that:

import requests

headers =   {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK0000320193.json'
resp = requests.get(url,headers=headers).json()
print(resp)