Hi I have been trying to download some files from a website. On this website, there are public and private files. TO access the private files you have to log in with your account. I do have an account and I am able to manually download the files. However, to speed up the process I have written the following script:
def download_fields(year, target_path, save_path):
for ii in range(len(year)):
folder_location1 = save_path '\_' str(year[ii])
if not os.path.exists(folder_location1):
os.mkdir(folder_location1)
month = ['01', '02','03','04','05','06','07','08','09','10','11','12']
for kk in range(len(month)):
url = target_path "/" str(year[ii]) "/" str(month[kk])
#If there is no such folder, the script will create one automatically
folder_location = save_path '\_' str(year[ii]) '\_' month[kk]
if not os.path.exists(folder_location):
os.mkdir(folder_location)
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
for link in soup.select("a[href$='.cdf']"):
#Name the pdf files using the last portion of each link which are unique in this case
filename = os.path.join(folder_location,link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url,link['href'])).content)
And run the code like:
"""Download files"""
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
year = [ '2021']
target_path ="http://sweap.cfa.harvard.edu/data/sci/fields/l2/mag_RTN_4_Sa_per_Cyc"
save_path = r"C:\Users\nikos.000\coh_struct_distance\data\psp\magnetic_field"
""" Run functions"""
download_fields(year, target_path, save_path)
The code works fine for the public files but gives me the following error when I try to download the private ones:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
</body></html>
Any idea how I could fix this issue?
CodePudding user response:
