CodePudding user response:
The way I found the PDF link is by going to the page and looking at the page source. Then I used the finder tool and searched for PDF and found a meta tag.
<meta name="citation_pdf_url" content="https://dashboard.digital.auraria.edu/downloads/1e0b44c6-cd79-49a3-9eac-0b10d1a4392e"/>
I followed the link and it downloaded a PDF with the same title. In the following code below, you can get the entire tag or the contents using .attrs.get('content') at the end.
Required -> pip install bs4 requests
from bs4 import BeautifulSoup
import requests
url = "https://digital.auraria.edu/work/ns/8fb66c05-0ad2-4e56-8cc7-6ced34d0c126"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
pdf_link = soup.find("meta", attrs={'name': "citation_pdf_url"}).attrs.get('content')
print(pdf_link)
Good luck and let me know if you face any other issues!
CodePudding user response:
Just scrape the filename and add that name to this link, I got the link by actually downloading the file copying it's download address, removing file name and adding different one to test it, it works like a charm. https://storage.googleapis.com/us-fedora/fcrepo.binary.directory/ea/05/2a/ea052a597af18fb6a46c44254e9e596a7e93571f?GoogleAccessId=k8s-storage@repositories-307112.iam.gserviceaccount.com&Expires=1643411246&Signature=ICo51cFbe3By7JPJol8nLfxcic+V+v1uvGYjodATCXJc2I6XWSi7JWC8l+M6BTSVFOL8A0YioZOQggY8Afc0JJtiwInkxFHmVjleQ41he3RK5pwF4IwONeuQxcgUXYzd8p94sA5L0YZC6drAFb9mx4AJLwTdKQt7dZh146FmaQYY8ElGT6BpHX2t+K31UGP0pC75uFGUq6b3IDK11gPOCSvnrLGSAM1yulE8togDgZmw0BU77nLPkinXSIATCTjlHNxf5aUxlJkg0/tSM21b53JFvHGHHCQf8QSKtST4WCBA1up6BVX1YLbGLZXxQ07mf8K7jnQ4U/Xfnw6IoTpQxw==&response-content-disposition=attachment; filename=
so according to your example the link would look like this https://storage.googleapis.com/us-fedora/fcrepo.binary.directory/ea/05/2a/ea052a597af18fb6a46c44254e9e596a7e93571f?GoogleAccessId=k8s-storage@repositories-307112.iam.gserviceaccount.com&Expires=1643411246&Signature=ICo51cFbe3By7JPJol8nLfxcic+V+v1uvGYjodATCXJc2I6XWSi7JWC8l+M6BTSVFOL8A0YioZOQggY8Afc0JJtiwInkxFHmVjleQ41he3RK5pwF4IwONeuQxcgUXYzd8p94sA5L0YZC6drAFb9mx4AJLwTdKQt7dZh146FmaQYY8ElGT6BpHX2t+K31UGP0pC75uFGUq6b3IDK11gPOCSvnrLGSAM1yulE8togDgZmw0BU77nLPkinXSIATCTjlHNxf5aUxlJkg0/tSM21b53JFvHGHHCQf8QSKtST4WCBA1up6BVX1YLbGLZXxQ07mf8K7jnQ4U/Xfnw6IoTpQxw==&response-content-disposition=attachment; filename=IR00000195_00001.pdf

