Home > Software design >  How to get file name from a URL when content-disposition is missing in headers using python?
How to get file name from a URL when content-disposition is missing in headers using python?

Time:01-17

I trying to automate downloading a bunch of pdfs. Among others, one URL is as follows

https://www.unpri.org/download?ac=4195

I'm using the following code to get the headers from this URL

import requests

h = requests.head(url, allow_redirects=True)
header = h.headers

print(header)

These are the headers {'Cache-Control': 'no-cache', 'Connection': 'close', 'Content-Type': 'text/html'}

There is no content-disposition or anything else that can give me file name. However, when I open this in the browser and right click --> save as, I get option to save with its original name(screenshot below)

Screenshot

Is there any way I can get this file name with python?

CodePudding user response:

Just add proper User-Agent and use the response headers to get the file name.

Here's how:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko/20100101 Firefox/95.0",
}

r = requests.get("https://www.unpri.org/download?ac=4195", headers=headers)
print(r.headers["Content-disposition"].split("=", -1)[-1])

Output:

PRI_Investor_guide_on_agricultural_supply_chain.pdf
  •  Tags:  
  • Related