Fom browser a csv file from URL (url 1) can be downloaded only if another URL of main page (url) is-CodePudding

If the url https://www.nseindia.com/companies-listing/corporate-filings-announcements is open in a tab of browser, I can download the CSV file using another url https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true from another Tab in same browser. Else Not and it says "resource not found". How can I implement it in python using pandas.

CodePudding user response：

This page uses Cookies to check if file was opened from first page.

You will have to use requests and Session to get first page and cookies, next use requests and Session (with cookies from previous request) to get file csv, and finally you will have to send data to pandas using io which simulate file in memory.

BTW: it seems it sends file with BOM (Byte Order Mark) so I read bytes data from r.content instead of text data from r.text and pandas will skip BOM

import requests
import pandas as pd
import io

# --- create Session with User-Agent from real browser ---

headers = {
    'User-Agent': 'Mozilla/5.0'
}

s = requests.Session()
s.headers.update(headers)

# --- get first page to get cookies --- 

url = 'https://www.nseindia.com/companies-listing/corporate-filings-announcements'
r = s.get(url)

# --- get file ---

url = 'https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true'
r = s.get(url)

print(r.text[:100])  # code `ï»¿` at the beginning means BOM
                     # so I will use `r.content` instead of `r.text`

# --- read file from memory ---

#df = pd.read_csv(io.StringIO(r.text))   # it doesn't remove BOM
df = pd.read_csv(io.BytesIO(r.content))  # it removes BOM

# --- show it ---

print(df.head())

Result:

ï»¿"SYMBOL","COMPANY NAME","SUBJECT","DETAILS","BROADCAST DATE/TIME","RECEIPT","DISSEMINATION","DIFF


      SYMBOL  ... DIFFERENCE
0  TATAELXSI  ...   00:00:08
1       RIIL  ...   00:00:10
2       ERIS  ...   00:00:06
3       RIIL  ...   00:00:09
4  INGERRAND  ...   00:00:09

[5 rows x 8 columns]