I have the following code that basically logs in and navigates to a page where a file is listed that I want to download:
const getNextcloudDownloadUrl = async (): Promise<string> => {
const downloadUrl = `https://${BASEURL}${lastFile}`;
const fileName = downloadUrl.substring(downloadUrl.lastIndexOf('/') 1);
const download = await page.evaluate((downloadUrl, fileName) => {
https.get(downloadUrl, res =>
{
const file = fs.createWriteStream(`/tmp/${fileName}`);
res.pipe(file);
file.on('finish', () => {
file.close();
console.log('done');
});
})
}, downloadUrl, fileName);
return downloadUrl;
};
I cannot get it to work. The things breaks because Error: Evaluation failed: ReferenceError: https is not defined. I cannot get it to work. I want to download a 500 MB file. I have looked through everything. Tried fetch but that does not work with streams supposedly.
I have tried the following resources, but I cannot solve it:
- How to download file with puppeteer using headless: true?
- https://github.com/puppeteer/puppeteer/issues/299
- https://oncletom.io/2018/puppeteer-download-file/
- https://www.scrapingbee.com/blog/download-file-puppeteer/
- https://docs.browserless.io/docs/downloading-files.html
- https://help.apify.com/en/articles/1929322-handling-file-download-with-puppeteer
Here is the request when I copy it in Chrome DevTools (but I since have found out that this does not work due to streams):
fetch(downloadUrl, {
"headers": {
"accept": "text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6,es;q=0.5,sv;q=0.4,ru;q=0.3",
"sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"97\", \"Chromium\";v=\"97\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "oc_sessionPassphrase=......."
},
"referrerPolicy": "no-referrer",
"body": null,
"method": "GET"
});
CodePudding user response:
Consider having 2 functions and awaiting the download, returning a Promise that resolves when the download completes:
await downloadUrl( getNextcloudDownloadUrl() )
CodePudding user response:
I have gotten it working the following way:
// Download file
const fileName = downloadUrl.substring(downloadUrl.lastIndexOf('/') 1);
const cookies = await page.cookies();
const cStr = cookies.map((c: any) => `${c.name}=${c.value}`).join(';');
const fRes = fetch(downloadUrl, {
headers: {
accept: 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6,es;q=0.5,sv;q=0.4,ru;q=0.3',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
cookie: cStr,
},
referrerPolicy: 'no-referrer',
body: null,
method: 'GET',
});
return await fRes
.then(
(res) =>
new Promise(async (resolve, reject) => {
const gcsFile = await uploadNextcloudFileToGoogleCloudStorage(fileName);
const dest = gcsFile.createWriteStream();
// @ts-ignore
res.body.pipe(dest);
// @ts-ignore
res.body.on('finish', () => resolve('it worked'));
dest.on('error', reject);
})
)
.then((x) => {
return {
status: 200,
downloadUrl,
fileName,
};
})
.catch((e) => {
return {
status: 400,
error: e,
};
});
This then automatically uploads the file to Google Cloud Storage without storing it in tmp.
