Actually, I am trying to fetch the content from the Product Description from the Nykaa Website.
URL:- https://www.nykaa.com/nykaa-skinshield-matte-foundation/p/460512?productId=460512&pps=1&skuId=460502
This is the URL, and in the section of the Product description, clicking upon the 'Read More' button, at the end there is some text.
The Text which, I want to extract is :
Explore the entire range of Foundation available on Nykaa. Shop more Nykaa Cosmetics products here.You can browse through the complete world of Nykaa Cosmetics Foundation . Alternatively, you can also find many more products from the Nykaa SkinShield Anti-Pollution Matte Foundation range.
Expiry Date: 15 February 2024
Country of Origin: India
Name of Mfg / Importer / Brand: FSN E-commerce Ventures Pvt Ltd
Address of Mfg / Importer / Brand: 104 Vasan Udyog Bhavan Sun Mill Compound Senapati Bapat Marg, Lower Parel, Mumbai City Maharashtra - 400013
After inspecting the page, when I, 'disable the javascript' all the content from 'product description' vanishes off. It means the content is loading dynamically with the help of Javascript.
I have used 'selenium' for this purpose. And This, is what I have tried.
from msilib.schema import Error
from tkinter import ON
from turtle import goto
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import numpy as np
from random import randint
import pandas as pd
import requests
import csv
browser = webdriver.Chrome(
r'C:\Users\paart\.wdm\drivers\chromedriver\win32\97.0.4692.71\chromedriver.exe')
browser.maximize_window() # For maximizing window
browser.implicitly_wait(20) # gives an implicit wait for 20 seconds
browser.get(
"https://www.nykaa.com/nykaa-skinshield-matte-foundation/p/460512?productId=460512&pps=1&skuId=460502")
# Creates "load more" button object.
browser.implicitly_wait(20)
loadMore = browser.find_element_by_xpath(xpath="/html/body/div[1]/div/div[3]/div[1]/div[2]/div/div/div[2]")
loadMore.click()
browser.implicitly_wait(20)
desc_data = browser.find_elements_by_class_name('content-details')
for desc in desc_data:
para_details = browser.find_element_by_xpath(
'.//*[@id="content-details"]/p[1]').text
extra_details = browser.find_elements_by_xpath(
'.//*[@id="content-details"]/p[2]', './/*[@id="content-details"]/p[3]', './/*[@id="content-details"]/p[4]', './/*[@id="content-details"]/p[5]').text
print(para_details, extra_details)
And this, is the output which is displaying.
PS E:\Web Scraping - Nykaa> python -u "e:\Web Scraping - Nykaa\scrape_nykaa_final.py"
e:\Web Scraping - Nykaa\scrape_nykaa_final.py:16: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
browser = webdriver.Chrome(
DevTools listening on ws://127.0.0.1:1033/devtools/browser/097c0e11-6f2c-4742-a2b5-cd05bee72661
e:\Web Scraping - Nykaa\scrape_nykaa_final.py:28: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
loadMore = browser.find_element_by_xpath(
[9312:4972:0206/110327.883:ERROR:ssl_client_socket_impl.cc(996)] handshake failed; returned -1, SSL error code 1, net_error -101
[9312:4972:0206/110328.019:ERROR:ssl_client_socket_impl.cc(996)] handshake failed; returned -1, SSL error code 1, net_error -101
Traceback (most recent call last):
File "e:\Web Scraping - Nykaa\scrape_nykaa_final.py", line 28, in <module>
loadMore = browser.find_element_by_xpath(
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 520, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1244, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/div/div[3]/div[1]/div[2]/div/div/div[2]"}
(Session info: chrome=97.0.4692.99)
Stacktrace:
Backtrace:
Ordinal0 [0x00FDFDC3 2555331]
Ordinal0 [0x00F777F1 2127857]
Ordinal0 [0x00E72E08 1060360]
Ordinal0 [0x00E9E49E 1238174]
Ordinal0 [0x00E9E69B 1238683]
Ordinal0 [0x00EC9252 1413714]
Ordinal0 [0x00EB7B54 1342292]
Ordinal0 [0x00EC75FA 1406458]
Ordinal0 [0x00EB7976 1341814]
Ordinal0 [0x00E936B6 1193654]
Ordinal0 [0x00E94546 1197382]
GetHandleVerifier [0x01179622 1619522]
GetHandleVerifier [0x0122882C 2336844]
GetHandleVerifier [0x010723E1 541697]
GetHandleVerifier [0x01071443 537699]
Ordinal0 [0x00F7D18E 2150798]
Ordinal0 [0x00F81518 2168088]
Ordinal0 [0x00F81660 2168416]
Ordinal0 [0x00F8B330 2208560]
BaseThreadInitThunk [0x76C9FA29 25]
RtlGetAppContainerNamedObjectPath [0x77337A9E 286]
RtlGetAppContainerNamedObjectPath [0x77337A6E 238]
Please, anyone help me getting this issue resolved, or any another specific piece of the code to write, which I am missing to fetch the text content from Product description. It would be a big help.
Thanks
