How To Webscrape A Page With Javascript Using Python?-CodePudding

Having a bit of trouble and can't seem to figure this out...

I am trying to scrape the following URL below to get the body text, but seems like I am running into issues due to Javascript. Anyone have suggestions/thoughts on how to pull the text? Is this even possible? What is the best library to use?

https://www.solanalysis.com/

CodePudding user response：

I would use the XHR request that fetched the data directly:

import requests
import pandas as pd

url = 'https://solanalysis-graphql-dot-feliz-finance.uc.r.appspot.com/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
payload = {"operationName":"GetProjectStatsQuery",
           "variables":{"pagination_info":{"page_number":1,"page_size":100},
             "conditions":{},
             "order_by":[{"field_name":"market_cap","sort_order":"DESC"}]},
           "query":"query GetProjectStatsQuery($conditions: GetProjectStatsCondition, $order_by: [OrderConfig!], $pagination_info: PaginationConfig) {\n  getProjectStats(\n    conditions: $conditions\n    order_by: $order_by\n    pagination_info: $pagination_info\n  ) {\n    project_stats {\n      project_id\n      market_cap\n      volume_7day\n      volume_1day_change\n      floor_price\n      average_price\n      average_price_1day_change\n      max_price\n      twitter_followers\n      num_of_token_listed\n      project {\n        supply\n        website\n        img_url\n        display_name\n        __typename\n      }\n      __typename\n    }\n    pagination_info {\n      total_page_number\n      current_page_number\n      has_next_page\n      current_page_size\n      __typename\n    }\n    __typename\n  }\n}\n"}

jsonData = requests.post(url, headers=headers, json=payload).json()


df = pd.DataFrame(jsonData['data']['getProjectStats']['project_stats'])

Output:

print(df.head(5).to_string())
             project_id  market_cap  volume_7day  volume_1day_change  floor_price  average_price  average_price_1day_change  max_price  twitter_followers  num_of_token_listed                                                                                                                                                                                                                                                     project   __typename
0  shadowysupercoderdao   117666155       788363             -0.0510        115.0     119.822968                    -0.0085     133.00              30267                   57                                {'supply': 10000, 'website': 'https://genesysgo.com/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/shadowysupercoderdao.png', 'display_name': 'Shadowy Super Coder DAO', '__typename': 'Project'}  ProjectStat
1                   smb    82297085      1119240             -0.0572        137.0     167.117647                     0.0259     555.00              76989                  654                                                  {'supply': 5000, 'website': 'https://market.solanamonkey.business', 'img_url': 'https://storage.googleapis.com/feliz-crypto/smb.jpg', 'display_name': 'Solana Monkey Business', '__typename': 'Project'}  ProjectStat
2              degenape    47488440       432058             -0.0175         36.0      48.349231                    -0.0283     130.00              95024                 1487  {'supply': 10000, 'website': 'https://www.degenape.academy/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/degenape/small/127RACV8SfCbbVrLdRbukh63zCDcubW4xVGh6aV6pnZi.jpg', 'display_name': 'Degenerate Ape Academy', '__typename': 'Project'}  ProjectStat
3        boryokudragonz    30880912       921541              0.1739        280.0     221.028112                     0.0342     269.69              18831                   12                                           {'supply': 1111, 'website': 'https://boryokudragonz.io/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/boryokudragonz.png', 'display_name': 'Boryoku Dragonz', '__typename': 'Project'}  ProjectStat
4                aurory    26850233       257762             -0.0401         22.0      27.342396                    -0.0001      28.90             178981                 1073                                                                           {'supply': 10000, 'website': 'https://app.aurory.io', 'img_url': 'https://storage.googleapis.com/feliz-crypto/aurorylogo.png', 'display_name': 'Aurory', '__typename': 'Project'}  ProjectStat

CodePudding user response：

You have two options:

You can use Selenium to scrape data, loaded by Javascript.
You can explore XHR requests on the page and figure out how data is loaded on the page. Maybe you will have an option to send this XHR request by yourself using the simple requests library, for instance, and get desired data.