Having a bit of trouble and can't seem to figure this out...
I am trying to scrape the following URL below to get the body text, but seems like I am running into issues due to Javascript. Anyone have suggestions/thoughts on how to pull the text? Is this even possible? What is the best library to use?
https://www.solanalysis.com/
CodePudding user response:
I would use the XHR request that fetched the data directly:
import requests
import pandas as pd
url = 'https://solanalysis-graphql-dot-feliz-finance.uc.r.appspot.com/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
payload = {"operationName":"GetProjectStatsQuery",
"variables":{"pagination_info":{"page_number":1,"page_size":100},
"conditions":{},
"order_by":[{"field_name":"market_cap","sort_order":"DESC"}]},
"query":"query GetProjectStatsQuery($conditions: GetProjectStatsCondition, $order_by: [OrderConfig!], $pagination_info: PaginationConfig) {\n getProjectStats(\n conditions: $conditions\n order_by: $order_by\n pagination_info: $pagination_info\n ) {\n project_stats {\n project_id\n market_cap\n volume_7day\n volume_1day_change\n floor_price\n average_price\n average_price_1day_change\n max_price\n twitter_followers\n num_of_token_listed\n project {\n supply\n website\n img_url\n display_name\n __typename\n }\n __typename\n }\n pagination_info {\n total_page_number\n current_page_number\n has_next_page\n current_page_size\n __typename\n }\n __typename\n }\n}\n"}
jsonData = requests.post(url, headers=headers, json=payload).json()
df = pd.DataFrame(jsonData['data']['getProjectStats']['project_stats'])
Output:
print(df.head(5).to_string())
project_id market_cap volume_7day volume_1day_change floor_price average_price average_price_1day_change max_price twitter_followers num_of_token_listed project __typename
0 shadowysupercoderdao 117666155 788363 -0.0510 115.0 119.822968 -0.0085 133.00 30267 57 {'supply': 10000, 'website': 'https://genesysgo.com/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/shadowysupercoderdao.png', 'display_name': 'Shadowy Super Coder DAO', '__typename': 'Project'} ProjectStat
1 smb 82297085 1119240 -0.0572 137.0 167.117647 0.0259 555.00 76989 654 {'supply': 5000, 'website': 'https://market.solanamonkey.business', 'img_url': 'https://storage.googleapis.com/feliz-crypto/smb.jpg', 'display_name': 'Solana Monkey Business', '__typename': 'Project'} ProjectStat
2 degenape 47488440 432058 -0.0175 36.0 48.349231 -0.0283 130.00 95024 1487 {'supply': 10000, 'website': 'https://www.degenape.academy/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/degenape/small/127RACV8SfCbbVrLdRbukh63zCDcubW4xVGh6aV6pnZi.jpg', 'display_name': 'Degenerate Ape Academy', '__typename': 'Project'} ProjectStat
3 boryokudragonz 30880912 921541 0.1739 280.0 221.028112 0.0342 269.69 18831 12 {'supply': 1111, 'website': 'https://boryokudragonz.io/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/boryokudragonz.png', 'display_name': 'Boryoku Dragonz', '__typename': 'Project'} ProjectStat
4 aurory 26850233 257762 -0.0401 22.0 27.342396 -0.0001 28.90 178981 1073 {'supply': 10000, 'website': 'https://app.aurory.io', 'img_url': 'https://storage.googleapis.com/feliz-crypto/aurorylogo.png', 'display_name': 'Aurory', '__typename': 'Project'} ProjectStat
CodePudding user response:
You have two options:
- You can use Selenium to scrape data, loaded by Javascript.
- You can explore XHR requests on the page and figure out how data is loaded on the page. Maybe you will have an option to send this XHR request by yourself using the simple
requestslibrary, for instance, and get desired data.
