I am new to Python and learning data analysis. I am trying to scrape data from this web page: https://bitinfocharts.com/dogecoin/address/DN5Hp2kCkvCsdwr5SPmwHpiJgjKnC5wcT7
I am able to scrape data with simple websites but I think since BitInfoCharts has tables it may be a more complex HTML setup than the tutorials I am following.
My goal is to scrape the data from the table which includes Block, Time, Amount, Balance, ect and have it in a csv file. I previously tried using pandas but found that it was difficult to select the data I want from the HTML.
To do this, I think that what I need to do is get the header/table information from the " and then pull all of the information from each object inside that class that contains ". The class=trb changes from page to page (Example, one person may have 7 transactions, and another may have 40). I am not exactly sure though as this is new territory for me.
I would really appreciate any help.
import requests
from bs4 import BeautifulSoup as bs
url = 'https://bitinfocharts.com/dogecoin/address/DN5Hp2kCkvCsdwr5SPmwHpiJgjKnC5wcT7'
headers = {"User-Agent":"Mozilla/5.0"}
r = requests.get(url, headers=headers)
soup = bs(r.content)
table = soup.find_all("table_maina")
print(table)
CodePudding user response:
If you do decide to do it manually, this does the same thing:
import csv
import requests
from bs4 import BeautifulSoup as bs
url = 'https://bitinfocharts.com/dogecoin/address/DN5Hp2kCkvCsdwr5SPmwHpiJgjKnC5wcT7'
headers = {"User-Agent":"Mozilla/5.0"}
r = requests.get(url, headers=headers)
soup = bs(r.content,'lxml')
table = soup.find(id="table_maina")
headers = []
datarows = []
for row in table.find_all('tr'):
heads = row.find_all('th')
if heads:
headers = [th.text for th in heads]
else:
datarows.append( [td.text for td in row.find_all('td')] )
fcsv = csv.writer( open('x.csv','w',newline=''))
fcsv.writerow(headers)
fcsv.writerows(datarows)
CodePudding user response:
There is only one table element called 'table_maina' so you should call find() vs find_all(). Also, you need you specify the "table" tag as first argument in find() function.
Try:
table = soup.find('table', id='table_maina')
for tr in table.find_all('tr', class_='trb'):
print(tr.text)
Output:
4066317 2022-01-17 15:41:22 UTC2022-01-17 15:41:22 UTC-33,000,000 DOGE (5,524,731.65 USD)220,000,005.04121223 DOGE$36,831,545 @ $0.167$-28,974,248
4063353 2022-01-15 11:04:46 UTC2022-01-15 11:04:46 UTC 4,000,000 DOGE (759,634.87 USD)253,000,005.04121223 DOGE$48,046,907 @ $0.19$-23,283,618
...
Next, to output each row into CSV file then try this:
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://bitinfocharts.com/dogecoin/address/DN5Hp2kCkvCsdwr5SPmwHpiJgjKnC5wcT7'
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(url, headers=headers, verify=False)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find("table", id='table_maina')
with open('out.csv', 'w', newline='') as fout:
csv_writer = csv.writer(fout)
csv_writer.writerow(['Block', 'Time', 'Amount', 'Balance', 'Price', 'Profit'])
for tr in table.find_all('tr', class_='trb'):
tds = tr.find_all('td')
csv_writer.writerow([x.text for x in tds])
Output:
Block,Time,Amount,Balance,Price,Profit
4066317 2022-01-17 15:41:22 UTC,2022-01-17 15:41:22 UTC,"-33,000,000 DOGE (5,524,731.65 USD)","220,000,005.04121223 DOGE","$36,831,545 @ $0.167","$-28,974,248"
...
