I want to scrape https://www.airport-data.com/manuf/Reims.html and iterated through all and extract the results into AircraftListing.csv The code runs without error, but the results are incorrectly populated and not all the records are extract from the webpage to the .csv file
How can I get out all Reims aviation records to the AircraftListing.csv ?
import requests
from bs4 import BeautifulSoup
import csv
root_url = "https://www.airport-data.com/manuf/Reims.html"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')
paging = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")
start_page = paging[1].text
last_page = paging[len(paging)-2].text
outfile = open('AircraftListing.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Tail_Number", "Year_Maker_Model", "C_N","Engines", "Seats", "Location"])
pages = list(range(1,int(last_page) 1))
for page in pages:
url = 'https://www.airport-data.com/manuf/Reims:%s.html' %(page)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
print ('https://www.airport-data.com/manuf/Reims:%s' %(page))
product_name_list = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")
# Each row has 6 elements in it.
# Loop through every sixth element. (The first element of each row)
# Get all the other elements in the row by adding to index of the first.
for i in range(int(len(product_name_list)/6)):
Tail_Number = product_name_list[(i*6)].get_text('td')
Year_Maker_Model = product_name_list[(i*6) 1].get_text('td')
C_N = product_name_list[(i*6) 2].get_text('td')
Engines = product_name_list[(i*6) 3].get_text('td')
Seats = product_name_list[(i*6) 4].get_text('td')
Location = product_name_list[(i*6) 5].get_text('td')
writer.writerow([Tail_Number, Year_Maker_Model, C_N, Engines, Seats, Location])
outfile.close()
print ('Done')
CodePudding user response:
There are better ways to do this but in lines 32-40 use:
# Each row has 6 elements in it.
# Loop through every sixth element. (The first element of each row)
# Get all the other elements in the row by adding to index of the first.
for i in range(int(len(product_name_list)/6)):
Tail_Number = product_name_list[(i*6)].get_text('td')
Year_Maker_Model = product_name_list[(i*6) 1].get_text('td')
C_N = product_name_list[(i*6) 2].get_text('td')
Engines = product_name_list[(i*6) 3].get_text('td')
Seats = product_name_list[(i*6) 4].get_text('td')
Location = product_name_list[(i*6) 5].get_text('td')
writer.writerow([Tail_Number, Year_Maker_Model, C_N, Engines, Seats, Location])
The comments explain what is going on.
