Home > Mobile >  BeautifulSoup, Web Scraping & Pagination, csv
BeautifulSoup, Web Scraping & Pagination, csv

Time:01-06

I want to scrape https://www.airport-data.com/manuf/Reims.html and iterated through all and extract the results into AircraftListing.csv The code runs without error, but the results are incorrectly populated and not all the records are extract from the webpage to the .csv file

How can I get out all Reims aviation records to the AircraftListing.csv ?

import requests
from bs4 import BeautifulSoup
import csv

root_url = "https://www.airport-data.com/manuf/Reims.html"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')

paging = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")

start_page = paging[1].text
last_page = paging[len(paging)-2].text


outfile = open('AircraftListing.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Tail_Number", "Year_Maker_Model", "C_N","Engines", "Seats", "Location"])


pages = list(range(1,int(last_page) 1))
for page in pages:
    url = 'https://www.airport-data.com/manuf/Reims:%s.html' %(page)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')

    print ('https://www.airport-data.com/manuf/Reims:%s' %(page))

    product_name_list = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")

    # Each row has 6 elements in it.
    # Loop through every sixth element. (The first element of each row)
    # Get all the other elements in the row by adding to index of the first.
    for i in range(int(len(product_name_list)/6)):
        Tail_Number = product_name_list[(i*6)].get_text('td')
        Year_Maker_Model = product_name_list[(i*6) 1].get_text('td')
        C_N = product_name_list[(i*6) 2].get_text('td')
        Engines = product_name_list[(i*6) 3].get_text('td')
        Seats = product_name_list[(i*6) 4].get_text('td')
        Location = product_name_list[(i*6) 5].get_text('td')

        writer.writerow([Tail_Number, Year_Maker_Model, C_N, Engines, Seats, Location])

outfile.close()
print ('Done')

CodePudding user response:

There are better ways to do this but in lines 32-40 use:

# Each row has 6 elements in it.
# Loop through every sixth element. (The first element of each row)
# Get all the other elements in the row by adding to index of the first.
for i in range(int(len(product_name_list)/6)):
    Tail_Number = product_name_list[(i*6)].get_text('td')
    Year_Maker_Model = product_name_list[(i*6) 1].get_text('td')
    C_N = product_name_list[(i*6) 2].get_text('td')
    Engines = product_name_list[(i*6) 3].get_text('td')
    Seats = product_name_list[(i*6) 4].get_text('td')
    Location = product_name_list[(i*6) 5].get_text('td')

    writer.writerow([Tail_Number, Year_Maker_Model, C_N, Engines, Seats, Location])

The comments explain what is going on.

  •  Tags:  
  • Related