Home > Blockchain >  URL link checker for CSV file
URL link checker for CSV file

Time:01-27

I have a csv with a list of http URL's. I need to check for each of the URL's listed if URL is reachable over http. How can I do that?

CodePudding user response:

You can check the URL's with a python script.

As input you need this csv structure

name,link
google,https://google.com
bla,https://doesnot.exist.com

Copy the following python code into a file: check_url.py
Then execute it with: python3 check_url.py

import csv
import urllib.parse
import urllib.request
import socket

# try to resolve the hostname
def hostname_resolves(hostname):
    try:
        socket.gethostbyname(hostname)
        return 1
    except socket.error:
        return 0    

# open file
file = open("links.csv")
csvreader = csv.reader(file)

# extract headers
header = []
header = next(csvreader)

# extract data
rows = []
for row in csvreader:
        rows.append(row)
rows

file.close()

# iterate over the links and check if they can be reached and respond with a valid http response code
for row in rows:

    # extract url
    url = row[1]
    print("check url: " url)

    # extract host
    parsed_url = urllib.parse.urlparse(url)
    host = parsed_url.netloc

    # try to resolve host over dns
    resolvable = hostname_resolves(host)

    # if the host could be resolve, try to do a http request
    url_reacheable_over_http = 0
    if resolvable == 1:
        http_status_code = urllib.request.urlopen(url).getcode()
        if http_status_code < 500:
            url_reacheable_over_http = 1
            
    row.append(url_reacheable_over_http)

# write the result to a new csv file
with open('links_checked_result.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)

    # write the header
    writer.writerow(header)

    for row in rows:
        # write the data
        writer.writerow(row)

The output should be a file links_checked_result.csv with this content:

name,link
google,https://google.com,1
bla,https://https://doesnot.exist.com,0
  •  Tags:  
  • Related