Scraping data from basketball reference and it is not looping through full url-CodePudding

The code only loops up to this point in the url 'https://www.basketball-reference.com/teams/{0}' and nothing after, so it is grabbing the incorrect data on an incorrect url

team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')



for i in team_abbrev:
    url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)

    team_perf = pd.read_html(url)[0]

CodePudding user response：

You aren't iterating through the rows in your .csv or pd dataframe. First you need to load your csv into your dataframe, then you need to iterate through that dataframe:

def baskiceball():

    filename = 'C:/Users/Me/Desktop/teams.csv'
    df = pd.read_csv(filename)
    for index, row in df.iterrows():
        for x in range(0, len(row)):
            url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
            r = requests.get(url)
            data = r.status_code
            print(f"{row[x]}"   " | "   f"{data}")
baskiceball()

My teams.csv document has the team abbreviations in a single column:

team_abbreviation
SAC
GSW

You plug row[x] into the query string

You make the request r = requests.get(url)

You read the request. In this instance I went with r.status_code since the url doesn't return json and I just wanted to show that it works. The result:

SAC | 200
GSW | 200