Home > Software design >  Scraping data from basketball reference and it is not looping through full url
Scraping data from basketball reference and it is not looping through full url

Time:01-28

The code only loops up to this point in the url 'https://www.basketball-reference.com/teams/{0}' and nothing after, so it is grabbing the incorrect data on an incorrect url

team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')



for i in team_abbrev:
    url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)

    team_perf = pd.read_html(url)[0]

CodePudding user response:

You aren't iterating through the rows in your .csv or pd dataframe. First you need to load your csv into your dataframe, then you need to iterate through that dataframe:

def baskiceball():

    filename = 'C:/Users/Me/Desktop/teams.csv'
    df = pd.read_csv(filename)
    for index, row in df.iterrows():
        for x in range(0, len(row)):
            url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
            r = requests.get(url)
            data = r.status_code
            print(f"{row[x]}"   " | "   f"{data}")
baskiceball()

My teams.csv document has the team abbreviations in a single column:

team_abbreviation
SAC
GSW 

You plug row[x] into the query string

You make the request r = requests.get(url)

You read the request. In this instance I went with r.status_code since the url doesn't return json and I just wanted to show that it works. The result:

SAC | 200
GSW | 200
  •  Tags:  
  • Related