Home > Mobile >  How to pass a parameter into bigquery query on colab
How to pass a parameter into bigquery query on colab

Time:01-21

I have a Bigquery query on colab:

from google.colab import auth
auth.authenticate_user()
print('Authenticated')
project_id = '[your project ID]'

sample_count = 2000
df = pd.io.gbq.read_gbq('''
  SELECT name, SUM(number) as count
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE state = 'TX'
  AND year BETWEEN 1910 AND 1920
  GROUP BY name
  ORDER BY count DESC
  LIMIT 100
''', project_id=project_id, dialect='standard')

df.head()

It works, but now I try to pass a parameter into the query and replace '1920' in the query WHERE clause. this parameter is dependent on another file

end_year = max(record.year) # set end_year

df = pd.io.gbq.read_gbq('''
  SELECT name, SUM(number) as count
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE state = 'TX'
  AND year BETWEEN 1910 AND end_year
  GROUP BY name
  ORDER BY count DESC
  LIMIT 100
''', project_id=project_id, dialect='standard')

df.head()

But I get an error:

BadRequest: 400 Syntax error: Unexpected identifier "end_year"

I guess the parameter doesn't pass into the query successfully, but I don't know how to fix it.

CodePudding user response:

As far as python cares, the query you're passing to bigquery is a string, so it cannot call your variable.

Try something like this:

end_year = max(record.year) # set end_year

df = pd.io.gbq.read_gbq('''
  SELECT name, SUM(number) as count
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE state = 'TX'
  AND year BETWEEN 1910 AND %s
  GROUP BY name
  ORDER BY count DESC
  LIMIT 100
''' % (end_year), project_id=project_id, dialect='standard')

df.head()

Really important note here -- I'm assuming that this is a script you're personally running for one-off data analysis and not code used in a production application where SQL injection might be a concern.

CodePudding user response:

As @Mike Karp mentioned, the query in your code is a String that is why you are encountering an error whenever you are passing your variable directly to the query.

You may also use python's f string to format your string and be able to pass the variable inside your query.

from google.colab import auth
import pandas as pd
auth.authenticate_user()
print('Authenticated')
project_id = 'PROJECT_ID'

end_year = max(record.year) # set end_year

query = (f" SELECT name, SUM(number) as count \
  FROM `bigquery-public-data.usa_names.usa_1910_2013` \
  WHERE state = 'TX' \
  AND year BETWEEN 1910 AND {end_year} \
   GROUP BY name \
  ORDER BY count DESC \
  LIMIT 100")

df = pd.io.gbq.read_gbq(query=query, project_id=project_id, dialect='standard')

df.head()
  •  Tags:  
  • Related