Snowflake Copy Into rejecting file without error in Python Snowflake Connector-CodePudding

I'm using the Python Snowflake Connector to PUT a JSON file into a Snowflake Stage and then COPY INTO to insert the JSON into a table.

Here's my code:

import snowflake.connector

snowflake_conn = snowflake.connector.connect(
            user=sf_user,
            password=sf_password,
            account=sf_account
            )

role_init = "USE ROLE ELT_ROLE"
wh_init = "USE WAREHOUSE TEST_WH"
db_init = "USE DATABASE TEST_DB"
schema_init = "USE SCHEMA TEST_SCHEMA"

snowflake_conn_cur.execute(role_init)
snowflake_conn_cur.execute(wh_init)
snowflake_conn_cur.execute(db_init)
snowflake_conn_cur.execute(schema_init)

remove_file_command = 'REMOVE @TEST_STAGE/test_file.json;'
put_file_command = 'PUT file://test_file.json @TEST_STAGE;'
truncate_existing_table_data_command = 'TRUNCATE TABLE OUTPUT_TABLE;'
copy_file_command = 'COPY INTO OUTPUT_TABLE FROM @TEST_STAGE/test_file.json file_format=(TYPE=JSON) on_error=CONTINUE;'

snowflake_conn_cur.execute(remove_file_command)
snowflake_conn_cur.execute(put_file_command)
snowflake_conn_cur.execute(truncate_existing_table_data_command)
snowflake_conn_cur.execute(copy_file_command)

My code executes successfully, but I noticed in Snowflake that the file was rejected (separate issue).

In the Snowflake Python Connector, on the cursor execute statements, is there a way to have it return back an error and use that to validate that it completed successfully or not?

Without that, it is basically failing silently. The only other way I can think of is to query the table after the fact to see if it has data, but that may not always help if the table wasn't truncated beforehand.

CodePudding user response：

For PUT/GET only it should return the error by default. For your example, using:

PUT file://test_file.json

on a Mac/Linux machine is incorrect (should be PUT file:///test_file.json) and would generate a stacktrace by default, like in this example:

cs = ctx.cursor()
cs.execute("PUT file://Users/<user>/Downloads/result_00XXX.csv @~")
cs.close()
ctx.close()

gets me:

$python basic_test.py
Traceback (most recent call last):
  File "basic_test.py", line 37, in <module>
    cs.execute("PUT file://Users/<user>/Downloads/result_00XXX.csv @~")
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 763, in execute
    sf_file_transfer_agent.execute()
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/file_transfer_agent.py", line 366, in execute
    self._init_file_metadata()
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/file_transfer_agent.py", line 966, in _init_file_metadata
    Error.errorhandler_wrapper(
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/errors.py", line 272, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/errors.py", line 327, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/Users/<user>/Documents/Connectors/python/snow/lib/python3.8/site-packages/snowflake/connector/errors.py", line 206, in default_errorhandler
    raise error_class(
snowflake.connector.errors.ProgrammingError: 253006: 253006: File doesn't exist: ['Users/<user>/Downloads/result_00XXXX.csv']

You can also use a try/catch block to just get the error:

cs = ctx.cursor()

try:
    cs.execute("PUT file://Users/<user>/Downloads/result_00XXX.csv @~")
except Exception as err:
    print(err)
finally:
    cs.close()
ctx.close()

gets me:

$ python basic_test.py
253006: 253006: File doesn't exist: ['Users/<user>/Downloads/result_00XXX.csv']

CodePudding user response：

Remove the "on_error=CONTINUE" command in the COPY INTO function in order to raise the error. The issue that was causing the error when trying to load the file into the table is that the file is too large.

To resolve the file size issue, since my JSON was encapsulated into an array, setting the STRIP_OUTER_ARRAY=TRUE on the file_format of the COPY INTO command removes the array and loads each JSON node into its own row in the target table.