Azure Synapse Analytics - column load with high length-CodePudding

I have a column which contains description and has length around 80000(contains unicode characters). I am unable to load it to Azure Synapse Analytics. Datatype defined in Synapse is nvarchar. We are using Azure Databricks. I tried options with maxStrLength (.option('maxStrLength', 4000)), but the max value for this is 4000. So, I am getting this error.

: com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Underlying SQLException(s):

com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated. [ErrorCode = 107090] [SQLState = S0001]

Could you please let me know if there any way to solve this issue?

Update - The issue is resolved by removing .option('maxStrLength', 4000) while writing the dataframe into Synapse and declaring the target datatype as nvarchar(max)

CodePudding user response：

You can store up to 1 billion 2-byte Unicode characters by using nvarchar [ ( n | max ) ].

Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.

You can refer the related SO threads mentioned below:

How does SQL Server store more than 4000 characters in NVARCHAR(max)?

SQL query variable nvarchar(max) can not store more than 4000 characters