ERROR: invalid byte sequence for encoding "UTF8": 0xff-CodePudding

Im importing a databse called adventure works to postgresql

and these message appears

ERROR: invalid byte sequence for encoding "UTF8": 0xff CONTEXT: COPY businessentity, line 1 SQL state: 22021

CodePudding user response：

Vety likely, you have used the wrong client encoding when running the script Figure out what the encoding of the file is and use that as client encoding, so that PostgreSQL can convert the data properly.

CodePudding user response：

As the error says, the byte 0xFF isn't valid in a UTF8 file. Since you're trying to load data from a SQL Server sample database I suspect the file was saved as UTF16 with a Byte Order Mark. Unicode isn't a single encoding. Unicode text files can contain a signature at the start which specifies the encoding used in the file. As the link shows, for UTF16 the BOM can be 0xFF 0xFE or 0xFE 0xFF, values which are invalid in UTF8.

As far as I know you can't specify a UTF16 encoding with COPY, so you'll have to either convert the CSV file to UTF8 with a command line tool or export it again as UTF8. If you exported the data using any SQL Server tool (SSMS, SSIS, bcp) you can easily specify the encoding you want. For example :

bcp Person.BusinessEntity out "c:\MyPath\BusinessEntity.csv" -c -C 65001

Will export the data using the 65001 codepage, which is UTF8