Here's the statement:
LOAD DATA INFILE '/var/lib/mysql-files/test.csv'
INTO TABLE test
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
Here's a line that imports fine:
1450327840667357185, 1, "This is my text"
If the text field has a comma though, it fails. For example, see this line:
1450327840667357185, 1, "This is my text, with a comma in it"
Why does it fail?
If I escape the comma (by putting a backslash in front of it) it works fine. This does not make sense though. I have stipulated that the fields may be enclosed by double quotes, so why doesn't it accept all the content therein as a content of the field?
If I have to reprocess all the text to quote comma's, it would be a big task.
CodePudding user response:
Your CSV has spaces after the , separator.
I don't have MySQL to play with, but I expect that means it's seeing those spaces as part of the values, causing all sorts of problems.
This means that the CSV line is read as...
- VAL =
1450327840667357185 - SEP =
, - VAL =
1 - SEP =
, - VAL =
"This is my text - SEP =
, - VAL =
with a comma in it"
Note that the leading spaces are included in the values.
This means that the third values does not start with " (it starts with a space), so the ENCLOSED BY '"' isn't doing anything. This means that the parser treats the string as unquoted, which in turn means that the subsequent , is seen as a separator.
Try TERMINATED BY ', '...
This will mean that the the parser now reads the line as...
- VAL =
1450327840667357185 - SEP =
, - VAL =
1 - SEP =
, - VAL =
"This is my text, with a comma in it"
Note that the third value now does start with " (because the leading space is now part of the separator), and so is a quoted string and so the subsequent , is not treated as a separator.
CodePudding user response:
Use Input Preprocessing. Load the whole row into a variable then parse it to separate columns.
LOAD DATA INFILE '/var/lib/mysql-files/test.csv'
INTO TABLE test
-- FIELDS TERMINATED BY ''
(@tmp)
SET column1 = SUBSTRING_INDEX(@tmp, ',', 1),
@tmp = TRIM(SUBSTRING(@tmp FROM 1 LOCATE(',', @tmp))),
column2 = SUBSTRING_INDEX(@tmp, ',', 1),
column3 = TRIM(BOTH '"' FROM TRIM(SUBSTRING(@tmp FROM 1 LOCATE(',', @tmp))))
