I was working on a task and got an error. Which says that invalid char between encapsulated token and delimiter. Here is the SS of the data
For line 08, I was using the pipe as a delimiter, and the escape character was '^'. For line 12 I was using the comma as a delimiter. The highlighted part is the issue. If I remove the cap sign from line 08 and a single quote from line 12 it runs with success. The processor is ConverRecord and here is the screenshot of the configs of the processor.
Actually, I am using two processors of ConvertRecord. In one processor the fields separator is a comma(,) whereas in the second processor the fields separator is also comma(,) but the escape character is Cap sign(^).
assume that these are two different records.
Why it is throwing error at that point? And how can I solve this issue?
Thanks in advance.
CodePudding user response:
For the first sample data (line 08), configure CSVReader as:
Quote Character: "Escape Character: \Value Separator(delimiter): |
For the second sample data (line 12), configure CSVReader as:
Quote Character: "Escape Character: \Value Separator(delimiter): ,
The reason for failure is that your data does not conform with delimited data specifications i.e. data is invalid, so you need to add upstream cleanup logic.
For line 08 data - you have used escape character as ^ and same is appeared in the data as well, so when CSVReader encountered ^" it escaped " because of this opening double quote does not have a corresponding closing double quote causing to throw the exception. So setting Escape Character: \ property will resolve the issue. \ is kind of widely used escape character, so it is very rare to get \ as a part of data.
For line 12 data - seems like single quote ' is used as Quote Character and missing a corresponding closing quote character i.e. ' causing to throw the exception. You need to devise a logic that will add the missing closing quote character wherever required. A workaround would be to use Quote Character: " so that ' will be the part of the data and then you can clean it at downstream eg. if you are putting data into a table then post-ingestion updating the column to remove '


