Home > Software design >  Javascript regex replace character if preceded AND followed by a letter
Javascript regex replace character if preceded AND followed by a letter

Time:01-12

Basically I am working with a string that is a json string in python but when used in javascript it has the "'" tags instead of double quotes and I would like to turn it into a real json (by using the JSON.parse()) but there are some quotation marks in the middle of the sentences (because I replaced the "'" for double marks).

Example: '{"author": "Jonah D"Almeida", ... }' (I want to replace the one in between D and Almeida)

As it already has quotation marks around the whole sentence, javascript gives an error because it can't create a json out of it and so, to solve that basically I want to replace the quotation mark in the middle of the sentence for a ' (single mark) but only if it has letters preceeding and following the quotation mark.

My thought: myString.replace('letter before ... " ... letter after', "'")

Any idea how can I get the right expression for this? Basically I just want to know the regex expression the check if before and after the " quote it has letters, and if yes, change it to single mark (').

CodePudding user response:

The OP ... "Basically I am working with a string that is a json string"

The above example is not what the OP refers to as json string. The OP's example data string already is invalid JSON.

Thus the first thing was to fix the process which generates such data.

Because ...

"parsing valid JSON data will return a perfectly valid object, and in case of the OP's use case a correctly escaped string value as well. "

... proof ...

const testSample_A = { author: "Jonah D'Almeida" };
const testSample_B = { author: 'Jonah D"Almeida' };
const testSample_C = { author: 'Jonah D\'Almeida' };
const testSample_D = { author: "Jonah D\"Almeida" };

console.log({
  testSample_A,
  testSample_B,
  testSample_C,
  testSample_D,
});
console.log('JSON.stringify(...) ... ', {
  testSample_A: JSON.stringify(testSample_A),
  testSample_B: JSON.stringify(testSample_B),
  testSample_C: JSON.stringify(testSample_C),
  testSample_D: JSON.stringify(testSample_D),
});
console.log('JSON.parse(JSON.stringify(...)) ... ', {
  testSample_A: JSON.parse(JSON.stringify(testSample_A)),
  testSample_B: JSON.parse(JSON.stringify(testSample_B)),
  testSample_C: JSON.parse(JSON.stringify(testSample_C)),
  testSample_D: JSON.parse(JSON.stringify(testSample_D)),
});
.as-console-wrapper { min-height: 100%!important; top: 0; }

Edit

A sanitizing task which exactly follows the OP's requirements nevertheless can be achieved based on a regex which features both a positive lookahead and a positive lookbehind ... either for basic latin only /(?<=\w)"(?=\w)/gm or more international with unicode escapes ... /(?<=\p{L})"(?=\p{L})/gmu

console.log('Letter unicode escapes ...', `
  {"author": "Jonah D"Almeida", ... }
  {"author": "Jon"ah D"Almeida", ... }
  {"author": "Jon"ah D"Alme"ida", ... }`
    .replace(/(?<=\p{L})"(?=\p{L})/gmu, '\\"')
);
console.log('Basic Latin support ...', `
  {"author": "Jonah D"Almeida", ... }
  {"author": "Jon"ah D"Almeida", ... }
  {"author": "Jon"ah D"Alme"ida", ... }`
    .replace(/(?<=\w)"(?=\w)/gm, '\\"')
);

console.log(
  'sanitized and parsed string data ...',
  JSON.parse(`[
    { "author": "Jonah D"Almeida" },
    { "author": "Jon"ah D"Almeida" },
    { "author": "Jon"ah D"Alme"ida" }
  ]`.replace(/(?<=\p{L})"(?=\p{L})/gmu, '\\"'))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

  •  Tags:  
  • Related