I'm trying to create a very rudimentary parser that would take a multi-line string and convert that into an array containing objects. The string would be formatted like this:
title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com
title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com
The goal is to turn this into an array where each section separated by one or more empty lines would be an object containing key/value pairs with the colon as the separator in between the key and value, and one new line as the separator in between individual key/value pairs. So the input above should result in the following output:
[
{
title: "This is a title",
description: "Shorter text in one line",
image: "https://www.example.com"
},
{
title: "This is another title : with colon",
description: "Longer text that potentially could span over several new lines, even three or more",
image: "https://www.example.com"
},
{
title: "This is another title, where the blank lines above are two",
description: "Another description",
image: "https://www.example.com"
}
]
I've started with this CodePen, but as you can see, the code currently have a few problems that needs to be solved before it's complete.
- If colons are used in the value, they shouldn't be split. I somehow need to make the split by the first occurence of a colon and then ignore additional colons in the value. This currently results in the following:
// Input:
// title: This is another title : with colon
// image: https://www.example.com
{
image: " https",
title: " This is another title "
}
- Some lines could contain a value that spans over multiple lines. The line breaks in the value should be concatenated into a single line and not be treated as a separator for a new key/value pair. This currently results in the following:
// Input:
// description: Longer text that potentially
// could span over several new lines,
// even three or more
{
could span over several new lines,: undefined,
description: " Longer text that potentially",
even three or more: undefined
}
Would greatly appreciate any help with how to approach this given the code I have so far. Any suggestions on how to optimise the code to be more performance efficient is also very welcome.
CodePudding user response:
As a partial-answer, the below will handle the multiple semicolons on one line:
var input = `title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com
title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com`;
var finalArray = [];
var first = input.split(/\n\s*\n/);
console.log("Array with sections split:", first);
first.forEach(function (section) {
var result = section.split("\n").reduce(function (o, pair) {
pair = pair.split(":");
return (o[pair.shift()] = pair.join(':')), o;
}, {});
console.log(result);
finalArray.push(result);
});
console.log("Array of sections as objects:", finalArray);
This still doesn't handle multi-line values, but the issue is that in your schema there is no way to determine when a new line means the start of a new property and when it is just the continuation of a value. You already rule out using colon and comma separation so you've now got no way to solve your second issue.
I'd advise using a special character that you don't allow in the main text body to denote the end of a key-value pair and splitting based on that.
CodePudding user response:
There is a very simple rule if you work with text, always keep in mind regular expressions.
Try this approach:
const data = `title: This is a title
description: Shorter text in one line
image: https://www.example.com
title: This is another title : with colon
description: Longer text that potentially
could span over several new lines,
even three or more
image: https://www.example.com
title: This is another title, where the blank lines above are two
description: Another description
image: https://www.example.com`;
const bloks = data.split(/\n\s*\n/);
result = bloks.map((blok) => {
const title = blok.match(/(?<=title:)([\S\s]*\n?)(?=description:)/gm).join(' ').trim();
const description = blok.match(/(?<=description:)([\S\s]*\n?)(?=image:)/gm).join(' ').replaceAll('\n', ' ').trim();
const image = blok.match(/(?<=image:)([\S\s]*\n?)(?=)/gm).join(' ').trim();
return { title, description, image };
})
console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }
