I have a string like following
19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit
How can I write a regex that would give me these two separate strings
19990101 - John DoeLorem ipsum dolor sit amet
19990102 - Elton Johnconsectetur adipiscing elit
The regex I wrote works up to this
/\d -/gm
But I don't know how can I include the alphabets there as well
CodePudding user response:
For the OP's use case a regex based split like with ... str.split(/(?<=\w)\s (?=\d)/) ... already should do it.
The regex uses lookarounds, here trying to match any whitespace (sequence)/\s which is both led/(?<= ... ) by a word/\w and is followed/(?= ... ) by a digit/\d character.
console.log(
'19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit 19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit'
.split(/(?<=\w)\s (?=\d)/)
);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
You can use
const text = '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit';
console.log(text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g))
console.log(text.split(/(?!^)\s (?=\d \s -)/))
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
The text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g) approach is extracting the alphanumeric/whitespace chars after \d \s - pattern. Details:
\d- one or more digits\s- one or more whitespaces-- a hyphen[A-Za-z0-9\s]*- zero or more alphanumeric or whitespace chars[A-Za-z]- a letter
The text.split(/(?!^)\s (?=\d \s -)/) splitting approach breaks the string with one or more whitespaces before one or more digits one or more whitespaces -:
(?!^)- not at the start of string\s- one or more whitespaces(?=\d \s -)- a positive lookahead that matches a location that is immediately followed with one or more digits one or more whitespaces-.
