I'm trying to make a RegEx that matches all thematic breaks in a string for use in JavaScript's String.split function.
A thematic break can be:
- Hyphens:
--- - Asterisks:
*** - Underscores:
___
Can have whitespace between the hyphens, asterisks or underscores, but you can't mix-n-match, for example this is not valid --*.
Full spec: https://spec.commonmark.org/0.30/#thematic-breaks
Here's what I've tried: /[-*_]{3,}/g but that does not match ones with whitespace in the middle, if I add a space there it will match stuff like -- which is not desirable. I also thought of first striping the whitespace but I'd like to fit it all into a RegEx.
Is this possible? And how?
CodePudding user response:
You can use this regex:
/^[ ]{0,3}([-*_])\s*\1\s*\1 \s*$/gm
Explanation:
^ - match start of line
[ ]{0,3} - match optional up to 3 spaces
([-*_]) - match either -, * or _ and put it in a group
\s*\1\s*\1 \s* - match optional white spaces and the character from the first group twice
$ - match end of line
Edit (from comment):
/^[ ]{0,3}([-*_])\s*(?:\1\s*){2,}$/gm
It now supports repeated patterns as long as the character used is the same.
This group:
(?:\1\s*) is repeated 2 or more times.
Examples of matches:
***
- - -
__ _
** * ** * ** * **
Examples of non matches:
*-_
abc
I should add that I use \s although the spec says 'space or tab'. Since this must be parsed line by line, \s should be safe.
You can test the regex here.
