Can someone please help me with this?
I'm trying to match roman numerals with a "." at the end and then a space and a capital letter after the point. For example:
I. And here is a line.
II. And here is another line.
X. Here is again another line.
So, the regex should match the "I. A", "II. A" and "X. H".
I did this "^(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}){1,4}\.\s[A-Z]" But the problem is that this RegEx is also matching with ". A" and i don't want it.
In resume it should have at least one roman numeral, followed by a "." and then a space and a capital letter.
CodePudding user response:
You need a (?=[LXVI]) lookahead at the start that would require at least one Roman number letter at the start of the string:
^(?=[LXVI])(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
# ^^^^^^^^^
See the regex demo. Not sure why you used {1,4}, I suggest removing it.
Another workaround here would be to use a word boundary right after ^:
^\b(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
#^^
This would disallow a match where . appears at the start since \b, required at the same position as the start of string, requires that the next char must be a word char (and here, it must be a Roman number).
Regarding \.\s[A-Z], you may enhance it you add or * after \s, and if you ever need to match it and exclude from a match, turn it into a positive lookahead, (?=\.\s [A-Z]) or (?=\.\s*[A-Z]).
