I'm using Perl to highlight errors through my browser as I scan through pages of text. At this point, I want to ensure the text Seq is preceded by a maltese cross and space ✠ , otherwise highlight it. I also want to ignore n>Seq.
PS. If it's easier, I want to ignore > but it will always be n>. In fact, it would always be </span> - whichever is easiest to check for.
Example phrase: ✠ Seq. S. Evangélii sec. Joánnem. — In illo témpore
I'm trying to replace xySeq if xy is NOT a Maltese cross and a space ✠ , AND if xy is NOT the letter n and a greater than symbol n>.
In other words, I don’t want to substitute
✠ Seq
n>Seq
>Seq
</span>Seq
but I do want to replace things like
✠Seq
* Seq
a✠Seq
>aSeq
The following would work if I was just checking for single characters like ✠ or >
my $span_beg = q(<span class='bcy'>); # HTML markup for highlighting
my $span_end = q(</span>);
$phr =~ s/([^✠>]Seq)/$span_beg$1$span_end/g;
but [^✠ >]Seq will naturally only treat the ✠ and the space as one or the other.
I even tried [^(✠\s)>]Seq and a varible [^$var>] but these didn’t work.
I played with (?<!✠\s)Seq but didn't know how to incorporate > or if it was even the right way to go.
I hope this is possible, thanks for all.
Guy
CodePudding user response:
If you always want to tag Seq and exactly two characters before it, a couple of look-behinds might be enough:
s{..(?<!✠\s)(?<!n>)Seq}{$span_beg$&$span_end}g;
Or, with look-ahead:
s{(?!✠\s)(?!n>)..Seq}{$span_beg$&$span_end}g;
CodePudding user response:
This should be more efficient than performing lookaround at every position:
# Doesn't include preceding characters in the span.
s{(✠ |>)?Seq}{ $1 ? $& : "$span_beg$&$span_end" }eg
# Includes two preceding characters in the span.
s{(?:(✠ |>)|..)Seq}{ $1 ? $& : "$span_beg$&$span_end" }seg
