For example, if I want to match all text between two different tags, as long as the first tag doesn't appear again within the text in between.
So let's say the specific strings I want to match between are "<tag 1>hello</tag 1>" and "<tag 2>hi there</tag 2>" and the specific string I don't want in between them is "<tag 1>"
So I'd want a match with this:
<tag 1>hello</tag 1>
a bunch of text that includes newlines
<tag 2>hi there</tag 2>
But not a match with this:
<tag 1>hello</tag 1>
a bunch of text that includes newlines
<tag 2>something other than hi there</tag 2>
<tag 1>something other than hello</tag 1>
a bunch of text that includes newlines
<tag 2>hi there</tag 2>
I've tried
<tag 1>hello</tag 1>[\S\s]*?(?=<tag 1>|$)<tag 2>hi there</tag 2>
Which doesn't work.. just doesn't match anything.
I'll be using python with this, so python regex dialect would be good.
CodePudding user response:
"<tag 1>hello</tag 1>.*(\n .*|\s)*(?:(?!tag 1).)*(\n .*|\s)*.*<tag 2>hi there</tag 2>"mg
This regex:
(?:(?!tag 1).)*- excludes tag 1 string as a non-capturing group(\n .*|\s)*- matches text on multiple lines.*at the end of the expression allows multiple new lines between 2 strings<tag 1>hello</tag 1>.*....*<tag 2>hi there</tag 2>- matches everything between the strings <tag 1>hello</tag 1> and <tag 2>hi there</tag 2>
CodePudding user response:
This worked
<tag 1>hello</tag 1>(?:[^<]*(?:<(?!tag 1)[^<]*)*?)<tag 2>hi there</tag 2>
Thanks to bobble bubble who suggested it in a comment.
CodePudding user response:
<tag 1>hello<\/tag 1>(?:\n)?(?!<tag1>)[a-zA-Z\s\n]*<tag 2>hi there<\/tag 2>
<tag 1>hello<\/tag 1>(?:\n)? matches the string '<tag 1>hello</tag 1>' and allows for a new line after it.
(?!<tag1>) makes sure the string ' does not appear (negative lookahead)
[a-zA-Z\s\n]* matches 0 or more letters, spaces and newlines
<tag 2>hi there<\/tag 2> matches the string '<tag 2>hi there</tag 2>'
