so I have this example string out of a html mail given:
Abholstellenname (Firmenname, Details): Musterfirma GmbH<br>
I'm using the following expression to find the company name, in this case Musterfirma GmbH:
(?<=Abholstellenname \(Firmenname, Details\): ).*
But I need to exclude the <br> tag following the company name.
How can I achieve this?
I would not ask here if I haven't read through the tutorials and still didn't get it.
CodePudding user response:
You can use
(?<=Abholstellenname \(Firmenname, Details\): ).*?(?=<br>|$)
The main idea is to turn the .* part into a .*?(?=<br>|$) pattern that matches any zero or more chars other than line break chars as few as possible followed with either <br> or end of string.
See the regex demo.
If the spaces can be any whitespace chars, replace the literal spaces in the pattern with \s.
CodePudding user response:
You would need to escape spaces with \s and escape parenthesis with \( and \)
[^<br>] matches any char other than <, >, b and r. This could work for your <br> but if you have anything after that, it will be captured again.
(?<=Abholstellenname\s\(Firmenname,\sDetails\):\s).*[^<br>]
