I'm parsing a file and need to extract street and house numbers in separate capture groups.
A file could look like this:
Start of String
Straße HNR : example street
More example data
Currently my expression looks like this:
/\s*Straße\s*HNR\s*:\s*(?<loc_street>\D )(?<loc_streetnumber>\d*\s{0,1}[a-z]*){0,1}/g
which matches things like:
Straße HNR : example street 1 a
Straße HNR : example street 12
correctly. But if I don't have a house number the (?<loc_street>\D ) matches just everything until the file ends, but I want to stop at the new line. Any hints?
CodePudding user response:
I would use this:
/\h*Straße\s*HNR\s*:\s*(?<loc_street>[^\d\n] )(?<loc_streetnumber>\d*\s?[a-z]*)?/g
You can check it here.
One key point is to match only horizontal spaces (\h) at the beginning of the line, or it could pick up possible newlines that are before that. \s is equivalent to [\r\n\t\f\v ].
Another point is to make sure you don't match newlines in the loc_street group. If you use \D you will match anything that is not a digit, including newline. By using [^\d\n] you explicitly match anything that is neither a digit nor a newline.
I replaced {0,1} with ?, but that is not important, just personal preference.
