Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ). The address string contains following Information:
- The first group is the street name
- The second group is the street number
- The third group is the zipcode (optional)
- The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching. I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here:
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout...

This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-] ?)\s*([\d\s] (?:\s?[-| \/]\s?\d )?\s*[a-z]?)?\s*(\d{5})\s*(. )?$/im
Try it out yourself: https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.
CodePudding user response:
You can use
^(.*?)(?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))?(?:\s (\d{4,5})(?:\s (.*))?)?$
See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).
Details:
^- start of string(.*?)- Group 1: any zero or more chars other than line break chars(?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))?- an optional non-capturing group matching\s- one or more whitespaces(\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b)- Group 2:\d- one or more digits(?:\s*[-| \/]\s*\d )*- zero or more sequences of zero or more whitespaces,-,,|or/, zero or more whitespaces, one or more digits\s*- zero or more whitespaces[a-z]?\b- an optional lowercase ASCII letter and a word boundary
(?:\s (\d{4,5})\b(?:\s (.*))?)?- an optional non-capturing group matching\s- one or more whitespaces(\d{4,5})- Group 3: four or five digits(?:\s (.*))?- an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$- end of string.
Please note that the (?:\s (.*))? optional group must be inside the (?:\s (\d{4,5})...)? group to work.

