In our Python system, I'm trying to isolate the second part of a size to make sure i can save the values separately.
As i got data in tons of different ways i have to take a lot of scenarios into consideration! At the same time our system requires everything to be in group 1 to be identified correctly, which increases the complexity!
This is what i got so far:
(?<=[\/\-])\s*([A-Za-z] |\w ) ?(?!\d*\s*\)|\d*\)|\w*\))(?!\s*[\/\-] )
Examples
working
These are my examples working:
110/116
S/M
S / M
S/M(32-34)
110/116(10-12y)
110/116(S/M)
not working
However my regex only functions correctly on the above examples.
Following 7 are causing issues:
S/M / L /XL
S / M / L / XL
S/M / L/XL
S/M/L/XL
S/M/L/XL(30-32)
S/M / L/XL(30-32)
S/M / L / XL(30-32)
How can I capture those cases as in below table:
| Case | Input | Expected capture in group 1 |
|---|---|---|
| 1 | S/M / L /XL |
"L /XL" |
| 2 | S / M / L / XL |
"L / XL" |
| 3 | S/M / L/XL |
"L/XL" |
| 4 | S/M/L/XL |
"L/XL" |
| 5 | S/M/L/XL(30-32) |
"L/XL" |
| 6 | S/M / L/XL(30-32) |
"L/XL" |
| 7 | S/M / L / XL(30-32) |
"L / XL" |
Issue
How can I capture a "/" in the middle including the whole part after (like /XL) but without any following parentheses (like not the (30/32)).
Example for S/M / L / XL(30-32) I want to capture L / XL only.
CodePudding user response:
You can use
(?<=[/-])\s*([A-Z] (?:\s*/\s*[A-Z] )?|\d )\b(?!\s*[/)-])
See the regex demo. Details:
(?<=[/-])- a position immediately preceded with/or-\s*- zero or more whitespaces([A-Z] (?:\s*/\s*[A-Z] )?|\d )- Group 1: one or more uppercase letters, and then an optional sequence of a/char enclosed with zero or more whitespaces and then one or more uppercase letters, or one or more digits\b- a word boundary(?!\s*[/)-])- immediately to the right of the current location, there can't be zero or more whitespaces and then either/,)or-.
