I am looking to parse patterns like "w x h x l" using regex, so basically the letters w,h,l (and others) with "x" in between. There could be text around the searched expression, and "w x h x l x l x h" would be valid as well.
I have tried the regular expression
(w|h|l|b)(\\s\*x\\s\*(w|h|l|b))
but I don't understand why this doesn't work.
Examples (with python's re.findall):
"The measurements are (w x h x l): 5x7x3cm" => [(w,h,l)]
"Measurement options are (wxhxl), (hxlxb): Some random stuff" => [(w,h,l),(h,l,b)]
"The measurements, in form wxhxl: 5x7x3cm" => [(w,h,l)]
CodePudding user response:
You can use your pattern with non-capturing groups to extract all matches, and then split each match with x to get the separate chars:
import re
texts = [
"The measurements are (w x h x l): 5x7x3cm", # => [(w,h,l)]
"Measurement options are (wxhxl), (hxlxb): Some random stuff", # => [(w,h,l),(h,l,b)]
"The measurements, in form wxhxl: 5x7x3cm" # => [(w,h,l)]
]
for text in texts:
print( [tuple(''.join(x.split()).split('x')) for x in re.findall(r'\b[whlb](?:\s*x\s*[whlb]) \b', text)] )
See the Python demo. Output:
[('w', 'h', 'l')]
[('w', 'h', 'l'), ('h', 'l', 'b')]
[('w', 'h', 'l')]
The \b[whlb](?:\s*x\s*[whlb]) \b pattern matches
\b- word boundary[whlb]- aw,h,lorbchar(?:\s*x\s*[whlb])- one or more repetitions of anxenclosed with zero or more whitespaces and then aw,h,lorbchar\b- word boundary
CodePudding user response:
if you can make use of the PyPi regex module you can use the group captures and a named capture group:
\b(?<pat>[whlb])(?:\s*x\s*(?<pat>[whlb])) \b
\bA word boundary to prevent a partial word match(?<pat>[whlb])Group pat match one ofwhlb(?:Non capture group to repeat as a whole\s*x\s*(?<pat>[whlb])Match anxbetween optional whitespace chars and again named capture group pat
)Close the non capture group and repeat it 1 times to match at least a singlex\bA word boundary
See a regex demo for the capture group values and a Python demo.
import regex
pattern = r'(?<pat>[whlb])(?:\s*x\s*(?<pat>[whlb])) '
s = ("The measurements are (w x h x l): 5x7x3cm\n"
"The measurements, in form wxhxl: 5x7x3cm\n"
"Measurement options are (wxhxl), (hxlxb): Some random stuff\n"
"w x h x l x l x h")
for m in regex.finditer(pattern, s):
print(tuple(m.captures("pat")))
Output
('w', 'h', 'l')
('w', 'h', 'l')
('w', 'h', 'l')
('h', 'l', 'b')
('w', 'h', 'l', 'l', 'h')
