I'm writing a python regex formula that parses the content of a heading, however the greedy quantifier is not working well, and the non greedy quantifier is not working at all.
My string is
Step 1 Introduce The Assets:
Step2 Verifying the Assets
Step 3Making sure all the data is in the right place:
What I'm trying to do is extract the step number, and the heading, excluding the :.
Now I've tried multiple regex string and came up with these 2:
r1 = r"Step ?([0-9] ) ?(.*) ?:?"
r2 = r"Step ?([0-9] ) ?(.*?) ?:?"
r1 is capturing the step number, but is also capturing : at the end.
r2 is capturing the step number, and ''. I'm not sure how to handle the case where there is a .* followed by a string.
Necessary Edit:
The heading might contain : inside the string, I just want to ignore the trailing one. I know I can strip(':') but I want to understand what I'm doing wrong.
CodePudding user response:
You can write the pattern using a negated character class without the non greedy and optional parts using a negated character class:
\bStep ?(\d ) ?([^:\n] )
\bStep ?Match the wordStepand optional space(\d ) ?Capture 1 digits in group 1 followed by matching an optional space([^:\n] )Capture 1 chars other than:or a newline in group 2
If the colon has to be at the end of the string:
\bStep ?(\d ) ?([^:\n] ):?$
