This seems like a simple match, but I'm unable to figure out how to match all text that starts with a known block of text and ends with a semicolon newline. What I have right now mostly works:
pattern = r'''[ ] (value \w \n)([^;] )'''
For an example section of text that allows me to parse:
value Y1N5NALC
1 = 'Yes'
5 = 'No'
7 = 'Not ascertained' ;
value AGESCRN
15 = '15 years'
16 = '16 years';
However, if any of the key/value pairs contain a semicolon in the string the match fails early since the regex is looking for any semicolon. An example:
value Y1N5NALC
1 = 'Yes'
5 = 'No;Maybe'
7 = 'Not ascertained' ;
What I'd like to do is end the match by looking for a semicolon Optional(space or tab) newline. Using ([^;\n] ) fails since the newline gets match to the negative.
CodePudding user response:
You can use
(?sm)^ (value \w \n)(.*?);$
See the regex demo.
Details:
(?sm)-re.Sandre.Mare on^- start of a line- one or more spaces(value \w \r?\n)- Group 1:value, space, one or more word chars, and and an LF line break(.*?)- Group 2:;- a;$- at the end of a line.
In case there can be CRLF endings, you need
(?sm)^ (value \w \r?\n)(.*?);\r?$
