What I want
I'm trying to work out a way in which I can use regex to find two groups in RST news files. I want get change level as well as the change text, for instance a following .rst file:
- hence I want a following regex (changelevel): (change text)
- I was thinking about something like (changelevel): (anything until no next change level)
* Major: This is a **Major** change
* Minnor: This is is a minor change with a typo
* Patch: This
is a multiline
patch
Should return a match, group1 and group2 as following
Match 1:
"* Major: This is a **Major** change"
"* Major: "
"This is a major **Major** change"
Match 2:
"* Patch: This\nis a multiline\n patch"
"* Patch: "
"This\nis a multiline\n patch
What I need help with
I cannot make a regex that will take care of multilines and asterisks present in the "change text" I tried following logic
- Match the change level
^(\*\s (\w ):\s) - Match anything - with "dot matches newline" option turned on"
.* - Negative forward lookup until I match the change level
(?!^(\*\s (\w ):\s))
- I ended up with
^(\*\s (\w ):\s).*(?!^(\*\s (\w ):\s))but.*seems to just match everything to group 2
What works
I managed to get the first group working with a following regex which works works:
- beginning of the line
- star in front
- then whitespace
- a word
- colon
- white space
^(\*\s (\w ):\s)
CodePudding user response:
You are almost there, you can write the pattern using the lookahead and introduce matching a newline and if the assertions succeeds, then match the whole line.
^(\*\s \w :\s)(.*(?:\n(?!\*\s \w :\s).*)*)
Explanation
^Start of string(Capture group 1\*\s \w :\smatch*, 1 whitespace chars, 1 word chars,:and a whitespace char
)Close group 1(Capture group 2.*Match the whole line(?:Non capture group to repeat as a whole\nMatch a newline(?!\*\s \w :\s)The negative lookahead, asserting not the starting pattern here.*Match the whole line
)*Close the non capture group and optionally repeat it to match alles lines
)Close group 2
See a regex demo and a Python demo.
Example code:
import re
pattern = r"^(\*\s \w :\s)(.*(?:\n(?!\*\s \w :\s).*)*)"
s = ("* Major: This is a **Major** change\n"
"* Minnor: This is is a minor change with a typo\n"
"* Patch: This\n"
"is a multiline\n"
" patch")
result = re.findall(pattern, s, re.MULTILINE)
print(result)
Output
[('* Major: ', 'This is a **Major** change'), ('* Minnor: ', 'This is is a minor change with a typo'), ('* Patch: ', 'This\nis a multiline\n patch')]
CodePudding user response:
re.findall(r'(\*\s*\w :\s )([\s\S]*?(?=\n\*|$))',text)
Use
\newline followed by*or end of string$as a anchorGroup 1: A literal
*followed by zero or more\spaces and any\word character, a literal:and one or more\spacesGroup 2: Match everything non greedily
*?upto\n\*or$


