Home > OS >  Regex that stop at a line
Regex that stop at a line

Time:01-15

I'm trying to build a regex that stop when a line is equal to "--- admonition".

For example, I have :

??? ad-question Quels sont les deux types de bornages ?

Il y en a deux :

- Le bornage amiable.

- Le bornage judiciaire.

test

--- admonition

I can have the same capture format multiple time on a page.

I want to retrieve (in every match) in a first group :

Quels sont les deux types de bornages ?

and in a second :

Il y en a deux :

  • Le bornage amiable.

  • Le bornage judiciaire.

test

I tried :

^\?{3} ad-question {1}(. )\n*((?:\n(?:^[^#].{0,2}$|^[^#].{3}(?<!---).*)) )

or

^\?{3} ad-question {1}(. )\n*((?:\n(?:^[^\n#].{0,2}$|^[^\n#](?<!----).*)) )

but it didn't stop at "\n--- admonition" and it took the new line between the two group.

Is someone can help me build this regex ?

ps : I must have a new line between the two group and between group 2 and "---- admonition". So these lines must be avoid in the groups.

Thanks for your help.

CodePudding user response:

If you want 2 capture groups without matching the newlines in between the groups, but there must be at least a whole empty line in between the groups:

^\?{3} ad-question (. )\n{2,}((?:(?!---).*\n)*?)\n ---

The pattern matches:

  • ^ Start of string
  • \?{3} ad-question Match ??? ad-question
  • (. ) Capture group 1, match the whole line
  • \n{2,} Match 2 or more newlines, so that there is at least an empty line in between
  • ( Capture group 2
    • (?:(?!---).*\n)*? Repeat as least as possible matching all lines and the newline, that do not start with ---
  • ) Close group 2
  • \n --- Match 1 or more newlines and ---

Regex demo

If there should be at least a single newline present:

^\?{3} ad-question (. )\n ((?:(?!---).*\n)*?)\n*---

Regex demo

CodePudding user response:

Try this regex:

\?{3}\s*(. )\s*((?:(?!-{3} admonition)[\s\S])*?)\s*-{3} admonition

Click for Demo


Explanation:

  • \?{3} - matches 3 occurrences of ?
  • \s* - matches 0 or more white-spaces
  • (. ) - matches 1 or more occurrences of any character except a new line and captures it in group 1
  • \s* - matches 0 or more white-spaces
  • ((?:(?!-{3} admonition)[\s\S])*?)\s*-{3} admonition - matches 0 or more occurrences of any character that does not start with --- admonition. After matching all such characters, it matches 0 or more whote-spaces followed by the word --- admonition

CodePudding user response:

So many ways I guess in doing this; my two cents:

^\?{3}\h ad-question\h (. )\n ((?:.*\n?) ?)\n ^---\h admonition$

See an online demo


  • ^\?{3}\h ad-question\h - Start-line anchor followed by three literal question marks, 1 (Greedy) horizontal whitespace characters and literally 'ad-question' and another 1 whitespace chars;
  • (. ) - Your 1st capture group with 1 (Greedy) characters other than newline;
  • \n - 1 (Greedy) newline-chars.
  • ((?:.*\n?) ?) - A 2nd capture group with a nested non-capture group matched 1 (Lazy) times, capturing 0 characters upto an optional newline char;
  • \n - 1 (Greedy) newline-chars.
  • ^---\h admonition$ - From start-line anchor to end-line anchor, match: '---', multiple whitespace chars and 'admonition'.

CodePudding user response:

You most probably need re.DOTALL and re.MULTILINE flags. You can also use it as inline flag within the pattern: '(?s)' and '(?m)'.

DOTALL lets '.' also capture '\n' which it normally does NOT match (re.DOTALL is python - other dialects have similar flags, f.e.: JS, Java ).

You can capture yours with r'\?\?\?(.*?)\?(.*?)--- admonition' and those 2 flags.

Python example (JS has DOTALL

import re

text = """??? ad-question Quels sont les deux types de bornages ?

Il y en a deux :

- Le bornage amiable.

- Le bornage judiciaire.

test

--- admonition
??? ad-question 2  types de bornages ?

Il y en a deux :

- Le bornage judiciaire.

test 2

--- admonition"""


pattern = r'\?\?\?(.*?)\?(.*?)--- admonition'

for f in re.finditer(pattern, text, re.MULTILINE | re.DOTALL):
    print(f)
    print(f.groups())  # tuple of groups (A, B, ..) of grouped matches

Output:

<re.Match object; span=(0, 144), match='??? ad-question Quels sont les deux types de born>
(' ad-question Quels sont les deux types de bornages ', 
 '\n\nIl y en a deux :\n\n- Le bornage amiable.\n\n- Le bornage judiciaire.\n\ntest\n\n')

<re.Match object; span=(145, 251), match='??? ad-question 2  types de bornages ?\n\nIl y en>
(' ad-question 2  types de bornages ', 
 '\n\nIl y en a deux :\n\n- Le bornage judiciaire.\n\ntest 2\n\n')

Pattern '\?\?\?(.*?)\?(.*?)--- admonition' explained:

\?\?\?                 - 3 literal question marks (QM)
(.*?)\?                - non greedy capture (including \n) up to 1st QM
(.*?)--- admonition    - non greedy capture up to ---admonition        
  •  Tags:  
  • Related