I'm trying to write a regex pattern that will fail a match if the preceding pattern contains any character except pure whitespace, for example
--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
--comment with some indentation (match)
--another comment with some indentation (match)
The closest I've got to is with this pattern I made (?<!.)--.*\n, that gives me this result
--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
--comment with some indentation (fail)
--another comment with some indentation (fail)
I've tried (?<!\s)--.*\n and (?<=\S)--.*\n but both return no matches at all
EDIT: a regexr.com illustrating the issue more clearly regexr.com/6j0mt
CodePudding user response:
With PyPi regex, you can use
import regex
text = r"""--hello
--goodbye
ROW_NUMBER() OVER (ORDER BY DATE) --date
--comment with some indentation
--another comment with some indentation"""
print( regex.findall(r'(?<=^[^\S\r\n]*)--.*', text, regex.M) )
# => ['--hello', '--goodbye', '--comment with some indentation', '--another comment with some indentation']
See this Python demo online.
Or, with the default Python re:
import re
text = r"""--hello
--goodbye
ROW_NUMBER() OVER (ORDER BY DATE) --date
--comment with some indentation
--another comment with some indentation"""
print( re.findall(r'^[^\S\r\n]*(--.*)', text, re.M) )
See this Python demo.
Pattern details
(?<=^[^\S\r\n]*)- a positive lookbehind that matches a location that is immediately preceded with start of string/line and zero or more horizontal whitespaces^- start of a string (here, a line, becausere.M/regex.Moption is used)[^\S\r\n]*- zero or more chars other than non-whitespace, CR and LF chars (any whitespace but carriage returns and line feed chars)(--.*)- Group 1:--and the rest of the line (.*matches zero or more chars other than line break chars as many as possible).
