I'm trying to match a string with regular expression using Python, but ignore an optional word if it's present.
For example, I have the following lines:
First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]
I'm looking to capture everything before [Ignore This Part]. Notice I also want to exclude the whitespace before [Ignore This Part]. Therefore my results should look like this:
First string
Second string
Third string (1)
I have tried the following regular expression with no luck, because it still captures [Ignore This Part]:
. (?:\s\[. \])?
Any assistance would be appreciated.
I'm using python 3.8 on Window 10.
Edit: The examples are meant to be processed one line at a time.
CodePudding user response:
Use [^[] instead of . so it doesn't match anything with square brackets and doesn't match across newlines.
^[^[\n] (?\s\[. \])?
CodePudding user response:
With your shown samples, please try following code, written and tested in Python3.
import re
var="""First string
Second string [Ignore This Part]
Third string (1) [Ignore This Part]"""
[x for x in list(map(lambda x:x.strip(),re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var))) if x]
Output will be as follows, in form of list which could be accessed as per requirement.
['First string', 'Second string', 'Third string (1)']
Here is the complete detailed explanation for above Python3 code:
- Firstly using
remodule'ssplitfunction where passing regex(.*?)(?:$|\s\[[^]]*\])with multiline reading flag enabled. This is complete function ofsplit:re.split(r'(?m)(.*?)(?:$|\s\[[^]]*\])',var) - Then passing its output to a
lambdafunction to usestripfunction to remove elements which are having new lines in it. - Applying
mapto it and creatinglistfrom it. - Then simply removing NULL items from list to get only required part as per OP.
CodePudding user response:
Perhaps you can remove the part that you don't want to match:
[^\S\n]*\[[^][\n]*]$
Explanation
[^\S\n]*Match optional spaces\[[^][\n]*]Match from[....]$End of string
Example
import re
pattern = r"[^\S\n]*\[[^][\n]*]$"
s = ("First string\n"
"Second string [Ignore This Part]\n"
"Third string (1) [Ignore This Part]")
result = re.sub(pattern, "", s, 0, re.M)
if result:
print(result)
Output
First string
Second string
Third string (1)
If you don't want to be left with an empty string, you can assert a non whitespace char to the left:
(?<=\S)[^\S\n]*\[[^][\n]*]$
CodePudding user response:
You may use this regex:
^. ?(?=$|\s*\[[^]]*]$)
If you want better performing regex then I suggest:
^\S (?:\s \S )*?(?=$|\s*\[[^]]*]$)
RegEx Details:
^: Start. ?: Match 1 of any characters (lazy match)(?=: Start lookahead$: End|: OR\s*: Match 0 or more whitespaces\[[^]]*]: Match[...]text$: End
): Close lookahead
