Example first:
import re
details = 'input1 mem001 output1 mem005 data2 mem002 output12 mem006'
input_re = re.compile(r'(?!output[0-9]*) mem([0-9a-f] )')
print(input_re.findall(details))
# Out: ['001', '005', '002', '006']
I am using negative lookahead to extract the hex part of the mem entries that are not preceded by an output, however as you can see it fails. The desired output should be: ['001', '002'].
What am I missing?
CodePudding user response:
You may use this regex in findall:
\b(?!output\d )\w \s mem([a-zA-F\d] )
RegEx Details:
\b: Word boundary(?!output\d ): Negative lookahead to assert that we don't haveoutputand 1 digits ahead\w: Match 1 word characters\s: Match 1 whitespacesmem([a-zA-F\d] ): Matchmemfollowed by 1 of any hex character
Code:
import re
s = 'input1 mem001 output1 mem005 data2 mem002 output12 mem006'
print( re.findall(r'\b(?!output\d )\w \s mem([a-zA-F\d] )', s) )
Output:
['001', '002']
CodePudding user response:
Maybe an easier approach is to split it up in 2 regular expressions ? First filter out anything that starts with output and is followed by mem like so
output[0-9]* mem([0-9a-f] )
If you filter this out it would result in
input1 mem001 data2 mem002
When you have filtered them out just search for mem again
mem([0-9a-f] )
That would result in your desired output
['001', '002']
Maybe not an answer to the original question, but it is a solution to your problem
CodePudding user response:
First of all, let's understand why your original regex doesn't work:
A regex encapsulates two pieces of information: a description of a location within a text, and a description of what to capture from that location. Your original regex tells the regex matcher: "Find a location within the text where the following characters are not 'output' digits but they are ' mem' alphanumetics". Think of the logic of that expression: if the matcher finds a location in the text where the following characters are ' mem' alphanumerics, then, in particular, the following characters are not 'output' digits. Your look ahead does not add anything to the exoression.
What you really need is to tell the matcher: "Find a location in the text where the following characters are ' mem' alphanumerics, and the previous characters are not 'output' digits. So what you really need is a look-behind, not look-ahead.
@ArtyomVancyan proposed a good regex with a look-behind, and it could easily be modified to what you need: instead of a single digit after the 'output', you want potentially more digits, so just put an asterisk (*) after the '\d'.
