ptx captures most of what i want. Because i am incompetent at combining many things into one regex) i created a second ptx1 regex that should capture the following character sequences ADDITIONALLY:
One Department, One foreign Department, Two office
text_list = ' '.join(map(str, text))
ptx = re.compile(r'(\s something(?:\s |\\n)*patternx:)(.*)(One\s foreign)', flags = re.DOTALL | re.MULTILINE)
ptx1 = re.compile(r'(\s something(?:\s |\\n)*patternx:)(.*)((One|Two)\s (?:foreign\s )*Department|office)', flags = re.DOTALL | re.MULTILINE)
ten = ptx.search(text_list)
eleven = ptx1.search(text_list)
try:
if ten:
ten = ten.group(2)
else:
ten = None
except:
pass
here is what i added before else above: It didnt work.
elif:
ten = eleven.group(2)
My question is: How do i need to call the group on the elif statement in order to get the (.*) or text_i_want content returned? I have the gut feeling that i need to access the eleven as if it were a list because it has so many capturing groups by eleven[0].group(1) in order to get first element from the list and get its second group. But that didnt work either.
You can think of text_list like this
text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']
CodePudding user response:
It looks as if you got tricked when factoring in the alternatives on the right hand side.
You need to use
\bsomething\s patternx:(.*?)\b(?:One\s foreign|One\s Department|One\s foreign\s Department|Two\s office)\b
which can be shortened as
\bsomething\s patternx:(.*?)\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b
See the regex demo. Details:
\bsomething\s patternx:- whole wordsomething, one or more whitespaces,patternx:string(.*?)- Group 1: any zero or more chars as few as possible\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b- eitherOne Department,One foreign,One foreign Department, orTwo officeas whole words.
See the Python demo:
import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s patternx:(.*?)\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want ']
