I have the following string
tex = r'''
...
\begin{tabular}
abcdefg
\endhead
\endlastfoot
...
'''
and I want to extract the code between \begin{tabular} and \endlastfoot or \endhead if \endlastfoot doesnt exists. the following doesnt work as I want:
res = re.search(r"\\begin{tabular}.*(?:\\endhead)?(?:\\endlastfoot)?", tex, re.DOTALL)
What should I change to avoid multiple if statements and re.search?
CodePudding user response:
/((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))|((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))/gm
There are 2 groups:
((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))(?<=\\begin{tabular})is a positive lookbehind. Searches everything after the string \begin{tabular}(?=\\endlastfoot)is a positive lookahead. Searches everything before \endlastfoot[\n\w\s\\]*matches a code block between the search strings\nfor multilines,\wfor words (a-zA-Z0-9_),\sfor spaces,\a slash.*matches this set from 0 to multiple times.
((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))- works as the first group. The only difference - this regex group searches between the strings \begin{tabular} and \endhead
