How to fix my regex expression for Start and End?-CodePudding

I have the following regex expression:

(?i)[ \t]*(logout|log out|signout)[ \t]*

I want it to match only logout surrounded by spaces in one condition that nothing before or after spaces (ie it's alone in the line).

I tried to add ^ and $ as many mentioned but didn't work and still these are matches:

"logout"
<a href="logout.asp"><script>i18n("Logout")</script></a>

Why is this? and how to fix it?

For example in python those are matches:

match1 = 'logout' match2 = ' logout '

Simple POC:

import re
match1 = '"logout"'
match2 = 'zlogout'
print(re.findall(r'(?i)[ \t]*(logout|log out|signout)[ \t]*$', match1))

CodePudding user response：

This is a very simple solution from my end. You can strip off the leading and trailing spaces of input using python "strip" function.

txt = "     test     "
x = txt.strip()
x >>> test

And the you can use this regex to match

logout|log out|signout

You can use this website to test your regex: https://regexr.com/

CodePudding user response：

The problem is that | dividing your pattern where you're not expecting it and you're missing the ^ to mark the start of a line.

You are using r'(?i)[ \t]*logout|log out|signout[ \t]*$' which means matching anything that matches one of [ \t]*logout OR log out OR signout[ \t]*$. You are also missing the ^ to mark the start of the line. I think you want something like r'(?i)\s*(logout|log out|signout)\s*$'. Note the parentheses around the patterns. \s means any whitespace and is a bit clearer than [ \t].

You could also use re.match to match the whole string instead of findall which matches anywhere in the string.

If you've not come across it already, I can't recommend regex101.com enough for this sort of debugging. It breaks each section of the regex down and explains what it's doing and if you add sample strings it shows which bit matches which part of the pattern.