Python regex to check if a substring is at the beginning or at the end of a bigger path to look for-CodePudding

I have a string containing words in the form word1_word2, word3_word4, word5_word1 (so a word can appear at the left or at the right). I want a regex that looks for all the occurrences of a specific word, and returns the "super word" containing it. So if I'm looking for word1, I expect my regex to return word1_word2, word5_word1. Since the word can appear on the left or on the right, I wrote this:

re.findall("( {}_)?[\u0061-\u007a\u00e0-\u00e1\u00e8-\u00e9\u00ec\u00ed\u00f2-\u00f3\u00f9\u00fa]*(_{} )?".format("w1", "w1"), string)

With the optional blocks at the beginning or at the end of the pattern. However, it takes forever to execute and I think something is not correct because I tried removing the optional blocks and writing two separate regex for looking at the beginning and at the end and they are much faster (but I don't want to use two regex). Am I missing something or is it normal?

CodePudding user response：

This would be the regex solution to your problem:

re.findall(rf'\b({yourWord}_\w ?|\w ?_{yourWord})\b', yourString)

CodePudding user response：

Python provides some methods to do this

a=['word1_word2', 'word3_word4', 'word5_word1']

b = [x for x in a if x.startswith("word1") or x.endswith('word1')]
print(b) # ['word1_word2', 'word5_word1']

Referenece link

CodePudding user response：

s = 'word1_word2, word3_word4, word5_word1'

matches = re.finditer(r'(\w _word1)|(word1_\w )', s)
result = list(map(lambda x: x.group(), matches))
['word1_word2', 'word5_word1']

This is one method, but seeing @Carl his answer I voted for his. That is a faster and cleaner method. I will just leave it here as one of many regex options.

CodePudding user response：

this regex will do the job for word1:

regex = (word\d_)*word1(_word\d)*
re.findall(regex, string)

you can also use this:

re.findall(rf'\b(word{number}_\w ?|\w ?_word{number})\b', string)

CodePudding user response：

Try the following regex.

In the following, replace word1 with the word you're looking for. This is assuming that the word you are looking for consists of only alphanumeric characters.

([a-zA-Z0-9]*_word1)|(word1_.[a-zA-Z0-9]*)