Home > Software design >  Python regex to check if a substring is at the beginning or at the end of a bigger path to look for
Python regex to check if a substring is at the beginning or at the end of a bigger path to look for

Time:01-25

I have a string containing words in the form word1_word2, word3_word4, word5_word1 (so a word can appear at the left or at the right). I want a regex that looks for all the occurrences of a specific word, and returns the "super word" containing it. So if I'm looking for word1, I expect my regex to return word1_word2, word5_word1. Since the word can appear on the left or on the right, I wrote this:

re.findall("( {}_)?[\u0061-\u007a\u00e0-\u00e1\u00e8-\u00e9\u00ec\u00ed\u00f2-\u00f3\u00f9\u00fa]*(_{} )?".format("w1", "w1"), string)

With the optional blocks at the beginning or at the end of the pattern. However, it takes forever to execute and I think something is not correct because I tried removing the optional blocks and writing two separate regex for looking at the beginning and at the end and they are much faster (but I don't want to use two regex). Am I missing something or is it normal?

CodePudding user response:

This would be the regex solution to your problem:

re.findall(rf'\b({yourWord}_\w ?|\w ?_{yourWord})\b', yourString)

CodePudding user response:

Python provides some methods to do this

a=['word1_word2', 'word3_word4', 'word5_word1']

b = [x for x in a if x.startswith("word1") or x.endswith('word1')]
print(b) # ['word1_word2', 'word5_word1']

Referenece link

CodePudding user response:

s = 'word1_word2, word3_word4, word5_word1'

matches = re.finditer(r'(\w _word1)|(word1_\w )', s)
result = list(map(lambda x: x.group(), matches))
['word1_word2', 'word5_word1']

This is one method, but seeing @Carl his answer I voted for his. That is a faster and cleaner method. I will just leave it here as one of many regex options.

CodePudding user response:

this regex will do the job for word1:

regex = (word\d_)*word1(_word\d)*
re.findall(regex, string)

you can also use this:

re.findall(rf'\b(word{number}_\w ?|\w ?_word{number})\b', string)

CodePudding user response:

Try the following regex.

In the following, replace word1 with the word you're looking for. This is assuming that the word you are looking for consists of only alphanumeric characters.

([a-zA-Z0-9]*_word1)|(word1_.[a-zA-Z0-9]*)
  •  Tags:  
  • Related