I am trying to replace the multiple simultaneous repetition of patterns.
Suppose following is the text with the pattern user.
user user please sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http
Here I want such occurrence of user in this case user user to be replaced with user . Condition being that the user repetition should be adjacent to each other.
For example if the sentence was:
user user user please user user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http
I want user user user and user user each to be replaced with user
What I have come up with so far is this:
re.findall(r'[user\s] ',text)
I know to replace we will use re.sub
The output that I am getting is:
['user user user ',
'e',
'se user user s',
'r',
' ',
'u',
' ',
' ',
' s',
'u',
' ',
'e ',
'u',
' ',
'er ',
'rese',
' ',
'er ',
' ',
'r',
' ',
'ur',
'e ',
' ',
'eres',
' ',
' ',
'u',
' ',
' ',
' e',
'e',
' ',
' ',
'e',
'ers',
' ',
'r',
'er ',
's ',
's',
'u',
'e',
' r',
' ',
' ']
So I just want the first and the third element to be found and third element should be user user instead of se user user s
So when you answer please could you explain how would that expression work. I am very new to regex.
CodePudding user response:
I hope I've understand your question right. This will shorten user user to user (even for more repetitions):
import re
s = "user user user please user user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http"
s = re.sub(
r"(?:(\suser\b)|(\buser\s)){2,}",
lambda g: " user" if g.group(1) else "user ",
s,
)
print(s)
Prints:
user please user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http
