Home > Mobile >  What is the fastest way to restrict a number of repeating characters with different length?
What is the fastest way to restrict a number of repeating characters with different length?

Time:02-01

I would like to restrict number of repeating characters in a string given that different characters have different restrictions.

Suppose, I have a string Mary,,, had!!!!! a--- little ? lamb........ and list of characters that are allowed to have a higher number of restriction chars = '.!?'. This means that I want to have all punctuation signs like ,- (suppose I have a list of those) to occur only once in a row, while characters from chars can occur max 3 times in a row.

Thus the final string will be formatted like this: Mary, had!!! a- little ? lamb...

Could anyone give me a hint what is the fastest way to do that, please? I suppose I will have to use groupby from itertools, but I can't quite wrap my head around it. Any tips are appreciated! Thank you in advance!

CodePudding user response:

You can use re.sub together with a lambda function which handles the replacement logic:

import re

n_max = {**dict.fromkeys('-,', 1), **dict.fromkeys('.!?', 3)}

test_string = 'Mary,,, had!!!!! a--- little ? lamb........'
result = re.sub(
    r'([{chars}])\1 '.format(chars=''.join(re.escape(c) for c in n_max)),
    lambda m: m.group(0)[:n_max[m.group(1)]],
    test_string,
)

CodePudding user response:

Another solution with re.sub that goes without callback function:

import re

only_once = ',-'
only_thrice = '.!?'

regex = f"([{re.escape(only_once)}])\\1 |([{re.escape(only_thrice)}])\\2{{3,}}"

# example
s = 'Mary,,, had!!!!! a--- little ? lamb........'
result = re.sub(regex, r"\1\2\2\2", s)

CodePudding user response:

You could indeed use groupby and setup a dictionary of number of allowed repetition for characters that have a restriction:

from itertools import groupby,islice
from collections import Counter

maxRep  = Counter(",-"*1   ".!?"*3)

output:

S = "Mary,,, had!!!!! a--- little ? lamb........"

S = "".join(c for g,r in groupby(S) for c in islice(r,0,maxRep.get(g)))

print(S)
# Mary, had!!! a- little ? lamb...

Note that this is slower than regular expressions (the re module). However, if you want to use regular expressions, it will be simpler and faster to perform clean-ups by deleting superfluous characters than replacing repetitions with their maximum steaks

import re
pattern = "[{0}] (?=[{0}]{{{1},{1}}})"    # look ahead for x reps
max1 = pattern.format(r",-",1)            # [,-] (?=[,-]{1,1})
max3 = pattern.format(r".!?",3)           # [.!?] (?=[.!?]{3,3})
restrictions = re.compile(max1 "|" max3)

Note that you will have to use escaping if you want restrictions on characters that need to be escaped within a character class in a regular expression (e.g. a closing square bracket: r"\]")

output:

S = "Mary,,, had!!!!! a--- little ? lamb........"

S = restrictions.sub("",S)

print(S)
# Mary, had!!! a- little ? lamb...

This is roughly 3x faster than the groupby solution

  •  Tags:  
  • Related