I am trying to replace the alphanumeric text with X for numbers but my text also has got other numbers which are not alphanumeric and those not to be replaced. I could able to locate the alphanumeric using the code below but unable to replace them.
Code
result = re.sub(r'\S*\d \S*,'',text)
text = my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453
expected output is
my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453
CodePudding user response:
I think you could try to use PyPi's regex library instead:
\b\d (?:\.\d )?\b(*SKIP)(*F)|\d
See an online demo
\b\d (?:\.\d )?\b- First match what we don't want. In this case I matched any 1 digits with optional decimals inbetween word-boundaries;(*SKIP)(*F)- Forget what we matched and exclude it from the final results;|- Or;\d- Any single digit.
import regex as re
s = 'my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453'
print(re.sub(r'\b\d (?:\.\d )?\b(*SKIP)(*F)|\d', '*', s))
Prints:
my text contains of the 33109 following values RT****SO** and also with certain numbers godwil_**** and ***stwe and *y*yc* 456453
CodePudding user response:
There's probably a way to do this purely with RE. However, here's a composite approach that seems to fulfil the brief:
import re
text = "my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453"
def replace_numbers(s):
tokens = s.split()
for i, t in enumerate(tokens):
if not t.isdigit():
tokens[i] = re.sub(r'\d', 'X', t)
return ' '.join(tokens)
print(replace_numbers(text))
Output:
my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453
CodePudding user response:
Would you please try:
import re
text = 'my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453'
s = re.sub(r'(?<=[A-Za-z_])\d |\d (?=[A-Za-z_])', lambda x: 'X' * len(x.group(0)), text)
print(s)
Result:
my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453
CodePudding user response:
Another option using re is to match a word that has at least a digit and a char a-zA-Z and then replace the digits from the match with X
(?i)\b(?=[^\W\d]*\d)[\d_]*[A-Z]\w*
(?i)Inline modifier for case insensitive (you can also usere.Ion re.sub)\bA word boundary to prevent a partial word match(?=[^\W\d]*\d)Positive lookhead to assert at least a digit ([^\W\d]is a word char without a digit )[\d_]*[A-Z]Match optional digits or_and then match A-Z\w*Match optional word characters
import re
pattern = r"(?i)\b(?=[^\W\d]*\d)[\d_]*[A-Z]\w*"
text = "my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453"
print(re.sub(pattern, lambda m: re.sub(r"\d", "X", m.group()), text))
Output
my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453
