replace the alphanumeric numbers with X-CodePudding

I am trying to replace the alphanumeric text with X for numbers but my text also has got other numbers which are not alphanumeric and those not to be replaced. I could able to locate the alphanumeric using the code below but unable to replace them.

Code

result = re.sub(r'\S*\d \S*,'',text)

text = my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453

expected output is

my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453

CodePudding user response：

I think you could try to use PyPi's regex library instead:

\b\d (?:\.\d )?\b(*SKIP)(*F)|\d

See an online demo

\b\d (?:\.\d )?\b - First match what we don't want. In this case I matched any 1 digits with optional decimals inbetween word-boundaries;
(*SKIP)(*F) - Forget what we matched and exclude it from the final results;
| - Or;
\d - Any single digit.

import regex as re
s = 'my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453'
print(re.sub(r'\b\d (?:\.\d )?\b(*SKIP)(*F)|\d', '*', s))

Prints:

my text contains of the 33109 following values RT****SO** and also with certain numbers godwil_**** and ***stwe and *y*yc* 456453

CodePudding user response：

There's probably a way to do this purely with RE. However, here's a composite approach that seems to fulfil the brief:

import re
text = "my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453"

def replace_numbers(s):
    tokens = s.split()
    for i, t in enumerate(tokens):
        if not t.isdigit():
            tokens[i] = re.sub(r'\d', 'X', t)

    return ' '.join(tokens)

print(replace_numbers(text))

Output:

my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453

CodePudding user response：

Would you please try:

import re

text = 'my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453'

s = re.sub(r'(?<=[A-Za-z_])\d |\d (?=[A-Za-z_])', lambda x: 'X' * len(x.group(0)), text)
print(s)

Result:

my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453

CodePudding user response：

Another option using re is to match a word that has at least a digit and a char a-zA-Z and then replace the digits from the match with X

(?i)\b(?=[^\W\d]*\d)[\d_]*[A-Z]\w*

(?i) Inline modifier for case insensitive (you can also use re.I on re.sub)
\b A word boundary to prevent a partial word match
(?=[^\W\d]*\d) Positive lookhead to assert at least a digit ( [^\W\d] is a word char without a digit )
[\d_]*[A-Z] Match optional digits or _ and then match A-Z
\w* Match optional word characters

Regex demo

import re

pattern = r"(?i)\b(?=[^\W\d]*\d)[\d_]*[A-Z]\w*"
text = "my text contains of the 33109 following values RT3123SO55 and also with certain numbers godwil_5708 and 323stwe and 8y9yc2 456453"

print(re.sub(pattern, lambda m: re.sub(r"\d", "X", m.group()), text))

Output

my text contains of the 33109 following values RTXXXXSOXX and also with certain numbers godwil_XXXX and XXXstwe and XyXycX 456453