Concatenate the count number with a duplicate text pattern using python ReGex-CodePudding

I have very large text file. It contains duplicate text patterns. In the code below, we can see the pattern "Path": "/home/downloads/file" exists 3 times. I want to add/concat the count number at the end of each Path pattern according to its position. E.g. when the code finds first Path pattern, it should concatenate 1 at the end like "Path": "/home/downloads/file/1". For the second Path pattern, it should add 2 at the end e.g. "Path": "/home/downloads/file/2" and so on. My current code counts the patterns but doesn't concatenate it properly at the end of the Path pattern. Below is my code, its current output and the desired output. I've also added a small chunk from the text.

from io import StringIO  
import re


file = StringIO("""{
    "title": "Pilot",
    "image": [
        {
            "Path": "/home/downloads/file"
            "Path": "/home/downloads/file",
            "Path": "/home/downloads/file"
        }
    ],
    "content": "<p>The wing man ...</p>"
}""")

text = file.read()
patterns = r'"Path": "(.*?)"'

count = 0

for match in re.finditer(patterns, text):
   count  = 1
   replace = '"Path": "\\1/'   str(count)   '"'
   text = re.sub(patterns, replace, text)

print(text)

Current output of the code is:

{
    "title": "Pilot",
    "image": [
        {
            "Path": "/home/downloads/file/1/2/3"
            "Path": "/home/downloads/file/1/2/3",
            "Path": "/home/downloads/file/1/2/3"
        }
    ],
    "content": "<p>The wing man ...</p>"
}

Desired output is:

{
    "title": "Pilot",
    "image": [
        {
            "Path": "/home/downloads/file/1"
            "Path": "/home/downloads/file/2",
            "Path": "/home/downloads/file/3"
        }
    ],
    "content": "<p>The wing man ...</p>"
}

CodePudding user response：

You have to limit the times that re.sub makes the replacement:

for cnt,match in enumerate(re.finditer(patterns, text),1):
    replace = '"Path": "\\1/'   str(cnt)   '"'
    text = re.sub(patterns, rf'\1/{cnt}', text, count=1)

CodePudding user response：

You can use re.sub with a function to replace non-overlapping occurences as follows.

Code

text = file.read()
patterns = r'"Path": "(.*?)"'

def repl(m):
    global count
    count  = 1                                         # update count with each 
                                                       # detection of pattern
    return m.group(0).replace('file', f'file/{count}') # Desired substitution

count = 0
text = re.sub(patterns, repl, text)  # applies function repl to each detection of pattern

print(text)

Output

{
    "title": "Pilot",
    "image": [
        {
            "Path": "/home/downloads/file/1"
            "Path": "/home/downloads/file/2",
            "Path": "/home/downloads/file/3"
        }
    ],
    "content": "<p>The wing man ...</p>"
}