Split string from two pattern based on regex Python-CodePudding

Given a two file path

Z:\home\user\dfolder\NO,AG,GK.jpg
Z:\home\user\dfolder\NI,DG,BJ (1).jpg

The objective is to split each string and store into a dict

Currently, I first split the path using os.path.split to get list of s

s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']

and iteratively split the string as below

all_dic=[]
for ds in s:
  k=ds.split(",")
  kk=k[-1].split('.jpg')[0].split("(")[0] if bool(re.search('\(\d \)', ds)) else k[-1].split('.jpg')[0]
  nval={"f":k[0],"s":k[1],"t":kk}
  all_dic.append(nval)

But, I am curious for a regex approach, or any 1 liner .

CodePudding user response：

One liner parsing using regex inline list parsing:

import re

s = ['NO,AG,GK.jpg', 'NI,DG,BJ (1).jpg']

keys = ['f', 's', 't']
all_dic = [{keys[k]: x for k, x in enumerate(
    re.sub("(\s\(\d \))?(\.jpg)?", "", item).split(','))} for item in s]

print(all_dic)

[{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]

CodePudding user response：

Well, I think this is the easiest way to get the same output without using the split() function.

The regular expression takes only the letters and puts them in a list, so we don't even have to split the string or remove the (1) from it.

import re

s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']
all_dic = []

for ds in s:
    regex = '[a-zA-Z] '
    k = re.findall(regex,ds)            # We extract all the matches (as a list)

    nval={'f':k[0],'s':k[1],'t':k[2]}   # We create the dictionary
    all_dic.append(nval)                # We append the dictionary to the list
        
print(all_dic)
# Output: [{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]

Also, you have the file extension in k[3], just in case you need it.