Given a two file path
Z:\home\user\dfolder\NO,AG,GK.jpg
Z:\home\user\dfolder\NI,DG,BJ (1).jpg
The objective is to split each string and store into a dict
Currently, I first split the path using os.path.split to get list of s
s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']
and iteratively split the string as below
all_dic=[]
for ds in s:
k=ds.split(",")
kk=k[-1].split('.jpg')[0].split("(")[0] if bool(re.search('\(\d \)', ds)) else k[-1].split('.jpg')[0]
nval={"f":k[0],"s":k[1],"t":kk}
all_dic.append(nval)
But, I am curious for a regex approach, or any 1 liner .
CodePudding user response:
One liner parsing using regex inline list parsing:
import re
s = ['NO,AG,GK.jpg', 'NI,DG,BJ (1).jpg']
keys = ['f', 's', 't']
all_dic = [{keys[k]: x for k, x in enumerate(
re.sub("(\s\(\d \))?(\.jpg)?", "", item).split(','))} for item in s]
print(all_dic)
->
[{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]
CodePudding user response:
Well, I think this is the easiest way to get the same output without using the split() function.
The regular expression takes only the letters and puts them in a list, so we don't even have to split the string or remove the (1) from it.
import re
s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']
all_dic = []
for ds in s:
regex = '[a-zA-Z] '
k = re.findall(regex,ds) # We extract all the matches (as a list)
nval={'f':k[0],'s':k[1],'t':k[2]} # We create the dictionary
all_dic.append(nval) # We append the dictionary to the list
print(all_dic)
# Output: [{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]
Also, you have the file extension in k[3], just in case you need it.
