I have a list of strings. Each element represents a field as key value separated by space:
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
Behavior
I need to return a dict out of this list with expanding the keys like 'xyz0-1' by the range denoted by 0-1 into multiple keys like abcd1 and abcd2 with the same value like 4d4e.
It should run as part of an Ansible plugin, where Python 2.7 is used.
Expected
The end result would look like the dict below:
{
abcd1: 4d4e,
abcd2: 4d4e,
xyz0: 551,
xyz1: 551,
foo: 3ea,
bar1: 2bd,
mc-mqisd0: 77a,
mc-mqisd1: 77a,
mc-mqisd2: 77a,
}
Code
I have created below function. It is working with Python 3.
def listFln(listA):
import re
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.sub('^(. ?)(\d -\d )?$',r'\1',aL)
cmpCountR = re.sub('^(. ?)(\d -\d )?$',r'\2',aL)
if cmpCountR.strip():
nStart = int(cmpCountR.split('-')[0])
nEnd = int(cmpCountR.split('-')[1])
for j in range(nStart,nEnd 1):
fL.append(comp str(j) ' ' bL)
else:
fL.append(i)
return(dict([k.split() for k in fL]))
Error
In lower python versions like Python 2.7. this code throws an "unmatched group" error:
cmpCountR = re.sub('^(. ?)(\d -\d )?$',r'\2',aL)
File "/usr/lib64/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib64/python2.7/re.py", line 275, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.7/sre_parse.py", line 800, in expand_template
raise error, "unmatched group"
Anything wrong with the regex here?
CodePudding user response:
Here's a simpler version using findall instead of sub, successfully tested on 2,7. It also directly creates the dict instead of first building a list:
mylist=[
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
import re
fL = {}
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.findall('^(. ?)(\d -\d )?$',aL)[0]
if comp[1]:
nStart = int(comp[1].split('-')[0])
nEnd = int(comp[1].split('-')[1])
for j in range(nStart,nEnd 1):
fL[comp[0] str(j)] = bL
else:
fL[comp[0]] = bL
return fL
print(listFln(mylist))
# {'abcd1': '4d4e',
# 'abcd2': '4d4e',
# 'xyz0': '551',
# 'xyz1': '551',
# 'foo': '3ea',
# 'bar1': '2bd',
# 'mc-mqisd0': '77a',
# 'mc-mqisd1': '77a',
# 'mc-mqisd2': '77a'}
CodePudding user response:
Used Python 2.7 to reproduce:
Both patterns compile
import re
# both seem identical
regex1 = '^(. ?)(\d -\d )?$'
regex2 = '^(. ?)(\d -\d )?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile() saves time when re-using multiple times like in this loop.
Fix: test for groups found
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub (docs to 2.7) there are references to groups like the second capturing group (\2) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str) or the compiled variant pattern.match(str) to create a Match object, then Match.groups() to return all found groups as tuple.
import re
regex = '^(. ?)(\d -\d )?$'
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def listFln(listA):
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
# test for match and groups found
match = pattern.match(aL)
print("DEBUG groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3 iteration has only group(1)
# break to next iteration here, if no 2nd group
if not match or not match.group(2):
continue
comp = re.sub(pattern, r'\1', aL)
cmpCountR = re.sub(pattern, r'\2', aL)
if cmpCountR.strip():
parts = cmpCountR.split('-')
nStart = int(parts[0])
nEnd = int(parts[1])
for j in range(nStart,nEnd 1):
fL.append(comp str(j) ' ' bL)
else:
fL.append(i)
return dict([k.split() for k in fL])
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
as_dict = listFln(listA)
print("resulting dict:", as_dict)
Prints:
('DEBUG groups:', ('abcd', '1-2'))
('DEBUG groups:', ('xyz', '0-1'))
('DEBUG groups:', ('foo', None))
('DEBUG groups:', ('bar1', None))
('DEBUG groups:', ('mc-mqisd', '0-2'))
('resulting dict:', {'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
CodePudding user response:
You could use a single pattern with 4 capture groups, and check if the 3rd capture group value is not empty.
^(.*?)(\d )(?:-(\d ))?\s*(.*)
The pattern matches:
^Start of string(.*?)Capture group 1, match any character, as few as possible(\d )Capture group 2, match 1 digits(?:-(\d ))?Optionally match-and capture 1 digits in group 3\s*Match optional whitespace chars(.*)Capture group 4, match the rest of the line
Code example (works on Python 2 and Python 3)
import re
strings = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
dct = {}
for s in listA:
lst = sum(re.findall(r"^(.*?)(\d )(?:-(\d ))?\s*(.*)", s), ())
if lst:
for i in range(int(lst[1]), (int(lst[2]) if lst[2] else int(lst[1])) 1):
dct[lst[0] str(i)] = lst[3]
return dct
print(listFln(strings))
Output
{
'abcd1': '4d4e',
'abcd2': '4d4e',
'xyz0': '551',
'xyz1': '551',
'foo 3': 'ea',
'bar1': '2bd',
'mc-mqisd0': '77a',
'mc-mqisd1': '77a',
'mc-mqisd2': '77a'
}
