Python Regex re.compile query-CodePudding

I'm trying to find get a list of required names from list of names using a regex query.

csv file: FYI, I converted Countries from Capital to small letters

searchList:

['AU.LS1_james.aus',
'AU.LS1_scott.aus',
'AP.LS1_amanda.usa',
'AP.LS1_john.usa',
'LA.LS1_harsha.ind',
'LA.LS1_vardhan.ind']

I'm trying to get a list of each group like this,

[
['AU.LS1_james.aus', 'AU.LS1_scott.aus'],
['AP.LS1_amanda.usa', 'AP.LS1_john.usa'],
['LA.LS1_harsha.ind', 'LA.LS1_vardhan.ind']
]

Using the following regex query: \<({region}).*\{country}\>

for region, country in regionCountry:
    query = f"\<({region}).*\{country}\>"
    r = re.compile(query)
    group = list(filter(r.match, searchList))

I tried re.search as well, but the group is always None

FYI: I also tried this query in notepad find using regex functionality.

Can Anyone Tell where it's going wrong in my script.? Thank you

CodePudding user response：

Without regex:

split
And a dictionary to group the entries:

Data

entries = ['AU.LS1_james.aus', 'AU.LS1_scott.aus', 'AP.LS1_amanda.usa', 'AP.LS1_john.usa', 'LA.LS1_harsha.ind', 'LA.LS1_vardhan.ind']

Solution 1: simple dict and setdefault

d = {}
for entry in entries:
    d.setdefault(entry.split('.',1)[0], []).append(entry)

Solution 2: defaultdict

from collections import defaultdict
d = defaultdict(list)
for entry in entries:
    d[entry.split('.',1)[0]].append(entry)

Result is in d.values()

>>> list(d.values())

[['AU.LS1_james.aus', 'AU.LS1_scott.aus'],
 ['AP.LS1_amanda.usa', 'AP.LS1_john.usa'],
 ['LA.LS1_harsha.ind', 'LA.LS1_vardhan.ind']]

CodePudding user response：

I thank you all for trying to assist my question. This answer worked out well for my usage. For some reason python doesn't like \< and \>. so i just removed them and it worked fine. I didn't expect that there could be some limitations using re library.

Answer: ({region}).*\{country}