Home > Mobile >  Create dictionary from values extracted through regex
Create dictionary from values extracted through regex

Time:02-08

I have a sparql query (qres1) that fetches strings from concepts of an RDF file (eg. below), on which I am applying regex to get two values. I would like to store these values as key-value pair in dictionary.

eg. (rdflib.term.Literal('skin sarcoma', lang='en'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))

pattern_doid = '.*\/(DOID.*)'
pattern_label = '.*\(\'(.*)\',.*'
doid = []
label = []
dict = {}

for line in qres1:
    doid = re.findall(pattern_doid, str(line[0]), re.MULTILINE)
    label = re.findall(pattern_label, str(line[1]), re.MULTILINE)

   #create dictionary with doid as key and prefLabel as value
    dict[doid[0]] = label[0]

This gives me the following error. IndexError: list index out of range

How can I create such dictionary. Any help is highly appreciated.

CodePudding user response:

I've tweaked the regex but generally it seems okay.

>>> import re
>>> re.findall(r'.*\/(DOID_\d ).*', "rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))", re.MULTILINE)
['DOID_2687']
>>> re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))", re.MULTILINE)
['http://purl.obolibrary.org/obo/DOID_2687']

You will get an indexing error if the string doesn't have a 'doid' or 'label'.

e.g.

>>> re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef'))", re.MULTILINE)[0]
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef'))", re.MULTILINE)[0]
IndexError: list index out of range

CodePudding user response:

You could use zip to pair up the keys and values so that if any of them is missing, you won't get an error:

myDictionary.update(zip(doid[:1],label))

BTW dict is a type name in Python, you should not use it as a variable name.

You might also want to check the order of the lines (line[0],line[1]) vs the patterns (doid,label) you are searching for. Seems to me that the 'DOID_' part is at line[1] based on your example data)

  •  Tags:  
  • Related