I have a sparql query (qres1) that fetches strings from concepts of an RDF file (eg. below), on which I am applying regex to get two values. I would like to store these values as key-value pair in dictionary.
eg. (rdflib.term.Literal('skin sarcoma', lang='en'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))
pattern_doid = '.*\/(DOID.*)'
pattern_label = '.*\(\'(.*)\',.*'
doid = []
label = []
dict = {}
for line in qres1:
doid = re.findall(pattern_doid, str(line[0]), re.MULTILINE)
label = re.findall(pattern_label, str(line[1]), re.MULTILINE)
#create dictionary with doid as key and prefLabel as value
dict[doid[0]] = label[0]
This gives me the following error. IndexError: list index out of range
How can I create such dictionary. Any help is highly appreciated.
CodePudding user response:
I've tweaked the regex but generally it seems okay.
>>> import re
>>> re.findall(r'.*\/(DOID_\d ).*', "rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))", re.MULTILINE)
['DOID_2687']
>>> re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef('http://purl.obolibrary.org/obo/DOID_2687'))", re.MULTILINE)
['http://purl.obolibrary.org/obo/DOID_2687']
You will get an indexing error if the string doesn't have a 'doid' or 'label'.
e.g.
>>> re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef'))", re.MULTILINE)[0]
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
re.findall(r'.*\(\'(.*)\'\).*', "rdflib.term.URIRef'))", re.MULTILINE)[0]
IndexError: list index out of range
CodePudding user response:
You could use zip to pair up the keys and values so that if any of them is missing, you won't get an error:
myDictionary.update(zip(doid[:1],label))
BTW dict is a type name in Python, you should not use it as a variable name.
You might also want to check the order of the lines (line[0],line[1]) vs the patterns (doid,label) you are searching for. Seems to me that the 'DOID_' part is at line[1] based on your example data)
