I am trying to parse ICD-10 codes and given a code, identify: description, parent, children (if any). Given this sample of the larger file I'm parsing:
A sample of the larger XML file I'm parsing
data = ```<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
<name>A00.0</name>
<desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
<inclusionTerm>
<note>Classical cholera</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.1</name>
<desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
<inclusionTerm>
<note>Cholera eltor</note>
</inclusionTerm>
</diag>
<diag>
<name>A00.9</name>
<desc>Cholera, unspecified</desc>
</diag>
</diag>```
How can I write a code in python that gives me the specific element by searching its name? (example: I'm looking for the code A00.0 and I want the program to print the found A00.0 code, plus the description and the inclusionTerm.
CodePudding user response:
As @mzjn mentioned, you can use xml.etree, which is part of the Python Standard Library. In particular, you may want to checkout XPath Support:
from xml.etree import ElementTree as ET
root = ET.parse('myfile.xml') # alternatively use ET.fromstring()
# Find diagnosis by "name" (ICD-10 code)
diag = root.find(".//*[name='A00.0']")
# Print out some information ("name" and "desc" tags)
print(diag.find('name').text)
print(diag.find('desc').text)
Explanation of XPath .//*[name='A00.0']
.select current node (XML root element)//select all subelements (search tree recursively)*select all child elements[name='A00.0']select element that has a child namednamewhose text isA00.0
So calling root.find() with this XPath finds this node:
<diag>
<name>A00.0</name>
<desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
<inclusionTerm>
<note>Classical cholera</note>
</inclusionTerm>
</diag>
