I am trying to parse ICD-10 codes and given a code, identify: description, parent, children (if any). Given this sample of the larger file I'm parsing:

A sample of the larger XML file I'm parsing

data = ```<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
  <name>A00.0</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
  <inclusionTerm>
    <note>Classical cholera</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.1</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar eltor</desc>
  <inclusionTerm>
    <note>Cholera eltor</note>
  </inclusionTerm>
</diag>
<diag>
  <name>A00.9</name>
  <desc>Cholera, unspecified</desc>
</diag>
</diag>```

How can I write a code in python that gives me the specific element by searching its name? (example: I'm looking for the code A00.0 and I want the program to print the found A00.0 code, plus the description and the inclusionTerm.

CodePudding user response：

As @mzjn mentioned, you can use xml.etree, which is part of the Python Standard Library. In particular, you may want to checkout XPath Support:

from xml.etree import ElementTree as ET

root = ET.parse('myfile.xml')  # alternatively use ET.fromstring()

# Find diagnosis by "name" (ICD-10 code)
diag = root.find(".//*[name='A00.0']")

# Print out some information ("name" and "desc" tags)
print(diag.find('name').text)
print(diag.find('desc').text)

Explanation of XPath `.//*[name='A00.0']`

. select current node (XML root element)
// select all subelements (search tree recursively)
* select all child elements
[name='A00.0'] select element that has a child named name whose text is A00.0

So calling root.find() with this XPath finds this node:

<diag>
  <name>A00.0</name>
  <desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
  <inclusionTerm>
    <note>Classical cholera</note>
  </inclusionTerm>
</diag>

A sample of the larger XML file I'm parsing

Explanation of XPath .//*[name='A00.0']

Explanation of XPath `.//*[name='A00.0']`