Home > database >  Export information from child nodes in xml using Python
Export information from child nodes in xml using Python

Time:01-26

I have an xml file called persons.xml in the following format:

<?xml version="1.0" encoding="UTF-8"?>
<persons>
  <person id="1" name="John">
    <city id="21" name="New York"/>
  </person>
  <person id="2" name="Mary">
    <city id="22" name="Los Angeles"/>
  </person>
</persons>

I want to export to a file the list of person names along with the city names

import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('./persons.xml')
root = tree.getroot()

df_cols = ["person_name", "city_name"]
rows = []

for node in root: 
    person_name = node.attrib.get("name")

    rows.append({"person_name": person_name})

out_df = pd.DataFrame(rows, columns = df_cols)    
out_df

Obviously this part of the code will only work for obtaining the name as it’s part of the root, but I can’t figure out how to loop through the child nodes too and obtain this info. Do I need to append something to root to iterate over the child nodes?

I can obtain everything using root.getchildren but it doesn’t allow me to return only the child nodes:

children = root.getchildren()
for child in children:
    ElementTree.dump(child)

Is there a good way to get this information?

CodePudding user response:

See below

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<persons>
   <person id="1" name="John">
      <city id="21" name="New York" />
   </person>
   <person id="2" name="Mary">
      <city id="22" name="Los Angeles" />
   </person>
</persons>'''

root = ET.fromstring(xml)
data = []
for p in root.findall('.//person'):
  data.append({'parson': p.attrib['name'], 'city': p.find('city').attrib['name']})
df = pd.DataFrame(data)
print(df)

output

  parson         city
0   John     New York
1   Mary  Los Angeles
  •  Tags:  
  • Related