Home > Enterprise >  Remove specific attribute from entire xml in Python
Remove specific attribute from entire xml in Python

Time:01-10

I need to remove all id attributes from XML using Python. It will be part of a bigger app and will be the input for some transformations after.

Example code:

<body>
    <r1 format="bold" id="NODE1">
        <r2 title="Test" id="NODE2">
            <r3 group="123" type="Operation" id="NODE3">
                <rtit id="NODE4">Evaluate the temperature</rtit>
                <procedure id="NODE5">
                    <procstep id="NODE6">
                        <graphelem id="NODE7">
                            <graphic graphicname="T123456" res_width="3.58in" scale="70" id="NODE8"/>
                        </graphelem>
                        <proct>Remove the screws. Remove the plates.</proct>
                    </procstep>
                    <procstep id="NODE9">
                        <graphelem id="NODE10">
                            <graphic graphicname="T654321" res_width="3.58in" scale="70" id="NODE11"/>
                        </graphelem>
                        <proct>Fix the thermocouple in the cover.</proct>
                    </procstep>
                </procedure>
            </r3>
        </r2>
    </r1>
</body>

The source files have more than 1000 lines, and more than 30 different XML tags that contain the id attribute.

The expected result is:

<body>
    <r1 format="bold">
        <r2 title="Test">
            <r3 group="123" type="Operation">
                <rtit>Evaluate the temperature</rtit>
                <procedure>
                    <procstep>
                        <graphelem>
                            <graphic graphicname="T2093978" res_width="3.58in" scale="70"/>
                        </graphelem>
                        <proct>Remove the screws. Remove the plates.</proct>
                    </procstep>
                    <procstep>
                        <graphelem>
                            <graphic graphicname="T654321" res_width="3.58in" scale="70"/>
                        </graphelem>
                        <proct>Fix the thermocouple in the cover.</proct>
                    </procstep>
                </procedure>
            </r3>
        </r2>
    </r1>
</body>

I've tried to use xslt to make the transformation except for the id attribute, but without any success.

Does anyone help me with this issue, please?

CodePudding user response:

I need to remove all id attributes from XML using Python.

Something like the below - loop over all elements and drop the 'id' attrib

import xml.etree.ElementTree as ET


xml = '''<body><r1 format="bold" id="NODE1">
        <r2 title="Test" id="NODE2">
            <r3 group="123" type="Operation" id="NODE3">
                <rtit id="NODE4">Evaluate the temperature</rtit>
                <procedure id="NODE5">
                    <procstep id="NODE6">
                        <graphelem id="NODE7">
                            <graphic graphicname="T123456" res_width="3.58in" scale="70" id="NODE8"/>
                        </graphelem>
                        <proct>Remove the screws. Remove the plates.</proct>
                    </procstep>
                    <procstep id="NODE9">
                        <graphelem id="NODE10">
                            <graphic graphicname="T654321" res_width="3.58in" scale="70" id="NODE11"/>
                        </graphelem>
                        <proct>Fix the thermocouple in the cover.</proct>
                    </procstep>
                </procedure>
            </r3>
        </r2>
    </r1>
</body>'''

root = ET.fromstring(xml)
for elem in root.iter():
  if 'id' in elem.attrib:
    del elem.attrib['id']
ET.dump(root)
  •  Tags:  
  • Related