Home > database >  How to replace the contents of a XML child element with a complement DOM document object from anothe
How to replace the contents of a XML child element with a complement DOM document object from anothe

Time:01-07

i have parsed and stored a xml file as document object using the code below.

import xml.dom.minidom as DOM
import shutil
import xml.etree.ElementTree as ET
metadata_path=r"C:\Users\ar\DD2MI_result.xml"
new_metadata=DOM.parse(metadata_path)

Now i want to use this complete document object to replace the data of the child node in another xml file. i am able to get the child node like this:

output_draft = r"C:\Users\ar\airquality.xml"
doc = DOM.parse(output_draft)
meta=doc.getElementsByTagName('XmlDoc')
for metadata in meta:
    if metadata.firstChild.data:
        metadata.firstChild.replaceData(0,len(new_metadata),new_metadata)
        print (metadata.firstChild.data)

When i run the above code i get the error, TypeError: object of type 'Document' has no len() which i understand as it is an object. How can i use the complete object or file to replace the current contents?

airquality.xml

<?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xmlns:gco="http://www.isotc211.org/2005/gco"
                 xmlns:gmd="http://www.isotc211.org/2005/gmd"
                 xmlns:srv="http://www.isotc211.org/2005/srv"
                 xmlns:gmx="http://www.isotc211.org/2005/gmx"
                 xmlns:gsr="http://www.isotc211.org/2005/gsr"
                 xmlns:gss="http://www.isotc211.org/2005/gss"
                 xmlns:gts="http://www.isotc211.org/2005/gts"
                 xmlns:gml="http://www.opengis.net/gml/3.2"
                 xmlns:xlink="http://www.w3.org/1999/xlink"
                 xmlns:xs="http://www.w3.org/2001/XMLSchema"
                 xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd">
   <gmd:fileIdentifier>
      <gco:CharacterString>https://hdl.handle.net/20.500.12085/1f97f2a1-75fc-4110-ae22-f873d7d86565@metadata</gco:CharacterString>
   </gmd:fileIdentifier>
   <gmd:language>
      <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
   </gmd:language>
 </gmd:MD_Metadata>

DD2MI_result.xml before replacement

<SVCManifest xmlns:xs="http://www.w3.org/2001/XMLSchema"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:type="typens:SVCManifest">
<Databases xsi:type="typens:ArrayOfSVCDatabase" />
<Resources xsi:type="typens:ArrayOfSVCResource">
<SVCResource xsi:type="typens:SVCResource">
<ID>{429221BF-D0A1-40D8-9DC1-B41D269E95C7}</ID>
<Name>test.crf</Name>
<Metadata xsi:type="typens:XmlPropertySet">
<XmlDoc>&lt;?xml version="1.0"?&gt;
&lt;metadata xml:lang="en"&gt;&lt;Esri&gt;&lt;CreaDate&gt;20211219&lt;/metadata&gt;
</XmlDoc>
</Metadata>
</SVCManifest>

DD2MI_result.xml after replacement

<SVCManifest xmlns:xs="http://www.w3.org/2001/XMLSchema"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:type="typens:SVCManifest">
<Databases xsi:type="typens:ArrayOfSVCDatabase" />
<Resources xsi:type="typens:ArrayOfSVCResource">
<SVCResource xsi:type="typens:SVCResource">
<ID>{429221BF-D0A1-40D8-9DC1-B41D269E95C7}</ID>
<Name>test.crf</Name>
<Metadata xsi:type="typens:XmlPropertySet">
<XmlDoc><?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xmlns:gco="http://www.isotc211.org/2005/gco"
                 xmlns:gmd="http://www.isotc211.org/2005/gmd"
                 xmlns:srv="http://www.isotc211.org/2005/srv"
                 xmlns:gmx="http://www.isotc211.org/2005/gmx"
                 xmlns:gsr="http://www.isotc211.org/2005/gsr"
                 xmlns:gss="http://www.isotc211.org/2005/gss"
                 xmlns:gts="http://www.isotc211.org/2005/gts"
                 xmlns:gml="http://www.opengis.net/gml/3.2"
                 xmlns:xlink="http://www.w3.org/1999/xlink"
                 xmlns:xs="http://www.w3.org/2001/XMLSchema"
                 xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd">
   <gmd:fileIdentifier>
      <gco:CharacterString>https://hdl.handle.net/20.500.12085/1f97f2a1-75fc-4110-ae22-f873d7d86565@metadata</gco:CharacterString>
   </gmd:fileIdentifier>
   <gmd:language>
      <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
   </gmd:language>
 </gmd:MD_Metadata>
</XmlDoc>
</Metadata>
</SVCManifest>

CodePudding user response:

Your DD2MI_result.xml is still not well formed - for example, a couple of tags aren't closed. So as a first step, this answer assumes that document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<SVCManifest xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="typens:SVCManifest">
   <Databases xsi:type="typens:ArrayOfSVCDatabase" />
   <Resources xsi:type="typens:ArrayOfSVCResource">
      <SVCResource xsi:type="typens:SVCResource">
         <ID>{429221BF-D0A1-40D8-9DC1-B41D269E95C7}</ID>
         <Name>test.crf</Name>
         <Metadata xsi:type="typens:XmlPropertySet">
            <XmlDoc>&lt;?xml version="1.0"?&gt;
&lt;metadata xml:lang="en"&gt;&lt;Esri&gt;&lt;CreaDate&gt;20211219&lt;/metadata&gt;</XmlDoc>
         </Metadata>
      </SVCResource>
   </Resources>
</SVCManifest>

Also, as I mentioned, you can't have two xml declarations within one well-formed xml, and if you have a declaration, it must be at the very beginning of the document, not somewhere in the middle.

Next, we start the actual process, but using the lxml library. This should get you close enough to what I believe is your expected output. If it's not 100% there, you'll have to tinker with it a bit:

from lxml import etree
source = """[the content of airquality.xml, from the question]"""
target = """[the content of DD2MI_result.xml, as corrected above]"""

s_doc = etree.XML(source.encode())
t_doc = etree.XML(target.encode())

#first, we get rid of the ugly XmlDoc element in target doc (DD2MI_result.xml)
for t in t_doc.xpath('//XmlDoc'):
    t.getparent().remove(t)

#we then create an empty replacement for it
new_xd = etree.Element("XmlDoc")

#next, the replacement element is inserted in the target document in the correct place  
for m in t_doc.xpath('//Metadata'):
    m.addnext(new_xd)

#finally, we insert in the new XmlDoc element the contents of the source document (airquality.xml)
for t in t_doc.xpath('//XmlDoc'):
    t.insert(0,s_doc.xpath('//*')[0])

#confirm that the output is what you are looking for:
print(etree.tostring(t_doc, xml_declaration=True, pretty_print=True).decode())

As I said, this may not be 100% of what you're trying to do, but should get you closer.

CodePudding user response:

Consider XSLT, the special-purpose language designed to transform XML files, which maintains the document() function to read in other XML documents. You even avoid any for-loops and if-logic.

Python's third-party, lxml, can run XSLT 1.0 scripts (not built-in etree or minidom). Alternatively, Python can call third-party XSLT 1.0, 2.0, even 3.0 processors.

XSLT (save as .xsl, a special .xml file)

Below assumes airquality.xml is in same folder relative to DD2MI_result.xml.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" encoding="utf-8" indent="yes" />
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REPLACE XMLDoc -->
    <xsl:template match="XmlDoc">
     <xsl:copy>
       <xsl:apply-templates select="document('airquality.xml')"/>
     </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et 

doc = et.parse('DD2MI_result.xml') 
xsl = et.parse('MyScript.xsl') 

# CONFIGURE TRANSFORMER 
transform = et.XSLT(xsl) 

# TRANSFORM SOURCE DOC 
result = transform(doc) 

# OUTPUT TO CONSOLE 
print(result) 

# SAVE TO FILE 
result.write_output('Output.xml')

Output

<SVCManifest xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="typens:SVCManifest">
  <Databases xsi:type="typens:ArrayOfSVCDatabase"/>
  <Resources xsi:type="typens:ArrayOfSVCResource"/>
  <SVCResource xsi:type="typens:SVCResource"/>
  <ID>{429221BF-D0A1-40D8-9DC1-B41D269E95C7}</ID>
  <Name>test.crf</Name>
  <Metadata xsi:type="typens:XmlPropertySet">
    <XmlDoc>
      <gmd:MD_Metadata xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gss="http://www.isotc211.org/2005/gss" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd">
        <gmd:fileIdentifier>
          <gco:CharacterString>https://hdl.handle.net/20.500.12085/1f97f2a1-75fc-4110-ae22-f873d7d86565@metadata</gco:CharacterString>
        </gmd:fileIdentifier>
        <gmd:language>
          <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
        </gmd:language>
      </gmd:MD_Metadata>
    </XmlDoc>
  </Metadata>
</SVCManifest>
  •  Tags:  
  • Related