I have a huge xml file where I need to eliminate duplicate elements that appear on the same level, but only if these elements have no child elements. On top of this, each element is prefixed by a namespace. I am using xsl version="1.0"
So here is how my file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<nms1:parent xmlns:nms1="urn:rdns:com:nms1">
<nms1:qq>
<nms1:aa>a0</nms1:aa>
<nms1:bb>
<nms1:cc>
<nms1:dd>
<nms1:ddd1>1</nms1:ddd1>
<nms1:ddd2>2</nms1:ddd2>
</nms1:dd>
</nms1:cc>
<nms1:ee>
<nms1:ff>0</nms1:ff>
<nms1:gg>
<nms1:cc>
<nms1:cc/>
<nms1:hh>h</nms1:hh>
<nms1:cc/>
</nms1:cc>
</nms1:gg>
</nms1:ee>
</nms1:bb>
</nms1:qq>
</nms1:qos>
</data>
I need to eliminate one of the <nms1:cc/> that are under the parent <nms1:cc/>.
In this case the nms1:cc/ is empty, but can also have a value. The only condition is not to be a parent element.
So, in the end my file must look like:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<nms1:parent xmlns:nms1="urn:rdns:com:nms1">
<nms1:qq>
<nms1:aa>a0</nms1:aa>
<nms1:bb>
<nms1:cc>
<nms1:dd>
<nms1:ddd1>1</nms1:ddd1>
<nms1:ddd2>2</nms1:ddd2>
</nms1:dd>
</nms1:cc>
<nms1:ee>
<nms1:ff>0</nms1:ff>
<nms1:gg>
<nms1:cc>
<nms1:cc/>
<nms1:hh>h</nms1:hh>
</nms1:cc>
</nms1:gg>
</nms1:ee>
</nms1:bb>
</nms1:qq>
</nms1:qos>
</data>
CodePudding user response:
This stylesheet is a modified identity transform with a specialized template matching empty elements. Inside of that template, it captures the local-name() and the namespace-uri() of the matched element and uses it to test whether there are any preceding-sibling:: elements that are also empty and have the same local-name() and namespace-uri() as the matched element. If there are not, then it copies the node. If there are, then nothing is produced and that element is dropped.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(node())]">
<xsl:variable name="local-name" select="local-name()"/>
<xsl:variable name="namespace" select="namespace-uri()"/>
<xsl:if test="not(preceding-sibling::*[not(node()) and local-name() eq $local-name and namespace-uri() eq $namespace])">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
CodePudding user response:
I would use a key:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:key name="dups" match="*[not(*)]" use="concat(generate-id(..), '|', namespace-uri(), '|', local-name())"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*)][not(generate-id() = generate-id(key('dups', concat(generate-id(..), '|', namespace-uri(), '|', local-name()))[1]))]"/>
</xsl:stylesheet>
