Home > Mobile >  XSLT to remove only duplicates on the same level that have no child element
XSLT to remove only duplicates on the same level that have no child element

Time:01-13

I have a huge xml file where I need to eliminate duplicate elements that appear on the same level, but only if these elements have no child elements. On top of this, each element is prefixed by a namespace. I am using xsl version="1.0"

So here is how my file looks like:

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <nms1:parent xmlns:nms1="urn:rdns:com:nms1">
        <nms1:qq>
            <nms1:aa>a0</nms1:aa>
            <nms1:bb>
                <nms1:cc>
                    <nms1:dd>
                        <nms1:ddd1>1</nms1:ddd1>
                        <nms1:ddd2>2</nms1:ddd2>
                    </nms1:dd>
                </nms1:cc>
                <nms1:ee>
                    <nms1:ff>0</nms1:ff>
                    <nms1:gg>
                        <nms1:cc>
                            <nms1:cc/>
                            <nms1:hh>h</nms1:hh>
                            <nms1:cc/>
                        </nms1:cc>
                    </nms1:gg>
                </nms1:ee>
            </nms1:bb>
        </nms1:qq>
    </nms1:qos>
</data>

I need to eliminate one of the <nms1:cc/> that are under the parent <nms1:cc/>. In this case the nms1:cc/ is empty, but can also have a value. The only condition is not to be a parent element.

So, in the end my file must look like:

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <nms1:parent xmlns:nms1="urn:rdns:com:nms1">
        <nms1:qq>
            <nms1:aa>a0</nms1:aa>
            <nms1:bb>
                <nms1:cc>
                    <nms1:dd>
                        <nms1:ddd1>1</nms1:ddd1>
                        <nms1:ddd2>2</nms1:ddd2>
                    </nms1:dd>
                </nms1:cc>
                <nms1:ee>
                    <nms1:ff>0</nms1:ff>
                    <nms1:gg>
                        <nms1:cc>
                            <nms1:cc/>
                            <nms1:hh>h</nms1:hh>
                        </nms1:cc>
                    </nms1:gg>
                </nms1:ee>
            </nms1:bb>
        </nms1:qq>
    </nms1:qos>
</data>

CodePudding user response:

This stylesheet is a modified identity transform with a specialized template matching empty elements. Inside of that template, it captures the local-name() and the namespace-uri() of the matched element and uses it to test whether there are any preceding-sibling:: elements that are also empty and have the same local-name() and namespace-uri() as the matched element. If there are not, then it copies the node. If there are, then nothing is produced and that element is dropped.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
    <xsl:output indent="yes"/>
    
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(node())]">
        <xsl:variable name="local-name" select="local-name()"/>
        <xsl:variable name="namespace" select="namespace-uri()"/>
        <xsl:if test="not(preceding-sibling::*[not(node()) and local-name() eq $local-name and namespace-uri() eq $namespace])">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

CodePudding user response:

I would use a key:

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:key name="dups" match="*[not(*)]" use="concat(generate-id(..), '|', namespace-uri(), '|', local-name())"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[not(*)][not(generate-id() = generate-id(key('dups', concat(generate-id(..), '|', namespace-uri(), '|', local-name()))[1]))]"/>

</xsl:stylesheet>
  •  Tags:  
  • Related