Home > Blockchain >  XSLT1.0 copy node content from specific tags when transforming XML data
XSLT1.0 copy node content from specific tags when transforming XML data

Time:01-12

I have an XSLT transformation file to convert an XML file to another format. The source XML has many formating tags that are incompatible with the destination stylesheet. Then, I need to read the content of some tags and pass the element content.

Here is the XSLT1.0 transformation code:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:template match="/">
        <xsl:variable name="var1_initial" select="."/>
        <xsl:for-each select="procstep">
            <procl>
                <xsl:variable name="var11_cur" select="."/>
                <procstep>
                    <xsl:attribute name="time">1</xsl:attribute>
                    <title>
                        <xsl:for-each select="(./proct/node())[./self::text()]">
                            <xsl:variable name="var12_filter" select="."/>
                            <xsl:value-of select="normalize-space(string(.))"/>
                            <xsl:text> </xsl:text>
                        </xsl:for-each>
                    </title>
                </procstep>
            </procl>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

And here is an example source data:

<procl>
    <procstep>
        <proct>Connect the lifts together.</br>Lift the vehicle.</proct>
    </procstep>
    <procstep>
        <proct>Remove the screws.</br>Remove the plates.</proct>
    </procstep>
    <procstep>
        <proct>Remove the nuts and washers.</br>Remove the shield.</proct>
    </procstep>
    <procstep>
        <proct>Secure the exhaust pipe.</br>Install a strap.</br>Apply torque of <hp1>25 Nm</hp1>.</proct>
    </procstep>
    <procstep>
        <proct>Install the screws and nuts.</br>Use tool <hp2>256256</hp2> to fix the clamp.</proct>
    </procstep>
    <procstep>
        <proct>Install the nuts and screws.</br>Assemble the member in the following order:
            <table>
                <tgroup cols="2" colsep="1" rowsep="1">
                    <colspec colwidth="132.38*"/>
                    <colspec colwidth="132.10*"/>
                    <thead>
                        <row>
                            <entry align="left" valign="top">Value</entry>
                            <entry align="left" valign="top">Position</entry>
                        </row>
                    </thead>
                    <tbody>
                        <row>
                            <entry align="left" valign="top">25</entry>
                            <entry align="left" valign="top">Superior</entry>
                        </row>
                        <row>
                            <entry align="left" valign="top">12</entry>
                            <entry align="left" valign="top">Inferior</entry>
                        </row>
                    </tbody>
                </tgroup>
            </table>
        </proct>
    </procstep>
    <procstep>
        <proct>Lower the vehicle.</proct>
    </procstep>
    <procstep>
        <proct>Mark the <hp1>torque value</hp1> in the data sheet.</proct>
    </procstep>
</procl>

Where I have <hp0>, <hp1>, <hp2> and <hp3> as formating tags (bold, italc, underline and enphatized). The tag </br> is a brake line, and will be removed by the normalize-space option.

The destination code must keep the element inside <hp*> and </hp*>, but must delete the tags. I try to add all content, but the destination stylesheet doesn't allow <hp*> tags. It doesn't allow another tags, like table content. Using a XPath would include the content, that I need to ignore.

The result code must be:

<procl>
    <procstep>
        <title>Connect the lifts together. Lift the vehicle. </title>
    </procstep>
    <procstep>
        <title>Remove the screws. Remove the plates. </title>
    </procstep>
    <procstep>
        <title>Remove the nuts and washers. Remove the shield. </title>
    </procstep>
    <procstep>
        <title>Secure the exhaust pipe. Install a strap. Apply torque of 25 Nm. </title>
    </procstep>
    <procstep>
        <title>Install the screws and nuts. Use tool 256256 to fix the clamp. </title>
    </procstep>
    <procstep>
        <title>Install the nuts and screws. Assemble the member in the following order: </title>
    </procstep>
    <procstep>
        <title>Lower the vehicle. </title>
    </procstep>
    <procstep>
        <title>Mark the torque value in the data sheet. </title>
    </procstep>
</procl>

I'm using the XSLT inside a Python app with LET.XSLT lirary, them I am limited to use XSLT1.0. My XSLT code and source data is more complex. I try to simplify here to focus only in the transformation of <proct> data.

So, the question is: how to pass the conten inside <hp*></hp*> and not to another child nodes under <proct></proct>? Maybe the question is very simple to you, but I am a newbie in XSLT transformations.

Thanks in advance for your time.

CodePudding user response:

The input XML is not well-formed. I had to fix it.

Input XML

<?xml version="1.0"?>
<procl>
    <procstep>
        <proct>Connect the lifts together.<br/>Lift the vehicle.</proct>
    </procstep>
    <procstep>
        <proct>Remove the screws.<br/>Remove the plates.</proct>
    </procstep>
    <procstep>
        <proct>Remove the nuts and washers.<br/>Remove the shield.</proct>
    </procstep>
    <procstep>
        <proct>Secure the exhaust pipe.<br/>Install a strap.<br/>Apply torque of <hp1>25 Nm</hp1>.</proct>
    </procstep>
    <procstep>
        <proct>Install the screws and nuts.<br/>Use tool <hp2>256256</hp2> to fix the clamp.</proct>
    </procstep>
    <procstep>
        <proct>Install the nuts and screws.<br/>Assemble the member in the following order:
            <table>
                <tgroup cols="2" colsep="1" rowsep="1">
                    <colspec colwidth="132.38*"/>
                    <colspec colwidth="132.10*"/>
                    <thead>
                        <row>
                            <entry align="left" valign="top">Value</entry>
                            <entry align="left" valign="top">Position</entry>
                        </row>
                    </thead>
                    <tbody>
                        <row>
                            <entry align="left" valign="top">25</entry>
                            <entry align="left" valign="top">Superior</entry>
                        </row>
                        <row>
                            <entry align="left" valign="top">12</entry>
                            <entry align="left" valign="top">Inferior</entry>
                        </row>
                    </tbody>
                </tgroup>
            </table>
        </proct>
    </procstep>
    <procstep>
        <proct>Lower the vehicle.</proct>
    </procstep>
    <procstep>
        <proct>Mark the <hp1>torque value</hp1> in the data sheet.</proct>
    </procstep>
</procl>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="proct">
        <title>
            <xsl:apply-templates select="@*|node()"/>
        </title>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:value-of select="normalize-space(.)"/>
    </xsl:template>

    <xsl:template match="hp1|hp2">
        <xsl:text> </xsl:text>
        <xsl:value-of select="."/>
        <xsl:text> </xsl:text>
    </xsl:template>

    <xsl:template match="table"/>
    <xsl:template match="br"/>
</xsl:stylesheet>

Output XML

<procl>
  <procstep>
    <title>Connect the lifts together.Lift the vehicle.</title>
  </procstep>
  <procstep>
    <title>Remove the screws.Remove the plates.</title>
  </procstep>
  <procstep>
    <title>Remove the nuts and washers.Remove the shield.</title>
  </procstep>
  <procstep>
    <title>Secure the exhaust pipe.Install a strap.Apply torque of 25 Nm .</title>
  </procstep>
  <procstep>
    <title>Install the screws and nuts.Use tool 256256 to fix the clamp.</title>
  </procstep>
  <procstep>
    <title>Install the nuts and screws.Assemble the member in the following order:</title>
  </procstep>
  <procstep>
    <title>Lower the vehicle.</title>
  </procstep>
  <procstep>
    <title>Mark the torque value in the data sheet.</title>
  </procstep>
</procl>

CodePudding user response:

Except for one thing, your output could be produced quite simply using only this:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/procl | procstep">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="proct">
    <title>
        <xsl:apply-templates/>
    </title>
</xsl:template>

<xsl:template match="br">
    <xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="*[not(starts-with(name(), 'hp'))]" priority="-1"/>

</xsl:stylesheet>

The only difference between the actual result:

<?xml version="1.0" encoding="UTF-8"?>
<procl>
   <procstep>
      <title>Connect the lifts together. Lift the vehicle.</title>
   </procstep>
   <procstep>
      <title>Remove the screws. Remove the plates.</title>
   </procstep>
   <procstep>
      <title>Remove the nuts and washers. Remove the shield.</title>
   </procstep>
   <procstep>
      <title>Secure the exhaust pipe. Install a strap. Apply torque of 25 Nm.</title>
   </procstep>
   <procstep>
      <title>Install the screws and nuts. Use tool 256256 to fix the clamp.</title>
   </procstep>
   <procstep>
      <title>Install the nuts and screws. Assemble the member in the following order:
            </title>
   </procstep>
   <procstep>
      <title>Lower the vehicle.</title>
   </procstep>
   <procstep>
      <title>Mark the torque value in the data sheet.</title>
   </procstep>
</procl>

and the expected output is the extra white space after "Assemble the member in the following order:".

You could try to remove it by using normalize-space() - but if you apply it globally to all text nodes passed to the output (as suggested above), you will also destroy existing spaces which may exist around the hp* formatting elements. Trying to restore these arbitrarily may lead to a result that is different from the original text - e.g. when only a part of a word is formatted. If there is a way to identify such problematic text nodes (such as having an immediately following table sibling), that would be preferable, IMHO. Alternatively, you could add another pass of processing and apply normalize-space() to the title elements created in the first pass.

  •  Tags:  
  • Related