Home > Enterprise >  Using XSL it can show ASCII value of some special character
Using XSL it can show ASCII value of some special character

Time:01-15

I am facing some issues when I convert the XML using XSL then it didn't parse the bullet it gives me some ASCII characters as shown below.

Here is the XSL that convert the complex xml into simplified XML.

        <xsl:stylesheet version="2.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
        <xsl:output method="xml" indent="yes"/>
        <xsl:template match="/document">
            <document>
                <xsl:for-each select="page">
                    <page>
                        <xsl:for-each select="block">
                            <block blockType="{@blockType}">
                               <xsl:for-each select="text">
                                   <text>
                                        <xsl:for-each select="par">
                                            <paragraph>
                                                <line>
                                                    <xsl:value-of select="line"/>
                                                </line>
                                            </paragraph>
                                        </xsl:for-each>
                                    </text>
                                </xsl:for-each>
                            </block>
                        </xsl:for-each>
                    </page>
                </xsl:for-each>
            </document>
        </xsl:template>
        </xsl:stylesheet>
        

At the beginning of <line> it needs to show the bullet bt it show some ascii value when we convert the xml using xsl from complex xml. i used the saxon transformation to transform the xml using xsl stylesheet language

        <paragraph>
                       <line>?¢â?¬?¢ If you have to take a picture of a document in poor lighting and need the flash, try to use the flash from 20 inches away and try to find additional light sources.</line>
                    </paragraph>
    

XSL is a family of recommendations for defining XML document transformation and presentation. An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary or Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable

Here is the XML that is converted using XML Stylesheet language. when I used online XSL transformation it gives me a correct answer but using Saxon transformation will not give me the exact result. I don't know where I was doing wrong why it does not give me the correct result. what's the issue behind that is it with Transformation or with XSL?

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" pagesCount="2" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<page width="2550" height="3300" resolution="300" originalCoords="1">
<block blockType="Text" blockName="" l="273" t="1721" r="2281" b="2618"><region><rect l="273" t="1721" r="2281" b="2618"/></region>
<text>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2232" l="355" t="2201" r="2275" b="2240"><formatting lang="EnglishUnitedStates">• Use the white balance feature. If your camera has manual white balance, use a white sheet of paper</formatting></line>
<line baseline="2280" l="429" t="2249" r="2209" b="2288"><formatting lang="EnglishUnitedStates">to set white balance. Otherwise, select the appropriate balance mode for your lighting conditions.</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2331" l="355" t="2300" r="1416" b="2339"><formatting lang="EnglishUnitedStates">• Enable the anti-shake setting: otherwise, use a tripod.</formatting></line></par>
<par lineSpacing="1152">
<line baseline="2403" l="282" t="2373" r="759" b="2412"><formatting lang="EnglishUnitedStates">In poor lighting conditions:</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2454" l="355" t="2423" r="1930" b="2462"><formatting lang="EnglishUnitedStates">• Auto focus may function incorrectly: therefore, you should switch to manual focus.</formatting></line></par>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2505" l="355" t="2474" r="2154" b="2513"><formatting lang="EnglishUnitedStates">• Use the maximum aperture allowed by the camera (2.3 or 4.5). (In bright daylight, use smaller</formatting></line>
<line baseline="2553" l="430" t="2522" r="1245" b="2561"><formatting lang="EnglishUnitedStates">apertures: this will produce sharper images).</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2603" l="355" t="2572" r="2121" b="2612"><formatting lang="EnglishUnitedStates">• If your camera gives you more than one choice of ISO speed, select the highest ISO setting.</formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="315" t="1349" r="697" b="1693"><region><rect l="315" t="1349" r="697" b="1693"/></region>
</block>
<block blockType="Text" blockName="" l="1270" t="3021" r="1304" b="3067"><region><rect l="1270" t="3021" r="1304" b="3067"/></region>
<text>
<par lineSpacing="1380">
<line baseline="3061" l="1276" t="3027" r="1297" b="3061"><formatting lang="EnglishUnitedStates">2</formatting></line></par>
</text>
</block>
</page>
</document>

Here is the saxon parser transformation that used to translate it
public static String saxonTransform(String xml, String xsl) throws TransformerException, FileNotFoundException {
        TransformerFactoryImpl f = new net.sf.saxon.TransformerFactoryImpl();
        f.setAttribute("http://saxon.sf.net/feature/version-warning", Boolean.FALSE);
        try {
            StreamSource xsrc = new StreamSource(new ByteArrayInputStream(xsl.getBytes(Charset.forName("UTF-8"))));
            Transformer t = f.newTransformer(xsrc);
            StreamSource src = new StreamSource(new ByteArrayInputStream(xml.getBytes(Charset.forName("UTF-8"))));
            StreamResult res = new StreamResult(new ByteArrayOutputStream());
            t.transform(src, res);
            return res.getOutputStream().toString();
        } catch (Exception e) {
            logger.warn(e.getMessage());
        }
        return null;
    }

here is the way that convert the file into XML
 public  String  FileToXmlString( String path){
        String str="";
        String str1="";
        try {
            str=new String(Files.readAllBytes(Paths.get(path)));
            str1=str.substring(3);
            }
            catch (IOException e) {
                logger.error(e.getMessage());
            }
        return str1;        
    }

CodePudding user response:

If you have a Java String with Unicode characters than the right way to feed them to the XML parser/JAXP Transformer would be a StreamSource over a StringReader i.e. StreamSource src = new StreamSource(new StringReader(xml));. You haven't shown how you construct the String xml but once you have a String with the characters use a StringReader.

Of course, if you have a file, use a StreamSource over a FileInputStream, all attempts to guess encoding and do the decoding by hand are not necessary and error-prone, an XML parser is usually pretty good at detecting the encoding based on the XML declaration and decoding as necessary.

As you also want a String for the transformation result I would additionally recommend a StreamResult over a StringWriter.

CodePudding user response:

The problem will be that the input file is not encoded the way the XML parser thinks it is, so the XML parser is decoding the characters incorrectly. Check whether the input XML file has an XML declaration that declares the encoding, and check whether the bullet character at the start of the line is actually encoded the way it should be.

A good XML editor like Oxygen should help you resolve this.

Of course, once you discover exactly what the encoding problem is, you need to investigate how it happened, and ensure that it doesn't happen again.

(Incidentally, it's "ascii" not "ascaii", and the characters you are seeing are non-ASCII characters. When you're dealing with character encoding problems, you need to be precise.)

  •  Tags:  
  • Related