XSLT

JSON to XML in XSLT

XSLT is powerful enough to process even non-XML input. For example, I have created a transformation that converts JSON text to well structured XML output:

JSON text:

{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
     "address": {
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": "10021"
     },
     "phoneNumber": [
         { "type": "home", "number": "212 555-1234" },
         { "type": "fax", "number": "646 555-4567" }
     ]
}

XML result:

 <?xml version="1.0" encoding="UTF-8"?>
<json>
   <object>
      <field name="firstName">
         <string>John</string>
      </field>
      <field name="lastName">
         <string>Smith</string>
      </field>
      <field name="age">
         <number>25</number>
      </field>
      <field name="address">
         <object>
            <field name="streetAddress">
               <string>21 2nd Street</string>
            </field>
            <field name="city">
               <string>New York</string>
            </field>
            <field name="state">
               <string>NY</string>
            </field>
            <field name="postalCode">
               <string>10021</string>
            </field>
         </object>
      </field>
      <field name="phoneNumber">
         <array>
            <object>
               <field name="type">
                  <string>home</string>
               </field>
               <field name="number">
                  <string>212 555-1234</string>
               </field>
            </object>
            <object>
               <field name="type">
                  <string>fax</string>
               </field>
               <field name="number">
                  <string>646 555-4567</string>
               </field>
            </object>
         </array>
      </field>
   </object>
</json>

It works by employing XML Pipeline technique:

  • first mode parses text using regular expressions and generates sequence of tokens in XML format: <comment>, <string>, <number>, <symbol>{</symbol>, etc.
  • second mode groups all tokens between "{" and "}" symbols into <object> element, between "[" and "]" symbols into <array> element
  • third mode makes a <field> element from <string><symbol>:<symbol>(<string>|<number>|<object>|<array>) sequence
  • fourth mode drops comma between consecutive <field> elements

and finally it performs XSD validation check.

This approach is quite exotic and non-standard, because when people hear word "parsing" they think of BNF, state-machine, AST, YACC, etc., but these technologies were created to parse complex programming languages.

I claim that XSLT coupled w/ regexp as tokenizer is powerful enough to convert simple markup languages such as JSON, CSS, MIF, RTF, wiki, etc., into XML.

p.s.
See attachment below for source code of the transformation.

?mode in xsl:include/@href

The xsl:include/@mode patchset I have developed to ease XML Pipeline in XSLT is not good. The major problem is it won't work on closed-source XSLT processors (Saxon PE/EE for example).

As suggested by Michael Kay, there is a standard compliant way to implement above:

<xsl:include href="link1.xsl?mode=link1"/>

It would work by means of URIResolver performing XSLT preprocessing.

Using this approach chain.xsl would look like

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common">

    <xsl:output method="xml" omit-xml-declaration="no" indent="no" encoding="UTF-8"/>

    <xsl:include href="link1.xsl?mode=link1"/>
    <xsl:include href="link2.xsl?mode=link2"/>
    <xsl:include href="link3.xsl?mode=link3"/>

    <xsl:template match="/">
        <xsl:variable name="link1">
            <xsl:apply-templates mode="link1" select="node()"/>
        </xsl:variable>
        <xsl:variable name="link2">
            <xsl:apply-templates mode="link2" select="exsl:node-set($link1)/node()"/>
        </xsl:variable>
        <xsl:variable name="link3">
            <xsl:apply-templates mode="link3" select="exsl:node-set($link2)/node()"/>
        </xsl:variable>

        <xsl:copy-of select="exsl:node-set($link3)/node()"/>
    </xsl:template>

</xsl:stylesheet>

, link1.xsl, link2.xsl, link3.xsl and identity.xsl will be the same as in previous variant.

I have implemented ?mode in xsl:include/@href, and it works great.

The implementation works by using URIResolver and custom SAX ContentHandler. The ContentHandler is on-the-fly transforming XSLT source code adding mode attribute where necessary.

The reason I choose ContentHandler approach over simpler XSLT like

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

	<xsl:output method="xml" omit-xml-declaration="no" indent="no" encoding="UTF-8"/>

	<xsl:param name="mode"/>

	<xsl:template priority="-9" match="@*|node()">
		<xsl:copy>
			<xsl:apply-templates select="@*|node()"/>
		</xsl:copy>
	</xsl:template>

	<xsl:template match="xsl:template[@match and not(@mode)] | xsl:apply-templates[not(@mode)]">
		<xsl:copy>
			<xsl:attribute name="mode">
		 		<xsl:value-of select="$mode"/>
			</xsl:attribute>
			<xsl:apply-templates select="@*|node()"/>
		</xsl:copy>
	</xsl:template>

	<xsl:template match="xsl:include/@href | xsl:import/@href">
		<xsl:attribute name="{name()}">
			<xsl:choose>
				<xsl:when test="contains(., '?')">
					<xsl:value-of select="concat(., '&amp;mode=', $mode)"/>
				</xsl:when>
				<xsl:otherwise>
					<xsl:value-of select="concat(., '?mode=', $mode)"/>
				</xsl:otherwise>
			</xsl:choose>
		</xsl:attribute>
	</xsl:template>

</xsl:stylesheet>

is because ContentHandler approach preserves line number information: if you'll make an error in the source, processor will report exact line/col position of error; w/ above XSLT that is not possible.

You can find ZIP w/ source code implementing ?mode in xsl:include/@href attached below.

XSLT fn:name() is evil

XSLT name() function is evil and should be avoided just like GoTo.

For example, it is very bad to write

<xsl:apply-templates select="*[name()!='a:b']"/>

because it will exclude both of

<a:b/>
<a:b xmlns:a="totallyDifferentNamespace"/>

the correct way to write above apply-templates is

<xsl:apply-templates select="*[not(self::a:b)]"/>

or in XSLT 2.0

<xsl:apply-templates select="* exclude a:b"/>

assuming xmlns:a="someSpecificNamespace"

Test code is attached below.

xsl:include/@mode

The XML Pipeline in XSLT technique requires every xsl:template and xsl:apply-templates in the chained stylesheet to have mode attribute.

This could be simplified if XSLT would allow specifying mode in xsl:include. So <xsl:include href="link1.xsl" mode="link1"/> would mean "use link1 as default mode in link1.xsl":

  1. if xsl:template or xsl:apply-templates in the contained stylesheet do not have mode attribute, they receive it from xsl:include the stylesheet was included with
  2. if you still want to reference default mode from link1.xsl, you should use mode="#default"
  3. if link1.xsl includes identity.xsl, link1 mode is tunneled to identity.xsl.

If xsl:include/@mode would be available in XSLT processors, XML Pipeline code would look much simpler:

  • chain.xsl
    <?xml version="1.0" encoding="utf-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common">
    
        <xsl:output method="xml" omit-xml-declaration="no" indent="no" encoding="UTF-8"/>
    
        <xsl:include href="link1.xsl" mode="link1"/>
        <xsl:include href="link2.xsl" mode="link2"/>
        <xsl:include href="link3.xsl" mode="link3"/>
    
        <xsl:template match="/">
            <xsl:variable name="link1">
                <xsl:apply-templates mode="link1" select="node()"/>
            </xsl:variable>
            <xsl:variable name="link2">
                <xsl:apply-templates mode="link2" select="exsl:node-set($link1)/node()"/>
            </xsl:variable>
            <xsl:variable name="link3">
                <xsl:apply-templates mode="link3" select="exsl:node-set($link2)/node()"/>
            </xsl:variable>
    
            <xsl:copy-of select="exsl:node-set($link3)/node()"/>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link1.xsl
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    	<xsl:import href="identity.xsl" />
    
        <xsl:template match="a">
            <aa>
                <xsl:apply-templates select="@*|node()"/>
                <b/>
            </aa>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link2.xsl
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    	<xsl:import href="identity.xsl" />
    
        <xsl:template match="b">
            <bb>
                <xsl:apply-templates select="@*|node()"/>
                <c/>
            </bb>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link3.xsl
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    	<xsl:import href="identity.xsl" />
    
        <xsl:template match="c">
            <cc>
                <xsl:apply-templates select="@*|node()"/>
            </cc>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • identity.xsl
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:template priority="-9" match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
    </xsl:stylesheet>
    

Attached you can find my patches to Saxon 9.1 and Xalan-J 2.7.1 enabling @mode in xsl:include.

XML Pipeline in XSLT

Most often people implement XML Pipeline in regular programming languages, sequentially invoking XSLT transformations one-by-one, "manually" feeding resulting output from one transformation to another.

Bespoke slow performance, this results in lesser flexibility since people stop using XSLT's declarative power and revert back to imperative world, which is not acceptable for XML transformations.

In this article I will show the most efficient way of implementing XML Pipeline using native XSLT capabilities.

The code outlining the approach consists of the following files:

  • chain.xsl, the "magic", main file implementing the pipeline. [1]
    <?xml version="1.0" encoding="utf-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common">
    
        <xsl:output method="xml" omit-xml-declaration="no" indent="no" encoding="UTF-8"/>
    
        <xsl:include href="link1.xsl"/>
        <xsl:include href="link2.xsl"/>
        <xsl:include href="link3.xsl"/>
    
        <xsl:template match="/">
            <xsl:variable name="link1">
                <xsl:apply-templates mode="link1" select="node()"/>
            </xsl:variable>
            <xsl:variable name="link2">
                <xsl:apply-templates mode="link2" select="exsl:node-set($link1)/node()"/>
            </xsl:variable>
            <xsl:variable name="link3">
                <xsl:apply-templates mode="link3" select="exsl:node-set($link2)/node()"/>
            </xsl:variable>
    
            <xsl:copy-of select="exsl:node-set($link3)/node()"/>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link1.xsl, linked transformation:
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:template priority="-9" mode="link1" match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates mode="link1" select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template mode="link1" match="a">
            <aa>
                <xsl:apply-templates mode="link1" select="@*|node()"/>
                <b/>
            </aa>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link2.xsl, linked transformation:
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:template priority="-9" mode="link2" match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates mode="link2" select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template mode="link2" match="b">
            <bb>
                <xsl:apply-templates mode="link2" select="@*|node()"/>
                <c/>
            </bb>
        </xsl:template>
    
    </xsl:stylesheet>
    
  • link3.xsl, linked transformation:
    <?xml version="1.0" encoding="UTF-8"?>
    
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:template priority="-9" mode="link3" match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates mode="link3" select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template mode="link3" match="c">
            <cc>
                <xsl:apply-templates mode="link3" select="@*|node()"/>
            </cc>
        </xsl:template>
    
    </xsl:stylesheet>
    

The key point of this approach is usage of EXSLT node-set function. This function converts result-nodes back into source-nodes. This way output of link1.xsl is fed as input to link2.xsl, and output of link2.xsl is fed as input to link3.xsl. [2]

This approach leverages full power of XSLT, it is fast and declarative. Transformations are applied without leaving XSLT processor, non-linear pipelines can be implemented easily:

        <xsl:variable name="link2">
            <!-- some logic whether or not launch link2.xsl -->
            <xsl:choose>
                <xsl:when test="not(.//d)">
                    <xsl:apply-templates mode="link2" select="exsl:node-set($link1)/node()"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:copy-of select="exsl:node-set($link1)/node()"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:variable>

I am using this technique for a number of years, and so far it works great.

[1] By default MSXML does not implement EXSLT, though it does have node-set function. It is in the xmlns:msxsl="urn:schemas-microsoft-com:xslt" namespace. There is chain_msxml.xsl — separate main file for MSXML that is using node-set function from msxsl namespace.
[2] xsl:copy-of in the last statement is included for debugging purposes. For example, if something goes wrong in link2.xsl, just change one symbol in xsl:copy-of to make it look like <xsl:copy-of select="exsl:node-set($link23)/node()"/> and pipeline will stop at 2nd transformation.
[3] exslt:node-set() is not required for XSLT 2.0 processor

Sample code is attached below.

Syndicate content