Many modern web applications utilize XML and XHTML. But developers often fail to realize the full potential of these standards. XSLT is a powerful technology that can be used to transform XML documents into something else (like XHTML, CSS, or SQL). This post will briefly introduce XSLT and perform some simple transformations to an XHTML document.

An Example

To begin, let’s take a look at a simple XSLT document.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>

<xsl:template match="/">
<html>
  <head>
    <title>Hello, world.</title>
  </head>
  <body>
    <p>Hi, <xsl:value-of select="name"/>.</p>
  </body>
</html>
</xsl:template>

</xsl:stylesheet>

If we use this stylesheet to transform this simple XML document

<?xml version="1.0" encoding="utf-8"?>
<name>Mike</name>

The resulting output will be

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hello, world.</title>
</head>
<body><p>Hi, Mike.</p></body>
</html>

How It Works

The first thing to notice is that XSLT doesn’t just transform XML, it also is XML. Thus, XSLT can be easily generated and manipulated using readily available programming tools. You can even use XSLT to transform another XSLT document. The downside is that even simple constructs require XML elements in XSLT, making the language very verbose. Keep in mind that because XSLT documents are written in XML, they also must be well-formed XML documents.

Now let’s dissect the example stylesheet.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

This is the standard XSLT heading, which identifies the document as a stylesheet. The xmlns:xsl attribute is an XML namespace declaration. It identifies elements with the prefix xsl as part of the W3C XSLT specification. XSLT relies heavily on XML namespaces, and an XSLT processor will treat elements differently depending on the element’s namespace.

<xsl:output method="text"/>

The <xsl:output> element defines the output format that should be produced. The method attribute can have one of four values: xml, html, xhtml, or text.

  • The xml output method produces an XML document, or an XML fragment.
  • The html output method typically produces HTML 4.0 (though this is somewhat implementation dependent) and recognizes certain HTML conventions such as outputting <hr> elements with no end tag.
  • The xhtml output method follows the rules of the xml output method, but sticks to conventions described in the XHTML specification. Note that this output method only works with XSLT 2.0, but we’ll discuss a workaround later in the article.
  • The text output method can be used to output any other text-based format.

If the <xsl:output> element is missing, the XSLT processor tries to guess which output method to use. It will choose HTML if the output starts with an <html> element in the null namespace, XHTML if it starts with <html> in the XHTML namespace, and XML otherwise.

<xsl:template match="/">

The <xsl:template match="/"> element defines a template rule that will be triggered when a particular part of the source document is being processed. The match="/" attribute indicates that this rule should be triggered at the beginning of the document. The value of the match attribute is an XPath expression, and “/” identifies the document node of the document.

Once a template is triggered, the body of the template tells the XSLT processor what output to generate. Most of the template body here is HTML. Since the tags are not in the XSLT namespace the processor will copy the elements into the output file. However, the <xsl:value-of> element is in the XSLT namespace, and has special meaning to the processor. This instruction copies the text from a node in the source document to the output document. The select attribute specifies the node whose value should be copied using an XPath expression. The name XPath expression tells the processor to find the set of all <name> elements that are children of the node that is currently being processed (in this case the document node). The processor then extracts the text of this element and inserts it into the output document.

That’s basically a complicated way of saying that the processor will copy “Mike” from the element <name>Mike</name>, and insert it into the output document.

Getting Fancy

If you’re still with me, I’d like to jump right into a more complicated stylesheet that web developers might find useful. So, without further ado, I’d like to present my solution to the float clearing problem.

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xhtml="http://www.w3.org/1999/xhtml"
 xmlns:m="http://immike.net/m"
 xmlns="http://www.w3.org/1999/xhtml"
 exclude-result-prefixes="xhtml">
 <xsl:output method="xml"
  version="1.0"
  encoding="UTF-8"
  indent="yes"
  omit-xml-declaration="yes"
  doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
  doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" />

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="xhtml:div[@m:clear='true']">
    <div>
      <xsl:apply-templates select="@* | node()"/>
      <div style="clear: both; height: 0px; line-height: 0px;">&#160;</div>
    </div>
  </xsl:template>

  <xsl:template match="@m:*" />

</xsl:stylesheet>

I know, it looks like a monster. But bear with me, it’ll all make sense in a moment.

Let’s break this one down into smaller chunks that will be a bit easier to understand.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
  xmlns:m="http://immike.net/m"
  xmlns="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="xhtml">

Again, this is the standard XSLT heading which tells the processor that this document is a stylesheet. We’ve also declared namespaces for xsl, xhtml, and some custom markup using the prefix ‘m’, that I’ll be using to extend xhtml.

Notice that the XHTML namespace is declared twice, once with the xhtml prefix and once as the document’s default namespace (no prefix). Because XHTML uses a default namespace, we can’t access an XHTML document’s nodes without using a namespace prefix in our stylesheet. The ‘xhtml’ prefix will be used to access nodes in the xhtml namespace in the source document. Setting the default namespace as XHTML will keep the XSLT processor from adding namespace attributes to each XML fragment it outputs. If you’re confused at this point, you can read more here, or ignore this paragraph and remember this pattern whenever you’re transforming XHTML to XHTML.

<xsl:output method="xml"
  version="1.0"
  encoding="UTF-8"
  indent="yes"
  omit-xml-declaration="yes"
  doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
  doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>

Here’s our workaround to produce XHTML output using XSLT 1.0. The output method is xml. but we’ve instructed the XSLT processor to omit the XML declaration (it triggers quirks mode in IE 6.0). Finally, the two doctype attributes will produce an XHTML strict document type declaration at the top of our output document.

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

This is a standard template that is used in many transformations. It’s called an identity stylesheet, or copy stylesheet. The XPath expression @* matches any attribute (the ‘@’ character is used to match an attribute in XPath). The node() function returns true for any node. And the pipe character (’|') means or. So our complete expression, @* | node() will match every attribute, and every node in the source document.

The <xsl:copy> element is used to copy portions of the source document to the output document. Between the opening and closing <xsl:copy> tags there is an <xsl:apply-templates> tag. This is where the magic happens.

An XPath expression is assigned to the select attribute of the apply-templates element. The apply-templates element then triggers the processing of all nodes that match the expression, using the templates in our stylesheet that match those nodes.

So we’re basically doing a deep copy… unless the tag matches the next template in our stylesheet.

<xsl:template match="xhtml:div[@m:clear='true']">
  <div>
    <xsl:apply-templates select="@* | node()"/>
    <div style="clear: both; height: 0px; line-height: 0px;">
      &#160;
    </div>
  </div>
</xsl:template>

This template matches any <div> element that has an m:clear attribute set to true. It copies the matching element verbatim, with one minor addition. Before the closing tag the template outputs an extra div element with zero height, and with its style set to clear: both;. The end result is that you can add an attribute to any div tag telling it to clear any floats inside of it. No need for fancy CSS hacks or javascript.

<xsl:template match="@m:*" />

This last template matches any attribute with the ‘m’ prefix and outputs nothing, effectively stripping our non-standard xhtml extensions from the document.

Applying the Transformation

Most modern programming languages have functionality built in that makes it trivially easy to apply XSLT transformations to an XML document. Here is a quick PHP5 command-line program that demonstrates how to use XSLT, and can be easily adapted for use in many applications.

if(count($argv) < 3) {
  die("Usage: php xslt.php <xml> <xslt>\n");
}

$xsl = new XSLTProcessor();
$doc = new DOMDocument();

$doc->load($argv[2]);
$xsl->importStyleSheet($doc);
$doc->load($argv[1]);
$result = $xsl->transformToXml($doc);

print($result);