XSLT for web developers
Web Development May 9th, 2007 - 11,144 viewsMany modern web applications utilize XML and XHTML. But developers often fail to realize the full potential of these standards. XSLT is a powerful technology that can be used to transform XML documents into something else (like XHTML, CSS, or SQL). This post will briefly introduce XSLT and perform some simple transformations to an XHTML document.
An Example
To begin, let’s take a look at a simple XSLT document.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Hello, world.</title>
</head>
<body>
<p>Hi, <xsl:value-of select="name"/>.</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
If we use this stylesheet to transform this simple XML document
<?xml version="1.0" encoding="utf-8"?> <name>Mike</name>
The resulting output will be
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Hello, world.</title> </head> <body><p>Hi, Mike.</p></body> </html>
How It Works
The first thing to notice is that XSLT doesn’t just transform XML, it also is XML. Thus, XSLT can be easily generated and manipulated using readily available programming tools. You can even use XSLT to transform another XSLT document. The downside is that even simple constructs require XML elements in XSLT, making the language very verbose. Keep in mind that because XSLT documents are written in XML, they also must be well-formed XML documents.
Now let’s dissect the example stylesheet.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
This is the standard XSLT heading, which identifies the document as a stylesheet. The xmlns:xsl attribute is an XML namespace declaration. It identifies elements with the prefix xsl as part of the W3C XSLT specification. XSLT relies heavily on XML namespaces, and an XSLT processor will treat elements differently depending on the element’s namespace.
<xsl:output method="text"/>
The <xsl:output> element defines the output format that should be produced. The method attribute can have one of four values: xml, html, xhtml, or text.
- The
xmloutput method produces an XML document, or an XML fragment. - The
htmloutput method typically produces HTML 4.0 (though this is somewhat implementation dependent) and recognizes certain HTML conventions such as outputting<hr>elements with no end tag. - The
xhtmloutput method follows the rules of the xml output method, but sticks to conventions described in the XHTML specification. Note that this output method only works with XSLT 2.0, but we’ll discuss a workaround later in the article. - The
textoutput method can be used to output any other text-based format.
If the <xsl:output> element is missing, the XSLT processor tries to guess which output method to use. It will choose HTML if the output starts with an <html> element in the null namespace, XHTML if it starts with <html> in the XHTML namespace, and XML otherwise.
<xsl:template match="/">
The <xsl:template match="/"> element defines a template rule that will be triggered when a particular part of the source document is being processed. The match="/" attribute indicates that this rule should be triggered at the beginning of the document. The value of the match attribute is an XPath expression, and “/” identifies the document node of the document.
Once a template is triggered, the body of the template tells the XSLT processor what output to generate. Most of the template body here is HTML. Since the tags are not in the XSLT namespace the processor will copy the elements into the output file. However, the <xsl:value-of> element is in the XSLT namespace, and has special meaning to the processor. This instruction copies the text from a node in the source document to the output document. The select attribute specifies the node whose value should be copied using an XPath expression. The name XPath expression tells the processor to find the set of all <name> elements that are children of the node that is currently being processed (in this case the document node). The processor then extracts the text of this element and inserts it into the output document.
That’s basically a complicated way of saying that the processor will copy “Mike” from the element <name>Mike</name>, and insert it into the output document.
Getting Fancy
If you’re still with me, I’d like to jump right into a more complicated stylesheet that web developers might find useful. So, without further ado, I’d like to present my solution to the float clearing problem.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:m="http://immike.net/m"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">
<xsl:output method="xml"
version="1.0"
encoding="UTF-8"
indent="yes"
omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" />
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="xhtml:div[@m:clear='true']">
<div>
<xsl:apply-templates select="@* | node()"/>
<div style="clear: both; height: 0px; line-height: 0px;"> </div>
</div>
</xsl:template>
<xsl:template match="@m:*" />
</xsl:stylesheet>
I know, it looks like a monster. But bear with me, it’ll all make sense in a moment.
Let’s break this one down into smaller chunks that will be a bit easier to understand.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:m="http://immike.net/m" xmlns="http://www.w3.org/1999/xhtml" exclude-result-prefixes="xhtml">
Again, this is the standard XSLT heading which tells the processor that this document is a stylesheet. We’ve also declared namespaces for xsl, xhtml, and some custom markup using the prefix ‘m’, that I’ll be using to extend xhtml.
Notice that the XHTML namespace is declared twice, once with the xhtml prefix and once as the document’s default namespace (no prefix). Because XHTML uses a default namespace, we can’t access an XHTML document’s nodes without using a namespace prefix in our stylesheet. The ‘xhtml’ prefix will be used to access nodes in the xhtml namespace in the source document. Setting the default namespace as XHTML will keep the XSLT processor from adding namespace attributes to each XML fragment it outputs. If you’re confused at this point, you can read more here, or ignore this paragraph and remember this pattern whenever you’re transforming XHTML to XHTML.
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
Here’s our workaround to produce XHTML output using XSLT 1.0. The output method is xml. but we’ve instructed the XSLT processor to omit the XML declaration (it triggers quirks mode in IE 6.0). Finally, the two doctype attributes will produce an XHTML strict document type declaration at the top of our output document.
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
This is a standard template that is used in many transformations. It’s called an identity stylesheet, or copy stylesheet. The XPath expression @* matches any attribute (the ‘@’ character is used to match an attribute in XPath). The node() function returns true for any node. And the pipe character (’|') means or. So our complete expression, @* | node() will match every attribute, and every node in the source document.
The <xsl:copy> element is used to copy portions of the source document to the output document. Between the opening and closing <xsl:copy> tags there is an <xsl:apply-templates> tag. This is where the magic happens.
An XPath expression is assigned to the select attribute of the apply-templates element. The apply-templates element then triggers the processing of all nodes that match the expression, using the templates in our stylesheet that match those nodes.
So we’re basically doing a deep copy… unless the tag matches the next template in our stylesheet.
<xsl:template match="xhtml:div[@m:clear='true']">
<div>
<xsl:apply-templates select="@* | node()"/>
<div style="clear: both; height: 0px; line-height: 0px;">
 
</div>
</div>
</xsl:template>
This template matches any <div> element that has an m:clear attribute set to true. It copies the matching element verbatim, with one minor addition. Before the closing tag the template outputs an extra div element with zero height, and with its style set to clear: both;. The end result is that you can add an attribute to any div tag telling it to clear any floats inside of it. No need for fancy CSS hacks or javascript.
<xsl:template match="@m:*" />
This last template matches any attribute with the ‘m’ prefix and outputs nothing, effectively stripping our non-standard xhtml extensions from the document.
Applying the Transformation
Most modern programming languages have functionality built in that makes it trivially easy to apply XSLT transformations to an XML document. Here is a quick PHP5 command-line program that demonstrates how to use XSLT, and can be easily adapted for use in many applications.
if(count($argv) < 3) {
die("Usage: php xslt.php <xml> <xslt>\n");
}
$xsl = new XSLTProcessor();
$doc = new DOMDocument();
$doc->load($argv[2]);
$xsl->importStyleSheet($doc);
$doc->load($argv[1]);
$result = $xsl->transformToXml($doc);
print($result);
May 9th, 2007 at 6:52 pm
I’ve been a developer for a long time and whenever I see something on XLST I read it, and while it certainly makes sense, I’ve never understood what the benefit XLST has over using other means - like simply using php or other existing technologies to achieve a similar result.
I worked on a project last year where the developer used xlst templates to display forms, which really seemed like extra work than was truly necessary. The exact same result could have easily been done with php, which was already being used, and the plain html. Adding XSLT files and a class/PEAR package to process the transformation seemed over-kill…
Can you provide any insight as to what benefit using XSLT has?
May 9th, 2007 at 10:27 pm
I have to agree with Mike - I also think XSLT is cool - but it has very little or no advantage over a PHP only templating solution.
May 10th, 2007 at 10:09 am
mike:
I think the easiest way to understand why XSLT is useful is to compare it to SQL. There isn’t anything that SQL can do that PHP, Java, Ruby, or any other language can’t. But, as a declarative language, SQL allows you to describe what you want without describing how you want it done.
This separation of what from how is what makes SQL attractive. It makes code more portable and more maintainable, and it offloads some of the implementation details (the how part) to the developers of the SQL engine - when MySQL or Oracle or whoever upgrades their SQL engine, every program that uses the database benefits.
XSLT is also declarative. Instead of listing an imperative sequence of actions you list a template rule that tells the processor what to do with a particular type of node, should it come across one.
The examples I gave here probably aren’t the best if you’re not sure whether XSLT is useful. They’re too trivial to demonstrate the languages capabilities. But XSLT is a very effective solution in many cases, particularly if you’re trying to convert between two XML schemas (that is what the language was designed for, after all).
Anyways, like any language XSLT has strengths and weaknesses. But I don’t think it gets as much attention as it deserves.
May 10th, 2007 at 10:50 am
So it sounds like XSLT is more for implementing templates across languages, as it’s really just more work if you’re only using a single language. That makes sense, although it’s difficult for me to envision any applicable use of such a technology.
May 13th, 2007 at 4:42 pm
[...] XSLT for web developers XSLT is a powerful technology that can be used to transform XML documents into something else (like XHTML, CSS, or SQL). This post will briefly introduce XSLT and perform some simple transformations to an XHTML document. (tags: XSLT XHTML) [...]
May 16th, 2007 at 5:55 am
It’s all about transformation.
An other attractive use of xslt is the separation of the data layer and the presentation layer. You could say that php does this also. True. But you are bind with php once you go that way. With xslt you can choose whatever language, platform, technology you like (call it java, .net, php or whatever else), without changing a bit in your xslt or the xml data.
I agree although that more effort is initially required to create simple things. The real benefit comes later.
I bet you can find other great uses and posibilities if you dig a litle deeper.
May 16th, 2007 at 9:15 am
Yes, it is all about transformation! I agree with you, and I do use XSLT in the way you describe. I’m not completely sold on the utility of this approach, but I am working on a project and trying it out - as usual, there are good things and bad things about it.
XSLT is so different from languages like PHP, Java, etc. that it becomes very obvious when you find a shortcoming. When I can’t do something that would be easy in PHP it gets frustrating. On the other hand, the reverse happens just as often: something that would be difficult or inefficient to do in PHP is very simple, so it’s a trade off (as usual).