Some time ago, I spoke to Daniel Veillard the author of Libxslt and Libxml. When Mono started, we wanted to reuse as code as possible, and those two libraries made a lot of sense to reuse.
The problem is that .NET Framework defines three mechanism for handling Xml: the Dom model, the pull parser, and the path navigator (a DOM-like interface). It was not possible for us to just use libxml to implement our Xml needs, it would have added too much overhead, and would have been an ugly hack.
.NET contains two pretty clever tricks: the pull parser (XmlTextReader and XmlTextWriter) and the path navigator, which exposes a cursor-based navigation interface. The beauty of the interface is that you can wrap any kind of data source and expose it to the XML subsystem with this interface. So you can wrap a database, a comma separated value file, a file system, anything you can think of, and dynamically provide the content. Interesting because it is possible to use XPath queries on top of it, or apply xslt transformations. XSLT in .NET is built completely on top of the XPathNavigator.
For a long time, in Mono we used libxslt: we would dump all the data from an XmlDocument, or a XPathNavigator into a temporary file, and then use libxslt to do the heavy lifting.
This approach had various problems: libxslt was not designed to be thread safe (a requirement for us: on ASP.NET we need to be able to do xslt transformations from multiple threads), so we had to add big locks around the xslt invocations. Also, it was not possible to create a context for the transformation, so extension functions were global to all the transformations. The only solution was to shutdown and restart the libxslt engine every time we did a transformation, not really optimal.
Fixing the problem was not going to be easy. Libxslt is a very large piece of code, and Daniel estimated that re-implementing it would take about a year. That is why I discouraged 16-year old Mono developer Ben Maurer from working on this project. He was not really ready to spend a year of hairy coding, and would likely not be able to do something even as fast as Libxslt.
Ben ignored my advise and went on to implement a managed implementation of XSLT. The idea was to remove the libxsl dependency: remove the temporary files, fix the extension object problem, and the locks.
In the course of a couple of weeks this summer, Ben had the basics of XSLT done, and he recruited the help of Atsushi to help with problems and missing features in the XML core, and Piers to fix and improve our XPath implementation.
The three hacker team in less than a month did a tremendous amount of progress implementation was able to be used instead of Libxslt for Monodoc.
The surprising news this week is that Ben has made Mono's managed XSLT faster than the C-based libxslt. Faster on a number of the XSLTMark tests and on other practical stylesheets (including the stylesheet used in Monodoc). The performance can be attributed partially to some performance improvements Atsushi did to our handling of XPathDocuments. The other part was the tireless work of Ben in doing performance tuning in our implementation: from profile-driven changes like removing calls to foreach on ArrayLists with loops to architectural changes. The biggest improvement was adding an interface for resolving functions/variables at compile time.
Atsushi mentioned that lots of the performance improvements came from Ben reusing concepts from SAXON. SAXON is the fastest xslt implementation out there, and its written in Java.
Congratulations to all the Mono XML crazy hackers for achieving this fantastic progress in so little time.
The Mono IL assembler has finally matured to the point where it can be used to bootstrap the Oberon IL compiler on Mono. The details are in Jackson's Blog.
Of course, some of you might want to laugh because we got generics support before Oberon worked ;-)
Go Jackson!
Anne talks about our recent discussion. She wants to reuse the Mono CLI, and build on top of it patent-free technologies along the lines of the technologies done in the past for Java (Cocoon, Struts).
She is absolutely right about the need to build a parallel universe of technologies. In Mono we have already started this process:
Posted on 14 Sep 2003