The EU Prosecutors are Wrong.

The file format wars between Open Document Format (ODF) file format against the Office Open XML (OOXML) are getting heated.

There are multiple discussions taking place and I have been ignoring it for the most part.

This morning I read on News.com an interview with Thomas Vinje. Thomas is part of the legal team representing some companies in the EU against Microsoft.

The bit in the interview that caught my attention was the following quote:

We filed our complaint with the Commission last February, over Microsoft's refusal to disclose their Office file formats (.doc, .xls, and .ppt), so it could be fully compatible and interoperable with others' software, like Linux. We also had concerns with their collaboration software with XP, e-mail software, and OS server software and some media server software with their existing products. They use their vast resources to delay things as long as possible and to wear people down so they'll give up.

And in July, we updated our complaint to reflect our concerns with Microsoft's "open XML." (Microsoft's Office Open XML is a default document format for its Office 2007 suite.) And last month, we supplemented that information with concerns we had for .Net 3.0 (software designed to allow Vista applications to run on XP and older-generation operating systems).

And I think that the group is not only shooting themselves in the foot, they are shooting all of our collective open source feet.

I'll explain.

Open Source and Open Standards

For a few years, those of us advocating open source software have found an interesting niche to push open source: the government niche.

The argument goes along these lines: Open Office is just as good as Microsoft Office; Open Office is open source, so it comes for free; You can actually nurture your economy if you push for a local open source software industry.

The argument makes perfect sense, most people will agree to it, but every once in a while our advocacy has faced some problems: Microsoft Office might have some features that we do not have, the cost of migration is not zero, existing licensing deals sweeten the spot, and there are compatibility corner cases that slow down the adoption.

A new powerful argument was identified a few years back, when Congressman Edgar Villanueva in 2002 replied to a letter from Microsoft's Peru General Manager.

One of the key components at the time was that the government would provide free access to government information and the permanence of public data. The letter those two points said:

To guarantee the free access of citizens to public information, it is indispensable that the encoding of data is not tied to a single provider. The use of standard and open formats gives a guarantee of this free access, if necessary through the creation of compatible free software.

To guarantee the permanence of public data, it is necessary that the usability and maintenance of the software does not depend on the goodwill of the suppliers, or on the monopoly conditions imposed by them. For this reason the State needs systems the development of which can be guaranteed due to the availability of the source code.

The letter is a great document, but the bit that am interested in is the bit about open standards.

Using Open Standards to Promote Open Source

Open standards and the need for public access to information was a strong message. This became a key component of promoting open office, and open source software. This posed two problems:

First, those promoting open standards did not stress the importance of having a fully open source implementation of an office suite.

Second, it assumed that Microsoft would stand still and would not react to this new change in the market.

And that is where the strategy to promote the open source office suite is running into problems. Microsoft did not stand still. It reacted to this new requirement by creating a file format of its own, the OOXML.

Technical Merits of OOXML and ODF

Unlike the XML Schema vs Relax NG discussion where the advantages of one system over the other are very clear, the quality differences between the OOXML and ODF markup are hard to articulate.

The high-level comparisons so far have focused on tiny details (encoding, model used for the XML). There is nothing fundamentally better or worse in those standards like there is between XML Schema and Relax NG.

ODF grew out of OpenOffice.org and is influenced by its internal design. OOXML grew out of Microsoft Office and it is influenced by its internal design. No real surprises there.

The Size of OOXML

A common objection to OOXML is that the specification is "too big", that 6,000 pages is a bit too much for a specification and that this would prevent third parties from implementing support for the standard.

Considering that for years we, the open source community, have been trying to extract as much information about protocols and file formats from Microsoft, this is actually a good thing.

For example, many years ago, when I was working on Gnumeric, one of the issues that we ran into was that the actual descriptions for functions and formulas in Excel was not entirely accurate from the public books you could buy.

OOXML devotes 324 pages of the standard to document the formulas and functions.

The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).

Depending on how you count, ODF has 4 to 10 pages devoted to it. There is no way you could build a spreadsheet software based on this specification.

To build a spreadsheet program based on ODF you would have to resort to an existing implementation source code (OpenOffice.org, Gnumeric) or you would have to resort to Microsoft's public documentation or ironically to the OOXML specification.

The ODF Alliance in their OOXML Fact Sheet conveniently ignores this issue.

I guess the fairest thing that can be said about a spec that is 6,000 pages long is that printing it out kills too many trees.

Individual Problems

There is a compilation being tracked in here, but some of the objections there show that the people writing those objections do not understand the issues involved.

Do as I say, not as I do

Some folks have been using a Wiki to keep track of the issues with OOXML. The motivation for tracking these issues seems to be politically inclined, but it manages to pack some important technical issues.

The site is worth exploring and some of the bits there are solid, but there are also some flaky cases.

Some of the objections over OOXML are based around the fact that it does not use existing ISO standards for some of the bits in it. They list 7 ISO standards that OOXML does not use: 8601 dates and times; 639 names and languages; 8632 computer graphics and metafiles; 10118-3 cryptography as well as a handful of W3C standards.

By comparison, ODF only references three ISO standards: Relax NG (OOXML also references this one), 639 (language codes) and 3166 (country codes).

Not only it is demanded that OOXML abide by more standards than ISO's own ODF does, but also that the format used for metafiles from 1999 be used. It seems like it would prevent some nice features developed in the last 8 years for no other reason than "there was a standard for it".

ODF uses SMIL and SVG, but if you save a drawing done in a spreadsheet it is not saved as SVG, it is saved using its own format (Chapter 9) and sprinkles a handful of SVG attributes to store some information (x, y, width, height).

There is an important-sounding "Ecma 376 relies on undisclosed information" section, but it is a weak case: The case is that Windows Metafiles are not specified.

It is weak because the complaint is that Windows Metafiles are not specified. It is certainly not in the standard, but the information is publicly available and is hardly "undisclosed information". I would vote to add the information to the standard.

More on the Size of the Spec

A rough breakdown of OOXML:

  • ~100 page "Fundamentals" document;
  • ~200 page "Packaging Conventions" document;
  • ~450 page "Primer" document (a tutorial);
  • ~1850 page Word Processing reference document;
  • ~1090 page Spreadsheet Processing reference document;
  • ~270 page Presentation Processing reference document;
  • ~1140 page Drawing Processing reference document;
  • ~900 pages for other references (VML, SharedML)
  • ~42 future extensibility document.

I have obviously not read the entire specification, and am biased towards what I have seen in the spreadsheet angle. But considering that it is impossible to implement a spreadsheet program based on ODF, am convinced that the analysis done by those opposing OOXML is incredibly shallow, the burden is on them to prove that ODF is "enough" to implement from scratch alternative applications.

If Microsoft had produced 760 pages (the size of ODF) as the documentation for the ".doc", ".xls" and ".ppt" that lacked for example the formula specification, wouldn't people justly complain that the specification was incomplete and was useless?

I would have to agree at that point with the EU that not enough information was available to interoperate with Microsoft Office.

If anything, if I was trying to interoperate with Microsoft products, I would request more, not less.

SVG

Then there is the charge about not using SVG in OOXML. There are a few things to point out about SVG.

Referencing SVG would pull virtually every spec that the W3C has produced (Javascript, check; CSS, check; DOM, check).

This can be deceptive in terms of the "size" of the specification, but also makes it incredibly complex to support. To this date am not aware of a complete open source SVG implementation (and Gnome has been at the forefront of trying out SVG, going back to 1999).

But to make things worse, OpenOffice does not support SVG today, and interop in the SVG land leaves a lot to be desired.

Some of my friends that have had to work with SVG have complained extensively to me in the past about it. One friend said "Adobe has completely hijacked it" referring to the bloatedness of SVG and how hard it is to implement it today.

At the time of this comment, Adobe had not yet purchased Macromedia, and it seemed like Adobe was using the standards group and SVG to compete against Flash, making SVG unnecessarily complicated.

Which is why open source applications only support a subset of SVG, a sensible subset.

ISO Standarization

ODF is today an ISO standard. It spent some time in the public before it was given its stamp of approval.

There is a good case to be made for OOXML to be further fine-tuned before it becomes an ISO standard. But considering that Office 2007 has shipped, I doubt that any significant changes to the file format would be implemented in the short or medium term.

The best possible outcome in delaying the stamp of approval for OOXML would be to get further clarifications on the standard. Delaying it on the grounds of technical limitations is not going to help much.

Considering that ODF spent a year receiving public scrutiny and it has holes the size of the Gulf of Mexico, it seems that the call for delaying its adoption is politically based and not technically based.

XAML and .NET 3.0

From another press release from the group:

"Vista is the first step in Microsoft‘s strategy to extend its market dominance to the Internet," Awde stressed. For example, Microsoft's "XAML" markup language, positioned to replace HTML (the current industry standard for publishing language on the Internet), is designed from the ground up to be dependent on Windows, and thus is not cross-platform by nature.

...

"With XAML and OOXML Microsoft seeks to impose its own Windows-dependent standards and displace existing open cross-platform standards which have wide industry acceptance, permit open competition and promote competition-driven innovation. The end result will be the continued absence of any real consumer choice, years of waiting for Microsoft to improve - or even debug - its monopoly products, and of course high prices," said Thomas Vinje, counsel to ECIS and spokesman on the issue.

He is correct that XAML/WPF will likely be adopted by many developers and probably some developers will pick it over HTML development.

I would support and applaud his efforts to require the licensing of the XAML/WPF specification under the Microsoft Open Specification Promise.

But he is wrong about XAML/WPF being inherently tied to Windows. XAML/WPF are large bodies of code, but they expose fewer dependencies on the underlying operating system than .NET 2.0's Windows.Forms API does. It is within our reach to bring to Linux and MacOS.

We should be able to compete on technical grounds with Microsoft's offerings. Developers interested in bringing XAML/WPF can join the Mono project, we have some bits and pieces implemented as part of our Olive sub project.

I do not know how fast the adoption of XAML/WPF will be, considering that unlike previous iterations of .NET, gradual adoption of WPF is not possible. Unlike .NET 2.0 which was an incremental upgrade for developers, XAML/WPF requires software to be rewritten to take advantage of it.

The Real Problem

The real challenge today that open source faces in the office space is that some administrations might choose to move from the binary office formats to the OOXML formats and that "open standards" will not play a role in promoting OpenOffice.org nor open source.

What is worse is that even if people manage to stop OOXML from becoming an ISO standard it will be an ephemeral victory.

We need to recognize that this is the problem. Instead of trying to bury OOXML, which amounts to covering the sun with your finger.

We need to make sure that OpenOffice.org can thrive on its technical grounds.

In Closing

This is not a complete study of the problems that OOXML has, as I said, it has its share of issues. But it has its share of issues just like the current ODF standard has.

To make ODF successful, we need to make OpenOffice.org a better product, and we need to keep improving it. It is very easy to nitpick a standard, specially one that is as big as OOXML. But it is a lot harder to actually improve OpenOffice.org.

If everyone complaining about OOXML was actually hacking on improving OpenOffice.org to make it a technically superior product in every sense we would not have to resort, as a community, to play a political case on weak grounds.

I also plan on updating this blog entry as people correct me (unlike ODF, my blog entries actually contain mistakes ;-)

Updates -- February 1st

There are a few extra points that I was made aware of after I posted this blog entry.

Standard Size

Christian Stefan wrote me to point out that the OOXML specification published by ECMA uses 1.5 line spacing, while OASIS uses single spacing. I quote from his message:

ODF             722 pages
SVG             719
MathML		665
XForms          152     (converted from html using winword, ymmv)
XLink            36     (converted from html using winword, ymmv)
SMIL            537     (converted from html using winword, ymmv)
OpenFormula     371
                ----
              3,202

Now I'm still missing some standards that would add severall hundred
pages and changing line spacing to 1.5 will bring me near the 6000
pages mark I guess. This is not very surprising (at least for me)
since both standards try to solve very similar problems with nearly
equal complexity.

Review Time

The "one month to review OOXML" meme that has been going around the net turns out to be false. It is unclear where it originated. Brian Jones from Microsoft has a complete explanation. For OOXML to become a standard it is going to take sevent months at least.

Posted on 30 Jan 2007 by Miguel de Icaza
This is a personal web page. Things said here do not represent the position of my employer.