The file format wars between Open Document
Format (ODF) file format against the Office Open XML
(OOXML) are getting heated.
There are multiple discussions taking place and I have been
ignoring it for the most part.
This morning I read on News.com an
interview with Thomas Vinje. Thomas is part of the legal
team representing some companies in the EU against Microsoft.
The bit in the interview that caught my attention was the
following quote:
We filed our complaint with the Commission last February,
over Microsoft's refusal to disclose their Office file formats
(.doc, .xls, and .ppt), so it could be fully compatible and
interoperable with others' software, like Linux. We also had
concerns with their collaboration software with XP, e-mail
software, and OS server software and some media server
software with their existing products. They use their vast
resources to delay things as long as possible and to wear
people down so they'll give up.
And in July, we updated our complaint to reflect our
concerns with Microsoft's "open XML." (Microsoft's Office Open
XML is a default document format for its Office 2007 suite.)
And last month, we supplemented that information with concerns
we had for .Net 3.0 (software designed to allow Vista
applications to run on XP and older-generation operating
systems).
And I think that the group is not only shooting themselves
in the foot, they are shooting all of our collective open
source feet.
I'll explain.
Open Source and Open Standards
For a few years, those of us advocating open source
software have found an interesting niche to push open source:
the government niche.
The argument goes along these lines: Open Office is just as
good as Microsoft Office; Open Office is open source, so it
comes for free; You can actually nurture your economy if you
push for a local open source software industry.
The argument makes perfect sense, most people will agree to
it, but every once in a while our advocacy has faced some
problems: Microsoft Office might have some features that we do
not have, the cost of migration is not zero, existing
licensing deals sweeten the spot, and there are compatibility
corner cases that slow down the adoption.
A new powerful argument was identified a few years back,
when Congressman Edgar Villanueva in 2002 replied
to a letter from Microsoft's
Peru General Manager.
One of the key components at the time was that the
government would provide free access to government
information and the permanence of public data. The letter
those two points said:
To guarantee the free access of citizens to public
information, it is indispensable that the encoding of data is
not tied to a single provider. The use of standard and open
formats gives a guarantee of this free access, if necessary
through the creation of compatible free software.
To guarantee the permanence of public data, it is necessary
that the usability and maintenance of the software does not
depend on the goodwill of the suppliers, or on the monopoly
conditions imposed by them. For this reason the State needs
systems the development of which can be guaranteed due to the
availability of the source code.
The letter is a great document, but the bit that am
interested in is the bit about open standards.
Using Open Standards to Promote Open Source
Open standards and the need for public access to
information was a strong message. This became a key component
of promoting open office, and open source software. This
posed two problems:
First, those promoting open standards did not stress the
importance of having a fully open source implementation of an
office suite.
Second, it assumed that Microsoft would stand still and
would not react to this new change in the market.
And that is where the strategy to promote the open source
office suite is running into problems. Microsoft did not
stand still. It reacted to this new requirement by creating a
file format of its own, the OOXML.
Technical Merits of OOXML and ODF
Unlike the XML Schema vs Relax NG discussion where the
advantages of one system over the other are very clear, the
quality differences between the OOXML and ODF markup are hard
to articulate.
The high-level comparisons so far have focused on tiny
details (encoding, model used for the XML). There is nothing
fundamentally better or worse in those standards like there is
between XML Schema and Relax NG.
ODF grew out of OpenOffice.org and is influenced by its
internal design. OOXML grew out of Microsoft Office and it is
influenced by its internal design. No real surprises there.
The Size of OOXML
A common objection to OOXML is that the specification is
"too big", that 6,000 pages is a bit too much for a
specification and that this would prevent third parties from
implementing support for the standard.
Considering that for years we, the open source community,
have been trying to extract as much information about
protocols and file formats from Microsoft, this is actually a
good thing.
For example, many years ago, when I was working on
Gnumeric, one of the issues that we ran into was that the
actual descriptions for functions and formulas in Excel was
not entirely accurate from the public books you could buy.
OOXML devotes 324 pages of the standard to document the
formulas and functions.
The original submission to the ECMA TC45 working group did
not have any of this information. Jody Goldberg and Michael
Meeks that represented Novell at the TC45 requested the
information and it eventually made it into the standards. I
consider this a win, and I consider those 324 extra pages a
win for everyone (almost half the size of the ODF standard).
Depending on how you count, ODF has 4 to 10 pages devoted
to it. There is no way you could build a spreadsheet
software based on this specification.
To build a spreadsheet program based on ODF you would have
to resort to an existing implementation source code
(OpenOffice.org, Gnumeric) or you would have to resort to
Microsoft's public documentation or ironically to the OOXML
specification.
The ODF Alliance
in their OOXML
Fact Sheet conveniently ignores this issue.
I guess the fairest thing that can be said about a spec
that is 6,000 pages long is that printing it out kills too
many trees.
Individual Problems
There is a compilation being tracked in here,
but some of the objections there show that the people writing
those objections do not understand the issues involved.
Do as I say, not as I do
Some folks have been using a Wiki to keep track of the
issues with OOXML. The motivation for tracking these issues
seems to be politically inclined, but it manages to pack some
important technical issues.
The site is worth exploring and some of the bits there are
solid, but there are also some flaky cases.
Some of the objections over OOXML are based around the fact
that it does not use existing ISO standards for some of the
bits in it. They list 7 ISO standards that OOXML does not
use: 8601 dates and times; 639 names and languages; 8632
computer graphics and metafiles; 10118-3 cryptography as well
as a handful of W3C standards.
By comparison, ODF only references three ISO standards:
Relax NG (OOXML also references this one), 639 (language
codes) and 3166 (country codes).
Not only it is demanded that OOXML abide by more standards
than ISO's own ODF does, but also that the format used for
metafiles from 1999 be used. It seems like it would prevent
some nice features developed in the last 8 years for no other
reason than "there was a standard for it".
ODF uses SMIL and SVG, but if you save a drawing done in a
spreadsheet it is not saved as SVG, it is saved using its own
format (Chapter 9) and sprinkles a handful of SVG attributes
to store some information (x, y, width, height).
There is an important-sounding "Ecma 376 relies on
undisclosed information" section, but it is a weak case: The
case is that Windows Metafiles are not specified.
It is weak because the complaint is that Windows Metafiles
are not specified. It is certainly not in the standard, but
the information is publicly available and is hardly
"undisclosed information". I would vote to add the
information to the standard.
More on the Size of the Spec
A rough breakdown of OOXML:
- ~100 page "Fundamentals" document;
- ~200 page "Packaging Conventions" document;
- ~450 page "Primer" document (a tutorial);
- ~1850 page Word Processing reference document;
- ~1090 page Spreadsheet Processing reference
document;
- ~270 page Presentation Processing reference
document;
- ~1140 page Drawing Processing reference document;
- ~900 pages for other references (VML, SharedML)
- ~42 future extensibility document.
I have obviously not read the entire specification, and am
biased towards what I have seen in the spreadsheet angle. But
considering that it is impossible to implement a spreadsheet
program based on ODF, am convinced that the analysis done by
those opposing OOXML is incredibly shallow, the burden is on
them to prove that ODF is "enough" to implement from scratch
alternative applications.
If Microsoft had produced 760 pages (the size of ODF) as
the documentation for the ".doc", ".xls" and ".ppt" that
lacked for example the formula specification, wouldn't people
justly complain that the specification was incomplete and was
useless?
I would have to agree at that point with the EU that not
enough information was available to interoperate with
Microsoft Office.
If anything, if I was trying to interoperate with Microsoft
products, I would request more, not less.
SVG
Then there is the charge about not using SVG in OOXML.
There are a few things to point out about SVG.
Referencing SVG would pull virtually every spec that the
W3C has produced (Javascript, check; CSS, check; DOM,
check).
This can be deceptive in terms of the "size" of the
specification, but also makes it incredibly complex to
support. To this date am not aware of a complete open source
SVG implementation (and Gnome has been at the forefront of
trying out SVG, going back to 1999).
But to make things worse, OpenOffice
does not support SVG today, and interop in the SVG land leaves
a lot to be desired.
Some of my friends that have had to work with SVG have
complained extensively to me in the past about it. One
friend said "Adobe has completely hijacked it" referring to
the bloatedness of SVG and how hard it is to implement it
today.
At the time of this comment, Adobe had not yet purchased
Macromedia, and it seemed like Adobe was using the standards
group and SVG to compete against Flash, making SVG
unnecessarily complicated.
Which is why open source applications only support a subset
of SVG, a sensible subset.
ISO Standarization
ODF is today an ISO standard. It spent some time in the
public before it was given its stamp of approval.
There is a good case to be made for OOXML to be further
fine-tuned before it becomes an ISO standard. But
considering that Office 2007 has shipped, I doubt that any
significant changes to the file format would be implemented in
the short or medium term.
The best possible outcome in delaying the stamp of approval
for OOXML would be to get further clarifications on the
standard. Delaying it on the grounds of technical limitations
is not going to help much.
Considering that ODF spent a year receiving public scrutiny
and it has holes the size of the Gulf of Mexico, it seems that the
call for delaying its adoption is politically based and not
technically based.
XAML and .NET 3.0
From another press
release from the group:
"Vista is the first step in Microsoft‘s strategy to extend its
market dominance to the Internet," Awde stressed. For example,
Microsoft's "XAML" markup language, positioned to replace HTML
(the current industry standard for publishing language on the
Internet), is designed from the ground up to be dependent on
Windows, and thus is not cross-platform by nature.
...
"With XAML and OOXML Microsoft seeks to impose its own
Windows-dependent standards and displace existing open
cross-platform standards which have wide industry acceptance,
permit open competition and promote competition-driven
innovation. The end result will be the continued absence of
any real consumer choice, years of waiting for Microsoft to
improve - or even debug - its monopoly products, and of course
high prices," said Thomas Vinje, counsel to ECIS and spokesman
on the issue.
He is correct that XAML/WPF will likely be adopted by many
developers and probably some developers will pick it over HTML
development.
I would support and applaud his efforts to require the
licensing of the XAML/WPF specification under the Microsoft
Open Specification Promise.
But he is wrong about XAML/WPF being inherently tied to
Windows. XAML/WPF are large bodies of code, but they expose
fewer dependencies on the underlying operating system than
.NET 2.0's Windows.Forms API does. It is within our reach to
bring to Linux and MacOS.
We should be able to compete on technical grounds with
Microsoft's offerings. Developers interested in bringing
XAML/WPF can join the Mono project, we have some bits and
pieces implemented as part of our Olive sub
project.
I do not know how fast the adoption of XAML/WPF will be,
considering that unlike previous iterations of .NET, gradual
adoption of WPF is not possible. Unlike .NET 2.0 which was an
incremental upgrade for developers, XAML/WPF requires software
to be rewritten to take advantage of it.
The Real Problem
The real challenge today that open source faces in the
office space is that some administrations might choose to move
from the binary office formats to the OOXML formats and that
"open standards" will not play a role in promoting
OpenOffice.org nor open source.
What is worse is that even if people manage to stop OOXML
from becoming an ISO standard it will be an ephemeral victory.
We need to recognize that this is the problem. Instead
of trying to bury OOXML, which amounts to covering the sun
with your finger.
We need to make sure that OpenOffice.org can thrive on its
technical grounds.
In Closing
This is not a complete study of the problems that OOXML
has, as I said, it has its share of issues. But it has its
share of issues just like the current ODF standard has.
To make ODF successful, we need to make OpenOffice.org a better
product, and we need to keep improving it. It is very easy
to nitpick a standard, specially one that is as big as OOXML.
But it is a lot harder to actually improve OpenOffice.org.
If everyone complaining about OOXML was actually hacking on
improving OpenOffice.org to make it a technically superior
product in every sense we would not have to resort, as a
community, to play a political case on weak grounds.
I also plan on updating this blog entry as people correct
me (unlike ODF, my blog entries actually contain mistakes ;-)
Updates -- February 1st
There are a few extra points that I was made aware of after
I posted this blog entry.
Standard Size
Christian Stefan wrote me to point out that the OOXML
specification published by ECMA uses 1.5 line spacing, while
OASIS uses single spacing. I quote from his message:
ODF 722 pages
SVG 719
MathML 665
XForms 152 (converted from html using winword, ymmv)
XLink 36 (converted from html using winword, ymmv)
SMIL 537 (converted from html using winword, ymmv)
OpenFormula 371
----
3,202
Now I'm still missing some standards that would add severall hundred
pages and changing line spacing to 1.5 will bring me near the 6000
pages mark I guess. This is not very surprising (at least for me)
since both standards try to solve very similar problems with nearly
equal complexity.
Review Time
The "one month to review OOXML" meme that has been going
around the net turns out to be false.
It is unclear where it originated. Brian Jones from
Microsoft has a complete
explanation. For OOXML to become a standard it is going
to take sevent months at least.