Tuesday, August 14, 2007

ODF support: Express your viewpoints

(Importance to India: This might be a long post, so if you are curious about why this is important to India, skip to the end of the post.)

You are reading this blog due to the magic of HTML (or its successor, XHTML, an XML file format). The magic is that I don't know what your browser is, and it does not affect my selection of browser. Currently I'm using Firefox, but in the past I have used different browsers to view posts. It has not affected you. You could move your browser to Opera, Mozilla's Firefox, Safari, Internet Explorer, or even Links2, and you can read this website. In short, there is complete independence on your part when viewing my blog. It is the same with email. My mail program has little to do with your mail program. As far as we follow mail standards, we can communicate by email. In the past, I have used at least five different email clients, and yet the recipients of my email weren't forced to use my email program.

Shouldn't all documents be like that? When you mail someone a file, they should be able to choose which program they want to view it. I don't force you to use Firefox version to view this blog, just because that is the browser that I use to create this. But if I use office documents, the same is not true. If someone sends me a Microsoft Office file, they must specify the program and the version used to create the file. This is why it is sometimes preferred to send PDF or image files, because they are standards, and can be viewed through many competing programs.

Office file formats (letters, presentations, spreadsheets) are very common, and these files need to be read by a majority of computer users today. It is easy for Government websites to put up applications and forms online, so you can download them, print them out, and fill them in the comfort of your own home. What used to take a visit to the local Municipal office can be done at home. So far, so good. But in the absence of an HTML or JPEG like standard, there is a glut of competing office file formats. The most common office productivity software is Microsoft's Office program, though there are competing programs like Open Office.

The Microsoft Office file formats are not open. There is no central information source which tells you how to decode them. Despite this, many volunteers have tried to figure out (and reverse-engineer) MS Office file formats. This is why Open Office can read MS Office formats. The reverse engineering isn't feature-complete, since only Microsoft knows what the file format contains. It is workable, but not something to rely on for large volumes of data. Further, as Microsoft continues to evolve the file format, the volunteers have to struggle to keep up. It is a slow and painful process, and not something that is a reliable solution in the long term.

The long term solution is to have an HTML like format for office documents. Then, much like this blog, you can use your favorite program to view the content. You can choose to use Microsoft's Office suite, much like you are free to use Microsoft's Internet Explorer to view this blog. I cannot enforce my choice on you. On the other hand, if you find Open Office to be better suited to your needs (perhaps because of its Hindi support), then you can choose to use that. The choice is yours.

This choice of program is different from the choice of file format. Right now, I don't have a choice of which file format to publish this blog in. It is HTML, no matter how much I might dislike it. HTML is an open format, and another competing open format wouldn't add much to mix. If you had two open formats, every office suite would be forced to implement both formats. After all, the content could be in either, right? This is not a useful choice. This forces the developers of Office suites (both MS Office and Open Office) to spend twice the energy in supporting two different file formats, which do the identical thing.

Currently in India, there is only one specification of the power plug. Everyone makes power adapters that fit this plug. When you go to the market, you don't have to ask if it works with the power socket at your home. Of course it does: that is what standards accomplish. They guarantee interoperability. A choice in power sockets and adapters wouldn't do any good. Every toaster would have to be made with both plugs, because then you aren't sure what socket the consumer has. It has doubled the work without offering the user any new feature. Of course, the consumer pays for this confusion, since that second plug costs money to put in every appliance.

This is even more true of document standards. There is very little to be gained in having two competing open formats. As far as the first is truly open, and can be implemented by everyone, the second only adds more work for software authors, and confusion for users. Today, if you were to push a competitor to HTML, there is little to be gained. After all, HTML already is truly open and well supported. Why force every browser maker to waste time supporting another format?

Currently, there is already an office format that has ISO certification. It is called ODF (Open Document Format) and it is already supported by a variety of programs. Even Google Documents supports exporting and importing from it. In time, every program willing to read and write documents can support the ODF file format. No more emails with "Please open the file in version x of program y".

This marks a departure from the backward world of proprietary formats. Secret file formats are what props up the lack of choice of Office programs today. While there is a large choice of browsers, there is comparatively little choice in Office productivity software, all because of the closed nature of the file format. The ISO standardization is a step in the right direction. It is a good step for consumers, since we can choose a program based on technical merit rather than its inside-knowledge of a secret. It is a good step for software authors, since they have to support one format, which everyone else can read and write. It is a good step for interoperability, since my choice does not constrain yours.

Microsoft has traditionally held on to the Office productivity market due to its proprietary file format lock-in. With the standardization of the file format, they stand to lose this monopoly. While they are free to implement the ODF file format in Microsoft Office, this will mean that they must compete on features and price, much like other software authors. If Microsoft Office truly was a superior product, they wouldn't have any problem with this. The trouble is that MS Office might just face some competition from Open Office, much like Internet Explorer faces quite some competition from Firefox. Open Office is a free download. Even though Open Office might not have all the features of MS Office, it has the most of the common features. Many individuals may choose to pay more for the advanced features that MS Office has, but that number may be smaller than the number now.

Enter OOXML, Microsoft's competing document format. Its name is Office Open XML, named for maximum confusion. I'll call it MS OOXML, to be safe. This file format is neither open, nor truly XML. MS OOXML isn't open because it has lots of pieces that depend on the behaviour of previous Microsoft programs, which only Microsoft knows. MS OOXML isn't truly XML, because it lacks definite specification of many features (like Drawings, etc). Without a clear specification, it becomes impossible to implement this fully. MS is pushing MS OOXML as a competing document standard to the existing ISO standard ODF.

Microsoft's efforts are a definite step backward. Remember the websites in 199x that would say, "Supported under IE 5.0", and you were forced to get IE 5.0 to view the website? This is the world that Microsoft wants to take us back to. The growth of alternate browsers has forced website authors to write clean HTML, instead of HTML with the proprietary Microsoft extensions. This has benefited every user of the Web, and every website creator. If the website parses as clean HTML, it will show up fine on every browser.

Mark Shuttleworth has a good post about this, and Rob Weir has a wealth of insight into this matter. The No OOXML website is stocked with good information, a list of delegates by country, and cogent arguments against MS OOXML. As always, Pamela Jones at Groklaw has complied a giant list of links about this issue. I am not affiliated with any of these organizations, and have no technical background in this area. I recommend reading the facts for yourself, rather than relying on my views. However, if you do have a view that conflicts with mine, I'd be happy if you post a comment.

So this is important for India, because India is on the ISO committee, and is a member that is forced to vote on the issue. USA, China and South Africa have resolved to vote against a competing document format. They are voting for a single file format, which is the right step. India's delegation can be reached at
ird@bis.org.in, and it is in every Indian's interest that they vote "no" as well. Please take this time to write a concise, professional mail to them, indicating why you believe that a second file format isn't a good step. Respect the delegates' time, but do let them know what your views are. If you aren't an Indian, your delegate can be reached at an address listed here.