XML do's and don'ts

by Miguel de Icaza

Atsushi Enomoto has a great list of hints on developers using XML and Mono's XML if they care about performance, his list is available here.

String Collation

Mono originally was using IBM's Classes for Unicode (ICU) library. A C library that provides many tools to handle internationalized strings (like comparing strings, finding substrings, handling case sensitivity in a culture-aware fashion) and so on.

Basically all the managed code in Mono would call into the C runtime and the C runtime would use ICU's functionality to carry out the job. Unluckly Microsoft's behavior of the unicode operations differed from ICU implementation and the fixes that we applied in our wrapper code that use ICU were insufficient to provide the same semantics. Developers were running into various unexpected problems and erratic behavior that came out of our mapping which prompted us first discourage the use of ICU, and later to completely disable the ICU support code in Mono.

A few months ago, Atsushi was wrapping up his work on System.XML 2.x and asked me what should he look into as his next task. I asked Atsushi to look into implementing a replacement for ICU that we could use for Mono. He took this challenge very seriously and this past week he finally landed the new string collation code in the repository.

His latest blog post has some performance information and he links to the various posts that detail his quest into implementing string collation for Mono.

Posted on 09 Aug 2005