P/Invoke limitations

by Miguel de Icaza

Jonathan Pryor has written a tutorial about Marshaling on Mono and the .NET Framework. Hopefully this will become a chapter on the Mono Handbook

One of the points discussed on the document is about string marshalling in .NET and the limitation imposed by the current specification. Sadly the ECMA specification (and .NET) only cover Ansi and Unicode as encodings for strings. Both are also underspecified, they are both `platform independent encodings'.

This is a problem, because there is no such thing as `Unicode encoding'. There are specific ways of encoding: utf-8, utf-16, ucs-2, ucs-4 and others. Which one are we talking about.

The issue we face on Unix and Windows is that there are plenty of libraries that have very specific requirements. We first ran into this problem with the Python bindings to .NET. Python can be compiled to either use utf-8 or ucs-4 API entry points.

Another example is Gtk#. Currently it generates bindings like this:

[DllImport("libgtk-win32-2.0-0.dll")]
static extern void gtk_label_set_text(IntPtr raw, string str);

public string Text { 
	set {
		gtk_label_set_text(Handle, value);
	}
}		
	

The above uses the default "Ansi" encoding, which we have conveniently mapped in Mono to do a conversion from Unicode to utf-8 (Gtk+ uses utf-8 as input). But this is not entirely correct. The trouble is that to be entirely correct we will have to generate code like this:

[DllImport("libgtk-win32-2.0-0.dll")]
static extern void gtk_label_set_text(IntPtr raw, IntPtr str);

public string Text { 
	set {
		// Marshaller is a helper class that every developer
		// has to roll on its own.
                IntPtr raw_str = Marshaller.ConvertToUTF8 (value);
		gtk_label_set_text(Handle, raw_str);
		Marshaller.PtrToStringGFree (raw_str);
	}
}		
	

This is not a problem for Gtk# which is generating the bindings for us, just a bit more of annoyance, but it is a problem for newcomers to .NET who will find the above just too complicated for average use. Not only that, but the other problem is that most of the time the app appears to work. It is only under stress conditions found during deployment (or testing if you are lucky) that the developer will be bit by this missing functionality.

We identified a couple of unused bits on the encoding for this attribute in the ECMA spec, and suggested that they get used for the most populate encodings (utf-8 and ucs-2 I believe), but we were turned down by Microsoft. The current position is to use a platform specific attribute to encode this. This has problems, because we can not be compatible with .NET unless we all adopt the same scheme. The other problem is that by using a real custom attribute to do this, as opposed to a synthesized one (the attribute being turned into a bit value) is that we will loose performance.

Anyways, too bad that Microsoft is not interested in fixing this platform usability issue, specially since they have been focusing on fixing things that would force programmers to write more code. This seems like one of those things.

Posted on 13 Sep 2003