by Miguel de Icaza

Generators Zen

I got most of the iterators support on the Mono C# compiler working now, hopefully these will be checked in CVS as soon as I polish a few things. You can read a small tutorial about them here.

I think that this has more potential than advertised.

Basically, the iterators support in C# simplifies the implementation of methods that return IEnumerable and IEnumerator classes. Typically an implementor would have to create a helper class that would track the state of the enumerable class, and implement a number of methods: MoveNext, Reset, and GetCurrent. Implementing this state machine is not only boring, but also error prone.

Implementing these patterns are so boring, that the average developer fights the system, and designs clever workarounds: From passing a delegate (a method pointer) to be called back, and have the enumerator work linearly; to construct an array with all the results, and returning this; or the worst, exposing the internals of their object. These solutions might work, but they are not available to the platform developers, as they have to provide the right framework for developers; This also means that average developers wont have their code integrate nicely with the underlying platform and interoperate nicely with others.

The C# iterator support works around the problem. Now it is trivial to implement these enumeration interfaces. There is no excuse to not use the system pattern, as it is so simple to use. Lame sample follows:

	IEnumerable CountToThree ()
	{
		print ("About to say one");
		yield 1;
		print ("About to say two");
		yield 2;
		print ("About to say three");
		yield 3;
	}

When invoked, say from the foreach construct:

	foreach (int value in CountToThree){
		print (value);
	}

The message "About to say one" would be printed, then, the routine would return, and the value would be printed by the print in the foreach main loop. Then, when the next value is about to be retrieved, the execution will be resumed where it left, and the string "About to say two" would be printed, and the value 2 returned.

Notice something: the values are not pre-computed ahead of time and returned: they are returned as they are consumed. This is implemented by a clever state machine and an internal class generated by the C# compiler.

Now, today as I explained my excitement to Ettore on the yield keyword, I realized that it could have more uses outside the scope of implementing enumerator interfaces. Part of the beauty of the yield is that from the developer perspective, it suspends the execution of the method at that point, only to be resumed later on.

This beauty is not obvious at first. It took quite some time for me to assimilate this. Lets repeat it again: the yield keyword suspends the execution of the routine, only to be resumed later. This is not only interesting, this is absolutely fabulous.

What the designers of C# have done here, is that they have taken an annoying and error prone pattern and have made a language extension that effectively addresses a problem.

But it seems like yield could be used for a lot more. In fact, over the years I have implemented plenty of state machines, in the presence of non-blocking operations. Non-blocking protocols handlers, parsers, and GUI applications. It starts to feel like a waste to have this functionality under-used.

For example, an streaming XML parser looks like this:

HandleToken (int token)
{
	switch (state){
		case START:
			if (match (token, "<")){
				state = "<";
			error ();
		case "<":
			if (match (token, identifier)){
				state = IDENTIFIER;
			error ();
		case IDENTIFIER:
			if (match (token, OPEN_QUOTES)){
				state = QUOTE_HANDLING;
			} if (match (token, ">"){
				state = START;
			}
	}
}

It would be fascinating if this could be implemeted with yield:

HandleToken (int token)
{
	open:
	if (match (token, "<")){
		yield;
		if (match (token, identifier)){
			yield;
			if (match (token, OPEN_QUOTES)){
				...
			} if (match (token, ">")){
				yield;
				goto open; 	// you know you love it.
			}
		}
	}
}

Ok, that is probably not the best example, as this is just the first time I have thought of this (Am sure the lisp, scheme, icon people have better examples of this). The one things missing here is how to resume the method, and how to provide any new parameters to it. Maybe a resume (method, args) would do the trick.

But the same sort of patterns exist on GUI applications: for example consider the state handling of user input while doing region selection (or the other hundred states tracked by Gnumeric for example).

Extending C# string support

A few things I would like to see in C#, these do not even require changes to the language, but only to the String class:

class String {
	// 
	// Returns the string STR replicated COUNT times.
	//
	static string operator * (String str, int count);

	//
	// Splits the string used the provided string
	//
	static string operator / (String str, String sep);
	static string operator / (String str, char c);

	//
	// Extracts a range of characters, notice that String.Substring is annoying
	//
	static string this [int start, int end] 
}

string s = "Hello World;

print (s [0, 4]);  	// prints "Hello"
print (s [-2, 2]);	// prints "ld"
print (s * 2);		// prints "Hello WorldHello World"
string [] j = s / " ";  // returns {"Hello", "World"}

Wikis

Been fascinated about the Gtk# Wiki

Posted on 22 Apr 2003