Iterators and Efficient Use of the Thread Pool

The new HTTP application pipeline in Mono is in my opinion a beautiful work of art. Let me explain.

The HTTP application pipeline processes an incoming HTTP request. When the request comes in it has to go through a number of steps before the request is actually handed over to the developer code: authentication, authorization, cache lookup, session state acquisition. There are a similar set of steps processed after your code is complete.

The runtime by default provides a few modules. These are listed in the machine.config file in the <httpModules> section. Mono by default includes: FormsAuthentication, UrlAuthorization, Session and the OutputCache modules. These hook up to one or more of the stages in the application pipeline.

These stages are part of the HttpApplication class. You typically have one of these per "directory" where you have deployed an ASP.NET application. As a developer, you can hook up to various points in the processing pipeline. For example, you could add your own authentication system by listing a dynamic module in the web.config file, and then hooking up to the processing pipeline:

	app.AuthenticateRequest += my_authenticator;

Now, typically when processing a request the various hooks are invoked synchronously: one after another. But in some cases your hook code might want to perform a lengthy operation, for example contacting a remote authentication server. Instead of blocking the executing thread, you want to queue the work using the CIL asynchronous framework and release the current thread from its duties so it can be used to process another request. To do this, you must register your hook using a different API:

	app.AddOnAuthenticateRequestAsync (begin_authentication, end_authentication);

Where begin_authentication is the method that will initiate the authentication asynchronously and end_authentication is the method that will be invoked when the asynchronous operation has completed. As I said before, while the asynchronous operation is pending the thread should be returned to the threadpool so it can handle the next request. When the asynchronous operation is completed it will be queued for execution again and when a thread becomes available it will execute the completion routine and resume execution.

The challenge is that you might have both synchronous and asynchronous hooks in a given application and also that at any stage the pipeline can be stopped (for example if authorization failed).

Now the challenge was to come up with a maintainable and clean design for the application pipeline. Here is where iterators came into play. The first step to make this simple was to treat all asynchronous registrations as synchronous at the event layer:

	//
	// AsyncInvoker is merely a wrapper class to hold the `b' and
	// `e' events, it does not actually invoke anything.
	// 
	public void AddAsync (BeginEventHandler b, EndEventHandler e)
	{
		AsyncInvoker invoker = new AsyncInvoker (b, e);
		Hook += new EventHandler (invoker.Invoke);
	}

The Pipeline is written like this:

	IEnumerator Pipeline ()
	{
		if (Authentication != null)
			foreach (bool stop in RunHooks (Authentication))
				yield return stop;

		if (Authorization != null)
			foreach (bool stop in RunHooks (Authorization))
				yield return stop;

		[...]
		
		done.Set ();
	}

Now the trick is how to implement RunHooks which takes a list of events, here it is:

	IEnumerable RunHooks (string stage, Delegate list)
	{
		Delegate [] delegates = list.GetInvocationList ();

		foreach (EventHandler d in delegates){
			if (d.Target != null && d.Target.GetType () is AsyncInvoker){
				AsyncInvoker ai = (AsyncInvoker) d.Target;

				ai.begin (this, EventArgs.Empty, resume, ai);
				yield return false;
			} else 
				d (this, EventArgs.Empty);

			if (stop_processing)
				yield return true;
		}
	}

Notice that we basically are using nested yield-based enumerators. The return value from "RunHooks" indicates whether the pipeline must be stopped (true) or not (false). RunHooks will execute as many synchronous operations as it can in order until it finds an asynchronous operation. At that point it initiates the operation calling the "begin" method and then it yields the control. The control is transfered to the Pipeline method which also yields and returns control to the caller.

The pipeline is kicked into action by:

	void Start (object callback)
	{
		done.Reset ();
		pipeline = Pipeline ();

		Execute ();
	}

Now the actual processing engine lives in the "Execute" method:

	void Execute ()
	{
		if (pipeline.MoveNext ())
			if ((bool)pipeline.Current){
				Console.WriteLine (prefix + "Stop requested");
				done.Set ();
			}
	}

	// This restarts the pipeline after an async call completes.
	void resume (IAsyncResult ar)
	{
		AsyncInvoker ai = (AsyncInvoker) ar.AsyncState;
		if (ai.end != null)
			ai.end (ar);

		Console.WriteLine (prefix + "Completed async operation: {0}", ar.GetType ());
		Execute ();
	}

Execute is basically using the IEnumerator interface directly while the Pipeline method uses the conveniece foreach method that iterates over every step.

The complete sample can be obtained here. This is the prototype I used as a proof of concept. The actual implementation (with different routine names) that we landed on System.Web is more complete is available here (you must scroll down).

At 1000 lines of code for the full file its one of the clean and small hacks that am most proud of.

Posted on 28 Aug 2005 by Miguel de Icaza
This is a personal web page. Things said here do not represent the position of my employer.