Tuning Code Generation

by Miguel de Icaza

The Mono team is working on a number of directions to improve code generation. Although Massi has worked on a number of advanced optimizations to improve code generation (based on SSA and HSSA), the improvements were dominated by problems on the register allocation.

We are working in two directions to fix this issue. A long-term fix is the new linear IR representation which Zoltan is working on. This code today is available in the mini-linear-il branch on SVN. The intention of this new IR is to improve the visibility of register usage to the register allocator, so it can make better choices when producing code.

A second line of work emerged from our research into inlining. Turning on the Mono inliner produced worse code than not turning on the inlining at all (we have it turned off by default on Mono). The problem was that inlining produced plenty of local variables and the resulting IR became very hard to optimize by the higher layers of the JIT engine so Massimiliano devised a plan to eliminate these extra temporary variables.

The details of this new optimization can be found on his mailing list post "Tree Mover".

What follows are a few excerpts from the email. Consider this piece of C# code:

	private int f;
	public int F {
	        get {
	                return f;
	        set {
	                f = value;
	static int Inline (int a) {
	        return a * 4;
	public override int Cprop (int v) {
	        return Inline (F + v);

The IR produced with the method "Cprop" is this:

	 (outarg (ldind.i arg[0]))
	 (stind.i4 local[2] call[get_F])
	 (outarg (add (ldind.i4 local[2]) (ldind.i4 arg[1])))
	 (setret call[Inline])

The problem happens when we inline the call to "Inline", which turns the IR into this:

	 (stind.ref local[3] (ldind.i arg[0]))
	 (stind.i4 local[2] (ldind.i4 (add (ldind.ref local[3]) iconst[8])))
	 (stind.i4 local[4] (ldind.i4 local[2]))
	 (stind.i4 local[6] (add (ldind.i4 local[4]) (ldind.i4 arg[1])))
	 (stind.i4 local[5] (mul (ldind.i4 local[6]) iconst[4]))
	 (stind.i4 local[4] (ldind.i4 local[5]))
	 (setret (ldind.i4 local[4]))

The extra variables introduce the extra pressue on the register allocator, which today produces pretty bad code (see Massi's original post for the details).

What Massi's new tree mover code does is remove the unused local variables from the previous representation:

	 (setret (shl (add (ldind.i4 (add (ldind.ref arg[0]) iconst[8]))
	 (ldind.i4 arg[1])) iconst[2]))

That is a much better tree to cope with in the code generation phase, and produces nice and compact assembly code.

Read his post "Tree Mover" for details for the actual assembly code produced before and after and benchmark information.

Posted on 24 Jan 2006