The Mono team is working on a number of directions to
improve code generation. Although Massi has worked on a
number of advanced optimizations to improve code generation
(based on SSA and HSSA), the improvements were dominated by
problems on the register allocation.
We are working in two directions to fix this issue. A
long-term fix is the new linear IR representation which Zoltan
is working on. This code today is available in the
mini-linear-il branch on SVN. The intention of this new IR
is to improve the visibility of register usage to the register
allocator, so it can make better choices when producing code.
A second line of work emerged from our research into
inlining. Turning on the Mono inliner produced worse code
than not turning on the inlining at all (we have it turned off
by default on Mono). The problem was that inlining produced
plenty of local variables and the resulting IR became very
hard to optimize by the higher layers of the JIT engine so
Massimiliano devised a plan to eliminate these extra temporary
variables.
The details of this new optimization can be found on his
mailing list post "Tree
Mover".
What follows are a few excerpts from the email. Consider
this piece of C# code:
private int f;
public int F {
get {
return f;
}
set {
f = value;
}
}
static int Inline (int a) {
return a * 4;
}
public override int Cprop (int v) {
return Inline (F + v);
}
The IR produced with the method "Cprop" is this:
(outarg (ldind.i arg[0]))
(stind.i4 local[2] call[get_F])
(outarg (add (ldind.i4 local[2]) (ldind.i4 arg[1])))
(setret call[Inline])
The problem happens when we inline the call to "Inline",
which turns the IR into this:
(stind.ref local[3] (ldind.i arg[0]))
(stind.i4 local[2] (ldind.i4 (add (ldind.ref local[3]) iconst[8])))
(stind.i4 local[4] (ldind.i4 local[2]))
(stind.i4 local[6] (add (ldind.i4 local[4]) (ldind.i4 arg[1])))
(stind.i4 local[5] (mul (ldind.i4 local[6]) iconst[4]))
(stind.i4 local[4] (ldind.i4 local[5]))
(setret (ldind.i4 local[4]))
The extra variables introduce the extra pressue on the
register allocator, which today produces pretty bad code (see
Massi's original post for the details).
What Massi's new tree mover code does is remove the unused
local variables from the previous representation:
(setret (shl (add (ldind.i4 (add (ldind.ref arg[0]) iconst[8]))
(ldind.i4 arg[1])) iconst[2]))
That is a much better tree to cope with in the code
generation phase, and produces nice and compact assembly code.
Read his post "Tree
Mover" for details for the actual assembly code produced
before and after and benchmark information.