VMs

Guido recently quoted a conversation we had about Parrot. I said to him that the Parrot VM design was based on ideology (Guido quoted me as saying religion ;-).

One of my favorite books is Computer Architecture: A quantitative approach.

The parrot design is based on ideology because some of the core design considerations are just that: `real machine use registers, hence register-based intermediate code is faster'. An ideology-based design is one where the design decisions are driven by punch lines and not by a careful and quantitative study of the problem at hand.

Mono implements the .NET virtual execution system; This execution system uses a stack-based instruction set to transport code. Mono never actually sees the stack operations, because we treat the stack-based operations by their name: a serialization format for a tree representation of the original code.

So for example, if you have an operation like:

		a = b + c;

The code can be thought of as:

A possible serialization of this tree is easily expressed by a lisp-like expression:

	(assign a (add b c))

Another possible serialization is using bytecodes, for a stack machine

:
	ldloc b
	ldloc c
	add
	stloc a

The above set of stack instructions can be easily decoded back into its original form (see drawing above). This is what the Mono JIT does: it transforms its input CIL bytecodes into various trees like this (we call these the forest of trees):

(stind.i4 regoffset[0xfffffff8(%ebp)] (add (ldind.i4 regoffset[0xfffffff4(%ebp)]) 
				      (ldind.i4 regoffset[0xfffffffc(%ebp)])))

The above if the debugging output that renders our C-based trees. We then use a code generator generator to transform the high-level operations (not listed in this example) into a list of low-level instructions:

        1  loadi4_membase R9 <- %ebp
        2  loadi4_membase R10 <- %ebp
        3  add R8 <- R9 R10 clobbers: 1
        4  move %eax <- R8

Depending on the optimizations turned on, there might be other steps involved, the end result is:

	mov    0x8(%ebp),%eax
	mov    0xc(%ebp),%ecx
	add    %ecx,%eax

(Actually, I had to trick the JIT compiler into doing this, because dead-code elimination and inlining in Mono eliminate redundant code).

Scripting languages

Now, am interested in this debate, because I think that Mono and the .NET VM are ideal virtual machines for scripting languages. Since the .NET folks have done a fair ammount of work into the interop issues (The Common Language Specification) and it is a fairly advance virtual machine, my interest is making Mono a good host for those languages.

It has been said `.NET works great for static languages, but not for scripting'. Which is not true; VB.NET and JScript are both dynamic languages that happen to work just fine on the .NET Framework. And we are convinced that the virtual machine can be improved.

The main issue we have today with Mono and scripting languages is that nobody has done a quantitatie study of what are the performance problems of running a dynamic language in the .NET/Mono CLR: Without an attempt to have a native compiler for the platform, and studying the problems, it is not possible to solve them.

In one particular case (Lisp), we know that implementors will likely want to structure their code (or their generated code) like this:

class Cell {
	Atom Head;
	Cell tail;
}

Method ()
{
	if (object is Cell){
		...
	} if (object is Atom){
		...
	}
}

So what they need in this particular case is a fast implementation of the `is' operator (the `isinst' instruction). For this, we did a proposal

Posted on 22 Jul 2003 by Miguel de Icaza
This is a personal web page. Things said here do not represent the position of my employer.