Language Shootout

Alvaro points out that in the Language Shootout Benchmark Mono comes in 18th place compared to Java's 10th place.

We know that Sun's proprietary Java edition (not the open source one, as that one is nowhere to be found yet) is faster than Mono, but I was surprised that we were so far behind. So I looked at the comparison between Java6 and Mono.

Memory usage wise, we mostly come ahead, but in performance, there were two places where Sun's server VM beat Mono seriously in performance (5x or more), one is the regex-dna test and the other one is pidigits test.

The regex test is a test of the regular expression matching engine in the class libraries, not really a test of the language or VM performance, but library implementation. Clearly, our Regex implementation could use some work.

The pidigits test was showing up as 6x better with Java than with Mono. But the test is basically comparing C# vs assembly language. In Mono's case it is using a full C# implementation of BigInteger while the Java version uses the C/assembly language GMP library that has been tuned with hand-coded assembly language.

I ported Java's pidigits to C# to also use native GMP, and the results are promising, we now have a 4.7x speedup and the process size is one megabyte smaller. I was unable to test the Java version on my machine, as I could not find the native "libjgmp" library.

I wonder what the policy is for the language shootout to use external libraries. If its ok, I should contribute my port, if its not, the Java test should be rewritten to be a fully managed implementation.

If you run all the tests the gap between Java and Mono goes from 8 places, to 3 places; If you remove the two bad tests (Our Regex implementation, and the pidigits test) Mono is only one slot behind Java server; and if you also account for memory usage (but still account for all the tests), Mono comes ahead of Java.

Of course, we got homework to do: why is our Regex implementation so much slower?

Update: As it turns out, Mario Sopena pointed out that, another 25% performance improvement can be achieved if the implementations are similar. The C# sample does a lot more regex work than the Java implementation does. The Python implementation has further refinements on the algorithm that could improve the performance further.

Some Observations

It is interesting to see in the benchmarks the progression:

  • Close to the metal languages are the first tier (C, C++, D, Pascal, even Eiffel).
  • Compiled functional languages come next (OCaml, ML, Haskell, Lisp).
  • Java and Mono are on the next tier.
  • A big jump comes next: Erlang, Smalltalk, MzScheme.
  • Next big jump: Pike, Perl, Python.
  • Another jump: PHP, Javascript.
  • Tcl: a class on its own.
  • Ruby, last one.

There are a few rarities, like Fortran being in the same tier as Java and Mono, which probably means the tests for Fortran have not been tuned up, I would expect it to be in the same tier as C.

Also, am surprised by Ruby being the last on the list, I expected it to be roughly in the same range as Python, so I suspect that the tests for Ruby have not been tuned either. Update: my readers also point out that Ruby 1.9 will improve things.

Update: I just noticed that Eiffel is on the first tier, performance wise, but has pretty much all the properties and features of the third tier (garbage collection, strong typing, bounds checking). This means that you get the best of both world with it (and Eiffel's compiler is now also open source).

Language Productivity

And of course, at the end of the day, what matters is how productive you are writing code in a language. The Wikipedia is powered by PHP, Amazon by lots of Perl and C, Google uses Python extensively, and the stellar productivity that can be achieved with Ruby on Rails is hardly matched. So even if your language is slower than the first few tiers, to many developers and sites deploying software what matters is productivity.

Choosing between Mono's C# and Java, both languages being roughly on the same class, is a function of the libraries that you use, the ecosystem where the code will be developed/deployed and to some extent the language.

Alvaro's teammates at Sun have a difficult challenge ahead of them when it comes to the language: how to fix a language that has been so badly bruised by their generics implementation, their refusal to acknowledge delegates, the ongoing saga over the catastrophic closure proposals [1] and the lack of a strong language designer to lead Java into the future.

So even if we have a slow regular expression engine, we have working closures, iterators, events, the lock and using statements in the language and LINQ.

Of course, I wish them the best luck (in the end, Mono is a language-independent VM, and we are just as happy to run C# code as we are running Java code, which incidentally, just reached another milestone) and we certainly plan on learning from the open source Java VM source code.

Alternatively, you can use Mainsoft's Grasshopper to write C# code, but have it run on a Java VM.

[1] Am tempted to write a post about the mistakes that both Java closure proposals have. But it seems like it will be a waste of time, it feels like there is just too much hatred/NIH towards C# in that camp to make any real progress.

Posted on 28 Dec 2007 by Miguel de Icaza
This is a personal web page. Things said here do not represent the position of my employer.