Botijos in New York City

by Miguel de Icaza

My friend Karla Frechilla will be presenting her artwork in New York City starting this Saturday until the 15th.

There will be an opening reception on Tuesday December 4th from 6 to 8 pm at Jadite Galleries 413 W 50th St in New York City.

She will be presenting her Botijos as well as some of her paintings.

Posted on 30 Nov 2007

Generics Memory Usage

by Miguel de Icaza

Aaron walked into my office today to get some some feedback on some implementation detail for his new listview in Banshee.

Before he left the office he said something like, but not necessarily "In some Microsoft blog someone commented that I should not use generics for small arrays of value types" (see Update at the bottom).

So we decided to measure with a trivial program the memory consumption for storing small arrays with and without generics with Mono 1.2.5 and Mono 1.2.6.

Storing 8 megs of ints (32-megs of data) on an array of objects has a high overhead: the actual data, the box, the object lock which means that you end up using about 21.5 bytes per int:

		object [] e = new object [size];

		for (int i = 0; i < size; i++)
			e [i] = 1;

With a generic array, you are as close as possible to a real array and you only consume 38 megs of ram (this is the full process size, the 32 meg array plus the Mono runtime, JIT, etc), the following sample ensures that am not using a regular int array, but an instantiated generic class with ints:

		public class D<T> {
			public T [] t;
			public D (int size)
				t = new T [size];

		D<int> d = new D<int> (size);
		for (int i = 0; i < size; i++)
			d.t [i] = 1;

The regular collection consumes 178 megs of ram, while the generics collection consumes 38 megs of ram (when running with Mono).

I was a bit shocked about the 178 megs of ram used by regular object wrappers, so I wrote the equivalent for Java, to see how they fared compared to Mono:

	      Object [] x = new Object [8*1024*1024];

	      for (int i = 0; i < 8*1024*1024; i++)
                     x [i] = new Integer (i);

Java uses 248 megs of ram, so it is chubbier than regular C# at 30 bytes per boxed int on average (this was with Sun's Java 1.6.0, maybe there are newer versions, but thats what I got on my system).

I got no Java/generics skills to implement the above with Java, but since Java does not really have generics at the VM level (they are implemented purely as a language-level feature), I do not think that the numbers would be significantly different).

Mono 1.2.6 also introduces a number of memory reduction features for generics that reduce the size of our interfaces. When using generics, in 1.2.5 on a List<int> case we were generating a lot of useless stuff:

IMT tables size: 7752
IMT number of tables: 102
IMT number of methods: 2105
IMT used slots: 872
IMT colliding slots: 486
IMT max collisions: 27
IMT methods at max col: 134
IMT thunks size: 19060

With the upcoming 1.2.6 release the memory savings for the metadata kept resident by the runtime are also significant:

IMT tables size: 7752
IMT number of tables: 102
IMT number of methods: 4
IMT used slots: 2
IMT colliding slots: 1
IMT max collisions: 3
IMT methods at max col: 3
IMT thunks size: 34

There is still an issue of locality. Using the boxing collections has the advantage that the same code is shared across all possible types. The generic versions on the other hand get their own JITed versions of every method involved (at least today).

You can track Mark's progress to change this as he continues with our implementation for generic code sharing.

Summary: From a memory consumption point of view, the bottom line is: if you are storing value types (non-object values like ints, longs, bools) it is better to use generic collections (System.Collections.Generic) than using the general-purpose collections (System.Collections). If you are just going to store references to objects, there is really no difference.

Update: The comment was from Rico Mariani, and the source of the confusion was:

List<T> offers generally better performance than ArrayList as well as type safety. We recommend using List<T> over ArrayList always if T is a reference type. If T is a value type there is additional cost assocated with creating a specialized version of List<T> for that value type. When T would be a value type we recommend List<T> if the savings in boxing is greater than the cost of the additional code -- this tends to happen if you store about 500 elements across all your List<T> objects.

OK, so the confusion is not that it might be worse for value types, but that the JIT will have to generate specific instantiations of generic methods (Insert for example) based on the parameter type.

So the JIT generates separate code for a List.Insert of int and for a List.Insert of long. Considering the savings for even small apps, I will go with "Go wild with the value types" myself as the code for the collection code is really small.

Posted on 21 Nov 2007

Javascript Decompressor Roundup

by Miguel de Icaza

Thanks to everyone that emailed me answers to the Javascript decompressor issue. This is a reply in case other people are looking at ways of de-obfuscating or have to debug some compressed Javascript code.

I included the names of the nice folks that emailed me, and some comments for those that I actually tried out.

Some annotations:

  • [CMD] works from the command line, my favorite kind.
  • [WEB] Provides a web UI.
  • [GUI] GUI
  • [SOURCE] comes with source code.
  • [WINDOWS] Windows-only

Here they go:

  • [WEB, CMD, SOURCE] Beautify is a PHP-based, server-side decompressor. This was the one I used to debug some of the problems we were having, and the results are very good. Source code is available for you to run on your local server, or you can reuse it from the command line (some assembly required for command-line use).
  • [CMD, SOURCE]Qooxdoo's pretty printer written in Python (Fabian Jakobs, Sebastian Werner), to use:
    $ INSTALL_PATH/frontend/framework/tool/modules/ file.js

    Alternatively, you can get only the pretty printer from SVN:

    $ svn co
    $ modules/ -w originalFile.js
  • [WEB, SOURCE] Beautify.aspx. Get the source code as the link from their web page is broken (Jokin Cuadrado, Steven Coffman).
  • [GUI, WINDOWS] A Plug-in for Fiddler: Fiddler is a Windows HTTP debugger, if you are using Windows and Fiddler this plugin might be for you (Jokin Cuadrado, Steinar Herland).

If you are a VIM user, this VIM script provides Javascript indentation. but it seems like a lot of work for general-purpose decompression of javascript (Kjartan Maraas).

If you feel that none of the above is good for you and you want to prepare for your interview at Google, Jeff Walden suggests a hard-core approach:

One of the less well-known aspects of SpiderMonkey, Mozilla's C JavaScript engine, is that it includes a decompiler which translates from SpiderMonkey bytecode to JavaScript (most people only use it the other way around). You can see it at work any time you convert a function to a string. Most JavaScript engines, when asked to convert a function to a string, do one of two things: return the exact source text (I believe IE does this, but I haven't double-checked), or return a string provides the minimum ECMAScript requires -- that the string have the syntax of a function declaration, i.e. that it be be evaluable to create a function (I think this is what Safari does). SpiderMonkey's choice to eliminate the overhead of storing source text after converting means that it can't do the former, and the latter is unpalatable from a developer standpoint. Instead, it decompiles the bytecode back to a JavaScript string representing the function as exactly as possible, while at the s ame time formatting the decompiled source to be reasonably readable. How would you use SpiderMonkey to reformat obfuscated source? First, you get a copy of SpiderMonkey:
  export CVSROOT=:pserver:[email protected]:/cvsroot
  cvs co mozilla/js/src
  cd mozilla/js/src
  make -f Makefile.ref clean && make -f Makefile.ref  # work around broken dependency system
  .obj/js # to run the interpreter

Next, you dump the JS code you want to reformat into a function, and you have SpiderMonkey pretty-print it:

  echo "function container() {" > obfuscated.js
  cat file-to-clean-up.js >> obfuscated.js
  echo "} print(container.toString());" >> obfuscated.js
  path/to/js -f obfuscated.js

SpiderMonkey will then print the container function's string representation, adjusting indentation and such to create a readable, if still name-obfuscated, version.

A couple things to know about this: first, SpiderMonkey doesn't pretty-print functions found in expression context:

  (function() {
     print("this won't get cleaned up");
  call_method(function() {
    print("this will probably be crunched to one line");
    print("not pretty-printed");

These examples are converted (once stripped of the containing function) to:

  (function () {print("this won't get cleaned up");}());
  call_method(function () {print("this will probably be crunched to one line");print("not pretty-printed");});

The former pattern's become fairly common for reducing namespace collisions (unfortunately for the decompiler), and the latter's become more popular as the functional aspects of JavaScript have been more played up recently in libraries. For now at least I think you just have to tweak the original source file to fix these problems. The decompiler could do a better job on these given some changes, but I don't see this happening any time soon. The decompiler is generally agreed to be one of the hairiest and least-well-understood pieces of code in SpiderMonkey, and people don't touch it that often.

Incidentally, the decompiler is also what allows SpiderMonkey to give the informative error messages it gives when your code throws an uncaught exception; the error messages I've seen in any other JavaScript interpreter are woefully less useful than the ones SpiderMonkey gives you using the decompiler.

Posted on 16 Nov 2007

RayTracing in one LINQ statement

by Miguel de Icaza

Through Don Syme's blog I read about Luke Hoban moving from the C# team at Microsoft to the F# team, I did not know about Luke's blog until now, it is a fantastic collection of cool C# 3 nuggets.

One of the things that impressed me the most is a recent sample he posted.

Luke implements a RayTracer in one line LINQ code. This test is insane (and sadly, our C# compiler is not yet able to handle that kind of complexity yet for LINQ statements). I reproduce it here (directly copy-pasted from Luke's blog):

var pixelsQuery =
  from y in Enumerable.Range(0, screenHeight)
  let recenterY = -(y - (screenHeight / 2.0)) / (2.0 * screenHeight)
  select from x in Enumerable.Range(0, screenWidth)
    let recenterX = (x - (screenWidth / 2.0)) / (2.0 * screenWidth)
    let point = Vector.Norm(Vector.Plus(scene.Camera.Forward, 
Vector.Plus(Vector.Times(recenterX, scene.Camera.Right), Vector.Times(recenterY, scene.Camera.Up)))) let ray = new Ray { Start = scene.Camera.Pos, Dir = point } let computeTraceRay = (Func<Func<TraceRayArgs, Color>, Func<TraceRayArgs, Color>>) (f => traceRayArgs => (from isect in from thing in traceRayArgs.Scene.Things select thing.Intersect(traceRayArgs.Ray) where isect != null orderby isect.Dist let d = isect.Ray.Dir let pos = Vector.Plus(Vector.Times(isect.Dist, isect.Ray.Dir), isect.Ray.Start) let normal = isect.Thing.Normal(pos) let reflectDir = Vector.Minus(d, Vector.Times(2 * Vector.Dot(normal, d), normal)) let naturalColors =
from light in traceRayArgs.Scene.Lights let ldis = Vector.Minus(light.Pos, pos) let livec = Vector.Norm(ldis) let testRay = new Ray { Start = pos, Dir = livec } let testIsects = from inter in from thing in traceRayArgs.Scene.Things select thing.Intersect(testRay) where inter != null orderby inter.Dist select inter let testIsect = testIsects.FirstOrDefault() let neatIsect = testIsect == null ? 0 : testIsect.Dist let isInShadow = !((neatIsect > Vector.Mag(ldis)) || (neatIsect == 0)) where !isInShadow let illum = Vector.Dot(livec, normal) let lcolor = illum > 0 ? Color.Times(illum, light.Color) : Color.Make(0, 0, 0) let specular = Vector.Dot(livec, Vector.Norm(reflectDir)) let scolor = specular > 0 ? Color.Times(Math.Pow(specular, isect.Thing.Surface.Roughness), light.Color) : Color.Make(0, 0, 0) select Color.Plus( Color.Times(isect.Thing.Surface.Diffuse(pos), lcolor), Color.Times(isect.Thing.Surface.Specular(pos), scolor)) let reflectPos = Vector.Plus(pos, Vector.Times(.001, reflectDir)) let reflectColor =
traceRayArgs.Depth >= MaxDepth ? Color.Make(.5, .5, .5) : Color.Times(isect.Thing.Surface.Reflect(reflectPos), f(new TraceRayArgs(new Ray { Start = reflectPos, Dir = reflectDir }, traceRayArgs.Scene,
traceRayArgs.Depth + 1))) select naturalColors.Aggregate(reflectColor, (color, natColor) => Color.Plus(color, natColor)))
.DefaultIfEmpty(Color.Background).First()) let traceRay = Y(computeTraceRay) select new { X = x, Y = y, Color = traceRay(new TraceRayArgs(ray, scene, 0)) }; foreach (var row in pixelsQuery) foreach (var pixel in row) setPixel(pixel.X, pixel.Y, pixel.Color.ToDrawingColor());

Although the above is pretty impressive, you might want to read about Luke's history of writing ray tracers as test cases for a new language (I write the factorial function, Luke writes Ray Tracers). His original sample from April goes into the details of how to define the scene, the materials and the objects and is useful to understand the above LINQ statement.

The full source code (includes the support definitions for defining the Scene and materials) is available here.

The original code (not LINQ-ified) is available here.

Posted on 16 Nov 2007

Javascript decompressors?

by Miguel de Icaza

Recently we have been debugging lots of Javascript from web pages, and the Javascript is either white-space compressed or obfuscated.

Does anyone know of a tool (preferably for Unix) to turn those ugly Javascript files into human readable form?

Am currently using GNU indent which is designed for C programs, but it does a mildly passable job at deciphering what is going on, but it is not really designed to be a Javascript pretty-printer. I would appreciate any pointers.

Posted on 13 Nov 2007

Mono Summit Schedule Published

by Miguel de Icaza

Jackson has published the Mono Summit 2007 Program Schedule.

If you are attending, please remember to register.

Posted on 13 Nov 2007

A Cool Mono Success Story

by Miguel de Icaza

Salvatore Scarciglia wrote an email today to tell me their story of using Mono on Clinical Trials. His team currently uses Oracle Clinical Trials, but when they wanted new features, they decided that it was better for them to build their own software than licensing the extra modules they needed to grow.

Salvatore has a description of their project where he explains why they built the software using .NET in the first place (they were mostly a Microsoft shop, with Microsoft servers, and they liked Visual Studio).

Recently something changed:

I believe in Mono (take a look at the projects page of this site) and I think that it can grows very fast in the next years. When I read that one of the latest release of Mono supports almost all the .NET Framework 2.0 I've decided to try our framework with it. The original architecture based on:
Windows Server 2003 + .NET Framework 2.0 + SQL Server 2005

has been substitute with:

Debian 4 + Apache + Mono 1.2.5 (mod_mono2) + MySQL (with .NET connector)

and... it works !

It was very hard to rewrite all the stored procedures and views developed with T-SQL in SQL Server, but at the end all the "dirty" work was done. On the other hand Monodevelop 0.16 has compiled the entire framework with no errors. The .NET connector provided by the MySQL team works fine and, finally, Apache + mod_mono was not so easy to configure but I did it.

Using Microsoft technology you have the advantage to use the best development environment (Visual Studio 2005), a fully supported database engine (SQL Server 2005) and the availability of many documentation and tutorial sites (first of all the MSDN). Using Mono you have the great opportunity to develop in C# on Linux.

[Ed: some typos fixed by my speller while quoting.]

This has also been our experience: porting the stored procedures from a SQL server to another is probably the most time consuming piece of work and it is also not mandatory. Mono comes with a SQL Server provider, and if you just want to replace your front-end ASP.NET servers with Linux hosts, you can continue to talk to your backend MS SQL Server if you want to.

Sadly, it has also been our experience that the most difficult piece to setup in Mono is mod_mono. The last time I set it up it was difficult, and people regularly have problems setting it up. A year or so ago, we came up with a pretty cool extension to mod_mono that (in my mind) simplified the deployment: AutoHosting, but it does not seem to be enough.

Am hoping that the new the FastCGI support for Mono will make it simpler to configure for some setups.

If you are interested in porting your ASP.NET application to Linux, you will like Marek Habersack's tutorial on porting ASP.NET apps that covers many of the details.

Posted on 13 Nov 2007

Android's VM

by Miguel de Icaza

Or <jwz>I, for one, welcome our new virtual machine overlords</jwz>

A very interesting theory on why Google created a new VM for Android instead of using an existing VM:

Dalvik is a virtual machine, just like Java's or .NET's.. but it's Google's own and they're making it open source without having to ask permission to anyone (well, for now, in the future expect a shit-load of IP-related lawsuits on this, especially since Sun and Microsoft signed a cross-IP licensing agreement on exactly such virtual machines technologies years ago... but don't forget IBM who has been writing emulation code for mainframes since the beginning of time).

But Android's programs are written in Java, using Java-oriented IDEs (it also comes with an Eclipse plugin)... it just doesn't compile the java code into java bytecode but (ops, Sun didn't see this one coming) into Dalvik bytecode.

So, Android uses the syntax of the Java platform (the Java "language", if you wish, which is enough to make java programmers feel at home and IDEs to support the editing smoothly) and the java SE class library but not the Java bytecode or the Java virtual machine to execute it on the phone (and, note, Android's implementation of the Java SE class library is, indeed, Apache Harmony's!)

With this VM, they managed to not depend on Sun terms for the future of the language and the VM or be bound by any definitions of the Java language. It is worth reading the entire article.

Once the source code for Android is released, it would be interesting to look into integrating Mono wiht it. It should already run on it, as its just Linux. What would be interesting is continuing to use C# to write code for it.

Some ideas that have been bounced around in the mono channels recently include:

  • CIL to Dalvik recompiler: Translate CIL bytecodes into Dalvik ones, like (like Grasshopper does) and provide a class library add-on.
  • DalvikVM: Implement a VM similar on top of Mono that can run Dalvik bytecodes side-by-side with other CIL code. This would be similar to the IKVM approach: a JavaVM for CIL.
  • Dalvik Support in Mono: Paolo suggested to add support to the Mono VM to have a Dalvik loader and turn the Dalvik instructions into the internal Mono IR (the rest at that point would be shared).
  • D/Invoke: Add support to the Mono VM to transparently call code into another VM. Very much along the lines of P/Invoke or COM's it-just-works support.

Looking forward to the release.

Posted on 13 Nov 2007

On parsing integers: Jeft Stedfast's Beautiful Code

by Miguel de Icaza

While the world is debating what is the proper object oriented and syntactic decoration for another wrapper around getElementById we back in Mono-land were coping with a bug related to parsing integers.

In the CLI the original API to parse integers threw exceptions when an overflow was detected. So if you tried to parse a value that is too large to fit on an int, you would get an overflow exception. This is trivial to implement in C# because you just write the parser code more or less like this:

		int value, digit;
		value = checked (value * 10 + digit);

If the result for the expression value * 10 + digit the JIT just throws. Beautiful.

This is great as long as your input is error-free, but if you are going to cope with lots of errors, you are going to be coping with a lot of exceptions. And throwing and catching exceptions is consideraly slower than returning an error code. So with .NET 2.0, a new family of methods was introduced, something like this: bool Int32.TryParse (string value, out int result).

The idea here is that instead of raising exceptions left and right we instead try to parse the value, and if the value overflows we just return an error condition.

A simple approach is to use a type that is bigger than the type you are parsing and you just check the boundaries for errors, for example, a byte could use an int:

		int value;

		value = value * 10 + digit;

		if (value < Byte.MinValue || value > Byte.MaxValue)
			// error.

The downside with this is that parsing 32-bit ints requires 64-bit longs. This alone is quite expensive as 64-bit operations on 32-bit machines take many more instructions, consume more registers and produce slow code for a very common case. For the 64-bit case, things are worse, since there is no 128-bit longs in .NET, which means that we have to either come up with something clever, or you need to do checked expressions (like before), use try/catch and return an error on the catch handler. Very suboptimal.

The ideal situation is to do the parsing with the data type at hand (use an int for int parsing, a long for long parsing) and catch the overflow before it causes a problem.

Without going into the embarrassing details (which anyone with an inclination to point fingers can do) recently we had to rewrite these routines and Jeff Stedfast came to the rescue with two beautiful routines: one to parse signed integers, and one to parse unsigned integers in C.

His routines are beautiful because they are able to parse a 32-bit int in a single loop, only using 32-bit ints; And 64-bit ints in a single loop, only using 64-bit longs.

His routines were for C code, but variations of these rewritten for C# are now part of our runtime (they will ship as part of Mono 1.2.6).

These routines have a very clever way of coping with overflows.

Posted on 13 Nov 2007

Undoing Bush

by Miguel de Icaza

Joseph Stiglitz discusses the Economic Consequences of Mr. Bush. Looking back at seven years of mistakes, a great big-picture summary of what went wrong and why, and the damage these policies have inflicted in the US.

Harper's Magazine ran a series of essays on how to fix the mess left behind in Undoing Bush: how to repair eight years of sabotage, bungling, and neglect.

Posted on 10 Nov 2007

Mono Bug Days

by Miguel de Icaza

On Monday and Tuesday (November 5th and 6th) we want to spend some time triaging, prioritizing and applying easy fixes to Mono from Bugzilla.

We will be on on the channel #monobugday on Monday and Tuesday going over various Mono components.

Our entry point is:

There is a lot of low-hanging fruit that could be easily fixed in Mono, bugs that are invalid, bugs that are missing information, bugs that needs owners, bugs that need confirmation and patches that have been waiting on the bug tracking system to be applied.

I have zero experience running a bug day and am not sure quite how to run this on Monday, if you have some experience, feel free to drop by, or send your comments.

Posted on 03 Nov 2007