Generics Memory Usage

by Miguel de Icaza

Aaron walked into my office today to get some some feedback on some implementation detail for his new listview in Banshee.

Before he left the office he said something like, but not necessarily "In some Microsoft blog someone commented that I should not use generics for small arrays of value types" (see Update at the bottom).

So we decided to measure with a trivial program the memory consumption for storing small arrays with and without generics with Mono 1.2.5 and Mono 1.2.6.

Storing 8 megs of ints (32-megs of data) on an array of objects has a high overhead: the actual data, the box, the object lock which means that you end up using about 21.5 bytes per int:

		object [] e = new object [size];

		for (int i = 0; i < size; i++)
			e [i] = 1;
	

With a generic array, you are as close as possible to a real array and you only consume 38 megs of ram (this is the full process size, the 32 meg array plus the Mono runtime, JIT, etc), the following sample ensures that am not using a regular int array, but an instantiated generic class with ints:

		public class D<T> {
			public T [] t;
			
			public D (int size)
			{
				t = new T [size];
			}
		}

		D<int> d = new D<int> (size);
		for (int i = 0; i < size; i++)
			d.t [i] = 1;
	

The regular collection consumes 178 megs of ram, while the generics collection consumes 38 megs of ram (when running with Mono).

I was a bit shocked about the 178 megs of ram used by regular object wrappers, so I wrote the equivalent for Java, to see how they fared compared to Mono:

	      Object [] x = new Object [8*1024*1024];

	      for (int i = 0; i < 8*1024*1024; i++)
                     x [i] = new Integer (i);
        

Java uses 248 megs of ram, so it is chubbier than regular C# at 30 bytes per boxed int on average (this was with Sun's Java 1.6.0, maybe there are newer versions, but thats what I got on my system).

I got no Java/generics skills to implement the above with Java, but since Java does not really have generics at the VM level (they are implemented purely as a language-level feature), I do not think that the numbers would be significantly different).

Mono 1.2.6 also introduces a number of memory reduction features for generics that reduce the size of our interfaces. When using generics, in 1.2.5 on a List<int> case we were generating a lot of useless stuff:


IMT tables size: 7752
IMT number of tables: 102
IMT number of methods: 2105
IMT used slots: 872
IMT colliding slots: 486
IMT max collisions: 27
IMT methods at max col: 134
IMT thunks size: 19060

With the upcoming 1.2.6 release the memory savings for the metadata kept resident by the runtime are also significant:


IMT tables size: 7752
IMT number of tables: 102
IMT number of methods: 4
IMT used slots: 2
IMT colliding slots: 1
IMT max collisions: 3
IMT methods at max col: 3
IMT thunks size: 34

There is still an issue of locality. Using the boxing collections has the advantage that the same code is shared across all possible types. The generic versions on the other hand get their own JITed versions of every method involved (at least today).

You can track Mark's progress to change this as he continues with our implementation for generic code sharing.

Summary: From a memory consumption point of view, the bottom line is: if you are storing value types (non-object values like ints, longs, bools) it is better to use generic collections (System.Collections.Generic) than using the general-purpose collections (System.Collections). If you are just going to store references to objects, there is really no difference.

Update: The comment was from Rico Mariani, and the source of the confusion was:

List<T> offers generally better performance than ArrayList as well as type safety. We recommend using List<T> over ArrayList always if T is a reference type. If T is a value type there is additional cost assocated with creating a specialized version of List<T> for that value type. When T would be a value type we recommend List<T> if the savings in boxing is greater than the cost of the additional code -- this tends to happen if you store about 500 elements across all your List<T> objects.

OK, so the confusion is not that it might be worse for value types, but that the JIT will have to generate specific instantiations of generic methods (Insert for example) based on the parameter type.

So the JIT generates separate code for a List.Insert of int and for a List.Insert of long. Considering the savings for even small apps, I will go with "Go wild with the value types" myself as the code for the collection code is really small.

Posted on 21 Nov 2007