Mono's master tree now contains support for a new mode of operation for our garbage collector, we call this the cooperative mode. This is in contrast with the default mode of operation, the preemptive mode.
This mode is currently enabled by setting the MONO_ENABLE_COOP environment variable.
We implemented this new mode of operation to make it simpler to debug our GC, to have access to more data on the runtime during GC times and also to support certain platforms that do not provide the APIs that our preemptive system needed.
When we started building Mono back in 2001, we wanted to get something up and running very quickly. The idea was to have enough of a system running on Linux that we could have a fully self-hosting C# environment in a short period of time, and we managed to do this within eight months.
We were very lucky when it came to garbage collection that the fabulous Boehm GC existed. We were able to quickly add garbage collection to Mono, without having to think much about the problem.
Boehm is fabulous because it does not really require the cooperation of the runtime to work. It is a garbage collector that was originally designed to add garbage collection capabilities to programs written in C or C++. It performs garbage collection without much developer intervention. And it achieves this for existing code: multi-threaded, assembly-loving, low-level code.
Boehm GC is a thing of beauty.
Boehm achieves its magic by pulling some very sophisticated low-level tricks. For example, when it needs to perform a garbage collection it relies on various operating system facilities to stop all running threads, examine the stacks for all these threads to gather roots from the stack, perform the actual GC job then resume the operation of the program.
While Boehm is fantastic, in Mono, we had needs that would be better served with a custom garbage collector. One that was generational and reduced collection times. One fit more closely with .NET. It was then that we built the current GC for Mono: SGen.
SGen has grown by leaps and bounds and has been key in supporting many advanced scenarios on Android and iOS as well as being a higher performance and lower latency GC for Mono.
When we implemented SGen, we had to make some substantial changes to Mono's code generator. This was the first time that Mono's code generator had to coordinate with the GC.
SGen kept a key feature of Boehm: most running code was blissfully unaware that it could be stopped and resumed at any point.
This meant that we did not have to do too much work to integrate SGen into Mono [1]. There are two main downsides with this.
The first downside is that we still required the host platform to support some mechanism to stop, resume and inspect threads. This alone is pretty obnoxious and caused much grief to developers porting Mono to strange platforms.
The second downside is that code that runs during the collection is not really allowed to use many of the runtime APIs or primitives, because the collector might be running in parallel to the regular code. You can only use reentrant code.
This is a major handicap for development and debugging of the collector. One that is just too obnoxious to deal with and one that has wasted too much of our time.
In the new cooperative mode, the generated code is instrumented to support voluntarily stopping execution
Conceptually, you can think of the generated code as one that basically checks on every back-branch, or every call site that the collector has requested for the thread to stop.
The supporting Mono runtime has been instrumented as well to deal with this scenario. This means that every API that is implemented in the C runtime has been audited to determine whether it can run in a finite amount of time, or if it is a blocking operation and adjusted to participate accordingly.
For methods that run in a finite amount of time, we just wait for them to return back to managed code, where we will stop.
For methods that might potentially block, we need to add some annotations that inform our GC that it is safe to assume that the thread is not running any mutating code. Consider the internal call that implements the CreateDirectory method. It now has been decorated with MONO_PREPARE_BLOCKING and MONO_FINISH_BLOCKING to delimit blocking code.
This means that threads do not stop right away as they used to, but they stop soon enough. And it turns out that soon enough is good enough.
This has a number of benefits. First, it allows us to support platforms that do not have enough system primitives to stop, resume and examine arbitrary threads. Those include things like the Windows Store, WatchOS and various gaming consoles.
But selfishly, the most important thing for us is that we will be able to treat the garbage collector code as something that is a first class citizen in the runtime: when the collector works, it will be running in such a state that accessing various runtime structures is fine (or even using any tasty C libraries that we choose to use).
As of today, Mono's Coop engine can either be compiled in by default (by passing --with-cooperative-gc to configure), or by setting the MONO_ENABLE_COOP environment variable to any value.
We have used a precursor of Coop for about 18 months, and now we have a fully productized version of it on Mono master and we are looking for developers to try it out.
We are hoping to enable this by default next year. [1] Astute readers will notice that it still took years of development to make SGen the default collector in Mono.
Posted on 22 Dec 2015