Mono's master tree now contains support for a new mode of
operation for our garbage collector, we call this the
cooperative mode. This is in contrast with the default mode
of operation, the preemptive mode.
This mode is currently enabled by setting
the MONO_ENABLE_COOP environment variable.
We implemented this new mode of operation to make it
simpler to debug our GC, to have access to more data on the
runtime during GC times and also to support certain platforms
that do not provide the APIs that our preemptive system
needed.
Behind Preemptive Mode
When we started building Mono back in 2001, we wanted to
get something up and running very quickly. The idea was to
have enough of a system running on Linux that we could have a
fully self-hosting C# environment in a short period of time,
and we managed to do this within eight months.
We were very lucky when it came to garbage collection that
the fabulous Boehm
GC existed. We were able to quickly add garbage
collection to Mono, without having to think much about the
problem.
Boehm is fabulous because it does not really require the
cooperation of the runtime to work. It is a garbage collector
that was originally designed to add garbage collection
capabilities to programs written in C or C++. It
performs garbage collection without much developer
intervention. And it achieves this for existing code:
multi-threaded, assembly-loving, low-level code.
Boehm GC is a thing of beauty.
Boehm achieves its magic by pulling some very sophisticated
low-level tricks. For example, when it needs to perform a
garbage collection it relies on various operating system
facilities to stop all running threads, examine the stacks for
all these threads to gather roots from the stack, perform the
actual GC job then resume the operation of the program.
While Boehm is fantastic, in Mono, we had needs that would
be better served with a custom garbage collector. One that
was generational and reduced collection times. One fit more
closely with .NET. It was then that we built the current GC
for
Mono: SGen.
SGen has grown by leaps and bounds and has been key in
supporting many advanced scenarios on Android and iOS as well
as being a higher performance and lower latency GC for Mono.
When we implemented SGen, we had to make some substantial
changes to Mono's code generator. This was the first time
that Mono's code generator had to coordinate with the GC.
SGen kept a key feature of Boehm: most running code was
blissfully unaware that it could be stopped and resumed at any
point.
This meant that we did not have to do too much work to
integrate SGen into Mono [1]. There are
two main downsides with this.
The first downside is that we still required the host
platform to support some mechanism to stop, resume and inspect
threads. This alone is pretty obnoxious and caused much grief
to developers porting Mono to strange platforms.
The second downside is that code that runs during the
collection is not really allowed to use many of the runtime
APIs or primitives, because the collector might be running in
parallel to the regular code. You can only use reentrant
code.
This is a major handicap for development and debugging of
the collector. One that is just too obnoxious to deal with
and one that has wasted too much of our time.
Cooperative Mode
In the new cooperative mode, the generated code is
instrumented to support voluntarily stopping execution
Conceptually, you can think of the generated code as one
that basically checks on every back-branch, or every call site
that the collector has requested for the thread to stop.
The supporting Mono runtime has been instrumented as well to deal
with this scenario. This means that every API that is
implemented in the C runtime has been audited to determine
whether it can run in a finite amount of time, or if it is a
blocking operation and adjusted to participate accordingly.
For methods that run in a finite amount of time, we just
wait for them to return back to managed code, where we will
stop.
For methods that might potentially block, we need to add
some annotations that inform our GC that it is safe to assume
that the thread is not running any mutating code. Consider
the internal call that implements
the CreateDirectory
method. It now has been decorated
with MONO_PREPARE_BLOCKING
and MONO_FINISH_BLOCKING to delimit blocking code.
This means that threads do not stop right away as they used
to, but they stop soon enough. And it turns out that soon
enough is good enough.
This has a number of benefits. First, it allows us to
support platforms that do not have enough system primitives to
stop, resume and examine arbitrary threads. Those include
things like the Windows Store, WatchOS and various gaming
consoles.
But selfishly, the most important thing for us is that we
will be able to treat the garbage collector code as something
that is a first class citizen in the runtime: when the
collector works, it will be running in such a state that
accessing various runtime structures is fine (or even using
any tasty C libraries that we choose to use).
Today
As of today, Mono's Coop engine can either be compiled in
by default (by passing --with-cooperative-gc to
configure), or by setting the MONO_ENABLE_COOP
environment variable to any value.
We have used a precursor of Coop for about 18 months, and
now we have a fully productized version of it on Mono master
and we are looking for developers to try it out.
We are hoping to enable this by default next year.
[1] Astute readers will notice that it still took
years of development to make SGen the default collector in Mono.