Most heap objects die young. We can take advantage of their earlydemise to improve the real-time performance of garbage collection.
Automatic memory management is a boon to programmers, but its performance implications can be unacceptable in real-time systems. New techniques in garbage collection have come a long way in overcoming these obstacles. While results vary from one implementation to the next, true efficient, real-time, deterministic garbage collection is here. If you're considering Java for your next embedded application, you'll need to understand the ins and outs of garbage collection.
Java programmers don't have to figure out who owns memory or where to free it; they just allocate objects and let the system free them. This completely eliminates nasty dangling pointer errors and greatly reduces the likelihood of memory leaks. Better still, it simplifies interfaces and enhances software reuse, because you don't need explicit rules about who is responsible for freeing memory. The system process that automatically detects and frees unused objects is called garbage collection.
If garbage collection is so great, you might be wondering why it isn't in ANSI C++. In fact, language extensions to C++ are available that allow one to designate classes as “managed,” such as those found in Microsoft's Visual C++ for the .NET platform. In exchange for accepting limitations on how they are used, you get automatic garbage collection for managed class objects. The problem is that a garbage collector must be able find all the object pointers. Since C++ allows pointers to be cast into other types, there's normally no way to track them all down.
Generational garbage collection is a key enhancement to the traditional garbage-collection algorithm. The basic idea is to focus on the newest objects. Since most objects live fast and die young, the young object set typically contains a high proportion of garbage. Thus we see another example of the “90/10 rule,” where you get 90% of the benefit from 10% of the work. Of course, there's a lot more to it than that.
The traditional “mark and sweep” garbage-collection algorithm works as follows. When memory gets low (for example, at a failing new), the system stops all the user threads, locates the set of unreachable objects, and frees them. To do this, it must examine every object reference, such as local variables and static class fields, starting at “roots.” Each referenced object is checked to see if it is already marked. If it isn't, it's marked and all its references are processed; otherwise you just go on to the next reference. When this is done, any unmarked objects are deemed unreachable and can, therefore, be recycled. Note that this technique is superior to reference-counting schemes because it correctly detects groups of dead objects that are only referenced by other dead objects. Figures 1 and 2 show before and after pictures, respectively, of this process.
Figure 1: Mark and sweep, before copying
Figure 2: Mark and sweep, after copying
Of course, the heap can get low on free space at any time, and the time required to mark and sweep is proportional to the number of objects and references. This produces a pattern in which the application periodically becomes unresponsive or stalls for a brief time. Such stalls are not acceptable in real-time systems, even “soft” real-time systems.
Some implementers attempt to minimize application stalls by a technique called incremental garbage collection. Simply put, the garbage collector will only run for a fixed increment of time, after which it will give the user threads a chance to run. After the application has had more time to run, the incremental garbage collector resumes work where it left off. (Note that some garbage collectors must start over if application threads preempt the garbage collecting thread; such garbage collectors are said to be preemptible, but not incremental.)
Both preemptible and incremental garbage collectors claim to reduce stalls, but the effectiveness of such algorithms varies widely. For example, a preemptible garbage collector is of no use if it is preempted so often that it never gets to make progress. Also, if garbage collection only starts when a thread becomes starved for storage, then at least that thread (if not the whole application) must stall, since it cannot continue until garbage collection releases storage.
The length of the collector's time increment, also called the latency, is also critical to application responsiveness. This time increment can range from seconds down to tenths of milliseconds, depending on the implementation. The smaller the time increment, the more responsive the application. Obviously, there is an engineering trade-off here between the application's responsiveness and the collector's ability to make progress.
A defragmenting garbage collector can move live objects around in memory, coalescing free space into a small number of large chunks. Without defragmentation, allocation of large objects can eventually fail (especially in long-running programs) because the available space is divided up into many small pieces. Memory fragmentation also makes every memory allocation operation take longer, as it gets harder and harder for the system to find free segments of sufficient size to satisfy the application's requests. Conversely, memory allocation is blindingly fast in a defragmented heap. All it has to do is increment a free pointer, which is no harder than allocating a stack frame for a function call.
Defragmenting garbage collectors typically divide heap memory into several regions. As one region is garbage collected, its live objects are compacted into another region. When this is complete, the original region has become empty, and the destination region has no gaps. This requires additional memory, of course, but has the effect of eliminating fragmentation.
Out with the new
Generational garbage collection works as follows. A number of memory regions are designated as the nursery, where new objects are born. As objects age-by surviving rounds of garbage collection-they are moved out of the nursery and into the main heap regions. Some generational algorithms even distinguish between several “older” generations. The garbage collector for the .NET platform and the C# language distinguishes three generations, thus separating the nursery from generation 1, and generation 1 from generations 2 on up. For simplicity we will describe an algorithm with just two generations. The multigenerational algorithm is similar; there's just more bookkeeping.
Generational garbage collection restricts attention to the nursery objects, essentially assuming that all non-nursery objects are still referenced (and thus alive). To do this, it needs a way to calculate the set of all references to nursery objects. To speed this process, a separate “old-to-young references” table is maintained; this lists all references from non-nursery objects to nursery objects. This table greatly speeds the examination of the object graph for the nursery, since non-nursery objects need not be examined at all.
You might wonder how references from non-nursery objects to nursery objects are tracked in the first place. The answer lies in a bit of magic called a write barrier. The write barrier intercepts all stores, and asks, “Are we storing a reference into a non-nursery object?” If so, it then asks whether the old value that is being overwritten and/or the new value being stored are references to nursery objects. It can then update the set of old-to-young references accordingly. Note that the garbage collector can quickly determine if a reference points to a nursery object by comparing its address to the bounds of the nursery region.
Measurement of typical applications has shown that the set of old-to-young references is usually quite small. Measurement also confirms that generational garbage collection divvies the garbage collection work linearly. That is, generational garbage collection can collect the newest eighth of memory in about one-eighth of the time it takes to do a full garbage collection. Of course, the point is, this frees much more than an eighth of the garbage.
Figure 3 shows a situation before garbage collection. The nursery contains three objects: C, D, and E. Object C contains a reference to E. The main heap region contains two objects: A and B. Object A contains a reference to C. This reference is noted in the “old-to-young” set. The root reference set contains references to A and E.
Figure 3: Generational, before collection
In this example, we show the nursery region being garbage collected and defragmented at the same time. We show the nursery being garbage collected directly into a main heap region. This has the effect of promoting all the objects in the nursery. That is, they will not be examined in the next round of generational garbage collection. If our algorithm had several generations, then we promote by defragmenting into a region that then becomes the next higher generation.
During generational garbage collection, the root reference to A is ignored, but the root reference to E causes E to be marked as live. A copy of E is created in the destination region, and the reference to E within C is updated. E contains no references, so nothing more is done with E. The old-to-young set is examined, and the reference from A to C is found. This causes C to be marked as live. A copy of C is created in the destination region, and the reference from A to C is updated. Object C contains a reference to E. Since E has already been copied, the reference in C is updated, but no new copy of E is made. The unreachable object D is not copied; hence it is eliminated when the region is reclaimed. This leads to the results shown in Figure 4.
Figure 4: Generational, after collection
Incremental garbage collection is much more complicated than generational garbage collection. Indeed, for years it was an unsolved problem, although there are now several different implementations that claim to be incremental. Here is a description of one particular implementation of incremental garbage collection.
Consider a defragmenting garbage collector (generational or not) that operates on a “from” and “to” region. As garbage collection proceeds, all the live objects in the “from” region are moved into the “to” region. The copy is driven by a traversal of references. Each reference is processed as follows. If the target object has not yet been copied, it is copied and its references are added to the list of references to be processed. Either way, the original reference is rewritten to reflect the target's new location, and a forwarding address to the target object's new location is left at the original from-space location of that object. When all the referenced objects have been copied and all the references rewritten, then garbage collection (of that region) is complete and the “from” region can be recycled.
Incremental garbage collection operates in bursts of relatively short duration. Between bursts, user threads are permitted to execute, which is what reduces latency. Naturally, the garbage collector cannot guarantee all the references to an object will have been rewritten before the user threads are unleashed. So how can the program continue to execute properly with the references half corrected? Again, the answer lies in a write barrier. When the program attempts to write into memory, the system checks to see if the object being written is currently being moved by garbage collection. If so, then the write is performed in both the old and new locations. This technique avoids the necessity of a “read barrier,” yet guarantees that modifications made via uncorrected references will not be lost.
The astute reader will notice that the set of object references may change between bursts of garbage collection. The removal of references is harmless. If an object's last reference is removed after the garbage collector has already marked that object as live, then that object will surely be collected in the next garbage-collection pass.
The addition of references must, however, be handled carefully. Each time a new reference is stored while garbage collection is “in progress,” that reference must be added to the list of references to be processed when garbage collection resumes. Again, the write barrier guarantees that this will happen.
There are other tricky details relating to the creation of new objects during incremental garbage collection. Care must be taken to allocate these objects in a separate region that will become the nursery when garbage collection completes. After all, if we allow these new objects to immediately participate in the current round of garbage collection, they would be promoted too soon. Put another way, they would miss out on their childhood and wouldn't have the chance to live in the nursery and enjoy quick collection via generational garbage collection.
Closing the loop
All this discussion makes garbage collection sound a lot more straightforward than it really is. There is an exquisitely delicate balance between the amount of time the system needs to spend executing application code and the amount of time performing garbage collection. If the system spends too much time performing garbage collection, throughput suffers. If the system spends too much time executing application code, then the garbage collector isn't able to keep up, eventually leading to a situation where the application must wait for full garbage collection to complete. Thus the worst-case stalling problem is not solved-it's merely made far more rare.
Merely adjusting static parameters for garbage collection is unlikely to achieve a good balance because the load on the garbage collector varies with the application's behavior. To really solve this problem, the garbage-collection subsystem needs to dynamically monitor itself relative to the application. How much memory is the application allocating? How fast is memory being promoted from the nursery into the main heap? Is garbage collection keeping up or falling behind? If it's falling behind, what should we do about it? Should we do generational garbage collection more frequently or keep objects in the nursery region longer? At what point must we schedule a round of full garbage collection so it has time to complete before running out of memory?
A key point is that by “pacing” the garbage collector to stay ahead of the application, it is finally possible to avoid all stalls, thus achieving true real-time, deterministic behavior. Of course, this assumes the application is somewhat well behaved; that is, that it does not allocate and free storage so quickly that the garbage collector cannot keep up. This boils down to the condition that there be enough excess memory to account for the fact that the garbage collector requires some time to free memory. How much padding is required depends on the efficiency of the garbage collector and the worst-case behavior of the application. The improved efficiency of generational garbage collection reduces the amount of padding needed to achieve real-time performance.
It is beyond the scope of this article to discuss how this kind of magic is achieved. Suffice it to say that an excellent garbage collector can probably tell you a whole lot about what's going on inside it just because it needs to know that information itself.
Measure the results
Memory usage and garbage collection overhead is a key statistic in evaluating system performance. Many garbage collectors provide a measurement API to ascertain the rate at which memory is being created, collected, and promoted from the nursery into the main heap. The ability to track application behavior over time is valuable when performance tuning.
Performance tuning tools often focus on measuring the rate at which objects are created and destroyed. Seasoned programmers can usually tell a story of how a great increase in speed was achieved by rewriting code that churns objects. I myself can relate a story in this vein. We had a Java program that converted thousands of timestamps (long integers) into strings. There is a standard Java class method that does this conversion in one step, but it internally creates (and discards) a “date formatter” object. After replacing the high-level operation with its constituent steps, we were able to get by with exactly one date formatter object. Thus we saved the time it took to create (and garbage collect) thousands of date formatter objects.
Sometimes applications know when they are going to be idle. This would be a good time to tell the garbage collection to kick off a full garbage collection. To do this you need a control API that lets you influence the behavior of the garbage collector. However, I hope you will realize from the previous discussion that the garbage collector usually knows better than you when it is time to take out the garbage.
Garbage collection in general, and generational garbage collection in particular, has been the subject of increasing research in recent years. Obviously, the stakes are very high right now: garbage collection is a major factor in the overall performance of Java and C#. This increased focus on the details has been good for the technology.
Generational garbage collection appeared first in the desktop environment. Sun's HotSpot JVM, for example, does generational garbage collection. Incremental garbage collection is typically found much more in the embedded space, however, because it reduces latency. HotSpot does incremental garbage collection too, but its time increment is quite large-collection progress is deemed more important than latency on the desktop.
Generational garbage collection has produced an order of magnitude improvement in garbage-collection performance. It greatly enhances the effectiveness of incremental garbage collection. Embedded developers will find incremental garbage collection plus generational garbage collection gives the best results. Latency can be adjusted as appropriate to the required responsiveness.
Kelvin Nilsen is the chief technology officer at NewMonics. He also serves as the technical chair for the J-Consortium. He earned his BS degree in physics from Brigham Young University and his MS and PhD degrees in computer science from the University of Arizona. He taught computer science for eight years at Iowa State University and holds five patents on inventions relating to garbage collection of real-time systems. His e-mail address is . Kelvin would like to thank to Julien Horn for his contributions to this article.