ARC, GC, and Noise

Memory management is never completely transparent in any large-scale program. A program’s memory footprint, the time it spends allocating and managing memory, and the safety of its memory-management primitives define its performance and correctness, and tend to ripple out into other design aspects (consciously or subconsciously) such as the data structures and algorithms we choose.

Lately, my coding has been partly in C, partly in Swift, and partly in Java, and so their respective approaches to memory management have been on my mind. When switching between languages, these differences can sometimes be pretty jarring.

C is conceptually the simplest. It’s all up to you, so you always think about it. Allocating memory on the heap? You need to know who owns it, and who will deallocate it. The performance of your program depends entirely on your own discipline.

Swift inherited automatic reference counting (ARC) from Objective-C. The core frameworks used in both languages rely on reference counting to manage object deallocation. ARC applies static code analysis within the compiler, using a nontrivial set of rules, to automatically inject reference count increment and decrement operations around variable assignments, parameter passing, and so on. This eliminates a whole lot of noise in our code, ensures that reference counts are generally correct, and even gives a performance boost, as the compiler can optimize away many more operations than it could have when we were explicitly writing them.

However, an ARC-based project still contains noise. Here I define two kinds of noise. Textual noise refers to extra code which must be written, read, and comprehended, and which exists to satisfy something like a memory manager and not to achieve any goal of the code. Cognitive noise refers to extra things you must think about while writing or modifying code, which you must get right, but which, again, are not part of the actual algorithms or data structures you are building.

For ARC, these both revolve around the classic problem with any reference-counted scheme: cyclic references or “retain cycles” (as retain is the NSObject method which increments reference counts).

Consider a GUI screen with a button on it. The screen object obviously contains a reference to the button object, which keeps the button’s reference count at least 1. When the button is pressed, the screen needs to do something, so the screen assigns some sort of callback to the button object; that callback ultimately must reference the screen, so the button object also keeps the screen object’s reference count at least 1. Thus a cycle, thus a memory leak. This pattern happens all the time in GUI apps (and quite frequently in other apps): look for the words observer, listener, delegate, callback, or action in your GUI documents or method names.

ARC solves this by allowing the programmer to declare, somewhere along the way, one or more of these references as “weak”. A weak reference doesn’t increment the target’s reference count; instead, it automatically nulls itself when the target is deallocated.

Doing so costs a small amount of textual noise; a few weak keywords inserted here and there. It costs a much larger amount of cognitive noise, however. You must always be aware of your dynamic object graph, especially when using function closures, and this knowledge leaks through class and module interface boundaries. When you pass an object pointer to a method, you need to know if it’s going to be held in a strong or weak reference.

This whole essay came to mind, in fact, when I switched tasks from some Swift coding to some Java coding. I implemented a small Observer-type pattern and found myself, for a moment, worried about the retain cycle. Then I remembered.

A fully garbage-collected language evaporates all of that noise. If your object is reachable from some global variable or some running function, it’s not going to get deallocated. If it’s not reachable, it will get collected, eventually. You stop managing memory; you just allocate and use it. (In high-throughput or low-latency environments, you may need to tune that usage to minimize GC pauses, so the noise comes back a bit, but in those environments, no environment saves you from thinking about performance.)

And a modern GC is fast. Really, really fast. Pointer assignments, parameter passing, and function returns do not require updates of counters, ever. The amortized cost of memory allocation is a few instructions, because there isn’t any bookkeeping to do until the GC has to run. At some point, any GC will need to actually pause your program and do some cleanup, but for most apps those pauses are not a problem. (Source: have released quite a few lines of server-side and GUI code in Java.)

ARC is pretty pleasant to use overall, and I enjoy programming in Swift and Objective-C. But there is a substantive difference between automatic reference counting and true automatic memory management.