Über Technik und Philosophie

Memory management in D containers

23. März 2019

Some time ago I was about to write my first container that managed its own memory. It was to be an array, since that is one of the most important data structures. At first glance, it seemed to be very simple; the container would contain a pointer to a block of contiguous memory, modify it as appropriate, and finally free it in the destructor. I started to experiment with my first sketch and it crashed very soon. After quick debugging I said, “Sure it should crash here, but how do I make it work?”

Since then I have written more than one container. In this article I will try to provide an answer to the questions I had at that time. I won’t touch other topics related to containers and data structures — most examples will only take use of arrays. This article is about memory management in containers.

For the code listings, I will use containers from the D standard library (std.container). I’d also like to present my own library, tanya, that maintains a slightly different container design.

Containers and arrays

Containers are abstract data types whose instances are collections of other objects. There can be containers that hold a set of unique numbers, or represent a queue of tasks, or keep a list of titles of your favorite video games. Within the scope of this article, an array is a data structure that provides random access by index.

The code above defines arr as an array containing 3 elements: 5, 8 and 7; and asserts that the second one is 8.

Built-in arrays

One may wonder why there is an array container if D has built-in arrays. A lot of features of built-in arrays rely on the Garbage Collector.

Appending an element to an array can trigger an allocation if there is no room to hold the new element. The same is true when assigning a value to length, e.g. arr.length = 100;. Concatenating two arrays, e.g. a3 = a1 ~ a2; always allocates.

Let us rewrite this example using std.experimental.allocator:

As you can see, creating an array and appending a new element to it are carried out in two steps. To create an array, space for one element is allocated with makeArray(), then a value is assigned to it. Before appending another element, memory must first be allocated with expandArray(), then a value can be assigned to the next position. At the end the array should be freed with dispose().

Notice that capacity is 0. capacity is a property used by the Garbage Collector, representing the number of elements that can be appended before an allocation is triggered. It is 0 when the array is backed by non-GC memory. It means if you want to preallocate some storage for later usage, you have to track the real length of the array and the size of the preallocated storage manually.

Another problem is that makeArray() and expandArray() initialize the allocated memory with initial values (for int it is 0, i.e. int.init), but you probably don’t want to initialize the preallocated storage before you intend to use it.

The standard Array container provides an alternative to the built-in arrays that uses non-GC memory. Using that, we can avoid all of the hassle of tracking the capacity and handling reallocations. Let us rewrite our example once again.

There is no need to free memory at the end and no additional function call before appending a new element with insertBack(). It is as easy as before! Also, like the built-in arrays, enough memory was allocated to allow for multiple insertions before triggering an allocation.

Life cycle

As you may have noticed, there was no free() function call at the end of the last listing. Generally, there is no need to free something that wasn’t allocated with a function like make() or makeArray() from std.experimental.allocator. Since std.container.array.Array is a struct, it will be destroyed at the end of the scope and it will free the memory allocated for its internal storage.

On a side note, Phobos has a class-based container in the form of std.container.rbtree. When I say “goes out of scope”, it doesn’t apply to the RedBlackTree. Since classes are reference types managed by the GC, instances of a RedBlackTree will be destroyed at some future point when no references to them remain.

Let us look at this in more detail. First we will write a “container” for one integer. It:

  1. stores the integer in dynamically-allocated memory in the constructor
  2. and frees it in the destructor (so it is automatically deallocated when the struct goes out of scope)

Consider what happens if the container gets copied:

This code causes a “double free or corruption” error. What happens here? container1 references some dynamically allocated memory through a pointer. When container1 is assigned to container2, the pointer is copied, so container1 and container2 point to the same memory. When both variables go out of scope, they try to free the same memory. Since the standard containers also contain pointers to blocks of memory in one form or another, they should suffer from the same problem. Though if you test similar code with a container, you won’t get an error:

So how it is solved?

Garbage Collector

The first approach is to leave the memory management to the Garbage Collector. This road is taken, for example, by std.container.slist. Such containers often don’t have a destructor; the Garbage Collector takes care of the destruction.

Reference counting

std.container.array uses reference counting, i.e. it keeps track of all copies of a container and destroys the internal storage only after the last copy gets destroyed. A container copy references the same data:

Containers that rely on the Garbage Collector have the same reference semantics, i.e. if we make a copy, both the original and the copied containers reference the same data.

Let us implement such a container storing an array of bytes and see simple reference counting in action:

First of all we added a new member variable, refCounted. refCounted counts the existing references. It should be a pointer since the number of references is global for all copies of a particular container. Also a copy of a copy should be able to adjust the number of existing references to the original container. We achieve this with a pointer to a variable shared between all copies.

The storage is shared between all copies as well — if we change the data in one copy, it is changed everywhere as the last assertion demonstrates.

In the postblit constructor, this(this), we increment the reference count. Although the count is adjusted in the postblit constructor of container2, since refCounted is a pointer, refCounted for container1 and container2 is the same.

In the destructor we don’t just free the memory, but check first whether there are still any references to the container. If there is more than one reference, we decrement the reference count, so the last container destroyed will take care of freeing the memory.

As you can see, reference counting has some storage and performance overhead. It allocates memory for an additional variable and does additional operations in the constructor and destructor, though in many cases the overhead is negligible.

Copying

It’s natural to think that when copying a container, it’s data is copied too. This is what tanya.container does. Each container maintains its own copy of the data:

Here is how it can be implemented:

The constructor doesn’t allocate and initialize any additional storage besides the storage used for the data. The destructor doesn’t have to perform any checks. So there is no additional overhead at the point of construction and destruction. Only a postblit constructor is added, which allocates new storage for container2 and copies the data. container1 and container2 don’t reference the same memory anymore, the value in the storage can be changed independently.

Move semantics

Copying a tanya.container.array (or passing it to a function by value) may be inefficient if copying isn’t desired. If you don’t need the original array and want to transfer ownership of the data to the destination, you can move the container:

tanya.container also supports move semantics. In the following example arr1 isn’t copied, instead its storage is transferred to arr2:

If you omit move here, arr2 will be a copy of arr1.

Moving is also useful for reference-counted containers. If a container is moved, the number of references remains the same, so this container doesn’t need to increase and decrease the reference count, making the overhead of reference counting minimal.

Container ranges

Most of the containers provide one or more ranges that are just a specific view of the container. This means that ranges don’t manage memory themselves.

  • Copying a range from a container with value semantics doesn’t cause the copying of the container.
  • Copying a range from a reference-counted container may or may not change the reference count.
  • std.container.array.Array keeps a reference to the container storage and adjusts the reference count to ensure that the range doesn’t outlive the container.

Ranges provide read-write or read-only access to the elements of a container, but not to the memory these elements are in. It also means that ranges should be passed around with some care. The part of the container a range points to should exist to be accessible from the range, since a range doesn’t keep any elements, it only references some elements in a container.

Container as struct or class member

Imagine there is a struct that stores an array of objects. If we choose a dynamic array as a storage for that objects and manage the memory manually, then we need a destructor to free the memory required for the elements. Furthermore, if the type of the object is a struct with a destructor, we also have to take care of destroying the objects since their destructor won’t be called automatically.

Consider the following element type:

and the following code:

By contrast using a container (whether std.container.array or tanya.container.array) triggers an assertion error:

S doesn’t need a destructor now (to be precise, the compiler generates a destructor for us). Since its member is a value type (Array!T is a struct) and not a reference type like built-in arrays, it gets destroyed automatically with S.

The same applies to the copying and the postblit constructor: The reference count of the array member is automatically decreased if S is copied and decreased when the copy goes out of scope (For the containers with value semantics the array member is copied together with the struct wrapping it).

Container of classes or pointers

Finally, we want to see what happens with the elements of a container if the container is copied or destroyed. It depends whether these elements are value or reference types. It is important to understand whether the elements are owned by the container. Containers can’t destroy and deallocate reference types (like classes) because

  • there can be other references to the same object
  • the container doesn’t know how the referenced memory was allocated (it could be a different allocator)

A reference type only references some object, so if a container contains such references, it can’t own the objects themselves. Thus if you have an Array of class instances, don’t expect their destructors to be called if the container is destroyed. You have to destroy those objects manually.

Conclusion

Some applications may require customized memory management and containers can be of great help here. On the one hand, the programmer can more accurately control how much memory is allocated and when it is deallocated. On the other hand, containers take care of destroying objects properly and freeing memory.


Thanks to Michael Parker for helping out with phrasing.