This is the 8th day of my participation in Gwen Challenge

The GC tag-collation (also known as compression) algorithm (Mark Compact GC) is a combination of the GC tag-sweep algorithm and the GC replication algorithm.

This paper implements Lisp2 algorithm developed by Donald E. Knuth, based on C language

In the mark-tidy algorithm, the marking stage is exactly the same as that in the mark-clean algorithm. The heap is then searched several times to sort through the live objects.

The collation algorithm is also mobile, does not have fragmentation problems, and does not sacrifice half the heap space compared to the replication algorithm

Noun explanation

object

Objects, in the GC world, represent collections of data and are the basic unit of garbage collection.

Pointer to the

It can be interpreted as a pointer (or handle) in C, and GC searches for objects based on a pointer.

mutator

This word is translated in some places as an assignment, but it’s still a bit weird, so it’s better not to be translated…

Mutator is a word coined by Edsger Dijkstra, meaning to change something. When it comes to changing something, it’s the reference relationships between GC objects. But just saying that may not be understood, in fact, in a word, its entity is “application”.

Mutator works in two ways:

  • To generate the object
  • Update the pointer

Mutator does all this while doing some processing (numerical calculations, web browsing, editing articles, and so on) for the user of the application. As these processes progress, the reference relationships between objects “change.” Garbage is generated along with these changes, and the mechanism responsible for collecting this garbage is the GC.

GC ROOTS

GC ROOTS is the starting point for references, such as stacks, global variables

Heap (Heap)

A heap is a chunk of dynamic memory in a process. In the GC world, a large chunk of heap memory is allocated and then a mutatar is allocated

Active and inactive objects

Live objects are objects that can be referenced by mutatar (GC ROOTS) and non-live objects that cannot be accessed.

The preparatory work

In the mark-collation algorithm, sequential allocation is used, and the sequential allocation flow is shown in the figure below

Maintains a free pointer that moves after each memory allocation. Limit-free is the size of the memory available in the heap

Data structure design

The first is the structure of the object type:

To access the properties of an “object” dynamically, the property offset is used here to record the position of the property, and then the property is obtained through the evaluation of the pointer

typedef struct class_descriptor {
    char *name;/ / class name
    int size;// Class size, i.e. Sizeof (struct)
    int num_fields;// Number of attributes
    int *field_offsets;// Attribute offset in a class, i.e. the offset of all attributes in the struct
} class_descriptor;
Copy the code

Then there is the structure of the object, which, although there is no concept of inheritance in C, can be implemented through structs with common attributes:

typedef struct _object {
    class_descriptor *class;// The type of the object
    byte marked;// Whether reachable
    object *forwarding;// Target position
} object;
Copy the code
Typedef struct emp {class_descriptor *class; typedef struct emp {class_descriptor *class; Byte marked; Object *forwarding; // Target position int id; dept *dept; } emp;Copy the code

With the basic data structure in place, it’s time to implement the algorithm

Algorithm implementation (Lisp2)

The Lisp2 algorithm makes room in the object header for the forwarding pointer. The forwarding pointer is used in the same way as in the GC replication algorithm.

Suppose we want to perform GC in the following case

tag

Firstly, the marking algorithm in mark-finishing is the same as that in mark-clearing. The heap state after the marking phase is shown below:

Mark code:

void mark(object *obj) {
    if(! obj || obj->marked) {return; }

    obj->marked = TRUE;
    printf("marking... \n");
    // Recursively marks a reference to an object
    for (int i = 0; i < obj->class->num_fields; ++i) {
        mark(*((object **) ((void*) obj + obj->class->field_offsets[i]))); }}Copy the code

finishing

Collating code:

void compact(a) {
    set_forwarding();
    adjust_ref();
    move_obj();
}
Copy the code

The finishing stage is divided into three steps:

Evaluates and sets the marshaled object’s Forwarding pointer

void set_forwarding(a) {
    int p = 0;
    int forwarding_offset = 0;

    // Walk through the used part of the heap, instead of walking through the whole heap
    // Because it is a sequential assignment, we only need to traverse to the assigned destination
    while (p < next_free_offset) {
        object *obj = (object *) (p + heap);

        // Set forwarding for reachable objects
        if(obj->marked) { obj->forwarding = (object *) (forwarding_offset + heap); forwarding_offset = forwarding_offset + obj->class->size; } p = p + obj->class->size; }}Copy the code

Adjust the reference of the object to the moved address

As you can see in the figure above, references to GC Roots and other objects have been updated to precomputed forwarding Pointers after adjusting references

void adjust_ref(a) {
    int p = 0;

    // Update the roots reference first
    for (int i = 0; i < _rp; ++i) {
        object *r_obj = _roots[i];
        _roots[i] = r_obj->forwarding;
    }

    // Iterate through the heap again, updating references to the surviving objects
    while (p < next_free_offset) {
        object *obj = (object *) (p + heap);

        if (obj->marked) {
            // Update the reference to forwarding
            for (int i = 0; i < obj->class->num_fields; i++) {
                object **field = (object **) ((void *) obj + obj->class->field_offsets[i]);
                if((*field) && (*field)->forwarding) { *field = (*field)->forwarding; } } } p = p + obj->class->size; }}Copy the code

Moving objects

void move_obj(a) {
    int p = 0;
    int new_next_free_offset = 0;
    while (p < next_free_offset) {
        object *obj = (object *) (p + heap);

        if (obj->marked) {
            // Move the object to forwarding
            obj->marked = FALSE;
            memcpy(obj->forwarding, obj, obj->class->size);
            new_next_free_offset = new_next_free_offset + obj->class->size;
        }
        p = p + obj->class->size;
    }

    // Clear the gap after the move
    memset((void *)(new_next_free_offset+heap),0,next_free_offset-new_next_free_offset);

    // Update free pointer to a new boundary pointer after the move is complete
    next_free_offset = new_next_free_offset;

}
Copy the code

According to the figure above, we can confirm that active objects B, C, D and F correspond to BꞋ, CꞋ, DꞋ and FꞋ respectively. In Lisp2, the collation phase does not change the order of objects, but rather Narrows the gaps between them and brings them together at one end of the heap.

So that’s the mark-tidy algorithm

advantages

There are no fragmentation problems, and you can use the whole heap instead of splitting it in half like a copy algorithm

disadvantages

Collation costs are too high, and in the above implementation, the heap is searched three times. That is to say, the time cost of this algorithm is proportional to the heap size and has nothing to do with the number of viable objects

The complete code

Github.com/kongwu-/gc_…

Related articles

  • Garbage collection algorithm implementation-mark-clean (complete executable C language code)
  • Garbage Collection Algorithm implementation – Reference counting (complete executable C language code)
  • Garbage collection algorithm implementation – Copy (complete runnable C language code)
  • Garbage collection algorithm implementation-mark-Collation (complete executable C language code)
  • Garbage Collection algorithm implementation – generation collection (complete executable C language code)

reference

  • Algorithms and Implementation of Garbage Recycling by Narayo Nakamura, Mitsu Aikawa, Iyuo Takeuchi
  • Handbook of Garbage Collection Algorithms the Art of Automatic Memory Management. By Richard Jones. Translated by Yaguang Wang