This is the 8th day of my participation in Gwen Challenge
The GC tag-collation (also known as compression) algorithm (Mark Compact GC) is a combination of the GC tag-sweep algorithm and the GC replication algorithm.
This paper implements Lisp2 algorithm developed by Donald E. Knuth, based on C language
In the mark-tidy algorithm, the marking stage is exactly the same as that in the mark-clean algorithm. The heap is then searched several times to sort through the live objects.
The collation algorithm is also mobile, does not have fragmentation problems, and does not sacrifice half the heap space compared to the replication algorithm
Noun explanation
object
Objects, in the GC world, represent collections of data and are the basic unit of garbage collection.
Pointer to the
It can be interpreted as a pointer (or handle) in C, and GC searches for objects based on a pointer.
mutator
This word is translated in some places as an assignment, but it’s still a bit weird, so it’s better not to be translated…
Mutator is a word coined by Edsger Dijkstra, meaning to change something. When it comes to changing something, it’s the reference relationships between GC objects. But just saying that may not be understood, in fact, in a word, its entity is “application”.
Mutator works in two ways:
- To generate the object
- Update the pointer
Mutator does all this while doing some processing (numerical calculations, web browsing, editing articles, and so on) for the user of the application. As these processes progress, the reference relationships between objects “change.” Garbage is generated along with these changes, and the mechanism responsible for collecting this garbage is the GC.
GC ROOTS
GC ROOTS is the starting point for references, such as stacks, global variables
Heap (Heap)
A heap is a chunk of dynamic memory in a process. In the GC world, a large chunk of heap memory is allocated and then a mutatar is allocated
Active and inactive objects
Live objects are objects that can be referenced by mutatar (GC ROOTS) and non-live objects that cannot be accessed.
The preparatory work
In the mark-collation algorithm, sequential allocation is used, and the sequential allocation flow is shown in the figure below
Maintains a free pointer that moves after each memory allocation. Limit-free is the size of the memory available in the heap
Data structure design
The first is the structure of the object type:
To access the properties of an “object” dynamically, the property offset is used here to record the position of the property, and then the property is obtained through the evaluation of the pointer
typedef struct class_descriptor {
char *name;/ / class name
int size;// Class size, i.e. Sizeof (struct)
int num_fields;// Number of attributes
int *field_offsets;// Attribute offset in a class, i.e. the offset of all attributes in the struct
} class_descriptor;
Copy the code
Then there is the structure of the object, which, although there is no concept of inheritance in C, can be implemented through structs with common attributes:
typedef struct _object {
class_descriptor *class;// The type of the object
byte marked;// Whether reachable
object *forwarding;// Target position
} object;
Copy the code
Typedef struct emp {class_descriptor *class; typedef struct emp {class_descriptor *class; Byte marked; Object *forwarding; // Target position int id; dept *dept; } emp;Copy the code
With the basic data structure in place, it’s time to implement the algorithm
Algorithm implementation (Lisp2)
The Lisp2 algorithm makes room in the object header for the forwarding pointer. The forwarding pointer is used in the same way as in the GC replication algorithm.
Suppose we want to perform GC in the following case
tag
Firstly, the marking algorithm in mark-finishing is the same as that in mark-clearing. The heap state after the marking phase is shown below:
Mark code:
void mark(object *obj) {
if(! obj || obj->marked) {return; }
obj->marked = TRUE;
printf("marking... \n");
// Recursively marks a reference to an object
for (int i = 0; i < obj->class->num_fields; ++i) {
mark(*((object **) ((void*) obj + obj->class->field_offsets[i]))); }}Copy the code
finishing
Collating code:
void compact(a) {
set_forwarding();
adjust_ref();
move_obj();
}
Copy the code
The finishing stage is divided into three steps:
Evaluates and sets the marshaled object’s Forwarding pointer
void set_forwarding(a) {
int p = 0;
int forwarding_offset = 0;
// Walk through the used part of the heap, instead of walking through the whole heap
// Because it is a sequential assignment, we only need to traverse to the assigned destination
while (p < next_free_offset) {
object *obj = (object *) (p + heap);
// Set forwarding for reachable objects
if(obj->marked) { obj->forwarding = (object *) (forwarding_offset + heap); forwarding_offset = forwarding_offset + obj->class->size; } p = p + obj->class->size; }}Copy the code
Adjust the reference of the object to the moved address
As you can see in the figure above, references to GC Roots and other objects have been updated to precomputed forwarding Pointers after adjusting references
void adjust_ref(a) {
int p = 0;
// Update the roots reference first
for (int i = 0; i < _rp; ++i) {
object *r_obj = _roots[i];
_roots[i] = r_obj->forwarding;
}
// Iterate through the heap again, updating references to the surviving objects
while (p < next_free_offset) {
object *obj = (object *) (p + heap);
if (obj->marked) {
// Update the reference to forwarding
for (int i = 0; i < obj->class->num_fields; i++) {
object **field = (object **) ((void *) obj + obj->class->field_offsets[i]);
if((*field) && (*field)->forwarding) { *field = (*field)->forwarding; } } } p = p + obj->class->size; }}Copy the code
Moving objects
void move_obj(a) {
int p = 0;
int new_next_free_offset = 0;
while (p < next_free_offset) {
object *obj = (object *) (p + heap);
if (obj->marked) {
// Move the object to forwarding
obj->marked = FALSE;
memcpy(obj->forwarding, obj, obj->class->size);
new_next_free_offset = new_next_free_offset + obj->class->size;
}
p = p + obj->class->size;
}
// Clear the gap after the move
memset((void *)(new_next_free_offset+heap),0,next_free_offset-new_next_free_offset);
// Update free pointer to a new boundary pointer after the move is complete
next_free_offset = new_next_free_offset;
}
Copy the code
According to the figure above, we can confirm that active objects B, C, D and F correspond to BꞋ, CꞋ, DꞋ and FꞋ respectively. In Lisp2, the collation phase does not change the order of objects, but rather Narrows the gaps between them and brings them together at one end of the heap.
So that’s the mark-tidy algorithm
advantages
There are no fragmentation problems, and you can use the whole heap instead of splitting it in half like a copy algorithm
disadvantages
Collation costs are too high, and in the above implementation, the heap is searched three times. That is to say, the time cost of this algorithm is proportional to the heap size and has nothing to do with the number of viable objects
The complete code
Github.com/kongwu-/gc_…
Related articles
- Garbage collection algorithm implementation-mark-clean (complete executable C language code)
- Garbage Collection Algorithm implementation – Reference counting (complete executable C language code)
- Garbage collection algorithm implementation – Copy (complete runnable C language code)
- Garbage collection algorithm implementation-mark-Collation (complete executable C language code)
- Garbage Collection algorithm implementation – generation collection (complete executable C language code)
reference
- Algorithms and Implementation of Garbage Recycling by Narayo Nakamura, Mitsu Aikawa, Iyuo Takeuchi
- Handbook of Garbage Collection Algorithms the Art of Automatic Memory Management. By Richard Jones. Translated by Yaguang Wang