Welcome to Tencent cloud community, get more Tencent mass technology practice dry goods oh ~
Author: Liang Kang
RCU(Read-copy Update) is an important synchronization mechanism in Linux. Just as the name implies, “read, copy and update”, or frankly, “read at will, but when updating data, you need to first copy a copy, modify on the copy, and then replace the old data once and for all”. This is a synchronization mechanism implemented by the Linux kernel for “read more than write less” shared data.
Is different from other synchronization mechanism, it allows multiple readers access to a Shared data at the same time, and the reader will not affect the performance of (read “random”), also do not need synchronization mechanism between reader and writer (but need to write after the “copy”), but if there are multiple writer, the writer put “copy” of the updated coverage to the original data, Writers need to use other synchronization mechanisms to ensure synchronization between writers.
A typical application scenario of RCU is linked lists. The Linux kernel also provides a header file (include/ Linux /rculist.h), which provides an interface for adding, deleting, and checking linked lists using the RCU mechanism. This article will use an example, using the interface provided by Rculist. H to add, delete and check the operation of the linked list, to tell the principle of RCU, and introduce the relevant API in the Linux kernel (based on Linux V3.4.0 source code).
Add linked entries
Linux kernel use RCU to add items to the list source as follows:
#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
static inline void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next)
{
new->next = next;
new->prev = prev;
rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;
}Copy the code
Rcu in the list_next_rcu() function is a compilation option for code analysis tool Sparse, which states that Pointers with RCU tags cannot be used directly but must return an RCU-protected pointer using rcu_dereference(). The rcu_dereference() interface is covered later; this section focuses on the Rcu_assign_pointer () interface. Rcu_assign_pointer ()
#define __rcu_assign_pointer(p, v, space) \({ \ smp_wmb(); \ (p) = (typeof(*v) __force space *)(v); The \})Copy the code
The net effect of the above code is to assign the value of v to p, but the key is the memory barrier on line 3. What are Memory barriers? When the CPU uses pipeline technology to execute instructions, it only ensures the execution sequence of memory-dependent instructions, such as P = V. a = *p; , since the memory pointed to by the pointer P accessed by the second instruction depends on the first instruction, the CPU will ensure that the first instruction is completed before the second instruction is executed. For non-memory dependent instructions, such as the __list_add_rcu() interface above, if line 8 is written as prev->next = new; Prev ->next = new; prev->next = new; prev->next = new; prev->next = new New ->prev = prev; , this will cause the new pointer (that is, new item list) haven’t finished the initialization is joined the list, if just happened to have a reader traversal access to the new list item (because of RCU is an important feature can perform a read operation), it will have access to an unfinished initialization list item! This problem is solved by placing a memory barrier, which ensures that instructions in front of the barrier are executed before instructions behind the barrier. This ensures that items added to the list have already been initialized.
As a final note, it is important to note that if there may be multiple threads performing the operation of adding linked items at the same time, the operation of adding linked items needs to be protected with other synchronization mechanisms such as spin_lock, etc.
Access linked list entries
The common code pattern for accessing RCU items in the Linux kernel is:
rcu_read_lock();
list_for_each_entry_rcu(pos, head, member) {
// do something with `pos`
}
rcu_read_unlock();Copy the code
Rcu_read_lock () and rcu_read_unlock() are the key to RCU “read at will”. Their effect is to declare a read-side critical sections. Before we talk about read-side critical sections, let’s look at the macro function list_FOR_each_entry_rcu that reads linked list entries. Rcu_dereference (); rcu_dereference(); rcu_dereference()
#define __rcu_dereference_check(p, c, space) \
({ \
typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
rcu_lockdep_assert(c, "suspicious rcu_dereference_check()" \
" usage"); \ rcu_dereference_sparse(p, space); \ smp_read_barrier_depends(); \ ((typeof(*p) __force __kernel *)(_________p1)); \}) line 3: declare pointer _p1 = p; Line 7: smp_read_barrier_depends(); Line 8: Return _p1;Copy the code
The above two pieces of code can actually be seen as a pattern:
rcu_read_lock();
p1 = rcu_dereference(p);
if(p1 ! = NULL) { //do something with p1, such as:
printk("%d\n", p1->field);
}
rcu_read_unlock();Copy the code
According to the implementation of rcu_dereference(), the net effect is to assign one pointer to another. What if rcu_dereference() was written p1 = p in line 2 above? There is no problem with the general processor architecture. On alpha, however, the compiler’s value-speculation option would reportedly “guess” the value of p1, and then rearrange the order to take p1->field~, so in Linux, The implementation of smp_read_barrier_DEPENDS () is architecture-specific. Arm, x86 and other architectures have empty implementations, while alpha has a memory barrier to ensure that the real address of P is obtained before dereference. So the “__rcu” compilation option mentioned in the previous section, “Adding linked list items,” forces a check to see if rcu_dereference() is used to access RCU-protected data in order to make your code more portable.
Now back to the problem of read-side critical sections. Multiple read-side critical sections are not mutually exclusive, that is, multiple readers can be in the read-side critical section at the same time. However, once a piece of memory data can be referenced by a pointer in the read-side critical section, the data in this memory block must be released until the end of the read-side critical section. The Linux Kernel API that waits for read-side critical sections to end is synchronize_rcu(). Synchronize_rcu () blocks any code that is in a read-side critical section and does not return until all read-side critical sections are complete. To get a sense of the problem, use the following code example:
Rcu_read_lock (); /* rcu_read_lock(); p1 = rcu_dereference(p);if(p1 ! = NULL) { printk("%d\n", p1->field);
}
rcu_read_unlock();
/* free the memory */
p2 = p;
if(p2 ! = NULL) { p = NULL; synchronize_rcu(); kfree(p2); }Copy the code
The following diagram shows the sequential relationship between multiple readers and memory free threads:
In the figure above, each reader’s square represents the period between the reference to p (line 5) and the end of the read-side critical section; T1 stands for the time when p = NULL; T2 indicates the time when synchronize_rcu() starts; T3 represents the time returned by synchronize_rcu(). Let’s start with Reader1,2,3, which have different end times but all get a reference to the p address before t1. Synchronize_rcu () is called at t2 when Reader1’s read-side critical sections have expired but Reader2 and 3 are still in read-side critical sections, waiting until t3 and t3 are completed. Kfree (P2) is used to free memory. The period during which synchronize_rcu() blocks is called a Grace period. Reader4,5, and 6, regardless of the time relationship with the Grace period, cannot get a reference to the p pointer because the reference time is after T1, and therefore does not enter P1! = NULL branch.
Delete linked entries
With Grace periods in mind, it’s easy to understand the deletion of linked items. Common code patterns are:
p = seach_the_entry_to_delete();
list_del_rcu(p->list);
synchronize_rcu();
kfree(p);
其中 list_del_rcu() 的源码如下,把某一项移出链表:
/* list.h */
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
prev->next = next;
}
/* rculist.h */
static inline void list_del_rcu(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
entry->prev = LIST_POISON2;
}Copy the code
According to the example in the previous section “Accessing linked items,” if a reader is able to retrieve the linked item we are trying to delete from the list, then synchronize_rcu() must have entered the read-side critical section before synchronize_rcu(), and synchronize_rcu() guarantees that the read-side critical section does not actually free the item’s memory until the read-side critical section ends. It does not release the linked list item that the reader is accessing.
Update linked list entries
As mentioned earlier, the RCU Update mechanism is “Copy Update”, and the RCU list items Update mechanism is also the typical code pattern:
p = search_the_entry_to_update();
q = kmalloc(sizeof(*p), GFP_KERNEL);
*q = *p;
q->field = new_value;
list_replace_rcu(&p->list, &q->list);
synchronize_rcu();
kfree(p);Copy the code
Lines 3 and 4 make a copy, update on the copy, and then call list_replace_rcu() to replace the old node with the new one. List_replace_rcu () replaces the old node with the new one and frees the memory of the old node. List_replace_rcu ()
static inline void list_replace_rcu(struct list_head *old,
struct list_head *new)
{
new->next = old->next;
new->prev = old->prev;
rcu_assign_pointer(list_next_rcu(new->prev), new);
new->next->prev = new;
old->prev = LIST_POISON2;
}Copy the code
References
[1] What is RCU, Fundamentally? [2] www.kernel.org/doc/Documen… [3] www.kernel.org/doc/Documen… [4] www.kernel.org/doc/Documen… [5] LINUX kernel memory barrier