The code is on Github.
This experiment felt quite simple, especially the latter two small experiments. The main is to multithreading and lock a study.
Uthread: switching between threads
This experiment is to implement a simple user level thread, after writing it turns out that the simple implementation of user level thread is not as complex as imagined.
We first define a context structure to hold the thread context and add it to the thread structure. In this context, only the registers saved by the caller, sp and S0-s11, are saved. Ra is used to hold the return address of the thread, similar to the PC in the process.
struct thread_context{
uint64 ra;
uint64 sp;
uint64 fp; // s0
uint64 s1;
uint64 s2;
uint64 s3;
uint64 s4;
uint64 s5;
uint64 s6;
uint64 s7;
uint64 s8;
uint64 s9;
uint64 s10;
uint64 s11;
};
struct thread {
char stack[STACK_SIZE]; /* the thread's stack */
int state; /* FREE, RUNNING, RUNNABLE */
struct thread_context context; /* context of thread */
};
Copy the code
We then add initialization code to thread_create so that RA points to the thread entry function and SP and FP point to the bottom of the stack. Note that the bottom of the stack should be t->stack[stack_size-1], because the stack grows from high addresses to low addresses.
void
thread_create(void (*func)())
{...// YOUR CODE HERE
t->context.ra = (uint64)func;
t->context.sp = (uint64)&t->stack[STACK_SIZE - 1];
t->context.fp = (uint64)&t->stack[STACK_SIZE - 1];
}
Copy the code
Thread_switch ((uint64)&t->context, (uint64)&next_thread->context); Call. Thread_switch needs to protect and restore the context and resume the execution of the next thread by setting the RA register and ret instructions.
thread_switch:
/* YOUR CODE HERE */
sd ra, 0(a0)
sd sp, 8(a0)
sd fp, 16(a0)
sd s1, 24(a0)
sd s2, 32(a0)
sd s3, 40(a0)
sd s4, 48(a0)
sd s5, 56(a0)
sd s6, 64(a0)
sd s7, 72(a0)
sd s8, 80(a0)
sd s9, 88(a0)
sd s10, 96(a0)
sd s11, 104(a0)
ld sp, 8(a1)
ld fp, 16(a1)
ld s1, 24(a1)
ld s2, 32(a1)
ld s3, 40(a1)
ld s4, 48(a1)
ld s5, 56(a1)
ld s6, 64(a1)
ld s7, 72(a1)
ld s8, 80(a1)
ld s9, 88(a1)
ld s10, 96(a1)
ld s11, 104(a1)
ld ra, 0(a1) /* set return address to next thread */
ret /* return to ra */
Copy the code
Using threads
This experiment is to practice the use of locks by performing parallel operations on hash tables. The code is only for bucket locks.
Because the test program separated the PUT and GET operations, only the mutual exclusion between the PUT operations needs to be considered. The lock is placed before the put function reads or writes the bucket and released at the end of the function.
pthread_mutex_t lock[NBUCKET]; / / define the lock
static
void put(int key, int value)
{
int i = key % NBUCKET;
// is the key already present?
struct entry *e = 0;
pthread_mutex_lock(&lock[i]); / / acquiring a lock
for(e = table[i]; e ! =0; e = e->next) {
if (e->key == key)
break;
}
if(e){
// update the existing key.
e->value = value;
} else {
// the new is new.
insert(key, value, &table[i], table[i]);
}
pthread_mutex_unlock(&lock[i]); / / releases the lock
}
int
main(int argc, char *argv[])
{...// Initialize the lock
for (int i = 0; i < NBUCKET; i++) {
pthread_mutex_init(&lock[i], NULL); }... }Copy the code
Table locking results are as follows:
$ ./ph 1
100000 puts, 7.336 seconds, 13631 puts/second
0: 0 keys missing
100000 gets, 7.599 seconds, 13160 gets/second
$ ./ph 2
100000 puts, 8.965 seconds, 11155 puts/second
1: 0 keys missing
0: 0 keys missing
200000 gets, 7.397 seconds, 27036 gets/second
Copy the code
It can be seen that the performance of multi-thread is even lower than that of single thread, because the table-level lock serializes all operations and cannot take advantage of the performance of multi-thread, and the initialization and switching of multi-thread as well as the acquisition and release of lock itself will also bring some performance overhead.
The results of bucket locking are as follows:
$ ./ph 1
100000 puts, 7.429 seconds, 13461 puts/second
0: 0 keys missing
100000 gets, 7.242 seconds, 13809 gets/second
$ ./ph 2
100000 puts, 4.472 seconds, 22359 puts/second
0: 0 keys missing
1: 0 keys missing
200000 gets, 7.347 seconds, 27221 gets/second
Copy the code
It can be seen that multi-threading can bring some speed in the case of bucket locking, because bucket locking allows operations between different buckets to be executed in parallel, thus taking advantage of multi-threading.
Barrier
This experiment is to implement a barrier point that all threads reach before they can continue. Practice using POSIX condition variables.
All you need to do is implement a barrier function. The implementation of the function does not say much, except to lock and determine the number of threads that have reached the barrier point. If all threads have reached the barrier point, call pthread_cond_broadcast to wake up the other threads, otherwise call pthread_cond_wait to wait.
static void
barrier(a)
{
pthread_mutex_lock(&bstate.barrier_mutex);
bstate.nthread++;
if(bstate.nthread == nthread){
bstate.round++;
bstate.nthread = 0;
pthread_cond_broadcast(&bstate.barrier_cond);
}else{
pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex);
}
pthread_mutex_unlock(&bstate.barrier_mutex);
}
Copy the code