The code is on Github.

This experiment felt quite simple, especially the latter two small experiments. The main is to multithreading and lock a study.

Uthread: switching between threads

This experiment is to implement a simple user level thread, after writing it turns out that the simple implementation of user level thread is not as complex as imagined.

We first define a context structure to hold the thread context and add it to the thread structure. In this context, only the registers saved by the caller, sp and S0-s11, are saved. Ra is used to hold the return address of the thread, similar to the PC in the process.

struct thread_context{
  uint64     ra;
  uint64     sp;
  uint64     fp; // s0
  uint64     s1;
  uint64     s2;
  uint64     s3;
  uint64     s4;
  uint64     s5;
  uint64     s6;
  uint64     s7;
  uint64     s8;
  uint64     s9;
  uint64     s10;
  uint64     s11;
};

struct thread {
  char       stack[STACK_SIZE]; /* the thread's stack */
  int        state;             /* FREE, RUNNING, RUNNABLE */
  struct thread_context context; /* context of thread */
};
Copy the code

We then add initialization code to thread_create so that RA points to the thread entry function and SP and FP point to the bottom of the stack. Note that the bottom of the stack should be t->stack[stack_size-1], because the stack grows from high addresses to low addresses.

void 
thread_create(void (*func)())
{...// YOUR CODE HERE
  t->context.ra = (uint64)func;
  t->context.sp = (uint64)&t->stack[STACK_SIZE - 1];
  t->context.fp = (uint64)&t->stack[STACK_SIZE - 1];
}
Copy the code

Thread_switch ((uint64)&t->context, (uint64)&next_thread->context); Call. Thread_switch needs to protect and restore the context and resume the execution of the next thread by setting the RA register and ret instructions.

thread_switch:
	/* YOUR CODE HERE */
	sd ra, 0(a0)

	sd sp, 8(a0)
	sd fp, 16(a0)
	sd s1, 24(a0)
	sd s2, 32(a0)
	sd s3, 40(a0)
	sd s4, 48(a0)
	sd s5, 56(a0)
	sd s6, 64(a0)
	sd s7, 72(a0)
	sd s8, 80(a0)
	sd s9, 88(a0)
	sd s10, 96(a0)
	sd s11, 104(a0)
    
	ld sp, 8(a1)
	ld fp, 16(a1)
	ld s1, 24(a1)
	ld s2, 32(a1)
	ld s3, 40(a1)
	ld s4, 48(a1)
	ld s5, 56(a1)
	ld s6, 64(a1)
	ld s7, 72(a1)
	ld s8, 80(a1)
	ld s9, 88(a1)
	ld s10, 96(a1)
	ld s11, 104(a1)

	ld ra, 0(a1) /* set return address to next thread */
	ret    /* return to ra */
Copy the code

Using threads

This experiment is to practice the use of locks by performing parallel operations on hash tables. The code is only for bucket locks.

Because the test program separated the PUT and GET operations, only the mutual exclusion between the PUT operations needs to be considered. The lock is placed before the put function reads or writes the bucket and released at the end of the function.

pthread_mutex_t lock[NBUCKET]; / / define the lock

static 
void put(int key, int value)
{
  int i = key % NBUCKET;

  // is the key already present?
  struct entry *e = 0;
  pthread_mutex_lock(&lock[i]); / / acquiring a lock
  for(e = table[i]; e ! =0; e = e->next) {
    if (e->key == key)
      break;
  }
  if(e){
    // update the existing key.
    e->value = value;
  } else {
    // the new is new.
    insert(key, value, &table[i], table[i]);
  }
  pthread_mutex_unlock(&lock[i]); / / releases the lock
}

int
main(int argc, char *argv[])
{...// Initialize the lock
  for (int i = 0; i < NBUCKET; i++) {
    pthread_mutex_init(&lock[i], NULL); }... }Copy the code

Table locking results are as follows:

$ ./ph 1
100000 puts, 7.336 seconds, 13631 puts/second
0: 0 keys missing
100000 gets, 7.599 seconds, 13160 gets/second

$ ./ph 2
100000 puts, 8.965 seconds, 11155 puts/second
1: 0 keys missing
0: 0 keys missing
200000 gets, 7.397 seconds, 27036 gets/second
Copy the code

It can be seen that the performance of multi-thread is even lower than that of single thread, because the table-level lock serializes all operations and cannot take advantage of the performance of multi-thread, and the initialization and switching of multi-thread as well as the acquisition and release of lock itself will also bring some performance overhead.

The results of bucket locking are as follows:

$ ./ph 1
100000 puts, 7.429 seconds, 13461 puts/second
0: 0 keys missing
100000 gets, 7.242 seconds, 13809 gets/second

$ ./ph 2
100000 puts, 4.472 seconds, 22359 puts/second
0: 0 keys missing
1: 0 keys missing
200000 gets, 7.347 seconds, 27221 gets/second
Copy the code

It can be seen that multi-threading can bring some speed in the case of bucket locking, because bucket locking allows operations between different buckets to be executed in parallel, thus taking advantage of multi-threading.

Barrier

This experiment is to implement a barrier point that all threads reach before they can continue. Practice using POSIX condition variables.

All you need to do is implement a barrier function. The implementation of the function does not say much, except to lock and determine the number of threads that have reached the barrier point. If all threads have reached the barrier point, call pthread_cond_broadcast to wake up the other threads, otherwise call pthread_cond_wait to wait.

static void 
barrier(a)
{
  pthread_mutex_lock(&bstate.barrier_mutex);

  bstate.nthread++;

  if(bstate.nthread == nthread){
    bstate.round++;
    bstate.nthread = 0;
    pthread_cond_broadcast(&bstate.barrier_cond);
  }else{
    pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex);
  }

  pthread_mutex_unlock(&bstate.barrier_mutex);
}
Copy the code