Machine-level representation of a program

Program code

compile

At compile time, increasing the optimization level can increase the speed, but the compile time will be longer and the code debugging will be more difficult

The build process

Graph LR A(C preprocessor extension source code)-->B(compiler generates two files of assembly code) B-->C C(assembler converts assembly code into binary object code)-->D(linker merges object files with standard Unix library functions to produce the final executable file)Copy the code

Assembly code and object code

Assembly code is close to machine code, the object code is binary format, assembly code is characterized by readable text format representation

The data format

Intel uses the term “word” to refer to 16-bit data types, so it calls 32 bits a double word and 64 bits a four word

Optimizing program performance

Write efficient programs

  • Choose the best set of algorithms and data structures
  • Write source code that the compiler can optimize effectively to convert into efficient executable code

Compile technical

  • Machine-dependent: Low-level details that depend on machines
  • Machine agnostic: Not considering the characteristics of a computer

Memory alias usage

The compiler must assume that different Pointers may point to the same location in memory, hindering compiler optimization

Eliminate circular inefficiencies

Amdahl’s law

When we speed up one part of the system, the impact on the system as a whole depends on how important that part is and how much speed is increased

Number cache-friendly code

Make the most common cases run faster

A few loops of the core function

Within the loop, the number of cache misses is minimal

Time locality

A memory location that has been referenced once will be referenced many times in the near future

Spatial locality

If a memory is referenced once, a nearby memory location is referenced in the near future

Memory mountain

  • The size and stride parameters of the run function allow us to control the degree of locality in which the read sequence is generated
  • The smaller size is, the smaller each read is, the faster variable reference is, and the better time locality is. The smaller the stride, the faster the writing, the better the spatial locality
  • The run function is repeatedly called with different sizes and stride to form a two-dimensional function with time and space locality of read bandwidth, which is called memory mountain

The second part

Chapter 7 Links

7.1 Compiler driver

link

  • Linking is the process of combining different parts of code and data into a single file that can be loaded into storage for execution
  • Links can be performed at compile time (source code -> machine code), load time (the program is loaded into memory for execution), or even run time by the application
  • In the early days, links were performed manually. Now, in the system, they are performed automatically by programs called linkers

7.3 Object File

Relocatable target files

Contains binary code and data that can be merged with other relocatable object files at compile time to create an executable object file

Executable object file

Contains binary code and data that can be copied directly to storage and executed

Sharing target files

A special type of relocatable object file that can be dynamically loaded into storage and linked at load time or run time

7.4 You can relocate the target file

7.9 Loading an Executable object File

P is not a built-in shell command, so the shell considers P to be an executable object file and runs P by calling operating system code that resides in memory called the loader

  • Any Unix program can call the execve function to call the loader
  • The loader copies code and data from the executable object file from disk to memory, and then jumps to the first instruction of the program, the entry point, to run the program
  • The process of copying a program into storage and running it is called loading

How does the loader actually work?

  • Every program on a Unix system runs in a process context that has its own virtual address space
  • When the shell runs a program, the parent shell process generates a child process that is a copy of the parent process
  • The child process starts the loader with the execve system call, which removes the child process’s existing virtual memory segments and creates a new set of code, data, heap, and stack segments
  • The new stack and heap segments are initialized to zero, and the new code and data segments are initialized to the contents of the executable by mapping the pages in the virtual address space to the large sub-blocks of the executable’s pages
  • The loader jumps to the _start address and eventually calls the application’s main function
  • Except for some header information, there is no copy of data from disk to storage during load until the CPU references a mapped virtual page, at which point the operating system automatically transfers the page from disk to storage using its page scheduling mechanism

7.10 Dynamically Linking shared Libraries

7.12 Position-independent code

The purpose of shared libraries is to allow multiple running processes to share the same library code in memory, saving precious memory resources

How can multiple processes share a copy of a program?

  • Method 1

Assign each shared library implementation a block of dedicated address space and then require the loader to always load the shared library at that address

Bad :1, even if a process doesn’t use the library, that space will still be allocated

2. Difficult to manage: We have to ensure no block will overlap, at a time when a library changed, it must confirm its assigned or before and the size of the block, and create a new library, have to find space for it, with the time development, a system with hundreds of libraries and various versions of the library, unavoidably address space is divided into many small, unused and can no longer use the small hole, and each department The allocation from library to memory is different

  • Method 2

Compile library code

Without the need for the linker to modify the library code, code can be loaded and executed at any address. Such code is called position-independent code (PIC code).

7.13 Tools for processing object files

Chapter 8 Abnormal control flow

8.1 abnormal

  • Exceptions are a form of exception control flow implemented partly by hardware and partly by the operating system
  • The details vary from system to system
  • The basic idea is the same for every system

An exception is a mutation in the control flow in response to some change in processor state

Exception Handling Procedure

In any case, when the processor detects that an event has occurred, it makes an indirect procedure call (exception) through a jump table called an Exception table to an operating system subroutine designed to handle such events, an exception handler handler)

When the exception handler completes processing, one of three things can happen, depending on the exception type:

  1. Return control to the current instruction Icurr(the instruction that was executing when the exception occurred)
  2. Will control to return Inext(the next instruction that will be executed if no exception occurs)
  3. Terminates an interrupted program

8.1.1 Exception Handling

  • Each type of exception is assigned a unique non-negative integer exception number, some by the designer of the processor and others by the designer of the operating system kernel (the part of the operating system resident memory)
  • Processor designer assignment: division by zero, missing pages, memory access violation, breakpoints, and arithmetic overflow
  • Operating system kernel allocation: system calls, signals from external I/O devices
  • When the system starts, the operating system allocates and initializes a jump table called an exception table
  • The exception number is the index in the exception table, whose starting address is placed in a special CPU register called exception Table Base Register

Exceptions are similar to procedure calls, except that:

  • If control is transferred from a user program to the kernel, all of these items are pushed onto the kernel stack, not onto the user stack
  • Exception handlers run in kernel mode, which means they have full access to all system resources
  • In a procedure call, the processor pushes the return address onto the stack before a jump, whereas when handling an exception, the return address is either the current instruction or the next instruction, depending on the exception type

8.1.3 Types of Exceptions

interrupt
  • Hardware interrupts are asynchronous in the sense that they are not caused by any specific instruction
  • I/O devices, such as network adapters, disk controllers, and timer chips, trigger interrupts by signaling a pin on the processor chip and placing an exception number on the system bus that identifies the device causing the interrupt
trap
  • Intentional exception, the result of executing an instruction
  • Returns control to the next instruction
  • Purpose: To provide a procedure-like interface between a user program and the kernel, called a system call
  • The difference between a system call and a normal function call is that a normal function runs in the user model, which limits the type of instructions the function can execute and access only the same stack as the calling function.
  • System calls run in kernel mode, which allows system calls to execute instructions and access stacks defined in the kernel

The fault
  • Caused by an error condition and may be corrected by a fault handler
  • Can be corrected, control returned to the fault command, re-execution
  • The handler returns an ABORT routine in the kernel, which terminates the application that caused the failure

Termination of
  • Terminations are the result of unrecoverable fatal errors — typically hardware errors such as parity errors that occur when DRAM or SRAM bits are corrupted
  • Returns control to an ABORT routine, which terminates the application

8.2 process

  • When we run a program on a modern system, we will have an illusion, as if our program is currently running in the system only, our program seems to be the exclusive use of processor and memory, the processor is like continuous one by one, carry out our program of instruction, in the end, we in the program code and data appear as system The illusion of a unique object in unified storage is provided by the concept of a process
  • An instance of an executing program
  • Each program runs in the context of some process
  • A context consists of the states required for a program to run correctly, including the program’s code and data in storage, stacks, the contents of general purpose storage, program counters, environment variables, and a collection of open file descriptors

Key abstractions that a process provides to an application

  • A separate logical control flow gives us the feeling of exclusive use of the processor
  • A private address space allows us to feel exclusive use of the memory system

8.2.1 Logical Control Flow

  • In general, each logical control flow wants to be independent of the other logical flows
  • The only exception occurs when that process explicitly interacts with other processes using interprocess communication (IPC) mechanisms, such as pipes, sockets, shared storage, and semaphores
  • Several logical flows that overlap in time are called concurrent processes, and these two processes are said to run concurrently
  • The concept of a process running in rotation with other processes is called multitasking, and each period of time in which a process executes part of its control flow is called a slice
  • Multitasking is also called time sharding

8.2.2 Private Address Space

The structure of the address space for Linux processes
  • The bottom three quarters of the address space are reserved for user programs, including the usual text, data, heap, and stack segments
  • The top quarter is reserved for the kernel (such as the code, data, and stack used when making system calls)

8.2.3 User Mode and kernel Mode

  • The processor provides this functionality with a mode bit in a control register that describes the current rights enjoyed by the process
  • A process running in kernel mode that can execute any instruction in the instruction set and access any memory location in the system
  • Processes in user mode cannot directly refer to code and data in the kernel area of the address space and must access the kernel code and data indirectly through the system call interface
  • The only way a process can change from user mode to kernel mode is through an exception such as an interrupt, failure, or falling into a system call

8.2.4 Context Switch

  • The operating system kernel implements multitasking using a more advanced form of exception control flow called context switch
  • The kernel can decide to preempt the current process and restart a previously preempted process, a decision called scheduling, which is handled by code in the kernel called a scheduler
Context switching process
  1. Saves the context of the current process
  2. Restores a context saved by a preempted process
  3. Pass control to the newly restored process
  • Interrupts can also trigger context switches. For example, all systems have some mechanism for generating periodic timer interrupts, typically every 1ms or 10ms. Each time a timer interrupt occurs, the kernel determines that the current process has run long enough and switches to a new process

The switching process

  1. It takes A long time (on the order of tens of ms) for disk to read data, so the kernel performs A context switch from process A to process B rather than wait
  2. Notice that prior to the switch, the kernel was executing instructions on behalf of process A in user mode
  3. In the first step of the switch, the kernel executes instructions in kernel mode on behalf of process A, then at some point the kernel starts executing instructions on behalf of process B (kernel mode), and when the switch is complete, the kernel executes instructions in user mode on behalf of process B
  4. B in user mode for A while until an interrupt signal from the disk, the data has been spread to the memory, the kernel to determine B has run long enough, will perform A switch from B A context switch, control returns to A close in the read system call after the instruction, process A continues to run, until the next exception occurs
Cache contamination and exception control flow
  • In general, hardware cache memory (L1,L2,L3 are very small) does not interact well with exception control flows such as interrupts and context switches
  • If the current process is interrupted, the cache is cold for the interrupt handler
  • If the handler accesses enough entries from main memory, the cache is also cold to it while the interrupted process continues
  • When a process continues after a context switch, the cache is also cold to the application and must be warmed up again

8.3 System Call and Error Handling

  • We refer interchangeably to system calls and their associated wrapper functions as system-level functions

8.4 Process Control

8.4.2

Three states of a process
  1. The running process either executes on the CPU or waits to be executed and is eventually scheduled
  2. The execution of a suspended process is suspended and will not be scheduled. When SIGSTOP, SIGTSTP, SIDTTIN, or SIGTTOU is received, the process is suspended and remains suspended until it receives a SIGCONT signal, at which point the process starts running again
  3. The termination process is stopped forever; There are three reasons to terminate a process: receiving a signal whose default behavior is to terminate the process; Return from the main program; Call the exit function.
The fork function
  • The parent creates a new running child by calling the fork function
  • The newly created child is almost, but not exactly, the same as the parent
  • The child gets the same copy of the parent’s user-level virtual address controls, including text, data and BSS segments, heaps, and user stacks. The child also gets the same copy of any open file descriptors as the parent, meaning that when the parent calls fork, the child can read and write any files that are open in the parent
  • The biggest difference between the parent process and the newly created child process is that they have different Pids
  • Fork is called only once, but returns twice: once in the calling process (parent) and once in the child. In the parent,fork returns the PID of the child, and in the child,fork returns 0
  • The return value fork is used to determine whether the program is being executed in the parent or child process

8.4.3 Reclaiming a subprocess

  • When a process terminates for some reason, the kernel is not immediately removed from the system, but is kept in a terminated state until reaped by its parent process (reaped).
  • When the parent reclaims the terminated child, the kernel passes the exit status of the child to the parent and then abandons the terminated process. From this point on, the process does not exist. The forehead
  • A process that has died but has not been reclaimed is called a zombie.
  • If the parent process does not reclaim its dead children, it terminates, and the kernel arranges the init process to reclaim them
  • The init process has a PID of 1 and is created by the kernel when the system is initialized

8.4.4 Hibernating processes

8.5 signal

  • Higher-level software exceptions, called Unix signals, allow processes to interrupt other processes
  • A signal is a message that informs a process that an event of some type has occurred in the system
  • Each signal type corresponds to a certain type of system event
  • Underlying hardware exceptions are handled by kernel exception handlers and are usually invisible to user processes
  • Signals provide a mechanism for notifying user processes of the occurrence of these exceptions

The case of signals

  1. Ctrl-c, the kernel sends a SIGINT signal to the foreground process
  2. A process can forcibly terminate another process by sending a SIGKILL(number 9) signal
  3. When a child process terminates or pauses, the kernel sends a SIGCHLD(number 17) to the parent process

8.5.1 Signal Terminology

  • A signal that is sent but not received is called a pending signal.
  • There is at most one signal of a type to be processed at any one time, and subsequent k-type signals sent to the process are not queued but simply discarded
  • A process can selectively block receiving a signal. When a signal is blocked, it can still be sent, but the resulting signal to be processed will not be received until the process unblocks the signal

8.5.2 Sending signals

Process group
  • Each process belongs to only one process group, which is identified by a positive integer process group ID
  • By default, a child process and its parent process belong to the same process group
  • A process can change its own or another process’s process group by using the setpgid function

8.5.3 Receiving signals

8.5.4 Signal processing Problems

  • The system signal can be interrupted. System calls such as Read, write, and Accept, which potentially block the process for an extended period of time, are called slow system calls
  • In some systems, when the handler catches a signal, the interrupted slow system call does not continue when the signal handler returns, but immediately returns an error condition to the user

8.5.5 Portable signal processing

  • Posix standards define sigAction functions that explicitly specify the signal-processing semantics they want
  • The Signal wrapper function sets up a Signal handler
  1. Only the type of signal that the handler is currently processing is blocked
  2. As with all signal implementations, signals do not queue
  3. Interrupted system calls are restarted automatically whenever possible

8.6 Non-Local Hop

  • C provides a form of user-level exception control flow called nonlocal jump. It transfers control directly from one function to another currently executing function without going through the normal call-return sequence
  • An important use of non-local jumps is to allow an immediate return from a deeply nested function call, usually caused by the detection of an error condition
  • If you find an error condition in a deeply nested function call, you can use a non-local jump to go straight back to a normal localized error handler instead of laboriously unwinding the call stack
  • Another important use of non-local jumps is to have a signal handler branch to a particular code location, rather than returning to the location where the signal arrived at the interrupt instruction

8.7 Tools for Operating processes

Unix systems provide a number of useful tools for monitoring and manipulating processes:

  • Strace: Prints the trace of each system call called by a program and its children
  • Ps: Lists the current processes in the system (including zombie processes)
  • Top: Prints information about the current process resource usage
  • Kill: Sends a signal to a process
  • /proc: a virtual file system that outputs the contents of a large kernel data structure in ASCII text format, which can be read by user programs; For example, type “cat /proc/loadavg” to see the current load average on your Linux system

Chapter ix Measurement of program execution time

9.1 Time flow on computer systems

  • On a macro time scale, the processor is constantly switching between many tasks, assigning approximately 5 to 20ms to each task at a time
  • The user feels that the task is going on simultaneously, because the person is not aware of periods of time shorter than about 100ms, during which the processor can execute millions of instructions

9.1.1 Process scheduling and timer interruption

  • External events, such as keyboard, disk operations, and network activity, generate interrupt signals that allow the operating system scheduler to run and possibly switch to another process
  • We also want processors to switch from one process to another so that the user looks like the processor is executing many programs at the same time
  • The computer has an external timer that periodically sends interrupt signals to the processor,
  • The time between interrupt signals is called interval time.
  • When an interrupt occurs, the scheduler can choose to either continue the currently executing process or switch to another process
  • The interval must be short enough to switch between tasks frequently
  • Switching from one process to another takes thousands of clock cycles to save the state of the current process and prepare the state for the next, and too short a time interval can lead to poor performance
  • Typical timer intervals range from 1 to 10ms
  • Operating system functions, such as handling missing pages, input, or output (the print function), are performance-intensive to print logs

9.1.2 Time from an application perspective

9.2 Using Interval counting to measure the time

  • Useful for approximating program performance, granularity is too coarse to be used for measurements lasting less than 100ms
  • There are systematic biases, overestimated calculation times, averaging about 4%
  • Advantages: Its accuracy is not very dependent on system load

9.3 Cycle Counter

  • A timer that runs at the clock cycle level is a special register that increments by 1 each clock cycle
  • Not all processors have such counters

9.4 Use cycle counter to measure program execution time

9.5 Measurement based on getTimeofday function

9.7 Looking ahead

Several characteristics are introduced into the system which have great influence on performance measurement

  • Frequency-changing clocks: To reduce power consumption, future systems will change the clock frequency because power consumption is directly related to clock frequency

9.8 Real life: K times optimal measurement method

9.9 Lessons learned

  • Every system is different
  • Experiments can be very revealing
  • Getting accurate timing is particularly difficult on heavily loaded systems
  • A cache can greatly affect the execution time of a program. The traditional technique is to either empty the cache of all useful data before the timing starts, or to load in all data that would normally be in the cache at the start

Chapter 10 Virtual Memory

  • Memory is easily corrupted, and if one process accidentally writes memory that another process uses, the process may fail in some confusing way that has nothing to do with program logic
  • Virtual storage is a perfect interaction of hardware exceptions, hardware address translation, main memory, disk files, and kernel software, providing a large, consistent, private address space for each process

Virtual storage provides three important capabilities:

  1. He made efficient use of main memory by treating it as a cache of address space stored on disk, keeping only active areas in main memory and passing data back and forth between disk and main memory as needed
  2. It simplifies storage management by providing a consistent address space for each process
  3. It protects the address space of each process from being corrupted by other processes

10.1 Physical and Virtual addressing

  • The main memory of a computer system is organized into an array of M contiguous byte size cells. Each byte has a unique physical address. The first byte has an address of 0, the next byte has an address of 1, the next byte has an address of 2, and so on
  • The most natural way for a CPU to access memory is to use physical addresses, known as physical addressing
  • Modern processors designed for general-purpose computing use virtual Addressing

The process of virtual addressing
  • The CPU accesses main memory by generating a virtual address. Address translation translates the virtual address into a physical address
  • Address translation requires close cooperation between the CPU hardware and the operating system
  • Dedicated hardware on the CPU chip called the MEMORY Management Unit (MMU) dynamically translates virtual addresses using a query table stored in main memory, the contents of which are managed by the operating system

10.2 Address Space

  • An address space is an ordered collection of non-negative integer addresses
  • If the integers in the address space are continuous, it is a linear address space
  • In a system with virtual storage, the CPU changes from one with
N = 2^n
Copy the code

The virtual address space is called the virtual address space

  • The size of an address space is described by the number of bits needed to represent the largest address
  • Modern systems typically support 32-bit or 64-bit virtual address Spaces
  • A system also has a physical address space that corresponds to M bytes of physical storage in the system

10.3 Virtual storage as a caching tool

10.3.1 Organization of THE DRAM cache

10.3.2 page table

  • The page table maps virtual pages to physical pages and is read each time the address translation hardware translates a virtual address to a physical address
  • The operating system is responsible for maintaining the contents of the page table and transferring pages back and forth between disk and DRAM

10.3.3 page hit

10.3.4 missing page

  • A DRAM cache miss is called a page miss
  • The activity of transferring pages between disks and memory is called Cleanup or Paging
  • The strategy of paging in pages only when a miss occurs is called Demand Paging

10.3.5 Assigning a Page

10.3.6 Local Rescue again

  • As long as our program has good time locality, the virtual memory system works quite well
  • If the size of the working set exceeds the size of the physical memory, the program incurs an unfortunate state called thrashing, in which pages are constantly moving in and out

10.4 Virtual storage as a storage management tool

10.4.1 Simplifying links

  • A separate address space allows each process to use the same basic format for its memory image, regardless of where the code and data are actually stored in physical storage

10.4.2 Simplifying Sharing

  • In some cases, processes are needed to share code and data; for example, each process must call the same operating system kernel code, and every C program calls a standard library program, such as printf
  • Instead of including separate copies of the kernel and C standard library in each process, the operating system shares a copy of this code by mapping the appropriate virtual pages in different processes to the same physical page

10.4.3 Simplifying Storage Allocation

  • When a process requests additional heap space, the operating system allocates an appropriate number (for example, K) of contiguous virtual storage pages and maps them to K arbitrary physical pages at any location in physical storage
  • Because of the way page tables work, there is no need for the operating system to allocate K contiguous physical memory pages; pages can be randomly scattered in physical memory

10.4.4 Simplified Loading

  • Mapping a collection of contiguous virtual pages to any location in any file is called memory mapping.

10.5 Virtual memory as a storage protection tool

  • The SUP bit indicates whether the process must be running in kernel mode in order to access the page
  • If an instruction violates these license conditions, the CPU triggers a general protection fault, passing control to an exception handler in a kernel, which the Unix Shell calls a segmentation fault.

10.6 Address Translation

  • Address translation is a mapping between elements in the virtual address space of an N-element and elements in the physical address space of an M-element

10.7 Case study :Pentium/Linux memory system

10.8 Memory Mapping

10.8.1 Viewing the Shared Object Again

  • An object can be mapped to an area of virtual storage, either as a shared object or as a private object
  • The virtual memory region that a shared object maps to is called a shared region, and the virtual memory region that a private object maps to is called a private region
  • Any writes that a process makes to a shared object are visible to other processes that also map the shared object to their virtual storage, and these changes are reflected in the original object on disk
  • Private objects are mapped to virtual storage using a clever technique called copy-on-write
  • Both processes map a private object process and share the same physical copy of the object. For each process that maps a private object, the page table entries for the corresponding private area are marked as read-only, and the area structure is marked as a private write time copy
  • As long as no process tries to write to its own private region, they can continue to share a single copy of an object in physical storage. However, as long as one process tries to write to a page in the private region, the write operation triggers a protection failure
Copy-on-write process
  • When writes trigger protection fault, fault handler will be in physical memory to create a new copy of the page, update the page table entries point to the new copy, and then the restoration of this page to write permissions, when the fault handler returns, the CPU to perform the write operation, in the newly created page, the write operation can be properly carried out
  • Copy-on-write makes the best use of scarce physical memory by delaying copying in private objects until the last possible moment

10.8.2 Look at the fork function

  • When fork is called by the current process, the kernel creates various data structures for the new process, assigns a unique PID, makes an as-is copy of the current process’s MM_struct, region structure, and page table, marks each page in both processes as read-only, and marks each region structure in both processes as a private write-time copy
  • When fork returns in the new process, the new process now has exactly the same virtual memory that existed when fork was called. When either process later writes, the copy-on-write mechanism creates a new page, thus preserving the abstraction of private address space for each process

10.8.3 Look at the execve function

The execve function loads and runs the program contained in the executable object file A.out in the current process, effectively replacing the current program with the A.out program

  1. Example Delete an existing user area
  2. Mapping private areas
  3. Mapping a shared Area
  4. Set program counter

10.8.4 User-level Memory Mapping using the MMAP function

Unix processes can use the Mmap function to create new virtual storage areas and map objects to them

10.9 Dynamic Memory Allocation

  • While low-level mmap and MUNmap functions can be used to create and remove areas of virtual memory, most C programs use a Dynamic Memory allocator when they need additional virtual memory at run time
  • A dynamic memory allocator maintains a process’s virtual memory area, called the heap.
  • On most Unix systems, the heap is an area that requests binary zeros
  • For each process, the kernel maintains a variable BRK (break) that points to the top of the heap

  • The allocator maintains the heap as a collection of blocks of different sizes
  • Each chunk is a contiguous chunk of virtual storage that is either allocated or free
  • Allocated blocks are explicitly reserved for supply use until freed, either by application display execution or implicitly by the memory allocator itself (garbage collection)
Explicit Allocator Display Explicit Allocator
  • The application explicitly frees any allocated blocks, for example, the C library’s malloc function allocates blocks and the free function frees a block
Implicit Allocator
  • An implicit allocator, also known as a garbage collector, is required to detect when an allocated block is no longer used by a program and then release the block.
  • The process of automatically releasing unused allocated blocks is called garbage collection

10.9.1 Malloc and free functions

  • The malloc function returns a pointer to a memory block of at least size bytes
  • After calling free, the pointer still points to the freed block, and the application does not use the pointer until the block is reinitialized by a new malloc call

10.9.2 Why Use Dynamic memory allocation

  • The size of the data structure is not known until the program is actually running

10.9.3 Requirements and objectives of the allocator

Explicit allocators must work under some constraints:
  • The allocator that processes any request sequence must not assume the order in which requests are allocated and released
  • Immediate Response The request allocator must respond to an allocation request immediately and is not allowed to rearrange or buffer requests to improve performance
  • Use heap only In order for the allocator to be extensible, any non-scalar data structures used must be stored in the heap
  • Align blocks allocator must align blocks so that they can hold any type of data object; On most systems, means that the block returned by the allocator is 8-byte (double-word) border-aligned (INT64,float64, etc.)
  • Without modifying an allocated block allocator can only manipulate or change free blocks; Once assigned, it is not allowed to change or move it
  • Target 1: Maximize throughput throughput: The number of requests completed per second (e.g. 500 allocation requests per second,500 release requests per second, i.e. 1000 operations per second)
  • Goal 2: Maximize memory utilization

10.9.4 fragments

  • Fragmentation occurs when unused storage cannot be used to satisfy allocation requests
Internal fragmentation
  • The allocated block is larger than the payload for example, the allocator allocates a minimum of 8 bytes for the allocated block, satisfying the alignment constraint, but a payload is 2 bytes
External fragmentation
  • Free storage adds up to enough to satisfy an allocation request, but no single free block is large enough to handle the request
  • External fragmentation is difficult to quantify and unpredictable, so allocators typically employ heuristic strategies to try to maintain a small number of large free blocks rather than a large number of small free blocks

10.9.5 Implementation Issues

  1. Free Block Organization: How do we record free blocks
  2. Placement: How do we select a suitable free block to place a newly allocated block
  3. Split: After placing a newly allocated block into a free block, what is done with the rest of the free block?
  4. Merge: What do we do with a block that has just been freed?

10.9.6 Implicitly Free linked lists

  • Block = header of a word (4 bytes,32 bits)+ payload + padding
  • The header encodes the fast size (including the header and all padding)

Implicitly free linked lists
  • Pros: Simplicity
  • Disadvantages: The overhead of any operation, such as placing allocated blocks, requires that the search of the free list be linear to the total number of allocated and free blocks in the heap

10.9.7 Placing allocated Blocks

  • When an application requests a k-byte block, the allocator searches the free linked list for a free block that is large enough to place the requested block, and the way the allocator performs this search is determined by the Placement policy
Common Strategies
First Fit

Search the free list from scratch and select an appropriate free block

  • Advantages: Tends to keep large free blocks later in the linked list
  • Disadvantages: Leaves “shards” of small free blocks near the start of the list, increasing the search time for larger blocks
Next Fit

This is similar to the first adaptation, except that instead of starting each search at the beginning of the list, it starts where the last query ended

  • The idea is that if we found a match last time in a free block, we’re likely to find a match next time in the remaining block
  • The next adaptation runs a little faster than the first
  • The memory utilization of the next adaptation is much lower than that of the first adaptation
Best fit

Examine each free block and select the smallest free block that matches the required request size

  • The utilization rate is higher than the first adaptation and the next adaptation
  • Disadvantages: Even in simple free list organizations, such as implicit free lists, best fit requires a thorough search of the heap

10.9.8 Splitting an Idle Block

Once a matching free block is found, you must decide how much space to allocate in that free block

  • Use the whole free block, which is simple and fast, but has the disadvantage of causing internal fragmentation
  • Divide the free block into two parts, with the first part becoming the allocation block and the rest becoming a new free block

10.9.9 Obtaining additional heap storage

What happens if the allocator cannot find a suitable free block?

  1. Merge free blocks that are physically adjacent to each other in memory to create larger free blocks
  2. If 1 still does not generate a large enough chunk, or if the free chunk has been merged to the maximum extent possible, the allocator will request additional heap storage from the kernel by calling mMAP, or SBRK functions
3. In either case, the allocator converts the extra memory into a large free block, inserts it into the free linked list, and places the requested block in the new free block

10.9.10 Merging Free Blocks

  • When an allocator releases an allocated block, the newly released free block may be adjacent to other blocks. These adjacent free blocks may cause a phenomenon called fault fragmentation.
  • These fake fragments are cut up into small, unusable free blocks, and 3 words +3 words cannot be allocated to 4 words
  • To solve the false fragmentation problem, any actual allocator must merge adjacent free blocks, a process known as coalescing.
When to perform a merge
Coalescing immediately

At each time a block is released, all adjacent blocks are merged, which can be done in constant time

  • Disadvantages: In some request patterns, blocks are merged repeatedly and then split immediately, resulting in a large number of unnecessary splits and merges
Deferred coalescing

Wait until a later date to merge free blocks

  • For example, the allocator can delay merging until an allocation request fails, then scan the entire heap to merge all free blocks

10.9.11 Merge with boundary tags

  • The head of the current block points to the head of the next block, and this pointer can be checked to determine whether the next block is free. If so, its size is simply added to the size of the current block head, and the two blocks are merged in constant time
  • Boundary tag allows merging of previous blocks in constant time
  • Add a foot (footer boundary marker) at the end of each block to determine the starting position and state of the previous block by examining the foot

  • The concept of boundary marking is simple and elegant and is universal to different types of allocators and free list organizations
  • Disadvantages: Requiring each block to hold a head and a foot can incur significant memory overhead when an application manipulates many small blocks, such as: If a graphics application repeatedly calls malloc and free to dynamically create and destroy graphics nodes, and each graphics node requires only two memory words, the head and feet will take up half the space of each allocated block
Optimization method of boundary marking

The feet are needed only if the previous block is free. If we store the allocated/free bits of the previous block in the extra low place of the current block, then the allocated block does not need the feet, and the free block still needs the feet

10.9.12 Synthesis: Implement a simple allocator

10.9.13 Displaying Free Lists (Two-way Free Lists)

  • The heap can be organized as a two-way free linked list, with each free block containing a pointer to preD (ancestor) and SUCC (descendant)
  • Using a bidirectional linked list, the allocation time of the first fit is reduced from linear time of the total number of blocks to linear time of the number of free blocks
A strategy for sorting blocks in a free linked list
  • Last in first Out (LIFO)

The newly freed block is placed at the beginning of the linked list. Freeing blocks can be done in constant time, or merging can be done in constant time if boundary markers are used

  • Maintain linked lists in address order

Releasing a block requires a linear time search to locate the appropriate ancestor; Higher memory utilization, close to optimal utilization

Disadvantages of explicitly linked lists
  • The free block must be large enough to contain all the Pointers needed, as well as heads and possibly feet
  • This results in a larger minimum block size, potentially increasing the degree of internal fragmentation

10.9.14 Separated Free Linked Lists (Separated Storage)

  • A popular method to reduce allocation time, commonly known as Segregated Storage, maintains multiple free lists with blocks of roughly equal size each

  • The general idea is to divide all possible block sizes into equivalence classes, also known as size classes.

  • The allocator maintains an array of free lists, each size class being a free list, in ascending order of size. When the allocator needs a block of size N, it searches the corresponding free list. If it cannot find a suitable block to match, it searches the next list, and so on

  • There are many ways to separate storage, the main differences being: how to define size classes, when to merge, when to ask the operating system for additional heap storage, whether to allow partitioning, and so on

Simple Segregated Storage
  • The free list of each size class contains blocks of equal size. Each block is the size of the largest element in the size class. For example, if a size class is defined as {17-32}, then the free list of that class consists entirely of blocks of size 32
  • When allocating, check the corresponding free list. If the list is not empty, simply allocate the whole of the first block. The free block will not be split to satisfy the allocation request
  • If the linked list is empty, the allocator requests an additional memory block of a fixed size from the operating system, divides the chunk into equal chunks, and links these blocks together to form a new free linked list
  • To free a block, you simply insert the block into the front of the corresponding free list
understand
  • The free memory is divided into blocks of fixed size for easy management. For example, a linked list composed of blocks of 32 size is used. When the system requests the allocator to allocate memory within the range of 17-32, a free block is used from the linked list of 32
  • Effectively reduce the number of free linked lists, reduce maintenance difficulty
  • Sometimes the range 17 to 32 needs to be reduced in order to reduce internal debris
advantages
  1. Allocating and releasing blocks are fast constant time operations
  2. There are equal sized blocks in each block, which are not divided or merged, meaning that each block has very little memory overhead
  3. There is no merge, so the allocated block does not need a header (allocated/free tag), nor does it need a foot
  4. Both allocation and release operations operate at the beginning of the free list, so the list only needs to be one-way
  5. The only field needed in any block is the SUCC pointer (successor) for a word in each free block, so the minimum block size is a word
disadvantages
  1. Since free blocks are not partitioned, internal fragmentation can result, such as a large number of 17 size objects using 32 blocks
  2. Because free blocks are not merged, some reference patterns can cause too much external fragmentation. For example, large objects are requested, and the size of small free lists is less utilized, not merged, and wasted
Merge mode to deal with external fragments

The allocator keeps track of the number of free blocks in each chunk returned by the operating system. Whenever a chunk (such as a size-class chunk of 32) consists entirely of free blocks, the allocator removes the block from the current size-class and frees up memory for use by other size-classes

Segregated Fit
  • Each free list is associated with a size class and is organized into some type of explicit or implicit list
  • Each linked list contains potentially different-sized blocks whose sizes are members of the size class
process
  1. Determine the size class of the request, first fit the appropriate free list, and find an appropriate block
  2. If one is found, split it up and insert the rest into the appropriate free linked list
  3. If no suitable block is found, the next free linked list of the larger size class is searched, and the process is repeated until a suitable block is found
  4. If no suitable block is eventually found, additional heap storage is requested, a block is allocated from this new heap storage, and the remaining portion is placed in the largest size class
  5. To free a block, we perform a merge, placing the result into the corresponding free linked list
The advantages and disadvantages
  • Is a common choice, and the GNU Malloc package available in the C library takes this approach
  • Both fast and efficient in the use of memory
  • The search time is reduced because the search is limited to a portion of the heap rather than the entire heap
  • Here’s an interesting fact: A simple first-fit search for a separated free linked list is equivalent to a best-fit search for the entire heap
Partner system
  • The buddy system is a special case of split matching, where each size class is a power of two. The basic idea is that assuming a heap size of two to the power of m, we maintain a separate free linked list for each block size of two to the power of k, where 0<=k<=m
  • The request block size is rounded up to the nearest power of 2
  • You start with a free block of two to the power of m words
process

advantages
  • Fast search and fast merge
disadvantages
  • Requiring a block size of powers of two can result in significant internal fragmentation, so partner system allocators are not suitable for general-purpose workloads
  • For some application-specific workloads where the block size advance is known to be a power of two, the partner system allocators are attractive

10.10 Garbage Collection

  • In a display allocator such as the C Malloc package, the application allocates and frees heap blocks by calling malloc and free. The application is responsible for releasing all allocated blocks that are no longer needed
  • A garbage collector is a dynamic storage allocator that automatically frees allocated blocks, called garbage, that are no longer needed by the program
  • The process of automatically reclaiming heap storage is called garbage collection
  • In a system that supports garbage collection, explicitly allocated heap blocks are applied, but they are never explicitly released, and the garbage collector periodically identifies the garbage blocks and calls free accordingly to put them back into the free linked list

10.10.1 Basic elements of the garbage collector

  • Garbage collector treats memory as a reachability graph
  • A set of root nodes and a set of heap nodes, each corresponding to an allocated block in the heap
  • The root node corresponds to a location that is not in the heap and contains Pointers to the heap, which can be registers, variables in the stack, or global variables in the read/write area of virtual storage

  • A node P is reachable when there is a directed path from any root node to P.
  • At any point, unreachable nodes corresponding to garbage cannot be reused by the application
  • The role of the garbage collector is to maintain some representation of the reachable graph and periodically reclaim unreachable nodes by releasing them and returning them to the free linked list
  • A garbage collector in a language like Java, which has tight control over how an application creates and uses Pointers, maintains an accurate representation of the reachable graph, and therefore can collect all garbage
  • Collectors for languages such as C and C++, which generally do not maintain an accurate representation of the reachable graph, are called conservative garbage collectors. They are conservative in the sense that every reachable block is correctly labeled as reachable, while some unreachable nodes may be incorrectly labeled as reachable

10.10.2 Mark&Sweep garbage collector

  • It consists of a Mark phase and a sweep phase
  • The mark phase marks all reachable and allocated successors of the root node, followed by the clear phase that frees each unmarked allocated block
  • One bit in the free lower position in the block header is used to indicate whether the block is marked

10.10.3c program for conservative Mark&Sweep

10.11C Common memory-related errors in programs

10.11.1 Indirectly Referencing a Bad Pointer

10.11.2 Reading Uninitialized memory

10.11.3 Allow stack buffer overflows

  • If a program writes to the target buffer on the stack without checking the size of the input string, the program will have a buffer overflow bug.

10.11.4 Assume that Pointers and the objects they point to are of the same size

10.11.5 Cause dislocation error

  • Creates a pointer array of n elements, but then attempts to initialize n+1 elements of the array. This process overwrites some memory behind the A array

10.11.6 refers to a pointer, not the object to which it points

Part 3: Interaction and communication between programs

Chapter 11 System-level I/O

  • Input/output (I/O) is the process of copying data between main memory and external devices such as disk drives, terminals, and networks
  • Input: Copies data from the I/O device to main memory
  • Output: Copies data from main memory to the I/O device
  • On Unix systems, these higher-level I/O functions are implemented by using system-level Unix I/O functions provided by the kernel

11.1 Unix I/O

  • All I/O devices, such as networks, disks, and terminals, are modeled as files, and all inputs and outputs are performed as reads and writes to corresponding files
  1. Open the file
  2. Change the current file location
  3. Read and write files
  4. Close files: No matter what reason the process terminates, the kernel closes all open files and frees their memory resources

11.2 Opening and Closing the File

11.3 Reading and Writing files

11.5 Reading File Metadata

11.6 Sharing a File

11.7 I/O Redirection

Chapter 12 Network programming

12.1 Client-server programming model

  • Every web application is based on the client-server model
  • An application consists of a server process and one or more client processes
  • The basic operation is a transaction

Client-server transactions versus database transactions

It is not a database transaction, and it does not have features of database transactions, such as atomicity, where a transaction is simply a series of steps performed between a client and a server

12.2 the network

  • Physically, a Network is a hierarchical system composed of geographical distances. The lowest layer is a Local Area Network (LAN), which is located within a building or campus
  • The most popular LAN technology is Ethernet, which has proven to be quite suitable between 3Mb/s and 1Gb/s
  • Each Ethernet adapter has a globally unique 48-bit address
  • An Ethernet segment consists of cables and a small box called a hub, usually serving a small area, such as a room or a floor. The hub indiscriminately copies each bit received from one port to all the other ports, so that each host can see each bit
  • A host can send a bit, called a frame, to any other host in the network segment. Each host adapter can see the frame, but only the destination host can actually read it
  • Frame = header (which identifies the source and destination addresses and the length of the frame)+ payload

  • Bridges make better use of cable bandwidth than hubs, and with a clever allocation algorithm, they automatically learn over time which hosts are reachable through which ports. Frames are then selectively copied from one port to another if necessary
  • For example, if host A sends A frame to host B on the same network segment, it will discard the frame when it reaches the input port of bridge X, thus saving bandwidth on other network segments
  • If host A sends A frame to host C on A different network segment, bridge X will only copy the frame to the port connected to bridge Y, and bridge Y will only copy the frame to the port connected to host C
  • At a higher level, multiple incompatible Lans can be connected through special computers called routers to form an Internet
  • Internet describes general concepts, while Internet is capitalized to describe a specific practical application (the Global IP Internet)
  • A wide-area Network (WAN) covers a larger geographical Area than a LOCAL Area Network

  • The Internet, which can consist of a variety of lans and Wans using completely different and incompatible technologies,
How is it possible for one source host to send bits of data across all these incompatible networks to another destination host?

A layer of protocol software that runs on each host and router. It smoothes out the differences between different networks. This software implements a protocol that controls how hosts and routers work together to transfer data

  • Naming methods Different LAN technologies have different and incompatible ways of assigning addresses to hosts. The Internet protocol eliminates differences by defining a consistent host address format that uniquely identifies it
  • Different networking technologies have different and incompatible ways of encoding bits on cables and encapsulating them into frames. The Internet protocol eliminates differences by defining a uniform way of strapping bits of data into discrete chunks, known as packets
  • A packet consists of a header and a payload. The header contains the size of the packet and the addresses of the source and destination hosts. The payload contains bits of data sent from the source host

12.3 Global IP Internet

The Internet can be thought of as a worldwide collection of hosts with the following characteristics:

  • The host set is mapped to a set of 32-bit IP addresses
  • This set of IP addresses is mapped to a set of identifiers called Internet Domain names
  • A process on an Internet host can communicate with a process on any other Internet host through a connection
12.3.1 IP address
  • An IP address is a 32 – bit unsigned integer
  • The IP address is in dotted decimal notation
12.3.2 Internet Domain names
  • Internet clients and servers use IP addresses to communicate with each other
  • Large integers are hard for people to remember, so a more human set of domain names and a mechanism for mapping domain names to IP addresses were defined
  • Every Internet host has a locally defined domain name, localhost, which is always mapped to the local loopback address 127.0.0.1

12.3.3 Internet Connection
  • Clients and servers communicate by sending and receiving byte streams over a connection
  • A connection is point-to-point in the sense that it connects a pair of processes
  • It is full-duplex in the sense that data flows in both directions
  • A socket is an end-point of a connection. Each socket has a corresponding socket address, which consists of an IP address and a 16-bit integer port,” Address: port”
  • When a client initiates a connection request, the port in the client socket address is automatically assigned by the kernel and is called the Ephemeral port.
  • The port in the server socket is usually a well-known port corresponding to the service, for example, port 80 is used by the Web server and port 25 is used by the email server
  • A connection is uniquely identified by the socket addresses at each end. These socket addresses are called socket pairs.

12.4 Socket Interfaces

  • Socket interfaces are a set of functions that combine Unix I/O functions to create network applications and are implemented on most modern systems

12.4.1 Socket Address Structure
12.4.2 socket function
  • The client and server use the socket function to create a socket descriptor.
12.4.3 connect function
  • The client establishes a connection with the server by calling connect
12.4.4 open_clientfd function
  • It is convenient to wrap the socket and connect functions into a helper function called open_clientfd. When the connect function returns, we return the socket descriptor to the client and the client can immediately start communicating with the server using Unix I/O
12.4.5 bind function
  • Tells the kernel to associate the server socket address with the socket descriptor
12.4.6 listen function
12.4.7 open_listenfd function
  • It is helpful to combine the socket, bind, and listen functions into a helper function called open_listenfd that the server can use to create a listener descriptor
12.4.8 the accept function
  • It is used by the server to wait for connection requests from clients
  • The accept function waits for the request from the client to arrive at the listening descriptor, listenfd, then fills in the client socket address in addr and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions
12.4.9 Echo Client and Server Examples
  • A simple Echo server can handle only one client at a time, iterating between them. This is called an Iterative server.
What does EOF mean?
  • There is no such thing as an EOF character
  • EOF is a condition detected by the kernel when an application receives a zero return code returned by the read function
  • For disk files, EOF occurs when the current file location exceeds the file length
  • For network connections, when a process closes the connection, EOF occurs at the other end of the connection, and the process at the other end detects EOF after trying to read the last byte in the stream

12.5 Web Server

12.5.1 Web based
  • Web clients and servers interact using a text-based application-level Protocol called HTTP(Hypertext Transfer Protocol)
  • FTP, file retrieval service
  • HTML(Hypertext Markup Language)
12.5.2 Web content
  • The content is a sequence of bytes associated with a MIME(Multipurpose Internet Mail Extensions) type

The Web server provides content to clients in two ways
  • Taking a disk file and returning it to the client is called static Content, and the process of returning the file to the client is called serving static Content.
  • Running an executable and returning the output to the client. The output produced by the executable is called dynamic Content, and running the program and returning its output to the client is called serving Dynamic Content.
12.5.3 HTTP transaction
  • The HTTP standard requires that each line of text be terminated by a carriage return and a newline pair
The HTTP request
  • HTTP/1.1 defines additional headers, such as advanced features such as caching and security, and supports a mechanism that allows clients and servers to perform multiple transactions over the same persistent connection
  • HTTP/1.0 and HTTP/1.1 are mutually compatible, and HTTP/1.0 clients and servers simply ignore HTTP/1.1 headers
  • The request header provides the server with additional information, such as the browser’s trademark name, or the MIME type
The HTTP response
  • Response line + Response header + Response body
  • The response header provides additional information about the response. The two most important headers are content-Type, which tells the client the MIME Type of the Content in the response body. And Content-Length, which indicates the size of the response body in bytes

12.5.4 Service Dynamic Content
  • How does the server serve dynamic content to the client? A de facto standard called CGI(Common Gateway Interface) addresses this problem
How does the client pass program parameters to the server?
  • The parameters of the GET request are passed in the URI,”? Each parameter is separated by an ampersand (&). No Spaces are allowed in the parameter and must be represented by the string “%20”. Similar encodings exist for other special characters
  • The parameters in the POST request are in the request body
How does the server pass parameters to child processes
  • After the server receives the following request (GET /cgi-bin/adder? 15000&213 HTTP/1.1), calls fork to create a child process that sets the CGI environment variable QUERY_STRING to “15000&213”. Adder programs can run in Unix The getenv function references it and calls execve to execute the /cgi-bin/adder program in the context of the child process, often referred to as a CGI program (it follows the CGI standard and is often written with Perl scripts, also known as CGI scripts).
  • For POST requests, the child process also needs to redirect standard input to the connected descriptor, from which the CGI program reads the parameters in the request body
How does the server pass additional information to child processes?

CGI defines a number of other environment variables that can be set at run time by CGI programs

Where does the child process send its output?
  • Before the child process loads and runs the CGI program, it uses the Unix dup2 function to redirect the standard output to the connected descriptor associated with the client
  • A CGI program sends its dynamic content to standard output
  • The parent process does not know the type or size of the Content generated by the child process, so the child process is responsible for generating the Content-Type and Content-Length response headers, as well as the empty line after the header

Chapter 13 Concurrent Programming

  • If logical control flows overlap in time, they are concurrent. This phenomenon, called concurrency, occurs at many different levels of computer systems, such as hardware exception handlers, processes, and Unix signal handlers
  • Applications that use application-level concurrency are called concurrent programs.

Application-level parallelism

  • Parallel computation is performed on the processor

On only a single CPU processor, concurrent flow is alternate, at any point in time, only have a flow on the CPU actual execution, however in a machine with multiple cpus, referred to as a multiprocessor, can really execute multiple flow at the same time, be divided into parallel application of concurrent flow, on a multiprocessor machine runs a lot faster

  • Accessing a slow I/O device

When an application is waiting for data to arrive from slow I/O devices (such as disks), the kernel runs other processes to keep the CPU busy and use concurrency by alternating I/O requests with other useful work

  • Interact with people

People who interact with computers require the ability to perform multiple tasks at the same time. For example, when printing a document, you might want to resize a window. Each time a user requests an operation, a separate concurrent logical flow is created to perform that operation

  • Reduce execution time by postponing work

For example, a dynamic storage allocator can reduce the latency of a single free operation by postponing a merge with a concurrent “merge” stream running at a lower priority, using idle CPU cycles

  • Serve multiple network clients

Create a concurrent server that creates a separate logical flow for each client, serving multiple clients simultaneously

Three basic ways to construct concurrent programs

  • process

Each logical control flow is a process that is scheduled and maintained by the kernel. Processes have a separate virtual address space, and to communicate with other processes, control flows must use some explicit interprocess communication (IPC) mechanism

  • I/O multiplexing

Applications explicitly schedule their own logical flows in the context of a process. The logical flows are modeled as state machines, and the main program explicitly transitions from one state to another as a result of data arriving at file descriptors. The program is a separate process, and all streams share the same address space

  • thread

A thread is a flow of logic running in the context of a single process, scheduled by the kernel. It can be thought of as a hybrid of the other two approaches, scheduling by the kernel like a process stream and sharing a virtual address space like an I/O multiplexing stream

13.1 Process-based concurrent programming

  • A client connection request is accepted in the parent process, and a new child process is created to serve each new client
  • A child process is derived from the server. The child gets a full copy of the server descriptor table, closes its listener descriptor, and the parent closes its connected descriptor
13.1.1 Process-based concurrent Servers
  • Typically, the server runs for a long time, so you need a SIGCHLD handler to recycle zombie child resources. When the SIGCHLD handler executes, the SIGCHLD signal is blocked, whereas Unix signals are not queued, so the SIGCHLD handler must be prepared to recycle multiple zombie processes The resources of the routine
  • Parent and child processes must close their respective copies of ConnFD (descriptors) to avoid memory leaks
  • Because of the reference count in the socket’s file table entry, the connection to the client is not terminated until both parent and child processes connFD are closed
13.1.2 The pros and cons of the process
  • Parent and child processes share state information, model: share file table, but not user address space
A separate process address space
  • Advantages: It is impossible for one process to accidentally overwrite another process’s virtual storage
  • Disadvantages: Makes it more difficult for processes to share state information, in order to share information, they must use explicit IPC mechanisms (which tend to be slow and expensive for process control and IPC)

13.2 Concurrent programming based on I/O multiplexing

  • The server must respond to two independent I/O events: a network client initiates a connection request; The user enters the command line on the keyboard
  • Multiplexing requires the kernel to suspend the process and return control to the application only after one or more I/O events have occurred
  • : once connected to a client, the continuous send back input line, until the client closes the connection, at this point, enter a command to the standard input, will not get a response, until the end of the between the client and the server, the better way is more granular multiplexing, server (at most) each time the loop back to send one line of text
13.2.1 Concurrent event-driven servers based on I/O multiplexing
13.2.2 Advantages and disadvantages of I/O multiplexing
advantages
  • It gives the programmer more control over the behavior of the program than process-based design. For example, imagine writing an event-driven concurrent server that provides the services that certain clients need
  • Running in the context of a single process, each logical flow has access to the entire address space of the process, data is easily shared between the flows, and programs can be debugged using debugging tools (such as GDB) as they would for sequential programs
  • Event-driven designs are often significantly more efficient than process-based designs, requiring no process context switch to schedule new flows
disadvantages
  • Coding is complex, for example, an event-driven concurrent server has three times more code than a process-based server, and the complexity increases as the granularity of concurrency decreases
  • Granularity: Indicates the number of instructions executed per time slice for each logical flow

13.3 Thread-based concurrent programming

  • A mixture of process-based and I/O based multiplexing approaches
  • A thread is a logical flow running in a process context, automatically scheduled by the kernel. Each thread has its own thread context, including a unique integer thread ID(Thread ID,TID), stack, stack pointer, program counter, general purpose registers, and condition codes
  • All threads running in a process share the entire virtual address space of that process
13.3.1 Threaded execution model
  • Each process begins its life cycle with a single thread, called the main thread.
  • At some point, the main thread creates a peer thread, and from this point on, the two threads run concurrently

Process and thread differences
  • Thread contexts are much smaller than process contexts and switch much faster
  • Threads, unlike processes, are not organized in a strict parent-child hierarchy. Threads associated with a process form a pool of peers Peers, created independently of other processes, a thread can kill any of its peers or wait for any of its peers to terminate, each reading and writing the same shared data
13.3.2 Posix threads
  • Posix Threads (Pthreads) is a standard interface for handling threads in C programs and is available on most Unix systems
  • About 60 functions are defined that allow programs to create, kill, and reclaim threads, securely share data with peers, and notify peers of changes in system state
  • Thread code and local data are wrapped in a thread routine, known as a Golang, and passed in a func
13.3.3 Creating a Thread
13.3.4 Terminating a Thread

A thread is terminated in one of the following ways

  • When the top-level thread routine returns, the thread terminates implicitly
  • Pthread_exit: The child thread calls the pthread_exit function, which terminates explicitly. This function returns a pointer to the return value thread_return
  • The main thread call, using the pthread_exit function, waits for all other peers to terminate and then terminates the main thread and the entire process, returning thread_return
  • Unix’s exit function: A peer thread calls Unix’s exit function to terminate a process and all threads associated with that process
  • Pthread_cancle: Another peer thread that terminates the thread corresponding to the specified thread ID by calling the pthread_cancle function
13.3.5 Reclaiming Resources from a Terminated thread
  • The thread waits for the specified thread to terminate by calling the pthread_join function
  • The pthread_join function blocks until the thread TID terminates, and then reclaims any memory resources occupied by the terminated thread
13.3.6 Separating threads
  • At any point in time, threads are joinable or detached.
  • A combined thread can be reclaimed and killed by other threads, and its memory resources (stack) are not freed until reclaimed
  • A detached thread cannot be reclaimed or killed by other threads, and its memory resources are automatically released by the system when it terminates
  • By default, threads are created to be joinable
  • To avoid memory leaks, each joinable thread should either be explicitly retrieved by another thread or separated by calling the pthread_detach function
13.3.7 Initializing a Thread
  • The pthread_once function allows you to initialize the state associated with thread routines
13.3.8 A thread-based concurrent server
  • Passes a pointer to the connected descriptor to the child thread
  • Without explicitly retrieving threads, each thread must be separated before the resource can be retrieved by the system upon termination

13.4 Shared Variables in multiple threads

13.4.1 Thread memory model
  • A group of concurrent threads runs within the context of a process, each with its own independent thread context (including thread ID, stack, stack pointer, program counter, conditional code, and register values for general purpose)
  • Multiple threads share the process context, including the entire user virtual address space, and a collection of open files
  • Registers are never shared, and virtual storage is always shared
13.4.2 Mapping variables to storage

Multithreaded C program variables

  • Global variables are defined outside of functions, and the read/write area of virtual storage contains only one instance
  • A variable defined internally by a local automatic variable (local variable) function that has no static property. At run time, each thread’s stack contains instances of all its local variables. Local variables are unique to each thread, even if multiple threads execute the same function
  • Local static variables a variable defined inside a function with a static attribute. Like global variables, the read/write area of virtual storage contains only one instance
13.4.3 Sharing variables
  • The variable v has only one runtime instance and is referenced by more than two threads

13.5 Synchronizing Threads with semaphores

13.5.2 Accessing shared variables using semaphores
  • A special type of variable is called a semaphore. The semaphore s is a global variable with a non-negative integer value that can only be handled by two special operations, called P and V
  • P (s) : If S is non-zero, P subtracts s by 1 and returns immediately. If s is zero, the process is suspended and blocked until S becomes non-zero. Then the process is restarted by V operation to complete P operation and gain control
  • V (s) : increments s by 1. If any processes are blocked in operation P, operation V will restart one of these processes
Binary semaphore
  • Each shared variable (or set of related shared variables) is associated with a semaphore s (initially 1), and the corresponding critical region is surrounded by P and V operations, which always have a value of 0 or 1, so it is called a binary semaphore
  • Progress of the figure

  • The exclusion zones created by P and V operations make it impossible for multiple threads to execute instructions within the enclosed critical zone at any one point in time. Semaphore operations ensure mutually exclusive access to the critical zone, a phenomenon commonly referred to as mutual exclusion.
  • Binary semaphores whose purpose is to provide a mutex are often called mutex. A P operation on a mutex is called locking, and a V operation is called unlocking. A thread that has locked a mutex but has not unlocked it is said to occupy a mutex
13.5.3 Posix semaphores
  • The Posix standard defines many functions that operate on semaphores. The three basic operations are sem_init, sem_WAIT (P operation), and sem_POST (V operation).
13.5.4 Scheduling shared resources using semaphores

Other synchronization mechanisms
  • Java threads are synchronized using a mechanism called the Java Monitor, which provides a higher level of abstraction of semaphore mutex and scheduling capabilities

13.6 Synthesis: Concurrent servers based on pre-threading

  • The downside of creating a new thread for each new client is that it can be costly

13.7 Other concurrency problems

13.7.1 Thread Safety
  • A function that consistently produces correct results if and only if called repeatedly by multiple concurrent threads is said to be thread-safe.
Class 1: Functions that do not protect shared variables
  • Modify an unprotected variable
  • Solution: Using synchronous operations such as P and V to protect variables has the advantage that the calling program does not need to make any changes, but the disadvantage is that synchronous operations will slow down the program execution time
Class 2: functions that protect state across multiple calls
  • Functions share a global variable
  • Solution: Caller passes status information in function arguments. Disadvantage: need to be forced to modify the code in the caller
Class 3: Functions that return Pointers to static variables
  • A global variable is shared
  • Use lock-and-copy technology to define a thread-safe wrapper function. Execute lock-and-copy and replace all calls to thread-unsafe functions by calling this wrapper function
Class 4: Functions that call thread-unsafe functions
13.7.2 Reentrancy
  • An important class of thread-safe functions is called reentrant functions.
  • Features: When called by multiple threads, no shared data is referenced
  • Reentrant functions are generally more efficient than thread-safe functions that cannot be reentrant because no synchronization operations are required

  • If all function arguments are passed by value (no Pointers) and all data references are local auto-stack variables (i.e., no references to static or global variables), then the function is explicitly reentrant and can be assumed to be reentrant regardless of how it is called
  • Add in some arguments to explicitly reentrant functions that can pass Pointers, and you have a implicitly reentrant function that is reentrant if the calling thread is careful to pass Pointers to unshared data
13.7.3 Using existing library functions in multithreading
  • Most Unix functions and functions defined in the standard C library are thread-safe, with a few exceptions
  • Unix systems provide reentrant versions of most thread-unsafe functions, always ending with the “_r” suffix
13.7.4 competition
  • Reason: Programmers assume that threads will follow some special path through the execution state space
  • Multithreading must work correctly for any possible track
13.7.5 deadlock
  • A deadlock, “deadlock,” is when a group of threads is blocked, waiting for a condition that will never be true
Avoid deadlock
  • Mutex lock order rule: If for each pair of mutex (s,t) in a program, each thread containing both S and T locks them simultaneously in the same order, then the program is deadlock-free
  • That is, both threads are starting from P(s) ->P(t)