Machine-level representation of a program

Program code

compile

At compile time, increasing the optimization level can increase the speed, but the compile time will be longer and the code debugging will be more difficult

The build process

Graph LR A(C preprocessor extension source code)-->B(compiler generates two files of assembly code) B-->C C(assembler converts assembly code into binary object code)-->D(linker merges object files with standard Unix library functions to produce the final executable file)Copy the code

Assembly code and object code

Assembly code is close to machine code, the object code is binary format, assembly code is characterized by readable text format representation

The data format

Intel uses the term “word” to refer to 16-bit data types, so it calls 32 bits a double word and 64 bits a four word

Optimizing program performance

Write efficient programs

Choose the best set of algorithms and data structures
Write source code that the compiler can optimize effectively to convert into efficient executable code

Compile technical

Machine-dependent: Low-level details that depend on machines
Machine agnostic: Not considering the characteristics of a computer

Memory alias usage

The compiler must assume that different Pointers may point to the same location in memory, hindering compiler optimization

Eliminate circular inefficiencies

Amdahl’s law

When we speed up one part of the system, the impact on the system as a whole depends on how important that part is and how much speed is increased

Number cache-friendly code

Make the most common cases run faster

A few loops of the core function

Within the loop, the number of cache misses is minimal

Time locality

A memory location that has been referenced once will be referenced many times in the near future

Spatial locality

If a memory is referenced once, a nearby memory location is referenced in the near future

Memory mountain

The size and stride parameters of the run function allow us to control the degree of locality in which the read sequence is generated
The smaller size is, the smaller each read is, the faster variable reference is, and the better time locality is. The smaller the stride, the faster the writing, the better the spatial locality
The run function is repeatedly called with different sizes and stride to form a two-dimensional function with time and space locality of read bandwidth, which is called memory mountain

The second part

Chapter 7 Links

7.1 Compiler driver

link

Linking is the process of combining different parts of code and data into a single file that can be loaded into storage for execution
Links can be performed at compile time (source code -> machine code), load time (the program is loaded into memory for execution), or even run time by the application
In the early days, links were performed manually. Now, in the system, they are performed automatically by programs called linkers

7.3 Object File

Relocatable target files

Contains binary code and data that can be merged with other relocatable object files at compile time to create an executable object file

Executable object file

Contains binary code and data that can be copied directly to storage and executed

Sharing target files

A special type of relocatable object file that can be dynamically loaded into storage and linked at load time or run time

7.4 You can relocate the target file

7.9 Loading an Executable object File

P is not a built-in shell command, so the shell considers P to be an executable object file and runs P by calling operating system code that resides in memory called the loader

Any Unix program can call the execve function to call the loader
The loader copies code and data from the executable object file from disk to memory, and then jumps to the first instruction of the program, the entry point, to run the program
The process of copying a program into storage and running it is called loading

How does the loader actually work?

Every program on a Unix system runs in a process context that has its own virtual address space
When the shell runs a program, the parent shell process generates a child process that is a copy of the parent process
The child process starts the loader with the execve system call, which removes the child process’s existing virtual memory segments and creates a new set of code, data, heap, and stack segments
The new stack and heap segments are initialized to zero, and the new code and data segments are initialized to the contents of the executable by mapping the pages in the virtual address space to the large sub-blocks of the executable’s pages
The loader jumps to the _start address and eventually calls the application’s main function
Except for some header information, there is no copy of data from disk to storage during load until the CPU references a mapped virtual page, at which point the operating system automatically transfers the page from disk to storage using its page scheduling mechanism

7.10 Dynamically Linking shared Libraries

7.12 Position-independent code

The purpose of shared libraries is to allow multiple running processes to share the same library code in memory, saving precious memory resources

How can multiple processes share a copy of a program?

Method 1

Assign each shared library implementation a block of dedicated address space and then require the loader to always load the shared library at that address

Bad :1, even if a process doesn’t use the library, that space will still be allocated

2. Difficult to manage: We have to ensure no block will overlap, at a time when a library changed, it must confirm its assigned or before and the size of the block, and create a new library, have to find space for it, with the time development, a system with hundreds of libraries and various versions of the library, unavoidably address space is divided into many small, unused and can no longer use the small hole, and each department The allocation from library to memory is different

Method 2

Compile library code

Without the need for the linker to modify the library code, code can be loaded and executed at any address. Such code is called position-independent code (PIC code).

7.13 Tools for processing object files

Chapter 8 Abnormal control flow

8.1 abnormal

Exceptions are a form of exception control flow implemented partly by hardware and partly by the operating system
The details vary from system to system
The basic idea is the same for every system

An exception is a mutation in the control flow in response to some change in processor state

Exception Handling Procedure

In any case, when the processor detects that an event has occurred, it makes an indirect procedure call (exception) through a jump table called an Exception table to an operating system subroutine designed to handle such events, an exception handler handler)

When the exception handler completes processing, one of three things can happen, depending on the exception type:

Return control to the current instruction Icurr(the instruction that was executing when the exception occurred)
Will control to return Inext(the next instruction that will be executed if no exception occurs)
Terminates an interrupted program

8.1.1 Exception Handling

Each type of exception is assigned a unique non-negative integer exception number, some by the designer of the processor and others by the designer of the operating system kernel (the part of the operating system resident memory)
Processor designer assignment: division by zero, missing pages, memory access violation, breakpoints, and arithmetic overflow
Operating system kernel allocation: system calls, signals from external I/O devices
When the system starts, the operating system allocates and initializes a jump table called an exception table
The exception number is the index in the exception table, whose starting address is placed in a special CPU register called exception Table Base Register

Exceptions are similar to procedure calls, except that:

If control is transferred from a user program to the kernel, all of these items are pushed onto the kernel stack, not onto the user stack
Exception handlers run in kernel mode, which means they have full access to all system resources
In a procedure call, the processor pushes the return address onto the stack before a jump, whereas when handling an exception, the return address is either the current instruction or the next instruction, depending on the exception type

8.1.3 Types of Exceptions

interrupt

Hardware interrupts are asynchronous in the sense that they are not caused by any specific instruction
I/O devices, such as network adapters, disk controllers, and timer chips, trigger interrupts by signaling a pin on the processor chip and placing an exception number on the system bus that identifies the device causing the interrupt

trap

Intentional exception, the result of executing an instruction
Returns control to the next instruction
Purpose: To provide a procedure-like interface between a user program and the kernel, called a system call
The difference between a system call and a normal function call is that a normal function runs in the user model, which limits the type of instructions the function can execute and access only the same stack as the calling function.
System calls run in kernel mode, which allows system calls to execute instructions and access stacks defined in the kernel

The fault

Caused by an error condition and may be corrected by a fault handler
Can be corrected, control returned to the fault command, re-execution
The handler returns an ABORT routine in the kernel, which terminates the application that caused the failure

Termination of

Terminations are the result of unrecoverable fatal errors — typically hardware errors such as parity errors that occur when DRAM or SRAM bits are corrupted
Returns control to an ABORT routine, which terminates the application

8.2 process

When we run a program on a modern system, we will have an illusion, as if our program is currently running in the system only, our program seems to be the exclusive use of processor and memory, the processor is like continuous one by one, carry out our program of instruction, in the end, we in the program code and data appear as system The illusion of a unique object in unified storage is provided by the concept of a process
An instance of an executing program
Each program runs in the context of some process
A context consists of the states required for a program to run correctly, including the program’s code and data in storage, stacks, the contents of general purpose storage, program counters, environment variables, and a collection of open file descriptors

Key abstractions that a process provides to an application

A separate logical control flow gives us the feeling of exclusive use of the processor
A private address space allows us to feel exclusive use of the memory system

8.2.1 Logical Control Flow

In general, each logical control flow wants to be independent of the other logical flows
The only exception occurs when that process explicitly interacts with other processes using interprocess communication (IPC) mechanisms, such as pipes, sockets, shared storage, and semaphores
Several logical flows that overlap in time are called concurrent processes, and these two processes are said to run concurrently
The concept of a process running in rotation with other processes is called multitasking, and each period of time in which a process executes part of its control flow is called a slice
Multitasking is also called time sharding

8.2.2 Private Address Space

The structure of the address space for Linux processes

The bottom three quarters of the address space are reserved for user programs, including the usual text, data, heap, and stack segments
The top quarter is reserved for the kernel (such as the code, data, and stack used when making system calls)

8.2.3 User Mode and kernel Mode

The processor provides this functionality with a mode bit in a control register that describes the current rights enjoyed by the process
A process running in kernel mode that can execute any instruction in the instruction set and access any memory location in the system
Processes in user mode cannot directly refer to code and data in the kernel area of the address space and must access the kernel code and data indirectly through the system call interface
The only way a process can change from user mode to kernel mode is through an exception such as an interrupt, failure, or falling into a system call

8.2.4 Context Switch

The operating system kernel implements multitasking using a more advanced form of exception control flow called context switch
The kernel can decide to preempt the current process and restart a previously preempted process, a decision called scheduling, which is handled by code in the kernel called a scheduler

Context switching process

Saves the context of the current process
Restores a context saved by a preempted process
Pass control to the newly restored process

Interrupts can also trigger context switches. For example, all systems have some mechanism for generating periodic timer interrupts, typically every 1ms or 10ms. Each time a timer interrupt occurs, the kernel determines that the current process has run long enough and switches to a new process

The switching process

It takes A long time (on the order of tens of ms) for disk to read data, so the kernel performs A context switch from process A to process B rather than wait
Notice that prior to the switch, the kernel was executing instructions on behalf of process A in user mode
In the first step of the switch, the kernel executes instructions in kernel mode on behalf of process A, then at some point the kernel starts executing instructions on behalf of process B (kernel mode), and when the switch is complete, the kernel executes instructions in user mode on behalf of process B
B in user mode for A while until an interrupt signal from the disk, the data has been spread to the memory, the kernel to determine B has run long enough, will perform A switch from B A context switch, control returns to A close in the read system call after the instruction, process A continues to run, until the next exception occurs

Cache contamination and exception control flow

In general, hardware cache memory (L1,L2,L3 are very small) does not interact well with exception control flows such as interrupts and context switches
If the current process is interrupted, the cache is cold for the interrupt handler
If the handler accesses enough entries from main memory, the cache is also cold to it while the interrupted process continues
When a process continues after a context switch, the cache is also cold to the application and must be warmed up again

8.3 System Call and Error Handling

We refer interchangeably to system calls and their associated wrapper functions as system-level functions

8.4 Process Control

8.4.2

Three states of a process

The running process either executes on the CPU or waits to be executed and is eventually scheduled
The execution of a suspended process is suspended and will not be scheduled. When SIGSTOP, SIGTSTP, SIDTTIN, or SIGTTOU is received, the process is suspended and remains suspended until it receives a SIGCONT signal, at which point the process starts running again
The termination process is stopped forever; There are three reasons to terminate a process: receiving a signal whose default behavior is to terminate the process; Return from the main program; Call the exit function.

The fork function

The parent creates a new running child by calling the fork function
The newly created child is almost, but not exactly, the same as the parent
The child gets the same copy of the parent’s user-level virtual address controls, including text, data and BSS segments, heaps, and user stacks. The child also gets the same copy of any open file descriptors as the parent, meaning that when the parent calls fork, the child can read and write any files that are open in the parent
The biggest difference between the parent process and the newly created child process is that they have different Pids
Fork is called only once, but returns twice: once in the calling process (parent) and once in the child. In the parent,fork returns the PID of the child, and in the child,fork returns 0
The return value fork is used to determine whether the program is being executed in the parent or child process

8.4.3 Reclaiming a subprocess

When a process terminates for some reason, the kernel is not immediately removed from the system, but is kept in a terminated state until reaped by its parent process (reaped).
When the parent reclaims the terminated child, the kernel passes the exit status of the child to the parent and then abandons the terminated process. From this point on, the process does not exist. The forehead
A process that has died but has not been reclaimed is called a zombie.
If the parent process does not reclaim its dead children, it terminates, and the kernel arranges the init process to reclaim them
The init process has a PID of 1 and is created by the kernel when the system is initialized

8.4.4 Hibernating processes

8.5 signal

Higher-level software exceptions, called Unix signals, allow processes to interrupt other processes
A signal is a message that informs a process that an event of some type has occurred in the system
Each signal type corresponds to a certain type of system event
Underlying hardware exceptions are handled by kernel exception handlers and are usually invisible to user processes
Signals provide a mechanism for notifying user processes of the occurrence of these exceptions

The case of signals

Ctrl-c, the kernel sends a SIGINT signal to the foreground process
A process can forcibly terminate another process by sending a SIGKILL(number 9) signal
When a child process terminates or pauses, the kernel sends a SIGCHLD(number 17) to the parent process

8.5.1 Signal Terminology

A signal that is sent but not received is called a pending signal.
There is at most one signal of a type to be processed at any one time, and subsequent k-type signals sent to the process are not queued but simply discarded
A process can selectively block receiving a signal. When a signal is blocked, it can still be sent, but the resulting signal to be processed will not be received until the process unblocks the signal

8.5.2 Sending signals

Process group

Each process belongs to only one process group, which is identified by a positive integer process group ID
By default, a child process and its parent process belong to the same process group
A process can change its own or another process’s process group by using the setpgid function

8.5.3 Receiving signals

8.5.4 Signal processing Problems

The system signal can be interrupted. System calls such as Read, write, and Accept, which potentially block the process for an extended period of time, are called slow system calls
In some systems, when the handler catches a signal, the interrupted slow system call does not continue when the signal handler returns, but immediately returns an error condition to the user

8.5.5 Portable signal processing

Posix standards define sigAction functions that explicitly specify the signal-processing semantics they want
The Signal wrapper function sets up a Signal handler

Only the type of signal that the handler is currently processing is blocked
As with all signal implementations, signals do not queue
Interrupted system calls are restarted automatically whenever possible

8.6 Non-Local Hop

C provides a form of user-level exception control flow called nonlocal jump. It transfers control directly from one function to another currently executing function without going through the normal call-return sequence
An important use of non-local jumps is to allow an immediate return from a deeply nested function call, usually caused by the detection of an error condition
If you find an error condition in a deeply nested function call, you can use a non-local jump to go straight back to a normal localized error handler instead of laboriously unwinding the call stack
Another important use of non-local jumps is to have a signal handler branch to a particular code location, rather than returning to the location where the signal arrived at the interrupt instruction

8.7 Tools for Operating processes

Unix systems provide a number of useful tools for monitoring and manipulating processes:

Strace: Prints the trace of each system call called by a program and its children
Ps: Lists the current processes in the system (including zombie processes)
Top: Prints information about the current process resource usage
Kill: Sends a signal to a process
/proc: a virtual file system that outputs the contents of a large kernel data structure in ASCII text format, which can be read by user programs; For example, type “cat /proc/loadavg” to see the current load average on your Linux system

Chapter ix Measurement of program execution time

9.1 Time flow on computer systems

On a macro time scale, the processor is constantly switching between many tasks, assigning approximately 5 to 20ms to each task at a time
The user feels that the task is going on simultaneously, because the person is not aware of periods of time shorter than about 100ms, during which the processor can execute millions of instructions

9.1.1 Process scheduling and timer interruption

External events, such as keyboard, disk operations, and network activity, generate interrupt signals that allow the operating system scheduler to run and possibly switch to another process
We also want processors to switch from one process to another so that the user looks like the processor is executing many programs at the same time
The computer has an external timer that periodically sends interrupt signals to the processor,
The time between interrupt signals is called interval time.
When an interrupt occurs, the scheduler can choose to either continue the currently executing process or switch to another process
The interval must be short enough to switch between tasks frequently
Switching from one process to another takes thousands of clock cycles to save the state of the current process and prepare the state for the next, and too short a time interval can lead to poor performance
Typical timer intervals range from 1 to 10ms
Operating system functions, such as handling missing pages, input, or output (the print function), are performance-intensive to print logs

9.1.2 Time from an application perspective

9.2 Using Interval counting to measure the time

Useful for approximating program performance, granularity is too coarse to be used for measurements lasting less than 100ms
There are systematic biases, overestimated calculation times, averaging about 4%
Advantages: Its accuracy is not very dependent on system load

9.3 Cycle Counter

A timer that runs at the clock cycle level is a special register that increments by 1 each clock cycle
Not all processors have such counters

9.4 Use cycle counter to measure program execution time

9.5 Measurement based on getTimeofday function

9.7 Looking ahead

Several characteristics are introduced into the system which have great influence on performance measurement

Frequency-changing clocks: To reduce power consumption, future systems will change the clock frequency because power consumption is directly related to clock frequency

9.8 Real life: K times optimal measurement method

9.9 Lessons learned

Every system is different
Experiments can be very revealing
Getting accurate timing is particularly difficult on heavily loaded systems
A cache can greatly affect the execution time of a program. The traditional technique is to either empty the cache of all useful data before the timing starts, or to load in all data that would normally be in the cache at the start

Chapter 10 Virtual Memory

Memory is easily corrupted, and if one process accidentally writes memory that another process uses, the process may fail in some confusing way that has nothing to do with program logic
Virtual storage is a perfect interaction of hardware exceptions, hardware address translation, main memory, disk files, and kernel software, providing a large, consistent, private address space for each process

Virtual storage provides three important capabilities:

He made efficient use of main memory by treating it as a cache of address space stored on disk, keeping only active areas in main memory and passing data back and forth between disk and main memory as needed
It simplifies storage management by providing a consistent address space for each process
It protects the address space of each process from being corrupted by other processes

10.1 Physical and Virtual addressing

The main memory of a computer system is organized into an array of M contiguous byte size cells. Each byte has a unique physical address. The first byte has an address of 0, the next byte has an address of 1, the next byte has an address of 2, and so on
The most natural way for a CPU to access memory is to use physical addresses, known as physical addressing
Modern processors designed for general-purpose computing use virtual Addressing

The process of virtual addressing

The CPU accesses main memory by generating a virtual address. Address translation translates the virtual address into a physical address
Address translation requires close cooperation between the CPU hardware and the operating system
Dedicated hardware on the CPU chip called the MEMORY Management Unit (MMU) dynamically translates virtual addresses using a query table stored in main memory, the contents of which are managed by the operating system

10.2 Address Space

An address space is an ordered collection of non-negative integer addresses
If the integers in the address space are continuous, it is a linear address space
In a system with virtual storage, the CPU changes from one with

N = 2^n
Copy the code

The virtual address space is called the virtual address space

The size of an address space is described by the number of bits needed to represent the largest address
Modern systems typically support 32-bit or 64-bit virtual address Spaces
A system also has a physical address space that corresponds to M bytes of physical storage in the system

10.3 Virtual storage as a caching tool

10.3.1 Organization of THE DRAM cache

10.3.2 page table

The page table maps virtual pages to physical pages and is read each time the address translation hardware translates a virtual address to a physical address
The operating system is responsible for maintaining the contents of the page table and transferring pages back and forth between disk and DRAM

10.3.3 page hit

10.3.4 missing page

A DRAM cache miss is called a page miss
The activity of transferring pages between disks and memory is called Cleanup or Paging
The strategy of paging in pages only when a miss occurs is called Demand Paging

10.3.5 Assigning a Page

10.3.6 Local Rescue again

As long as our program has good time locality, the virtual memory system works quite well
If the size of the working set exceeds the size of the physical memory, the program incurs an unfortunate state called thrashing, in which pages are constantly moving in and out

10.4 Virtual storage as a storage management tool

10.4.1 Simplifying links

A separate address space allows each process to use the same basic format for its memory image, regardless of where the code and data are actually stored in physical storage

10.4.2 Simplifying Sharing

In some cases, processes are needed to share code and data; for example, each process must call the same operating system kernel code, and every C program calls a standard library program, such as printf
Instead of including separate copies of the kernel and C standard library in each process, the operating system shares a copy of this code by mapping the appropriate virtual pages in different processes to the same physical page

10.4.3 Simplifying Storage Allocation

When a process requests additional heap space, the operating system allocates an appropriate number (for example, K) of contiguous virtual storage pages and maps them to K arbitrary physical pages at any location in physical storage
Because of the way page tables work, there is no need for the operating system to allocate K contiguous physical memory pages; pages can be randomly scattered in physical memory

10.4.4 Simplified Loading

Mapping a collection of contiguous virtual pages to any location in any file is called memory mapping.

10.5 Virtual memory as a storage protection tool

The SUP bit indicates whether the process must be running in kernel mode in order to access the page
If an instruction violates these license conditions, the CPU triggers a general protection fault, passing control to an exception handler in a kernel, which the Unix Shell calls a segmentation fault.

10.6 Address Translation

Address translation is a mapping between elements in the virtual address space of an N-element and elements in the physical address space of an M-element

10.7 Case study :Pentium/Linux memory system

10.8 Memory Mapping

10.8.1 Viewing the Shared Object Again

An object can be mapped to an area of virtual storage, either as a shared object or as a private object
The virtual memory region that a shared object maps to is called a shared region, and the virtual memory region that a private object maps to is called a private region
Any writes that a process makes to a shared object are visible to other processes that also map the shared object to their virtual storage, and these changes are reflected in the original object on disk
Private objects are mapped to virtual storage using a clever technique called copy-on-write
Both processes map a private object process and share the same physical copy of the object. For each process that maps a private object, the page table entries for the corresponding private area are marked as read-only, and the area structure is marked as a private write time copy
As long as no process tries to write to its own private region, they can continue to share a single copy of an object in physical storage. However, as long as one process tries to write to a page in the private region, the write operation triggers a protection failure

Copy-on-write process

When writes trigger protection fault, fault handler will be in physical memory to create a new copy of the page, update the page table entries point to the new copy, and then the restoration of this page to write permissions, when the fault handler returns, the CPU to perform the write operation, in the newly created page, the write operation can be properly carried out
Copy-on-write makes the best use of scarce physical memory by delaying copying in private objects until the last possible moment

10.8.2 Look at the fork function

When fork is called by the current process, the kernel creates various data structures for the new process, assigns a unique PID, makes an as-is copy of the current process’s MM_struct, region structure, and page table, marks each page in both processes as read-only, and marks each region structure in both processes as a private write-time copy
When fork returns in the new process, the new process now has exactly the same virtual memory that existed when fork was called. When either process later writes, the copy-on-write mechanism creates a new page, thus preserving the abstraction of private address space for each process

10.8.3 Look at the execve function

The execve function loads and runs the program contained in the executable object file A.out in the current process, effectively replacing the current program with the A.out program

Example Delete an existing user area
Mapping private areas
Mapping a shared Area
Set program counter

10.8.4 User-level Memory Mapping using the MMAP function

Unix processes can use the Mmap function to create new virtual storage areas and map objects to them

10.9 Dynamic Memory Allocation

While low-level mmap and MUNmap functions can be used to create and remove areas of virtual memory, most C programs use a Dynamic Memory allocator when they need additional virtual memory at run time
A dynamic memory allocator maintains a process’s virtual memory area, called the heap.
On most Unix systems, the heap is an area that requests binary zeros
For each process, the kernel maintains a variable BRK (break) that points to the top of the heap

The allocator maintains the heap as a collection of blocks of different sizes
Each chunk is a contiguous chunk of virtual storage that is either allocated or free
Allocated blocks are explicitly reserved for supply use until freed, either by application display execution or implicitly by the memory allocator itself (garbage collection)

Explicit Allocator Display Explicit Allocator

The application explicitly frees any allocated blocks, for example, the C library’s malloc function allocates blocks and the free function frees a block

Implicit Allocator

An implicit allocator, also known as a garbage collector, is required to detect when an allocated block is no longer used by a program and then release the block.
The process of automatically releasing unused allocated blocks is called garbage collection

10.9.1 Malloc and free functions

The malloc function returns a pointer to a memory block of at least size bytes
After calling free, the pointer still points to the freed block, and the application does not use the pointer until the block is reinitialized by a new malloc call

10.9.2 Why Use Dynamic memory allocation

The size of the data structure is not known until the program is actually running

10.9.3 Requirements and objectives of the allocator

Explicit allocators must work under some constraints:

The allocator that processes any request sequence must not assume the order in which requests are allocated and released
Immediate Response The request allocator must respond to an allocation request immediately and is not allowed to rearrange or buffer requests to improve performance
Use heap only In order for the allocator to be extensible, any non-scalar data structures used must be stored in the heap
Align blocks allocator must align blocks so that they can hold any type of data object; On most systems, means that the block returned by the allocator is 8-byte (double-word) border-aligned (INT64,float64, etc.)
Without modifying an allocated block allocator can only manipulate or change free blocks; Once assigned, it is not allowed to change or move it
Target 1: Maximize throughput throughput: The number of requests completed per second (e.g. 500 allocation requests per second,500 release requests per second, i.e. 1000 operations per second)
Goal 2: Maximize memory utilization

10.9.4 fragments

Fragmentation occurs when unused storage cannot be used to satisfy allocation requests

Internal fragmentation

The allocated block is larger than the payload for example, the allocator allocates a minimum of 8 bytes for the allocated block, satisfying the alignment constraint, but a payload is 2 bytes

External fragmentation

Free storage adds up to enough to satisfy an allocation request, but no single free block is large enough to handle the request
External fragmentation is difficult to quantify and unpredictable, so allocators typically employ heuristic strategies to try to maintain a small number of large free blocks rather than a large number of small free blocks

10.9.5 Implementation Issues

Free Block Organization: How do we record free blocks
Placement: How do we select a suitable free block to place a newly allocated block
Split: After placing a newly allocated block into a free block, what is done with the rest of the free block?
Merge: What do we do with a block that has just been freed?

10.9.6 Implicitly Free linked lists

Block = header of a word (4 bytes,32 bits)+ payload + padding
The header encodes the fast size (including the header and all padding)

Implicitly free linked lists

Pros: Simplicity
Disadvantages: The overhead of any operation, such as placing allocated blocks, requires that the search of the free list be linear to the total number of allocated and free blocks in the heap

10.9.7 Placing allocated Blocks

When an application requests a k-byte block, the allocator searches the free linked list for a free block that is large enough to place the requested block, and the way the allocator performs this search is determined by the Placement policy

Common Strategies

First Fit

Search the free list from scratch and select an appropriate free block

Advantages: Tends to keep large free blocks later in the linked list
Disadvantages: Leaves “shards” of small free blocks near the start of the list, increasing the search time for larger blocks

Next Fit

This is similar to the first adaptation, except that instead of starting each search at the beginning of the list, it starts where the last query ended

The idea is that if we found a match last time in a free block, we’re likely to find a match next time in the remaining block
The next adaptation runs a little faster than the first
The memory utilization of the next adaptation is much lower than that of the first adaptation

Best fit

Examine each free block and select the smallest free block that matches the required request size

The utilization rate is higher than the first adaptation and the next adaptation
Disadvantages: Even in simple free list organizations, such as implicit free lists, best fit requires a thorough search of the heap

10.9.8 Splitting an Idle Block

Once a matching free block is found, you must decide how much space to allocate in that free block

Use the whole free block, which is simple and fast, but has the disadvantage of causing internal fragmentation
Divide the free block into two parts, with the first part becoming the allocation block and the rest becoming a new free block

10.9.9 Obtaining additional heap storage

What happens if the allocator cannot find a suitable free block?

Merge free blocks that are physically adjacent to each other in memory to create larger free blocks
If 1 still does not generate a large enough chunk, or if the free chunk has been merged to the maximum extent possible, the allocator will request additional heap storage from the kernel by calling mMAP, or SBRK functions

3. In either case, the allocator converts the extra memory into a large free block, inserts it into the free linked list, and places the requested block in the new free block

10.9.10 Merging Free Blocks

When an allocator releases an allocated block, the newly released free block may be adjacent to other blocks. These adjacent free blocks may cause a phenomenon called fault fragmentation.
These fake fragments are cut up into small, unusable free blocks, and 3 words +3 words cannot be allocated to 4 words
To solve the false fragmentation problem, any actual allocator must merge adjacent free blocks, a process known as coalescing.

When to perform a merge

Coalescing immediately

At each time a block is released, all adjacent blocks are merged, which can be done in constant time

Disadvantages: In some request patterns, blocks are merged repeatedly and then split immediately, resulting in a large number of unnecessary splits and merges

Deferred coalescing

Wait until a later date to merge free blocks

For example, the allocator can delay merging until an allocation request fails, then scan the entire heap to merge all free blocks

10.9.11 Merge with boundary tags

The head of the current block points to the head of the next block, and this pointer can be checked to determine whether the next block is free. If so, its size is simply added to the size of the current block head, and the two blocks are merged in constant time
Boundary tag allows merging of previous blocks in constant time
Add a foot (footer boundary marker) at the end of each block to determine the starting position and state of the previous block by examining the foot

The concept of boundary marking is simple and elegant and is universal to different types of allocators and free list organizations
Disadvantages: Requiring each block to hold a head and a foot can incur significant memory overhead when an application manipulates many small blocks, such as: If a graphics application repeatedly calls malloc and free to dynamically create and destroy graphics nodes, and each graphics node requires only two memory words, the head and feet will take up half the space of each allocated block

Optimization method of boundary marking

The feet are needed only if the previous block is free. If we store the allocated/free bits of the previous block in the extra low place of the current block, then the allocated block does not need the feet, and the free block still needs the feet

10.9.12 Synthesis: Implement a simple allocator

10.9.13 Displaying Free Lists (Two-way Free Lists)

The heap can be organized as a two-way free linked list, with each free block containing a pointer to preD (ancestor) and SUCC (descendant)
Using a bidirectional linked list, the allocation time of the first fit is reduced from linear time of the total number of blocks to linear time of the number of free blocks

A strategy for sorting blocks in a free linked list

Last in first Out (LIFO)

The newly freed block is placed at the beginning of the linked list. Freeing blocks can be done in constant time, or merging can be done in constant time if boundary markers are used

Maintain linked lists in address order

Releasing a block requires a linear time search to locate the appropriate ancestor; Higher memory utilization, close to optimal utilization

Disadvantages of explicitly linked lists

The free block must be large enough to contain all the Pointers needed, as well as heads and possibly feet
This results in a larger minimum block size, potentially increasing the degree of internal fragmentation

10.9.14 Separated Free Linked Lists (Separated Storage)

A popular method to reduce allocation time, commonly known as Segregated Storage, maintains multiple free lists with blocks of roughly equal size each
The general idea is to divide all possible block sizes into equivalence classes, also known as size classes.
The allocator maintains an array of free lists, each size class being a free list, in ascending order of size. When the allocator needs a block of size N, it searches the corresponding free list. If it cannot find a suitable block to match, it searches the next list, and so on
There are many ways to separate storage, the main differences being: how to define size classes, when to merge, when to ask the operating system for additional heap storage, whether to allow partitioning, and so on

Simple Segregated Storage

The free list of each size class contains blocks of equal size. Each block is the size of the largest element in the size class. For example, if a size class is defined as {17-32}, then the free list of that class consists entirely of blocks of size 32
When allocating, check the corresponding free list. If the list is not empty, simply allocate the whole of the first block. The free block will not be split to satisfy the allocation request
If the linked list is empty, the allocator requests an additional memory block of a fixed size from the operating system, divides the chunk into equal chunks, and links these blocks together to form a new free linked list
To free a block, you simply insert the block into the front of the corresponding free list

understand

The free memory is divided into blocks of fixed size for easy management. For example, a linked list composed of blocks of 32 size is used. When the system requests the allocator to allocate memory within the range of 17-32, a free block is used from the linked list of 32
Effectively reduce the number of free linked lists, reduce maintenance difficulty
Sometimes the range 17 to 32 needs to be reduced in order to reduce internal debris

advantages

Allocating and releasing blocks are fast constant time operations
There are equal sized blocks in each block, which are not divided or merged, meaning that each block has very little memory overhead
There is no merge, so the allocated block does not need a header (allocated/free tag), nor does it need a foot
Both allocation and release operations operate at the beginning of the free list, so the list only needs to be one-way
The only field needed in any block is the SUCC pointer (successor) for a word in each free block, so the minimum block size is a word

disadvantages

Since free blocks are not partitioned, internal fragmentation can result, such as a large number of 17 size objects using 32 blocks
Because free blocks are not merged, some reference patterns can cause too much external fragmentation. For example, large objects are requested, and the size of small free lists is less utilized, not merged, and wasted

Merge mode to deal with external fragments

The allocator keeps track of the number of free blocks in each chunk returned by the operating system. Whenever a chunk (such as a size-class chunk of 32) consists entirely of free blocks, the allocator removes the block from the current size-class and frees up memory for use by other size-classes

Segregated Fit

Each free list is associated with a size class and is organized into some type of explicit or implicit list
Each linked list contains potentially different-sized blocks whose sizes are members of the size class

process

Determine the size class of the request, first fit the appropriate free list, and find an appropriate block
If one is found, split it up and insert the rest into the appropriate free linked list
If no suitable block is found, the next free linked list of the larger size class is searched, and the process is repeated until a suitable block is found
If no suitable block is eventually found, additional heap storage is requested, a block is allocated from this new heap storage, and the remaining portion is placed in the largest size class
To free a block, we perform a merge, placing the result into the corresponding free linked list

The advantages and disadvantages

Is a common choice, and the GNU Malloc package available in the C library takes this approach
Both fast and efficient in the use of memory
The search time is reduced because the search is limited to a portion of the heap rather than the entire heap
Here’s an interesting fact: A simple first-fit search for a separated free linked list is equivalent to a best-fit search for the entire heap

Partner system

The buddy system is a special case of split matching, where each size class is a power of two. The basic idea is that assuming a heap size of two to the power of m, we maintain a separate free linked list for each block size of two to the power of k, where 0<=k<=m
The request block size is rounded up to the nearest power of 2
You start with a free block of two to the power of m words

process

advantages

Fast search and fast merge

disadvantages

Requiring a block size of powers of two can result in significant internal fragmentation, so partner system allocators are not suitable for general-purpose workloads
For some application-specific workloads where the block size advance is known to be a power of two, the partner system allocators are attractive

10.10 Garbage Collection

In a display allocator such as the C Malloc package, the application allocates and frees heap blocks by calling malloc and free. The application is responsible for releasing all allocated blocks that are no longer needed
A garbage collector is a dynamic storage allocator that automatically frees allocated blocks, called garbage, that are no longer needed by the program
The process of automatically reclaiming heap storage is called garbage collection
In a system that supports garbage collection, explicitly allocated heap blocks are applied, but they are never explicitly released, and the garbage collector periodically identifies the garbage blocks and calls free accordingly to put them back into the free linked list

10.10.1 Basic elements of the garbage collector

Garbage collector treats memory as a reachability graph
A set of root nodes and a set of heap nodes, each corresponding to an allocated block in the heap
The root node corresponds to a location that is not in the heap and contains Pointers to the heap, which can be registers, variables in the stack, or global variables in the read/write area of virtual storage

A node P is reachable when there is a directed path from any root node to P.
At any point, unreachable nodes corresponding to garbage cannot be reused by the application
The role of the garbage collector is to maintain some representation of the reachable graph and periodically reclaim unreachable nodes by releasing them and returning them to the free linked list
A garbage collector in a language like Java, which has tight control over how an application creates and uses Pointers, maintains an accurate representation of the reachable graph, and therefore can collect all garbage
Collectors for languages such as C and C++, which generally do not maintain an accurate representation of the reachable graph, are called conservative garbage collectors. They are conservative in the sense that every reachable block is correctly labeled as reachable, while some unreachable nodes may be incorrectly labeled as reachable

10.10.2 Mark&Sweep garbage collector

It consists of a Mark phase and a sweep phase
The mark phase marks all reachable and allocated successors of the root node, followed by the clear phase that frees each unmarked allocated block
One bit in the free lower position in the block header is used to indicate whether the block is marked

10.10.3c program for conservative Mark&Sweep

10.11C Common memory-related errors in programs

10.11.1 Indirectly Referencing a Bad Pointer

10.11.2 Reading Uninitialized memory

10.11.3 Allow stack buffer overflows

If a program writes to the target buffer on the stack without checking the size of the input string, the program will have a buffer overflow bug.

10.11.4 Assume that Pointers and the objects they point to are of the same size

10.11.5 Cause dislocation error

Creates a pointer array of n elements, but then attempts to initialize n+1 elements of the array. This process overwrites some memory behind the A array

10.11.6 refers to a pointer, not the object to which it points

Part 3: Interaction and communication between programs

Chapter 11 System-level I/O

Input/output (I/O) is the process of copying data between main memory and external devices such as disk drives, terminals, and networks
Input: Copies data from the I/O device to main memory
Output: Copies data from main memory to the I/O device
On Unix systems, these higher-level I/O functions are implemented by using system-level Unix I/O functions provided by the kernel

11.1 Unix I/O

All I/O devices, such as networks, disks, and terminals, are modeled as files, and all inputs and outputs are performed as reads and writes to corresponding files

Open the file
Change the current file location
Read and write files
Close files: No matter what reason the process terminates, the kernel closes all open files and frees their memory resources

11.2 Opening and Closing the File

11.3 Reading and Writing files

11.5 Reading File Metadata

11.6 Sharing a File

11.7 I/O Redirection

Chapter 12 Network programming

12.1 Client-server programming model

Every web application is based on the client-server model
An application consists of a server process and one or more client processes
The basic operation is a transaction

Client-server transactions versus database transactions

It is not a database transaction, and it does not have features of database transactions, such as atomicity, where a transaction is simply a series of steps performed between a client and a server

12.2 the network

Physically, a Network is a hierarchical system composed of geographical distances. The lowest layer is a Local Area Network (LAN), which is located within a building or campus
The most popular LAN technology is Ethernet, which has proven to be quite suitable between 3Mb/s and 1Gb/s
Each Ethernet adapter has a globally unique 48-bit address
An Ethernet segment consists of cables and a small box called a hub, usually serving a small area, such as a room or a floor. The hub indiscriminately copies each bit received from one port to all the other ports, so that each host can see each bit
A host can send a bit, called a frame, to any other host in the network segment. Each host adapter can see the frame, but only the destination host can actually read it
Frame = header (which identifies the source and destination addresses and the length of the frame)+ payload

Bridges make better use of cable bandwidth than hubs, and with a clever allocation algorithm, they automatically learn over time which hosts are reachable through which ports. Frames are then selectively copied from one port to another if necessary
For example, if host A sends A frame to host B on the same network segment, it will discard the frame when it reaches the input port of bridge X, thus saving bandwidth on other network segments
If host A sends A frame to host C on A different network segment, bridge X will only copy the frame to the port connected to bridge Y, and bridge Y will only copy the frame to the port connected to host C
At a higher level, multiple incompatible Lans can be connected through special computers called routers to form an Internet
Internet describes general concepts, while Internet is capitalized to describe a specific practical application (the Global IP Internet)
A wide-area Network (WAN) covers a larger geographical Area than a LOCAL Area Network

The Internet, which can consist of a variety of lans and Wans using completely different and incompatible technologies,

How is it possible for one source host to send bits of data across all these incompatible networks to another destination host?

A layer of protocol software that runs on each host and router. It smoothes out the differences between different networks. This software implements a protocol that controls how hosts and routers work together to transfer data

Naming methods Different LAN technologies have different and incompatible ways of assigning addresses to hosts. The Internet protocol eliminates differences by defining a consistent host address format that uniquely identifies it
Different networking technologies have different and incompatible ways of encoding bits on cables and encapsulating them into frames. The Internet protocol eliminates differences by defining a uniform way of strapping bits of data into discrete chunks, known as packets
A packet consists of a header and a payload. The header contains the size of the packet and the addresses of the source and destination hosts. The payload contains bits of data sent from the source host

12.3 Global IP Internet

The Internet can be thought of as a worldwide collection of hosts with the following characteristics:

The host set is mapped to a set of 32-bit IP addresses
This set of IP addresses is mapped to a set of identifiers called Internet Domain names
A process on an Internet host can communicate with a process on any other Internet host through a connection

12.3.1 IP address

An IP address is a 32 – bit unsigned integer
The IP address is in dotted decimal notation

12.3.2 Internet Domain names

Internet clients and servers use IP addresses to communicate with each other
Large integers are hard for people to remember, so a more human set of domain names and a mechanism for mapping domain names to IP addresses were defined
Every Internet host has a locally defined domain name, localhost, which is always mapped to the local loopback address 127.0.0.1

12.3.3 Internet Connection

Clients and servers communicate by sending and receiving byte streams over a connection
A connection is point-to-point in the sense that it connects a pair of processes
It is full-duplex in the sense that data flows in both directions
A socket is an end-point of a connection. Each socket has a corresponding socket address, which consists of an IP address and a 16-bit integer port,” Address: port”
When a client initiates a connection request, the port in the client socket address is automatically assigned by the kernel and is called the Ephemeral port.
The port in the server socket is usually a well-known port corresponding to the service, for example, port 80 is used by the Web server and port 25 is used by the email server
A connection is uniquely identified by the socket addresses at each end. These socket addresses are called socket pairs.

12.4 Socket Interfaces

Socket interfaces are a set of functions that combine Unix I/O functions to create network applications and are implemented on most modern systems

12.4.1 Socket Address Structure

12.4.2 socket function

The client and server use the socket function to create a socket descriptor.

12.4.3 connect function

The client establishes a connection with the server by calling connect

12.4.4 open_clientfd function

It is convenient to wrap the socket and connect functions into a helper function called open_clientfd. When the connect function returns, we return the socket descriptor to the client and the client can immediately start communicating with the server using Unix I/O

12.4.5 bind function

Tells the kernel to associate the server socket address with the socket descriptor

12.4.6 listen function

12.4.7 open_listenfd function

It is helpful to combine the socket, bind, and listen functions into a helper function called open_listenfd that the server can use to create a listener descriptor

12.4.8 the accept function

It is used by the server to wait for connection requests from clients
The accept function waits for the request from the client to arrive at the listening descriptor, listenfd, then fills in the client socket address in addr and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions

12.4.9 Echo Client and Server Examples

A simple Echo server can handle only one client at a time, iterating between them. This is called an Iterative server.

What does EOF mean?

There is no such thing as an EOF character
EOF is a condition detected by the kernel when an application receives a zero return code returned by the read function
For disk files, EOF occurs when the current file location exceeds the file length
For network connections, when a process closes the connection, EOF occurs at the other end of the connection, and the process at the other end detects EOF after trying to read the last byte in the stream

12.5 Web Server

12.5.1 Web based

Web clients and servers interact using a text-based application-level Protocol called HTTP(Hypertext Transfer Protocol)
FTP, file retrieval service
HTML(Hypertext Markup Language)

12.5.2 Web content

The content is a sequence of bytes associated with a MIME(Multipurpose Internet Mail Extensions) type

The Web server provides content to clients in two ways

Taking a disk file and returning it to the client is called static Content, and the process of returning the file to the client is called serving static Content.
Running an executable and returning the output to the client. The output produced by the executable is called dynamic Content, and running the program and returning its output to the client is called serving Dynamic Content.

12.5.3 HTTP transaction

The HTTP standard requires that each line of text be terminated by a carriage return and a newline pair

The HTTP request

HTTP/1.1 defines additional headers, such as advanced features such as caching and security, and supports a mechanism that allows clients and servers to perform multiple transactions over the same persistent connection
HTTP/1.0 and HTTP/1.1 are mutually compatible, and HTTP/1.0 clients and servers simply ignore HTTP/1.1 headers
The request header provides the server with additional information, such as the browser’s trademark name, or the MIME type

The HTTP response

Response line + Response header + Response body
The response header provides additional information about the response. The two most important headers are content-Type, which tells the client the MIME Type of the Content in the response body. And Content-Length, which indicates the size of the response body in bytes

12.5.4 Service Dynamic Content

How does the server serve dynamic content to the client? A de facto standard called CGI(Common Gateway Interface) addresses this problem

How does the client pass program parameters to the server?

The parameters of the GET request are passed in the URI,”? Each parameter is separated by an ampersand (&). No Spaces are allowed in the parameter and must be represented by the string “%20”. Similar encodings exist for other special characters
The parameters in the POST request are in the request body

How does the server pass parameters to child processes

After the server receives the following request (GET /cgi-bin/adder? 15000&213 HTTP/1.1), calls fork to create a child process that sets the CGI environment variable QUERY_STRING to “15000&213”. Adder programs can run in Unix The getenv function references it and calls execve to execute the /cgi-bin/adder program in the context of the child process, often referred to as a CGI program (it follows the CGI standard and is often written with Perl scripts, also known as CGI scripts).
For POST requests, the child process also needs to redirect standard input to the connected descriptor, from which the CGI program reads the parameters in the request body

How does the server pass additional information to child processes?

CGI defines a number of other environment variables that can be set at run time by CGI programs

Where does the child process send its output?

Before the child process loads and runs the CGI program, it uses the Unix dup2 function to redirect the standard output to the connected descriptor associated with the client
A CGI program sends its dynamic content to standard output
The parent process does not know the type or size of the Content generated by the child process, so the child process is responsible for generating the Content-Type and Content-Length response headers, as well as the empty line after the header

Chapter 13 Concurrent Programming

If logical control flows overlap in time, they are concurrent. This phenomenon, called concurrency, occurs at many different levels of computer systems, such as hardware exception handlers, processes, and Unix signal handlers
Applications that use application-level concurrency are called concurrent programs.

Application-level parallelism

Parallel computation is performed on the processor

On only a single CPU processor, concurrent flow is alternate, at any point in time, only have a flow on the CPU actual execution, however in a machine with multiple cpus, referred to as a multiprocessor, can really execute multiple flow at the same time, be divided into parallel application of concurrent flow, on a multiprocessor machine runs a lot faster

Accessing a slow I/O device

When an application is waiting for data to arrive from slow I/O devices (such as disks), the kernel runs other processes to keep the CPU busy and use concurrency by alternating I/O requests with other useful work

Interact with people

People who interact with computers require the ability to perform multiple tasks at the same time. For example, when printing a document, you might want to resize a window. Each time a user requests an operation, a separate concurrent logical flow is created to perform that operation

Reduce execution time by postponing work

For example, a dynamic storage allocator can reduce the latency of a single free operation by postponing a merge with a concurrent “merge” stream running at a lower priority, using idle CPU cycles

Serve multiple network clients

Create a concurrent server that creates a separate logical flow for each client, serving multiple clients simultaneously

Three basic ways to construct concurrent programs

process

Each logical control flow is a process that is scheduled and maintained by the kernel. Processes have a separate virtual address space, and to communicate with other processes, control flows must use some explicit interprocess communication (IPC) mechanism

I/O multiplexing

Applications explicitly schedule their own logical flows in the context of a process. The logical flows are modeled as state machines, and the main program explicitly transitions from one state to another as a result of data arriving at file descriptors. The program is a separate process, and all streams share the same address space

thread

A thread is a flow of logic running in the context of a single process, scheduled by the kernel. It can be thought of as a hybrid of the other two approaches, scheduling by the kernel like a process stream and sharing a virtual address space like an I/O multiplexing stream

13.1 Process-based concurrent programming

A client connection request is accepted in the parent process, and a new child process is created to serve each new client
A child process is derived from the server. The child gets a full copy of the server descriptor table, closes its listener descriptor, and the parent closes its connected descriptor

13.1.1 Process-based concurrent Servers

Typically, the server runs for a long time, so you need a SIGCHLD handler to recycle zombie child resources. When the SIGCHLD handler executes, the SIGCHLD signal is blocked, whereas Unix signals are not queued, so the SIGCHLD handler must be prepared to recycle multiple zombie processes The resources of the routine
Parent and child processes must close their respective copies of ConnFD (descriptors) to avoid memory leaks
Because of the reference count in the socket’s file table entry, the connection to the client is not terminated until both parent and child processes connFD are closed

13.1.2 The pros and cons of the process

Parent and child processes share state information, model: share file table, but not user address space

A separate process address space

Advantages: It is impossible for one process to accidentally overwrite another process’s virtual storage
Disadvantages: Makes it more difficult for processes to share state information, in order to share information, they must use explicit IPC mechanisms (which tend to be slow and expensive for process control and IPC)

13.2 Concurrent programming based on I/O multiplexing

The server must respond to two independent I/O events: a network client initiates a connection request; The user enters the command line on the keyboard
Multiplexing requires the kernel to suspend the process and return control to the application only after one or more I/O events have occurred
: once connected to a client, the continuous send back input line, until the client closes the connection, at this point, enter a command to the standard input, will not get a response, until the end of the between the client and the server, the better way is more granular multiplexing, server (at most) each time the loop back to send one line of text

13.2.1 Concurrent event-driven servers based on I/O multiplexing

13.2.2 Advantages and disadvantages of I/O multiplexing

advantages

It gives the programmer more control over the behavior of the program than process-based design. For example, imagine writing an event-driven concurrent server that provides the services that certain clients need
Running in the context of a single process, each logical flow has access to the entire address space of the process, data is easily shared between the flows, and programs can be debugged using debugging tools (such as GDB) as they would for sequential programs
Event-driven designs are often significantly more efficient than process-based designs, requiring no process context switch to schedule new flows

disadvantages

Coding is complex, for example, an event-driven concurrent server has three times more code than a process-based server, and the complexity increases as the granularity of concurrency decreases
Granularity: Indicates the number of instructions executed per time slice for each logical flow

13.3 Thread-based concurrent programming

A mixture of process-based and I/O based multiplexing approaches
A thread is a logical flow running in a process context, automatically scheduled by the kernel. Each thread has its own thread context, including a unique integer thread ID(Thread ID,TID), stack, stack pointer, program counter, general purpose registers, and condition codes
All threads running in a process share the entire virtual address space of that process

13.3.1 Threaded execution model

Each process begins its life cycle with a single thread, called the main thread.
At some point, the main thread creates a peer thread, and from this point on, the two threads run concurrently

Process and thread differences

Thread contexts are much smaller than process contexts and switch much faster
Threads, unlike processes, are not organized in a strict parent-child hierarchy. Threads associated with a process form a pool of peers Peers, created independently of other processes, a thread can kill any of its peers or wait for any of its peers to terminate, each reading and writing the same shared data

13.3.2 Posix threads

Posix Threads (Pthreads) is a standard interface for handling threads in C programs and is available on most Unix systems
About 60 functions are defined that allow programs to create, kill, and reclaim threads, securely share data with peers, and notify peers of changes in system state
Thread code and local data are wrapped in a thread routine, known as a Golang, and passed in a func

13.3.3 Creating a Thread

13.3.4 Terminating a Thread

A thread is terminated in one of the following ways

When the top-level thread routine returns, the thread terminates implicitly
Pthread_exit: The child thread calls the pthread_exit function, which terminates explicitly. This function returns a pointer to the return value thread_return
The main thread call, using the pthread_exit function, waits for all other peers to terminate and then terminates the main thread and the entire process, returning thread_return
Unix’s exit function: A peer thread calls Unix’s exit function to terminate a process and all threads associated with that process
Pthread_cancle: Another peer thread that terminates the thread corresponding to the specified thread ID by calling the pthread_cancle function

13.3.5 Reclaiming Resources from a Terminated thread

The thread waits for the specified thread to terminate by calling the pthread_join function
The pthread_join function blocks until the thread TID terminates, and then reclaims any memory resources occupied by the terminated thread

13.3.6 Separating threads

At any point in time, threads are joinable or detached.
A combined thread can be reclaimed and killed by other threads, and its memory resources (stack) are not freed until reclaimed
A detached thread cannot be reclaimed or killed by other threads, and its memory resources are automatically released by the system when it terminates
By default, threads are created to be joinable
To avoid memory leaks, each joinable thread should either be explicitly retrieved by another thread or separated by calling the pthread_detach function

13.3.7 Initializing a Thread

The pthread_once function allows you to initialize the state associated with thread routines

13.3.8 A thread-based concurrent server

Passes a pointer to the connected descriptor to the child thread
Without explicitly retrieving threads, each thread must be separated before the resource can be retrieved by the system upon termination

13.4 Shared Variables in multiple threads

13.4.1 Thread memory model

A group of concurrent threads runs within the context of a process, each with its own independent thread context (including thread ID, stack, stack pointer, program counter, conditional code, and register values for general purpose)
Multiple threads share the process context, including the entire user virtual address space, and a collection of open files
Registers are never shared, and virtual storage is always shared

13.4.2 Mapping variables to storage

Multithreaded C program variables

Global variables are defined outside of functions, and the read/write area of virtual storage contains only one instance
A variable defined internally by a local automatic variable (local variable) function that has no static property. At run time, each thread’s stack contains instances of all its local variables. Local variables are unique to each thread, even if multiple threads execute the same function
Local static variables a variable defined inside a function with a static attribute. Like global variables, the read/write area of virtual storage contains only one instance

13.4.3 Sharing variables

The variable v has only one runtime instance and is referenced by more than two threads

13.5 Synchronizing Threads with semaphores

13.5.2 Accessing shared variables using semaphores

A special type of variable is called a semaphore. The semaphore s is a global variable with a non-negative integer value that can only be handled by two special operations, called P and V
P (s) : If S is non-zero, P subtracts s by 1 and returns immediately. If s is zero, the process is suspended and blocked until S becomes non-zero. Then the process is restarted by V operation to complete P operation and gain control
V (s) : increments s by 1. If any processes are blocked in operation P, operation V will restart one of these processes

Binary semaphore

Each shared variable (or set of related shared variables) is associated with a semaphore s (initially 1), and the corresponding critical region is surrounded by P and V operations, which always have a value of 0 or 1, so it is called a binary semaphore
Progress of the figure

The exclusion zones created by P and V operations make it impossible for multiple threads to execute instructions within the enclosed critical zone at any one point in time. Semaphore operations ensure mutually exclusive access to the critical zone, a phenomenon commonly referred to as mutual exclusion.
Binary semaphores whose purpose is to provide a mutex are often called mutex. A P operation on a mutex is called locking, and a V operation is called unlocking. A thread that has locked a mutex but has not unlocked it is said to occupy a mutex

13.5.3 Posix semaphores

The Posix standard defines many functions that operate on semaphores. The three basic operations are sem_init, sem_WAIT (P operation), and sem_POST (V operation).

13.5.4 Scheduling shared resources using semaphores

Other synchronization mechanisms

Java threads are synchronized using a mechanism called the Java Monitor, which provides a higher level of abstraction of semaphore mutex and scheduling capabilities

13.6 Synthesis: Concurrent servers based on pre-threading

The downside of creating a new thread for each new client is that it can be costly

13.7 Other concurrency problems

13.7.1 Thread Safety

A function that consistently produces correct results if and only if called repeatedly by multiple concurrent threads is said to be thread-safe.

Class 1: Functions that do not protect shared variables

Modify an unprotected variable
Solution: Using synchronous operations such as P and V to protect variables has the advantage that the calling program does not need to make any changes, but the disadvantage is that synchronous operations will slow down the program execution time

Class 2: functions that protect state across multiple calls

Functions share a global variable
Solution: Caller passes status information in function arguments. Disadvantage: need to be forced to modify the code in the caller

Class 3: Functions that return Pointers to static variables

A global variable is shared
Use lock-and-copy technology to define a thread-safe wrapper function. Execute lock-and-copy and replace all calls to thread-unsafe functions by calling this wrapper function

Class 4: Functions that call thread-unsafe functions

13.7.2 Reentrancy

An important class of thread-safe functions is called reentrant functions.
Features: When called by multiple threads, no shared data is referenced
Reentrant functions are generally more efficient than thread-safe functions that cannot be reentrant because no synchronization operations are required

If all function arguments are passed by value (no Pointers) and all data references are local auto-stack variables (i.e., no references to static or global variables), then the function is explicitly reentrant and can be assumed to be reentrant regardless of how it is called
Add in some arguments to explicitly reentrant functions that can pass Pointers, and you have a implicitly reentrant function that is reentrant if the calling thread is careful to pass Pointers to unshared data

13.7.3 Using existing library functions in multithreading

Most Unix functions and functions defined in the standard C library are thread-safe, with a few exceptions
Unix systems provide reentrant versions of most thread-unsafe functions, always ending with the “_r” suffix

13.7.4 competition

Reason: Programmers assume that threads will follow some special path through the execution state space
Multithreading must work correctly for any possible track

13.7.5 deadlock

A deadlock, “deadlock,” is when a group of threads is blocked, waiting for a condition that will never be true

Avoid deadlock

Mutex lock order rule: If for each pair of mutex (s,t) in a program, each thread containing both S and T locks them simultaneously in the same order, then the program is deadlock-free
That is, both threads are starting from P(s) ->P(t)

Reading notes on An In-depth Understanding of Computer Systems

Machine-level representation of a program

Program code

compile

The build process

Assembly code and object code

The data format

Optimizing program performance

Write efficient programs

Compile technical

Memory alias usage

Eliminate circular inefficiencies

Amdahl’s law

Number cache-friendly code

Make the most common cases run faster

Within the loop, the number of cache misses is minimal

Time locality

Spatial locality

Memory mountain

The second part

Chapter 7 Links

7.1 Compiler driver

link

7.3 Object File

Relocatable target files

Executable object file

Sharing target files

7.4 You can relocate the target file

7.9 Loading an Executable object File

How does the loader actually work?

7.10 Dynamically Linking shared Libraries

7.12 Position-independent code

How can multiple processes share a copy of a program?

7.13 Tools for processing object files

Chapter 8 Abnormal control flow

8.1 abnormal

Exception Handling Procedure

8.1.1 Exception Handling

8.1.3 Types of Exceptions

interrupt

trap

The fault

Termination of

8.2 process

8.2.1 Logical Control Flow

8.2.2 Private Address Space

The structure of the address space for Linux processes

8.2.3 User Mode and kernel Mode

8.2.4 Context Switch

Context switching process

Cache contamination and exception control flow

8.3 System Call and Error Handling

8.4 Process Control

8.4.2

Three states of a process

The fork function

8.4.3 Reclaiming a subprocess

8.4.4 Hibernating processes

8.5 signal

The case of signals

8.5.1 Signal Terminology

8.5.2 Sending signals

Process group

8.5.3 Receiving signals

8.5.4 Signal processing Problems

8.5.5 Portable signal processing

8.6 Non-Local Hop

8.7 Tools for Operating processes

Chapter ix Measurement of program execution time

9.1 Time flow on computer systems

9.1.1 Process scheduling and timer interruption

9.1.2 Time from an application perspective

9.2 Using Interval counting to measure the time

9.3 Cycle Counter

9.4 Use cycle counter to measure program execution time

9.5 Measurement based on getTimeofday function

9.7 Looking ahead

9.8 Real life: K times optimal measurement method

9.9 Lessons learned

Chapter 10 Virtual Memory