Main memory (RAM) is a very important resource, and memory must be treated with care. While most memory is currently growing much faster than the IBM 7094, program size is growing much faster than memory. As Parkinson’s Law says, no matter how big memory is, the size of a program grows faster than the size of memory. Let’s take a look at how the operating system creates memory and manages it.

After years of discussion, people put forward a kind of memory hierarchy. The following is the classification of the hierarchical system

The top tier has the highest memory speed, but the smallest capacity, and the cost is very high. The lower the hierarchy, the slower the access efficiency, the larger the capacity, but the cheaper the cost.

The part of the operating system that manages the memory hierarchy is called the Memory Manager. Its main job is to efficiently manage memory, keep track of what memory is in use, allocate memory when processes need it, and reclaim memory when processes complete.

We will explore different memory management models, from simple to complex, focusing on the main memory model and how it is managed, since the lowest level of cache is managed by hardware.

Memoryless abstraction

The simplest memory abstraction is no storage. Early mainframe computers (prior to the 1960s), minicomputers (prior to the 1970s), and personal computers (prior to the 1980s) had no memory abstraction. Each program has direct access to physical memory. When a program executes the following command:

MOV REGISTER1, 1000
Copy the code

The computer moves the contents of the physical memory at location 1000 to REGISTER1. Thus, the memory model presented to the programmer at that time was physical memory, with each address containing an 8-bit cell from zero to the maximum number of memory addresses.

So it’s impossible for a computer in this situation to have two applications in memory at the same time. If the first program writes a value to the memory address 2000, that value will replace the value of the second program at that location, so running two applications at the same time won’t work, and both programs will crash immediately.

But even if the memory model is physical memory, there are some options. Three variations are shown below

In Figure A, the operating system is located at the bottom of Random Access Memory (RAM) or at the top of ROM(Read-Only Memory) as in Figure B. In Figure C, the device drivers are in ROM at the top and the operating system is in RAM at the bottom. The model in Figure A was once used on mainframes and minicomputers, but is now rarely used; The model in Figure B is typically used in handheld computers or embedded systems. A third model was used in early personal computers. Part of the ROM System becomes the Basic Input Output System (BIOS). The disadvantage of models A and C is that errors in user programs can break the operating system, with potentially catastrophic results.

When the system is organized this way, there is usually only one thread running at a time. Once a user types a command, the operating system copies the desired program from disk to memory and executes it. When the process finishes running, the operating system displays a prompt on the user terminal and waits for a new command. After receiving the new command, it loads the new program into memory, overwriting the previous one.

One way to achieve parallelism in a system without memory abstraction is to use multithreading for programming. Since multiple threads in the same process internally share the same memory image, parallelism is not an issue.

Run multiple programs

However, it is possible to run multiple programs simultaneously without memory abstraction. The operating system simply saves everything in current memory to a disk file and then reads programs into memory. As long as there is only one program at a time, there is no conflict.

With the help of extra special hardware, it is possible to run multiple programs in parallel, even without switching. Early models of the IBM 360 were solved in this way

The System/360 was IBM’s groundbreaking mainframe computer introduced on April 7, 1964. It was the world’s first instruction set compatible computer.

In the IBM 360, memory was divided into 2KB area blocks, and each area was assigned a 4-bit protection key, which was stored in a special REGISTER of the CPU. A machine with 1 MB of memory would only need 512 of these 4-bit registers for a total of 256 bytes. PSW(Program Status Word) has a 4-bit code. If a running process accesses memory with a different key than its PSW code, the 360 hardware will notice this because only the operating system can change the protection key, thus preventing interference between processes and between user processes and the operating system.

There is a drawback to this solution. As shown below, suppose you have two programs, each 16 KB in size

As can be seen from the figure, this is the loading process of two different 16KB programs. Program A will first jump to address 24, where is a MOV instruction, while program B will first jump to address 28, where is a CMP instruction. This is the situation when two programs are successively loaded into the memory. If the two programs are simultaneously loaded into the memory and executed from address 0, the state of the memory is as shown in figure C above. When the program is loaded and started to run, the first program starts to run from address 0 and executes JMP 24 instruction. The following instructions (many of which are not drawn) are then executed in sequence, and after a period of time the first program is completed, the second program is executed. The first instruction of the second program is 28, which causes the program to jump to ADD of the first program instead of the preset jump instruction CMP. The program may crash in 1 second due to incorrect access to memory address.

The core problem with executing both programs simultaneously is that they both reference absolute physical addresses. That’s not what we want to see. What we want is for each program to reference a private local address. The IBM 360 uses a technique called static relocation to modify the second program when it is loaded into memory. It works like this: when a program is loaded at address 16384, the constant 16384 is added to each program address (so JMP 28 becomes JMP 16412). While this mechanism can work without error, it is not a universal solution and will slow down loading. More recently, it requires additional information in all executable programs to indicate which (relocatable) addresses are included and which are not. After all, JMP 28 in figure B can be redirected (modified), while similar to MOV REGISTER1,28 moves the number 28 into REGISTER without being redirected. Therefore, loaders need some ability to distinguish between addresses and constants.

A memory abstraction: address space

Exposing physical memory to processes has several major disadvantages. The first problem is that if user programs can address every byte of memory, they can easily corrupt the operating system and bring it down (unless protected with IBM 360 lock-and-key mode or special hardware). This problem exists even when only one user process is running.

The second point is that this model it is difficult to run any more programs (if there is only one CPU that is executed sequentially), on a personal computer, usually open many applications, such as im, email, web browser, there will be a process of these processes in different time is running, other applications can use the mouse to wake up. This is difficult to implement with no physical memory in the system.

The concept of address space

If you want multiple applications running in memory at the same time, you must solve two problems: protection and relocation. Let’s look at how the IBM 360 solves this problem: The first solution is to mark blocks of memory with a protection key and compare the key for the execution of the procedure with the key extracted for each stored word. This only solves the first problem, but it still doesn’t solve the problem of multiple processes running in memory at the same time.

A better approach is to create a storage abstraction: the Address space. Just as the concept of a process creates an abstract CPU to run programs, address Spaces create an abstract memory for programs to use. An address space is a set of addresses that a process can use to address memory. Each process has its own address space, separate from the address space of other processes, but some processes may wish to share the address space.

Base address register and address change register

The simplest approach is to use dynamic relocation, which is a simple way to map the address space of each process to a different region of physical memory. The classic approach used to go from the CDC 6600(the world’s first supercomputer) to the Intel 8088(the heart of the original IBM PC) was to configure each CPU with two special hardware registers, They are usually called basic register and limit Register. When using base address registers and indexed registers, programs are loaded to contiguous spatial locations in memory without relocation during loading. When a process runs, the starting physical address of the program is loaded into the base address register, and the length of the program is loaded into the indexing register. In Figure C, when a program runs, the base address and variable address registers loaded into these hardware registers have values of 0 and 16384, respectively. When the second program runs, these values are 16384 and 32768, respectively. If a third 16 KB program was loaded directly at the address of the second program and run, the base and change registers would be 32768 and 16384. So we can summarize

  • Base address register: The starting location of data storage memory
  • Indexing register: Stores the length of an application.

Each time a process references memory to obtain instructions or to read or write data words, the CPU hardware automatically adds the base address value to the address generated by the process and then sends it to the memory bus. At the same time, it checks whether the address provided by the program is equal to or greater than the value in the indexing register. If the program provides an address that exceeds the range of the indexing register, an error is generated and access is aborted. Thus, after executing JMP 28 in figure C, the hardware interprets it as JMP 16412, so the program can jump to the CMP instruction as follows

Using base address registers and indexed registers is a good way to give each process a private address space, because each memory address is appended to the base address register before it is sent to memory. In many practical systems, the base and indexed registers are protected in such a way that only the operating system can modify them. Protection for these registers is provided in the CDC 6600, but not in the Intel 8088, not even an indexing register. However, the Intel 8088 provides a number of base-address registers that allow the code and data of a program to be independently relocated, but provides no protection against out-of-scope memory references.

So you can see the disadvantages of using base address registers and indexed registers, adding and CMP operations every time memory is accessed. Comparisons can be performed quickly, but addition is relatively slow unless special addition circuits are used, which is slowed down by carry travel time.

Switching technology

If the computer’s physical memory is large enough to hold all the processes, then the scheme mentioned earlier is more or less feasible. In reality, however, the total amount of RAM required by all processes is much greater than the amount of memory available. On Windows, OS X, or Linux, about 50-100 processes start after the computer completes Boot. For example, when a Windows application is installed, it usually issues commands so that on subsequent system boot, a process will start that does nothing but check for updates to the application. A simple application might take up 5-10MB of memory. Other background processes check E-mail, network connections, and many other such tasks. This all happens before the first user launches. Today, important user applications like Photoshop require only 500 MB to start, but many GIGABytes to process once they start processing data. As a result, it takes a lot of memory to keep all the processes in memory at all times, and if it’s not enough, it can’t be done.

One of the considerations is swapping a process into memory, and then running in memory for a while before putting it back on disk. Idle processes are stored on disk, so they don’t take up much memory when they’re not running. Another strategy is called virtual memory, a technology that allows parts of an application to run in memory. So let’s talk about interchange first

Exchange process

Here’s an exchange

At the beginning, only process A is in memory, then processes B and C are either created or swapped into memory from disk, then in Figure D, A is swapped out of memory to disk, and finally A comes back in. Because process A in Figure G is now in A different location, it needs to be relocated during loading or executed by software when switching programs; Or by hardware relocation during program execution. Base address registers and address change registers are suitable for this case.

Swapping creates multiple free areas (holes) in memory, and memory moves all free areas as far down as possible to merge them into one large free area. This technique is called a memory compaction. But this technique is usually not used because it consumes a lot of CPU time. For example, copying 8 bytes for every 8ns on a 16GB ram machine, it takes about 16s to shrink the entire memory.

One notable issue is how much memory should be allocated to a process when it is created or swapped into memory. If a process’s size is fixed after it is created and does not change, then the allocation strategy is simple: the operating system allocates exactly what size it needs.

But if a process’s data segment can grow automatically, for example, by dynamically allocating memory in the heap, there will be a problem. Let’s talk about data segment again. At the logical level, the operating system stores data in segments (regions) :

  • Codesegment/TextSegment:

Also known as text segment, used to store instructions, run code a chunk of memory space

This space size is determined before the code runs

Memory space is generally read-only, and code for some schemas is allowed to be writable

It is also possible to include read-only constant variables, such as string constants, in a code snippet.

  • Datasegment:

Can read but write

Stores initialized global variables and initialized static variables

The lifetime of data in a data segment is process persistence: it exists when the process is created and disappears when the process dies

  • BSS segment:

Can read but write

Store uninitialized global and uninitialized static variables

The lifetime of data in the BSS segment is persistent with the process

The default value of data in the BSS segment is 0

  • Rodata section:

Read-only data such as format strings in printf statements and jump tables for switch statements. That’s the constant region. For example, const int ival = 10 in the.rodata segment; Another example is printf(“Hello world %d\n”, c); The format string “Hello world %d\n” in the statement is also stored in the.rodata section.

  • Stack:

Can read but write

Stores local variables in functions or code (non-static variables)

The lifetime of the stack lasts with the code block, you get allocated space when the code block runs, and you automatically reclaim space when the code block ends

  • Heap:

Can read but write

Stores malLOc/RealLOC space dynamically allocated during program execution

The lifetime of the heap persists with the process, from malloc/ Realloc to free

Here is the result of compiling with Borland C++

_TEXT segment dword public use32 'CODE' _TEXT ends _DATA segment dword public use32 'DATA' _DATA ends _BSS segment dword  public use32 'BSS' _BSS endsCopy the code

A segment is used to distinguish or divide areas of scope. The segment directive in assembly language indicates the beginning of a segment definition, and the ENDS directive indicates the end of a segment definition. A segment definition is a contiguous segment of memory

So there are three ways to deal with memory for areas that grow automatically

  • If a process is adjacent to a free area, the free area can be allocated to a process for its enlargement.

  • If a process is adjacent to another process, there are two ways to do this: either move the process that needs to grow to an area of memory where the free area is large enough, or swap out one or more processes to create a large free area.

  • If a process cannot grow in memory and the swap space on disk is full, the process can only suspend some free space (or it can terminate the process)

Above only for the individual or a small part of the need to increase the process of using, if most of the process are growth at run time, in order to reduce caused by memory area is not enough overhead, produced by the process of exchange and mobile is a available method, while in or moving process allocate some additional memory for it. However, when a process is swapped out to disk, only the memory that is actually used should be swapped. Swapping additional memory is also wasteful. Here is a memory configuration that allocates increased space for two processes.

If the process has two growable segments, for example, a data segment used as a heap (global variables) for dynamic allocation and release of variables, and a stack segment for local variables and return addresses, this is shown in Figure B. In the figure, you can see that the stack segment of the process shown grows down at the top of the memory occupied by the process, followed by the data segment following the program segment. When the reserved memory area for growth is insufficient, the processing method is the same as the above flow chart (data segment automatic growth three processing methods).

Free memory management

When memory is allocated dynamically, the operating system must manage it. Roughly speaking, there are two ways to monitor memory usage

  • The bitmap (bitmap)
  • Free Lists

Let’s explore these two uses

Storage management using bitmaps

With the bitmap approach, memory may be divided into allocation units as small as a few words or as large as a few thousand bytes. Each allocation unit corresponds to a bit in the bitmap, with 0 indicating free and 1 indicating occupied (or vice versa). A memory region and its corresponding bitmap are shown below

Figure A represents a segment of memory with five processes and three free areas. The scale is memory allocation unit, and the shaded area is free (represented by 0 in the bitmap). Figure B represents the corresponding bitmap; Figure C shows the same information as a linked list

The size of the allocation unit is an important design factor; the smaller the allocation unit, the larger the bitmap. However, even with only a 4-byte allocation unit, 32-bit memory requires only one bit of the bitmap. 32N bits of memory requires n bits of bitmap, so one bitmap takes up 1/32 of memory. If you choose a larger memory location, the bitmap should be smaller. If the size of the process is not an integer multiple of the allocation unit, then a lot of memory is wasted in the last allocation unit.

Bitmaps provide an easy way to track memory usage in fixed-size memory, because the size of a bitmap depends on the size of memory and allocation units. One problem with this approach is that when deciding to put a process with K allocation units into memory, the Memory Manager must search the bitmap for strings capable of running k consecutive 0 bits. The downside of bitmaps is that finding consecutive zeros of specified length in a bitmap is a time-consuming operation. (This can be simply understood as finding a large number of free array elements in a chaotic array.)

Use linked lists for management

Another way to keep track of memory usage is to maintain a linked list of allocated memory segments and free memory segments, which may contain free areas for processes or both processes. You can use Figure C above to represent memory usage. Each item in the list can represent a free area (H) or process (P) start flag, length, and the location of the next item.

In this example, the segment list is sorted by address. The advantage of this approach is that it is easy to update the list when a process terminates or is swapped. A terminating process usually has two neighbors (except the top and bottom of memory). Adjacent areas can be processes or free areas, and they can be combined in four ways.

When storing processes and free areas in a linked list by address, there are several algorithms that can allocate memory for processes that are created (or swapped in from disk). Assuming that the memory manager knows how much memory to allocate, the simplest algorithm is to use first Fit. The memory manager scans along the segment list until it finds a large enough free area. Unless the size of the free area is the same as the size of the space to be allocated, the free area is divided into two parts, one for the process; Some generate new free areas. The first fit algorithm is a fast algorithm because it searches the linked list as much as possible.

A small variation of the first fit is next fit. It works in the same way as the first match, except that the next match records its location each time it finds a suitable free area, so that the next time it finds a free area it starts where it left off, rather than starting from the beginning like the first match algorithm does. Bays(1997) proved that the performance of the next algorithm is slightly lower than that of the first matching algorithm.

Another well-known and widely used algorithm is Best Fit. The best fit is to go through the list and find the smallest free area that can accommodate the process. The best fit algorithm tries to find the free area closest to the actual need in order to best match the request and available free area, rather than first breaking up one large free area at a time that may be used later. For example, if we now need a block of size 2, the first match algorithm will allocate the block to the free region at position 5, and the best match algorithm will allocate the block to the free region at position 18, as follows

So what is the performance of the best fit algorithm? The best match algorithm traverses the entire linked list, so the performance of the best match algorithm is worse than the first match algorithm. But surprisingly, the best-fit algorithm wastes more memory than the first-and-second-match algorithm because it generates a lot of useless small buffers, and the first-match algorithm generates a larger free area.

The best-fit free area splits into many very small buffers. To avoid this problem, consider using a worst-fit algorithm. Always allocate the largest memory area (so you can see why the best fit algorithm splits many small buffers) so that the newly allocated free area is large enough to continue to use. Simulations show that worst-fit algorithms are not a good idea either.

The speed of all four algorithms can be improved if separate linked lists are maintained for processes and free areas. Thus, the goal of all four algorithms is to examine the free area rather than the process. But one of the unavoidable costs of this increased allocation speed is increased complexity and slower memory free times, because a reclaimed segment must be removed from the process list and inserted into the free list area.

If the process and the free area use different lists, then the free area list can be sorted by size to speed up the best fit algorithm. When using the best matching algorithm to search the list of free areas arranged from small to large, as long as an appropriate free area is found, this free area is the smallest free area that can accommodate the job, so it is the best matching. Because the free zone list is organized as a single linked list, no further search is required. When the free list is sorted by size, the first fit is as fast as the best fit, and the next fit is meaningless here.

Another allocation algorithm is the Quick Fit algorithm, which maintains separate linked lists for free areas of the usual size. For example, if you have a table with n entries, the first entry of the table is a pointer to a free list header of size 4 KB, the second entry is a pointer to a free list header of size 8 KB, the third entry is a pointer to a free list header of size 12 KB, and so on. For example, a free area of 21 KB can be placed in either a 20 KB linked list or a special free area linked list.

The fast matching algorithm is also very fast to find a designated free area, but it has a common disadvantage, like all schemes that sort the free area by size, that is, when a process terminates or is swapped out, the process of finding its adjacent blocks and seeing if it can be merged is very time-consuming. Without consolidation, memory quickly splits into small free areas that processes can’t use.