One, the introduction
The various memory management strategies of operating systems all serve the same purpose: to keep multiple processes in memory at the same time in order to allow multiprogramming. However, each of these strategies requires that the entire process be kept in memory before execution. Dynamic loading relieves this limitation, but requires careful application and extra work.
Virtual memory, on the other hand, allows the execution process to be partially in memory, a significant advantage being that the program can be larger than physical memory. Moreover, virtual memory abstracts memory into a huge array, separates the logical memory from the physical memory in the user’s view, and makes programmers free from the limitations of memory storage. In short, virtual memory presents the programmer with a much larger, contiguous memory space than physical memory, which is actually mapped to fragmented physical memory and even disk.
\
\
However, virtual memory is not easy to implement and can significantly degrade performance when used incorrectly.
Two, as required page
1. Basic concepts
Pages are called into memory only when they are needed.
This solution requires hardware support to distinguish which pages are in memory and which are on disk. It is represented by valid/invalid bits. When the bit of an entry in the page table is valid, the page is valid and in memory. Otherwise, it could be illegal, or it could be legal but not in memory.
When a process attempts to access pages that have not yet been called into memory, it causes a page-fault trap. Proceed as follows:
1) Check the internal process page table, usually saved with the PCB. To determine the validity of the reference
2) If it is illegal, the process terminates; Otherwise, call in:
3) Find a free frame
4) Schedule a disk operation to load the required page into the just allocated frame
5) After the disk read operation is complete, modify the internal table and page table. , indicating that the page is already in memory
6) Restart the instruction that was interrupted by the trap.
\
2, as required page performance
Reducing the page error rate is very important for on-demand paging.
Another is the use of swap space processing. Disk IO to swap space is usually faster than to a file system because swap space is allocated in chunks rather than using file lookup and indirect allocation methods. Therefore, it is possible to obtain better performance by copying the entire file image to swap space at process start and performing page by page scheduling from space swap.
Another option is to start paging on demand from the file system, but the displaced pages are written to swap space, and subsequent paging is read from swap space. This approach ensures that only the required pages are called in from the file system, while maintaining a certain level of performance.
\
Copy while writing
Some processes, such as children that fork(), do not need to move pages on demand. Instead, they share the page with the parent and make a copy of the page when the child needs to change it. Is for copy on write.
Where free pages are allocated is important when a page needs to be written and copied. Many operating systems provide free buffer pools for this purpose.
\
Four, page replacement
Memory is sometimes overallocated and processes need to use more pages than they can allocate. In addition, memory is not only used for the pages of the process, IO cache also needs to use a large amount of memory, which may lead to insufficient memory requirements. When a page error occurs in the process, the operating system prepares the required pages to be loaded, but finds no idle frames to allocate. The so-called house is optimistic, the car is also optimistic, everything looks bicolor ball.
1, page replacement
In this case, the operating system can choose to kill the starving process or swap out a hapless one. More often, page substitutions are used:
If there are no free frames, the frame that is not currently in use is found and freed to save the page that the process failed (that is, the page that needs to be paged in). \
You must also write the page back to disk if it changes. Performance can be improved by setting modified bits or dirty bits.
\
Page replacement algorithm:
2, FIFO page replacement
The simplest page replacement algorithm. Select the oldest page for displacement. To create a FIFO queue to manage all pages in memory, the first page in the queue is replaced and the newly called page is added to the end of the queue.
FIFO algorithms are easy to understand and implement, but their performance is not always good. The replaced page may still be in use, immediately after the page error, request to change back.
\
3. Optimal permutation
Replace the page that has not been used for the longest time. . This algorithm has the lowest page error rate.
The problem with this algorithm is that it is difficult to implement.
\
4. LRU page replacement
Approximation of optimal permutation. The key difference between optimal substitution and FIFO is that FIFO uses page call-in time, whereas optimal substitution looks at the time the page will be used in the future. If the nearest to the past is used as the near future approximation, the longest page that has not been used can be replaced. Guess the future from the past. This approach is called the least recently-used algorithm.
LRU algorithm, available counter, also available stack: where used pages, on the top, not sink to the bottom of the stack.
\
5. Approximate LRU page permutation
Few computer systems provide enough hardware to support true LRU page replacement. However, many systems do approximate permutations by referring to bits:
Each entry in the page table is associated with a reference bit, and each time a page is referenced, the reference bit is set by hardware.
At first, all reference bits are zeroed out, and many are later set to 1. By examining the reference bits, you can tell which pages are used and which are not. This information is the basis of the approximate LRU permutation algorithm.
There are several approximate LRU positioning algorithms:
1) Additional reference bit algorithm
Each page has an 8-bit byte as reference bit, and the reference bit is refreshed periodically. When there is a reference, the byte takes the highest position of 1, and other bits move to the right, squeezing out the original lowest position. Pages with minimum reference bits can then be replaced.
\
2) Quadratic chance algorithm
When an unlucky page is selected, check its reference bit, and if it is 0, swap it out; If the reference bit is 1, give it a chance, let it go, and move on to the next bad page. The page given the chance to respawn is cleared of reference bits and reset the time. At least it won’t be replaced until all the pages have been searched.
\
3) Enhanced quadratic chance algorithm
By considering the reference bit and the modification bit as an ordered pair:
(0,0) not used or modified recently: switch, don’t hesitate
(0,1) recently unused but modified: write back to disk before replacement, please think again!
(1,0) recently used but not modified: may be used again soon
(1,1) recently used and modified: may be used again soon, and write back to disk before replacement, please think again!
\
Counting based page displacement
Set a counter for each page, resulting in two scenarios
1) Least frequent use of page replacement algorithm (LFU)
Substitution count minimum page. The reason is that the active page should have more citations. But there can be problems like this: a page may be used a lot at first, but then not anymore. The solution is to periodically shift the number register one bit to the right to form an exponentially decaying average number of uses.
\
2) Page replacement algorithm (MFU) is most commonly used
Displacement count maximum page. Reason: The min times page may have just been brought in and not yet used.
\
7. Page caching algorithm
Reserve a buffer pool of free frames.
1) Maintain a list of modified pages. Whenever the paging device is idle, a modified page is selected and written to disk with its modified bit reset. This scheme increases the number of clean pages and reduces the probability of writing in substitution.
2) Keep a free frame pool and remember the relationship between pages and frames. When frames need to be reused, they are taken from the pool first, with no disk IO.
\
8. Application and page replacement
Sometimes, applications using virtual memory through the operating system can get worse results. Databases are an example. Because the database can provide its own memory management and IO buffering, because it better understands its own memory usage and disk usage. Because of this, the operating system allows special programs to use the disk as an array of logical blocks without passing through the operating system’s file system.
\
5. Frame allocation
How do I allocate some free memory between processes?
A simple way to do this is to hang frames on a list of free frames and allocate them when a page error occurs. When the process terminates, the frame is put back into the free frame list.
The frame allocation strategy is limited in several ways. For example, the number of allocated frames cannot exceed the number of available frames and must be allocated at least a minimum. One of the reasons for keeping the minimum is performance. An increase in page errors slows down the process. Also, if a page error occurs before the instruction completes, the instruction must be re-executed. So having enough frames is crucial.
The minimum number of frames per process is determined by the architecture, while the maximum number is determined by the amount of physical memory available.
1. Frame allocation algorithm
1) Divide evenly, with the same amount per process
2) Use proportion according to process size
3) According to process priority
4) Combination of size and priority
\
2. Global and local allocation
Global displacement allows a process to choose a frame from all the set to replace, regardless of whether the frame has been assigned to another process, that is, it can rob frames from other processes, such as high-priority frames from low-priority frames; Local allocation requires each process to allocate only from its own allocation frame.
Global substitutions generally have better throughput and are more commonly used.
\
6. System turbulence
If a process does not have the frames it needs, it quickly generates a page error and must replace a page. However, all pages are in use, and when you replace one page, you immediately have to replace it again, and page errors occur frequently, which is called bumping.
Turbulence causes serious performance problems. The operating system monitors the CPU usage. If the CPU usage is too low, the system introduces new processes. Global displacement algorithm is used, but no matter which process the page belongs to, grab the change. Suppose a process needs more frames and starts to get a page error, snatching frames from other processes. The robbed process is removed from the ready queue and the CPU usage decreases. The CPU scheduler finds out and calls in more processes in an attempt to get the CPU high. More frames are grabbed, queues take longer, CPU usage drops further, the CPU scheduler tries harder to load more processes…
Eventually, the process is so busy paging that the system can’t get a job done.
Using local displacement can limit system turbulence, but does not completely solve the problem.
1. Working set model
To prevent bumping, a process must get enough frames to start. The operating system keeps track of each process’s work set and assigns it frames larger than its work set. If there is free time, it is possible to start another process. If the sum of a process’s working sets exceeds the total number of available frames, it is paused and its frames are allocated to other processes. Pending processes wait to be restarted later. This is the working set model. The difficulty is keeping track of the work set.
2. Page error frequency strategy
In addition to work sets, another scheme to prevent bumping is the page error frequency policy.
If a process has too many page errors, it needs more frames, give it! If the page error frequency is too low, it means that the frame is surplus, and give some to others. Set upper and lower page error rates for the process and allocate frames flexibly.
As with the working set model, if frames are needed and there are no frames to assign, the process should be suspended and released to other processes with the same high page-error rate.
\
Memory mapped files
Typically, each access to a file requires a system call and disk access, but there is another approach: use virtual memory technology to access file IO as normal memory. This means that accessing a file is like accessing memory.
1. Basic mechanics
Map disk blocks to memory pages (one or more pages). In the beginning, page scheduling will generate page errors, so that the contents of the file will be read into physical memory. Reading and writing files is like memory access, simplified by memory manipulation of files instead of system calls read() and write().
Where writes to files may not be written to disk immediately, unless dirty page replacement or periodic operating system checks are performed, or the file is closed?
If a file is shared by multiple processes, it is mapped to their respective virtual memory to allow data sharing. Any process that modifies data in virtual memory can be seen by other processes. If there is a change, it is to modify the respective copy, write copy. And maybe mutual exclusion.
2, WIN32 API shared memory
To put a File on disk into a process’s virtual address space and create an area in that process’s virtual address space to “store” the File. This space is called a File View. The system also generates a File Mapping Object (stored in physical memory) to maintain this Mapping relationship. In this way, when multiple processes need to read and write data from that File, their File views actually correspond to the same File Mapping Object. Doing so saves memory, keeps data synchronized, and achieves data sharing.
3. Memory mapping IO
By mapping an IO device to memory, reading and writing to the memory is like reading and writing to the IO device, without directly manipulating the IO device. For example, if each point on the screen corresponds to a memory address, the program can control the memory and control the screen display.
\
Kernel memory allocation
When a user-mode process needs additional memory, pages can be retrieved from the linked list of free page frames maintained by the kernel. Typically, page frames are scattered across physical memory, but kernel memory is usually fetched from the free memory pool for two main reasons:
1) The kernel needs to allocate memory for data structures of different sizes, so it must use it sparingly and minimize fragmentation waste. Many operating systems have kernel code and data that are not controlled by the paging system
2) Some hardware needs to deal directly with physical memory, rather than through virtual memory interfaces, and therefore requires memory to reside in contiguously physical pages
Kernel processes manage memory in two ways:
1. Buddy system
Allocate from physically continuous, fixed size segments, by the power of 2 to allocate, such as 4K, 8K, etc. The advantage is that larger segments can be formed quickly by merging, but fragmentation is easily generated.
\
2. Slab allocation
According to the size required by the data structure of the kernel object, a number of memory blocks are pre-allocated, waiting to be used.
Specifically, kernel object pairs should be cached, and the cache contains slabs of memory. . A slab can be in three states: full, empty, and partial. When allocating, first allocate from the idle part, not from the empty part; If that’s not enough, assign new ones from the physical continuum page.
Advantages:
1) Size varies according to kernel object requirements, no fragmentation
2) Advance preparation can quickly meet the requirements
\
Ix. Other considerations
1. Pre-set page
A notable feature of pure on-demand paging is the large number of page errors that occur when a process starts. The strategy for preloaded pages is to load all the required pages into memory at the same time. The key is whether the cost is less than the cost of the corresponding page error.
\
2. Page size
Whether to use large pages or small pages is a question.
1) Large pages help reduce page tables
2) Small pages help reduce fragmentation and make better use of memory
3) small page transmission is fast, large page IO is good, but not necessarily, small page because of addressing, transmission is fast, the locality is improved, the total IO will be reduced, then, should use small page?
4) However, large pages can reduce the number of page errors
…
Now, Chicko, you tell me, should I use the big page or the small one?
\
3. Scope of TLB
TLB improves the memory access speed. If TLB is not available, memory access is required twice each time, that is, to obtain the physical address from the page table and obtain the data.
The TLB maintains only a small part of the page table entries. During the logical address translation process, the TLB searches for the physical address first. If the physical address is found, the physical address is readily available. If not, use the substitution algorithm to replace the related entry into the TLB, and then obtain the physical address. \
So improving the TLB hit rate is critical.
Increasing THE TLB hit ratio increases the number of TLBS, but at a cost because the associated memory used to construct TLBS is expensive and power-hungry. Another approach is to increase the page size or provide multiple page sizes.
\
4. Reverse page table
The reverse page table can save memory, but when the page referenced by the process is not in memory, an external page table is still needed to get information about which virtual memory page the physical frame holds. Fortunately, this is only needed in the event of a page error; the external page table itself can be swapped out and in, not necessarily complete.
\
5. Program structure
When we write programs, we don’t care about memory. But sometimes a little memory knowledge can improve system performance:
For example, if you have a 128*128 two-dimensional array, and the data is stored in rows, how can the traversal performance be high?
int i,j;
int[128] [128] data;
Copy the code
Suppose we run the outer loop in columns:
for(int j=0; j<128; j++)for(int i=0; i<128; i++) data[i][j] =0;
Copy the code
If the page is exactly 128 words in size, this is equivalent to calling one page for each inner loop, each time just to change a number. If the number of frames assigned to the process is less than 128, a total of 128 * 128 = 16,384 page errors will be generated!
But suppose you write it like this:
for(int i=0; i<128; i++)for(int j=0; j<128; j++) data[i][j] =0;
Copy the code
\
For each page, the number on the page is modified before the next page, resulting in a total of 128 page errors. \
\
6. I/O interlock
Allows pages to be locked in memory.
In a global substitution algorithm, a process makes an I/O request, is added to the I/O device wait queue, and the CPU is handed over to another process. These processes had page errors and replaced the waiting process’s cached pages for IO, which were swapped out. Well, the process that requests the IO waits for the IO device to do the IO for the specified address, but the frame is already being used by different pages of other processes.
There are two common solutions to this problem:
1) Do not IO user memory. If you do IO, copy user memory data to system memory. It’s too expensive to copy once.
2) Physical frames have a lock bit that allows pages to be locked in memory. If locked, it cannot be replaced. When THE IO completes, the page is unlocked.
The lock bit is useful, such as operating system kernel pages are usually locked; A page of a low-priority process must run at least once before it can be unlocked and replaced.