Series Of articles

Virtual Memory in Computer systems – Part 1 (Fundamentals)

Virtual Memory Action (Continued)

Simplified memory management


The virtual memory mechanism provides a great convenience for processes to use memory by providing them with a consistent, independent address space. On a 32-bit system, each process gets 4GB of virtual memory to itself, and even with a few hundred megabytes of physical memory, processes feel like they’re using 4GB of memory.As shown in the figure, each process maintains a separate page table, performs virtual address translation work separately, and multiple processes can share the same physical page, which is the basis for dynamic libraries and shared memory.

The combination of on-demand page scheduling and separate address Spaces has had a profound impact on the use and management of memory in the system. In particular, VM simplifies linking and loading, code and data sharing, and how applications allocate memory.

  • Simplify links. Separate address Spaces allow each process’s memory image to use the same basic format, such as the ELF format used by object files in Linux, regardless of where the code and data are stored in physical memory. On Linux, as shown below, each process uses a similar memory format.

  • Simplify loading. Virtual memory simplifies the steps of loading executable files and shared targets into memory. The loader does this by paging through the object file’s code and data, then constructing page tables that point to those pages, and the virtual memory system does the rest. When the process accesses the target, the VM loads the corresponding page into memory through page shortage processing, so as to complete the loading of the target file.

  • Simplify sharing. Virtual memory simplifies data sharing between user processes and the operating system. Each process’s address space is divided into equal portions to share kernel data and code, and processes also share dynamic libraries by pointing virtual pages to the same physical pages, such as the LIBC library to which each process connects. This mechanism not only greatly saves memory space, but also simplifies the sharing of data and code between processes.

  • Simplify memory allocation. Virtual memory provides a simple mechanism for user processes to allocate additional memory. When a process needs additional heap space, it requests the OS through the malloc function, and the OS allocates an appropriate number of virtual pages and maps them to physical memory pages, which do not have to be contiguously continuous.

Protect the memory


Virtual memory systems provide a natural way to protect memory. Each time the CPU accesses main memory through a virtual address, the address translation hardware reads a PTE. Additional license bits can be added to the PTE to control access to the virtual page.In this example, three license bits are added to the PTE. The SUP bit indicates whether the process must be running under the kernel to access the page. A process running in user mode can only access the page with the SUP bit NO. The READ bits and WRITE bits are used to control READ and WRITE access to the page. For example, process J cannot write the VP0 page.

If an instruction violates these license conditions, the CPU triggers a general protection failure known in Linux as a “segmentation falut” to indicate that the process has accessed illegal memory.

Address translation


The basic principle of


In a virtual memory system, address translation is done by the MMU unit in the CPU, which is a purely hardware process. Here are some of the terms used in address translation.The following figure shows how the MMU implements address translation through the page table.Refer to the above table to explain the role of the table on the next page in address translation.

  1. PTBR: Page table base address Base register, which points to the start of the current page.
  2. The n-bit virtual address is divided into two parts: VPN and VPO. VPN is used to index the PTE in the page table, and VPO represents the offset of the virtual page.
  3. The M bit physical address is also divided into two parts: PPN and PPO. PPN indicates the number of the physical page. If the virtual page is cached, the PPN of the corresponding physical page is recorded in the PTE.
  4. The valid bits represent the cache state of the virtual page.

The following figure shows what the CPU hardware does when the page hits:

  • Step 1: The processor generates a virtual address VA and passes it to the MMU.
  • Step 2: THE MMU generates the PTEA (PTE Address) and gets the corresponding PTE from the cache/main memory request.
  • Step 3: Cache/main memory returns PTE.
  • Step 4: The MMU constructs the physical address PA from the PTE and transfers it to cache/main memory.
  • Step 5: Cache/memory returns the requested data word to the processor.

Unlike page hits, which are done entirely by the MMU hardware, missing pages require the MMU and OS to work together, as shown in the figure below.

  • Steps 1 to 3: Same as steps 1 to 3 in the page hit process.
  • Step 4: The significant bit in the PTE is 0, so a page missing exception is triggered and the CPU gives control to the page missing handler.
  • Step 5: The page missing handler picks out the sacrificial page in physical memory and, if the page has been modified, pings it out to disk.
  • Step 6: The page missing handler calls in the new page and updates the PTE in memory.
  • Step 7: Page missing handler completes execution and returns. The CPU executes the instruction that causes the page missing again. The MMU translates the address and gets the PTE. Since the virtual page has been called into memory, this time the cache hits and the CPU gets the data word.

Speed up – Cache improves translation speed


The following figure shows the MMU address process with high speed participation.The figure above shows how a physically addressed cache can be combined with virtual memory. As you can see, address translation occurs before cache lookups, and page tables can be cached, just like other normal physical words.

Faster – use TLB


In theory, every time the CPU generates a virtual address, the MMU needs to query a PTE, and the virtual address is translated into a physical address. In the worst case, the MMU needs to pull data from memory one more time, at a cost of tens to hundreds of clock cycles. If the PTE happens to be cached, the overhead drops to one or two clock cycles. However, to eliminate this overhead, the MMU designed a separate cache for page tables: the Translation Lookaside Buffer TLB.

TLB is a small, virtual address addressing cache that caches a PTE block per line. The following figure shows the components of the virtual address used to access the TLB.TLBI is used for group selection of TLBS, and TLBT is used for row matching of TLBS. If the TLB has T=2^n groups, then the TLBI is composed of the lower T bits of the VPN, and the rest of the VPN is used to compose the TLBT.

The following figure shows the steps involved when a TLB hit occurs, which is fairly fast because all operations are hardware automated.

  • Step 1: THE CPU generates a virtual address.
  • Step 2 and Step 3: THE MMU extracts the corresponding PTE from the TLB.
  • Step 4: MMU translates this VA into PA and sends it to cache/main memory.
  • Step 5: Cache/main memory returns the requested data word to the CPU.

If the TLB is not hit, the MMU must fetch the corresponding PTE from the cache or main memory. The new PTE is automatically cached to the TLB, which may overwrite an existing entry.

Stronger – multilevel page tables


So far, we have assumed that the system only uses a single page table for address translation. When we do a little math, we see that memory is overwhelmed with only level-1 page tables. Assuming a 32bit system with a page size of 4KB and a PTE of 4 bytes, the page table size is 4MB. Note that the page table must be resident in memory, and that each process consumes 4MB, so only the page table would be bursting!

So, the page table needs to be compressed, and the way to do that is to useHierarchical page table. Here is an example of the organization structure of a secondary page table.The specificity of this secondary page table is as follows:

  1. The virtual address is 32 bits, the virtual page size is 4KB, and the PTE size is 4 bytes.
  2. Each PTE in the level 1 page table, which is 1K in size, is responsible for mapping a 4MB slice of the virtual address space. Each level 1 PTE points to a level 2 page table.
  3. Each secondary page table is 1K in size, and each PTE points to a virtual page.

This approach reduces the memory burden in two ways:

  1. If a PTE in the primary page table is empty, then the corresponding secondary page table does not need to exist at all, which can save a significant amount of memory space, since most of it is unallocated for a typical 4GB virtual address space.
  2. Only the primary page tables are resident in memory, and the VM can create, page in, and page out the secondary page tables on demand, further reducing the stress on main memory. Only the most frequently used secondary page tables are resident in memory.

The page table structure for the Core I7 CPU is shown below, which is a four-tier page table structure.You can see that the VPN of virtual addresses is divided into four parts, VPN1 to VPN4. VPN1 to VNP3 are used for index page table, and VPN4 is used for index physical page.

conclusion

Above is the introduction of the main principle of virtual memory, VM as an abstraction of main memory, memory management and use has provided great convenience, which makes the process can be simple, efficient use of main memory this rare system resources, it can be said that VM is an indispensable part of modern computing system. The understanding of it can help us better understand the operating principle of the computer system, and write more efficient system software based on this. Enjoy It, :)!