The text before

It’s like I haven’t written in ages. Just a turn, the latest article in 1-19, three months… Sure enough, I’m about to give up Jane Book, but at least there are so many memories here, let’s not lose them. Throw in some of my translated papers later.

The body of the

~ original text I use CAJviewer to identify, and sorted out, there is a need to please private message me ~ or send my email :[email protected] to ask. Below I only put the content of the Chinese, (most of the whole article Google translation, and then read a rough modification. Some parts may still be confusing, but that basically means I don’t understand…) There are also some [] contents in it which are marked by myself and can be ignored


Unikernel specializes in simplifying LibOS and targeting applications to an independent single-purpose virtual machine (VM) running on a Hypervisor, which is called a (virtual) appliance. Unikernel devices have a smaller memory footprint and lower overhead compared to traditional VMS, while ensuring the same isolation level. On the downside, Unikernel stripped its overall device process abstraction, thus sacrificing flexibility, efficiency, and applicability.

This paper explores the optimal balance between Unikernel devices (strong isolation) and processes (high flexibility/efficiency). We offer KylinX, a dynamic library operating system that enables simplified and efficient cloud virtualization by providing pVM (process-like VM) abstraction. The pVM uses the virtual machine Hypervisor as the operating system and the Unikernel device as the process that allows dynamic mapping at the page and library levels.

  • At the page level, KylinX supports pVM forks and a set of apis for inter-PVM communication (IpC).
  • At the library level, KylinX supports shared libraries linked to Unikernel devices at run time.

KylinX enforces mapping restrictions on potential threats. KylinX can fork pVM in about 1.3 milliseconds and link libraries to a running pVM in milliseconds, both comparable to process branching on Linux (about 1 millisecond). The latency for KylinX IpC is also comparable to that for UNIX IpC.

1 introduction

Commodity clouds, such as EC2 [5], provide a common platform where tenants rent virtual machines (VMS) to run their applications. These cloud-based VMS are often dedicated to specific online applications, such as big data analytics [24] and game servers [20], and are referred to as (virtual) appliances [56,64]. Highly specialized single-purpose devices require only a fraction of what traditional operating systems support to run the applications they accommodate, while current general-purpose operating systems contain a wide range of libraries and capabilities, leading to the ultimate multi-user, multi-application scenario. Performance and security penalties caused by the mismatch between the single-purpose use of devices and the generic design of traditional operating systems make the deployment and arrangement of device-based services cumbersome [62,52], inefficient [56], and susceptible to unnecessary libraries [27].

This issue recently drove the design of Unikernel [56], a library operating system (LibOS) architecture designed to enable efficient and secure devices in the cloud. Unikernel reconstructs traditional operating systems into libraries and encapsulates application binaries and required libraries into dedicated application images that can run directly on virtual machine hypervisors such as Xen [30] and KVM [22]. Compared to traditional VMS, Unikernel devices remove unused code:

  • Smaller memory footprint is achieved
  • Shorter startup time
  • Lower overhead
  • The same isolation level is ensured.

The stable interface of the virtual machine Hypervisor avoids the hardware compatibility problems encountered by earlier LibOS [39]. [Mark: Early LibOS can be seen]

On the downside, Unikernel stripped its statically sealed monolithic device of process extraction, thus sacrificing flexibility, efficiency and applicability. For example, Unikernel does not support dynamic forking, which is the basis of the multiprocess abstraction commonly used in traditional UNIX applications; And compile-time immutability makes runtime management impossible, such as online library updates and address space randomization. This defect greatly reduces the applicability and performance of Unikernel.

【 in a multitasking operating system, process (run) need a way to create new processes, such as running other applications. The Fork and its variants in unix-like systems usually is the only way to do so. If the process need to start another program executable file, it need to Fork to create a copy. This copy, the “child process *” *, then invokes the exec system call, overwriting itself with other programs: stopping its previous program and executing other programs.

Fork** creates a separate address space for the child process. The child process has exact copies of all memory segments of the parent process. In modern UNIX variants, this follows the virtual memory model from SUnos-4.0, where physical memory does not need to be physically copied according to the copy-on-write semantics. Instead, the virtual memory pages of both processes may point to the same page in physical memory, and writing does not occur until they write to the page. This optimization is important in cases where fork is used in conjunction with exec to execute a new program. Typically, a child process performs a small set of actions that benefit other programs before stopping the program, and it may use a small amount of its parent’s data structure.

When a process calls fork, it is considered the parent and the newly created process is its child. After the fork, both processes run the same program and resume execution as if the system call had been called. They can then examine the return value of the call to determine its status: whether it is a parent or child, and act accordingly.

The fork** system call has been around since the first version of Unix [1] and was borrowed from the earlier GENIE time-sharing system. [2]Fork** is part of standardized POSIX.

In this article, we will examine whether there is a balance to get the best performance from Unikernel devices (strong isolation) and processes (high flexibility/efficiency). Drawing an analogy between applications on virtual machine hypervisors (hypervisors) and processes on traditional operating systems, one step further from static Unikernel, there is KylinX, a dynamic library operating system, Simplify and efficient cloud virtualization by providing process-like VM (pVM) abstractions. We treat the virtual machine Hypervisor as an operating system and a process-like application that allows dynamic mapping of the pVM at the page and library levels.

Hypervisor, also known as virtual machine monitor Virtual Machine Monitor (VMM) is used to create and execute the software, firmware, or hardware of a virtual machine. Computers used by the Hypervisor to execute one or more virtual machines are called host machines, and these virtual machines are called guest machines. A hypervisor provides a virtual operating platform that executes guest operating systems and manages the execution phases of other guest operating systems that share virtualized hardware resources.

  • At the page level, KylinX supports pVM fork and a set of apis for Inter-PVM Communication (IpC), which is compatible with traditional UNIX interprocess communication (IpC). The security of IpC is guaranteed by only allowing IpC between trusted PVMS that fork from the same root pVM.
  • At the library level, KylinX supports shared libraries that dynamically link to Unikernel devices, enabling pVM to perform (I) online library updates, replace old libraries with new ones at run time, and (ii) reuse memory domains for quick startup. We analyze potential threats arising from dynamic mapping and enforce limitations accordingly.

We implemented a prototype of KylinX based on Xen [30] (Type 1 Hypervisor) by modifying MiniOS [14] (Unikernel LibOS written in C) and Xen’ Stoolstack. KylinX can allocate PVMS in about 1.3 milliseconds and link libraries to a running pVM in milliseconds, both comparable to process branching on Linux (about 1 millisecond). KylinX IpC also has latency comparable to UNIX IpC. Evaluations of real-world applications, including Redisserver [13] and Web servers [11], show that Kylin has higher applicability and performance than static Unikernel while retaining isolation guarantees.

The rest of this article is arranged as follows. Section 2 covers the background and design options. Section 3 describes the design of dynamically customized KylinX LibOS with security restrictions. Section 4 reports the results of the evaluation of the KylinX prototype implementation. Section 5 describes related work. Section 6 summarizes the paper and discusses future work.

2 background

2.1 VM, Container, and Picoprocesses

There are several traditional models in the virtualization and isolation literature: processes, Jails, and VMS.

  1. OS process. The process model is targeted at a traditional (partially trusted) operating system environment and provides a rich ABI (application binary Interface) and interactivity that makes it unsuitable for true delivery to tenants.
  2. FreeBSD Jails [47]. The Jail model provides a lightweight mechanism for separating an application from its associated policies. It runs a single process on a traditional operating system, but limits several system call interfaces to reduce vulnerabilities.
  3. VMs. The VM model builds isolated boundary matching hardware. It provides the traditional compatibility of running a full operating system for guest virtual machines, but it can be very expensive for duplicate and residual operating system components.

Virtual machines (Figure 1 (left)) have been widely used in multi-tenant clouds because they guarantee strong (type-supervised) isolation [55]. However, the current virtualization architecture of VMS, which includes advanced hypervisors, VMS, OS cores, processes, language runtimes (such as Glibc [16] and JVM [21]), libraries, and applications, is too complex to meet the efficiency requirements of the commercial cloud.

Containers (such as LXC [9] and Docker [15]) utilize kernel functionality to package and isolate processes. They are in high demand recently [25,7,6] because they are lightweight compared to VM. However, containers are less isolated than VMS, so they often run in VMS for proper security [58].

The Docker containers implementation principle and isolation on pit is introduced: http://dockone.io/article/8148]

Picoprocesses [38] (Figure 1 (middle)) can be thought of as more isolated but lighter containers. They implement LibOS using a small interface between the host operating system and the client, implementing the host ABI and mapping the high-level client API to the small interface. Picoprocesses is particularly suited for client software delivery, which needs to run on a variety of host hardware and operating system combinations [38]. They also run on hvpervisors [62,32].

Recent research on picoprocesses [67,32,54] allows dynamics to restart the original static isolation model. For example, Graphene [67] supports picoProcess forks and multi-Picoprocess apis, while Bascule [32] allows os-independent extensions to be attached to run-time Picoprocess. While these relaxations dilute the strict isolation model, they effectively extend the reach of PicoProcesses to a wider range of applications.

2.2 Unikernel Appliances

Process-based virtualization and isolation technologies face challenges from extensive kernel system call apis for interacting with host operating systems. For example, process/thread management, IPC, networking, etc. Linux system calls have reached nearly 400 [3] and are increasing, and system call apis are harder to protect than the VM’S ABI, which can take advantage of hardware memory isolation and CPU rings [58].

More recently, researchers have proposed reducing VMS, rather than increasing processes, to achieve secure and efficient cloud virtualization. Unikernel [56] focuses on single-application VM devices [26] and applies LibOS in the style of Exokernel [39] to VM guest VMS to improve performance advantages while preserving the strong isolation guarantee of Class 1 VM hypervisors. It breaks with the traditional generic virtualization architecture (Figure 1 (left)) and implements OS functionality (for example, device drivers and networking) as libraries. In contrast to other hypervisor-based simplified virtual machines (such as Tiny CoreLinux [19] and OS 0 [49]), Unikernel encapsulates only applications and their required libraries in an image.

The Mark: Exokernel is an operating system kernel developed by the MIT Parallel and Distributed Operating Systems group,[1] and also a class of similar operating systems.

Operating systems generally present hardware resources to applications through high-level abstractions such as (virtual) file systems. The idea behind exokernels is to force as few abstractions as possible on application developers, enabling them to make as many decisions as possible about hardware abstractions.[2] Exokernels are tiny, since functionality is limited to ensuring protection and multiplexing of resources, which is considerably simpler than conventional microkernels’ implementation of message passing and monolithic kernels’ implementation of high-level abstractions.

Implemented applications are called library operating systems; they may request specific memory addresses, disk blocks, etc. The kernel only ensures that the requested resource is free, and the application is allowed to access it. This low-level hardware access allows the programmer to implement custom abstractions, and omit unnecessary ones, most commonly to improve a program’s performance. It also allows programmers to choose what level of abstraction they want, high, or low.

Exokernels can be seen as an application of the end-to-end principle to operating systems, in that they do not force an application program to layer its abstractions on top of other abstractions that were designed with different requirements in mind. For example, in the MIT Exokernel project, the Cheetah web server stores preformatted Internet Protocol packets on the disk, the kernel provides safe access to the disk by preventing unauthorized reading and writing, but how the disk is abstracted is up to the application or the libraries the application uses. https://en.wikipedia.org/wiki/Exokernel * *

L [Type I: local or bare-metal Hypervisor

L Type II: Hosted Hypervisor】

Since the Hypervisor already provides many of the management functions of traditional operating systems (such as isolation and scheduling), Unikernel has adopted the minimalism philosophy [36], It minimizes the VM not only by removing unnecessary libraries but also by stripping duplicative administration functions from its LibOS. For example, Mirage[57] follows the multi-core model [31] and makes use of the virtual machine Hypervisor for multi-core scheduling so that the single-threaded runtime can have fast sequential performance. MiniOS [14] relies on the Hypervisor (rather than the in-libos linker) to load/link device startup times; LightVM [58] achieved fast VM startup by redesigning Xen’s control plane.

2.3 Motivation and design choices

Unikernel devices and traditional UNIX processes can abstract away units of isolation, privilege, and execution state and provide management functions such as memory mapping, execution collaboration, and scheduling. To achieve low memory footprint and small computing base (TCB), Unikernel stripped the process abstraction of its monolithic devices and linked minimalist LibOS to their target applications, demonstrating the benefits of relying on the virtual machine Hypervisor to eliminate duplication of functionality. On the downside, the integrity of process and compile-time determination greatly reduces Unikernel’s flexibility, efficiency, and applicability.

TCB stands for Trusted Computing Base, referring to the aggregate of protective devices within a computer, including hardware, firmware, software, and the combination responsible for enforcing security policies. It establishes a basic protected environment and provides additional user services required by a Trusted computer system.

As shown in Figure 1 (right), KylinX explicitly provides the pVM abstraction by making the virtual machine Hypervisor explicit as an OS and the Unikernel device explicit as a process. KylinX eases Unikernel’s compile-time monolithic requirements, allowing page-level and library-level dynamic mapping so that pVM can have the best performance of Unikernel devices and UNIX processes. As shown in Table 1, KylinX can be seen as an extension to Unikernel (providing pVM abstraction), similar to the extensions of Graphene [67] (providing traditional multi-process compatibility) and Bascule [32] (providing runtime extensibility) to Picoorocess.

We implemented the dynamic mapping extension of KylinX in the Hypervisor rather than guestLibOS for the following reasons.

  1. First, the extensions outside guestLibOS allow the Hypervisor to enforce mapping restrictions (sections 3.2.3 and 3.3.3), improving security.

  2. Second, the Hypervisor provides more flexibility for dynamic management, for example, restoring real-time state during an online library update for the pVM (Section 3.3.2, p. 462).

  3. Third, KylinX naturally follows Unikernel’s minimalist philosophy (Chapter 2.2), utilizing the virtual machine Hypervisor to eliminate duplicate client LibOS functionality.

Backward compatibility is another trade-off. The original Mirage Unikernel [56] occupied an extreme position where existing applications and libraries had to be completely rewritten in OCaml [10] to ensure type-safety, which required considerable engineering work and could introduce new vulnerabilities. By contrast, KylinX is designed to support source code (mostly C) compatibility, so a wide variety of legacy applications can be run on KylinX with minimal tweaks.

Threat model. KylinX uses a traditional threat model [56,49] in the same context as Unikernel [56], where VM/pVM runs on the virtual machine Hypervisor and is expected to provide network oriented services in a common multi-tenant cloud. We assume that an attacker can run untrusted code in THE VM/pVM and that applications running in the VM/pVM are potentially threatened by others in the same cloud and by malicious hosts connected over the Internet. KylinX treats the virtual machine Hypervisor (using its tool stack) and control domain (DOM0) as part of the TCB and uses the virtual machine Hypervisor to isolate attacks from other tenants. Using security protocols such as SSL and SSH helps the KylinX pVM trust external entities.

【www.ibm.com/developerwo…

——————————————————————————

Dom0 is the initial domain started by the Xen hypervisor on boot. Dom0 is an abbrevation of “Domain 0” (sometimes written as “domain zero” or the “host domain”). Dom0 is a privileged domain that starts first and manages the DomU unprivileged domains. The Xen hypervisor is not usable without Dom0

A DomU is the counterpart to Dom0; it is an unprivileged domain with (by default) no access to the hardware. It must run a FrontendDriver for multiplexed hardware it wishes to share with other domains. A DomU is started by running

Recent advances in hardware such as Intel software Guard eXtensions (SGX) [12] demonstrate the feasibility of masking execution and protecting VM/pVM from privileged virtual machine Hypervisor and dom0 [33,28,45], which will be investigated in our future work. We also assume that no hardware devices have been compromised, although hardware threats have been identified [34].

3 KylinX design

3.1 an overview of the

KylinX extends Unikernel to achieve desirable functionality previously only available to processes. Instead of designing a new LibOS from scratch, we built KylinX based on MiniOS [27], This is a C-style Unikernel Libs. MiniOS of user VM domain (domU) running on the Xen virtual machine Hypervisor that uses its front-end drivers to access hardware, connecting to the corresponding hardware privilege DOM0 or back-end drivers in the dedicated driver domain. MiniOS has a single address space, no kernel and user space separation, and no preemptive simple scheduler. MiniOS is small but lends itself to a clean and efficient LibOS design on Xen. For example, Erlang on Xen [1], LuaJIT [2], C1ickOS [59], and LightVM [58] utilize MiniOS to provide Erlang, Lua, Click, and quick-start environments, respectively. [Marked: basic principle explanation blog.51cto.com/wzlinux/172 Xen virtualization… the Mini – OS: wiki.xenproject.org/wiki/Mini-O…

As shown in Figure 2, the MiniOS-based KylinX design includes (I) a (limited) dynamic page/library mapping extension of the Xen tool stack in DomO, and (ii) process abstraction support (including dynamic pVMfork/IpC and runtime pVM library links) in DomU.

3.2 Dynamic Page Mapping

KylinX supports process device branching and communication by leveraging Xen’s shared memory and authorization tables to perform cross-domain page mapping.

3.2.1 pVM Fork

The fork API is the foundation for implementing pVM’s traditional multi-process abstraction. KylinX treats each user domain (pVM) as a process that generates a new pVM when the application calls fork().

We use Xen’s memory sharing mechanism to fork the child pVM by (I) copying the xc_dom_image structure and (ii) calling Xen’s unpause()API to fork the parent pVM and return its domain ID to the parent.

As shown in Figure 3, when fork() is called in the parent pVM, we use inline assembly to get the current state of the CPU registers and pass them to the child registers. The control domain (DOM0) is responsible for the branch and initiator pVM. We modified libxc to keep the xc_dom_images structure in memory when the parent pVM was created, so that when fork() was called, the structure could be mapped directly into the virtual address space of the child pVM, which the parent could then share with the child using the authorization table. Share writable data in copy-on-write (CoW) mode. Copy-on-write (CoW or CoW), sometimes referred to as implicit sharing[1] or shadowing,[2] is a resource-management technique used in computer programming to efficiently implement a “duplicate” or “copy” operation on modifiable resources.[3] If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred to the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, While adding a small overhead to resource-modifying operations. If a resource is repeated but not modified, there is no need to create a new resource; the resource can be shared between the copy and the original file. Modify must still create a copy, so technology: the copy operation be postponed until the first write. Through sharing resources in this way, can significantly reduce the unmodified copy of the resource consumption, at the same time for the resource has been increased by a small number of overhead en.wikipedia.org/wiki/Copy-o modification operations.

After starting the pVM child with unpause(), it (I) accepts the shared page from its parent, (ii) restores the CPU register and jumps to the next instruction after fork, and (iii) starts running as a child. After fork(), KylinX asynchronously initializes an event channel and shares dedicated pages between parent and child PVMS to enable their IpC, as described in the next section.

3.2.2 Inter-PVM Communication (IpC)

KylinX provides a multi-process (multi-PVM) application in which all processes (PVMS) run cooperatively on the OS (Hypervisor). Currently, KylinX follows a strict isolation model [67] in which only trusted PVMS can communicate with each other, as discussed in detail in Section 3.2.3.

The two communication PVMS use the event channel to share pages for communication between PVMS. If two trusted PVMS have not initialized the event channel when they communicate for the first time because they do not have a parent-child relationship through fork(3.2.1), then KylinX will:

  • Verify mutual adaptability (Section 3.2.3, Must be a family)
  • Initialize the event channel
  • Share dedicated pages between them.

Event channels are used to notify events, and shared pages are used to enable communication. KylinX has implemented the following four types of Inter-PVM communication apis (listed in Table 2).

  • (1) Pipe (fd) creates a pipe and returns two file descriptors (fd [0] and fd [1]), one for writing and one for reading.
  • (2) Kill (domID, SIG) sends the signal (SIG) to another pVM (domID) by writing the SIG to the shared page and notifying the target pVM (DomID) to read the signal from the page; Exit and wait are implemented using kill.
  • (3) FTOK (Path, projid) converts path and PROj ID to IpC key, which will be used by MSGGET (key, MSGFLG) to create the tagged message queue (MSGFLG) and return the queue ID (MSGID); Msgsend (MSgid, MSG, len) and MSGRCV (msgid, MSG, len) write/read queue (msgid) to/from the mSMbuf structure (MSG) using length len.
  • (4) shmget (key, size, SHMFLG) creates and stores a memory region using key, memory size (size) and flag (SHMFLG), and returns the shared memory region ID (shmid), which can be attached and separated to shmat (shmid, shmaddr, SHMFLG and SHMDT (shmaddr).

3.2.3 Restrictions on dynamic Page mapping

When dynamic pVM fork is performed, the parent pVM shares the page with the empty child pVM, and the process introduces no new threats.

When performing IpC, KylinX ensures security by abstracting trusted PVMS that fork from the same root pVM. For example, if pVM A branches pVM B, which further forks another pVM C, then the three PVMS A, B, and C belong to the same family. For simplicity, currently KylinX follows an all-nothing isolation model: Only PVMS that belong to the same family are considered trusted and allowed to communicate with each other. KylinX rejects requests for communication between untrusted PVMS.

3.3 Dynamic library mapping

3.3.1 pVM Library Links

Inherited from MiniOS, KylinX has a single flat virtual storage address space, where application binaries and libraries, system libraries (for booters, memory allocation, etc.) and data structures are located together to run. KylinX adds a dynamic segment to MiniOS ‘original memory layout to fit the dynamic library after loading.

As shown in Figure 2, we implemented the dynamic library mapping mechanism in the Xen Control library (libxc), which is used by the upper tool stack, such as XM/XL/Chaos. The pVM is actually a paravirtualized domU, which

  • Create a domain,
  • Parse the kernel image file
  • Initialize boot memory,
  • Build the image in memory
  • Start the domU of the image.

In step 4 above, we extended libxc’s static linking process by adding a function (xc_dom_map_dyn()) to map shared libraries to dynamic segments, as shown below.

  1. First, KylinX reads the address, offset, file size, and memory size of the shared library from the program header table of the device image.
  2. Second, it verifies that restrictions are met (Section 3.3.4, p. 362). KylinX enforces restrictions on the identity of the library and its loader. If not, the program terminates.
  3. Third, for each dynamic library, KylinX retrieves information about its dynamic parts, including dynamic string tables, symbol tables, and so on.
  4. Fourth, KylinX maps all required libraries in the entire dependency tree to the dynamic segment of the pVM, lazily relocating the unresolved symbols to the correct virtual address when actually accessed.
  5. Finally, it jumps to the pVM entry point.

KylinX does not load/link the shared library until it is actually used, which is similar to the delayed binding of traditional processes [17]. As a result, the startup time of the KylinX pVM is lower than that of previous Unikernel VMS. In addition, another advantage of KylinX using shared libraries over Unikernel, which previously supported only static libraries, is that it effectively reduces memory footprint in high-density deployments (e.g., 8K VM per machine and 80K container per machine in LightVM [58]). In Flurries [71], This is the single biggest factor limiting scalability and performance [58].

[Mark: dynamic link and delayed binding: www.jianshu.com/p/20faf72e0…

Next, we’ll discuss two simple applications of KylinX AVM’s dynamic library mapping.

3.3.2 Online pVM Library Updates It is important to keep the system/application library up to date to fix bugs and vulnerabilities. Static Unikernel [56] must recompile and restart the entire device image to apply updates to each of its libraries, which can lead to a significant deployment burden when the device has many third-party libraries.

Online library updates are more attractive than rolling launches, primarily to maintain a connection to the client. First, when the server has many long-term connections, rebooting results in high reconnection overhead. Second, there is uncertainty about whether third-party clients will re-establish connections, which imposes complex design logic for reconnecting after a reboot. Third, frequent restarts and reconnections can seriously degrade the performance of critical applications, such as high-frequency trading.

Dynamic mapping enables KylinX to implement online library updates. However, libraries may have their own state, such as compression or encryption, so simply replacing stateless functions does not meet KylinX’s requirements.

Like most library update mechanisms (including DYMOS [51], Ksplice [29], Ginseng [61], PoLUS [37], Katana [63], Kitsune [41], etc.), KylinX requires old and new libraries to be binary compatible: New functions and variables are allowed to be added to the library, but it is not allowed to change the function interface, delete functions/variables, or change structural fields. For library state, we expect all state to be stored as variables (or dynamically allocated structures) that will be saved and restored during updates.

KylinX provides update (domID, new_lib, old_lib) apis that can dynamically replace old_lib and new_lib with domU pVM (ID = domID) with the necessary library status updates. We also provide an update command “update domid, new_lib, old_lib” to parse the parameters and call the update() API.

The difficulty with dynamic pVM updates is manipulating symbol tables in a sealed VM device. We use DOM0 to solve this problem. When the update API is called, Dom0 will (I) map the new library to Dom0’s virtual address space; (ii) Shared loaded libraries with domU; (iii) Verify that the old library is static by requiring domU to check the call stack of each domU kernel thread; (iv) Suspend execution until the old library is not used; (v) Modify the affected symbol entry to the appropriate address; Finally (vi) release the old library. In step 5 above, two kinds of symbols (functions and variables) are parsed as described below.

Function. The dynamic parsing of the function is shown in Figure 4. We keep the relocation table, symbol table, and string table in DOM0 because they are not in the loadable segment. We load the global offset table function (.got.plt) and procedure link table (.plt) in Dom0 and share them with domU. To resolve symbols in different domains, we modify 2nd’s assembly line (shown in blue in Figure 4) in the 1st entry of the.plT table to point to KylinX’s symbol resolution function (DU_resolve). After loading the new library (new_lib), each function entered in old_lib into the.got.plt table (for example, foo in Figure 4) is modified to point to the corresponding entry in the.plT table, the 2nd component (push n) shown by the dotted green line in Figure 4. When a function first calls foo of the library after new_lib is loaded, du_resolve is called with two arguments (n (got+ 4)), where n is the offset of the symbol (foo) in the.got.plt table and (got+ 4) is the ID of the current module. Du_resolve then asks Dom0 to call the corresponding D0_resolve, which finds Foo in new_lib and updates the corresponding entry in the current module’s Gt.plt table (located by n) to the correct address of Foo (blue line in figure) 4.

The variable. The dynamic resolution of variables is slightly more complex. For now, we’re just assuming that new_libexpect sets all its variables to their real-time state, inold_lib, rather than their initial values. Without this limitation, the compiler would need to extend to allow developers to specify their intent for each variable.

(1) Global variables. If a library global variable (g) is accessed in the main program, then g is stored in the program’s data segment (.bss) and there is an entry in the global offset table (.got) of the library pointing to G. Otherwise, g is stored in the data segment of the library, so KylinX is responsible for copying the global variable G from old_lib to new_lib.

(2) Static variables. Since static variables are stored in the data segment of the library and cannot be accessed externally, KylinX will simply copy them one by one from old_lib to new_lib after loading new_1 IB.

(3) Pointers. If the library pointer (p) points to a dynamically allocated structure, KylinX presets the structure and sets p in new_lib to it. If p points to a global variable stored in the program data segment, p is copied from old_lib to new_lib. If you point to a static variable (or a global variable stored in the library), p points to the new address.

3.3.3 pVM recycling

The standard startup of KylinX pVM and Unikernel VM [58] is relatively slow (Section 3.3.1, p. As described in Chapter 4 of the evaluation, starting pVM or UnikernelVM takes more than 100 milliseconds, most of which is spent creating the airspace. Therefore, we designed a pVM reclamation mechanism for KylinX pVM that uses dynamic library mapping to bypass domain creation.

The basic idea of reclamation is to reuse an in-memory domain to dynamically map applications (as shared libraries) to that domain. Specifically, before calling the placeholder dynamic library’s app_entry function, an empty recoverable field is checked and the application is waiting to run. Compile the application as a shared library instead of a bootable image, using APP_entry as an entry. To speed up the pVM boot of the application, KylinX restores the checked domain and links the application library by replacing the placeholder library after the online update process (Section 3.3.2, p. 462).

3.3.4 Dynamic library mapping restrictions

KylinX should isolate any new vulnerabilities, in contrast to Unikernel, which is static and monolithic when performing dynamic library mappings. The main threat is that an attacker might load malicious libraries into the pVM’s address space, replace libraries with damaged libraries with the same name and symbols, or modify entries in the symbol table of shared libraries to pseudo-symbols/functions.

*** *** *** **

http://www.choudan.net/2013/10/24/Linux%E8%BF%9B%E7%A8%8B%E5%9C%B0%E5%9D%80%E7%A9%BA%E9%97%B4%E5%AD%A6%E4%B9%A0***(%E4%B 8%, 80). The HTML * * * * * * * * *

To address these threats, KylinX enforces restrictions on library identity and library loaders. KylinX allows developers to specify signature, version, and loader restrictions for dynamic libraries, which are stored in the header of the pVM image and verified before linking to the library.

** Signature and version. ** The library developer first generates a SHA1 summary of the library, which will be encrypted by RSA (Rivest-Shamir-Adleman). The results are stored in the Asignature section of the dynamic library. If the device needs to validate the library’s signature, KylinX reads and validates the signature portion using the public key. Version limits are also required and verified.

Loader. Developers can request different levels of restrictions on the library’s loader :(I) only allow the pVM itself to be the loader; (ii) other PVMS for the same application are also allowed; Or (iii) even allow pVM for other applications. With the first two restrictions, a malicious library in one compromised application does not affect other applications. Another case of loader checking is to use the application binaries as libraries and link them to the pVM for quick recycling (Section 3.3.3, p. 473), where KylinX restricts the loader to an empty pVM.

With these limitations, KylinX introduces no new threats compared to the statically sealed Unikernel. For example, a runtime library update to the pVM (Section 3.3.2, p. 462) with restrictions on signatures (as trusted developers), versions (as specific version numbers), and loaders (as the pVM itself) will be recompiled and restarted with the same level of security.

4 assessment

We implemented the KylinX prototype on Ubuntu 16.04 and Xen. Following the default Settings of MiniOS [14], we used RedHat Newlib and 1wIP for the liBC /libm library and TCP/IP stack, respectively. Our test stand had two machines, each equipped with an Intel 6 Core Xeon ES-2640 CPU, 128 GB of RAM and a 1 GB ENIC.

We have ported several applications to KylinX, where we will use a multi-process Redis server [13] and a multi-threaded Web server [11] to evaluate KylinX application performance in Chapter 4.6. Due to the limitations of MiniOS and RedHat Newlib, two adjustments are currently required to port applications to KylinX. First, KylinX only supports SELECT, not the more efficient epoll. Second, interprocess communication (IPC) is limited to the apis listed in Table 2.

4.1 Standard Startup

We evaluated the standard startup process (3.3.1) time for KylinX pVM and compared it to all MiniOS VMS and Docker containers running Redis servers. Redis is an in-memory key-value store that supports fast key-value store/query. Each key-value pair consists of fixed-length keys and variable-length values. It uses a single-threaded process to serve user requests and implements (periodic) serialization by assigning new backup procedures.

We disabled Xen Store logging to eliminate the interference of periodic log file refreshes. RedHat Newlib’s C library (libc) is static in embedded systems and difficult to convert to a shared library. In a nutshell, we compile liBC into a static library and libyn (the math library for Newlib) into a shared library that will be linked to KylinX pVM at run time. Since the Mini doesn’t support forking, we removed the code from this experiment (for now).

Starting a single KylinX pVM takes about 124 milliseconds and can be roughly divided into two phases, namely creating the domain/image in memory (steps 1-4, Section 3.2.1) and booting the image (Step 5). Dynamic mapping is performed in the first phase. Most of the time (about 121 milliseconds) is spent in the first phase, where high-level calls are made to interact with the Hypervisor. The second phase takes about 3 milliseconds to start the pVM. In contrast, MiniOS takes about 133 milliseconds to start a virtual machine, while Docker takes about 210 milliseconds to start building containers. KylinX takes less time than MiniOS, mainly because shared libraries are not read/linked during boot.

We then evaluated the total time it took to start a large number of PVMS (up to 1K) sequentially on a single machine. We also evaluated the total startup time of the MiniOS VM and Docker containers for comparison.

The result is shown in Figure 5. First, KylinX is much faster than MiniOS due to lazy loading/linking. Secondly, the startup time of MiniOS and KylinX increases linearly with the increase of the number of VM/pVM, while the startup time of Docker container only increases linearly, mainly because XenStore is very inefficient when serving a large number of VM/pVM.

4.2 Fork and recycle

Compared to containers, KylinX’s standard startup does not scale well to a large number of PVMS due to the high efficiency of XenStore. More recently, LightVM [58] completely redesigned Xen’s control plane by implementing Chaos /libchaos, NOXS (without XenStore) and splitting tool stacks, and many other optimizations to achieve Ms-level startup times for a large number of VMS. We used LightVM’s nox to eliminate the XenStore effect and tested the pVM fork mechanism running unmodified Redis emulation of traditional process branches. LightVM’s NOXS enables the startup time of The KylinX pVM to increase linearly even for large numbers of PVMS. The fork of a single pVM takes about 1.3 milliseconds (no lack of space shown), which is several times faster than the original startup process of LightVM (about 4.5 milliseconds). The fork of the KylinX pVM is slightly slower than the process fork on Ubuntu (about 1 ms) because multiple operations, including page sharing and parameter passing, are time-consuming. Note that the initialization of the event channel and shared pages of the parent/child pVM is performed asynchronously and therefore does not count towards the delay of fork.

4.3 Memory Usage

We measured KylinX, MiniOS, and Docker (Running Redis) memory usage for different numbers of pVM/VM/ containers on one machine. The results (shown in Figure 6) demonstrate that the KylinX pVM has less storage space than statically sealed MiniOS and Docker containers. This is because KylinX allows all devices of the same application (Chapter 3.3) to share libraries (except liBC), so shared libraries need to be loaded at most once. The memory footprint advantage facilitates virtualization [42], can be used to dynamically share physical memory between VM devices, and enables KylinX to achieve memory efficiencies comparable to page-level deduplication [40] while reducing complexity.

4.4 Intel – pVM communication

We evaluated the performance of inter-PVM communication (IpC) by assigning parent PVMS and measuring parent/child communication delays. We call a pair of parent/child PVMS linear PVMS. As described in section 3.2.1 of the introduction, the two linear PVMS already have an event channel sharing the page, so they can communicate directly with other pages. In contrast, nonlinear pVM pairs must initialize event channels and shared pages before the first communication.

The results are listed in Table 3 and compared to the corresponding IPC on Ubuntu. Due to Xen’s high-performance event channel and shared memory mechanism, the KylinX IpC latency between two linear PVMS is comparable to the corresponding IpC latency on Ubuntu. Note that the delay for PIPE includes not only creating the pipe, but also writing and reading values through the pipe. The first communication delay between nonlinear PVMS is several times higher due to initialization costs.

4.5 Runtime Library Updates We evaluated KylinX runtime library updates by dynamically replacing the default libyn (RedHat Newlib 1.16) with a newer version (RedHat Newlib 1.18). Libyn is a math library used by MiniOS/KylinX and contains a set of 110 basic math functions.

To test KylinX’s updating of global variables, we also added 111 pseudo-global variables to the old and new libyn libraries along with a read_global function that reads out all global variables. The main function first sets global variables to random values and then periodically validates them by calling the read global function.

Thus, there were a total of 111 functions and 111 variables that needed to be updated in our tests. The update process can be roughly divided into four phases, and we measure the execution time of each phase.

First, KylinX loads new_lib into dom0’s memory and shares it with domU. Second, KylinX modifies the entries in the function in the.got. PLT table to point to the corresponding entries in the.plt table. Third, KylinX calls du_resolve for each function that requires Dom0 to resolve the given function and return its address in new_lib, then update the corresponding entry to the returned address. Finally, KylinX resolves the corresponding entries for global variables in the new lib.got table to the appropriate addresses. We modified the third stage in the evaluation, updating all 111 functions in libyn at once, instead of lazily linking functions when they are actually called (Section 3.3.2, p. 362), in order to outline the entire runtime update cost for libyn.

The result is shown in Figure 7, where the total cost of updating all the functions and variables is about 5 milliseconds. The overhead of phase 3 (parsing functions) is higher than that of other phases, including phase 4 (parsing variables), which is caused by several time-consuming operations in phase 3, including parsing symbols, cross-domain calls to D0_ resolve, returning the actual function address, and updating the corresponding entries.

4.6 the application

In addition to the process flexibility and efficiency of pVM scheduling and management, KylinX provides high performance for the applications it accommodates, comparable to their counterparts on Ubuntu, as described in this section.

4.6.1 Redis server application

We evaluated Redis server performance in KylinX pVM and compared it to performance in MiniOS/Ubuntu. Also, since MiniOS does not support fork(), we temporarily removed the serialization code. Redis servers use SELECT instead of epoll for asynchronous I/O because the lwIP stack used by MiniOS and KylinX [4] does not yet support epoll.

We used the Redis benchmark [13] to evaluate performance, which uses a configurable number of busy loops to write KV asynchronously. We run a different number of pVM/VM/ processes (one per server) for client write requests. We measure write throughput as a function of the number of servers (Figure 8). The three Redis servers had similar write throughput (due to selection constraints), increasing almost linearly with the number of concurrent servers (scaling was linear, up to 8 instances, until the lwIP stack became a bottleneck).

4.6.2 Web server applications

We evaluated the JOS Web server [11] in KylinX, which provides multithreading for multiple connections. After the main thread accepts the incoming connection, the Web server creates a worker thread to parse the header, read the file, and send the content back to the client. We used Weighttp benchmarks to support a small portion of the HTTP protocol (but enough for our Web server) to measure Web server performance. Similar to the Redis server evaluation, we tested the Web server by running multiple Weighttp [8] clients on a single computer, with each client constantly sending GET requests to the Web server.

We evaluated throughput as a function of the number of concurrent clients and compared it to Web servers running on MiniOS and Ubuntu, respectively. The result is shown in Figure 9, where the KylinX Web server achieves higher throughput than the MiniOS Web server because it provides higher sequential performance. Both KylinX and MiniOS Web servers are slower than Ubuntu Web servers because asynchronous selection is inefficiently scheduled using MiniOS ‘network drivers [27].

5 Related work

KylinX is associated with static Unikernel devices [56,27], reducing VM [19,48,49], containers [66,9,15] and image processing [38,62,32,54,67,33].

5.1 Unikernel and Redced VMs

KylinX is an extension of Unikernel[56], implemented on top of MiniOS [27]. Unikernel OS includes Mirage [56], Jitsu [55], Unikraft [18], etc. For example, Jitsu [55] uses Mirage [56] to design a powerful and responsive platform for hosting cloud services on edge networks. LightVM [58] uses Unikernel on Xen for fast startup.

MiniOS [27] designed and implemented a C-style Unikernel LibOS that ran as a semi-virtual client OS in the Xen domain. MiniOS has better backward compatibility than Mirage and supports single-process applications written in C. However, the original MiniOS statically sealed device suffered from similar problems to other static Unikernels.

The difference between KylinX and static Unikernels such as Mirage [56], MiniOS [27], and EbbRT [65] is the pVM abstraction, which explicitly treats Hypervisoras as an operating system and supports procedural operations, Such as pVM fork/IpC and dynamic library mapping. The mapping restrictions (section 3.3.4, p. 367) allow KylinX to introduce bugs as much as possible, and there is no TCB larger than Mirage/MiniOS [56,55]. KylinX supports source code (C) compatibility rather than rewriting the entire software stack using a type-safe language [56].

Recent studies [19,49,48] have attempted to improve hypervisor-based type 1 virtual machines to achieve smaller memory footprint, shorter startup times, and higher execution performance. Tiny Core Linux [19] minimizes existing Linux distributions to reduce customer overhead. 0S0 [49] implemented a new guest operating system for running a single application on a VM, resolving liBC function calls to its kernel using optimization techniques, Examples include Spinlock-freemutex [70] and network channel network stack [46].RumpKernel [48] reduces VM by implementing optimized guest operating systems. Unlike KylinX, these generic LibOS designs contain unnecessary features for targeted applications, resulting in a larger attack surface. They do not support multi-process abstractions. In addition, KylinX’s pVM fork is much faster than the repeatable VM_fork in SnowFlock [50].

5.2 container

Containers use operating system-level virtualization [66] and leverage kernel capabilities to package and isolate processes rather than relying on the virtual machine Hypervisor. In return, they don’t need to capture system calls or emulated hardware, and can run as normal OS processes. For example, Linux Containers (LXC) [9] and Docker [15] create Containers to package resources and run container-based processes using many Linux kernel features, such as Naynespaces and Cgroups.

Containers need to use the same host operating system API [49] to expose hundreds of system calls and expand the host’s attack surface. Thus, while LXC and Docker containers are generally more efficient than traditional VMS, they offer less security because an attacker can corrupt processes running inside the container.

5.3 Picoprocess

Picoprocess is essentially a container that implements LibOS between the host operating system and the client, mapping high-level client apis onto a small interface. The original picoprocess design (Xax [38] and interrogated [43]) allows for only one small system call API, which can be small enough to be convincingly (and even verifiable) isolated. Howell, etc. Show how to support a small group of single-process applications on top of a minimal PicoProcess interface [44] by providing a POSIX emulation layer and binding existing programs.

Recent e studies have relaxed static and rigid microprocess isolation models. For example, Drawbridge [62] is a WinBows transformation of Xax [38] PicoProcess and creates picoProcess LibOS that supports rich desktop applications. Graphene [67] extends the LibOS paradigm to support the use of multi-process apis (using messaging) in a family of Picoprocesses (sandbox). Bascule [32] allows os-independent extensions to be attached safely and efficiently at run time. Tardigrade [54] uses Picoprocesses to easily build fault-tolerant services. The success of these relaxations on picoProcess inspired our dynamic KylinX extensions to Unikernel.

Containers and Picoprocesses typically have large TCBS because LibOS contains unused functionality. In contrast, KylinX and other Unikernel take advantage of virtual hardware abstractions from the virtual machine Hypervisor to simplify their implementation and follow minimalism [36] by combining applications only with required libraries, improving not only efficiency but also security.

Dune [34] uses Intel VT-X [69] to provide process (rather than machine) abstraction to isolate processes and access privileged hardware functionality. IX [35] integrates virtual devices into Dune process mode, achieving high throughput and low latency for networked systems. 1wCs [53] provides separate units of protection, privilege, and execution state within the process.

In contrast to these technologies, KylinX runs Xen (Type 1 Virtual machine Hypervisor) directly, naturally providing strong isolation and allowing KylinX to focus on flexibility and efficiency issues.

6 the conclusion

In the literature of cloud virtualization, there has long been a tension between strong isolation and rich features. This article takes advantage of the new design space to propose the pVM abstraction by adding two new features (dynamic pages and library mapping) to the highly dedicated static Unikernel. The simplified Virtualization Architecture (KylinX) treats the virtual machine Hypervisor as an OS and securely supports flexible process operations such as pVM fork and Inter-PVM communication, runtime updates, and fast reclaim.

In the future, we will improve security by modularizing [27], decomposing [60] and SGX enclaves [33,28,45,68]. We will improve the performance of KylinX by adopting a more efficient runtime MUSL [23] and adapting KylinX to the Multi LibOS Model [65], allowing pVM to be spread across multiple machines. Currently, the pVM reclamation mechanism is temporary and conditional: it can only check the airspace of the point; Looping PVMS cannot communicate with other PVMS using event channels or shared memory; Applications can only take the form of self-contained shared libraries and do not need to load other shared libraries. After recycling, there is still no safeguard between the old and new PVMS to check for potential security threats. We will address these shortcomings in future work.

7 thanks

This work was supported by the National Basic Research Program of China (2014CB340303) and the National Natural Science Foundation of China (61772541). Wethank Li Ziyang Li, Qiao Zhou and anonymous commentators helped them refine this article. This work was carried out when the first author visited the NetOS Group at the University of Cambridge Computer Laboratory, and we thank Professor Anil Madhavapeddy, Ripduman Sohan and Liu Hao for their discussion.

After the body

Ok, I hope it works. These new articles have nothing on the Internet, and the teachers of NATIONAL University of Defense Technology have no external contact information, so those who choose this article to report, cheer up ~