GOPATH is the path of the code. GO will find the dependency packages according to GOPATH. There are three directories below: PKG stores the dependency packages, SRC stores the source code, and bin stores the compiled executable files

1. Linux NameSpace

A namespace is an isolation technology that allows a container to isolate system resources, such as process ids (PID), User ids (UID), and networks.

An example: Unlike normal Linux user isolation, Linux users sometimes require root privileges to operate. Users in a namespace can be assigned root permission in the namespace, which implements UID isolation, which is the real isolation of users.

In addition to UsrNamespace, pids can also be virtualized. Each child namespace has its own init process, etc., mapped to the parent namespace.

1.1 UTS Namespace

UTS namespace is used to isolate two system identifiers, nodename and Domainname. In UTS namespace, each namespace is allowed to have its own hostname

package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS,
	}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
}
Copy the code

OS /exec is used to run external commands. It has a type of Cmd that can easily open processes to use channel communication and some IO ports for reading and writing.


The Command function returns the Cmd structure to execute the named program with the given parameters. It only sets the path and parameters in the return structure. If the name does not contain a path separator, Command uses LookPath to resolve the name as full a path as possible. Otherwise, it simply uses name as Path. The Args field of the returned Cmd is constructed from the command name followed by the ARG element, so the ARG should not include the command name itself. For example Command(“echo”, “hello”). The argument [0] is always the name, not the path that might be resolved.


Sysprocttr is a set of parameters whose purpose is unknown


Log.Fatal prints the output, exits the application, and the defer function does not execute. Panic will defer and recurse to the next level. The difference between go and panic

This code forksWhat is fork? Fork fork is a UNIX or UniX-like function that splits a running program into two (nearly) identical processes, each of which starts a thread of execution from the same point in the code. Threads in both processes continue executing as if two users had started two copies of the application at the same timeFork /exec /usr/bin/shecho $$The current PID is printedWe want to check that the parent and child are in the same namespace,pstree -plList the tree relationships of all processes and find that the sh parent is our ProjectName awesomeProject (PID 51903).usehostname -b zhouxiaohaoChanging the hostname finds no sound changes and outside the hostname. It means they’re quarantined.

1.2 the IPC Namespace

The IPC Namespace isolates system V IPC from POSIX mesage (a type of message queue). System V is a Unix kernel architecture that introduces three advanced interative process communication mechanisms. Message queues, shared memory, semaphores. These are IPC objects, and they are called with their own IDS.

Ipcs -q: displays only message queues. Ipcs-s: Displays only semaphores. Ipcs -m: displays only the shared memory. Ipcs -help: indicates other parametersCopy the code
package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC,
	}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
}
Copy the code

Creating a message queue on the host, which is not visible on our container, indicates that the Linux namespace also isolates the IPC namespace, since message queues are normally shared by all processes and can be seen.

1.3 the PID Namespace

PID Namespace is used to isolate process IDS. The same process can have different Pids in different PID namespaces, and also modify the previous program slightly. Then check pstree-pl in the host and use echo $$in the container

package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID,
	}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
}
Copy the code

You can see that the current PID on the host is different from the PID I coded in the container.

1.4 mount Namespace

The first thing to know is that the /proc directory on Linux is a file system, the Proc file system. Unlike other common file system, / proc is a pseudo file system (and the virtual file system), storage is the current state of the kernel run a series of special files, the user can through these files to view information about the system hardware and is currently running process, can even by changing some of those files to change kernel running state. Mount Namespace is used to isolate mount points. The file directory structure seen inside our container is different from that of the host. The file hierarchy seen in different mount namespaces is different. Calling mount () and unmount () in a mount namespace only affects the file system within the current namespace, not the global file system. This is similar to the chroot command, which turns a node into a root node, only more secure and flexible.

package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
	}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
}
Copy the code

Mount -t proc proc /proc to the current container, much less. Then you can use ps -ef to view the process

1.5 the User Namespace

The User Namespace is used to isolate User Group IDS. That is, the uer ID, Group ID, and User Namespace of a process are different from those of the process. More commonly, a User Namespace is created by running as a non-root User on the host and then being mapped as root in the User Namespace. This means that the process has root privileges in the User Namespace, but not outside the User mespace. Starting from Linux Kernel 3.8, the root process can also be User Name pace. The User can be mapped as root in namespoce and has root permission in Namespace.

package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER,
	}
        // Tell the specified system user to open the child process
	cmd.SysProcAttr.Credential=&syscall.Credential{Uid:uint32(1),Gid:uint32(1)}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
	os.Exit(- 1)}Copy the code

The problem of permission deny will occur according to the above procedure,vim /etc/passwdThe user’s information is listed in

User name: Password :UID:GID: User information :HOME Directory path: user shell. It is found that the sub-process is opened by a daemon user. So it could be that?If the UID is different, the User namespace takes effect.

Refer to the blog Linux to view users and user groups

1.6 the Network Namespace

A Network Namespace is used to isolate Network devices, such as Network devices, IP addresses and ports. A Network Namespace allows each container to have its own independent (virtual) Network devices, and applications in the container can be bound to its own ports. Ports in each Namespace do not conflict with each other. Once you bridge the host, you can easily communicate between containers, and applications on different containers can use the same port

package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func main(a){
	cmd:=exec.Command("sh")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER | syscall.CLONE_NEWNET,
	}
	//cmd.SysProcAttr.Credential=&syscall.Credential{Uid:uint32(1),Gid:uint32(1)}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		log.Fatal(err)
	}
	os.Exit(- 1)}Copy the code

If you look at ifconfig on the host and inside the container, you will see that there is no network device inside, but you will see a warning

2. Linux Cgroup

2.1 What is a Control Group?

www.hangdaowangluo.com/archives/24… Zhuanlan.zhihu.com/p/81668069 tech.meituan.com/2015/03/31/… Cgroups, full name for Control Groups, is a physical resource isolation mechanism provided by the Linux kernel. By this mechanism, resources of Linux processes or process Groups can be restricted, isolated, and counted. (Modify the configuration files /etc/cgconfig.conf and /etc/cgrules. Conf)

For example, you can use cgroup to limit the resource usage of a specific process, such as using a specific number of CPU cores and a specific size of memory. If the resource exceeds the limit, it will be suspended or killed. I found a lot of reference documents, and finally sorted out my own understanding as follows


A few simple concepts to know:

  • Task: In a Cgroup, a task is a process.
  • Control Group: Resource control of a Cgroup is implemented in the form of a control group, which specifies resource quota limits. Processes can be added to one control group or migrated to another.
  • Hierarchy: The control group has a hierarchical structure similar to a tree. The control group of child nodes inherits the attributes (such as resource quotas and restrictions) of the parent control group.
  • Subsystem: A subsystem is basically a controller of resources. For example, the Memory subsystem controls the use of process memory. Subsystems need to be added to a level, and then all control groups at that level are controlled by that subsystem.

Categories of subsystems:

  • CPU: limits the CPU usage of a process.
  • Cpuacct subsystem, which can count CPU usage reports of processes in Cgroups.
  • Cpuset: Assigns separate CPU or memory nodes to processes in Cgroups.
  • Memory: Limits the memory usage of a process.
  • Blkio: block device IO for limiting processes.
  • Devices: Control processes can access certain devices.
  • Net_cls: Marks network packets for processes in Cgroups, which can then be controlled using the TC module (traffic control).
  • Net_prio: limits the priority of network traffic for a process.
  • Huge_tlb: Limits the use of HugeTLB.
  • Freezer: suspend or resume a process in a Cgroups.
  • Ns: Controls processes in Cgroups to use different namespaces.

Relationships between concepts:

A Cgroup can control the configuration of many types of resources. That is, it can configure many types of subsystem. Our Hierarchy defines a variety of resource allocation combinations through different cgroup hierarchies. We can ask our task to reference the child nodes in different Hierarchy to obtain such allocation combinations. However, note that there should not be more than one definition of the same resource configuration. So there are a couple of rules

  • Different hierarchies cannot have the same type of subsystem
  • A task cannot reference multiple Cgroups in a hierarchy
  • The parent-child process just forked out is in the same Cgroup, but the relationship can change later

The two tasks form a Task Group and use cgroups of the CPU and Memory subsystems to control resource isolation between CPU and MEM


Subsystem: “toolapt install cgroup-tools”; subsystem: “lssubsys-a” “” “” “” “” “” “” “” “” “” “” “” “” ” Green: executable files, including JARS. White: text files. Blinking red: Incorrect symbolic link. Light blue: symbolic links. Yellow: device file)

Solution: Usemount -t proc proc /proc, load the external proc file into the container, and then check the subsystem. Why load it every time? Our top command, as well as pstree, and our lsSubsys need the Proc folder.

2.2 How to Implement CGroup in the Linux Kernel

Linux uses a virtual tree-based file system to represent hierarchy and file directories at the hierarchy to represent Cgroups. The following figure shows the operations

  • Cgroup. clone_children: the default value of this configuration file is 0, and if it is 1, the child node inherits the parent’s attributes
  • Cgroup. procs Displays the process group ID of the cgroup on the current node. The current node is the root node and all processes in the hierarchy are displayed
  • Notify_no_release is a notation that indicates whether release_agent has been executed since the last cgroup process exited. The current RELEase_agent is a path that is usually used to clean up unused Cgroups after the process exits
  • Tasks identify the process ID of the current node. If a PID is added to the cgroup, the cgroup is added to the tasks

You can see that the child node inherits the attributes of the parent node

Move the current terminal to a cgroup as followsHowever, this step does not associate to any subsystem, that is, there is no restriction on our Cgroup. In fact, the system has already created a hierarchy for each subsystem as shown below. Hierarchy: /sys/fs/cgroup/memory Hierarchy: /sys/fs/cgroup/memory Hierarchy: /sys/fs/cgroup/memory hierarchy


Create a STRESS that takes up 200 MB of memory and place it in our Cgroup with a limit of 100 MB of memory to see what happens

This was used for the Stress process before we started limiting memorytopView results

The specific operation is shown below

This is re-usetopIf you look at the memory footprint, it’s halved

2.3 GOlang implements Cgroup

This step is to implement the above kernel operation restricting Memroy through a GO script

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"os/exec"
	"path"
	"strconv"
	"syscall"
)
const cgroupMemoryHierarchyMounted="/sys/fs/cgroup/memory"
func main(a){
	if os.Args[0] = ="/proc/self/exe"{
		//thread of container
		fmt.Printf("current pid %d", syscall.Getpid())
		fmt.Println()
		cmd:=exec.Command("sh"."-c".`stress --vm-bytes 200m --vm-keep -m 1`)
		cmd.SysProcAttr=&syscall.SysProcAttr{
		}
		cmd.Stdin=os.Stdin
		cmd.Stdout=os.Stdout
		cmd.Stdin=os.Stdin
		cmd.Stderr=os.Stderr
		iferr:=cmd.Run(); err! =nil{
			fmt.Println(err)
		}
		os.Exit(- 1)
	}
	cmd:=exec.Command("/proc/self/exe")
	cmd.SysProcAttr=&syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER | syscall.CLONE_NEWNET,
	}
	//cmd.SysProcAttr.Credential=&syscall.Credential{Uid:uint32(0),Gid:uint32(0)}
	cmd.Stdin=os.Stdin
	cmd.Stdout=os.Stdout
	cmd.Stdin=os.Stdin
	cmd.Stderr=os.Stderr
	iferr:=cmd.Run(); err! =nil{
		fmt.Println(err)
		os.Exit(1)}else {
		fmt.Printf("%v", cmd.Process.Pid)
		//mkdir the cgroup-1 on the hierarchy of the default memory subsystem
		os.Mkdir(path.Join(cgroupMemoryHierarchyMounted,"testmemroylimits"),0755)
		//add the container into the cgroup
		ioutil.WriteFile(path.Join(cgroupMemoryHierarchyMounted,"testmemorylimits"."tasks"), []byte(strconv.Itoa(cmd.Process.Pid)),0644)
		//limit the cgroup usage
		ioutil.WriteFile(path.Join(cgroupMemoryHierarchyMounted,"testmemroylimits"."memory.limit_in_bytes"), []byte("100m"),0644)}//Wait waits for the Process to exit, and then returns a ProcessState describing its status and an error, if any. Wait releases any resources associated with the Process. On most operating systems, the Process must be a child of the current process or an error will be returned.
	cmd.Process.Wait()
}
Copy the code

Exec :=Command() already forks a child, so the stress process will not be affected if the current child is placed in the limit. The console is started from /proc/self/exe. If (args[0]) will not execute the if statement the first time because args[0] does not meet the criteria, and the child fork will enter the binary of main. If (args[0] does not meet the criteria), the child fork will enter the binary of main. Finally, the content stress was not added to the restriction. As can be seen from the results from top, Memeory is still 5%. Print out args[0] to know the order of executionSome operational documentation for reference:

mount: www.linuxprobe.com/mount-detai…

3.Union File System

What is it? Early Dockers used AUFS which is an advanced version of UFS. It lets you have many file systems in a container, making it easy to use and manage resources.AUFS file read operation

  • 1. The file exists at the container-layer: the file is read directly from the container-layer.
  • 2. The file does not exist at the Container-layer: Search down from the next layer to find the file and read the file from the layer where the file is found.
  • 3. When the file exists at both the container-layer and image-layer, the file in the container-layer is read.

In short, you start at the Container-layer and work your way down to find the file, read it, and stop the search. Modify file or directory write operations under AUFS

  • 1. Write to an existing file in the container-layer. Operations are performed in the file (new files are created and modified in the container layer).
  • 2. Write to the existing file in image-layers: Copy the file to the container-layer, and write to the copy at the container-layer.

delete

  • 1. Delete files or directories in the Container-layer.
  • Create a whiteoutfile in the container layer. The image-layer file is not deleted, but will become invisible to the Container because of the whiteout.
  • 3. Delete the directory in image-layers. Create opaquefile in the container layer, which is the same as whiteout.

rename

  • 1. Rename the container-layer file or directory directly.
  • 2, image-layer file rename:
  • 3, image-layer directory renaming: There is no support in AUFS of Docker, it will trigger EXDEV.

Copy-on-write technology: when open a copy of a process, such as a parent opened a child process, don’t need to immediately replicate the resources, but sharing them, until the child to write operations, a resource will copy a copy of the resource, the copy resources belong to the child process address space, and will not affect the parent process. Docker formally creates images and containers based on thisIf you want to see how to use AUFS, you have to create an AUFS

3.1 Write AUFS by yourself

Create a new auFS directory with the following structure and a TXT file within each layerI found that my kernel no longer supports AUFSUse DMESag to see the true cause of the errorWhat is Docker0? Dokcer is a local loopback router that controls the communication of containers within dokcer