Writing in the front
Recently in the research docker cluster (Kubernetes) monitoring, in order to thoroughly understand, a simple look at the source code. Here’s what I learned.
docker api: stats
First, the API of Docker, the specific usage scenarios of STATS are as follows:
http://$dockerip:2375/containers/$containerid/stats
Copy the code
The state of a container on the Docker machine can be obtained, and the response of this request will continue to write response messages back (once per second).
http://$dockerip:2375/containers/$containerid/stats?stream=false
Copy the code
The stream argument defaults to true, and when set to false, response is written once.
This API is equivalent to the docker stats $CID command, which calls the ContainerStats() method (line 19 at /daemon/stats.go in the Docker project directory) and starts a collector:
daemon.SubscribeToContainerStats(name)
Copy the code
The collector periodically writes data to the UPDATE variable. Then start a coroutine to read the data, and obtain the data in the following ways:
update := v.(*execdriver.ResourceStats)
Copy the code
and
nwStats, err := daemon.getNetworkStats(name)
Copy the code
You can see that the network state is read separately, but where is the CPU and memory state read? Jump step by step, we see here: / daemon/execdriver/driver_linux. Go line 112:
func Stats(containerDir string, containerMemoryLimit int64, machineMemory int64) (*ResourceStats, error)
Copy the code
In this function, we can see that all we really mean by getting the state of the container is reading the file. Let’s take a container that is already running on Docker:
cat /run/docker/execdriver/native/$containerID/state.json
Copy the code
Cstats, err := Mgr.getStats () gets the monitoring data, such as CPU status information, stored in a path as follows:
cd /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-9b479ceaf3bef1caee2c37dfdc982a968144700a2c42991b238b938788a8a91f.scope
Copy the code
Another example is the existence of general information in memory:
cat /sys/fs/cgroup/memory/system.slice/docker-9b479ceaf3bef1caee2c37dfdc982a968144700a2c42991b238b938788a8a91f.scope/memory. statCopy the code
It’s up to you to figure out what the data is.
There is one more piece of data left, which is the focus of our discussion: network IO.
Go: getNetworkStats(name string): Each docker container creates a network card with a name starting with veth followed by a random hexadecimal code. We just need to find the network card, and then, like the above method of obtaining CPU and memory state, go to the file. However, the nic name seems to be in memory, at least I don’t see Docker storing it anywhere else. In the code: Nw, err: = daemon.netController.Net workByID (c.NetworkSettings.Net workID) can be seen that the nic (network), and then have a Statistics () method, As we continue to trace, docker calls a component package: Github.com/docker/libcontainer with statistics method, and performs a cat command (the component package/sandbox interface_linux. Go line 174) : data, err := exec.Command(“cat”, netStatsFile).Output()
Yes, reading files again. /sys/class/net/, which contains many subdirectories, including the network card generated when docker starts the container.
Personal guess: Docker keeps container to nic mapping in memory. From this mapping, we can find the statistics of the network card (the data is generated by the kernel and recorded in the machine file). We can see the following data:
Unifying the data is what the docker Stats command does.
How does Kubernetes call the above monitoring function
Kubernetes uses the cAdvisor component for monitoring. Because kubernetes records container information (but does not record container-network card mapping), cAdvisor running on nodes does not need docker Stats to get container CPU and memory usage data. What about network IO data?
We know that k8S deployment runs a container that is going to be a pause container. Yes, network IO is recorded in the pause container. Here you can verify in your own K8S environment.
As long as we get the containerID of the pause container, we can use this method to catch the network IO.
Because cAdvisor is not designed specifically for K8S, it does not iterate through the pause container when fetching network IO. So the current cAdvisor has no way to get the network IO of the container.
Therefore, if you want to join the network IO monitoring function when using K8S cluster, you can start with the cAdvisor of K8S.
cAdvisor
How does the cAdvisor get the IO from the network? First, in k8s (version 1.0.6)/PKG/kubelet cadvisor/cadvisor_linux. 51 in the go, the New (port int) method, this is called kubelet cadvisor entrance. Instantiate a cAdvisorClient, and perform his Start () method, we can find out the method into cAdvisor project (github.com/google/cadvisor/manager/manager.go 195 lines).
We see a sentence
err := docker.Register(self, self.fsInfo)
Copy the code
The mark of the first. Continue to look at:
glog.Infof("Starting recovery of all containers")
err = self.detectSubcontainers("/")
Copy the code
Here the program checks all running containers, compares them with those recorded in memory, and creates a new processing mechanism for each new container: The structure of a Container (has reduced, just delete), and then execute cont. Start () (github.com/google/cadvisor/manager/manager.go 766 rows)
And by keeping track, we know that it collects surveillance information every second, by the way
stats, statsErr := c.handler.GetStats()
Copy the code
(/ cadvisor/manager/container. Go 401 rows) handler is an interface. We need to know how it was assigned to him. Without further elaboration, let’s go straight to the key method:
func (self *dockerContainerHandler) GetStats() (*info.ContainerStats, error)
Copy the code
(cadvisor/container/docker/handler. Go line 305)
It calls again:
func GetStats(cgroupManager cgroups.Manager, networkInterfaces []string) (*info.ContainerStats, The error) (cadvisor/container/libcontainer/helpers. Go line 77)Copy the code
This is where you actually get the monitoring information, which introduces the method of the same name in the libContainer package,
Finally is to import the “github.com/docker/libcontainer/cgroups” method in the third-party packages. Does that sound familiar? Docker stats is the package used to obtain network IO. We found function: to this bag (/ libcontainer/cgroups/fs/apply_raw go line 148)
func (m *Manager) GetStats() (*cgroups.Stats, Error) {m.u.lock () defer m.u.lock () stats := cgroups.newstats () for name, path := range m.p.sun {// m.p.sun Sys, OK := SUBSYSTEMS [name] if! ok || ! Cgroups.pathexists (path) {continue} if err := sys.GetStats(path, stats); err ! = nil { return nil, err } } return stats, nil }Copy the code
The libcontainer/cgroups/fs directory contains cpu.go,memory.go, etc. Each file contains a GetStats() structure that allows you to retrieve any data.
We go back to cadvisor project cadvisor/container/libcontainer/helpers. Go line 77,
Look down:
// TODO(rjnagal): Use networking stats directly from libcontainer. stats.Network.Interfaces = make([]info.InterfaceStats, len(networkInterfaces)) for i := range networkInterfaces { interfaceStats, err := sysinfo.GetNetworkStats(networkInterfaces[i]) if err ! = nil { return stats, err } stats.Network.Interfaces[i] = interfaceStats }Copy the code
Here, the official also used other means to find the data of the network. Although I don’t know why, and I don’t care which system file it is to look for, the data is still not available. No network card for the container was found. Here looking for card method in cadvisor/container/docker/handler. Go line 311:
var networkInterfaces []string if len(config.Networks) > 0 { // ContainerStats only reports stat for one network device. // TODO(vmarmol): Handle multiple physical network devices. for _, n := range config.Networks { // Take the first non-loopback. if n.Type ! = "loopback" { networkInterfaces = []string{n.HostInterfaceName} break } } } stats, err := containerLibcontainer.GetStats(self.cgroupManager, networkInterfaces)Copy the code
There is obviously a need for improvement (in the k8S environment).
Note: as the updated version, the new version of the cAdvisor github.com/opencontainers used to wrap. See: github.com/opencontainers/runc/libcontainer/container_linux.go line 151: In addition, the correct way to obtain network monitoring data is to monitor the pause container in /proc/$pid/net/dev
Disk monitoring
When the main function is initialized, the CAdvisor uses init methods to build up the structure of the file system on the machine (which disks are there, which directories are mounted, and which maj and min numbers are available) in memory. Cadvisor then monitors machine performance by monitoring disk usage and disk read and write.
Monitor disk usage
The CAdvisor executes the syscall.statfs method to obtain information such as the total disk size, available space, total inodes, and usage through Linux system calls
Monitoring disk read and write status
By reading the contents of the /proc/diskSTATS file from the machine, it looks like this:
cat /proc/diskstats
254 0 vda 48560 0 994278 64304 2275295 70286 18962312 6814364 0 1205480 6877464
254 1 vda1 48177 0 991034 64008 1865714 70286 18962304 6777592 0 1170880 6840740
254 16 vdb 700 0 4955 284 0 0 0 0 0 284 284
Copy the code
The read/write count and read/write speed of each storage device are calculated based on the data. (definition and calculation of the contents of the documents here can see www.udpwork.com/item/12931….
Expose interfaces
When we call the CAdvisor interface: / API /v2.1/ MACHINestats, we find the Filesystem field in the JSON structure returned, which contains data to monitor disk read/write performance and disk usage
defects
As mentioned above, cAdvisor generates an abstract structure of the machine file system during initialization, but this structure will not be checked and updated dynamically. When a data disk is dynamically mounted on the machine (such as PV using CEPH RBD for POD), CAdvisor will not be aware of the monitoring data of the newly mounted disk, nor will it be aware of the monitoring data.
There are currently no community optimizations for this feature.
Prometheus
Prometheus was written here to show how the performance data we mentioned above can be collected and used more efficiently.
When kubelet is started, it listens on port 10250 and provides a /metrics URL, along with the /metrics/ CAdvisor URL, which provides data monitoring of Kubelet’s own workflow and performance monitoring information provided by CAdvisor.
This information is returned to the client in Prometheus Metrics format. It can be displayed through the corresponding Grafana component, analyzed through Prometheus, and provided with alarms.