“This is the 17th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
Play ChaosMesh, one of the open source tools for Chaos Engineering. ChaosMesh aims to be a universal chaos testing tool.
ChaosMesh is to be used in conjunction with K8S, which uses cloud native capabilities.
Basic workflow principles of Chaos Mesh
The schematic diagram shows the general workflow:
- Users create updated Chaos objects using yamL files or K8S clients.
- Chaos-mesh creates update or delete events through the Chaos object in the Watch API Server, with controller-Manager/Chaos-Daemon and Sidecar collaborating to provide injection capabilities.
- Admission -webhooks are used to receive HTTP callbacks and provide status information.
Chaos Mesh function points
With that in mind, let’s use it in detail.
Chaos Mesh installation
Prerequisites:
- K8s cluster (including HELM3)
The Chaos Mesh installation is relatively simple, and the steps are as follows:
[root@s5 ChaosMesh]# helm repo add chaos-mesh https://charts.chaos-mesh.org
[root@s5 ChaosMesh]# kubectl create ns chaos-testing
[root@s5 ChaosMesh]# helm install chaos-mesh chaos-mesh/chaos-mesh --namespace=chaos-testing
Copy the code
Check the installation result:
[root@s5 ChaosMesh]# kubectl get pods --namespace chaos-testing -l app.kubernetes.io/instance=chaos-mesh
NAME READY STATUS RESTARTS AGE
chaos-controller-manager-58bc5ff9d8-bvwht 1/1 Running 0 99s
chaos-daemon-5bzjd 1/1 Running 0 99s
chaos-daemon-jjtnb 1/1 Running 0 99s
chaos-dashboard-5878548c46-rnz47 1/1 Running 0 99s
[root@s5 ChaosMesh]#
Copy the code
Several pods were generated normally.
If you are interested, you can also install a simple test sample provided by ChaosMesh. Run the following command.
[root@s5 ChaosMesh]# curl - sSL https://mirrors.chaos-mesh.org/v1.2.1/web-show/deploy.sh | bash
Copy the code
Note: This example installs into the namespace of Default by default.
Chaos Mesh access
1. Check the Chaosmesh Dashboard nodePort port and access the following IP address:
Here are the steps to provide token generation. If you want to operate on the entire K8S, you can select Cluster Scoped, and role can select Manager. The corresponding RBAC content will be generated below, and then apply directly.
2. The following page is displayed after login.
Simulation of CPU load using Chaos Mesh
1. Click On NEW EXPERIMENT and select STRESS TEST (note that this is not a performance TEST concept).
- Enter the number of CPU workers and CPU load percentage. Then click Submit.
3. Then choose the test target. And here, just like any other chaos tool, we’re using label_selector. Then click Submit twice.
4. Then check the CPU usage on the Worker where the POD is located and get the following results.
- Check the process in the corresponding worker, and you can see the following information.
Top-02:38:38 Up 35 days, 12:33, 0 Users, Load Average: 5.07, 4.08, 2.55 Tasks: 7 total, 1 running, 6 sleeping, 0 stopped, 0 zombie %Cpu0 : 29.2us, 3.0sy, 0.0ni, 67.4 ID, 0.0wa, 0.0hi, 0.3si, 0.0st %Cpu1: 34.0 US, 4.4SY, 0.0Ni, 61.3 ID, 0.0wa, 0.0hi, 0.3Si, 0.0St KiB Mem: 8008964 total, 7834456 used, 174508 free, 32984 buffers KiB Swap: 0 total, 0 used, 0 free.1203140 cached Mem PID USER PR NI VIRT RES SHR S %CPU % Mem TIME+ COMMAND 11 root 20 0 59088 3980 1096 S 45.3 0.0 7:45.77 stress-ng-CPU 1 root 20 0 4436 652 548 S 0.0 0.0 00.00 sh 6 root 20 0 17984 1448 1164 S 0.0 0.0 00.00 Run. Sh 7 root 20 0 41508 2500 1476 S 0.0 0.0 0:03.83 redis-server 10 root 20 0 58444 3864 3512 S 0.0 0.0 0:00.00 Stress -ng 12 root 20 0 19356 3148 1488 S 0.0 0.0 0:00.01 bash 34 root 20 0 19896 1396 1004 R 0.0 0.0 0:00.00 topCopy the code
You can see that the tool directly starts a process called stress-ng-CPU in the worker. From this name, we can understand that this is to start a process with the stress-ng tool.
This logic is the same as ChaostoolKit and ChaosBlade, which is to start a new process in the worker and consume the CPU.
conclusion
I’m going to leave this article at that, and then I’m going to sort it out. In the process of sorting these things out, I think the technology stack is much smaller than performance engineering, so I can do it easily and happily. You can see how important it is to have a technical base of knowledge.
I leave you with two questions to ponder:
- In chaotic engineering, what production scenarios can be covered by using this logic to simulate CPU utilization? What production scenarios can’t be covered?
- In an overridden scenario, what are the characteristics of system-level abnormal responses, since a process is newly started?