Author: Ding Yuan RadonDB test principal
Responsible for RadonDB cloud database, containerized database quality performance test, iterative verification. In-depth research on performance and high availability solutions including cloud databases and containerized databases.
Following up “Chaos Engineering Tool ChaosBlade Opeator Series Introduction”, this installment will use ChaosBlade Opeator to test the application scenarios of Node class resources, including:
- CPU Load Scenario
- Network Delay scenario
- Network packet loss scenario
- Kill Specifies the process
- Stop the specified process
| experimental environment
The test object
RadonDB MySQL container database based on KubeSphere platform was tested.
For details about how to deploy RadonDB MySQL, see Deploying the RadonDB MySQL Cluster in KubeSphere.
Environmental parameters
The name of the cluster | The host type | CPU | Memory | Total Disk | Node Counts | Replicate counts | Shard counts |
---|---|---|---|---|---|---|---|
KubeSphere | High availability type | 8C | 16G | 500GB | 4 | – | – |
RadonDB MySQL | – | 4C | 16G | POD: 50G DataDir: 10 G | 3 | 2 | 1 |
After the test environment is deployed, you can perform verification in the following five scenarios.
1. CPU load scenario
1.1 Test Objectives
Specify a node to perform 80% CPU load verification.
1.2 Starting tests
Set yamL test parameter values.
apiVersion: chaosblade.io/v1alpha1 kind: ChaosBlade metadata: name: cpu-lode spec: experiments: - scope: node target: CPU action: fulllode desc: "increase node CPU load by names" # - "worker-s001" # Test object Node name - name: cpu-percent value: "80" # node load percentage - name: IP value:192.168.0.20 # Node load percentageCopy the code
Select a node and modify itnode_cpu_load.yaml
The names value in.
1.3 Test and Verification
On the Node Node, run the top command to see that the CPU of the Node reaches 80% of the load.
2. Network delay scenario
2.1 Test Preparations
Log in to the Node, run the ifconfig command to view the nic information, and specify the default nic name to eth0.
2.2 Test Objectives
Add 3000 ms access latency to the specified node, worker-s001, and the latency will fluctuate by 1000 ms.
2.3 Starting tests
Select a node and modify itdelay_node_network_by_names.yaml
The names value in. rightworker-s001
The packet loss rate of node access is 100%.
Start testing.
kubectl apply -f delay_node_network_by_names.yaml
Copy the code
View experimental status.
kubectl get blade delay-node-network-by-names -o json
Copy the code
2.4 Test and Verification
Access the Guestbook from the node.
$ time echo "" | telnet 192.168.0.18
echo "" 0.00s user 0.00s system 35% cpu 0.003 total
telnet 192.168.1.129 32436 0.01s user 0.00s system 0% cpu 3.248 total
Copy the code
Stop testing. You can delete the test process or simply delete the Blade resource.
kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code
3. Network packet loss scenario
3.1 Test Objective
The packet loss rate of the specified node is 100%.
3.2 Starting a Test
Select a node and change the names value in loss_node_network_by_names.yaml.
Run the following command to start the test.
$ kubectl apply -f loss_node_network_by_names.yaml
Copy the code
Run the following command to view the experiment status.
kubectl get blade loss-node-network-by-names -o json
Copy the code
3.3 Test and Verification
The port is the Guestbook NodePort port. The port does not respond to the request for accessing experiments, but the port that is not enabled for accessing experiments can be used normally.
Obtain the node IP address.
$ kubectl get node -o wide
Copy the code
Access to Guestbook from the experimental node – Inaccessible.
$Telnet 192.168.0.20Copy the code
Access Guestbook from a non-experimental node – Normal access.
$Telnet 192.168.0.18Copy the code
In addition, you can access the address directly from the browser and verify the test results.
Stop testing. You can delete the test process or simply delete the Blade resource.
kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code
4. Kill Specifies the process
4.1 Test Objective
Example Delete the MySQL process on the specified node.
4.2 Starting a Test
Select a node and modify itkill_node_process_by_names.yaml
The names value in.
Run the following command to start the test.
$ kubectl apply -f kill_node_process_by_names.yaml
Copy the code
Run the following command to view the experiment status.
kubectl get blade kill-node-process-by-names -o json
Copy the code
4.3 Test and Verification
Enter the experimental node.
$SSH 192.168.0.18Copy the code
Check the mysql process number.
$ ps -ef | grep mysql
root 10913 10040 0 14:10 pts/0 00:00:00 grep --color=auto mysql
Copy the code
You can see that the process number has changed.
$ ps -ef | grep mysql
Copy the code
The MySQL process number changes, indicating that it is restarted after being killed.
Stop testing. You can delete the test process or simply delete the Blade resource.
kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code
5. Stop the specified process
5.1 Test Objectives
Suspends the MySQL process on the specified node.
5.2 Starting the Test
Select a node and change the names value in stop_node_process_by_names.yaml.
Run the following command to start the test.
$ kubectl apply -f stop_node_process_by_names.yaml
Copy the code
Run the following command to view the experiment status.
kubectl get blade stop-node-process-by-names -o json
Copy the code
5.3 Test and Verification
Enter the experimental node.
$SSH 192.168.0.18Copy the code
Check the mysql process number.
$ ps -ef | grep mysql
root 10913 10040 0 14:10 pts/0 00:00:00 grep --color=auto mysql
Copy the code
You can see that the process number has changed.
$ ps -ef | grep mysql
Copy the code
The MySQL process number changes, indicating that it is restarted after being killed.
Stop testing. You can delete the test process or simply delete the Blade resource.
kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code
| epilogue
By using ChaosBlade Operator to conduct chaos engineering experiment on KubeSphere Node resources, the following conclusions can be drawn:
For Node nodes, ChaosBlade still has simple configuration and operation to complete complex experiments. It can realize complex failures of various Node levels through free combination to verify the stability and availability of Kubernetes cluster. At the same time, when the real fault comes, because it has already simulated a variety of fault situations, you can quickly locate the source of the fault, do not panic, easy to deal with the fault.
Next up
The next installment will use the deployed ChaosBlade Opeator tool to test and validate various scenarios for poD-class resources.