The author | XiaoChangJun ali cloud valley (arch) intelligent enterprise group of technical experts

Introduction: With the evolution of cloud native systems, how to ensure the stability of the system is facing great challenges. Chaos engineering, through the idea of anti-fragility, can inject faults into the system, detect system problems in advance, and improve the fault tolerance of the system. The ChaosBlade tool performs chaos experiments with declarative configuration, simple and efficient. This article will focus on ChaosBlade and cloud native related experimental scenario practices.

ChaosBlade introduction

ChaosBlade is an open source chaos experiment execution tool based on chaos experiment model developed by Alibaba. It has the characteristics of high scene richness, simplicity and ease of use, and can easily expand experimental scenes. Soon after open source, ChaosBlade was added to CNCF Landspace and became a mainstream chaos tool.

The experimental scene

Currently, the supported experimental scenarios are as follows:

  • Basic resource scenarios: CPU load, memory usage, disk I/O load, disk usage, network delay, network packet loss, network masking, domain name unreachable, shell script tampering, process killing, process Hang, and machine restart.

  • Application service scenarios: experimental scenarios in Java and C++ applications. Java has rich scenario components, such as Dubbo, RocketMQ, HttpClient, Servlet, Druid, etc., and supports writing Java or Groovy scripts to implement complex experimental scenarios.

  • Container service scenario: supports Kubernetes and Docker services and contains node, POD, and Container resources, such as POD network delay and packet loss.

Chaos experimental model

All the above experimental scenes follow the chaos experimental model, which is divided into four layers, including:

  • Target: indicates the experimental Target. Refers to the components where the experiment takes place, such as containers, application frameworks (Dubbo, Redis), etc.
  • Scope: Scope of experiment implementation. Refers to the specific trigger experiment machine or cluster;
  • Matcher: Experimental rule Matcher. Define experimental matching rules based on the configured Target. Multiple matching rules can be configured. Each Target may have its own special matching conditions, such as Dubbo in RPC field, which can be matched according to the service provided by the service provider and the service called by the service consumer. Redis in cache field, which can be matched according to the set and GET operations.
  • Action: indicates the scenario to be simulated. For example, if the disk is full, the I/O reading and writing capability of the disk is high. In the case of applications, you can abstract out experimental scenarios such as delays, exceptions, returning specified values (error codes, large objects, etc.), parameter tampering, and repeated calls.

For example, an application on a machine with IP 10.0.0.1 calls [email protected] Dubbo service delay 3s, based on this model can be described as an experiment on the Dubbo component (Target), The Scope of experiment implementation is 10.0.0.1 host (Scope), invoke [email protected] (Matcher) service delay 3s (Action), and the corresponding chaosblade command is:

Blade create dubbo delay --time 3000 --service com.example.HelloService --version 1.0.0Copy the code

Therefore, this model is very simple and clear to express the experimental scene, easy to understand. The cloud native experimental scenario described below is also based on this model definition.

An experimental scenario for cloud native

Implementation scheme

The chaos experiment scene is defined as resources in Kubernetes according to the above experimental model, and managed by a custom controller, which can be implemented through Yaml configuration or blade command directly.

The ChaosBlade Operator defines the resource controller and deploits a ChaosBlade-tool Pod on each node to perform chaos experiments in a daemonset manner. For example, in the Node scenario, chaosBlade-tool deployed in the Node scenario can be executed internally, while in the Container scenario, the controller copies the ChaosBlade package to the target Container for execution.

use

Install necessary Components

To install the ChaosBlade Operator, download ChaosBlade -operator-0.0.1. TGZ and run the following command:

Helm install --namespace kube-system --name chaosblade-operator chaosblade-operator-0.0.1. TGZ helm install --namespace kube-system --name chaosblade-operator chaosblade-operator-0.0.1. TGZCopy the code

Install in kube-system command space. ChaosBlade Operator starts, deploying ChaosBlade -tool Pod and a ChaosBlade -operator Pod on each node. To view the installation result, run the following command:

kubectl get pod -n kube-system -o wide | grep chaosblade
Copy the code

Perform experiment

There are two execution modes:

  • One is to configure yamL and use Kubectl.
  • The other is to execute it directly using the blade command in the ChaosBlade package.

In the following example, the CPU load is 80% on a specified node.

Yaml configuration mode

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: cpu-load
spec:
  experiments:
  - scope: node
    target: cpu
    action: fullload
    desc: "increase node cpu load by names"
    matchers:
    - name: names
      value:
      - "Cn - hangzhou. 192.168.0.205"
    - name: cpu-percent
      value:
      - "80"
Copy the code

As shown above, save the file as chaosblade_CPU_load. yaml and run the following command to execute the scenario:

kubectl apply -f chaosblade_cpu_load.yaml
Copy the code

You can view the execution status of each experiment by running the following command:

kubectl get blade cpu-load -o json
Copy the code

View more configuration examples in experimental scenarios.

Blade Command execution mode

Download the ChaosBlade toolkit and unzip it to use. Again, using the blade command, execute as follows:

Blade create k8s node-CPU fullload --names cn-hangzhou.192.168.0.205 --cpu-percent 80 --kubeconfig ~/. Kube /configCopy the code

Running the blade command returns the result of the experiment.

Modify the

The YAML configuration file supports dynamic scenarios. For example, to adjust the CPU load to 60%, you only need to change the value from 80 to 60. For example:

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: cpu-load
spec:
  experiments:
  - scope: node
    target: cpu
    action: load
    desc: "cpu load"
    flags:
    - name: cpu-percent
      value: "60"
    - name: ip
      value: 192.168. 034.
Copy the code

Then use kubeclt apply -f chaosblade_cpu_load.yaml to perform the update.

To stop the

You can stop the experiment in one of three ways:

Stop by experiment resource name

For example, in the CPU-load scenario, you can run the following command to stop the experiment:

kubectl delete chaosblade cpu-load
Copy the code

Stop through the YAML configuration file

To specify the created YAML file to be deleted, run the following command:

kubectl delete -f chaosblade_cpu_load.yaml
Copy the code

Stop with the blade command

This method is limited to experiments created using the blade. Stop using the following command:

blade destroy <UID>
Copy the code

Is the result returned by executing the blade create command. If you forget it, run the blade status –type create command to query it.

Uninstall chaosblade operator

Run helm del –purge chaosblade-operator to unload, which will stop all experiments and delete all created resources.

conclusion

ChaosBlade combines Kubernetes resource control in a friendly manner based on chaos experimental model, which is simple to deploy and simple to use, and controllable to experiment. In addition, ChaosBlade implements a number of domain scene actuators based on the experimental model, which can easily extend the experimental scenes. Please refer to the project list in the appendix.

Community building

ChaosBlade has been open source for over 30 contributors and many businesses. Thank you very much. At the same time, more people are welcome to participate, making ChaosBlade more powerful, covering more scenarios, and becoming a stable and universal chaos engineering tool for various enterprises.

Contributions can take the form of bugs, code submissions, documentation, additional unit tests, participation in problem discussions, and so on. ChaosBlade believes that in the open source world, any help is a contribution.

The appendix

The list of projects is as follows:

  • ChaosBlade CLI (Call entry)
  • ChaosBlade experimental model definition
  • Base resource scenario executor
  • Docker scene executor
  • Kubernetes scenario executor
  • Java application scenario executor
  • C++ application scenario executor

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”