Wechat public number: kernel prince attention can learn more about the database, JVM kernel related knowledge; If you have any questions you can also add me pigpdong[1]

High availability architecture is the core of service stability.

Chaotic engineering

We can think of chaos engineering as an experiment to uncover unknown weaknesses in distributed systems. Chaos engineers learn to observe how systems react by applying some principles of empirical inquiry. Just as scientists do experiments to learn the laws of physics, chaos engineers do experiments to learn about systems.

Chaos engineering is a discipline of conducting experiments on distributed systems, aiming to establish the ability and confidence of the system to resist runaway conditions in production environment. It was first proposed by Netflix and related teams.

Most chaos engineering projects these days are called “Monkey,” that pesky Monkey that jumps up and down your system until it crashes. Through chaos experiments, we can understand the vulnerable side of the system, and we can actively discover these problems before they cause harm to users, and improve the resilience of the whole system.

And fault injection and fault testing

Chaos engineering, fault injection, and fault testing have significant overlap in concerns and tools. However, the former focuses on generating unpredictable information through practice, which helps us better understand the system and has more uncertainty, while the latter focuses more on meeting expectations, just like an Assert assertion in a program, which throws an exception if it does not meet expectations. The fail-tests broke the system in a certain intended way, but did not explore more strange scenarios that might occur. It is essentially the difference between experiments, which are mainly used to verify, and tests, which are used to discover new knowledge.

Sample input for wonton experiment:

  • Simulate the entire IDC breakdown
  • Calls to an interface return with a 5 second delay
  • Make a CPU fully loaded
  • Have a function throw an exception or return a fixed value
  • Force NTP time to be out of sync
  • Production IO error

Principles of chaos engineering

The following figure shows the possible fault classification, which can be simulated by Java bytecode technology and operating system-level tools respectively for in-process and out-of-process faults. With the emergence of Serverless, Docker and other new architectures and technologies, the fault implementation mechanism and undertaking carrier will also have some new changes.

Application scenarios of chaos engineering

  • Improve system fault tolerance and stability
  • Evaluate the system hazard red line
  • Verify the dynamic scaling capabilities of cloud services
  • This section describes how to verify the effectiveness of monitoring and alarms and whether the indicators are comprehensive
  • Fault raid, improve the ability of related personnel to locate and troubleshoot problems

A brief introduction to ChaosBlade

ChaosBlade is a chaos engineering tool that follows the experimental principles of chaos engineering, provides a variety of fault scenarios, and helps distributed systems improve fault tolerance and recovery. It can inject underlying faults, featuring simple operation, non-invasive, and strong scalability. The Github address is github.com/chaosblade-… Unlike other alibaba open source tools, it is not under/Alibaba, but under ChaosBlade-io

Chaos [ˈ ˈ s ɪ t] = chaos a blade of a knife is a cosplay

The following figure shows the development history of ChaosBlade, in which AHAS is a product based on ChaosBlade’s fault drill on the cloud.

ChaosBlade components and ecology

ChaosBlade provides command-line methods to manage experiments, such as create to create an experiment, destroy to destroy an experiment, and query. Current in-process experiments are conducted primarily through JavaAgent bytecode injection for the JAVA environment (as is arthas, another fault finding tool) and through shell (including Docket and K8S) for the machine level

ChaosBlade characteristics

  • High scene richness
  • Simple to use and easy to understand
  • Dynamic loading, no intrusion
  • Easy to expand scenarios
  • Support running time setting

ChaosBlade supported experiments

  • The CPU load
  • Network packet loss delay filtering
  • Domain block
  • Disk filling and disk I/O load
  • Kill the process
  • Delete containers, POD
  • RPC call delay
  • JAVA method injection specifies the exception and sets the return value

Using an example

docker pull registry.cn-hangzhou.aliyuncs.com/chaosblade/chaosblade-demo:latest
docker run -it registry.cn-hangzhou.aliyuncs.com/chaosblade/chaosblade-demo:latest

Copy the code