What is Chaosblade?

Chaosblade is an experimental tool that follows the Chaos Engineering principle and is used to simulate common failure scenarios to improve the recoverability and fault tolerance of distributed systems.

Chaosblade is built on Nearly a decade of Alibaba’s practice of failure testing and drills, combining the best ideas and practices across the group’s businesses.

Currently, the supported scenarios include CPU, disk, process, and network of the operating system, Dubbo, MySQL, Servlet, and custom class method delay or exception throwing of Java application classes, and container and Pod killing. You can run blade create -h to view the specific scenarios.

Okay, so the intro was lifted from Chaosblade’s Github page.

Github homepage: github.com/chaosblade-…

To put it bluntly, Chaosblade is a fault simulation tool that simulates things like a full server CPU, full disk, slow network, slow response time for a service in Dubbo, a method in the JVM throwing an exception, slow Mysql calls, and so on. So this tool is very, very useful for large companies, because you can simulate all kinds of failures in advance, so that the system is highly available and stable.

How does Chaosblade work?

The usage is very simple in two steps:

  1. Download the zip package and unzip it: github.com/chaosblade-…
  2. The unzipped file contains a Blade executable, which is a client tool provided by Chaosblade that we use primarily for failure simulation.

For details about blade parameters, please go to github homepage. I will not introduce them here. I mainly want to show you the specific use and effect of fault simulation.

Here are six Chaosblade usage scenarios:

  • The simulation server CPU is full. Procedure
  • The simulated server disk is full
  • Simulating a call to a Dubbo service timed out
  • Simulate a method in the JVM that throws an exception or modifies the method return value
  • The simulated Mysql call timed out or failed
  • The network of the analog server is slow

Scenario 1: The CPU capacity of the server is full

Run the top-o CPU command to check the system CPU status before the fault test.

Conduct a fault drill:

$ ./blade create cpu fullload
{"code": 200,"success":true."result":"a0682a98d0d7d900"}
Copy the code

If the command is executed successfully, the fault test succeeds. Then run the top -o CPU command.

We can see from the results that Chaosblade is supposed to fill the CPU of the server by letting itself fill up.

Scenario 2: The disk capacity of the server is full

To simulate disk overcrowding, you actually only need to generate a very large file in a folder, so here we create a/bladeDisk folder.

Before troubleshooting, the size of the/bladeDisk folder is:

$ du -sh /bladedisk/
  0B	/bladedisk/
Copy the code

To test the fault, run the following commands:

./blade create disk fill -d --mount-point /bladedisk --size 1024
Copy the code

In normal cases, a chaOS_filldisk.log. dat file is created in /bladedisk. This file is 1024 bytes in size.

The reason WHY I say normal is because I am running a Max OX system and I will get an error when I execute the above command. Specific error has been submitted to Github issues, interested students can pay attention to, the issue address.

Trivia: WHEN SUBMITTING the issue, I used Chinese, but it was automatically translated into English by ChaosBlade-bot.

Then you can try it in your own system. After the issue is resolved, I will update the article and supplement it. All you need to know is that Chaosblade can simulate this scenario and how it works.

Scenario 3: Invoking a Dubbo service times out

The Demo on the official website provides us with:

  • dubbo-provider
  • dubbo-consumer

After downloading the above service provider and service consumer JARS, go to the download directory and run the following command:

# start dubbo - the provider
nohup java -Djava.net.preferIPv4Stack=true-dproject. name= dubbo-provider-jar dubbo-provider-1.0-snapshot. jar > provider.nohup.log 2> &1&# Wait 2 seconds, then start Dubo-Consumer
nohup java -Dserver.port=8080 -Djava.net.preferIPv4Stack=true -Dproject.name=dubbo-consumer -jar dubbo-consumer-1.0-SNAPSHOT.jar > consumer.nohup.log 2>&1 &
Copy the code

Nohup is a Linux command that allows Java commands to run in the background.

Once up and running, the service invocation can be made with the following command:

http://localhost:8080/hello?msg=world
Copy the code

Normally, the request will be completed quickly and return:

{
"date": "Wed Jul 03 16:33:10 CST 2019"."msg": "Dubbo Service: Hello world"
}
Copy the code

Conduct a fault drill:

$ ./blade prepare jvm --process dubbo.consumer
{"code": 200,"success":true."result":"5cdbc31f46a3d621"}
$ ./blade create dubbo delay --time 3000 --service com.alibaba.demo.HelloService --methodname hello --consumer --process dubbo.consumer
{"code": 200,"success":true."result":"3e705e8babe8a86c"}
Copy the code

The above command will make the consumer call com. Alibaba. Demo. Hello HelloService service method increase the delay 3 seconds. We will wait a little longer than before when we access the path visited above.

There are actually a lot of segmentation scenarios that are supported when we run through dubbo, because there are two roles in Dubbo: Consumer and Provider. When a consumer calls provider, we now want to delay the request. We can either add a delay for the specified service on the provider side or delay for the specified service when the consumer calls the specified service, so we can take a look at the above command, which is actually controlled by the consumer, and the command also supports the control on the provider side. Let’s run the following command:

blade create dubbo delay --help
Copy the code

You will see the following information in the help:

Flags:
      --appname string          The consumer or provider application name
      --consumer                To tag consumer role experiment.
      --effect-count string     The count of chaos experiment in effect
      --effect-percent string   The percent of chaos experiment in effect
  -h, --help                    help for delay
      --methodname string       The method name
      --offset string           delay offset for the time
      --process string          Application process name
      --provider                To tag provider experiment
      --service string          The service interface
      --time string             delay time (required)
      --timeout string          set timeout for experiment
      --version string          the service version
Copy the code

These include –consumer and –provider, which represent commands that control both ends of a service invocation. So if we want to control the provider side and make an interface timeout when called, we can run a fault drill.

As for the underlying principle, you need to know more about Dubbo. Dubbo has dynamic configuration function, so Chaosblade should also use the dynamic configuration function of Dubbo.

Scenario 4: A method in the JVM throws an exception or modifies the method return value

Chaosblade supports direct manipulation of methods in the JVM to throw exceptions or modify their return values.

Start with a MockJvm class:

package com;
import java.util.concurrent.TimeUnit;
public class MockJvm {
    public String test(a) {
        return "test...";
    }

    public static void main(String[] args) throws InterruptedException {
        MockJvm testJVM = new MockJvm();

        while (true) {
            try {
                System.out.println(testJVM.test());
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
            TimeUnit.SECONDS.sleep(3); }}}Copy the code

This class will call the test method every three seconds and print out the return value of the method. It will print out the exception thrown by the test method after catching it. The test method returns “test” by default. We run the class and keep it running. When running normally, the console will print:

test...
test...
test...
test...
Copy the code

Method throws an exception

$ ./blade prepare jvm --process MockJvm
{"code": 200,"success":true."result":"5ff98509d2334906"}
$ ./blade create jvm throwCustomException --process MockJvm --classname com.MockJvm --methodname test --exception java.lang.Exception
{"code": 200,"success":true."result":"f9052478db2f7ffc"}
Copy the code

The above command simulates a Java.lang. Exception Exception thrown by the test method in the MockJvm class under the MockJvm process. Once this command is successfully executed, the code console we’ve been running above will throw an exception:

test...
test...
test...
chaosblade-mock-exception
chaosblade-mock-exception
Copy the code

To undo the simulation, use the following command:

/blade destroy f9052478db2f7ffc // f9052478db2f7ffc.Copy the code

After retracting, the console will resume normal printing:

chaosblade-mock-exception
chaosblade-mock-exception
chaosblade-mock-exception
chaosblade-mock-exception
test...
test...
Copy the code

Modify the return value of the method

The return value of a method can be modified using the following command:

$ ./blade create jvm return --process MockJvm --classname com.MockJvm --methodname test --value hahaha...
{"code": 200,"success":true."result":"9ffce12b1fdc2580"}
Copy the code

The console will print:

test...
test...
test...
hahaha...
hahaha...
hahaha...
Copy the code

You can see that the return value of the test method was successfully modified.

Scenario 5: Invoking Mysql times out or an exception occurs

Chaosblade currently supports Mysql scenarios in which calls to Mysql timeout or exceptions occur during statement execution. But it is controlled at the JDBC level and does not really control the mysql server.

Write a test class in JDBC:

package com;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.time.LocalDateTime;
import java.util.concurrent.TimeUnit;

public class JDBCConnection {
    public static String url_encrypt="JDBC: mysql: / / 127.0.0.1:3306 / test? useSSL=false";
    public static String user="root";
    public static String password="Nice89163";

    public static void main(String[] args) throws Exception
    {
        Class.forName("com.mysql.jdbc.Driver");
        Connection conn  = DriverManager.getConnection(url_encrypt,user,password);
        Statement stmt= conn.createStatement();

        while (true) {
            try {
                LocalDateTime before = LocalDateTime.now();
                ResultSet rs = stmt.executeQuery("select * from t_test");
                LocalDateTime after = LocalDateTime.now();
                System.out.println("Execution time:" + (after.getSecond() - before.getSecond()));
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
            TimeUnit.SECONDS.sleep(3); }}}Copy the code

This JDBCConnection class uses JDBC directly to execute SQL, relying on the jar corresponding to mysql-connector-java. Here, IN my test, I found that if [email protected] version is used, normal fault simulation can be carried out, but if [email protected] version is used, normal fault simulation cannot be carried out. The specific reason has not been checked.

This test is used to perform select queries, and if an exception is thrown during select, it will be caught and printed, and it will count the time taken to execute the SELECT statement.

Start by running the above class, and the console will always print:

Execution time: 0 Execution time: 0 Execution time: 0Copy the code

Calling Mysql throws an exception

Run the following command to start the fault simulation:

$ ./blade prepare jvm --process JDBCConnection
{"code": 200,"success":true."result":"f278e66ddb1b4e11"}
$ ./blade create mysql throwCustomException --database test--host 127.0.0.1 --port 3306 --process JDBCConnection -- SQLTYPE SELECT --table t_test --exception java.lang. exception {"code": 200,"success":true."result":"ddd6799da50f9201"}
Copy the code

After the command is executed successfully, the console displays an exception:

Unexpected Exception encountered during query. Unexpected exception encountered during query.Copy the code

To undo the simulation, use the following command:

./blade destroy ddd6799da50f9201 
Copy the code

After retracting, the console will resume normal printing:

Unexpected exception encountered during query. Unexpected exception encountered during query. Unexpected exception encountered during query. Execution time: 0 Execution time: 0Copy the code

Calling Mysql adds latency

Using the following command directly will add 4 seconds to the delay in executing the SELECT, which is controlled at the JDBC layer.

$ ./blade create mysql delay --database test--host 127.0.0.1 --port 3306 --process JDBCConnection -- sqlTYPE select --table t_test --time 4000 {"code": 200,"success":true."result":"8e5b35e76098caab"}
Copy the code

After the command is executed, the console will print:

Execution time: 0 Execution time: 0 Execution time: 4 Execution time: 4 Execution time: 4Copy the code

Scenario 6: The server network is slow

Chaosblade can also control the network, for example, by running the following command to limit the network delay to 3 seconds as it passes through eth0:

./blade create network delay --interface eth0 --time 3000
Copy the code

However, the Mac system does not support this scenario, because it actually uses the TC (Traffic Control) command under the Linux system, so if you want to simulate the Linux system, I will not simulate here.

conclusion

Originally, I was going to write an article about the complete use of Chaosblade, but so far it is not perfect, so I will leave it here and go to Github to make an issue.

However, I believe that through this article, you should have a good understanding of the role and function of Chaosblade, you gain is my purpose.

There is a pain point innovation, a technology is certainly to solve a pain point just appeared. Please help to forward it. If you want to learn more wonderful content in the first time, please pay attention to the wechat public number: 1:25