background
For a large distributed microservice system, it is necessary to simulate the anomalies in each link and service invocation chain. And these fault simulations must be integrated ininvasively into the system so that they can be manually activated to see if the system is performing as expected.
Plan expect
- Functional appeal:
- A random delay or unavailability occurs in a microservice.
- The STORAGE system disk I/O latency increases, THE I/O throughput is low, and the disk drop time is long.
- A hotspot occurs in the scheduling system, and a scheduling command fails.
- In the recharge system, the callback interface is used to simulate the success of repeated recharge requests from the third party.
- Game development simulation player network instability, frame drop, delay is too large, and a variety of abnormal input (plug-in request) under the circumstances of the system is working correctly.
- Solution demands
- Do not affect the normal function logic, do not have any intrusion into the function code
- The fail-injected code does not end up in the final released binaries.
- Fail-injected code must be readable, easy to write, and capable of introducing compiler detection.
- Supports parallel testing, which can be specified to control whether a point of failure is activated
Scheme selection
Two projects of Go fault injection solution are widely used, namely gofail of ETCD team and Failpoint of PingCap Company. After actual investigation and experience, GoFAIL has some problems, including:
- Fault injection code is injected into the code in the form of comments, so the compiler cannot check its syntax, and the readability of the converted code is basically 0.
- It cannot be precisely controlled, and all fault points will be activated after injection is started, which is not friendly to parallel testing.
- The compiled code may affect the number of lines of code, i.e. the code originally in line 10 May be converted to line 12, which is not optimized for troubleshooting and code location
To sum up, choose failpoint scheme with more humanization;
The Failpoint project, developed by PingCap, is a Golang implementation of FreeBSD FailPoints that allows error or abnormal behavior to be injected into code and triggered by environment variables or code dynamic activation. Failpoint can be used to simulate error processing in various complex systems to improve the system’s fault tolerance, correctness and stability. For any source file of Golang code, you can parse out the syntax tree of the file, traverse the entire syntax tree, find all failpoint injection points, and then rewrite the syntax tree to convert it to the desired logic.
The principle of
What is the nature of macros? If we trace back to the source, we find that the FAILpoint meeting the above conditions can be realized in Golang through AST rewriting, as shown in the following figure:For any source file of Golang code, you can parse out the syntax tree of the file, traverse the entire syntax tree, find all failpoint injection points, and then rewrite the syntax tree to convert it to the desired logic.
Setting up the FailPoint environment
cd $GOPATH/src mkdir -p github.com/pingcap cd github.com/pingcap git clone https://github.com/pingcap/failpoint.git cd failpoint make GO111MODULE=on CGO_ENABLED=0 GO111MODULE=on go build -ldflags '-X "github.com/pingcap/failpoint/failpoint-ctl/version.releaseVersion=12f4ac2-dev" -X "github.com/pingcap/failpoint/failpoint-ctl/version.buildTS=2019-11-15 09:41:49" -X "github.com/pingcap/failpoint/failpoint-ctl/version.gitHash=12f4ac2fd11dfc6b2f7018b00bb90f61a5b6b692" -X "github.com/pingcap/failpoint/failpoint-ctl/version.gitBranch=master" -X "Github.com/pingcap/failpoint/failpoint-ctl/version.goVersion=go version go1.13 Darwin/amd64" '- o bin/failpoint - CTL failpoint-ctl/main.go failpoint-ctl build successfully :-) !Copy the code
After compiling, an executable file failpoint-ctl is generated:
Ll bin Total 6840-rwxr-xr-x 1 lanyang staff 3.3m 11 15 17:41 Failpoint-ctlCopy the code
Fault injection and activation
1. Inject fault codes
package main
import "github.com/pingcap/failpoint"
func main(a) {
failpoint.Inject("testPanic".func(a) {
panic("failpoint triggerd")})}Copy the code
2. Code conversion
Convert code to fault injection code
$GOPATH/src/github.com/pingcap/failpoint/bin/failpoint-ctl enable
Copy the code
When enabled, the following files are generated and the point of failure codes are transformedRestore the code
$GOPATH/src/github.com/pingcap/failpoint/bin/failpoint-ctl disable
Copy the code
After restore, the additional generated files are deleted and the code is restored
3. Code execution & activation failure
Normal execution
./your-program
Copy the code
Activate the fault
GO_FAILPOINTS="main/testPanic=return(true)"./your-program // You can specify which failpoints to activateCopy the code
Advanced – fine control
Sometimes, in order to carry out parallel testing, that is, to activate the failure injection point without affecting other people’s tests, you can add a Hook through context.Context to fine control failPoint.
WithHook function packaging a callback function, through some built-in judgment logic can determine whether to hit the point of failure; Change the input parameters of the callback function to context and the name of the fault point. You can determine the Value of the context internally or in other ways. You can control the parameters as required.
The demo code is as follows:
sctx := failpoint.WithHook(ctx, func(ctx context.Context, fpname string) bool {
Printf("hook CTX %v,%v\n", CTX,fpname) // CTX can be omitted
//return ctx.Value(fpname) ! = nil // Determine by ctx key
if c.Ctx.Request.URL.RawQuery == "mock=true"{
return true
}else{
return false
}
})
failpoint.InjectContext(sctx,"common_info".func(val failpoint.Value) {
fmt.Printf("mock error 2: %v\n", val)
c.ResponseSuccess("ping mock point2")})Copy the code
Appendix: Failpoint Maker functions
Marker function
The AST rewriting phase marks the part that needs to be rewritten, which has the following functions:
- Prompts Rewriter to rewrite as an equal IF statement.
- The parameters of the marker function are the parameters needed in the rewrite process.
- The tag function is an empty function, and the compilation process is inline, further eliminated.
- The failpoint injected in the tag function is a closure. IF the closure accesses external variables, the closure syntax allows you to capture the external scope variables without compiling errors. The converted code is an IF statement, which accesses the external scope variables without causing any problems. So closure capture is just syntactically legitimate, and ultimately doesn’t have any extra overhead.
- Simple, easy to read and write.
- By introducing compiler detection, if the parameter of Marker function is not correct, the program cannot be compiled, so as to ensure the correctness of the translated code.
List of Marker functions currently supported:
func Inject(fpname string, fpblock func(val Value)) {} func InjectContext(fpname string, ctx context.Context, fpblock func(val Value)) {} func Break(label ... string) {} func Goto(label string) {} func Continue(label ... string) {} func Fallthrough() {} func Return(results ... interface{}) {} func Label(label string) {}Copy the code
For more information about the use of Maker functions, see the PingCap documentation.
series
“Chaos Engineering: System-level fault Simulation.”
“Fault Injection: Code-level Fault Simulation”