The author | lodge any alibaba senior development engineer
Sentinel is an open source flow control component of Alibaba that is oriented to distributed service architecture. It mainly takes traffic as the entry point and helps developers guarantee the stability of micro-services from multiple dimensions such as flow limiting, traffic shaping, fuse downgrading and system adaptive protection. Sentinel has undertaken the core scenarios of alibaba’s double 11 traffic promotion in the past 10 years, such as second kill, cold start, message peak cutting and valley filling, cluster flow control, real-time fusing of downstream unavailable services, etc. Sentinel is a powerful tool to ensure the high availability of micro services. It supports Java, Go, C++ and other languages. Istio/Envoy global flow control support is also provided to provide high availability protection for the Service Mesh.
Sentinel Go 0.3.0 has been released recently, bringing with it the support of fuse downgrading feature. It can automatically fuse unstable calls in THE Go service to avoid cascading errors/avalanches, which is an important part of ensuring high availability of the service. Combined with Sentinel Go’s gRPC, Gin, Dubbo and other framework components, developers can quickly configure circuit breaker degradation rules at the Web and RPC call levels to protect the stability of their own services. The 0.3.0 release also includes the ETCD dynamic data source module, which allows developers to dynamically adjust fuse downsizing strategies using ETCD.
Sentinel Go project address: github.com/alibaba/sen…
Why do you need a circuit breaker downgrade
A service often calls another module, perhaps another remote service, a database, a third-party API, and so on. For example, when making a payment, you may need to remotely call the API provided by UnionPay. Querying the price of an item may require a database query. However, the stability of the dependent service is not guaranteed. If the dependent service is unstable and the response time of the request is longer, the response time of the method that invokes the service is also longer, threads pile up, and eventually the business’s own thread pool may be exhausted and the service itself becomes unavailable.
Modern microservice architectures are distributed and consist of a very large number of services. Different services call each other and form a complex call link. The above problems can have a magnified effect in link calls. If a link in a complex link is unstable, it may cascade to make the whole link unavailable. Therefore, we need to fuse down unstable services to temporarily cut off unstable calls to avoid local unstable factors leading to an avalanche of the whole.
Sentinel Go fuse degradation feature is based on the idea of fuse mode, which temporarily cuts off service invocation when unstable factors (such as response time becomes longer and error rate increases) occur, and then tries again after a certain period of time. On the one hand, it prevents the “aggravation” of the unstable service, on the other hand, it protects the caller of the service from being dragged down. Sentinel supports two fusing strategies: response time based (slow call ratio) and error based (error ratio/number of errors), which can effectively protect against a variety of unstable scenarios.
Here are some best practices for Sentinel flow control degradation.
Best practices for flow control degradation
In the Service Provider scenario, we need to protect the Service Provider from being overwhelmed by traffic peaks. Traffic control is usually based on the service provider’s service capability or restricted to a particular service caller. In order to protect the service provider from being overwhelmed by the surging traffic and affecting its stability, we can configure QPS mode flow control rules through Sentinel combined with the previous capacity evaluation. When the number of requests per second exceeds the set threshold, the redundant requests will be automatically rejected.
In the Service Consumer scenario, we need to protect the Service caller from being dragged down by unstable dependent services. By using Sentinel semaphore isolation strategy (concurrent data flow control rule), the concurrent amount of a service call is limited to prevent a large number of slow calls from crowding the resources of normal requests. At the same time, with the help of the fusing downgrade rule, when the abnormal ratio or slow call ratio exceeds a certain threshold, the automatic fusing will be invoked, and the recovery will be attempted after a period of time. We can provide the default processing logic (fallback) during the circuit breaker, and calls during the circuit breaker will return the result of the Fallback, instead of trying the already very unstable service. It is important to note that even if the service caller introduces a circuit breaker degrade mechanism, we still need to configure the request timeout on the HTTP or RPC client to provide a cushion.
At the same time, Sentinel also provides adaptive protection capability of the system in the global dimension. Combining with the monitoring indicators of Load, CPU utilization, QPS, response time and concurrency of the service, Sentinel achieves a balance between the system inlet flow and the system Load through adaptive flow control strategy. Let the system run as far as possible in the maximum throughput at the same time to ensure the overall stability of the system. System rules can serve as a bottom-saving defense policy for the entire service to ensure service continuity.
Let’s start hacking!
The Sentinel Go version is evolving rapidly and we welcome any developers who are interested in contributing to lead the evolution of future versions. The evolution of the Sentinel Go version was not possible without the contributions of the community. If you would like to contribute, feel free to contact us to join the Sentinel Contribution team and grow together (Sentinel Open Source Discussion peg group: 30150716).
Meanwhile, the annual Alibaba Summer of Code has begun! If you are a student in the school and interested in participating in the development and evolution of the Sentinel project, don’t miss this opportunity and pick interested issues are welcome to submit proposals: github.com/alibaba/Sen…
Now let’s start hacking!
You are invited to join the 3rd cloud Native Webinar
From 19:00 to 20:00 tonight, ali Cloud technical experts will present “How to Bring stable and efficient Deployment ability for Cloud Native Applications?” , will introduce the core deployment problems encountered in the large-scale cloud application of Aliyun economy, the corresponding solutions adopted, and how to help users on Aliyun to improve the efficiency and stability of application deployment and release after these solutions are exported as open source universal capabilities.
The following benefits are available to listeners:
• Understand the practical experience of large-scale cloud application in Ali economy, and how to solve the problem that the native K8s workload does not meet the requirements of the scene; • As an external user, how to experience and use the application deployment and publishing capability precipitated by the cloud in Alibaba’s economy; • Demonstrate how Alibaba can achieve a highly available grayscale upgrade for a large K8s cluster with DaemonSet (open source soon!)
Click the link to schedule a live broadcast: yq.aliyun.com/live/2898
“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”