Polly is one. NET elastic and transient fault handling library, which allows developers to express policies such as failure retries, service fuses, timeout handling, bulkhead isolation, cache policies, and failure degradation in a smooth and thread-safe manner.

  • Polly
  • Polly.Extensions.Http

In a distributed system, in order to ensure high performance and high availability of services, we need to do traffic control, such as load balancing, service routing, circuit breaker, degradation, traffic limiting and traffic related scheduling.

When a large number of requests appear, all connections time out, Socket connection cannot be released in time, resulting in memory overflow, OOM appears.

Here are two real-life scenarios:

  1. For example, all lines of business that depend on this privilege service are unavailable because the underlying privilege service is down.
  2. Recently, when our group used Redis, we did not release it in time after using it. As the maximum connection pool was only 128, a classmate encapsulated the blocking Redis lock and continuously competed for the lock, resulting in the state of Redis service being unavailable.

Background: We currently rely on a standard Restful style API to encapsulate a set of transient processing policy components via Http requests. I use RestSharp for the Http request library.

Retry strategy

In THE Http request, such as the instantaneous network failure, or the peer service due to the online release of a certain moment caused by the failure to establish a connection; In another case, we need to Sleep (1s) because the frequency of invocation triggers the frequency limit of the object service. In this low-probability scenario, we can improve the availability of the service request by retrying. For standard Restful apis, we use the HttpStatus status code to determine the exception handling retry policy.

var retryPolicy = Policy<IRestResponse>.HandleResult(restResponse =>
    {
        // The HTTP network is temporarily faulty
        // ReSharper disable once ConvertIfStatementToReturnStatement
        if (response.ResponseStatus == ResponseStatus.Error
       || response.ResponseStatus == ResponseStatus.TimedOut
       || response.StatusCode >= HttpStatusCode.InternalServerError;)
        {
            return true;
        }

        return false;
    })
    .WaitAndRetry(3, i => TimeSpan.FromSeconds(1));
return retryPolicy;
Copy the code

Fuses, bulkhead isolation strategies

In order to avoid dragging down our own business services, we can use circuit breakers when an upstream service is down or unavailable. Example code:

  • The first parameter: failureThreshold = 0.8 indicates the fault threshold. For example, out of 100 requests, 80 fail, and the circuit breaker is triggered.
  • The second parameter, samplingDuration, is based on data samples within 30 seconds. Some upstream service may be unavailable within 5s due to instantaneous network failure, and the network will be restored on 6s. If the setting time is too short, the circuit breaker will be triggered immediately, and it may take 10s for the circuit breaker to return to normal, so the design is not reasonable. Therefore, it is recommended that this parameter be set neither too small nor too large.
  • The third parameter: minimumThroughput = 50, triggering the minimum number of requests to fuse. It must be greater than 50 times.

The current circuit breaker mechanism is triggered if more than 50 requests are made within 30 seconds and more than 80% of the requests fail.

/// <summary> // <summary> BrokenCircuitException /// </summary> /// <param name="onBreakAction"></param> /// <param name="onResetAction"></param> /// <param name="onHalfOpenAction"></param> /// <param name="failureThreshold"></param> /// <returns></returns> public static CircuitBreakerPolicy<IRestResponse> BulkheadPolicy(Action<DelegateResult<IRestResponse>, TimeSpan> onBreakAction, Action onResetAction, Action onHalfOpenAction, Double failureThreshold = 0.8) {var Bulkhead = Policy<IRestResponse>.handleresult (InvalidStatus) AdvancedCircuitBreaker (failureThreshold samplingDuration: TimeSpan.FromSeconds(30), 50, TimeSpan.FromSeconds(30), onBreakAction, onResetAction, onHalfOpenAction); return bulkhead; }Copy the code

Demotion strategy

When a service request triggers the circuit breaker policy, in order to ensure the availability of the service, we will generally degrade the processing, such as reading the cache data at zero, or directly prompt the service exception and return the exception information in time.

  • Refer to Polly’s circuit breaker strategy. After circuit breaker, BrokenCircuitException will be triggered. We degrade this exception, whether it is cached or direct exception return, to ensure the fast response of the service, and will not break down due to a large number of requests and response timeout.
public static PolicyWrap<IRestResponse> BulkheadFailBackPolicy(Action<DelegateResult<IRestResponse>, TimeSpan> onBreakAction, Action onResetAction, Action onHalfOpenAction) {var failBack = Policy<IRestResponse> .Handle<BrokenCircuitException>().Fallback(() => new RestResponse() { ResponseStatus = ResponseStatus.Error, StatusCode = HttpStatusCode ExpectationFailed, ErrorMessage = "is not available, the current service interface fusing has been downgraded." }); return Policy.Wrap(failBack, BulkheadPolicy(onBreakAction, onResetAction, onHalfOpenAction)); }Copy the code

Combine all strategies

We can easily implement dynamic proxies through Castle or Autofac. Implement the logic of the control level in the form of sections without affecting the standard business. The order of policy combinations is important, with the outermost firing first. If the retry policy is in the outermost layer, and if the service is already circuit breaker, we continue to trigger the retry policy. This is contrary to circuit breaker logic. Therefore, the recommended sequence for triggering the retry policy is circuit breaker – downgrade – retry.

public class RestClientPolicyInterceptor : IInterceptor {public RestClientPolicyInterceptor () {} / / / < summary > / / / north's RestClient agent encapsulation / / / < summary > / / / < param name="invocation"></param> public void Intercept(IInvocation invocation) { //1. Determine whether the circuit is fused to realize bulkhead isolation //2. Determine the retry strategy for instantaneous network failure. String methodName = Invocation.Method.Name; //3. PolicyWrap<IRestResponse> policy = Policy.Wrap(RequestExceptionRetryPolicy(), BulkheadFailBackPolicy((result, Span) => {logger.error ($"{methodName}: triggers circuit breaker." ); }, () => {logger.error ($"{methodName}: circuit breaker, Rest." ); }, () => {logger.error ($"{methodName}: circuit breaker, Half Open." ); })); policy.Execute(() => { invocation.Proceed(); return invocation.ReturnValue as IRestResponse; }); }Copy the code