This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021

Recently, business growth is very rapid, the background for us this is also a big challenge, the core business interface performance bottlenecks, and not a single problem, but a few questions together: we solve a hair online, then find another performance bottlenecks. This is also my lack of experience, resulting in a sudden positioning of the solution; And I have stubborn self-esteem for the whole backstage team, so I don’t want to survive the pressure peak through a large number of horizontal expansion. As a result, problems of varying degrees appeared on the line for several consecutive nights, which must have an impact on our business growth. This is also where I am immature and reflective. This series of articles mainly record the general technical optimization of our background micro-service system and the optimization of business process and cache for this business growth, which is only applicable to our business, so we will not repeat it here. This series will be divided into the following parts:

  1. Improved the client load balancing algorithm
  2. Develop a plug-in to filter the log output exception stack
  3. Improved asynchronous log waiting strategies for x86 cloud environments
  4. Increasing the monitoring of HTTP request waiting queues for synchronous microservices and cloud deployment requires careful response to slow requests due to reaching the upper limit of instance network traffic
  5. Add necessary intrusive monitoring for system critical services

Improved the client load balancing algorithm

Spring Cloud LoadBalancer built-in polling algorithm and issues

We use Spring Cloud as our microservices architecture and have optimized many of these components. Please refer to my other series. Previously, our client load balancing algorithm was polling for different requests independently of each other. Since the microservices framework we implemented retries for requests that can be retried, retries require retrying different instances than before. Without retry, online publishing can not be realized without user awareness, and we deploy different instances of the same micro-service in different availability areas, and micro-service will not all fail every time, but some instances fail, with retry can make some instances fail, no awareness to users. Retries also allow the request to retry instances that are less stressed when some instances are too stressed. None of the built-in load balancing algorithms using Spring Cloud LoadBalancer could meet our requirements, so we improved the polling algorithm. The original process is:

  1. Gets a list of service instances
  2. All threads share the same atomic variable position, incrementing the atom by 1 each time they request it
  3. Position resizes the number of instances and returns the instance with the corresponding subscript

The algorithm problem is: Suppose there are two instances of microservice A: instance 1 and instance 2. When request A arrives, RoundRobinLoadBalancer returns instance 1. When request B arrives, RoundRobinLoadBalancer returns instance 2. Then, if request A fails and tries again, RoundRobinLoadBalancer returns instance 1. This is not what we expected to see.

Before this optimization, our load balancing algorithm as well as the problem

Before the sudden improvement of this business, the improved process of our first version is as follows:

  1. Get the list of service instances and sort the list by IP port. If not, position may represent a previously invoked instance even if it is next
  2. Based on the traceId in the request, obtain an atomic variable position from the local cache with traceId as the key and an initial value of a random number. This prevents all requests from being called from the first instance, followed by the second and third.
  3. Position atom is incremented by one, then the number of instances is mod, and the instance with the corresponding subscript is returned to be called

Where the request contains traceId, we use spring-cloud-sleUTH link tracing. Based on this mechanism, we can ensure that the request will not be retried to the instance that has been called before. The source code is:

/ / must be must be implemented ReactorServiceInstanceLoadBalancer / / not ReactorLoadBalancer < ServiceInstance > / / because when registration is ReactorServiceInstanceLoadBalancer @ Log4j2 public class RoundRobinWithRequestSeparatedPositionLoadBalancer implements ReactorServiceInstanceLoadBalancer { private final ServiceInstanceListSupplier serviceInstanceListSupplier; // Each request does not exceed 1 minute with retries // For requests that exceed 1 minute, the request must be heavy, Private final LoadingCache<Long, AtomicInteger> positionCache = Caffeine.newBuilder().expireAfterWrite(1) TimeUnit. MINUTES) / / random initial value, prevent begins with the first call. Every time the build (k - > new AtomicInteger (ThreadLocalRandom. The current () nextInt (0, 1000))); private final String serviceId; private final Tracer tracer; public RoundRobinWithRequestSeparatedPositionLoadBalancer(ServiceInstanceListSupplier serviceInstanceListSupplier, String serviceId, Tracer tracer) { this.serviceInstanceListSupplier = serviceInstanceListSupplier; this.serviceId = serviceId; this.tracer = tracer; } // Each time you retry, Override public Mono<Response<ServiceInstance>> Choose (Request Request) {return serviceInstanceListSupplier.get().next().map(serviceInstances -> getInstanceResponse(serviceInstances)); } private Response<ServiceInstance> getInstanceResponse(List<ServiceInstance> serviceInstances) { if (serviceInstances.isEmpty()) { log.warn("No servers available for service: " + this.serviceId); return new EmptyResponse(); } return getInstanceResponseByRoundRobin(serviceInstances); } private Response<ServiceInstance> getInstanceResponseByRoundRobin(List<ServiceInstance> serviceInstances) { if (serviceInstances.isEmpty()) { log.warn("No servers available for service: " + this.serviceId); return new EmptyResponse(); } // In order to solve the problem that different calls to the original algorithm may cause a request to retry the same instance // Get the context of the current request from sleuth's Tracer. If (currentSpan == null) {currentSpan = tracer.newtrace (); } // Get the request traceId from the request context, which uniquely identifies a request long l = currentspan.context ().traceid (); AtomicInteger seed = positionCache.get(l); int s = seed.getAndIncrement(); int pos = s % serviceInstances.size(); log.info("position {}, seed: {}, instances count: {}", pos, s, serviceInstances.size()); Return new DefaultResponse(serviceInstance.stream () // First take again for sorting sorted (Comparator.com paring (ServiceInstance: : getInstanceId)). Collect (Collectors. ToList ()). The get (pos)); }}Copy the code

However, this load balancing algorithm still caused us problems in the case of a large increase in requests.

First of all, we did not expand this time, so the performance pressure of this time is very sensitive to the balanced distribution of pressure. Micro service, for example, assume that A nine instances, peaks in the business, the most ideal situation is to ensure that whenever the nine are completely balanced load pressure, but because we use the initial value for the random number of atomic variable position, although from the total amount of day, is responsible for the equilibrium pressure must be balanced, But in a short period of time, it’s very likely that the pressure will all go to a few instances, causing them to collapse and fuse, and then all go to a few other instances, which will collapse and fuse, and so on.

Then, we deployed with a K8S deployment, where there might be many microservice pods running on the same virtual machine. In some cases, multiple pods of the same microservice may run on the same VM Node, as can be seen from the IP segment of the POD: For example, a microservice has the following seven instances: 10.238.13.12:8181,10.238. 13.24:8181,10.238. 15.12:8181,10.238. 17.12:8181,10.238. 20.220:8181,10.238. 21.31:8181,10.238. 21.1 21:8181, then 10.238.13.12:8181 and 10.238.13.24:8181 are likely to be on the same Node, 10.238.21.31:8181 and 10.238.21.121:8181 are likely to be on the same Node. We need to retry the instance on a Node that is different from the one we tried before, because if one of the instances on the same Node has a problem or is under too much pressure, the others will also have a problem or is under too much pressure.

Finally, if calling an instance consistently fails, that instance needs to take precedence over other normal instances. Released this to reduce quickly refresh (suddenly started many stopped several old instance after instance, number of instances is greater than retries configuration) to the user, and a usable area sudden abnormal cause multiple instances offline impact on users, as well as business pressure has been in the past, the pressure decreases after, need to turn off the instance is no longer needed, The impact on users when a large number of instances are migrated is significant.

The optimization scheme for the above problems

We propose an optimized solution to the above three problems:

  1. For each request, record:
  2. Which instances have been invoked by this request -> the instance cache that the request invoked
  3. Instance called, how many requests are currently being processed -> Number of requests the instance runs
  4. Instance called, recent request error rate -> instance request error rate
  5. Shuffling the list of instances randomly prevents requests from always being sent to the same instance when all three metrics are the same.
  6. In order that the current request has not been called before -> The lower the error rate, the higher the order -> The smaller the number of instances running requests, the higher the order
  7. Take the first instance in the sorted list as the load balancing instance

The following code comes from: github.com/JoJoTec/spr…

We use dependencies:

<dependency>
    <groupId>io.dropwizard.metrics</groupId>
    <artifactId>metrics-core</artifactId>
</dependency>
Copy the code

Caching classes for logging instance data:

@Log4j2 public class ServiceInstanceMetrics { private static final String CALLING = "-Calling"; private static final String FAILED = "-Failed"; private MetricRegistry metricRegistry; ServiceInstanceMetrics() { } public ServiceInstanceMetrics(MetricRegistry metricRegistry) { this.metricRegistry = metricRegistry; } / records invocation instance * * * * @ param serviceInstance * / public void recordServiceInstanceCall (serviceInstance serviceInstance) { String key = serviceInstance.getHost() + ":" + serviceInstance.getPort(); metricRegistry.counter(key + CALLING).inc(); } /** * Record the end of the instance * @param serviceInstance * @param isSuccess */ public void recordServiceInstanceCalled(ServiceInstance serviceInstance, boolean isSuccess) { String key = serviceInstance.getHost() + ":" + serviceInstance.getPort(); metricRegistry.counter(key + CALLING).dec(); if (! Metricregistry.meter (key + FAILED).mark(); @param serviceInstance @return */ public long getCalling(serviceInstance serviceInstance) { String key = serviceInstance.getHost() + ":" + serviceInstance.getPort(); long count = metricRegistry.counter(key + CALLING).getCount(); log.debug("ServiceInstanceMetrics-getCalling: {} -> {}", key, count); return count; } /** * Get the number of call failures in the last minute. @param serviceInstance @return */ public double getFailedInRecentOneMin(serviceInstance serviceInstance) {public double getFailedInRecentOneMin(serviceInstance serviceInstance) { String key = serviceInstance.getHost() + ":" + serviceInstance.getPort(); double rate = metricRegistry.meter(key + FAILED).getOneMinuteRate(); log.debug("ServiceInstanceMetrics-getFailedInRecentOneMin: {} -> {}", key, rate); return rate; }}Copy the code

Load balancing core code:

private final LoadingCache<Long, Set<String>> calledIpPrefixes = Caffeine.newBuilder() .expireAfterAccess(3, TimeUnit.MINUTES) .build(k -> Sets.newConcurrentHashSet()); private final String serviceId; private final Tracer tracer; private final ServiceInstanceMetrics serviceInstanceMetrics; // Each time you retry, @override public Mono<Response<ServiceInstance>> Choose (Request Request) {Span Span = tracer.currentSpan(); Return serviceInstanceListSupplier. The get (). Next (). The map (serviceInstances - > {/ / keep the span and calls to choose span as a try (Tracer.SpanInScope cleared = tracer.withSpanInScope(span)) { return getInstanceResponse(serviceInstances); }}); } private Response<ServiceInstance> getInstanceResponse(List<ServiceInstance> serviceInstances) { if (serviceInstances.isEmpty()) { log.warn("No servers available for service: " + this.serviceId); return new EmptyResponse(); TraceId Span currentSpan = tracer.currentSPAN (); if (currentSpan == null) { currentSpan = tracer.newTrace(); } long l = currentSpan.context().traceId(); return getInstanceResponseByRoundRobin(l, serviceInstances); } @VisibleForTesting public Response<ServiceInstance> getInstanceResponseByRoundRobin(long traceId, List<ServiceInstance> serviceInstances) {// First randomly shuffles the order of instances in the List collections.shuffle (serviceInstances); // If the comparator calls multiple times, And parameters may change during sorting (the request statistics for the instance are changing concurrently) Map<ServiceInstance, Integer> used = maps.newhashmap (); Map<ServiceInstance, Long> callings = Maps.newHashMap(); Map<ServiceInstance, Double> failedInRecentOneMin = Maps.newHashMap(); ServiceInstances = serviceInstances.stream(). Sorted (Comparator) <ServiceInstance>comparingInt(ServiceInstance -> {return used.computeIfAbsent(ServiceInstance, k -> { return calledIpPrefixes.get(traceId).stream().anyMatch(prefix -> { return serviceInstance.getHost().contains(prefix); })? 1:0; }); }). / / the current error rate at least thenComparingDouble (serviceInstance - > {return failedInRecentOneMin.com puteIfAbsent (serviceInstance, k -> { double value = serviceInstanceMetrics.getFailedInRecentOneMin(serviceInstance); Return ((int) (value * 100)) / 100.0; return ((int) (value * 100)) / 100.0; }); }) // thenComparingLong(serviceInstance -> {return callings.computeIfAbsent(serviceInstance, k -> serviceInstanceMetrics.getCalling(serviceInstance) ); }) ).collect(Collectors.toList()); if (serviceInstances.isEmpty()) { log.warn("No servers available for service: " + this.serviceId); return new EmptyResponse(); } ServiceInstance serviceInstance = serviceInstances.get(0); Calledipprefixes.get (traceId).add(serviceinstance.gethost ().substring(0, serviceInstance.getHost().lastIndexOf("."))); Positioncache.get (traceId).getAndIncrement(); return new DefaultResponse(serviceInstance); }Copy the code

Some Q&A about the project design within the group

1. Why not use a cache shared by all microservices to hold call data to make it more accurate?

Options for shared caches include putting these data records into an in-memory grid like Redis or Apache Ignite. But there are two problems:

  1. If data records are placed in additional storage such as Redis, all load balancing cannot be performed if Redis is not available. If you add Apache Ignite, if the corresponding node goes offline, then the corresponding load balancing cannot be performed. None of this is acceptable.
  2. Assuming microservice A needs to call microservice B, it may be that one instance of A has A problem calling one instance of B, but other instances of A have no problem calling this instance of B, for example, when one availability zone is congested with another. If the same cache Key is used to record the data of all instances of A calling instance of B, it is obviously inaccurate.

Each microservice uses a local cache to record its own calls to other instances, which in our view is not only easier to implement, but also more accurate.

2. Use EMA instead of request window to calculate the latest error rate

For example, if we calculate the error rate in the last minute, we will cache the requests in the last minute. When reading, we will add the cached request data together and take the average. However, this method can take up a lot of memory to cache requests when they surge. At the same time, as the number of cache requests increases, more CPU will be consumed to calculate the error rate. It’s not worth it.

EMA is a sliding average that is commonly used in a variety of performance monitoring statistics scenarios, such as dynamic calculation of TLAB size in the JVM, scaling of G1 GC Region size, and many other areas where the JVM needs to dynamically produce appropriate values. Instead of caching requests, he simply multiplies the latest value by a ratio plus the old value by (1 – this ratio), which is usually higher than 0.5, indicating that EMA is more relevant to the current latest value.

But there is another problem with EMA. We will see that as the program runs the number of decimal places becomes very large and we will see values like the following: 0.00000000123, 0.120000001, 0.120000003, in order to ignore the effect of too fine a difference (which is also due to a long time ago error request), we only keep two decimal digits for sorting.

Wechat search “my programming meow” public account, a daily brush, easy to improve skills, won a variety of offers