1. Eureka’s self-protection

After the service is registered in Eureka. By default, Eureka sends a heartbeat every 30 seconds. If Eureka does not receive a heartbeat within a period of time (90 seconds by default), the service is disabled. However, sometimes the service is normal, but the heartbeat is not sent to Eureka due to abnormal network jitter. If Eureka removes the service at this time, when the network is restored, the service will not register with Eureka again (the service can only register with Eureka when it is started). The service is not accessible through Eureka.

To prevent this kind of accidental killing, Eureka provides a self-protection mechanism: Eureka is triggered when the number of heartbeats received by Eureka within 15 minutes is less than the total number of heartbeats that Eureka should have received * the self-protection threshold (default 0.85). This mechanism is enabled by default. Exit self-protection after the network recovers.

The general idea is that it is better to keep the unhealthy than blindly cancel any healthy service.

For example, we have 10 servers. Under normal circumstances, 10 * (2 * 15) = 300 heartbeats should be sent to Eureka within 15 minutes (once every 30 seconds). However, when the heartbeat received by Eureka is less than 300 * 0.85 = 255, self-protection will be triggered. There are two possibilities for not having a heartbeat.

  1. Eureka did not receive the service heartbeat for network reasons. The heartbeat continues to be sent after the network is restored.
  2. The service is down. It’s too late to go offline. When will that be removed from the sign-up list? Wait for services that are actually protected by network jitter to send a heartbeat again.
  • So when do you turn on self-preservation and when don’t?
  1. I personally a little bit of small thinking, throw a brick to attract jade, service more open protection, service less protection.

Network problems are more likely because more than 15 percent of people receive no heartbeat when there is a lot of service. However, if the service is less than 15% without heartbeat, the service is more likely to hang up. If the failed service is protected, the client will return an error 2. Of course, in order to ensure the robustness of the online system, self-protection can be turned on at any time.

The self-protection configuration is as follows:

Eureka: server: ## self-preservation enable-self-preservation: true ## Self-preservation threshold, which can be modified appropriatelyCopy the code

2. Get offline quickly

Eureka Server creates a scheduled task when it starts. Every time (60 seconds by default), the Eureka Server removes services that have not been renewed (90 seconds by default) from the current service list. We can set the time between scheduled tasks to be shorter, so as to quickly get offline. Prevents unusable services from being pulled.

Eureka: server: eviction- interval-timer-in-MS: 3000 // e.g. 3sCopy the code

3. Cache optimization

In order to avoid concurrency conflicts caused by simultaneous reading and writing of memory data structures, Eureka Server adopts a three-level cache mechanism to further improve the response speed of service requests. The steps to pull the registry are:

  1. First look up the cached registry from ReadOnlyCacheMap.
  2. If not, look for the registry cached in ReadWriteCacheMap.
  3. If not, get the actual registry data from memory.

When the registry is changed, the registry data and the data cached in ReadWriteCacheMap are updated to ReadOnlyCacheMap 30 seconds later by default. To improve the speed at which services are discovered. We can do some Settings.

  1. To remove a service, obtain the service directly from ReadWriteCacheMap instead of ReadOnlyCacheMap.
Eureka: server: use-read-only-response-cache: false // Disable pulling data from ReadOnlyCacheMap.Copy the code
  1. Shorten the time between ReadWriteCacheMap and ReadOnlyCacheMap. The default is 30 seconds, but we can optimize it to 3 seconds, depending on your situation.
eureka:
  server:
	response-cache-update-interval-ms: 3000
Copy the code

There is a problem with the code when looking at the source code:

if (shouldUseReadOnlyResponseCache) {
            timer.schedule(getCacheUpdateTask(),
                    new Date(((System.currentTimeMillis() / responseCacheUpdateIntervalMs) * responseCacheUpdateIntervalMs)
                            + responseCacheUpdateIntervalMs),
                    responseCacheUpdateIntervalMs);
        }
Copy the code

Why System. CurrentTimeMillis () divided by responseCacheUpdateIntervalMs by responseCacheUpdateIntervalMs again, System.currenttimemillis ()?

In fact, the use of timer also has a hidden danger, that is, when multiple threads process timed tasks in parallel, when the timer runs multiple timetasks, as long as one of them does not catch the exception thrown, the other tasks will automatically terminate. You can use ScheduledExcutorService instead.

4. Tips on client development

When we were developing the client, if we didn’t start the registry, we kept reporting the registry connection timeout error. We can do the following configuration at development time to decouple the service from the registry.

Eureka: client: ###Copy the code

5. The client pulls the registry more immediately

Api-client periodically pulls the registry from eureka-server. By default, it is pulled every 30 seconds. You can set the pull interval as required.

Eureka: client: fetch-registry=true ### registry-fetch-interval-seconds: 3Copy the code

6. Client. ServiceUrl defaultZone optimization

The apI-client pulls registry information from Eureka-server in the sequence configured by defaultZone. When Eureka1 is unavailable, the apI-client obtains/registers registry information from Eureka2. But if Eureka1 does not hang. All microservices will first get information from Eureka1, resulting in excessive pressure on Eureka1. In real production, each microservice could be randomly configured with a different defaultZone order. Manually perform load balancing. For example, clientA defaultZone: Eureka1, Eureka2, Eureka3; DefaultZone of clientB: is Eureka2, Eureka3, Eureka1.

eureka:
  client:
    serviceUrl:
      defaultZone: eureka1,eureka2,eureka3
Copy the code

7. Client heartbeat frequency

By default, the client sends a heartbeat to the server every 30 seconds. This time can also be appropriately adjusted to a little less.

Eureka: instance: ## Send a heartbeat to the server every 30 seconds to prove that it is still "alive". lease-renewal-interval-in-seconds: 30Copy the code

8. The interval at which the server removes the client

By default, the server did not receive the client heartbeat in the 90s, so I was kicked out. In order to make the service respond quickly, this time can be adjusted to be smaller.

eureka:
 instance:
    lease-expiration-duration-in-seconds: 90
Copy the code

— — — — — — — — — — had other problems — — — — — — — — — — —

Where is consistency not achieved? That’s the C in CAP.

  1. Self-protection mechanism, so that the network is not good will also be able to pull the registry to call.
  2. This was not implemented during cache synchronization. ReadOnlyCacheMap is inconsistent with ReadWriteCacheMap when tuning the cache.
  3. Pull the registry from other peers. The state between clusters is asynchronously synchronized, so it is not guaranteed that the state between nodes is consistent, but it is almost guaranteed that the final state is consistent.

Cluster synchronization, cluster doesn’t expand Eureka doesn’t expand its affordability, it just makes it available.

Under what circumstances is data synchronized? We analyze it from the following nodes.

  1. Register: the first node is registered and only the next node is synchronized.
  2. Renewal: New service renewal is automatically synchronized to other Eureka-servers.
  3. Offline: All clusters are synchronized.
  4. Culling: Not synchronized, each Server has its own culling mechanism.

Estimate how much service it can handle

For example, there are 20 services, each of which deploys 5 instances. So 20 times 5 is 100.

  1. By default, an instance sends heartbeat every 30 seconds and pulls the registry every 30 seconds. The number of requests received per minute by the Service. 100 * 2 * 2 = 400 times. That’s 400 * 60 * 24 = 576,000 requests on that day. That’s more than 5 million visitors a day.

Therefore, by setting an appropriate pull registry and sending heartbeat frequency, you can ensure that the request pressure on Eureka Server is not too heavy in large-scale systems.

A problem in production. When the service is restarted, it is still accessible but returns a service error

When starting the service, stop the service first and then manually trigger the offline service. If you do not log out manually, services that are being restarted may be accessed. The service is not available. If you log out manually, the restarted service may be pulled out.

Regional problems

When the number of users is relatively large, our services may be arranged in different areas, different computer rooms. When we go online with micro-service, we hope that the service in the same machine room will call the service in the same machine room. When the service in the same machine room is unavailable, the service in other machine rooms will be called. It’s like CDN. This reduces network latency. Eureka provides two concepts for partitioning.

  1. Region: equivalent to a region, such as Beijing.
  2. Zone: region subordinate units, such as Beijing Machine Room A and Beijing machine Room B.

Well write so much, what questions we can discuss.

This article has been included in the headline “Programmer’s Book Club”, 8 Eureka optimization tips to increase productivity by 100 times