The previous article “Eureka Caching Mechanism” introduced the Eureka caching mechanism, and I believe you have a better understanding of Eureka. This article will detail how the API gateway implements real-time awareness of service offline.

One, foreword

In cloud-based microservices applications, the network location of service instances is dynamically assigned. And service instances change dynamically frequently due to automatic scaling, failures, and upgrades. Therefore, the client code needs to use a more sophisticated service discovery mechanism.

At present, there are two main modes of service discovery: client discovery and server discovery.

  • Server discovery: The client sends requests to the service registry through the load balancer. The load balancer queries the service registry and routes each request to an available service instance.
  • Client discovery: The client is responsible for determining the network address of the available service instances and load balancing the requests in the cluster. The client accesses the service registry, which is a database of available services. The client then uses a load balancing algorithm to select an available service instance and initiate the request.

The biggest difference between client discovery and server discovery is that the client knows (caches) the available service registry information. If the Client cache is not updated in time from the server, data inconsistency may occur between the Client and server cache.

2. Gateway is used in combination with Eureka

Netflix OSS provides a good example of client service discovery. Eureka Server is the registry, and Zuul is the Eureka Client. Zuul caches the Eureka Server service list locally and updates the service list in the form of scheduled tasks. Zuul also discovers other services through local tables, using the Ribbon for client load balancing.

Normally, the caller initiates a request to the gateway and gets an immediate response. However, when the producer is scaled down, offline or upgraded, the service list B on the LoadBalance side is not updated in time due to the design structure of Eureka multi-level cache and the mechanism of periodic update (as can be seen from the previous article Eureka Cache Mechanism). The maximum perception time of the service consumer approaches to 240s indefinitely. If the consumer initiates a request to the gateway, LoadBalance will initiate a request to a service that does not exist, and the request will time out.

Iii. Solutions

3.1 Implementation Roadmap

After the producer goes offline, the readWriteCacheMap in the Eureka Server is sensed first, and the LoadBalance in the gateway core is sensed last. But loadBalance’s discovery of producers is in a list that loadBalance maintains locally.

Therefore, in order to achieve real-time awareness of producer offline by the gateway, we can do as follows: First, the producer or deployment platform actively notifies the Eureka Server, then directly notifies the Eureka Client in Zuul by skipping the update time between Eureka multi-level caches, and finally updates the service list in the Eureka Client to the Ribbon.

However, if the logical code for offline notification is placed in the producer, it can cause code contamination, language differences, and other problems.

To borrow a famous phrase:

“Any problem in computer science can be solved by adding an indirect intermediate layer.”

Gateway-synchspeed is a proxy service that provides REST APIS to respond to callers’ offline requests and synchronizes the state of producers to the Eureka Server and Gateway core, acting as a state synchronization and soft thing.

Ideas: Before a producer can downsize, go offline, or upgrade, the Spider platform actively notifies gateway-SynchSpeed that an instance of a producer is going offline. Gateway-synchspeed then notifies the Eureka Server producer that an instance has gone offline; If Eureka Server is offline successfully, gateway-synchSpeed notifies the Gateway core directly.

Design features

  • Non-invasive, easy to use. Regardless of the language in which the caller is implemented, the caller simply makes an HTTP REST request to gateway-SynchSpeed, and the real implementation logic is handed over to the proxy instead of the caller.

  • Atomicity. The caller is first offline at Eureka Server and then offline as the minimum work execution unit in all the relevant Gateway cores. Gateway-synchspeed acts as a “soft thing”, ensuring a certain atomic nature of the service offline.

3.2 Implementation Procedure

Step-by-step instructions

  • Step 1: Before the producer downsizes, goes offline, or upgrades, the Spider platform notifies the gateway-SynchSpeed service in the form of HTTP requests. The notification granularity is the IP address of the container where the service instance resides.

  • Step 2: gateway-SynchSpeed verifies the availability of the IP address after receiving the request, and then notifies the Eureka Server.

  • Step 3: The Eureka Server sets the Producer to the invalid state and returns the processing result. (There are two forms of Eureka offline: one is to directly remove the Producer from the service registry, and the other is to go offline, that is, to set the state of the Producer to OUT_OF_SERVICE. In the first case, the Spider does not guarantee that the Producer process will be killed immediately after the offline request is sent. If the Producer’s heartbeat is synchronized to the Eureka Server in the meantime, the service will be re-registered with the Eureka Server.

  • Step 4: gateway-synchSpeed returns the result of the previous step. If the result is successful, go to the next step. Otherwise, stop.

  • Step 5: Gateway-synchSpeed is Eureka Client. Gateway-synchspeed uses the IP address to get the application-name of Producer in the local service registry.

  • Step 6: gateway-synchspeed Run the application-name command to query all Gateway group names related to offline services in the Gateway core library.

  • Step 7: gateway-synchSpeed Searches the local service list for IP addresses (IP: port) of all services in the Gateway group based on the Gateway group name.

  • Step 8: gateway-synchspeed asynchronously notifies all relevant Gateway nodes.

  • Step 9: After receiving the notification, gateway-Core takes Producer offline and records all successful offline instances in the cache DownServiceCache.

  • Step 10: Gateway-core Updates the list of local Ribbon services.

4. Compensation mechanism

Eureka provides a security protection mechanism. The Eureka Client checks whether the Hash value is changed before updating the service list from the Eureka Server. If the Hash value is changed, The update mode changes from incremental update to full update. (According to Eureka Cache Mechanism, the data of readOnlyCacheMap and readWriteCacheMap may be different within 30 seconds.) If readOnlyCacheMap overwrites the Client cache list, the Ribbon service list and readWriteCacheMap data are inconsistent.

For Eureka, a listener, EurekaEventListener, is introduced as a compensation mechanism. It will listen to Eureka Client’s full pull events. For services that are not in the cache for 30 seconds, their status is reset to OUT_OF_SERVICE.

Five, API security design

Considering the security of the system, malicious access may cause producers to be offline in The Eureka Server for no reason, and consumers cannot discover producers through the Eureka Server.

The basic process for security filtering is as follows:

  • Set whitelisted network segment (IP network segment) to gateway-synchSpeed

  • Add a filter to gateway-synchSpeed to verify the IP address of the offline requestor. If the IP address of the requestor is in the network segment, the device is allowed. Instead, filter.

6. Log backtracking

Since gateway-SynchSpeed and gateway-core are deployed in a Docker container, if the container is restarted, all log files will be lost. Therefore, you need to write logs from gateway-SynchSpeed and gateway-core to Elasticsearch. Kibana is responsible for querying Elasticsearch data and displaying it visually.

7. Code snippet display

Gateway-synchspeed Performs status synchronization

EurekaEventListener processes cached data

Viii. Supplementary notes

At present, the gateway realizes real-time awareness of service offline, and uses Spring Cloud Zuul 1.3.6.RELEASE and Spring Cloud Eureka 1.4.4.release.

At present, the gateway realizes real-time perception of downstream services of the gateway, and the following conditions must be met:

  • Producers need to be deployed on the Kubernetes container management platform
  • The producer performs normal downsizing, upgrading, or scaling operations. If insufficient container resources cause abnormal services, such as service downtime, to go offline, this function is not supported.

Gateway service offline real-time perception is a gateway for the business to provide an alternative solution, default is not open this function in the platform of spiders, whether to open this function is determined by business party according to the system requirements, specific how to configure can reference API gateway access in the guide of the gateway real-time perception on spiders configuration documentation “.

Author: Xie Guohui

Source: Creditease Institute of Technology