Talk about service registration and discovery for microservices

Abstract:

The introduction

Zk, ETCD, Consul, Eureka components are immediately on the list when talking about service registration and discovery for microservices, and then how CAP trade-offs, performance, high availability and disaster recovery are implemented.

Before I do that, from the component user’s perspective, I’d like to ask a few questions:

How to determine the REGISTERED IP and port?
What other information needs to be registered to implement service governance?
How to do elegant service registration and service offline?
How is the health check for the registration service done?
Can subscribers be notified when a node exits or a new node joins the service?
Can I easily see which services an application publishes and subscribes to, as well as which nodes it subscribes to?

After looking at these issues, you may find that service registration and discovery should focus first on the functionality of service registration discovery itself, and then on performance and high availability.

A good service registry discovery middleware should be able to fully meet the basic functions of service development and governance, followed by performance and high availability. If you don’t think through the previous features, no matter how high the availability and performance. Finally, safety is equally important.

What is the performance of the server?
What is the Dr Policy for service discovery?
What happens to my calls when my application and service discovery center has a network connection problem?
What happens to my calls when one or all of the machines in the service registry go down?
Are the links for service registration and discovery secure, and are permissions well controlled?

Here are the main approaches to answering these questions in terms of service registration, service discovery, disaster recovery, and high availability.

Finally, we will introduce ANS(Alibaba Naming Service), ANS integrates the advantages of these solutions, and outputs in EDAS(Alibaba enterprise distributed application Service), currently completely free!

The service registry

How to determine the REGISTERED IP and port?

How to determine the IP

There are several mainstream IP acquisition methods.

The simplest way is to manually configure the IP address that needs to be registered. Of course, this method cannot be used in the production environment, because microservices support horizontal expansion and multi-machine deployment. If IP addresses are written to death in the configuration, horizontal expansion cannot be supported by one code, resulting in high o&M costs.
Obtain the IP address by traversing the network adapter and find the first IP address that is not the loopback address. For the most part, this works better, and frameworks like Dubbo use this approach.
In some standardized computer rooms with good network planning, you can manually specify the network adapter name, that is, interfaceName, to specify which NETWORK adapter corresponds to the IP address for registration.
When the above three methods fail to solve the problem, one method is to directly establish a socket connection with the service registry, and then throughsocket.getLocalAddress()This way to obtain the IP address of the host.

How to determine the port

There is no standardized approach to port acquisition.

For RPC applications, there is a configuration to specify the port on which the service listens during startup, and the port value of the configuration item is directly used during registration.
For HTTP applications provided by traditional WEB containers, there is also a configuration file to configure the container’s listening port, and the port value of the configuration item is directly used when registering.
In particular, in the Spring Boot framework of Java applications, you can passEmbeddedServletContainerInitializedEvent. getEmbeddedServletContainer().getPort()In order to get. (The Spring Boot version is 1.x).

What other information needs to be registered to implement service governance?

Simple registration of IP and port information can satisfy basic service invocation requirements, but at a certain point in the business we still have these requirements:

You want to know if TLS is enabled for an HTTP service.
Set different weights for different nodes in the same service to schedule traffic.
The service is divided into pre-delivery environment and production environment to facilitate the AB Test function.
The services of different equipment rooms are registered with equipment room labels to implement the same equipment room priority routing rule.

The implementation of these advanced functions, in essence, depends on the client call load balancing strategy and call strategy, but if the service metadata is not registered, it can only be a clever housewife. A good service registry should support these extended fields from the beginning of the design.

How to do elegant service registration and service offline?

Elegant release

Although service registration generally occurs during the startup phase of a service, in refinement, service registration should not occur until the service has been fully started and is ready to provide services externally.

Some RPC frameworks themselves provide methods to determine whether the service has been started, such as Thrift, which we can do via server.isserving ().
Some RPC frameworks themselves do not provide a way to see if the service is started. In this case, we can check whether the port is already listening.
For THE HTTP service, whether the service is started can also be determined by whether the port is in the listening state.
In particular, in the Spring Boot framework of Java applications, event notifications can be used to notify the container that it has been started. EmbeddedServletContainerInitializedEvent event to notify the container has begun to complete (Spring Boot version for 1 x).

Elegant offline

Most service registries provide a health check function that automatically removes the node for the service when the application is stopped. However, we should not rely entirely on this functionality, and the application should actively invoke the service offline interface of the service registry when stopped.

In Java applications, the generic service offline interface invocation is usually implemented using the JVM Shutdown Hook.
In particular, in the Spring framework of Java applications, Spring Bean LifeCycle can be used to actively invoke the service offline interface when an application is stopped.
Of course, the above two methods are not elegant enough, because there is no guarantee that there will not be a rude stop method such as kill-9, and the application call service offline interface is also tried to call, and there is no exception handling for abnormal scenarios such as network failure. Therefore, the calling client should still perform load balancing and failover.
In a more elegant way, set the weight of the application to 0, and the upstream application will not be called. In this case, stopping applications has no impact on service subscribers. Of course, this scenario requires subscribers to implement weighted load balancing and in-depth integration of o&M deployment tools.

How is the health check of the service done?

There are two health check methods: client heartbeat and server active detection.

Client Heartbeat
The client sends a heartbeat message to the server to indicate the normal service status. The heartbeat message can be in the form of TCP or HTTP.
You can also implement a client heartbeat by maintaining a long socket connection between the client and server.
ZooKeeper does not actively send heartbeat. Instead, it relies on the temporary node feature provided by the component itself to maintain the temporary node through the Session connected to ZooKeeper.

However, in the heartbeat of the client, the maintenance of the long connection and the active heartbeat of the client only indicate that the link is normal, not necessarily the service status.

It is a more accurate way for the server to actively invoke the service for health check, and the successful return result indicates that the service status is indeed normal.

Active detection by the server
The server invokes one of the HTTP interfaces of the service publisher to complete the health check.
For RPC applications that do not provide HTTP services, the server invokes the interface of the service publisher to complete the health check.
This can be done in the form of executing a script.

Server-side active probing is also problematic. Service registries can’t call some interface of RPC service to achieve universality. In many scenarios, the network between the service registry and the service publisher is disconnected, and the server cannot initiate a health check.

Therefore, how to make a choice still needs to be decided according to the actual situation, according to different scenarios, choose different strategies.

Service discovery

How to find the address of service discovery server?

Specify the address of the service registry in the application configuration file, similar to ZooKeeper and Eureka.
Specify the address of an address server and use this address server to obtain the address of the service registry. The results returned by the address server are updated as the service registry expands and shrinks.

How can subscribers be notified when a node exits or a new node joins the service?

Classic Push and Pull problems.

There are two classical implementations of Push, namely notify based on socket long connection. The typical implementation is ZooKeeper. The other uses Long Polling for HTTP connections.

However, both notify based on socket Long connection and Long Polling based on HTTP have the problem of notify message loss.

Therefore, regular polling through Pull is also necessary, and the choice of time interval is also critical. The higher the frequency, the greater the pressure on the service registry. The trade-off needs to be based on the performance of the server and the scale of the business.

There is another way, real Push, where the client starts a UDP server and the service registry pushes data through UDP. Of course, this is also limited by network connectivity.

Can I easily see which services I publish and subscribe to, and which nodes the subscribed services have?

For a good product, the user experience and operation experience must be elegant, and if viewing native published and subscribed services can only be obtained by viewing logs or even jMaps, the experience is obviously terrible.
The service registry should provide a rich interface to support multi-layered composite queries based on application name, IP address, subscribed service name, and published service name.
At the same time, the client’s memory should also hold various information about service publishing and subscription, and provide a way for people to easily query.
For example, in the Application of Spring Boot in Java, the use of the actuator endpoint can be combined to provide the local service query function by means of HTTP, which can query the services published by the application as well as the subscribed services and corresponding nodes of each service.

Disaster recovery and high availability

How the performance

When the number of service nodes increases, the performance of the service registry becomes a bottleneck. In this case, horizontal capacity expansion is required to improve the performance of the service registry cluster.

For components that use the paxOS-like protocol for strong consistency, such as ZooKeeper, more than half of the nodes need to confirm each write operation. Horizontal expansion can only improve the read performance of the entire cluster, but cannot improve the write performance of the entire cluster.
For components that adopt final consistency, horizontal expansion improves both the write and read performance of the entire cluster.

Client Dr Policies

First, the local memory cache allows services to be invoked normally when the connection to the service registry is lost at run time or when the service registry is completely down.
Then, the local cache file, when the application and the service registry have a network partition or the service registry is completely down, the application restarts, there is no data in the memory, the application can read the data of the local cache file to obtain the last subscription content.
Finally, the local Dr Folder. In normal cases, the Dr Folder has no content. When the server is down and cannot be recovered for a long time and the service provider is changed greatly, you can enable local Dr By adding files to the Dr Folder. In this case, the client ignores the original local cache file and reads the configuration from the local Dr File.

Server Dr And HIGH availability

When a new node is added to the cluster, the node can be automatically added to the address server after being started. The address server can find other nodes and automatically synchronize data from other nodes to achieve data consistency.
When a node is down, the information about the service registry node is automatically removed from the address server, and the client can sense that the node is offline in time.

The statelessness of the server ensures that the disaster recovery and high availability of the service are very thin.

How is server side security done?

Link security. For service registries that use HTTP connections, the best way to secure links is to use HTTPS. For service registries that use TCP connections, the application layer protocols generally use proprietary protocols, so there may not be a ready-made TLS support scheme.

In terms of service security, authentication information should be added to each advertisement, subscription, and heartbeat communication to verify and authenticate the service information.

Alibaba Naming Service

ANS (Alibaba Naming Service) is an open source product that the Alibaba middleware team has refined years of business practice. In terms of service registration and discovery, ANS combines the advantages of the above solutions and is the most suitable service registration and discovery component for cloud native applications. ANS service has been launched in EDAS(Alibaba Enterprise distributed Application Service), and Spring Cloud ANS Starter has been provided to facilitate Spring Cloud users to directly use a secure and reliable commercial version of service registration and discovery functions. ANS perfectly supports Eureka’s features and is currently completely free! See the EDAS help documentation for more information.

The original link