Ali failed in the interview: How to select 5 kinds of microservice registry? These dimensions tell you!

1, the preface

At present, there are four mainstream registration centers for micro services:

Zookeeper
Eureka
Consul
Kubernetes

So how do you choose in real development? This is a matter worth in-depth research, don’t worry, today Chen mou will take you to have an in-depth understanding of these four types of registry and how to select the problem.

This is the fourth article in the Spring Cloud Advancements column, which follows:

How strong is Nacos, the soul ferryman of microservices, in 55 images?
OpenFeign kills series 9 asks, who can stand that?
Ali interview asked: Nacos, Apollo, Config configuration center how to select? These 10 dimensions tell you!

2. Why is a registry needed?

With the fragmentation of individual applications, the first challenge is the large number of service instances and the dynamic access addresses exposed by the services themselves. The runtime status of service instances may change frequently due to service expansion, service failure, and update, as shown in the following figure:

Product details need to call marketing, order, inventory three services, the problems are as follows:

The addresses of the marketing, order and inventory services can change dynamically. Simply using the configuration requires frequent changes. If it is written to the configuration file, it also needs to restart the system, which is too unfriendly to production
How to implement the load balancing of the caller in the form of cluster deployment

The solution to the first problem is to use as we say with the great man, there is nothing that can not be solved by adding a middle layer, this middle layer is our registry.

To solve the second problem is about the realization of load balancing, which needs to be combined with our middle brother to achieve.

3. How to implement a registry?

For how to realize the registry, first of all, the model of how to interact between services is abstracted out. We combine the actual case to illustrate this problem, taking goods and services as an example:

When we search for goods, goods and services are providers;
When we inquire about the details of goods, the service provider is also the service consumer, and the consumption is the order, inventory and other services; Thus, we need to introduce three roles: the middle tier (registry), producer, and consumer, as shown below:

The overall execution process is as follows:

When the service is started, the service provider automatically registers its service with the registry through the internal registry client application, including the host address, service name and other information.
When services are started or changed, the registry client of the service consumer can obtain information about registered service instances from the registry or remove services that have been taken offline.

There is also a design for caching local routes. Caching local routes is designed to improve the efficiency and fault tolerance of service routing. Service consumers can equip caching mechanisms to speed up service routing. More importantly, service consumers can leverage local cache routing to make reliable calls to existing services when the service registry is not available.

In the whole implementation process, one of the difficulties is how the service consumer knows how the service producer changes in time. This problem is also a classic producer-consumer problem. There are two ways to solve this problem:

Publish and subscribe: Service consumers can monitor the status of service updates in real time, usually using listeners and callbacks, a classic exampleZookeeper;
Active pull strategy: The consumer of a service periodically invokes the service retrieval interface provided by the registry to get the latest list of services and update the local cache, as in the classic caseEureka;

There’s actually another way to choose between these two, rightData consistencyProblem can talk, such as choosing the timer must be abandoned consistency, pursue is finally is consistent, there is not further opened, and you may also say service removal, etc. These features are not, it seems to me that it was just an additional function, registry key lies in service registration and discovery, everything else is icing on the cake.

4. How to solve the load balancing problem?

Load balancing can be implemented in two ways:

Load balancing on the server side;
Client load balancing; For the implementation of the scheme is essentially similar, but the carrier is different, one is a server, one is a client, as shown below:

Load balancing on the server side gives service providers stronger traffic control, but it cannot meet the needs of different consumers who want to use different load balancing policies.

Client-side load balancing provides this flexibility and more user-friendly support for user scaling. But client load balancing policies, if incorrectly configured, can result in hot spots for service providers or no service providers at all.

Nginx is the typical representative of server load balancing, and Ribbon is the typical representative of client load balancing. Each method has its classic representative, and we can learn more about it.

Common load balancer algorithm implementation, common algorithms are the following six:

1. Polling method

Requests are distributed to back-end servers in a sequential rotation that treats each server on the back-end equally, regardless of the actual number of connections to the server and the current system load.

2. Random method

Through the random algorithm of the system, a server is randomly selected for access according to the size of the list of back-end servers. According to the probability and statistics theory, as the number of times the client calls the server increases; The actual effect is getting closer to evenly distributing the call volume to each server on the back end, which is the result of polling.

3. Hash algorithm

The idea of hash is to obtain the IP address of the client through the hash function to calculate a value, with the value of the size of the server list modular operation, the result is the number of the client to access the server. The source address hash method is used for load balancing. When the list of back-end servers remains unchanged, clients with the same IP address will be mapped to the same back-end server each time for access.

4. Weighted polling method

Different back-end servers may not have the same machine configuration and load on the current system, so they may not have the same ability to withstand stress. Assign higher weights to machines with high configuration and low load to handle more requests; Weighted polling works well for low-configuration, high-load machines by assigning them a lower weight, reducing their system load, and ordering and weighting requests to the back end.

5. Weighted random method

Like the weighted polling method, the weighted random method also assigns different weights according to the configuration of the back-end machine and the load of the system. The difference is that it requests the back-end server randomly based on weights, not sequentially.

6. Least join number method

Least connections algorithm is more flexible and intelligent, due to the backend server configuration is not the same, have fast or slow to processing of the request, it is based on the current back-end server connection, dynamically select the current backlog connections at least one server to deal with the current request, improve the utilization efficiency of the back-end service as much as possible, Will be responsible for reasonably branching to each server.

5. How to select the registry?

Now the choice of registration center is also multifarious, at this stage more popular have the following:

There are some things you should know about CAP, Paxos and Raft algorithms that I won’t go into too much here. Start with the five ways to implement a registry.

1, the Zookeeper

What is interesting about this is that it is not officially described as a registry, but many Dubbo scenarios in China use Zookeeper to complete the functions of the registry.

Of course, there are many historical reasons for this, so we won’t go back to them here, but let’s talk about how Zookeeper performs when used as a registry.

Basic concepts of Zookeeper

1. Three roles

Leader role: A Zookeeper cluster has only one Leader at a time who initiates and maintains heartbeats between Follwers and observers. All write operations must be performed by the Leader who broadcasts the write operations to other servers.

Follower role: A Zookeeper cluster may have multiple followers who respond to the heartbeat of the Leader. Followers directly process and return read requests to the client, forward the write requests to the Leader, and vote on them when the Leader processes the write requests.

Observer role: similar to Follower but without voting rights.

2. Four types of nodes

Persistent-persistent nodes: Nodes exist on Zookeeper unless manually deleted

EPHEMERAL- Temporary node: The life cycle of the temporary node is tied to the client session, and once the client session is EPHEMERAL, all temporary nodes created by the client are removed.

PERSISTENT_SEQUENTIAL- a node that has the same basic characteristics as a persistent node, but with sequential properties and a self-incrementing integer number that is maintained by the parent node.

EPHEMERAL_SEQUENTIAL node: Basic features are the same as temporary nodes, with sequential attributes added. The node name is followed by an incremental integer number maintained by the parent node.

3. A mechanism

Zookeeper’s Watch mechanism is a lightweight design. Because it uses a push-pull model. Once the server perceiving that the topic has changed, it will only send an event type and node information to the concerned client, not the specific change content, so the event itself is lightweight, which is the push part. Then, the client receiving the change notification needs to pull the changed data itself, which is the pull part.

How does Zookeeper implement a registry?

To put it simply, Zookeeper can act as a Service Registry, forming a cluster of multiple Service providers, and enabling Service consumers to access specific Service providers by obtaining specific Service access addresses (IP + port) from the Service Registry. As shown below:

Every time a service provider is deployed, it registers its service in a path of ZooKeeper: /{service}/{version}/{IP :port}.

For example, if our HelloWorldService is deployed on two machines, two directories will be created on Zookeeper:

/ HelloWorldService / 1.0.0/100.19.20.01:16888
HelloWorldService 1.0.0/100.19.20.02:16888.

It’s a little hard to visualize, but this is a little more intuitive,

In Zookeeper, service registration is actually the creation of a Znode node in Zookeeper, which stores the IP address, port, invocation mode (protocol, serialization mode) of the service.

This node has the most important responsibility and is created by the service provider (when the service is published) for the service consumer to obtain the information in the node to locate the real IP of the service provider and initiate the invocation. If the node is set to a temporary node by IP address, the node data will not be stored on the ZooKeeper server.

When the client session that creates the temporary node is closed due to timeout or an exception, the node is also deleted from the ZooKeeper server. When the node is deleted or online, the Watch mechanism of ZooKeeper will be triggered and a message will be sent to consumers, so that consumers’ information can be updated in a timely manner.

In terms of design, Zookeeper follows the principle of CP. The access request to Zookeeper can obtain consistent data results at any time, and the system has fault tolerance for network partitions. When Zookeeper is used to obtain the service list, If the Leader of the Zookeeper cluster breaks down, the cluster will have to elect the Leader, or more than half of the server nodes in the Zookeeper cluster are unavailable (for example, if there are three nodes, if the first node detects the failure of the third node, and the second node also detects the failure of the third node, The node is actually dead), and the request cannot be processed.

Therefore, Zookeeper cannot guarantee service availability.

2, had been

Netflix has a feeling that something better is afoot, so let’s focus on Ereka 1.x.

Eureka consists of two components: Eureka server and Eureka client. The Eureka server serves as the service registry server. The Eureka client is a Java client designed to simplify interactions with servers, act as a polling load balancer, and provide failover support for services.

The basic architecture of Eureka consists of three roles: 1. Eureka Server

Provide service registration and discovery capabilities;

2. Service Provider Registers its own Service with Eureka so that Service consumers can find it;

The Consumer can obtain the list of registered services from Eureka to consume the Service

Eureka Server is designed in accordance with AP principles. Eureka Server can run multiple instances to build clusters and solve single point problems. Instances can register with each other to improve availability. Each node can be treated as a copy of another node.

In a cluster environment, if a Eureka Server is down, requests from the Eureka Client are automatically switched to the new Eureka Server node. After the faulty Server is recovered, Eureka will manage the Server cluster again.

When a node starts accepting client requests, all operations are replicated across nodes, copying the requests to all other nodes that the Eureka Server is currently aware of.

When a new Eureka Server node is started, it first tries to get all registration list information from neighboring nodes and completes initialization. Eureka Server obtains all nodes through the getEurekaServiceUrls() method and updates them periodically through heartbeat contracts.

By default, if the Eureka Server does not receive a heartbeat from a service instance within a certain period of time (default interval is 30 seconds), the Eureka Server will log out of the instance (default interval is 90 seconds, Eureka. Instance. lease-expiration-duration-in-seconds User-defined configuration).

When the Eureka Server node loses too many heartbeats in a short period of time, the node goes into self-protection mode, which needs to be taken care of when testing the environment.

As long as one Eureka is still in the cluster, the registration service is guaranteed to be available, but the information may not be up to date (strong consistency is not guaranteed).

In addition, Eureka also has a self-protection mechanism. If more than **85%** of the nodes do not have normal heartbeat within 15 minutes, Eureka considers that there is a network failure between the client and the registry.

Eureka no longer removes services from the registry that have expired because they have not received heartbeats for a long time;
Eureka can still accept new service registration and query requests, but will not be synchronized to other nodes (i.e. the current node is still available).
When the network is stable, the newly registered information of the current instance is synchronized to other nodes.

3, Nacos

Nacos seamlessly supports some of the mainstream open source ecosystems, as shown below:

Nacos is dedicated to helping you discover, configure, and manage microservices. Nacos provides an easy-to-use feature set that helps you quickly implement dynamic service discovery, service configuration, service metadata, and traffic management.

Nacos helps you build, deliver, and manage microservices platforms more agile and easily. Nacos is the service infrastructure for building modern “service” centric application architectures (e.g., microservices paradigm, cloud native paradigm).

In addition to registering discovery of services, Nacos also supportsDynamically configured service. Dynamically configured services allow you to manage application configuration and service configuration for all environments in a centralized, external, and dynamic manner. Dynamic configuration eliminates the need to redeploy applications and services when configuration changes, making configuration management more efficient and agile. Centralized configuration management makes it easier to implement stateless services and make it easier for services to scale flexibly on demand.

Nacos characteristics

Service discovery and service health monitoring

Nacos supports DNs-based and RPC-based service discovery. After a Service provider registers a Service using a native SDK, OpenAPI, or a separate Agent TODO, Service consumers can use DNS TODO or HTTP&API to find and discover services.

Nacos provides real-time health checks on services to prevent requests from being sent to unhealthy hosts or service instances. Nacos supports health checks at the transport layer (PING or TCP) and the application layer (e.g. HTTP, MySQL, user-defined). Nacos provides two health check modes: Agent report mode and server active check mode for complex cloud environments and network topologies, such as VPCS and edge networks. Nacos also provides a unified health check dashboard to help you manage service availability and traffic based on health status.

Dynamically configured service

Dynamically configured services allow you to manage application configuration and service configuration for all environments in a centralized, external, and dynamic manner.

Dynamic configuration eliminates the need to redeploy applications and services when configuration changes, making configuration management more efficient and agile.

Centralized configuration management makes it easier to implement stateless services and make it easier for services to scale flexibly on demand.

Nacos provides an easy-to-use UI (sample console Demo) to help you manage the configuration of all your services and applications. Nacos also provides a number of out-of-the-box configuration management features including configuration version tracking, Canary publishing, one-click rollback configuration, and client configuration update status tracking to help you more securely manage configuration changes and reduce the risks associated with configuration changes in a production environment.

Dynamic DNS Service

The dynamic DNS service supports weighted routing, enabling you to implement load balancing at the middle layer, flexible routing policies, traffic control, and simple DNS resolution services on the data center Intranet. Dynamic DNS services also make it easier for you to implement DNS protocol-based service discovery to help eliminate the risk of coupling to vendor-proprietary service discovery apis.

Nacos provides some simple DNS APIs TODO to help you manage your service’s associated domain name and available IP:PORT list.

Services and their metadata management

Nacos enables you to manage all services and metadata in your data center from a microservices platform construction perspective, including managing service descriptions, life cycles, static dependency analysis of services, health of services, traffic management of services, routing and security policies, SLA of services, and most importantly metrics statistics.

Nacos supports plug-in management For Nacos data storage, support is temporary as well as persistent.

In terms of design support for CP and AP, it’s just a command switch for him to play with, it also supports migration of various registries to Nacos, anyway, in a word, it has anything you want.

4, Consul

Consul is an open source tool from HashiCorp. Consul, developed from the Go language, is very easy to deploy, requires very few executables and configuration files, and is green and lightweight. Consul is distributed, highly available, and scaleable for service discovery and configuration for distributed systems.

The characteristics of the Consul

Service Discovery

Consul provides a DNS or HTTP interface for registering and discovering services. For external services, Consul makes it easy to find the service they depend on.

Health Checking

Consul’s Client can provide any number of health checks that can be associated either with a given service (” Does webServer return 200 OK “) or with a local node (” Is memory utilization below 90% “). Operators can use this information to monitor the health of the cluster, and service discovery components can use this information to route traffic out of unhealthy hosts.

The Key/Value store

Applications can use Key/Value storage provided by Consul based on their own needs. Consul provides an easy-to-use HTTP interface that, in combination with other tools, enables dynamic configuration, feature marking, leader election, and more.

Secure service communication

Consul can generate and distribute TLS certificates for services to establish mutual TLS connections. Intents can be used to define which service communications are allowed. Service segmentation can be easily managed and is intended to be changed in real time, rather than using complex network topologies and static firewall rules.

Multi-data center

Consul supports multiple data centers out of the box. This means that users don’t need to worry about building additional layers of abstraction to scale their business across multiple regions.

Consul supports multiple data centers and in the figure above there are two datacenters that are connected over the Internet. Also note that to improve communication efficiency, only Server nodes participate in cross-data center communication.

In a data center, Consul is divided into Client and Server nodes (all nodes are also called Agents). The Server node stores data, while the Client performs health checks and forwards data requests to the Server. A Server node has one Leader and multiple followers. The Leader node synchronizes data to followers. The recommended number of servers is 3 or 5.

Consul nodes in a cluster use the Gossip protocol to maintain membership, meaning that a node knows which nodes are in the cluster and whether they are clients or servers. The myth protocols in a single data center communicate using both TCP and UDP, and both use port 8301. The myth protocol across the data center also uses both TCP and UDP communication, using port 8302.

The read/write request of the data in the cluster can either be directly sent to the Server or forwarded to the Server by the Client using RPC. The request will finally reach the Leader node. Under the condition that data delay is allowed, the read request can also be completed on the common Server node. Data in the cluster is read, written, and replicated through TCP port 8300.

Consul can also be registered in the app and then loaded with Spring Cloud family buckets

Let’s talk about Consul’s out-of-app registration:

Registrator Consul Template, Registrator Consul Template, Registrator Consul Template, Registrator Consul Template, Registrator Consul Template

Registrator: An open source third-party service manager project responsible for the registration and destruction of service providers by listening for the viability of Docker instances deployed by the service.

Consul Template: Regularly get the latest list of service provider nodes from the registry server and refresh LB configuration (e.g. Nginx upstream) so that service consumers can access Nginx to obtain the latest provider information, enabling dynamic load balancing.

The overall architecture diagram might look like this:

We use Registrator to monitor the status of each Server. When a new Server is launched, Registrator registers it in Consul, a registry.

Consul Template has already subscribe to Consul’s service messages, Consul registry pushes the new Server information to Consul Template, which then revises the nginx.conf configuration file. Then let Nginx reload the configuration to automatically modify the load balancing.

5, Kubernetes

Kubernetes is a lightweight and extensible open source platform for managing containerized applications and services. Kubernetes enables automatic deployment and scaling of applications.

In Kubernetes, containers that make up an application are grouped into a logical unit for easier management and discovery. Kubernetes has accumulated 15 years of experience running workloads as a Google production environment, incorporating best ideas and practices from the community.

After years of rapid development, Kubernetes has formed a large ecological environment. Google took Kubernetes as an open source project in 2014. Key features of Kubernetes include:

Automated boxing: Automatically deploys containers based on their resource requirements and constraints without sacrificing availability. At the same time, critical and optimal workloads are combined in order to improve utilization and save more resources.
Self-healing ability: When the container fails, it will restart the container. The container is redeployed and rescheduled when the deployed Node has a problem. When the container does not pass the monitoring check, the container will be closed. Services are not provided until the container is up and running.
Horizontal capacity expansion: You can expand or shrink applications using simple commands, user interfaces, or based on CPU usage.
Service discovery and load balancing: Developers can do service discovery and load balancing based on Kubernetes without using additional service discovery mechanisms.
Automatic publish and rollback: Kubernetes can publish applications and related configurations programmatically. If there is a problem with the release, Kubernetes will be able to revert back to the changes that occurred.
Security and configuration management: Security and application configurations can be deployed and updated without the need to rebuild the image.
Storage orchestration: Automatically connects storage systems from local, public cloud providers (such as GCP and AWS), and network storage systems (such as NFS, iSCSI, Gluster, Ceph, Cinder, and Floker).

Kubernetes is a master-slave distributed architecture, mainly composed of Master Node and Worker Node, including client command line tool Kubectl and other add-ons.

A Master Node is a controller Node that schedules and manages clusters. A Master Node consists of the following parts:

Api Server is the gateway of K8S. All instruction requests must go through Api Server.
Kubernetes scheduler, using scheduling algorithm, the request resource scheduling to a Node Node;
Controller Maintains K8S resource objects (CRUD: add, delete, update, and modify).
ETCD stores resource objects (for service registration, discovery, and so on);

Worker Node: as a real work Node, a container running business applications; Worker Node mainly consists of five parts:

Docker is the basic environment for running containers, container engines;
Kuberlet performs resource operations on Node. Scheduler sends the request to Api, and Api Sever stores the information instruction data in ETCD. Kuberlet scans ETCD for the instruction request and executes it.
Kube-proxy is a proxy service that acts as a load balancer.
Fluentd collects logs.
Pod: The basic unit (minimum unit) managed by Kubernetes. Inside the Pod is the container. Kubernetes does not manage containers directly, but pods;

6, summary

1. High availability

All of these open source products have considered how to build high availability clusters, but there are some differences;

2. The choice of CP or AP

For service discovery, the fact that different nodes in the registry store different service provider information for the same service is not disastrous.

However, for service consumers, it would be disastrous for the system if the consumption could not proceed normally due to the abnormality of the registry. Therefore, I think the registry selection should focus on availability rather than consistency, so I choose AP.

3. Technical system

We are all Java technology stacks for languages, and at this point we prefer Eureka, Nacos.

If the company has specialized middleware or operation and maintenance team, Consul and Kubernetes, after all, Kubernetes is the future, what we pursue is to solve these problems within the framework, do not involve in-application business development, we actually have the latter, but may not be able to reach the degree of independent research and development, So you can only ask yourself to go far.

In-application solutions generally apply when service providers and service consumers belong to the same technology system; Out-of-application solutions are generally suitable for business scenarios where service providers and service consumers adopt different technology architectures.

In terms of how Eureka and Nacos choose, this choice is easier to make, that one gives me less to do, I choose that one, obviously Nacos helps us do more.

4. Product activity

These open source products are generally very active

7. One last word

Chen code word sorting is not easy, feel good article friends welcome to like, forward, in the look, thank you for your support!

Interested can pay attention to Chen mou public number: code ape technology column