An overview of the
Why use service discovery
Suppose you’re writing some code that calls a service with a REST API. In order to make the request, your code needs to know the network location (IP address and port) of the service instance. In traditional applications running on physical hardware, the network location of service instances is relatively static. For example, your code could read the network location from an occasionally updated configuration file. However, in modern cloud-based microservices applications, this is a conundrum to solve, as shown in the figure below:
The network location of the service instance is registered in the service registry at startup. When an instance terminates, it is removed from the service registry. A heartbeat mechanism is typically used to refresh the registry of service instances periodically.
Theory of CAP
CAP theory was born
In the development of distributed systems, CAP theory is the most influential and most extensive theory, which can be said to be the theoretical foundation of the development of distributed systems. Back in 1998, Eric Brewer, a computer scientist at the University of California, Proposed three metrics for distributed systems. On this basis, Eric Brewer further proposed CAP conjecture two years later. Two years later, in 2002, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology theoretically proved the CAP conjecture. The CAP conjecture became the CAP theorem, also known as Brewer’s theorem. Since then, CAP theorem has become the theoretical foundation for the development of distributed systems, and has far-reaching influence on the development of distributed systems.
CAP theoretical index
CAP theory simply means that a distributed system cannot satisfy the three elements of Consistency, Availability and Partition Tolerance at the same time. The conclusion is called CAP theory because the first letters of Consistency, Availability, and Partition Tolerance are C, A, and P respectively.
Consistency Consistency
The first element of CAP theory is Consistency. Consistency refers to “All nodes see the same data at the same time”. That is, all nodes return the same data when accessed at any time. Another interpretation by CAP author Brewer is that the read instruction after the write operation must get the value written by the write operation, or the newly updated value after the write operation.
From the perspective of the Server side, after the Client writes an update, the Server side synchronizes the new value to the entire system to ensure that the data is the same throughout the system. From the client’s point of view, it is how to get the latest value after changing the data during concurrent access.
The Availability Availability
The second element of CAP theory is Availability. Availability means “Reads and writes always succeed”. That is, the service cluster can always respond to user requests. Another of Brewer’s explanations is that a node with no outages or exceptions can always respond to user requests.
That is, when a user accesses a properly functioning node, the system guarantees that the node must give the user a response, either a correct response or an old or even an incorrect response, but not no response. From the perspective of the server, it means that the service node can always respond to user requests without swallowing or blocking requests. However, from the client’s perspective, the request always has a response, and the whole service cluster will not be disconnected, timed out, and no response.
Partition Tolerance Fault Tolerance of a Partition
The third factor is Partition Tolerance. Partition fault tolerance in English means “The system continues to operate despite arbitrary message loss or failure of part of The system”. Even if a partition is faulty or communication between partitions is abnormal, the system still provides external services.
In a distributed environment, each service node is not reliable, and communication between different service nodes can be problematic. When some nodes fail, or communication between some nodes and other nodes fails, the whole system becomes partitioned.
From the perspective of the server, the service cluster can still provide stable services when a node failure or network exception occurs, which means that it has good fault tolerance of partitions. From the client’s point of view, the server is transparent to itself.
Normal Service Scenario
As shown in the figure, there are two service nodes, Node1 and Node2, which form a distributed system through network connectivity. In a normal service scenario, Node1 and Node2 are always running properly and the network is always connected. Assuming that the data in both nodes is the same at some initial time, both are V0, users accessing Nodel and Node2 will immediately get V0 response.
When the user updates data to Node1 and changes V0 to V1, the distributed system constructs a data synchronization operation M to synchronize V1 to Node2. Since Node1 and Node2 both work properly and communicate well with each other, V0 in Node2 is also changed to V1. At this point, the user requests Node1 and Node2, respectively, and gets V1. The data remains consistent and always gets a response.
Network Exception Scenario
As a distributed system, there are always multiple distributed nodes that need network connection. The more nodes there are and the more complex the network connection, the greater the probability of node failure and network abnormality. The three elements of CAP should be fully satisfied. This means that if a network exception occurs between nodes, network exceptions need to be supported, that is, fault tolerance of partitions, and the distributed system needs to meet the requirements of consistency and availability.
Now continue to assume that the data for both Node1 and Node2 is V0 at the beginning, and then the network between Node1 and Node2 is disconnected. The user sends a change request to Node1 to change V0 to V1, and the distributed system prepares to initiate synchronization operation M. However, due to the disconnection between Node1 and Node2, synchronization operation M cannot be synchronized to Node2 in time. Therefore, data on Node2 is still V0.
At this time, a user sends a request to Node2. Because Node2 is disconnected from Node1 and data is not synchronized, Node2 cannot immediately return the correct result V1 to the user. So what to do?
There are two options.
In the first scenario, at the expense of consistency, Node2 returns a response to the requesting user with old data V0.
In the second scenario, at the expense of availability, Node2 continues to block requests until the network connection between Node1 and Node2 is restored and the data update operation M completes on Node2, which then returns the correct V1 operation to the user.
At this point, the brief proof process is over. The whole analysis process also shows that when distributed system meets the fault tolerance of partition, it cannot meet the consistency and availability at the same time, and can only choose one of the two, which further proves that distributed system cannot meet the consistency, availability and fault tolerance of partition at the same time.
CAP weigh
In a distributed system, multiple replicas of data are reserved to ensure high availability of data. Therefore, network partitioning is an established reality, and a choice can only be made between availability and consistency. CAP theory focuses on the fact that in the absolute case, in engineering, availability and consistency are not completely opposite. What we focus on is how to improve the availability of the system while maintaining relative consistency.
- CA architecture: Does not support partition fault tolerance, only consistency and availability.
Partition fault tolerance is not supported, which means that partition exceptions are not allowed, and devices and networks are always in the ideal state of availability, so that the entire distributed system meets the consistency and availability.
But because of distributed system is composed of many nodes through network communication link building, equipment failure, network anomalies is objective existence, and the distribution of the nodes, the range is wide, the greater the probability of failure and abnormal also, therefore, for distributed systems, partition fault-tolerant P is inevitable, if avoid the P, You can only roll back a distributed system to a single-instance system.
- CP architecture: Because partition fault-tolerant P exists objectively, it is equivalent to giving up system availability in exchange for consistency.
When a partition exception occurs, the system blocks all services until the partition fault is rectified. In this way, data consistency is ensured. CP is widely used, especially for services sensitive to data consistency. For example, in payment transactions and distributed databases such as Hbase, data consistency must be ensured first. If a network exception occurs, the system suspends service processing. In distributed systems, Zookeeper, which distributes and subscribing metadata, also preferentially guarantees CP. Because the consistency of data is the basic requirement of these systems, otherwise, the bank system balance withdrawal, database system access, random return of new and old data will cause a series of serious problems.
- AP architecture: Due to the objective existence of partition fault-tolerant P, it is equivalent to giving up the consistency of system data in exchange for availability.
When the system encounters a partition exception, the nodes cannot communicate with each other and the data is in an inconsistent state. In order to ensure availability, service nodes respond immediately after receiving user requests, which can only return their new and old different data. This abandonment of consistency, while ensuring system availability in the case of partition exceptions, is very common in Internet systems. For example, if the network in different areas is interrupted, the users in the area will still tweet, comment on each other and like each other, but they cannot see the new microblog and interaction status of users in other areas for the time being. The same is true for wechat moments. There are such as 12306 train ticket system, in the holiday peak grab tickets, occasionally also encountered, repeatedly see a train more than tickets, but each time the real click to buy, but prompted that there is no more tickets. In this way, although a small part of functions are limited, the overall service of the system is stable and the impact is very limited. Compared with CP, the user experience will be better.
CAP problems and misunderstandings
CAP theory has greatly promoted the development of distributed system, but with the evolution of distributed system, it is found that the classical CAP theory is actually too idealistic, and there are many problems and misunderstandings.
First, take the Internet scenario as an example. A large and medium-sized Internet system has a large number of hosts and is deployed in multiple regions. Each region has multiple IDCs. Node failures, network exceptions, and partition problems are common. In order to ensure user experience, theoretically, service availability must be guaranteed. It is the best choice to choose AP and temporarily sacrifice data consistency.
However, when a partition exception occurs, if the system is not well designed, there is no simple choice between availability or consistency. For example, when partitioning occurs, if the system in one region must access the dependent sub-services in another region to provide services properly, but the network is abnormal and cannot access the dependent sub-services in the other region, the services will be unavailable and cannot support availability. At the same time, data consistency cannot be guaranteed due to network exceptions, and data in different regions are temporarily inconsistent. After the network is restored, the data to be synchronized is large and complex, causing inconsistency. In addition, some service operations may be related to the execution sequence. Even if all data is synchronized in different regions, the execution sequence may be different, resulting in inconsistent results. A large amount of data inconsistency accumulates after multiple partition exceptions for a long time, which affects user experience continuously.
Second, partitioning problems certainly occur in distributed systems, but they occur infrequently, or very rarely, relative to the time of steady operation. When partitions do not exist, you should not just choose C or A, but can provide both consistency and availability.
Again, in the same system, different businesses, different stages of the same business process, when partitioning occurs, the choice of consistency and availability strategies may be different. For example, in 12306 ticket purchase system mentioned above, AP will be selected for train number query function, and AP will be selected for ticket purchase function in the query stage, but CP will be selected for ticket purchase function in the payment stage. Therefore, in the system architecture or function design, it is not easy to choose AP or CP.
Moreover, every element of CAP theory is not black and white in the actual operation of the system. For example, there are strong consistency and weak consistency. Even if a large number of data are inconsistent temporarily, after a period of time, the inconsistent data will decrease and the inconsistent rate will decrease. Another example is availability. Some functions may be abnormal and other functions are normal, or the system may be overloaded and can only support some users’ requests. Even the partition can have a series of intermediate states, the situation of complete interruption of the LAN is less, but the network communication conditions can change continuously from 0 to 100%, and different services, different functions, different components in the system can also have different cognition and Settings for the partition.
Finally, CAP classical theory, without considering the actual business of network delay problems, delay exist from the beginning to the end, even the partition abnormal P can be viewed as a kind of delay, and this can be any delay time, 1 second, 1 minutes, 1 hour, 1 day possible, system architecture and functional design at this time to consider, How to define and differentiate and what to do about it.
Zookeeper-based Service Discovery (CP)
Process analysis
1, service platform management side to create a service in the ZooKeeper root, can according to the interface name (for example: dubbo/com. Foo. BarService), on the path to create the service providers and service the caller directory (for example: Providers and consumers, respectively, are used to store node information of service providers and service callers.
2. When the service provider initiates the registration, a temporary node will be created in the service provider directory to store the registration information of the service provider.
The temporary node is created because the life cycle of the temporary node is related to the client session, so the temporary node is automatically removed from Zookeeper if the provider’s machine fails and the provider is unable to provide the service.
3. When a service caller initiates a subscription, a temporary node is created in the service caller directory that stores information about the service caller. Service at the same time the caller watch the provider of the service directory (/ dubbo/com. Foo. BarService/will) all service node in the data.
4. When node data changes in the directory of the service provider, ZooKeeper will notify the service caller that initiates the subscription.
Existing problems
** Problem 1: **
ZooKeeper is characterized by strong consistency. When data on each ZooKeeper node is updated, other ZooKeeper nodes are notified to perform the update.
If the number of nodes connected to ZooKeeper is too large, the nodes frequently read and write data to ZooKeeper, and the number of directories stored by ZooKeeper reaches a certain number, ZooKeeper becomes unstable and the CPU continues to increase, causing ZooKeeper to break down. After the shutdown, all service nodes continue to send read and write requests. ZooKeeper fails to withstand the instant read and write pressure immediately after it is started.
Problem two:
ZooKeeper fails to process network partitions discovered by the service. In ZooKeeper, node clients that cannot reach the quorum number of partitions cannot communicate with ZooKeeper and its service discovery mechanism at all.
Eureka based Service Discovery (AP)
Process analysis
1. Eureka-client registers the service instance information with any Eureka-server during initialization and sends heartbeat requests every 30 seconds.
2. The Eureka-server synchronizes registered and heartbeat requests to other Eureka-servers in batches.
Existing problems
Problem a:
The subscriber gets the full address of the service: this is a big drain on the client’s memory, especially in the case of multi-data center deployments, where the subscriber in a data center often only needs to work with the service provider in the data center.
Problem two:
The client adopts the mode of periodically pulling service data from the server (that is, client rotation training), which has problems of insufficient real-time performance and unnecessary performance consumption.
Question 3:
The consistency protocol of multiple copies of Eureka cluster adopts the AP protocol similar to “asynchronous multi-write”. Each server sends locally received write requests to all other machines in the cluster (called peer by Eureka). In particular, Hearbeat messages are sent periodically between client->server-> All peers. Such consistency algorithm leads to the following problems:
- Each Server needs to store a full amount of service data, and Server memory becomes an obvious bottleneck.
- When the number of subscribers increases, the Eureka cluster needs to be expanded to improve the read capacity. However, the expansion will cause each server to bear more write requests, and the effect of expansion is not obvious.
- All servers that make up the Eureka cluster need to have the same physical configuration and can only be upgraded to accommodate more service data.
Extension: Eureka 2.0
Eureka 2.0 is designed to address these issues and includes the following improvements and enhancements:
- Data push moves from pull to push mode, and more granular on-demand subscription of service address is implemented.
- Read/write separation: The write cluster is stable and does not need to be expanded frequently. Read clusters can be expanded as required to improve data push capability.
- Added audit log functions and dashboards with richer functions.
Ali Nacos
The contrast with Eureka 1.x is enhanced as follows:
- The client-based synchronization mode is removed, and the scheme of batch data synchronization based on long connection level + periodic renew is adopted to ensure data consistency.
- The client subscribes to services of interest, and the server “pushes” only a small amount of data to the client that the client is interested in.
- The cluster is divided into session and data clusters. Clients fragment service data to register in the session cluster, and the session cluster asynchronously writes data to the data cluster. After the data cluster completes the aggregation of service data, After the compressed service data is pushed to the session layer for caching, the client can directly subscribe to the required service data from the session layer.
Note: The nature of “push” of Nacos service information is still in pull mode, and the specific process is as follows:
1. The Nacos client will repeatedly request the changed data of the server, and the timeout period is set to 30s. When the configuration changes, the response of the request will be returned immediately.
2. After the server data is changed, find the response in the specific client request and directly write the result into the response;
3. The Nacos client can sense changes in server configuration in real time.
conclusion
For service discovery, it is better to have information that may contain false information than to have no information at all, so I personally think AP is better than CP.
In AP mode, if the request is sent to a down server due to inaccurate service instance information, the request can be successfully processed as long as the retry mechanism and load balancing are properly implemented.
Further reading
CP vs AP for service discovery(consul)
Why not use Curator/Zookeeper as a service registry?
Eureka! Why You Shouldn’t Use ZooKeeper for Service Discovery
Why doesn’t Alibaba use ZooKeeper for service discovery?
The resources
Service Discovery in a Microservices Architecture
Geek Time: RPC in action and core Principles
Geek Time: Principles and Algorithms of distributed technology
Technical development review of Alibaba Service Registry product ConfigServer in 10 years