Preface:
Generally, private RPC frameworks choose XML configuration to describe interfaces. Because the performance of private RPC protocol is higher than that of HTTP protocol, XML configuration is suitable for scenarios with high performance requirements.
1. How do I publish and reference services
Service description: Service invocation first addresses the problem of how the service is described externally. Common service description methods include RESTful apis, XML configuration, and IDL files.
RESTful API
It is primarily used as an interface definition for HTTP or HTTPS protocols and is widely adopted even in non-microservices architectures
Advantage:
The HTTP protocol itself is an open protocol with little learning cost for service consumers, so it is better used as a service protocol across business platforms.
Disadvantages: – Relatively low performance
The XML configuration
Generally, private RPC frameworks choose XML configuration to describe interfaces. Because the performance of private RPC protocol is higher than that of HTTP protocol, XML configuration is suitable for scenarios with high performance requirements. Publishing and referencing services in this manner consists of three main steps:
The service provider defines and implements the interface
When the service provider process starts, it exposes the interface by loading the server.xml configuration file.
When the service consumer process starts, it introduces the interface to invoke by loading the client.xml configuration file.
Advantage:
The performance of the private RPC protocol is higher than that of HTTP. Therefore, XML configuration is appropriate in scenarios that require high performance. Disadvantages:
It is highly intrusive to business code
When the XML configuration changes, both service consumers and service providers are updated (Recommendation: between closely connected businesses within the company)
The IDL file
IDL, which stands for Interface Description Language, describes interfaces in a neutral way so that objects running on different platforms and programs written in different languages can communicate with each other. Common IDLS: one is Facebook’s open source Thrift protocol, the other is Google’s open source gRPC protocol. Whether it’s the Thrift protocol or gRPC protocol, their work turns out to be similar.
Advantage:
Used as an invocation between services across language platforms
Disadvantage:
When describing the interface definition, the IDL file needs to define the interface return value in detail. If the interface returns a large number of fields and changes frequently, the IDL file approach to the interface definition is not appropriate.
On the one hand, IDL files are too large to maintain
On the other hand, whenever the interface return value defined in the IDL file changes, it needs to be synchronized to all service consumers, which is too expensive to manage.
conclusion
The choice of service description depends on the actual situation. In general, XML configuration is easiest if it is only intra-enterprise service invocations and all are in the Java language. If multiple services exist in an enterprise and use different language platforms, you are advised to use IDL files to describe services. If there are situations where service calls are open to the outside world, using RESTful apis is more common.
2. How do I register and discover services
Principles of the Registry
In the microservice architecture, there are three main roles: service provider (RPC Server), service consumer (RPC Client) and Service Registry. The interaction among the three roles is shown in the figure
RPC Server provides services. At startup, according to the information configured in the service publishing file server.xml, it subscribes to Registry service, caches the list of service nodes returned by Registry in local memory, and establishes a connection with RPC Server.
RPC Client invokes the service, subscribes to Registry service according to the information configured in the service reference file client.xml, caches the list of service nodes returned by Registry in local memory, and establishes a connection with RPC Client.
When the RPC Server node changes, Registry synchronizes the change, and the RPC Client refreshes the list of service nodes cached in the local memory.
The RPC Client selects an RPC Server from the locally cached service node list based on the load balancing algorithm to initiate the call.
Registry implementation
Registry API
Service registration interface: Service providers invoke the registration interface to complete service registration
Service unregister interface: The service provider invokes the service unregister interface to complete service unregister
Heartbeat reporting interface: The service provider invokes the heartbeat reporting interface to report the node inventory status
Service subscription interface: The service consumer invokes the service subscription interface to complete the service subscription and get the list of available service provider nodes
Service Change query interface: The service consumer invokes the service Change query interface to get the latest list of available service nodes
Service query interface: Queries which services are currently housed in the registry
Service modification interface: Modifies information about a service in the registry
Cluster deployment
Registries are typically clustered to ensure high availability, and distributed consistency protocols are used to ensure data consistency between nodes in the cluster.
How Zookeeper works:
Each Server stores a piece of data in the memory. Read requests from clients can be sent to any Server
When Zookeeper starts, a leader is elected from the instance (Paxos protocol)
The Leader handles operations such as data updates (ZAB protocol)
An update operation, Zookeeper ensures high availability and data consistency
Directory to store
As a registry, ZooKeeper stores service information in a hierarchical directory structure:
Each directory is called zNode in ZooKeeper and has a unique path identifier
A ZNode can contain data and child ZNodes.
Data in a Znode can have multiple versions. For example, a Znode has multiple data versions. In this case, you need to query the data in this path with the version information.
Service health status check
In addition to supporting the most basic functions of service registration and subscription, the registry must also have the function of checking the health status of service provider nodes to ensure that the service nodes stored in the registry are available.
Based on the long connection and session timeout control mechanism between the ZooKeeper client and the server, service health status detection is implemented.
In ZooKeeper, after a connection is established between the client and server, a Session is also established and a globally unique Session ID is generated. The server maintains a long connection with the client. During the SESSION_TIMEOUT period, the server checks whether the link to the client is normal by sending heartbeat messages (ping messages) to the server periodically. The server resets the next SESSION_TIMEOUT period. If the Session expires, ZooKeeper considers the Session to have expired, considers the service node to be unavailable, and deletes its information from the registry.
Service status change notification
As soon as the registry detects that a server provider node has been added or dropped, it must immediately notify all service consumers that subscribe to the service, refresh the locally cached service node information, and ensure that service invocations do not request service provider nodes that are not available.
Watcher mechanism based on Zookeeper to realize service status change notification to service consumers. When the service consumers subscribe to the service by calling getData of Zookeeper, they can also obtain the changes of the service through the process method of Watcher, and then call getData method to obtain the changed data and refresh the local mixed service node information.
Whitelist mechanism
Registries can provide a whitelist mechanism so that only RPC servers added to the registry whitelist can call the registry’s registration interface, thus preventing nodes in the test environment from accidentally running into the online environment.
conclusion
The registry can be said to be the key to the implementation of service talk, because after service talk, service providers and service consumers do not run in the same process, achieving decoupling, which requires a link to connect the service provider and service consumer, and the registry is exactly to assume this role. In addition, service providers can scale to increase or decrease nodes at will. Through service health detection, the registry can keep the latest service node information and notify the changes to the service consumers subscribing to the service.
Registries generally use distributed cluster deployment to ensure high availability, and in order to achieve remote multi-live, some registries also use multi-IDC deployment, which has high requirements for data consistency, these are the problems that must be solved in the implementation of the registry.
3. How to implement RPC remote service invocation
How do clients and servers establish network connections
HTTP communication
HTTP communication is based on the application layer HTTP protocol, which in turn is based on the transport layer TCP protocol. An HTTP communication process is an HTTP call, and an HTTP call establishes a TCP connection, which goes through a “three-way handshake” process as shown in the figure below.
After completing the request, go through another “four waves” to disconnect.
The Socket communication
Socket communication is based on TCP/IP protocol encapsulation. A Socket connection requires at least one pair of sockets, one of which runs on the client, called ClientSocket. The other runs on the server side, called ServerSocket.
Server monitoring: ServerSocket uses bind() to bind a specific port, and then calls Listen () to monitor the network status in real time and wait for the connection request from the client.
Client request: ClientSocket calls the connect() function to initiate a connection request to the address and port bound to the ServerSocket.
Server connection confirmation: When the ServerSocket listener receives or receives a connection request from the Client, the accept() function is called in response to the client’s request to establish a connection with the client.
Data transmission: When a connection is established between ClientSocket and ServerSocket, ClientSocket calls send() and ServerSocket calls receive(). After ServerSocket processes the request, Call send() and ClientSocket call receive(), and the result is returned.
When a network connection is established between the client and server, the request can be made. However, the network may not always be reliable. There are often various exceptions such as intermittent network disconnection, connection timeout, and server breakdown. There are two ways to deal with these exceptions:
Link survival check: The client periodically sends heartbeat checks (usually through ping requests) to the server. If the server performs heartbeat checks for n consecutive times or does not respond to any response within the specified time, the link is considered invalid and the client needs to reconnect to the server.
Disconnection retry: The connection may be disconnected in a variety of ways, such as client shutdown, server breakdown, or network failure. In this case, the client needs to re-establish a connection with the server. However, the connection cannot be reconnected immediately. In this case, the client needs to wait for a fixed interval before reconnecting to the server.
How does the server handle requests
Synchronous blocking Mode (BIO)
Each time a client makes a request, the server generates a thread to process it. When a client initiates many requests at once, the server needs to create many threads to handle each request. If the system reaches the maximum number of threads, the incoming requests cannot be processed.
BIO is suitable for business scenarios where the number of connections is small, so that there are no threads available in the system to process requests. This way to write the program is relatively simple and intuitive, easy to understand.
Synchronous non-blocking (NIO)
Instead of creating a new thread each time a client makes a request, the server uses I/O multiplexing to process it. The block of anthem I/O is multiplexed to the block of listening to a SELECT, allowing the system to process multiple client requests simultaneously in a single thread. The advantage of this approach is that it is less expensive and saves system overhead by not creating a thread for each request.
NIO is suitable for service scenarios with a large number of connections and light request consumption, such as chat servers. This approach is relatively more complex to program than BIO.
Asynchronous non-blocking (AIO)
The client only needs to initiate an I/O operation and return immediately. After the I/O operation is complete, the client is notified that the I/O operation is complete. In this case, the client only needs to process data without performing actual I/O operations. Because the actual I/O reading or writing is already done by the kernel. The advantage of this approach is that the client does not have to wait and there is no blocking wait problem.
AIO is suitable for service scenarios with a large number of connections and heavy request consumption, such as album servers that involve I/O operations. Compared with the other two, this method is the most difficult to program and the program is not easy to understand.
advice
The safest approach is to use mature open source solutions such as Netty, MINA, etc., which are proven and reliable solutions that have been adopted on a large scale in the industry.
What protocol is used for data transmission
Both open and private agreements must define a “contract” so that a consensus can be reached between service consumers and service providers. The service consumer encodes the transmitted data according to the contract and transmits it over the network. After receiving the data from the network, the service provider decodes the transmitted data according to the contract, processes the request, and then codes the processed result and transmits it back to the service consumer through the network. The service consumer then decodes the returned result and finally gets the returned value processed by the service provider.
The HTTP protocol
The message header
Server Indicates the type of the Server
Content-length Indicates the Length of returned data
Content-type Indicates the Type of returned data
The message body
Concrete return results
How should data be serialized and deserialized
Generally, when data is transmitted on the network, it is necessary to encode the data in one segment of the sender, and then decode the data after it is transmitted to another segment through the network. This process is serialization and deserialization
The commonly used serialization methods are divided into two types: text type, such as XML/JSON, and binary type, such as PB/Thrift. The specific serialization method depends on three factors.
Supports richness of data structure types. The more types of data structures are supported, the better, so that they are more user-friendly in programming. Some serialization frameworks such as Hessian 2.0 also support complex data structures such as maps and lists.
Cross-language support.
Performance. Two main points to look at, one is the compression ratio after serialization, one is the speed of serialization. Taking the commonly used PB serialization and JSON serialization protocols for example, the compression ratio and speed of PB serialization are much higher than that of JSON serialization, so it is more suitable for systems with high requirements on performance and storage space to choose PB serialization. Although JSON serialization has poor performance, it is more readable. Therefore, PB serialization is more suitable for systems with high requirements on performance and storage space to provide external services.
conclusion
Communication framework: It mainly deals with how clients and servers establish connections, manage connections, and how servers handle requests.
Communication protocol: it mainly solves the problem of which data transmission protocol the client and the server use.
Serialization and deserialization: It mainly addresses the problem of which data encoding is used by the client and the server.
The communication framework provides the basic communication capabilities, the communication protocol describes the communication contract, and the serialization and deserialization are used for data encoding/decoding. A communication framework can adapt to a variety of communication protocols, and can also use a variety of serialization and deserialization formats. For example, the service language framework not only supports Dubbo protocol, but also supports RMI protocol, HTTP protocol, etc., and also supports a variety of serialization and deserialization formats. Examples include JSON, Hession 2.0, and Java serialization.
4. How do I monitor microservice calls
Before talking about monitoring microservice monitoring calls, three questions need to be clear: What are you monitoring? What specific indicators are monitored? Which dimensions are monitored from?
Monitoring object
Client monitoring: Services directly monitor functions provided by users.
Interface monitoring: Usually refers to specific RPC interface monitoring for the functions provided by the service.
Resource monitoring: Monitoring the resources that an interface depends on. (Eg :Redis stores the concern list, and monitoring Redis is resource monitoring.)
Basic monitoring: This usually refers to monitoring the health of the server itself. (EG: CPU, MEM, I/O, NIC bandwidth, etc.)
Monitoring indicators
The request quantity
QPS Queries Per Second: Measured by the number of Queries Per Second, shows the real-time changes of service invocation
PV Page View (PV Page View) : it measures the number of Page views of users in a period of time. Eg: PV of a day represents the number of Page views of services in a day, which is usually used for statistical reports
Response time: In most cases, the response time of a request can be reflected in the average time taken for all invocations over a period of time. But it only represents the average speed of requests, and sometimes we care more about the number of slow requests. Therefore, the response time should be divided into multiple intervals, such as 0 ~ 10ms, 10ms ~ 50ms, 50ms ~ 100ms, 100ms ~ 500ms and above 500ms. The number of requests in the interval above 500ms represents the number of slow requests. Under normal circumstances, The number of requests in this interval should be close to zero; In the event of a problem, the number of requests within this interval should be close to zero; In the event of a problem, the number of requests in this range increases significantly, which may not be reflected in the average elapsed time. In addition, P90, P95, P99, and P999 can be used to monitor the response time of the request within 500ms, which represents the service quality of the request, namely SLA.
Error rate: It is usually measured by the ratio of the number of failed calls to the total number of calls within a period of time. For example, the error rate of an interface is usually expressed by the ratio of the error code returned by the interface to 503.
Monitor the dimension
Global dimension: Monitors the request volume, average time, and error rate of an object from an overall perspective. The global dimension is used to give you an overall understanding of the call status of the monitored object.
Extension room dimension: To ensure high service availability, more than one equipment room is deployed. Because the indicators of a monitored object may vary greatly in different equipment room regions.
Single machine dimension: In the same machine room, the indicators may be different due to different purchase years and batches.
Time dimension: Indicators of the same monitored object are usually different at the same time every day. Such differences are either caused by business changes or operational activities. To understand the changes in indicators of monitored objects, you need to compare them with one day, one week, one month, or even three months ago.
Core dimension: In business, monitoring objects are generally classified according to their importance. The simplest one is divided into core business and non-core business. Core services and non-core services must be separated and monitored separately to ensure core services.
For a microservice, it is necessary to make clear which objects and indicators to monitor and monitor from different dimensions in order to master the invocation of microservices.
Principle of Monitoring system
Data collection: Collect detailed information about each call, including the response time of the call, whether the call is successful, and who is the initiator and receiver of the call. This process is called data collection.
Data transmission: after the data is collected, the data must be transmitted to the data processing center for processing in a certain way. This process is called data out transmission.
Data processing: After the data is transmitted, the data processing center aggregates the data according to the dimensions of services, calculates the request volume, response time, error rate and other information of different services and stores it. This process is called data processing.
Data display: Displays service invocation status in the form of interfaces or dashboards. This process is called data display.
The data collection
Active service reporting
Proxy collection: This process records the details of service invocations to a local log file, parses the local log file through the proxy, and reports the service invocations.
Either way, the first thing to consider is the sampling rate, which is how often data is collected. In general, the higher the sampling rate, the higher the real-time monitoring, the higher the accuracy. However, sampling has an impact on the system performance. In particular, when the collected data needs to be written to the local disk, a high sampling rate will cause a high number of I/ OS written to the system, affecting normal service invocation. Therefore, a reasonable sampling rate is the key to data collection. It is best to dynamically control the sampling rate and increase the sampling rate when the system is relatively idle to pursue real-time monitoring and accuracy. When the system load is relatively high, the sampling rate is reduced to pursue the availability of monitoring and system stability.
The data transfer
UDP transmission: In this processing mode, the data processing unit provides the server’s request address, establishes a connection with the server through UDP protocol after data collection, and then sends the data to the server.
Kafka transport: The data is collected and sent to a specific Topic. Then the data processing unit subscribes to the corresponding Topic and reads the corresponding data from the Kafka message queue.
Regardless of the transmission mode, the data format is very important, especially for bandwidth-sensitive and high-performance scenarios. Generally, there are two data formats used in data transmission:
Binary protocols, most commonly used are PB objects
The most common text protocol is JSON strings
The data processing
Interface dimension aggregation: The data received in real time is aggregated by the node dimension of the call, so that information such as real-time requests, average elapsed time for each interface can be obtained.
Machine dimension aggregation: The data received in real time is aggregated according to the node dimension called, so that the real-time request volume and average time of each interface can be viewed from the single machine dimension.
The aggregated data needs to be persisted in a database for storage, which is generally divided into two types:
Index database: such as Elasticsearcher, which stores books in an inverted index data structure, and queries against the index when needed.
Timing database: For example, OpenTSDB, which is stored in the form of timing sequence data and queried according to timing dimensions such as 1min and 5min
The data show
Curve: Monitor the change trend.
Pie chart: Monitoring proportion distribution.
Grid diagram: Mainly sit some fine – grained monitoring.
conclusion
The importance of service monitoring sub-ah micro-service transformation is self-evident. Without strong monitoring ability, after the transformation into micro-service architecture, it will be impossible to control the situation of different services. In the event of invocation failure, if the system problems can not be found quickly, it will be a disaster for the business.
Build a service monitoring system, design data acquisition, data transmission, data processing, data display and other links, each link needs to choose the appropriate solution according to their own business characteristics
5. How do I track microservice calls
It traces the invocations initiated and processed by each user request, and records the detailed information involved in each invocation. In this case, if an invocation failure occurs, it can be used to quickly locate the fault.
The role of service tracking
Optimizing system bottlenecks
By recording the time spent on each link that the call passes through, you can quickly locate the bottleneck of the entire system. Possible causes are as follows:
Carrier network Latency
Gateway System Exception
A service is abnormal.
The cache or database is abnormal
Through service tracking, we can observe the bottleneck point of the whole system from a global perspective, and then make targeted optimization
Optimized link call
Service tracing allows you to analyze the path that the call took and then evaluate whether it made sense
Generally, services are deployed in multiple data centers to implement remote Dr. In this case, service A often invokes service B in another data center, but does not invoke service B in the same data center. The call across the data center will have a certain amount of network delay, such as Beijing and Guangzhou thousands of kilometers of network delay may be more than 30ms, which is almost unacceptable for some services. Through the analysis of the call link, we can find out the service invocation across the data, so as to optimize and avoid this general situation as far as possible.
Generating a Network Topology
Through the link information recorded in the service tracking system, a network call topology of the system can be generated. It can reflect which services the system depends on and what the call relationship between services is, which can be seen at a glance. In addition, the details of service invocation can also be marked on the network topology, which can also play a role of service monitoring.
Transparent transmission of data
In addition to service tracing, there is often a business requirement to pass some user data from the beginning of the invocation down the line so that the services in the system can access this information. For example, if A business wants to do some A/B tests, it wants to pass the switch logic of A/B tests down through the service tracking system. Each layer of services through can obtain the switch value, so that A/B tests can be carried out uniformly.
Service Tracing principles
The originator of service Tracing: Dapper, [a large-scale Distributed Systems Tracing Infrastructure], published by Google
Core idea: restore the original call relationship by concatenating the same request distributed on each service node with a globally unique ID, track system problems, analyze call data, and count various system indicators
It can be said that the following service tracking systems are derived from Dapper, such as Twitter’s Zipkin, Ali’s Eagle Eye and Meituan’s MTrace.
Explain some of the most basic concepts in service tracking systems
TraceId: identifies a specific request ID.
SpanId: Used to identify the location of an RPC call in a distributed request.
Annotation: Custom buried data for the business, which can be data that the business is interested in uploading to the back end, such as a user UID for a request.
TraceId is used to connect all the paths of a request in the system, spanId is used to distinguish the call sequence between different services in the system, and annotation is used to customize some data that businesses are interested in. Besides uploading basic information like traceId and spanId, Add information that interests you.
Service tracking system implementation
The above is the service tracking system architecture diagram. A service tracking system can be divided into three layers:
Data acquisition layer: responsible for data burying and reporting
Data processing layer: responsible for data storage and calculation
Data display layer: responsible for the graphical display of data
Data acquisition layer
Function: burying points in different modules of the system, collecting data and reporting it to the data processing layer for processing.
Client Send (CS) phase: The Client initiates a request and generates the context for the invocation.
Server Recieve (SR) phase: The Server receives the request and generates the context.
Server Send (SS) The server returns a request, and the server context data will be reported at this stage. The following figure shows the reported data: traceId=123456, spanId=0.1, appKey=B, method= B.thod, Start =103, duration=38.
CR(Client Recieve) : After the client receives the result, the client context data will be reported at this stage. The reported data are: Traceid =123456, spanId=0.1, appKey=A, method= B.thod, Start =103, duration=38.
Data processing layer
Function: the data reported data according to the need to calculate, and then the ground storage for query use
Real-time data processing: Requires high computing efficiency. Generally, the collected link data can be aggregated in seconds for real-time query
For real-time data processing, Storm or Spack Streaming is generally used for real-time aggregation processing of link data. OLTP data warehouse is used for storage, such as HBase. TraceId is used as RowKey, which can naturally aggregate a call chain together. Improve query efficiency.
Offline data processing: The computing efficiency is relatively low. Generally, the aggregation calculation of link data can be completed at the hour level, which is generally used for summary statistics.
For offline data processing, you can run the MapReduce or Spark batch processing program to calculate link data offline. Hive is generally used for storage
The data show
Function: Displays the processed link information to users in graphical mode for fault location
Call link diagram (eg: Zipkin)
Service overview: total service time, network depth of service invocations, systems through each layer, and how many invocations. The following figure shows a call, which takes 209.323ms in total, passes through 5 different system modules, the call depth is 7 layers, and a total of 2
Pinpoint call topology map
Call topology is a global view. In actual projects, it is mainly used for global monitoring. Users can find system exceptions and make decisions quickly. For example, if a service suddenly becomes abnormal, the call time of the service becomes higher in the call link topology, which can be marked in red as a monitoring alarm.
conclusion
Service tracing can help query the specific execution path of a user request in the system, as well as the details of the upstream and downstream of each path, which is very useful for tracing problems.
To achieve a service tracking system, design data acquisition, data processing and data display three processes, there are a variety of implementation methods, specific to take a certain to choose according to their own business conditions.
6. What are the means of micro-service governance
Service providers, registries, and networks may all have problems with a service invocation. What should the service consumer do to ensure that the invocation succeeds? This is what service governance is about.
Node management
Service invocation failures are generally caused by two types of causes
The service provider has its own problems, such as server downtime, unexpected process exits, and so on
Network problems, such as network problems between service providers, registries, and service consumers
Regardless of the service reason, there are two methods of node management:
Registry active removal mechanism
This mechanism requires the service provider to report the heartbeat to the registry regularly. The registry will compare the last heartbeat reporting time of the service provider node with the last heartbeat reporting time. If the heartbeat reporting time exceeds a certain period, the service provider will be considered as having problems and the node will be removed from the service list. And push a list of the most recent available service nodes to service consumers.
Service consumer removal mechanism
Although registry active removal mechanism can solve the problem of abnormal service provider node, but if it’s because the registry between service providers and network anomalies, in the worst case is a registry node to remove all the service, service consumers no possible service call, but this time the provider itself is normal. Therefore, it makes more sense to use the survival detection mechanism on the service consumer side, and if the service consumer fails to invoke the service provider node, the node is removed from the in-memory list of available husband provider nodes.
Load balancing algorithm
Common load balancing algorithms include the following:
Random algorithm (uniform)
Polling algorithm (polling available service nodes according to fixed weights)
Least active call algorithm (performance theory optimal)
Consistent Hash algorithm (requests with the same parameters are always sent to the same service node)
Service routing
For service consumers, which node to choose from the list of available service nodes in memory is determined not only by load balancing algorithms but also by routing rules.
The so-called routing rules limit the selection range of service nodes through certain rules such as conditional expression or regular expression.
Why specify routing rules? There are two main reasons:
There is a need for grayscale publishing
For example, a service provider makes a feature change, but wants to make it available only to a subset of the population, and then decides whether to release it in full based on the feedback from that subset of the population.
Multiple equipment rooms need to be accessed nearby
Call depending on the distance across the data center will have certain network latency, such as Beijing and guangzhou thousands of kilometers of distance network latency could reach more than 30 ms, this is unacceptable to some business almost, so try to choose the same IDC is a service call internal nodes, and thus to minimize network takes overhead and improve performance. In this case, you can use the IP address segment rule to control access. When selecting service nodes, the nodes in the same IP address segment are preferentially selected.
How to configure routing rules?
Static configuration: The service consumer stores the called routing rules locally. If the rules are changed, the changes take effect only after the service consumer goes online again
Dynamic configuration: routing rules stored in configuration center, service consumers to request the registry to keep synchronization regularly, if you want to change consumer routing configuration, you can change the registry configuration, service consumers in a synchronous cycle, will request the registry to update the configuration, so as to realize the dynamic change
Service fault tolerance
The commonly used means mainly include the following:
FailOver: Automatic switchover when a call fails or times out. The number of retries can be set.
FailBack: Notifies of failure (if the call fails or times out, subsequent execution policies are determined based on detailed information about the failure.)
FailCache: Failure cache (if the call fails or times out, retry is not initiated immediately, but retry after a period of time)
FailFirst: rapid failure (After a call fails, it is no longer sufficient. Generally, non-core service calls adopt the policy of rapid failure. Failure logs are recorded and returned after the call fails.)
Generally, you can select FailOver or FailCache for idempotent calls, and Failback or FailFast for non-idempotent calls
conclusion
Node management is considered from the perspective of the health status of service nodes, load balancing and service routing is considered from the perspective of the access priority of service nodes, and service fault tolerance is considered from the health status of invocation.
In the actual microservice architecture, the above service governance means are generally established by default in the service framework, such as Dubbo of Ali, Motan of Weibo open source service architecture, etc.