This article will elaborate on all aspects of micro-service and lead you into the world of micro-service. This article is a long one, totaling 2.4W words. Those with basic knowledge can choose chapters that interest them.
Reprint please contact me!
This article will introduce the following contents:
-
Overview of Microservices
1.1 Easy to expand
1.2 Simple Deployment
1.3 Technology heterogeneity
-
Servitization of database sharding
2.1 What is “Separate database and separate table”?
2.2 Several ways of database expansion
2.3 Several ways to divide database and table
2.4 Problems after the introduction of sub-database sub-table middleware
2.5 Horizontal comparison of existing sub-database sub-table middleware
-
Distributed transactions in microservices architecture
3.1 What are transactions?
3.2 Four Characteristics of Transaction ACID
3.3 Transaction isolation level
3.4 What are distributed transactions?
3.5 CAP theory
3.6 the BASE theory of
3.7 Acid-base balance
3.8 Distributed transaction protocol
3.9 Distributed transaction solution
-
Service deployment
4.1 Continuous integration, continuous deployment and continuous delivery
4.2 Microservices and Continuous integration
4.3 Microservice constructs
-
reference
1. What are microservices?
We will first give a definition of microservices and then explain it in detail.
Microservices are small services that can run independently and work together.
From the concept we can extract three key words: can run independently, can work together, small. These three words encapsulate the core features of microservices. Let’s explain these three words in detail.
-
Independent operation
Microservices are systems or processes that can be developed, deployed and run independently.
-
Synergistic work
After adopting the microservice architecture, the whole system is divided into multiple microservices. These services are often not completely independent, but there is some coupling in business, that is, one service may need to use the functions provided by another service. This is called “collaborative work”. Different from the single-service application, the invocation between multiple micro-services is realized through RPC communication rather than the local invocation of a single service, so the communication cost is relatively higher, but the benefits are considerable.
-
Small and beautiful
The idea of microservices is to divide a large system with complex functions into multiple independent subsystems according to business functions, which are called “microservices”. Each microservice performs only one function, making the microservice “small” in size compared to a single-service application. Small means that each service bears fewer responsibilities. According to the principle of single responsibility, we should try to make each service bear only one responsibility when designing the system, so as to achieve “high cohesion” of the system.
2. Advantages of microservices
1. Easy to expand
In a single-service application, if the current performance reaches the bottleneck and cannot support the current service volume, the cluster mode is generally adopted. In this case, add nodes to the server cluster and copy the single-service application to all nodes to improve the overall performance. However, the granularity of this extension is relatively coarse. If only a small part of the system has a performance problem, in a single-service application, the entire application has to be extended. This approach is simple and crude, and cannot be used to solve the problem. However, when we use the microservice architecture, if the performance of a certain service reaches a bottleneck, we only need to increase the number of nodes of the service, and other services do not need to change. This extension is more targeted and takes full advantage of computer hardware/software resources. Moreover, the impact of extending only a single service is smaller, so the probability of system error is lower.
2. Simple deployment
For a single-service application, all the code is in a single project, so any small change requires the entire project to be packaged, distributed, and deployed, which can be costly. Over time, in order to reduce the frequency of releases, teams tend to make a lot of changes with each release, and the more changes there are, the more errors there are. When we adopt the microservices architecture, each service has only a few responsibilities, so that only the modified system needs to be released each time, and the other systems can still function, and the impact is small. In addition, each microservice system changes relatively little code compared to a single-service application, resulting in a relatively low probability of errors after deployment.
3. Technical heterogeneity
For single-service applications, all modules of a system are integrated into one project, so the modules have to choose the same technology. But sometimes, a single technology cannot meet different business needs. For example, for the algorithm team of a project, a functional trial programming language may be more suitable for algorithm development, while for the business development team, a strongly typed language such as Java has higher stability. However, in a single service application, the tradeoffs are one and the same language, and when we use the microservice architecture, this problem can be solved. We split a complete system into several independent services, so that each service can choose the most appropriate technology architecture according to its different characteristics.
Of course, not all microservices are technically heterogeneous, and to achieve this, it is necessary to ensure that all services provide common interfaces. As we know, in microservice system, RPC interface is used to communicate between services, and there are many ways to realize RPC communication. Some RPC communication methods are strongly coupled to languages, such as Java’s RMI technology, which requires both sides of communication to adopt Java language development. There are, of course, language-independent RPC communication methods, such as REST, which is based on the HTTP protocol. This communication method does not impose any restrictions on the language used by the communication parties, as long as the data transmitted during the communication complies with REST specifications. Of course, being language-independent means that there is no type checking on both sides of the communication, which increases the probability of errors. Therefore, whether to choose RPC communication mode independent of language or RPC communication mode strongly coupled with language needs to be reasonably analyzed according to actual business scenarios.
2. Servitization of database segmentation
2.1 What is “Separate database and separate table”?
With the advent of the era of big data, the amount of data in service systems is increasing, and the data storage capability becomes a bottleneck affecting system performance. At present, the upper limit of single table storage of mainstream relational database is 10 million records, but this storage capacity obviously cannot meet the storage requirements of business system under the background of big data. With the emergence of concepts such as microservices architecture and distributed storage, the data storage problem is gradually turning a corner. At present, data sharding is an important means to solve persistent storage and efficient query of massive data. The process of dividing database into tables is completed in the system design stage, which requires system designers to divide the database and tables that may appear bottlenecks in the future into multiple libraries and tables according to certain rules according to the expected business volume of the system. These databases and data tables must be deployed on different servers to distribute data read and write burdens to each node in the cluster, improving the overall processing capability of the database and avoiding read and write bottlenecks.
Currently, there are two methods of data sharding: discrete sharding and continuous sharding.
Discrete sharding is based on the hash of a certain field of data. As long as the hash algorithm is properly selected, data will be evenly distributed among different shards, so that the read and write pressure is evenly distributed among all shards, and the overall data read and write ability is improved. However, discrete storage requires a strong independence of data, but this is not the case in actual business systems. Data in different shards is often associated with each other. Therefore, cross-shard connection queries are required in some scenarios. Because all relational databases currently do not support cross-library joins for security reasons. Therefore, cross-database operations need to be completed by the data database and table middleware, which greatly affects the efficiency of data query. In addition, when the data storage capacity is bottleneck and needs to be expanded, the discrete sharding rules need to re-hash all data, which undoubtedly becomes an important factor limiting the scalability of the system. Although consistent hashing can reduce data migration during system expansion to a certain extent, data migration problems are still inevitable. For a system running online, the cost of stopping data migration for external services is too high.
The second data sharding method is continuous sharding, which can solve the data migration problem during system expansion. This approach requires data to be stored continuously, either by time or by continuous increment of primary keys. Data from a period of time or data from adjacent primary keys will be stored in the same shard. When shards need to be added, existing shards are not affected. Therefore, continuous sharding can solve the data migration problem caused by capacity expansion. However, the storage time of data is proportional to the read/write frequency. That is, a large number of read/write data are concentrated on the latest data, which leads to hot issues and fails to share read/write pressure.
2.2 Several ways of database expansion
There are four distribution methods of database expansion, which are: vertical sub-database, vertical sub-table, horizontal sub-table and horizontal data sharding. Each strategy has its own applicable scenarios.
-
Vertical depots
Vertical database splitting a complete database into multiple independent databases based on service functions. These databases can run on different servers to improve the overall data read and write performance of the database. This approach is very common in microservice architectures. The core idea of microservice architecture is to divide a complete application into several independent subsystems according to their business functions. These subsystems are called “microservices”, and each service communicates with each other through RPC interfaces. This structure makes the system less coupled and easier to expand. The concept of vertical depots and micro service concept, can be had complete data according to the micro service separation system, split into multiple independent database, make each micro service system have their own independent database, in order to avoid a single database node pressure is too large, affect the overall performance of the system, as shown in the figure below.
-
The vertical table
If a table has a large number of fields, it is likely to cause cross-page storage of data, which can cause additional performance overhead for the database, and vertical partitioning can solve this problem. Vertical table splitting is to split the infrequently used fields in one table into another table to ensure fewer fields in the table in Chapter 1, avoid cross-page database storage, and improve query efficiency. The data in the other table is associated with the first table by foreign keys, as shown in the figure below.
-
The level of table
If a table contains too many records (more than 10 million records), the read and write performance of the database is greatly affected. Although the read and write performance of the database is still correct, the read and write speed reaches the point that services cannot tolerate. In this case, you need to use horizontal table to solve this problem. Horizontal table is to divide a table with many records into several tables with the same structure. For example, suppose an order table currently stores data for 20 million orders, resulting in extremely low data read and write efficiency. In this case, you can divide the order table into 100 order tables with the same structure, named ORDER_1, ORDER_2… And order_100. Then, the order can be hashed according to the ID of the user to which the order belongs and then stored evenly in the 100 tables, so that only 200,000 order records are stored in each table, which greatly improves the order reading and writing efficiency, as shown in the figure below. Of course, if the split tables are stored on the same database node, then when the request volume is too large, after all, the processing capacity of a single server is limited, the database will still become the bottleneck of the system, so in order to solve this problem, the solution of horizontal data sharding appeared.
-
Horizontal sub-database sub-table
The difference between horizontal data sharding and data sharding lies in that horizontal data sharding splits data tables horizontally and stores them on multiple database servers according to a certain sharding rule. In this way, the pressure of single library is distributed to multiple libraries, so as to avoid the database performance bottleneck caused by limited database hardware resources, as shown in the figure below.
2.3 Several ways to divide database and table
There are two commonly used data sharding strategies: continuous sharding and discrete sharding.
-
Discrete subdivision
In discrete sharding, data is scattered and evenly stored in each fragment of a logical table. In this way, data reads from the same logical table are evenly distributed on different tables in different libraries, improving the read/write speed. Discrete sharding is generally implemented by hashing modules. For example, a logical table has four shards. When reading and writing data, the middleware will first obtain the hash value of the shard field and then model it to 4 to calculate the shard in which the record resides. In this method, as long as the hash algorithm is well chosen, the data fragments will be relatively uniform, so that data reads and writes will be evenly distributed on each fragment, thus achieving high read and write efficiency. However, this approach also has one of the biggest defects – high cost of database expansion. In this way, if additional sharding is required, the original sharding algorithm is invalid and all records need to be recalculated. For a system that is already online, the cost of data migration at the row level is quite high, and because the system is still running and new data is generated during the data migration, data consistency cannot be guaranteed during the migration. If migration is stopped to avoid this problem, it will have a huge impact on the business. Of course, if you want to avoid data migration, you have to pay a lot of shards in the beginning, which is not affordable for small and medium-sized companies.
-
Continuous shard
Continuous sharding means that data in a certain range is stored in the same sharding according to a certain sharding rule. For example, a physical table is generated every month by time sharding. When reading or writing data, you can directly find the shard where the data resides according to the current time. Another example is sharding by record ID, which requires the ID to be continuously incremented. Since the maximum number of records in a single table of Mysql database is about 10 million, we can store 10 million records in each shard according to the ID of records. When the current number of records is about to reach the storage limit, we only need to increase the number of records, and the original data does not need to migrate. One of the biggest benefits of continuous sharding is that it facilitates expansion because it does not require any data migration. However, one of the biggest disadvantages of continuous sharding is the hot spot problem. Continuous sharding enables newly inserted data to be concentrated in the same shard. However, newly inserted data is frequently read and written. Therefore, read and write operations are concentrated in the latest shard, which fails to reflect the advantages of data sharding.
2.4 Problems after the introduction of sub-database sub-table middleware
-
Across the library operation
In a relational database, there is often correlation between multiple tables, so we need to use JOIN operation to JOIN multiple tables in the development process. But when we use the depots table model, because of security concerns, database vendors are not allowed to cross the library JOIN operation, thus if you need to connect the two tables, after being assigned to the different libraries will not be able to use SQL to provide the JOIN keyword to realize table joins, we may need to be in the business system level, through multiple SQL queries, Complete data assembly and splicing. On the one hand, this will increase the complexity of the business system, on the other hand, it will increase the load of the business system. Therefore, when we use the database and table mode, we need to set sharding policies and sharding fields reasonably according to specific business scenarios, which will be introduced in the following chapters of this paper.
-
Distributed transaction
As we know, databases provide transactional capabilities to ensure data consistency. However, this transaction is only for a single database; database vendors do not provide cross-library transactions. Therefore, when we use the sub-database and sub-table, we need to implement distributed transactions at the business system level. For details on distributed transactions, you can refer to another of my articles, Common Distributed transaction Solutions.
2.5 Horizontal comparison of existing sub-database sub-table middleware
-
Cobar simplifies the database programming model by implementing a transparent repository of the database, allowing developers to manipulate the database cluster without being aware of it. However, Cobar only implements the function of library division, not table division. A separate database can solve the IO, CPU, and memory bottlenecks of a single database, but it cannot solve the problem of a large amount of data in a single table. In addition, Cobar is a stand-alone system that sits between the application system and the database system, which adds additional deployment complexity and operational costs.
-
To address these issues, Cobar also introduced a Cobar-Client project, which is just a Jar package installed in the application, rather than a stand-alone system, reducing the complexity of the system to a certain extent. But like Cobar, there is still support for split libraries, not split tables, and no read/write separation.
-
MyCat is database middleware based on Cobar secondary development. Compared with Cobar, it adds read/write separation and fixes some Cobar bugs. However, MyCat, like Cobar, is a system that needs to be deployed independently, thus increasing the complexity of deployment and the cost of system operation and maintenance in the later stage.
3. Distributed transactions in microservices architecture
It is well known that databases can implement local transactions, which means that within the same database, you can allow a set of operations to either all execute correctly or none at all. There is a special emphasis on local transactions, meaning that the current database can only support transactions within the same database. However, today’s systems tend to use microservices architecture, and business systems have independent databases, so there is a need for transactions across multiple databases, which is called “distributed transactions”. So how can we implement distributed transactions when databases currently do not support cross-library transactions? This paper will first sort out the basic concepts and theoretical basis of distributed transaction, and then introduce several commonly used distributed transaction solutions. Without further ado, let’s begin
3.1 What are transactions?
A transaction consists of a set of operations that we want to execute correctly, and if any step in this set of operations fails, we need to roll back previously completed operations. That is, all operations in the same transaction either all execute correctly or none at all.
3.2 Four Characteristics of Transaction ACID
When it comes to transactions, there are four notable features of transactions.
-
atomic
Atomicity requires that a transaction be an indivisible unit of execution in which all operations are either executed at all or not at all.
-
consistency
Consistency requires that the integrity constraints of the database are not broken before and after a transaction.
-
Isolation,
Transaction execution is independent of each other, they do not interfere with each other, and one transaction does not see the data of another transaction in progress.
-
persistence
Persistence requires that after a transaction is completed, the execution result of the transaction must be persisted. Even if a database crash occurs, the results of the transaction commit will not be lost after the database is recovered.
Note: transactions can only ensure the high reliability of the database, that is, the data can still be recovered after the transaction is submitted even if the database itself has problems. If it is not a failure of the database itself, such as a corrupted hard disk, the data committed by the transaction may be lost. This falls under the category of “high availability”. Therefore, transactions can only ensure the “high reliability” of the database, and the “high availability” needs the cooperation of the whole system to achieve.
3.3 Transaction isolation level
This extends to a more detailed explanation of transaction isolation.
In ACID, the required isolation is in the strict sense that multiple transactions are executed sequentially without any interference from one another. This can be a complete guarantee of data security, but in real business systems, this approach is not high performance. Therefore, the database defines four isolation levels, which are inversely proportional to database performance. The lower the isolation level, the higher the database performance, and the higher the isolation level, the worse database performance.
3.3.1 Problems arising from concurrent execution of transactions
Let’s take a look at some of the problems that can occur with databases at different isolation levels:
-
Update the lost
When two concurrent transactions update the same row, it is possible that one transaction overwrites the other’s update. Occurs when no lock has been added to the database.
-
Dirty read
A transaction reads data from another uncommitted transaction. The data may be rolled back and become invalid. An error occurs if the first transaction is processed with invalid data.
-
Unrepeatable read
Non-repeatability: a transaction reads the same row twice and gets different results. It can be divided into the following two situations:
- Virtual read: When transaction 1 reads the same record twice, transaction 2 modifies the record so that transaction 1 reads a different record a second time.
- Phantom read: Transaction 1 inserts and deletes the table in the process of two queries, so the result of the second query of transaction 1 changes.
What is the difference between unrepeatable and dirty reads? Dirty reads are uncommitted data, while unrepeatable reads are committed data that has been modified by another transaction in the process.
3.3.2 Four isolation levels for databases
There are four isolation levels for databases:
-
Read Uncommitted The Read is not committed
At this level, while one transaction is modifying a row, another transaction is not allowed to modify the row, but another transaction is allowed to read the row. Therefore, at this level, updates will not be lost, but dirty and unrepeatable reads will occur.
-
Read COMMITTED Read commit
At this level, uncommitted write transactions do not allow other transactions to access the row, so no dirty reads occur; However, the transaction reading the data allows other transactions to access the data in that row, so non-repeatable reads can occur.
-
Repeatable Read Repeatable read
At this level, read transactions prohibit write transactions, but read transactions are allowed, so the same transaction never reads different data twice (non-repeatable reads), and write transactions prohibit all other transactions.
-
The Serializable serialization
This level requires that all transactions be executed serially, thus avoiding all concurrency problems, but is inefficient.
A higher isolation level ensures data integrity and consistency, but has a greater impact on concurrency performance. For most applications, setting the isolation level of the database system to Read Committed is a priority. It can avoid dirty reads and has good concurrency performance. Although it can lead to concurrent problems such as unrepeatable reads, phantom reads, and type ii missing updates, in the rare cases where such problems may occur, it can be controlled by the application using pessimistic or optimistic locking.
3.4 What are distributed transactions?
So far, the transactions described have been local transactions based on a single database. Current databases only support single library transactions, not cross-library transactions. With the popularity of microservice architecture, a large business system is often composed of several subsystems, which have their own independent databases. Often a business process needs to be done by multiple subsystems, and these operations may need to be done in a single transaction. These business scenarios are common in microservice systems. At this point, we need to implement cross-database transaction support through some means on the database, which is often referred to as “distributed transaction”.
A typical example of a distributed transaction is the user order process. When our system adopts the micro-service architecture, an e-commerce system is often divided into the following subsystems: commodity system, order system, payment system, points system, etc. The whole ordering process is as follows:
- The user browses through the product system, sees a particular product, and clicks to place an order
- The order system generates an order
- After the order is successfully created, the payment system provides the payment function
- When the payment is completed, the points system will add points for the user
Steps 2, 3, and 4 above need to be done in one transaction. Implementing A transaction for A traditional singleton application is as simple as putting these three steps into A method A that is identified by Spring’s @Transactional annotation. Spring, through the transaction support of the database, ensures that these steps are either all performed or none performed. But in this microservice architecture, these three steps involve three systems, involve three databases, and we have to implement distributed transaction support between the database and the application system through some dark technology.
3.5 CAP theory
CAP theory states that in A distributed system, only two requirements of C, A and P can be satisfied at most.
CAP meaning:
-
C: Consistency
Whether multiple copies of the same data are the same in real time.
-
A: Availability
Availability: the system is available if it returns a definite result within a certain amount of time.
-
P: Partition tolerance Indicates the fault tolerance of a Partition
Distribute the same service across multiple systems so that when one system goes down, other systems still provide the same service.
CAP theory tells us that in distributed systems, we can choose at most two of the three conditions C, A, and P. So the question is, which two conditions are more appropriate to choose?
Availability and fault tolerance of partitions are two conditions that must be met for a business system, and they complement each other. Business systems use distributed systems for two main reasons:
-
Improve overall performance
When the business volume soars, a single server can no longer meet our business needs, we need to use a distributed system, using multiple nodes to provide the same function, so as to improve the overall performance of the system, this is the first reason to use a distributed system.
-
Achieve partition fault tolerance
If a single node or multiple nodes are on the same network, the service system may break down in case of power failure in the equipment room or natural disasters in the region. To prevent this problem, a distributed system is used to distribute multiple subsystems in different equipment rooms in different regions to ensure high availability of the system.
This shows that partition fault tolerance is the root of distributed system, if the partition fault tolerance can not meet, then the use of distributed system will be meaningless.
In addition, availability is particularly important for business systems. In today’s talk about the user experience, often appear abnormal “system” if the business system, response time is too long, and so on and so forth, this allows the user to liking system, in the Internet industry competitive today, not very enumeration, competitors in the field of the same system of intermittent unavailable immediately cause customers to competitors. Therefore, we can only sacrifice consistency for system availability and partition fault tolerance. So that’s the BASE theory.
3.6 the BASE theory of
CAP theory tells us the sad but inevitable truth that we can only choose between C, A and P. For business systems, consistency is often sacrificed in exchange for system availability and partition fault tolerance. It should be noted, however, that the so-called “sacrificing consistency” does not mean abandoning data consistency completely, but rather sacrificing strong consistency for weak consistency. So let’s talk about BASE theory.
- BA: Basic Available
- The whole system can still be “available” in the event of some force majeure, that is, it can still return a clear result within a certain time. The difference between “basic availability” and “high availability” is:
- When a major promotion is held, the response time can be extended appropriately
- Return a degraded page to some users directly return a degraded page to some users, thus relieving the server pressure. Note, however, that returning to the degraded page still returns explicit results.
- The whole system can still be “available” in the event of some force majeure, that is, it can still return a clear result within a certain time. The difference between “basic availability” and “high availability” is:
- S: Soft State: Soft State Indicates the State of different copies of the same data, which need not be consistent in real time.
- E: Eventual Consisstency: Final consistency State of different copies of the same data. It does not need to be consistent in real time, but it must be consistent after a certain period of time.
3.7 Acid-base balance
ACID ensures strong transaction consistency, that is, data is consistent in real time. This is not a problem in local transactions. In distributed transactions, strong consistency will greatly affect the performance of the distributed system, so the distributed system can follow the BASE theory. However, different business scenarios of distributed systems have different requirements for consistency. For example, strong consistency is required in transaction scenarios, and ACID theory should be followed. However, real-time consistency is not required in scenarios such as sending SMS verification codes after successful registration, so BASE theory should be followed. Therefore, there is a balance between ACID and BASE depending on your business scenario.
3.8 Distributed transaction protocol
Several protocols for implementing distributed transactions are described below.
3.8.1 Two-Phase Commit Protocol 2PC
One of the difficulties of distributed systems is how to ensure the consistency of transactional operations between multiple nodes in the architecture. To achieve this, the two-phase submission algorithm is based on the following assumptions:
- In this distributed system, one node serves as a Coordinator and the other nodes serve as Cohorts. The nodes can communicate with each other on the network.
- All nodes use write-ahead logs, and the logs are stored on reliable storage devices after being written, even if the node is damaged.
- All nodes are not permanently damaged and can be recovered even after damage.
1. Phase I (voting phase) :
- The coordinator node asks all the participant nodes if they can vote and waits for the response from each participant node.
- The participant node performs all transactions until the query is initiated and writes Undo and Redo information to the log. (Note: if successful, each participant has already performed the transaction.)
- Each participant node responds to the query initiated by the coordinator node. If the transaction of the participant node actually succeeds, it returns a “agree” message; If the transaction operation of the participant node actually fails, it returns an abort message.
2. Phase II (submission for implementation) :
When the coordinator node gets the corresponding message “agree” from all the participant nodes:
- The coordinator node issues a “commit” request to all the participant nodes.
- The participant node formally completes the operation and releases the resources that were held during the entire transaction.
- The actor node sends a “done” message to the coordinator node.
- The coordinator node completes the transaction after receiving a “complete” message from all the participant nodes.
If the response message returned by any of the participant nodes in the first phase is “abort”, or the coordinator node cannot obtain the response message from all the participant nodes before the first phase query times out:
- The coordinator node issues a “rollback” request to all the participant nodes.
- The participant node performs a rollback with the previously written Undo information and frees the resources occupied during the entire transaction.
- The participant node sends a rollback complete message to the coordinator node.
- The coordinator node cancels the transaction after receiving a “rollback complete” message from all the participant nodes.
The second phase ends the current transaction regardless of the final outcome.
Two-phase commits do seem to provide atomic operations, but unfortunately, two-phase commits have several drawbacks:
- During execution, all participating nodes are transaction blocking. When a participant occupies a common resource, other third party nodes have to block access to the common resource.
- The participant is faulty. Procedure The coordinator needs to specify an additional timeout mechanism for each participant, after which the entire transaction fails. (Not much fault tolerance)
- The coordinator has failed. The participants will keep blocking. Additional standby machines are required for fault tolerance. (This can rely on the Paxos protocol to implement HA.)
- The problem that phase 2 failed to resolve: The coordinator went down after issuing a COMMIT message, and the only participant who received the message went down at the same time. So even if the coordinator creates a new coordinator by election agreement, the status of the transaction is uncertain, and no one knows whether the transaction has been committed.
For this reason, Dale Skeen and Michael Stonebraker proposed three-stage commit protocol (3PC) in “A Formal Model of Crash Recovery in A Distributed System”.
3.8.2 Three-phase Commit protocol 3PC
Unlike the two-phase commit, the three-phase commit has two change points.
- Introduce timeouts. Introduce timeouts for both the coordinator and the participant.
- Insert a preparation phase between phases 1 and 2. The state of each participating node is consistent before the final submission stage.
That is, in addition to introducing a timeout mechanism, 3PC splits the preparation phase of 2PC in two again, so that there are three phases of CanCommit, PreCommit, and DoCommit.
1. CanCommit phase
The CanCommit phase for 3PC is actually very similar to the preparation phase for 2PC. The coordinator sends a COMMIT request to the participant, who returns a Yes response if he can commit, or a No response otherwise.
-
Transaction to ask
The coordinator sends the CanCommit request to the participant. Asks if a transaction commit operation can be performed. It then waits for the participant’s response.
-
In response to feedback
Once a participant receives a CanCommit request, it normally returns a Yes response and enters the preparatory state if it thinks it can execute the transaction successfully. Otherwise feedback No
2. PreCommit stage
The coordinator determines whether a transaction’s PreCommit operation can be remembered based on the response of the participant. Depending on the response, there are two possibilities. If the coordinator receives a Yes response from all participants, the transaction is pre-executed.
-
Send a pre-commit request
The coordinator sends the PreCommit request to the participant and enters the Prepared phase.
-
Transaction precommit
Upon receiving a PreCommit request, the participant performs a transaction and records undo and redo information to the transaction log.
-
In response to feedback
If the participant successfully executes the transaction, an ACK response is returned and the participant waits for the final instruction.
If either participant sends a No response to the coordinator, or if the coordinator does not receive a response from the participant after a timeout, the transaction is interrupted.
-
Send interrupt request
The coordinator sends abort requests to all participants.
-
Interrupt the transaction
After an abort request from the coordinator is received (or after a timeout, no request from the coordinator is received), the participant performs the interruption of the transaction.
3. DoCommit Phase The doCommit phase can be divided into the following two scenarios.
The actual transaction commit at this stage can also be divided into the following two scenarios.
3.1 Performing the Submission
-
Send submit request
When the coordinator receives an ACK response from the participant, he goes from the pre-committed state to the committed state. DoCommit requests are sent to all participants.
-
Transaction commit
After receiving the doCommit request, the participant performs the formal transaction commit. All transaction resources are released after the transaction commits.
-
In response to feedback
After the transaction commits, an Ack response is sent to the coordinator.
-
To complete the transaction
After receiving ack responses from all participants, the coordinator completes the transaction.
3.2 Interrupt Transaction If the coordinator does not receive the ACK response sent by the participant (either the recipient sent a non-ACK response, or the response timed out), the interrupt transaction will be executed.
-
Send interrupt request
The coordinator sends abort requests to all participants
-
Transaction rollback
After receiving the ABORT request, the participant uses the undo information it records in phase two to roll back the transaction and release all transaction resources after the rollback.
-
Feedback the result
After the participant completes the transaction rollback, it sends an ACK message to the coordinator
-
Interrupt the transaction
After receiving the ACK message from the participant, the coordinator performs the interruption of the transaction.
3.9 Distributed transaction solution
There are several solutions for distributed transactions:
- Global news
- Distributed transactions based on reliable messaging services
- TCC
- Best effort notice
3.9.1 Scheme 1: Global Transactions (DTP model)
Global transactions are implemented based on the DTP model. DTP is a Distributed Transaction Processing Reference Model proposed by X/Open organization. It specifies that three roles are required to implement distributed transactions:
-
AP: Application Application system
It is the business system we developed, and in our development process, we can use the transaction interface provided by the resource manager to implement distributed transactions.
-
TM: Transaction Manager Transaction Manager
- The implementation of distributed transactions is accomplished by the transaction manager, which provides the operation interface of distributed transactions for our business system to call. These interfaces are called TX interfaces.
- The transaction manager also manages all resource managers, scheduling them together through the XA interface they provide to implement distributed transactions.
- DTP is only a set of specifications for the realization of distributed transactions, and does not define how to realize distributed transactions in detail. TM can use 2PC, 3PC, Paxos and other protocols to realize distributed transactions.
-
RM: Resource Manager Indicates the Resource Manager
- The objects that can provide data services can be resource managers, such as databases, messaging middleware, caches, and so on. In most scenarios, the database is the resource manager in a distributed transaction.
- Resource managers can provide the transaction capability of a single database. They provide the commit, rollback and other capabilities of the database to the transaction manager through XA interface, so as to help the transaction manager to achieve distributed transaction management.
- XA is the interface defined by the DTP model to provide the transaction manager with the commit, rollback, and other capabilities of the resource manager (the database).
- DTP is only a set of specifications to implement distributed transactions, and the specific implementation of RM is done by database vendors.
- Is there distributed transaction middleware based on DTP model?
- What are the pros and cons of the DTP model?
3.9.2 Scheme 2: Distributed transaction based on reliable message service
This way of implementing distributed transactions needs to be realized by message-oriented middleware. Suppose you have two systems, A and B, that can handle task A and task B, respectively. In this case, A business process exists in system A, and task A and task B need to be processed in the same transaction. The following describes the implementation of this distributed transaction based on message-oriented middleware.
- Before system A processes task A, it sends A message to the messaging middleware
- Message-oriented middleware persists the message after receiving it, but does not deliver it. Downstream system B is still unaware of the message.
- After successful persistence, the message-oriented middleware returns an acknowledgement reply to system A.
- After receiving the acknowledgement, system A can process task A.
- After task A completes processing, send A Commit request to the messaging middleware. After the request is sent, the transaction is complete for system A and it can now work on other tasks. However, the COMMIT message may be lost in transit and the messaging middleware will not deliver the message to system B, resulting in inconsistencies in the system. This problem is handled by the transaction backcheck mechanism of message-oriented middleware, as described below.
- After receiving the Commit command, the message-oriented middleware delivers the message to system B, thus triggering the execution of task B.
- When task B completes, system B returns an acknowledgement reply to the messaging middleware, telling the messaging middleware that the message has been successfully consumed, and the distributed transaction is complete.
The above process can draw the following conclusions:
- Message-oriented middleware plays the role of distributed transaction coordinator.
- There is A time difference between system A completing task A and task B completing task B. In this time difference, the whole system is in a state of data inconsistency, but this temporary inconsistency is acceptable, because after a short time, the system can maintain data consistency, to meet the BASE theory.
In the preceding process, if task A fails to be processed, the rollback process is required, as shown in the following figure:
- If system A fails to process task A, A Rollback request is sent to the messaging middleware. Just like sending A Commit request, system A can assume that the rollback is complete after sending it, and it can do something else.
- After receiving the rollback request, the messaging middleware directly discards the message instead of delivering it to system B. In this way, task B of system B is not triggered.
The system is in A consistent state again because task A and task B are not executed.
The Commit and Rollback described above are ideal, but in a real world system, both the Commit and Rollback instructions can be lost in transit. How does messaging middleware ensure data consistency when this happens? The answer is the time-out query mechanism.
In addition to realizing normal business processes, system A also needs to provide an interface for transaction inquiry, which can be invoked by message middleware. When the messaging middleware receives A transactional message, it will start timing. If it does not receive Commit or Rollback instructions from system A within the timeout period, it will proactively call the transaction query interface provided by system A to inquire about the current status of the system. This interface returns three results:
-
submit
If the obtained status is “submitted,” the message is delivered to system B.
-
The rollback
If the obtained status is rollback, the message is discarded.
-
In the processing
If the obtained state is “Processing”, continue waiting.
The timeout query mechanism of message-middleware can prevent system inconsistency caused by the loss of Commit/Rollback instructions in the transmission process, and reduce the blocking time of the upstream system. After issuing Commit/Rollback instructions, the upstream system can process other tasks without waiting for confirmation. The timeout query mechanism is used to compensate for the loss of Commit/Rollback instructions, which greatly reduces the blocking time of the upstream system and improves the system concurrency.
Let’s talk about the reliability of the message delivery process. When the upstream system completes the task and submits the Commit directive to the message-oriented middleware, it is ready to process other tasks. At this point, it can consider the transaction completed and the message-oriented middleware will ensure that the message is successfully consumed by the downstream system! ** So how does this work? This is guaranteed by the messaging middleware delivery process.
After the message is delivered to the downstream system, the message-oriented middleware enters the blocking waiting state, and the downstream system immediately processes the task. After the task is processed, the message-oriented middleware returns the reply. The message-oriented middleware receives an acknowledgement and considers the transaction complete!
If a message is lost during delivery, or an acknowledgement reply for a message is lost on the way back, the messaging middleware redelivers after waiting for the acknowledgement reply to time out until the downstream consumer returns a successful consumption response. Of course, the general message middleware can set the number and interval of message retries. For example, if the first delivery fails, the message will be retried every five minutes for a total of three times. If the delivery fails after three retries, the message requires human intervention.
Some students may ask: why don’t you roll back a failed message instead of trying to redeliver it?
This involves the implementation cost of the whole distributed transaction system. We know that after system A will send A Commit instruction to the messaging middleware, it will do something else. If the message delivery fails and rollback is needed, system A needs to provide the rollback interface in advance, which undoubtedly increases the additional development cost and the complexity of the business system. The design goal of a service system is to minimize the complexity of the system while ensuring performance, thus reducing the operation and maintenance costs of the system.
I don’t know if you have noticed that the upstream system A submits Commit/Rollback messages to the message-oriented middleware in an asynchronous manner, that is, after the upstream system submits the messages, it can do other things, then the submission and Rollback are completely entrusted to the message-oriented middleware, and it completely trusts the message-oriented middleware. Assume that it must complete the commit or rollback of the transaction correctly. However, the messaging middleware delivers messages to downstream systems synchronously. That is, after the message middleware delivers the message to the downstream system, it will block the wait, and the downstream system will cancel the blocking wait after successfully processing the task and returning the acknowledgement reply. Why are the two inconsistent in design?
First, asynchronous communication between upstream systems and messaging middleware is used to improve system concurrency. Business systems deal directly with users, and user experience is particularly important. Therefore, this asynchronous communication mode can greatly reduce user waiting time. In addition, compared with synchronous communication, asynchronous communication has no long blocking wait, so the concurrency of the system is greatly increased. However, asynchronous communication can cause the problem of Commit/Rollback instruction loss, which is compensated by the timeout query mechanism of the messaging middleware.
So why synchronous communication between message-oriented middleware and downstream systems?
Asynchrony can improve system performance but increase system complexity. Synchronization reduces system concurrency, but the implementation cost is low. Therefore, synchronization can be used to reduce the complexity of the system when the requirements for concurrency are not very high, or when the server resources are abundant. We know that the message middleware is an independent of the business system of the third party middleware, coupling it with any business system to produce directly, it also have direct link to the user, it usually deployed on the server cluster of independent and has good expansibility, so don’t too worry about its performance, if the processing speed can’t meet our requirements, It can be solved by adding machines. Moreover, even some delay in the speed of message-oriented middleware processing is acceptable, because the BASE theory introduced earlier tells us that we are looking for final consistency, not real-time consistency, so it is acceptable for message-oriented delay to cause transient inconsistencies in transactions.
3.9.3 Option 3: Maximum Effort Notification (periodically proofread)
Maximum effort notification is also known as periodic proofreading, which is already included in Plan 2 and introduced here separately, mainly for the integrity of the knowledge system. This solution also requires the participation of message-oriented middleware as follows:
- After completing the task, the upstream system synchronously sends a message to the message middleware to ensure that the message is successfully persisted, and then the upstream system can do other things;
- After receiving the message, the message-oriented middleware delivers the message synchronously to the corresponding downstream system and triggers the task execution of the downstream system.
- When the downstream system processes successfully, it sends an acknowledgement reply back to the message-oriented middleware, which then deletes the message and completes the transaction.
The above is an idealized process, but in a real scenario, the following unexpected situations often occur:
- The messaging middleware failed to deliver a message to the downstream system
- The upstream system failed to send a message to the messaging middleware
For the first case, the message middleware has a retry mechanism, we can set the message in the message middleware retries and retry interval, for network instability caused by the message delivery failed, often try again after a few messages can be delivered successfully, if more than the retry limit still delivery fails, then the message middleware can no longer deliver the message, Instead, it is recorded in the failure message table. The message middleware needs to provide the failure message query interface, and the downstream system will periodically query the failure message and consume it, which is called “periodic proofreading”.
If repeated delivery and regular proofreading fail to solve the problem, it is often because something is seriously wrong with the downstream system, requiring human intervention.
In the second case, a message retransmission mechanism needs to be established in the upstream system. You can set up a local message table on the upstream system and put the task processing and inserting messages into the local message table in a local transaction. If a message fails to be inserted into the local message table, a rollback is triggered and the previous task processing result is cancelled. If all these steps are executed successfully, the local transaction is complete. A dedicated message sender then keeps sending messages from the local message table, and if it fails, it returns and tries again. Of course, a maximum retry limit should also be set on the sender. Generally, failure to reach the retry limit means that there is a serious problem with the messaging middleware that can only be resolved by human intervention.
For message-oriented middleware that does not support transactional messaging, this approach can be used if distributed transactions are to be implemented. It can be implemented through the retry mechanism + check regularly distributed transactions, but compared to the second kind of plan, it with a long cycle to achieve data consistency, but also need to retry in upstream system realize news release mechanism, to ensure that the message was released to message middleware, it certainly increase the cost of the development of the business system and make the business system is not pure, And this additional business logic will undoubtedly take up hardware resources of the business system, thus affecting performance.
Therefore, use messaging middleware that supports transactional messaging to implement distributed transactions, such as RocketMQ.
3.9.4 Scheme 4: TCC (two-stage and compensation)
TCC stands for Try Confirm Cancel, which is a compensated distributed transaction. As the name suggests, TCC implements distributed transactions in three steps:
- Try: Tries services to be executed
- This process does not execute the business, but just completes the consistency check for all the business and reserves all the resources required for execution
- Confirm: Perform the service
- This process actually starts executing the business, and because the consistency check has been done in the Try phase, this process is executed directly without any checks. In addition, service resources reserved during the Try phase are used.
- Cancel: Cancels the service
- If the service fails to be executed, the Cancel phase releases all occupied service resources and rolls back the operations performed in the Confirm phase.
The following uses a transfer example to explain how TCC implements distributed transactions.
Suppose user A uses his account balance to send user B A red envelope of 100 yuan, and the balance system and the red envelope system are two separate systems.
-
Try
- Create a flow and set the state of the flow to transaction
- Deduct 100 yuan (reserved service resources) from user A’s account
- After the Try succeeds, the Confirm phase is entered
- If any exception occurs during the Try process, it enters the Cancel phase
-
Confirm
- Add 100 yuan to user B’s red envelope account
- Sets the flow state to transaction completed
- If any exception occurs during the Confirm process, the Cancel phase is entered
- If the Confirm process is successful, the transaction ends
-
Cancel
- Add $100 to user A’s account
- Set the flow state to failed transaction
In the traditional transaction mechanism, the execution of business logic and transaction processing are completed by different components at different stages: the business logic part accesses resources to realize data storage, and its processing is responsible for by the business system; The transaction processing part implements transaction management by coordinating resource managers, whose processing is handled by the transaction manager. There is not much interaction between the two, so the transaction logic of a traditional transaction manager only needs to focus on the commit/ ROLLBACK phase, not the business execution phase.
TCC global transactions must be based on RM local transactions to implement global transactions
The TCC service consists of the Try, Confirm, and Cancel services. When the Try, Confirm, and Cancel services are executed, the TCC service accesses the Resource Manager (RM) to access data. These access operations must participate in the RM local transaction to make the data they change either commit or rollback.
This isn’t hard to understand. Consider the following scenario:
Assuming that service B in the figure is not based on RM local transactions (for example, RDBS, which can be simulated by setting auto-commit to true), if [B:Try] fails midway and the TCC transaction framework later decides to roll back the global transaction, The [B:Cancel] operation needs to determine which operations in [B:Try] have been written to the DB and which operations have not been written to the DB. Assume that the [B:Try] service has five write operations to the database. The [B:Cancel] service needs to determine whether the five operations take effect one by one and reverse the effective operations.
Unfortunately, since [B:Cancel] also has n (0<=n<=5) reverse write operations, if [B:Cancel] also fails, the subsequent [B:Cancel] execution will be more onerous. Since the first [B:Cancel] operation requires subsequent [B:Cancel] operations to determine which of the n (0<=n<=5) write libraries of the previous [B:Cancel] operation have been executed and which have not yet been executed, the idempotent problem is involved. The guarantee of idempotency may also involve additional write library operations, which have similar problems due to the lack of RM local transaction support… As you can imagine, the TCC transaction framework cannot effectively manage TCC global transactions without being based on RM local transactions.
On the other hand, TCC transactions based on RM local transaction can be easily handled: [B:Try] operation fails in the middle, TCC transaction framework can participate in RM local transaction rollback. Later, when the TCC transaction framework decides to rollback the global transaction, it does not need to perform the [B:Cancel] operation when it knows that the RM local transaction involved in the [B:Try] operation has rolled back.
In other words, when implementing the TCC transaction framework based on RM local transactions, the Cancel business of a TCC service either executes or does not execute, regardless of the partial execution.
The TCC transaction framework should provide idempotency guarantees for the Confirm/Cancel service
The idempotent nature of a service is generally considered to be a pointer to multiple (n>1) requests to the same service and a single (n=1) request to it, both of which have the same side effects.
In the TCC transaction model, the Confirm/Cancel business may be called repeatedly for a number of reasons. For example, the Confirm/Cancel business logic of each TCC service is invoked when a global transaction commits/rolls back. When performing these Confirm/Cancel services, a failure such as a network outage may occur that prevents the global transaction from completing. Therefore, the failure recovery mechanism will still recommit/roll back these outstanding global transactions, so that the Confirm/Cancel business logic of each TCC service participating in the global transaction will be invoked again.
Since the Confirm/Cancel service may be called multiple times, it needs to be idempotent. So, should the TCC transaction framework provide idempotent security? Or should business systems ensure idempotency themselves? Personally, IT is the TCC transaction framework that provides idempotent security. If only a few services have this problem, then it is ok for the business system to take responsibility; However, this is a common class of problems, and there is no doubt that all TCC services have idempotency problems with Confirm/Cancel services. The common issues of TCC services should be addressed by the TCC transaction framework; Moreover, if you consider the need for business systems to be responsible for idempotence, there is no doubt that this increases the complexity of business systems.
4. Deploy services
When we finish developing the business code, we need to move into the deployment phase. During the deployment process, we will introduce continuous integration, continuous delivery, and continuous deployment, and explain how to use them in microservices.
4.1 Continuous integration, continuous deployment and continuous delivery
Before introducing these three concepts, let’s take a look at the software development process after using these three concepts, as shown in the figure below:
The first is the development stage of the code. When the code is submitted to the code repository after development, the code needs to be compiled and packaged. The packaged products are called “constructs”, for example, the WAR package and JAR package generated after the packaging of the Web project are constructs. The build is syntactically free, but its quality is not guaranteed and must undergo a series of rigorous tests before it is eligible for deployment to production. We typically assign multiple environments to the system, such as development environment, test environment, pre-release environment, and production environment. Each environment has its own test criteria, and when the build completes the test of one environment and meets the delivery criteria, it automatically moves to the next environment. The build passes through each of these four environments in turn, and each time the build completes its verification, it becomes eligible for delivery to the next environment. When the pre-delivery environment is verified, it has the qualification to go online.
The testing and delivery processes go hand in hand, and each environment has its own set of testing standards. For example, in a development environment, when the code is submitted and needs to be compiled and packaged to generate a build, the code is unit tested during compilation, and if any test cases fail, the whole build process is aborted. The developer needs to fix the problem immediately, resubmit the code, and recompile the package.
When the unit test passes, the build is eligible to enter the test environment, where it is automatically deployed for a new round of testing. In a test environment, interface testing and manual testing are generally required. Interface testing is done by automated scripts, and once this process is complete, manual functional testing is required. After manual testing is complete, manual triggering is required to proceed to the next phase.
At this point the build will be deployed to the pre-launch environment. The pre-delivery environment is a production-like environment. The server configurations of the pre-delivery environment and the production environment must be highly consistent. In a pre-release environment, it is generally necessary to conduct performance tests on the build to know whether its performance indicators can meet the requirements of going online. When a build is ready to go live after pre-validation, it can go live at any time.
The above process covers continuous integration, continuous delivery, and continuous deployment, so let’s introduce these three concepts from a theoretical perspective.
4.1.1 Continuous integration
“Integration” refers to the process of merging modified/added code into the code repository, while “continuous integration” refers to the high frequency of merging code. What’s the good of that? You think, if we integrate the code at higher frequencies, so each integration code quantity becomes less, because every time integration are unit tests, and when there is a problem problem is narrowed down, so you can quickly locate to the wrong place, it will be easier to find problems. In addition, frequent integration allows problems to be exposed earlier, which is less costly to solve. Because there is a law in software testing that time is proportional to the cost of fixing a bug, that is, the longer the time, the more the cost of fixing a bug. So continuous integration is the ability to find problems early and fix them in time, which is very important to the quality of software.
4.1.2 Continuous Deployment
“Continuous deployment” refers to when there are multiple environments, after the build has finished testing in the previous environment, it will automatically deploy to the next environment and perform a series of tests until the build is ready to go live.
4.1.3 Continuous delivery
When the system passes all the tests, it is ready for deployment to production, a process known as “delivery.” “Continuous delivery” refers to each version of the structure which has launched, this requires every time after the code in the library has a new version, all need to trigger automatically build, test, deployment, delivery, and a series of process, when building material in a certain stage of the test failed, requires developers to solve this problem immediately, and to build, This ensures that each build is ready to go live and ready to be deployed into production.
4.2 Microservices and Continuous integration
Now that we know about continuous integration, let’s look at how microservices can be integrated with continuous integration. After microservitization of the system, the original single system was divided into multiple independent microservices. Continuous integration of single-service systems is relatively simple, with a one-to-one relationship between code base, build, and build. However, when we microservize the system, continuous integration becomes complicated. The following two methods of continuous integration in microservices are introduced, namely single-library multi-build and multi-library multi-build, and the advantages and disadvantages of these two methods and their application scenarios are introduced in turn.
4.2.1 Build multiple libraries with one library
“Single repository” refers to a single code repository, that is, the code of multiple modules of the entire system is maintained by a code repository. “Multiple builds” refers to the fact that there are multiple build projects in a continuous integration platform, and each build generates a build, as shown below:
In this continuous integration pattern, all code for an entire project is maintained in the same code repository. But in a CONTINUOUS integration platform, each service has its own build, so the continuous integration platform can produce its own build for each service.
This pattern of continuous integration is clearly not justified in a microservice architecture. First of all, A system may have many services, if these services code are in the same code maintenance in the warehouse, so A programmer in the development of service when A code is likely to be because of carelessness, changed the service B code, the after service building B will pose A safety hazard, if this problem was found before service B online, so also good, But it certainly adds extra work; However, if the problem is so subtle that it is not covered by previous test cases, service B can bring the problem into production, which can be costly to the enterprise. Therefore, in the microservice architecture, choose the multi-library multi-build mode to achieve continuous integration as much as possible, which will bring greater security.
Although this mode is unreasonable, it is also necessary to exist. When we are in the early stage of project construction, this mode will bring us more convenience. In the early stage of project construction, the boundary between services is often fuzzy, and a stable boundary can be constructed only after a period of evolution. So if you use a microservice architecture directly at the beginning of a project, the frequent adjustment of service boundaries can greatly increase the complexity of system development, as it is much more expensive to adjust boundaries between multiple systems than between multiple modules of a single system. Therefore, in the initial stage of project construction, we can use the single-service structure, and use modules inside the service as the boundary of each micro-service in the future. When the system evolves into a clear and stable boundary, the system can be divided into multiple micro-services. It makes sense for the code to be maintained in the same repository, which is consistent with the idea of rapid iteration in agile development.
4.2.2 Multi-library multi-build
When our system has stable, clear boundaries, we can evolve the system to a microservice architecture. At the same time, the continuous integration pattern can also evolve from single-repository multi-build to multi-repository multi-build.
In the multi-repository, multi-build pattern, each service has its own code repository, which does not interfere with each other. The development team only needs to focus on its own code repository for a few services. Each service has a separate build. This approach has clear logic, low maintenance costs, and avoids the problems that occur in the single-library multi-build model that affect other services.
4.3 Microservice constructs
The continuous integration platform refers to the artifacts generated after source code is compiled and packaged as “constructs”. Depending on the granularity of packaging, builds can be divided into three categories: platform builds, operating system builds, and mirror builds.
4.3.1 Platform constructs
Platform builds are those that are generated by a particular platform, such as Jar packages generated by the JVM, War packages, eggs generated by Python, and so on. But platform constructs need to be deployed in a specific container to run, such as a WAR that needs to run in a Servlet container, which in turn depends on the JVM environment. So if you want to deploy platform builds, you need to give them the environment they need to run well.
4.3.2 Operating system constructs
The operating system build is to package the system into an operating system executable program, such as CentOS RPM package, Windows MSI package, etc. These packages can be installed and run directly on the operating system. But like platform builds, operating system builds often need to depend on other environments, so they need to set up the dependencies required by the installation package before deployment. In addition, the complexity of configuring an operating system build is high and the cost of the build is high, so this approach is generally not used and is only described here.
4.3.3 Mirror constructs
A common disadvantage of both platform builds and operating system builds is the additional dependency required to install builds to run, which increases deployment complexity, and mirror builds work well to address this problem.
We can think of an image as a small operating system that contains all the dependencies needed to run the system and deployable the system in this “operating system.” This way, when the continuous integration platform has built the image, it can run it directly without any dependent installation, greatly simplifying the build. However, the images tend to be large and the process of building the images takes a long time to publish from the continuous integration server to the deployment server, which undoubtedly reduces the efficiency of the deployment. Fortunately, Docker has solved this problem. The continuous integration platform does not need to generate an image during the build process, but only a Dockerfile file of the image. The Dockerfile file uses commands to define what the image contains and how the image is created. The continuous integration server only needs to publish the smaller image file to the deployment server. The deployment server will then use the Docker build command to create an image based on the Dockerfile and create a container for the image to complete the deployment of the service.
Compared to platform builds and operating system builds, mirror builds do not require additional environment dependencies to be installed at deployment time. It simplifies the deployment process by making the environment dependent configurations complete when the continuous integration platform builds Dockerfile files.
5. References
Distributed transaction processing in large-scale SOA systems
Life beyond Distributed Transactions: an Apostate’s Opinion
Some thoughts on how to implement a TCC distributed transaction framework
How can a requestor ensure a consistent outcome across multiple, independent providers
About distributed transactions, two-phase commit protocol, three-stage commit protocol
Three-phase commit protocol
Microservice Architecture technology stack Selection manual
Analysis of application scenarios of the microservice architecture
Several common forms of sub – library sub – table and the difficulties that may be encountered