Microservices advocate splitting complex single applications into several loosely coupled services with simple functions, which can reduce development difficulty, enhance scalability, and facilitate agile development. The concept was put forward in 2012 and quickly spread all over the world. It has been praised by more and more developers. Many Internet industry giants and open source communities have started the discussion and practice of micro-services. According to Adrian Cockcrof, Director of Cloud architecture at Netflix, Hailo is made up of 160 different services, while Netflix has about 600. In China, many Internet companies, such as Alibaba, Tencent, 360, JINGdong and 58, have carried out micro-service transformation. There are dozens of microservices development frameworks, such as Dubbo, SpringCloud, Thrift, GRPC, etc.
Distributed transaction solution and its disadvantages
Although microservices are now in full swing, the practice is still in its infancy. Even the practice of Internet giants is mostly at the experimental level, and there are few cases of core business system microservitization. However, for many small and medium-sized Internet companies, it is more difficult to implement micro-services in view of problems such as experience and technical strength. World renowned software architect Chris Richardson pointed out the current problems of Microservices directly in Introduction to Microservices:
- The complexity of splitting a single application into a distributed system. Developers need to choose or implement message-based or RPC-based interprocess communication mechanisms, and developers need to write additional code to handle local failures of slow or unavailable requests to the destination service.
- The splitting of the database architecture caused by the splitting of individual applications. It is common for applications to update multiple business records, and the implementation of a single application is relatively simple. However, in microservices architecture, applications have to call multiple microservices to update multiple databases. Distributed transaction resolution is generally difficult to use, not only because of CAP theory, but also because some popular NoSQL databases and Message Queue systems do not support it at all. Finally, we come back to the final consistency solution, which is very challenging for developers.
- Testing the application of the microservices architecture is also more complex. Because there may be many calls between services, testing one service will have to start other services.
- Deploying, operating and maintaining an application with a microservice architecture becomes more difficult. Microservices typically consist of a large number of services, each with multiple running instances! There will be more changing parts to configure, deploy, expand, and monitor. You also need to implement a service discovery mechanism that allows other services to find the address of the service they need to communicate with.
As for the first and third problems, the author thinks that with the maturity of RPC framework, they have been gradually solved. For example, dubbo supports rmi, hessian, HTTP, webservice, thrift, redis and other communication protocols. Springcloud supports restful calls very well. Testing applications under the Spring architecture is also getting easier. As for the fourth question, with the development of Docker and DevOPS technologies and the launch of automatic operation and maintenance tools for public cloud PAAS platforms, the deployment, operation and maintenance of microservices will become easier and easier.
As for the second problem mentioned by Chris Richardson, there is no general solution to the transaction problems arising from microservices. Distributed transactions have become the biggest obstacle to the implementation of microservices, which is also the most challenging technical problem. Therefore, this paper will deeply discuss various solutions of distributed transactions under the micro-service architecture, and focus on the interpretation of alibaba’s distributed transaction solution —-GTS. GTS mentioned in the solution is the industry’s first and only general middleware to solve the micro-service distributed transaction problem, and can ensure the strong consistency of data.
SOA distributed transaction solution
Before microservices, most services in information systems were designed based on the concept of SOA (the difference between SOA and microservices), and services were relatively heavy. In the ERA of SOA, there are solutions to the problem of distributed transactions arising from service invocation. Some famous schemes are based on XA protocol, TCC scheme and message final consistency scheme.
2.1 XA protocol based scheme
This solution was first proposed by Oracle to solve the problem of transactions across data access. It is a strong consistency solution, which is jointly implemented by the transaction coordinator and the local resource manager. The transaction coordinator and resource manager communicate through the XA protocol. The implementation principle of XA protocol is shown in the following figure. It is divided into two phases, which is commonly referred to as the two-phase protocol.
Two-phase scheme is widely used in solving distributed transaction problems of database. Mainstream relational databases such as Oracle and Mysql all support XA protocol, and well-known distributed databases such as Ocenbase and DCDB are also based on two-phase protocol. In solving the problem of service transaction, in fact, XA protocol is not only applicable to multi-resource scenarios within a single service, but also can be applied to multi-resource scenarios across services, but additional transaction delivery mechanism is required. But it has the fatal shortcoming, the performance is not ideal. Because global transactions commit only after all branch transactions are ready, each transaction locks data for a longer time, making it difficult for XA schemes to meet high concurrency scenarios. And the performance problems of XA solutions will be magnified when addressing microservices. Because the application in accessing the service call way, network environment and so on than accessing the database is much more complex. For example, the application and the database it accesses are usually on the same LAN, while the services it invoks through RPC may belong to another network or on the public network, resulting in longer latency and higher probability of failure. This leads to further reductions in data lock time and system concurrency. Therefore, THE XA scheme is basically not suitable for solving the transaction problems of microservices.
2.2 TCC scheme
TCC schemeApplication is currently the highest voice, but also landing the most of a program. There are some open source implementations of the TCC framework, such asTCC-Transaction,ByteTCC. The TCC solution is an improvement on the two-phase solution, incorporating the functionality of the local resource manager into the business implementation. The service logic is divided into three parts: Try, Confirm, and Cancel. The try part completes the business preparation, the Confirm part completes the business submission, and the Cancel part completes the transaction rollback. The basic principle is shown in the figure below.
When a transaction starts, the business application registers with the transaction coordinator to start the transaction. The business application then invokes the try interfaces of all services, equivalent to the first phase of XA. A transaction rollback request is sent to the transaction coordinator if either service’s try interface invocation fails, otherwise a transaction commit request is sent. The transaction coordinator invokes the confirm interface of the transaction in turn upon receipt of the transaction rollback request, or otherwise invokes the Cancel interface for rollback, which is equivalent to the second phase of XA. If the phase 2 interface invocation fails, a retry is performed.
TCC scheme avoids the problem of data locking for a long time through three interfaces, and the business table can be released after each interface is called, which greatly improves the concurrency of business, which is also the biggest advantage of TCC scheme. Therefore, in the PERIOD of SOA, TCC scheme is widely used by many financial and e-commerce business systems. Of course, TCC also has its shortcomings, which are mainly reflected in the following two aspects:
- Development workload is large. It more than doubled the development volume by incorporating some of the resource manager capabilities into the development of each service, resulting in each interface of the service having to implement try, Confirm, cancle, and transaction coordinators.
- Difficult to implement. The system needs to record the service invocation links for each application. As I mentioned before, RPC calls are complicated. Call failures, such as network conditions and system faults, are regarded as normal, and rollback policies must be implemented according to different failure reasons. To meet the requirements for consistency, both confirm and cancle calls must succeed. If a call is unsuccessful, the transaction coordinator must try again. This requires that the Confirm and Cancle interfaces be idempotent.
As a result, TCC solutions are mostly adopted by large companies with strong R&D capabilities and urgent needs. It turns distributed transactions into a so-called “noble technology”, which is difficult for small and medium-sized enterprises to land due to limited personnel and weak technical strength. In addition, the author believes that microservices advocate lightweight and easy deployment of services, while TCC scheme integrates many transaction processing functions into business, which is too intrusive to business and leads to complex service logic, so it is more suitable for heavy services.
2.3 Message transaction consistency scheme
Message consistency scheme ensures upstream and downstream application data operation through message middlewareconsistency. The basic idea is to place the local operation and the send message in a single transaction, ensuring that either the local operation and the send message succeed or both fail. The downstream application subscribes to the message system and performs operations after receiving the message.
The following single business is used as an example. The basic process of placing an order is to store the order information first and then deduct the inventory of the corresponding goods. The two operations must be in one transaction. In the figure below, the business application first invokes the order service, which delivers the order message to MQ through the message processing service after the order is successfully stored. When the inventory service receives a message from MQ, it detains the inventory and sends a notification to the message processing service if the execution is successful. The message processing service monitors in real time whether the order message has timed out, and if so reposts it to MQ to drive the inventory service to hold the inventory. If the inventory capture operation fails, the inventory service subsequently receives the same order message from MQ, which needs to be repeated several times until success or human intervention. The inventory service needs to be idempotent.
The messaging scheme essentially transforms a distributed transaction into two local transactions and then relies on the retry mechanism of the downstream business to achieve final consistency. Compared with TCC scheme, message scheme has lower technical difficulty and easier landing, and is also a good choice for applications that are not sensitive to consistency. E-bay, a famous American e-commerce company, and Mogujie in China have all tried. The disadvantage of message consistency scheme is that it is highly intrusive to the application, which needs to be reformed based on message interface, and needs to build a special message system, which costs a lot.
GTS– Microservices distributed transaction solution
GTS is a distributed transaction middleware developed by Alibaba Middleware Department, which can provide a one-stop solution for distributed transactions in micro-service architecture. The basic idea of GTS scheme is to separate distributed transaction from specific business and develop general transaction middleware GTS at the platform level. The transaction middleware coordinates the consistency of service invocation, manages the life cycle of distributed transaction and automatically rolls back the failure of service invocation.
GTS scheme has three advantages. Firstly, it frees microservices from distributed transactions. The realization of microservices does not need to consider complex issues such as reverse interface, idempotency and rollback strategy, but only needs its own interface, which greatly reduces the difficulty and workload of microservices development. The transformation of distributed transactions from so-called “noble technology” to “civilian technology” that everyone can use is conducive to the implementation and promotion of micro-services. Second, GTS has little to no intrusion into business code, just transaction boundaries defined by the @txcTransaction annotation, and the cost of microservices accessing GTS is very low. Third, the performance of GTS is also excellent, 8 to 10 times that of traditional XA solutions.
3.1 Basic Principles
GTS middleware mainly includes Client (GTS Client), resource manager (GTS RM) and transaction coordinator (GTS Server). The GTS Client performs transaction initiation and termination. GTS RM starts, commits, and rolls back branch transactions. GTS Server is mainly responsible for the overall promotion of distributed transactions and the management of transaction life cycle.
The integrated structure of GTS and microservices is shown in the figure above. The GTS Client must be integrated with service applications and the RM must be integrated with microservices. When a business application initiates a service call, a new global transaction is first registered with the TC through the GTS Client. GTS Server then returns the globally unique transaction number xID to the business application. The XID is propagated to the server when the business application invokes the service. When the microserver performs database operations, it registers branch transactions with the GTS Server through GTS RM and submits branch transactions. If services A, B, and C are successfully invoked, the GTS Client notifies the GTS Server to end the transaction. If the C call fails, the GTS Client asks the GTS Server to initiate a global rollback. The rollback is then done by the respective RM.
3.2 Key mechanisms of GTS
- Availability THE GTS service is also a high availability cluster consisting of multiple nodes, which can expand flexibly and receive highly concurrent client requests. Supports cross-room deployment, same-city Dr, and geo-redundant Dr. Guarantee high availability under any abnormal conditions.
- Automatic rollback Policy When a microservice invocation fails, the GTS service can drive the RM of each microservice to roll back the invocation. For example, a money transfer application usually invokes a deposit service and a debit service to complete the transfer function. Call the debit service to deduct $100 from account A, and then call the deposit service to deposit $100 into account B. If the transfer application fails to invoke the deposit service, the GTS Client will ask the GTS Server to initiate A rollback, and then notify the RM corresponding to the debit service. The RM will directly add 100 yuan to account A. The GTS Server then notifies the transfer application that the rollback is successful. From this process, it can be seen that when the service fails to be invoked, in fact, the micro-service does not need to do any work. Instead, RM performs reverse operation for the micro-service, which naturally avoids idempotent operation. In the TCC scenario, the transaction coordinator needs to display the reverse interface from which the microservice is called, and retry if the reverse interface fails.
- Extensibility In some cases, applications need to invoke interfaces from third-party systems or microservers that are not developed based on GTS, and GTS cannot access the implementation of these services. In this case, MT mode of GTS is needed. The MT mode of GTS can be equivalent to the TCC mode.
MT mode reserves phase one and phase two commit interfaces, allowing applications to intervene in the two-phase commit of GTS. The application will commit and roll back the interface after registration, and GTS will automatically complete the call.
- Isolation level GTS currently supports read uncommitted and read committed isolation levels.
3.3 Comparison between GTS and other schemes
1. Compare with XA scheme
Compared to XA solutions, GTS is more generic and can mask the low-level implementation details from the upper level of the business with little intrusion. This is especially useful in the era of the service, the service is facing a large number of small and medium-sized enterprises, even is a people developer, different business demands, universal, standard distributed transaction products are necessary, can be let out in the developer from the underlying technology details, more focus on the business logic implementation, to get more efficient and rapid business development. Both schemes can comply with ACID properties, and both can achieve strong transaction consistency. GTS has higher performance than XA scheme.
2. Compare with TCC scheme
The biggest difference between GTS scheme and TCC scheme is the different level of distributed transaction implementation. TCC scheme chooses to implement the distributed transaction function from the business level, and implements the rollback and retry functions of transactions in microservices. However, GTS chooses to solve the distributed transaction problem from the middleware level, and has almost no intrusion on microservices. Both schemes can achieve better performance and ensure consistency of calls. TCC solution is difficult to implement, which is suitable for teams with strong technical strength. GTS scheme can achieve strong consistency of transactions. In addition, after adopting GTS scheme, microservices will be simpler and the coupling is very low. TCC mainly provides the development framework, the implementation needs to rely on the business side, while GTS is a complete distributed transaction solution, all distributed transaction problems do not need the business side to intervene.
3. Final consistency comparison with messages
Compared with messaging scheme, GTS scheme is very less intrusive and can achieve strong data consistency. Using message scheme, there is a strong coupling between upstream and downstream services, testing and deployment are not very convenient, need to build a separate message system. However, the messaging scheme is relatively simple to implement and is an option if consistency requirements are not high.
3.4 Application Scenarios of GTS
GTS can be applied in many fields involving service invocation, including but not limited to financial payment, telecommunications, e-commerce, express logistics, advertising marketing, social networking, instant messaging, mobile games, video, Internet of Things, Internet of vehicles, etc. For detailed introduction, you can read GTS– Alibaba’s New Solution for Distributed Transactions.
3.5 Output form of GTS
GTS can be output through public cloud platforms, public cloud services, and private cloud platforms.
- 1 Output through the public cloud platform
This form of output is mainly for Aliyun users. If the user’s business system has been deployed to Ali Cloud, you can directlyTo apply for openingPublic cloud GTS. After opening, the business application can guarantee the consistent line of service invocation through GTS. In this scenario, the network environment between the service system and THE GTS is ideal, and the GTS provides lower response time.
Public cloud provides rich integration samples with Dubbo, SpringCloud, etc. You can check them out.
- 2 Output the output through the public network cloud service
This output mode is mainly for non-Aliyun users, which is more convenient and flexible to use. Business systems can enjoy the cloud services provided by GTS as long as they can connect to the Internet. In the case of network jitter and intermittent disconnection, GTS can still ensure the consistency of service invocation. Under the normal network environment, take the global transaction containing two local transactions as an example, the transaction completion time is about 20ms, the service can easily realize the distributed transaction of more than 1000TPS, which can meet the needs of most business systems. It can also be used for local development and testing.
Now two sample samples are provided: sample-txc-simple and sample-txc-sample. Sample-txc-simple is the basic sample to get started with GTS. After downloading, you can directly run the sample after setting up the local database environment. Sampler – TXc-Dubbo is a sample integration of GTS and Dubbo frameworks, and can also be run directly on the local machine.
- 3 Output through a proprietary cloud platform
This mode is mainly for large users who have built their own private cloud platform. GTS can be directly deployed on the user’s private cloud platform to provide distributed transaction services for the private cloud. At present, the state Grid Corporation of China, China Post, Zhejiang Tobacco and other super-large enterprises use GTS in their proprietary clouds to ensure data consistency.
3.6 Usage of GTS
GTS is very non-invasive to applications and very simple to use. The following uses the order storage application as an example. The order business application completes the order business by calling the order service and the inventory service, and the service development framework is Dubbo.
1 Application of order service
Use the @TXcTransaction annotation around the business function to start a distributed transaction. The Dubbo application propagates the TRANSACTION XID of GTS to the server by hiding the parameter.
@txcTransaction (timeout = 1000 * 10) public void Bussiness(OrderServiceInterface OS,StockServiceInterface ss) {// Get xID String xid = TxcContext.getCurrentXid(); Rpccontext.getcontext ().setAttachment(); rpcContext.getContext ();"xid",xid); int ret = os.setOrder(new Order(pid,num,new Date())); Rpccontext.getcontext ().setAttachment() rpcContext.getContext ().setAttachment()"xid",xid);
}Copy the code
2 Server usage mode
The inventory service
public int setStock(Stock sk) {// Get xID from dubbo context String xid = rpcContext.getContext ()."xid"); // Bind the transaction ID to txcContext.bind (xid,null); Ret = jdbctemplate2.update ("update stock set number = number -? where pid = ?",new Object[]{sk.getPnum(),sk.getPid()});
return ret;
}Copy the code
3.7 GTS Application
Under the premise of meeting transaction ACID, GTS can achieve more than 15000 TPS performance (more than 100 million transactions completed in two hours) on a single server with common configuration. At present, it has been widely used in Taobao, Tmall, Ali Pictures, Taobaopiao.com, Ali Mama, 1688 and other Alibaba business systems, and has withstood the test of massive requests for the double 11 in 2016 and 2017. The maximum flow of an online business system has reached 100,000 TPS (100,000 transactions per second). After GTS is outputted on Ali Cloud and proprietary cloud, many users use GTS to solve the distributed transaction problems of SpringCloud, Dubbo, Edas and other micro-services, including State Grid, China Post, China Tobacco, Xtep, Zhejiang Public Security, Deppon Express, One Step Sharing Technology, etc. Involving power, logistics, ETC, tobacco, finance, retail, e-commerce, shared travel and more than a dozen industries, getUnanimous recognition from users.
The picture above shows the integration of GTS and SpringCloud, applied to a shared travel system. In the scenario of business shared travel, GTS supports the consistency of application transactions of the Internet of Things system, order system, payment system, operation and maintenance system, analysis system and other systems to ensure massive orders and tens of millions of transactions.
The original link