In the cloud native scenario, microservice architecture is favored by developers for its high flexibility, flexibility and efficiency. In the process of transforming the microservices architecture, people will also have such thinking: why choose the microservices architecture at the initial stage of the transformation of the architecture?

Author: Liu Guanjun

Source: Tencent Cloud middleware official account

Starting with the banking architecture, why did the architecture change from centralized to microservices

We all have some practical operation experience, such as ATM transfer, online banking, mobile banking, and the banking system behind it, which is the typical business architecture that most banks still use today: requests are first uploaded to a channel layer for information integration and filtering. Then it moves to the customer service layer, where a lot of SOA is embodied, and to the more granular application business layer.

Then there are independent modules. For example, the red one is the typical core business system. The top three modules are deposit, loan, payment and settlement. The following are customer management and other product management, which are related to the core business system. Generally speaking, customer information system can be divided into OLTP business online processing system and OLAP data analysis system. The core business system is OLTP system. With the continuous development of big data, OLAP system is becoming more and more important and valuable.

Instead of focusing on OLTP systems, let’s simplify it by first looking at the general architectural model of OLTP. Many Internet systems will put Gateway after Nginx, etc., because the banking system will have a front Gateway to protect the core system, so I simplified the Gateway, you can see that after the request is resolved from the DNS Server, Load the Balancer layer to the Web Server, Load the request as the client to the business layer, and then to the DB Cache and DB. This is a typical OLTP model.

  • How do traditional vendors like IBM do core banking systems?

Here’s a look at how that system was implemented for the IBM product. This focuses on AOR and TOR, which are service instances based on CICS transaction middleware. As we know, the core banking system is mainly dealing with transactions. Here, the object of CICS AOR is the application server, followed by CF for caching, and then DB2.

If you look at the HA logo on the far right, we actually do HA at every level. In distributed architecture, there are two characteristics that are very important: redundancy of services and automatic migration of services when they fail, and traditional centralized architecture is also very important for these two characteristics. The difference, though, is that the IBM system has one precondition: hardware is highly reliable, IBM specializes in servers, hardware devices, and redundant services, not just software. For example, in today’s distributed clusters based on X86, X86 machines are basically considered to be broken, so our software architecture and payload need to have a lot of redundant services.

In the past, IBM’s system had a hardware reliability bonus, and it did the same. For example, the CPU, memory, and transmission channels are all designed to be about 20% redundant, so that if something goes wrong, the backup hardware can be immediately activated to take over. I think the philosophy behind the nature of IT systems is the same, based on redundant services and failover.

This diagram shows the system architecture of an IBM mainframe. But big banking systems like ICBC and AGRICULTURAL Bank of China are made up of four of these machines, each of which you can imagine is the size of a refrigerator, with two or more virtual machines on each machine, and more or less the same for each bank. However, the difference is that TOR and AOR, also known as application servers, have different business logic layers and need to be deployed according to their own business. Others, such as DB2 and cache CF, are the same, and CF is also very distinctive, it is the integration of software and hardware, many such as the global bill number can be generated through this, the efficiency is very high, this is the core banking system made by IBM in the traditional bank of this architecture scheme.

A lot of systems based on Oracle or other traditional vendors are essentially the same, but with other vendors’ products, few vendors can achieve this level of hardware because of low supply and high price. So the later distributed system, can be called the business dimension reduction blow, from another way to change the IT world system, reduce the importance of hardware. Here are a few features to note: Horizontal split, application is a monomer and store data set, you can see the TOR plays is gateway layer effect, can do authentication, distribution, assembly, etc., the AOR was really processing business, here are the cache layer, data access layer to the DB, it has the characteristics of the horizontal split, but the application is still the monomer, the data is centralized.

The evolution of the Internet architecture, what changes did this process go through?

Let’s talk about centralization versus distribution, which is a system that a lot of banks think is too important to change, but these products need to be upgraded and patched. At this point, they all ask, can we not fight? If so, can you maintain a large number of service personnel on-site and online support, why? Because its application is monomer, problems of monomer will cause global problems, requiring people from all aspects to coordinate the investigation, which is very heavy burden. So this thing is very expensive for IBM and for the customer. In the whole process, if only one hundred times, there is a problem, but each time requires a lot of manpower and material resources, which is a very large consumption for all of us, and why we use micro services is very related.

  • So why the shift to microservices?

Microservices are autonomous services that can be developed, deployed, and evolved by small teams, overcoming the above problems to a large extent. Of course, the whole system didn’t move into microservices overnight. We’ve seen horizontal decoupling and SOA service-oriented architecture. Many people think SOA is similar to microservices, but it’s very different. SOA is mainly about transforming and connecting old systems, while microservices are often built on the basis of business division, which needs to be clarified and split, and then implemented after partitioning into services.

Microservices architecture has several buzzwords called Service Mesh and cloud native architecture, both of which are related to microservices. Service Mesh is a cross-language microservice framework. To better realize services, it can separate Service communication and governance infrastructure and let users focus on processing their own business logic. Therefore, Service Mesh is created. Because you need to think about how to deploy more efficiently after microservices are split? Hence the cloud native architecture. Cloud native architecture is more about the way you deploy and operate. It’s a system. The whole process uses a picture from red to green, it’s a gradual process, it’s not a dramatic process, it’s the natural development of the world.

  • What does the microservices architecture model look like?

The core point is that its services and its own data is a set of systems, a small team can operate and maintain. This is based on Conway’s theorem that what an organization looks like determines what your system looks like. Therefore, when it comes to the architectural transformation of microservices, it is not only the change of the architectural model, but also the change of the organizational structure. The characteristics reflected in the function is horizontal split, business is vertical split, data is distributed storage.

Let’s take a quick look at the comparative characteristics of centralized and distributed architectures to understand why they evolve this way. Centralized architecture is vertical scale-up, as a simple example: IBM mainframes, by adding hardware resources such as server CPUS. For example, it can handle 10,000 demands in one second before, but with scale-up, it can be expanded to 12,000 and 12,000, etc., so it achieves business expansion through vertical expansion.

This kind of data storage together has its own advantages. For example, the accounting system behind these big banks has tens of tons of data. Unlike distributed storage on many servers, it is stored in a centralized place, so it is easy to ensure the strong consistency of data. For distributed systems, our development model is microservices, where data is distributed and ultimately consistent is easier to achieve.

I feel very interesting discussion with a lot of people is a phenomenon of micro service architecture, actually nobody CARES what hardware machine, everyone seems to have the same assumption, this machine is an x86 server in this way, we pay more attention to the service architecture itself is about how to realize the goal of system from the aspects of software architecture. Of course, the differences and advantages and disadvantages of the two systems have been very clear, the monomer must be relatively slow, has not been suitable for the rapid development and iteration of Internet applications.

IBM is also in the process of transformation. It also hopes to have a good positioning in the cloud and cloud native system, which can support microservices and data transformation of customers. Microservice distributed system, of course, represents some new problems, such as service separation is very difficult to do, need some experience. There is also the issue of data consistency, which is compounded by the fact that data is stored separately.

Evolution and practice of distributed transactions under microservices Architecture

Next, let’s look at the evolution, classification, and implementation of distributed transactions under the microservices architecture. Let’s first look at distributed transactions or data consistency schemes, which are mainly divided into rigid transactions (strong ACID data consistency) and flexible transactions (data consistency BASE data consistency).

  • How are rigid transactions implemented?

2PC, traditional transaction middleware CICS, Tuxedo and databases DB2 and Oracle all implemented the 2PC protocol, and later 3PC, as well as distributed multi-node Paxos. 2PC is mainly two stages of voting and submission, but there is blocking and the state does not know whether to submit or roll back in extreme cases. In truth, the 2PC scenario protocol should be marked out to make it clearer. I’m going to give you a brief overview of it by way of a story.

The first two CICS and Tuxedo are the most famous transaction processing middleware of IBM and Oracle, while DB2 and Oracle are the transaction participants, and they both implement the XA protocol. For synchronous blocking problems in the process, and the resources of the locking time is longer, inside the previous system, the coordination with participants problems lead to the state of the uncertain agreement directive because, from the 2 PC in theory is difficult to solve, if appear this problem, what will appear affairs in a suspended state, this is why the Banks have a lot of list, There are a lot of logs, and in extreme cases manual intervention is needed to compensate.

But this kind of Case is relatively few, each system has to consider how to do the system input-output ratio, if this kind of Case is very few, it can tolerate 2PC protocol to deal with the transaction scheme. Therefore, a large number of bank transaction systems are still based on 2PC, because it well supports the strong consistency of data. What banks care most is that the accounting information must be consistent in real time.

After Paxos is distributed database using some algorithms, including ZAB, Raft are Paxos improved some algorithms, performance optimization, become faster.

And then the flexible transaction, the core point is the final consistency of the data, that’s the Base theory, the usability, the consistency of the data is just the final consistency, it doesn’t matter if you have some soft states in between. This flexible transaction mainly includes TCC, SAGA and AT, which is also a process of continuous optimization and evolution, followed by transaction messages, which are mostly used.

Just about IBM support transaction processing systems, distributed transaction to the Internet today, compensating the flexible transaction, essentially structure difference lies in the scene in front of the committed transaction is an independent service, but this service involves multiple DB participants, you can think it is a long transaction, involve more than one DB in long transaction. The core difference of this type of distributed transaction is that it converts this long transaction into a series of local transactions. I’m a service that only cares about its own transaction. If my data is inconsistent and the overall data is inconsistent, I have other compensating transaction frameworks, or message-oriented middleware, that schedule and coordinate to ensure that the data is ultimately consistent.

  • Why is transaction processing middleware like IBM performing so well?

As for the core banking system built by IBM I just mentioned, its DB is all stored in DB2, and most of the transaction participants involved are resource Managers, so an optimization has been made, which does not really follow the process that each participant of 2PC needs to go through. I first need to answer whether I can decide to roll back or submit. It’s not a rigid process. If it only has one Resource Manager, it just does it and compensates if it has problems. If there are no problems it is a step. This idea is actually the Saga idea. So the ideas in many of IBM’s products are very advanced.

For flexible transactions, let’s look at TCC first and then understand Saga. In the TCC architectural model, the transaction manager requests to start a transaction, gets a global XID from the transaction coordinator, and registers subtransactions. Each sub transaction has three interfaces Try and Commit/Cancel. I need to reserve resources through the Try method, depending on whether my account can be used properly and how much money I have in my account. If all sub-transactions are successfully reserved in the first step, Confirm will be performed; if some reservations fail, Cancel will be performed to rollback. However, Confirm does not actually do anything, just a confirmation process, the role of Saga is to optimize the previous parts of the Try and Confirm, directly execute, if there is a failure to retry forward or overall rollback.

Let’s see if TCC and 2PC are the same thing. It looks like it’s thinking the same thing, but it’s actually quite different.

Prepare for 2PC has nothing to do with this business logic, while TCC has business involvement and Saga is an optimization of TCC. The Saga mode only guarantees that ACD does not guarantee isolation. The implementation of Saga is divided into cooperative Saga and choreography Saga, and attention should be paid to idempotence and empty suspension.

Finally, in the practice of distributed transaction, we should pay attention to business avoidance, minimize distributed transaction scenarios, try to use BASE flexible transaction, and finally CP rigid transaction. Flexible transactions are usually recommended for Saga or asynchronous message processing. Rigid transactions are usually recommended for 2PC implementation, and in reality there are also a large number of customers with 2PC implementation of the system, we want to integrate into the framework of micro services, in fact, there are many things to consider.

Know the author

Guanjun Liu, former Director of Mainframe Middleware at IBM Systems Lab; He graduated from Nankai University with a bachelor’s degree in software Engineering in 2004. In 2008, HE graduated from Peking University and joined IBM China Development Center as a software engineer in transaction processing middleware group. 2014 Transformation Technical Support Manager; In 2017, he was promoted to Senior R&D Manager, and in 2019, he was promoted to R&d Director of Mainframe Middleware, in charge of the R&D and technical support team of mainframe platform middleware products in China.

portal

  • Techo Live: Why is centralized Architecture evolving to Microservices Architecture
  • For more information about micro services: Tencent micro service platform TSF
  • Feedback exchange your valuable opinion: Tencent cloud middleware q&A community