The interview questions

Distributed transactions? How do you solve distributed transaction problems?

Analysis of interview questions

Generally speaking, the implementation of distributed transactions mainly includes the following five schemes:

  • XA scheme
  • TCC scheme
  • Local message table
  • Reliable message final consistency scheme
  • Best efforts to inform the scheme

Two-stage submission plan /XA plan

The so-called XA scheme, namely, two-phase commit, has the concept of a transaction manager that coordinates transactions between multiple databases (resource managers). The transaction manager asks each database are you ready? If each database replies ok, then the transaction is formally committed and the operation is performed on each database; If either of the databases answers no, then the transaction is rolled back.

This distributed transaction scheme is more suitable for distributed transactions across multiple libraries in a single block application, and because it relies heavily on the database level to handle complex transactions, the efficiency is very low, and it is definitely not suitable for high concurrency scenarios. If you want to play, then based on Spring + JTA can be done, their own random search demo see know.

This scheme is rarely used. Generally speaking, if there is such an operation across multiple libraries within a system, it is not compliant. I can tell you, now microservices, a big system divided into dozens or even hundreds of services. In general, our rules and specifications require each service to operate on only one database of its own.

If you want to operate other corresponding library services, are not allowed to direct other service of library, in violation of the micro service architecture specification, you literally crossing random access, hundreds of services, all the broken, the administration of such a service can’t can’t control, there may be data correction by others, their own libraries written by others, and so on and so forth.

If you want to operate on someone else’s service library, you must do so by calling another service’s interface, never allowing cross-access to someone else’s database.


TCC scheme

TCC stands for Try, Confirm, or Cancel.

  • Try phase: This phase checks the resources of each service and locks or reserves the resources.
  • Confirm phase: This phase is about performing the actual operations in the various services.
  • Cancel phase: If the business method execution of any of the services fails, there is a need to compensate by performing a rollback of the business logic that has been successfully executed. (Roll back those that performed successfully)

To be honest, this scheme is rarely used by people, and we use it relatively rarely, but there are scenarios where it is used. Because this transaction rollback is actually heavily dependent on your own code to roll back and compensate, the compensation code can be huge and very disgusting.

For example, generally speaking, for scenarios related to money, dealing with money, payment, transaction, we use TCC to strictly guarantee that distributed transactions will either all succeed or all automatically roll back, strictly guarantee the correctness of funds, guarantee that there will be no problems with funds.

And it’s best if you have a shorter time frame for each business.

But to be honest, generally try not to do so, write your own rollback logic, or compensation logic, it is too disgusting, the business code is difficult to maintain.


Local message table

The local message list is actually a set of ideas developed by foreign ebay.

It goes something like this:

  1. A When the system operates in its own local transaction, it inserts A data into the message table;
  2. System A then sends this message to MQ;
  3. After receiving the message, system B inserts a data into its local message table in a transaction and performs other business operations at the same time. If the message has been processed, the transaction will be rolled back to ensure that the message will not be processed again.
  4. After the execution succeeds, system B updates the status of its local message table and that of system A.
  5. If system B fails to process the message table, the status of the message table is not updated. At this time, system A periodically scans its message table. If there are unprocessed messages, system A sends them to MQ again for B to process again.
  6. This scheme ensures final consistency. Even if B fails, A will continue to resend messages until B succeeds.

To be honest, the biggest problem with this scheme is that it relies heavily on the message table of the database to manage transactions and so on. What if it is a high concurrency scenario? How to expand? So it’s really rarely used.


Reliable message final consistency scheme

This means that instead of using native message tables, you can implement transactions directly based on MQ. Alibaba’s RocketMQ, for example, supports message transactions.

It means:

  1. System A will send A Prepared message to MQ first. If the prepared message fails to be sent, the operation will be cancelled.
  2. If the message is sent successfully, the local transaction is then executed, telling MQ to send an acknowledgement message if it succeeds, and telling MQ to roll back the message if it fails.
  3. If an acknowledgement message is sent, system B receives the acknowledgement message and executes the local transaction.
  4. Mq will automatically poll all prepared messages to call back to your interface and ask you if this message failed in a local transaction. Should you retry or roll back any unconfirmed messages? Generally you can check the database here to see if the previous local transaction was executed, and if it was rolled back, then roll back here as well. This is to avoid the possibility that the local transaction executed successfully, but the confirmation message sent failed.
  5. In this scenario, what if the transaction for system B fails? Retry, automatically retry until successful, if it is not possible, or for the important fund services roll back, for example, after the local rollback of system B, try to inform system A to roll back; Or send an alarm for manual rollback and compensation.
  6. This is more appropriate, most domestic Internet companies are playing this way, either you use RocketMQ support, or you based on similar ActiveMQ? The RabbitMQ? They encapsulate a set of similar logic, in short, the idea is this way.


Best efforts to inform the scheme

The general idea of this plan is:

  1. After the local transaction is completed, system A sends A message to MQ.
  2. There will be a Max effort notification service dedicated to consuming MQ, which will consume MQ and write it to the database, or put it on a memory queue, and then call the interface of system B;
  3. If system B succeeds, it is OK; If system B fails, the best effort notification service periodically tries to call system B again, N times, and finally gives up.

How does your company handle distributed transactions?

If you do get asked, it’s fair to say that we used TCC for a very strict scenario to ensure strong consistency; Then there are other scenarios based on Alibaba’s RocketMQ to implement distributed transactions.

You find a scenario where you can’t go wrong with strict funding requirements, you can say you’re using TCC; In a general distributed transaction scenario, the inventory service is called to update the inventory after the order is inserted. The inventory data is not as sensitive as the money, and the final consistency scheme can be used with reliable messages.

As a friendly reminder, RocketMQ prior to 3.2.6 could have followed the above ideas, but since then the interface has made some changes that I won’t repeat here.

Of course, if you wish, you can implement a set of distributed transactions by referring to the Reliable Message Ultimate Consistency scheme, such as RocketMQ for fun.