Introductions to the 0.

In daily life, payment happens all the time, from offline supermarket shopping to online takeout food and e-commerce online shopping, no matter through cash, POS card swiping or third-party payment such as wechat Alipay. Online payment with timely and convenient with the perfection of experience, of course, there are a few experience is not smooth, when early we buy train ticket in the PC version 12306, for example, when the payment is completed, the order of payment status often cannot be updated in a timely manner, there will be some time delay, sometimes even wait for a long time in unpaid state.

In the process of payment, due to various reasons (such as external channel processing problems, asynchronous callback delays), the process is stopped halfway, and users will be at a loss when they see that the order is still unpaid. In this case, a mechanism is needed to promote the completion of the transaction. This paper introduces a more general document compensation mode by taking the order replenishment mechanism in the tripartite payment system as an example.

1. Introduction of tripartite payment system

1.1 What is tripartite payment

The so-called third-party payment is a third-party independent institution that has signed contracts with major banks and is independent of merchants and banks. It has certain strength and reputation guarantee and provides payment and settlement services for merchants and consumers. It is a credible third party between the buyer and the seller, assuming the role of guarantor and fund custodian. Tripartite payment can also be called virtual account payment. Consumers open a virtual account in a third-party payment agency and use the funds in the virtual account to make payment. Common tripartite payments in the industry include Alipay, wechat Pay, Meituan Pay, JINGdong Pay and so on.

1.2 Transaction & payment system in Tripartite Payment

What is the transaction, the most intuitive description is “one hand to pay, one hand to deliver”, the transaction will enable the buyer and seller to form the creditor’s right and debt relationship. The existence of transaction is the premise of payment, and users complete the transaction by using a certain payment method. Transaction is the driver of payment process, which combines different payment instructions according to specific scenarios to complete the transfer of transaction funds.

Payment is a tool to deal with capital flow, the purpose is to pay off the creditor’s rights and debt relationship; Support a variety of payment methods (such as bank card payment, balance payment, coupon combination payment, credit payment similar to spending etc.), responsible for the connection of accounting, accounting, billing system and other capital processing capabilities, receive payment instructions, drive the completion of capital exchange. The actual payment behavior (actual funds) and internal accounting (virtual funds) combined to ensure the consistency of the real.

The overall business structure of tripartite payment is shown in Figure 1, in which the transaction core and payment core are in the “order receiving and payment domain” in terms of business division, and are equipped with common functions such as receipt, payment, refund and recharge of ordinary transactions, transfer and withdrawal, etc., as well as the ability of joint order payment, guarantee and account distribution supporting e-commerce business. The transaction and payment core has an exception check module, which includes all business compensation process, is also the main part of this paper.

Figure 1. Business architecture of tripartite payment

2. What is replenishment & why is it needed

When a transaction is interrupted during payment due to various anomalies in the link, the transaction is in an intermediate state, commonly known as a “card order”, where the card is stuck and does not move further. In another case, the payment core initiates the deduction to the channel. After the channel receives the deduction, the bank card is successfully deducted. However, due to various reasons, the payment is not completed and the user does not enjoy the corresponding rights and interests, but the bank card money has been deducted.

Whether it is a card or a drop order, is in the middle of the order. Order replenishment is to compensate the order in the intermediate state until it reaches the final state (success or failure). Replenishment generally has two key points. One is the effectiveness of compensation. In extreme cases, compensation may not be successful for many times. Another is the timeliness of compensation, because the longer the transaction is suspended, the worse the user experience.

The order replenishment of the transaction core and the payment core complement each other and have a certain degree of similarity in design and implementation. We take the order replenishment of the payment core as an example to introduce the abnormal order replenishment mechanism.

3. How is the replenishment realized

This chapter first introduces the business process, explains the premise of order replenishment, and then introduces the evolution of order replenishment mechanism, the existing problems in each version and how to solve the problems in the next version.

3.1 Finite state machines and idempotence

Finite state machines that identify money operations

Firstly, we take a balance withdrawal initiated by the user as an example to illustrate the following business process, which is shown in Figure 2 after simplification.

Figure 2. Balance withdrawal process

First, the payment order is generated, and then the account system is requested to deduct the balance under the user account, and then the payment operation is initiated to the external channel. After the completion of the fund operation, the results are processed uniformly and the document information is updated. Finally, there are some asynchronous notifications to the upstream and downstream, including messages and RPC callback in form.

We recorded the status of each key fund operation in the database, as shown in the table below. Where the money out of the money from, where the money into the money to go, that is to roll back trading. In the case of withdrawal, the user pays from the account to the user’s bank card. Other scenarios, such as top-up (bank card -> user payment account), have different money flows. The scene in bold in the last two lines of the table has not reached the final state, which we need to compensate.

Different stage scenarios of balance withdrawal State of the paragraph The kind of state Rush is state The total state
The initial state INIT INIT INIT PROCESSING
Withdrawal success SUCCESS SUCCESS INIT SUCCESS
If the user succeeds in deducting the balance and fails to pay to the bank card, it is necessary to launch a rectification to the account, that is, roll back the user’s balance SUCCESS FAILED SUCCESS REVERSED
The user failed to deduct due to insufficient balance FAILED INIT INIT FAILED
The user account has timed out or is back in processing PROCESSING INIT INIT PROCESSING
Call channel payment to bank card timed out or returned to processing SUCCESS PROCESSING INIT PROCESSING

In order to facilitate understanding, we omit the relevant operations of flushing here. The state machine transformation of a balance withdrawal process is shown in Figure 3.

Figure 3. Balance withdrawal process with state machine transition

Reentrant and idempotent guarantee

Initiating a payment involves multiple intersystem calls, and communication timeouts due to network reasons are common problems. At the same time, the upstream system may re-initiate the request, which requires our system to keep the result of the operation idempotent. A small number of users may also have concurrent requests caused by the simultaneous operation of multiple terminals, requiring us to ensure the reentrancy of the interface. In addition to the service itself, our downstream dependencies also need to ensure that their interfaces have the same capabilities.

Let’s talk about reentrancy and idempotence.

  • Reentrant: Correctness is guaranteed under concurrent requests.
  • Idempotent: Repeat the same input many times to get the same output. Idempotence technically also involves reentrancy.

Specifically, in business, idempotency is for a payment that has reached the final state. For the request that did not get the final business result at the first time, the result of the call can be different (processing -> processing success or failure). So how do we ensure that business processes are reentrant and idempotent? Let’s disassemble each step separately:

  1. Generate payment list: first payment documents can transfer business parties to guarantee uniqueness of external order number as the only index, inserted into the database if the only index conflict, it will query existing data for parameters such as power calibration, if and when the time request parameters exactly is repeated requests, using payment list in the DB continue to push forward the follow-up process; If inconsistent, an error is returned.
  2. Funds processing flow: Account and channel systems each ensure idempotency of their interfaces. We also maintain the status of each downstream operation, depending on the state machine to decide whether to proceed or not, and try not to output repeated flows downstream. For example, if the payment order has completed all funds processing and the state machine is in its final state, the interface can directly return the corresponding result.
  3. To update the payment order information, add the row level exclusive lock to the payment order first, and then update it to ensure that only one of multiple concurrent requests will succeed.
  4. Asynchronous notification, after the payment order has advanced to the final state.

With reentrant and idempotent guarantees, we can reuse a lot of forward processes to implement complementary interfaces.

Initial version 3.2

Generally speaking, the most common form of order replenishment is to set a scheduled task to sweep the table periodically to complete business compensation. The implementation is relatively simple, but the timeliness is not enough, and the user experience is not good for the transaction of collection and transfer. We used instant compensation via message queues, as shown in Figure 4. Instead of doing the compensation work, the compensation consumer parses the message and then pays the compensation interface exposed by the core via RPC calls. Why not direct compensation among consumers? The main reason for this is to converge the logic to one place for easy maintenance.

Figure 4. Order replenishment process when abnormal balance withdrawal occurs

Of course, the order may still fail and we can send the compensation message again. However, this loop cannot continue forever, so you need to set a maximum number of retries. When compensation fails multiple times, usually because of a problem with the downstream system, we need to slow down the frequency of compensation, increasing each compensation interval as the number of retries increases, via delayed messages in RocketMQ.

Here are three easy-to-think questions:

  1. If the exception message fails to be sent and there is no retry mechanism upstream, the order can hang there forever, as shown in Figure 5.
  2. The compensation consumer’s request to pay the core reorder was unsuccessful, either due to timeout but the compensation was successful, or the request never passed at all, as shown in Figure 6.
  3. If the maximum number of retries does not succeed, how to deal with this order.

Figure 5. Compensation message sending failure

Figure 6. Compensating for message consumption failures

Improved version 3.3

For problem 1, if retry still fails to send, we resolve this problem by introducing an exception message table and dropping the failed message into the database. The table records the order number, current retry times, exception classification, record status, message body, and other fields. If step 4 in Figure 6 fails to send the message, the order is placed in an exception table in the DB and a scheduled task is set up to process it. With the current availability of RocketMQ, abnormal data is rare. See Figure 7.

Figure 7. Improved version 1 for message production/consumption exceptions

For the second issue, if the compensation consumer fails to invoke the payment core, the compensation consumer HandleMessage will throw an error to the upper layer and, using RocketMQ’s gradient retry mechanism, will enter a dead-letter queue when the number of consumer retries reaches a certain limit. As shown in Figure 8, this is typically a service or network problem that can be pulled from the dead-letter queue for unified processing after recovery.

Figure 8. Improved version -2 for message production/consumption exceptions

Of course, there are more extreme cases, MQ and DB requests fail? With the current availability of MQ and DB, simultaneous failure of this kind of basic can not be considered, alarm manual intervention can be.

For problem 3, if the compensation fails even after the maximum number of retries is exceeded, it is usually due to downstream dependency problems. In this case we also put it in the exception table.

For these two types of fish that slip through the net, the operational capability of supporting single/batch payment order compensation is required for manual intervention; It is best to have a bottom-of-the-line task that runs during peak periods of business, scanning business receipts and compensating for orders that have not been filled over a period of time.

In addition, bottom-of-the-line tasks may cause a temporary accumulation of messages, affecting the real-time compensation process on the line, which can be isolated by an independent queue.

3.4 the final version

In fact, if only the operation of the asynchronous notification class fails, there is no need to go through the whole business process again every time, just make up for what is missing. So we divide exceptions into different types, separating some asynchronous operations from business processes, and handling them in a more refined way:

  • Notifying downstream MQ of failure, just send this message weight once;
  • If the callback RPC of notification transaction fails, the RPC request can be serialized to the message body, and the compensation can be made by deserializing the RPC request in the message body and directly initiating another RPC.
  • If DB update fails, serialize the update parameters to the message body, and initiate an update again when the order is replenished.
  • If an exception occurs during service processing, you need to perform service compensation again.

Figure 9 illustrates the message parameters for each compensation type using these exceptions as examples.

Figure 9. Classification compensation

Our final order replenishment system is shown in Figure 10, which not only ensures the timeliness of compensation through instant message, but also takes the initiative. Delay message retries, exception table of landing failure message and bottom-saving task are also used to ensure the validity of compensation, which is a foolproof shield. It can not only be used for compensation of payment documents, but also can be used as a general solution by ensuring the reentrancy of the process. However, it is not suitable for stateless and non-reentrant business forms.

Figure 10. Anomaly compensation system

4. To summarize

This paper first introduces what is the single, and then based on the three payment system to realize complete elaborated the evolution process of the single mechanism, eventually evolved into a relatively general exception handling mode, which are based on message queue, finite state machine with multiple tasks out of the business layer eventual consistency guarantee mechanism, correct me for your reference.

5. About us

We are financial payment technology team. We are deeply engaged in the payment field to support the rapid growth of the company’s business and at the same time to accumulate our own payment capacity. If you are passionate about technology and want to witness the rapid growth of business, welcome to join the Financial payment technology team. Currently we have recruitment needs in Beijing, Shenzhen and Hangzhou. Email address: [email protected], email title: name – years of work – finance – payment.

More share

UME – Rich Flutter debugging tools

An example of Go compiler code optimized for bug location and fix parsing

Bytedance breaks federal Learning: Open source Fedlearner framework increases advertising efficiency by 209%

Douyin Quality Construction – iOS Startup optimization principles


Welcome to Bytedance Technical Team

Resume mailing address: [email protected]