When designing the trading system, stability, scalability and maintainability are the key points we need to pay attention to. This paper will discuss how to solve the above problems through the application of state machine in the trading system.

Hornet’s nest air ticket order trading system

The trading system is often characterized by multiple order dimensions, multiple states, long trading links and complex processes. Take the air ticket transaction in hornet’s Nest transportation business as an example, an order submitted by a user may contain a lot of information besides the air ticket information, such as insurance or other additional products. Among them insurance is divided into a lot of types again, wait like air accident danger, delay danger, combination danger.

From the perspective of users, an order is composed of the main product air ticket and additional products, and the payment is made as a whole. If you want to refund the ticket or refund the warranty, you can also partially operate it. From the perspective of suppliers, there are independent suppliers behind each product in an order, such as the supplier of air tickets and the supplier of insurance. Orders of each supplier need to be invoiced separately and settled independently.

The user’s purchase and payment process and the supplier’s ticket-issuing process constitute an organic whole interwoven in the air ticket trading system, inseparable.

Application and optimization of state machine in air ticket transaction system

The concept of finite state machines

Finite state machine (hereinafter referred to as state machine) is a tool used to model the behavior of things or objects.

State machines reduce complex logic to a finite number of stable states, build mathematical models of behaviors such as transitions and actions between these states, and judge events in the stable states.

Enter an event into the state machine, and the state machine uniquely determines a state transition based on the current state and the events that are triggered.

The nature of business systems is to describe the real world, so almost all business systems have a shadow of a state machine in them. The order transaction process is naturally suitable for the application of the state machine model.

Taking the user payment process as an example, if the state machine is not used, a series of actions need to be performed when the payment callback is received: query the payment serial number, record the payment time, change the status of the main order to paid, notify the supplier to issue tickets, record the notification issue time, change the status of the sub-order of the air ticket to issue tickets…… The logic is cumbersome and the code is heavily coupled.

In order to make the order status of the trading system flow down correctly according to the design process, for example, the current user has paid, not allowed to pay; Current order has been closed, can’t notice the ticket, and so on, our way through the use of state machine to optimize the ticket trading system, will all status, events and actions are pulled out, the complex state transition logic unified management, to replace the tedious if the else judgment, decoupling ticket transaction system of complicated problem, Intuitive and easy to operate, making the system easier to maintain and manage.

State machine design

At the database design level, we regard the whole order as a main order and supplier’s order as a sub-order. Assume that a user has purchased air ticket and insurance at the same time, because air ticket and insurance correspond to different suppliers, that is, one main order corresponds to two sub_orders. The main order ORDER records user information (UID, contact information, total order price, etc.), and the sub-order sub_order records product type, supplier order number, settlement price, etc.

At the same time, we separate forward ticket issue and reverse ticket refund and change into different subsystems. In this way, each subsystem is completely independent, which is conducive to system maintenance and expansion.

For the forward subsystem of air ticket, there are two sets of state machines: the master order state machine is responsible for managing the state of order, including successful order creation, successful payment, successful transaction and order closure, etc. The suborder state machine is responsible for managing the state of sub_order and maintaining the process from successful booking to ticket issuing. Similarly, there are state machines for the reverse refund and change subsystems.

Selection of the framework

Currently, the Statemachine engine frameworks commonly used in the industry mainly include Spring Statemachine, Stateless4j, Squirrel-Foundation, etc. After a horizontal comparison with the actual business, we finally decided to use Squirrel Foundation mainly because:

  1. Moderate amount of code, relatively easy to expand and maintain;

  2. StateMachine is lightweight and costs little to create instances.

  3. Rich pointcuts support the monitoring of nodes such as state entry, state completion and exception, leaving enough pointcuts in the transition process;

  4. Support the use of annotations to define state transitions, easy to use;

  5. Singleton reuse is not supported in design, but can only be used with New, so the life flow management of the state machine itself is very clear, and will not cause trouble because of the singleton reuse problem of the state machine.

Design and implementation of MSM

Combined with large traffic business logic, we extracted and encapsulated the Action concept on the basis of Squirrel-Foundation, combined State migration and asynchronous message, and encapsulated it into MSM framework (MFW State Machine). It implements business order state definitions, event definitions, and state machine definitions, and describes state migration in the form of annotations.

We assume that a state migration must be accompanied by asynchronous messages, so we put database operations that must succeed in a process into a transaction, and operations that allow failed retries and require little realtime in an asynchronous message consuming process.

Take the successful payment of air ticket order as an example, the successful payment of air ticket order involves changing the order status to paid, updating the payment serial number, etc., which are in a transaction; Notification of vendor invoicing is handled in asynchronous message consumption. Asynchronous messaging is implemented using RocketMQ, which supports two-phase commit, guaranteed message reliability, retry, and multiple Consumer groups.

The following details:

1. An Action class is extracted for each Action required for state migration, and AbstractAction is inherited to support multiple different state migrations to perform the same Action. It depends on the implementation of public List matchConditions(), so matchConditions only need to return multiple initial state-event matching condition key-value pairs. Each Action has a corresponding context class that extends from the MFWContext class for communication in methods such as Process saveDB.

2. Register all actions and add a listener for each state transition that has completed or failed.

3. Relying on RocketMQ asynchronous messages, you need a Spring Bean to inherit from BaseMessageSender, which generates the asynchronous message provider. If you want to use two-phase commit, you need a class inherits BaseMsgTransactionListener, here you can refer to the ticket OrderChangeMessageSender and OrderChangeMsgTransactionListener.

4. Finally, implement an event trigger class. This class contains an Apply method, passing in the PO object, event, and context, instantiating a state machine instance with each execution, initializing the current state, and calling the Fire method.

5. Instantiate a state machine object, set the current state to the corresponding state of the database, call the Fire method, and finally execute the callMethod annotated in the OrderStateMachine class. We configure callMethod = “action”, which will reflect the action method executing the current class.

Action (super. Action (from, to, event, context)), then execute the MFWStateMachine Action method. The factory pattern is used here, and the Process method is executed. If successful, will perform in TransitionCompleteListener MFWStateMachine class initialization, execute the Action afterProcess method to modify database records and sending a message; If it fails, will perform TransitionExceptionListener, perform the Action of onException approach to handle accordingly.

To sum up, MSM can dynamically generate the Squirrel Foundation state machine definition according to the declaration and configuration of Action class, without requiring the user to define it again, which makes MSM more convenient to use.

Wade’s pit

1. Transaction does not take effect

We initially used Spring annotations for transaction management, which annotated the @Transactional annotation on the Database operation methods of the Action class, but found that this did not work in practice. After review, Spring’s transaction annotations are implemented using AOP facets. Calling other methods of the object that use AOP annotations in a method inside the object invalidates the AOP annotations of the called method. Because internal code calls from the same class do not go through the proxy class. Later we solved this problem by manually opening the transaction.

2. Match the Action

Initially we matched actions in two ways: precise and imprecise. Precise matching means that the Action can be matched only when the initial state of a state migration is consistent with the event triggered. Imprecise matching means that actions can be matched as long as the triggered events are the same. Later, we found that inaccurate matching would cause problems in some cases, so we changed it to multi-condition accurate matching. That is, when executing the Action method executed when the state machine is triggered, the Action method can be accurately matched. Multiple state migration methods can be matched to the same Action, so that the Action code can be reused without problems.

3. Asynchronous message consistency

There are some situations that should never occur, such as sending a message when the database is not modified successfully; Or the database was modified successfully, but the message failed; Or the message was sent successfully before committing the database transaction. To solve this problem, we used RocketMQ’s transaction messaging feature, which supports two-phase commit, sending a pre-processed message, then calling back to perform the local transaction, and finally committing or rolling back to help ensure that the information to modify the database is consistent with sending asynchronous messages.

4. Execute different events concurrently for the same order data

In some cases, the same order data may trigger different events at the same time (in milliseconds). If the main ticket order is in the state of waiting for payment, it can receive the callback from the payment center and trigger the successful payment event. The closing event can also be triggered by the user clicking to cancel the order or not paying the scheduled task over time. Without any controls, an order may be both paid and cancelled.

We use optimistic locks to avoid this problem: when a transaction is executed to modify the database, the update order statement with a condition of the original state decides whether to throw an exception by checking whether the number of update rows is 1. Update order WHERE order_id = ‘1234’ and order_status =’ to be paid ‘

In this way, if two events are triggered and executed simultaneously, the first to commit the transaction succeeds. An event committed late in the transaction will fail to execute because the number of rows updated is 0, eventually rolling back the transaction as if nothing had happened.

Pessimistic locks can also be used to solve this problem, in which the first person to grab the lock can successfully execute. However, considering that there may be scripts to batch modify the database, pessimistic lock has the potential problem of deadlock, we finally adopted the optimistic lock approach.

conclusion

On the basis of Squirrel Foundation, the definition and declaration of MSM state machine extract the concept of Action, and configure the initial state, target state, triggered events, and context definition of Action class, so that MSM can be configured according to the declaration and configuration of Action class. To dynamically generate Squirrel Foundation’s state machine definition without requiring the user to define it again, making it easier to operate and maintain.

By using the state machine, the process complexity of the air ticket order transaction system is solved, and the system has been significantly improved and enhanced in stability, scalability, maintainability and other aspects.

The use of state machines is not limited to order-trading systems, but also applies to other systems that involve complex processes of state changes. Hope that through the introduction of this article, can make the state machine understanding and use the needs of the reader friends have harvest.

Author: Dong Tian, air ticket transaction system r&d engineer of Hornet’s Nest Transportation R&D team.

(Hornet’s nest technology original content, reproduced must indicate the source to save the end of the two-dimensional code picture, thank you for your cooperation.)