The author of this article is Li Zhao, the author of the public account “Coffee Latte”, and Contributor for distributed transactions in Seata community.

1. About Seata

Recently, I wrote an analysis of distributed transaction middleware Fescar, and within days the Fescar team upgraded its brand, Called Seata(Simpe Extensible Autonomous Transcaction Architecture), the former Fescar was called Fast & EaSy Commit And Rollback. As you can see, Fescar’s name is more limited to Commit and Rollback, while the new brand name, Seata, aims to create a one-stop distributed transaction solution. After the name change, I feel more confident about its future development.

Here’s a quick overview of Seata’s entire process model:

TM: The initiator of the transaction. To tell TCS that global transactions have started, committed, and rolled back.
RM: Specific transaction resources. Each RM is registered with the TC as a branch transaction.
Coordinator of TC transactions. Also known as the Fescar-server, which receives registration, commit, and rollback of our transactions.

I gave an overview of the entire role in the previous article, but in this article I’ll focus on the core role, THE TC, or transaction coordinator.

2.Transaction Coordinator

Why is TC always emphasized as the core? That’s because the role of TC is like god, controlling the RM and TM of the masses. If TC does not work well, then RM and TM will be in a mess once there is a small problem. So to understand Seata, you have to understand his TC.

So what are the qualities of a good transaction coordinator? I think there should be the following:

Correct coordination: can correctly coordinate RM and TM what to do next, what to do when wrong, what to do when right.
High availability: Transaction coordinators are important in distributed transactions. If high availability is not guaranteed, there is no need for them to exist.
High performance: The performance of the transaction coordinator must be high. If the performance of the transaction coordinator is bottleneck, then the RM and TM managed by the coordinator will encounter frequent timeouts, resulting in frequent rollback.
High scalability: This feature is at the code level, and a good framework requires a lot of custom extensions for users, such as service registration/discovery, read configuration, etc.

I will also step by step explain how Seata achieves the above four points.

2.1 Design of SEATa-Server

The overall module diagram of SEata-Server is shown above:

Coordinator Core: The bottom module is the Core code of the transaction Coordinator. It is used to process transaction coordination logic, such as whether to Commit or Rollback.
Store: A storage module that keeps our data persistent and prevents data loss during a restart or downtime.
Discover: Service registration/discovery module used to expose the Server address to the Client.
Config: stores and searches server configurations.
Lock: The Lock module, which provides Seata with global locking capabilities.
Rpc: Used for communication with other ends.
Ha-cluster: High availability Cluster, currently not open source. Provides reliable high availability for Seata.

2.2 Discover

Let’s start with the more basic Discover module, also known as the Service registration/discovery module. After starting seata-server, we need to expose our address to other users, so we need this module to help.

This module has a core interface, RegistryService, as shown in the above image:

Register: Used by the server to register services.
Unregister: used by the server, usually in the JVM close hook, ShutdownHook call.
Subscribe: Used by clients to register events to listen for address changes.
Unsubscribe: Client use, unsubscribe to listen for events.
Lookup: Used by the client to find the service address list based on the Key.
Close: Can be used to close the Register resource.

If you need to add your own service registry/discovery, implement this interface. Up to now, there are four services registered/discovered, which are Redis, ZK, NACOS and Eruka, driven by the continuous development of the community. The implementation of Nacos is briefly introduced below:

2.2.1 the register interface

Step1: Verify whether the address is legal;

Step2: get the Name instance of Nacos and register the address with the current Cluster Name.

The unregister interface is similar and will not be explained in detail here.

2.2.2 lookup interface

Step1: Obtain the current clusterName name;

Step2: judge whether the current Cluster has been obtained. If yes, get it from the Map.

Step3: get the address data from Nacos and convert it into what we need;

Step4: register our event-changing Listener with Nacos.

2.2.3 the subscribe interface

This interface is relatively simple and consists of two steps:

Step1: Add the Clstuer and Listener to the Map.

Step2: register with Nacos.

2.3 the Config

Configuration module is also a more basic, relatively simple module. The number of Select threads, the number of Work threads, the maximum number of Session threads, etc. Seata has its own default Settings for these parameters.

Seata also provides an interface called Configuration, which is used to define where we need to get the Configuration:

Get int/Long/Boolean/Config () : through DataId to obtain the corresponding value.
PutConfig: used to add configuration.
RemoveConfig: Deletes a configuration.
Add /remove/get ConfigListener: Adds, removes, or obtains a configuration listener, which is used to listen for configuration changes.

So far there are four ways to get Config: File, Nacos, Apollo, and ZK. In Seata, you first need to configure registry.conf to configure the type of the conf. Implementing conf is relatively simple and I won’t go into detail here.

2.4 Store

The implementation of the storage tier is critical to Seata’s high performance and reliability.

If the storage layer is not implemented well, data that is in the middle of a distributed transaction in the TC will be lost in the event of an outage. Since distributed transactions are used, there is no tolerance for loss. If the storage tier is well implemented, but its performance is a big problem, RM may roll back frequently and be completely unable to cope with high concurrency scenarios.

Seata provides file storage by default. The following defines the data stored as Session. The global transaction created by TM is called GloabSession, and the branch transaction created by RM is called BranchSession. A GloabSession can have multiple BranchSessions. Our goal is to store so many sessions.

In FileTransactionStoreManager# writeSession code:

The above code is divided into the following steps:

Step1: Generate a TransactionWriteFuture.

Step2: drop the futureRequest into a LinkedBlockingQueue. Why do you need to throw all the data into the queue? Of course, this can also be done with locks, which are used in RocketMQ, another alibaba open source. Whether it’s a queue or a lock, their purpose is to ensure single-threaded writing, and why? Some people might say that you need to write sequentially so that it’s fast, but that’s a mistake. Our FileChannel is thread-safe and already writes sequentially. Ensure that the single thread write is in fact to make this write logic is single thread, because there may be some files write full or record write data location and so on logic, of course, these logic can be active lock to do, but in order to achieve simple and convenient, directly to the entire write logic lock is the most appropriate.

Step3: call future.get and wait for the notification that the data writing logic is complete.

After we have submitted the data to the queue, we need to consume it with the following code:

Here we submit a WriteDataFileRunnable() to the thread pool. The Runnable’s run() method looks like this:

Divided into the following steps:

Step1: Check whether stopping is done. If stopping is true, return NULL.

Step2: get data from the queue.

Step3: Determine whether the Future has timed out. If it has timed out, set the result to false. At this time, our producer get() method will contact blocking.

Step4: write the data to the file. At this time, the data is still in the pageCache layer and has not been flushed to the disk. If the data is written successfully, determine whether to flush the disk according to the conditions.

Step5: when the number of writes reaches a certain level or the writing time reaches a certain level, save the current file as a history file, delete the previous history file, and then create a new file. This step is to prevent the file from growing indefinitely and wasting disk resources with large amounts of invalid data.

In writeDataFile there is the following code:

Step1: First get the ByteBuffer. If the maximum loop size of BufferSize is exceeded, create a new Buffer. Otherwise, use the cached Buffer. This step can significantly reduce GC.

Step2: Then add the data into the ByteBuffer.

Step3: finally write the ByteBuffer to the fileChannel, which will retry three times. At this time, the data is still in the pageCache layer, which is affected by two aspects. The OS has its own refresh policy, but this business program cannot control it. In order to prevent events such as downtime resulting in massive data loss, the business needs to control flush itself. Here is the code for Flush:

If there is a power outage, there may still be data in pageCache that has not been flushed, causing a small amount of data to be lost. Synchronous mode, where each piece of data is flushed, is not currently supported, which ensures that every message is flushed, but performance can be significantly affected, although this will be supported over time.

The core process of Store is mainly the above several methods, of course, there are some such as Session reconstruction, these are relatively simple, readers can read by themselves.

2.5 the Lock

As we all know that database isolation levels are implemented mainly through locks, similarly Seata, a redistributed transaction framework, also requires locks to achieve isolation levels. In a database, there are four isolation levels: read uncommitted, read committed, repeatable read, and serialization. In Seata, writes are guaranteed to be mutually exclusive, while the isolation level for reads is generally uncommitted, but provides a means to achieve read committed isolation.

The Lock module is the core module that Seata implements the isolation level. The Lock module provides an interface for managing locks:

There are three ways:

AcquireLock: acquireLock the BranchSession. AcquireLock: acquireLock the BranchSession. AcquireLock: acquireLock the BranchSession.
IsLockable: queries whether the transaction is locked based on the transaction ID, resource ID, and locked Key.
CleanAllLocks: Removes all locks.

We can implement locks locally, or we can use Redis or mysql to help us implement them. The local global lock implementation is officially provided by default:

There are two constants to look at in the implementation of local locks:

BUCKET_PER_TABLE: Defines the number of buckets per table to reduce contention for subsequent locking of the same table.
LOCK_MAP: This Map is very complex by definition, with many layers of Map inside and outside.

The layer number	key	value
1-LOCK_MAP	ResourceId (jdbcUrl)	dbLockMap
2- dbLockMap	TableName tableName	tableLockMap
3- tableLockMap	Pk.hashcode %Bucket (primary key hashCode %Bucket)	bucketLockMap
4- bucketLockMap	PK	trascationId

As you can see, the actual lock is in the bucketLockMap. The lock method here is relatively simple, but the main thing is to find the bucketLockMap step by step and insert the current trascationId into it. If the primary key currently has TranscationId, then the comparison is to itself, if not, then locking fails.

2.6 RPC

One of the keys to Seata’s high performance is the use of Netty as the RPC framework. The thread model with the default configuration is shown below:

If the default base configuration is used, there will be an Acceptor Thread to handle client connections, and there will be two NIO-threads in the Acceptor Thread. This Thread will not do heavy work, but will only do fast work such as codec, heartbeat event and TM registration. Some time-consuming business operations will be assigned to the business thread pool, which is configured to have a minimum of 100 threads and a maximum of 500 threads by default.

The Seata heartbeat mechanism is done using Netty’s IdleStateHandler, as follows:

On the Sever side, the maximum idle time is not set for write, but for read. The default idle time is 15s. If it exceeds 15s, the link will be broken and the resource will be shut down.

Step1: Detection event to determine whether the read is idle;

Step2: if yes, disconnect the link and close the resource.

2.7 HA – Cluster

Ha-cluster has not been officially announced at present, but through some other middleware and some official disclosures, ha-cluster can be designed as follows:

The specific process is as follows:

Step1: The client guarantees that the same Transcation is on the same Master according to the TranscationId when publishing information, and provides concurrent processing performance through horizontal expansion of multiple masters.

Step2: In the Server, a Master has multiple slaves. The data in the Master is synchronized to the slaves in near real time, so as to ensure that when the Master breaks down, there are other slaves on top of it.

All of this is speculation, of course, and the specific design implementation will have to wait until version 0.5. There is currently a Go version of Seata-Server that has also been donated to Seata (still in the process) that implements replica consistency through Raft, other details are unclear.

2.8 the Metrics & Tracing

This module is also an unpublished module, although it is possible to provide plugins for other third-party metrics to access. In addition, Apache SkyWalking is currently in discussions with the Seata team about how to plug in.

3.Coordinator Core

We’ve covered a lot of basic Server modules, so you have a good idea of Seata implementation. Next, I’ll explain how the transaction coordinator logic is implemented, and give you a better understanding of Seata implementation.

3.1 Startup Process

The start method has a main method in the Server class that defines our start process:

Step1: create a RpcServer, and this contains our network operations, using Netty server implementation.

Step2: parse the port number and file address.

Step3: initialize SessionHoler. The most important thing is to restore our data in the dataDir folder and rebuild our Session.

Step4: create a CoorDinator, which is also the logical core code of our transaction CoorDinator, and initialize it. The internal initialization logic will create four scheduled tasks:

RetryRollbacking: performs a timed rollback retry task every 5ms to retry failed rollback files.
Retryresearch: retry the commit task, which is used to retry the failed commit every 5ms.
Asyncresearch: Asynchronous commit scheduled task, used to perform an asynchronous COMMIT every 10ms.
TimeoutCheck: detects timed tasks and executes the timeout logic every 2ms.

Step5: initialize UUIDGenerator. This is also the base class for generating transcationids (branchids).

Step6: set the local IP address and listening port to the XID, initialize the rpcServer and wait for the client to connect.

The startup process is simple, and I’ll show you how Seata handles some of the common business logic in a distributed transaction framework.

3.2 Begin – Start global transactions

A distributed transaction must start with a global transaction. First, let’s look at how Seata implements global transactions:

Create a GloabSession based on the application ID, transaction group, name, and timeout.

Step2: Add a RootSessionManager to it to listen for some events, There are currently four types of listeners in Seata (all sessionManagers implement SessionLifecycleListener) :

ROOT_SESSION_MANAGER: the most complete, the largest, and owns all sessions.
ASYNC_COMMITTING_SESSION_MANAGER: Used to manage the sessions that need to be committed.
RETRY_COMMITTING_SESSION_MANAGER: manages the Session for retry the commit.
RETRY_ROLLBACKING_SESSION_MANAGER: manages the Session that is being rolled back again. Since we are starting the transaction, we don’t need to worry about the other SessionManager, we just add the RootSessionManager.

Step3: start a Globalsession

This step changes the state to Begin, records the start time, and calls the onBegin listening method of the RootSessionManager to save the Session to the Map and write it to our file.

Step4: Finally return XID, which is composed of IP +port+transactionId, and it is very important. After TM obtains the application, this ID needs to be transmitted to RM, and RM decides which Server should be accessed through XID.

3.3 BranchRegister – Register branch transactions

When a global transaction is enabled on TM, RM branch transactions must also be registered with the global transaction. Here is how to handle this:

Step1: Obtain and verify the global transaction status using transactionId.

Step2: create a new branch transaction, i.e. BranchSession.

Step3: Apply a global lock to the branch transaction. The logic here is that of the lock module.

Step4: add branchSession, mainly by adding it to the globalSession object and writing it to our file.

Step5: return the branchId. This ID is also important. We need to use it later to roll back our transaction or update the status of our branch transaction.

After registering a branch transaction, you need to report the subsequent status of the branch transaction whether it succeeded or failed. At present, the Server simply keeps a record of the branch transaction. The purpose of the report is that even if the branch transaction fails, if TM still insists on committing the global transaction, when the branch transaction is committed again, The failed branch transaction does not need to be committed.

3.4 GlobalCommit – GlobalCommit

When the branch transaction is completed, it is up to the TM-transaction manager to decide whether to commit or roll back. If it is committed, the following logic is applied:

Step1: First find the globalSession. If it proves for Null that it has been committed, then the direct idempotent operation returns success.

Step2: close GloabSession to prevent new branches from entering.

Step3: if status is equal to Begin, then it has not been committed for a long time, change its status to research, which means Committing.

Step4: judge whether asynchronous submission can be made. Currently, only AT mode can be made asynchronous submission, because it is done through Undolog. Both MT and TCC require synchronous submission of code.

Step5: if the commit is asynchronous, put it directly into ASYNC_COMMITTING_SESSION_MANAGER and ask it to do it asynchronessly in the background thread. Step6: if the commit is synchronous, execute step6 directly.

Step6: If a branch transaction fails, it will be retried based on different conditions. Asynchronous transactions do not need to retry because they are in the Manager. As long as they fail, they will not be deleted and will continue to retry.

3.5 GlobalRollback – indicates a GlobalRollback

If our TM decides to roll back globally, it will go to the following logic:

This logic is basically the same as the commit process and can be regarded as its reverse, which will not be expanded here.

4. To summarize

At the beginning of the summary, we proposed four key points of distributed transactions, Seata exactly how to solve:

Correct coordination: various correct retries through background scheduled tasks, and it is possible to roll back manually in the future when the monitoring platform is rolled out.
High availability: Ensures high availability through ha-cluster.
High performance: file sequential write, RPC through netty implementation, Seata can be scaled in the future, improve processing performance.
High scalability: Provide users with free places to implement, such as configuration, service discovery and registration, global locking, etc.

Finally, I hope you can understand the core design principles of Seata-Server from this article. Of course, you can also imagine how to design a distributed transaction Server yourself.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

In-depth analysis of a one-stop distributed transaction solution Seata-Server

1. About Seata

2.Transaction Coordinator

2.1 Design of SEATa-Server