Phenomenon: The technology of a previous project used the framework of Spring Cloud microservice and used Nacos+Fegin to make RPC calls. It was deployed on K8S and had multiple PODS. Problems still occurred in the process of use: the order table data was processed repeatedly in the order center…
So let’s do a quick summary
I. Idempotence (Review)
What is idempotency: for the same request (including parameters, address), the system will get the same result no matter how many times the request occurs. The idempotency of the interface ensures that the distributed system will not repeat processing because of repeated requests in a complex environment, simplifying the steps of the system to deal with faults
Solutions commonly used by industry experts
-
MVCC multi-version concurrency control, using optimistic locks to ensure that data will not be processed repeatedly, with only one chance of success per Version
-
To de-duplicate a table, the database uses unique constraints on rows to ensure that data will not be processed repeatedly, one row for one success
-
TOKEN is a common mechanism used in long process services. A service is transferred between systems based on an id. Each service has only one chance to succeed on the same ID
-
Pessimistic locking, which locks rows of data to prevent other threads from committing transactions, guarantees only one success through serialization
-
Distributed locks, using Redis or ZK nodes, prevent transactions from being committed by other threads and ensure only one success through serialization
-
Asynchronous processing, not idempotent for Insert operations, using timed tasks or other means to filter data
-
State machine power, when processing a long process, if the state field has changed, should not process all previous fields of the business request
Idempotency of interfaces
-
GET requires no extra processing, and the interface for querying the data is idempotent in that the data remains the same no matter how many times the request is made
-
POST POST requests are not idempotent, and POST is not guaranteed in any HTTP specification, but practical scenarios often encounter examples that need to be handled
-
PUT needs to be idempotent and needs to ensure that data is not submitted and updated repeatedly
-
DELETE The request to DELETE a resource must be idempotent. If multiple requests are made to DELETE a resource, the deletion succeeds
Second, to find out the root
Of illness
- Check log printing. Query the logs of the Client and Server for a certain order number. It is found that the Client invokes only once, but the Server consumes three times in two PODS (K8S environment) with an interval of 10 seconds
- All three threads on the Server end block after receiving the request, until a certain point, all three requests are successful and a success log is printed
- The Client gets a response and continues processing
It is easy to figure out the reason for the problem. When selecting the framework, the remote RPC architecture uses Nacos+Fegin. In terms of the implementation method, Spring’s RestTemplate calls the HTTP protocol, and it is impossible for one HTTP call to have three server links. This problem because I have seen one request is only one log printing business end, but in fact Fegin default implementation, more than 10 seconds no response is received the request will be discarded (configurable), Fegin will according to the predetermined good access rules to select another server again request, We did not take the initiative to set the rules, the default implementation is polling, many requests are blocked because of the database, in polling actually abandoned the results of the last Http link, but the end of the Http link did not end the Server thread, causing the problem this time
Solve it in three stages, don’t ask me why, the factory director is not my cousin (order data security consideration) :
Three, the knife to cut the pig
3.1 idempotent design 1.0
- The idempotency is identified by the trade order number
- The first step of each request is to query according to the order number, and the original result will be returned if data is found
- Conduct business processing, obtain account information of all trading parties, increase flow and change balance
Cause analysis,
The above 3 did not continue because of the database congestion, but the Server threads are actually running. This time, the problem occurred because the database operation lasted more than 10 seconds. However, repeated interface submission, memory, CPU, disk IO, network and many other conditions can affect the response time, so idempotency needs to be redesigned
Direction to start (resolve repeated requests)
Token mechanism
In the case of repeated and consecutive clicks in the front end, such as the user shopping and submitting an order, the interface of submitting an order can be implemented through the Token mechanism to prevent repeated submission.
The main process is:
- The server provides an interface for sending tokens. When we analyze the business, which business has idempotent problem, we must obtain the token before executing the business, and the server will save the token in Redis. (Microservices must be distributed, if a single machine is suitable for JVM caching).
- The token is then carried over when the business interface request is invoked, usually in the header of the request.
- The server checks whether the token exists in redis. If the token exists in Redis, the server deletes the token in redis for the first time and continues services.
- If the token does not exist in redis, it indicates that the operation is repeated, and the repeated token is directly returned to the client. In this way, the business code will not be executed repeatedly.
This operation is an INSERT+UPDATE operation, which itself belongs to a POST request, but it needs to ensure idempotency to satisfy the upstream system. The original idempotency design is to determine whether there is a processing record according to whether there is a record in the database, and the data change in the database is after the transaction is committed. Such a design is not at all idempotent under high concurrency, and the starting point is to fix repeated commits
Delete tables from the database
Use the unique index feature of the database to ensure unique logic when inserting data into the table. The unique sequence number can be a single field, such as the order number of an order, or a unique combination of multiple fields. For example, design the following database table.
CREATE TABLE 't_IDEmpotent' (' id 'int(11) NOT NULL COMMENT' id ', 'serial_no' vARCHar (255) NOT NULL COMMENT 'id ', 'source_type' varchar(255) NOT NULL COMMENT 'resource type ',' status' int(4) DEFAULT NULL COMMENT 'status ', 'remarking' varchar(255) NOT NULL COMMENT 'COMMENT ',' create_by 'bigint(20) DEFAULT NULL COMMENT' founder ', 'create_time' datetime DEFAULT NULL COMMENT 'create time ',' modify_by 'bigint(20) DEFAULT NULL COMMENT' create time ', 'modify_time' datetime DEFAULT NULL COMMENT 'modify_time ', PRIMARY KEY (' id') UNIQUE KEY 'key_s' (' serial_no ',' source_type ', InnoDB DEFAULT CHARSET=utf8 COMMENT=' midempower check table '; Copy code Copy codeCopy the code
Let’s take a look at these key fields,
- Serial_no: Unique serial number value, in this case I set it by annotation
@IdempotentKey
To identify the fields in the request object and obtain their corresponding values by encrypting them with MD5. - Source_type: Business type, differentiating different business, order, payment, etc.
- Remark: by joining together into a string of identification field, splicing operator for “|”.
As the data establishes a unique index consisting of the combination of serial_no,source_type and remark fields, the idempotency of the interface can be achieved through this method. The specific code design is as follows:
3.2 idempotent Design 2.0
Before business processing, it is necessary to ensure that there is only one thread on the Server side for business processing and data submission. We use Redis to make a distributed lock and combine the execution operation with distributed lock. The modified Server side steps are as follows
-
Identify idempotency by order number
-
Use Redis SetNX to place the order number in Redis, set a medium expiration time of 1 minute, and spin if you don’t get the distributed lock
-
The first step of each request is to query according to the order number, and the original result will be returned if data is found
-
Conduct business processing, process all flow and balance of this transaction
-
Commit the transaction
-
Release Redis lock
-
Redis SetNx method does not set timeout at the same time, so the original scheme is actually a two-step operation, no guarantee of atomicity (discover yourself).
-
The time of distributed lock is inconsistent with the wait time of database transaction. The 1-minute wait time of distributed lock is much lower than the set wait time of database transaction, so it is still possible to bypass idempotency check when the wait time exceeds 1 minute
The solution
-
If you want to solve the first problem, there are two possible solutions:
- Start Redis transactions
- Add expiration time to Redis lock
-
Set the transaction waiting time and the distributed lock effective time to N. Within this time, only one thread will compete for the lock and process it. If the lock fails, a retry will be initiated
Considering the uneven level of developers, I chose the second way to encapsulate Redis in order to avoid the potential risks caused by environmental problems.
3.3 idempotent design 3.0
The state machine
For many businesses there is a business flow state, and each state has a pre-state, a post-state, and a final end state. For example, the process of pending approval, approval, rejection, re-launch, approval, approval rejected. Order to be submitted, to be paid, paid, cancelled.
Taking orders as an example, the pre-state of a paid state can only be paid, and the pre-state of a cancelled state can only be paid. We can control the idempotence of requests through the flow of this state machine.
- Identify idempotency by order number
- Use the following states
UN_SUBMIT (0, 0, "to submit"), UN_PADING (0, 1, "to pay"), and PAYED (1, 2, "paid to goods"), DELIVERING (2, 3, "already shipped"), and COMPLETE (3, 4, "completed"), CANCEL(0, 5, "cancelled ")
Update when you feel better