“This is the fourth day of my participation in the First Challenge 2022. For details: First Challenge 2022”
1. Scenario description
During the on-site support of customer performance pressure test, I encountered a problem: long Response Time (RT) of user order status statistics interface, which became the bottleneck of the overall performance of the project and needed to be optimized.The statistical interface of user order status is the capability provided by the standard product. Due to the complexity of its business and the particularity of data storage structure (the specific content is described below), the ES query has been conducted for several times, resulting in long RT and affecting the overall performance.
1.1 Service Logic Description
The original storage and query logic is as follows:
- Users place orders, and RDS is persisted
- DTS listens for bin log changes and sends order change messages via Kafka
- The exchange listens and consumes Kafka messages, writing orders to ES
- When the contact end queries, the trading center directly queries ES order data back
The overall link timing diagram is as follows:
SequenceDiagram Autonumber Touch Side ->> Trade Center: Change order status Trade Center ->> RDS: Persistent order information RDS ->> Trade Center: Note right of RDS ->> DTS -> Kafka: Note Right of Touch Side Kafka ->> Trade Center ->> ES: Trade Center ->> ES: query ES ES ->> Trade Center: Return ES order information Trade Center ->> Touch Side: Return order statistics
1.2 Core Issues
In the original business logic of standard products, there are two problems that are more troublesome:
1.2.1 Order Status Mapping
The first is the complexity of order status mapping. The touchpoints need to show the order status: to be paid, to be shipped, to be received, and completed. These touch-side order states are jointly determined by the states of multiple persistent fields and their mapping is shown in the following table:
Contact end order status | The state that the persistent field must meet | ||
---|---|---|---|
Payment status | The delivery status | State of the goods | |
To be paid | To be paid | ||
To send the goods | Have to pay | To send the goods | |
For the goods | Have to pay | Has been shipped | Not receiving |
Has been completed | Have the goods |
Ps: The judgment condition of private completed state needs to be discussed. For example, completed, is it ok as long as the receiving state is received? Strictly speaking, the completed state should satisfy both: the paid state is paid, the shipped state is shipped, and the received state is received. There should not be an agreement that “received status must mean paid, shipped”, and when there are too many agreements, a lot of unexpected things can happen.
1.2.2 ES Data Store Structure
When the trading center writes data to ES, its structure is consistent with the RDS persistent data structure, that is to say, querying ES cannot directly obtain the order status required by the contact end, which is also the reason why it needs to query ES several times when querying the statistical results of orders.
Because of the data stored in ES and the persistent data structure of RDS, we can think of ES as a faster MySQL. When we query the number of unreceived orders for user ID =1, the SQL statement is as follows:
select count(*) from order where user_id = 1 and'Payment status'='Paid' and'Delivery status'='Shipped' andReceiving status='Unreceived';
Copy the code
If we need to query the quantity of four order states at the same time, we can only get it through four queries:
- to pay
select count(*) from order where user_id = 1 and'Payment status'='To be paid';
- at the goods ready
select count(*) from order where user_id = 1 and'Payment status'='Paid' and'Delivery status'='To be shipped';
- to be receiving
select count(*) from order where user_id = 1 and'Payment status'='Paid' and'Delivery status'='Shipped' andReceiving status='Unreceived';
- has been completed
select count(*) from order where user_id = 1 and'Payment status'='Paid' and'Delivery status'='Shipped' andReceiving status='Received';
Copy the code
This is why the whole user order statistics interface RT is long.
2. Solutions
Based on the above analysis of business and core problems, we propose the following solutions:
2.1 Modifying the ES Storage Data Structure
Add a payment status field to each data, and then write ES.
This reduces the number of N queries to one, and only the order status is required as the query condition. However, for reasons that cannot be described, this plan was abandoned (the main reason being to try not to make any major changes to the standard).
2.2 New cache mechanism
After abandoning the first solution, we put forward a second solution: caching, which can cache the statistical results of user orders.
This eliminates the need to query ES N times each time and then assemble the result set and return it. However, the accuracy and robustness of caching is a concern:
- How to ensure the accuracy of data in asynchronous scenarios and not affect the cached data in scenarios such as repeated consumption?
- How do I make cached data robust? Assuming the cache is wrong, how do you make the cache data self-correcting without background intervention?
- How can caches be hit consistently under certain scenarios (e.g., performance pressure tests)?
- …
These are all things to consider when designing a caching mechanism.
3. Detailed design
3.1 Cache process design
After determining the direction of caches based optimization for this scenario, the first thing we need to determine is the overall optimized business process, the previous steps remain unchanged, or:
- Users place orders, and RDS is persisted
- DTS listens for bin log changes and sends order change messages via Kafka
- The exchange listens and consumes Kafka messages, writing orders to ES
Added steps for synchronizing cache:
- The trading center synchronizes the order information to the cache
When the contact side conducts the order statistics query, the cache should be queried first, which needs to be discussed in different cases:
- First query (cache missed)
- The trading center directly queries ES order data
- Cache the query results. There are two types of cache
- User order statistics cache: caches the number of orders in each state
- User order ID cache: Caches the order ID value of each state
- Return the query result to the contact
- The NTH query (cache hit) is also discussed separately here
- The two types of caches are consistent
- Return the query result to the contact
- The two types of caches are inconsistent
- The trading center directly queries ES order data
- Fix the cache to update the query results to the cache
- Return the query result to the contact
- The two types of caches are consistent
In addition, we also verify and compensate the accuracy of the cache when the user queries all orders in the specified status:
- Contact side queries all orders with specified status
- Exchange center queries all orders in ES specified status
- Check whether it is consistent with the data in the cache
- If they are inconsistent, the data in the cache is updated with compensation
- Returns all orders in the specified status to the contact end
SequenceDiagram Autonumber Touch Side ->> Trade Center: Change order status Trade Center ->> RDS: Persistent order information RDS ->> DTS: Kafka ->> Trade Center -> ES: Trade Center ->> Redis: Touch Side ->> Trade Center ->> Redis: Redis ->> Trade Center: cache hit Trade Center ->> Trade Center: Note Right of Trade Center: The cache is not hit or the cache data is inconsistent. Trade Center -->> ES: Trade Center -->> Redis: Data compensation and correction Trade Center -->> Touch Side: Return order statistics Trade Center ->> ES: Query all orders in the specified state Trade Center ->> Redis: Trade Center ->> Trade Center: Check the accuracy of cache data Trade Center ->> Touch Side: Note Right of Trade Center: Note right of Trade Center: Note right of Trade Center -->> Redis: Trade Center -->> Touch Side: Returns all orders in the specified status
3.2 Design of Redis data structure
The caching middleware used in the client’s project is Redis.
As mentioned in the cache design, we will divide the cache into two categories, with the following meanings and data structure:
- User order statistics cache: caches the number of orders in each state
- Redis Data Structure (Hash)
- Key: UserOrderCountCache: userId
- Value: (Order status: Statistical quantity result)
- User order ID cache: Caches the order ID value of each state
- Redis Data Structure (Zset)
- Key: UserOrderDetailCache: userId: order status
- Value :(order ID)
3.3 Compensation mechanism design
Based on the distrust of caches, we designed two compensation mechanisms.
3.3.1 Active compensation
-
Query order statistics (Redis cache internal correction)
SequenceDiagram Autonumber Touch Side ->> Trade Center: Send order statistics query request Trade Center ->> ES: first query ES ->> Trade Center: Return order statistics query result Trade Center ->> Redis: N query result Redis ->> Trade Center: order ID cache and order statistics cache check Trade Center ->> Touch Side: Note right of Trade Center: operation with inconsistent caches Trade Center -->> ES: query all orders of the user Trade Center -->> Redis: Note right of Redis: update the order ID cache and the order statistics cache Trade Center -->> Touch Side: update the cache and return the order statistics query result
-
Query all orders with specified status (lazy loading active compensation correction)
SequenceDiagram Autonumber Touch Side ->> Trade Center: Send all order requests to the Trade Center ->> ES: Trade Center ->> Redis: Query user order ID cache and statistics cache Trade Center ->> Trade Center: Note right of Trade Center -->> Redis: Note right of Redis: update the order ID cache and the order statistics cache Trade Center -->> Touch Side: update the cache, return the query results of all orders in the specified status
3.3.2 Timing compensation
Based on scheduled tasks, the consistency of the two types of cache is judged. If they are inconsistent, ES is queried for cache compensation update.
SequenceDiagram Autonumber Schedule ->> Trade Center: Scheduled task start Trade Center ->> Redis: query cache Redis ->> Trade Center: Note right of Trade Center: Note right of Trade Center: Note right of Trade Center: Note right of Trade Center: Note right of Redis: update order ID cache and order statistics cache Trade Center -->> Schedule: The compensation update cache ends, and the scheduled task ends. Procedure
3.4 Argument and explanation
3.4.1 Whether the two types of caches are overdesigned
The two types of caches are designed to prevent repeated consumption.
If only one cache is designed (order statistics), the order statistics will be +2 when Kafka messages are consumed to update the cached data, causing the cached data to be abnormal.
And if, in accordance with the current design has two kinds of cache, even if the user order statistical results in the cache data is unusual, but because the user order ID cache storage is the order ID, so won’t produce duplicate data, can again the next user query to identify the accuracy of cache data is wrong, thus go ES query.
3.4.2 Is it reasonable to calculate the cache
One could argue that it is not safe to calculate the cached results, assuming that instead of repeated consumption, the order information was synchronized in ES in some way when the message was consumed, but what about data loss in the cache? At this time, the data of the two types of cache is consistent, and the ES query is not performed, which leads to the return of wrong data to the contact end.
Indeed, this is possible, so when the touchpoint queries all orders in the specified state (as shown in the figure below), we append oneActive compensation mechanism.
The active compensation mechanism is based on the results of the query specified state all orders must be accurate, its working principle is not directly return after get the query results, but the cache data on a query and checking the accuracy, such as query results do not agree with the cache, illustrate the cached data is wrong, need to compensate for caching.
4. Sequence diagram of main scenes
This part of the sequence diagram is mainly to enumerate all business scenarios, listing the operations that the trading center needs to perform on the cache based on various scenarios.
4.1 Placing an Order (Creating an order)
Omit non-core nodes such as DTS/Kafka
SequenceDiagram Autonumber Touch Side ->> Trade Center: Create order Trade Center ->> RDS: Persistent order Information Trade Center ->> ES: Note Right of Redis: Note right of Redis: +1 Note right of Redis: Order details cache (to be paid) added OrderId Trade Center ->> Touch Side: Returns order results
4.2 User Payment (Order Status change)
Omit non-core nodes such as DTS/Kafka
SequenceDiagram Autonumber Touch Side ->> Trade Center: User pay Trade Center ->> RDS: Update order status Trade Center ->> ES: Note Right of Redis: Statistical result cache (to be paid)-1,(to be shipped)+1 Note Right of Redis: Order details Cache (to be paid) Remove OrderId, (to be shipped) add OrderId Trade Center ->> Touch Side: Return payment result
4.3 Merchant Delivery (Order Status Change)
Omit non-core nodes such as DTS/Kafka
SequenceDiagram Autonumber Management Platform ->> Trade Center: Merchant shipping Trade Center ->> RDS: Update order status Trade Center ->> ES: Note Right of Redis: Note right of Redis: Note right of Redis: Note right of Redis: Order Details Cache (backlog) Remove OrderId, (backlog) add OrderId Trade Center ->> Management Platform: Return shipment results
4.4 Customer confirmation of Receipt (Order Status Change)
Omit non-core nodes such as DTS/Kafka
SequenceDiagram Autonumber Touch Side ->> Trade Center: User confirms receipt of goods Trade Center ->> RDS: Updates order status Trade Center ->> ES: Note Right of Redis: Note right of Redis: Note right of Redis: Note right of Redis: Order Details Cache (waiting for shipment) Remove OrderId, (completed) add OrderId Trade Center ->> Touch Side: return confirmation of shipment
4.5 User Return (Cancel order)
Omit non-core nodes such as DTS/Kafka
SequenceDiagram Autonumber Touch Side ->> Trade Center: User cancels order Trade Center ->> RDS: Update order Status Trade Center ->> ES: Note Right of Redis: Note right of Redis: Note right of Redis Delete OrderId Trade Center ->> Touch Side: Return cancellation result
5. The bottom line
- Timed task: Timed compensation corrects all caches
- Open compensation interface: Actively invoke compensation interface correction (specified/all) cache
- Batch tasks are performed to prevent services from being unavailable due to slow SQL database bursts
At this point, this sharing is over, thank you. Finally, this article is included in the Personal Speaker Knowledge Base: Back-end technology as I understand it, welcome to visit.