I. Demand background and design
One of the recently developed requirements requires real-time statistics of the average score of the evaluation based on the content of the evaluation, allowing a second-level delay.
This is a very traditional statistical model, where streams are stored through evaluated services and statistics are stored in the form of MQ asynchronous insert or update. Because of the need to count the average score, the library table is designed to store the total score of the evaluation and the number of evaluators. The simplified synchronization process is shown in figure 1.
Advantages of the scheme:
- It prevents repeated query of the historical data of the day and reduces unnecessary calculation.
- There are direct data updates, MySQL unique constraints to handle concurrent situations, no need to lock the query and update.
Disadvantages of this scheme: 1. The midempotency of MQ needs to be guaranteed to the maximum extent, and a record of a class ID can only be consumed once. If messages are repeatedly consumed, subsequent data statistics are affected and the flow data needs to be manually computed again.
Second, ensure the idempotency of MQ
There are concurrent requests in the front end of this demand. It is possible that multiple unique identification streams may be sent at the same time. Only one of them needs to be counted.
2.1 Common midempotent SCHEMES for MQ
Traditional schemes to ensure idempotency of MQ consumption are based on consistency identification to prevent repeated sending – or repeated consumption.
- For message producers:
In concurrent cases, it is only necessary to ensure that the first data update process upstream cannot be queried and updated by other threads, because subsequent incoming requests will query the database first, as shown in the figure.
- For message consumers:
1. Perform status query before performing service operations:
When the consumer starts to perform business operations, it first queries the business status by idempotent ID, for example, modifying the order status. When the order status is successful/failed, no further processing is required. Before the consumption logic is executed, the order status is queried by the order number. Once the order status is obtained, the message is submitted to inform the broker that the message status is consumeStatus.consume_SUCCESS.
2. Uniqueness constraint: set the database unique constraint, ensure the uniqueness of data under the worst conditions.
The first operation does not guarantee must not duplicate data, such as: concurrent insert scenarios, if not optimistic locking, distributed lock as guarantee under the premise of data is very likely to be repeated injection operation, so we must add the uniqueness of idempotent id index, so it can guarantee in concurrent scenarios also can guarantee the uniqueness of the data.
3. Introduce locking
In the first point above, in the case of concurrent update, if the update is carried out without pessimistic lock, optimistic lock, distributed lock and other mechanisms, it is likely that multiple updates will lead to inaccurate status. For example, to update the order status, the service requires that the order can only be updated from the initial -> Processing, processing -> Success, processing -> failure, not across the status update. If there is no locking mechanism, it is likely that initialized orders will be updated as successful, successful orders will be updated as failed, and so on. In the case of high concurrency, it is recommended to define service state changes through state machines. Optimistic locking and distributed locking mechanisms are used to ensure that the results of multiple updates are determined. Pessimistic locking is not recommended in the case of concurrency, which is not conducive to the improvement of service throughput.
4. Message record table: this scheme is similar to the idempotent operation done by the business layer, specifying unique constraints, storing message flow, and indirectly realizing idempotent consumption.
First prepare a message form, at the same time in the consumption of successful insert a has been successfully processed the message id of the record to the table, pay attention to must with business operations in the same things, when new messages arrive, according to the new message id query whether the id already exists in the table, if there is suggests that message has been consumption, Then, you can discard the message and no further business operations can be performed.
2.2 Select solutions based on services
First, define unique identifiers. According to business requirements, the class ID and the date of the day are taken as unique identifiers. (There are other business fields that need to be unique identifiers, which are simplified in this article)
Refer to the above solution to eliminate the unusable solution
2.2.1 Scheme selection
After discussion, the lock of the evaluation service and the statistic service are coupled. To decouple and improve the code readability of the evaluation service. Decide to use a message log for message de-duplication.
The storage scheme of message record table can be selected according to the service level from statistical dimension to day level. Automatic expiration support is preferred.
2.2.2 Additional protection ensures data correctness
-
Front end: do a good job of anti-shake mechanism to prevent misoperation
-
Back-end: Notifies event alarms, generates alarms when SQL update failures or network problems occur, and keeps records. Data will be restored in a unified manner.