This is the fourth day of my participation in the August Text Challenge.More challenges in August


When performing rest and Dubbo remote calls, an error occurs on the client and the socket timeout timeout situation is unavoidable. However, is the server successfully executed or not successfully executed? When designing for high availability, it is always assumed that the server will die before invoking it. Is this really true? Based on these two problems, analyze the impact of the timeout problem on services.

Based on the timeout problem caused by server suspension, there are three time points to analyze, namely, the call request hangs before it arrives, the call request hangs when it arrives, and the call request hangs after it finishes processing.

The invoked request hangs before it arrives

For example, the connection between the client and server is abnormal. If you call the client, an error message is displayed indicating socket timeout, which has no impact on data. You can try again.

When the invoked request arrives, the server hangs while processing it

This situation needs to be analyzed according to the stage to which the server will process, which is closely related to the service single business, so the following classification discussion is carried out:

The server is MySQL

MySQL server: MySQL server:

  1. The query cache
  2. Lexical analysis grammatical analysis
  3. Query optimizer
  4. The execution plan
  5. Storage Engine Execution

All the above operations are query operations, and there is no persistent write operation. If the server hangs up during this period, it does not matter if the client tries again after receiving timeout. 6. Write the redo log and prepare. 7. If pacge cache does not contain this record, a read IO is required to read data to page cache. 8. Write bin log, redo log, commit (if bin log crash, no commit or inconsistent, how to guarantee atomicity?) The above 6-8 are the write operations of MySQL’s Innodb engine. When the MySQL restarts in step 6, the client receives the socket timeout, and the client sends an error message “Please try again”. There is no problem. After the MySQL server restarts in step 7, commit data can be restored based on the redo log. Data in the prepared state cannot be restored. Therefore, the data is lost. You can also solve the problem by retry. If MySQL restarts at step 8, there will be inconsistency between primary and secondary data because vinlog write and redo log commit do not guarantee atomic write.

The server is Dubbo Provider

For example, if the server is a deduction inventory service, the provider restarts when invoked, and the final data result depends on the logic in the provider. Typically a provider operates on a database to subtract inventory, then writes an order, etc. This type of service can also return “please try again” when a timeout occurs, but database transactions are required to ensure data consistency.

The call request hangs after it reaches the server

In this scenario, after invoking the provider, the server restarts when TCP returns. As a result, the TCP response cannot be sent and the client reports an error and times out. This situation cannot be resolved by simple retry. When writing data, the client needs to determine whether the written data exists first, and then write data. If the data exists, the client needs to query the data. Therefore, it needs to define a primary key to determine the uniqueness of the data. If data uniqueness check is not performed, a primary key conflict error is reported.