preface
In the online operation of the company’s products, some users reported that they initiated an account request, recording 100 yuan. After the completion of normal transactions at the front end, two accounts of 100 yuan were recorded at the back end, causing a very dangerous short payment account problem.
After verification by multiple parties, it is confirmed that the client communication has been sent twice and only received the return message of the second time, which caused the problem.
In order to start with self-examination in advance, we don’t make a joke about why the server side does not do a good job of idempotent and anti-reduplication functions, first analyze whether the client side really has a problem, so the following analysis and troubleshooting process is generated.
Problem analysis
The client uses CEF framework, based on Chromium 76 version; Request initiation is based on AXIOS framework, and the front-end framework adopts vue.js 2.0 technology.
1. Summarize the overall network deployment architecture diagram:
Figure 1. Network deployment architecture Diagram
2. According to the network architecture in the figure above, get the production logs from back to front. First, analyze whether the server logs of the two requests are completely consistent, and conclude that the two server logs are completely consistent.
3. The service gateway logs show that the interval between the two requests is 19s and no other information is found.
Figure 2. Service gateway log screenshot
4. By observing the log of the reverse proxy server, it is found that websocket attempts to re-handshake and reconnect exist between the two requests:
Figure 3 Reverse proxy server log screenshot
The log shows that the network is disconnected when the next and second POST requests are the same, and the WebSocket is reconnected.
5. After observing the client logs, the network disconnection did occur. The Websocket tried to reconnect and succeeded in subsequent reconnection.
Figure 4. Screenshot of client log
6. Check the client logs from the above figure. It is found that the client logs did not receive any HTTP exception logs after the communication request logs were initiated, only websocket exception logs were recorded.
To sum up, analysis of the above problems: We found that the client application layer only initiated one HTTP request, and the network fluctuated at this time. The reverse proxy server recorded that the client initiated two HTTP requests, intermixed with a Websocket reconnection, and the server also received two requests. Therefore, you need to find out what causes the problem between the client and the reverse proxy server.
The scene again
1. Who did the retransmission between the client and the reverse proxy server, causing the problem
2. Nginx will re-request the backend service if an exception occurs. Nginx retransmission mechanism – yxy_Linux – Blogpark (cnblogs.com) However, according to the architecture described at the beginning, the parameters of NgniX are modified. Retransmission does occur during the test, but one item is recorded in the gateway log, which is inconsistent with this phenomenon
3. What I suspect again is the re-posting of AXIos. I have also seen related posts on Github
Axios send the same request twice and ignore the first response, only receives the second response. · Issue #2825 · axios/axios · GitHub
If you look at the AXIos source code, you can see that the underlying sending of AXIOS is also implemented using XMLHttpRequest. From the AXIos process, there is no retransmission code processing
4. With the exclusion of the above nginx and AXIos mechanism, we turned our attention to Http protocol. An online search found that Http has a retransmission process. HTTP request retransmission – SegmentFault Yes
Repeat the problem based on the above scenario.
5. Disconnect the client network and try to reproduce it. The client directly sends A POST request using AXIos. After receiving the request, the server will sleep for several seconds to simulate the service process and then return. Wireshark is used to capture packets and observe the packet sending situation of the client and server to try to reoccur the problem.
The client clicks the button to send a POST request, immediately unplug the network cable, and waits for a few seconds to reinsert the network cable. The client immediately receives a Disconnect exception. No such exception occurs in the production environment, so this scenario is incorrect.
FIG. 5 screenshot of browser test
6. Disconnect the server network and try again. The client clicks the button to send the POST request. After receiving the request, the server immediately unplugs the Network cable and waits for a few seconds to reinsert the Network cable and observe the Network status and packet capture data of the client:
The client Network shows that two requests were sent, one for OPTION, one for Preflight and one for normal POST. After a few milliseconds, the OPTION request for Preflight returns, and the POST request remains pending. (figure 6)
FIG. 6 screenshot of browser test
The Wireshark logs show that after unconnecting the server Network cable, waiting for a period of time, and then reinserting the Network cable, the client sends another request but the client is unaware of the request and the Network is in the Pending state. Then the server sends the request to the client after processing the second request. The client receives a normal response packet with status code 200 from the server. Procedure The scene is completely consistent with the production. The reproduction is complete and 100% reproducible. (Figure 7, 8)
Figure 7 Wireshark test screenshot
FIG. 8 screenshot of browser test
Conclusion:
1, summary of the above analysis, is the client browser initiated HTTP1.1 protocol request, in the server network abnormal disconnect situation, sent a RESET command to the browser, the browser spontaneously launched the second request, and no perception of the application layer. The next step is to consider why the browser spontaneously initiates the second request and is unaware of the application layer, causing the application layer to view it as initiating one request and receiving the second response.
2. First of all, the Chromium kernel browser version 76 on the platform now uses HTTP1.1 by default. Connection: keep-alive: This Connection mode can improve the TCP Connection status. In this mode, a TCP Connection is held only once (as shown in Figure 7), and multiple packets of data can be sent continuously without disconnection. The keep-alive mechanism can reduce the number of TCP connections established, which also means that TIME_WAIT connections can be reduced. This improves performance and increases HTTPD server throughput (fewer TCP connections means fewer kernel calls,socket accept() and close() calls). Can this problem be solved by turning off KeepAlive?
3, Connection: keep-alive: The browser does not allow modification of the protocol header, so other than keep-alive, what other situations can cause duplicate links? After viewing the Wireshark and the browser’s Network, you can find that an OPTIONS request is sent before a NORMAL POST request. The browser will first make a pre-request using the OPTIONS method to determine whether the interface can communicate properly. If not, it will not send a real request. If the test communication is normal, it will start the real request. So could it be that the OPTIONS request had some impact?
4. From this perspective, we first thought and verified what content-Type would cause an OPTIONS request to be sent. The Wireshark uses a simple XmlHttpRequest to send a POST request to the server. The Wireshark finds that the simple XmlHttpRequest does not send the second request even if the server is disconnected. Only Axios sends the second request. In this case, you need to compare the captured analysis packet headers in the two cases.
Figure 9 Wireshark test screenshot
5. By comparing the sending of the two requests (Figure 9), XmlHttpRequest sends the content-type text/plain, while AXIos sends the content-type application/json. Next, we manually change axios’ default Application/JSON content-type to Text /plain and find that the browser does not automatically send the second request after disconnecting the server network again (Figure 10, Figure 11).
Figure 10 Wireshark test screenshot
FIG. 11 screenshot of browser test
6. Therefore, the root cause of the two requests received by the whole link after the reverse proxy server is the Content-Type: Application/JSON content type. The browser determines that the request will be automatically resent after the server network is restored, and has no awareness of the application layer.
Solution:
So what are the types of Content-Types? Are there any definitions and categorizations? It is found that HTTP requests are classified into simple requests and complex requests. Simple requests do not send OPTIONS pre-requests. Simple requests must meet the following conditions:
Therefore, the content-type of a simple request is usually text/plain, multipart/form-data, application/ X-www-form-urlencoded.
After our repeated validation with various Content-Type types and different types of browsers (Chrome and Firefox), we summarized several cases as follows:
Content-type | Whether Chrome will repost a second time by default | Whether Firefox resends a second message by default |
---|---|---|
application/json | will | will |
text/plain | no | will |
Application/x – WWW – form – urlencoded. | no | no |
multipart/form-data | unmeasured | unmeasured |
Finally, it was concluded that the most suitable client to send HTTP request should be Application/X-www-form-urlencoded.
The final solution is as follows:
The content-Type of client AXIos was manually changed to Application/X-ww-form-urlencoded. Since the content-type was form-data, QS was needed for body serialization transformation.
legacy
What are the underlying causes of browser retransmissions, and what do they retransmit and what don’t?
Reference Materials:
www.w3.org/Protocols/r… Developer.mozilla.org/zh-CN/docs/… Dev. To/p0oker/according to -… Dev. To/effingkay/c… Developer.mozilla.org/en-US/docs/… Segmentfault.com/a/119000000… Blog.csdn.net/edward30/ar…