Introduction: CDN is to cache all kinds of Internet content to the edge servers close to users by deploying edge servers in a distributed global scale, so as to reduce the access delay of users and greatly reduce the traffic across the core network of the Internet. The use of CDN in Internet business has become an inevitable choice.
Challenges facing enterprise edge applications
CDN is to cache all kinds of Internet content to edge servers close to users through distributed deployment of edge servers around the world, so as to reduce the access delay of users and greatly reduce the traffic across the Internet core network. The use of CDN in Internet business has become an inevitable choice. Traditional website protection is basically to protect the source, customers buy firewall, WAF and other products can protect their core business content is not malicious theft. However, the traditional defense mode cannot fully meet the requirements for service traffic distribution through CDN:
- The deployment position is in front of the source site to protect the source site. In CDN architecture, pages are basically cached in CDN, and crawler can directly crawl away user sensitive business data from CDN.
- The identification method mainly relies on embedding JS in the user’s page, which essentially modifies the user’s page and is highly intrusive. Moreover, it can only be adapted to Web business and does not take effect for API business.
- Generally, the disposal means limit high-frequency IP and other features through frequency control, which is easy to be bypassed. At present, crawlers basically adopt the mode of IP proxy pool to randomly modify the header end of the request, so it is difficult to find features for frequency control.
CDN currently undertakes a large number of businesses from the master site, so it is necessary to ensure business browsing and trading experience to prevent content from being maliciously stolen. More and more service data are cached on the edge server of CDN, and the weight of edge security is getting higher and higher. The machine traffic management based on edge cloud emerges as The Times require to deal with the hidden danger of CDN edge security and realize the security protection of user application data.
The realization and advantage of machine traffic management of edge cloud
The analysis and processing flow of machine traffic management based on CDN edge node is shown in the figure below:
Internet access is generally divided into normal user access, commercial search engine access, malicious crawler access, etc. Machine traffic management extracts request packet features at the edge, identifies request types based on packet features, blocks malicious crawler access at the edge, and protects cache resources on CDN from malicious crawler access.
Advantages of machine traffic management are as follows:
- Based on the CDN edge network architecture, the machine traffic management ability is realized. The request types of domain names are identified by the characteristics of the request packets, and the normal or malicious machine requests are distinguished, so as to help users manage their own requests and block malicious requests.
- By identifying the request types of domain names and marking the request packet types in real time, the current service request packet types are displayed intuitively. Customers can intuitively perceive the distribution of access types of their own websites and handle abnormal packet types.
- By disposing of the packet type rather than the IP address, the attacker cannot bypass the random header field or the dial-in-seconds proxy IP address pool as long as the packet type of the malicious request remains unchanged.
Verify the actual result of machine traffic management
In the double-11 service scenario, machine traffic management identifies all traffic visiting the master site detail page and classifies Bot traffic. The core strategy is to allow normal commercial crawlers such as search engines to restrict or intercept malicious crawlers.
By analyzing the traffic of the detail page and the behavior characteristics of the request, it is found that nearly 40% of the requests are malicious accesses. Before double 11, by enabling the disposal strategy, a business of the main site successfully intercepted more than 70% of the crawler traffic. The following figure shows the traffic comparison before and after the processing policy is enabled. The blue line shows the traffic trend before and after the processing policy is enabled, and the green line shows the traffic trend after the processing policy is enabled. The interception effect is very obvious, and actual services are not affected.
On November 11, basically, the access characteristics of requests did not change. Finally, hundreds of millions of malicious requests, millions of malicious IP addresses and tens of millions of malicious commodity IDS were intercepted.
CDN machine traffic management undertakes more protection of master site services, and it is found that part of the crawling request of master site content can pass through the protection policy, that is, the crawling request behavior changes. Through the analysis of online QPS, it is found that the mutant crawler mainly uses the BROWSER engine of IE, and the source IP uses a large number of dial-second proxy IP, which has obvious commercial crawler characteristics. After the report, the emergency plan was quickly formed and the abnormal type was quickly dealt with.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.