▌ takeaway

The network boundary of the infrastructure of Internet companies with a certain business scale usually presents a certain degree of complexity and multi-area. How to carry out effective security protection and control will become the focus and difficulty of security system construction. Facing this challenge, iQiyi security team developed the network traffic analysis engine QNSM and applied it in various cross-region security detection and control scenarios based on traffic analysis, becoming the key basic engine of iQiyi security defense system. This article is compiled according to the contents of iQiyi network Traffic Security Detection Capacity Construction Practice shared by Lu Mingfan, senior director of IQiyi, at QCon 2019.

 

The boundary complexity of Internet enterprises

The figure above shows the network architecture of a typical medium or large Internet company, which is usually divided into:

· Office network, possibly including many small branches as shown in the lower right corner, and even an office room

· Core data centers. These data center rooms may be distributed in many places across the country and are interconnected through dedicated lines. On this basis, they build their own private clouds.

· CDN network. These CDN nodes are also connected to the core machine room to a certain extent.

· A hybrid cloud model is presented if public cloud infrastructure services are used, which are interconnected to the core data center.

· A large number of the above partitions may be connected to the Internet in different ways.

· Finally, THE emergence of BYOD and various wireless hotspots as well as mobile wireless hotspots leads to a large number of fragmented so-called new boundaries.

For enterprise security defense, it brings various new challenges:

· Security defenses become fragmented and multi-layered, with the possibility of more small boundaries within large boundaries.

· The traffic may be very large at a single boundary, especially at the boundary of Internet interconnection, where more than 100Gps becomes the norm. If traffic detection is carried out at these boundaries, it is required that traffic detection should have high performance and good horizontal expansion ability to cope with the expansion of boundary traffic at any time.

· Enterprises are still facing severe internal and external threats, such as flow-based attacks, loophole attacks, border penetration and leakage, destruction and control by internal personnel.

· The security defense system is gradually evolving, from simple border defense, in-depth defense combined with multi-level internal defense, and security models based on zero trust or assumption of failure to closely link roles and permissions are all promoting our security defense to constantly upgrade and iterate to deal with increasingly severe threats. The evolution of security defense system does not mean the disappearance of boundary. Boundary defense is still a basic and important link. Effective traffic analysis and control has become an important data source and a control point that can be arranged by joints in the new security defense system.

Therefore, we have developed the bypass traffic analysis engine QNSM(iQiYi Network SecurityMonitoring). Through the cooperative analysis of service traffic and bypass traffic, and the integration of various schedulable defense capabilities, we can cooperate to deal with various types and levels of attacks, and the security operation system is open. Gradually form a closed loop from access, plan, response, linkage, defense and traceability.

▌ QNSM profile

Full flow analysis is very important, can be used for asset discovery, network monitoring and visualization, for security, through the analysis of network traffic, build a baseline for flow modeling, from traffic can be found in the abnormal, risk and detect attacks, also can realize the data content extraction from traffic, found that potentially sensitive data flow or leak, In addition, the ACL policy can be proofread and data characteristics generated by network traffic can be analyzed by machine learning and experts to mine more information and perform forensic tracing and event lookup.

The design goal of QNSM (iQIYI Network Security Monitor) is to become a full-flow, real-time, high-performance Network Security monitoring engine. High-performance, real-time, scalable, and multivariate feature extraction are the key features we need.

· High performance: based on DPDK (www.dpdk.org/), it can process the traffic of more than 10Gbps at a high speed by using ordinary x86 servers, bypassing the complex protocol stack of the kernel, and using polling to send and receive data packets. It is based on the master-slave and pipeline architecture, and has no lock design.

, strong expansibility: bypass deployment, combined with spectral distribution can support more rapid lateral extension, its own assembly line design can also be convenient to realize the insert custom components, and provide uniform resource management model based on configuration files, including queue, CPU, MEMPOOL etc., can quickly set up data exchange network, accelerate the development process.

· Multiple features: Multiple dimensions of DDoS detection feature data, support basic DFI/DPI, Suricata IDPS integration in the form of library files, support ipv4 and ipv6 dual-protocol.

· Real-time: integrated IDPS real-time detection, DDoS detection supports the output of aggregated data in 10s (adjustable) cycle. The extracted multivariate features can be connected and analyzed through Kafka and security intelligent analysis service.

▌QNSM Architecture Design

As shown in the above, QNSM is run as a service software in ordinary multi-core X86 servers, each server can have much Wan Zhao interfaces, and through the spectral or exchange port mirror way to analyze the network traffic, if flow rate is bigger, can be further through the shunt shunt, disperses traffic analysis to different servers, Furthermore, QNSM uses DPDK to realize multiple multi-core processors and multiple multi-queue network cards to realize high-speed packet processing. Its high performance comes from:

· Zero-copy

· Prefetch and batch collect to reduce cache misses and improve throughput

· Share Nothing design mode to achieve no lock, no CPU switch

· Make full use of RSS feature of network card, bind packet receiving queue with CPU core

1. Base library

QNSM encapsulates and constructs various basic libraries on the basis of DPDK, providing basic capabilities for the construction of the upper flow line, including:

· PORT: logical encapsulation of the ring queue between the network card queue and the core, which is the basis for parallel processing and linear expansion.

· MSG: encapsulates the communication messages transmitted between CPU cores that support callback, which can be policy messages or data messages. Such non-blocking inter-core messages support one-to-one and one-to-many communication, realizing the separation of data and control plane, and the separation of data set output and packet processing.

· ACL: it is a policy description of a quintuple that supports callback. For example, to specify which policy the package must be aggregated, processed, dumped, and other operations.

· TBL: encapsulates DPDKrte_hash table, stores data sets, provides CURD operation interface, and implements memPool-based entry resource allocation.

· SCHED: encapsulation of worker threads, supporting custom packet processing, policy execution, custom calculation logic, message distribution and timed callback, etc.

2. Assembly line

QNSM constructs different pipeline components to meet the security application of different scenarios. In order to support more security application scenarios, we can build more pipeline components on the basis of the basic library, so as to achieve diverse network traffic processing capabilities.

· SESSM: responsible for packet parsing, flow data aggregation and replication and forwarding, processing policy messages, etc.

· SIP_AGG: implements feature aggregation and output for source IP addresses, and is opened in response to attack policy messages.

· VIP_AGG: realize self-learning of target VIP (business IP that needs to be protected), feature extraction and output to EDGE based on target VIP.

· DUMP: Saves data packets as PCAP files for subsequent event tracing, which is opened through response policy messages during attacks.

· EDGE: Responsible for outputting multidimensional data sent by upstream components to external Kafka for further analysis.

· DETECT: Integrated Suricata library, supporting IDPS detection.

Based on the existing pipeline components of QNSM, we have applied it in DDoS attack detection, IDPS detection and protection, traffic monitoring, network DLP and other scenarios, and supported the development of a variety of upper-layer security products. You can design and plug in your own components according to the needs of your different security applications.

3. Control layer

Master is the main control of the engine. It receives policy messages from the management platform through Kafka and delivers them to the pipeline components for configuration and processing control of the pipeline.

4, security applications

The figure above shows how IQiyi applies QNSM to meet various security requirements. Iqiyi’s QNSM service nodes are distributed on the boundaries of each network partition, and are managed and maintained through the edge control center. As the core service of IQiyi network security protection, Aegis has the following functions:

· Manage and configure all QNSM clusters and control interactions between Kafka and QNSM

· Manage and configure IDPS (Suricata) for QNSM integration through IDPS gateway

· As the unified service background of IQiyi WAF, there will be no key introduction here

· iQIYI internal safety data analysis engine combined with threat intelligence and other external data, from the EDGE component output data at the Kafka QNSM cluster analysis processing, the output of control center, network attacks will send EDGE EDGE control center will further docking and situational awareness system, according to the strategy and docking realize closed-loop operation and safe operating system.

· The edge control center can dump network traffic into PCAP files according to event-driven QNSM. And ETL into Moloch (github.com/aol/moloch a Largescale, Open Source, Indexed Packet Capture and Search System) to establish the package index. Facilitates packet-level analysis and tracing of events.

We will introduce the structure and design of the edge control center in the subsequent sharing articles, which will not be repeated here. Below, we briefly describe how we use QNSM to meet our requirements for DDoS attack detection and extended IDPS support.

4.1 DDoS Attack Detection

When services access the edge control center, it will provide Virtual IP addresses (VIPs) to be protected. In addition, QNSM will proactively discover target VIPs in traffic and compare with CMDB to discover VIPs to be protected. DDoS attack detection mainly aggregates feature data based on protected VIP traffic for the security big data analysis engine to determine and detect DDoS attacks. The overall detection idea is to build the traffic baseline of the target VIP and the machine room where it is located, calculate the traffic characteristics, and carry out multidimensional anomaly detection and identification of attacks. The common traffic baseline includes the component baseline of the VIP and the machine room, the traffic baseline of the machine room and the upper boundary of the traffic. Real-time calculation is carried out based on the current traffic and baseline. Construct multidimensional index features representing the deviation between current flow and baseline, and use explanatory models to detect and judge (such as score card model, etc.), and further revise baseline and training models by taking positive and negative feedback of events from the operations of the edge control center.

After traffic attacks are identified through the previous multidimensional anomaly detection, the edge control center will take the following actions immediately after receiving the attack event and alarm:

· The edge control center delivers various policy messages to Master through Kafka to manage and guide pipelining, including dump packet forensics, attack source IP discovery, attack source port extraction, reflection attack protocol DFI and other policy messages.

· Master wakes up SIP_AGG and DUMP according to the policy. SIP_AGG aggregates characteristic data based on the source IP address (which can be used to help discover the source IP address of subsequent attacks). DUMP dumps data packets. PCAP files dumped were sent to Moloch for further index and expert analysis.

· VIP_AGG component aggregates feature data based on VIP+SPORT, and SESSM component also makes protocol DFI identification for attacked VIPs to help identify whether there is reflection attack of a certain protocol.

· QNSM aggregates data into Kafka through the EDGE component, which can be used as the data source for secure big data analysis, and can be linked with other different security services.

· After the DDoS attack, the edge control center delivers a policy to the Master component through Kafka to shut down message management and guide the work of heavy components in the pipeline.

When an attack is detected on VIP entry traffic, it is usually necessary to further determine whether the attack is a reflection attack. We will use VIP+SPORT aggregated feature data produced by QNSM and DFI protocol identification feature data of SESSM components. In the safety data engine calculate flow rate of different source port of distribution and packet of distribution characteristics and then calculate the entropy value, the smaller the entropy value, the higher the risk, the higher the ratio, the higher the risk, flow rate and packet than if all concentrated in a port, the entropy will be 0), we will be combined with the feature of multidimensional model building scorecard, Finally, whether it is reflection attack of a certain protocol is determined.

After the attack is confirmed, traffic traction will be carried out according to the emergency plan. Iqiyi has built a private traffic cleaning center and formed a trinity of cleaning capabilities by combining cloud cleaning and near-source cleaning of operators.

4.2 IDPS Capability Integration

Suricata is an IDPS engine based on network traffic. It has a wide set of rules to monitor network traffic and trigger an alarm when an intrusion event occurs. QNSM integrates Suricata in the way of library files, and receives the update of Suricata related detection rules issued by the edge control center through IDPS gateway. The packet copied and forwarded from the SESSM component will be processed by calling Suricata through the Detect component to realize real-time detection and trigger events and alarms, and output events and alarms to the secure big data analysis engine for further analysis and processing through Kafka. By integrating Suricata, QNSM can be compatible with a large number of open source and custom IDPS rule sets, and rule management remains completely consistent.

Using Suricata’s DFI capability, QNSM also rapidly expands its ability to identify all kinds of databases from traffic, cache and other cloud services to access traffic, and supports extracting file information from traffic (including file name, file size, file type, MD5, etc.), output to the secure big data analysis engine through Kafka. Finally, the system outputs data leakage and illegal access events to the DLP platform (Green Shield).

Iqiyi supports HTTP, MySQL, Redis, CouchBase, Memcached, MongoDB, Elasticsearch, Kafka, VNC, RSYNC and other common protocols and related tools. Monitoring FTP file transfer channels.

▌ open source

At present, QNSM has been applied to iQiyi in various security detection scenarios, including DDOS attack detection, IDPS, network DLP and so on. A total of 22+ clusters and 130+ analysis nodes have been deployed, and the total analysis bandwidth capacity reaches 1TBps.

We need you to work with us to improve QNSM and make it more powerful, and we welcome more collaboration and contributions to cover more scenarios than just security applications.

QNSM project eliminated and iQIYI internal platform part deep integration, the core code open source in GitHub:www.github.com/iqiyi/qnsm, welcome to use, reporting Issues and submit a Pull Request.

We adopted Git work-flow mode, for details please refer to the Contributing document in the open source code, more developers are welcome to participate, and also welcome to communicate with us through the email group: qnsm_developer #qiyi.com (please use @ instead #).

▌ Follow-up Planning

In the future, iQiyi security team will continue to optimize QNSM, and the development plan is as follows:

· Enhanced ease of use, making configuration easier to use

· Further optimize performance and reduce resource footprint

· Increased ability to analyze non-content and content characteristics of encrypted traffic

· Advanced DPI and DFI capabilities

· Support netflow and other standard outputs

· Integration of Bro/Zeek and sandbox analysis

· Bypass TAP(shunt) and serial Firewall(filter) capabilities

· Expand to more scenarios, such as enabling business security and intelligent operation and maintenance

· Gradually open the cluster management, monitoring and event disposal capabilities of the Edge Control Center

The latest developments will gradually be synchronized to the open source community.

Maybe you’d like to see more

Open source | FASPell: SOTA high-performance jianfan Chinese spell check tool

Open source | iQIYI Qigsaw open-source Android App Bundle based dynamic scheme

Scan the qr code below, more exciting content to accompany you!

\