How to ensure network performance when tens of thousands of concurrent connections are made
In the past few decades, the explosive growth of the Internet, rich content and endless DDoS attacks have posed great challenges to network performance and also promoted the rapid development of network infrastructure. As the bandwidth of carriers increases, the performance of hardware such as CPUS and network adapters also increases. But for a long time, the performance improvement of software lagged behind that of hardware and severely limited the performance of applications, which had to rely on heap machines to cope most of the time, resulting in a lot of wasted resources and cost increases.
With the continuous development of software, in the first decade of the new century, the problem of C10K was solved through multi-threading and event-driven (kqueue/epoll, etc.). But in the second decade it was overwhelmed and new solutions were needed to cope with the increase in traffic.
For example, the HttpDNS service provided by Tencent Cloud doubles the number of requests every few months, which has a strong demand for high-performance network processing and user-mode protocol stack. The kernel stack used in the early days of HttpDNS could only do TCP short connection services of less than 100,000 QPS per machine. With the progress and development of technology, such as REUSEPORT, the subsequent kernel protocol stack can also achieve hundreds of thousands of QPS, but there is still a very large horizontal expansion bottleneck. Under such a bottleneck, Tencent Cloud urgently needs a high-performance network service framework, so it chooses DPDK+ user-mode protocol stack for kernel bypass to improve network performance.
In Robert David Graham’s 2013 talk on C10M, the main argument on how to achieve ten million concurrent connections was that the kernel is the problem that prevents performance improvement, and we should bypass the kernel (kernel by pass) and a lot of other technical optimizations. Such as polling, zero copy, Hugepage and so on.
EBPF and XDP introduced later by Linux kernel can also greatly improve network performance, but its essence of improving performance is still to bypass the kernel. So far, it has not caused a substantial impact on the Intel DPDK ecosystem, especially the dependence on high kernel version and nic driver, which severely restricts the use and promotion in enterprises.
Before this speech, relevant technologies have been applied, such as PF_RING, Netmap, IntelDPDK and other data-driven technologies mentioned in the speech. Tencent Cloud DNSPod has completed the research and selection of relevant software and hardware in 2012. And finally choose DPDK (not open source at this time) to achieve a new generation of authoritative DNS server to achieve a single 10GE 11 million QPS performance, greatly improving the conventional DNS resolution and anti-attack capabilities. But it wasn’t until this talk that the technology was widely developed in the industry, and DPDK, in particular, became almost standard for high-performance web applications. In 2016, we separately extracted the network module using DPDK from authoritative DNS as an independent general network framework, which can be reused to multiple businesses to improve network performance, namely the current F-STACK.
F-stack introduction and technical features
F-stack is a high performance network access development package in full user mode. It is based on DPDK, FreeBSD protocol Stack, microthreading interface, etc. Users only need to pay attention to the service logic and simply access f-Stack to achieve high performance network server. Although the kernel bypass of network packets to the application layer for processing greatly improves the network performance, it is no longer possible to use the network protocol stack of the kernel. This has little impact on layer 4 and simple LAYER 7 UDP applications, but for other layer 7 applications, a mature user-mode protocol stack is necessary. So F-Stack is the solution given by Tencent Cloud DNSPod.
F-stack is basically a complete network programming framework, equivalent to glue-bonding DPDK network I/O module, FreeBSD user mode protocol Stack, POSIX Like API, asynchronous programming interface, part of the upper application, etc., for user access.
Using pure C development (some third-party components use C++,F-Stack encapsulation), easy to use, but also requires users to have a certain DPDK foundation. The BSD 2-clause open source protocol is very business-friendly. So what are the technical features of the F-Stack? We’ll continue with that.
Multi – process architecture, polling mode
Here is a basic architecture of F-Stack, using multi-process model, full user mode, each process is bound with a CPU core, network card transceiver queue, has better memory locality, avoid cache failure, and process internal polling mode, no lock, no scheduling, no context switch.
At present, F-Stack adopts multi-process architecture. Each process has its own protocol Stack, application interface and application layer business logic, avoiding various performance bottlenecks of the kernel. There is no data sharing between each process, and it has very good horizontal expansion capability.
DPDK development kit
DPDK is a widely used data plane development suite, and I won’t talk too much about it here.
In addition to the initial open source version 16.07, F-Stack quickly upgraded and maintained the LTS version (XX.11) of DPDK, but generally upgraded in the Dev branch several months after the latest LTS release. For example, DPDK 18.11.x and 19.11.x are used in the 1.20 and 1.21 versions of F-Stack, while 20.11.x is supported in the development branch.
FreeBSD protocol stack
For the use of the FreeBSD protocol Stack for user mode porting, there is actually a lot of thinking and trying behind, here only a few advantages of FreeBSD protocol Stack, more information can be read through the following F-Stack background story.
-
The protocol stack is fully functional, and a large number of tools can be used for network debugging and analysis, such as SYSCTL, ifconfig, Netstat, Netgraph, IPfw, NDP and so on.
-
You can follow up on community improvements without having to develop and maintain them yourself. You can use native user mode ports for reference, which greatly reduces the workload. See libplebnet and Libuinet.
-
Compared to Linux, the implementation of the protocol stack is more complex, and the code of FreeBSD is clearer and easier to understand; Linux is open source under the GPL and may be restricted to some users.
The current release versions of F-Stack are all based on FreeBSD Releng 11.0 version, and some patches of later versions have been migrated, with perfect functions but redundant (some modules have been removed and not compiled into F-Stack, such as SCTP and IPSEC, etc.). The debugging analysis tool is perfect, and the operation is stable. It will be upgraded to FreeBSD Releng version 13.0 and will continue to follow major community improvements.
POSIX compatible interface
F-stack provides a POSIX like interface with the prefix ff_, such as FF_socket and ff_bind. It also provides the FF_kqueue event-driven interface and encapsulates the FF_epoll interface based on the kqueue. Except the “FF_epoll” interface is slightly different from Linux system interface, other interface usage is completely compatible, and existing programs can be accessed by simple changes.
It should be noted that although the interface usage is fully compatible, the f-Stack interface will carry out the conversion of the definition because many tag bits are defined differently in Linux and FreeBSD systems. However, it cannot guarantee 100% support, especially for the newly added tag definitions, which also need to be updated and maintained continuously.
POSIX like interface is portable and safe for original applications. However, because it involves memory copy, the performance is not optimal. F-stack will also provide an independent zero-copy API for users who need it.
Tasklet framework
F-stack applications must be programmed using an asynchronous mode interface, but also provide a microthreading (coroutine) framework that allows users to program synchronously and execute asynchronously.
The microthread framework uses a part of Micro_thread in MSEC, which is also open source of Tencent. It needs special attention that the open source protocol of the microthread module is GPL-2.0, which is not the main core module of F-STACK and has no impact on the main open source protocol of F-Stack. However, if you use the micro_thread module for application development, you need to pay attention to the impact of open source agreements.
Application of transplantation
F-stack currently provides access to lib libraries, which need to be compiled and packaged together with business applications, and directly provides ported Nginx and Redis applications for users to use directly.
In order to achieve better performance and scale-out capability, our recommendation is to split and reduce resource sharing as much as possible for some applications with traditional multi-threaded architecture. F-stack will also consider providing separate network I/O and protocol Stack modules if this is not possible, but performance degradation will be inevitable.
Applicable scenario
Nginx uses f-Stack and kernel Stack respectively to compare the performance of short and long links. It should be noted that kernel Stack is also the test data after various tuning, such as network card queue, worker CPU affinity binding. Enable REUSEPORT and other kernel network parameters optimization.
Here, F-Stack significantly improves the kernel protocol Stack, but especially the short link after more than 12 cores. F-stack has good performance optimization and application value for most high concurrency network application scenarios, among which the most suitable is the ultra-large concurrency TCP short link service scenarios. This is also our main business scenario for HttpDNS.
Of course, to fully understand the business application of F-Stack, we must start from the beginning of its development history.
F-stack history
At present, f-Stack is version 3.0, version 1.0 is the authoritative DNS of DNSPod in 12-13. When DPDK is selected to improve performance, it is a simple user-mode TCP Stack used to support TCP DNS. After being online in 13 years, f-Stack has been continuously running online. They have all been upgraded to 3.0 in the last two years.
In order to support the rapid development of DNS services, it is necessary to have a high-performance user-mode protocol Stack, and it takes a lot of energy to maintain a fully functional TCP protocol Stack, which is also a very important reason for developing F-Stack 2.0 and 3.0.
In 2016, under the approval of the leader at that time, we gave up maintaining the 1.0 protocol stack and chose the open source protocol stack for adaptation and upgrade and external open source. Through research, we first chose Seastar (excluding MTCP, LwIP, etc.), and made version 2.0 in that year, and some application adaptations. For example, HttpDNS, Tencent Cloud dynamic accelerated CDN (DSA, now has been merged into the full site accelerated ECDN), but the ideal is beautiful, the reality is harsh, although the HttpDNS based on F-Stack2.0 version is perfect in the laboratory, excellent performance, extensible plug-in architecture, etc. However, numerous pits were stepped in the operation of a small amount of gray scale in the live network, which is related to the use scenarios of Seastar itself. As a component of ScyllaDB, its main application scenarios are on the internal network, which cannot well adapt to the complex network environment of the external network.
After the team filled in a lot of holes and submitted multiple Pull Requests to Seastar, we found ourselves stuck in the 1.0 cycle again, so we gave up Seastar after sticking with it for a while. Instead, FreeBSD was chosen from the more mature Llinux and FreeBSD stacks to develop F-Stack 3.0, which is currently open source. Of course, f-Stack 2.0 framework is not completely abandoned, although the main service in the external HttpDNS does not adapt to the environment, but in the internal network acceleration as the main scene of CDN dynamic acceleration DSA is running for many years before upgrading.
In the first half of 2017, we completed f-Stack 3.0 development based on DPDK and FreeBSD protocol Stack, and opened source to the outside world, and quickly re-adapted HttpDNS. Because the number of HttpDNS requests has been growing rapidly, the business performance pressure is very high, so the adaptation of HttpDNS is prioritised. And gradually online to provide external services, although the follow-up also encountered some problems, but are soon optimized and stabilized, so far to support the daily request volume of trillions of HttpDNS requests and maintain 10 times
F-stack Open source version history
- 2017.4.14 Officially open source
- 2017.11.27 Release 1.11
- 2018.5.21 Release 1.12
- 2019.11.15 Release 1.13
- 2019.11.23 Release 1.20
- 2021.1.29 Release 1.21
F-Stack ROADMAP
The F-Stack has also been under continuous maintenance, with a release of version 1.22 expected in late 2021 and early 2022 that could include the following new features
- For DPDK 20.11, the Dev branch has been updated to support a very different way of compiling and using it than before 19.11, only using Meson/Ninja.
- The dev branch has been upgraded to support FreeBSD 13.0, but it is not completely stable yet. There are still some problems, such as BBR/RACK does not work properly, and some problems of multi-process performance need to be optimized. Some functions of some tools are abnormal (such as ff_netstat viewing the listening port), and further debugging and optimization are needed.
- New zero copy interface support.
- LD_PRELOAD and other ways to simplify application migration thresholds, but will certainly lead to performance degradation.
- Nginx – 1.20 support.
- Redis 6 support.
- The default Flow Director mode of the receiving network adapter is changed from RSS to Flow Director, but the existing default RSS policy remains.
[Note] The above functions will be adjusted depending on the specific time schedule. Some functions may not be included in version 1.22 and will be supported in subsequent versions.
F-stack practice cases
F-stack has been recognized by a large number of research institutions, universities and companies around the world since it was open source. It is used for technical research or online commercialization projects. Here, it will give you only the practical cases of f-Stack users’ actual live network business.
Tencent cloud HttpDNS
HttpDNS service is mainly used for mobile APP, to solve the default DNS a large number of existing resolution failure, resolution results cross the network, resolution hijacking and other problems, currently the major TOP APP most have use of such technology, and Tencent cloud DNSPod as the earliest launch of commercial HttpDNS service, At present, the latest HttpDNS has been iterated to update a number of versions, the new professional version supports more features and functions, such as IPv6, DNSPod authoritative data push, user-defined domain name resolution, dangerous domain name interception (user-defined whether to enable and which types of dangerous domain name to block), blacklist and whitelist, request statistics and a series of other functions are also built on the F-Stack infrastructure.
DNSPod authoritative DNS
As the parent project of F-Stack, DNSPod authoritative DNS provides authoritative resolution services for nearly 10 million domain names. Benefiting from the high-performance network services of F-Stack, the latest version of authoritative DNS has reached the performance of 100 million QPS per machine on 100 GIGABytes. For details, please refer to my previous article “Performance Optimization Practice of 100 million QPS authoritative DNS single machine Based on F-Stack”. At present, the capacity of DNSPod bus has reached billions of QPS. Combined with Tencent group’s deployment of large-bandwidth nodes all over the world and advanced protective equipment and algorithms, DNSPod successfully defended against MULTIPLE DDoS attacks of TB level and above without customers’ awareness. The latest attack occurred on Friday afternoon of August 27, 2019. Multiple attack modes were mixed and the total peak value of attacks on the platform exceeded 5T.
Other user-mode protocol stacks are introduced
VPP
VPP is dominated by Cisco and participated by many large manufacturers. Its user-mode Host Stack is developed from Cisco switch Stack and opened source later than F-Stack, but it is the user-mode protocol Stack with the highest community activity at present.
MTCP
MTCP Stack comes from KAIST in South Korea and is widely used in the industry. The main problem is that it only supports TCP as its name suggests.
Seastar
As a subproject of ScyllaDB, Seastar’s Native Stack performs well on the Intranet and is often used in Intranet scenarios.
LwIP
LwIP is a lightweight protocol stack from Swedish Academy of Computer Sciences, which is mainly used in embedded systems. However, many manufacturers modify and transplant based on LwIP to support their own applications.