As I have been doing things related to gateways recently, I would like to talk about my understanding of gateways. In terms of the responsibilities and positioning of the gateway, the enterprise internal gateway is divided into two categories, one is API gateway, the other is application gateway. Why is it classified? The main reason is that they focus on different aspects.
In terms of traffic, THE API gateway is mainly used to deal with north-south traffic, while the application gateway is mainly used to deal with east-west traffic inside the grid.
Functionally: there will be some overlap between API gateway and application gateway. For example, authentication, traffic limiting, fusing, path rewriting, health check, multi-protocol, etc. However, API gateways are completely irrelevant to business processes, such as aggregation and orchestration of services, service discovery, and so on.
In terms of deployment mode, the API gateway and LVS are both deployed on physical servers with high performance (although some large factories may deploy them on the cloud), which is a company-level application. Application gateways can be deployed on the cloud or virtual machines (VMS), which is a department-level application.
From the technical selection: The performance of API gateway is the first indicator. Generally, high-performance gateways such as Kong and Apisix based on OpenResty+Lua will be selected (thanks to the high-performance non-blocking network IO model of Ngnix based on C++). Generally, application gateways will be selected based on their own business technology stack. Examples include SpringCloud Gateway, Zuul, and others. This is by no means absolute, and if you’re familiar with Kong, it’s not out of the question to use it as an application gateway.
Examples of open source gateway projects:
- Kong
- Apisix
- Envoy
- Traefik
- SpringCloud Gateway
- Zuul / Zuul2
Next we will focus on application gateways. Application gateways within the grid focus on the following functions (unlike API gateways)
- Dynamic routing
- Service discovery
- Service aggregation/choreography
- observability
If you are using the Sping stack, using SpringCloud Gateway and Zuul makes it easy to reuse existing libraries, such as integrating your registry, using Hystrix, And Resilience4J to implement fusion-limiting functions. Quickly get a production-level application gateway available, and costs can skyrocket if a new and complex technology stack is introduced. Depending on the usage scenario, performance is sometimes not the first metric, but it is often easy to fall into the performance trap.
So what’s the difference between SpringCloud Gateway and Zuul? The biggest difference is that SpringCloud Gateway uses a responsive architecture, while Zuul uses a blocking architecture. SpringCloud Gateway was built based on Reactor Netty and Zuul was built based on Server.
What’s the problem with blocking architectures? The following is explained by reference to the section “RxJava Responsive Programming – Solving the C10k Problem”, as follows
A blocking architecture requires one thread for each request, and if there are 10,000 threads, this is what happens.
- Several gigabytes of RAM are consumed to store stack space.
- This puts a lot of pressure on the garbage collection mechanism, but the stack space is not available for garbage collection (lots of GC roots and live objects). –
- A lot of CPU time is wasted just switching cores to run various threads (context switching).
In some scenarios, the classic thread-per-socket model works fine, and in fact, it works fine for many applications today. However, after a certain level of concurrency, the number of threads becomes dangerous. It is not uncommon for a single commercial server to handle 1000 concurrent connections, especially in long-lived TCP/IP connection scenarios such as HTTP with keep-alive headers, server-sent events, and Websockets. However, each thread occupies some memory (stack space), whether it is performing a calculation or waiting for data to arrive.
There are two independent approaches to achieving extensibility: horizontal and vertical. To handle more concurrent connections, we can deploy more servers, each managing a subset of the load. This requires a front-end load balancer, but this approach does not solve the original C10k problem of having only one server to handle the load. Vertical scaling, on the other hand, means buying bigger and more powerful servers. However, due to blocking I/O, a disproportionate memory footprint is required compared to the underutilized CPU. Even though a large enterprise server can handle hundreds of thousands of concurrent connections (which is very expensive), it is nowhere near the problem of C10M, which is 10 million concurrent connections. This number is no coincidence: a few years ago, a well-designed Java application on a typical server reached such a staggering level.
What are the advantages of the ring drive architecture? In a blocking processing model, one thread per request is obviously not scalable. The ringing driver architecture manages a large number of client connections with only a few threads. This approach has the following advantages:
- Reduced memory consumption and reduced thread context switching times.
- Better utilization of CPU and CPU cache utilization.
- Significantly improved scalability on a single node.
From the above discussion, it would be easy to assume that gateway performance would increase by an order of magnitude due to reduced thread context switching and increased CPU cache utilization, but is this really the case?
By putting core business logic into both blocking and non-blocking architectures, Netfix ran the production environment for several months and came to the conclusion that the more cpu-constrained the system, the less efficient the asynchronous architecture improved.
This also makes sense because reactive programming is designed to benefit the CPU in an asynchronous way, if our system itself handles a lot of metric calculations, encryption and decryption, and response compression. Blocking and non-blocking are essentially equivalent from a capacity and CPU perspective, so the throughput levels per node should be about the same as before.
If we have a system with a lot of requests (a lot of IO), but a very small response, and no encryption is required. Switching from synchronous to asynchronous at this point would increase throughput by approximately 25% while reducing CPU utilization by 25%. According to Netfix’s production practice, the less gateway work we do, the more efficiency gains we reap from asynchrony.