This article was shared by Jia Dao, senior technical expert of Ant Financial, at the Service Mesh Meetup in Hangzhou on December 28th.

The MOSN completes incubation and enables independent groups

On December 18, 2020.2019.han Chang, principal of MOSN project and application network Group of Ant Financial announced that MOSN had completed incubation from SOFAStack and would start independent Group for follow-up operation. We welcome everyone to build the community together.

MOSN is a network agent software developed using the Go language. As a cloud native network data plane, MOSN aims to provide services with multi-protocol, modular, intelligent, and secure agent capabilities. MOSN is Modular Open Smart Network-Proxy. It can be integrated with any Service Mesh that supports xDS API, and can be used as an independent layer 4 and 7 load balancer API Gateway. Cloud native Ingress, etc.

Project address: github.com/mosn/mosn

Introduction:

In the Service Mesh microservice architecture, we often hear the terms east-west traffic and north-south traffic. Ant Financial’s open source Service Mesh Sidecar: Modular Observable Smart Network (MOSN) has met and communicated with everyone for many times. Previous topics focus on Service discovery and routing of east-west traffic. So what is Ant Financial thinking about north-south traffic?

In this sharing, we will see the development history of ANT Financial API gateway, what the gateway architecture of Mesh is, what problems it has solved, the practice performance of Double Eleven and our thinking about the future.

Today’s share is divided into three parts:

  1. The definition of the API Gateway Mesh: I googled the term API Gateway Mesh and found all of them were API Gateway vs Service Mesh. So what we’re going to do is compare API Gateway to Service Mesh, and talk about how I personally understand and think about the term.
  2. The practice of API Gateway Mesh in Ant Financial: This year, Alibaba’s core system is 100% cloud bio-chemical, supporting the world-class traffic peak of Double 11. Among them, Ant Financial’s Service Mesh is shining brilliantly, with the core link fully Mesh and the scale of tens of thousands of containers. Our API Gateway also handles part of the wallet link and 100% of the requests for the payment link. In this chapter, I will look at the development process of ANT Financial API Gateway, why we do API Gateway Mesh, what our architecture is like, and some risks and tests in the process.
  3. Thinking about API Gateway under cloud native: Now everyone is talking about cloud native, but in the real practice of cloud native, there will be a variety of questions, which API Gateway scheme and form is the most suitable for your business? In the cloud native architecture, Service Mesh and API Gateway are one of the most core components. How do we think about the positioning of API Gateway in the cloud native architecture of Service Mesh? And what are some of our plans for the future? Will be shared with you in this chapter.

Definition of API Gateway Mesh

The diagram above is a cloud native, north-south + east-west traffic architecture diagram, which contains some core components, I quickly introduce:

  • LB\ingress: load balancing for SSL unloading and inbound traffic, usually doing some simple routing.
  • API Gateway: responsible for more business-oriented API signature check, traffic limiting, protocol conversion, user session, load balancing and other logic;
  • Sidecars in POD: Sidecar in the business system is the forwarding agent of east-west traffic in the machine room, usually through the internal RPC (such as SOFARPC \ Dubbo \ Thrift \ SpringCloud). The traffic is carried by the Sidecar Proxy of the Service Mesh. The Sidecar is responsible for routing (unitary, gray, canary), load balancing, Service authentication, etc.
  • Control Plane: the “big steward” of traffic Control. The most mainstream solution in cloud native is Istio, which is responsible for delivering and controlling routing policies, security, authentication and so on.

As you can see from the above description, the Sidecar of API Gateway and Service Mesh has many similar capabilities. For example, they are both network agents with load balancing, traffic limiting and authentication capabilities. Next, we’ll do a comparison between the API Gateway and the Service Mesh.

API  Gateway vs Service Mesh

Conceptually, the API Gateway Exposes internal services in a more controlled and manageable manner, in the words of “Exposes your services as managed APIs.” Service Mesh can be summarized in one sentence: (95) “A infrastructure to ple the application network from your service code,” the key word here is “decoupling.”

In terms of traffic, the API Gateway manages north-south traffic, while the Sidecar in the Servcie Mesh is generally used as a Proxy for east-west traffic. Both are responsible for balancing capabilities. The API Gateway is typically a load balancer through LVS and NGINX centralization, which we call hard load. Service Mesh is usually discovered through services, and the sidecars are called point-to-point, which is called soft load.

In terms of communication protocols, API Gateway generally receives open communication protocols, such as HTTP and gRPC, and may involve protocol conversion, such as converting HTTP to internal RPC protocol. The internal traffic of the Service Mesh proxy is generally internal private RPC protocols (WebService, Dubbo, SOFABolt, Thrift, and so on). In terms of traffic control, such as authentication, flow control, and security, API gateways are strongly dependent on each other so that they can be controlled. However, internal traffic of Service Mesh agents is generally weakly dependent because it is in an Intranet environment.

Our real understanding of Service Mesh

As you can see, the API Gateway and Service Mesh actually have a lot in common and a lot of differences. How is the API Gateway Mesh defined? Here’s what we really know about Service Mesh.

The Sidecar in the Service Mesh is such a Sidecar motorcycle. Sidecar decouples the Service Code from the internal communication RPC logic. However, Sidecar seat can not only sit “RPC for internal communication”, but also put other middleware into this Sidecar. API Gateway + Sidecar = API Gateway Mesh. We can also put the MessageQueue Client in the Sidecar, which is Message Mesh.

So, Service Mesh is a pattern and architecture, and the key word is “decouple” your Service code from your middleware.

API Gateway Mesh definition

So the API Gateway Mesh is defined as: (Goldman Sachs) An infrastructure to expose your services as a managed APIs in the form of a pled sidecar proxy, Expose your service code to a controlled API infrastructure.

OK, so far the DEFINITION of the API Gateway Mesh is clear, but why would we want to structure our API Gateway this way? What problem does this solve? To explain these problems, we should look at the development process of Alipay API gateway.

Ant Financial API Gateway Mesh practice

The predecessor of Alipay Mobile Gateway

The first version of Alipay APP was released in 2009. In 2009, it was still the world of Nokia Symbian. APP Mobile terminal was not the main entrance of traffic, so the architecture of APP server was very simple, and all business codes were stacked in a system called Mobile. The ARCHITECTURE provides HTTPS restful services externally. The advantages of this architecture are simple and crude. With the delay of time, the rise of Mobile Internet in 2013, the popularity of smart phones (Android&iOS), more and more business of the company to the Mobile end, a Mobile system has become the bottleneck of research and development, in addition, the stability of the single system has also emerged.

IN 2013, the company proposed the “ALL IN” wireless strategy, which resulted IN the mobile microservices Gateway (the concept of microservices was proposed by Uncle Martin IN 2014), mainly to solve the problem of multi-business team collaboration.

Microservices Gateway architecture

In this gateway architecture, we designed ant Financial wireless RPC protocol (similar to gRPC), which supports multi-language RPC code generation capability of iOS and Android clients, shields network communication details, and adds more security, authentication and monitoring capabilities. Because the threading model of traditional servlets is sensitive to the back-end system RT, we changed the API Gateway communication to Netty asynchronization altogether. In order to solve the unfriendliness of HTTP communication in mobile weak network, we design a private long connection protocol based on TCP. Such an architecture supported the rapid growth of the business for 3-4 years.

But at the end of 2016, the centralized gateway exposed a number of problems, such as:

  • The problem of the gateway becoming more risky: once the logical change of the gateway is published, it will affect all services;
  • Problem of hierarchical service isolation: Core service apis want to isolate resources from non-core service interfaces.
  • Problems in capacity evaluation: It is difficult to evaluate the QPS of thousands of API interfaces during the annual Double 11 and New Year red envelope activities. RT, BodySize and QPS of different APIS have different effects on gateway performance. In order to ensure the stability of gateway entrance, under normal conditions, the capacity will be expanded frantically.

Decentralized gateway

Based on the above problems, we decided to eliminate formal gateways, thus introducing the next generation of gateway architectures: decentralized gateways.

We split the centralized gateway, migrated the routing module with simple logic to spanner load balancer, abstracted the complex logics of gateway, such as authentication, LDC routing and security, into a gateway.jar, so that the service integration jar package has the capability of gateway, so that business systems are isolated. The risk of centralized gateway change also does not affect these systems, which are themselves a “gateway”, and the problem of increasing capacity is no longer an issue.

A new architecture solves some problems, but also introduces some new ones.

The decentralized architecture has been running smoothly for 2 years, connected to more than 30 systems (hundreds of full systems) and carrying 60%-80% of the traffic. Why only 30 systems? Because the current decentralized gateway architecture has many problems, it is difficult to import and promote:

  • Access difficulties: Gateway.jar relies on dozens of jars, plus configuration, and new dependencies are constantly being added in new versions;
  • Jar package conflict: In one case, gateway. Jar depended on Netty earlier version, and some middleware upgrade indirectly upgraded this Netty version, resulting in abnormal gateway Jar function.
  • Difficulty in upgrading: At the beginning, we thought that centralized gateway would bring many versions and difficult to upgrade. However, we naivedly thought that after so many years of development, the gateway was stable and did not need to be changed frequently, and even if it was changed, it would be good to upgrade the system that needed to be updated. But things are always too good to be true: as soon as there is an upgrade, the business side will say: development integration, regression testing, no time! New functions can not be popularized, the whole network upgrade more this super high;
  • Heterogeneous system support: Part of alipay’s business is based on Node.js technology stack. The Node.js middleware team is very excellent. It took 1-2 months to translate the Java code of the gateway with JavaScript, but later it gave up updating, so it is impossible to copy all the new functions, which costs too much. And r&d students have no sense of achievement…

Service Mesh solves similar problems: decouple gateway code from business code, upgrade independently, and support heterogeneous systems. So we integrated the decentralized gateway Jar into the Sidecar of the Service Mesh, introducing the next generation gateway architecture: the Mesh Gateway architecture.

Mesh gateway architecture

To sum up:

  • Microservices Gateway architecture: decouple services and gateways;
  • Decentralized gateway architecture: solve problems such as stability, hierarchical service isolation, and capacity evaluation;
  • Mesh gateway architecture: it solves problems such as difficulty in decentralized upgrade and support for heterogeneous systems.

Ant Financial API Gateway Mesh architecture

The following describes the architecture and problems in the implementation process of Ant Financial API Gateway Mesh.

The diagram above shows the architecture of the API Gateway Mesh, which has three flows:

  • Data flow: The service data is directly forwarded to THE Sidecar of POD in a system by Spanner, and the local or forward request is sent to the SOFA service logic through various checking logic in the gateway.
  • Control flow: Generally, the control surface in the Service Mesh is the Pilot component in Istio. However, due to the poor performance of the original Pilot component in the case of large volume, we do not use the Pilot at present, but directly connect to the gateway background control.
  • Ops flow: a channel for operation and maintenance (O&M). Through K8s operator Sidecar injection, services have the capability of gateway Mesh.

The base of API Gateway Mesh is the open source MOSN Sidecar Proxy of Ant Financial. Based on the modular expansion ability of MOSN, we upgraded a layer of Gateway Core Module. Including the core Server, Router, Pipeline, Service, Config and other core models, integrated Lua, JavaScript and other dynamic scripts to enhance the dynamic capabilities of the gateway, mosN-based protocol expansion capabilities, Easily realized ant Financial MMTP private protocol. When Gateway Core goes online, Gateway products in different scenarios can be expanded by inserting and unplugging different filters and Config, such as ant Financial wireless Gateway, open platform Gateway, financial cloud Gateway and so on. On the control surface we support a variety of configuration delivery channels, including Istio XDS, Amdin RestAPI, K8s ConfigMap and so on.

MOSN:github.com/mosn/mosn

The launch of new technology is not a simple matter!

  • Function: because MOSN is based on Go language development, so we want to switch the Java technology stack to Go, but not just copy Java code, according to Go language characteristics, not only do good functions, better performance;
  • Performance: In the final online pressure test, we found that the Mesh version had some performance improvement over the original Java version. The reason is that we changed the serialization mode from Hessian to Protobuf, and the switch from Java thread mode to Go Goroutine also brought some performance improvement.
  • O&m: O&M wants to focus more on K8s cloud native direction;
  • Risk: Known risks are not risks, how to reduce unknown risks?

The biggest difference between Internet companies and traditional software companies is agile, we will put more energy on the implementation of three axe. Typically, we might spend 30% of the effort on a feature, but 70% of the effort on grayscale, rollback, monitoring.

How do we grayscale and quickly roll back the API Gateway Mesh?

Here’s an example of how Spanner cuts streams for a new network Sidecar. We support tangent flow by percentage, which can achieve slow gray speed and fast rollback. In addition, Sidecar injection of MOSN is not a one-time full group access. We support cutting current verification of integrated MOSN on some single machines in the cluster through Label marking.

Thinking about API Gateway under cloud native

Cloud Yuansheng north-south flow scheme

The experience of Ant Financial in the practice of API Gateway Mesh is introduced above. Next, I would like to share with you the selection of some standard north-south traffic solutions under cloud native.

The figure above shows three mainstream north-south solutions in the industry. The first is K8s Ingress, which has relatively simple functions. The second is Istio Gateway, which has more routing functions than Ingress. The third is the API Gateway with more powerful functions, which can control interfaces and traffic in a more refined manner and choose suitable north-south traffic products according to the characteristics of its own services.

The multifaceted nature of MOSN under cloud native

Next, the multifaceted nature of MOSN is introduced.

As mentioned earlier, the Sidecar of the Service Mesh is not only used for RPCS of north-south traffic. In fact, it can be used as a Sidecar for all traffic.

In the future, MOSN will be positioned as a cloud native fully functional network agent, which can be deployed together with LB as LB Sidecar. Can be independently deployed as a centralized gateway; Can be deployed with a business POD as a decentralized gateway or MessageQueue Client; It can also act as a gateway for cross-cloud communication.

Service Mesh is here, get in the car! And that’s all we have to share.

The authors introduce

Jin Wenxiang (name: Jia Dao), senior technical expert of Ant Financial jia Dao, joined the Alipay wireless team after graduation in 2011. He has been engaged in the research and development of mobile network access, API gateway, micro-services and other related works. At present, he is responsible for the design and optimization of the mobile network access architecture of Ant Financial.

Review this video and share PPT to see the address

Tech.antfin.com/community/a…

Financial Class Distributed Architecture (Antfin_SOFA)