Nacos profile

Nacos/N ɑ: K əʊs/ is an acronym for Dynamic Naming and Configuration Service. The goal is to build a dynamic service discovery, configuration management, and service management platform that makes it easier to build cloud native applications.

Nacos in Alibaba originated from the Multicolored Stone project in 2008 (the project completed the separation of micro-services and the construction of the middle stage of the business), and grew up in the peak test of Alibaba double 11 in ten years. In this stage, Nacos mainly helped businesses solve the scalability and high availability of micro-services, and solved the scalability problem of millions of instances (10W -> 100W instances). In 2018, we deeply felt the influence of the open source software industry, so we decided to open source Nacos, export the accumulation of alibaba decade on service discovery and management, promote the development of micro services industry, and accelerate the digital transformation of enterprises.

With the development of cloud native technology and the proposal of service grid technology in recent years, more and more companies are trying to migrate the micro-service architecture to the service grid architecture, which also raises a new demand for Nacos, that is, how to better support the service grid ecosystem.

Nacos seamlessly supports the service grid

Let’s take a look at the architecture under Microservices 1.0, where traffic comes in through Tengine, passes through the microservices gateway, and then enters the microservices architecture.

Here is why there are two layers of gateways. The first layer Tegine is responsible for traffic access, and its core capabilities are anti-heavy traffic, security protection and HTTPS certificate support. It pursues universality, stability and high performance. The second layer is the microservice gateway, which focuses on authentication, service governance, protocol transformation, dynamic routing and other micro-service-related capabilities. For example, the open source Spring Cloud Gateway and Zuul are all microservice gateways.

After traffic enters the micro-service system, it will realize the invocation between services through the micro-service framework, such as HSF/Dubbo, Spring Cloud, etc. Then the core role of Nacos here is service discovery capability. For example, Cousumer will first obtain the service list address of the provider from Nacos. The call is then made, and the microservice gateway also gets a list of upstream services through Nacos. These capabilities are mainly provided through the SDK, and some load balancing and load protection policies will be added to the SDK.

The microservices 1.0 architecture has the following major problems:

1. Tengine does not support dynamic configuration, including the open source Nginx native. Alibaba implements the configuration change through regular reload configuration, which results in the configuration cannot be changed in time, affecting the r&d efficiency;

2. In Fat SDK mode, the logic of service governance and service discovery is strongly coupled with THE SDK. If the logic needs to be changed, the SDK needs to be modified to promote the upgrade of the business side;

3. SDKS of different languages need to be maintained in multiple languages, which costs a lot and makes it difficult to unify service governance policies;

With the development of cloud native technology and microservices 2.0 architecture, many companies are trying to solve the problems in microservices 1.0 architecture through service grid technology. In microservices Architecture 2.0, traffic flows into microservices through the Ingress gateway. Unlike the 1.0 architecture, the data side Envoy and control side Istio are introduced. The Envoy is deployed in the same Pod as the application in Sidecar mode, hijacking the incoming and outgoing traffic of the application. Then, traffic control, security, and observability can be realized through XDS configuration delivered by Istio on the control plane. The advantage of this architecture is that it decouples the service governance capability from the service logic, and separates most SDK capabilities from the service framework and sinks into Sidecar, realizing unified governance of different languages.

There are many technical advantages of service grid, but the introduction of new architectures will also bring new problems, especially for companies with heavy technical burdens, such as sidecar performance issues, proprietary protocol support issues, how to smooth the migration of old and new architectures, and so on.

This article focuses on the issue of smooth migration of old and new architectures. Smooth migration inevitably faces two problems with service discovery:

1. How to discover each other between the old and new architectures, because the migration process is bound to exist in the coexistence of two systems, applications need to call each other;

2. How does the registry support the microservice grid ecosystem, as ISTIO currently supports the K8S Service discovery mechanism by default;

We see in Nacos service grid is how to solve these problems, under the ecological architecture diagram as follows, flow from the cloud native gateway (cloud native gateway, it has the characteristics of is compatible with micro service architecture to support both micro service gateway, also conforms to the cloud native architecture, support K8s standard ingress gateway) come in, and then enter the micro service system, In microservices, 1.0 applications (non-meshed applications) coexist with meshed applications.

Firstly, we can see how non-meshed applications access meshed applications. From this architecture diagram, we can see that non-meshed applications register or subscribe services from Nacos through SDK. Providers that have been meshed will also register with Nacos. In this way, non-meshed applications can also access the meshed application service information. Provider registration services are usually through the SDK, because the open source Envoy does not support proxy registration. Of course, when we implement it internally, we have actually lowered the service registration ability to Sidecar.

Another question is how service discovery is done for meshed applications. As you can see in the lower part of the architecture diagram, Nacos already supports MCP Server’s capabilities. Istio takes a full list of service information from Nacos via THE MCP protocol and delivers it as an XDS configuration to envoy, enabling service discovery in mesh applications. Non-meshed services can also be accessed. Services discovered during the meshing process can be seamlessly migrated without any modification.

MCP is a configuration synchronization protocol between components proposed by Istio community. This protocol was abandoned after 1.8. The alternative is MCP over XDS, which is compatible with both protocols.

In addition to the MCP protocol synchronization scheme, there are other schemes to synchronize the service data of the registry to the ServiceMesh architecture. We compare these schemes, as described below:

Nacos service grid ecological Ali landing practice

Finally, I would like to introduce the practice of Alibaba Nacos service grid ecology. The following picture summarizes two scenarios of Alibaba’s landing.

Scene 1: The communication between the group and the nail cloud is essentially the application communication in the hybrid cloud scenario. We use gateways to get through the two environments. Nail VPC uses MSE cloud native gateway here, and the group uses Dubbo3.0 Triple protocol for network communication. The control surfaces of gateways use Istio, which synchronizes service list data from Nacos through MCP.

Using this architecture solves two problems:

1. Communication security between private cloud and public cloud networks, because gateways use MTLS to encrypt communication; 2. Smoothing supports microservice architecture, because the application invokes the gateway through triple protocol, no code change is required for business, and service discovery synchronizes data through Nacos MCP; This architecture is also used in the ant cluster interaction scenario, shown on the left, where the ant gateway uses the Mosn on Envoy architecture.

Scene 2: The group’s microservice mesh scenario, which corresponds to the middle and bottom part of this diagram, differs from the community in that Envoy directly connects to the Nacos registry. The main consideration in using this solution is performance. Some of our applications will have tens of thousands of instance IP addresses. Causes Istio OOM or Envoy DATA side CPU spikes.

The live playback address: yqh.aliyun.com/live/detail… , can also scan to see the nail group live playback.

Ali cloud MSE cloud products has been completed Nacos ecological integration service grid, welcome to fit experience, address: www.aliyun.com/product/ali…