This article will not post too much source code, the basic picture and text narrative

The full text contains 2582 words and the expected reading time is 12 minutes

  • What is a Nacos
  • Configure the architecture of the center
  • Nacos example
    • Official code Example
    • The Properties interpretation
    • Hierarchical design of configuration items
  • Nacos client parsing
    • Access to the configuration
    • Register listeners
    • Configure long polling
  • Nacos server parsing
    • Configure the Dump
    • Configuration register
    • Handles long polling
  • The full text summary

What is a Nacos

Nacos is an open source project initiated by Alibaba at github.com/alibaba/nac… . Nacos mainly provides two kinds of services. One is the configuration center, which supports configuration registration, change delivery and hierarchical management, meaning that the internal configuration items can be dynamically updated without stopping the service. Second, as a naming service, it provides service registration and discovery functions. It is usually used as a medium between Client and Server of RPC framework, and also has health monitoring, load balancing and other functions.

This article focuses on the first piece of Nacos functionality, the implementation of the configuration center. Describe what components are typically required for a configuration center and explore how these designs are reflected in the Nacos 1.1.4 source code.

Configure the architecture of the center

The configuration hub itself is not complicated, if you leave the CAP trade-offs aside. The basic function of the configuration center is to store a key-value pair. Users publish a configKey and the client obtains the configValue. The advanced feature is that when a configuration item changes, the change tells the client to refresh the old value.

The architecture diagram below briefly describes the rough architecture of a configuration center. Users can publish configurations through the management platform, register configurations with the server through HTTP calls, and the server stores them in persistent storage engines such as MySQL. The user accesses the configuration of the server through the CLIENT SDK and establishes HTTP long polling to listen for configuration item changes. In addition, to reduce the pressure on the server and ensure the DISASTER recovery feature, after the configuration item is pulled from the client, a snapshot is saved in a local file. The SDK takes precedence over the contents in the file.

Here omit many details, such as configuration, hierarchical design permission check, client long polling interval, the service side every query need to access MySQL, configuration changes are active and regularly polling triggers, such as push or, as well as operational high availability aspects (private thought that this is the essence of the configuration center), such as node deployment across regions, Configure how to ensure that changes can be written and pushed during network partitioning. The real realization of a high-quality configuration center, or need a long time to polish.

Nacos example

The source code described below is based on Nacos 1.1.4

Official code Example

The first step is to pass the configuration and create a New ConfigService instance. The second step is to get the configuration through the corresponding interface and register the configuration listener. The usage method is very simple and easy to understand.

try {
    // Pass the configuration
	String serverAddr = "{serverAddr}";
	String dataId = "{dataId}";
	String group = "{group}";
	Properties properties = new Properties();
	properties.put("serverAddr", serverAddr);
    
    / / new configService
	ConfigService configService = NacosFactory.createConfigService(properties);
	String content = configService.getConfig(dataId, group, 5000);
	System.out.println(content);
    
    // Register the listener
    configService.addListener(dataId, group, new Listener() {
	@Override
	public void receiveConfigInfo(String configInfo) {
		System.out.println("recieve1:" + configInfo);
	}
	@Override
	public Executor getExecutor(a) {
		return null; }}); }catch (NacosException e) {
    // TODO 
    -generated catch block
    e.printStackTrace();
}
Copy the code

The Properties interpretation

ServerAddr passes the address list of the server side of the configuration center, which is parsed into an address list for management by the internal class ServerListManager. When making HTTP calls, the surviving machines will be selected to join together into a URL to complete the call. Once the address is thrown abnormally during the call, The client will take some actions, such as converting the node selected next time. It is important to note that this is not usually hardcoded in practice and can be configured on Zookeeper or a registered discovery center to dynamically pull at startup.

Hierarchical design of configuration items

Nacos officially gave the following design:

DataId can be understood as a user-defined configuration key, and group can be understood as a configuration group name, which belongs to the concept of configuration level design. To put it simply, the configuration center supports different partitions through hierarchical design to distinguish different environments, different groups, and even different developers to meet the needs of grayscale publishing and testing during development. Therefore, any design is ok, as long as it has meaning, such as the picture below is not impossible.

Nacos client parsing

Access to the configuration

The main method to obtain the configuration is the getConfigInner method of the NacosConfigService class. In general, this method directly obtains the configuration value from the local file. If the local file does not exist or the content is empty, it uses the HTTP GET method to pull the configuration from the remote end. Save the snapshot to the local snapshot.

Nacos provides two fusing policies when obtaining remote configurations over HTTP: timeout and maximum number of retries (three by default).

Register listeners

It is common for a configuration center client to register a listener for a configuration item to perform a callback when a configuration item changes.

iconfig.addListener(dataId, group, ml);
iconfig.getConfigAndSignListener(dataId, group, 1000, ml);
Copy the code

Nacos can register listeners in this way, and their internal implementations all call addCacheDataIfAbsent of the ClientWorker class. CacheData is an instance that maintains a configuration item and all listeners registered under it.

All CacheData is stored in the atom cacheMap in the ClientWorker class, whose core members are:

Among them, content is the configuration content, MD5 value is the key to detect whether the configuration changes, and an array of several listeners is maintained internally, and these listeners are called back in turn once changes occur.

Configure long polling

ClientWorker configures long polling through two thread pools: a single-threaded executor that fetches cacheData instances for polling every 3,000 cacheData entries every 10ms. Package it as a LongPollingTask and submit it to a second thread pool for executorService processing.

The long polling task is divided into four steps:

  1. Check the local configuration. Ignore the configuration items that do not exist in the local snapshot and check whether there are configuration items that require a callback listener
  2. If there is no local configuration item, obtain the configuration item from the server and return the list of the changed key values
  3. Each key value obtains the latest configuration from the server, updates the local snapshot, and completes the missing configuration
  4. checkMD5Whether the tags are consistent or not requires a callback listener

If the polling task throws an exception, wait some time before starting the next call to relieve the server. Nacos also limits the stream code in the HTTP utility class, reducing the risk of polling or heavy traffic in a variety of ways. As we will see below, if no changed key is found on the server, the server will hang on to the HTTP request for a period of time (the default timeout on the client side is 30 seconds) to further reduce the polling frequency of the client and the pressure on the server.

Nacos server parsing

Configure the Dump

When the server starts up, it relies on the init method of DumpService, stores the load configuration from the database on the local disk, and caches some important meta information, such as THE MD5 value, into the memory. The server determines whether to dump all configuration data or some incremental configuration data from the database based on the last heartbeat time saved in the heartbeat file (if the last heartbeat interval is less than 6 hours).

Full dump clears the disk cache and flushes 1000 configurations at a time to disk and memory based on the primary key ID. Incremental dump is to retrieve the newly added configurations (including updated and deleted configurations) in the last 6 hours, refresh the memory and files based on this batch of data, and compare the database with all the data in the memory. If there is any change, then synchronize the data again. Compared with full dump, it can reduce the number of database I/O and disk I/O.

Configuration register

Nacos server is a SpringBoot implementation of the service, the registration configuration main code is located in ConfigController and ConfigServletInner. The server is typically a multi-node cluster, so the request is initially made to a single machine that inserts the configuration into MySQL for persistence. This code is simple enough not to go into detail.

Instead of accessing MySQL for every configuration query, the server relies on the dump function to cache the configuration in a local file. Therefore, after a single machine has saved the configuration, it needs to notify other machines to refresh the memory and file contents in the local disk. Therefore, it will issue an event named ConfigDataChangeEvent, which will notify all cluster nodes (including itself) through HTTP calls, triggering the local file and memory refresh.

Handles long polling

As mentioned above, the client will have a long polling task to pull configuration changes on the server. How does the server handle this long polling task? The source logic is located in the LongPollingService class, which has a Runnable task named ClientLongPolling. The server wraps the polling request as a ClientLongPolling task. The task holds an AsyncContext response object (a new mechanism in Servlet 3.0) and is executed 29.5s later through a timed thread pool.

Why return 500ms earlier than the client’s 30s to ensure that the client will not time out due to network delay

Note that at the same time the ClientLongPolling task is submitted to the thread pool for execution, the server also holds all polling requests that are being tamped through a queue allSubs, because during the time the configuration item is tamped, If a user changes a configuration item through the management platform, or the server node receives a dump refresh notification from another node, cancel the tamping task immediately and notify the client of the data change in time.

To do this, the LongPollingService class, which inherits from the Event interface and is actually an Event trigger itself, needs to implement the onEvent method, whose Event type is LocalDataChangeEvent.

When a server receives a configuration change while the request is being rammed, it issues an event notification of type LocalDataChangeEvent (note the difference from ConfigDataChangeEvent above). The change is then wrapped as a DataChangeTask asynchronous execution, which finds the tampered ClientLongPolling request from allSubs and writes the change to force it to return immediately.

So the complete process is as follows, if the node is not receiving the request, then ignore the first step of persistent configuration and start:

The full text summary

This article focuses on the source code implementation of Nacos as the configuration center, including the client and the server two parts, the content basically covers the key points of the configuration center function, not only as a learning summary, but also hope to help reading friends.