background

  1. Registry ZooKeeper (2C4G)*3
  2. The number of services is 7000+ and the number of nodes is 100000
  3. Online ZK failure: Suddenly a ZK node CPU is too high and is suspended. At this time, the NODE is upgraded, and ZK data is lost. The specific reason is unknown. The outage paralyzed almost all services
  4. In the coming years, there will be a lot more services added

In fact, this kind of failure has occurred several times, ZK problem has not found the specific cause (too much learning), based on the above background, want to replace the ZK registry. Replacing the ZK registry is not an immediate option, as it has been mentioned several times before, so we compared some common registries: ETCD, NACOS, And ZooKeeper

  1. Etcd and ZooKeeper are based on CP mode; Nacos supports AP/CP mode, so if you use NACOS as the registry for Dubbo, it’s natural to use its AP mode
  2. Although etCD and ZooKeeper are in CP mode, etCD is more stable under the same configuration
  3. In order to facilitate the comparison, the daily/pre-delivery environment was connected to three sets of registries to compare the results

alternative

Before we get to the alternatives, let’s briefly discuss the purpose of the registry in Dubbo: provider address repository. So what are the requirements for this registry?

  1. Stability: The registry must be stable. If it is unstable, the provider cannot register itself with the registry, and the consumer cannot get the provider address from the registry, thus failing to call properly
  2. Consistency: If the data on each node in the registry is inconsistent, different consumers will get different provider lists. If the data is less, it may lead to unbalanced provider load. If you take too many, you might call a node that has gone offline
  3. Real-time: When the provider node changes, consumers need to be able to sense it in time

According to CAP theory, either AP or CP correspond to point 1 and point 2 above respectively. So which is more important for the Dubbo registry? There is an online article “Why Alibaba does not use ZooKeeper to do service discovery”, there is an analysis, you can have a look. I also think stability is more important:

  1. Smaller scope of impact: Take ZK as an example, if ZK is unavailable for a long time, all Dubbo temporary nodes will go offline at this time, and the service will not be able to call normally if some degradation measures are not taken at the RPC framework layer. Data inconsistencies may affect only some services
  2. By default, the Failover policy based on Dubbo reduces the impact of data inconsistency

Given the brief introduction to the registry’s use in Dubbo, how do you implement registry migration? The simplest thing to think of is “multiple registries” :

  1. Prepare other registries, such as ZK, ETCD, NACOS, in addition to existing registries
  2. Client: An upgrade is required to ensure that applications can subscribe to and register with multiple registries simultaneously
  3. Provider: The application is registered with multiple centers at startup so that it can be consumed by unupgraded consumers as well as upgraded consumers
  4. Consumer: Subscribes to multiple registries at application startup so that unupgraded and upgraded providers can be consumed

Multi-registry

Dubbo already supports multiple registries. If you want to access multiple registries, you only need to configure them on the client side. However, we need to consider how to make it easier for the business side to access and what is the principle behind Dubbo multi-registry.

Multi-registry access mode

1. The business side can directly access and configure multiple registries

@Bean
public RegistryConfig zkRegistryConfig(a) {
    RegistryConfig registryConfig = new RegistryConfig();
    registryConfig.setProtocol("zookeeper");
    registryConfig.setAddress("localhost:2181");
    return registryConfig;
}

@Bean
public RegistryConfig nacosRegistryConfig(a) {
    RegistryConfig registryConfig = new RegistryConfig();
    registryConfig.setProtocol("nacos");
    registryConfig.setAddress("localhost:8808");
    return registryConfig;
}


// Or in the application
dubbo.registries.zk.protocol=zookeeper
dubbo.registries.zk.address=xxx

dubbo.registries.nacos.protocol=nacos
dubbo.registries.nacos.address=xxx
Copy the code
  1. This approach requires each business party to configure the information of multiple registries in the application, which is costly and difficult to promote
  2. The metadata of the registry is scattered among all clients, making it difficult to manage. Imagine if we wanted to switch from NACOS to ETCD later? Did you make one such change?

2. Configure the registry through Dubbo’s configuration center

dubbo.registries.zk.protocol=zookeeper
dubbo.registries.zk.address=xxx

dubbo.registries.nacos.protocol=nacos
dubbo.registries.nacos.address=xxx
Copy the code

  1. Add registry information to the global configuration of the Dubbo console
  2. This way registry information is centrally managed and manual configuration for each client is avoided

3. Built-in registry

  1. Generally for internal versions, such as based on Dubbo binary
  2. You can put registrie-related information into Apollo, load it from Apollo during application startup and generate a registry to inject into Dubbo

Each approach can serve the purpose of accessing multiple registries, depending on the situation.

Principle analysis of multiple registries

In fact, it mainly involves one core question:How does Dubbo interact with the registry? How does the provider write? How do consumers read it?Once this problem is clarified, the principle of multiple registries is naturally understood, as shown in the figure:

  1. An additional layer of routing and load balancing
  2. Each registry can be viewed as an Invoker, and the corresponding registry Invoker can be found through pre-routing, then the registry Invoker can be found according to the pre-load balancing policy, and then the complex logic of Dubbo default routing can be followed
  3. By default, there is no registry routing and load policy, so the fixed (first) registry is selected
AbstractClusterInvoker#invoke =>
ZoneAwareClusterInvoker#doInvoke =>
MockClusterInvoker#invoke =>
AbstractClusterInvoker#invoke =>
RegistryDirectory#doList =>
Copy the code

Q&A

Configuration center issues under multiple registries

  1. By default, if an application has multiple registries, the system automatically generates multiple configuration centers without manually configuring the configuration centers, and the configurations are overwritten. In Dubbo2.7.8, etcd cannot use the default configuration center, there are bugs (SPI etcd and etcd3 issues), need to be fixed
  2. In the case of multiple registries, if you want to configure one configuration center manually without having multiple configuration centers at the same time, there is a problem: the business side needs to modify the code to manually configure the configuration center in the application. This approach has a major problem: it is not conducive to the later configuration center migration, such as etCD now, later want to use nacOS; Access costs are also high, in addition to client upgrades, code changes
  3. You can place the configuration center information at Apollo and load the configuration center before Dubbo starts. In this way, the client does not need to maintain the configuration center information, and it is convenient

Metadata center issues under multiple registries

The metadata center is located in the Dubbo configuration center and is also centrally managed

Does it matter if a registry fails

  1. Multiple registrations are a good way to downgrade
  2. In the case of multiple registries, if one of the registries dies, the Invoker is destroyed, and the pre-route automatically selects other registries

Offline problems with existing registries

  1. Once all applications have registered with multiple registry centers, ZK can be taken offline
  2. By default, all clients have ZK configured as a registry in their code. In this case, ZK Server goes offline and the client receives an error: some retry task
  3. Will this update force the client to remove the ZK registry from the code? Then delete the ZK registry directly from the Apollo configuration center

Problem to be solved

  1. By default, the Dubbo console displays only the service information of one registry. If multiple registries are used, how can I easily view the service information of all registries?
  2. Dubbo control console does not have ETCD at present, so it needs to be reformed