Use the question before
- How do I start the server? When the server is started, it is bound to the corresponding
RequestHander
, know how to start the Server, you can find the Server after receiving the request processing process - How do I start the client? It might be to set up a connection and start some tasks
- How does the client and server communicate? How many connections does each client make? Long connection? Short connection? What happens if the server changes during the process (restart/outage, etc.)?
- Configure the push process? What if it fails? Does the client wait while the server is retrying? If a server fails, will it be switched to another server?
- Service registration process?
- How is the data stored?
The Server end to start
- Function: Nacos supports both configuration center and registry, and the registry determines whether to use AP mode or CP mode according to whether it is a temporary node. From this point of view, the function is relatively powerful;
- In operation and maintenance: Nacoa supports the configuration center and the registry to be deployed independently, and also supports the configuration center and the registry to be deployed in the same process, depending on their own needs, from this point of view, the deployment is relatively flexible;
Both the configuration center and the registry have springBoot projects in Nacos, which are easier to deploy, but from my perspective, there are some disadvantages, such as a more chaotic startup process for Nacos. Reading the code, we found that the core functionality of Nacos is triggered by @Postconstruct annotations, which are distributed under different modules and packages throughout the functionality. There is no problem with using Spring’s features to trigger the launch of Nacos’s core functionality, but it is still inconvenient for those who want a quick look at the Nacos startup process. Since these annotations are scattered, to understand the startup process, you need to know which @PostConstruct annotations are available, and then understand the logic of each @PostConstruct annotation in detail to form the overall concept.
Below is a list of @PostConstruct annotations for different modules in 2.0.0-alpha.2
1. console com.alibaba.nacos.console.config.ConsoleConfig
2. cmdb com.alibaba.nacos.cmdb.memory.CmdbProvider
3. config com.alibaba.nacos.config.server.auth.ExternalPermissionPersistServiceImpl
4. config com.alibaba.nacos.config.server.auth.ExternalRolePersistServiceImpl
5. config com.alibaba.nacos.config.server.auth.ExternalUserPersistServiceImpl
6. config com.alibaba.nacos.config.server.controller.HealthController
7. config com.alibaba.nacos.config.server.filter.CurcuitFilter
8. config com.alibaba.nacos.config.server.service.capacity.CapacityService
9. config com.alibaba.nacos.config.server.service.capacity.GroupCapacityPersistService
10. config com.alibaba.nacos.config.server.service.capacity.TenantCapacityPersistService
11. config com.alibaba.nacos.config.server.service.datasource.LocalDataSourceServiceImpl
12. config com.alibaba.nacos.config.server.service.dump.EmbeddedDumpService
13. config com.alibaba.nacos.config.server.service.dump.ExternalDumpService
14. config com.alibaba.nacos.config.server.service.repository.embedded.EmbeddedStoragePersistServiceImpl
15. config com.alibaba.nacos.config.server.service.repository.embedded.StandaloneDatabaseOperateImpl
16. config com.alibaba.nacos.config.server.service.repository.extrnal.ExternalStoragePersistServiceImpl
17. core com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy
18. core com.alibaba.nacos.core.remote.AbstractRequestFilter
19. core com.alibaba.nacos.core.remote.BaseRpcServer
20. core com.alibaba.nacos.core.remote.ClientConnectionEventListener
21. core com.alibaba.nacos.core.remote.ConnectionManager
22. istio com.alibaba.nacos.istio.mcp.NacosMcpServer
23. istio com.alibaba.nacos.istio.mcp.NacosMcpService
24. naming com.alibaba.nacos.naming.cluster.ServerListManager
25. naming com.alibaba.nacos.naming.cluster.ServerStatusManager
26. naming com.alibaba.nacos.naming.consistency.ephemeral.distro.DistroConsistencyServiceImpl
27. naming com.alibaba.nacos.naming.consistency.ephemeral.distro.DistroHttpRegistry
28. naming com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientComponentRegistry
29. naming com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore
30. naming com.alibaba.nacos.naming.consistency.persistent.raft.RaftPeerSet
31. naming com.alibaba.nacos.naming.consistency.persistent.raft.RaftConsistencyServiceImpl
32. naming com.alibaba.nacos.naming.core.DistroMapper
33. naming com.alibaba.nacos.naming.core.ServiceManager
34. naming com.alibaba.nacos.naming.misc.GlobalConfig
35. naming com.alibaba.nacos.naming.misc.SwitchManager
36. naming com.alibaba.nacos.naming.monitor.PerformanceLoggerThread
Copy the code
- Loading a Configuration File
- Start the log handler
- Start the ServerMemberManager
- Start the MemberLookup
- Start the gRPC server
- Start Distro protocol, associated with the AP mode of the registry
- Start the RAFT protocol, which is related to the CP schema of the registry
Cluster Node Management
Nacos server startup can be divided into standalone mode and cluster mode: standalone mode is mainly convenient for us to debug, we can specify Nacso to start in standalone mode by adding -dnacos. standalone=true startup parameter; There are several cases of cluster mode. In cluster mode, whether AP mode or CP mode, it is generally necessary to know the list of servers, so that the communication between the servers is easy. How to know the server nodes in Nacao? Nacos provides related APILookupFactory, MemberLookup: LookupFactory is a higher level API, easy for us to quickly get/switch MemberLookup, focus on MemberLookup. MemberLookup in Nacos China has three implementations:
StandaloneMemberLookup
: Corresponding to Nacos single-machine mode, the core implementation of this class is to obtain the local IP portFileConfigMemberLookup
From:/${user.home}/nacos/conf/cluster.conf
To get the server listAddressServerMemberLookup
: Gets the server list from a separate address server
1. If it is a simple test, you can start it in single-machine mode, which corresponds to standAlonemberLookup
-Dnacos.standalone=true
Copy the code
2. Start in cluster mode. Each server corresponds to one machine
1. Launch parameters added - Dnacos. Member. The list = IP1:8848, IP2:8848, IP3:8848 2. Or in the configuration file file ` / Users/luoxy nacos/conf/cluster. The conf `, each IP: port corresponds to one rowCopy the code
3. In cluster mode, three server nodes are deployed on the same machine, but are distinguished by different ports. Note that there are some limitations in this mode
-Dserver.port=8848 -Dnacos.home=/Users/luoxy/nacos8848 -Ddebug=true - Dnacos. Member. The list = 172.16.120.249:8848172.16. 120.249:8858172.16. 120.249:8868 must specify ` nacos. Home ` parameters, because the user directory by default, If all three server nodes are started on the same machine, there will be a conflict between the data directories corresponding to JRAFT and the next two nodes will fail to startCopy the code
The Client end to start
The main thing is to establish a connection to the server
Configuration center
NacosFactory#createConfigService
=> new NacosConfigService
=> new ClientWorker
=> new ServerListManager
=> ConfigRpcTransportClient
The registry
NacosFactory#createNamingService => new NacosNamingService => new NamingClientProxyDelegate => new NamingGrpcClientProxy => RpcClientFactory#createClient => GrpcSdkClient#start
Connection management
1, connect
- ConnectionBasedClient: long connection, for version 2.x
- IpPortBasedClient: for 1.x
2. Connection manager
- ConnectionBasedClient ConnectionBasedClientManager: management
- IpPortBasedClient EphemeralIpPortClientManager: management, in view of the temporary node
- IpPortBasedClient PersistentIpPortClientManager: management, in view of the permanent node
The dependencies are as follows
- A connection is executed when it is created or disconnected
ClientConnectionEventListenerRegistry#notifyClientxxx
Method, and then notificationConnectionBasedClientManager#clientxx
Method to update the corresponding client cache ConnectionBasedClientManager, EphemeralIpPortClientManager
During initialization, a scheduled task is started to check whether the client has expired. The default expiration time is 30s and 5s. WhyPersistentIpPortClientManager
What about starting such a mission? Because the first two are for temporary nodes, andPersistentIpPortClientManager
This pair of permanent nodes.
1. Heartbeat sending: During service registration on the HTTP client, a heartbeat task is created and sent to the server every 5s. In gRPC, heartbeat tasks are not created and clients are refreshed based on TCP connection status, such as removing invalid clients
- Server receiving heartbeat: The gRPC client does not send heartbeat, so the server does not receive heartbeat. Heartbeat sent by Http clients
Configuration updates
To quickly understand the overall process of publishing a configuration, create a configuration directly from the Control Console
- Request HTTP interface of background control console:
/v1/cs/configs
Corresponding to Controller isConfigController
- Update the configuration through the external storage service
ExternalStoragePersistServiceImpl PersistService implementation class
, which ultimately corresponds to executing the SQL write library - release
ConfigDataChangeEvent
Event: Pushes events toBlockingQueue
And then notify all subscribers in turn. Note that the subscriber here does not mean the client listening, but the subscriber on the Server side, that is, the Server side receivesConfigDataChangeEvent
The subsequent processing after the event is primarily observer mode - Execute notify for the subscriber
- Log and return results
So, what are the subscribers on the Server side? Through debugging the source code, we found that we can register the subscribers through the static method NotifyCenter#registerSubscriber. The core subscribers are as follows:
- RpcConfigChangeNotifier: corresponds to the gRPC client and is used for processing
ConfigDataChangeEvent
Event to notify the client of the event - LongPollingservice: corresponding to the Http client for processing
LocalDataChangeEvent
Event to notify the client of the event - AsyncNotifyService: For processing
ConfigDataChangeEvent
Event, used to notify other server nodes of configuration changes and update the dump cache of this server
The service registry
There are two modes of service discovery AP/CP
AP mode
The AP mode has one feature: The registered node type is temporary node. Data is not persisted; Distro protocol is used: a consistency protocol for temporary data
Properties properties = new Properties();
// Address of Nacos's service center
properties.put(PropertyKeyConst.SERVER_ADDR, "localhost:8848");
NamingService nameService = NacosFactory.createNamingService(properties);
Instance instance = new Instance();
instance.setIp("127.0.0.1");
instance.setPort(8009);
instance.setMetadata(new HashMap<>());
instance.setEphemeral(true);
nameService.registerInstance("nacos-api", instance);
Thread.sleep(Integer.MAX_VALUE);
Copy the code
- The gRPC client connects to the Server
- The client sends a request to the Server (RCC Ent #request)
- The server receives the request. Before receiving a request, there is a question that needs to be clarified: how does the server start? We need to know this because normally when we start the Server, we will bind the corresponding
RequestHandler
If we know this, we can easily debug breakpoints, breakpoints can quickly master the whole process. So in Nacos, how is the server turned on? The core is as follows:BaseRpcServer
be@PostConstruct
An annotation modifier that binds such a Hander when the server is startedGrpcRequestAcceptor -> InstanceRequestHandler
- Enter the first
GrpcRequestAcceptor#request
Methods; Then enter theRequestHandler#handleRequest
Method, in this process, deals with permissions first; Then enter theInstanceRequestHandler#handle
Method, in which it decides whether to register the service or to destroy the service, taking the registration service as an example; Then enter theEphemeralClientOperationServiceImpl#registerInstance
Method,EphemeralClientOperationServiceImpl ClientOperationService is an implementation class, A implementation class, also PersistentClientOperationServiceImpl ClientOperationService distribution corresponding to the AP and the CP mode
- perform
AbstractClient#addServiceInstance
Method, update the Map cache, and publish oneClientEvent.ClientChangedEvent
The event - To release a
ClientOperationEvent.ClientRegisterServiceEvent
The event - To release a
MetadataEvent.InstanceMetadataEvent
Event, and return to the client
The main flow may seem simple, but there is some core logic hidden in the process of receiving the event. For example, if you update the cache of one server node, how to synchronize the data to other server nodes, you will find that there are many problems.
In a service registry, 3 events are published, but there are 4 events in total. In the listener logic of some events, new events are published. Let’s see what each of these events is involved in
- ClientEvent. ClientChangedEvent: the event listener is corresponding
DistroClientDataProcessor
. theDistroClientDataProcessor
When did you register?DistroClientComponentRegistry#doRegister
be@PostConstruct
In this method is registeredDistroClientDataProcessor
.DistroClientDataProcessor
receivedClientEvent.ClientChangedEvent
After the event, will passDistroProtocol
Synchronizing data changes to other server nodes answers the first question. How to synchronize, we will analyze in detail next - ClientOperationEvent. ClientRegisterServiceEvent: the event listener is corresponding
ClientServiceIndexesManager
This is aspring bean
.ClientServiceIndexesManager
receivedClientOperationEvent.ClientRegisterServiceEvent
The cache is updated first after the eventConcurrentMap<Service, Set<String>> publisherIndexes
“And then release another oneServiceEvent.ServiceChangedEvent
Events.publisherIndexes
What exactly does it do? We’ll analyze it later - MetadataEvent. InstanceMetadataEvent: the event listener is corresponding
NamingMetadataManager
This is aspring bean
.NamingMetadataManager
receivedMetadataEvent.InstanceMetadataEvent
After the event, the logic is also used to update the information in both cachesexpiredMetadataInfos
,serviceMetadataMap
- ServiceEvent. ServiceChangedEvent: this event is corresponding supervisor
NamingSubscriberServiceV2Impl
This is aspring bean
.NamingSubscriberServiceV2Impl
receivedServiceEvent.ServiceChangedEvent
After the event, will toPushDelayTaskExecuteEngine
A mission has been lost, the intent is not clear
There are still some things I don’t understand, such as the purpose of these caches
Distro agreement
Distro protocol is positioned as a consistency protocol for temporary data. That is, there is no need to store data to disk or database. Temporary data usually maintains a session with the server and data will not be lost as long as the session exists.
The CP mode
CP mode has one characteristic: The registered node type is non-temporary node. Data is persisted
Properties properties = new Properties();
// Address of Nacos's service center
properties.put(PropertyKeyConst.SERVER_ADDR, "localhost:8848");
NamingService nameService = NacosFactory.createNamingService(properties);
Instance instance = new Instance();
instance.setIp("127.0.0.1");
instance.setPort(8009);
instance.setMetadata(new HashMap<>());
instance.setEphemeral(false);
nameService.registerInstance("nacos-api", instance);
Thread.sleep(Integer.MAX_VALUE);
Copy the code
It turns out that temporary nodes are registered, the result of local debugging, and the result confirmed with the official: CP mode cannot be used in long connection mode.
However, persistent nodes can be registered via Http, as follows:
curl -X POST 'http://127.0.0.1:8848/nacos/v1/ns/instance? Port = 8848 & healthy = true&ip = 11.11.11.11 & weight = 1.0 & serviceName = nacos. Test. 3 & encoding = GBK&namespaceId = n1 & ephemeral = false '
Copy the code
However, an error message is displayed. (The current server deployment is as follows: A single node is started. Startup parameter – dnacos.standalone =true -Ddebug=true)
caused: java.util.concurrent.ExecutionException: com.alibaba.nacos.consistency.exception.ConsistencyException: com.alibaba.nacos.core.distributed.raft.exception.NoLeaderException: The Raft Group [naming_persistent_service_v2] did not find the Leader node; caused: com.alibaba.nacos.consistency.exception.ConsistencyException: com.alibaba.nacos.core.distributed.raft.exception.NoLeaderException: The Raft Group [naming_persistent_service_v2] did not find the Leader node; caused: com.alibaba.nacos.core.distributed.raft.exception.NoLeaderException: The Raft Group [naming_persistent_service_v2] did not find the Leader nodeCopy the code
It seems to be related to the leader election. Preliminary judgment is that the server deploys a single node
CP mode at the core of the implementation is based on open source jraft: www.sofastack.tech/projects/so…
/ / send a heartbeat curl -v – X PUT “http://localhost:8848/nacos/v1/ns/instance/beat? serviceName=xxx”
IpPortBasedClient ClientBeatCheckTaskV2
InstanceBeatCheckTaskInterceptorChain
HealthCheckEnableInterceptor HealthCheckResponsibleInterceptor InstanceEnableBeatCheckInterceptor ServiceEnableBeatCheckInterceptor
InstanceBeatCheckTask UnhealthyInstanceChecker ExpiredInstanceChecker
Configuration management
If there are 3 server nodes in a cluster, client 1 is connected to server1, and client 2 is connected to server2, when client 1 writes a configuration, how can client 2 listen to the configuration change? In other words, if a configuration is changed, how can it be synchronized between different server nodes?
Server1 receives the request to update the database and then update the local Dump file. After receiving the event, AsyncNotifyService obtains all server nodes in the cluster. Then, a NotifySingleRpcTask is encapsulated for each server node based on the ConfigDataChangeEvent event. Finally, the NotifySingleRpcTask is merged into AsyncRpcTask and sent to the thread pool for execution. (For HTTP requests, 4, the execution logic of AsyncRpcTask: get the NotifySingleTask and check whether the server node in the NotifySingleRpcTask is the current node, if so, the new local dump file; Otherwise, gRPC requests are sent to other server nodes. 5. What if an error occurs during the notification to other servers? The retries continue, but the retry delay increases with the number of retries until the number of retries reaches 6. GrpcRequestAcceptor – > ConfigChangeClusterSyncRequestHandler, implement the local dump operation, the information in the database query, update to the cache file 7, dump operation: The latest value is checked from the database, the local cache file is updated, and a LocalDataChangeEvent is published. Both RpcConfigChangeNotifier and LongPollingService listen for this event. After receiving listener events, find all listeners (clients) that are listening to the key and send notices to the clients. After receiving the notices, the clients find corresponding listeners based on the key and execute them in sequence. In the BaseRpcServer class, a method is annotated with @postconstruct. This method is used to start the gRPC server. Mainly to reduce the database read pressure,
Service discovery
There are two modes of service discovery AP/CP
1. In AP mode, temporary nodes are registered, there is no persistence involved, data is stored in a Map, and data synchronization between servers is broadcast. If it fails, try again
Configuration center
There are connections
- There are two ways: Http and gRPC. GRPC is not available until version 2.0. We will only discuss gRPC in this case
- Connections are only created when the client actually uses them, such as when ConfigService is initialized and when configuration is pushed
- In the current version (2.0.0.-alpha.2), only one gRPC connection will be established per client, but depending on the intent of the code, in later versions it is possible that one client can support multiple connections and then assign taskId to different ones (ClientWorker#ensureRpcClient).
- Before establishing a connection, the system first obtains the list of all servers, and then selects the first one to establish a connection. If the connection succeeds, the system returns. If it fails, try again. The next server will be selected for retries, up to 3 times (RpcClient#start).
- What can I do if the connection fails after three attempts in Step 4? In Step 4, in addition to establishing a connection, two background threads are started: one to handle connection failures; A thread is used to handle the callback after a successful connection
About the request
- The client sends a push request to the specified server and tries again if the request fails. The maximum retry is 3ci without timeout. If the long connection between the client and the server is abnormally disconnected (for example, when the server node goes offline) during the request, the client will re-establish a connection with the available server through the background tasks mentioned above to ensure that the connection is available
- The server receives the request, updates the database, then updates the local cache file, and then notifies the server. The notification includes two parts: the client and the server. The client is mainly for those listening; The server is the other server nodes, telling them that the configuration has changed
- The client receives a configuration change notification, that is, executes the listener logic on the client. The server receives the configuration change notification, updates the local dump file, and triggers a notification to the listening client. The client receives the notification and executes the listener logic
About storage
- Database: Persistent
- Local cache, which is a Map, reduces the load of data read (ConfigCacheService#dump)
Refer to the link
- Distro agreement: cloud.tencent.com/developer/a…