preface
The purpose of this article is to establish a basic concept for beginners, so that beginners can establish a certain foundation on the road of learning.
Starting today, I will continue to update several advanced Spring Cloud tutorials.
Eureka profile
Eureka, a service discovery framework developed by Netflix, is itself a REST-based service. Spring Cloud integrates it into its sub-project spring-Cloud-Netflix to implement service registration and discovery functions.
Eureka overall architecture diagram
This section describes Eureka components
-
Service registry cluster
They are deployed in IDC1, IDC2, and IDC3 centers
-
Service provider
One service provider is deployed in IDC1 and one in IDC3
-
Service consumer
Service consumers are deployed in IDC1 and IDC2
Call relationships between components
Service provider
- Start the service: The service provider makes a Register request to the service registry to Register the service.
- Running: The service provider periodically sends the Renew heartbeat to the registry, telling it “I’m still alive.”
- Stop service provision: The service provider sends a Cancel request to the service registry, telling it to clear the current service registry information.
Service consumer
- After startup: Pull service registration information from the service registry.
- Running: Update service registration information periodically.
- Initiating a remote call:
- A service consumer will select a service provider in the same room from the service registry and then make a remote call. Only when the service provider in the same room is down will the service provider in another room be selected.
- If the service consumer finds that there is no service provider in the same machine room, it selects service providers in other machine rooms based on the load balancing algorithm and initiates remote invocation.
The registry
- After startup: Pull service registration information from other nodes
- During operation:
- The Evict task is periodically run and the service provider that does not send Renew on time is periodically cleared. The clearing here will stop all services caused by other factors such as network abnormality normally.
- Received Register, Renew, and Cancel requests are synchronized to other registry nodes.
Eureka Server provides service registration, discovery, and heartbeat detection through interfaces such as Register, Renew, and Get Registry.
Eureka Client is a Java Client used to simplify the interaction with the Eureka Server. The Client also has a built-in load balancer (round-robin mode by default). After being started, the Client sends heartbeat detection to the Eureka Server with the default interval of 30 seconds. If the Eureka Server does not receive a heartbeat request from a certain node of the Eureka client within several heartbeat cycles, the Eureka Server clears the heartbeat request from the service registry to the corresponding service node of the Eureka client (90s by default).
The data structure
The data structure of the service store can be easily understood as a two-tier HashMap (ConcurrentHashMap for thread safety). For details, see the AbstractInstanceRegistry class in the source code:
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry = new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
Copy the code
The key of layer 1 ConcurrentHashMap =spring.application.name, which is the application name, and value is ConcurrentHashMap.
The key of layer 2 ConcurrentHashMap =instanceId, which is the unique instanceId of the service, and the value is the Lease object, which is the specific service. Lease is a wrapper around InstanceInfo that holds the instance information, service registration time, and so on. For details, see the InstanceInfo source.
Data stored procedure
Eureka provides services externally through the REST interface.
Here I am registered, for example (ApplicationResource), will first PeerAwareInstanceRegistry instance into ApplicationResource member variables in the registry.
- Upon receiving the request, ApplicationResource calls the registry.register() method.
@POST
@Consumes({"application/json"."application/xml"})
public Response addInstance(InstanceInfo info, @HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
logger.debug("Registering instance {} (replication={})", info.getId(), isReplication);
// validate that the instanceinfo contains all the necessary required fields
if (isBlank(info.getId())) {
return Response.status(400).entity("Missing instanceId").build();
} else if (isBlank(info.getHostName())) {
return Response.status(400).entity("Missing hostname").build();
} else if (isBlank(info.getIPAddr())) {
return Response.status(400).entity("Missing ip address").build();
} else if (isBlank(info.getAppName())) {
return Response.status(400).entity("Missing appName").build();
} else if(! appName.equals(info.getAppName())) {return Response.status(400).entity("Mismatched appName, expecting " + appName + " but was " + info.getAppName()).build();
} else if (info.getDataCenterInfo() == null) {
return Response.status(400).entity("Missing dataCenterInfo").build();
} else if (info.getDataCenterInfo().getName() == null) {
return Response.status(400).entity("Missing dataCenterInfo Name").build();
}
// handle cases where clients may be registering with bad DataCenterInfo with missing data
DataCenterInfo dataCenterInfo = info.getDataCenterInfo();
if (dataCenterInfo instanceof UniqueIdentifier) {
String dataCenterInfoId = ((UniqueIdentifier) dataCenterInfo).getId();
if (isBlank(dataCenterInfoId)) {
boolean experimental = "true".equalsIgnoreCase(serverConfig.getExperimental("registration.validation.dataCenterInfoId"));
if (experimental) {
String entity = "DataCenterInfo of type " + dataCenterInfo.getClass() + " must contain a valid id";
return Response.status(400).entity(entity).build();
} else if (dataCenterInfo instanceof AmazonInfo) {
AmazonInfo amazonInfo = (AmazonInfo) dataCenterInfo;
String effectiveId = amazonInfo.get(AmazonInfo.MetaDataKey.instanceId);
if (effectiveId == null) { amazonInfo.getMetadata().put(AmazonInfo.MetaDataKey.instanceId.getName(), info.getId()); }}else {
logger.warn("Registering DataCenterInfo of type {} without an appropriate id", dataCenterInfo.getClass());
}
}
}
registry.register(info, "true".equals(isReplication));
return Response.status(204).build(); // 204 to be backwards compatible
}
Copy the code
- AbstractInstanceRegistry does the job of storing service information in the Register method.
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock();
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
REGISTER.increment(isReplication);
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// Retain the last dirty timestamp without overwriting it, if there is already a lease
if(existingLease ! =null&& (existingLease.getHolder() ! =null)) {
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
logger.debug("Existing lease found (existing={}, provided={}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
// this is a > instead of a >= because if the timestamps are equal, we still take the remote transmitted
// InstanceInfo instead of the server local copy.
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
logger.warn("There is an existing lease and the existing lease's dirty timestamp {} is greater" +
" than the one that is being registered {}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
logger.warn("Using the existing instanceInfo instead of the new instanceInfo as the registrant"); registrant = existingLease.getHolder(); }}else {
// The lease does not exist and hence it is a new registration
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
Lease<InstanceInfo> lease = new Lease<InstanceInfo>(registrant, leaseDuration);
if(existingLease ! =null) {
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
gMap.put(registrant.getId(), lease);
synchronized (recentRegisteredQueue) {
recentRegisteredQueue.add(new Pair<Long, String>(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
}
// This is where the initial state transfer of overridden status happens
if(! InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) { logger.debug("Found overridden status {} for instance {}. Checking to see if needs to be add to the "
+ "overrides", registrant.getOverriddenStatus(), registrant.getId());
if(! overriddenInstanceStatusMap.containsKey(registrant.getId())) { logger.info("Not found overridden id {} and hence adding it", registrant.getId());
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if(overriddenStatusFromMap ! =null) {
logger.info("Storing overridden status {} from map", overriddenStatusFromMap);
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
// Set the status based on the overridden status rules
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
// If the lease is registered with UP status, set lease service up timestamp
if (InstanceStatus.UP.equals(registrant.getStatus())) {
lease.serviceUp();
}
registrant.setActionType(ActionType.ADDED);
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
logger.info("Registered instance {}/{} with status {} (replication={})",
registrant.getAppName(), registrant.getId(), registrant.getStatus(), isReplication);
} finally{ read.unlock(); }}Copy the code
It is easy to see from the source code that the stored data structure is a two-tier HashMap.
Eureka also implements a level 2 cache to ensure that service information is transmitted externally,
-
Level 1 cache: Essentially a HashMap, with no expiration time, and a data structure that holds the output of service information.
private final ConcurrentMap<Key, Value> readOnlyCacheMap = new ConcurrentHashMap<Key, Value>(); Copy the code
-
Level 2 cache: Guava’s cache, which contains the invalidation mechanism and holds the output data structure of service information.
private final LoadingCache<Key, Value> readWriteCacheMap; Copy the code
-
Cache update:
Delete level 2 cache:
- After the client sends register, Renew, and Cancel requests and updates the register registry, the level-2 cache is deleted.
The Evict task of the server itself will delete the level 2 cache after deleting the service.
The invalidation mechanism (referring to Guava’s implementation of readWriteCacheMap)
Load level 2 cache:
- After the client sends the Get Registry request, if it does not exist in the second-level cache, the load mechanism of Guava will be triggered. After obtaining the original service information from Registry, it will process it and then put it into the second-level cache.
When the server updates level 1 cache, guava’s load mechanism will be triggered if there is no data in level 2 cache.
Update level 1 cache:
- There is a built-in time task on the server that periodically synchronizes data from level 2 cache to Level 1 cache, including deletion and update.
The caching mechanism can be viewed in the ResponseCacheImpl source code.
The data structure of Eureka can be summarized as follows:
Service Registration Mechanism
The service registry, service provider, and service consumer all initiate requests to register services to the service registry after startup (provided that the registration service is configured).
Upon receipt of a register request:
Save the service information to Registry;
Update the queue. Add the event to the update queue for incremental synchronization service information of Eureka Client.
Clear the level-2 cache to ensure data consistency. (readWriteCacheMap)
Update threshold;
Synchronizing service information;
The service contract
After the service is registered, send a renewal request (heartbeat check) periodically to prove that I am still alive. Do not clear my service information. The default time is 30 seconds.
Renew request received by registry:
- Update the last renewal time of the service object (lastUpdateTimestamp);
- Synchronize information to other nodes;
Service cancellation
A normal service shutdown would be preceded by a logout request notifying the registry that I am going offline.
When the registry receives a cancellation request (cancel) :
- Delete the service information from Registry;
- Update queue;
- Clear level 2 cache;
- Update threshold;
- Synchronize information to other nodes;
Note: Cancel request will be sent only when the service stops normally. If the service stops abnormally, it will be deleted through the active elimination mechanism of Eureka Server.
Service to eliminate
In fact, service culling is a bottom-of-the-line solution, which aims to solve the service information clearing strategy of abnormal service outage or other factors that make it impossible to send cancel request.
Services excluded are divided into:
- Judgment elimination condition
- Identify expired services
- Clearing overdue services
Elimination conditions:
- Turn off self-protection
- If self-protection is enabled, the system determines whether the server or client is faulty. If the fault occurs on the client, the system deletes the fault.
Self-protection mechanism: Eureka’s self-protection mechanism is a protection mechanism provided by the service against manslaughter. The self-protection mechanism of Eureka believes that if a large number of services fail to renew, it will consider that it has problems (for example, it is disconnected from the Internet) and will not be removed. On the contrary, it is the problem of its people, on the elimination.
The self-protection thresholds are divided into server and client. If the threshold exceeds the threshold, a large number of services are available and some services are unavailable. This indicates that the client is faulty. If the value does not exceed the threshold, a large number of services are unavailable. In this case, the fault occurs.
Calculation of threshold:
- Self-protection threshold = Total number of services x Number of renewals per minute x Self-protection threshold factor.
- Renewals per minute = (60s/Client renewals);
Expired Service:
If the time between the last renewal and the current time is greater than the threshold, the system will mark the expired services as expired services and save these expired services in the expired service set.
Elimination services:
Before removing services, the number of services to be removed will be calculated first, and then the expired services will be traversed. Through the shuffling algorithm, the services to be removed will be fairly selected each time, and then removed.
After performing the culling service:
- Delete service information from register;
- Update queue;
- Clear the level-2 cache to ensure data consistency.
Service access
The Eureka Client service is fetched from the cache. If not, the data is loaded into the cache and fetched from the cache. Services can be obtained by full synchronization or incremental synchronization.
Only data structures are stored in Registry, and the ready service information is stored in the cache
- First read the level 1 cache
- Check whether level 1 caching is enabled
- If level 1 cache is enabled, it is fetched from level 1 cache. If level 1 cache is not enabled, it is fetched from level 2 cache.
- If level-1 cache is not enabled, it is directly fetched from level-2 cache.
- Then read the second level cache
- If present in level 2 cache, return directly;
- If it does not exist in the secondary cache, the data is first loaded into the secondary cache and then read from the secondary cache.
Note: When loading the secondary cache, you need to determine whether it is full or incremental. If it is incremental, it is loaded from recentlyChangedQueue, and if it is full, it is loaded from Registry.
Service synchronization
Service synchronization is data synchronization between Server nodes. It is divided into boot time synchronization and run time synchronization.
-
Start the synchronization
When synchronization is started, the service information obtained in Applications is first iterated and registered with Registry. You can refer to PeerAwareInstanceRegistryImpl syncUp method in the class:
public int syncUp(a) { // Copy entire entry from neighboring DS node int count = 0; for (int i = 0; ((i < serverConfig.getRegistrySyncRetries()) && (count == 0)); i++) { if (i > 0) { try { Thread.sleep(serverConfig.getRegistrySyncRetryWaitMs()); } catch (InterruptedException e) { logger.warn("Interrupted during registry transfer.."); break; } } Applications apps = eurekaClient.getApplications(); for (Application app : apps.getRegisteredApplications()) { for (InstanceInfo instance : app.getInstances()) { try { if (isRegisterable(instance)) { register(instance, instance.getLeaseInfo().getDurationInSecs(), true); count++; }}catch (Throwable t) { logger.error("During DS init copy", t); }}}}return count; } Copy the code
Note that this method uses a two-tier for loop, where the first loop guarantees that it has pulled the service information and the second loop iterates through the service registration information.
-
Runtime synchronization
When reigster, Renew, or Cancel requests come in, the server encapsulates these requests into a task, and then puts them into a queue. After a series of processing, the requests are put into another queue. Can view PeerAwareInstanceRegistryImpl BatchWorkerRunnable class in class, there is no longer stick source.
conclusion
The principle of Eureka, which follows here, seems simple overall, but the implementation details are complicated. You have to read the source code a few times to figure out what they’re thinking.
Eureka serves as a service for registration and discovery, and its actual design principle is to follow AP principles, namely “final consistency of data”. There are still many companies that use ZK and NACOS as service registries, but a brief update on the comparison of service registries will not be covered here.
- Writing is not easy, reprint please indicate the source, like small partners can pay attention to the public number to view more like the article.
- Contact: [email protected]
- QQ:95472323
- wx:ffj2000