Each ZStack service is stateless, making it highly available and scaling out can be as simple as starting the remaining service instances and then load balancing. In addition, ZStack packages all services into a single process called a Management node, which makes deployment and administration super simple.

motivation

ZStack’s Scalability Secrets Part 1: Asynchronous Architecture, which allows a single ZStack management node to handle most cloud workloads. However, when users want to set up a highly available production environment, or deal with extremely large concurrent workloads, a management node is not enough. The solution is to build a distributed system so that the workload can be extended to each single management node. This way of adding new nodes to expand the capacity of the whole system is called scale out.

The problem

Designing a distributed system is not easy. A distributed system, especially a stateful one, must deal with consistency, availability, and partitioning tolerance (see CAP Theorem), all of which are complex. A stateless distributed system, on the other hand, somehow gets rid of this complexity. First, because there is no state sharing between nodes, the system naturally maintains consistency. Second, since the nodes are similar, the system is usually OK when it encounters a partitioning problem. Given this, a distributed system generally prefers to remain stateless rather than stateful. However, designing a stateless distributed system is also difficult, and often more difficult than designing stateful distributed systems. ZStack, which leverages the message Bus and database, builds a stateless distributed system with stateless services.

Before discussing what stateless services are, let’s understand what “state” is, since stateless services are the foundation for keeping the entire system stateless. In ZStack, resources such as hosts, virtual machines, images, and users are managed by a single service; When there is more than one service instance in the system, resources are divided into different instances. For example, if you have 10,000 virtual machines and two virtual machine service instances, ideally each instance will manage 5000 virtual machines:

Since there are two service instances, before sending a request to the virtual machine, the requester must know which instance is managing the virtual machine; Otherwise, it has no way of knowing where to send the request. Perceptions like “which service instance is managing what resource” are the states we’re talking about. If the service is stateful, the state is present in the service. The requester needs to consult these states somewhere. Services need to swap states when the number of service instances changes, for example, when a new service instance joins, or when the current service instance leaves.

State swapping is a worry, it is error-prone and often limits the extensibility of systems. To make the system more reliable and scalable horizontally, the ideal approach is to keep services stateless by separating states from each other (see Service Statelessness Principle). There are stateless services where requesters no longer need to ask where to send requests; Services no longer need to swap state when new service instances join or when old service instances leave.

Note: In what follows, the terms “service” and “service instance” are used interchangeably for simplicity.

Service and management nodes

Services, which communicate with each other via the Central Message Bus, RabbitMQ, are the “first class citizens” of the ZStack.

Unlike the usual microservices architecture, where each service runs in a separate process or on a separate machine, ZStack bundles all the services into a single process called the management node. There are good reasons for this so-called in-process Microservices Architecture, which you can refer to as The In-process Microservices Architecture.

A management node is a fully functional ZStack software. Due to the inclusion of stateless services, management nodes have no shared state, but have heartbeat records, and consistent hashing rings — which we’ll cover in more detail next. Heartbeat is used to monitor the “health” of management nodes. As soon as a management node stops updating its heartbeat within a given interval, other management nodes remove it and begin to take over the resources it manages.

Stateless service

The core technology to implement stateless services, especially for the business logic of ZStack, is consistent hashing algorithm. At startup, each management node is assigned a Version 4UUID (management node UUID), which, along with the service name, registers a service queue on the message bus. For example, a management node might register a service queue like this:

zstack.message.ansible.3694776ab31a45709259254a018913ca zstack.message.api.portal zstack.message.applianceVm.3694776ab31a45709259254a018913ca zstack.message.cloudbus.3694776ab31a45709259254a018913ca zstack.message.cluster.3694776ab31a45709259254a018913ca zstack.message.configuration.3694776ab31a45709259254a018913ca zstack.message.console.3694776ab31a45709259254a018913ca zstack.message.eip.3694776ab31a45709259254a018913ca zstack.message.globalConfig.3694776ab31a45709259254a018913ca zstack.message.host.3694776ab31a45709259254a018913ca zstack.message.host.allocator.3694776ab31a45709259254a018913ca zstack.message.identity.3694776ab31a45709259254a018913ca zstack.message.image.3694776ab31a45709259254a018913ca zstack.message.managementNode.3694776ab31a45709259254a018913ca Zstack.message.net work. L2.3694776 ab31a45709259254a018913ca zstack.message.network.l2.vlan.3694776ab31a45709259254a018913ca Zstack.message.net work. L3.3694776 ab31a45709259254a018913ca zstack.message.network.service.3694776ab31a45709259254a018913ca zstack.message.portForwarding.3694776ab31a45709259254a018913ca zstack.message.query.3694776ab31a45709259254a018913ca zstack.message.securityGroup.3694776ab31a45709259254a018913ca zstack.message.snapshot.volume.3694776ab31a45709259254a018913ca zstack.message.storage.backup.3694776ab31a45709259254a018913caCopy the code

Note: You should notice that all queues end with the same UUID, which is the UUID of the management node.

Resources, such as hosts, capacity, and virtual machines, are also identified by UUID. Messages, often associated with resources, are passed between services. Before sending a message, the sender must select the receiver service based on the UUID of the resource, where the consistent hashing algorithm comes into play.

Consistent hashing is a unique type of hash that is used when a hash table is resized; only a portion of the keys need to be remapped. For more on consistent hashing, see here for more details. In ZStack, the management node consists of consistent hash rings, as shown below:

Each management node maintains a copy of a consistent hash ring that contains uuids for all management nodes in the system. As management nodes join or leave, lifecycle events are broadcast across the message bus to other nodes, causing them to expand or contract the ring to represent the current state of the system. When sending a message, the sender service hashes the UUID of the target management node using the resource UUID. For example, send the VM UUID for 932763162 d054c04adaab6ab498c9139 StartVmInstanceMsg, pseudo code is as follows:

msg = new StartVmInstanceMsg(); destinationManagementNodeUUID = consistent_hashing_algorithm("932763162d054c04adaab6ab498c9139"); msg.setServiceId("vmInstance." + destinationManagementNodeUUID); cloudBus.send(msg)
Copy the code

If there is a stable ring, messages containing the same resource UUID will always be routed to the same service on a management node, which is the basis of ZStack’s lockless architecture (see ZStack’s Scalability secrets (Part 3) : Stack’s Scalability Secrets Part 3: Lock-free Architecture.

When a consistent hash ring is shrunk or released, only a few nodes are slightly affected due to the nature of consistent hashing.

Because of the consistent hash ring, the sender does not need to know which service instance is about to process the message; Instead, it will be dealt with. Services do not need to be maintained and exchanged, information about what resources they are managing; All they need to do is process incoming messages, because the ring ensures that the message finds the correct service instance. This is how services become super simple and remain stateless.

In addition to messages containing resource UUids (such as StartVmInstanceMsg, DownloadImageMsg), there is also a class of messages without resource UUids. Typically, these are creative messages (such as CreateVolumeMsg) and non-resource messages (such as AllocateHostMsg) — they do not manipulate individual resources. Given that these messages can be sent to the service of any administrative node, they may be intentionally sent to the local administrative node, and since the sender and receiver are on the same node, the receiver is of course reachable when the sender sends the message.

There is a special processing for API messages (for example, APIStartVmInstanceMsg), which always send a well-known service ID, api.portal. On the message bus, a global queue called zstack. Message. API. Portal, it Shared by all the management node API service, message service ID API. The portal will automatically be on one of the API service load balancing, The service also routes and forwards messages to the correct destination, using consistent hashing rings. By doing so, ZStack hides the details of routing and forwarding messages from API clients and simplifies writing a ZStack API client.

msg = new APICreateVmInstanceMsg()
msg.setServiceId("api.portal")
cloudBus.send(msg)
Copy the code

Abstract

In this article, we demonstrate that Zstack builds scalable distributed systems. Because management nodes share less information, it is easy to build a large cluster, perhaps with dozens or even hundreds of management nodes. In practice, however, on the private cloud side, two management nodes can scale well; On the public cloud side, administrators can create a management node based on workload. With its asynchronous architecture and stateless services, Zstack can handle a large number of concurrent tasks that existing IaaS software cannot handle.