Today, Ecological Jun brings an ETCD by Mr. Sun Jianbo: a comprehensive interpretation from application scenarios to implementation principles. It is a long article, so you can collect it and see it again. The original see www.cnblogs.com/sunsky303/p… In typical application scenario: blog.51cto.com/nileader/10…

With projects such as CoreOS and Kubernetes gaining popularity in the open source community, the ETCD component used in both projects is gaining traction among developers as a highly available and consistent repository for service discovery. In cloud computing era, how to make the service rapidly transparently access to computing cluster, how to make sharing all machines found in the configuration information is rapid cluster, more importantly, how to build such a high availability, safety, easy to deploy and rapid response services cluster, has become the urgent need to solve the problem. Etcd is a good solution to these problems. This article will take an in-depth look at how etCD can be implemented, starting with the application scenario of ETCD, so that developers can take full advantage of the convenience brought by ETCD.

Classic Application Scenarios

What is etCD? Many people’s first thought might be a key-value repository, but they don’t pay attention to the second half of the official definition for configuring sharing and service discovery.

A highly-available key value store for shared configuration and service discovery.

In fact, etCD, a project inspired by ZooKeeper and Doozer, focuses on four things in addition to similar features.

  • Simple: The HTTP+JSON API makes it easy to use curl.
  • Security: Optional SSL client authentication mechanism.
  • Fast: Each instance supports one thousand writes per second.
  • Trusted: Distribution is fully implemented using the Raft algorithm.

With the continuous development of cloud computing, more and more people pay attention to the problems involved in distributed system. Inspired by the article overview of Typical Application scenarios of ZooKeeper [1] by Ali Middleware team, the author also summarizes some classic application scenarios of ETCD according to his own understanding. Let’s take a look at etCD, a distributed storage repository based on Raft’s strong consistency algorithm.

It is worth noting that data in distributed systems is divided into control data and application data. By default, the data processed in etCD scenarios is control data. For application data, it is recommended only when the amount of data is small but the update access is frequent.

Scenario 1: Service Discovery

Service discovery is one of the most common problems in distributed systems, that is, how processes or services in the same distributed cluster can find each other and establish connections. Essentially, service discovery wants to know if a process in the cluster is listening on udp or TCP ports and can find and connect by name. Solving the problem of service discovery requires three pillars, one of which is indispensable.

  1. A highly consistent, highly available service storage directory. Etcd, based on the Raft algorithm, is naturally a highly consistent and highly available directory for storing services.
  2. A mechanism for registering services and monitoring the health of services. Users can register services on the ETCD and set the key TTL for the registered services to maintain the heartbeat of the services periodically to monitor the health status.
  3. A mechanism for finding and connecting services. Services registered under the topic specified by etCD can also be found under the corresponding topic. To ensure connectivity, we can deploy a Proxy mode ETCD on each service machine to ensure that services that have access to the ETCD cluster are connected to each other.

Figure 1 service discovery diagram

Let’s look at a specific scenario for service discovery.

  • In the microservices collaborative work architecture, services are added dynamically. With the popularity of Docker containers, there are more and more cases of multiple microservices working together to form a relatively powerful architecture. There is also a growing need to transparently add these services dynamically. Through the service discovery mechanism, a directory with a service name is registered in etCD and the IP addresses of available service nodes are stored in this directory. In the process of using the service, you only need to find the available service nodes in the service directory to use.

Figure 2. Microservices work together

  • Application multi-instance and instance failure restart transparency in PaaS platform. Applications on the PaaS platform generally have multiple instances. Using domain names, you can transparently access these instances and implement load balancing. However, an instance of the application may fail and restart at any time. In this case, the information in domain name resolution (routing) needs to be configured dynamically. This dynamic configuration problem can be easily solved with etCD’s service discovery capabilities.

Figure 3 Cloud platform multi-instance transparency

Scenario 2: Message Publishing and Subscription In a distributed system, message publishing and subscription is the most applicable communication mode between components. This is to build a configuration sharing center where data providers publish messages, and message consumers subscribe to topics they care about and notify subscribers in real time when a topic has a message published. In this way, centralized management and dynamic update of distributed system configuration can be achieved.

  • Some configuration information used in applications is centrally managed on etCD. This scenario is usually used in this way: the application actively obtains configuration information from ETCD once at startup time, and registers a Watcher on the ETCD node and waits. Etcd will notify the subscriber in real time every time the configuration is updated, thus obtaining the latest configuration information.
  • In distributed search services, the meta information of the index and the node state of the server cluster machines are stored in etCD for subscription by individual clients. Use etCD’s key TTL feature to ensure that machine state is updated in real time.
  • Distributed log collection system. The core of the system is to collect logs distributed across different machines. Collectors typically assign collection units by application (or topic), so you can create a directory P on ETCD named after the application (topic), store all the machine IP addresses of the application (topic) as subdirectories on directory P, and then set up an ETCD recursive Watcher. Recursively monitors changes to all information in the application (topic) directory. This enables the collector to be notified in real time when the machine IP (message) changes to adjust the assignment.
  • A situation in which information in a system requires dynamic automatic retrieval and manual intervention to modify the content of the information request. This is usually done by exposing an interface, such as a JMX interface, to get some runtime information. With the introduction of ETCD, instead of implementing a solution of your own, you can simply store the information in designated ETCD directories, which can be accessed externally via an HTTP interface.

Figure 4 Message publishing and subscription

Scenario 3: Load balancing

Load balancing is also mentioned in Scenario 1. The load balancing referred to in this article is soft load balancing. In a distributed system, to ensure high availability of services and data consistency, data and services are usually deployed in multiple copies to achieve peer services. Even if one of the services fails, the peer services are not affected. The downside is reduced data write performance and the benefit is load balancing during data access. Because complete data exists on each peer service node, users’ access traffic can be split between different machines.

  • Etcd supports load balancing for information access stored in its distributed architecture. When ETCD is clustered, each core node of ETCD can handle user requests. Therefore, the small amount of data but frequently accessed message data directly stored in ETCD is also a good choice, such as the secondary code table commonly used in the business system (stored in the table code, stored in etCD code represents the specific meaning, the business system call table lookup process, need to find the meaning of the code in the table).
  • Maintain a load balancing node list using ETCD. Etcd can monitor the status of multiple nodes in a cluster, and when a request comes in, it can poll forward the request to the surviving states. Similar to KafkaMQ, ZooKeeper maintains load balancing between producers and consumers. You can also use ETCD to do ZooKeeper work.

Figure 5 load balancing

Scenario 4: Distributed Notification and coordination

The concept of distributed notification and coordination is similar to publishing and subscribing to messages. Watcher mechanism in ETCD is used to realize notification and coordination between different systems in distributed environment through registration and asynchronous notification mechanism, so as to realize real-time processing of data changes. This is usually implemented like this: Different systems register for the same directory on etCD and set Watcher to watch for changes in that directory (recursive mode can be set if changes in subdirectories are needed). When a system updates the directory on ETCD, the system with Watcher will be notified and react accordingly.

  • Low coupling heartbeat detection via ETCD. The detecting system is associated with the detected system through a directory on etCD rather than directly, which can greatly reduce the coupling of the system.
  • The system is scheduled using etCD. A system is composed of a console and a push system. The responsibility of the console is to control the push system to carry out corresponding push work. Some of the actions the administrator makes on the console actually change the status of some directory nodes on etCD, and etCD notifies the push clients that have signed up for Watcher’s push system about these changes, and the push system makes corresponding push tasks.
  • Complete work report through ETCD. Most similar task distribution systems register a temporary working directory on etCD after a subtask is started and report its progress regularly (writing progress to this temporary directory) so that the task manager can know the progress of the task in real time.

Figure 6. Distributed collaboration

Scenario 5: Distributed lock

Because ETCD uses the Raft algorithm to keep data consistent, the values stored in the cluster for an operation must be globally consistent, so distributed locking is easy to implement. The lock service can be used in two ways, either to remain exclusive or to control timing.

  • Retention exclusivity means that all users who acquire a lock end up with only one. Etcd provides a set of apis for implementing distributed lock atom operation CAS (CompareAndSwap). Setting prevExist ensures that if multiple nodes attempt to create a directory at the same time, only one of them succeeds. The successful user is considered to have acquired the lock.
  • Control timing, that is, all users who want to acquire locks are scheduled to do so, but the order in which they are acquired is also globally unique and determines the order in which they are executed. Etcd also provides a set of apis (automatic creation of ordered keys) to specify a POST action for a directory, so that ETCD will automatically generate a current maximum value in the directory as a key and store the new value (client number). You can also use the API to list all the keys in the current directory in order. The values of these keys are the timing of the client, and the values stored in these keys can be numbers representing the client.

Figure 7 distributed lock

Scenario 6: Distributed queues

The general usage of distributed queues is similar to the control timing usage of distributed locks described in Scenario 5, that is, creating a first-in, first-out queue to ensure order.

Another interesting implementation is to execute the queue sequentially when certain conditions are guaranteed. An implementation of this method creates an additional /queue/condition node in the /queue directory.

  • Condition can represent the queue size. For example, a large task can only be executed when many small tasks are ready. Each time a small task is ready, the number of condition will be increased by 1 until the number specified by the large task is reached. Then a series of small tasks in the queue will be executed and the large task will be finally executed.
  • Condition can indicate whether a task is in the queue. This task can be the first execution of any sort task, or it can be a point in the topology that has no dependencies. Typically, these tasks must be executed before other tasks in the queue can be executed.
  • Condition can also represent other classes of notifications to start performing tasks. This can be specified by the controller to start the queue task when the condition changes.

Figure 8 distributed queue

Scenario 7: Cluster monitoring and Leader election

Monitoring through ETCD is very simple and real-time.

  1. As mentioned in the previous scenarios, Watcher is the first to notice when a node disappears or changes.
  2. A node can set the TTL key, such as sending heartbeat every 30 seconds to keep the node that represents the machine alive, otherwise the node disappears.

In this way, the health status of each node can be detected in the first time to meet the monitoring requirements of the cluster.

In addition, with distributed locks, Leader campaigning can be done. This scenario is usually for machines with long CPU calculations or IO operations. The elected Leader only needs to calculate or process the results once, and then the results can be copied to other followers. So as to avoid repeated labor and save computing resources.

The classic scenario for this is full indexing in a search system. If each machine carries out index establishment once, not only time-consuming but also the consistency of index establishment cannot be guaranteed. By simultaneously creating a node in the CAS mechanism of ETCD, the successful machine is created as the Leader to perform index calculation and then distribute the calculation results to other nodes.

Figure 9 Leader election

Scenario 8: Why etCD instead of ZooKeeper?

Readers of “Typical ZooKeeper Application Scenarios” [2] may find that ZooKeeper can implement all of the functions that ETCD implements. So why use ETCD instead of ZooKeeper directly?

ZooKeeper has the following disadvantages:

  1. Complex. ZooKeeper deployment and maintenance is complex, and administrators need to master a series of knowledge and skills. The Paxos strong consistency algorithm is also known for its complexity and complexity. ZooKeeper is also complicated to use and requires the installation of a client. ZooKeeper provides only Java and C interfaces.
  2. Java write. This is not a bias against Java, but Java itself is biased towards heavy-duty applications that introduce a lot of dependencies. Operations people generally want to keep a strong, consistent, highly available cluster of machines as simple as possible and error-proof to maintain.
  3. Development is slow. The unique “Apache Way” [3] of The Apache Foundation project is controversial in the open source community. One of the reasons is that the huge structure and loose management of the foundation lead to the slow development of the project.

And etCD as an up-and-comer, its advantages are also obvious.

  1. Simple. Easy to write and deploy using Go. Using HTTP as an interface is simple to use; Use Raft algorithm to ensure strong consistency and make it easy for users to understand.
  2. Data persistence. Etcd defaults to persist data as soon as it is updated.
  3. Security. Etcd supports SSL client security authentication.

Finally, etCD, as a young project, really teaches iteration and development that this is both a strength and a weakness. The advantage is that it has unlimited possibilities for the future, while the disadvantage is that it cannot be tested for long-term use by large projects. However, well-known projects such as CoreOS, Kubernetes, and CloudFoundry are currently using etcd in production environments, so overall etcd is worth a try.

Etcd implementation principle interpretation

In the previous section, we outlined many classic etCD scenarios. In this section, we will start with the architecture of ETCD and dive into the source code to parse ETCD.

1 architecture

Figure 10 etCD architecture diagram

From the architecture diagram of ETCD, we can see that ETCD is mainly divided into four parts.

  • HTTP Server: Used to process API requests sent by users and synchronization and heartbeat information requests from other ETCD nodes.
  • Store: Transactions that handle the various functions supported by ETCD, including data indexing, node state changes, monitoring and feedback, event processing and execution, and so on. It is the concrete implementation of most of the API functions provided by ETCD to users.
  • Raft: The concrete implementation of Raft strong consistency algorithm is the core of ETCD.
  • WAL: Write Ahead Log, which is the data storage mode of etCD. In addition to holding the state of all the data and the index of the node in memory, ETCD is persisted through WAL. In WAL, all data is logged before submission. Snapshot is a status Snapshot to prevent too much data. Entry Indicates the specific log content.

Typically, a request from a user is forwarded to the Store via the HTTP Server for specific transaction processing, and if node changes are involved, it is handed to Raft module for state changes, logging, and synchronization to other ETCD nodes to confirm the data submission. Finally, the data is submitted and synchronized again.

List of important changes to new version ETCD

  • IANA certified port 2379 for client communication, 2380 for node communication, shared with the original (4001 peers / 7001 clients).
  • Each node can listen to multiple broadcast addresses. The number of monitored addresses is increased from one to multiple. Users can implement a more complex cluster environment based on requirements, for example, one is a public IP address, and the other is a private IP address, such as a VM (container).
  • Etcd can proxy the leader node’s requests, so if you can access any of the ETCD nodes, you can read and write to the entire cluster regardless of the network topology.
  • The ETCD cluster and the nodes in the cluster have their own unique ids. This prevents configuration confusion, and requests from other ETCD nodes other than the cluster will be masked.
  • The configuration information for etCD cluster startup is now completely fixed, which helps users to properly configure and start the cluster.
  • The Runtime Reconfiguration changes. Users can change the ETCD cluster structure without restarting the ETCD service. You can dynamically change the cluster configuration after startup.
  • Raft algorithm was redesigned and implemented to make it faster, easier to understand and contain more test code.
  • Raft logging is now strictly a backward-append only, write-ahead logging system with CRC checks added to each record.
  • The _etcd/* keyword used at startup is no longer exposed to the user
  • Disable standby mode for cluster auto-tuning, which makes cluster maintenance more difficult for users.
  • The new Proxy mode does not add to the ETCD consistency cluster, but only implements Proxy forwarding.
  • The ETCD_NAME (-name) parameter is now optional and is no longer used to uniquely identify a node.
  • Instead of using configuration files to configure etCD properties, you can use environment variables instead.
  • To start a cluster in self-discovery mode, you must provide the cluster size so that you can determine the actual number of nodes in the cluster.

3 EtCD concept vocabulary

  • Raft: An algorithm used by ETCD to ensure strong consistency in distributed systems.
  • Node: A Raft state machine instance.
  • Member: an etCD instance. It manages a Node and can service client requests.
  • Cluster: An ETCD Cluster consisting of multiple members that can work together.
  • Peer: indicates the name of another Member in the same ETCD cluster.
  • Client: the Client that sends HTTP requests to the ETCD cluster.
  • WAL: write-ahead log format. Etcd is used for persistent storage.
  • Snapshot: etcd A snapshot created to prevent excessive WAL files and store ETCD data status.
  • Proxy: An ETCD mode that provides reverse Proxy services for ETCD clusters.
  • Leader: The node created by campaigning in the Raft algorithm to process all data commits.
  • Follower: The failed node serves as a subordinate node in Raft to ensure the consistency of the algorithm.
  • Candidate: When the Follower cannot receive the heartbeat from the Leader for a certain period of time, the Follower becomes the Candidate and starts the campaign.
  • Term: a node becomes the Leader until the next election, called a Term.
  • Index: indicates the number of a data item. Term and Index are used to locate data in Raft.

4 Cluster application Practice

Etcd, as a highly available key-value storage system, is inherently designed for clustering. Since Raft algorithms require a majority of nodes to vote for decisions, etCD generally recommends an odd number of nodes for deployment clusters, with 3, 5, or 7 nodes being recommended for a cluster.

4.1 Starting a Cluster

Etcd has three configuration solutions for cluster startup: static configuration startup, ETCD service discovery, and DNS service discovery.

Depending on the configuration content, you can choose different methods. Notably, this is one of the features that distinguishes the new VERSION of ETCD from its predecessors. Instead of using configuration files to configure parameters, etCD uses command-line parameters or environment variables to configure parameters.

4.1.1. Static Configuration

This approach is suitable for offline environments. Before starting the entire cluster, you know the size of the cluster to be configured, as well as the address and port information of each node in the cluster. At startup time, you can start the ETCD cluster by configuring the initial-cluster parameter.

When each ETCD machine starts up, you configure environment variables or add startup parameters as follows.

ETCD_INITIAL_CLUSTER = "infra0 = http://10.0.1.10:2380, infra1 = http://10.0.1.11:2380, infra2 = http://10.0.1.12:2380" ETCD_INITIAL_CLUSTER_STATE=newCopy the code

Parameter method:

- initial cluster infra0 = http://10.0.1.10:2380, http://10.0.1.11:2380, infra2 = http://10.0.1.12:2380 - initial - cluster - state newCopy the code

Note that the URL configured in the -initial-cluster parameter must be the same as the initial-advertise-peer-urls parameter set when each node is started. (The initial-advertise-peer-urls parameter indicates the address where the node listens for synchronization signals from other nodes)

If your network is configured with multiple ETCD clusters, it is best to use the -initial-cluster-token parameter to configure a token authentication for each cluster. This ensures that each cluster and its members have a unique ID.

To sum up, if you were to configure a cluster with three ETCD nodes, your startup commands on each of the three machines would look like this.

$etcd-name infra0-initial-advertise-peer-urls http://10.0.1.10:2380 \ -listen-peer-urls http://10.0.1.10:2380 \ -initial-cluster-token etcd-cluster-1 \ -initial-cluster Infra0 = http://10.0.1.10:2380, infra1 = http://10.0.1.11:2380, infra2 = http://10.0.1.12:2380 - initial - cluster - the state new $etcd - Name infra1-initial-advertise-peer-urls http://10.0.1.11:2380 \ -listen-peer-urls http://10.0.1.11:2380 \ -initial-cluster-token etcd-cluster-1 \ -initial-cluster Infra0 = http://10.0.1.10:2380, infra1 = http://10.0.1.11:2380, infra2 = http://10.0.1.12:2380 - initial - cluster - the state new $etcd - Name infra2-initial-advertise-peer-urls http://10.0.1.12:2380 \ -listen-peer-urls http://10.0.1.12:2380 \ -initial-cluster-token etcd-cluster-1 \ -initial-cluster Infra0 = http://10.0.1.10:2380, infra1 = http://10.0.1.11:2380, infra2 = http://10.0.1.12:2380 - initial - cluster - the state newCopy the code

After initialization, etcd provides the function of dynamically adding, deleting, and modifying nodes in the ETcd cluster. This operation requires the etcdctl command.

4.1.2. Etcd self-discovery mode

To start an ETCD cluster in self-discovery mode, prepare an ETCD cluster. If you already have an ETCD cluster, you can first set the cluster size by executing the following command, let’s say 3.

$ curl -X PUT http://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83/_config/size -d value=3
Copy the code

Myetcd. Local /v2/keys/dis… Etcd. Myetcd. Local /v2/keys/dis… Registration and discovery services for ETCD.

So you end up starting ETCD on a machine with the following command.

$etcd-name infra0-initial-advertise-peer-urls http://10.0.1.10:2380 \ -listen-peer-urls http://10.0.1.10:2380 \ -discovery http://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83Copy the code

If you do not have an etCD cluster available locally, the etCD website provides an etCD storage address that can be accessed from the public network. You can get the directory of the etcd service and use it as the -discovery parameter by using the following command.

$ curl http://discovery.etcd.io/new? size=3 http://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573deCopy the code

Again, when you have completed the initialization of the cluster, this information is useless. When you need to add nodes, you need to use etcdctl to do so.

For security, be sure to register with the new Discovery Token every time you start a new ETCD cluster. In addition, if you start more than the specified number of nodes during initialization, the excess nodes are automatically converted to Proxy mode ETCD.

4.1.3. DNS Self-discovery mode

Etcd also supports booting with DNS SRV records. Refer to RFC2782 [4] for details on how DNS SRV records are used for service discovery, so you need to configure them accordingly on the DNS server.

(1) Enable SRV record query on the DNS server and add the corresponding domain name record. The query result is similar to the following:

$ dig +noall +answer SRV _etcd-server._tcp.example.com
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra0.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra1.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra2.example.com.
Copy the code

(2) Configure related A records for each domain name to point to the machine IP corresponding to the ETCD core node. The query result is similar to the following.

$dig +noall +answer infra0.example.com infra1.example.com infra2.example.com infra0.example.com. 300 IN A 10.0.1.10 Infra1.example.com. 300 IN A 10.0.1.11 infra2.example.com. 300 IN A 10.0.1.12Copy the code

With the above two-step DNS configuration, you can start the ETCD cluster using DNS. Set the URL parameter of DNS resolution to -discovery-srv, and run the following command to start a node:

$ etcd -name infra0 \
  
-discovery-srv example.com \
  
-initial-advertise-peer-urls http://infra0.example.com:2380 \
  
-initial-cluster-token etcd-cluster-1 \
  
-initial-cluster-state new \
  
-advertise-client-urls http://infra0.example.com:2379 \
  
-listen-client-urls http://infra0.example.com:2379 \
  
-listen-peer-urls http://infra0.example.com:2380 
Copy the code

Of course, you can also directly change the domain name of the node to IP to start.

4.2 Key parts of the source code analysis

Etcd starts from main.go in the home directory, and then goes to etcdmain/etcd.go to load the configuration parameters. If the mode is set to Proxy, enter startProxy; otherwise, enter startEtcd and enable etCD service module and HTTP request processing module.

When start the HTTP listener, in order to keep keep connecting with other cluster etcd machine (peers), use transport. NewTimeoutListener start way, so don’t get a response in more than a specified time timeout error will occur. And in monitoring the client request, the use is transport. NewKeepAliveListener, contribute to the stability of the connection.

As you can see from the setupCluster function in etcdmain/etcd.go, the way to start the cluster is slightly different depending on the etCD parameters, but ultimately all you need is a string of IP and port.

In the statically configured startup mode, all the cluster information is already given, so it is good to parse the comma-separated cluster URL information directly.

DNS finds the page similarly by sending a TCP SRV request first to see if the cluster’s domain name is available under the etcd-server._tcp.example.com and, if not found, to see the etcd-server._tcp.example.com. Based on the found domain name, the IP address and port are resolved, that is, the URL of the cluster.

More complex is etCD-style self-discovery startup. First, use its own single URL to form a cluster, and then enter the JoinCluster function of discovery/ Discovery. go source code according to the parameters in the startup process. Since we know the token address of etCD used at startup in advance, it contains the cluster size information. It’s a process of constant monitoring and waiting. The first step of the startup is to register its information in the token directory of the ETCD, and then monitor the number of all nodes in the token directory. If the number is not up to the standard, the loop waits. When the number reaches the requirement, it ends and enters the normal startup process.

Two confusing urls are commonly used to configure ETCD. One is used for etCD clusters to synchronize information and maintain connections, usually called peer-urls. The other is for receiving HTTP requests from clients, often called client-urls.

  • Peer-urls: Typically listens on port 2380 (older versions used port 7001), including the addresses of all nodes that are already working in the cluster.
  • Client-urls: Usually listens on port 2379 (the old version used port 4001). To accommodate complex network environments, the new version of ETCD listens for multiple urls from clients instead of just one. In this way, ETCD can work with multiple network cards to listen for requests on different networks at the same time.

4.3 Change of operating time point

After the ETCD cluster is started, you can reconstruct the cluster during the running process, including adding, deleting, migrating, and replacing core nodes. Run-time refactoring allows the ETCD cluster to change its configuration without restarting it, which is a new feature included in the new version of ETCD compared to the old version.

You can only do runtime configuration management if most of the nodes in the cluster are healthy. Because information about configuration changes is also stored and synchronized by ETCD as an information store, the cluster loses its ability to write data if many nodes in the cluster fail. Therefore, it is highly recommended to configure at least three core nodes when configuring the number of ETCD clusters.

4.3.1. Node migration and replacement

When a hardware failure occurs on the machine on which your node is located, or a problem occurs on the node, such as a corrupted data directory, causing the node to become permanently irrecoverable, you need to migrate or replace the node. When a node fails, it must be repaired as soon as possible, because an ETCD cluster requires that most of the nodes in the cluster work properly.

To migrate a node, perform the following steps:

  • Suspends the running node process
  • Copy the data directory from the existing machine to the new machine
  • Use API to update the URL record of the corresponding node pointing to the machine in etCD to update the IP of the new machine
  • Start ETCD on the new machine using the same configuration items and data directory.

4.3.2. Nodes are added

Adding nodes makes ETCD more highly available. For example, if you have three nodes, at most one node is allowed to fail; When you have five nodes, you can allow two nodes to fail. At the same time, adding nodes also enables better read performance of the ETCD cluster. Because etCD nodes are synchronized in real time and all information is stored on each node, adding nodes improves read throughput overall.

To add a node, perform two steps:

  • Add the URL record for this node to the cluster and get the cluster information.
  • Use the obtained cluster information to start a new ETCD node.

4.3.3. Node Removal

Sometimes you have to make a trade-off between improving etCD write performance and increasing cluster high availability. When the Leader node commits a write record, it synchronizes the message to each node, and when the majority of nodes agree, the data is actually written. So the more nodes, the worse write performance. When there are too many nodes, you may need to remove one or more.

Removing a node is very simple and requires only one step to delete the records of that node from the cluster. The node on the corresponding machine will then stop automatically.

4.3.4. Forcibly restart the cluster

When more than half of the nodes in a cluster fail, you need to manually force a node to act as its Leader and start a new cluster using the original data.

At this point you need to do two steps.

  • Backup the original data to the new machine.
  • Run the -force-new-cluster command to add the backup data to restart the node

Note: Forced restart is a mandatory option that breaks the security guaranteed by the consistency protocol (an error occurs if there are other nodes in the cluster working at the time of the operation), so be sure to save your data before performing this operation.

5 the Proxy mode

Proxy mode is also an important change in the new version of ETCD, which acts as a reverse Proxy to forward customer requests to the available ETCD cluster. In this way, you can deploy a proxy-mode ETCD on each machine as a local service, and if these ETCD proxies work well, your service discovery will be stable and reliable.

Figure 11 Proxy mode schematic diagram

Therefore, Proxy does not directly join the etCD cluster that meets the strong consistency. Likewise, Proxy does not increase the reliability of the cluster, nor does it reduce the write performance of the cluster.

5.1 Why Proxy Replaces Standby Mode

So why have a Proxy pattern instead of just adding etCD core nodes? In fact, every time a core node (peer) is added to ETCD, the Leader node will be burdened with network, CPU and disk to a certain extent, because every information change needs to be synchronized backup. Increasing the number of etCD core nodes makes the whole cluster more reliable, but when the number reaches a certain point, the benefits of increased reliability become less obvious and actually decrease the cluster’s write synchronization performance. Therefore, adding a lightweight Proxy mode ETCD node is an effective alternative to adding an ETCD core node directly.

Users familiar with the older version of etCD 0.4.6 will notice that Proxy mode actually replaces the old Standby mode. In addition to the forwarding agent function, the Standby mode also changes from Standby mode to normal node mode when the number of core nodes is insufficient due to faults. When the failed node recovers and the number of etCD core nodes reaches the preset value, it switches to Standby mode.

However, in the new version of ETCD, the Proxy mode is automatically enabled only when the number of core nodes is found to be sufficient when the ETCD cluster is initially started. Otherwise, the Proxy mode is not implemented. The main reasons are as follows.

  • Etcd is a component used to ensure high availability, so the system resources it requires (including memory, hard disk, CPU, etc.) should be fully guaranteed to ensure high availability. Letting the automatic switching of the cluster change the core nodes at will does not allow the machine to guarantee performance. So ETCD officials encourage you to have your own cluster of machines for running ETCD in a large cluster.
  • Because the ETCD cluster is highly available, a partial machine failure does not cause functionality to fail. Therefore, when the machine fails, the administrator has sufficient time to check and repair the machine.
  • Automatic conversion complicates etCD clustering, especially now that ETCD supports listening and interaction with multiple network environments. Switching between different networks is more prone to errors, resulting in cluster instability.

For the above reasons, the current Proxy mode has the forwarding Proxy function, but does not perform role conversion.

5.2 Key parts of the source code analysis

As you can see from the code, the essence of the Proxy pattern is an HTTP Proxy server that forwards requests sent by customers to this server to other ETCD nodes.

Etcd currently supports both read-write and read-only modes. By default, both read and write requests are forwarded. Read-only mode forwards only read requests and returns 501 errors for all other requests.

Note that, except during startup, the proxy mode will be started because the proxy parameter is set. When the etCD clustering starts, the actual number of nodes in the cluster is detected to meet the requirements when the node registers itself, and then the Proxy mode is degraded.

6 Data Storage

Etcd storage is divided into memory storage and persistent (hard disk) storage. In addition to sequentially recording all users’ changes to node data, etCD also performs operations such as indexing user data and building heap for easy query. Persistence uses WAL (Write Ahead Log) for record storage.

In WAL’s architecture, all data is logged before it is committed. In etCD’s persistent storage directory, there are two subdirectories. One is WAL, which stores the change record of all transactions. The other is snapshot, which stores data for all etCD directories at a given point in time. Using the combination of WAL and Snapshot, ETCD can effectively perform data storage and node fault recovery.

Why do you need snapshot when WAL stores all changes in real time? To prevent disk overflow, etCD creates a snapshot for every 10000 records by default. WAL files that go through the snapshot can be deleted. The default number of historical ETCD operations that can be queried through the API is 1000.

Etcd stores the startup configuration information in the data directory specified by the data-dir parameter when it is first started. The configuration information includes local node ids, cluster ids, and initial cluster information. Users need to avoid etCD rebooting from an expired data directory, because a node started with an expired data directory will be inconsistent with other nodes in the cluster (for example, the Leader node will be asked for this information after the Leader node has been logged and agreed to store the information). So, to maximize the security of the cluster, you should remove the node from the cluster as soon as there is any possibility of data corruption or loss and add a new node without the data directory.

6.1 Write-Ahead Logs (WAL)

Write Ahead Log (WAL) records the entire process of data changes. In ETCD, all data changes are written to WAL before committing. Using WAL for data storage gives ETCD two important functions.

  • Fast recovery from failure: When your data is corrupted, you can quickly recover from the original data to the state before the data was corrupted by performing all the modifications recorded in WAL.
  • Undo/redo: Because all changes are recorded in WAL, rollback or redo is required, and only operations in the log are performed in the direction or forward direction.

Naming rules for WAL and Snapshot in ETCD

WAL files are stored in the seq− SEq -seq−index. WAL format in the etCD data directory. The initial WAL file is 0000000000000000-0000000000000000. WAL is the 0 of all WAL files and the initial Raft state number is 0. After running for a while, you may need to do log sharding to put new entries into a new WAL file.

For example, when the cluster runs to the Raft state 20 and WAL files need to be shred, the next WAL file will become 0000000000000001-0000000000000021.wal. If log splitting is performed after 10 operations, the WAL file name changes to 0000000000000002-0000000000000031.wal. You can see that the number before the – sign increases by 1 after each shard, while the number after the – sign is based on the Raft start state that is actually stored.

The snapshot storage is named in the term-term-index. wal format, which is easier to understand. Term and index indicate the status of the RAFT node where the snapshot data was stored, the current term number, and the location of the data item.

6.2 Key parts of the source code analysis

As you can see from the code logic, WAL has two modes, read mode and append mode. Both modes cannot be applied simultaneously. A newly created WAL file is in append mode and does not enter read mode. An existing WAL file must be opened in Read mode, and append mode cannot be entered until all records have been read, and append mode cannot be entered again. This helps to ensure the integrity and accuracy of the data.

When the cluster enters the NewServer function of etCDServer /server.go to start an ETCD node, it checks for the presence of legacy WAL data from the past.

The first step is to check whether there are files in the snapshot folder that meet the specification. If the snapshot format is v0.4, call the function to upgrade to V0.5. Get the cluster configuration information from snapshot, including token, other node information, and so on, and load the contents of the WAL directory, sorted from smallest to largest. Based on the term and index obtained in snapshot, find WAL entries next to snapshot and update them backwards until all WAL entries have been traversed and stored in memory in the ENTS variable. WAL enters Append mode to prepare for adding data items.

If the number of data items in a WAL file reaches the threshold (default value: 10000), WAL partitions and snapshot operations are performed at the same time. This process can be seen in the snapshot function of etcdServer /server.go. Therefore, there is actually only one snapshot and one WAL file each useful in the data directory, and etCD retains five history files each by default.

7 Raft

The RAFT package in the new ETCD is a concrete implementation of the RAFT consistency algorithm. There are plenty of articles on Raft on the web, and if you’re interested, read the Raft algorithm paper [5]. Instead of giving a detailed description of Raft algorithm, this article introduces some key aspects of the algorithm in the form of q&A with ETCD. If you don’t understand the terms for Raft algorithms, see the Concept Glossary section.

7.1 FAQ for Raft

  • What does a Term mean in Raft? Raft algorithm, in terms of time, a term is from the start of one election to the start of the next election. From a functional point of view, if the Follower fails to receive the heartbeat message from the Leader node, the Follower ends the current term and becomes the Candidate to start a campaign, which helps recover the cluster when the Leader node fails. The node with a small term will not win the election when the election is initiated. If the cluster does not fail, a term lasts indefinitely. A conflict in the vote could lead to re-election.

FIG. 12 Schematic diagram of Term

  • How does the Raft state machine switch? When Raft starts running, the node enters the Follower state by default and waits for the heartbeat message sent by the Leader. If the wait times out, the status changes from Follower to Candidate and the Candidate starts the election in the next term. After receiving the votes of most nodes in the cluster, the node becomes the Leader. The Leader node may have network faults and other nodes may vote to become the Leader of the new term. In this case, the original Leader node will be switched to followers. The Candidate will also switch to the Follower node if it finds that another node has successfully become the Leader when waiting for other nodes to vote.

Figure 13 Raft state machine

  • How to ensure the election of the Leader in the shortest time to prevent election conflicts? As you can see in the Raft state machine figure, there is a times out in the Candidate state. The Times out is a random value, which means that after each machine becomes a Candidate, the amount of time it takes to run a new campaign is different. There is a time lag. Within the time difference, if Candidate1 received the election information with a greater term value than the election information initiated by itself (i.e., the other party is the new term), and the new Candidate2 wanting to be the Leader contains all the submitted data, Then Candidate1 will vote for Candidate2. This ensures that there is only a small chance of an election conflict.

  • How to prevent other candidates from voting to become Leader if some data is missing? Raft campaigning uses a random value to determine the timeout, and the first node that times out will raise the term number to start a new round of voting. Normally, other nodes will vote when they receive the election notification. However, the node that initiated the campaign will refuse to vote for it if the submitted data saved from the previous term is incomplete. This mechanism prevents a node that misses data from becoming a Leader.

  • What happens when a node fails Raft? Normally, if the Follower nodes fail, the cluster can function with little impact if the number of remaining available nodes exceeds half. If the Leader node is down, followers will fail to receive heartbeat beats and time out. They will start a campaign to get votes and become the Leader of the new term to continue providing services for the cluster. Note that; Etcd currently does not have any mechanism to automatically change the total number of nodes in the cluster, that is, if there is no artificial API call, the nodes after the ETCD outage will still be counted as the total number of nodes, and any request that is acknowledged will receive more than half of this total number of votes.

Figure 14 Node breakdown

  • Why does Raft algorithm not need to consider the Byzantine general problem when determining the number of available nodes? The Byzantine problem states that a distributed architecture that allows N nodes to go down and still provide normal services requires a total number of nodes of 3N +1, whereas Raft only needs 2n+1. The main reason for this is that the Byzantine general problem involves data fraud, whereas etCD assumes that all nodes are honest. Before the election, ETCD needs to tell other nodes their term number and index value at the end of the term of the previous round. These data are accurate, and other nodes can decide whether to vote according to these values. In addition, ETCD strictly restricts the data flow from Leader to Follower to ensure the consistency of data without error.

  • Which node in the cluster does the user read and write data from? Raft to ensure the strong consistency of data, all data flows in the same direction from the Leader to the followers, that is, the data of all followers must be consistent with the Leader. If the data is inconsistent, it will be overwritten. That is, all users’ data update requests are first obtained by the Leader, and then saved to inform other nodes to save as well. Data will be submitted when most nodes respond. A committed data item is the one that Raft actually stores stably and doesn’t change any more, and then synchronizes the submitted data to other followers. Since each node has an exact backup of the data Raft has committed (at worst it’s not completely synchronized), any read request can be handled by any node.

  • How does the Raft algorithm implemented by ETCD perform? Single-instance nodes support 1000 data writes per second. The more nodes, the slower data synchronization will be due to network latency, and the better read performance will be because each node can handle user requests.

7.2 Key parts of the source code analysis

In the ETCD code, Node, as the concrete implementation of the Raft state machine, is the key to the whole algorithm and the entry point to the algorithm.

In etcd, the call to the Raft algorithm is as follows, which you can find in startNode in etcdServer /raft.go:

storage := raft.NewMemoryStorage()
n := raft.StartNode(0x01, []int64{0x02, 0x03}, 3, 1, storage)
Copy the code

This code shows that Raft records data and state in memory, and the Node that Raft.StartNode starts is the Raft state machine Node. After starting a Node, Raft does the following.

First, you need to push the information received from other machines in the cluster to the Node Node, as you can see in the Process function in etcdServer /server.go.

func (s *EtcdServer) Process(ctx context.Context, m raftpb.Message) error {
  
  if m.Type == raftpb.MsgApp {

  s.stats.RecvAppendReq(types.ID(m.From).String(), m.Size())

  }
 return s.node.Step(ctx, m)
}
Copy the code

In detecting whether the machine that sends the request is a Node in the cluster and whether its own Node is a Follower, the machine that sends the request is regarded as the Leader, and the push and processing of Node Node information are realized through node.step () function.

Second, you need to store the log entry, execute the submitted log entry in your application, and then signal completion to other nodes in the cluster, waiting for the next task to execute via Node.ready () listening. It is important to ensure that the contents of your log entries are reliably and stably stored before you send completion messages to other nodes.

Finally, you need to keep a heartbeat Tick(). There are two important areas where Raft uses timeouts: heartbeat hold and the Leader campaign. The user is required to periodically call the Tick() function on his RAFT Node to service the timeout mechanism.

To sum up, the state machine cycle for the entire RAFT node looks like this:

for {
  
  select {

    case <-s.Ticker:
      n.Tick()

    case rd := <-s.Node.Ready():
      saveToStorage(rd.State, rd.Entries)
      send(rd.Messages)
      process(rd.CommittedEntries)
      s.Node.Advance()

    case <-s.done:
     return

  }
  
} 
Copy the code

The actual code location of the state machine is the run function in etcdServer /server.go.

To change the state of the state machine (such as updating user data, etc.), n.propose (CTX, data) function is called. When storing data, serialization operation will be performed first. After getting confirmation from most of the other nodes, the data is committed and saved as committed.

As mentioned earlier, the etCD cluster needs to be started by another ETCD cluster or DNS. After starting, these external forces are not needed. Etcd will store its cluster information as state. So changing the number of nodes in your cluster actually requires adding data entries to the Raft state machine just like changing user data. All this is done by n.proseconfchange (CTX, CC). When the request for cluster configuration information change also receives confirmation and feedback from most nodes, the formal operation of configuration change can be performed. The code is as follows.

var cc raftpb.ConfChange
  
cc.Unmarshal(data)
  
n.ApplyConfChange(cc)
Copy the code

Note: a unique ID represents a cluster, so in order to avoid message confusion among different ETCD clusters, ids need to be unique, and old token data cannot be reused as ids.

8 Store

The Store module, as the name implies, is like a Store that processes all the underlying support that etCD has already prepared, providing users with a wide variety of API support to handle user requests. To understand the Store, just look at etCD’s API. V2 refers to the ETCD API list [6], V3 command line refers to this [7], V3 HTTP refers to this [8]. We can see that there are apis that operate on keys stored in etCD, as provided by Store. The directories and keys mentioned in the API may also be called etCD nodes.

For newer versions of etCD Server, the HTTP API of V2 is disabled by default. If necessary, you can add the parameter –enable-v2. Here are v2-based practices:

  • Assign values to keys stored in etCD
http://127.0.0.1:2379/v2/keys/message - XPUT curl - s - d value = "Hello world" | jq. {" action ":" set ", "node" : {" key ": "/message", "value": "Hello world", "modifiedIndex": 9, "createdIndex": 9 }, "prevNode": { "key": "/message", "value": "Hello world", "modifiedIndex": 8, "createdIndex": 8 } }Copy the code

The meaning of feedback is as follows:

- action: indicates the name of the action. - node.key: indicates the HTTP path of the request. Etcd uses a file-system-like approach to reflect the contents of key-value stores. - node.value: specifies the contents of the requested key. - node.createdIndex: a value that is incremented each time the ETCD node changes. In addition to user requests, etCD internal operations (such as startup, cluster information changes, etc.) may change the node and cause this value to change. - node.modifiedIndex: Operations like Node. createdIndex that can change modifiedIndex include set, delete, update, create, compareAndSwap and compareAndDelete.Copy the code
  • Query the value stored by a key in etcd
The curl http://127.0.0.1:2379/v2/keys/messageCopy the code
  • Change the key value

This is almost the same as creating a new value, except that there is a prevNode value that reflects what was stored before the modification.

Curl http://127.0.0.1:2379/v2/keys/message - XPUT - d - s value = "Hello etcd" | jq. {" action ":" set ", "node" : {" key ": "/message", "value": "Hello etcd", "modifiedIndex": 12, "createdIndex": 12 }, "prevNode": { "key": "/message", "value": "Hello etcd", "modifiedIndex": 11, "createdIndex": 11 } }Copy the code
  • Delete a value

http://127.0.0.1:2379/v2/keys/message – XDELETE curl – s | jq.

{
  "action": "delete",
  "node": {
    "key": "/message",
    "modifiedIndex": 20,
    "createdIndex": 12
  },
  "prevNode": {
    "key": "/message",
    "value": "Hello etcd",
    "modifiedIndex": 12,
    "createdIndex": 12
  }
}
Copy the code
  • Deletes a key periodically

Etcd deletes keys periodically, setting a TTL value, and deletes keys when this value expires. In the feedback, the expiration item tells the timeout period, and the TTL item tells the set time.

Curl http://127.0.0.1:2379/v2/keys/foo - XPUT - d - s value = bar - d TTL = 5 | jq. {" action ":" set ", "node" : {" key ": "/foo", "value": "bar", "expiration": "2021-02-04t3:40:41.517258z "," TTL ": 5, "modifiedIndex": 13, "createdIndex": 13}}Copy the code
  • Example Cancel the scheduled deletion task
Curl http://127.0.0.1:2379/v2/keys/foo - XPUT - d - s value = bar - TTL = d - d prevExist = true | jq. {" action ": "update", "node": { "key": "/foo", "value": "bar", "modifiedIndex": 16, "createdIndex": 15 }, "prevNode": { "key": "/foo", "value": "bar", "expiration": "2021-02-04t3:44:46.401423z "," TTL ": 3, "modifiedIndex": 15, "createdIndex": 15}}Copy the code
  • Monitor key value changes

The API provided by ETCD allows the user to monitor a value or recursively monitor the value of a directory and its subdirectories, and etCD actively notifies the directory or value when it changes.

The curl - s http://127.0.0.1:2379/v2/keys/foo? wait=true | jq .Copy the code
  • Query past key-value operations: Similar to the monitoring mentioned above, but with the index number of a previous modification, you can query the history of operations. By default, 1000 historical records can be queried.
The curl -s' http://127.0.0.1:2379/v2/keys/foo? wait=true&waitIndex=7' | jq . { "action": "set", "node": { "key": "/foo", "value": "bar", "expiration": "2021-02-04T03:40:41.517258z "," TTL ": 5, "modifiedIndex": 13, "createdIndex": 13}}Copy the code
  • Automatically creates ordered keys in a directory. Using the POST parameter on the created directory automatically creates a value in that directory with the createdIndex value as the key, thus strictly sorting by creation time. This API is useful for scenarios such as distributed queues.
http://127.0.0.1:2379/v2/keys/queue - XPOST curl - s - d value = Job1 is | jq. {" action ":" create ", "node" : {" key ": "/queue/00000000000000000021", "value": "Job1", "modifiedIndex": 21, "createdIndex": 21}} http://127.0.0.1:2379/v2/keys/queue - XPOST curl - s - d value = Job1 is | jq. {" action ":" create ", "node" : {" key ": "/queue/00000000000000000022", "value": "Job1", "modifiedIndex": 22, "createdIndex": 22 } }Copy the code
  • Lists all ordered keys created in order
The curl -s' http://127.0.0.1:2379/v2/keys/queue? recursive=true&sorted=true' | jq . { "action": "get", "node": { "key": "/queue", "dir": true, "nodes": [ { "key": "/queue/00000000000000000021", "value": "Job1", "modifiedIndex": 21, "createdIndex": 21 }, { "key": "/queue/00000000000000000022", "value": "Job1", "modifiedIndex": 22, "createdIndex": 22 } ], "modifiedIndex": 21, "createdIndex": 21 } }Copy the code
  • Create a directory for periodic deletion: Similar to the periodic deletion of a key. If a directory is deleted due to timeout, all contents under it are automatically deleted due to timeout.
Curl -s TTL = 30 - http://127.0.0.1:2379/v2/keys/dir - XPUT - d d dir = true | jq. {" action ":" set ", "node" : {" key ": "/dir", "dir": true, "expiration": "2021-02-04T04:02:56.043769z "," TTL ": 30, "modifiedIndex": 23, "createdIndex": 23}} "/dir", "dir": true, "expiration": "2021-02-04t04:02:56.043769z "," TTL ": 30, "modifiedIndex": 23, "createdIndex": 23}Copy the code
  • Refresh timeout period.
Curl -s TTL = 30 - http://127.0.0.1:2379/v2/keys/dir - XPUT - d d dir = true 3-d prevExist = true | jq. {" action ": "Update", "node" : {" key ":"/dir ", "dir" : true, "expiration" : "the 2021-02-04 T04:03:59. 826027 z", "TTL" : 30, "modifiedIndex": 26, "createdIndex": 25 }, "prevNode": { "key": "/dir", "dir": true, "expiration": "2021-02-04T04:03:57.610303z "," TTL ": 21, "modifiedIndex": 25, "createdIndex": 25}}Copy the code
  • Automating the CAS (compact-and-swap) operation: The most intuitive expression of etCD’s strong consistency is this API, which sets conditions to prevent nodes from being created or modified twice. That is, the user’s instruction is executed if and only if the CAS condition is true. The conditions are as follows.

PrevValue Specifies the value of the previous node. Operation is allowed if the value is the same as the supplied value. PrevIndex Indicates the number of the previous node. The operation is allowed only when the number is the same as the provided check number. PrevExist Whether the previous node exists. If yes, the operation is not allowed. This is often used for unique acquisition of distributed locks. Suppose the following is done first: set the value of foo.

Curl http://127.0.0.1:2379/v2/keys/foo - XPUT - d - s value = one | jq. {" action ":" set ", "node" : {" key ": "/foo", "value": "one", "modifiedIndex": 28, "createdIndex": 28 }, "prevNode": { "key": "/foo", "value": "bar", "modifiedIndex": 19, "createdIndex": 18 } }Copy the code

Then do the operation:

The curl - s http://127.0.0.1:2379/v2/keys/foo\? prevExist\=false -XPUT -d value=three |jq . { "errorCode": 105, "message": "Key already exists", "cause": "/foo", "index": 28 }Copy the code

A failed creation error is returned.

  • Conditional compare-and-delete: similar to CAS, the CAS server can be deleted only after the conditions are valid.

  • Create a directory

The curl http://127.0.0.1:2379/v2/keys/dir – XPUT – d dir = true

  • List all nodes in the directory, ending with a slash (/). You can also recursively list all subdirectory information using the recursive parameter.
The curl -s http://127.0.0.1:2379/v2/keys/ | jq. {" action ":" get ", "node" : {" dir ": true," nodes ": [{" key" : "/queue", "dir": true, "modifiedIndex": 21, "createdIndex": 21 }, { "key": "/foo", "value": "one", "modifiedIndex": 28, "createdIndex": 28 } ] } }Copy the code
  • Delete directories: By default, only empty directories are allowed to be deleted. To delete directories with content, add recursive=true.
The curl -s' http://127.0.0.1:2379/v2/keys/foo? dir=true' -XDELETE | jq . { "action": "delete", "node": { "key": "/foo", "modifiedIndex": 29, "createdIndex": 28 }, "prevNode": { "key": "/foo", "value": "one", "modifiedIndex": 28, "createdIndex": 28 } }Copy the code
  • Create a hidden node: the name starts with an underscore _ and the default is the hidden key.
Curl http://127.0.0.1:2379/v2/keys/_message - XPUT - d - s value = "Hello hidden world" | jq. {" action ":" set ", "node" : { "key": "/_message", "value": "Hello hidden world", "modifiedIndex": 30, "createdIndex": 30 } }Copy the code

After reading so many apis, you have a basic understanding of what the Store does. It processes the data stored in ETCD to create a filesystem like tree structure for users to quickly query. It has a Watcher for real-time feedback of node changes, and maintains a WatcherHub to push notifications to all Watcher subscribers. It also maintains a small top heap of timing keys that quickly return to the next key to time out. Finally, all of these API requests are stored as events in an event queue waiting to be processed.

9 summary

Through a series of reviews ranging from application scenarios to source code analysis, we learned that ETCD is not a simple distributed key-value storage system. It solves the most common consistency problems in distributed scenarios, provides a stable and highly available message registry repository for service discovery, and provides unlimited possibilities for microservices collaborative architecture. We believe that more and more large systems will be built through ETCD in the near future.

10 Introduction to the Author

Sun Jianbo, master student of SEL Laboratory, Zhejiang University [9], is currently engaged in scientific research and development in cloud platform team. The zhejiang University team has in-depth research and secondary development experience on PaaS, Docker, big data and mainstream open source cloud computing technology. The team now contributes some technical articles, hoping to help readers.

The resources

[1] ZooKeeper in typical application scenario: blog.51cto.com/nileader/10…

In [2] “ZooKeeper typical application scenario” : blog.51cto.com/nileader/10…

[3] “Apache Way” : www.infoworld.com/article/261…

[4] RFC2782: tools.ietf.org/html/rfc278…

Paper [5] Raft algorithm: ramcloud.stanford.edu/raft.pdf

[6] EtCD API list: github.com/etcd-io/etc…

[7] V3 command line reference this: github.com/etcd-io/etc…

[8] v3 HTTP Reference this: github.com/etcd-io/etc…

[9] Zhejiang University SEL Laboratory Master: www.sel.zju.edu.cn

Etcd: a comprehensive interpretation from application scenarios to implementation principles

Welcome to Go Ecology. Ecojun will share content related to Go language ecology from time to time.