Background,

Note: For simplicity, this article will refer to ClickHouse as CK and ZooKeeper as ZK.

From the end of last year, our company started the relocation of the computer room from Hong Kong to Singapore. All Clickhouse cluster instances have been moved from Hong Kong to Singapore, and the Zookeeper cluster is still in Hong Kong, so we plan to move the Zookeeper cluster smoothly to Hong Kong in the near future.

0.1 Goals and challenges

0.1.1ZK’s transcontinental relocation requires no perception of users

The CK cluster has grown to accommodate the real-time data analysis needs of the entire company and supports many online services. This requires ck clusters not to be down and to be available at all times. Ck cluster performs data insertion, query and table change all the time, while CK relies heavily on ZK for metadata storage, copy synchronization and table change in architecture design. Once ZK is unavailable, ck’s read and write will be affected. The migration of ZK cluster will inevitably lead to leader switching, so how to ensure read and write in the process of ZK cluster relocation is not a small challenge.

0.1.2 Hot Upgrade + Dynamic Configuration Update

In order to achieve the above goal, during the migration process, on the one hand, we need to do a good retry from the write layer to avoid the failure of the ZK cut main process. Also, minimize the amount of time zK is unavailable. All the zK operations should adopt the hot upgrade mode, rolling operation. At the same time, because the CLUSTER IP of ZK has changed, it is necessary to change a lot of configurations, and all configurations should also try to use reload mode, rather than restart the service.

First, the overall plan

1.1 Step 1: Upgrade ZK from statically configured version to dynamically configured version

Dynamic configuration is supported after ZK 3.5.0. The dynamic configuration feature facilitates capacity expansion and reduction without requiring a rolling restart of all instances in the entire ZK cluster. Unfortunately, the ZK cluster used by CK does not use dynamic configuration. Therefore, the first step of the ZK cluster relocation is to smoothly upgrade the ZK cluster from the statically configured version to the dynamically configured version to simplify the subsequent expansion and reduction operations. Zk dynamic configuration version details may refer to: backendhouse. Making. IO/post/zookee…

1.2 Step 2: ZK capacity expansion to achieve relocation

Step 2: After ck cluster is upgraded to the dynamically configured version, smooth relocation of ZK cluster from the old machine in Hong Kong to the new machine in Singapore is realized through capacity expansion and reduction operations:

  • Capacity expansion: Add new machines one by one from the Singapore machine room to the ZK cluster
  • Change the ZK configuration from the old machine to the new machine
  • Capacity reduction: Remove old machines from the ZK cluster one by one.

2. Problems encountered and solutions

To ensure that there were no problems during the online ZK relocation process, we conducted sufficient impact level estimation and offline drills in advance. During this process, we found the following problems:

2.1 The ZK Statically configured version is incompatible with the ZK dynamically configured version

In 1.1, when the followers instance in ZK is upgraded from static configuration to dynamic configuration, an error occurs on the Follower instance during the upgrade.

2021-02-25 11:07:03.081 [myID :5] -warn [QuorumPeer[myID =5](plain=/ 0:0:0:0:0:0:0:0:285)(Secure =disabled):Follower@96] -  Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158) at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:336) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at . Org. Apache. Zookeeper server. Quorum. QuorumPeer. Run (1271) QuorumPeer. Java: the 2021-02-25 11:07:03, 081 [5] myid: - the INFO [QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:2185)(secure=disabled):Follower@201] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)Copy the code

Meanwhile, the Leader instance that has not been upgraded reported the following error:

The 2021-02-26 19:35:08, 065 [6] myid: - WARN [LearnerHandler - / xx. Xx. Xx. Xx: 52906: LearnerHandler @ 644] - * * * * * * * GOODBYE / xx, xx, xx, xx: 52906 * * * * * * * * 2021-02-26 19:35:08, 066 [6] myid: - the INFO [WorkerReceiver[myid=6]:FastLeaderElection$Messenger$WorkerReceiver@285] - 6 Received version: b00000000 my version: 0 2021-02-26 19:35:08/066 [myID :6] -info [WorkerReceiver[myID =6]:FastLeaderElection@679] - Notification: 2 (message format version), 5 (n.leader), 0xb0000000c (n.zxid), 0x2 (n.round), LOOKING (n.state), 5 (n.sid), 0xb (n.peerEPoch), LEADING (my state)b00000000 (n.config version) 2021-02-26 19:35:08.067 [myID :6] -error [LearnerHandler-/xx.xx.xx.xx:52908:LearnerHandler@629] - Unexpected exception causing shutdown while sock still open java.io.IOException: Follower is ahead of the leader (has a later activated configuration) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:398)Copy the code

The problem is found that the Leader instance of the static version and the Follower instance of the dynamic version cannot coexist. During the serial upgrade of ZK, to minimize the unavailability time of the ZK cluster, we upgraded all followers first and then the Leader. After a Follower instance is upgraded from the statically configured version to the dynamically configured version, the Leader is still in the statically configured version and its Config version is 0. The Follower is in the dynamic version, and the config version is greater than 0.

Upon receiving the request, the Leader checks the config version of the Follower. If the Leader finds that the config version of the Follower is larger than his own, The Follower is ahead of the leader and actively closes the connection.

                if (learnerInfoData.length >= 20) {  
                    long configVersion = bbsid.getLong();  
                    if (configVersion > leader.self.getQuorumVerifier().getVersion()) {  
                        throw new IOException("Follower is ahead of the leader (has a later activated configuration)"); }}Copy the code

After reading the EOF, the followers throw eofExceptions and retry.

There are two solutions on the table:

  • Scheme 1: Change the serial upgrade to parallel upgrade to avoid the existence of static version and dynamic version instances at the same time.
  • Solution 2: Since the static version and dynamic version exist at the same time for a short time, ZK adds a temporary version, which removes the Leader’s config version check on followers to bypass the above problems

Considering the large amount of data in the ZK cluster and the long startup time of the ZK instance, the parallel upgrade will cause the ZK cluster to be unavailable within 2 to 4 minutes. Once a problem occurs and a rollback occurs, the zK cluster will be unavailable for twice as long, which is a great risk. Instead of plan 1, we choose Plan 2: upgrade the static version to the static version without config version checking, and then upgrade to the dynamic version.

2.2 CK cannot load zK configuration dynamically

In the offline walkthrough, when modifying the ZK Server list in the ClickHouse configuration file in 1.2, configuration changes are not dynamically loaded by CK. There are so many applications on ClickHouse, some of them 2B businesses or very real-time businesses, that it is not acceptable to restart the CK cluster to load a new ZK Server list.

The final solution was to add ClickHouse support for dynamic loading of ZK configurations to avoid user impact from restarting the CK cluster. This optimization has now been incorporated into the community, PR: github.com/ClickHouse/…

2.3 ZK Instance Restart Causes a Few CK Query Failures

Ck queries generally do not involve ZK interactions, so ZK relocation does not affect CK queries in most cases. When clickhouse optimize_trivial_count_query is enabled, PR: github.com/ClickHouse/… Count ‘queries are affected by zK moves.

Tracking code, found that after optimize_trivial_count_query open, for the simple select count query, ck will skip conventional query process, to obtain total number of rows from the metadata (in the process, will visit zk), To speed up select count queries. Therefore, we set the switch Optimize_trivial_count_query to 0 in the ClickHouse cluster before the ZK cluster is moved and will turn it on after the ZK cluster is moved.

2.4 FAILED to write CK due to ZK instance restart symptom

When writing data to CK, CK relies on the ZK cluster to allocate Blockid and synchronize data from the current copy to the spouse copy. Therefore, the process of ZK relocation must affect ck writes. What we need to do is to make the time period that affects writing as short as possible; In addition, if a write failure is found, try again for the copy in the same fragment to ensure that when the ZK cluster is recovered, the CK write can be automatically resumed.

There are currently Flink, Spark and Clickhouse_sinker portals to write data to CK, we need to make sure they have retry mechanism in advance before zK moves.

2.5 FAILED to Change the CK table Due to zK Instance Restart symptom

Table change operations include creating/deleting tables and adding/deleting/modifying fields. The CK cluster must access the CK cluster when performing table changes. Therefore, the relocation of ZK will affect table changes in the CK cluster. Because CK table changes are less frequent than CK writes and queries. Therefore, in the process of ZK relocation, we degraded the ck platform, that is, the CK table change operation is not supported at this time, so as to avoid the CK table change failure caused by zK instance restart.

Iii. Final relocation plan

3.0 Initial Status

Assume that Hong Kong machines: A1, A2, A3, A4, A5, Singapore machines: B1, B2, B3, B4, B5 In the initial state, the ZK cluster is deployed in the Hong Kong equipment room and the initial version is static configuration. Our goal is to move the ZK cluster to a new machine room, with the final version being a dynamically configured version.

3.1 Upgrading to the Dynamically Configured Version

Version upgrade path: Static version -> static version (without config version check) -> Dynamic version

3.1.1 Static Version -> Static version without (Config Version check)

Serial upgrade, upgrading Follower and Leader first. After a machine is upgraded, check the status of the Leader and Follower in the cluster, and check whether ck queries and writes are abnormal. Wait for the ZK instance to finish starting, then upgrade the next ZK instance.

Expected impact:

  • When upgrading the Follower instance, the connection between the CK and the ZK instance is disconnected, and some CK writes may be reportedzookeeper session expiredError, CK reconnects to other ZK instances and recovers within 40s.
  • When the Leader instance is upgraded, ZK will enter the election cycle and a new Leader will be created, and CK will write the reporttable in readonly modeError. After the new Leader is generated, CK writes are restored, and the recovery time does not exceed 3 minutes.

Rollback scheme: Roll back directly to the original static version in parallel

3.1.2 Static Version without (Config Version Check) -> Dynamic version

The upgrade procedure, expected impact surface, and rollback scheme are the same as 3.1.1

3.2 Dynamic Capacity Expansion and contraction

3.2.1 Capacity expansion: Add new Singapore machines to ZK cluster

Serial expansion steps:

  • Deploy the ZK instance on the new machine B1 with a1-A5 and B1 in its configuration. throughreconfig -add 6=B1:2888:3888; 2181Add B1 to the cluster. Then check whether the local configurations of all zK instances are updated, the status of the Leader/Follower, and the READ and write status of the CK. If no problem is found, expand the capacity of the next machine
  • Deploy the ZK instance on the new machine B2 with a1-A5 and B1-B2 in configuration. throughreconfig -add 7=B2:2888:3888; 2181Add B2 to the cluster. Check the same procedure as above
  • .
  • Repeat the above steps until all the new machines have been added to the ZK cluster.

Expected impact: None

Rollback step: Run the reconfig -remove

command to remove the new instance from the cluster if an exception occurs at any step during the serial capacity expansion.

3.2.2 Modifying CK configuration: change ZK configuration to Singapore new machine

Modify ck configuration, change zookeeper-Servers from old machine A1-A5 to new machine B1-B5, and deliver to all CK instances. The netstat command checks whether ck has established a connection with the new ZK instance.

<zookeeper-servers>  
      <node index="0">  
          <host>A1</host>  
          <port>2181</port>  
      </node>  
      <node index="1">  
          <host>A2</host>  
          <port>2181</port>  
      </node>  
      <node index="2">  
          <host>A3</host>  
          <port>2181</port>  
      </node>  
      <node index="3">  
          <host>A4</host>  
          <port>2181</port>  
      </node>  
      <node index="4">  
          <host>A5</host>  
          <port>2181</port>  
      </node>  
</zookeeper-servers>  
Copy the code

Expected impact: None

Roll back: If any exception is found during the check, roll back the CK configuration and deliver it again.

3.2.3 Capacity reduction: Remove old Hong Kong machines from ZK cluster

During the serial capacity reduction, the Follower instance should be first scaled down and the Leader instance should be scaled down as follows:

  • Capacity reduction A1: Passreconfig -remove 1=A1:2888:3888; 2181Command to remove old machine A1 from the cluster. Then check whether the local configuration of all zK instances is automatically updated, check the Leader/Follower status, and check whether the CK read and write is abnormal. After confirming that there is no problem, take the ZK instance on old machine A1 offline.
  • Reduce capacity A2, the operation is the same as above
  • .
  • Repeat until all the old Hong Kong machines have been removed from the ZK cluster.

Expected impact: same as 3.1.1

Rollback: If an exception occurs in any step during the serial capacity reduction, run reconfig -add < ID >=< IP >:2888:3888. The 2181 command adds the instance to the ZK cluster.

Four,

Through extensive zK relocation drills in the offline environment, we were able to identify and resolve various issues arising in the ZK relocation in a timely manner. Finally, the smooth migration of ZK cluster from Hong Kong to Singapore was completed under the condition that CK users were basically unaware.

Recommended reading

  • Read clickHouse cluster monitoring
  • 30 minutes getting started with Vim
  • 30 minutes introduction to GDB
  • STL source code analysis — Vector
  • Principles of the ZooKeeper Client
  • Redis implements distributed locking
  • Recommend a few useful efficiency devices
  • Restrict the C/C++ keyword
  • Rvalue semantics in modern C++
  • Python garbled nine questions
  • Linux Shell Script walkthrough reading Notes

For more exciting content, please scan the code to follow the wechat public number: back-end technology cabin. If you think this article is helpful to you, please share, forward, read more.