1. Set up the Zookeeper cluster

To ensure high availability of the cluster, the Zookeeper cluster should have an odd number of nodes and at least three nodes. Therefore, a three-node cluster is set up here.

1.1 Download & Decompress

Download the corresponding Zookeeper version, which is 3.4.14. The official download address: archive.apache.org/dist/zookee…

#downloadWget HTTP: / / https://archive.apache.org/dist/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz#Unpack theThe tar - ZXVF zookeeper - 3.4.14. Tar. GzCopy the code

1.2 Modifying the Configuration

Copy three ZooKeeper installation packages. Go to the conf directory of the installation directory, copy the zoo_sample. CFG configuration sample to zoo. CFG, and modify the configuration file.

Zookeeper01 configuration:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-cluster/data/01
dataLogDir=/usr/local/zookeeper-cluster/log/01
clientPort=2181

#Server. 1 The 1 is the server identifier, which can be any significant number, identifying the number of server nodes. This identifier is written to the myID file under the dataDir directory
#Name the inter-cluster communication port and the election portServer. 1 = 127.0.0.1:2287-3387 for server 2 = 127.0.0.1:2288:3388 server. 3 = 127.0.0.1:2289-3389Copy the code

If multiple servers are deployed, the communication port and election port of each node in the cluster must be the same, and the IP address must be the IP address of the host accommodating each node.

Zookeeper02 configuration, only dataLogDir, dataLogDir, and clientPort are different from zooKeeper01:

tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-cluster/data/02 DataLogDir = / usr/local/zookeeper cluster/log / 02 clientPort = 2182 server. 1 = 127.0.0.1:2287:3387 server. 2 = 127.0.0.1:2288-3388 Server. 3 = 127.0.0.1:2289-3389Copy the code

Only dataLogDir, dataLogDir, and clientPort are different from zooKeeper01 and 02:

tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-cluster/data/03 DataLogDir = / usr/local/zookeeper cluster/log / 03 clientPort = 2183 server. 1 = 127.0.0.1:2287:3387 server. 2 = 127.0.0.1:2288-3388 Server. 3 = 127.0.0.1:2289-3389Copy the code

Parameter description:

  • TickTime: Basic unit of time used for computation. For example, session timeout: N*tickTime;
  • InitLimit: used for clusters to allow the initial connection time from the secondary node to the master node to be synchronized, expressed as a multiple of tickTime.
  • SyncLimit: For clusters, master sends messages between master and slave nodes, request and reply times (heartbeat mechanism);
  • DataDir: location where data is stored.
  • DataLogDir: log directory;
  • ClientPort: port used for client connections. The default port is 2181

1.3 Identifying a Node

Create a myID file in the data store directory of the three nodes and write the corresponding node ID. The Zookeeper cluster identifies cluster nodes through the myID file, communicates with nodes through the node communication port and election port configured above, and elects the leader node.

Create a storage directory:

# dataDir
mkdir -vp  /usr/local/zookeeper-cluster/data/01
# dataDir
mkdir -vp  /usr/local/zookeeper-cluster/data/02
# dataDir
mkdir -vp  /usr/local/zookeeper-cluster/data/03
Copy the code

Create and write node id to myID file:

#server1
echo "1" > /usr/local/zookeeper-cluster/data/01/myid
#server2
echo "2" > /usr/local/zookeeper-cluster/data/02/myid
#server3
echo "3" > /usr/local/zookeeper-cluster/data/03/myid
Copy the code

1.4 Starting a Cluster

Start three nodes respectively:

#Start Node 1
/usr/app/zookeeper-cluster/zookeeper01/bin/zkServer.sh start
#Start Node 2
/usr/app/zookeeper-cluster/zookeeper02/bin/zkServer.sh start
#Start Node 3
/usr/app/zookeeper-cluster/zookeeper03/bin/zkServer.sh start
Copy the code

1.5 Cluster Verification

Run the JPS command to view processes and the zkServer.sh status command to view the status of each node in the cluster. As shown in the figure, three node processes are started successfully, and two nodes are follower nodes and one node is leader node.

2. Kafka cluster construction

2.1 Download Decompression

Kafka installation package download address: official kafka.apache.org/downloads, download version 2.2.0 for this use case, download the command:

#downloadWget HTTP: / / https://www-eu.apache.org/dist/kafka/2.2.0/kafka_2.12-2.2.0.tgz#Unpack theThe tar - XZF kafka_2. 12-2.2.0. TGZCopy the code

Kafka installation package naming rules: For example, kafka_2.12-2.1.0.tgz, 2.12 indicates the version number of Scala (Kafka is developed in Scala), and 2.2.0 indicates the version number of Kafka.

2.2 Copying the Configuration File

Go to the config directory of the decompressed directory and copy three configuration files.

# cp server.properties server-1.properties
# cp server.properties server-2.properties
# cp server.properties server-3.properties
Copy the code

2.3 Modifying The Configuration

Modify some configurations in the three configuration files as follows:

Server – 1. The properties:

# The id of The broker. The only by The broker. Each node in The cluster id = 0 # listening address listeners = PLAINTEXT: / / hadoop001:9092 # data storage location The dirs = / usr/local/kafka - logs / 00 # Zookeeper link address to Zookeeper. Connect = hadoop001:2181, hadoop001:2182, hadoop001:2183Copy the code

Server – (2) the properties:

broker.id=1
listeners=PLAINTEXT://hadoop001:9093
log.dirs=/usr/local/kafka-logs/01
zookeeper.connect=hadoop001:2181,hadoop001:2182,hadoop001:2183
Copy the code

Server – 3. The properties:

broker.id=2
listeners=PLAINTEXT://hadoop001:9094
log.dirs=/usr/local/kafka-logs/02
zookeeper.connect=hadoop001:2181,hadoop001:2182,hadoop001:2183
Copy the code

It is important to note that log.dirs refers to the location of the data log, specifically, the partition data storage location, not the application run log location. The location of the program run logs is configured using log4j.properties in the same directory.

2.4 Starting a Cluster

Start three Kafka nodes with different configuration files. After starting, you can use JPS to view processes. At this point, there should be three ZooKeeper processes and three Kafka processes.

bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
bin/kafka-server-start.sh config/server-3.properties
Copy the code

2.5 Creating a Test Topic

Create a test theme:

bin/kafka-topics.sh --create --bootstrap-server hadoop001:9092 \
					--replication-factor 3 \
					--partitions 1 --topic my-replicated-topic
Copy the code

You can run the following command to view the information about the created theme:

bin/kafka-topics.sh --describe --bootstrap-server hadoop001:9092 --topic my-replicated-topic
Copy the code

Zone 0 has three replicas: 0,1, and 2. All the three replicas are available and in the IN-Sync Replica Replica (ISR) list, where 1 is the leader Replica. In this case, the cluster is successfully set up.

See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series