[TOC]
Section 1 Kafka sets
Before the socket
If you’re a developer and not interested in building kafka clusters, you can skip this chapter and jump right to tomorrow.
If you think it wouldn’t hurt to know a little more, read on.
Mind you, this chapter is full of figures
Kafka cluster setup
An overview of the
Setting up a Kafka cluster is tedious. You just download files and modify configurations, but there is a lot of data.
The base environment requires three ZK servers and three Kafka servers.
Operation process
Look at the picture
It looks a little bit longer, so INSTEAD of doing that, I’m going to use Docker to simplify the process a little bit.
Kafka cluster setup quickly
Install the Docker
Calculate the review
Uname -a yum -y install docker service docker start # or curl -fssl https://get.docker.com -o get-docker get-docker.shCopy the code
Mirror to accelerate
vi /etc/docker/daemon.json
{
"registry-mirrors": ["https://uzoia35z.mirror.aliyuncs.com"]
}Copy the code
zookeeperThe cluster
Docker – compose has been installed
Create a Docker network
Docker network create --driver bridge --subnet 172.29.0.0/25 \ --gateway 172.29.0.1 elk_zoo Docker network lsCopy the code
Yml script
The configuration is too long, put a structure here first, the source file will be posted on the blog later.
The listed items should be configured basically. Key attention should be paid to:
- Ports: # port
- Volumes: # Mount volumes
- Environment: Indicates the environment variable
- Networks: There are two parts, IP and common network
Refer to the configuration file for verification.
docker-compose up -dCopy the code
validation
ZooInspector
CD zookeeper/SRC/contrib/zooinspector / # open failure, need to verifyCopy the code
Kafka cluster
The mirror
docker pull wurstmeister/kafka
docker pull sheepkiller/kafka-managerCopy the code
Yml script
The configuration is too long, put a structure here first, the source file will be posted on the blog later.
The listed items should be configured basically. Key attention should be paid to:
- Ports: # port
- Volumes: # Mount volumes
- Environment: Indicates the environment variable
- Even outside the external_links
- Networks: There are two parts, IP and common network
Refer to the configuration file for verification.
docker-compose up -dCopy the code
validation
Use the kafka-Manager admin page, local IP plus 9000 port
Wrap it up.
In line with the belief of the god of laziness, docker completed the cluster construction in a short time. Clap your hands.
Stay tuned for command line practice tomorrow.
Today’s three diagrams are quite complicated and do not need to be memorized. Just figure out the flow by referring to the configuration file.
Section 2 Cluster Management Tools
Let’s start with a question. Yesterday I finished building kafka’s cluster and installed the management tools, as shown in the screenshot.
Can anyone see or guess the problems in the cluster? Have confidence to oneself can add my good friend private chat, if thinking is right, I also can send a small red envelope to encourage.
Cluster Management Tool
An overview of the
Kafa-manager is a common kafka cluster management tool. There are many similar tools, but also the company’s own development type of tools.
Operation process
After the cluster is configured, you can log in to Kafa-Manager from a browser and add the cluster management.
Once added, it will look like this
View Broker information
Click Topic to view Topic
Click again to set a single message
other
Preferred Replica ElectionReassign PartitionsConsumers
We’ll talk more about replica elections, partitions, and consumers.
Because the cluster is just built, much of the information will not be visible, and the next few articles will show it in conjunction with command line operations.
Cluster Issues
The following are some common faults and troubleshooting ideas:
- It can be used in a single machine, but the cluster fails to send messages
The host name cannot be set to 127.0.0.1
- You cannot consume information after upgrading
Check the default topic
__consumer_offsets
- Slow response
Using performance test scripts:
kafka-producer-perf-test.sh
Analysis and report Generation
Check the JStack information or locate the source code
- The log continues to report exceptions
Check kafka logs, check GC logs, check ZK logs and GC logs, check node memory monitoring
Finally, the abnormal node is reported offline and then reply to solve
- Docker encountered an unlimited restart when mounting data volumes
Check logs and find no permission, configure
privileged: true
- Error: Kafka address occupied in docker
unset JMX_PORT; bin/kafka-topics.sh ..
As a trickier way to do this, uncheck the JMX_PORT variable defined in the kafka-env.sh script.
Copy the code
—
Section 3 Using commands to manipulate clusters
Normally, Kafka is connected by code.
However, occasionally you want to check whether Kafka is wrong or your code is wrong.
Or when you don’t have the time or time to create a piece of code, you can simply use the command line.
docker
docker inspect zookeeperCopy the code
zookeeper
The cluster view
Log in to the cluster and check its status
docker exec -it zoo1 bash zkServer.sh status ZooKeeper JMX enabled by default Using config: /conf/zoo.cfg Mode: Leader # Note Mode: standalone for standaloneCopy the code
The configuration file
If the state is single-machine, check the following files:
CFG # server.1=zoo1:2888:3888 ZOO_MY_ID=3 \ ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888"Copy the code
Start the ZK cluster
./zkServer.sh start
jps # QuorumPeerMainCopy the code
Kafka view
Docker exec it zoo1 bash zkcli. sh ls/ls /brokers/idsCopy the code
topic
Create a topic
Note that the following commands are all executed in kafka’s directory
CD/opt/kafka_2. 12-2.3.0 / unset JMX_PORT; Sh --create --zookeeper zoo1:2181 --replication-factor 1 -- Partitions 1 --topic test1 # --config delete. Retention. Ms =21600000 --config deleteCopy the code
Creating a Cluster Topic
Copy factor 1, partition number 3, name test.
unset JMX_PORT; bin/kafka-topics.sh --create --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --replication-factor 1 --partitions 3 --topic testCopy the code
See the topic
List and Details
unset JMX_PORT; bin/kafka-topics.sh --list --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 unset JMX_PORT; bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic __consumer_offsetsCopy the code
Delete the topic
Default tag deletion
unset JMX_PORT; Bin/kafka - switchable viewer. Sh -- delete -- -- zookeeper zoo1:2181, zoo2:2181, zoo3:2181 - topic test set # delete., enable true delete = trueCopy the code
producers
Send a message
Cat config/server properties | grep listeners # for listening address the unset JMX_PORT; Bin /kafka-console-producer.sh --broker-list broker1:9091 -- Topic test2 # runs to input informationCopy the code
Throughput test
unset JMX_PORT; bin/kafka-producer-perf-test.sh --num-records 100000 --topic test --producer-props bootstrap.servers=b roker1:9091,broker2:9092,broker3:9093 --throughput 5000 --record-size 102400 --print-metrics # 3501 records sent, 699.2 Records/SEC (68.28 MB/ SEC), 413.5 ms AVG latency, 1019.0 ms Max latency.Copy the code
consumers
Receive a message
unset JMX_PORT; Bin /kafka-console-consumer.sh -- Bootstrap-server broker1:9091 -- Topic test2 # Accept in real time, use from -- beginningCopy the code
List consumers
unset JMX_PORT; bin/kafka-consumer-groups.sh --bootstrap-server broker1:9091 --list # KafkaManagerOffsetCache # console-consumer-26390Copy the code
Viewing partition messages
View the latest messages received by the current partition
unset JMX_PORT; bin/kafka-console-consumer.sh --bootstrap-server broker1:9091 --topic test2 --offset latest --partition 0Copy the code
Throughput test
bin/kafka-consumer-perf-test.sh --topic test --messages 100000 --num-fetch-threads 10 --threads 10 --broker-list broker1:9091,broker2:9092,broker3:9093 --group console-consumer-26390Copy the code
Fault tolerance
unset JMX_PORT; Bin/kafka - switchable viewer. Sh - go - the zookeeper zoo1:2181, zoo2:2181, zoo3:2181 - topic test2 docker stop broker3 # kill a broker, Run Leader: -1 unset JMX_PORT; bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic test2Copy the code
All the commands are typed by hand to ensure availability.
The commands involved are quite long, please copy the commands in the code box at a time, do not consider the newline.
Section iv Terminology for Kafka
There was a glitch yesterday when it came to command line operations on kafka clusters.
The cluster was suspended while running producer throughput tests.
The kafka-producer-perf-test.sh command filled up all the disk space in a short time.
Today we’ll cover some of the basics of Kafka. Novice to, Daniel please skip
Introduction to the
- Kafka is written in Scala,
- Official homepage kafka.apache.org,
- Defined as a distributed real-time stream processing platform,
- Its performance depends heavily on disk performance.
- Messages are stateless and require periodic or quantitative deletion.
use
The messaging system
This is nothing to say, famous message-oriented middleware.
Application of monitoring
It is mainly used with ELK in monitoring.
User behavior tracking
Record massive information about users and transfer it to various big data software for processing, such as Hadoop,Spark, and Strom
Stream processing
Collecting stream data
This is my vacancy, yesterday command line operation, configuration file error, will be filled later.
Persistent log
The main application Kafka performance characteristics, with Flume + HDFS, quite easy to use.
performance
Kafka is said to have tens of millions of performance, we do not have such a large amount, dare not comment. But millions are accepted.
The good performance is due to heavy use of operating system page caching and no direct involvement in physical I/O operations. Appending data is also used to avoid disk performance nightmare caused by random data writes.
Also, zero-copy technology represented by SendFile completes data copy in the kernel area to avoid user cache.
Save the data
In Zookeeper, Kafka saves information in several directories. Viewing method:
docker exec -it zoo1 bash
zkCli.sh
ls /
ls /brokers/ids
...Copy the code
Directory name |
use |
---|---|
brokers |
Stores cluster and topic information |
controller |
Store node election information |
admin |
Stores the output of the script command |
isrchangenotification | ISR that records changes |
config |
Record the cluster ID and version number |
controller_epoch |
Record the controller version number to avoid tombstone issues |
Terms used
The name of the |
use |
---|---|
broker |
Kafka server |
The cluster |
A unit of work composed of multiple brokers |
The message |
The most basic data unit |
batch |
A group of messages |
A copy of the |
Redundant forms of messages |
Message schema | The way messages are serialized |
submit |
Updates the current location of the partition |
The theme |
Mysql table, corresponding to the command topic |
partition |
The corresponding command is partition |
producers |
Responsible for message entry |
consumers |
Responsible for message output |
Supplement:
- Message location: a unique message can be located by tPOIC, partition, or offset.
- Replicas are classified into leader replica and follower replica.
Followers are used to copy data
When the leader dies, a new leader is elected from the followers.
Followers copy data and select a new leader if the leader fails.
- Topic can have multiple partitions with multiple segments to hold messages
The configuration file
The following four configuration files are available:
use |
The file name |
---|---|
The broker configuration |
server.properties |
Zookeeper configuration | zookeeper.properties |
Consumer configuration |
consumer.properties |
Producer allocation |
producer.properties |
The basics are the basics. Yann can only disdain the above content when he sees it again after learning the basics. So, come on.
Section 5 Working principle of Kafka cluster
Before the socket
Yesterday, I sent my public account to the big guy and was criticized. Said the format is too messy to go down. Then I went on a formatting tour, Posting dozens of previews in a row, and felt like I was passing out.
So, today’s content is a little bit watery, forgive me.
The cluster theory
Here is a brief description of kafka’s clustering principle. As explained earlier, a Kafka cluster consists of three ZooKeeper servers and three Kafka servers.
The relationship is similar to the following diagram:
The relationship doesn’t matter, as long as ZooKeeper is the database and Kakka is the instance. Both individuals are strong enough (with three nodes) to be even stronger together.
So why did Kafka report zK’s thighs? Zk is actually used to solve the problem of distribution consistency. The three nodes are distributed across three servers, and the data is consistent, although many systems are self-maintained, but Kafak is called external.
However, ZooKeeper alone is not enough. It also needs to make considerable efforts.
Kafka’s clustering relies on data replication and leadership elections to ensure consistency.
In data replication, there are three replicas, but only the leader serves them. Followers monitor the leader’s movements and pull new copies to themselves.
Leadership election means that if the leader fails, a new leader will be chosen from the followers who are closest to the leader.
Each Kafka instance registers itself with the ZooKeeper service as a session. When a Kafka instance fails, its session with ZooKeeper fails.
Just like clocking in at work, if you haven’t clocked in for a period of time, you know that the leader is cold.
Add a noun
ISR: The leader node keeps track of the Replica list that is synchronized with it. This list is called ISR (In-sync Replica).
The working process
Now that you know how clustering works, take a look at the workflow.
The application first connects to the ZooKeeper cluster to obtain some messages from the Kafka cluster. Among them, the most important thing is to know who is the leader. The following things are simple:
- The application sends the message to the leader
- The leader writes the message to a local file
- Followers know that the messages are later synchronized
- Followers synchronize good news and report it to the leader
- The leader tells the application after collecting all deputy ACK signals
The general process is the above steps, but there will be some details, and can be fine-tuned with parameters.
For example, the leader does not write a message to disk as soon as it receives it, a threshold of time or number of messages. Partiton physically corresponds to a folder. Generally, multiple copies of a partition will not be allocated to the same physical machine. It’s a matter of whether to feed back to the application first or ensure synchronization first, and which partition the message is written to, depending on the parameters.
One important feature of Kafka is that it guarantees the order of messages within a single partition. The reason is that Kafka creates a separate disk space and writes data sequentially. There are multiple groups of segment files in the partition. If the conditions are met, write new segment files to disk.
Consumption mechanism
Finally, consumers are applications. In fact, the application actively pulls messages from Kafka. Of course, also find the leader to pull. Due to the strong performance of Kafka, multiple consumers can be added at the same time, and consumers can form consumer groups. Consumers of the same consumer group can consume data from different partitions of the same topic.
When there are plenty of partitions, one consumer may consume more than one partition, but if they consume more than one partition, there may be consumers who do nothing and lie on standby. So, don’t let the number of consumers exceed the number of topic partitions.
The ownership of
Handling of messages when a client crashes.
- Consumer groups share reception
- Rebalance ownership transfer
- Consumers send heartbeats to brokers to maintain ownership
- The client pulls the data and records the consumption
Log compression
- Partition for a topic
- Compression does not reorder messages
- The offset of the message does not change
- The offset of the message is the order
——
conclusion
I’m sorry. It’s a little anticlimactic. The first two stanzas are written in great detail, but the rest are cursory. Kafka is, after all, middleware, not a platform. Further down the line, you need to write a production architecture or describe a business process, regardless of the original intention. After all, it was supposed to be a simple Kafka.
Put a hang up button first, and then add another idea. I met it when I took ELK.
Thanks for reading.
Kafka configuration file attached:
Docker network create --driver bridge --subnet 172.69.0.0/25 --gateway 172.69.0.1 kafka_zoo version: '2' services: broker1: image: wurstmeister/kafka restart: always hostname: broker1 container_name: broker1 ports: - "9091:9091" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 1 KAFKA_ADVERTISED_HOST_NAME: broker1 KAFKA_ADVERTISED_PORT: 9091 KAFKA_HOST_NAME: broker1 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker1:9091 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9091 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker1/:/kafka" networks: default: ipv4_address: 172.69.0.11 Broker2: Image: Wurstmeister/Kafka Restart: Always Hostname: Broker2 Container_name: Broker2 Ports: - "9092:9092" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 2 KAFKA_ADVERTISED_HOST_NAME: broker2 KAFKA_ADVERTISED_PORT: 9092 KAFKA_HOST_NAME: broker2 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker2:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker2/:/kafka" networks: default: ipv4_address: 172.69.0.12 Broker3: Image: Wurstmeister/Kafka Restart: Always Hostname: Broker3 Container_name: Broker3 Ports: - "9093:9093" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 3 KAFKA_ADVERTISED_HOST_NAME: broker3 KAFKA_ADVERTISED_PORT: 9093 KAFKA_HOST_NAME: broker3 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker3:9093 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9093 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker3/:/kafka" networks: default: ipv4_address: 172.69.0.13 kafka-manager: image: sheepkiller/kafka-manager restart: always container_name: kafa-manager hostname: Kafka-manager ports: - "9002:9000" links: # Connect to the container-broker1-Broker2-Broker3 external_links created by this compose file: # connect this compose documents outside of the container - zoo1 zoo2 - zoo3 environment: ZK_HOSTS: zoo1:2181, zoo2:2181, zoo3:2181 KAFKA_BROKERS: broker1:9091,broker2:9092,broker3:9093 APPLICATION_SECRET: letmein KM_ARGS: -Djava.net.preferIPv4Stack=true networks: Default: ipv4_address: 172.69.0.10 Networks: default: external: name: kafka_zoo # mkdir -p /root/kafka/broker1 # mkdir -p /root/kafka/broker2 # mkdir -p /root/kafka/broker3Copy the code
This article is published by OpenWrite!
The article published in the platform, and the original format is different, please forgive the inconvenience of reading
The latest content welcome to pay attention to the public account: