[TOC]

Section 1 Kafka sets

Before the socket

If you’re a developer and not interested in building kafka clusters, you can skip this chapter and jump right to tomorrow.

If you think it wouldn’t hurt to know a little more, read on.

Mind you, this chapter is full of figures

Kafka cluster setup

An overview of the

Setting up a Kafka cluster is tedious. You just download files and modify configurations, but there is a lot of data.

The base environment requires three ZK servers and three Kafka servers.

Operation process

Look at the picture

It looks a little bit longer, so INSTEAD of doing that, I’m going to use Docker to simplify the process a little bit.

Kafka cluster setup quickly

Install the Docker

Calculate the review

Uname -a yum -y install docker service docker start # or curl -fssl https://get.docker.com -o get-docker get-docker.shCopy the code

Mirror to accelerate

vi /etc/docker/daemon.json
    {
      "registry-mirrors": ["https://uzoia35z.mirror.aliyuncs.com"]
    }Copy the code

zookeeperThe cluster

Docker – compose has been installed

Create a Docker network

Docker network create --driver bridge --subnet 172.29.0.0/25 \ --gateway 172.29.0.1 elk_zoo Docker network lsCopy the code

Yml script

The configuration is too long, put a structure here first, the source file will be posted on the blog later.

The listed items should be configured basically. Key attention should be paid to:

Ports: # port
Volumes: # Mount volumes
Environment: Indicates the environment variable
Networks: There are two parts, IP and common network

Refer to the configuration file for verification.

docker-compose up -dCopy the code

validation

ZooInspector

CD zookeeper/SRC/contrib/zooinspector / # open failure, need to verifyCopy the code

Kafka cluster

The mirror

docker pull wurstmeister/kafka
docker pull sheepkiller/kafka-managerCopy the code

Yml script

The configuration is too long, put a structure here first, the source file will be posted on the blog later.

The listed items should be configured basically. Key attention should be paid to:

Ports: # port
Volumes: # Mount volumes
Environment: Indicates the environment variable
Even outside the external_links
Networks: There are two parts, IP and common network

Refer to the configuration file for verification.

docker-compose up -dCopy the code

validation

Use the kafka-Manager admin page, local IP plus 9000 port

Wrap it up.

In line with the belief of the god of laziness, docker completed the cluster construction in a short time. Clap your hands.

Stay tuned for command line practice tomorrow.

Today’s three diagrams are quite complicated and do not need to be memorized. Just figure out the flow by referring to the configuration file.

Section 2 Cluster Management Tools

Let’s start with a question. Yesterday I finished building kafka’s cluster and installed the management tools, as shown in the screenshot.

Can anyone see or guess the problems in the cluster? Have confidence to oneself can add my good friend private chat, if thinking is right, I also can send a small red envelope to encourage.

Cluster Management Tool

An overview of the

Kafa-manager is a common kafka cluster management tool. There are many similar tools, but also the company’s own development type of tools.

Operation process

After the cluster is configured, you can log in to Kafa-Manager from a browser and add the cluster management.

Once added, it will look like this

View Broker information

Click Topic to view Topic

Click again to set a single message

other

Preferred Replica ElectionReassign PartitionsConsumers

We’ll talk more about replica elections, partitions, and consumers.

Because the cluster is just built, much of the information will not be visible, and the next few articles will show it in conjunction with command line operations.

Cluster Issues

The following are some common faults and troubleshooting ideas:

It can be used in a single machine, but the cluster fails to send messages

The host name cannot be set to 127.0.0.1

You cannot consume information after upgrading

Check the default topic

__consumer_offsets

Slow response

Using performance test scripts:

kafka-producer-perf-test.sh

Analysis and report Generation

Check the JStack information or locate the source code

The log continues to report exceptions

Check kafka logs, check GC logs, check ZK logs and GC logs, check node memory monitoring

Finally, the abnormal node is reported offline and then reply to solve

Docker encountered an unlimited restart when mounting data volumes

Check logs and find no permission, configure

privileged: true

Error: Kafka address occupied in docker

unset JMX_PORT; bin/kafka-topics.sh ..

As a trickier way to do this, uncheck the JMX_PORT variable defined in the kafka-env.sh script.

    Copy the code

—

Section 3 Using commands to manipulate clusters

Normally, Kafka is connected by code.

However, occasionally you want to check whether Kafka is wrong or your code is wrong.

Or when you don’t have the time or time to create a piece of code, you can simply use the command line.

docker

docker inspect zookeeperCopy the code

zookeeper

The cluster view

docker exec -it zoo1 bash zkServer.sh status ZooKeeper JMX enabled by default Using config: /conf/zoo.cfg Mode: Leader # Note Mode: standalone for standaloneCopy the code

The configuration file

If the state is single-machine, check the following files:

CFG # server.1=zoo1:2888:3888 ZOO_MY_ID=3 \ ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888"Copy the code

Start the ZK cluster

./zkServer.sh start
jps   # QuorumPeerMainCopy the code

Kafka view

Docker exec it zoo1 bash zkcli. sh ls/ls /brokers/idsCopy the code

topic

Create a topic

Note that the following commands are all executed in kafka’s directory

CD/opt/kafka_2. 12-2.3.0 / unset JMX_PORT; Sh --create --zookeeper zoo1:2181 --replication-factor 1 -- Partitions 1 --topic test1 # --config delete. Retention. Ms =21600000 --config deleteCopy the code

Creating a Cluster Topic

Copy factor 1, partition number 3, name test.

unset JMX_PORT; bin/kafka-topics.sh --create --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --replication-factor 1 --partitions 3 --topic testCopy the code

See the topic

List and Details

unset JMX_PORT; bin/kafka-topics.sh --list --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 unset JMX_PORT; bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic __consumer_offsetsCopy the code

Delete the topic

Default tag deletion

unset JMX_PORT; Bin/kafka - switchable viewer. Sh -- delete -- -- zookeeper zoo1:2181, zoo2:2181, zoo3:2181 - topic test set # delete., enable true delete = trueCopy the code

producers

Send a message

Cat config/server properties | grep listeners # for listening address the unset JMX_PORT; Bin /kafka-console-producer.sh --broker-list broker1:9091 -- Topic test2 # runs to input informationCopy the code

Throughput test

unset JMX_PORT; bin/kafka-producer-perf-test.sh --num-records 100000 --topic test --producer-props bootstrap.servers=b roker1:9091,broker2:9092,broker3:9093 --throughput 5000 --record-size 102400 --print-metrics # 3501 records sent, 699.2 Records/SEC (68.28 MB/ SEC), 413.5 ms AVG latency, 1019.0 ms Max latency.Copy the code

consumers

Receive a message

unset JMX_PORT; Bin /kafka-console-consumer.sh -- Bootstrap-server broker1:9091 -- Topic test2 # Accept in real time, use from -- beginningCopy the code

List consumers

unset JMX_PORT; bin/kafka-consumer-groups.sh --bootstrap-server broker1:9091 --list # KafkaManagerOffsetCache # console-consumer-26390Copy the code

Viewing partition messages

View the latest messages received by the current partition

unset JMX_PORT; bin/kafka-console-consumer.sh --bootstrap-server broker1:9091 --topic test2 --offset latest --partition 0Copy the code

Throughput test

bin/kafka-consumer-perf-test.sh --topic test --messages 100000 --num-fetch-threads 10 --threads 10 --broker-list broker1:9091,broker2:9092,broker3:9093 --group console-consumer-26390Copy the code

Fault tolerance

unset JMX_PORT; Bin/kafka - switchable viewer. Sh - go - the zookeeper zoo1:2181, zoo2:2181, zoo3:2181 - topic test2 docker stop broker3 # kill a broker, Run Leader: -1 unset JMX_PORT; bin/kafka-topics.sh --describe --zookeeper zoo1:2181,zoo2:2181,zoo3:2181 --topic test2Copy the code

All the commands are typed by hand to ensure availability.

The commands involved are quite long, please copy the commands in the code box at a time, do not consider the newline.

Section iv Terminology for Kafka

There was a glitch yesterday when it came to command line operations on kafka clusters.

The cluster was suspended while running producer throughput tests.

The kafka-producer-perf-test.sh command filled up all the disk space in a short time.

Today we’ll cover some of the basics of Kafka. Novice to, Daniel please skip

Introduction to the

Kafka is written in Scala,
Official homepage kafka.apache.org,
Defined as a distributed real-time stream processing platform,
Its performance depends heavily on disk performance.
Messages are stateless and require periodic or quantitative deletion.

use

The messaging system

This is nothing to say, famous message-oriented middleware.

Application of monitoring

It is mainly used with ELK in monitoring.

User behavior tracking

Record massive information about users and transfer it to various big data software for processing, such as Hadoop,Spark, and Strom

Stream processing

Collecting stream data

This is my vacancy, yesterday command line operation, configuration file error, will be filled later.

Persistent log

The main application Kafka performance characteristics, with Flume + HDFS, quite easy to use.

performance

Kafka is said to have tens of millions of performance, we do not have such a large amount, dare not comment. But millions are accepted.

The good performance is due to heavy use of operating system page caching and no direct involvement in physical I/O operations. Appending data is also used to avoid disk performance nightmare caused by random data writes.

Also, zero-copy technology represented by SendFile completes data copy in the kernel area to avoid user cache.

Save the data

In Zookeeper, Kafka saves information in several directories. Viewing method:

docker exec -it zoo1 bash
zkCli.sh
ls /
ls /brokers/ids    
...Copy the code

Directory name	use
brokers	Stores cluster and topic information
controller	Store node election information
admin	Stores the output of the script command
isrchangenotification	ISR that records changes
config	Record the cluster ID and version number
controller_epoch	Record the controller version number to avoid tombstone issues

Terms used

The name of the	use
broker	Kafka server
The cluster	A unit of work composed of multiple brokers
The message	The most basic data unit
batch	A group of messages
A copy of the	Redundant forms of messages
Message schema	The way messages are serialized
submit	Updates the current location of the partition
The theme	Mysql table, corresponding to the command topic
partition	The corresponding command is partition
producers	Responsible for message entry
consumers	Responsible for message output

Supplement:

Message location: a unique message can be located by tPOIC, partition, or offset.
Replicas are classified into leader replica and follower replica.

Followers are used to copy data

When the leader dies, a new leader is elected from the followers.

Followers copy data and select a new leader if the leader fails.

Topic can have multiple partitions with multiple segments to hold messages

The configuration file

The following four configuration files are available:

use	The file name
The broker configuration	server.properties
Zookeeper configuration	zookeeper.properties
Consumer configuration	consumer.properties
Producer allocation	producer.properties

The basics are the basics. Yann can only disdain the above content when he sees it again after learning the basics. So, come on.

Section 5 Working principle of Kafka cluster

Before the socket

Yesterday, I sent my public account to the big guy and was criticized. Said the format is too messy to go down. Then I went on a formatting tour, Posting dozens of previews in a row, and felt like I was passing out.

So, today’s content is a little bit watery, forgive me.

The cluster theory

Here is a brief description of kafka’s clustering principle. As explained earlier, a Kafka cluster consists of three ZooKeeper servers and three Kafka servers.

The relationship is similar to the following diagram:

The relationship doesn’t matter, as long as ZooKeeper is the database and Kakka is the instance. Both individuals are strong enough (with three nodes) to be even stronger together.

So why did Kafka report zK’s thighs? Zk is actually used to solve the problem of distribution consistency. The three nodes are distributed across three servers, and the data is consistent, although many systems are self-maintained, but Kafak is called external.

However, ZooKeeper alone is not enough. It also needs to make considerable efforts.

Kafka’s clustering relies on data replication and leadership elections to ensure consistency.

In data replication, there are three replicas, but only the leader serves them. Followers monitor the leader’s movements and pull new copies to themselves.

Leadership election means that if the leader fails, a new leader will be chosen from the followers who are closest to the leader.

Each Kafka instance registers itself with the ZooKeeper service as a session. When a Kafka instance fails, its session with ZooKeeper fails.

Just like clocking in at work, if you haven’t clocked in for a period of time, you know that the leader is cold.

Add a noun

ISR: The leader node keeps track of the Replica list that is synchronized with it. This list is called ISR (In-sync Replica).

The working process

Now that you know how clustering works, take a look at the workflow.

The application first connects to the ZooKeeper cluster to obtain some messages from the Kafka cluster. Among them, the most important thing is to know who is the leader. The following things are simple:

The application sends the message to the leader
The leader writes the message to a local file
Followers know that the messages are later synchronized
Followers synchronize good news and report it to the leader
The leader tells the application after collecting all deputy ACK signals

The general process is the above steps, but there will be some details, and can be fine-tuned with parameters.

For example, the leader does not write a message to disk as soon as it receives it, a threshold of time or number of messages. Partiton physically corresponds to a folder. Generally, multiple copies of a partition will not be allocated to the same physical machine. It’s a matter of whether to feed back to the application first or ensure synchronization first, and which partition the message is written to, depending on the parameters.

One important feature of Kafka is that it guarantees the order of messages within a single partition. The reason is that Kafka creates a separate disk space and writes data sequentially. There are multiple groups of segment files in the partition. If the conditions are met, write new segment files to disk.

Consumption mechanism

Finally, consumers are applications. In fact, the application actively pulls messages from Kafka. Of course, also find the leader to pull. Due to the strong performance of Kafka, multiple consumers can be added at the same time, and consumers can form consumer groups. Consumers of the same consumer group can consume data from different partitions of the same topic.

When there are plenty of partitions, one consumer may consume more than one partition, but if they consume more than one partition, there may be consumers who do nothing and lie on standby. So, don’t let the number of consumers exceed the number of topic partitions.

The ownership of

Handling of messages when a client crashes.

Consumer groups share reception
Rebalance ownership transfer
Consumers send heartbeats to brokers to maintain ownership
The client pulls the data and records the consumption

Log compression

Partition for a topic
Compression does not reorder messages
The offset of the message does not change
The offset of the message is the order

——

conclusion

I’m sorry. It’s a little anticlimactic. The first two stanzas are written in great detail, but the rest are cursory. Kafka is, after all, middleware, not a platform. Further down the line, you need to write a production architecture or describe a business process, regardless of the original intention. After all, it was supposed to be a simple Kafka.

Put a hang up button first, and then add another idea. I met it when I took ELK.

Thanks for reading.

Kafka configuration file attached:

Docker network create --driver bridge --subnet 172.69.0.0/25 --gateway 172.69.0.1 kafka_zoo version: '2' services: broker1: image: wurstmeister/kafka restart: always hostname: broker1 container_name: broker1 ports: - "9091:9091" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 1 KAFKA_ADVERTISED_HOST_NAME: broker1 KAFKA_ADVERTISED_PORT: 9091 KAFKA_HOST_NAME: broker1 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker1:9091 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9091 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker1/:/kafka" networks: default: ipv4_address: 172.69.0.11 Broker2: Image: Wurstmeister/Kafka Restart: Always Hostname: Broker2 Container_name: Broker2 Ports: - "9092:9092" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 2 KAFKA_ADVERTISED_HOST_NAME: broker2 KAFKA_ADVERTISED_PORT: 9092 KAFKA_HOST_NAME: broker2 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker2:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker2/:/kafka" networks: default: ipv4_address: 172.69.0.12 Broker3: Image: Wurstmeister/Kafka Restart: Always Hostname: Broker3 Container_name: Broker3 Ports: - "9093:9093" external_links: - zoo1 - zoo2 - zoo3 environment: KAFKA_BROKER_ID: 3 KAFKA_ADVERTISED_HOST_NAME: broker3 KAFKA_ADVERTISED_PORT: 9093 KAFKA_HOST_NAME: broker3 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_LISTENERS: PLAINTEXT://broker3:9093 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9093 JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - "/root/kafka/broker3/:/kafka" networks: default: ipv4_address: 172.69.0.13 kafka-manager: image: sheepkiller/kafka-manager restart: always container_name: kafa-manager hostname: Kafka-manager ports: - "9002:9000" links: # Connect to the container-broker1-Broker2-Broker3 external_links created by this compose file: # connect this compose documents outside of the container - zoo1 zoo2 - zoo3 environment: ZK_HOSTS: zoo1:2181, zoo2:2181, zoo3:2181 KAFKA_BROKERS: broker1:9091,broker2:9092,broker3:9093 APPLICATION_SECRET: letmein KM_ARGS: -Djava.net.preferIPv4Stack=true networks: Default: ipv4_address: 172.69.0.10 Networks: default: external: name: kafka_zoo # mkdir -p /root/kafka/broker1 # mkdir -p /root/kafka/broker2 # mkdir -p /root/kafka/broker3Copy the code

This article is published by OpenWrite!

The article published in the platform, and the original format is different, please forgive the inconvenience of reading

The latest content welcome to pay attention to the public account:

Some brief introduction to using Kafka: 1 cluster 2 principle 3 terminology

Section 1 Kafka sets

Before the socket

Kafka cluster setup

An overview of the

Operation process

Kafka cluster setup quickly

Install the Docker

zookeeperThe cluster

Create a Docker network

Yml script

validation

Kafka cluster

The mirror

Yml script

validation

Section 2 Cluster Management Tools

Cluster Management Tool

An overview of the

Operation process

Cluster Issues

Section 3 Using commands to manipulate clusters

docker

zookeeper

The cluster view

Kafka view

topic

Create a topic

Creating a Cluster Topic

See the topic

Delete the topic

producers

Send a message

Throughput test

consumers

Receive a message

List consumers

Viewing partition messages

Throughput test

Fault tolerance

Section iv Terminology for Kafka

Introduction to the

use

The messaging system

Application of monitoring

User behavior tracking

Stream processing

Persistent log

performance

Save the data

Terms used

The configuration file

Section 5 Working principle of Kafka cluster

Before the socket

The cluster theory

The working process

Consumption mechanism

conclusion

Related Posts

A programmer’s first year of management

How to have your own product methodology?

Graduated 2 years or 2020 | Denver annual essay ahead of garbage