Everybody is good! I’m Dongci777. Today I’m going to give you a quick overview of Kafka and how to send and consume messages

Overview of Kafka

1, define,

Distributed Message Queue based on publish and subscribe mode is mainly used for real-time processing of big data, including the classic combination of Spark and Kafka. Kafka is the preferred messaging middleware of most companies when developing applications of big data

2. Message queues

Asynchronous processing in message queue application scenarios

The most classic scenario is to register for sending SMS messages.

Synchronous processing:

  • Step 1: The user fills in the registration information
  • Step 2: Write the registration information to the database
  • Step 3: Invoke the SMS sending interface
  • Step 4: Send SMS to the user’s mobile phone
  • Step 5: The page responds successfully

Asynchronous processing:

  • Step 1: The user fills in the registration information
  • Step 2: The registration information is written to the database, immediately respond to the user prompt registration success, and the request to send SMS message written to the message queue

Obviously, asynchronous processing gives the user a better experience without waiting

In fact, joining the message queue can decouple and buffer (peak clipping)

1) Decoupling: No two services need to be online at the same time

2) Recoverability: when some components fail, the whole system will not be affected

3) Buffer: to solve the inconsistent processing speed of production messages and consumption messages

4) Flexibility: Since message queues are distributed, they can be up and down at any time

Two modes of message queuing

1. Point-to-point mode

One-to-one, consumers take the initiative to pull data, messages received after the message cleared

The message produces the message to the Queue, the message consumer retrieves and consumes the message from the Queue, once the message is consumed, it is cleared from the Queue, so the consumer cannot consume a message that has already been consumed. The Queue supports multiple consumers, but for a message, only one consumer can consume it

Publish/subscribe

One-to-many, the consumer does not clear the message after consuming it

A message producer (publication) publishes a message to a topic, and multiple message consumers (subscription) consume the message. Unlike the point-to-point model, messages published to a topic are consumed by all subscribers. Kafka is based on that model, based on pull, where consumers themselves pull messages to consume

Disadvantages of active push: some consumers have insufficient consumption ability, and some consumers waste resources.

Disadvantages of active pull: Consumers will always poll the queue for messages to consume, which will result in constant polling

Kafka Infrastructure

1, topic

A Topic is called a Topic, and in Kafka, a class attribute is used to classify the message into a class called Topic. Simply put, it is classified, such as men, women, children and the elderly

2, partitin

A topic is divided into one or more partitions. It is a physical concept that corresponds to one or more directories on the system. A partition is a commit log. Messages are appended to the partition and read sequentially.

Since a topic contains an infinite number of partitions, there is no guarantee of order across the entire topic, but a single Partition Partition can guarantee order. Messages are forced to be written to the end of each partition. Kafka implements data redundancy and scalability through partitioning

Partitions can be distributed on different servers, that is, a topic can span multiple servers to provide greater performance than a single server.

3, the segment

Segment is translated as a segment, which further subdivides a Partition into several segments. Each segment file has the same size.

4, the broker

A Kafka cluster consists of one or more servers. Each server in Kafka is called a broker, which receives messages from producers, sets offsets for messages, and submits them to disk for storage. The broker serves consumers and responds to requests to read partitions by returning messages that have been committed to disk.

A broker is part of a cluster. In each cluster, one broker acts as a Leader and is elected by the active members of the cluster. A member of each cluster may act as a Leader, who is responsible for managing the work, including assigning partitions to and monitoring brokers. Partition replication occurs in a cluster where a partition is subordinate to a Leader, but a partition can be assigned to multiple brokers (non-leaders). This replication mechanism provides message redundancy for partitions, and if a broker fails, other active users re-elect a Leader to take over.

5, producer

Producers, or message publishers, publish messages from a topic to the appropriate partition. By default, producers distribute messages evenly across all partitions of a topic, regardless of which partition a particular message is written to. However, in some cases, the producer writes the message directly to the specified partition.

6, consumer

A consumer, or message consumer, can consume messages from multiple topics, but a topic partition can only be consumed by one consumer

Note: Offset before 0.9 is stored in ZK, and after 0.9 is stored locally

Iv. Kafka installation and deployment

Start ZooKeper before you start Kafka. Because Kafka depends on ZooKeper, you need to start zooKeeper first

1. Install Zookeeper

1.1 download and install

Download from the website to zookeeper installation package, download address: zookeeper.apache.org/releases.ht…

Just download the corresponding version, I used version 3.5.8 here

Once the download is complete, drop the package to any directory on your Linux server, such as/TMP, and run the following command:

Gz // Decompress the zookeeper-3.5.8.tar.gz file. Mv zookeeper-3.5.8 /opt/Bigdata/zookeeper // Move the file to a specified directory (customized).Copy the code

Then go to the conf directory of the Zookeeper directory and copy the zoo_sample. CFG file to zoo.cfg

cp zoo_example.cfg zoo.cfg
Copy the code

Modify zoo. CFG file:

TickTime =2000 // Milliseconds initLimit=10 // Upper limit for setting up initial connections between the secondary node and the primary node 10 x 2000ms syncLimit=5 // Upper limit for setting up initial connections between the secondary node and the primary node 5 x 2000ms syncLimit=5 // Upper limit for setting up initial connections between the secondary node and the primary node 5 x 2000ms dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/logs clientPort=2181Copy the code

The zooKeeper configuration is complete.

1.2. Start ZooKeeper

Switch to the ZooKeeper home directory and run the following command:

./bin/zkServer.sh start
Copy the code

Startup success is as follows

Run the following command to connect to the ZooKeeper port:

telnet localhost 2181
Copy the code

Then send SRVR to verify that ZooKeeper is correctly installed

Zookeeper installation, configuration, and startup are complete.

Install Kafka Broker

1.1 download and install

Download from the website kafak installation package, download address: kafka.apache.org/downloads

Download the corresponding version, which I used here is version 2.6.0, and the corresponding Scala version is 2.12

Once the download is complete, drop the package to any directory on your Linux server, such as/TMP, and run the following command:

Tar -zxf kafka_2.12-2.1.0.tgz mv kafka_2.12-2.6.0 /opt/Bigdata/Copy the code

Now that kafka is installed, configure server.properties in the config directory

log.dirs=/data/kafka/logs
Copy the code

1.2. Start Kafka

Go to the bin directory in the home directory and run the following command:

. / kafka - server - start. Sh - daemon/opt/Bigdata/kafka_2 12 - server/config/server propertiesCopy the code

Start creating a new test topic after executing the above statement

 ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Copy the code

The following information is displayed:

The test topic has been created successfully.

Use the following command to list all topics

./kafka-topics.sh --list --zookeeper localhost:2181
Copy the code

Once we have a topic, we can use the following command to publish information to a topic

./kafka-console-producer.sh --broker-list localhost:9092 --topic test
Copy the code

After the producer has sent the message, we can use the consumer to consume the message

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Copy the code

Note: — zooKeeper is an outdated method and requires –bootstrap-server

This completes the simple sending and consuming of the message

I’ll cover some of Kafka’s configurations in more detail next time