Recently, the Apache Pulsar messaging middleware is very hot, known as the next generation of messaging, today, let’s take a look at how awesome it is?
An overview of the
Apache Pulsar is a Pub/Sub messaging platform that uses Apache BookKeeper to provide persistence. It is a server-side messaging middleware, originally developed by Yahoo and opened source in 2016. It is currently being incubated under the Apache Foundation. It can provide the following features:
- Cross-region replication
- multi-tenant
- Zero data loss
- Zero Rebalancing time
- Unified queue and flow models
- High scalability
- High throughput
- Pulsar Proxy
- function
architecture
Pulsar uses a hierarchical structure to isolate the storage mechanism from the broker. This architecture provides Pulsar with the following benefits:
- Standalone Extension Broker
- Standalone extended storage (Bookies)
- Easier to container ZooKeeper, Broker and Bookies
- ZooKeeper provides configuration and state storage for the cluster
In a Pulsar cluster, one or more agents process and load balance incoming messages from producers, dispatch messages to consumers, communicate with the Pulsar configuration store to handle various coordination tasks, store messages in BookKeeper instances (aka Bookies), Dependency on cluster-specific ZooKeeper cluster tasks, and so on.
- A BookKeeper cluster consisting of one or more BookIE handles persistent storage of messages.
- The ZooKeeper cluster specific to this cluster handles the coordination tasks between the Pulsar clusters.
More about architecture is introduced of the Pulsar, please refer to: https://pulsar.apache.org/doc…
Four subscription models
There are four subscription modes in Pulsar: exclusive, shared, failover, and key\_shared. These patterns are shown in the figure below.
Details refer to: https://pulsar.apache.org/doc…
Performance is better than Kafka
The best part about Pulsar is performance. Pulsar is much faster than Kafka. Compared to Kafka, Pulsar is 2.5 times faster and has 40% less latency.
Source: https://streaml.io/pdf/Gigaom…
Note: The comparison is for 1 topic on 1 partition, containing 100 bytes of messages, and Pulsar can send 220,000+ messages per second.
The installation
The binary version installs Pulsar
# download official binary package/root @ centos7 ~ # wget https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz # extract [root@centos7 ~]# tar ZXF apache-pulsar-2.8.0-bin.tar.gz [to ~]# CD apache-pulsar-2.8.0 [to] Apache-pulsar-2.8.0]# ll total 72 drwxr-xr-x 3 root root 225 Jan 22 2020 bin drwxr-xr-x 5 root root 4096 Jan 22 2020 conf drwxr-xr-x 3 root root 132 Jul 6 11:47 examples drwxr-xr-x 4 root root 66 Jul 6 11:47 instances drwxr-xr-x 3 root root 16384 Jul 6 11:47 lib -rw-r--r-- 1 root root 31639 Jan 22 2020 LICENSE drwxr-xr-x 2 root root 4096 Jan 22 2020 Licenses-rw-r --r-- 1 root root 6612 Jan 22 2020 NOTICE -rw-r--r-- 1 root root 1269 Jan 22 2020 README #bin
Docker installation (emphasis)
[root@centos7 ~]# docker run -it \ -p 6650:6650 \ -p 8080:8080 \ --mount source=pulsardata,target=/pulsar/data \ --mount Pulsar source = pulsarconf, target = / / conf \ apachepulsar/pulsar: pulsar 2.8.0 \ bin/standalone
Port 8080 is used for HTTP protocol access and port 6650 is used for Pulsar protocol (Java, Python, etc.) access.
The official visualization tool Pulsar Manager allows you to manage multiple Pulsar visually. https://pulsar.apache.org/doc…
[root@centos7 ~]# docker pull apachepulsar/pulsar-manager:v0.2.0 [to ~]# docker run-it \ -p 9527: 9527-p 7750:7750 \ -e SPRING_CONFIGURATION_FILE=/pulsar-manager/pulsar-manager/application.properties \ Apachepulsar/pulsar - manager: v0.2.0
Set administrator user and password
[root@centos7 ~]# CSRF_TOKEN=$(curl http://localhost:7750/pulsar-manager/csrf-token) curl \ -H 'X-XSRF-TOKEN: $CSRF_TOKEN' \ -H 'Cookie: XSRF-TOKEN=$CSRF_TOKEN; ' \ -H "Content-Type: application/json" \ -X PUT http://localhost:7750/pulsar-manager/users/superuser \ -d '{"name": "admin", "password": "admin123", "description": "test", "email": "[email protected]"}' {"message":"Add super user success, please login"}
Enter http://server_ip:9527 directly in your browser to log in as follows
Enter the user just created with the password, configuration management server
The list of
Toptic list
Toptic details
Client configuration
Java client
Here is an example of a Java consumer configuration using a shared subscription:
import org.apache.pulsar.client.api.Consumer;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.SubscriptionType;
String SERVICE_URL = "pulsar://localhost:6650";
String TOPIC = "persistent://public/default/mq-topic-1";
String subscription = "sub-1";
PulsarClient client = PulsarClient.builder()
.serviceUrl(SERVICE_URL)
.build();
Consumer consumer = client.newConsumer()
.topic(TOPIC)
.subscriptionName(subscription)
.subscriptionType(SubscriptionType.Shared)
// If you'd like to restrict the receiver queue size
.receiverQueueSize(10)
.subscribe();
The Python client
Here is an example of a Python consumer configuration using a shared subscription:
from pulsar import Client, ConsumerType
SERVICE_URL = "pulsar://localhost:6650"
TOPIC = "persistent://public/default/mq-topic-1"
SUBSCRIPTION = "sub-1"
client = Client(SERVICE_URL)
consumer = client.subscribe(
TOPIC,
SUBSCRIPTION,
# If you'd like to restrict the receiver queue size
receiver_queue_size=10,
consumer_type=ConsumerType.Shared)
C + + client
Here is an example of a C++ consumer configuration using shared subscriptions:
#include <pulsar/Client.h>
std::string serviceUrl = "pulsar://localhost:6650";
std::string topic = "persistent://public/defaultmq-topic-1";
std::string subscription = "sub-1";
Client client(serviceUrl);
ConsumerConfiguration consumerConfig;
consumerConfig.setConsumerType(ConsumerType.ConsumerShared);
// If you'd like to restrict the receiver queue size
consumerConfig.setReceiverQueueSize(10);
Consumer consumer;
Result result = client.subscribe(topic, subscription, consumerConfig, consumer);
More configuration and operational guidelines, the official document writing is clear, the official document: https://pulsar.apache.org/docs/
conclusion
As the next generation of distributed message queues, Plusar has a number of attractive features that make up for some of the weaknesses of competing products, such as geographical replication, multi-tenancy, scalability, read/write isolation, and so on.