Original: Sijie Guo

Translation: Zhai Jia

In the previous article, we delved into the messaging model of Apache Pulsar, which unified high-performance flows with flexible queues. This comparison shows how Apache Pulsar and Apache Kafka implement message consumption, acknowledgement, and retention in the messaging model. In this article, we will introduce some of the system architecture and design ideas behind Apache Pulsar and conclude with some comparisons with Apache Kafka’s architecture.

Pulsar’s layered architecture

The fundamental difference between Apache Pulsar and other messaging systems is the layered architecture. The Apache Pulsar cluster consists of two layers: the stateless service layer, which consists of a set of brokers that receive and deliver messages; And a stateful persistence layer consisting of a set of Apache BookKeeper storage nodes called Bookies that store messages persistently. The following figure shows a typical deployment of Apache Pulsar.

A Producer & Consumer interface is provided in the Pulsar client, which is used by applications to connect to brokers to publish and consume messages.

The Pulsar client does not interact directly with the storage layer Apache BookKeeper. The client does not have direct access to Zookeeper. This isolation provides the foundation for Pulsar to implement a secure multi-tenant unified authentication model.

Apache Pulsar provides support for clients in multiple languages, including Java, C ++, Python, Go, and Websockets.

Apache Pulsar also provides a set of Kafka-compatible apis that users can migrate existing Kafka applications by simply updating dependencies and pointing clients to Pulsar clusters so that existing Kafka applications can be immediately used with Apache Pulsar, No code changes are required.

Broker layer – Stateless service layer

The Broker cluster forms a stateless service layer in Apache Pulsar. The service layer is “stateless” because the Broker does not actually store any message data locally. Messages about Pulsar topics are stored in a distributed log storage system (Apache BookKeeper). We’ll talk more about BookKeeper in the next section.

Each Topic Partition is assigned by Pulsar to a Broker, who is called the owner of the Topic Partition. Pulsar producers and consumers connect to the owner Broker of the topic partition to send and consume messages to the owner Broker.

If a Broker fails, Pulsar automatically moves the topic partitions it owns to one of the remaining available brokers in the cluster. One thing to note here is that since brokers are stateless, Pulsar simply transfers ownership from one Broker to another when Topic migrates, and no data replication takes place in the process.

The following figure shows a Pulsar cluster with four brokers, with four topic partitions spread across four brokers. Each Broker owns and provides messaging services for a topic partition.

BookKeeper layer – Persistent storage layer

Apache BookKeeper is a persistent storage layer for Apache Pulsar. Each topic partition in Apache Pulsar is essentially a distributed log stored in Apache BookKeeper.

Each distributed log is divided into segments. Each Segment Segment acts as a Ledger in Apache BookKeeper, evenly distributed and stored in multiple Bookie (the storage node of Apache BookKeeper) in the BookKeeper cluster. Create a Segment based on the size of the Segment. Configuration-based rolling time; Or when the owner of the Segment is switched.

Messages from the topic partition are evenly and evenly distributed across all Bookie’s in the cluster by way of Segment segmentation. This means that the size of topic partitions is not only limited by the capacity of a node; Instead, it can scale to the total capacity of the entire BookKeeper cluster.

The figure below illustrates a topic partition divided into X segments. Each Segment stores three copies. All segments are distributed and stored in four Bookie.

Segment-centered storage

The layered architecture of storage services and segment-centric storage are two key design concepts behind Apache Pulsar (using Apache BookKeeper). These two foundations provide a number of important benefits for Pulsar:

  • Unlimited topic partitioned storage
  • Instant scaling, no data migration required
  • Seamless Broker fault recovery
  • Seamless cluster scaling
  • Seamless storage (Bookie) failure recovery
  • Independent scalability

Let’s look at some of the benefits separately.

Unlimited topic partitioned storage

Because topic partitions are split into segments and stored distributed in Apache BookKeeper, the size of topic partitions is not limited by any single node capacity. Instead, theme partitions can scale to the total capacity of the entire BookKeeper cluster by simply adding a Bookie node. This is the key to Apache Pulsar’s ability to store streaming data of unlimited size and process data in an efficient, distributed manner. Distributed log storage using Apache BookKeeper is critical to unified messaging services and storage.

Instant scaling, no data migration required

Because message services and message stores are two-tier, moving topic partitions from one Broker to another can be done almost instantaneously without any data rebalancing (re-copying data from one node to another). This feature is critical for many aspects of high availability, such as cluster scaling; Quick response to Broker and Bookie failures. I’ll use examples to explain in more detail below.

  • Seamless Broker fault recovery

The following diagram illustrates an example of how Pulsar handles Broker failures. In the example Broker 2 is disconnected for some reason (such as a power outage). Pulsar detects that Broker 2 is closed and immediately transfers ownership of Topic1-Part2 from Broker 2 to Broker 3. In Pulsar the data store is separated from the data service, so when agent 3 takes over ownership of Topic1-Part2, it does not need to copy Partiton’s data. If new data arrives, it is immediately appended and stored as Segment X + 1 in TopIC1-part2. Segment X + 1 is distributed and stored on Bookie1, 2, and 4. Because it does not need to replicate the data again, the ownership transfer occurs immediately without sacrificing the availability of the subject partitions.

  • Seamless cluster capacity expansion

The following figure illustrates how Pulsar handles cluster capacity expansion. Bookie X and Bookie Y are added to the cluster when Broker 2 writes the message to Segment X of Topic1-part2. Broker 2 immediately found the newly added Bookies X and Y. The Broker will then attempt to store the messages for Segment X + 1 and X + 2 into the newly added Bookie. The newly added Bookie is immediately used and traffic immediately increases without any data being copied again. In addition to rack-aware and area-aware strategies, Apache BookKeeper also provides resource-aware placement strategies to ensure that traffic is balanced across all storage nodes in a cluster.

  • Seamless storage (Bookie) failure recovery

The following figure illustrates how Pulsar (via Apache BookKeeper) handles disk failures in Bookie. There was a disk failure that destroyed Segment 4 stored on Bookie 2. The Apache BookKeeper background will detect this error and copy and fix it.

Copy fixes in Apache BookKeeper are quick many-to-many fixes at the Segment (or even Entry) level, which are more subtle than replicating an entire subject partition, copying only as much data as is necessary. This means that Apache BookKeeper can read messages in Segment 4 from bookie 3 and Bookie 4 and fix Segment 4 at Bookie 1. All replica repair is done in the background, transparent to the Broker and the application. Even when a Bookie node fails, all brokers can continue to accept writes without sacrificing the availability of the theme partition by replacing the failed Bookie with a new available Bookie.

Independent scalability

Because the message service layer and the persistent storage layer are separate, Apache Pulsar can extend the storage and service layers independently. This standalone expansion is more cost-effective:

When you need to support more consumers or producers, you can simply add more brokers. The theme partitions are immediately balanced among Brokers, and ownership of some theme partitions is immediately transferred to the new Broker.

When you need more storage to keep messages for longer, you simply add more Bookie. With intelligent resource awareness and data placement, traffic will automatically switch to the new Bookie. There is no unnecessary data relocation involved in Apache Pulsar, where old data is not copied from an existing storage node to a new storage node.

Compare that to Kafka

Apache Kafka and Apache Pulsar both have similar message concepts. The client interacts with the messaging system through topics. Each topic can be divided into multiple partitions. However, the fundamental difference between Apache Pulsar and Apache Kafka is that Apache Kafka centers on partitions while Apache Pulsar centers on segments.

The figure above shows the difference between partition-centric and segment-centric systems.

In Apache Kafka, partitions can only be stored on a single node and copied to other nodes, and their capacity is limited by the minimum node capacity. This means that capacity expansion requires rebalancing of partitions, which in turn requires replicating the entire partition to balance the data and traffic of the newly added agents. Retransmitting data is expensive and error-prone, and consumes network bandwidth and I/O. Maintenance personnel must be very careful when performing this operation to avoid damaging the production system.

The re-copy of partitioned data in Kafka does not only occur on cluster extensions in partition-centric systems. Many other things can also trigger data re-copying, such as copy failure, disk failure, or computer failure. During data replication, partitions are usually unavailable until data replication is complete. For example, if you configure a partition to store three copies, if one copy is lost, the entire partition must be copied again before the partition is available again.

This defect is usually ignored until the user encounters a failure, because in many cases it is just a read of cached data in memory for a short period of time. After data is saved to disks, users will inevitably encounter data loss and recovery problems, especially when data needs to be saved for a long time.

In contrast, Apache Pulsar also uses partitions as logical units but segments as physical storage units. Partitions are segmented over time and evenly distributed throughout the cluster, designed to scale efficiently and quickly.

Pulsar is segment-centric, so there is no need for data rebalancing and copying when expanding capacity, and old data is not copied again, thanks to the use of an extensible segment-centric distributed log storage system in Apache BookKeeper.

By leveraging distributed log storage, Pulsar can maximize Segment placement options for high write and high read availability. For example, with BookKeeper, the replica setting is equal to 2, and the subject partition can be written as soon as any two Bookies are started. For read availability, as long as one replica of a topic partition is active, the user can read it without any inconsistencies.

conclusion

In summary, Apache Pulsar’s unique segments-centric publish/subscribe messaging system based on distributed log storage can provide many advantages, such as a reliable streaming system, including unlimited log storage, instant scaling without partition rebalancing, Fast copy fixes and high write and read availability options by maximizing data placement.

In this article, we gave an architectural overview of Apache Pulsar. The layered architecture and segment-centric design in Apache Pulsar are two unique design concepts compared to other streaming messaging systems. Hopefully the reader will get a better understanding of Apache Pulsar from an architectural and developer perspective, and understand the differences between Apache Pulsar and Apache Kafka.


If you are interested in Pulsar, you can participate in the Pulsar community in the following ways:

  • Pulsar Slack channel: apache-pulsar.slack.com/ You can register here: apache-pulsar.herokuapp.com/
  • Pulsar mailing list: pulsar.incubator.apache.org/contact

General information about the Pulsar Apache project, please visit the website: pulsar.incubator.apache.org/ @ apache_pulsar can also focus on Twitter account.