1. Introduction

Why is Apache TubeMQ Benchmark designed this way? What is the feedback for each scenario, and is it a specific scenario definition to amplify the benefits of TubeMQ? How does it compare to other MQ Benchmark test cases? In this article, we will do the interpretation of Benchmark test items and test results of TubeMQ.

2. Say it first

The Benchmark test item and test result report of TubeMQ, tubemq_perf_test_vs_Kafka, are included in the website of the project and are not directly displayed on the page. It is mainly discussed by the team during the open source period and has no intention to do PK competition, so we need to visit the page directly through the link.

In the document tubemq_perf_test_vs_Kafka, there are only 8 test scenarios, each of which contains multiple test items, and the total number of test items is nearly 100. The actual test report is much more detailed than this. The published test items and test results have provided a clear picture of the actual situation of TubeMQ.

TubeMQ the latest version of the test data must be to do the data better than open source early, just because the initial test results have been published there is no need to spend time and resources to adjust the content, you can in the knowledge of the conception of TubeMQ Benchmark test item for more targeted analysis and review, to verify the truth of it.

3. Design ideas of TubeMQ Benchmark test items

When designing the Benchmark test item Apache TubeMQ, we mainly consider these points: what are the strengths and weaknesses of the core design scheme of the system, and what are its boundary values in application? How will we use this system in the real environment, and how will the corresponding configuration adjustment affect the system? As the load increases, how does the system’s indicator trend behave within the design-allowed limits?

Each test scenario in the test report has been given in the way of [scenario, conclusion and data], and our various indicator views have been given. Here, we still give it in the way of [scenario, data interpretation] for your convenience and reference.

4. Interpretation of TubeMQ Benchmark test results:

4.1 Scenario 1: Basic scenario, single topic, one in and two out model, using different consumption modes and message packets of different sizes, partition gradually horizontal expansion, and comparing the performance of TubeMQ and Kafka

Through this test item, we can obtain a benchmark information, the Topic throughput capacity of TubeMQ does not improve with the increase of the number of partitions, the throughput has no relationship with the number of partitions; Meanwhile, the throughput of a single storage instance of TubeMQ is not as good as Kafka’s throughput of a single Partition. In combination with our storage design introduction and the data test results, we can know that Partition in TubeMQ is not the actual storage unit, and it needs to be distinguished from Kafka Partiton concept when using. In other words, the Partition of TubeMQ is a logical Partition, hooked only to the parallelism of the consumption.

Why the 1 part production 2 parts consumption prerequisite: Most of the test scenarios are 1 part production 2 parts consumption, why this prerequisite? From our point of view, the pure production scenario is not in line with the actual online application scenario of MQ. There will be at least one consumption of online data, and at most dozens of data. If only pure production is tested, it cannot reflect the actual operation of the system after it goes online. We have also conducted tests on pure production, production of 1 copy and consumption of 1 copy, production of 2 copies and production of multiple consumption of 1 copy. According to the test data, the TPS of the system (the abbreviation of TransactionsPerSecond, the number of successful request responses per second) will be directly affected by the number of consumption copies. In our environment, there are usually two reads of data, so our main test scenarios are benchmarking requirements for two consumption.

Why 1K size is selected as the front for single packet: The packet length is also selected through this test case. According to the test data of scenario 1, as the packet length increases, although the traffic increases, the TPS of the system gradually decreases. The larger the packet, the more TPS decreases. According to our test data, the TPS, cost and other aspects of the system are acceptable in the range of 1 ~ 4K. Taking 1K as the benchmark, it is neither long nor short, which is more in line with the actual use of the system after it goes online.

4.2 Scenario 2: Single topic, one-in-two-out model, fixed consumption package size, horizontally extended instances, and comparison of TubeMQ and Kafka performance

From this table, we can get some information:

  1. In scenario 1, a single instance of Apache TubeMQ is not as capable as Kafka’s single Parititon. However, when there are four instances of Apache TubeMQ, Kafka will have the same configuration.
  2. From 1 to 10, the throughput of TubeMQ increases while Kafka decreases.
  3. By adjusting the consumption mode (qryPriorityId, consumption priority ID) of consumers in TubeMQ, the throughput capacity of the system will show different changes. Online users can adjust the consumption capacity of the consumer group according to the actual operation situation, carry out differentiated services and improve the throughput capacity of the system per unit resource.

4.3 Scenario 3: Multi-topic scenario, with fixed message packet size, instance and partition number, investigated the performance of TubeMQ and Kafka in scenarios of 100, 200, 500 and 1000 topics

Why do we test so many topics? Some students have expressed this doubt. Why don’t we use single-topic multi-partiton or dozens of topics like other MQ Benchmark tests?

Based on the requirements of online practical application: Each Topic configured on the current network of TubeMQ is often dozens or hundreds, and our designed capacity is 1000. We need to obtain the load curve presented by the system with the increase of the number of topics through this scenario, so a few or dozens of topics are not enough to meet our actual requirements.

In fact, the Benchmark test items made by various MQ do not conform to the actual system application situation of everyone, especially in the big data scenario. Just think, we have a cluster of dozens of machines, each Broker will only configure a few topics and partitions? If you can only configure a few dozen topics, you can’t improve the resource utilization of your machine, so our benchmark tests are load tests of hundreds or thousands of topics.

We analyzed and compared the stability of the system, the change of throughput and the maximum possible situation of the system within the design range through the load of different scales. In the appendix of the document, we also provided the change of traffic under different Topic scenarios. Through these, we can clearly know how the system performs in practical application:

4.4 Scenario 4:100 topics, one in, one out in full and five out in partial filtering: Pull consumption of a full topic; Filter consumption uses 5 different consumer groups to filter out 10% of the message content from the same 20 topics

This scenario reflects the influence of Apache TubeMQ filtering consumption on the system. When online businesses construct a Topic, each business will not construct a Topic, and many businesses are mixed in the same Topic for data reporting and consumption, so there will be a demand for filtering consumption data.

This use case reflects the difference between client-side filtering and server-side filtering; At the same time, this index also reflects that in the process of filtering consumption, the throughput of the system increased by about 50,000 TPS compared with that of full consumption, although the consumption number changed from 2 full copies to 1 full copy and 5 full copies, indicating that the load pressure brought by filtering consumption to the system was lower than that of full consumption.

4.5 Delay Comparison of Data consumption between TubeMQ and Kafka

Is end-to-end latency in Kafka really the case? It seems to be different from what everyone uses. Some students have feedback in this post “How to evaluate Tencent’s open source messaging middleware TubeMQ?” .

Why the difference? Because we changed the production configuration of Kafka to 200ms and batch.size to 50000 bytes in order to improve the throughput of Kafka, if we remove these two Settings, Kafka’s end-to-end latency is similar to TubeMQ. However, there is a big gap between the TPS request and the test result given in the test report, and we have released this test report before. In order to avoid unnecessary misunderstanding, we still keep the data in the report, because the data is indeed like this according to the test parameter configuration provided by us:

If you are interested in this question, you can directly verify our test results and analysis in your own environment to see if it is true.

4.6 Scenario 6: Changing the Memory Cache Size (memCacheMsgSizeInMB) for Topic Configuration On Throughput

This scenario reflects changes to the system when Apache TubeMQ adjusts the size of the memory cache. The size of the memory block can affect the throughput of TubeMQ. This is consistent with our design, and we will introduce the specific amount and impact in another document.

4.7 Scenario 7: Performance of the two systems under severe consumption Lag

Disadvantages of the disk family: From this test, we can see that disk-based MQ has this disadvantage. The advantages of disk are lower cost, better read and write times than SSD, and the disk may take a long time before the hardware is upgraded. This test also validated one of our original ideas, which was to use SSDS to cache lagging data in exchange for the I/O pull problems caused by disk lagging reads.

The idea of caching lagging data through SSDS is discussed in “How to Evaluate Tencent’s open source messaging middleware TubeMQ?” We decided that this was not as fast as scaling up Broker nodes, so we scrapped the SSD dump in the new version. Through this test, we need to know clearly when the disk system lag read performance, and how to deal with.

4.8 Scenario 8: Evaluate the performance of two systems with multiple models

Adaptability of different models: From this test, we can see that the throughput of TubeMQ in the disk system will increase greatly with the increase of memory and CPU. Kafka is much better on SSDS. Our analysis should be related to our read mode. Kafka’s block read block write is much better than TubeMQ’s block write random read in a mode where storage is not a bottleneck. From this test also very clear feedback, every system has its adaption surface, the key is the system application environment and scene.

Considering that a lot of our online consumption is multi-filter consumption, even if storage is not the bottleneck, the network will be the bottleneck again, so the compromise is that random reads will not disappear. How to make SSD better, TubeMQ needs to be improved in the future.

5. Conclusion:

In Apache TubeMQ data Reliability Introduction, we introduce a different perspective of data reliability. In this article, we introduce another different perspective of Benchmark test solution design. Using this Benchmark, you can get a clear picture of TubeMQ, including its strengths, weaknesses, and boundary values. If you have anything to add, we can come together in the TubeMQ community.