1. Introduction:
When we apply Apache TubeMQ, how should we configure its parameters to achieve the optimal performance of the system? What potholes do we need to focus on in advance that affect the performance of the system? How does the number of production and consumption affect the overall inflow and outflow flow?
This document is based on test data from 2018, but it does not affect your understanding and mastery of the questions. By adjusting the value of each parameter to obtain its influence on the performance of TubeMQ system, you will basically know the influence of each factor on the system performance, which is convenient for you to carry out related online operations based on this. You can also see that the combination of each factor is very large, and you can also review and improve according to our plan before application. If you get a better parameter ratio, welcome to contribute to the community.
2. Test Summary:
Through the analysis of the test data, we obtained the information of guiding significance for the online operation of Apache TubeMQ:
- When the packet length is 1K on a 10-gigabit machine, a single storage instance of TubeMQ can be configured with 2M memory Buffer cache and 3K disk brush, and the incoming flow can reach more than 1G.
- By expanding the number of storage instances of a Topic, the incoming traffic of the Topic can be significantly increased, but the more the number is not the better. Increasing the number of Topic partitions does not increase incoming traffic.
- The consumption of Topic directly affects the production capacity;
- When the incoming flow of Topic is large, it is necessary to increase the number of Partition of Topic to improve the parallelism of consumption; otherwise, consumption will not catch up with production. In the case of continuous production and consumption, a Topic should be equipped with at least 4 partitions to catch up with the production. Considering the data processing time, in the actual online operation, we need to increase the number of partitions for large businesses to improve the consumption speed.
- Disk IO-util has a direct impact on production and consumption. When operating online, we should avoid the situation that the business lags behind consumption by a large proportion for a long time.
3. Test details of production and consumption impact factors:
3.1 Pure production performance:
3.1.1 Optimal incoming flow test for pure production of single Topic and single Partition
Data interpretation: This test fixed the number of Topic and Partition, as well as the interval of reading and writing I/O threads and data flushing time, to analyze the impact of the size of memory Buffer and the number of file flushing disks on the throughput of a Topic.
From the test, the size of memory Buffer and the frequency of file brushing (the number of brushes and the interval between brushes) have a direct impact on production. The larger the memory Buffer, the greater the throughput; the lower the file brushing frequency, the greater the throughput. When the Buffer is 2M and the file flush configuration is 3K, the Broker can meet the incoming traffic requirement of 1G per Topic.
3.1.2 Impact test on the number of pure production READ and write I/O threads for a Single Partition
Data interpretation: In testing, the number of Broker reading and writing IO threads does not have a direct effect on the production performance of a Topic. More reading and writing IO threads is not always better.
Based on 3.1 Single Topic single Partition pure production performance test 1, this item verifies the impact of adjusting the size of memory Buffer and the number of disk brushes on throughput as the number of READ and write I/O threads changes.
According to the test comparison, the number of IO of read-write thread has a slight impact on the performance of Topic, but the impact is not as great as the impact of the size of memory Buffer and the number of disk brushes. For example, the performance of A.2-1-1 relative to A.1-6-1 and A.2-1-2 relative to A.1-3-4 decreases with the reduction of read and write I/O threads. However, when A.2-2-2 relative to A.2-1-2 adjusts the memory Buffer by 8M, the incoming traffic can be increased by 100M.
3.1.3 Impact test of single Topic single-partition pure production memory Buffer
Data interpretation: From the test, the number of disk brushes has the most direct impact on system performance, while the adjustment of memory Buffer size has an impact on performance, but the effect is limited at a certain level, indicating that disk is the bottleneck of system performance.
According to the parameter adjustment test conducted on the basis of 3.1.1 and 3.1.2, it can be seen from the figure that only the memory Buffer of A.4-1-1 was changed compared with THAT of A.2-1-2, and the increase of incoming flow was limited by comparison.
3.1.4 The number of pure production instances for a Topic and a Partition Affects the test
Data interpretation: According to the test, the number of instances can effectively improve the inbound traffic of a Topic, but the final bottleneck is still in the disk. As the amount of writing to the disk exceeds the capacity of the disk, the overall inbound traffic shows a downward trend.
According to the test results, in the case that single Topic single instance does not meet the requirements, the capacity of the system can be completely drained by configuring no more than 10 instances in a single machine.
3.2 Tracking production and consumption performance:
3.2.1 Influence of different Partiton numbers on consumption under single Topic
Data interpretation: From the test data of this scenario, we can get several information:
- Consumption can form a substantial impact on production, and with the increase of consumption, production capacity is in a downward trend;
- The Partiton number of Topic determines whether the data can be consumed quickly. The more partitions, the better.
- Under the premise of pure data pulling, about 4 partitions can catch up with the production speed without lagging behind.
3.2.2 Analysis of factors affecting consumption power
Data interpretation: This group of tests mainly analyzed the factors affecting the consumption speed. According to the test, disk io-util has the greatest impact on the consumption speed. The disk io-util can be pulled out by a small number of clients with delayed read, so as to prevent long-term delayed read of services in actual operation.