Brief introduction: The DpCA big data acceleration engine integrates software and hardware optimization for commonly used big data components, such as Spark, Hadoop, and Alluxio, in combination with the characteristics of Alibaba Cloud DpCA architecture, forming unique performance advantages. Finally, the performance of complex SQL query scenarios is improved 2-3 times compared with Spark community edition. Use eRDMA to speed up Spark performance by 30%.
Recently, Benchmark Express-Bigbench (TPCX-BB for short) released the latest world ranking, and Shenlong Big Data Accelerator independently developed by Ali Cloud won the world ranking of TPCx-BB@3000.
Tpcx-bb test is divided into two dimensions: performance and cost performance. Among them, in terms of performance, Alibaba Cloud is 41.6% ahead of the second ranking, reaching 2187.42 BBQpm, and 40% ahead of the second ranking, reducing to 346.53 USD/BBQpm.
(TPCx-BB@3000 Performance dimension ranking)
(TPCx-BB@3000 Cost-performance dimension ranking)
TPCx – BB is organized by the international standardized testing authority (TPC) released based on the retail scene building large end-to-end data test benchmark, support the big mainstream distributed data processing engine, simulated the whole process of online and offline business, there are 30 query, involves the descriptive type process query, data mining and machine learning algorithms. The test of TPCX-BB is characterized by large amount of data, complex characteristics and complex sources, which is close to real business scenarios and has important reference significance for infrastructure selection in various industries.
Tpcx-bb test results can fully and accurately reflect the overall performance of the end-to-end big data system. The test covers structured, semi-structured, and unstructured data, and can comprehensively evaluate the software and hardware performance, cost performance, service, and power consumption of big data systems from the perspective of customers’ actual scenarios.
Alibaba Cloud developed the Dragon big data acceleration engine MRACC (ApasaraCompute MapReduce Accelerator), is the top killer in the world this time. The DpCA big data acceleration engine integrates software and hardware optimization for commonly used big data components, such as Spark, Hadoop, and Alluxio, in combination with the characteristics of Alibaba Cloud DpCA architecture, forming unique performance advantages. Finally, the performance of complex SQL query scenarios is improved 2-3 times compared with Spark community edition. Use eRDMA to speed up Spark performance by 30%.
Specifically, in view of the heavy IO characteristics of big data tasks, MRACC combines the advantages of cloud architecture in network and storage to accelerate software and hardware, including SQL engine optimization of software, using caching, file clipping, indexing and other optimization methods, and trying to unload compression operations onto heterogeneous devices. In addition, eRDMA is used for network acceleration. Data exchange in shuffle phase is performed on the eRDMA network, which reduces latency and greatly improves CPU utilization.
The combination of MRACC and DpCA server brings new imagination to big data on the cloud and higher performance and cost performance for users.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.