When you find that a database query is particularly slow, and you can’t find the cause in terms of hardware configuration, SQL optimization, and indexing, you might want to look at the performance of the database’s computing engine itself.

How important is database computing engine performance? We can make a simple analogy with cars. The hardware configuration of the server is the infrastructure, which is equivalent to the road of the car. The driving effect of the expressway and the dirt road in the mountain village is certainly different. SQL query optimization is equivalent to driving level; The database computing engine is equivalent to the automobile engine, which is not only the source power of database performance, but also the core technical barrier of various manufacturers.

Then, we will explore how to improve database performance from the realization of database computing engine technology. The following figure simplifies the process from sending an SQL statement from the client to sending the result back to the client.

If you think of the database kernel as an organization, the optimizer sits at the top of the organization as the head of the organization calling the shots. The executor, located in the middle of the organization, strictly executes the plan delivered by the optimizer, reads data from the storage space for processing, and ultimately returns it to the client.

The optimizer

How to visualize the optimizer? Take the query of “Zhihu’s answer with more than ten thousand likes” as an example, the user tells the database through SQL to “find me the answer with more than ten thousand likes”, and the optimizer converts the user’s demand into the strategy and method of “how to find the answer with more than ten thousand likes”, namely the query plan.

There can be thousands of different execution plans for the same SQL, and there can be a huge difference in performance between a good execution plan and a bad one.

How to select the best query plan from thousands of query plans? Heuristic Rule Based Optimization (RBP) is usually adopted by the query optimizer of the early database, which is not accurate enough to obtain the optimal execution plan. Cost Based Optimization (CBO) can efficiently select the execution plan with the best performance for most scenarios.

Therefore, the high-performance database engines we see often use cost-based optimizers.

actuator

The executor is one of the most important parts of the database kernel. Improving the performance of the actuator will greatly improve the performance of the database, so the major database manufacturers have put a lot of energy into the research and development of the actuator technology.

There are two main techniques to improve actuator performance: Vectorized execution and code generation. The mainstream database vendors use one of these actuator optimization techniques, for example, Snowflake uses vector computation, Impala uses code generation, Spark uses both, and OushuDB uses vector computation plus SIMD optimization. Some traditional databases do not yet implement any of these performance techniques.

You might be wise enough to ask, which technology route wins? On this question, many studies and papers give the answer: the two technologies have different focuses but can improve performance, different statements will also have different levels of performance improvement, vector computing is more suitable for parallel processing of data SIMD. Therefore, to further improve database engine performance on the basis of parallel computing, you can make full use of CPU hardware instructions (such as SIMD) in conjunction with parallel processing of data.

SIMD

Single Instruction multi-data (SIMD) is a single instruction multi-data stream that synchronously executes the same instruction at the same time. Compared with single-instruction single-data stream (SISD), single-instruction multi-data stream obtains all operands at once, thus speeding up operations, especially data-intensive operations.

As shown in the figure above, scalar operations can only perform one pair of data multiplication operations at a time, while SIMD multiplication instructions can perform four pairs of data multiplication operations simultaneously. As a vector architecture, SIMD uses a vector instruction to initiate a set of data operations in which data is loaded, stored, and computed in a pipeline.

Through testing on the international standard data set TPCH, we found OushuDB 4.x to be about an order of magnitude faster than the latest version of SparkSQL 3.x.

Based on the above analysis, from the perspective of improving database performance, we can adopt cost-based optimization + vector calculation + SIMD as the preferred method to improve database performance.