Abstract: Make database performance UP! The UP! The UP! Huawei cloud GaussDB(for MySQL) does this
This article is shared in the Huawei cloud community. What is the Power of vertical integration of GaussDB(forMySQL) Cloud Stack? By Lv Manyi, Chief scientist of Database Lab at Huawei Swedish Research Institute.
Nowadays, cloud computing is becoming more and more popular. As an important service mode of cloud computing, cloud stack is one of its key components, which is transactional database service. In real business scenarios, applications rely on scalable, high-performance managed database services to fully benefit from the cloud platform. Cloud databases also need to make efficient use of the underlying cloud infrastructure to unlock the potential of cloud-scale operations.
Huawei cloud GaussDB(forMySQL) isa high-performance enterprise cloud native distributed database based on the next-generation DataFunction Virtualisation (DFV) distributed storage, which uses computing and storage separation architecture and is fully compatible with MySQL. To provide professional services to Internet and corporate clients in the form of fully hosted services. In this article, I’ll explain common customer workloads and how we can leverage the unique capabilities of Huawei’s cloud stack to handle them.
Who are the customers on the cloud? What is their workload?
In China, people usually think that only Internet start-ups use cloud platforms, and MySQL is favored by many domestic enterprises due to its popularity among Internet companies. But in fact, companies have been embracing the cloud concept for years and are continuing to develop, which is also the current trend in China. As the most popular open source database in the world, MySQL is widely adopted by all industries and Internet companies.
So what is a typical workload for a cloud database customer? Two characteristics we observe are: 1) The volume of data is increasing. From a few terabytes to tens of terabytes of data to start with, and it gets bigger and bigger as time goes on. 2) A mixture of simple insert/delete/update/lookup and complex analysis queries. In addition, there are occasional DDL operations.
Currently, customers are facing a big challenge, namely how to improve database performance in the case of large data volume. Customers want to maintain throughput of their core transaction workloads while performing complex queries. Queries are often complex because of the business logic nature of the enterprise. Fortunately, MySQL 8.0 adds long-awaited parsing row SQL support, such as Windowing Function and recursive CTE. For unstructured data, MySQL’s JSON support has been very popular.
Overview of GaussDB(for MySQL) architecture
The architecture of GaussDB(for MySQL) is built on a distributed storage system shared by multiple tenants. Currently, a maximum of 128TB of data can be stored in a database. A primary node is used for read/write loads, and a maximum of 15 read-only nodes are used for read/write loads. The SQL engine is a deeply modified MySQL 8.0, so it is 100% syntactically and semantically compatible with MySQL. RDMA network is used between compute nodes and storage.
The storage system used by GaussDB(for MySQL) is a highly reliable cross-AZ cloud storage system. On a public cloud, the storage system can be a large cluster with dozens or hundreds of nodes, with scalability many times higher than in a single-tenant offline solution. SQL nodes write redo logs to the storage layer, and pages are stored at the storage layer Materialize. This design significantly reduces network traffic for update-intensive workloads. The pages belonging to a single database are organized in slice form, with slices distributed across multiple storage nodes, and this data distribution is the basis for distributed queries.
Huawei cloud GaussDB(for MySQL) architecture diagram
Huawei’s unique advantage: vertical integration
Unlike traditional offline databases, cloud databases have the ability to vertically integrate all layers of the cloud stack. As a leading provider in all layers of the cloud stack, Huawei has a unique position in the cloud field and is capable of becoming a leader in the industry.
The closest thing to a database in the cloud stack is storage. Offline pure soft databases need to be used together with general storage and standard file systems, so there is almost no room for optimization in vertical integration. But on the cloud, the integration of storage and databases can play a bigger role because cloud storage is highly scalable in terms of storage nodes and allows customers to scale dynamically based on data volume and load. Because cloud storage is shared among multiple tenants, and not all tenants have large scans all the time, we can offload some query processing to the storage tier to achieve higher resource utilization.
- Improve performance through parallelism (parallel query: PQ)
One common way to improve performance is parallelism, which can be implemented on multiple levels. The community version of MySQL 8.0 only supports single-threaded query execution and cannot take full advantage of all the cores provided by the hardware to execute complex queries. We modified the MySQL executor to allow multiple threads to execute a single query in parallel. Unlike offline solutions, cloud infrastructure allows us to leverage its vertically scalable capabilities on computing nodes. The largest compute node currently has 64 cores, which represents the maximum parallelism we can achieve with parallel queries. This optimization works best when most of the hot data can be stored in a BufferPool. Parallel queries will be explained in detail in another article.
The customer workload includes not only DML but ALSO DDL, such as index creation and changing the data types of columns. Although most DDLS are handled online in MySQL, some operations can be blocked, and using logical replication can amplify the blockage. GaussDB(for MySQL) uses physical replication to avoid this problem. When tables are large, DDL operations can take hours to complete. The need to optimize DDL is obvious in order to support the amount of data we see on the cloud. We already have an innovative approach to DDL that will be explored in a later article.
Another layer that allows for higher parallelism is the storage layer, since storage systems may have hundreds of nodes and thousands of cores. This cloud-scale distributed storage used by GaussDB(for MySQL) is a key foundation for improving query performance, with the potential to improve query performance by more than 100 times when combined with parallel queries.
Compute layer single threaded query execution
The computing and storage tiers execute in parallel
- Using cloud storage to improve query performance (operator down: NDP)
Data in GaussDB(for MySQL) is organized in slice format and distributed on multiple storage nodes. We use this data distribution to unload the operator to the storage node where the data resides and execute it using locally available computing resources without reading the data to the compute node. In database terms, we refer to this as near data processing (NDP) or operator push-down. Its basic principle is: will be part of the query processing work pushed down to WHERE the data storage nodes, under the push of the query is data-intensive queries, such as a full table scan and index scans, projection and certain conditions WHERE the filtering, and aggregation in storage layer, just match the row and column back to compute nodes, rather than the full page. In addition to parallel execution, this approach reduces network IO because the amount of data extracted to compute nodes is significantly reduced.
In addition, NDP operator pushdown allows full use of the local bandwidth of caches and storage media, and offloads to storage work best when queries need to scan large amounts of data and the data is not in the Innodb buffer pool. For example, the figure below shows that NDP operator push-down and parallel queries optimize the execution time of TCP-H Q12 by 34 times. A separate article will cover the technical details of NDP and provide a comprehensive performance analysis.
The future direction
GaussDB(for MySQL) is designed for the cloud. This architecture has extremely powerful and flexible vertical integration capabilities. Computing and storage resources are decoupled and can be independently extended, while functions are tightly integrated and database operations can be performed in multiple layers. In the future, database functionality could also be offloaded to network cards and other cloud components, not limited to compute nodes and storage.
We believe that deep integration of cloud stacks is the key to unlocking the power of cloud databases, and Huawei is uniquely positioned to achieve this, as GaussDB(for MySQL) has shown and will lead the way in the future in the cloud space.
To sum up, Huawei cloud GaussDB(forMySQL) uses advanced technologies, such as parallel query PQ and operator push-down NDP, to greatly improve database performance, maximize the vertical integration of cloud stacks, and make computing power faster and more powerful. Article described function have been online, welcome everybody to huawei cloud website experience: www.huaweicloud.com/product/gau… , please also follow us, there will be more technical information to share with you!
Lv Manyi is currently the chief scientist of Database Lab at Huawei Swedish Research Institute and the head of the European R&D team of cloud database. He has more than 20 years of experience in the database field. He has participated in the development of distributed high-availability database in the telecom industry, and has been deeply engaged in MySQL technology for ten years in an international well-known software company. In 2020, he joined Huawei, determined to build the world’s top enterprise cloud database.
Huawei will hold Huawei Connect 2021 at Shanghai World Expo Center & Expo Convention & Exhibition Center from September 23 to 25, 2021. With the theme of “Deep Digitization”, huawei will gather industry thought leaders, business elites, technology giants, pioneer enterprises, ecological partners, application service providers and developers to discuss how to deepen the industry scene. Combine digital technology with industry knowledge deeply, truly integrate into the main business process of government and enterprise, solve core business problems, promote experience improvement, efficiency improvement and model innovation; We also release scenario-oriented products and solutions, share the latest achievements and practices of customers and partners, and build an open and win-win healthy ecology. For more information, please visit the website www.huawei.com/cn/events/h…
Click to follow, the first time to learn about Huawei cloud fresh technology ~