Abstract: This article describes GaussDB(for openGauss) in terms of its overall architecture, main scenarios, and key technical features.
This article is shared in huawei cloud community technical Live Interpretation Session 1: Understanding Huawei cloud database GaussDB(foropenGauss).
1. Background
On March 16, there was a question in the first live lecture of GaussDB(for openGauss) series hosted by Huawei Cloud called Understanding Huawei Cloud Database GaussDB(for openGauss) : If open source databases are so delicious, why does Huawei make efforts to develop GaussDB(foropenGauss)?
In fact, many open source databases are weak in ease of use, supporting ability and other aspects, and need constant maintenance. Moreover, once data loss occurs, it is difficult to recover quickly, causing inestimable losses. Therefore, cloud on open source database can only solve the demands of small and medium-sized enterprises, such as simplified deployment, operation and maintenance, tuning, and extreme cost performance.
At the same time, open source database also has to face various large and small costs such as server, database maintenance and upgrading, human operation and maintenance, which is difficult to meet the rapid expansion and sustainable development of business. In the face of financial, government and enterprise data security, response speed, reliability, availability of large enterprises have strict requirements, they need ultra-high availability, complete functions, excellent performance, open ecology, extreme flexibility of enterprise database services.
GaussDB(for openGauss) is an enterprise-level distributed relational database developed by Huawei based on years of experience in the database field and the requirements of enterprise-level scenarios. Currently, it supports single sharding and distributed deployment modes. On the basis of supporting traditional services, it continues to build competitive features, providing infinite possibilities for enterprises to face the challenges of the 5G era.
To help you quickly understand GaussDB(foropenGauss), Huawei cloud database team has prepared GaussDB(foropenGauss) technical live broadcasts. This document describes the overall architecture, main scenarios, and key technical features of the first live broadcast.
2. Overall architecture: unified distributed architecture based on data sharding
GaussDB(foropenGauss) uses a shared nothing architecture based on data sharding. Underlying data is scattered to different data nodes based on certain rules, such as hash, list, and range. Multiple underlying nodes participate in calculation. At the same time, data nodes can be expanded, and SQL parsing and forwarding are carried out by coordination nodes at the upper level.
As you can see from the figure, there are three main types of nodes: coordination nodes, data nodes, cluster nodes (most importantly, the global transaction manager). The coordination node is responsible for SQL parsing and forwarding and acts as a proxy. The data node is responsible for computing and data storage. The global transaction manager is responsible for ensuring the consistency of global transaction reads.
This architecture builds the following core advantages for GaussDB(for openGauss) :
(1) Extreme high availability: Two-site, three-center architecture, cross-region real-time data Dr
② Data security: Implements strong data consistency across AZs, ensuring data 0 loss
(3) High scalability: containerized deployment, performance capacity on demand level expansion, up to 1000+ nodes
④ Strong performance: Kunpeng 2-way server, 32 nodes, 12 million tpmC (Huawei internal test)
(5) Full-stack soft and hard self-development controllable: industry leading kunpeng +openGauss self-development open kernel
3. Hit the scene
Scenario 1: Traditional core transactions
For traditional applications, you can use the single-fragment mode, which is the same as the traditional active/standby mode. Combined with kunpeng deep optimization, GaussDB(for openGauss) has excellent performance and greatly improved usability, which is very suitable for the replacement scenario of traditional commercial databases.
Scenario 2: Massive future transactions
With the arrival of THE 5G era, it is difficult for a single node to cope with the increasing data scale and ensure performance, while cross-node and horizontally expanded database can well solve the computing and storage needs of large-scale and massive data. The distributed mode of GaussDB(for openGauss) supports a maximum of 1000+ nodes, PB level storage, and strong consistency of distributed transactions, meeting the Internet + requirements of the government, transportation, finance, and energy industries.
The key role
To better understand the technical running status of GaussDB(for openGauss), the following describes some key roles of GaussDB(for openGauss) :
4. Key technical features
GaussDB(foropenGauss) constructs six core technical features based on the distributed architecture of computing and storage separation. The following describes the six features.
Key technology 1: High performance – Distributed execution framework
The implementation process of this feature is as follows:
-
A service application sends an SQL file to a Coordinator. The SQL file can contain CRUD operations on data.
-
A Coordinator uses the optimizer of a database to generate an execution plan. Each DN processes data according to the requirements of the execution plan.
-
Data is distributed on each DN based on the consistent Hash algorithm. Therefore, the DN may need to obtain data from other DN during data processing. GaussDB provides three streams (broadcast stream, aggregation stream, and redistribution stream) to transfer data between DN.
-
DN returns the result set to Coordinate for summary;
-
The Coordinator returns the summary result to the service application.
Huawei has years of experience in SQL execution optimization, even complex SQL, transaction analysis hybrid (HTAP) scenarios can achieve optimal execution, for example:
Cost-based optimization
• Base estimation: Feedback enhancement, AI base enhancement
• Cost estimation: row/column storage cost estimation, network communication cost estimation
• Search algorithm: dynamic programming method, genetic algorithm, AI search
Distributed execution planning capability
• LightProxy
• FastQuery Shipping
• RemoteQuery Shipping
Self developed Cascade optimizer
• Objectified rule application and search tasks
• Branch bound based pruning technology
Using the distributed query engine, distributed scheduling engine, and distributed storage engine, GaussDB(for openGauss) achieves automatic data sharding and improves execution plan processing efficiency by using the query optimizer to automatically balance load. On the data node, stream streams (broadcast stream, aggregation stream, redistribution stream) are provided for different data scenarios to continuously improve the interaction efficiency between multi-fragment data nodes, and automatically complete data summary to ensure the global consistency of distributed transactions.
Key technology 3: High performance – Distributed transaction processing performance, GTM-Lite technology
The advantages of this feature are:
-
High performance transaction management: support lock – free, multi-version, high concurrency transaction technology.
-
Distributed strong Consistency: The distributed GTM-Lite solution provides global transaction snapshot and commit number management, achieving strong consistency without performance bottlenecks on the central node.
Key technology 3: high-performance scale-up capability, breaking through the new structure of Kunpeng 4P NUMA-AWARE, achieving a breakthrough in 4P server performance
GaussDB(for openGauss) uses Numa Aware technology to carry out series optimization according to the characteristics of multi-core Numa architecture of Kunpeng processor. Binding core technology is adopted to avoid cross-core memory access and reduce latency. Through the application of redo log batch insert, hot data NUMA distribution, Clog partition and other key technologies, give full play to the advantages of multiple accounting forces, and constantly reduce access delay, log write conflict, index update conflict. Currently, based on Taishan Kunpeng server, TPCC performance pressure test is 1.5 times that of x86 with the same specification.
Key technologies 4: High availability – Cluster HA and multi-layer redundancy ensure that the system has no single point of failure
GaussDB(for openGauss) implements hardware redundancy, instance redundancy, and data redundancy to prevent single points of failure of software and hardware in the entire system. Different from traditional database software, GaussDB(for openGauss) focuses on providing high availability and reliability through software capabilities. Based on hardware and software bases, the Huawei cloud provides high availability of e2E databases and supports e2E monitoring and detection for the entire scenario, ensuring online applications and data loss in a timely and reliable manner and eliminating single points of failure for all stacks.
High availability technology points
Hardware high availability:
Storage: RAID redundancy for disks.
Network: Dual switch redundancy.
Nic: Multiple nics are redundant.
Host: UPS power protection
Software high availability:
Coordination node CN instances live redundant
Active-standby redundancy of data node/global transaction manager/cluster manager instances
Fault detection
Network fault detection and processing (switch, router, etc.)
Network card fault detection and Handling (Local network card fault detection)
Disk fault detection and handling: Disk heartbeat, handling error codes returned by the file system
Host power failure detection and handling: The Heartbeat mechanism
Cluster instance fault detection and Handling (THE CN/DN/GTM process terminates illegally)
Cluster software failure
Key technologies 5: High availability – Cross-AZ /Region Dr
GaussDB(for openGauss) supports cross-AZ hypermetro in a cluster in the same city. RPO=0, RTO<60s. Two-cluster, cross-region, geo-redundant Dr, with RPO less than 10s and RTO less than 10 METERS, supports cross-region Dr And minimizes Dr Nodes, effectively reducing Dr Costs and allowing users to expand Dr Nodes online after a fault occurs. This ensures service continuity and improves the reliability and availability of original Dr Instances.
Key technology six: high expansion – scale-out online horizontal expansion
GaussDB(for openGauss) supports a maximum of 1000+ compute nodes in a cluster and has excellent linear expansion capability.
Single-cluster fragmentation expansion automatically redistributes data online and supports pB-level massive transactional storage expansion.
To sum up, GaussDB(for openGauss) supports enterprise-level mixed transaction load, strong distributed transaction consistency, cross-AZ deployment in the same city, zero data loss, 1000+ computing node expansion capability, and PB-level massive storage. At the same time, it has high availability, high reliability, high security, elastic scaling, one-click deployment, quick backup and recovery, alarm monitoring and other key capabilities on the cloud. It can provide enterprises with comprehensive, stable and reliable, strong scalability, excellent performance of the enterprise database services, which has been open for commercial use on the whole network. It is also an open ecosystem product. The single-shard version of the source code is open source, and the community address is opengauss.org. You are welcome to download, install and experience it yourself.
Ps: miss GaussDB (foropenGauss) special friends to pay attention to the broadcast, click on the links can be directly to review, come and watch it > > bbs.huaweicloud.com/live/cloud_…
Click to follow, the first time to learn about Huawei cloud fresh technology ~