POLARDB advantage: Minute elasticity

Pulse computing is everywhere

Ali has double 11, China has Spring Festival travel, college entrance examination after the score out that day, fans hearts have Jay Chou concert tickets online when the sale of… Where there are people, there will be rivers and lakes, and where there are people, there will also be pulse computing. Behind these hot events, a large number of computing resources are needed to support, and these suddenly urgent computing resources are just like pulses, urgent and fierce, which is called pulse computing. Not only the ECS server, but also the database needs to deal with these sudden pulse fluctuations to ensure that the whole system is smooth and stable.

Storage is separated from computation

We know that one of the biggest characteristics of POLARDB is the separation of storage and computing, the so-called separation is the compute node (DB Engine) and storage node (DB Store) on different physical servers, any LANDING to the storage device I/O operations are network I/O. Some people may ask, go network, how about latency, performance is good? In this article, I briefly introduced PolarFS through the network access PolarStore test results, and the local single copy SSD is almost the same, I will not repeat here.

POLARDB storage and computing separation architecture, in addition to reduce storage costs, ensure that the master and backup data strong consistency, not to lose data, but also brings a huge advantage, is to make the database “elastic expansion” become very simple and convenient.

The challenge of doing database elasticity

Although elastic scaling is one of the characteristics of the cloud, many people see IT as a reason to move their IT systems to the cloud. However, the elastic scaling of databases has always been a challenge in the industry. Different from ECS, which provides purely computing services, a database needs to deal with these problems in order to be flexible:

  • First, horizontal expansion is difficult. Databases are often the core of a business system. Data must flow and be shared in order to be valuable, so when the scale is not very large, the database is generally centralized deployment, which makes it easy to use, such as multiple business library queries, one SQL out. Therefore, it is difficult to achieve linear scaling capability for databases by horizontally increasing the number of servers.
  • Second, 0 downtime requirements. The core status of the database determines that once the database failure, the real business will be paralyzed. Therefore, the database must be high availability, shielding any hardware failure, to ensure uninterrupted business. Not only to ensure high availability, but also to do elastic expansion, just like changing the engine on a high-speed plane, the difficulty can be imagined.
  • Again, the data is heavier than the calculation. The essence of a database is to store data, but data is essentially stored on storage devices. It is not easy to upgrade storage devices when you find that the I/O performance of storage devices is not enough. Similarly, if the data and calculation in the same physical machine, the number of CPU cores and the main frequency of the physical machine, it determines the upper limit of computing power, it is difficult to expand.

Now, when we break through the performance bottleneck of separating storage and computing, we can finally make new progress in the field of elastic scaling of databases, combined with the architectural design of multiple nodes sharing the same data.

Elasticity advantage of POLARDB

As shown in the figure above, POLARDB is a layered architecture, from the upper proxy PolarProxy provides read and write separation, SQL acceleration and other functions, to the middle of the database engine node POLARDB constructed a write read database cluster, and then to the bottom of the distributed storage PolarStore for the upper layer to provide multi-node mount data sharing, Each layer plays its own role and jointly constructs the POLARDB cloud database cluster.

From the POLARDB product definition, the number of nodes and specifications purchased by users (such as 4-core 16G) refers to the middle layer of POLARDB configuration, the upper PolarProxy can adapt according to the POLARDB configuration, users do not need to buy and do not care about performance and capacity. The capacity of the underlying PolarStore is automatically expanded and only charged according to the actual capacity used.

In general, there are two types of scalability: Scale up and Scale out. The Scale up means to improve the configuration, while the Scale out means to increase nodes while the configuration remains unchanged. For databases, it is always vertical first, such as 4 cores not enough to go up to 8 cores. However, bottlenecks will eventually be encountered. On the one hand, nonlinear performance improvement is related to the design of the database engine itself and the application access model (for example, the multi-threaded design of MySQL, if there is only one session, it is difficult to reflect the advantages of multi-core); on the other hand, the configuration of computing physical servers has a ceiling. So the ultimate solution is to scale horizontally and increase the number of nodes.

In a word, __POLARDB can be up to 16 nodes horizontally and up to 88 cores vertically. Storage capacity is dynamically expanded without configuration. __

Vertical scaling (upgrade/downgrade configuration)

Thanks to the separation of storage and computing, we can upgrade or downgrade alone POLARDB database node configuration, if the current server resources is insufficient, can also quickly migrated to other servers, the entire process takes only 5 to 10 minutes (continuous optimization), don’t need any data to move among them, only when it comes to migrate across the machine, There may be tens of seconds of connection flash disconnection (in the future, this effect can be eliminated through PolarProxy, upgrade has no impact on business applications).

At present, all nodes in the same cluster must be bound to Upgrade, so we will adopt a very soft Rolling Upgrade Rolling Upgrade method to further reduce the unavailability time by controlling the Upgrade rhythm and matching the active/standby switchover.

Horizontal scaling (add/subtract nodes)

Because the storage is shared, nodes can be added quickly without any data COPY. The whole process only takes 5-10 minutes (continuous optimization). If it is to increase nodes, it has no impact on business applications; if it is to reduce nodes, it only has impact on connections that fall to this node.

When nodes are added, PolarProxy can dynamically sense and automatically join the read nodes at the read-write backend, which immediately improves performance and throughput for applications connected to POLARDB using cluster access addresses (read-write addresses).

Unmanaged storage space

POLARDB storage space does not need to care, how much to pay how much, every hour automatic settlement.

The I/O capability is designed based on the specifications of database nodes. The larger the specifications are, the higher the IOPS and I/O throughput. I/O capabilities are isolated and restricted on nodes to avoid I/O contention among multiple database clusters.

In essence, data is stored in a storage pool consisting of a large number of servers. Due to reliability requirements, three copies of each data block are made and stored on different servers on different racks. Storage pools can be self-managed and dynamically expanded and balanced to avoid storage fragmentation and data hotspots.

A typical scenario

A Beijing-based online education company deployed an online answer exam system for primary school students on the cloud, with 50,000 to 100,000 people online on weekdays and 200,000 on weekends. The peak period of the exam can reach 500,000 to 1 million with less than 500 GIGABytes of data. The main difficulty lies in the high user concurrent access, read and write contention, high I/O, if you always buy the highest configuration, the cost is unacceptable. With POLARDB, the overall cost was reduced by 70% compared to the previous solution, thanks to the ability to quickly and flexibly increase the database configuration and cluster size temporarily during peak times.

Author: B Hugh