Introduction: In the digital context, we have a lot of thinking. How is the database different from what it used to be? What is a cloud-native database? As a developer using a database, how has your database requirements changed? What kind of demands do we make when we use databases these days? This article will answer each one.

Overview of cloud native databases

(1) Cloud computing is digital infrastructure

As we all know, cloud computing has become a digital infrastructure, and the whole society is also digitized. Digital permeates our daily lives, including not only food, clothing and transportation, but also education, health care, games and so on.

In the medical field, for example, in the early years, to go to the hospital, whether blood test or chest X-ray, it must be to get a paper report, and then hit a plastic chest X-ray map. However, in the past year or two, other hospitals except grade A hospitals have basically provided patients with materials such as reports and chest radiographs through the Internet. The digitalization phenomenon in the medical field is very obvious.

After all these data are digitized, we are faced with a very big problem: on which platforms and how? Aliyun is a very important link in the process of digitization. The database carries the whole process of data production, integration, real-time processing and analysis. In the entire database surrounding, there may be hardware, security, elastic computing and other capabilities, these large and small things eventually constitute ali Cloud this platform.

(2) What is cloud native database technology

Cloud computing is reshaping database technology and business.

In the digital context, we have a lot to think about.

How is the database different from what it used to be? What is a cloud-native database? As a developer using a database, how has your database requirements changed? What kind of demands do we make when we use databases these days?

Today, the upper business changes very fast, including the former Alibaba Taobao internal actually also has the same problem. The rapid change in the business makes it a challenge for developers to adapt very quickly. Before the adoption of the cloud, the process was slow, from building the server, then the network, the installation of the operating system and database, etc., the whole process was very long.

The appeal to the database may be summarized as follows.

The first is that we want to focus more on business development and not spend too much time on the configuration of the underlying hardware, software, computer room, network and other facilities.

The second is out of the box. We hope that the database can be used directly after it is created, and there is no need to do complicated, time-consuming and professional things such as configuration and optimization.

The third one is security and trust, which is a very basic requirement when you put your data on a third-party platform.

The fourth is open compatibility, we do not want to be locked into a cloud vendor, we want to be very free to migrate in and out.

The fifth is massive expansion, with the explosive growth of the business, the system pressure will soon become several times or even tens of times the original. In this case, if you don’t have a good database system that scales horizontally and vertically, it can be very difficult to support the business and it can be very difficult to handle.

The sixth is globalization. Many game companies in China’s development and promotion of abroad’s done very well, especially in southeast Asia, but there are also some games in Europe and the United States was a very big success in Japan, so now some developers are also facing the globalization, as the database infrastructure, should consider how to provide the ability of globalisation.

The seventh is continuous availability. We used to build our own database system, and continuous availability is one of the core considerations.

In addition, there is reliability, which requires that data loss cannot occur.

Finally, low cost. When the business is mature, we will focus on low cost.

In response to these customer demands, we thought about what features the next generation database or new database should have, namely the product capabilities of the cloud native database, as shown below.

The first is full hosting, where users no longer need to worry about installation, backup, deployment, monitoring, high availability, etc., and can create an instance with one click, creating an instance that has all of these things.

The second is pay-as-you-go. Pay-as-you-go can make the cost of starting a business very low. Otherwise, the cost of equipment room, hardware, network and other facilities will be very high.

The third is the on-demand elasticity, it is divided into two aspects, one is to have the ability to bounce up, when the business in the rapid development process, the database should also be able to quickly bounce up. On the other hand, there is a downward bounce. When the business peak is over, resource usage needs to be brought down quickly to reduce costs.

The fourth is ecological compatibility, whether the user is currently using MySQL, Oracle, or other databases, we can migrate in, we can migrate out.

Above is what we think the product capabilities of cloud native databases are.

There’s a lot of technology behind the capabilities of these products.

The six core technologies are intelligent, multi-mode, hardware and software integration, security and credibility, HTAP: large database integration, cloud native + distributed. These six core technologies support the above product capabilities and address developers’ needs.

(3) PolarDB, a cloud native relational database

PolarDB is a new generation of cloud native database developed by Alibaba. Based on the storage and computing separation architecture, PolarDB utilizes the advantages of the combination of hardware and software to provide users with extremely flexible, high-performance, massive storage, safe and reliable database services. 100% compatible with MySQL 5.6/5.7/8.0, PostgreSQL 11, highly compatible with Oracle.

Polardb-x is a distributed version of PolarDB, which integrates distributed SQL engine and distributed self-developed storage X-DB, focusing on solving problems such as massive data storage, ultra-high concurrent throughput, complex computing and analysis.

(4) PolarDB product architecture of the cloud native relational database

PolarDB product architecture diagram

PolarDB products have the following features:

  • Separation of storage and computing

1) Upgrade or downgrade of minute level elasticity

2) Adding or deleting a read-only node at the minute level

  • Intelligent Agent forwarding

1) Transparent database expansion

2) Multiple conformance levels

3) Customize an Endpoint

  • Distributed storage

1) Support 100TB

2) Quick backup and recovery

3) Higher single-instance IO capability

  • libpfs+rdma+optane

1) High performance transparent implementation of triple copy RPO=0

2) High-performance write: achieve high concurrency write

  • Redo based replication

1) Read-only instance latency in milliseconds

2) Solve the consistency and performance problems of binlog/redo dual logs

  • Parallel execution

1) Query and analysis in some scenarios

2) The degree of parallelism can be freely controlled to ensure performance and stability

Here we will focus on one feature that is more relevant to the developer’s use: intelligent proxy forwarding.

There is a very difficult point in the database, unlike the application server, when the application server system is particularly stressed, it is relatively easy to scale, you can add a group of application servers, the related traffic to the new application server can be extended.

But databases usually can’t do this because the data is interconnected in terms of queries and usage, and the data can’t simply be split. PolarDB has an intelligent Proxy layer on top called Proxy that solves this problem for developers. When the database system is particularly stressed, some queries can be automatically distributed to other read-only nodes through intelligent agents. For example, if the system uses one active node and one standby node, it can be changed to one active node and three standby nodes. In this way, traffic can be automatically distributed to the three nodes.

You might think, well, isn’t this the same thing as adding a few backup databases to your database?

PolarDB solves a key problem by using the intelligent agent, that is, after adding these read-only nodes, the connection configuration on the application server does not need to be changed and can be added at any time. The intelligent agent will automatically forward the Query after receiving it.

Take a real business scenario as an example. For example, one day the front-end business system told us that we would do a promotion at 10 o ‘clock tomorrow morning, so please expand the database.

In the past, if a read-only node is added, the problem may be that the front-end application server cannot access the read-only node at all, or the front-end application server can access the read-only node, but the application server configuration changes may cause the application server to restart. Now through PolarDB’s intelligent agent can effectively solve this problem, easy to do capacity expansion.

Second, the traditional relational database to the cloud native environment migration

(I) The challenge of replacing traditional commercial databases

Today, there are several major challenges to migrating to PolarDB from other commercial databases, such as Oracle databases.

The first challenge is to apply high coupling. Usually, database and application of the coupling is very high, if you want the database to do an action, the front end application to cooperate to do together, may affect the availability of the front, because normally database under load are more critical of the business, the dynamic database often means moving the front-end applications.

The second challenge is stability. When the database goes down, the business on the front end goes down, so changes and actions to the database are often performed at night.

The third challenge is the volume of data. Because businesses are now larger, the data volume of the core database is usually larger.

The fourth challenge is syntax compatibility. Although everyone is using SQL, SQL is different from database to database. If you move from Oracle database to PolarDB, SQL will have to do too much remodeling, which means that the front-end business system transformation will be very large and complicated.

(2) Use PolarDB, a cloud native database, to replace the traditional commercial database

It is a scientific process of standardization and productization.

Migration flow chart

On Aliyun, we will provide a set of standardized processes and products to help users move from the original database to PolarDB.

First, we will give the user a tool or script, to a user’s system run it inside, it can be collected some characteristics of the user database, what are the features include SQL stored procedure, function, and the target database of writing does not match, the characteristics of the original database, such as it is a database of the system pressure particularly big, It is also a database where hot data is particularly obvious. When these points are detected, the user is told what to look out for in the later stages of the transformation.

The above table is in the actual business process through the script run out.

From this table, we can see that the overall compatibility of the original database is relatively high when migrating to PolarDB. We probed a total of 6029 objects, which might include stored procedures, tables, index sequences, synonyms, and other related things, but only two of them were incompatible, which is relatively small. The report will indicate the specific two tables, and there are some specific suggestions to change, and then you can migrate.

The following is a more specific process, which will not be elaborated here.

At present, Ali Cloud has made this set of standardized, product process and China Information and Communication Institute together into a standard guide for database migration, developers can look up online, follow the guide to do database migration.

Managing PolarDB O engine (compatible with Oracle syntax)

PolarDB provides full stack compatibility for Oracle

PolarDB provides compatibility with Oracle in many aspects. In addition to the compatibility of syntax layer, there are also physical storage layer, logical layer and interface layer.

2. Managing PolarDB O engine (compatible with Oracle syntax) : a common tool

If users migrate from Oracle, what will be the difference between using and managing PolarDB?

In terms of management tools, users can use DMS, the data management platform of Aliyun Cloud, find the entry called login database on the console, and then log in to DMS, as shown below.

The second is to use the open source data management platform called pgAdmin, on this platform can do the basic data management operations, including basic information view, data query, see some execution plans, tables, objects, etc., as shown below.

PolarDB O engine (compatible with Oracle syntax) development practice: database basic specification

Managing the PolarDB O engine (compatible with Oracle syntax) : Development Specifications (1)

In addition, Aliyun has some commonly used development specifications, which are explored internally by Aliyun, also known as the regulations, and are strictly followed and implemented within Alibaba. They will be published in the developer community and aliyun’s document system in the future. The development specification is divided into several areas, some of which will be more relevant to the specific use of PolarDB by developers, as briefly described below.

Some of the specifications are internally mandated, and some are recommended, which users can choose according to their actual situation.

Above is the table specification. For example, there is a specification for field names that requires lowercase letters and numbers and no keywords. Why is there such a specification? Because the field name is a costly thing to change, usually can not be “pre-posted.”

We found it very cumbersome to change a field name during actual production. Because the previous business is already running, if a field name is changed, it means that the business system cannot run properly. So most of the previous practice was to add new fields, so we have some specifications for field names, such as lowercase only, no keywords, etc.

The second is the table name and the field name, and we want create_time and update_time. This has several benefits. The first is that if something goes wrong with the data, you can quickly know how and when the field was changed. The second is in the upstream and downstream systems, if you want to pull some changed data, it can also very quickly find which data has changed, and then do the corresponding processing.

In addition, the table must have a primary key. There are several reasons for this. The first is that query performance is very good, and the second is that when a downstream system pulls some changing data, it can get it quickly through the primary key.

There are also a series of index specifications, as shown in the figure above.

It is mentioned in the specification that the index should be created in an order, which may affect the order of the fields in the “WHERE” condition and the order of the fields in the “Order by” condition. Only when the two conditions match, the overall performance will be better.

In addition, if you can use overwrite index queries, try to use overwrite index queries, which will greatly increase efficiency.

There is also a recommendation in the specification to optimize hyperpaging scenarios with deferred correlation or subqueries. This is also our experience with database index optimization. As paging query, for example when you turn to page 1000, or is on page 500 of the page, then Suggestions, such as the content of the page to find out 10 pages, had better look over these ten pages of the primary key ID first found out, found out later again back to the table, put all of the data found out, This is a common recommendation.

In addition, the index specification also mentions one thing, is to pay attention to different field types, as little or no implicit conversions, because implicit conversions can invalidate the entire index.

Managing the PolarDB O Engine (compatible with Oracle Syntax) : Development Specifications (2)

SQL and operations also have a number of specifications, and I will focus on a few of them here.

The first is the data correction, if the developers want to do some modification of the data, we must first query these data out, first look at it again to do delete, otherwise it is easy to delete by mistake.

The data management product DMS is also recommended. If you are doing data correction on the DMS, it has the advantage of being able to check the backup option. When data is being corrected, it automatically makes a backup of all the data to be corrected. If there is a problem with the data correction, you can find the data automatically backed up by the DMS and restore the data again.

The rest will be released in the developer community and aliyun’s documentation system in the future.

PolarDB O engine (compatible with Oracle syntax) development practices: common SQL optimization

Managing PolarDB O engine (compatible with Oracle syntax) : SQL Optimization case a parallel query

When querying some Query with complex calculation, using parallel Query can greatly accelerate the Query efficiency.

The above is a simple example. In GROUP BY, there is a very simple calculation. When the Query needs to scan a lot of data, opening a parallel Query can increase the time from more than 100 seconds to 10 seconds, which is a small trick for users to use PolarDB.

2. Manage PolarDB O engine (compatible with Oracle syntax) : SQL optimization case 2. Select the appropriate JOIN mode

We support hash Join, Merge Join, and Nest-loop Join. Users can select appropriate join modes according to different scenarios.

As you can see, in the above example, selecting the Nest-loop join is the fastest.

Vi. Case and recognition

(1) Complete database ecology

PolarDB is a separate product, but it has a very complete product ecosystem, including data management DMS, data autonomy service DAS, data transmission DTS, database backup DBS, data and application migration ADAM, etc., can meet the user’s various scenarios, bring a full range of services.

(2) Case: PolarDB helps PrestoMall smoothly migrate from Oracle to the cloud

PrestoMall is a Southeast Asian e-commerce company founded in 2014. To cope with the rapid growth of its business, PolarDB helps PrestoMall smoothly migrate from Oracle to the cloud.

The main business challenges of cloud migration are as follows:

  1. With the rapid development of the business, IT costs also rise, and the cost of Oracle is high.
  2. With the rapid growth of business, the application has the ability of horizontal expansion, but the elasticity of the database is insufficient;
  3. High complexity, lack of experience, hope to have professional evaluation guidance;
  4. Optimal migration cost and risk control become difficult problems.

Based on the customer’s business needs, we developed a plan to migrate to PolarDB O (compatible with Oracle syntax) because:

  1. PolarDB O engine (compatible with Oracle syntax) as a cloud database, no expensive license fees;
  2. PolarDB O engine (compatible with Oracle syntax) cloud native elasticity, to solve the problem of customer database elasticity;
  3. ADAM provides customers with professional database/application compatibility evaluation reports and comprehensive migration plans. Combined with PolarDB O engine (compatible with Oracle syntax), high compatibility to Oracle, greatly improve the transformation efficiency;
  4. DTS live migration/backflow, in conjunction with specialist services, dramatically reduces cutover time and risk.

After moving to the PolarDB O engine (which is compatible with Oracle syntax), the following customer values have been achieved through the migration:

  1. PolarDB’S O engine (compatible with Oracle syntax) successfully supported customers’ businesses while reducing overall IT costs by 40%.
  2. PolarDB O engine (compatible with Oracle syntax) flexible upgrade, cope with ease;
  3. The ADAM + PolarDB O engine (compatible with Oracle syntax) helps customers reduce the cost of code transformation by 93%;
  4. Complete the cutover smoothly and smoothly within the plan, and the business runs stably.

(3) PolarDB, a widely recognized cloud-based relational database

PolarDB is widely recognized in the industry, with more than 10 top society papers, a first prize award from this year’s China Institute of Electronics, and a number of other prestigious honors.

The original link to this article is ali Cloud original content, shall not be reproduced without permission.