Introduction | every time is always a new era, new technology emerge in endlessly makes resurgent database technology. Distributed databases such as Spanner, CockroachDB and TDSQL are the trendsetters of this era. This article is organized by Tencent cloud database expert engineer Li Haixiang in Techo TVP Developer Summit “Song of Ice and Fire of Data — from online database technology, to mass data analysis technology” “Evolution of Distributed database” speech sharing, take you to taste distributed database architecture, cutting-edge technology and TDSQL technology practice. Experience the technical beauty of distributed databases.
Click here to watch a video of the speech
Distributed database architecture
What I share today is mainly focused on database technology, which is closely related to Tencent’s distributed database technology development in the past ten years. There are three main aspects: the first is the historical development and evolution of distributed database; The second is the core technical content of distributed database, including related content knowledge points; The third is the work done by Tencent TDSQL in the frontier. TDSQL is a distributed database system based on HTAP, with a particular emphasis on strong consistency. In 2017-2018, we proposed the concept of “all-temporal database”. At that time, we proposed the implementation of a cluster architecture called HTAC, which is very close to HTAP. In engineering, we call it HTAC, which is summed up in a theoretical term HTAP. So we launched our original product at that time, and the evolution of the product over the last two years has been focused on strong consistency, and last year we launched a product that was both theoretical and practical, and clearly explained the concept of “strong consistency.” The corresponding products of this technology, after a period of internal polishing, the TDSQL containing this technology will be launched in TDSQL public cloud and other products soon.
1. Overview of distributed system classical architecture
Let’s start with the first part, the evolution of distributed databases. What is the picture showing? It’s talking about infrastructure: Shared Nothing, Shared Memory, Shared Disk, Shared Everything. What are these? Where did it first come from? Hardware level is the foundation of software, the development of hardware level determines the development of software technology, the hardware level of some basic framework built, database software or application layer, system layer software will be superimposed on the top, just like building blocks, one by one. The internal database is also the same, divided into modules, layers, and then these things can be built together. But the database with the characteristics of strong coupling, build together is hard to open it, but now do a trend of distributed database is to try to take these things, like building blocks to into base again, what kind of components, which place need to go to the construction of such components, modules and modules to decoupling between, easier setup after decoupling, Build the system to be more scalable in the future. The underlying foundation of distributed database system is closely related to hardware.
2. Classic mainstream technology of distributed system architecture
Let me show you the technology that represents databases from a technical point of view. In this picture, the first person is the second winner of the Turing Prize in the database world — Dr. Coad, the founder of relational models, who laid the foundation for relational databases with a paper in 1970. In 1974, two typical technologies were born, one is SQL language, the other is transaction processing technology. More than 50 years ago, James Gray, the third Turing prize winner in the database industry, began to study transaction processing and produced a series of groundbreaking results, so transaction processing has been established in the 1970s and remains today. In the same year, IBM also developed a pioneering technology known as SQL, a structured query language (SQL) that IBM began to develop when it was working on databases.
After that, there is the ER model, which is the entity relationship model that helps us model database applications. However, in the development process of database technology, there are many models, including the relational model, hierarchical model and network model before 1970. These models and ER model have the same original intention, which is to map the real world from the perspective of data and data hierarchy. To enable the data world to express and compute the physical world. ER model just only been used to in the process of development of the relationship between modeling (textbooks, capture the essence, the reader’s understanding can no longer be the comprehensive), but behind it contains the content and relationship model, the hierarchical model is the same, if we look back at history and restore its original intention, is can be seen from the history of some of the basic things.
By 1980, cost-based query optimization techniques emerged in the database world, which could better select a near-optimal execution plan. Since then, the database has evolved an executor based on volcano model, which promotes the further development of database technology. As can be seen from this figure, the development of database technology basically progressed from no transaction to the concept of transaction. In 1993, there was a fork between AP and TP, which was attributed to Dr. Code. He put forward the concept of OLAP, online analytical transaction processing, in addition to the relational model. In the past, the main line became OLTP and OLAP branches.
As the years went on, something interesting happened in 2014 when Gartner, not an academic research institution but an industry research institution, introduced a concept called HTAP, which hopes to enhance analytics capabilities on transactional systems. This concept has been popular in recent years, and seems to remedy the weakness of transactional databases’ weak analytical ability in heavy transaction processing (conceptual separation, guiding ideology change, it seems to have disadvantages). There is always a good will to do everything in one system. It has to do with human needs and changing perceptions.
But before that, in 2012, Google Spanner system was born, which marked the evolution of people from not SQL to embracing SQL to embracing database transaction processing technology, to New SQL system.
The above mentioned technologies, is the classical technology of database, no matter stand-alone database or distributed database, are based on these basic technologies. Although TDSQL is a distributed database, 90% or more of the basic core functions come from the stand-alone database system, so the evolution of technology is actually stepping on the basis of the previous evolution, distributed database technology is inseparable from the relationship model, transaction processing and other basic technologies we talked about before. So I think do distributed database can not do without standalone database system, to understand the distributed database to start from the standalone database system, standalone database system actually has the viscera of the distributed database, it has been more comprehensive. However, based on the basic technology, distributed database has some new challenges due to the change of system architecture.
Let’s take the MySQL database system architecture as an example to share what modules and components a standalone database system contains.
The SQL in the upper left corner is an entry, and the results of its execution are returned to the user and entered into the database system after going around the box. On the left is MySQL Server and on the right is its storage engine. In fact, the entire database can be divided into three layers: the Server on the left, the storage engine on the right, and under the storage engine closely integrated with the operating system is the part of the content related to external files. The SQL Server in the receiving user and parse, like a compiler, for parse SQL statements do get a syntax tree, the syntax tree after transformation of the query optimizer into logical query plan, and then into a physical query plan, will do a lot in the process of optimization, like a child, query optimization, expression how heavy, reduction, etc. Again in the future it will be to actuators to perform and actuators and the storage system is actually closely bound, storage system of two parts, one part is actuators, various kinds of SQL statement execution, DDL, DML, DQL etc, it is under transverse execution of transaction processing and its orthogonal combination, the transaction processing system under the control of, Various SQL statements are executed in high concurrency and pass through various modules.
Module from the bottom up, at the bottom of the database system is a file, because it was on the operating system, and is the operating system on a file to organize data, so you can see the bottom is some physical file, physical file has its format, format have the database their defined on a variety of data format. Data can be divided into two parts, one is user data, and the other is the log data to be maintained by the database system itself. Data can be read and written by physical IO generation, which needs to deal with the storage engine, that is, the executor and the storage system. Data is read into memory, different database has its own specific data formats, this need access analytical format, original face is a a physical page, put them first loaded into the buffer zone, and then do format conversion, physical pages are parsed into a a record and column, is advantageous for the upper of it. When the parsing is complete, for example, there are two client connect to come in, it is to read and write the same data, so there are concurrent, it may produce abnormal data, transaction processing systems will happen at this time, to avoid abnormal data, ensure the consistency of the data, after calculating the results through the SQL Server back up again. As a distributed database system, it can not do without this execution process, but also can not do without the basic modules and components contained in it.
Database systems develop and evolve gradually. In fact, everyone who did stand-alone database in the early days is familiar with the master-slave architecture. The master-slave architecture of MySQL is based on a mixture of logic and physics, but it is more biased to the data transmission of the master-slave architecture logically. And then pure stand-alone database system one step further, be similar to distribution, physical nodes has become more, but more than a master, only prepared to do more reading, also is not purely a distributed database system, so the development of the database system architecture actually divided into two generations, the first generation is pure stand-alone systems, the second generation is a distributed system, There is a transitional phase between gen 1 and Gen 2, which I call Gen 1.5, but it also belongs to a stand-alone database system, so there is a master-slave architecture. Typically every database does physical log based, such as Oracle, PG stream replication, etc., but MySQL does master slave based on logical log format.
Seven or eight years ago, Amazon’s Aurora system was born. It was still master-slave in nature, one master and many standby. What it improved was that it became a separate storage and computing system on the cloud. So the products of the 1.5 era, typically represented by the early master-slave +Aurora architecture, are in a transitional era. What are the typical signs of a truly distributed database? Is write in each node is equal to, can be written on each node, the technology is here and there are many, have a plenty of pseudo distributed, the write transaction all operation focused on a single node to do, really is distributed by distributed concurrent access control algorithm, in each node to do the guarantee of data consistency.
3. Summary
The evolution of the basic database architecture has gone through such a process. To sum up, what factors are driving the evolution of distributed database systems from a technical perspective? In fact database systems have some inherent and essential requirement in promoting it, we used to say there is “3 tall one easy” database system, and high reliability, high availability, high performance, ease of use, etc., these basic factors in the development of distributed technology constantly move forward, eventually evolved into a distributed database system, to the requirement of scale on the agenda, So my first summary is about scaling, not just vertically scaling, but horizontally scaling, so for the multiple-read, multiple-write scenario of scaling, the structure of a distributed database becomes a pure peer-to-peer structure. In distributed system, usability should be emphasized, including the availability of data level, such as the multi-copy mechanism of data under consensus protocol, and the availability of the whole system function level. If you get high performance for less, you can support more business externally, and the cost is lower. Therefore, the inherent requirements of the original database from the stand-alone database system to the distributed database system has not changed. This is the first part of my share.
Distributed transactions and consistency
1. Data is abnormal
In the second part, we’ll look at what the technical aspects of a distributed database system involve. The core of transactional distributed database system must be transaction processing technology. There are two kinds of database operations after abstraction, one is read operation, one is write operation. When there is concurrency, when at least two transactions read and write the same data item, data exceptions may occur.
The figure on the left shows that when the read and write are under the orthogonal of the two transactions, there are four cases of 2×2. In the four cases, only read and read will not generate data exceptions, and all other combinations will generate data exceptions, which is the reason for the data exceptions. Because there are concurrent reads and writes to the same data items, transactions are designed to solve this problem. Transaction has an ACID four characteristics, including the C is consistency, I is isolation, actually the content of C and I are the same, just like two sides of the coin, ensure consistent, isolation, isolation level, consistency will almost, will allow some exceptions exist, this part is on the right side of the said, some specific data anomalies occur.
This graph summarizes some of the data anomalies, but there are more than a few, and TDSQL is doing a systematic study of how many data anomalies there are. So far, humans have not been able to explain how many anomalies there are.
The SQL standard defines four data exceptions and four isolation levels. A 1995 paper by James Gray defines eight data exceptions and eight isolation levels. In this case, if a ninth data exception is suddenly found, according to the SQL standard, it should be placed under which isolation level of the two systems? Such problems in the present can’t answer, that is what TDSQL doing the research and development process of distributed database, to solve the problems, only after answered the question, a system can be truly strong, fortunately, we are on such a basis problem had a clear answer, and uphold the tencent open source spirit, This research result was open-source to 3TS System (Tencent Transaction Processing Testbed System).
Some techniques to solve data exceptions are concurrent access control algorithms, and there are many concurrent access control algorithms, such as block-based, timestamp-based, optimistic mechanism based, and so on. TDSQL open source system is to do basic technology research, namely Tencent transaction processing experimental bed. Our system is called 3TS, which takes the first letter T in the words I just said, so there are three T’s, namely 3TS.
And solve the distributed database system and transaction related technology, more important there is a data exception — read half committed. Read submitted such data anomaly is based on the physical distribution system, a data item on a node has submitted, transfer didn’t submit another node, then to the second transaction to read the two nodes, the nodes must be able to read the submitted data, but can’t read uncommitted data nodes, That is, the data read on the uncommitted node is old data, and the data consistency between the old data and the submitted new data cannot be guaranteed, so the data anomaly called read half submitted data will occur, which is the problem to be solved in the transaction processing level of distributed database, namely consistency.
2. Lack of consistency challenges and industry solutions
But distributed database systems face more than these problems. As the database expands from a centralized system to a logically independent subsystem, it faces new challenges as it behaves like a single physical database system. Single machine to do the transaction on the database system to guarantee the ACID, a distributed system to ensure the ACID, also but have distributed consistency in distributed systems, such as: linear uniform, order uniform, causal, consistent, and learns to write, write, read after, and so on, and the two met will generate a new problem, which is a distributed system to solve.
The problems we face are more complicated than those mentioned above. For example, the following diagram summarizes the various concepts of distributed consistency. It is very complex, and there are about 60 different kinds of distributed consistency. It is not easy enough to understand this diagram and understand each one of them, and it becomes even more difficult when combined with transactional ACID.
The industry has studied the consistency of distributed transactions. Let me give you an idea of what this picture is about: Transactional consistency at the upper and the lower left corner distributed consistency, the top right corner has a transactional consistency problem under the isolation level, it is to deal with things in the database, but it happened in the picture the red box, distributed uniformity and transaction processing in these places so far without access to relevant theory and technology to support, And these are also TDSQL in doing distributed database system is committed to solving the problem. Spanner is really consistent compared to Google’s Spanner in the industry. It’s the only system I’ve seen so far that is really consistent, except TDSQL. Spanner did linear consistently adding ACID fusion of data consistency, the industry actually have this concept called serializable strictly, it can at the global level to ensure data consistency under the distributed system, this is really strong consensus (please note that the global emphasis here, look from any node to read data, everyone can read the same result, It is impossible to obtain different observations for different nodes).
However, if you do a test or theoretical derivation of Spanner, you will find that the transaction processing performance of Spanner system is very poor. Where can Spanner be used? It’s the processing of data from non-real-time advertising systems, but have you ever heard of Spanner using financial systems? No, in this context, it poses a new challenge to TDSQL, because TDSQL is used in the financial system business, we need to ensure both correctness and performance.
Doing distributed databases also faces the technical MPP required to do highly concurrent actuators.
From the beginning, we mentioned that the database is built based on hardware system. Limited by hardware, as a distributed database, what problems are faced? We are studying how it relates to the new hardware. Just now, Mr. Guy talked about Oracle21c. 21c does the combination of persistent memory and database, but it is the combination of stand-alone, while TDSQL does the combination of distributed system. Early on with hardware like SSD, now with persistent memory, with RDMA, that’s what we’re doing.
All of these things have an external demand-driven change in the amount of data. There are several dimensions to this change in data volume. The first one is important, but you may not be fully aware of it, and that is metadata. For a distributed database system, metadata will increase dramatically, and it will also put forward some new challenges to the database. What else is in the database system? We know everyone is talking about AI for DB. At this time, AI system must need a large amount of data, and such a large amount of data must be stored. Some systems store the data required by AI outside the database, but we are considering whether it can be stored in the database. And if so, in what form? All these are the basic problems to be studied and considered as distributed technology.
Thirdly, TDSQL strong consistency technology practice based on HTAP
Therefore, distributed database contains a lot of content: transaction processing data storage level, computing level and other aspects of the problem, based on these requirements, we carried out basic research, made TDSQL HTAC system, which contains a variety of content. This is the infrastructure diagram of our system. It looks like a stand-alone system, but there are a lot of small details that show that it is a distributed system.
There will be a variety of technologies involved in this system, and the core of them is the one I just talked about related to transaction processing — how to ensure strong consistency.
And according to our research, there are an infinite number of data anomalies, an infinite number of things. How do you recognize them? This presents a new challenge. TDSQL in doing a job is the abnormal data, the data into a limited number of abnormal, to cognition of it, based on the cognitive can define what isolation level, what is consistency, how to influence, look at all of the existing concurrent access control algorithm, how to consistency and distributed to combine, These are some of the basic technologies that TDSQL is working on, as well as the basic content of the conformance technology that I just talked about.
So let’s share some of the work that we’ve done. The top left corner is the understanding of the academic circles, the lower left corner is the industry’s existing products for the strong and consistent support degree, the lower right corner is the result of we have to come out, we turn the top left corner that tree to the lower right corner of the net, mean the trees a lot of seemingly unrelated relationship of nodes are intertwined, the strong consistency is involved in relevant technical content.
Based on the above technology, we developed multi-level consistency technology, that is, multiple levels of serializability technology, so that in distributed system, serializability can be divided into multiple levels. We bought Spanner service on Google Cloud, compared Spanner, implemented multi-level consistency technology on GreenPlum, and compared multiple concurrent access control algorithms on an open source system of MIT, and the experimental results showed that, TDSQL performs better (the technology was publicly shared at DTCC 2020).
Finally, what are the challenges and problems of distributed databases? ** I list them in a directory structure. The directory structure is divided into two parts, the left part is the data processing technology of its own internal driving demand for the database, the right part is based on the external environment of the database for the database driving factors. You can look at these basic factors listed in the catalog to understand the technical points associated with distributed databases. In fact, the internal requirements of the database are distributed and storage architecture related, data distribution, storage management, multi-copy, storage separation, multi-read and multi-write, query optimization, MPP. This idea is in line with what I’ve just shared, and is related to high availability and transaction processing, which are the inherent requirements of distributed databases that drive the continued advancement of database technology.
From the outside, new hardware, intelligent database, cloud computing, these are computing requirements for database system; HTAP, and the next generation of so-called New SQL, databases are evolving, and New technologies and content are emerging along the way. So do distributed database, most is based on stand-alone database system, and then do something related to distributed. Distributed related things are probably the mainstream technologies I mentioned earlier, and this directory structure covers the basic core content that is not included.
One, in particular, is decentralization. Do distributed database must consider decentralization, that is, each node is peer, when considering a concurrent access control algorithm, is the algorithm decentralized? Is the relationship equal? All things to consider.
Finally, if you are interested in the core basic technology, you can pay attention to our 3TS system, whose research objectives mainly include: Distributed transaction processing what to do, for example, its consistency and scalability, security, performance, etc. What is the relationship between these, how to evaluate the distributed database system, and how many abnormal data (we have countless answered the abnormal data, but can be classified as limited), concurrent access control algorithm such as how to improve the technology.
I look forward to in-depth exchanges with you. Thank you.
The lecturer introduction
Hai-xiang li
Tencent cloud database expert engineer, responsible for Tencent TDSQL research and development. School of Information, Renmin University of China, master of Engineering business tutor, member of CCF Database committee, DTCC expert Committee. He has published the Art of database Query Optimizer: Principle Analysis and SQL Performance Optimization, the Art of Database Transaction Processing: Transaction Management and Concurrent Access Control, and Big Data Management. Application and authorization of patents 50+, VLDB and other papers will be several. Participated in the national 863 major projects, nuclear high-tech, Ministry of Industry and Information Technology, Ministry of Science and Technology and other projects. Won the first prize of Beijing Science and Technology Progress.