Through the core Java learning notes, we can see in this article, find out what distributed SQL is, and look at the database architecture, business benefits, etc. Code a lot of professional knowledge, share for your reference.

What is distributed SQL?

SQL has been the de facto language for relational databases (aka RDBMSS) for nearly 40 years. Therefore, relational databases are also called SQL databases. However, from an architectural point of view, the original SQL databases such as Oracle, PostgreSQL, and MySQL are monolithic. They cannot automatically allocate data and queries across multiple instances. The advent of NewSQL databases makes SQL scalable. But they also offered painful compromises of their own.

Microservices-based applications have been on the rise since 2015, when Docker containers and Kubernetes choreography were introduced to create flexible, composable infrastructures. The cloud-native principles of built-in scaling, elasticity, and geographic distribution are at the heart of this architectural shift. The time is ripe for the introduction of a new type of database called “distributed SQL.” The defining characteristic of a distributed SQL database is that the entire database cluster (regardless of the number of nodes in it) is treated as a single logical SQL database to the application.

Database architecture

Distributed SQL databases have a three-tier architecture.





1. SQL API

As the name implies, distributed SQL databases must have SQL apis so that applications can model relational data and execute queries that involve those relationships. Typical data modeling constructs that are unique to SQL databases are indexes, foreign key constraints, JOIN queries, and multi-row ACID transactions.

2. Distributed query execution

Queries should be automatically distributed across multiple nodes in the cluster so that no single node becomes a bottleneck for query processing. Any node in the cluster should accept the incoming query, and that node should then request other nodes to process a portion of its query, including the amount of data transferred between nodes on the network, in a way that minimizes its processing wait. The original node receiving the request should then send the summary result back to the client application.

3. Distributed data storage

Data containing indexes should be automatically distributed (also known as sharding) among multiple nodes of the cluster so that no single node becomes a bottleneck to ensure high performance and high availability. In addition, database clusters should support highly consistent replication and multi-row (also known as distributed) ACID transactions to ensure a single logical database concept.

Highly consistent replication

Supporting a powerful SQL API layer essentially requires that the underlying storage layer be built on strong consistent replication across database cluster nodes. This means that writes to the database are committed synchronously on multiple nodes to ensure availability during failures. The read should serve the last committed write or error. This property is often called linearizable. According to the famous CAP theorem, distributed SQL databases are classified into consistency and partition tolerance (CP).

Distributed ACID transaction

The database storage tier should also support distributed ACID transactions, where transactions are coordinated across multiple rows on multiple nodes. Typically, this requires the use of the two-phase commit (2PC) protocol. The isolation level represents the I in ACID, which indicates how strict the database is with concurrent data access. Distributed SQL databases are expected to support serialization as the most stringent isolation level and other weaker isolation levels such as Snapshot.

Business interests

The above architecture brings four key advantages.



Developer agility using SQL and transactions

Even as NoSQL databases such as Amazon DynamoDB, MongoDB, and FaunaDB begin to make certain operations transactional, application developers continue to make SQL databases intimate.

One reason for this affinity is the inherent power of SQL as a data modeling language for easily modeling relationships and multi-row operations. For example, SQL goes beyond traditional key-value NoSQL by allowing multi-row transactions both explicitly (using BEGIN and END TRANSACTION syntax) and implicitly (using secondary indexes, foreign keys, and JOIN queries).

In addition, developers like the ease of using SQL to model (and store) data only once, and then simply change the JOIN to change the query as the business requirements change.

2. Super elastic with local failover/recovery

Techniques such as distributed consensus replication per shard in a distributed SQL database ensure that each shard, rather than each instance, remains highly available in the event of a failure.

Infrastructure failures always affect only a subset of the data (only those pieces whose leaders are fragmented), not the entire cluster. And, since the remaining shard copies can automatically elect a new leader in seconds, the cluster will repair itself, thus showing self-healing characteristics in the event of failure. The application is transparent to these cluster configuration changes and can continue to function normally without interruption or deceleration.

3. On-demand scaling with horizontal write scalability


How to distribute
Data fragmentation is performed in the SQL databaseShows how automatic data sharding is generally implemented in a distributed SQL database. Sharding is automatically balanced among all available nodes as new nodes are added or existing nodes are removed.

Microservices that require transactional applications to have write scalability can now rely directly on the database without having to add a new infrastructure, such as in-memory caching (offloading read requests from the database so they can be retained to process write requests) or

NoSQL database. (Extensible write but abandon ACID guarantee).

4. Low user latency with geographic data distribution

As”
Nine techniques for building cloud-native, geographically distributed SQL applications with low latencyAs highlighted, distributed SQL databases can provide a wide range of techniques for building geographically distributed applications that not only help to automatically tolerate regional failures, but also reduce fault tolerance. Create delays for end users by bringing the data closer to its local area.