C In Internet projects, when the business scale is increasing and the data is increasing, the pressure of database will be increasing.

We may take various methods to optimize, such as cache schemes and SQL optimization mentioned in the previous article. In addition to these methods, here are some general methods for database optimization: “data read and write separation” and “database Sharding”. These are basically very common solutions used in medium and large Internet projects.

Let’s take a closer look.

Separation from read and write to CQRS

  


(Photo by Aliyun)

Since most of the Internet business scenarios are read more than write less, it is a very simple and efficient solution to separate read and write from database.

Read/write separation is a simple way to separate the read and write operations from the data, so that the two operations access different databases, so that you can reduce the pressure on the database.

For example, in the figure above, the database will have a “master instance” that provides writes (and occasionally reads), and in addition to the “master instance” there will be multiple “slave instances” (shown in the figure as read-only instances) that provide reads only.

There are more than one database. How do you ensure the consistency of data between multiple databases?

In fact, common databases have their own synchronization functions. For example, Mysql has a master-slave function to automatically synchronize data between the master database and the slave database, which is based on binary log replication. Write operations performed on the primary database create a binary log, which is asynchronously synchronized to the secondary database. The secondary database automatically executes the binary log, and the data is consistent with the primary database.

In addition to Mysql, commercial databases such as Oracle have similar functions, and there are even many open source third-party data synchronization tools available on the Internet, as well as many mature and useful ones.

The problem of synchronizing data between the master and slave instances is solved, so there is still a question of how to implement the routing rule in a project to allow write requests to access the master and read requests to access the slave.

There are two conventional ways:

Use encoding mode

This method mainly relies on the development of students in the process of coding, according to different operation requirements of reading and writing, to call different data sources. For example, in the data manipulation layer (DAO layer), separate the read and write data into two methods (functions), and then specify different databases for each method.

But this approach is a bit hardcoded, and it’s an extra concern for developers, adding a coding cost and making it easy to forget.

Using middleware

Mysql-proxy is a middleware provided by MySQL, which is used to realize read/write separation requests. However, this component is not widely used in practice. We can choose some other open source components to replace it, such as: MyCat, ProxySQL, etc., but the general principle is similar and the pattern is easy to understand from this diagram.


Ok, basic read and write separation is done, but feel this way although practical is practical, is not how forced.

OK, if you want to be forced, then let’s talk about another forced separation concept: “CQRS”.

CQRS: Command Query Responsibility Segregation

Separation of responsibility for commands (add, delete, modify) and queries

  


Let’s take a look at the picture first, from which we can easily understand CQRS

CQRS emphasizes the separation of Query(read) and Command(write), which separates responsibilities clearly in business. Command is mainly responsible for the execution of business logic, while Query is responsible for data Query and display. At the same time These two operations are based on different data sources, or even a database, the other is no (can, Query to Query data sources can be carried out in accordance with the domain model is stored directly, and not according to the data model to store, so immediately Query can be show, without conversion, and the Query efficiency.

In fact, CQRS was proposed by Martin Fowler, a famous computer scientist. For a more in-depth look at CQRS, check out Martin Fowler’s open source.

Ii. Sharding(Database and Table)

That said, let’s talk about database Sharding.

As the data in the database becomes larger and larger, the performance of a single table query cannot meet the service requirements. In this case, it is necessary to split the large table into several small tables, and the data in different tables are different. In this way, the query pressure can be dispersed and processing efficiency can be improved.

However, when the number of tables increases and all data is stored in a database, network IO and file IO are concentrated in a database, which may exceed the capacity of a single server. CPU, memory, file IO, and network IO will become the bottleneck of the system, and QPS/TPS will exceed the processing limit of a single database instance. So at this point we need to do sharding of the database.

Because the idea of sub-table and sub-library is similar, so the following unified to talk about technical solutions.

In fact, sub-database sub-table is just our popular easy to understand the talk, the correct description should be: data sharding

There are two modes of data sharding:

The vertical resolution

Horizontal split The scenarios for the two split applications are different:

Vertical split refers to the split of service modules. Simply put, this means putting fields/tables of business-intensive modules together in the same database or server. Separate the fields/tables of different services to different databases or servers. For example, in a game system, basic player information can be separated from item guild information.

Figure example:


(Photo credit network)

Horizontal splitting refers to splitting purely according to some data rule/format. For example, split by hash hash of the unique ID of the data, split by date of the data, split by some range, and so on. Horizontal split needs to be noted that, as data changes dynamically, the number of sharding may need to be adjusted dynamically. In addition, horizontal sharding does not consider business characteristics, so it is more troublesome to conduct business summary query or deal with things in sharding.

Figure example:


In addition, in practical applications, the two split modes are generally used together for better results.

The above is the database performance optimization “data read and write separation” and “database Sharding” method, welcome to communicate with you.