Read for about 3 minutes. Bookmark it

With the rapid development of the company’s business, the amount of data in the database soared, and the access performance slowed down, so optimization was imminent.

Analyze what the problem is? Relational database itself is relatively easy to become the system bottleneck, single storage capacity, connection number, processing capacity are limited. When the amount of data in a single table reaches 1000W or 100G, the performance of many operations deteriorates seriously even after adding slave libraries and optimizing indexes due to the large number of query dimensions.

Plan 1:

Improving server hardware capabilities to improve data processing capabilities, such as increasing storage capacity, CPU, etc., is costly and limited if the bottleneck is MySQL itself.

Scheme 2:

By dispersing data in different databases, the data quantity of a single database is reduced to alleviate the performance problem of a single database, so as to achieve the purpose of improving database performance.

As shown in the figure below: The e-commerce database is split into several independent databases, and the large table is also split into several small tables, so as to solve the performance problem of the database through this method of database splitting. It’s like putting eggs in multiple baskets.

Depots table in order to solve the performance degradation due to large amount of data of the database, the original split into several independent database of database, the data tables are split into a number of tables, making a single database, a single data table data quantity is small, so as to achieve the purpose of the database performance.

Subdivision table includes subdivision and subdivision table two parts, in production usually includes: vertical subdivision, horizontal subdivision, vertical subdivision table, horizontal subdivision table four ways.

@[toc]

The vertical table

Vertical split table definition: A table is divided into multiple tables by field, and each table stores a portion of its fields.

The improvements it brings are:

  1. To avoid IO contention and reduce the chance of locking the table
  2. Give full play to the operation efficiency of popular data, the operation efficiency of commodity information will not be dragged down by the low efficiency of commodity description.

Why the IO efficiency of large fields is low: First, because of the large amount of data itself, it takes longer to read; The second is cross-page, page is the database storage unit, many search and location operations are based on the page as a unit, the more data lines in a single page, the better the overall performance of the database, and large fields occupy large space, the number of storage lines in a single page, so the IO efficiency is low. Third, the database loads the data into memory in behavioral units, so that the field length in the table is shorter and the access frequency is higher, the memory can load more data, the hit ratio is higher, and the disk IO is reduced, thus improving the database performance.

Generally, data items in a business entity are accessed at different frequencies. Some data items may be BLOBs or texts that occupy large storage space. For example, the product description in the above example. Therefore, if a table has a large amount of data, you can split the table by field and store the popular and unpopular fields in different libraries. These libraries can be stored on different storage devices to avoid I/O contention. The performance gains from vertical sharding are focused on efficient manipulation of hot data and reduced disk contention.

Usually we split vertically according to the following principles:

  1. Put uncommon fields in a separate table;
  2. Separate large fields such as text and BLOb into attached tables.
  3. Frequently combined query columns in a table;

Vertical depots

Through the vertical table performance has been improved to a certain extent, but it has not met the requirements, and the disk space is almost insufficient, because the data is still limited to a server, the database vertical table only solves the problem of large data volume of a single table, but does not distribute the table to different servers. Therefore, each table still competes for CPU, memory, network IO, and disk of the same physical machine.

Vertical branch refers to the classification of tables according to business and distribution to different databases. Each database can be placed on different servers. Its core concept is dedicated database.

The improvements it brings are:

  1. Solve the business level coupling, business clarity
  2. Hierarchical management, maintenance, monitoring, and expansion of data of different services
  3. In high concurrency scenarios, vertical database separation improves I/O and database connections to a certain extent, and reduces the bottleneck of hardware resources on a single machine

By classifying tables according to business and distributing them in different databases, vertical branch database can be deployed on different servers to share the pressure of multiple servers, but it still does not solve the problem of large amount of data in a single table.

The level of depots

Horizontal database is to separate the data of the same table into different databases according to certain rules, and each database can be placed on different servers.

Vertical branch is to separate different tables into different databases. Horizontal partitioning is the splitting of data rows without affecting the table structure

The improvements brought by horizontal repository are:

  1. It solves the performance bottleneck of single database big data and high concurrency.
  2. The stability and availability of the system are improved.

Stability in IO conflict reduction, lock reduction, availability refers to a library problem, partially available

When an application is difficult to fine-grained vertical segmentation, or the number of data lines after segmentation is huge, there is a single library read and write, storage performance bottleneck, at this time, it is necessary to carry out horizontal segmentation, after optimization of horizontal segmentation, can often solve the single inventory storage and performance bottleneck. However, because the same table is allocated to different databases, additional routing work for data operations is needed, which greatly improves the system complexity.


The level of table

Horizontal table is in the same database, the data of the same table according to certain rules to separate into multiple tables.

The improvements it brings are:

  1. Optimize performance problems caused by large data volumes in a single table
  2. Avoid IO contention and reduce table locking

The horizontal table in the database solves the problem of large amount of data in a single table. The small table contains only part of the data, so that the amount of data in a single table becomes smaller and the retrieval performance is improved.


conclusion

Vertical table division: A wide table can be divided into multiple tables based on the access frequency and whether the fields are large. In this way, services are clear and performance is improved. After splitting, try to avoid joint inspection from the business point of view, otherwise the performance will outweigh the loss.

Vertical depots: can classify multiple tables according to business coupled elastic, respectively in different libraries, the libraries can be distributed in different servers, allowing access to stress was more the load on the server, greatly improving performance, at the same time can improve the business definition of overall architecture, different business library can customize according to oneself circumstance optimization solution. But it needs to solve all the complications that cross libraries bring.

Horizontal repository: The data of a table (by data row) can be divided into multiple different libraries, each library has only part of the data of this table, these libraries can be distributed on different servers, so that the access pressure is loaded by multiple servers, greatly improving performance. Not only does it have to deal with all the complexities that cross libraries bring, but it also has to deal with the problem of data routing (which I’ll cover later).

Horizontal table: You can split the data of one table (by data row) into multiple tables in the same database, each table has only part of the data of that table. This can improve performance slightly, but it is only a supplementary optimization to the horizontal table.

Generally speaking, in the system design stage, we should determine the vertical branch library and vertical branch table scheme according to the tightness of the business coupling. In the case of the data volume and access pressure is not particularly large, we should first consider the cache, read and write separation, index technology and other schemes. If the amount of data is very large and continues to increase, then consider the scheme of horizontal sub-database and horizontal sub-table.


JavaPub reference tutorial: https://blog.csdn.net/weixin_44062339/article/details/100491744

Afterword.

It is Now March 2021. As you all know, I have been doing a series of JavaPub interview questions recently. Part of them have been updated in the past, and will continue to be updated in the future

The preface

No matter how lofty the framework is, it also needs a solid foundation to play, and the high-frequency interview questions are the high-frequency practical points in the foundation.

Suitable for readers

Java learners and enthusiasts, technical people with work experience, prospective interviewers, etc.

Read the advice

This tutorial is a series of tutorials covering Java basics, JVM, containers, multithreading, reflection, exceptions, networking, object copy, JavaWeb, Design Patterns, Spring-Spring MVC, Spring Boot/Spring Cloud, Mybatis/Hibernate Kafka RocketMQ Zookeeper MySQL Redis Elasticsearch Lucene

Wechat search: JavaPub, read a full series of interview questions tutorial